Shipping Python APIs/
Lesson

The previous lesson covered how to set up a GitHub Actions workflow for Python. Now we go deeper into the testing layer itself. Running pytest in CI is a start, but a production-quality pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production. tests across multiple environments, enforces coverage standards, and includes checks that AI consistently skips.

Test matrixWhat is test matrix?A CI configuration that runs the same tests in parallel across multiple environments (e.g., Node 18, 20, 22).

A test matrix runs your entire test suite across multiple configurations in parallel. For Python projects, the two most useful matrix dimensions are Python version and operating system.

yaml
jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        python-version: ["3.11", "3.12", "3.13"]
        os: [ubuntu-latest, macos-latest]
      fail-fast: false

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: "pip"
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - run: pytest

This creates six parallel jobs, three Python versions times two operating systems. Each job runs on a fresh VM with its own isolated environment.

SettingWhat it does
matrix.python-versionTests against multiple Python versions
matrix.osTests against multiple operating systems
fail-fast: falseLets all jobs finish even if one fails

The fail-fast: false setting is important. By default, GitHub Actions cancels all remaining matrix jobs the moment one fails. With fail-fast: false, you see all failures at once instead of fixing them one at a time.

AI pitfall
AI hardcodes a single Python version (usually 3.11) and ubuntu-latest. If your library supports 3.11 through 3.13 or your users run macOS, you will not discover compatibility issues until they file bug reports.
02

Parallel jobs for speed

Beyond the matrix, you can split your CI into parallel jobs by responsibility. Linting, type checking, and testing are independent, there is no reason to run them sequentially.

yaml
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
      - run: pip install ruff
      - run: ruff check .
      - run: ruff format --check .

  typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - run: mypy src/

  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.11", "3.12"]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: "pip"
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - run: pytest --cov=src --cov-report=xml

Three jobs start simultaneously. Linting finishes in seconds. Type checking finishes in 10-30 seconds. Tests might take a minute or more. Total wall-clock time equals the slowest job, not the sum of all jobs.

03

Coverage reports with pytest-cov

pytest-cov is a pytest plugin that measures code coverageWhat is code coverage?A metric showing what percentage of your code is exercised by your tests, measured by lines, branches, or functions., which lines of your source code are executed during tests and which are not.

# Install
pip install pytest-cov

# Run with coverage
pytest --cov=src --cov-report=term-missing

The --cov-report=term-missing flag shows exactly which lines are not covered:

Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
src/auth.py                45      3    93%   67-69
src/routes/users.py        82     12    85%   44-48, 91-97
src/database.py            34      0   100%
-----------------------------------------------------
TOTAL                     161     15    91%

Enforcing a coverage threshold

You can make CI fail if coverage drops below a threshold:

yaml
- run: pytest --cov=src --cov-fail-under=80

This fails the job if overall coverage is below 80%. It prevents the slow erosion that happens when every new feature adds untested code.

FlagPurpose
--cov=srcMeasure coverage for the src/ directory
--cov-report=term-missingShow uncovered lines in terminal
--cov-report=xmlGenerate XML report for upload to Codecov
--cov-fail-under=80Fail if coverage drops below 80%
AI pitfall
AI-generated test pipelines never include coverage thresholds. Without a threshold, coverage silently drops from 90% to 60% to 30% over months. By the time anyone notices, there are hundreds of untested lines and no one remembers what they do.
04

Type checking with mypy

Mypy is a static type checker for Python. It reads your type annotations and verifies that function calls, return values, and variable assignments are consistent.

yaml
- name: Type check
  run: mypy src/ --strict

The --strict flag enables all optional checks: disallowing Any types, requiring return type annotations, checking untyped function definitions. It is aggressive, but it catches the bugs that matter most in production, the ones where a function returns None when the caller expects a dict.

Why mypy belongs in CI, not just your editor

Your editor's mypy plugin only checks the file you have open. CI runs mypy across the entire codebase in one pass. This catches cross-file issues: you change a function signature in auth.py, and mypy flags every caller in routes/, services/, and tests/ that passes the wrong arguments.

src/routes/users.py:34: error: Argument "role" to "create_user" has incompatible type "str"; expected "UserRole"
src/services/email.py:12: error: Missing return statement
tests/test_auth.py:56: error: "None" has no attribute "id"
AI pitfall
AI almost never includes mypy in CI workflows. It adds pytest and sometimes ruff, but type checking is consistently skipped. In a FastAPI project with Pydantic models, this means your CI cannot catch type mismatches between your API schemas and your database models, exactly the kind of bug that causes 500 errors in production.
05

Integration tests with service containers

Unit tests mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. the database. Integration tests use a real one. GitHub Actions lets you spin up service containers, DockerWhat is docker?A tool that packages your application and all its dependencies into a portable container that runs identically on any machine. containers that run alongside your test job.

yaml
jobs:
  integration:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: testdb
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
      - run: pip install -r requirements.txt -r requirements-dev.txt
      - name: Run integration tests
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/testdb
        run: pytest tests/integration/ -v

The services block starts a PostgreSQL containerWhat is container?A lightweight, portable package that bundles your application code with all its dependencies so it runs identically on any machine. before your steps run. The health-cmd ensures the database is ready before tests start. The DATABASE_URL environment variableWhat is environment variable?A value stored outside your code that configures behavior per deployment, commonly used for secrets like API keys and database URLs. tells your application where to connect.

Service containers for Redis

The same pattern works for Redis, RabbitMQ, or any service available as a Docker image:

yaml
services:
  redis:
    image: redis:7
    ports:
      - 6379:6379
    options: >-
      --health-cmd "redis-cli ping"
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5
06

What AI skips in test pipelines

Here is a direct comparison of what AI generates versus what a production pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production. needs:

What AI generatesWhat production needs
pytest (unit tests only)Unit tests, integration tests, E2E tests
No coverage measurementpytest-cov with --cov-fail-under threshold
No type checkingmypy --strict across the full codebase
No lintingruff check . and ruff format --check .
Single Python versionMatrix across supported versions
No database in CIService containers for PostgreSQL, Redis
Sequential stepsParallel jobs for lint, typecheck, test
07

Quick reference

PatternPurpose
strategy.matrixTest across multiple Python versions/OS
fail-fast: falseSee all failures, not just the first
pytest --cov=srcMeasure test coverage
--cov-fail-under=80Fail CI if coverage drops
mypy src/ --strictCatch type errors across the codebase
services: postgres:Spin up a real database for integration tests
Parallel jobsRun lint, typecheck, test simultaneously