The previous lesson covered how to set up a GitHub Actions workflow for Python. Now we go deeper into the testing layer itself. Running pytest in CI is a start, but a production-quality pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production. tests across multiple environments, enforces coverage standards, and includes checks that AI consistently skips.
Test matrixWhat is test matrix?A CI configuration that runs the same tests in parallel across multiple environments (e.g., Node 18, 20, 22).
A test matrix runs your entire test suite across multiple configurations in parallel. For Python projects, the two most useful matrix dimensions are Python version and operating system.
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
os: [ubuntu-latest, macos-latest]
fail-fast: false
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
- run: pip install -r requirements.txt -r requirements-dev.txt
- run: pytestThis creates six parallel jobs, three Python versions times two operating systems. Each job runs on a fresh VM with its own isolated environment.
| Setting | What it does |
|---|---|
matrix.python-version | Tests against multiple Python versions |
matrix.os | Tests against multiple operating systems |
fail-fast: false | Lets all jobs finish even if one fails |
The fail-fast: false setting is important. By default, GitHub Actions cancels all remaining matrix jobs the moment one fails. With fail-fast: false, you see all failures at once instead of fixing them one at a time.
ubuntu-latest. If your library supports 3.11 through 3.13 or your users run macOS, you will not discover compatibility issues until they file bug reports.Parallel jobs for speed
Beyond the matrix, you can split your CI into parallel jobs by responsibility. Linting, type checking, and testing are independent, there is no reason to run them sequentially.
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- run: pip install ruff
- run: ruff check .
- run: ruff format --check .
typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- run: pip install -r requirements.txt -r requirements-dev.txt
- run: mypy src/
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
- run: pip install -r requirements.txt -r requirements-dev.txt
- run: pytest --cov=src --cov-report=xmlThree jobs start simultaneously. Linting finishes in seconds. Type checking finishes in 10-30 seconds. Tests might take a minute or more. Total wall-clock time equals the slowest job, not the sum of all jobs.
Coverage reports with pytest-cov
pytest-cov is a pytest plugin that measures code coverageWhat is code coverage?A metric showing what percentage of your code is exercised by your tests, measured by lines, branches, or functions., which lines of your source code are executed during tests and which are not.
# Install
pip install pytest-cov
# Run with coverage
pytest --cov=src --cov-report=term-missingThe --cov-report=term-missing flag shows exactly which lines are not covered:
Name Stmts Miss Cover Missing
-----------------------------------------------------
src/auth.py 45 3 93% 67-69
src/routes/users.py 82 12 85% 44-48, 91-97
src/database.py 34 0 100%
-----------------------------------------------------
TOTAL 161 15 91%Enforcing a coverage threshold
You can make CI fail if coverage drops below a threshold:
- run: pytest --cov=src --cov-fail-under=80This fails the job if overall coverage is below 80%. It prevents the slow erosion that happens when every new feature adds untested code.
| Flag | Purpose |
|---|---|
--cov=src | Measure coverage for the src/ directory |
--cov-report=term-missing | Show uncovered lines in terminal |
--cov-report=xml | Generate XML report for upload to Codecov |
--cov-fail-under=80 | Fail if coverage drops below 80% |
Type checking with mypy
Mypy is a static type checker for Python. It reads your type annotations and verifies that function calls, return values, and variable assignments are consistent.
- name: Type check
run: mypy src/ --strictThe --strict flag enables all optional checks: disallowing Any types, requiring return type annotations, checking untyped function definitions. It is aggressive, but it catches the bugs that matter most in production, the ones where a function returns None when the caller expects a dict.
Why mypy belongs in CI, not just your editor
Your editor's mypy plugin only checks the file you have open. CI runs mypy across the entire codebase in one pass. This catches cross-file issues: you change a function signature in auth.py, and mypy flags every caller in routes/, services/, and tests/ that passes the wrong arguments.
src/routes/users.py:34: error: Argument "role" to "create_user" has incompatible type "str"; expected "UserRole"
src/services/email.py:12: error: Missing return statement
tests/test_auth.py:56: error: "None" has no attribute "id"Integration tests with service containers
Unit tests mockWhat is mock?A fake replacement for a real dependency in tests that records how it was called so you can verify interactions. the database. Integration tests use a real one. GitHub Actions lets you spin up service containers, DockerWhat is docker?A tool that packages your application and all its dependencies into a portable container that runs identically on any machine. containers that run alongside your test job.
jobs:
integration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: testdb
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- run: pip install -r requirements.txt -r requirements-dev.txt
- name: Run integration tests
env:
DATABASE_URL: postgresql://test:test@localhost:5432/testdb
run: pytest tests/integration/ -vThe services block starts a PostgreSQL containerWhat is container?A lightweight, portable package that bundles your application code with all its dependencies so it runs identically on any machine. before your steps run. The health-cmd ensures the database is ready before tests start. The DATABASE_URL environment variableWhat is environment variable?A value stored outside your code that configures behavior per deployment, commonly used for secrets like API keys and database URLs. tells your application where to connect.
Service containers for Redis
The same pattern works for Redis, RabbitMQ, or any service available as a Docker image:
services:
redis:
image: redis:7
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5What AI skips in test pipelines
Here is a direct comparison of what AI generates versus what a production pipelineWhat is pipeline?A sequence of automated steps (install, lint, test, build, deploy) that code passes through before reaching production. needs:
| What AI generates | What production needs |
|---|---|
pytest (unit tests only) | Unit tests, integration tests, E2E tests |
| No coverage measurement | pytest-cov with --cov-fail-under threshold |
| No type checking | mypy --strict across the full codebase |
| No linting | ruff check . and ruff format --check . |
| Single Python version | Matrix across supported versions |
| No database in CI | Service containers for PostgreSQL, Redis |
| Sequential steps | Parallel jobs for lint, typecheck, test |
Quick reference
| Pattern | Purpose |
|---|---|
strategy.matrix | Test across multiple Python versions/OS |
fail-fast: false | See all failures, not just the first |
pytest --cov=src | Measure test coverage |
--cov-fail-under=80 | Fail CI if coverage drops |
mypy src/ --strict | Catch type errors across the codebase |
services: postgres: | Spin up a real database for integration tests |
| Parallel jobs | Run lint, typecheck, test simultaneously |