Day 10 of 180 - PyTest & Virtual Environments

March 29, 2026 24 minute read

Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI

Introduction

There’s a moment every developer faces. You ship code to production. It works. Customers use it. And then one of them does something you didn’t expect. Your app crashes. They post a 1-star review. You don’t sleep.

Here’s the thing: that crash-to-be existed in your code the entire time. You just never tested that specific path. You tested the happy path-“user enters valid data, everything works”-but you never tested “user enters empty data” or “user enters huge numbers” or “user is offline.”

Today is the day you stop letting bugs hide.

Day 10 is a turning point in your journey as an engineer. You’re moving from “code that works on my machine” to “code I can prove works.” This is where professionalism begins.

You’ll learn three tools:

Virtual environments - Keep your projects isolated, like separate apartments instead of one giant dorm
requirements.txt + pip - Make your dependencies reproducible, like a recipe with exact measurements
pytest - Write tests that catch bugs before users do, like crash-test dummies instead of real crashes

By the end, you’ll have a test suite that gives you confidence. And confidence is everything.

Setup

First, create a project directory and set up your environment:

# Create and navigate to project directory
mkdir ~/workspace/day10-testing
cd ~/workspace/day10-testing

# Create virtual environment
python -m venv .venv

# Activate it (macOS/Linux)
source .venv/bin/activate

# On Windows, use:
# .venv\Scripts\Activate.ps1

You should see (.venv) appear in your terminal prompt. Good-your venv is active.

Now install the tools you need:

pip install pytest==7.4.3 pytest-cov==4.1.0

Create your requirements.txt:

pytest==7.4.3
pytest-cov==4.1.0

Create your .gitignore:

.venv/
__pycache__/
*.pyc
.pytest_cache/
.coverage
htmlcov/
dist/
build/
*.egg-info/
.DS_Store

Your project structure so far:

day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
└── (code files go here)

Part 1: Virtual Environments

Why Virtual Environments Exist

Imagine you’re building two houses on the same street. The first house needs blue paint. The second needs red paint. If they shared the same paint supply, disaster: when you paint one red, the other turns red too.

Python projects are like that. Project A might need Django 3.0 (an older web framework). Project B might need Django 4.0 (a newer version with different features). If both projects share the same Python installation, you can’t install both versions at the same time.

Virtual environments solve this by creating a separate Python installation for each project. Each project gets its own site-packages/ folder (where packages live). Paint one project red, the other stays blue.

Creating a Virtual Environment

python -m venv .venv

Let me break this down:

python - Call Python itself
-m venv - Run Python’s built-in venv module
.venv - Create a folder called .venv (hidden on macOS/Linux because it starts with a dot)

What did Python create? A folder with this structure:

.venv/
├── bin/                 # (on macOS/Linux) or Scripts/ (on Windows)
│   ├── python           # A Python binary specific to this venv
│   ├── pip              # A pip specific to this venv
│   ├── activate         # Script to activate the venv
│   └── ...
├── lib/
│   └── python3.11/
│       └── site-packages/   # Empty folder where packages will install
└── pyvenv.cfg

All of this is isolated. Your system Python is untouched.

Activating and Deactivating

Activation means “use this venv’s Python instead of the system Python.”

# macOS/Linux
source .venv/bin/activate

# Windows (PowerShell)
.venv\Scripts\Activate.ps1

After activation, your terminal prompt changes:

# Before activation:
$ python --version
Python 3.11.0

# After activation:
(.venv) $ python --version
Python 3.11.0 (from /Users/edward/workspace/day10-testing/.venv/bin/python)

That (.venv) prefix tells you: you are now using the isolated Python.

To deactivate (go back to system Python):

(.venv) $ deactivate
$ python --version
Python 3.11.0 (system Python again)

Why .venv Goes in .gitignore

Your .venv/ folder contains thousands of files. If you committed it to Git, your repo would be:

Massive (hundreds of MB or even GB)
Non-portable (another person’s M1 Mac can’t use your Intel Windows .venv/)
Unnecessary (they can recreate it by installing requirements.txt)

So: Never commit .venv/. Commit only requirements.txt.

When someone clones your repo:

git clone https://github.com/edward/my-project.git
cd my-project

# Create their own venv
python -m venv .venv
source .venv/bin/activate

# Install packages
pip install -r requirements.txt

They get the exact same environment as you, without downloading gigabytes.

A Brief Note on Conda

On your M1 Mac, you might hear about conda. Conda is like venv’s more powerful cousin-it handles non-Python dependencies (C libraries, CUDA, even system libraries). M1 Macs especially benefit from conda because it handles ARM architecture seamlessly.

For now, stick with venv. It’s built into Python and sufficient for what we’re doing. Conda is useful later when you’re installing scientific libraries like TensorFlow.

Part 2: pip and requirements.txt

The Problem: Version Hell

Let’s say you write code that uses NumPy:

import numpy as np

data = np.array([1, 2, 3])
result = data.mean()

You run pip install numpy (without specifying a version). You get version 1.26.0. Your code works perfectly.

Six months later, a colleague clones your repo and runs pip install numpy. NumPy is now at 2.0.0. There’s a breaking change-some API you used was removed. Suddenly your code breaks.

You didn’t change your code. NumPy changed. And now you’re debugging at midnight.

The Solution: Exact Version Pinning

# Instead of
pip install numpy

# Do this
pip install numpy==1.24.1

The ==1.24.1 means: “Install version 1.24.1, exactly. Nothing else.”

Now if someone else installs the same requirements, they get 1.24.1 too. Same code, same dependencies, same behavior.

Creating requirements.txt

Option 1: Freeze your current environment

pip freeze > requirements.txt

This command captures everything you’ve installed, with versions:

certifi==2024.2.2
charset-normalizer==3.3.2
idna==3.6
pytest==7.4.3
pytest-cov==4.1.0
requests==2.31.0
urllib3==2.1.0

Any package you’ve installed appears. Some you might not have explicitly asked for-they’re dependencies of dependencies (transitive dependencies).

Option 2: Write it by hand

For a fresh project, just list the packages you actually need:

pytest==7.4.3
pytest-cov==4.1.0

Later, if you add other packages, update this file.

Installing from requirements.txt

Once you’ve created requirements.txt, sharing your environment is one command:

pip install -r requirements.txt

-r means “read from file.” All packages with exact versions install. Reproducibility achieved.

Other Useful pip Commands

# List what you have installed
pip list

# Show details about a specific package
pip show pytest
# Output:
# Name: pytest
# Version: 7.4.3
# Summary: pytest: simple powerful testing with Python
# ...

# Uninstall a package
pip uninstall numpy

# See what's outdated
pip list --outdated

# Update a specific package
pip install --upgrade pytest
# or
pip install -U pytest

requirements.txt vs pyproject.toml

Newer Python projects use pyproject.toml instead of requirements.txt. It’s more powerful and follows PEP 518 standards.

pyproject.toml looks like this:

[project]
name = "my-project"
version = "1.0.0"
dependencies = [
    "pytest==7.4.3",
    "pytest-cov==4.1.0",
]

For Day 10, we’re sticking with requirements.txt-simpler, universally understood, and fully supported everywhere. You’ll learn pyproject.toml later.

Part 3: pytest - Your New Testing Standard

Why Testing Matters: The Crash-Test Dummy Analogy

Before a car ships to customers, engineers do something that seems wasteful: they crash it.

They strap a dummy (made of plastic and sensors) into the car. They launch it into a wall at 30 mph. The dummy experiences the impact. Sensors record everything-forces on the head, chest, legs. Engineers study the data. If the dummy’s chest took too much force, they redesign the airbags.

Then they crash it again. And again. They test every scenario: head-on collision, side impact, rollover. Each test teaches them something.

By the time the car reaches you, it’s been “crashed” hundreds of times. But those crashes were tests, not failures.

Code is the same. pytest is your crash-test dummy.

Without Tests: Bugs Hide Until Production

Let me show you a real bug:

def calculate_average(numbers):
    """Calculate the average of a list of numbers."""
    return sum(numbers) / len(numbers)

# Happy path: works fine
result = calculate_average([1, 2, 3])
print(result)  # 2.0

# But what if someone passes an empty list?
result = calculate_average([])  # ZeroDivisionError!

You test the happy path manually. You see it works. You ship it. A user passes an empty list. Your app crashes. You find out via a 1-star review.

This is not your fault personally. But it’s a bug that tests would have caught immediately.

With Tests: The Bug Gets Caught During Development

Now, with pytest:

import logging

logger = logging.getLogger(__name__)

def calculate_average(numbers: list) -> float:
    """Calculate the average of a list of numbers.

    Args:
        numbers: List of numeric values

    Returns:
        The average as a float

    Raises:
        ValueError: If the list is empty
    """
    if not numbers:
        logger.error("Cannot calculate average of empty list")
        raise ValueError("Cannot calculate average of empty list")

    logger.debug(f"Calculating average of {len(numbers)} values")
    return sum(numbers) / len(numbers)

# Test it
def test_calculate_average_happy_path():
    """Test normal case."""
    result = calculate_average([1, 2, 3])
    assert result == 2.0
    logger.info("Happy path test passed")

def test_calculate_average_empty_list():
    """Test edge case: empty list."""
    with pytest.raises(ValueError):
        calculate_average([])
    logger.info("Empty list edge case handled correctly")

When you run pytest -v:

tests/test_stats.py::test_calculate_average_happy_path PASSED
tests/test_stats.py::test_calculate_average_empty_list PASSED

======================== 2 passed in 0.05s ========================

You find the bug during development, not in production. You log it. You fix it. Users never see it. Everyone’s happy.

Test Discovery: Naming Matters

pytest finds tests automatically based on naming conventions. Follow these rules:

Test files are named test_*.py or *_test.py
Test functions are named test_*
Test classes are named Test*

pytest will find these:

project/
├── test_mean.py              ✓ Starts with test_
├── tests/test_variance.py    ✓ File starts with test_
├── stats_test.py             ✓ Ends with _test
│
└── THESE GET IGNORED:
    ├── check_mean.py         ✗ Doesn't follow pattern
    ├── mean_test.py          ✗ Not test_mean_test.py
    └── tests/stats.py        ✗ Doesn't start with test_

Break this rule, and pytest won’t find your tests.

Your First Test: The AAA Pattern

The gold standard for test structure is Arrange → Act → Assert:

def test_mean_of_positive_integers():
    # ARRANGE: Set up the data you're testing
    numbers = [1, 2, 3, 4, 5]

    # ACT: Call the function
    result = mean(numbers)

    # ASSERT: Check the result
    assert result == 3.0

Breaking it down:

Arrange: Create any inputs or objects the function needs
Act: Call the function
Assert: Verify the output is what you expect

This structure makes tests easy to read and understand.

Running Tests

# Run all tests in the current directory
pytest

# Run all tests, verbose (shows each test name)
pytest -v

# Run only one file
pytest tests/test_mean.py

# Run one specific test
pytest tests/test_mean.py::test_mean_of_positive_integers

# Run tests matching a pattern
pytest -k "mean"  # Runs test_mean_*, *_mean, etc.

# Stop on the first failure
pytest -x

# Show print statements (normally hidden)
pytest -s

# Quiet mode (only show summary)
pytest -q

Example output from pytest -v:

tests/test_stats.py::test_mean_happy_path PASSED                        [ 10%]
tests/test_stats.py::test_mean_empty_list FAILED                       [ 20%]
tests/test_stats.py::test_mean_single_number PASSED                    [ 30%]
tests/test_stats.py::test_variance_happy_path PASSED                   [ 40%]
tests/test_stats.py::test_bayes_update_basic PASSED                    [ 50%]

======= FAILED tests/test_stats.py::test_mean_empty_list ========

def test_mean_empty_list():
    with pytest.raises(ValueError):
        mean([])

E    AssertionError: DID NOT RAISE ValueError

=========== 1 failed, 4 passed in 0.23s ============

This tells you: “The function didn’t raise ValueError when you expected it to. Go fix that.”

Testing for Exceptions

Many functions should raise exceptions for bad input. You test that they do:

def test_mean_raises_on_empty_list():
    """The function should reject empty lists."""
    with pytest.raises(ValueError):
        mean([])

The with pytest.raises(ValueError): context manager says: “I expect this code to raise ValueError. If it does, the test passes. If it doesn’t, the test fails.”

You can even check the error message:

def test_mean_raises_with_correct_message():
    """Check the error message too."""
    with pytest.raises(ValueError, match="Cannot calculate mean"):
        mean([])

match takes a regex pattern. The error message must contain “Cannot calculate mean” or the test fails.

Parametrised Tests: DRY Testing

Parametrised tests let you test many inputs with one test function.

Without parametrisation (repetitive):

def test_mean_case_1():
    assert mean([1, 2, 3]) == 2.0

def test_mean_case_2():
    assert mean([10, 20]) == 15.0

def test_mean_case_3():
    assert mean([5]) == 5.0

def test_mean_case_4():
    assert mean([-1, -2, -3]) == -2.0

# ... 10 more similar tests

With parametrisation (clean):

@pytest.mark.parametrize("numbers,expected", [
    ([1, 2, 3], 2.0),
    ([10, 20], 15.0),
    ([5], 5.0),
    ([-1, -2, -3], -2.0),
    ([0, 0, 0], 0.0),
    ([1.5, 2.5, 3.5], 2.5),
])
def test_mean_multiple_cases(numbers, expected):
    assert mean(numbers) == expected

pytest runs this one test function six times-once for each tuple. Much cleaner.

You can parametrize multiple arguments:

@pytest.mark.parametrize("numbers,alpha,expected", [
    ([1, 2, 3], 0.05, 0.6666),
    ([1, 2, 3], 0.1, 0.6666),
    ([10, 20, 30], 0.05, 66.6666),
])
def test_variance_with_alpha(numbers, alpha, expected):
    result = variance(numbers)
    assert result == pytest.approx(expected)

Fixtures: Reusable Test Setup

If many tests need the same data, use a fixture:

import pytest

@pytest.fixture
def sample_dataset():
    """Fixture: provides sample data to tests that ask for it."""
    return [1, 2, 3, 4, 5]

def test_mean_with_fixture(sample_dataset):
    """This test receives sample_dataset automatically."""
    result = mean(sample_dataset)
    assert result == 3.0

def test_variance_with_fixture(sample_dataset):
    """Different test, same fixture."""
    result = variance(sample_dataset)
    assert result == 2.0

The sample_dataset parameter tells pytest: “I need the fixture named sample_dataset.” pytest calls the fixture function and injects the return value.

Why fixtures are great:

DRY: Define data once, use in many tests
Readable: Test code is cleaner without setup clutter
Maintainable: Change the fixture once, all tests update
Shareable: Fixtures work across multiple test files

If you have multiple test files and want to share fixtures:

project/
├── stats.py
├── tests/
    ├── conftest.py          ← Fixtures defined here
    ├── test_mean.py         ← Uses fixtures from conftest
    ├── test_variance.py     ← Uses fixtures from conftest
    └── test_bayes.py

tests/conftest.py:

import pytest

@pytest.fixture
def simple_numbers():
    return [1.0, 2.0, 3.0]

@pytest.fixture
def large_numbers():
    return list(range(1, 101))

@pytest.fixture
def negative_numbers():
    return [-5.0, -2.0, 0.0, 2.0, 5.0]

tests/test_mean.py:

def test_mean(simple_numbers):
    # No import needed, fixture comes from conftest.py
    assert mean(simple_numbers) == 2.0

tests/test_variance.py:

def test_variance(simple_numbers):
    # Same fixture, different file
    result = variance(simple_numbers)
    assert result == pytest.approx(2.0/3.0)

pytest automatically finds fixtures in conftest.py. No imports needed.

Mocking: Testing Code That Calls External Services

Here’s a brief intro (we’ll do this more on Day 14):

Sometimes your code calls an external API:

import requests

def fetch_stock_price(ticker):
    """Call an external API to get stock price."""
    response = requests.get(f"https://api.example.com/price/{ticker}")
    return response.json()["price"]

In tests, you don’t want to hit the real API. It’s slow, unreliable, and might cost money. Instead, you mock it:

from unittest.mock import patch

def test_fetch_stock_price():
    """Test without calling the real API."""
    with patch('requests.get') as mock_get:
        # Make requests.get return fake data
        mock_get.return_value.json.return_value = {"price": 150.0}

        result = fetch_stock_price("AAPL")
        assert result == 150.0

The test never touches the real API. It’s fast (milliseconds), isolated, and reliable.

Test Coverage: Proving Your Code Is Tested

You write tests. But how do you know if you’ve tested enough?

Code coverage measures what percentage of your code is executed by tests.

pip install pytest-cov

pytest --cov=stats --cov-report=term-missing

Output:

Name       Stmts   Miss  Cover   Missing
stats.py      95      5    94%    45-46, 78, 102-104

This says:

stats.py has 95 statements (lines of code)
5 are not executed by any test (missing)
Coverage is 94%
Lines 45-46, 78, 102-104 are untested

You then write tests for those lines until coverage is 100% (or as close as practical).

Aim for:

Green (90%+): Excellent
Yellow (70-89%): Good
Red (<70%): Needs work

Good Test Names: Be Descriptive

A good test name tells you exactly what’s being tested and what’s expected.

Bad names:

def test_mean():
    pass

def test_variance():
    pass

def test_1():
    pass

When they fail, you have no idea what broke.

Good names:

def test_mean_of_positive_integers_returns_correct_average():
    pass

def test_variance_of_empty_list_raises_valueerror():
    pass

def test_bayes_update_with_strong_evidence_increases_posterior():
    pass

When they fail, you know exactly what broke. The test name is a specification of what the function should do.

Pattern: test_[function_name]_[condition]_[expected_result]

FIRST Principles: What Makes a Good Test

F - Fast Tests should run in milliseconds. If a test takes seconds, you won’t run them often. You’ll skip them. Bugs hide.

I - Isolated Tests shouldn’t depend on each other. If test A must run before test B, you have a problem. Tests should be independent.

R - Repeatable Same test, same result every time. No random variation. No flaky tests that sometimes pass and sometimes fail.

S - Self-Validating A test either passes or fails. No manual checking (“did this look right?”). No human judgment.

T - Timely Ideally, write tests before the code (Test-Driven Development). Minimum: write tests alongside the code, not months later. Fresh context makes better tests.

The Project: Testing the Stats Functions from Day 9

Today you’ll build a complete pytest test suite for the statistics functions from Day 9.

Project Structure

day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
├── stats.py              ← Code being tested
└── tests/
    ├── __init__.py       ← Empty file, makes tests/ a package
    ├── conftest.py       ← Shared fixtures
    └── test_stats.py     ← All tests

stats.py - The Code

Create stats.py with full type hints and logging:

"""
Statistics module with type hints and logging.
Day 10: Production-grade Python with testing.
"""

import logging
from typing import List, Optional, Tuple

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)


def mean(numbers: List[float]) -> float:
    """
    Calculate the arithmetic mean of a list of numbers.

    Args:
        numbers: List of floats or ints

    Returns:
        The mean as a float

    Raises:
        ValueError: If the list is empty
        TypeError: If numbers contains non-numeric values

    Example:
        >>> mean([1, 2, 3])
        2.0
    """
    if not numbers:
        logger.error("Cannot calculate mean of empty list")
        raise ValueError("Cannot calculate mean of empty list")

    try:
        total = sum(numbers)
        result = total / len(numbers)
        logger.debug(f"Calculated mean of {len(numbers)} values: {result}")
        return result
    except TypeError as e:
        logger.error(f"Invalid data type in numbers list: {e}")
        raise TypeError("All elements must be numeric") from e


def variance(numbers: List[float]) -> float:
    """
    Calculate the variance (average squared deviation from mean).

    Args:
        numbers: List of floats or ints

    Returns:
        The variance as a float

    Raises:
        ValueError: If the list is empty or has only one element
        TypeError: If numbers contains non-numeric values

    Example:
        >>> variance([1, 2, 3])
        0.6666666666666666
    """
    if not numbers:
        logger.error("Cannot calculate variance of empty list")
        raise ValueError("Cannot calculate variance of empty list")

    if len(numbers) < 2:
        logger.error("Variance requires at least 2 values")
        raise ValueError("Variance requires at least 2 values (population has no variance)")

    try:
        m = mean(numbers)
        squared_diffs = [(x - m) ** 2 for x in numbers]
        result = sum(squared_diffs) / len(numbers)
        logger.debug(f"Calculated variance: {result}")
        return result
    except TypeError as e:
        logger.error(f"Invalid data type in numbers list: {e}")
        raise TypeError("All elements must be numeric") from e


def std_dev(numbers: List[float]) -> float:
    """
    Calculate the standard deviation (square root of variance).

    Args:
        numbers: List of floats or ints

    Returns:
        The standard deviation as a float

    Raises:
        ValueError: If the list is empty or has only one element
        TypeError: If numbers contains non-numeric values

    Example:
        >>> std_dev([1, 2, 3])
        0.8164965809004287
    """
    if not numbers:
        logger.error("Cannot calculate std dev of empty list")
        raise ValueError("Cannot calculate std dev of empty list")

    try:
        var = variance(numbers)
        result = var ** 0.5
        logger.debug(f"Calculated std dev: {result}")
        return result
    except (ValueError, TypeError) as e:
        raise


def is_significant(p_value: float, alpha: float = 0.05) -> bool:
    """
    Determine if a p-value is statistically significant.

    In plain English: Is this result unlikely to happen by chance?
    If p_value < alpha, we say "yes, this is surprising" (significant).

    Args:
        p_value: The p-value (must be between 0 and 1)
        alpha: The significance level (default 0.05 = 5%)

    Returns:
        True if p_value < alpha, False otherwise

    Raises:
        ValueError: If p_value or alpha are outside [0, 1]

    Example:
        >>> is_significant(0.03, 0.05)
        True
        >>> is_significant(0.08, 0.05)
        False
    """
    if not (0 <= p_value <= 1):
        logger.error(f"p_value {p_value} is outside [0, 1]")
        raise ValueError("p_value must be between 0 and 1")

    if not (0 <= alpha <= 1):
        logger.error(f"alpha {alpha} is outside [0, 1]")
        raise ValueError("alpha must be between 0 and 1")

    result = p_value < alpha
    logger.debug(f"p_value={p_value}, alpha={alpha} → significant={result}")
    return result


def bayes_update(
    prior: float,
    likelihood: float,
    likelihood_complement: Optional[float] = None
) -> float:
    """
    Update a prior probability using Bayes' Theorem.

    In plain English: You had a guess (prior). You saw new evidence (likelihood).
    What should your new guess be (posterior)?

    Bayes' Theorem: P(A|B) = P(B|A) * P(A) / P(B)

    Args:
        prior: Your initial guess (0 to 1)
        likelihood: Probability of evidence given your guess (0 to 1)
        likelihood_complement: Probability of evidence given NOT your guess.
                              If None, assumed to be 1 - likelihood

    Returns:
        The updated probability (posterior)

    Raises:
        ValueError: If any probability is outside [0, 1]

    Example:
        >>> bayes_update(0.5, 0.9)
        0.6428571428571429
    """
    # Validate inputs
    if not (0 <= prior <= 1):
        logger.error(f"prior {prior} is outside [0, 1]")
        raise ValueError("prior must be between 0 and 1")

    if not (0 <= likelihood <= 1):
        logger.error(f"likelihood {likelihood} is outside [0, 1]")
        raise ValueError("likelihood must be between 0 and 1")

    # Default likelihood_complement
    if likelihood_complement is None:
        likelihood_complement = 1 - likelihood

    if not (0 <= likelihood_complement <= 1):
        logger.error(f"likelihood_complement {likelihood_complement} is outside [0, 1]")
        raise ValueError("likelihood_complement must be between 0 and 1")

    # Bayes' Theorem
    posterior = (
        likelihood * prior /
        (likelihood * prior + likelihood_complement * (1 - prior))
    )

    logger.info(
        f"Bayes update: prior={prior} → posterior={posterior:.4f}"
    )
    return posterior

tests/conftest.py - Shared Fixtures

Create tests/conftest.py:

"""
Pytest configuration and shared fixtures for stats tests.
"""

import pytest
from typing import List


@pytest.fixture
def simple_numbers() -> List[float]:
    """Simple dataset for testing."""
    return [1.0, 2.0, 3.0, 4.0, 5.0]


@pytest.fixture
def large_numbers() -> List[float]:
    """Larger dataset for testing edge cases."""
    return list(range(1, 101))  # 1 to 100


@pytest.fixture
def negative_numbers() -> List[float]:
    """Numbers including negatives."""
    return [-5.0, -2.0, 0.0, 2.0, 5.0]


@pytest.fixture
def single_number() -> List[float]:
    """Single value (edge case)."""
    return [42.0]


@pytest.fixture
def duplicate_numbers() -> List[float]:
    """All values the same."""
    return [7.0, 7.0, 7.0, 7.0]

tests/test_stats.py - Complete Test Suite

Create tests/test_stats.py:

"""
Complete test suite for stats module.
Day 10: pytest introduction with happy paths, edge cases, and parametrisation.
"""

import pytest
import logging
from typing import List
from stats import mean, variance, std_dev, is_significant, bayes_update

# Get logger for test output
logger = logging.getLogger(__name__)


# ============================================================================
# TESTS FOR mean()
# ============================================================================

class TestMean:
    """Test the mean() function."""

    def test_mean_happy_path(self, simple_numbers):
        """Test normal case: positive integers."""
        result = mean(simple_numbers)
        logger.info(f"mean(simple_numbers) = {result}")
        assert result == 3.0

    def test_mean_single_number(self, single_number):
        """Edge case: single value."""
        result = mean(single_number)
        logger.info(f"mean(single_number) = {result}")
        assert result == 42.0

    def test_mean_all_same(self, duplicate_numbers):
        """Edge case: all values identical."""
        result = mean(duplicate_numbers)
        logger.info(f"mean(duplicate_numbers) = {result}")
        assert result == 7.0

    def test_mean_with_negatives(self, negative_numbers):
        """Include negative numbers."""
        result = mean(negative_numbers)
        logger.info(f"mean(negative_numbers) = {result}")
        assert result == pytest.approx(0.0)

    def test_mean_empty_list_raises(self):
        """Empty list should raise ValueError."""
        logger.info("Testing mean([]) raises ValueError")
        with pytest.raises(ValueError, match="Cannot calculate mean"):
            mean([])

    def test_mean_with_floats(self):
        """Floats should work correctly."""
        result = mean([1.5, 2.5, 3.5])
        logger.info(f"mean([1.5, 2.5, 3.5]) = {result}")
        assert result == pytest.approx(2.5)

    @pytest.mark.parametrize("numbers,expected", [
        ([1, 2, 3], 2.0),
        ([10, 20], 15.0),
        ([5], 5.0),
        ([-1, -2, -3], -2.0),
        ([0, 0, 0], 0.0),
        ([1.1, 2.2, 3.3], 2.2),
    ])
    def test_mean_parametrised(self, numbers, expected):
        """Parametrised: test many cases at once."""
        result = mean(numbers)
        logger.info(f"mean({numbers}) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)

    def test_mean_non_numeric_raises(self):
        """Non-numeric values should raise TypeError."""
        logger.info("Testing mean() with non-numeric values")
        with pytest.raises(TypeError):
            mean([1, 2, "three"])


# ============================================================================
# TESTS FOR variance()
# ============================================================================

class TestVariance:
    """Test the variance() function."""

    def test_variance_happy_path(self, simple_numbers):
        """Test normal case."""
        result = variance(simple_numbers)
        expected = 2.0
        logger.info(f"variance(simple_numbers) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)

    def test_variance_two_elements(self):
        """Edge case: minimum valid input."""
        result = variance([1.0, 3.0])
        expected = 1.0
        logger.info(f"variance([1.0, 3.0]) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)

    def test_variance_all_same(self, duplicate_numbers):
        """Edge case: variance of identical values is zero."""
        result = variance(duplicate_numbers)
        logger.info(f"variance(duplicate_numbers) = {result}, expected = 0.0")
        assert result == pytest.approx(0.0)

    def test_variance_empty_list_raises(self):
        """Empty list should raise ValueError."""
        logger.info("Testing variance([]) raises ValueError")
        with pytest.raises(ValueError, match="Cannot calculate variance"):
            variance([])

    def test_variance_single_element_raises(self):
        """Single element is not enough for variance."""
        logger.info("Testing variance([42.0]) raises ValueError")
        with pytest.raises(ValueError, match="at least 2 values"):
            variance([42.0])

    def test_variance_negative_numbers(self, negative_numbers):
        """Variance of numbers including negatives."""
        result = variance(negative_numbers)
        expected = 10.4
        logger.info(f"variance(negative_numbers) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)

    @pytest.mark.parametrize("numbers,expected", [
        ([1, 2, 3], 2.0/3.0),
        ([1, 3], 1.0),
        ([0, 0], 0.0),
        ([2, 2, 2, 2], 0.0),
    ])
    def test_variance_parametrised(self, numbers, expected):
        """Parametrised variance tests."""
        result = variance(numbers)
        logger.info(f"variance({numbers}) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)


# ============================================================================
# TESTS FOR std_dev()
# ============================================================================

class TestStdDev:
    """Test the std_dev() function."""

    def test_std_dev_happy_path(self, simple_numbers):
        """Test normal case."""
        result = std_dev(simple_numbers)
        expected = 2.0 ** 0.5
        logger.info(f"std_dev(simple_numbers) = {result}, expected = {expected}")
        assert result == pytest.approx(expected)

    def test_std_dev_all_same(self, duplicate_numbers):
        """Edge case: std dev of identical values is zero."""
        result = std_dev(duplicate_numbers)
        logger.info(f"std_dev(duplicate_numbers) = {result}, expected = 0.0")
        assert result == pytest.approx(0.0)

    def test_std_dev_empty_list_raises(self):
        """Empty list should raise ValueError."""
        logger.info("Testing std_dev([]) raises ValueError")
        with pytest.raises(ValueError):
            std_dev([])

    def test_std_dev_single_element_raises(self):
        """Single element is not enough."""
        logger.info("Testing std_dev([42.0]) raises ValueError")
        with pytest.raises(ValueError):
            std_dev([42.0])


# ============================================================================
# TESTS FOR is_significant()
# ============================================================================

class TestIsSignificant:
    """Test the is_significant() function."""

    def test_significant_below_threshold(self):
        """p-value < alpha → True."""
        result = is_significant(0.03, 0.05)
        logger.info(f"is_significant(0.03, 0.05) = {result}, expected = True")
        assert result is True

    def test_not_significant_above_threshold(self):
        """p-value > alpha → False."""
        result = is_significant(0.08, 0.05)
        logger.info(f"is_significant(0.08, 0.05) = {result}, expected = False")
        assert result is False

    def test_boundary_equals_alpha(self):
        """p-value == alpha → False (not strictly less)."""
        result = is_significant(0.05, 0.05)
        logger.info(f"is_significant(0.05, 0.05) = {result}, expected = False")
        assert result is False

    def test_boundary_just_below_alpha(self):
        """Just below alpha threshold."""
        result = is_significant(0.049999, 0.05)
        logger.info(f"is_significant(0.049999, 0.05) = {result}, expected = True")
        assert result is True

    def test_custom_alpha(self):
        """Different significance level."""
        result = is_significant(0.08, alpha=0.10)
        logger.info(f"is_significant(0.08, alpha=0.10) = {result}, expected = True")
        assert result is True

    def test_p_value_zero(self):
        """p-value = 0 is always significant."""
        result = is_significant(0.0, 0.05)
        logger.info(f"is_significant(0.0, 0.05) = {result}, expected = True")
        assert result is True

    def test_p_value_one(self):
        """p-value = 1 is never significant."""
        result = is_significant(1.0, 0.05)
        logger.info(f"is_significant(1.0, 0.05) = {result}, expected = False")
        assert result is False

    def test_invalid_p_value_raises(self):
        """p-value outside [0, 1] raises ValueError."""
        logger.info("Testing is_significant with invalid p_value")
        with pytest.raises(ValueError, match="p_value must be between"):
            is_significant(-0.05, 0.05)

        with pytest.raises(ValueError, match="p_value must be between"):
            is_significant(1.5, 0.05)

    def test_invalid_alpha_raises(self):
        """alpha outside [0, 1] raises ValueError."""
        logger.info("Testing is_significant with invalid alpha")
        with pytest.raises(ValueError, match="alpha must be between"):
            is_significant(0.05, -0.05)

    @pytest.mark.parametrize("p_value,alpha,expected", [
        (0.01, 0.05, True),
        (0.05, 0.05, False),
        (0.10, 0.05, False),
        (0.001, 0.01, True),
        (0.0, 0.05, True),
        (1.0, 0.05, False),
    ])
    def test_is_significant_parametrised(self, p_value, alpha, expected):
        """Parametrised tests for various thresholds."""
        result = is_significant(p_value, alpha)
        logger.info(f"is_significant({p_value}, {alpha}) = {result}, expected = {expected}")
        assert result is expected


# ============================================================================
# TESTS FOR bayes_update()
# ============================================================================

class TestBayesUpdate:
    """Test the bayes_update() function."""

    def test_bayes_update_basic(self):
        """Test standard Bayes' Theorem calculation."""
        result = bayes_update(0.5, 0.9)
        expected = 0.5 * 0.9 / (0.5 * 0.9 + 0.5 * 0.1)
        logger.info(f"bayes_update(0.5, 0.9) = {result}, expected ≈ {expected}")
        assert result == pytest.approx(expected)
        assert result == pytest.approx(0.9)

    def test_bayes_update_weak_prior(self):
        """Low prior gets updated strongly by evidence."""
        result = bayes_update(prior=0.1, likelihood=0.8)
        expected = 0.8 * 0.1 / (0.8 * 0.1 + 0.2 * 0.9)
        logger.info(f"bayes_update(0.1, 0.8) = {result}, expected ≈ {expected}")
        assert result == pytest.approx(expected)
        assert result > 0.1  # Posterior > prior

    def test_bayes_update_strong_prior(self):
        """High prior stays high unless evidence is strong."""
        result = bayes_update(prior=0.9, likelihood=0.6)
        logger.info(f"bayes_update(0.9, 0.6) = {result}, expected > 0.9")
        assert result > 0.9  # Still high

    def test_bayes_update_explicit_complement(self):
        """Can specify likelihood_complement explicitly."""
        result1 = bayes_update(0.5, 0.8, likelihood_complement=0.3)
        result2 = bayes_update(0.5, 0.8)  # Default: 1 - 0.8 = 0.2
        logger.info(f"explicit complement: {result1}, default complement: {result2}")
        assert result1 != result2

    def test_bayes_update_zero_prior(self):
        """Posterior is zero if prior is zero."""
        result = bayes_update(0.0, 0.9)
        logger.info(f"bayes_update(0.0, 0.9) = {result}, expected = 0.0")
        assert result == 0.0

    def test_bayes_update_one_prior(self):
        """Posterior stays high if prior is very high."""
        result = bayes_update(1.0, 0.5)
        logger.info(f"bayes_update(1.0, 0.5) = {result}, expected = 1.0")
        assert result == 1.0

    def test_bayes_update_invalid_prior_raises(self):
        """Prior outside [0, 1] raises ValueError."""
        logger.info("Testing bayes_update with invalid prior")
        with pytest.raises(ValueError, match="prior must be between"):
            bayes_update(-0.1, 0.5)

        with pytest.raises(ValueError, match="prior must be between"):
            bayes_update(1.5, 0.5)

    def test_bayes_update_invalid_likelihood_raises(self):
        """Likelihood outside [0, 1] raises ValueError."""
        logger.info("Testing bayes_update with invalid likelihood")
        with pytest.raises(ValueError, match="likelihood must be between"):
            bayes_update(0.5, 1.5)

    def test_bayes_update_invalid_complement_raises(self):
        """Complement outside [0, 1] raises ValueError."""
        logger.info("Testing bayes_update with invalid complement")
        with pytest.raises(ValueError, match="likelihood_complement must be between"):
            bayes_update(0.5, 0.8, likelihood_complement=1.5)

    @pytest.mark.parametrize("prior,likelihood,expected_comparison", [
        (0.5, 0.9, "greater"),   # Strong evidence increases posterior
        (0.5, 0.5, "equal"),     # Equal evidence keeps it steady
        (0.5, 0.1, "less"),      # Weak evidence decreases posterior
    ])
    def test_bayes_update_parametrised(self, prior, likelihood, expected_comparison):
        """Parametrised Bayes update tests."""
        result = bayes_update(prior, likelihood)
        logger.info(f"bayes_update({prior}, {likelihood}) = {result}")

        if expected_comparison == "greater":
            assert result > prior or result == pytest.approx(prior)
        elif expected_comparison == "equal":
            assert result == pytest.approx(prior)
        elif expected_comparison == "less":
            assert result < prior or result == pytest.approx(prior)

Create init.py

touch tests/__init__.py

This makes tests/ a Python package.

Running the Tests

# Make sure you're in the project root
cd ~/workspace/day10-testing

# Make sure venv is activated
source .venv/bin/activate

# Run all tests
pytest -v

# Run with coverage
pytest --cov=stats --cov-report=term-missing

# Run one test class
pytest -v tests/test_stats.py::TestMean

# Run one test
pytest -v tests/test_stats.py::TestMean::test_mean_happy_path

# Show logging output
pytest -v -s

Expected output from pytest -v:

tests/test_stats.py::TestMean::test_mean_happy_path PASSED                 [ 5%]
tests/test_stats.py::TestMean::test_mean_single_number PASSED              [10%]
tests/test_stats.py::TestMean::test_mean_all_same PASSED                   [15%]
tests/test_stats.py::TestMean::test_mean_with_negatives PASSED             [20%]
tests/test_stats.py::TestMean::test_mean_empty_list_raises PASSED          [25%]
tests/test_stats.py::TestMean::test_mean_with_floats PASSED                [30%]
tests/test_stats.py::TestMean::test_mean_parametrised[numbers0-expected0] PASSED [35%]
... (more tests)
tests/test_stats.py::TestBayesUpdate::test_bayes_update_parametrised[prior2-likelihood2-less] PASSED [100%]

======================== 60 passed in 0.34s ========================

Expected output from pytest --cov=stats --cov-report=term-missing:

Name       Stmts   Miss  Cover   Missing
stats.py     145      0   100%

Edward

Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI