Day 10 of 180 - PyTest & Virtual Environments
Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI
Introduction
There’s a moment every developer faces. You ship code to production. It works. Customers use it. And then one of them does something you didn’t expect. Your app crashes. They post a 1-star review. You don’t sleep.
Here’s the thing: that crash-to-be existed in your code the entire time. You just never tested that specific path. You tested the happy path-“user enters valid data, everything works”-but you never tested “user enters empty data” or “user enters huge numbers” or “user is offline.”
Today is the day you stop letting bugs hide.
Day 10 is a turning point in your journey as an engineer. You’re moving from “code that works on my machine” to “code I can prove works.” This is where professionalism begins.
You’ll learn three tools:
- Virtual environments - Keep your projects isolated, like separate apartments instead of one giant dorm
- requirements.txt + pip - Make your dependencies reproducible, like a recipe with exact measurements
- pytest - Write tests that catch bugs before users do, like crash-test dummies instead of real crashes
By the end, you’ll have a test suite that gives you confidence. And confidence is everything.
Setup
First, create a project directory and set up your environment:
# Create and navigate to project directory
mkdir ~/workspace/day10-testing
cd ~/workspace/day10-testing
# Create virtual environment
python -m venv .venv
# Activate it (macOS/Linux)
source .venv/bin/activate
# On Windows, use:
# .venv\Scripts\Activate.ps1
You should see (.venv) appear in your terminal prompt. Good-your venv is active.
Now install the tools you need:
pip install pytest==7.4.3 pytest-cov==4.1.0
Create your requirements.txt:
pytest==7.4.3
pytest-cov==4.1.0
Create your .gitignore:
.venv/
__pycache__/
*.pyc
.pytest_cache/
.coverage
htmlcov/
dist/
build/
*.egg-info/
.DS_Store
Your project structure so far:
day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
└── (code files go here)
Part 1: Virtual Environments
Why Virtual Environments Exist
Imagine you’re building two houses on the same street. The first house needs blue paint. The second needs red paint. If they shared the same paint supply, disaster: when you paint one red, the other turns red too.
Python projects are like that. Project A might need Django 3.0 (an older web framework). Project B might need Django 4.0 (a newer version with different features). If both projects share the same Python installation, you can’t install both versions at the same time.
Virtual environments solve this by creating a separate Python installation for each project. Each project gets its own site-packages/ folder (where packages live). Paint one project red, the other stays blue.
Creating a Virtual Environment
python -m venv .venv
Let me break this down:
python- Call Python itself-m venv- Run Python’s built-in venv module.venv- Create a folder called.venv(hidden on macOS/Linux because it starts with a dot)
What did Python create? A folder with this structure:
.venv/
├── bin/ # (on macOS/Linux) or Scripts/ (on Windows)
│ ├── python # A Python binary specific to this venv
│ ├── pip # A pip specific to this venv
│ ├── activate # Script to activate the venv
│ └── ...
├── lib/
│ └── python3.11/
│ └── site-packages/ # Empty folder where packages will install
└── pyvenv.cfg
All of this is isolated. Your system Python is untouched.
Activating and Deactivating
Activation means “use this venv’s Python instead of the system Python.”
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
After activation, your terminal prompt changes:
# Before activation:
$ python --version
Python 3.11.0
# After activation:
(.venv) $ python --version
Python 3.11.0 (from /Users/edward/workspace/day10-testing/.venv/bin/python)
That (.venv) prefix tells you: you are now using the isolated Python.
To deactivate (go back to system Python):
(.venv) $ deactivate
$ python --version
Python 3.11.0 (system Python again)
Why .venv Goes in .gitignore
Your .venv/ folder contains thousands of files. If you committed it to Git, your repo would be:
- Massive (hundreds of MB or even GB)
- Non-portable (another person’s M1 Mac can’t use your Intel Windows
.venv/) - Unnecessary (they can recreate it by installing
requirements.txt)
So: Never commit .venv/. Commit only requirements.txt.
When someone clones your repo:
git clone https://github.com/edward/my-project.git
cd my-project
# Create their own venv
python -m venv .venv
source .venv/bin/activate
# Install packages
pip install -r requirements.txt
They get the exact same environment as you, without downloading gigabytes.
A Brief Note on Conda
On your M1 Mac, you might hear about conda. Conda is like venv’s more powerful cousin-it handles non-Python dependencies (C libraries, CUDA, even system libraries). M1 Macs especially benefit from conda because it handles ARM architecture seamlessly.
For now, stick with venv. It’s built into Python and sufficient for what we’re doing. Conda is useful later when you’re installing scientific libraries like TensorFlow.
Part 2: pip and requirements.txt
The Problem: Version Hell
Let’s say you write code that uses NumPy:
import numpy as np
data = np.array([1, 2, 3])
result = data.mean()
You run pip install numpy (without specifying a version). You get version 1.26.0. Your code works perfectly.
Six months later, a colleague clones your repo and runs pip install numpy. NumPy is now at 2.0.0. There’s a breaking change-some API you used was removed. Suddenly your code breaks.
You didn’t change your code. NumPy changed. And now you’re debugging at midnight.
The Solution: Exact Version Pinning
# Instead of
pip install numpy
# Do this
pip install numpy==1.24.1
The ==1.24.1 means: “Install version 1.24.1, exactly. Nothing else.”
Now if someone else installs the same requirements, they get 1.24.1 too. Same code, same dependencies, same behavior.
Creating requirements.txt
Option 1: Freeze your current environment
pip freeze > requirements.txt
This command captures everything you’ve installed, with versions:
certifi==2024.2.2
charset-normalizer==3.3.2
idna==3.6
pytest==7.4.3
pytest-cov==4.1.0
requests==2.31.0
urllib3==2.1.0
Any package you’ve installed appears. Some you might not have explicitly asked for-they’re dependencies of dependencies (transitive dependencies).
Option 2: Write it by hand
For a fresh project, just list the packages you actually need:
pytest==7.4.3
pytest-cov==4.1.0
Later, if you add other packages, update this file.
Installing from requirements.txt
Once you’ve created requirements.txt, sharing your environment is one command:
pip install -r requirements.txt
-r means “read from file.” All packages with exact versions install. Reproducibility achieved.
Other Useful pip Commands
# List what you have installed
pip list
# Show details about a specific package
pip show pytest
# Output:
# Name: pytest
# Version: 7.4.3
# Summary: pytest: simple powerful testing with Python
# ...
# Uninstall a package
pip uninstall numpy
# See what's outdated
pip list --outdated
# Update a specific package
pip install --upgrade pytest
# or
pip install -U pytest
requirements.txt vs pyproject.toml
Newer Python projects use pyproject.toml instead of requirements.txt. It’s more powerful and follows PEP 518 standards.
pyproject.toml looks like this:
[project]
name = "my-project"
version = "1.0.0"
dependencies = [
"pytest==7.4.3",
"pytest-cov==4.1.0",
]
For Day 10, we’re sticking with requirements.txt-simpler, universally understood, and fully supported everywhere. You’ll learn pyproject.toml later.
Part 3: pytest - Your New Testing Standard
Why Testing Matters: The Crash-Test Dummy Analogy
Before a car ships to customers, engineers do something that seems wasteful: they crash it.
They strap a dummy (made of plastic and sensors) into the car. They launch it into a wall at 30 mph. The dummy experiences the impact. Sensors record everything-forces on the head, chest, legs. Engineers study the data. If the dummy’s chest took too much force, they redesign the airbags.
Then they crash it again. And again. They test every scenario: head-on collision, side impact, rollover. Each test teaches them something.
By the time the car reaches you, it’s been “crashed” hundreds of times. But those crashes were tests, not failures.
Code is the same. pytest is your crash-test dummy.
Without Tests: Bugs Hide Until Production
Let me show you a real bug:
def calculate_average(numbers):
"""Calculate the average of a list of numbers."""
return sum(numbers) / len(numbers)
# Happy path: works fine
result = calculate_average([1, 2, 3])
print(result) # 2.0
# But what if someone passes an empty list?
result = calculate_average([]) # ZeroDivisionError!
You test the happy path manually. You see it works. You ship it. A user passes an empty list. Your app crashes. You find out via a 1-star review.
This is not your fault personally. But it’s a bug that tests would have caught immediately.
With Tests: The Bug Gets Caught During Development
Now, with pytest:
import logging
logger = logging.getLogger(__name__)
def calculate_average(numbers: list) -> float:
"""Calculate the average of a list of numbers.
Args:
numbers: List of numeric values
Returns:
The average as a float
Raises:
ValueError: If the list is empty
"""
if not numbers:
logger.error("Cannot calculate average of empty list")
raise ValueError("Cannot calculate average of empty list")
logger.debug(f"Calculating average of {len(numbers)} values")
return sum(numbers) / len(numbers)
# Test it
def test_calculate_average_happy_path():
"""Test normal case."""
result = calculate_average([1, 2, 3])
assert result == 2.0
logger.info("Happy path test passed")
def test_calculate_average_empty_list():
"""Test edge case: empty list."""
with pytest.raises(ValueError):
calculate_average([])
logger.info("Empty list edge case handled correctly")
When you run pytest -v:
tests/test_stats.py::test_calculate_average_happy_path PASSED
tests/test_stats.py::test_calculate_average_empty_list PASSED
======================== 2 passed in 0.05s ========================
You find the bug during development, not in production. You log it. You fix it. Users never see it. Everyone’s happy.
Test Discovery: Naming Matters
pytest finds tests automatically based on naming conventions. Follow these rules:
- Test files are named
test_*.pyor*_test.py - Test functions are named
test_* - Test classes are named
Test*
pytest will find these:
project/
├── test_mean.py ✓ Starts with test_
├── tests/test_variance.py ✓ File starts with test_
├── stats_test.py ✓ Ends with _test
│
└── THESE GET IGNORED:
├── check_mean.py ✗ Doesn't follow pattern
├── mean_test.py ✗ Not test_mean_test.py
└── tests/stats.py ✗ Doesn't start with test_
Break this rule, and pytest won’t find your tests.
Your First Test: The AAA Pattern
The gold standard for test structure is Arrange → Act → Assert:
def test_mean_of_positive_integers():
# ARRANGE: Set up the data you're testing
numbers = [1, 2, 3, 4, 5]
# ACT: Call the function
result = mean(numbers)
# ASSERT: Check the result
assert result == 3.0
Breaking it down:
- Arrange: Create any inputs or objects the function needs
- Act: Call the function
- Assert: Verify the output is what you expect
This structure makes tests easy to read and understand.
Running Tests
# Run all tests in the current directory
pytest
# Run all tests, verbose (shows each test name)
pytest -v
# Run only one file
pytest tests/test_mean.py
# Run one specific test
pytest tests/test_mean.py::test_mean_of_positive_integers
# Run tests matching a pattern
pytest -k "mean" # Runs test_mean_*, *_mean, etc.
# Stop on the first failure
pytest -x
# Show print statements (normally hidden)
pytest -s
# Quiet mode (only show summary)
pytest -q
Example output from pytest -v:
tests/test_stats.py::test_mean_happy_path PASSED [ 10%]
tests/test_stats.py::test_mean_empty_list FAILED [ 20%]
tests/test_stats.py::test_mean_single_number PASSED [ 30%]
tests/test_stats.py::test_variance_happy_path PASSED [ 40%]
tests/test_stats.py::test_bayes_update_basic PASSED [ 50%]
======= FAILED tests/test_stats.py::test_mean_empty_list ========
def test_mean_empty_list():
with pytest.raises(ValueError):
mean([])
E AssertionError: DID NOT RAISE ValueError
=========== 1 failed, 4 passed in 0.23s ============
This tells you: “The function didn’t raise ValueError when you expected it to. Go fix that.”
Testing for Exceptions
Many functions should raise exceptions for bad input. You test that they do:
def test_mean_raises_on_empty_list():
"""The function should reject empty lists."""
with pytest.raises(ValueError):
mean([])
The with pytest.raises(ValueError): context manager says: “I expect this code to raise ValueError. If it does, the test passes. If it doesn’t, the test fails.”
You can even check the error message:
def test_mean_raises_with_correct_message():
"""Check the error message too."""
with pytest.raises(ValueError, match="Cannot calculate mean"):
mean([])
match takes a regex pattern. The error message must contain “Cannot calculate mean” or the test fails.
Parametrised Tests: DRY Testing
Parametrised tests let you test many inputs with one test function.
Without parametrisation (repetitive):
def test_mean_case_1():
assert mean([1, 2, 3]) == 2.0
def test_mean_case_2():
assert mean([10, 20]) == 15.0
def test_mean_case_3():
assert mean([5]) == 5.0
def test_mean_case_4():
assert mean([-1, -2, -3]) == -2.0
# ... 10 more similar tests
With parametrisation (clean):
@pytest.mark.parametrize("numbers,expected", [
([1, 2, 3], 2.0),
([10, 20], 15.0),
([5], 5.0),
([-1, -2, -3], -2.0),
([0, 0, 0], 0.0),
([1.5, 2.5, 3.5], 2.5),
])
def test_mean_multiple_cases(numbers, expected):
assert mean(numbers) == expected
pytest runs this one test function six times-once for each tuple. Much cleaner.
You can parametrize multiple arguments:
@pytest.mark.parametrize("numbers,alpha,expected", [
([1, 2, 3], 0.05, 0.6666),
([1, 2, 3], 0.1, 0.6666),
([10, 20, 30], 0.05, 66.6666),
])
def test_variance_with_alpha(numbers, alpha, expected):
result = variance(numbers)
assert result == pytest.approx(expected)
Fixtures: Reusable Test Setup
If many tests need the same data, use a fixture:
import pytest
@pytest.fixture
def sample_dataset():
"""Fixture: provides sample data to tests that ask for it."""
return [1, 2, 3, 4, 5]
def test_mean_with_fixture(sample_dataset):
"""This test receives sample_dataset automatically."""
result = mean(sample_dataset)
assert result == 3.0
def test_variance_with_fixture(sample_dataset):
"""Different test, same fixture."""
result = variance(sample_dataset)
assert result == 2.0
The sample_dataset parameter tells pytest: “I need the fixture named sample_dataset.” pytest calls the fixture function and injects the return value.
Why fixtures are great:
- DRY: Define data once, use in many tests
- Readable: Test code is cleaner without setup clutter
- Maintainable: Change the fixture once, all tests update
- Shareable: Fixtures work across multiple test files
conftest.py: Sharing Fixtures Across Test Files
If you have multiple test files and want to share fixtures:
project/
├── stats.py
├── tests/
├── conftest.py ← Fixtures defined here
├── test_mean.py ← Uses fixtures from conftest
├── test_variance.py ← Uses fixtures from conftest
└── test_bayes.py
tests/conftest.py:
import pytest
@pytest.fixture
def simple_numbers():
return [1.0, 2.0, 3.0]
@pytest.fixture
def large_numbers():
return list(range(1, 101))
@pytest.fixture
def negative_numbers():
return [-5.0, -2.0, 0.0, 2.0, 5.0]
tests/test_mean.py:
def test_mean(simple_numbers):
# No import needed, fixture comes from conftest.py
assert mean(simple_numbers) == 2.0
tests/test_variance.py:
def test_variance(simple_numbers):
# Same fixture, different file
result = variance(simple_numbers)
assert result == pytest.approx(2.0/3.0)
pytest automatically finds fixtures in conftest.py. No imports needed.
Mocking: Testing Code That Calls External Services
Here’s a brief intro (we’ll do this more on Day 14):
Sometimes your code calls an external API:
import requests
def fetch_stock_price(ticker):
"""Call an external API to get stock price."""
response = requests.get(f"https://api.example.com/price/{ticker}")
return response.json()["price"]
In tests, you don’t want to hit the real API. It’s slow, unreliable, and might cost money. Instead, you mock it:
from unittest.mock import patch
def test_fetch_stock_price():
"""Test without calling the real API."""
with patch('requests.get') as mock_get:
# Make requests.get return fake data
mock_get.return_value.json.return_value = {"price": 150.0}
result = fetch_stock_price("AAPL")
assert result == 150.0
The test never touches the real API. It’s fast (milliseconds), isolated, and reliable.
Test Coverage: Proving Your Code Is Tested
You write tests. But how do you know if you’ve tested enough?
Code coverage measures what percentage of your code is executed by tests.
pip install pytest-cov
pytest --cov=stats --cov-report=term-missing
Output:
Name Stmts Miss Cover Missing
stats.py 95 5 94% 45-46, 78, 102-104
This says:
stats.pyhas 95 statements (lines of code)- 5 are not executed by any test (missing)
- Coverage is 94%
- Lines 45-46, 78, 102-104 are untested
You then write tests for those lines until coverage is 100% (or as close as practical).
Aim for:
- Green (90%+): Excellent
- Yellow (70-89%): Good
- Red (<70%): Needs work
Good Test Names: Be Descriptive
A good test name tells you exactly what’s being tested and what’s expected.
Bad names:
def test_mean():
pass
def test_variance():
pass
def test_1():
pass
When they fail, you have no idea what broke.
Good names:
def test_mean_of_positive_integers_returns_correct_average():
pass
def test_variance_of_empty_list_raises_valueerror():
pass
def test_bayes_update_with_strong_evidence_increases_posterior():
pass
When they fail, you know exactly what broke. The test name is a specification of what the function should do.
Pattern: test_[function_name]_[condition]_[expected_result]
FIRST Principles: What Makes a Good Test
F - Fast Tests should run in milliseconds. If a test takes seconds, you won’t run them often. You’ll skip them. Bugs hide.
I - Isolated Tests shouldn’t depend on each other. If test A must run before test B, you have a problem. Tests should be independent.
R - Repeatable Same test, same result every time. No random variation. No flaky tests that sometimes pass and sometimes fail.
S - Self-Validating A test either passes or fails. No manual checking (“did this look right?”). No human judgment.
T - Timely Ideally, write tests before the code (Test-Driven Development). Minimum: write tests alongside the code, not months later. Fresh context makes better tests.
The Project: Testing the Stats Functions from Day 9
Today you’ll build a complete pytest test suite for the statistics functions from Day 9.
Project Structure
day10-testing/
├── .venv/
├── .gitignore
├── requirements.txt
├── stats.py ← Code being tested
└── tests/
├── __init__.py ← Empty file, makes tests/ a package
├── conftest.py ← Shared fixtures
└── test_stats.py ← All tests
stats.py - The Code
Create stats.py with full type hints and logging:
"""
Statistics module with type hints and logging.
Day 10: Production-grade Python with testing.
"""
import logging
from typing import List, Optional, Tuple
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
def mean(numbers: List[float]) -> float:
"""
Calculate the arithmetic mean of a list of numbers.
Args:
numbers: List of floats or ints
Returns:
The mean as a float
Raises:
ValueError: If the list is empty
TypeError: If numbers contains non-numeric values
Example:
>>> mean([1, 2, 3])
2.0
"""
if not numbers:
logger.error("Cannot calculate mean of empty list")
raise ValueError("Cannot calculate mean of empty list")
try:
total = sum(numbers)
result = total / len(numbers)
logger.debug(f"Calculated mean of {len(numbers)} values: {result}")
return result
except TypeError as e:
logger.error(f"Invalid data type in numbers list: {e}")
raise TypeError("All elements must be numeric") from e
def variance(numbers: List[float]) -> float:
"""
Calculate the variance (average squared deviation from mean).
Args:
numbers: List of floats or ints
Returns:
The variance as a float
Raises:
ValueError: If the list is empty or has only one element
TypeError: If numbers contains non-numeric values
Example:
>>> variance([1, 2, 3])
0.6666666666666666
"""
if not numbers:
logger.error("Cannot calculate variance of empty list")
raise ValueError("Cannot calculate variance of empty list")
if len(numbers) < 2:
logger.error("Variance requires at least 2 values")
raise ValueError("Variance requires at least 2 values (population has no variance)")
try:
m = mean(numbers)
squared_diffs = [(x - m) ** 2 for x in numbers]
result = sum(squared_diffs) / len(numbers)
logger.debug(f"Calculated variance: {result}")
return result
except TypeError as e:
logger.error(f"Invalid data type in numbers list: {e}")
raise TypeError("All elements must be numeric") from e
def std_dev(numbers: List[float]) -> float:
"""
Calculate the standard deviation (square root of variance).
Args:
numbers: List of floats or ints
Returns:
The standard deviation as a float
Raises:
ValueError: If the list is empty or has only one element
TypeError: If numbers contains non-numeric values
Example:
>>> std_dev([1, 2, 3])
0.8164965809004287
"""
if not numbers:
logger.error("Cannot calculate std dev of empty list")
raise ValueError("Cannot calculate std dev of empty list")
try:
var = variance(numbers)
result = var ** 0.5
logger.debug(f"Calculated std dev: {result}")
return result
except (ValueError, TypeError) as e:
raise
def is_significant(p_value: float, alpha: float = 0.05) -> bool:
"""
Determine if a p-value is statistically significant.
In plain English: Is this result unlikely to happen by chance?
If p_value < alpha, we say "yes, this is surprising" (significant).
Args:
p_value: The p-value (must be between 0 and 1)
alpha: The significance level (default 0.05 = 5%)
Returns:
True if p_value < alpha, False otherwise
Raises:
ValueError: If p_value or alpha are outside [0, 1]
Example:
>>> is_significant(0.03, 0.05)
True
>>> is_significant(0.08, 0.05)
False
"""
if not (0 <= p_value <= 1):
logger.error(f"p_value {p_value} is outside [0, 1]")
raise ValueError("p_value must be between 0 and 1")
if not (0 <= alpha <= 1):
logger.error(f"alpha {alpha} is outside [0, 1]")
raise ValueError("alpha must be between 0 and 1")
result = p_value < alpha
logger.debug(f"p_value={p_value}, alpha={alpha} → significant={result}")
return result
def bayes_update(
prior: float,
likelihood: float,
likelihood_complement: Optional[float] = None
) -> float:
"""
Update a prior probability using Bayes' Theorem.
In plain English: You had a guess (prior). You saw new evidence (likelihood).
What should your new guess be (posterior)?
Bayes' Theorem: P(A|B) = P(B|A) * P(A) / P(B)
Args:
prior: Your initial guess (0 to 1)
likelihood: Probability of evidence given your guess (0 to 1)
likelihood_complement: Probability of evidence given NOT your guess.
If None, assumed to be 1 - likelihood
Returns:
The updated probability (posterior)
Raises:
ValueError: If any probability is outside [0, 1]
Example:
>>> bayes_update(0.5, 0.9)
0.6428571428571429
"""
# Validate inputs
if not (0 <= prior <= 1):
logger.error(f"prior {prior} is outside [0, 1]")
raise ValueError("prior must be between 0 and 1")
if not (0 <= likelihood <= 1):
logger.error(f"likelihood {likelihood} is outside [0, 1]")
raise ValueError("likelihood must be between 0 and 1")
# Default likelihood_complement
if likelihood_complement is None:
likelihood_complement = 1 - likelihood
if not (0 <= likelihood_complement <= 1):
logger.error(f"likelihood_complement {likelihood_complement} is outside [0, 1]")
raise ValueError("likelihood_complement must be between 0 and 1")
# Bayes' Theorem
posterior = (
likelihood * prior /
(likelihood * prior + likelihood_complement * (1 - prior))
)
logger.info(
f"Bayes update: prior={prior} → posterior={posterior:.4f}"
)
return posterior
tests/conftest.py - Shared Fixtures
Create tests/conftest.py:
"""
Pytest configuration and shared fixtures for stats tests.
"""
import pytest
from typing import List
@pytest.fixture
def simple_numbers() -> List[float]:
"""Simple dataset for testing."""
return [1.0, 2.0, 3.0, 4.0, 5.0]
@pytest.fixture
def large_numbers() -> List[float]:
"""Larger dataset for testing edge cases."""
return list(range(1, 101)) # 1 to 100
@pytest.fixture
def negative_numbers() -> List[float]:
"""Numbers including negatives."""
return [-5.0, -2.0, 0.0, 2.0, 5.0]
@pytest.fixture
def single_number() -> List[float]:
"""Single value (edge case)."""
return [42.0]
@pytest.fixture
def duplicate_numbers() -> List[float]:
"""All values the same."""
return [7.0, 7.0, 7.0, 7.0]
tests/test_stats.py - Complete Test Suite
Create tests/test_stats.py:
"""
Complete test suite for stats module.
Day 10: pytest introduction with happy paths, edge cases, and parametrisation.
"""
import pytest
import logging
from typing import List
from stats import mean, variance, std_dev, is_significant, bayes_update
# Get logger for test output
logger = logging.getLogger(__name__)
# ============================================================================
# TESTS FOR mean()
# ============================================================================
class TestMean:
"""Test the mean() function."""
def test_mean_happy_path(self, simple_numbers):
"""Test normal case: positive integers."""
result = mean(simple_numbers)
logger.info(f"mean(simple_numbers) = {result}")
assert result == 3.0
def test_mean_single_number(self, single_number):
"""Edge case: single value."""
result = mean(single_number)
logger.info(f"mean(single_number) = {result}")
assert result == 42.0
def test_mean_all_same(self, duplicate_numbers):
"""Edge case: all values identical."""
result = mean(duplicate_numbers)
logger.info(f"mean(duplicate_numbers) = {result}")
assert result == 7.0
def test_mean_with_negatives(self, negative_numbers):
"""Include negative numbers."""
result = mean(negative_numbers)
logger.info(f"mean(negative_numbers) = {result}")
assert result == pytest.approx(0.0)
def test_mean_empty_list_raises(self):
"""Empty list should raise ValueError."""
logger.info("Testing mean([]) raises ValueError")
with pytest.raises(ValueError, match="Cannot calculate mean"):
mean([])
def test_mean_with_floats(self):
"""Floats should work correctly."""
result = mean([1.5, 2.5, 3.5])
logger.info(f"mean([1.5, 2.5, 3.5]) = {result}")
assert result == pytest.approx(2.5)
@pytest.mark.parametrize("numbers,expected", [
([1, 2, 3], 2.0),
([10, 20], 15.0),
([5], 5.0),
([-1, -2, -3], -2.0),
([0, 0, 0], 0.0),
([1.1, 2.2, 3.3], 2.2),
])
def test_mean_parametrised(self, numbers, expected):
"""Parametrised: test many cases at once."""
result = mean(numbers)
logger.info(f"mean({numbers}) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
def test_mean_non_numeric_raises(self):
"""Non-numeric values should raise TypeError."""
logger.info("Testing mean() with non-numeric values")
with pytest.raises(TypeError):
mean([1, 2, "three"])
# ============================================================================
# TESTS FOR variance()
# ============================================================================
class TestVariance:
"""Test the variance() function."""
def test_variance_happy_path(self, simple_numbers):
"""Test normal case."""
result = variance(simple_numbers)
expected = 2.0
logger.info(f"variance(simple_numbers) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
def test_variance_two_elements(self):
"""Edge case: minimum valid input."""
result = variance([1.0, 3.0])
expected = 1.0
logger.info(f"variance([1.0, 3.0]) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
def test_variance_all_same(self, duplicate_numbers):
"""Edge case: variance of identical values is zero."""
result = variance(duplicate_numbers)
logger.info(f"variance(duplicate_numbers) = {result}, expected = 0.0")
assert result == pytest.approx(0.0)
def test_variance_empty_list_raises(self):
"""Empty list should raise ValueError."""
logger.info("Testing variance([]) raises ValueError")
with pytest.raises(ValueError, match="Cannot calculate variance"):
variance([])
def test_variance_single_element_raises(self):
"""Single element is not enough for variance."""
logger.info("Testing variance([42.0]) raises ValueError")
with pytest.raises(ValueError, match="at least 2 values"):
variance([42.0])
def test_variance_negative_numbers(self, negative_numbers):
"""Variance of numbers including negatives."""
result = variance(negative_numbers)
expected = 10.4
logger.info(f"variance(negative_numbers) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
@pytest.mark.parametrize("numbers,expected", [
([1, 2, 3], 2.0/3.0),
([1, 3], 1.0),
([0, 0], 0.0),
([2, 2, 2, 2], 0.0),
])
def test_variance_parametrised(self, numbers, expected):
"""Parametrised variance tests."""
result = variance(numbers)
logger.info(f"variance({numbers}) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
# ============================================================================
# TESTS FOR std_dev()
# ============================================================================
class TestStdDev:
"""Test the std_dev() function."""
def test_std_dev_happy_path(self, simple_numbers):
"""Test normal case."""
result = std_dev(simple_numbers)
expected = 2.0 ** 0.5
logger.info(f"std_dev(simple_numbers) = {result}, expected = {expected}")
assert result == pytest.approx(expected)
def test_std_dev_all_same(self, duplicate_numbers):
"""Edge case: std dev of identical values is zero."""
result = std_dev(duplicate_numbers)
logger.info(f"std_dev(duplicate_numbers) = {result}, expected = 0.0")
assert result == pytest.approx(0.0)
def test_std_dev_empty_list_raises(self):
"""Empty list should raise ValueError."""
logger.info("Testing std_dev([]) raises ValueError")
with pytest.raises(ValueError):
std_dev([])
def test_std_dev_single_element_raises(self):
"""Single element is not enough."""
logger.info("Testing std_dev([42.0]) raises ValueError")
with pytest.raises(ValueError):
std_dev([42.0])
# ============================================================================
# TESTS FOR is_significant()
# ============================================================================
class TestIsSignificant:
"""Test the is_significant() function."""
def test_significant_below_threshold(self):
"""p-value < alpha → True."""
result = is_significant(0.03, 0.05)
logger.info(f"is_significant(0.03, 0.05) = {result}, expected = True")
assert result is True
def test_not_significant_above_threshold(self):
"""p-value > alpha → False."""
result = is_significant(0.08, 0.05)
logger.info(f"is_significant(0.08, 0.05) = {result}, expected = False")
assert result is False
def test_boundary_equals_alpha(self):
"""p-value == alpha → False (not strictly less)."""
result = is_significant(0.05, 0.05)
logger.info(f"is_significant(0.05, 0.05) = {result}, expected = False")
assert result is False
def test_boundary_just_below_alpha(self):
"""Just below alpha threshold."""
result = is_significant(0.049999, 0.05)
logger.info(f"is_significant(0.049999, 0.05) = {result}, expected = True")
assert result is True
def test_custom_alpha(self):
"""Different significance level."""
result = is_significant(0.08, alpha=0.10)
logger.info(f"is_significant(0.08, alpha=0.10) = {result}, expected = True")
assert result is True
def test_p_value_zero(self):
"""p-value = 0 is always significant."""
result = is_significant(0.0, 0.05)
logger.info(f"is_significant(0.0, 0.05) = {result}, expected = True")
assert result is True
def test_p_value_one(self):
"""p-value = 1 is never significant."""
result = is_significant(1.0, 0.05)
logger.info(f"is_significant(1.0, 0.05) = {result}, expected = False")
assert result is False
def test_invalid_p_value_raises(self):
"""p-value outside [0, 1] raises ValueError."""
logger.info("Testing is_significant with invalid p_value")
with pytest.raises(ValueError, match="p_value must be between"):
is_significant(-0.05, 0.05)
with pytest.raises(ValueError, match="p_value must be between"):
is_significant(1.5, 0.05)
def test_invalid_alpha_raises(self):
"""alpha outside [0, 1] raises ValueError."""
logger.info("Testing is_significant with invalid alpha")
with pytest.raises(ValueError, match="alpha must be between"):
is_significant(0.05, -0.05)
@pytest.mark.parametrize("p_value,alpha,expected", [
(0.01, 0.05, True),
(0.05, 0.05, False),
(0.10, 0.05, False),
(0.001, 0.01, True),
(0.0, 0.05, True),
(1.0, 0.05, False),
])
def test_is_significant_parametrised(self, p_value, alpha, expected):
"""Parametrised tests for various thresholds."""
result = is_significant(p_value, alpha)
logger.info(f"is_significant({p_value}, {alpha}) = {result}, expected = {expected}")
assert result is expected
# ============================================================================
# TESTS FOR bayes_update()
# ============================================================================
class TestBayesUpdate:
"""Test the bayes_update() function."""
def test_bayes_update_basic(self):
"""Test standard Bayes' Theorem calculation."""
result = bayes_update(0.5, 0.9)
expected = 0.5 * 0.9 / (0.5 * 0.9 + 0.5 * 0.1)
logger.info(f"bayes_update(0.5, 0.9) = {result}, expected ≈ {expected}")
assert result == pytest.approx(expected)
assert result == pytest.approx(0.9)
def test_bayes_update_weak_prior(self):
"""Low prior gets updated strongly by evidence."""
result = bayes_update(prior=0.1, likelihood=0.8)
expected = 0.8 * 0.1 / (0.8 * 0.1 + 0.2 * 0.9)
logger.info(f"bayes_update(0.1, 0.8) = {result}, expected ≈ {expected}")
assert result == pytest.approx(expected)
assert result > 0.1 # Posterior > prior
def test_bayes_update_strong_prior(self):
"""High prior stays high unless evidence is strong."""
result = bayes_update(prior=0.9, likelihood=0.6)
logger.info(f"bayes_update(0.9, 0.6) = {result}, expected > 0.9")
assert result > 0.9 # Still high
def test_bayes_update_explicit_complement(self):
"""Can specify likelihood_complement explicitly."""
result1 = bayes_update(0.5, 0.8, likelihood_complement=0.3)
result2 = bayes_update(0.5, 0.8) # Default: 1 - 0.8 = 0.2
logger.info(f"explicit complement: {result1}, default complement: {result2}")
assert result1 != result2
def test_bayes_update_zero_prior(self):
"""Posterior is zero if prior is zero."""
result = bayes_update(0.0, 0.9)
logger.info(f"bayes_update(0.0, 0.9) = {result}, expected = 0.0")
assert result == 0.0
def test_bayes_update_one_prior(self):
"""Posterior stays high if prior is very high."""
result = bayes_update(1.0, 0.5)
logger.info(f"bayes_update(1.0, 0.5) = {result}, expected = 1.0")
assert result == 1.0
def test_bayes_update_invalid_prior_raises(self):
"""Prior outside [0, 1] raises ValueError."""
logger.info("Testing bayes_update with invalid prior")
with pytest.raises(ValueError, match="prior must be between"):
bayes_update(-0.1, 0.5)
with pytest.raises(ValueError, match="prior must be between"):
bayes_update(1.5, 0.5)
def test_bayes_update_invalid_likelihood_raises(self):
"""Likelihood outside [0, 1] raises ValueError."""
logger.info("Testing bayes_update with invalid likelihood")
with pytest.raises(ValueError, match="likelihood must be between"):
bayes_update(0.5, 1.5)
def test_bayes_update_invalid_complement_raises(self):
"""Complement outside [0, 1] raises ValueError."""
logger.info("Testing bayes_update with invalid complement")
with pytest.raises(ValueError, match="likelihood_complement must be between"):
bayes_update(0.5, 0.8, likelihood_complement=1.5)
@pytest.mark.parametrize("prior,likelihood,expected_comparison", [
(0.5, 0.9, "greater"), # Strong evidence increases posterior
(0.5, 0.5, "equal"), # Equal evidence keeps it steady
(0.5, 0.1, "less"), # Weak evidence decreases posterior
])
def test_bayes_update_parametrised(self, prior, likelihood, expected_comparison):
"""Parametrised Bayes update tests."""
result = bayes_update(prior, likelihood)
logger.info(f"bayes_update({prior}, {likelihood}) = {result}")
if expected_comparison == "greater":
assert result > prior or result == pytest.approx(prior)
elif expected_comparison == "equal":
assert result == pytest.approx(prior)
elif expected_comparison == "less":
assert result < prior or result == pytest.approx(prior)
Create init.py
touch tests/__init__.py
This makes tests/ a Python package.
Running the Tests
# Make sure you're in the project root
cd ~/workspace/day10-testing
# Make sure venv is activated
source .venv/bin/activate
# Run all tests
pytest -v
# Run with coverage
pytest --cov=stats --cov-report=term-missing
# Run one test class
pytest -v tests/test_stats.py::TestMean
# Run one test
pytest -v tests/test_stats.py::TestMean::test_mean_happy_path
# Show logging output
pytest -v -s
Expected output from pytest -v:
tests/test_stats.py::TestMean::test_mean_happy_path PASSED [ 5%]
tests/test_stats.py::TestMean::test_mean_single_number PASSED [10%]
tests/test_stats.py::TestMean::test_mean_all_same PASSED [15%]
tests/test_stats.py::TestMean::test_mean_with_negatives PASSED [20%]
tests/test_stats.py::TestMean::test_mean_empty_list_raises PASSED [25%]
tests/test_stats.py::TestMean::test_mean_with_floats PASSED [30%]
tests/test_stats.py::TestMean::test_mean_parametrised[numbers0-expected0] PASSED [35%]
... (more tests)
tests/test_stats.py::TestBayesUpdate::test_bayes_update_parametrised[prior2-likelihood2-less] PASSED [100%]
======================== 60 passed in 0.34s ========================
Expected output from pytest --cov=stats --cov-report=term-missing:
Name Stmts Miss Cover Missing
stats.py 145 0 100%
Congratulations! 100% coverage.
What’s Next
Day 11 will cover async/await and Python packaging.