Day 5 of 180 - Comprehensions, Generators, Decorators & Modern Features

March 24, 2026 17 minute read

Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI

Why These Topics Matter for AI Engineering

When working with large datasets, every byte of memory and every millisecond of computation matters. Today’s topics are the tools that separate production-grade Python from tutorial code.

Generators let you process gigabytes of data without loading it all into RAM. List comprehensions make data transformation readable and fast. Decorators let you add behavior (like caching or timing) without rewriting code. Type hints catch bugs before your code runs-critical when working in teams. Dataclasses eliminate boilerplate for data structures. pathlib handles file paths correctly on Windows, Mac, and Linux without painful string concatenation.

Together, these are the everyday tools of AI engineers processing datasets, building ML pipelines, and shipping production code.

Setup

You need only Python’s standard library for this lesson. Open a terminal and create a virtual environment:

python3 -m venv day5_env
source day5_env/bin/activate  # On Windows: day5_env\Scripts\activate
python --version  # Should be 3.9 or higher (3.10+ recommended for newer type hint syntax)

Create a working directory:

mkdir -p day5_project/{data_samples,analysis_output}
cd day5_project

Part 1: List Comprehensions - Readable Data Transformation

The Analogy

Imagine a factory assembly line where items pass through a filter gate. Some are rejected, the rest are transformed. List comprehensions do exactly that in one readable line.

Basic Syntax

[expression for item in iterable if condition]

The if is optional. Read it left to right: “Make a list of (expression) for each (item) in (iterable) if (condition).”

Example 1: Simple Transformation

# Without comprehension (verbose)
numbers = [1, 2, 3, 4, 5]
squared = []
for n in numbers:
    squared.append(n ** 2)
print(squared)  # [1, 4, 9, 16, 25]
 
# With comprehension (readable)
squared = [n ** 2 for n in numbers]
print(squared)  # [1, 4, 9, 16, 25]

Example 2: Filtering

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 
# Filter: keep only even numbers
evens = [n for n in numbers if n % 2 == 0]
print(evens)  # [2, 4, 6, 8, 10]
 
# Filter and transform: keep even numbers, then square them
evens_squared = [n ** 2 for n in numbers if n % 2 == 0]
print(evens_squared)  # [4, 16, 36, 64, 100]

Example 3: Dictionary Comprehensions

# Create a dict mapping word to word length
words = ["python", "engineering", "data"]
word_lengths = {word: len(word) for word in words}
print(word_lengths)  # {'python': 6, 'engineering': 11, 'data': 4}
 
# Swap keys and values
original = {"a": 1, "b": 2, "c": 3}
swapped = {v: k for k, v in original.items()}
print(swapped)  # {1: 'a', 2: 'b', 3: 'c'}

Example 4: Set Comprehensions

numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
 
# Remove duplicates and filter
unique_evens = {n for n in numbers if n % 2 == 0}
print(unique_evens)  # {2, 4}

Example 5: Nested Comprehensions - When They Help

# Create a 3x3 matrix (list of lists)
matrix = [[i * 3 + j for j in range(3)] for i in range(3)]
print(matrix)
# [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
 
# Flatten a nested list
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [item for sublist in nested for item in sublist]
print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Example 6: Nested Comprehensions - When to Avoid

# Too complex - hard to read. Use a regular loop or generator instead.
result = [
    x * y
    for x in range(10)
    for y in range(10)
    if (x + y) % 2 == 0 and x * y > 20
]
 
# Better: be explicit
result = []
for x in range(10):
    for y in range(10):
        if (x + y) % 2 == 0 and x * y > 20:
            result.append(x * y)

Rule of thumb: If it takes more than one second to read, use a regular loop. Comprehensions should be readable.

Example 7: Generator Expressions (Lazy Evaluation)

# List comprehension - evaluates immediately, stores all in memory
lazy_list = [n ** 2 for n in range(1_000_000)]  # Uses lots of memory
 
# Generator expression - evaluates on demand, one at a time
lazy_gen = (n ** 2 for n in range(1_000_000))  # Uses almost no memory
 
# They look identical except () vs []
# But generators are lazy - they don't compute until you ask
 
print(next(lazy_gen))  # 0 (first square)
print(next(lazy_gen))  # 1 (second square)

Part 2: Generators - Processing Without Loading Everything

The Analogy

A vending machine gives you one item at a time when you press the button, instead of handing you the entire inventory.

Why Generators Matter

Processing a 100 GB dataset? Load it line by line with a generator instead of loading all 100 GB into RAM.

import sys
 
# List: stores all in memory at once
list_squares = [n ** 2 for n in range(1000)]
print(f"List memory: {sys.getsizeof(list_squares)} bytes")  # ~9000+ bytes
 
# Generator: computes one at a time
def gen_squares(n):
    for i in range(n):
        yield i ** 2
 
gen_squares_obj = gen_squares(1000)
print(f"Generator memory: {sys.getsizeof(gen_squares_obj)} bytes")  # ~128 bytes

Example 1: Simple Generator with `yield`

def count_up(max_num: int):
    """Generator that yields numbers from 0 to max_num."""
    current = 0
    while current < max_num:
        yield current
        current += 1
 
# Use it
for num in count_up(5):
    print(num)
# Output: 0 1 2 3 4

Example 2: Reading Large Files Line by Line

def read_large_file(file_path: str):
    """Generator that reads file line by line without loading all into memory."""
    with open(file_path, "r") as f:
        for line in f:
            yield line.strip()
 
# Use it (simulated with a string)
# In real use: for line in read_large_file("huge_file.txt"):
#     process(line)

Example 3: `yield from` - Delegating to Another Generator

def gen_a():
    """First generator."""
    yield 1
    yield 2
 
def gen_b():
    """Second generator."""
    yield 3
    yield 4
 
def gen_combined():
    """Combine both generators."""
    yield from gen_a()
    yield from gen_b()
 
for value in gen_combined():
    print(value)
# Output: 1 2 3 4

Example 4: Manual Iteration with `next()`

def countdown(n: int):
    """Countdown generator."""
    while n > 0:
        yield n
        n -= 1
 
counter = countdown(3)
print(next(counter))  # 3
print(next(counter))  # 2
print(next(counter))  # 1
# print(next(counter))  # Would raise StopIteration

Example 5: Infinite Generator

def infinite_counter(start: int = 0):
    """Infinite generator - keeps going forever."""
    current = start
    while True:
        yield current
        current += 1
 
counter = infinite_counter(10)
print(next(counter))  # 10
print(next(counter))  # 11
print(next(counter))  # 12
# Keep calling next() - it never stops

Example 6: Generator with Conditional Yield

def fibonacci(limit: int):
    """Generate Fibonacci numbers up to limit."""
    a, b = 0, 1
    while a < limit:
        yield a
        a, b = b, a + b
 
for fib in fibonacci(100):
    print(fib, end=" ")
# Output: 0 1 1 2 3 5 8 13 21 34 55 89

Part 3: Decorators - Adding Behavior Without Rewriting Code

The Analogy

A gift wrapper takes your present and wraps it in paper and ribbon, then hands back the wrapped version. The present is unchanged, but it’s now decorated. Decorators do the same for functions.

Example 1: Write a Decorator from Scratch

import time
from typing import Callable, Any
 
def timer(func: Callable) -> Callable:
    """Decorator that measures how long a function takes."""
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"{func.__name__} took {end - start:.4f} seconds")
        return result
    return wrapper
 
# Use it
@timer
def slow_function():
    time.sleep(1)
    return "Done!"
 
result = slow_function()
# Output: slow_function took 1.0001 seconds
# Done!

Problem: Decorators Lose Function Metadata

def simple_decorator(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper
 
@simple_decorator
def greet(name: str) -> str:
    """Say hello to someone."""
    return f"Hello, {name}!"
 
print(greet.__name__)  # 'wrapper' - WRONG! Should be 'greet'
print(greet.__doc__)   # None - WRONG! Should be the docstring

Example 2: Fix with `functools.wraps`

import functools
from typing import Callable, Any
 
def timer(func: Callable) -> Callable:
    """Decorator that measures function execution time."""
    @functools.wraps(func)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        import time
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"{func.__name__} took {end - start:.4f} seconds")
        return result
    return wrapper
 
@timer
def greet(name: str) -> str:
    """Say hello to someone."""
    return f"Hello, {name}!"
 
print(greet.__name__)  # 'greet' - CORRECT
print(greet.__doc__)   # 'Say hello to someone.' - CORRECT

Example 3: Decorator with Arguments (Decorator Factory)

import functools
from typing import Callable, Any
 
def repeat(times: int) -> Callable:
    """Decorator factory that repeats a function N times."""
    def decorator(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> None:
            for _ in range(times):
                func(*args, **kwargs)
        return wrapper
    return decorator
 
@repeat(times=3)
def say_hello():
    print("Hello!")
 
say_hello()
# Output:
# Hello!
# Hello!
# Hello!

Example 4: Stacking Decorators

import functools
import time
from typing import Callable, Any
 
def timer(func: Callable) -> Callable:
    @functools.wraps(func)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"[TIMER] {func.__name__}: {end - start:.4f}s")
        return result
    return wrapper
 
def log_call(func: Callable) -> Callable:
    @functools.wraps(func)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        print(f"[LOG] Calling {func.__name__} with args={args}, kwargs={kwargs}")
        return func(*args, **kwargs)
    return wrapper
 
# Stack decorators - applied bottom to top
@timer
@log_call
def add(a: int, b: int) -> int:
    time.sleep(0.1)
    return a + b
 
result = add(2, 3)
# Output:
# [LOG] Calling add with args=(2, 3), kwargs={}
# [TIMER] add: 0.1005s
# 5

Example 5: Caching Decorator (Memoization)

import functools
from typing import Callable, Any
 
def cache(func: Callable) -> Callable:
    """Decorator that caches function results."""
    results = {}
 
    @functools.wraps(func)
    def wrapper(*args: Any) -> Any:
        if args not in results:
            results[args] = func(*args)
        return results[args]
    return wrapper
 
@cache
def expensive_computation(n: int) -> int:
    """Simulates a slow calculation."""
    import time
    time.sleep(1)
    return n ** 2
 
print(expensive_computation(5))  # Takes 1 second
print(expensive_computation(5))  # Instant (from cache)

Part 4: Type Hints - Your First Production Standard

Why Type Hints Matter

Catch bugs early: IDEs warn you when you pass the wrong type
Better autocomplete: Your editor knows what methods are available
Self-documenting: Readers immediately know what types are expected
Easier refactoring: Change a type signature, find all broken calls

Type hints are now a Day 5 production standard. From this point forward, every function must have type hints on parameters and return type.

Example 1: Basic Types

def greet(name: str) -> str:
    """Say hello to someone."""
    return f"Hello, {name}!"
 
def add(a: int, b: int) -> int:
    """Add two integers."""
    return a + b
 
def is_positive(num: float) -> bool:
    """Check if a number is positive."""
    return num > 0

Example 2: Collections

def first_three(numbers: list[int]) -> list[int]:
    """Return the first three numbers."""
    return numbers[:3]
 
def count_words(text: str) -> dict[str, int]:
    """Count occurrences of each word."""
    words = text.split()
    return {word: words.count(word) for word in set(words)}
 
def get_coords() -> tuple[float, float]:
    """Return latitude and longitude."""
    return (40.7128, -74.0060)
 
def unique_tags(tags: list[str]) -> set[str]:
    """Return unique tags."""
    return set(tags)

Example 3: Optional Types

from typing import Optional
 
def find_user(user_id: int) -> Optional[str]:
    """Find a user by ID, or None if not found."""
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)
 
# Python 3.10+ allows this syntax
def find_user_modern(user_id: int) -> str | None:
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)

Example 4: Union Types (Multiple Possible Types)

from typing import Union
 
def process_data(data: Union[int, str]) -> str:
    """Accept either int or str."""
    return str(data).upper()
 
# Python 3.10+ allows this syntax
def process_data_modern(data: int | str) -> str:
    return str(data).upper()

Example 5: Generic Types with TypeVar

from typing import TypeVar
 
T = TypeVar('T')  # 'T' can be any type
 
def get_first(items: list[T]) -> T:
    """Get the first item from a list."""
    return items[0]
 
# Works with any type
first_num = get_first([1, 2, 3])      # int
first_str = get_first(["a", "b"])     # str

Example 6: Callable Types (Function Types)

from typing import Callable
 
def apply_twice(func: Callable[[int], int], value: int) -> int:
    """Apply a function twice to a value."""
    return func(func(value))
 
def square(n: int) -> int:
    return n ** 2
 
result = apply_twice(square, 3)  # 3 -> 9 -> 81
print(result)  # 81

Example 7: Protocol for Duck Typing

from typing import Protocol
 
class Drawable(Protocol):
    """Anything with a draw() method."""
    def draw(self) -> None:
        ...
 
class Circle:
    def draw(self) -> None:
        print("Drawing circle")
 
class Square:
    def draw(self) -> None:
        print("Drawing square")
 
def render(shape: Drawable) -> None:
    """Draw any shape."""
    shape.draw()
 
render(Circle())   # Works
render(Square())   # Works

Example 8: TypedDict for Typed Dictionaries

from typing import TypedDict
 
class Person(TypedDict):
    name: str
    age: int
    email: str
 
def greet_person(person: Person) -> str:
    return f"Hello {person['name']}, age {person['age']}"
 
# Dict with correct types
person = {"name": "Alice", "age": 30, "email": "alice@example.com"}
print(greet_person(person))

Part 5: Dataclasses - Auto-Generating Data Structure Boilerplate

The Analogy

A dataclass is like a form template. Instead of writing __init__, __repr__, and __eq__ by hand, the decorator fills them in automatically.

Example 1: Basic Dataclass

from dataclasses import dataclass
 
@dataclass
class Person:
    name: str
    age: int
    email: str
 
# Automatically gets __init__, __repr__, __eq__
person = Person(name="Alice", age=30, email="alice@example.com")
print(person)  # Person(name='Alice', age=30, email='alice@example.com')
 
# Can compare directly
person2 = Person(name="Alice", age=30, email="alice@example.com")
print(person == person2)  # True

Example 2: Default Values

from dataclasses import dataclass
 
@dataclass
class Config:
    host: str
    port: int = 8000
    debug: bool = False
 
# Can omit fields with defaults
config = Config(host="localhost")
print(config)  # Config(host='localhost', port=8000, debug=False)

Example 3: `field()` for Custom Defaults

from dataclasses import dataclass, field
 
@dataclass
class Team:
    name: str
    members: list[str] = field(default_factory=list)
    scores: dict[str, int] = field(default_factory=dict)
 
team1 = Team(name="Alpha")
team2 = Team(name="Beta")
 
# Each team has its own list/dict (not shared!)
team1.members.append("Alice")
print(team1.members)  # ['Alice']
print(team2.members)  # []

Example 4: Validation with `__post_init__`

from dataclasses import dataclass
 
@dataclass
class Person:
    name: str
    age: int
 
    def __post_init__(self) -> None:
        """Validate after initialization."""
        if self.age < 0:
            raise ValueError(f"Age cannot be negative: {self.age}")
        if not self.name:
            raise ValueError("Name cannot be empty")
 
# This works
person = Person(name="Alice", age=30)
 
# This fails
try:
    bad_person = Person(name="Bob", age=-5)
except ValueError as e:
    print(f"Error: {e}")  # Error: Age cannot be negative: -5

Example 5: Immutable Dataclass with `frozen=True`

from dataclasses import dataclass
 
@dataclass(frozen=True)
class Point:
    x: float
    y: float
 
point = Point(x=10.0, y=20.0)
print(point)  # Point(x=10.0, y=20.0)
 
# Try to modify
try:
    point.x = 15.0
except Exception as e:
    print(f"Error: {type(e).__name__}")  # Error: FrozenInstanceError

Part 6: pathlib - Working with File Paths Correctly

Why pathlib Over os.path

os.path.join() requires string concatenation: os.path.join("folder", "file.txt")
pathlib uses the / operator: Path("folder") / "file.txt"
Handles Windows, Mac, Linux paths automatically
More readable, more Pythonic

Example 1: Creating and Checking Paths

from pathlib import Path
 
# Create a Path
file_path = Path("data") / "input.txt"  # Works on any OS
print(file_path)  # data/input.txt (or data\input.txt on Windows)
 
# Check if it exists
if file_path.exists():
    print("File exists!")
else:
    print("File does not exist")
 
# Check type
if file_path.is_file():
    print("It's a file")
elif file_path.is_dir():
    print("It's a directory")

Example 2: Reading and Writing

from pathlib import Path
 
# Write text
output_file = Path("output") / "result.txt"
output_file.parent.mkdir(parents=True, exist_ok=True)  # Create parent dir if needed
output_file.write_text("Hello, World!")
 
# Read text
content = output_file.read_text()
print(content)  # Hello, World!
 
# Append text (read, modify, write back)
content = output_file.read_text()
output_file.write_text(content + "\nNew line added")

Example 3: Listing Files with `.glob()`

from pathlib import Path
 
# Find all .txt files recursively
for txt_file in Path("data").glob("**/*.txt"):
    print(txt_file)
 
# Find all .py files in current directory (non-recursive)
for py_file in Path(".").glob("*.py"):
    print(py_file)

Example 4: Iterating Directories with `.iterdir()`

from pathlib import Path
 
# List everything in a directory
for item in Path("data").iterdir():
    if item.is_file():
        print(f"File: {item.name}")
    elif item.is_dir():
        print(f"Directory: {item.name}")

Example 5: Working with Path Components

from pathlib import Path
 
file_path = Path("data/analysis/results.txt")
 
print(file_path.name)        # results.txt
print(file_path.stem)        # results (filename without extension)
print(file_path.suffix)      # .txt (extension)
print(file_path.parent)      # data/analysis
print(file_path.parts)       # ('data', 'analysis', 'results.txt')

Example 6: Resolving to Absolute Path

from pathlib import Path
 
relative = Path("data/input.txt")
absolute = relative.resolve()
print(absolute)  # /full/path/to/data/input.txt

The Project: File Analysis Tool

Let’s build a complete file analysis tool that uses all six concepts together:

Requirements

Use pathlib to scan a directory
Use a generator to lazily yield file statistics for each file
Use list comprehension to filter by file extension
Use @dataclass for the FileStats data structure
Use @timer decorator to measure analysis time
Use type hints on every function

Step 1: Create the Main Script

Create analyzer.py:

import time
import functools
from dataclasses import dataclass
from pathlib import Path
from typing import Generator, Callable, Any
 
# =============================================================================
# Data Structure: FileStats
# =============================================================================
 
@dataclass
class FileStats:
    """Statistics for a single file."""
    name: str
    path: Path
    size_bytes: int
    extension: str
 
    def __post_init__(self) -> None:
        """Validate after initialization."""
        if self.size_bytes < 0:
            raise ValueError(f"Size cannot be negative: {self.size_bytes}")
 
# =============================================================================
# Decorator: Timer
# =============================================================================
 
def timer(func: Callable) -> Callable:
    """Decorator that measures function execution time."""
    @functools.wraps(func)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        elapsed = end - start
        print(f"\n[TIMER] {func.__name__} took {elapsed:.4f} seconds\n")
        return result
    return wrapper
 
# =============================================================================
# Generator: Analyze Files
# =============================================================================
 
def analyze_directory(directory: Path) -> Generator[FileStats, None, None]:
    """
    Generator that yields FileStats for each file in a directory.
    Lazy evaluation - doesn't load all files at once.
    """
    if not directory.is_dir():
        raise ValueError(f"Not a directory: {directory}")
 
    for file_path in directory.rglob("*"):
        if file_path.is_file():
            try:
                size = file_path.stat().st_size
                yield FileStats(
                    name=file_path.name,
                    path=file_path,
                    size_bytes=size,
                    extension=file_path.suffix,
                )
            except OSError:
                # Skip files we can't access
                pass
 
# =============================================================================
# Main Analysis Functions with Type Hints
# =============================================================================
 
def filter_by_extension(
    stats: Generator[FileStats, None, None],
    extension: str
) -> list[FileStats]:
    """
    Filter files by extension using list comprehension.
    Example: filter_by_extension(gen, ".py") returns only Python files.
    """
    return [
        stat for stat in stats
        if stat.extension.lower() == extension.lower()
    ]
 
def get_largest_files(
    stats: Generator[FileStats, None, None],
    top_n: int = 5
) -> list[FileStats]:
    """
    Return the N largest files, sorted by size descending.
    """
    all_stats = list(stats)
    return sorted(all_stats, key=lambda s: s.size_bytes, reverse=True)[:top_n]
 
def total_size_by_extension(
    stats: Generator[FileStats, None, None]
) -> dict[str, int]:
    """
    Use dict comprehension to sum sizes by extension.
    """
    all_stats = list(stats)
    return {
        ext: sum(s.size_bytes for s in all_stats if s.extension == ext)
        for ext in {s.extension for s in all_stats}
    }
 
def format_size(size_bytes: int) -> str:
    """Convert bytes to human-readable format."""
    for unit in ["B", "KB", "MB", "GB"]:
        if size_bytes < 1024:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024
    return f"{size_bytes:.1f} TB"
 
# =============================================================================
# Main Entry Point
# =============================================================================
 
@timer
def main(directory: str) -> None:
    """
    Run full analysis with timing.
    """
    target_dir = Path(directory)
 
    if not target_dir.exists():
        print(f"Error: Directory does not exist: {target_dir}")
        return
 
    print(f"Analyzing directory: {target_dir}\n")
 
    # Example 1: All Python files
    print("=" * 70)
    print("PYTHON FILES")
    print("=" * 70)
    py_files = filter_by_extension(analyze_directory(target_dir), ".py")
    for stat in py_files:
        print(f"  {stat.name:<40} {format_size(stat.size_bytes):>10}")
 
    # Example 2: Largest files
    print("\n" + "=" * 70)
    print("TOP 5 LARGEST FILES")
    print("=" * 70)
    largest = get_largest_files(analyze_directory(target_dir), top_n=5)
    for stat in largest:
        print(f"  {stat.name:<40} {format_size(stat.size_bytes):>10}")
 
    # Example 3: Size by extension
    print("\n" + "=" * 70)
    print("SIZE BY FILE EXTENSION")
    print("=" * 70)
    sizes = total_size_by_extension(analyze_directory(target_dir))
    for ext, size in sorted(sizes.items(), key=lambda x: x[1], reverse=True):
        ext_label = ext if ext else "[no extension]"
        print(f"  {ext_label:<40} {format_size(size):>10}")
 
if __name__ == "__main__":
    # Analyze the current directory
    main(".")

Step 2: Create Sample Data

Create create_sample_data.py to generate test files:

from pathlib import Path
 
def create_sample_structure() -> None:
    """Create sample directory structure for testing."""
    base = Path("sample_data")
    base.mkdir(exist_ok=True)
 
    # Create subdirectories
    (base / "python_scripts").mkdir(exist_ok=True)
    (base / "config").mkdir(exist_ok=True)
    (base / "docs").mkdir(exist_ok=True)
 
    # Create sample files
    (base / "python_scripts" / "main.py").write_text(
        "print('Hello World')\n" * 100
    )
    (base / "python_scripts" / "utils.py").write_text(
        "def helper():\n    pass\n" * 50
    )
    (base / "config" / "settings.json").write_text(
        '{"debug": true, "port": 8000}\n' * 30
    )
    (base / "config" / "database.ini").write_text(
        "[database]\nhost=localhost\nport=5432\n" * 40
    )
    (base / "docs" / "README.md").write_text(
        "# Project Documentation\n\nThis is a sample file.\n" * 60
    )
    (base / "notes.txt").write_text(
        "Random notes and ideas\n" * 25
    )
 
    print(f"Sample data created in: {base}")
 
if __name__ == "__main__":
    create_sample_structure()

Step 3: Run the Analysis

# Create sample data
python create_sample_data.py
# Output: Sample data created in: sample_data
 
# Analyze the sample data
python analyzer.py sample_data

Expected Output

Analyzing directory: sample_data
 
======================================================================
PYTHON FILES
======================================================================
  main.py                                    1.2 KB
  utils.py                                   0.6 KB
 
======================================================================
TOP 5 LARGEST FILES
======================================================================
  README.md                                  3.5 KB
  settings.json                              1.6 KB
  database.ini                               2.1 KB
  main.py                                    1.2 KB
  utils.txt                                  0.6 KB
 
======================================================================
SIZE BY FILE EXTENSION
======================================================================
  .md                                        3.5 KB
  .ini                                       2.1 KB
  .json                                      1.6 KB
  .py                                        1.8 KB
  [no extension]                             0.6 KB
 
[TIMER] main took 0.0024 seconds

What’s Next

Day 6 introduces NumPy and Pandas, Python’s powerhouses for numerical computing and data analysis.

Share on

X Facebook LinkedIn Bluesky

Edward

Part of my 180-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI

Why These Topics Matter for AI Engineering

Setup

Part 1: List Comprehensions - Readable Data Transformation

The Analogy

Basic Syntax

Example 1: Simple Transformation

Example 2: Filtering

Example 3: Dictionary Comprehensions

Example 4: Set Comprehensions

Example 5: Nested Comprehensions - When They Help

Example 6: Nested Comprehensions - When to Avoid

Example 7: Generator Expressions (Lazy Evaluation)

Part 2: Generators - Processing Without Loading Everything

The Analogy

Why Generators Matter

Example 1: Simple Generator with yield

Example 2: Reading Large Files Line by Line

Example 3: yield from - Delegating to Another Generator

Example 4: Manual Iteration with next()

Example 5: Infinite Generator

Example 6: Generator with Conditional Yield

Part 3: Decorators - Adding Behavior Without Rewriting Code

The Analogy

Example 1: Write a Decorator from Scratch

Problem: Decorators Lose Function Metadata

Example 2: Fix with functools.wraps

Example 3: Decorator with Arguments (Decorator Factory)

Example 4: Stacking Decorators

Example 5: Caching Decorator (Memoization)

Part 4: Type Hints - Your First Production Standard

Why Type Hints Matter

Example 1: Basic Types

Example 2: Collections

Example 3: Optional Types

Example 4: Union Types (Multiple Possible Types)

Example 5: Generic Types with TypeVar

Example 6: Callable Types (Function Types)

Example 7: Protocol for Duck Typing

Example 8: TypedDict for Typed Dictionaries

Part 5: Dataclasses - Auto-Generating Data Structure Boilerplate

The Analogy

Example 1: Basic Dataclass

Example 2: Default Values

Example 3: field() for Custom Defaults

Example 4: Validation with __post_init__

Example 5: Immutable Dataclass with frozen=True

Part 6: pathlib - Working with File Paths Correctly

Why pathlib Over os.path

Example 1: Creating and Checking Paths

Example 2: Reading and Writing

Example 3: Listing Files with .glob()

Example 4: Iterating Directories with .iterdir()

Example 5: Working with Path Components

Example 6: Resolving to Absolute Path

The Project: File Analysis Tool

Requirements

Step 1: Create the Main Script

Step 2: Create Sample Data

Step 3: Run the Analysis

Expected Output

What’s Next

Share on

You May Also Enjoy

Reverse Engineering an MCP-Based Job Application (Without Instructions)

Day 10 of 180 - PyTest & Virtual Environments

Day 9 of 180 - Probability & Statistics

Day 8 of 180 - Linear Algebra & Calculus

Example 1: Simple Generator with `yield`

Example 3: `yield from` - Delegating to Another Generator

Example 4: Manual Iteration with `next()`

Example 2: Fix with `functools.wraps`

Example 3: `field()` for Custom Defaults

Example 4: Validation with `__post_init__`

Example 5: Immutable Dataclass with `frozen=True`

Example 3: Listing Files with `.glob()`

Example 4: Iterating Directories with `.iterdir()`