Episode 3: Dependencies & EnvironmentsΒΆ
Learning Objectives
By the end of this episode, you will:
- Understand how to specify package dependencies
- Use optional dependencies with "extras"
- Work with virtual environments effectively
- Handle version constraints properly
- Understand the difference between package and development dependencies
π¬ Adding new/more featuresΒΆ
Dr. Sarah's kir-pydemo package is getting popular! Now she wants to add some new features:
- Read FASTA files (needs
biopython) - Create plots of GC content distributions (needs
matplotlib) - Statistical analysis of sequences (needs
numpy,scipy)
But she has concerns:
"Not everyone needs all these features. Do I force all users to install matplotlib even if they just want basic sequence analysis? What if someone's using an old version of numpy that conflicts with what I need?"
The solution? Proper dependency management with pyproject.toml!
π¦ Understanding DependenciesΒΆ
Dependencies are other Python packages that your package needs to work. There are different types:
1. Core DependenciesΒΆ
Required for basic functionality - installed automatically with your package:
2. Optional DependenciesΒΆ
Needed for extra features - installed only when requested:
[project.optional-dependencies]
plotting = ["matplotlib>=3.5.0"]
dev = ["pytest>=7.0", "black>=22.0"]
Installed with: pip install kir-pydemo[plotting]
3. Development DependenciesΒΆ
Tools for development - not needed by users:
- Testing frameworks (pytest)
- Code formatters (black, ruff)
- Documentation builders (sphinx)
- Type checkers (mypy)
π¨ Hands-On: Adding DependenciesΒΆ
Step 1: Decide What's Core vs. OptionalΒΆ
For kir-pydemo, let's say we want to add:
- Core: None (our basic functions use only stdlib!)
- Optional - bio:
biopythonfor FASTA file support - Optional - plotting:
matplotlibfor visualization - Optional - dev: Testing and code quality tools
Step 2: Update pyproject.tomlΒΆ
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "kir-pydemo"
version = "0.2.0" # π Bumped version
description = "A demonstration package for DNA sequence analysis"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "MIT"}
authors = [
{name = "BMRC Training", email = "training@example.com"}
]
keywords = ["bioinformatics", "DNA", "sequence analysis", "tutorial"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
# π NEW: Optional dependencies
[project.optional-dependencies]
bio = [
"biopython>=1.80",
]
plotting = [
"matplotlib>=3.5.0",
"numpy>=1.20.0",
]
dev = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"black>=23.0.0",
"ruff>=0.1.0",
"mypy>=1.5.0",
]
# Convenience: Install everything
all = [
"kir-pydemo[bio,plotting]",
]
[project.scripts]
kir-pydemo = "kir_pydemo.cli:main"
[project.urls]
Homepage = "https://github.com/bmrc/kir-pydemo"
Documentation = "https://kir-pydemo.readthedocs.io"
Repository = "https://github.com/bmrc/kir-pydemo"
Issues = "https://github.com/bmrc/kir-pydemo/issues"
Step 3: Version ConstraintsΒΆ
Let's understand version specifiers:
dependencies = [
"numpy", # Any version (not recommended!)
"numpy>=1.20.0", # Minimum version
"numpy>=1.20.0,<2.0.0", # Version range
"numpy~=1.20.0", # Compatible release (>=1.20.0, <1.21.0)
"numpy==1.20.0", # Exact version (too restrictive!)
]
Best practices:
- β
Use minimum versions:
package>=1.0.0 - β
Exclude known broken versions:
package>=1.0.0,!=1.2.0 - β
Use upper bounds cautiously:
package>=1.0.0,<2.0.0 - β Avoid pinning exact versions in libraries:
package==1.0.0
Pinning vs. Constraints
Libraries (packages imported by others) should use loose constraints:
Applications (final products) can pin exact versions:
kir-pydemo is a library, so we use minimum version constraints.
Step 4: Add FASTA SupportΒΆ
Create a new module src/kir_pydemo/io.py that uses biopython:
"""File I/O utilities for sequence data."""
from pathlib import Path
from typing import List, Tuple
try:
from Bio import SeqIO
HAS_BIOPYTHON = True
except ImportError:
HAS_BIOPYTHON = False
def read_fasta(filepath: Path) -> List[Tuple[str, str]]:
"""
Read sequences from a FASTA file.
Parameters
----------
filepath : Path
Path to the FASTA file
Returns
-------
List[Tuple[str, str]]
List of (name, sequence) tuples
Raises
------
ImportError
If biopython is not installed
FileNotFoundError
If the file doesn't exist
Examples
--------
>>> sequences = read_fasta(Path("sequences.fasta"))
>>> for name, seq in sequences:
... print(f"{name}: {len(seq)} bp")
"""
if not HAS_BIOPYTHON:
raise ImportError(
"biopython is required for FASTA support. "
"Install with: pip install kir-pydemo[bio]"
)
if not filepath.exists():
raise FileNotFoundError(f"File not found: {filepath}")
sequences = []
for record in SeqIO.parse(filepath, "fasta"):
sequences.append((record.id, str(record.seq)))
return sequences
Graceful Degradation
Notice the pattern:
- Try to import optional dependency
- Set a
HAS_*flag - Check the flag before using the feature
- Raise helpful error if not installed
This allows users to install only what they need!
Step 5: Update CLI for FASTA SupportΒΆ
Update src/kir_pydemo/cli.py to support FASTA files:
# Add to the gc-content subcommand
gc_parser.add_argument(
"--fasta",
type=Path,
help="read sequences from FASTA file (requires: pip install kir-pydemo[bio])",
)
# In cmd_gc_content function
def cmd_gc_content(args: argparse.Namespace) -> int:
"""Handle the gc-content command."""
sequences = []
if args.fasta:
try:
from kir_pydemo.io import read_fasta
fasta_sequences = read_fasta(args.fasta)
for name, seq in fasta_sequences:
result = gc_content(seq)
print(f"{name}: GC content = {result:.{args.precision}f}%")
return 0
except ImportError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
# ... rest of the function
π Virtual EnvironmentsΒΆ
Virtual environments isolate your project's dependencies from the system Python.
Why Use Virtual Environments?ΒΆ
Without virtual environments:
System Python
βββ numpy==1.19.0 (old project needs this)
βββ pandas==1.3.0
βββ kir-pydemo attempts to install numpy>=1.20.0 β CONFLICT!
With virtual environments:
System Python
βββ virtualenv installed
Project A (venv-a/)
βββ numpy==1.19.0
βββ pandas==1.3.0
Project B (venv-b/)
βββ numpy==1.23.0 β
No conflict!
βββ kir-pydemo
Creating Virtual EnvironmentsΒΆ
Using venv (built-in)ΒΆ
# Create a virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Your prompt changes: (venv) user@host:~$
# Install packages in this environment
pip install -e ".[dev]"
# Deactivate when done
deactivate
Installing with ExtrasΒΆ
# Install just the package
uv pip install kir-pydemo
# Install with bio support
uv pip install kir-pydemo[bio]
# Install with multiple extras
uv pip install kir-pydemo[bio,plotting]
# Install everything
uv pip install kir-pydemo[all]
# For development (editable install with dev tools)
uv pip install -e ".[dev]"
Quote the Extras
On some shells (especially zsh), you need quotes:
π requirements.txt vs pyproject.tomlΒΆ
People often ask: "Should I use requirements.txt or pyproject.toml?"
pyproject.toml (for libraries)ΒΆ
Use when:
- Building a package to distribute
- Want to specify minimum requirements
- Need flexibility for users
requirements.txt (for applications)ΒΆ
Use when:
- Deploying an application
- Need reproducible environments
- Want exact versions
Both TogetherΒΆ
For kir-pydemo development, you might have:
pyproject.toml - Loose constraints for users:
requirements-dev.txt - Pinned versions for development:
Generate from current environment:
π Dependency Lock FilesΒΆ
Modern tools provide lock files for reproducible installs:
Poetry (poetry.lock)ΒΆ
# Install poetry
uv pip install poetry
# Initialize
poetry init
# Add dependency
poetry add numpy
# Generates poetry.lock with exact versions
PDM (pdm.lock)ΒΆ
# Install pdm
uv pip install pdm
# Initialize
pdm init
# Add dependency
pdm add numpy
# Generates pdm.lock
pip-tools (requirements.txt + requirements.in)ΒΆ
Lock Files in Practice
For kir-pydemo (a library), we don't commit lock files to the repository. For applications, lock files ensure everyone uses identical dependency versions.
π Checkpoint: What Have We Achieved?ΒΆ
Verify you've successfully completed Episode 3:
- Added optional dependencies to
pyproject.toml - Created extras:
[bio],[plotting],[dev],[all] - Implemented graceful dependency handling with try/except
- Added FASTA file support with biopython
- Created and activated a virtual environment
- Installed package with extras:
pip install -e ".[dev]" - Understand version constraints and when to use them
π― Key TakeawaysΒΆ
- Dependencies in
[project.dependencies]are always installed - Optional dependencies in
[project.optional-dependencies]are installed with[extras] - Version constraints should be loose for libraries, strict for applications
- Virtual environments isolate project dependencies
- Graceful degradation provides helpful errors when optional deps are missing
- Lock files ensure reproducible environments (more important for apps than libraries)
π What's Next?ΒΆ
In Episode 4, we'll add Testing & Quality tools to ensure kir-pydemo is reliable and maintainable:
- Writing tests with pytest
- Code formatting with black/ruff
- Type checking with mypy
- Pre-commit hooks for automation
This will make your package production-ready!