Episode 1: Project Structure & pyproject.tomlΒΆ
Learning Objectives
By the end of this episode, you will:
- Understand what Python packaging is and why it matters
- Know the modern approach using
pyproject.toml - Set up a proper project structure with the
src/layout - Create your first installable Python package
- Install and test your package in editable mode
π¬ The Story BeginsΒΆ
Meet Dr. Sarah, a bioinformatics researcher who has written some useful Python functions for DNA sequence analysis. Currently, her code looks like this:
# sequence_utils.py
def gc_content(sequence):
"""Calculate GC content of a DNA sequence."""
sequence = sequence.upper()
g_count = sequence.count('G')
c_count = sequence.count('C')
return (g_count + c_count) / len(sequence) * 100
def reverse_complement(sequence):
"""Get reverse complement of a DNA sequence."""
complement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
return ''.join(complement.get(base, base) for base in reversed(sequence.upper()))
Sarah copies this file into each new analysis project. But there are problems:
- No version control - Which version is the latest?
- Code duplication - Changes must be made everywhere
- Hard to share - Colleagues can't easily use her functions
- No discoverability - Others don't know these tools exist
The solution? Package it! Let's help Sarah transform her scripts into kir-pydemo, a proper Python package.
π¦ What is Python Packaging?ΒΆ
Python packaging is the process of bundling your code so it can be:
- Installed with
pip install - Imported in any Python script:
import kir_pydemo - Versioned and tracked
- Shared with colleagues or the world
- Managed with proper dependencies
ποΈ Modern Python Packaging: The EvolutionΒΆ
The Old Way: setup.pyΒΆ
Previously, Python packages used setup.py:
# setup.py (old approach - don't use this!)
from setuptools import setup, find_packages
setup(
name="kir-pydemo",
version="0.1.0",
packages=find_packages(),
# ... more configuration
)
Problems with setup.py:
- Mixes configuration with executable code
- Different tools had different formats
- Security concerns (executes arbitrary Python)
- Hard to parse by automated tools
The Modern Way: pyproject.tomlΒΆ
PEP 517 and PEP 518 introduced pyproject.toml, a declarative, standardized format:
# pyproject.toml (modern approach - use this!)
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "kir-pydemo"
version = "0.1.0"
Benefits:
- β Declarative and readable (TOML format)
- β Standardized across the Python ecosystem
- β Safer (no code execution)
- β Tool-independent
- β Can include configuration for linters, formatters, etc.
Why TOML?
TOML (Tom's Obvious, Minimal Language) is designed for configuration files. It's more human-readable than JSON and less ambiguous than YAML.
π Project Structure: The src/ LayoutΒΆ
There are two common project layouts. We'll use the src/ layout (recommended):
kir-pydemo/
βββ src/
β βββ kir_pydemo/ # Note: underscores in Python package names
β βββ __init__.py
β βββ sequence.py
βββ tests/
β βββ test_sequence.py
βββ pyproject.toml
βββ README.md
βββ LICENSE
Why src/ Layout?ΒΆ
The src/ layout has a key advantage: it prevents you from accidentally importing the development version of your package instead of the installed version.
Without src/ (flat layout):
kir-pydemo/
βββ kir_pydemo/ # β οΈ Python might import this directly
β βββ __init__.py
β βββ sequence.py
βββ tests/
βββ pyproject.toml
If you're in the project directory and run python -c "import kir_pydemo", Python might import the local directory instead of the installed package. This can hide installation problems!
With src/ layout:
kir-pydemo/
βββ src/
β βββ kir_pydemo/ # β
Must be installed to import
β βββ __init__.py
β βββ sequence.py
βββ tests/
βββ pyproject.toml
Now you must install the package to import it, catching installation issues early.
Package Name vs. Import Name
- Package name (for PyPI): Can use hyphens:
kir-pydemo - Import name (in Python): Must use underscores:
kir_pydemo
π¨ Hands-On: Creating Your PackageΒΆ
Step 1: Create the Directory StructureΒΆ
# Create the project directory
mkdir kir-pydemo
cd kir-pydemo
# Create the src layout
mkdir -p src/kir_pydemo
mkdir tests
# Create essential files
touch src/kir_pydemo/__init__.py
touch src/kir_pydemo/sequence.py
touch README.md
touch pyproject.toml
Step 2: Write the Package CodeΒΆ
src/kir_pydemo/__init__.py
This file makes kir_pydemo a Python package. It can be empty, or we can expose our main functions:
"""kir-pydemo: A demonstration package for DNA sequence analysis."""
__version__ = "0.1.0"
# Import main functions to make them easily accessible
from kir_pydemo.sequence import gc_content, reverse_complement
__all__ = ["gc_content", "reverse_complement"]
src/kir_pydemo/sequence.py
"""DNA sequence analysis utilities."""
def gc_content(sequence: str) -> float:
"""
Calculate GC content of a DNA sequence.
Parameters
----------
sequence : str
DNA sequence containing A, T, G, C nucleotides
Returns
-------
float
GC content as a percentage (0-100)
Examples
--------
>>> gc_content("ATGC")
50.0
>>> gc_content("AAAA")
0.0
"""
if not sequence:
raise ValueError("Sequence cannot be empty")
sequence = sequence.upper()
g_count = sequence.count('G')
c_count = sequence.count('C')
return (g_count + c_count) / len(sequence) * 100
def reverse_complement(sequence: str) -> str:
"""
Get the reverse complement of a DNA sequence.
Parameters
----------
sequence : str
DNA sequence containing A, T, G, C nucleotides
Returns
-------
str
Reverse complement of the input sequence
Examples
--------
>>> reverse_complement("ATGC")
'GCAT'
>>> reverse_complement("AAAA")
'TTTT'
"""
complement_map = {
'A': 'T', 'T': 'A',
'G': 'C', 'C': 'G',
'a': 't', 't': 'a',
'g': 'c', 'c': 'g'
}
return ''.join(complement_map.get(base, base) for base in reversed(sequence))
Type Hints and Docstrings
Notice we're using:
- Type hints (
sequence: str,-> float) - Makes code more readable and enables static analysis - NumPy-style docstrings - Standard format for scientific Python packages
- Examples in docstrings - Can be tested with
doctest
Step 3: Create pyproject.tomlΒΆ
Now for the heart of our package - the pyproject.toml file:
pyproject.toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "kir-pydemo"
version = "0.1.0"
description = "A demonstration package for DNA sequence analysis"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "MIT"}
authors = [
{name = "BMRC Training", email = "training@example.com"}
]
keywords = ["bioinformatics", "DNA", "sequence analysis", "tutorial"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
[project.urls]
Homepage = "https://github.com/bmrc/kir-pydemo"
Documentation = "https://kir-pydemo.readthedocs.io"
Repository = "https://github.com/bmrc/kir-pydemo"
Issues = "https://github.com/bmrc/kir-pydemo/issues"
Let's break this down:
[build-system]ΒΆ
- requires: Tools needed to build your package (setuptools is the most common)
- build-backend: Which build system to use (setuptools, flit, poetry, hatch, etc.)
Build Backends
While we use setuptools (the most common and compatible choice), alternatives include:
- flit: Simpler, for pure Python packages
- hatch: Modern, with environment management
- poetry: Includes dependency management
- PDM: Supports PEP 582
For scientific packages, setuptools remains the standard.
[project]ΒΆ
This section contains your package metadata:
- name: Package name on PyPI (can use hyphens)
- version: Current version (we'll cover version management in Episode 5)
- description: One-line summary
- readme: Path to README file (supports .md, .rst, .txt)
- requires-python: Minimum Python version
- license: License information
- authors: List of authors with name and email
- keywords: For discoverability on PyPI
- classifiers: Standardized categories (see PyPI classifiers)
Step 4: Add a READMEΒΆ
README.md
# kir-pydemo
A demonstration package for DNA sequence analysis, created as part of the Python Packaging Basics training series.
## Features
- Calculate GC content of DNA sequences
- Generate reverse complement of DNA sequences
## Installation
```bash
pip install kir-pydemo
UsageΒΆ
from kir_pydemo import gc_content, reverse_complement
# Calculate GC content
sequence = "ATGCATGC"
gc = gc_content(sequence)
print(f"GC content: {gc}%") # Output: GC content: 50.0%
# Get reverse complement
rev_comp = reverse_complement(sequence)
print(f"Reverse complement: {rev_comp}") # Output: Reverse complement: GCATGCAT
LicenseΒΆ
MIT
### Step 5: Install in Editable Mode
Now comes the magic moment - installing your package!
```bash
# Make sure you're in the kir-pydemo directory
cd /path/to/kir-pydemo
# Install in editable mode
pip install -e .
The -e flag means editable mode (also called development mode):
- Changes to your source code are immediately reflected
- No need to reinstall after each change
- Perfect for development
Virtual Environments
It's best practice to use a virtual environment:
# Create a virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # On Linux/Mac
venv\Scripts\activate # On Windows
# Now install
pip install -e .
We'll cover this more in Episode 3!
Step 6: Test Your PackageΒΆ
Now test that everything works:
# Open a Python interpreter from anywhere (not in the project directory!)
python
>>> from kir_pydemo import gc_content, reverse_complement
>>> gc_content("ATGCATGC")
50.0
>>> reverse_complement("ATGC")
'GCAT'
>>> import kir_pydemo
>>> kir_pydemo.__version__
'0.1.0'
Success! Your package is installed and importable from anywhere on your system.
π Checkpoint: What Have We Achieved?ΒΆ
Let's verify you've successfully completed Episode 1:
- Created a
src/kir_pydemo/directory structure - Written
__init__.pywith version and imports - Implemented functions in
sequence.pywith type hints and docstrings - Created
pyproject.tomlwith proper metadata - Installed the package with
pip install -e . - Successfully imported and used the package
π― Key TakeawaysΒΆ
- Modern packaging uses
pyproject.toml, notsetup.py - The src/ layout prevents import confusion and catches installation issues
- Package names (with hyphens) vs import names (with underscores) are different
- Editable mode (
pip install -e .) is perfect for development - Type hints and docstrings make your package professional and maintainable
π What's Next?ΒΆ
In Episode 2, we'll add command-line interface (CLI) capabilities to kir-pydemo, allowing users to run our functions directly from the terminal:
This will introduce entry points and console scripts - making your package even more useful!