π§ Work in Progress
This repository is under active development.
Expected completion: 20th of May 2026
Who this workshop is for
Researchers and research software engineers comfortable with Python and bash who want to move beyond shell scripts and for loops. No prior Snakemake experience is assumed.
What you will build
By the end of Episode 5, you will have a complete, cluster-ready RNA-seq quantification pipeline:
Paired-end FASTQs
β
βΌ
FastQC β quality assessment (per sample, per read)
β
βΌ
fastp β adapter trimming (intermediate outputs auto-deleted)
β
βΌ
HISAT2 β splice-aware alignment (BAMs write-protected)
β
βΌ
Subread β quantification across all samples in one call
β
βΌ
counts matrix
EpisodesΒΆ
| Episode | Topic | Key concepts |
|---|---|---|
| Episode 1 | From shell scripts to Snakemake | Rules, inputs/outputs, rule all, dry runs |
| Episode 2 | Wildcards, expand(), and the DAG |
{wildcards}, expand(), --dag, --rulegraph |
| Episode 3 | A real Snakefile β RNA-seq from scratch | configfile:, params:, log:, temp(), protected() |
| Episode 4 | Scaling to the cluster β Slurm via DRMAA | threads:, resources:, --executor drmaa, --drmaa-args, profiles |
| Episode 5 | Robustness and best practices | benchmark:, --rerun-incomplete, wildcard_constraints:, conda envs |
Before you startΒΆ
This workshop assumes Snakemake 9, snakemake-executor-plugin-drmaa, and Python DRMAA bindings are already installed. See the installation guide for instructions specific to this cluster.
A note on the examples
All exercises use toy data (text files, word counts) in Episodes 1β2, then switch to a realistic but deliberately simplified RNA-seq skeleton in Episodes 3β5. The pipeline is designed to illustrate Snakemake concepts cleanly. For a production-grade RNA-seq workflow ready to run out of the box, see the Snakemake wrappers and community workflow catalogues.
