Skip to content

🚧 Work in Progress

This repository is under active development.
Expected completion: 20th of May 2026

drawing

Who this workshop is for

Researchers and research software engineers comfortable with Python and bash who want to move beyond shell scripts and for loops. No prior Snakemake experience is assumed.

What you will build

By the end of Episode 5, you will have a complete, cluster-ready RNA-seq quantification pipeline:

Paired-end FASTQs
       β”‚
       β–Ό
    FastQC          ← quality assessment (per sample, per read)
       β”‚
       β–Ό
     fastp          ← adapter trimming (intermediate outputs auto-deleted)
       β”‚
       β–Ό
    HISAT2          ← splice-aware alignment (BAMs write-protected)
       β”‚
       β–Ό
    Subread         ← quantification across all samples in one call
       β”‚
       β–Ό
  counts matrix

EpisodesΒΆ

Episode Topic Key concepts
Episode 1 From shell scripts to Snakemake Rules, inputs/outputs, rule all, dry runs
Episode 2 Wildcards, expand(), and the DAG {wildcards}, expand(), --dag, --rulegraph
Episode 3 A real Snakefile β€” RNA-seq from scratch configfile:, params:, log:, temp(), protected()
Episode 4 Scaling to the cluster β€” Slurm via DRMAA threads:, resources:, --executor drmaa, --drmaa-args, profiles
Episode 5 Robustness and best practices benchmark:, --rerun-incomplete, wildcard_constraints:, conda envs

Before you startΒΆ

This workshop assumes Snakemake 9, snakemake-executor-plugin-drmaa, and Python DRMAA bindings are already installed. See the installation guide for instructions specific to this cluster.

A note on the examples

All exercises use toy data (text files, word counts) in Episodes 1–2, then switch to a realistic but deliberately simplified RNA-seq skeleton in Episodes 3–5. The pipeline is designed to illustrate Snakemake concepts cleanly. For a production-grade RNA-seq workflow ready to run out of the box, see the Snakemake wrappers and community workflow catalogues.