Scaling Limits¶
Amdahl's law and Gustafson's law¶
When you submit a job to a cluster with more CPU cores, you might reasonably expect twice the cores to give twice the speed. In practice, the relationship is more complicated — and two laws describe exactly why.
The core problem: serial bottlenecks¶
Any parallel workload has two components:
- Serial work — steps that must happen in sequence (reading input, writing output, checkpointing, initialising data structures). Only one processor can do this at a time.
- Parallel work — steps that can be split across many processors simultaneously (computing, transforming, simulating).
Adding more processors only accelerates the parallel portion. The serial portion is immune to parallelism, and as you scale up, it increasingly dominates total runtime.
Amdahl's law (1967)¶
Amdahl's law gives the theoretical maximum speedup for a fixed-size problem:
where \(s\) is the serial fraction of the workload and \(p\) is the number of processors.
The critical insight is that as \(p \to \infty\), speedup approaches \(\tfrac{1}{s}\) — a hard ceiling. A workload that is 10% serial can never exceed 10× speedup, no matter how many processors you throw at it.
| Serial fraction | Maximum possible speedup |
|---|---|
| 50% | 2× |
| 25% | 4× |
| 10% | 10× |
| 5% | 20× |
| 1% | 100× |
Amdahl's pessimism
A workload that feels "mostly parallel" can still have a brutal ceiling. 10% serial code caps you at 10× regardless of whether you use 16, 256, or 10,000 cores.
Gustafson's law (1988)¶
Gustafson's law challenges Amdahl's framing. The speedup formula looks similar:
but the underlying assumption is fundamentally different. Gustafson observed that in practice, researchers don't run the same fixed job faster — they use extra processors to tackle larger problems in the same wall-clock time.
- Amdahl: fixed problem, fewer serial seconds per run → serial becomes the bottleneck.
- Gustafson: fixed time, larger problem → serial fraction stays small as the parallel portion grows.
The result is a much more optimistic scaling curve. There is no hard ceiling — speedup scales roughly linearly with \(p\) when \(s\) is small.
When Gustafson applies
Gustafson's framing fits most real HPC workloads well: running a finer-resolution simulation, processing more samples in a cohort, using a larger training dataset. The question isn't "how fast can I finish this specific job?" but "how much science can I do in a given allocation window?"
Interactive explorer¶
Adjust the sliders to see how both laws respond to changes in the serial fraction and processor count.
Practical implications on the cluster¶
Choosing your core count¶
Amdahl's law is most relevant when your job has a hard serial phase you cannot avoid — for example, a pipeline step that reads and indexes a large file before the parallel processing begins. In these cases, requesting hundreds of cores will not help if that indexing step takes 30 minutes regardless.
Profile first. Tools like Slurm's job efficiency reports and sacct can tell you whether your jobs are actually using the cores they request.
Scaling out vs. scaling up¶
| Scenario | Better fit |
|---|---|
| Fixed dataset, want it faster | Amdahl — check the serial ceiling before requesting more cores |
| Larger dataset, same time budget | Gustafson — more cores let you do more work |
| Embarrassingly parallel (e.g. per-sample pipelines) | Gustafson — near-linear scaling, use job arrays |
| Tightly coupled simulation (e.g. MPI across nodes) | Amdahl — communication overhead adds to the effective serial fraction |
The hidden serial fraction¶
In MPI jobs, communication and synchronisation overhead act as an additional serial penalty on top of whatever is inherently sequential in your code. This is why strong scaling (same problem, more cores) typically plateaus well before Amdahl's theoretical ceiling — the effective \(s\) is higher than the code alone suggests.
Rule of thumb
For most bioinformatics pipelines on BMRC, requesting more than 16–32 cores per step rarely helps unless you have profiled the job and confirmed the parallel fraction justifies it. Over-requesting cores wastes allocation and increases queue wait time.
Summary¶
| Amdahl's law | Gustafson's law | |
|---|---|---|
| Problem size | Fixed | Scales with processors |
| Serial portion | Hard speedup ceiling | Remains a small fraction |
| Scaling outlook | Pessimistic | Optimistic |
| Best applies to | Latency-sensitive, fixed jobs | Throughput-oriented, scalable workloads |
| Formula | \(S = \frac{1}{s + (1-s)/p}\) | \(S = p - s(p-1)\) |
Both laws are correct within their assumptions. The key is recognising which regime your workload lives in before deciding how many cores to request.