Episode 1: From Script to Package¶
Learning Objectives
By the end of this episode you will be able to:
- Explain the anatomy of an R package
- Scaffold a new package with
usethis::create_package() - Write your first exported functions in
R/ - Use the
devtools::load_all()development loop - Initialise a reproducibility lockfile with
renv
1.1 Why Package Your Code?¶
You've written a perfectly good R script:
# gc_analysis.R — the script we've all written
gc_content <- function(sequence) {
sequence <- toupper(sequence)
bases <- strsplit(sequence, "")[[1]]
gc <- sum(bases %in% c("G", "C"))
round(gc / length(bases) * 100, 2)
}
reverse_complement <- function(sequence) {
map <- c(A="T", T="A", G="C", C="G")
bases <- strsplit(toupper(sequence), "")[[1]]
paste(rev(map[bases]), collapse = "")
}
This works fine — until:
- A colleague asks "can I install this?"
- You want to
source()it in three different projects - You need to add documentation, tests, or a version number
A package solves all of this. In R, packages are the fundamental unit of reusable code.
1.2 Anatomy of an R Package¶
A minimal R package looks like this:
That's it. No complex build system. No src/ layout debate. R packages have had a stable, well-understood structure for decades.
Compared to Python
The R equivalent of pyproject.toml is DESCRIPTION. The R/ directory maps to src/kirgcdemo/. And unlike Python, you never manually edit NAMESPACE — roxygen2 writes it for you.
1.3 Scaffolding with usethis¶
usethis is the modern way to set up package infrastructure. Let's create our package:
This opens a new RStudio project (or sets the working directory) with:
The DESCRIPTION file¶
Open DESCRIPTION — it looks like this:
Package: kirgcdemo
Title: What the Package Does (One Line, Title Case)
Version: 0.1.0
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: `use_mit_license()` to add a license.
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Let's fill it in properly:
Package: kirgcdemo
Title: DNA Sequence Analysis Utilities
Version: 0.1.0
Authors@R:
person("BMRC Training", role = c("aut", "cre"),
email = "rescomp@kennedy.ox.ac.uk")
Description: A demonstration package providing GC content calculation
and reverse complement utilities for DNA sequences.
Created for the KIR R Packaging workshop.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Then run:
1.4 Writing Your First Functions¶
Create R/sequences.R:
# R/sequences.R
#' Calculate GC content of a DNA sequence
#'
#' @param sequence A character string of DNA bases (A, T, G, C).
#' Case-insensitive. Ambiguous bases (N, etc.) are counted in the
#' denominator but not as GC.
#'
#' @returns A numeric value: percentage GC content (0–100).
#'
#' @examples
#' gc_content("ATGCATGC") # 50
#' gc_content("GGGGAAAA") # 50
#' gc_content("GCGCGCGC") # 100
#'
#' @export
gc_content <- function(sequence) {
if (!is.character(sequence) || length(sequence) != 1L) {
stop("`sequence` must be a single character string.", call. = FALSE)
}
bases <- strsplit(toupper(sequence), "")[[1]]
if (length(bases) == 0L) return(NA_real_)
gc <- sum(bases %in% c("G", "C"))
round(gc / length(bases) * 100, 2)
}
#' Reverse complement of a DNA sequence
#'
#' Returns the reverse complement of a DNA sequence using the standard
#' Watson-Crick base pairing rules (A↔T, G↔C). Ambiguous bases are
#' passed through unchanged.
#'
#' @param sequence A character string of DNA bases. Case-insensitive.
#'
#' @returns A character string: the reverse complement, upper-case.
#'
#' @examples
#' reverse_complement("ATGC") # "GCAT"
#' reverse_complement("AAAA") # "TTTT"
#' reverse_complement("GCGCGC") # "GCGCGC"
#'
#' @export
reverse_complement <- function(sequence) {
if (!is.character(sequence) || length(sequence) != 1L) {
stop("`sequence` must be a single character string.", call. = FALSE)
}
map <- c(A = "T", T = "A", G = "C", C = "G")
bases <- strsplit(toupper(sequence), "")[[1]]
rev_comp <- map[bases]
rev_comp[is.na(rev_comp)] <- bases[is.na(rev_comp)] # pass-through unknowns
paste(rev(rev_comp), collapse = "")
}
R/4.1+ native pipe
These examples are written for clarity, but once you reach Episode 2 we'll use the native pipe |> in vignette examples. Unlike %>% (magrittr), it requires no package dependency.
1.5 The load_all() Development Loop¶
This is the single most important habit in R package development:
Run this after every change to R/. It simulates installing the package without actually building it, making iteration very fast.
devtools::load_all()
#> ℹ Loading kirgcdemo
gc_content("ATGCNN")
#> [1] 33.33
reverse_complement("ATGC")
#> [1] "GCAT"
Keyboard shortcut
In RStudio: Ctrl+Shift+L (or Cmd+Shift+L on Mac) runs load_all().
1.6 Initialise renv¶
Before we go any further, lock the environment:
This creates:
kirgcdemo/
├── renv/
│ └── activate.R
├── renv.lock # ← the lockfile
└── .Rprofile # sources renv/activate.R on startup
renv.lock records the exact version of every package you depend on — the R equivalent of a uv.lock or requirements.txt. Commit it to git.
✅ Episode 1 Checkpoint¶
Run the following — everything should pass:
Your package directory should look like: