Skip to content

Episode 1: From Script to Package

Learning Objectives

By the end of this episode you will be able to:

  • Explain the anatomy of an R package
  • Scaffold a new package with usethis::create_package()
  • Write your first exported functions in R/
  • Use the devtools::load_all() development loop
  • Initialise a reproducibility lockfile with renv

1.1 Why Package Your Code?

You've written a perfectly good R script:

# gc_analysis.R  — the script we've all written
gc_content <- function(sequence) {
  sequence <- toupper(sequence)
  bases    <- strsplit(sequence, "")[[1]]
  gc       <- sum(bases %in% c("G", "C"))
  round(gc / length(bases) * 100, 2)
}

reverse_complement <- function(sequence) {
  map  <- c(A="T", T="A", G="C", C="G")
  bases <- strsplit(toupper(sequence), "")[[1]]
  paste(rev(map[bases]), collapse = "")
}

This works fine — until:

  • A colleague asks "can I install this?"
  • You want to source() it in three different projects
  • You need to add documentation, tests, or a version number

A package solves all of this. In R, packages are the fundamental unit of reusable code.


1.2 Anatomy of an R Package

A minimal R package looks like this:

kirgcdemo/
├── DESCRIPTION        # metadata: name, version, dependencies
├── NAMESPACE          # what gets exported (auto-generated by roxygen2)
├── R/
   └── sequences.R    # your functions live here
└── man/               # documentation (auto-generated by roxygen2)

That's it. No complex build system. No src/ layout debate. R packages have had a stable, well-understood structure for decades.

Compared to Python

The R equivalent of pyproject.toml is DESCRIPTION. The R/ directory maps to src/kirgcdemo/. And unlike Python, you never manually edit NAMESPACE — roxygen2 writes it for you.


1.3 Scaffolding with usethis

usethis is the modern way to set up package infrastructure. Let's create our package:

usethis::create_package("kirgcdemo")

This opens a new RStudio project (or sets the working directory) with:

kirgcdemo/
├── .Rbuildignore
├── DESCRIPTION
├── NAMESPACE
├── R/
└── kirgcdemo.Rproj

The DESCRIPTION file

Open DESCRIPTION — it looks like this:

Package: kirgcdemo
Title: What the Package Does (One Line, Title Case)
Version: 0.1.0
Authors@R:
    person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: `use_mit_license()` to add a license.
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2

Let's fill it in properly:

Package: kirgcdemo
Title: DNA Sequence Analysis Utilities
Version: 0.1.0
Authors@R:
    person("BMRC Training", role = c("aut", "cre"),
           email = "rescomp@kennedy.ox.ac.uk")
Description: A demonstration package providing GC content calculation
    and reverse complement utilities for DNA sequences.
    Created for the KIR R Packaging workshop.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2

Then run:

usethis::use_mit_license()
usethis::use_readme_md()

1.4 Writing Your First Functions

Create R/sequences.R:

# R/sequences.R

#' Calculate GC content of a DNA sequence
#'
#' @param sequence A character string of DNA bases (A, T, G, C).
#'   Case-insensitive. Ambiguous bases (N, etc.) are counted in the
#'   denominator but not as GC.
#'
#' @returns A numeric value: percentage GC content (0–100).
#'
#' @examples
#' gc_content("ATGCATGC")   # 50
#' gc_content("GGGGAAAA")   # 50
#' gc_content("GCGCGCGC")   # 100
#'
#' @export
gc_content <- function(sequence) {
  if (!is.character(sequence) || length(sequence) != 1L) {
    stop("`sequence` must be a single character string.", call. = FALSE)
  }
  bases <- strsplit(toupper(sequence), "")[[1]]
  if (length(bases) == 0L) return(NA_real_)
  gc <- sum(bases %in% c("G", "C"))
  round(gc / length(bases) * 100, 2)
}


#' Reverse complement of a DNA sequence
#'
#' Returns the reverse complement of a DNA sequence using the standard
#' Watson-Crick base pairing rules (A↔T, G↔C). Ambiguous bases are
#' passed through unchanged.
#'
#' @param sequence A character string of DNA bases. Case-insensitive.
#'
#' @returns A character string: the reverse complement, upper-case.
#'
#' @examples
#' reverse_complement("ATGC")    # "GCAT"
#' reverse_complement("AAAA")    # "TTTT"
#' reverse_complement("GCGCGC")  # "GCGCGC"
#'
#' @export
reverse_complement <- function(sequence) {
  if (!is.character(sequence) || length(sequence) != 1L) {
    stop("`sequence` must be a single character string.", call. = FALSE)
  }
  map   <- c(A = "T", T = "A", G = "C", C = "G")
  bases <- strsplit(toupper(sequence), "")[[1]]
  rev_comp <- map[bases]
  rev_comp[is.na(rev_comp)] <- bases[is.na(rev_comp)]  # pass-through unknowns
  paste(rev(rev_comp), collapse = "")
}

R/4.1+ native pipe

These examples are written for clarity, but once you reach Episode 2 we'll use the native pipe |> in vignette examples. Unlike %>% (magrittr), it requires no package dependency.


1.5 The load_all() Development Loop

This is the single most important habit in R package development:

devtools::load_all()   # re-load your package from source — like `pip install -e .`

Run this after every change to R/. It simulates installing the package without actually building it, making iteration very fast.

devtools::load_all()
#> ℹ Loading kirgcdemo

gc_content("ATGCNN")
#> [1] 33.33

reverse_complement("ATGC")
#> [1] "GCAT"

Keyboard shortcut

In RStudio: Ctrl+Shift+L (or Cmd+Shift+L on Mac) runs load_all().


1.6 Initialise renv

Before we go any further, lock the environment:

renv::init()

This creates:

kirgcdemo/
├── renv/
│   └── activate.R
├── renv.lock          # ← the lockfile
└── .Rprofile          # sources renv/activate.R on startup

renv.lock records the exact version of every package you depend on — the R equivalent of a uv.lock or requirements.txt. Commit it to git.

usethis::use_git()     # initialise git repo if you haven't already

✅ Episode 1 Checkpoint

Run the following — everything should pass:

devtools::load_all()
gc_content("GCGCGCGC")   # → 100
reverse_complement("ATGC")  # → "GCAT"

Your package directory should look like:

kirgcdemo/
├── .Rbuildignore
├── .gitignore
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R/
│   └── sequences.R
├── README.md
├── renv.lock
└── kirgcdemo.Rproj