Modules: Organizing R Source Code

2024-01-20

Introduction

This vignette explains how to use modules outside of R packages as a means to organize a project or data analysis. Using modules we may gain some of the features we also expect from packages but with less overhead.

A lot of R projects run into problems when they grow. Even relatively simple data analysis projects can span a thousand lines easily. R has two important building blocks to organize projects: functions and packages. However packages do present a hurdle for a lot of users with little programming background. In those cases we often rely on splitting up the code base into files and source them into our R session (referring to the function source). Modules, in this context, present a more sophisticated way to source files by providing three important features:

Example

You can load scripts as modules when you refer to a file (or directory) in a call to use. Inside such a script you can use import and use in the same way you typically use library. Consider the following example where we create a module in a temporary file with its dependencies.

code <- "
import('stats', 'median')
functionWithDep <- function(x) median(x)
"

fileName <- tempfile(fileext = ".R")
writeLines(code, fileName)

Then we can load such a module into this session by the following:

library(modules)
m <- use(fileName)
m$functionWithDep(1:2)
#> [1] 1.5

Pseudo-code example

To give a bit more context of how you can structure a project, consider the following file structure:

/
  /R
    munging.R
    graphics.R
  /data
    some.csv
  /results
    /tables
      ...
    /figs
  main.R
  README.md

You put all your R code into the R folder. This folder may or may not have a nested folder structure itself. You probably have a folder for your data and one into which you store all results. The important part here is that you have split your code base into different files. main.R in the project root acts as the master file in this example. This file kicks of all steps of our analysis and connects the dots. munging.R and graphics.R implement helper functions.

main.R

lib <- modules::use("R")
dat <- read.csv("data/some.csv")

# munging
dat <- lib$munging$clean(dat)
dat <- lib$munging$recode(dat)

# generate results
lib$graphics$barplot(dat)
lib$graphics$lineplot(dat)

The main.R file implements no logic of the analysis. Its responsibility is to connect all steps. Each file in the R folder then implements a phase of the project. In larger projects it is likely that each phase will need its own folder. The implementation may then look something along the lines of:

R/munging.R

export("clean")
clean <- function(dat) {
  # ...
}

export("recode")
recode <- function(dat) {
  # ...
}

helper <- function(...) {
  # This function is private
  # ...
}

R/graphics.R

import("ggplot2")
export("barplot", "lineplot")

barplot <- function(dat) {
  # ...
}

lineplot <- function(dat) {
  # ...
}

helper <- function(...) {
  # ...
}

Documentation

If you want proper documentation for your functions or modules you really want a package. There are some simple things you can do for ad-hoc documentation of modules which is to use comments:

module({
  fun <- function(x) {
    ## A function for illustrating documentation
    ## x (numeric) some values
    x
  }
})
#> fun:
#> function(x)
#> ## A function for illustrating documentation
#> ## x (numeric) some values

Best practices