Suggestions for optimizing package dependencies

For the strong parents

Heaviness analysis provides hints for reducing the complexity of package dependencies, but how to optimize depends on the specific use of parent packages in the corresponding package. First we need to be aware of that there are scenarios when reduction of heavy parents could not be performed:

  1. When a package extends the functionality of a heavy parents, the parent must be its strong parent.
  2. A heavy parent provides core functionality to a package.
  3. S4 methods or S4 classes are imported from a parent.

Now let’s assume the dependencies from heavy parents can be reduced. We give the following suggestions for reducing them:

  1. Re-implement the functions that were imported from heavy parents. Take package mapStats (version 2.4) as an example, an extremely heavy parent Hmisc can be observed where Hmisc has a heaviness of 49 which is almost 60% of the total number of required packages of mapStats. If moving Hmisc to “Suggests” of mapStats, the number of strong dependencies can be reduced from 83 to 34. The dependency heaviness analysis shows there is only one function imported from Hmisc. A deep inspection into the source code of mapStats reveals that only a function capitalize() from Hmisc is imported to mapStats. capitalize() is a simple function that only capitalizes the first letter of a string. The 49 additional dependencies imported from Hmisc can be avoided by simply reimplementing a function capitalize() by developer’s own.

  2. The previous scenario is actually not very common. More common cases are when heavy parents only provide analysis that are not frequently used in the package, which we call it as “secondary analysis”. We suggest to move heavy parent packages that are less used to “Suggests” and they are only loaded when the corresponding functions are called. Dependency packages listed in “Suggests” by default are not mandatory to be installed, thus, there will be errors when the functions are called but these dependency packages are not installed yet. pkgndep provides a function check_pkg() that checks the availability of a package, and if the package has not been installed yet, check_pkg() prints friendly messages to guide user to install it.

    Assume foo() is a function that uses another function bar() from a heavy parent pkg, foo() can be written as:

foo = function(...) {
    check_pkg("pkg", ...)

    pkg::bar(...)
    ...
}
  1. Another way for reducing the package dependencies is to directly copy code from heavy parents. This approach is of course NOT recommended from the aspect of software engineering, but actually it is widely used in CRAN packages.

  2. Try to separate a large package into several smaller packages which focus on more specific tasks. For example, the package remotes separated from the package devtools only focuses on installing packages from remote repositories, where remotes only has 4 strong dependencies, but devtools has 76.

For the weak parents

Weak parent packages are those listed in “Suggested/Enhances” fields in the DESCRIPTION file. The heaviness mainly measures the effects on strong parents (those in “Depends/Imports”) and we suggest moving heavy and strong parents to “Suggested/Enhances” if they are not frequently used by users. As “suggested” packages are not necessarily required for users, optimizing the dependency complexity for suggested packages is less necessary.

But still, on the developer’s side, it is still important to keep the dependency complexity of “all parent packages” to a reasonable size. CRAN and Bioconductor perform full checks on packages, which means they require all the packages in “Depends”, “Imports” and “Suggests” to be installed. A large dependency (normally from suggested packages) more likely produces failures, which prevents acceptance or updates on CRAN or on Bioconductor. In general, we have the following suggestions for handling dependency complexity from suggested packages.

  1. If removing a heavy suggested package does not affect the functionality of the package, or in other words, this dependency package only “enhances” the package, then it can be put in the “Enhances” field of the DESCRIPTION file. Packages listed in “Enhances” will never be checked by R CMD check.
  2. If a heavy suggested package is only used in examples and in vignettes. Developers may think about whether they really need the corresponding code to run during the building and the checking of the package. More complex vignettes can be served separately while not being shipped with the package. A nice example is the Seurat package.