Using transformations to pre-install R packages. #42

drtconway · 2020-10-13T01:02:25Z

Following up from a conversation with Michael about the upcoming "transformations" feature.

It would make it much easier to build R scripts into workflows if it was not necessary to build a separate docker/singularity image for each script in order to have the package dependencies installed. Currently, the only alternative would be to have the script install the packages, but that would create its own set of problems, because the stateless nature of workflow execution would mean they would get repeatedly installed.

I note, in passing, that an advantage of creating separate images, is reproducibility - if dependencies are updated and behaviour changes, installing them at workflow run time could lead to different results when rerunning a workflow. However, I think it would still be valuable, especially during workflow development, to have a way of having a more flexible way of running R scripts.

My understanding of transformations, from the brief discussion with Michael, is that they allow arguments of one type to be transformed into another by the execution of a command (or a workflow, perhaps?). For example an argument of type FASTA could be transformed into a BwaIndex by building the index - a relatively expensive operation that only need be done once for an overall workflow; or Bam files might be transformed into Cram files; and so on.

For R scripts, the same feature could be used for transforming a list of required packages (Array(String)?) into RSitePackages or something similar. The implementation would need to capture the directory hierarchy under /usr/local/lib/R/site-library which is the default location for package installation, and then bind it as a volume for running an R script with the RSitePackages installed.

Possible problems:

The r-base docker image has docopt already installed in /usr/local/lib/R/site-library.
The transformation will need to track which packages are installed, so that if the list of packages change, the bundle of files also changes appropriately.
This solution definitely doesn't provide a way to compose package dependencies - that is you can't easily take two lists of dependencies and trivially combine them, without reinstalling them. If all transitive dependencies were explicitly listed, you might be able to gather up the specific subdirectories of /usr/local/lib/R/site-library and at least merge package bundles in the case where they don't overlap, but it seems preferable to wear the cost, and use a simpler implementation.

Once the transformation feature is complete, this should be pretty easy to implement, and a similar mechanism should be possible for other scripting environments with established package repositories (e.g. Python!).

I note that a related use of transformations may be to collect online resources such as reference sequences or databases that are used by scripts or other tools. A URI->File transformer (i.e. curl/wget) should be a relatively simple matter, and extremely useful!

T.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using transformations to pre-install R packages. #42

Using transformations to pre-install R packages. #42

drtconway commented Oct 13, 2020

Using transformations to pre-install R packages. #42

Using transformations to pre-install R packages. #42

Comments

drtconway commented Oct 13, 2020