Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using transformations to pre-install R packages. #42

Open
drtconway opened this issue Oct 13, 2020 · 0 comments
Open

Using transformations to pre-install R packages. #42

drtconway opened this issue Oct 13, 2020 · 0 comments

Comments

@drtconway
Copy link

Following up from a conversation with Michael about the upcoming "transformations" feature.

It would make it much easier to build R scripts into workflows if it was not necessary to build a separate docker/singularity image for each script in order to have the package dependencies installed. Currently, the only alternative would be to have the script install the packages, but that would create its own set of problems, because the stateless nature of workflow execution would mean they would get repeatedly installed.

I note, in passing, that an advantage of creating separate images, is reproducibility - if dependencies are updated and behaviour changes, installing them at workflow run time could lead to different results when rerunning a workflow. However, I think it would still be valuable, especially during workflow development, to have a way of having a more flexible way of running R scripts.

My understanding of transformations, from the brief discussion with Michael, is that they allow arguments of one type to be transformed into another by the execution of a command (or a workflow, perhaps?). For example an argument of type FASTA could be transformed into a BwaIndex by building the index - a relatively expensive operation that only need be done once for an overall workflow; or Bam files might be transformed into Cram files; and so on.

For R scripts, the same feature could be used for transforming a list of required packages (Array(String)?) into RSitePackages or something similar. The implementation would need to capture the directory hierarchy under /usr/local/lib/R/site-library which is the default location for package installation, and then bind it as a volume for running an R script with the RSitePackages installed.

Possible problems:

  1. The r-base docker image has docopt already installed in /usr/local/lib/R/site-library.
  2. The transformation will need to track which packages are installed, so that if the list of packages change, the bundle of files also changes appropriately.
  3. This solution definitely doesn't provide a way to compose package dependencies - that is you can't easily take two lists of dependencies and trivially combine them, without reinstalling them. If all transitive dependencies were explicitly listed, you might be able to gather up the specific subdirectories of /usr/local/lib/R/site-library and at least merge package bundles in the case where they don't overlap, but it seems preferable to wear the cost, and use a simpler implementation.

Once the transformation feature is complete, this should be pretty easy to implement, and a similar mechanism should be possible for other scripting environments with established package repositories (e.g. Python!).

I note that a related use of transformations may be to collect online resources such as reference sequences or databases that are used by scripts or other tools. A URI->File transformer (i.e. curl/wget) should be a relatively simple matter, and extremely useful!

T.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant