-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation from software.bib does not look at dependencies #31
Comments
Hi Christoffer! That's a valid point. The design of TAF is indeed that DATA.bib is a hard requirement - the only valid pathway to use data in a TAF analysis - while SOFTWARE.bib entries are not a hard requirement, especially for R packages. In other words, a TAF analysis is perfectly valid if it uses R packages without declaring them in SOFTWARE.bib. The main purpose of declaring an R package in SOFTWARE.bib is to specify the exact version number (or SHA code) of a key package that is used in the analysis, typically, a stock assessment package. This is important and useful information for scientific purposes and to strengthen reproducibility. By strengthening reproducibility, we can still not 100% guarantee that it's straightforward to rerun the analysis next year, or in 10 years, running R 7.0 in Windows 14. Some analyses are more reproducible than others and we can usually tell by looking at the scripts - fewer dependencies means better reproducibility. For example, I was recently involved in a TAF analysis that uses the sraplus package which has a large number of dependencies. From a fresh R install, you need to install nearly 200 packages just to get sraplus to work. It will probably be a challenge to rerun this analysis a few years from now. That's an extreme example, but in ICES assessments we can expect many analyses to start a TAF script with library(tidyverse), for example. The idea in TAF is not to have SOFTWARE.bib install every package used in the TAF analysis, along with all dependencies, but rather to selectively pinpoint the location and version of key software used. In the case of SPiCT, for example, it could make sense to declare in SOFTWARE.bib not only the version of SPiCT used, but also the matching version of TMB and perhaps the Matrix package. This might be practical to support reproducibility: rerunning an analysis that uses an old version of SPiCT, on a computer that has a newer version of SPiCT installed but the newer version should not be used for this particular analysis. Given the sraplus example above, and by extension other packages with several layers of dependencies, the Do you think, in the analysis you're working on and for your purposes, that it would be enough to declare in SOFTWARE.bib the version of the key software that is used, or do you think it would be useful to also declare the version of some key dependencies as well? We appreciate insights and suggestions from TAF users on this topic. Based on experience and user feedback on SOFTWARE.bib entries, we can copy parts of this essay and add specific recommendations to the TAF documentation, e.g. https://github.com/ices-taf/doc/wiki/Bib-entries#software-version. |
Hi Arni, I completely agree that listing all 200 dependencies of the "tidyverse" in the SOFTWARE.bib is not useful to anyone! A good alternative, in my opinion, would be to catch the error when a package cannot be installed automatically, stop the script, and give an informative error. Best, |
When running, e.g.,
taf.bootstrap()
, the procedure tries to install R packages listed in the software.bib file, but it does not install missing dependencies, and is not stopped by the error.The text was updated successfully, but these errors were encountered: