create and release a Docker image to run validations #57

elray1 · 2023-10-12T16:20:32Z

this would likely live in a different repo, but am filing the issue here for now

nickreich · 2024-05-17T19:50:12Z

I removed the assignees from this issue, as I don't think this is being actively worked on.

zkamvar · 2025-01-02T17:04:23Z

I will investigate this today.

zkamvar · 2025-01-03T01:03:36Z

Initial container in https://github.com/hubverse-org/test-docker-hubValidations. I will follow up tomorrow by experimenting with incorporating this into a fork of a hub.

zkamvar · 2025-01-03T16:27:00Z

I've tested it in zkamvar/variant-nowcast-hub#1, which was created by duplicating the output in reichlab/variant-nowcast-hub#246 and adding an error to the value column

Some observations:

The test runs about 2 minutes faster than the original run because we did not have to install R or check for dependency updates.
The workflow is shorter
The container takes 15 to 20 minutes to build on GitHub actions, but once it's built, it only takes 20 to 30 seconds to provision in a GitHub workflow.
I do not yet know how to version this, but I can dig into it

annakrystalli · 2025-01-06T09:31:05Z

My feeling is that we should not be wrapping the validation code within scripts in the container. I feel using an image that contains all necessary software and packages and using it to run the rest of the workflow steps is preferable for a number of reasons:

Reusability: Image can be used for other purposes, including locally for validating submissions.
Modularity: It can still form the basis of a container that does run validation code if we decide this is necessary. It can even be used by hub admins as the basis for other containers.
The validate_pr() function has a number of arguments that allow for customisation of validation. Hiding it away in a container script in a different repo means folks can't access that or we would need to write extra code to pass all validation arguments as command line args which feels unnecessary work.
Equally if any custom validation functions require additional dependencies, they cannot be configured as described in our vignette using the current approach.

In addition before deploying any of this, I feel we need a clear system that will ensure images are up to date, ideally as automated as possible.

Give all this and how important validation is to hub functionality, I feel this issue requires a RFC and decision document before moving forward further.

annakrystalli · 2025-01-06T13:10:23Z

Sorry! I think I confused what was going on when I looked at it last week but now that all the docs are in the repo, it's clear the way you've designed this can be used already as described above 🚀.

Given the step to install packages will no longer be required in the vanilla validation workflow, it would be good once we've decided to roll the above out in our validation workflows to include an example of how hub admins could amend their workflow to include additional dependencies. I imagine the vignette here in hubValidations for deploying custom functions might be a good place and perhaps the updated hubverse-actions documentation for the updated validate-submission action?

annakrystalli · 2025-01-06T13:24:04Z

One last question, should we be installing the released hubverse packages from r-universe instead of from GitHub? Would it be useful to maintain a dev version of the image with dev versions of the packages?

zkamvar · 2025-01-06T20:04:59Z

should we be installing the released hubverse packages from r-universe instead of from GitHub?

No. We are using {renv} to control the versioning, which expects an archival repository. Theoretically, {renv} should fall back to GitHub if the package is from a universe repository, but I've found in rstudio/renv#2068 that this is still not the case.

Would it be useful to maintain a dev version of the image with dev versions of the packages?

I think so.

zkamvar · 2025-01-06T20:14:18Z

Give all this and how important validation is to hub functionality, I feel this issue requires a RFC and decision document before moving forward further.

I agree and I want to make it clear that my exercise here was to explore the space and come up with a prototype/minimal viable product. There are a lot of different layers that can be optimized and I wanted to explore them sufficiently before proposing a project.

zkamvar · 2025-01-06T23:35:59Z

include an example of how hub admins could amend their workflow to include additional dependencies.

That's a good point. I'll have to look at https://rstudio.github.io/renv/articles/docker.html#dynamically-provisioning-r-libraries-with-renv to make sure our implementation can be extended.

annakrystalli · 2025-01-17T13:35:46Z

I know this is still WIP but I think some of the functionality is really useful and might be better as official hubValidations functionality. I'm thinking specifically the get_error_data() which would also resolve #16

It could be that we add it to the print method but only deploy through an argument (e.g. show_errors = TRUE)?

zkamvar · 2025-01-17T15:47:02Z

I know this is still WIP but I think some of the functionality is really useful and might be better as official hubValidations functionality. I'm thinking specifically the get_error_data() which would also resolve #16

It could be that we add it to the print method but only deploy through an argument (e.g. show_errors = TRUE)?

I can take a stabby-stab at it. I agree that the functionality should live in hubValidaitons.

zkamvar · 2025-01-17T20:20:33Z

Back on thread:

I'm now realizing that the renv.lock is over-complicating things a bit. The docker container could be built from a simple docker file.

I think it might be possible for the docker files to actually live inside the repository and ignored in .Rbuildignore

docker
|- dev/
|    |- Dockerfile
|- release/
|    |- Dockerfile

We could have a GitHub workflow that will build the dev container on each push and on a schedule (to capture updates from dependencies). The release container will be built when a release is created.

I've demonstrated this in my fork of hubUtils: https://github.com/zkamvar/hubUtils/
The Dockerfiles live in the places I described above and they are built with two nearly identical workflows, the difference being that the dev container is built every push and the release container is built on demand or on a release tag.

The container with the released version is https://github.com/zkamvar/hubUtils/pkgs/container/hubutils/339039092?tag=main

The released version contains a renv.lock file in /etc/renv.lock (I wasn't sure of the right place to put it), but this demonstrates that it should be possible to have a workflow that emits the renv.lock file and attaches it to a release.

and the container with the dev version is
https://github.com/zkamvar/hubUtils/pkgs/container/hubutils-dev/339039095?tag=main

elray1 · 2025-01-17T21:02:40Z

I'm now realizing that the renv.lock is over-complicating things a bit. The docker container could be built from a simple docker file.

In past investigations essentially independently replicated by @matthewcornell and me, we've landed on building R-based containers using renv in order to get the installation of specific R package versions that renv provides, and which we've found somewhat difficult to achieve without renv. @matthewcornell has written up the procedures we've used here: https://github.com/reichlab/operational-models/blob/main/flu_ar2/..%2FREADME.md#requirementstxt-and-renvlock-details. I (and I'm sure Matt) would be interested to hear if you're envisioning something else specific to deal with this?

zkamvar · 2025-01-17T21:42:41Z

we've landed on building R-based containers using renv in order to get the installation of specific R package versions that renv provides

I guess it really depends on what the container is for. If it's a drop-in replacement for our current GitHub workflow setup, then a simple Dockerfile that generates a container at release time is sufficient.

If, however, we would need this for future reproducibility, then the renv.lock file is an important element because we would be able to re-build the container from an old renv.lock.

I think for a container that should mirror the release and development versions of the packages , using a renv.lock might be a bit too prescriptive because we rebuild the containers every time we do a release, containing a snapshot of the R package environment at the time of release, which better reflects the testing environment and it reduces the burden of maintaining and synchronizing a renv.lock file across the dependent package containers (e.g. the renv.lock for hubUtils will serve as the base renv.lock for hubValidations).

zkamvar · 2025-01-17T21:43:36Z

That all being said, we might be able to have the Dockerfile even generate the renv.lock as an artifact when it's built.

elray1 · 2025-01-17T21:55:35Z

makes sense. i guess there's a plan to distribute the built images via docker hub or similar, and that's what a user would effectively interact with?

zkamvar · 2025-01-17T22:04:15Z

makes sense. i guess there's a plan to distribute the built images via docker hub or similar, and that's what a user would effectively interact with?

Yes. The plan is to use GitHub Workflows to build and deploy the containers to GitHub's container registry (ghcr.io). This is how https://github.com/hubverse-org/hub-dash-site-deployer and https://github.com/hubverse-org/hubPredEvalsData-docker are deployed.

github-project-automation bot added this to hubverse Development overview Oct 12, 2023

github-project-automation bot moved this to Todo in hubverse Development overview Oct 12, 2023

annakrystalli assigned bsweger and annakrystalli Jan 12, 2024

nickreich unassigned bsweger and annakrystalli May 17, 2024

bsweger moved this from Todo to Up Next in hubverse Development overview May 22, 2024

bsweger moved this from Up Next to Todo in hubverse Development overview May 24, 2024

annakrystalli mentioned this issue Dec 11, 2024

Ubuntu latest removes R from its default installation hubverse-org/hubverse-actions#30

Closed

zkamvar self-assigned this Jan 2, 2025

zkamvar moved this from Wishlist to In Progress in hubverse Development overview Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create and release a Docker image to run validations #57

create and release a Docker image to run validations #57

elray1 commented Oct 12, 2023

nickreich commented May 17, 2024

zkamvar commented Jan 2, 2025

zkamvar commented Jan 3, 2025

zkamvar commented Jan 3, 2025

annakrystalli commented Jan 6, 2025

annakrystalli commented Jan 6, 2025 •

edited

Loading

annakrystalli commented Jan 6, 2025

zkamvar commented Jan 6, 2025

zkamvar commented Jan 6, 2025

zkamvar commented Jan 6, 2025

annakrystalli commented Jan 17, 2025 •

edited

Loading

zkamvar commented Jan 17, 2025 •

edited

Loading

zkamvar commented Jan 17, 2025 •

edited

Loading

elray1 commented Jan 17, 2025

zkamvar commented Jan 17, 2025

zkamvar commented Jan 17, 2025

elray1 commented Jan 17, 2025

zkamvar commented Jan 17, 2025

create and release a Docker image to run validations #57

create and release a Docker image to run validations #57

Comments

elray1 commented Oct 12, 2023

nickreich commented May 17, 2024

zkamvar commented Jan 2, 2025

zkamvar commented Jan 3, 2025

zkamvar commented Jan 3, 2025

annakrystalli commented Jan 6, 2025

annakrystalli commented Jan 6, 2025 • edited Loading

annakrystalli commented Jan 6, 2025

zkamvar commented Jan 6, 2025

zkamvar commented Jan 6, 2025

zkamvar commented Jan 6, 2025

annakrystalli commented Jan 17, 2025 • edited Loading

zkamvar commented Jan 17, 2025 • edited Loading

zkamvar commented Jan 17, 2025 • edited Loading

elray1 commented Jan 17, 2025

zkamvar commented Jan 17, 2025

zkamvar commented Jan 17, 2025

elray1 commented Jan 17, 2025

zkamvar commented Jan 17, 2025

annakrystalli commented Jan 6, 2025 •

edited

Loading

annakrystalli commented Jan 17, 2025 •

edited

Loading

zkamvar commented Jan 17, 2025 •

edited

Loading

zkamvar commented Jan 17, 2025 •

edited

Loading