-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(docker): Add zstd to the dockerimage used in silo preprocessing #574
Conversation
There is no change in the changelog. This PR will not produce a new releasable version. |
We install zstd as a dependency, I do not quite understand why this is required from the CLI? |
@Taepper I would like to use zstd in the silo_import_job.sh script, I can also somehow call it from the silo api if that is possible. However, adding it to the cli here makes debugging easier. here is the position: https://github.com/loculus-project/loculus/pull/2773/files#diff-4d103a16b3b0f2a73a5fcac5f2db3d5c401311c01652196e13f2cd060792951b |
But this is a script not part of SILO. Whether the downstream consumer wants to run the script in the SILO image or not should not introduce this package for all consumers. I would rather introduce the apt install here |
I can try this out. I thought normally when running a process in a docker image the process cannot install dependencies in the image as this would mean the process is running a sudo command - so this would normally need to be done in the docker image set-up. But maybe this is possible here |
Hmm, that is true. I am currently also not quite in the clear why the silo import script needs to be executed inside the silo container, but I guess this would be a little complicated to change? So the mental model of the silo container is supposed to just have all dependency installed and be able to run silo. Adding additional binaries to the image because a user script interactively execs into the container run bash commands is not the way I envisioned it to be used |
I guess that could be solved by also publishing binaries for all releases instead of just docker images. Then the docker image we use to execute the commands can be just some ubuntu base image with the silo binary, all tools installed that you need and as an entry-point have the loculus silo import script. This could still be facilitated by building an image on top of the silo base image. What do you think? |
The silo_import_script runs But it is ok I got it to work like you suggested - I wonder if this means any process running in your silo container is root. I think having silo binaries would make sense adding @corneliusroemer as he had some thoughts in loculus-project/loculus#2765. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also would not introduce this here, if it's not necessary.
There are arguments for and against, it's not about necessity but about pro/con tradeoff. For:
Against:
Overall, I see a clear case for adding it to unblock loculus-project/loculus#2773 and I struggle to see downsides that are worth the extra work in PP at this point (later we might want to change how we use silo preprocessing, but not right now) @chaoran-chen's view here would be valuable I think as he's involved in both projects and could make a call on the tradeoffs. |
IMO Loculus "misuses" the SILO Docker image by running a script in it that the image is not intended for. And it apparently works by simply installing zstd on startup of the container. That's why I see this rather as a Loculus issue than a SILO issue. But we've discussed splitting SILO and providing two images: one for the API and one for the preprocessing. This might be an aspect that's worth considering there, too. |
Not complicated to change but it somewhat unnecessarily costs dev time to reconfigure the architecture and build a separate docker image in PP. It doesn't need to be executed inside the silo container, but it's how things are done right now.
Curl was added a long time ago (71c5664) and has much more security implications than a tiny zstd binary.
Yeah that would be nice, I made #575 - but it would still cost a lot of downstream dev work. |
Hi! Thank you all for detailing and weighing up the pros and cons! All your points make a lot of sense and I think that we all agree that there are things to improve with regard to how SILO is distributed. It has been on our roadmap for a long time to release binaries that people can use without Docker but the challenge is that it is not easy/possible to provide a fully statically-linked binary and we will probably not be able to solve that within the next few days. Offering different Docker images with different bases also sounds good in principle and is common practice for many projects but it will take quite some effort. Installing things that are unrelated to SILO into the image is not ideal and we should improve it both on the Loculus side (by moving the data download from the backend into a separate image, this could be combined with loculus-project/loculus#2765) as well as on the SILO side (by providing binaries and/or other images). However, I do see that this is a very simple way to solve a problem in Loculus and see very few disadvantages. So, fow now, if adding the one-liner is the best and easiest solution for Loculus, I am in favor of merging this PR. However, I see that we now have https://github.com/loculus-project/loculus/pull/2773/files#diff-f451cc1794ed7b689f4eddbe5eed0d6d7ce35e599c44a628e50af4b6c41a3499R93, so is this PR and discussion still relevant or obsolete anyways? |
Quite, yeah, but that's how it is right now and has been for a while, since loculus-project/loculus@b41659d
That's a good workaround idea - I still don't think not just adding zstd here isn't the easiest with little harm. Downside of installing after startup: increased startup time, requires internet access - but yeah we can just do that then.
Agreed that we can think about this but it's not a quick fix for the immediate need we have for the zstd binary. |
Sorry, I really didn't mean to block on this, after being confused by the issue at hand, and then finding out that the import script runs in the silo container, I thought that line change would have been the quickest solution.. I also do not think it would have harmed that zstd is installed into the docker image. But it would be confusing in a few months, when someone thinks that this install is required for some dynamic linking of the silo binary, but a comment linking to the loculus script could have cleared that up |
So, if you think this or something else helps development in loculus just mention it, and we can move forward with it, until we refactor the import script / amend the interface to start silo |
I now just install zstd every time we start the silo preprocessing script - inefficient but not terrible as a temporary solution. I think its good to follow up in #575 and work on splitting SILO and providing two images: one for the API and one for the preprocessing which we can then use for example in the silo_import_job.sh script. I will close this PR for now. |
resolves #
Summary
Adds zstd to the silo image, as we use zstd file compression this generally makes sense to have, I also need this to add changes proposed in loculus-project/loculus#2773 (checking that we have received all streamed data).