-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If possible, do not run worker containers with privileged access or a 'root' user #238
Comments
While some investigation has started, several stages of investigation may be necessary for this; i.e., it may pass in and out of In progress status several times. |
My earlier (brief) research on this seems to suggest it's feasible... there are switches available for sshd to prevent it from trying to change user contexts and to use a port >1024. Anything trying to connect would have to know to use the nonstandard port, but that also seems tractable. The other known tricky bit is trying to get datasets mounted in the worker... which is complicated by the fact that usually a container must be privileged/have cap SYS_ADMIN to mount things, even with FUSE... there are convoluted ways around this that require certain kernel versions and compiled options and modified security policy, but those in themselves are tricky. Still, having every worker running privileged is worth trying to avoid. Best bet so far looks like using some kind of volume plugin, but more research needed. |
Regarding getting data into the workers, we could develop a custom BMI module that was somehow implemented to retrieve forcings data from a DMOD dataset (or even multiple datasets). This wouldn't work for other things like config data, but those datasets are small enough that simply copy/download data into workers is probably feasible. That typically isn't the case with forcings. We could also conceivably inject a second custom BMI module that encapsulated writing data back out to a DMOD dataset (we are also currently writing to a dataset). Speaking of "inject," we would likely need to add the ability to support on-the-fly manipulation of realization configs in order to do things like this. While a significant task resource-wise, this is in principle something we already need to do, as noted in #407. It may also be a better general solution than what we have now, which leans heavily on a specific implementation of the DMOD Dataset abstraction. |
If we run into a good bit of trouble with using the current MinIO-based solution for getting data because of #242, we may have to go in a direction like the one I suggested above. |
@robertbartel, if we continue to face issues with the minio docker plugin we are using, we may be able to use docker cluster volumes. I am not well read on the details of this addition, but it does seem like our use case fits within the feature set introduced. |
@aaraney I need to study that further, but on quick examination it at least look promising. We may still run into issues with the plugins needing CAP_SYS_ADMIN, but we can cross that bridge when we get there. |
At present, the
ngen
worker containers require two less-than-ideal settings: running as theroot
user and running withSYS_ADMIN
added to grant privileged capabilities. These requirements are at least partially due to needs of thesshd
service and MPI process communication inside the workers. There may be implications for access to datasets as well (i.e., any potentially solution should be well tested to make sure that functionality still works).For technical reasons, it must also be assumed that not all Docker host environments allow use of
sudo
within containers. Therefore, simply switching tosudo
isn't a viable alternative.If possible, we should move away from these two requirements. It may not be possible, though, for one or both of them.
If one or either cannot be changed, a complete explanation should be included here of the reasons why such things are required and there are no alternatives. Otherwise, if there are external blocking factors, then issues for those dependencies should be opened and linked to.
The text was updated successfully, but these errors were encountered: