Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow-Containers explore multi-stage builds #1991

Closed
EveningStarlight opened this issue Nov 25, 2024 · 6 comments
Closed

Kubeflow-Containers explore multi-stage builds #1991

EveningStarlight opened this issue Nov 25, 2024 · 6 comments
Assignees
Labels
kind/feature New feature or request

Comments

@EveningStarlight
Copy link
Contributor

Explore the use of multi-stage builds in Docker.
See if they can be used to:

  • help organize architecture
  • improve build speed
  • improve clarity
@EveningStarlight EveningStarlight added the kind/feature New feature or request label Nov 25, 2024
@EveningStarlight EveningStarlight self-assigned this Nov 25, 2024
@EveningStarlight
Copy link
Contributor Author

The current dockerfile usage for each image

image

Colouring is just for noticing related blocks.

@EveningStarlight
Copy link
Contributor Author

An alternative build order, with possible build stages

image

This could simplify all the jupyterlab and rstudio builds. Letting them share stages, cached resources, and improve build speeds.

@EveningStarlight
Copy link
Contributor Author

Structure would allow for all images to be stored in one Dockerfile.

Only using the FROM argument simplifies the Jupyter builds for the early stages.
Each stage just needs to reference the build chain and label the current stage.
ex.
FROM base-cpu as base-jupyter
This is a stage name base-jupyter with the base image base-cpu

The limitation of this is that the finalize stage will need to be duplicated for each final image.

@EveningStarlight
Copy link
Contributor Author

The COPY instruction can be used to copy files and directories from one stage to another. This could be used to start sharing some steps with sas and remote-desktop and reduce the duplication of code.

@EveningStarlight
Copy link
Contributor Author

Arguments used in FROM statements must be declared at the top of the dockerfile.
The specific case is

ARG BASE_VERSION=2024-06-17
FROM quay.io/jupyter/datascience-notebook:$BASE_VERSION

This line is used for the cpu base and for sas. If they will always use the same base, this will work fine for now.
But it can't be defined separately, so if they need to be different, they need to be renamed.
Then they also need to be declared at the top of the file, before all the FROM statements start getting used. If not, the variable will read as blank and the build will fail.
See
https://github.com/StatCan/aaw-kubeflow-containers/actions/runs/12032558256/job/33544669100

  1684 |     ARG BASE_VERSION_SAS=2024-06-17
 1685 | >>> FROM quay.io/jupyter/datascience-notebook:$BASE_VERSION_SAS as sas
 1686 |     
 1687 |     USER root
--------------------
ERROR: failed to solve: failed to parse stage name "quay.io/jupyter/datascience-notebook:": invalid reference format

The vaiable can't be declared at the top and redefined later either
See
https://github.com/StatCan/aaw-kubeflow-containers/actions/runs/12032666248/job/33545016945

  7     |     ARG BASE_VERSION_SAS=2024-06-17-test
.
.
.
 1686 |     ARG BASE_VERSION_SAS=2024-06-17
 1687 | >>> FROM quay.io/jupyter/datascience-notebook:$BASE_VERSION_SAS as sas
 1688 |     
 1689 |     USER root
--------------------
ERROR: failed to solve: quay.io/jupyter/datascience-notebook:2024-06-17-test: failed to resolve source metadata for quay.io/jupyter/datascience-notebook:2024-06-17-test: quay.io/jupyter/datascience-notebook:2024-06-17-test: not found

@EveningStarlight
Copy link
Contributor Author

DOCKER_BUILDKIT=0 will break simple builds for multi-stage.
While the legacy build is enabled, the build will build every stage up till the target stage. This greatly balloons the build process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants