Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process multiple targets in single action call and support S3 backend #10

Closed
strophy opened this issue Oct 10, 2023 · 4 comments · Fixed by #25
Closed

Process multiple targets in single action call and support S3 backend #10

strophy opened this issue Oct 10, 2023 · 4 comments · Fixed by #25
Labels
enhancement New feature or request

Comments

@strophy
Copy link

strophy commented Oct 10, 2023

Hi @AkihiroSuda thank you for picking up maintenance of this important action!

We have added two features on a fork over at https://github.com/dcginfra/buildkit-cache-dance and I wonder if you would be interested in PRs to add these features to v2 of the action, now that its use is recommended in the official Docker documentation. We have two main changes:

  • Process multiple cache mounts in a single pass by specifying an ID for each mount
  • Support AWS S3 as an alternative cache storage backend

The changes require the user's Dockerfile to be modified with cache IDs like this:

FROM ubuntu:22.04
RUN \
  --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-cache \
  --mount=type=cache,target=/var/lib/apt,sharing=locked,id=apt-lib \
  apt-get update && apt-get install -y gcc

And the action is called something like this:

- name: inject cache mounts into docker
  uses: reproducible-containers/buildkit-cache-dance@mount-id-example
  with:
    mounts: |
      apt-cache
      apt-lib

The main change is in the Dancefile, which is generated on the fly with as many mounts and copy operations as necessary. There is no need to pass the cache-source and cache-target separately anymore because the cache is identified by its unique ID instead, like this:

- name: Prepare list of cache mounts for Dancefile
  uses: actions/github-script@v6
  id: mounts
  with:
    script: |
      const mountIds = `${{ inputs.mounts }}`.split(/[\r\n,]+/)
        .map((mount) => mount.trim())
        .filter((mount) => mount.length > 0);
      
      const cacheMountArgs = mountIds.map((mount) => (
        `--mount=type=cache,sharing=shared,id=${mount},target=/cache-mounts/${mount}`
      )).join(' ');
      
      const s3commands = mountIds.map((mount) => (
        `aws s3 sync --no-follow-symlinks --quiet s3://${{inputs.bucket}}/cache-mounts/${mount} /cache-mounts/${mount}`
      )).join('\n');

      core.setOutput('cacheMountArgs', cacheMountArgs);
      core.setOutput('s3commands', s3commands);

- name: Inject cache data into buildx context
  shell: bash
  run: |
    docker build ${{ inputs.cache-source }} --file - <<EOF
    FROM amazon/aws-cli:2.13.17
    COPY buildstamp buildstamp
    RUN ${{ steps.mounts.outputs.cacheMountArgs }} <<EOT
        echo -e '${{ steps.mounts.outputs.s3commands }}' | sh && \
        chmod 777 -R /cache-mounts || true
    EOT
    EOF

The code is currently still written in JS, and is quite tightly bound to S3 (since that is what we need) but I'd love to see features like this supported in the maintained version of the action, since there has been a lot of discussion about this (as I'm sure you're aware). Thoughts?

@AkihiroSuda AkihiroSuda added the enhancement New feature or request label Oct 10, 2023
@AkihiroSuda
Copy link
Member

Thanks for proposal, SGTM

  • How will “mounts” work with actions/cache?
  • Do we really need to execute the awscli inside Dockerfile?
  • Probably, composite actions such as github-script cannot be used: Avoid composite action #4

@strophy
Copy link
Author

strophy commented Oct 10, 2023

  • I'm not sure about this, but I think we can call the GH cache API directly? The action would therefore require two inputs:
    • list of mount ids
    • (optional) cache backend (default to using GHA cache, if using S3 then bucket name is also needed)
  • Executing the cache call directly inside the Dockerfile results in a significant speedup with large cache by removing one of the copy operations, and uses less drive space because there is no need to store the cache in an intermediate step, so the copy operation cache mount -> runner local storage -> external cache becomes cache mount -> external cache directly
  • Yes, this would need to be rewritten in bash

We could probably even go a step further for point 1 and implement Apache OpenDAL as the backend, immediately adding support for a wide range of cloud storage. See https://github.com/everpcpc/actions-cache for an existing implementation of this.

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Oct 10, 2023

OpenDAL

What about rclone?
https://github.com/rclone/rclone

@strophy
Copy link
Author

strophy commented Oct 10, 2023

rclone looks perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants