Skip to content
This repository has been archived by the owner on Nov 18, 2024. It is now read-only.

Add DockerHub -> GHCR sync workflow #11

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ferferga
Copy link
Contributor

@ferferga ferferga commented Sep 8, 2021

All the CI migration and revamp is in full swing and it will simplify the publishing of container images to multiple registries beside DockerHub (kudos to @h1dden-da3m0n for all the huge amount of work!)

However, it's likely that it won't be ready for 10.8 and, once implemented, all the images that were previously available in DockerHub won't be available in GHCR. This might frustrate some users and looks like half-baked.

In order to have one more "new thing" for 10.8, and to provide an alternate method to access our software in the meantime (as GHCR is becoming a quite popular alternative), this workflow will run at midnight to check which container tags are missing/outdated in GHCR and update those from the ones present in DockerHub, skipping everything that it's already up-to-date in GHCR. This means that all the previous versions that are present in DockerHub will be present in GHCR as well!

As soon as all the CI work is complete, this workflow can be removed and is no longer necessary.

Needed tokens

This PR needs that the following org-wide secrets are made available in this repo:

  • JF_BOT_TOKEN: For pushing images to GHCR. Needs write/read:packages, delete:packages, read:org
  • DOCKERHUB_TOKEN: Needed to increase the 6-hour DockerHub pull limit from 100 to 200
  • DOCKERHUB_USERNAME: Needed to increase the 6-hour DockerHub pull limit from 100 to 200

(When DockerHub limits are reached, the workflow will fail. However, as the script only pulls and push what it's outdated in GHCR, the process will be continued where it was left off in the next schedule, after the API quota is resetted)

Copy link

@h1dden-da3m0n h1dden-da3m0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one so far, left some ideas

.github/workflows/sync.yml Show resolved Hide resolved
sync_image_ghcr.sh Show resolved Hide resolved

echo "Fetching tags from DockerHub..."
while true; do
results=$(wget -q "$url" -O - | jq -r '.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will loop at most ~100 times due to API rate limits (Thank you Docker Inc. 🙄), which would lead to the script abruptly exiting due to you having a if check in L30

(at least I am fairly certain the tagsendpoint counts toward that quota too)

Copy link
Contributor Author

@ferferga ferferga Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this one doesn't have any quota. I didn't use any authentication while testing either 😁.

I tested this locally and it queried over 300 pages without any issues. The problem is down below, when docker pull runs 😁.

As I mentioned in the OP, this workflow will fail only on the pull or push process due to API constraints, but that's not problematic at all as the work it's going to be resumed in the next scheduled run.

I also expect a lot of API problems in the first weeks of this workflow running, but once GHCR gets up-to-date, it won't be a problem, as we only need to sync the images that have been published in the last 24 hours.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, interesting. I could have sworn that any GET against the API was counting towards that cap, but this is good news if it does not 🎉

Yeah as soon as the Pull starts this will be 'FUN'!
However, on a somewhat related note how long do we keep / or intent to keep unstable builds available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With unstable you mean this method of syncing? Or the unstable images that we're pushing daily. AFAIK we're storing it locally in our infraestructre VPS for 2 days, while in DockerHub (and now GHCR once this is merged) it's since the beginning of Jellyfin! 😋

Copy link

@h1dden-da3m0n h1dden-da3m0n Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to to the actual images that get build not only daily but on each push to master (web or server).

But given your answer for that ... we should consider some form of cleanup strategy for those IMHO. (Irrelevant of the registry)

edit: maybe outside the scope of this PR so not blocking

sync_image_ghcr.sh Outdated Show resolved Hide resolved
Comment on lines +57 to +63
docker pull $original_image:$tag
docker tag $original_image:$tag $target_repo/$original_image:$tag
docker push $target_repo/$original_image:$tag

# Delete pushed images from local system
docker image rm $original_image:$tag
docker image rm $target_repo/$original_image:$tag
Copy link

@h1dden-da3m0n h1dden-da3m0n Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use skopeo (which is also part of the virt env) and have a way better time xD (as its designed to 'copy&sync' images from one reg to another)

e.g. (auth options not included since they should be detected)

--authfile <path>
Path of the authentication file. Default is ${XDG_RUNTIME_DIR}/containers/auth.json, which is set using skopeo login. If the authorization state is not found there, $HOME/.docker/config.json is checked, which is set using docker login.

skopeo copy docker://docker.io/${original_image}:${tag} docker://${target_repo}/${original_image}:${tag}

or you could read into skopeo sync, which sounds even more interesting in this case. However, I have not used that command YET

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting, had no idea about this tool (and if I had, this script wouldn't exist 😄).

I'll definitely star the repo to install skopeo in my machine as it seems really useful, but I wouldn't bother about it for this PR: we're going to remove this as soon as we have all the CI work completed, so I don't see anything useful in throwing off something that I 100% sure know that works (at least in my machine, fingers crossed that the GH Actions environment doesn't mess it up with permissions or whatever) for something meant for the short term.

wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I too wasn't aware of skopeo for some time, until docker push started to act up in the CI I manage at work (It froze indefinitely when it encountered network issues and never timed out till the Pipeline timed out).
Skopeo ended up being the solution since it is able to detect network issues better / times out properly. Ever since then I have really enjoyed using it for various tasks.

Yes for this intermediate solution your approach is perfectly adequate, I just felt like mentioning skopeo as it really fit this use-case 😉


# The 'echo' command masks the environment variable
- name: Prepare environment
run: chmod +x sync_image_ghcr.sh && echo "::add-mask::${GITHUB_TOKEN}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to note this too: I commented on the explicit masking of the secret initially due to the use within the external Shell scrip but this should usually not be necessary in regular workflows as it is the default for Secrets to be masked.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants