Skip to content
This repository has been archived by the owner on Nov 18, 2024. It is now read-only.

Add DockerHub -> GHCR sync workflow #11

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/workflows/sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Sync DockerHub images with GHCR 🔁

## This workflow is a temporal solution for having all our docker images synced to GHCR. However,
## as soon as all the repo's CI is migrated to GitHub Actions, this can be removed.
##
## Project tracking the progress: https://github.com/orgs/jellyfin/projects/31

on:
schedule:
- cron: "0 0 * * *"
workflow_dispatch:

jobs:
sync:
runs-on: ubuntu-latest
name: Image sync 🔁
# Errors will probably be caused by excesses in API quota, so we can safely continue.
# Remaining workflows will be removed in the next scheduled run.
continue-on-error: true
strategy:
fail-fast: false
matrix:
image:
- 'jellyfin/jellyfin'

steps:
- name: Clone repository
- uses: actions/[email protected]

# The 'echo' command masks the environment variable
- name: Prepare environment
run: chmod +x sync_image_ghcr.sh && echo "::add-mask::${GITHUB_TOKEN}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to note this too: I commented on the explicit masking of the secret initially due to the use within the external Shell scrip but this should usually not be necessary in regular workflows as it is the default for Secrets to be masked.


## Logging in to DockerHub allows for more pulls without hitting DockerHub's quota (100 vs 200).
- name: Login to Docker Hub
uses: docker/[email protected]
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
ferferga marked this conversation as resolved.
Show resolved Hide resolved
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Login to GitHub Container Registry
uses: docker/[email protected]
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.JF_BOT_TOKEN }}

- name: Run syncing script
run: ./sync_image_ghcr.sh ${{ matrix.image }}
env:
GITHUB_TOKEN: ${{ secrets.JF_BOT_TOKEN }}
69 changes: 69 additions & 0 deletions sync_image_ghcr.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash
set -e
# Simple script that pushes missing/outdated tags from an image in DockerHub to GHCR, skipping up-to-date tags
# GitHub package registry. You need to be logged in using 'docker login' first: https://docs.github.com/es/packages/working-with-a-github-packages-registry/working-with-the-container-registry
target_repo="ghcr.io"
tag_file="tag_list.txt"
original_image="$1"
rm -rf $tag_file
url="https://hub.docker.com/v2/repositories/${original_image}/tags"
tag_count=$(wget -q "$url" -O - | jq -r '.count')
ferferga marked this conversation as resolved.
Show resolved Hide resolved

echo "Fetching tags from DockerHub..."
while true; do
results=$(wget -q "$url" -O - | jq -r '.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will loop at most ~100 times due to API rate limits (Thank you Docker Inc. 🙄), which would lead to the script abruptly exiting due to you having a if check in L30

(at least I am fairly certain the tagsendpoint counts toward that quota too)

Copy link
Contributor Author

@ferferga ferferga Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this one doesn't have any quota. I didn't use any authentication while testing either 😁.

I tested this locally and it queried over 300 pages without any issues. The problem is down below, when docker pull runs 😁.

As I mentioned in the OP, this workflow will fail only on the pull or push process due to API constraints, but that's not problematic at all as the work it's going to be resumed in the next scheduled run.

I also expect a lot of API problems in the first weeks of this workflow running, but once GHCR gets up-to-date, it won't be a problem, as we only need to sync the images that have been published in the last 24 hours.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, interesting. I could have sworn that any GET against the API was counting towards that cap, but this is good news if it does not 🎉

Yeah as soon as the Pull starts this will be 'FUN'!
However, on a somewhat related note how long do we keep / or intent to keep unstable builds available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With unstable you mean this method of syncing? Or the unstable images that we're pushing daily. AFAIK we're storing it locally in our infraestructre VPS for 2 days, while in DockerHub (and now GHCR once this is merged) it's since the beginning of Jellyfin! 😋

Copy link

@h1dden-da3m0n h1dden-da3m0n Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to to the actual images that get build not only daily but on each push to master (web or server).

But given your answer for that ... we should consider some form of cleanup strategy for those IMHO. (Irrelevant of the registry)

edit: maybe outside the scope of this PR so not blocking

url=$(echo "$results" | jq -r '.next')
echo "$results" | jq -r '.results[] | {name: .name, last_pushed: .tag_last_pushed, digests: [.images[].digest]}' >> $tag_file
if [ "${url}" = "null" ]
then
break
else
continue
fi
done;
unset results, url

sorted=$(cat "$tag_file" | jq -s 'sort_by(.last_pushed)')
echo "$sorted" > $tag_file
file_tag_count=$(jq length "$tag_file")

if [ $tag_count = $file_tag_count ]
then
echo -e "All the data was retrieved correctly. Pushing missing/modified tags to DockerHub...\n"
else
echo "The retrieved data doesn't match the amount of tags expected by Docker API. Exiting script..."
exit 1
fi

unset sorted, file_tag_count, tag_count

## This token is that GitHub provides is used to access the registry in read-only, so users are able to
## use GHCR without signing up to GitHub. By using this token for checking for the published images, we don't consume
## our own API quota.
dest_token=$(wget -q https://${target_repo}/token\?scope\="repository:${original_image}:pull" -O - | jq -r '.token')
tag_names=$(cat "$tag_file" | jq -r '.[] | .name')

while read -r line; do
tag="$line"
source_digests=$(cat "$tag_file" | jq -r --arg TAG_NAME "$tag" '.[] | select(.name == $TAG_NAME) | .digests | sort | .[]' | cat)
target_manifest=$(wget --header="Authorization: Bearer ${dest_token}" -q https://${target_repo}/v2/${original_image}/manifests/${tag} -O - | cat)
target_digests=$(echo "$target_manifest" | jq '.manifests | .[] | .digest' | jq -s '. | sort' | jq -r '.[]' | cat)
if [ "$source_digests" = "$target_digests" ]
then
echo The tag $tag is fully updated in $target_repo
continue
else
echo Updating $tag in $target_repo
docker pull $original_image:$tag
docker tag $original_image:$tag $target_repo/$original_image:$tag
docker push $target_repo/$original_image:$tag

# Delete pushed images from local system
docker image rm $original_image:$tag
docker image rm $target_repo/$original_image:$tag
Comment on lines +57 to +63
Copy link

@h1dden-da3m0n h1dden-da3m0n Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use skopeo (which is also part of the virt env) and have a way better time xD (as its designed to 'copy&sync' images from one reg to another)

e.g. (auth options not included since they should be detected)

--authfile <path>
Path of the authentication file. Default is ${XDG_RUNTIME_DIR}/containers/auth.json, which is set using skopeo login. If the authorization state is not found there, $HOME/.docker/config.json is checked, which is set using docker login.

skopeo copy docker://docker.io/${original_image}:${tag} docker://${target_repo}/${original_image}:${tag}

or you could read into skopeo sync, which sounds even more interesting in this case. However, I have not used that command YET

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting, had no idea about this tool (and if I had, this script wouldn't exist 😄).

I'll definitely star the repo to install skopeo in my machine as it seems really useful, but I wouldn't bother about it for this PR: we're going to remove this as soon as we have all the CI work completed, so I don't see anything useful in throwing off something that I 100% sure know that works (at least in my machine, fingers crossed that the GH Actions environment doesn't mess it up with permissions or whatever) for something meant for the short term.

wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I too wasn't aware of skopeo for some time, until docker push started to act up in the CI I manage at work (It froze indefinitely when it encountered network issues and never timed out till the Pipeline timed out).
Skopeo ended up being the solution since it is able to detect network issues better / times out properly. Ever since then I have really enjoyed using it for various tasks.

Yes for this intermediate solution your approach is perfectly adequate, I just felt like mentioning skopeo as it really fit this use-case 😉

fi
done <<< $tag_names

rm -rf $tag_file
echo -e "\nAll the tags have been updated successfully"
exit 0