Deploy releases by GitHub actions #538

ITViking · 2024-12-02T15:51:50Z

What does this PR do?

This is a test.
What we hope to achieve with this in it full form is a safer more stable deployment that are much less manual then hitherto.
This MR tests the viability of having a merged PR, trigger a deployment to lagoon.

Should this be tested by the reviewer and how?

Read it through.
If you want to test it, I highly recommend that you install (act)[https://github.com/nektos/act]

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-264

…rsion deployed

…dockerfiles image to reflect the new version

ath88 · 2024-12-03T07:41:29Z

Does updating the commit deploy a new version?

ath88 · 2024-12-03T07:39:10Z

.github/workflows/update-deploy-single-prod-site.yaml

+    - name: Check if the dpl-cms-release version changed
+      id: check-version
+      run: |
+        CURRENT_VERSION=$(sed -n 's/^FROM ghcr.io\/danskernesdigitalebibliotek\/dpl-cms-source:\([^ ]*\) .*/\1/p' env-canary/lagoon/cli.dockerfile)


Could something like dockerfile-json and jq be a more readable solution here?

https://github.com/keilerkonzept/dockerfile-json
https://jqlang.github.io/jq/

Hmmm.... That could be an option. Do you dislike sed?

sed is fine - its just only people who worked with it extensively that under the syntax. I guess was inspired by the talk on KCD to read Dockerfiles as JSON/YAML. :)

fair, then I would rather use yq instead :) I don't know if we will have to use awk but let's see

ok. It became a really long line when used, so Id rather stick with what Ive got for now. Can this be approved?

ITViking · 2024-12-03T08:26:19Z

Not entirely understood?
Right now it has to be manually run, but the idea is to have it run on a merge

kasperg

I appreciate the effort here but I find this somewhat hard to review because I do not know what thought process went into this suggestion. With this in mind I will just leave this as a rather long comment.

I am unsure about what part of the deployment process you want to automate.

Some candidates I see:

Updating sites.yaml
Deploying to affected sites (task sites:sync) and running associated tasks (setting mode, adjusting resource requests etc.)
More?

Personally I think updating sites.yaml can be tricky to get entirely right. Perhaps it would be possible to create GitHub workflows which mapped to the runbooks we already have established.

Running task sites:sync is time-consuming but as I see it primarily because you have to wait for it to finish. Running it in GitHub Actions allows you to detach from the process locally. We know from experience that once deployments have kicked off, many things can happen and for the moment we cannot be certain that we are ready to reset the cluster state once the first round of deployments have been started.

This PR lands somewhere between all this in a manner that I do not understand. For one thing it seems to bypass the work that task sites:sync currently handles in maintaining the GitHub repo for the individual library site.

If I was to look into automating deployments I would start out by reusing dplsh to effectuate changes from sites.yaml to the correspond GitHub repos. This could be in several forms:

Running task sites:sync in one process to have a central log for it.
Running SITE=site task site:full-sync in a matrix with one process per site in sites.yaml. Then the process could watch for the deployment to complete and report status. Bear in mind that we still have limited parallelization in Lagoon at the moment so some processes would need to wait quite a while.

My primary caveat around automating deployments is that it often assumes stable deployments. That is currently not something we have. I think we should either focus our efforts there or ensure that we have instability in mind when designing the automation.

kasperg · 2024-12-03T10:20:22Z

.github/workflows/update-deploy-single-prod-site.yaml

+on:
+  workflow_dispatch: # Allows manual trigger


Consider making the target version an input to the workflow instead of reading it from sites.yaml.

ITViking · 2024-12-03T12:18:05Z

@kasperg I've been making other efforts towards a more stable and secure deployment of releases these last few weeks, eg. having installed vertical-pod-autoscalers for site-related, and some Lagoon related, workloads. Right now they are running in "suggestion mode", so in practice they don't do anything until I have turned them on, but when turned on, they remove the need for manuelly adjusting pod resource.

This addition will for starters replace task sites:sync and site:sync, which are completely manual and human-error prone, which we have seen demonstrated more than once, in various forms. This will bring us to a new and more safe release procedure, which will require a review and approval, but which will also be way easier for eg. some other developers to paticipate in.

In a short while I will figure out how to rerun failed deployments, so the process is fully automated.

That will leave the syncing of moduletest with production, which should exist in dplshell, because it quite harmless. It also not something we run on every single deployment.

I hope this clearifies the path I'm currently on

kasperg · 2024-12-03T13:23:39Z

@ITViking

I've been making other efforts towards a more stable and secure deployment of releases these last few weeks [...] In a short while I will figure out how to rerun failed deployments, so the process is fully automated.

Sounds awesome! I am very much looking forward to seeing this in action.

This addition will for starters replace task sites:sync and site:sync, which are completely manual and human-error prone, which we have seen demonstrated more than once, in various forms.

Personally I do not see the two specific tasks as manual or error-prone. We usually call them with few (one?) arguments and they seldomly fail. However our runbooks/experience for deploying show that we currently have a lengthy manual process around them with other failing parts in order to complete a deployment successfully.

Deployment as currently suggested in this PR (pushing an updated tag to a Dockerfile in a Git repo) only handles part of the responsibility of task site:sync. I worry that this over time will simply recreate the task in GitHub workflows scripts.

The current combination of task and shell scripts for sites:sync may not be optimal, but my suggestion would be to start out by recreating our runbook in a GitHub Workflow, run the existing tasks from dplsh as a part of this and work our way out from there.

With the increased stability, autoscaling, automated rerun of failed deployments etc. the resulting runbook (and thus workflow) should be significantly simpler.

That will leave the syncing of moduletest with production, which should exist in dplshell, because it quite harmless.

As I see it we could also run this from a manually triggered GitHub workflow.

ITViking · 2024-12-03T14:07:00Z

The thing left out of the intended goal of deploying a new release via Github actions is everything surrounding the lagoon related files.
We should fix that too, because they way we do it right now, is what requires us to first deploy, and then go make a manual change to kobenhavn-main's docker-compose file, as the current way of creating the "profile-template" completely replaces the existing branch with what we're bringing from our local machines. If that got merged instead, we wouldn't have the same problem.

ITViking added 9 commits November 28, 2024 14:33

see if we can get GHA to find an write out the new version

9184953

move file and specify actual key to get

89fee8d

mroe commits but gotta fix deploy

7965abe

find the new version we possibly want to update to

2a3b824

clone the repo

956ec32

check if the version in sites.yaml is newer than the current image ve…

b362cb6

…rsion deployed

if the new version we're bringing in sites.yaml is newer, update the …

c8c2591

…dockerfiles image to reflect the new version

add a space

4756b40

add the new space

db5ed20

ITViking requested review from kasperg, ath88 and hypesystem December 2, 2024 15:51

ITViking assigned kasperg, ath88 and hypesystem Dec 2, 2024

ITViking added 6 commits December 2, 2024 17:38

use the new way of making variables available across steps

e67a85a

same here

3f735b0

access variable from previous step

d752775

also use the new way of making variables available across steps here

6f31875

correct mr message

49aed7e

access new-version correctly

a369062

ath88 reviewed Dec 3, 2024

View reviewed changes

kasperg reviewed Dec 3, 2024

View reviewed changes

ITViking added 2 commits December 4, 2024 08:17

rename file to match what we are doing with it

be13d4f

start on workflow for releasing to moduletests

18c592c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy releases by GitHub actions #538

Deploy releases by GitHub actions #538

ITViking commented Dec 2, 2024

ath88 commented Dec 3, 2024

ath88 Dec 3, 2024

ITViking Dec 3, 2024

ath88 Dec 3, 2024

ITViking Dec 3, 2024

ITViking Dec 3, 2024

ITViking commented Dec 3, 2024

kasperg left a comment

kasperg Dec 3, 2024

ITViking commented Dec 3, 2024

kasperg commented Dec 3, 2024

ITViking commented Dec 3, 2024

Deploy releases by GitHub actions #538

Are you sure you want to change the base?

Deploy releases by GitHub actions #538

Conversation

ITViking commented Dec 2, 2024

What does this PR do?

Should this be tested by the reviewer and how?

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

ath88 commented Dec 3, 2024

ath88 Dec 3, 2024

Choose a reason for hiding this comment

ITViking Dec 3, 2024

Choose a reason for hiding this comment

ath88 Dec 3, 2024

Choose a reason for hiding this comment

ITViking Dec 3, 2024

Choose a reason for hiding this comment

ITViking Dec 3, 2024

Choose a reason for hiding this comment

ITViking commented Dec 3, 2024

kasperg left a comment

Choose a reason for hiding this comment

kasperg Dec 3, 2024

Choose a reason for hiding this comment

ITViking commented Dec 3, 2024

kasperg commented Dec 3, 2024

ITViking commented Dec 3, 2024