Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy releases by GitHub actions #538

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ITViking
Copy link
Contributor

@ITViking ITViking commented Dec 2, 2024

What does this PR do?

This is a test.
What we hope to achieve with this in it full form is a safer more stable deployment that are much less manual then hitherto.
This MR tests the viability of having a merged PR, trigger a deployment to lagoon.

Should this be tested by the reviewer and how?

Read it through.
If you want to test it, I highly recommend that you install (act)[https://github.com/nektos/act]

Any specific requests for how the PR should be reviewed?

What are the relevant tickets?

https://reload.atlassian.net/jira/software/c/projects/DDFDRIFT/boards/464?selectedIssue=DDFDRIFT-264

@ath88
Copy link
Contributor

ath88 commented Dec 3, 2024

Does updating the commit deploy a new version?

- name: Check if the dpl-cms-release version changed
id: check-version
run: |
CURRENT_VERSION=$(sed -n 's/^FROM ghcr.io\/danskernesdigitalebibliotek\/dpl-cms-source:\([^ ]*\) .*/\1/p' env-canary/lagoon/cli.dockerfile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could something like dockerfile-json and jq be a more readable solution here?

https://github.com/keilerkonzept/dockerfile-json
https://jqlang.github.io/jq/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... That could be an option. Do you dislike sed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sed is fine - its just only people who worked with it extensively that under the syntax. I guess was inspired by the talk on KCD to read Dockerfiles as JSON/YAML. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair, then I would rather use yq instead :) I don't know if we will have to use awk but let's see

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. It became a really long line when used, so Id rather stick with what Ive got for now. Can this be approved?

@ITViking
Copy link
Contributor Author

ITViking commented Dec 3, 2024

Not entirely understood?
Right now it has to be manually run, but the idea is to have it run on a merge

Copy link
Contributor

@kasperg kasperg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort here but I find this somewhat hard to review because I do not know what thought process went into this suggestion. With this in mind I will just leave this as a rather long comment.

I am unsure about what part of the deployment process you want to automate.

Some candidates I see:

  1. Updating sites.yaml
  2. Deploying to affected sites (task sites:sync) and running associated tasks (setting mode, adjusting resource requests etc.)
  3. More?

Personally I think updating sites.yaml can be tricky to get entirely right. Perhaps it would be possible to create GitHub workflows which mapped to the runbooks we already have established.

Running task sites:sync is time-consuming but as I see it primarily because you have to wait for it to finish. Running it in GitHub Actions allows you to detach from the process locally. We know from experience that once deployments have kicked off, many things can happen and for the moment we cannot be certain that we are ready to reset the cluster state once the first round of deployments have been started.

This PR lands somewhere between all this in a manner that I do not understand. For one thing it seems to bypass the work that task sites:sync currently handles in maintaining the GitHub repo for the individual library site.

If I was to look into automating deployments I would start out by reusing dplsh to effectuate changes from sites.yaml to the correspond GitHub repos. This could be in several forms:

  • Running task sites:sync in one process to have a central log for it.
  • Running SITE=site task site:full-sync in a matrix with one process per site in sites.yaml. Then the process could watch for the deployment to complete and report status. Bear in mind that we still have limited parallelization in Lagoon at the moment so some processes would need to wait quite a while.

My primary caveat around automating deployments is that it often assumes stable deployments. That is currently not something we have. I think we should either focus our efforts there or ensure that we have instability in mind when designing the automation.

Comment on lines 3 to 4
on:
workflow_dispatch: # Allows manual trigger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the target version an input to the workflow instead of reading it from sites.yaml.

@ITViking
Copy link
Contributor Author

ITViking commented Dec 3, 2024

@kasperg I've been making other efforts towards a more stable and secure deployment of releases these last few weeks, eg. having installed vertical-pod-autoscalers for site-related, and some Lagoon related, workloads. Right now they are running in "suggestion mode", so in practice they don't do anything until I have turned them on, but when turned on, they remove the need for manuelly adjusting pod resource.

This addition will for starters replace task sites:sync and site:sync, which are completely manual and human-error prone, which we have seen demonstrated more than once, in various forms. This will bring us to a new and more safe release procedure, which will require a review and approval, but which will also be way easier for eg. some other developers to paticipate in.

In a short while I will figure out how to rerun failed deployments, so the process is fully automated.

That will leave the syncing of moduletest with production, which should exist in dplshell, because it quite harmless. It also not something we run on every single deployment.

I hope this clearifies the path I'm currently on

@kasperg
Copy link
Contributor

kasperg commented Dec 3, 2024

@ITViking

I've been making other efforts towards a more stable and secure deployment of releases these last few weeks [...] In a short while I will figure out how to rerun failed deployments, so the process is fully automated.

Sounds awesome! I am very much looking forward to seeing this in action.

This addition will for starters replace task sites:sync and site:sync, which are completely manual and human-error prone, which we have seen demonstrated more than once, in various forms.

Personally I do not see the two specific tasks as manual or error-prone. We usually call them with few (one?) arguments and they seldomly fail. However our runbooks/experience for deploying show that we currently have a lengthy manual process around them with other failing parts in order to complete a deployment successfully.

Deployment as currently suggested in this PR (pushing an updated tag to a Dockerfile in a Git repo) only handles part of the responsibility of task site:sync. I worry that this over time will simply recreate the task in GitHub workflows scripts.

The current combination of task and shell scripts for sites:sync may not be optimal, but my suggestion would be to start out by recreating our runbook in a GitHub Workflow, run the existing tasks from dplsh as a part of this and work our way out from there.

With the increased stability, autoscaling, automated rerun of failed deployments etc. the resulting runbook (and thus workflow) should be significantly simpler.

That will leave the syncing of moduletest with production, which should exist in dplshell, because it quite harmless.

As I see it we could also run this from a manually triggered GitHub workflow.

@ITViking
Copy link
Contributor Author

ITViking commented Dec 3, 2024

The thing left out of the intended goal of deploying a new release via Github actions is everything surrounding the lagoon related files.
We should fix that too, because they way we do it right now, is what requires us to first deploy, and then go make a manual change to kobenhavn-main's docker-compose file, as the current way of creating the "profile-template" completely replaces the existing branch with what we're bringing from our local machines. If that got merged instead, we wouldn't have the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants