-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental version checks during migration #2399
base: main
Are you sure you want to change the base?
Conversation
c3dca19
to
21653d6
Compare
What is the motivation for keeping the phase of prepare in-cluster? Isn't it just critical to track the apply phase? |
Seemed a natural starting point for these checks.
Yes. |
#2315 (comment) is probably where I got that from, to track the whole upgrade process from the start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM!
Regarding the tracking of the prepare phase, I like that we are able to make sure that the config has been properly updated and get rid of the ck8sVersion
from the config on the other hand if there is no Kubernetes cluster to verify against then we still have to rely on something external.
No strong opinion here though. It doesn't hurt but might add some extra complexity.
bin/upgrade.bash
Outdated
prepared_version="$(get_prepared_version sc || true)" | ||
if [ -z "${prepared_version}" ]; then | ||
log_fatal "'prepare' step does not appear to have been run, do so first" | ||
fi | ||
|
||
if [[ "${prepared_version}" != "${CK8S_TARGET_VERSION}" ]]; then | ||
log_fatal "'prepare' step in sc appears to have been run for version ${prepared_version}, not ${CK8S_TARGET_VERSION}" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Extract into a separate function and reuse for both sc and wc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3162bf7
to
53ac173
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still approved after removing the prepare phase tracking and keep relying on ck8sVersion
in the config file. However, I still do think that getting rid of the ck8sVersion
config value and only rely on the apps-meta
config map would have been neat.
But if it is in cluster it doesn't say anything, as the config isn't attached to it. Then you must apply some sort of checksumming to assert that it is that config you mean. |
bin/upgrade.bash
Outdated
# Create a configmap. This fails if done twice. | ||
if [[ "${CK8S_CLUSTER:-}" =~ ^(sc|both)$ ]]; then | ||
if ! record_migration_begin sc; then | ||
log_fatal "prepare already started in sc (kubectl delete -n kube-system configmap apps-upgrade to try again)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_fatal "prepare already started in sc (kubectl delete -n kube-system configmap apps-upgrade to try again)" | |
log_fatal "apply already started in sc (kubectl delete -n kube-system configmap apps-upgrade to try again)" |
Also, would it make sense to add a ck8s upgrade <prefix> unlock-apply
to do this?
(With some warning+confirm that you probably shouldn't unless you know what you are doing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a thing with --force
, would a separate sub-command be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, because else some people might just always run with --force
and that would defeat the purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a ck8s reset-upgrade
command to bypass some of the argument related logic that ck8s upgrade ...
does. How's it look?
Yes you're right, the inherent issue of working with distributed configuration files still applies. At least now we can make sure that the cluster has been migrated to a new version which is a big win. I think my main concern with the |
bin/ck8s
Outdated
version) | ||
shift | ||
case "${1:-both}" in | ||
both) | ||
log_info "sc version:" "$(get_version "sc")" | ||
log_info "wc version:" "$(get_version "wc")" | ||
;; | ||
sc | wc) | ||
log_info "${1} version:" "$(get_version "${1}")" | ||
;; | ||
esac | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have this command. I think it could be even better if it also printed the version in the config, so you can easily see if they are the same. Does that sound good to you as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Should just be one version in defaults/common-config.yaml
, or are there cases where wc and sc may differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the only logical place to override the version is in common. It might be technically possible to have it in sc or wc, but that is not intended.
So just comparing with common should be fine, but doing the regular merge of sc with common and wc with common might be safest.
scripts/migration/lib.sh
Outdated
cluster_version="$(get_apps_version "${1}" >/dev/null 2>&1 || true)" | ||
if [ -n "${cluster_version}" ]; then | ||
log_info "Currently running ${cluster_version} according to cluster" | ||
if [ "${common_override}" != "null" ] && [ "${cluster_version%.*}" != "${common_override}" ]; then | ||
log_warn "Version mismatch, cluster ${cluster_version}, config ${common_override}" | ||
fi | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Could you elaborate on why we have this block? If I understand it correctly then it will warn if there is a version override in the common config and that is not the same as the applied version. But it will not exit and is not part of a prompt, so it would just be informational when looking at what is happening.
Could it perhaps make sense to merge this check into the prompt above it? Then you would still get this information, but it can be used as more context to decide if you want to abort or not. What do you think of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, probably missed that it's the override config it checks against. The intent was probably to check the pre-upgrade config version (defaults/common-config.yaml → .global.ck8sVersion
) against the in-cluster version (if available). E.g. if you try to upgrade to v0.43
and your config says v0.42
but your cluster says v0.41
or similar. This gets complicated and hard to think about since the config is changed by ck8s init
.
Maybe just drop this, or what do you think is the way forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the best way forward is.
Merging it with the block above might still be best, so that you get a chance to abort. Because I think that there is some value to having this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the block above the abort check but removed the mismatch check since it would be comparing the to-be-upgraded-to version recorded by ck8s init
to the currently running version, which wasn't the intent and it would always mismatch unless try to migrate a second time.
If that is the concern then we should change how |
This is meant to atomically record the state transition into a migration (by creation) and then out of it (by deletion). The general apps version is recorded at the end, and compared to the config version on next upgrade.
This reverts commit 53ac173. Decided to go with the way where it starts tracking at the prepare stage.
should apply have one too?
This reverts commit b90a9b3.
separate command to discourage always running with --force
going trough ck8s upgrade invokes a number of behaviors that are not needed
one of the config checks were wrong, better to only have one
it's going to mismatch every you upgrade because it compares the version you have with the version you are upgrading to since ck8s init changes it
608960d
to
4150a95
Compare
Would it be enough to add something here that checks in the apply steps that |
Or just move the update of ck8sVersion to be done as a final step when upgrade prepare completes. |
bin/ck8s
Outdated
@@ -35,7 +35,9 @@ usage() { | |||
echo " update-ips <wc|sc|both> <apply|dry-run> Automatically fetches and applies the IPs for network policies" 1>&2 | |||
echo " upgrade <wc|sc|both> <vX.Y> apply runs all apply steps upgrading the environment" 1>&2 | |||
echo " upgrade <wc|sc|both> <vX.Y> prepare runs all prepare steps upgrading the configuration" 1>&2 | |||
echo " unlock-upgrade <wc|sc|both> reset a migration to allow trying again" 1>&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, could this be upgrade-unlock
instead? Makes similar commands stay together.
Or perhaps upgrade <wc|sc|both> unlock
to be more similar in structure to the rest of the upgrade commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, could this be
upgrade-unlock
instead?
Sure
Or perhaps
upgrade <wc|sc|both> unlock
Finding a good way to bypass the existing path that requires a version number was a bit awkward.
[[ "${2}" =~ ^(wc|sc|both)$ ]] || usage | ||
export CK8S_CLUSTER="${2}" | ||
"${here}/upgrade.bash" "unlock" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo " upgrade <wc|sc|both> unlock reset a migration to allow trying again" 1>&2
####
upgrade)
[[ "${2}" =~ ^(wc|sc|both)$ ]] || usage
if [[ "${3}" == "unlock" ]]; then
export CK8S_CLUSTER="${2}"
"${here}/upgrade.bash" "unlock"
exit 0
fi
[[ "${3}" =~ ^(v[0-9]+\.[0-9]+)$ ]] || usage
[[ "${4}" =~ ^(prepare|apply)$ ]] || usage
check_tools
export CK8S_CLUSTER="${2}"
"${here}/upgrade.bash" "${3}" "${4}"
;;
I guess this would work, tho feels a bit awkward to jump out in the middle here. @aarnq ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on how you write it, you don't have to jump in the middle.
[[ "${2}" =~ ^(wc|sc|both)$ ]] || usage
[[ "${3}" =~ ^(v[0-9]+\.[0-9]+|unlock)$ ]] || usage
if [[ "${3}" != "unlock" ]]; then
[[ "${4}" =~ ^(prepare|apply)$ ]] || usage
fi
check_tools
export CK8S_CLUSTER="${2}"
"${here}/upgrade.bash" "${3}" "${4:-}"
Warning
This is a public repository, ensure not to disclose:
What kind of PR is this?
Required: Mark one of the following that is applicable:
Optional: Mark one or more of the following that are applicable:
Important
Breaking changes should be marked
kind/admin-change
orkind/dev-change
depending on typeCritical security fixes should be marked with
kind/security
What does this PR do / why do we need this PR?
ck8s upgrade prepare
is run beforeck8s upgrade apply
ck8s upgrade apply
is run beforeck8s apply
TODOThis uses a pair of ConfigMaps to keep track of progress during apps upgrades
in order to enforce order (prepare first, then apply, and for the same version)and atomicity
(only allow prepare once)per step.apps-meta
stores the current version of apps.data.version
the version number itselfapps-upgrade
stores details about an in-progress upgrade, and is deleted to signify a completed migration.data.prepare
records thatck8s upgrade ... prepare
has been started.data.prepared
records thatck8s upgrade ... prepare
has completed.data.prepared
records thatck8s upgrade ... apply
has started.data.last_step
records the last snippet run byck8s upgrade ...
Information to reviewers
This is intended to behave as a state machine, going from start
to preparetoapply (for each step) to done, at which point the upgraded-to version is
recorded.
How to test: I applied the changes on top of the
release-0.42
branch and upgraded from 0.41.x to 0.42.1, then again to 0.43 by creating a local tag (not to be published).Checklist