Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a blog post to summarize the issue in 3.5 to 3.6 upgrade #974

Merged
merged 1 commit into from
Mar 28, 2025

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Mar 25, 2025

The blog is to summarize the upgrade issue etcd-io/etcd#19557. We need to broadcast the blog post to increase awareness, so that users take action before upgrading to v3.6.0[-rc.x].

cc @fuweid @ivanvc @jberkus @jmhbnz @serathius @siyuanfoundation @spzala @wenjiaswe

@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

also cc @neolit123

@serathius
Copy link
Member

Should user care at all if v3.6 will come with a complete fix and no user action is needed? or maybe I didn't understood etcd-io/etcd#19636

@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

Should user care at all if v3.6 will come with a complete fix and no user action is needed? or maybe I didn't understood etcd-io/etcd#19636

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

Should user care at all if v3.6 will come with a complete fix and no user action is needed? or maybe I didn't understood etcd-io/etcd#19636

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

Also it's part of the fixes. The summary should be complete.

All the links are just for experienced users references. I don't think normal users will understand the details of the PRs without expertise of etcd. It's exactly the reason why we need to ensure the blog is clear and easy to understand.

Also we have a "TL; DR" section for entry-level users or anyone who don't care about the details.

@serathius
Copy link
Member

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

With etcd-io/etcd#19636 would it just work?

@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

With etcd-io/etcd#19636 would it just work?

Obviously it won't work. Please see the the first case as mentioned in
If the etcd cluster has already been affected by the issue, there are two possible outcomes

@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

With etcd-io/etcd#19636 would it just work?

Obviously it won't work. Please see the the first case as mentioned in If the etcd cluster has already been affected by the issue, there are two possible outcomes

To be clearer, the etcd-io/etcd#19636 is only to address the second case as mentioned in the two possible cases if users still upgrade from 3.5.1-3.5.19 to 3.6.0.

Copy link
Contributor

@jberkus jberkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny grammatical change, but also I think we really should change the title to be more grabbing.

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahrtr !! I have added few small comments inline.

@ahrtr ahrtr force-pushed the upgrade_issue_20250325 branch from 55613bc to 8701d69 Compare March 25, 2025 20:42
@ahrtr
Copy link
Member Author

ahrtr commented Mar 25, 2025

Thanks both @jberkus and @spzala for the review!

Basically addressed all comment, PTAL, thx

@ahrtr ahrtr force-pushed the upgrade_issue_20250325 branch 3 times, most recently from 942757c to 8db3f37 Compare March 25, 2025 21:06
Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for quickly addressing the review comments @ahrtr !!

@siyuanfoundation
Copy link
Contributor

With etcd-io/etcd#19636 would it just work?

etcd-io/etcd#19636 would just work if v2store snapshot is saved. If not, we have to assume the v3store is correct, and prevent too many learners.

@serathius
Copy link
Member

serathius commented Mar 26, 2025

It answers the question "What happens if users still upgrade directly from etcd v3.5.1-v3.5.19 to v3.6.0?".

With etcd-io/etcd#19636 would it just work?

Obviously it won't work. Please see the the first case as mentioned in If the etcd cluster has already been affected by the issue, there are two possible outcomes

Oh, I forgot that in v3.6 the etcd will bootstrap membership from v3 storage. Sorry, I'm 100% focused on KubeCon. This means that etcd might have incorrect information until fully we process local WAL.

With etcd-io/etcd#19636 in wouldn't a complete fix just require on more step? When patching storage during bootstrap also read WAL for any PromoteLearner proto, and we are done? No need to communicate, no need for user action, etc.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 26, 2025

When patching storage during bootstrap also read WAL for any PromoteLearner proto, and we are done?

Yes. It's exactly the reason why we sync the v3store not only on bootstrap (inside (*bootstrappedCluster) Finalize(...)", but also during the apply (in case the learner promotion is still in the WAL records).

No need to communicate, no need for user action, etc.

Right, no further action is needed from users. We just need to clarify it in the blog to avoid any potential confusion or questions.

Overall, users are only required to upgrade to 3.5.20 (or a higher version) before upgrading to 3.6.0. We also clarify what will happen if users do not follow this guide (we guarantee that we won't silently swallow any errors).

Comment on lines +66 to +67
it was discovered in Kubernetes' workflow test. To address this gap, we added a similar e2e
test via [19634][], which was also backported to release-3.6 via [19662][].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
it was discovered in Kubernetes' workflow test. To address this gap, we added a similar e2e
test via [19634][], which was also backported to release-3.6 via [19662][].
it was discovered in Kubernetes' workflow test. To address this gap, we added a Kubernetes style upgrade e2e
test via [19634][], which was also backported to release-3.6 via [19662][].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try not to create new concept "Kubernetes style upgrade" to avoid any unnecessary confusion/questions, also avoid wordy explanation.

I used "Kubeadm style upgrade" in the description of etcd-io/etcd#19557. It's OK in that informal case, but suggest not for this formal (and to be widely broadcasted) blog.

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a comment with a minor nit. Thanks, @ahrtr.

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr ahrtr force-pushed the upgrade_issue_20250325 branch from 8db3f37 to d49f19c Compare March 27, 2025 06:51
@ahrtr
Copy link
Member Author

ahrtr commented Mar 27, 2025

@jberkus we will release etcd v3.6.0-rc.3 tonight. I am going to merge this PR right after the release being out.

Do you have any further comments?

@jberkus
Copy link
Contributor

jberkus commented Mar 27, 2025

Nope, it's good to go

/hold

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ivanvc, jberkus, neolit123, siyuanfoundation, spzala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ahrtr,ivanvc,jberkus,spzala]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ivanvc
Copy link
Member

ivanvc commented Mar 28, 2025

/unhold

v3.6.0-rc.3 is out :)

@ahrtr
Copy link
Member Author

ahrtr commented Mar 28, 2025

Thanks. @ivanvc

@ahrtr ahrtr merged commit 25cc3bc into etcd-io:main Mar 28, 2025
6 checks passed
@ahrtr
Copy link
Member Author

ahrtr commented Mar 28, 2025

Thanks all for the review.

Do you think we should broadcast this blog now (given v3.6.0-rc.3 is already out) or wait until v3.6.0 is out? @fuweid @jberkus @jmhbnz @ivanvc @serathius @spzala @wenjiaswe

https://etcd.io/blog/2025/upgrade_from_3.5_to_3.6_issue/

@spzala
Copy link
Member

spzala commented Mar 28, 2025

Thanks all for the review.

Do you think we should broadcast this blog now (given v3.6.0-rc.3 is already out) or wait until v3.6.0 is out? @fuweid @jberkus @jmhbnz @ivanvc @serathius @spzala @wenjiaswe

https://etcd.io/blog/2025/upgrade_from_3.5_to_3.6_issue/

Considering the blog post is public and upgrade to rc versions can potentially hit the issue covered in the blog, I think it's probably a good idea to broadcast (after KubeCon). It can also serve as a good reminder/heads-up for v3.6.0, and at that time we can amplify more. Just thoughts if others agree. Thanks @ahrtr !!

@ahrtr
Copy link
Member Author

ahrtr commented Mar 28, 2025

Thanks @spzala for the feedback, which makes sense.

I will call this out in KubeCon maintainer session regardless. Leave it to @jberkus to broadcast?

@spzala
Copy link
Member

spzala commented Mar 28, 2025

Thanks @spzala for the feedback, which makes sense.

I will call this out in KubeCon maintainer session regardless. Leave it to @jberkus to broadcast?

Thanks @ahrtr !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants