-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DPE-6341] Upgrade to 24.04 base and etcd v3.5.18 #32
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes issue where files in charm directory conflict with files created by poetry plugin during staging
The continuous writes function in the integration test suite needs to be able to run independently. Currently it is executed via the `etcdctl` command being run in one of the deployed juju units. This is problematic as we might want to remove this specific unit as part of the scaling/HA tests. This PR adjusts integration test setup and installs the deb-package `etcd-client` on the host machine. The continuous writes function is then run with this client's `etcdctl` and thus not depending on execution in specific units anymore. The function is still kept very simple: - It needs to be started with `start_continuous_writes(endpoints=...)` and stopped with `stop_continuous_writes()`. - It can be started with multiple endpoints though and persists through changes of Raft or Juju leadership. - Use the `assert_continuous_writes_increasing` and `assert_continuous_writes_consistent` helpers to ensure that the writes continued throughout any cluster/HA related operation. **Note:** There is no currently maintained Python client lib for etcd (see [issue](etcd-io/etcd#13285)), that's why we keep using `etcdctl` to connect to etcd on each write. Testing with several of the available, unsupported clients failed because they either don't support authentication or cannot be installed to due incompatibilities with protobuf.
This pull request added CA rotation to the charm. ### TLS Certificate Management Enhancements: * **New TLS CA Rotation States:** - Added `TLSCARotationState` enum to represent different stages of CA rotation (`src/literals.py`). * **CA Rotation and Cleanup Methods:** - Introduced methods in `TLSManager` for setting CA rotation state, checking CA status across servers, and cleaning up old CAs (`src/managers/tls.py`). [[1]](diffhunk://#diff-a2c036f44ae9bbab595aad84b23bc3106a342c5ab12f7b0cf3d26268b51cdc83R136-R204) [[2]](diffhunk://#diff-a2c036f44ae9bbab595aad84b23bc3106a342c5ab12f7b0cf3d26268b51cdc83L57-R102) * **Event Handling Updates:** - Added `CleanCAEvent` to handle CA cleanup events and integrated it into the `TLSEvents` class (`src/events/tls.py`). [[1]](diffhunk://#diff-3dfb4a104b407f7d4b2e07795a63745a09cde228a8dbe86602e7b72ea600fbe4L15-R57) [[2]](diffhunk://#diff-3dfb4a104b407f7d4b2e07795a63745a09cde228a8dbe86602e7b72ea600fbe4R88-R89) [[3]](diffhunk://#diff-3dfb4a104b407f7d4b2e07795a63745a09cde228a8dbe86602e7b72ea600fbe4R229-R243) ### Codebase Improvements: * **Logging Enhancements:** - Improved logging messages to provide more context during rolling restarts and CA rotations (`src/charm.py`). [[1]](diffhunk://#diff-b9ed39bbc9c0387bd3e07da31d13373745534a1cd723d3e292c73496b12e307cL68-R78) [[2]](diffhunk://#diff-b9ed39bbc9c0387bd3e07da31d13373745534a1cd723d3e292c73496b12e307cR178-R202)
This reverts commit 9054884.
This PR adds scale-down functionality to the etcd operator. It includes the following changes: **EtcdEvents** - new methods for handling the `RelationDepartedEvent` and `StorageDetachingEvent` **ClusterManager** - refactoring of the `get_leader()` method (now `leader` property) to return the `leader id` instead of the `endpoint` of the leader - a new method for selecting a new Raft leader from the remaining cluster members before removing a cluster member - a new method `remove_member()` which makes sure a) the cluster is always available even if removing the Raft leader and b) the unit removal is blocked until the cluster could be reconfigured or errors out in case it can't **EtcdClient** - new method `remove_member()` to reconfigure the etcd cluster without the departing unit - new method `move_leader()` to transfer Raft leadership from in the etcd cluster to a different unit - remove duplicate arguments for the `_run_etcdctl()` method (accidentally introduced in [PR#13](#13)) **TLSEvents** - an update to the `_on_certificates_broken` handler to only initiate a rolling restart if the respective unit is started and not departing soon **Workload** - new method `stop()` to stop a snap service **Test Coverage** Integration tests in `test_scaling.py` for - removing one unit - removing the Raft leader - removing multiple units - scaling to zero and back up - remove the Juju leader - remove the application Unit tests in `test_charm.py` for the `storage_detaching` handler **Integration Test Helpers** - a new method `get_raft_leader()` returning the member-name of the Raft leader in an etcd cluster - a new method `wait_for_cluster_formation()` that waits until all cluster members have been promoted to full-voting member - `get_application_relation_data()` taken over from other integration test suits to check for peer relation data in integration tests - a fix to the `continuous_writes` for cleanup of previous runs in the same etcd cluster
…ember anymore, use all cluster endpoints instead
# Conflicts: # src/managers/cluster.py # tests/integration/ha/test_scaling.py # tests/integration/test_charm.py # tests/integration/tls/test_ca_rotation.py # tests/integration/tls/test_tls.py # tox.ini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks René, I have a question / concern on raise_on_error
Mehdi-Bendriss
approved these changes
Feb 3, 2025
skourta
approved these changes
Feb 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @reneradoi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR upgrades the etcd operator to run on Ubuntu 24.04 base image and deploy etcd in version 3.5.18.
Besides those changes, it includes the following adjustments / fixes:
ops_test.build_charm
is deprecated for 24.04 in data-platform-workflows (see deprecation notice here). Therefore we now build the charm outside of the tests when running the tests locally (see here), and assume the charm file exists in the tests.raise_on_error=False
inwait_for_idle
when scaling down. Themember remove
operation might result in errors on thestorage_detaching
hook if the member cannot be removed at the specific moment (desired behaviour). The operation might succeed on the next run, but the test would fail then.