Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LP2053031: Adding tuning params #213

Conversation

FrancescoDeSimone
Copy link
Contributor

@FrancescoDeSimone FrancescoDeSimone commented Mar 25, 2024

this PR add tuning parameters like heartbeat_interval, election_timeout, snapshot_count, as describe in LP:#2053031

@reneradoi
Copy link

Hello @FrancescoDeSimone, I'm not sure if you saw the reply on Matrix, that's why I'll leave my comment here as well. Please refer here for further discussion.

Copy link
Member

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR and bringing attention to it. I hope to start a good conversation around this bug. So far the work looks good i only have a problem with the default for snapshot_count

Copy link
Contributor

@HomayoonAlimohammadi HomayoonAlimohammadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks a lot @FrancescoDeSimone! Left some comments and suggestions.

FrancescoDeSimone and others added 4 commits January 22, 2025 14:29
Co-authored-by: Homayoon Alimohammadi <[email protected]>
Co-authored-by: Homayoon Alimohammadi <[email protected]>
Co-authored-by: Homayoon Alimohammadi <[email protected]>
Copy link
Member

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a tiny nit remains. Really nice work @FrancescoDeSimone

can you check out the unit test failures?

Copy link
Member

@addyess addyess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work. LGTM 🍰

@addyess
Copy link
Member

addyess commented Jan 23, 2025

odd... the first run of the integration tests failed to complete test_snapshot_restore

I'll run again and see if it is successful

@addyess
Copy link
Member

addyess commented Jan 23, 2025

@FrancescoDeSimone well I guess it wasn't a fluke. The integration tests are failing in test_snapshot_restore due to these changes. DO you have time to look into why this failure occurs?

@FrancescoDeSimone
Copy link
Contributor Author

hi @addyess try to understand what's the issue. Don't seems related to my changes. I tried to run the same test on the main branch and I got the same results

======================================== short test summary info =========================================
FAILED tests/integration/test_etcd.py::test_snapshot_restore - AssertionError: assert 'failed' == 'completed'
========================= 1 failed, 10 passed, 6 warnings in 1190.96s (0:19:50) ==========================
integration: exit 1 (1191.70 seconds) /home/ubuntu/test/layer-etcd> pytest -v -s --tb native --log-cli-level=INFO /home/ubuntu/test/layer-etcd/tests/integration pid=110510
  integration: FAIL code 1 (1204.01=setup[12.31]+cmd[1191.70] seconds)
  evaluation failed :( (1204.08 seconds)
ubuntu@ubuntu:~/test/layer-etcd$ git branch
* main

looks something relative to getent binary. I will try to dig deeper

unit-etcd-1: 18:48:14 ERROR unit.etcd/1.juju-log b'cmd_run.go:1276: WARNING: cannot create user data directory: cannot get the current user: getent could not be executed: exec: "getent": executable file not found in $PATH\ncmd_run.go:1281: WARNING: cannot copy user Xauthority file: cannot get the current user: getent could not be executed: exec: "getent": executable file not found in $PATH\nCouldn\'t find a member in the cluster with an ID of cmd_run.go.\n'

@FrancescoDeSimone
Copy link
Contributor Author

@addyess I am currently facing the same problem with the upstream etcd, running this script

juju add-model test
juju deploy easyrsa 
juju deploy ch:etcd --to 0
juju relate etcd easyrsa
juju wait-for application etcd
juju run etcd/leader snapshot
juju scp etcd/leader:"/home/ubuntu/etcd-snapshots/etcd-snapshot-*.tar.gz" snapshot.tar.gz 
juju attach-resource etcd snapshot=./snapshot.tar.gz
juju run etcd/leader restore

got the same error in juju debug-log

unit-etcd-0: 19:29:27 ERROR unit.etcd/0.juju-log b'cmd_run.go:1276: WARNING: cannot create user data directory: cannot get the current user: getent could not be executed: exec: "getent": executable file not found in $PATH\ncmd_run.go:1281: WARNING: cannot copy user Xauthority file: cannot get the current user: getent could not be executed: exec: "getent": executable file not found in $PATH\nCouldn\'t find a member in the cluster with an ID of cmd_run.go.\n'

@addyess
Copy link
Member

addyess commented Jan 28, 2025

This will pass once the following is merged

@addyess
Copy link
Member

addyess commented Jan 28, 2025

@FrancescoDeSimone please rebase from main

@HomayoonAlimohammadi HomayoonAlimohammadi merged commit c086dd5 into charmed-kubernetes:main Jan 29, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants