Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(nodetool rebuild): use repair instead of rebuild if no tablets support #9073

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yarongilor
Copy link
Contributor

if no tables support for nodetool rebuild, test should use an alternative action of repair. it should then disable load-balancing and repair all nodes in this datacenter.
refs: scylladb/scylladb#17575
refs: scylladb/scylladb#20084 (comment)

Testing

  • [ ]

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@yarongilor yarongilor added the backport/2024.2 Need backport to 2024.2 label Oct 28, 2024
@yarongilor
Copy link
Contributor Author

yarongilor commented Oct 28, 2024

@bhalevy , can you please advise, following scylladb/scylladb#20084 (comment) -

IIUC, in case scylladb/scylladb#17852 is open all DC nodes should be manually repaired.
but then, otherwise, there is no need for a repair at all? or only repair target node?

and secondly, i'm not sure, is it right to backport this fix to 2024/6.x ? (it may have an extensive impact on longevities and testing for this PR)

@yarongilor yarongilor added area/tablets and removed backport/2024.2 Need backport to 2024.2 labels Oct 28, 2024
@yarongilor yarongilor force-pushed the skip_rebuild_streaming_err_with_tablets branch from 56d1bbe to 5340448 Compare October 29, 2024 08:13
…pport

if no tables support for nodetool rebuild, test should use an alternative action of repair.
it should then disable load-balancing and repair all nodes in this datacenter.
refs: scylladb/scylladb#17575
refs: scylladb/scylladb#20084 (comment)
@yarongilor yarongilor force-pushed the skip_rebuild_streaming_err_with_tablets branch from 5340448 to f24debe Compare October 30, 2024 15:19
with self.cluster.cql_connection_patient(self.target_node) as session:
if is_tablets_feature_enabled(session=session) and not is_rebuild_supported:
for node in [n for n in self.cluster.nodes if n.dc_idx == self.target_node.dc_idx]:
node.run_nodetool(sub_cmd="repair")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend doing long_running=True, retry=0

also maybe to consider hard timeout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'm not sure you have guarantee all the nodes in this DC are up and running...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants