Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Fullscanoperation thread to choose only alive node #9284

Open
2 tasks
aleksbykov opened this issue Nov 19, 2024 · 1 comment · May be fixed by #9370
Open
2 tasks

Fix Fullscanoperation thread to choose only alive node #9284

aleksbykov opened this issue Nov 19, 2024 · 1 comment · May be fixed by #9370
Assignees

Comments

@aleksbykov
Copy link
Contributor

Packages

Scylla version: 2024.2.0-20241118.614d56348f46 with build-id e67376d9ddfea081a3bab398f4581ecdde59911d

Kernel Version: 5.15.0-1072-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Full scan operation chose node which was then used by rolling restart nemesis and cause the error message:

2024-11-18 22:32:44.845: (FullScanAggregateEvent Severity.ERROR) period_type=end event_id=e79676a5-c38c-4b33-b7cf-b3f9d96610a8 during_nemesis=RollingRestartCluster duration=13s node=longevity-tls-50gb-3d-2024-2-db-node-c5d16022-6 select_from=keyspace1.standard1 message=FullScanAggregatesOperation operation failed, ReadTimeout error: ReadTimeout('Error from server: code=1200 [Coordinator node timed out waiting for replica nodes\' responses] message="Operation failed for keyspace1.standard1 - received 0 responses and 1 failures from 1 CL=ONE." info={\'consistency\': \'ONE\', \'required_responses\': 1, \'received_responses\': 0}')

Need to fix FullScan thread to choose only alive nodes
-or-
Fix rolling restart nemesis to mark restarting node as busy for other operations

Impact

Reported Error event mark job as failed.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-9 (52.4.92.28 | 10.12.35.142) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-8 (98.85.39.206 | 10.12.35.198) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-7 (52.72.119.13 | 10.12.34.73) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-6 (44.214.249.197 | 10.12.34.72) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-5 (35.172.65.4 | 10.12.32.10) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-4 (34.227.247.191 | 10.12.35.166) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-3 (50.19.104.112 | 10.12.32.218) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-2 (34.237.79.41 | 10.12.34.15) (shards: 14)
  • longevity-tls-50gb-3d-2024-2-db-node-c5d16022-1 (54.81.140.125 | 10.12.34.222) (shards: 14)

OS / Image: ami-06d63888ff4cf3d3f (aws: undefined_region)

Test: longevity-50gb-3days-test
Test id: c5d16022-93b6-44b1-9bab-22571a3eade5
Test name: enterprise-2024.2/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor c5d16022-93b6-44b1-9bab-22571a3eade5
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs c5d16022-93b6-44b1-9bab-22571a3eade5

Logs:

Jenkins job URL
Argus

@temichus temichus self-assigned this Nov 19, 2024
@roydahan
Copy link
Contributor

I also noticed this issue.
Is it really relevant only to rolling restart?
Isn't it relevant to every FullScan that may happen during disruptive nemesis?

temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce acommon targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
@temichus temichus linked a pull request Nov 26, 2024 that will close this issue
2 tasks
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce acommon targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 common run_nemesis wrapper can provide a node
that is not under disruptive_nemesis together with providing a node with no nemesis.
This will allow non-disruptive operations to pick the same node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 26, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 27, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Nov 28, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 1, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 2, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 3, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 9, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 10, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 11, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
temichus added a commit to temichus/scylla-cluster-tests that referenced this issue Dec 11, 2024
this commit has the following changes

1 introduce common targed_node_lock mechanism
that can be used in nemesis and Scan operations

2 FullScan operation now run only on free of nemeses node

3 change all node.running_nemesis settings to use common methods
set/unset_running_nemesis from common targed_node_lock file (except unit tests)

4 change disrupt_rolling_restart_cluster nemesis to lock all nodes in
the cluster befo performing restart

fixes: scylladb#9284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants