Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

yarongilor · 2024-09-23T14:31:55Z

Packages

Scylla version: 2024.2.0~rc2-20240904.4c26004e5311 with build-id a8549197de3c826053f88ddfd045b365b9cd8692

Kernel Version: 5.15.0-1068-aws

Issue description

The backup restore failed with error:

restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0

The restore task was started like:

< t:2024-09-19 15:13:31,266 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: restore/08395d3e-1492-4af3-86dc-d9b0b03039fc
< t:2024-09-19 15:13:31,564 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.255Z","N":"restore","M":"Initialized views","views":null,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,564 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.257Z","N":"scheduler","M":"PutTask","task":"restore/08395d3e-1492-4af3-86dc-d9b0b03039fc","schedule":{"cron":"{\"spec\":\"\",\"start_date\":\"0001-01-01T00:00:00Z\"}","window":null,"timezone":"Etc/UTC","start_date":"0001-01-01T00:00:00Z","interval":"","num_retries":3,"retry_wait":"10m"},"properties":{"location":["s3:manager-backup-tests-permanent-snapshots-us-east-1"],"restore_schema":true,"snapshot_tag":"sm_20240812164539UTC"},"create":true,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,565 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.264Z","N":"scheduler.4253a65e","M":"Schedule","task":"restore/08395d3e-1492-4af3-86dc-d9b0b03039fc","in":"0s","begin":"2024-09-19T15:13:31.264Z","retry":0,"_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}
< t:2024-09-19 15:13:31,565 f:base.py         l:231  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Sep 19 15:13:31 alternator-ttl-4-loaders-no-lwt-sis-monitor-node-4afc0c3a-1 scylla-manager[11147]: {"L":"INFO","T":"2024-09-19T15:13:31.264Z","N":"http","M":"POST /api/v1/cluster/4253a65e-2c97-48dc-a939-7c7590741a75/tasks","from":"127.0.0.1:34234","status":201,"bytes":0,"duration":"3766ms","_trace_id":"jlyaoqGiR0ab_NHSOrxl0g"}

then failed:

< t:2024-09-19 15:13:35,364 f:base.py         l:143  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.4.1.202>: Command "sudo sctool  -c 4253a65e-2c97-48dc-a939-7c7590741a75 progress restore/08395d3e-1492-4af3-86dc-d9b0b03039fc" finished with status 0
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Restore progress
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Run:              bb1652f4-7699-11ef-bc2a-0a833fefb519
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Status:           ERROR (restoring backed-up data)
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Cause:            restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Start time:       19 Sep 24 15:13:31 UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > End time: 19 Sep 24 15:13:33 UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Duration: 2s
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Progress: 0% | 0%
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20240812164539UTC
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ╭───────────────┬──────────┬──────────┬─────────┬────────────┬────────╮
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > │ Keyspace      │ Progress │     Size │ Success │ Downloaded │ Failed │
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ├───────────────┼──────────┼──────────┼─────────┼────────────┼────────┤
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > │ system_schema │  0% | 0% │ 352.731k │       0 │          0 │      0 │
< t:2024-09-19 15:13:35,364 f:cli.py          l:1132 c:sdcm.mgmt.cli        p:DEBUG > ╰───────────────┴──────────┴──────────┴─────────┴────────────┴────────╯
< t:2024-09-19 15:13:35,364 f:cli.py          l:1148 c:sdcm.mgmt.cli        p:DEBUG > sctool res after parsing: [['Restore progress'], ['Run: bb1652f4-7699-11ef-bc2a-0a833fefb519'], ['Status: ERROR (restoring backed-up data)'], ['Cause: restore data: create "100gb_sizetiered_6_0" ("100gb_sizetiered_6_0") with CREATE KEYSPACE "100gb_sizetiered_6_0" WITH replication = {\'class\': \'org.apache.cassandra.locator.NetworkTopologyStrategy\', \'us-east\': \'3\'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0'], ['Start time: 19 Sep 24 15:13:31 UTC'], ['End time: 19 Sep 24 15:13:33 UTC'], ['Duration: 2s'], ['Progress: 0%', '0%'], ['Snapshot Tag: sm_20240812164539UTC'], ['Keyspace', 'Progress', 'Size', 'Success', 'Downloaded', 'Failed'], ['system_schema', '0%', '0%', '352.731k', '0', '0', '0']]

2024-09-19 15:13:39.530: (DisruptionEvent Severity.ERROR) period_type=end event_id=74265a87-4830-422e-a42f-7081a9ec6230 duration=58s: nemesis_name=MgmtRestore target_node=Node alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3 [34.242.246.113 | 10.4.3.150] errors=Schema restoration of sm_20240812164539UTC has failed!
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5207, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2966, in disrupt_mgmt_restore
    assert restore_task.status == TaskStatus.DONE, \
AssertionError: Schema restoration of sm_20240812164539UTC has failed!

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6 (18.202.235.208 | 10.4.3.36) (shards: -1)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5 (54.75.40.118 | 10.4.3.65) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-4 (34.241.184.210 | 10.4.0.247) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3 (34.242.246.113 | 10.4.3.150) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2 (108.129.126.116 | 10.4.1.130) (shards: 14)
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1 (34.245.137.137 | 10.4.1.50) (shards: 14)

OS / Image: ami-0555cb82c50d0d5f1 (aws: undefined_region)

Test: longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test id: 4afc0c3a-7457-4d8b-a69a-8ee387d26369
Test name: enterprise-2024.2/alternator_tablets/longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

longevity-alternator-1h-scan-12h-ttl-no-lwt-2h-grace-4loaders-sisyphus.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 4afc0c3a-7457-4d8b-a69a-8ee387d26369
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 4afc0c3a-7457-4d8b-a69a-8ee387d26369

Logs:

alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5 - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_105715/alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-5-4afc0c3a.tar.gz
alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6 - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_105715/alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-6-4afc0c3a.tar.gz
db-cluster-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/db-cluster-4afc0c3a.tar.gz
sct-runner-events-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/sct-runner-events-4afc0c3a.tar.gz
2024_09_19__10_57_16_766.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__10_57_16_766.sct-4afc0c3a.log.gz
2024_09_19__16_16_55_387.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_16_55_387.sct-4afc0c3a.log.gz
2024_09_19__16_25_22_374.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_25_22_374.sct-4afc0c3a.log.gz
2024_09_19__16_33_50_995.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_33_50_995.sct-4afc0c3a.log.gz
2024_09_19__16_42_17_873.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_42_17_873.sct-4afc0c3a.log.gz
2024_09_19__16_50_45_800.sct-4afc0c3a.log.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/2024_09_19__16_50_45_800.sct-4afc0c3a.log.gz
loader-set-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/loader-set-4afc0c3a.tar.gz
monitor-set-4afc0c3a.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/4afc0c3a-7457-4d8b-a69a-8ee387d26369/20240919_165951/monitor-set-4afc0c3a.tar.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-42-50.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16312.1726763829000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16312.1726763829000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-46-15.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16648.1726764141000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16648.1726764141000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-47-09.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16354.1726763992000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16354.1726763992000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1-2024-09-19_16-49-59.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.17956.1726764362000000/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.17956.1726764362000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-51-03.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16850.1726764281000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.16850.1726764281000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-51-18.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16915.1726764444000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.16915.1726764444000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-1-2024-09-19_16-54-13.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.18246.1726764508000000/core.scylla.112.8d725cf960c64aa2882b3fa0d034537c.18246.1726764508000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-3-2024-09-19_16-55-19.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.17151.1726764570000000/core.scylla.112.94ffa6175da846e8b9d94ef2912250f7.17151.1726764570000000.gz
core.scylla-alternator-ttl-4-loaders-no-lwt-sis-db-node-4afc0c3a-2-2024-09-19_16-56-12.gz - https://storage.cloud.google.com/upload.scylladb.com/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.17296.1726764733000000/core.scylla.112.8c56b36cbe334b329c8f7143d02bba54.17296.1726764733000000.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

Michal-Leszczynski · 2024-09-24T06:41:26Z

Well, this is expected.
SM restores schema by applying the output of DESC SCHEMA WITH INTERNALS queried during the backup.
The problem is that schema contains topology related information - dcs in which the keyspace is replicated.
So in order to use SM restore schema task, the restore destination cluster needs to consist of the same dcs as the backed up cluster.

A workaround is to take the schema file from backup location, modify it to fit your needs, and apply it manually.

karol-kokoszka · 2024-09-30T09:37:38Z

@yarongilor Is there anything you suggest to change in Scylla Manager ? As per #4041 (comment) this is expected behavior of the manager.

It looks that there is no datacenter of the "us-east" name in the destination cluster.

# cassandra-rackdc.properties
#
# The lines may include white spaces at the beginning and the end.
# The rack and data center names may also include white spaces.
# All trailing and leading white spaces will be trimmed.
#
dc=thedatacentername
rack=therackname
# prefer_local=<false | true>
# dc_suffix=<Data Center name suffix, used by EC2SnitchXXX snitches>

yarongilor · 2024-09-30T10:17:27Z

@roydahan , @fruch , is there any known resolution for this issue?
The test ran on eu-west-1 region (with Datacenter: eu-west) and failed restoring backup to us-east datacenter. Is it a matter of wrong selected region to test? or it require an SCT fix?

roydahan · 2024-09-30T12:07:24Z

It's not a new issue, mostly a usability issue.
@mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

Michal-Leszczynski · 2024-09-30T13:07:41Z

Issue about restoring schema into a differenct DC setting: #4049.

fruch · 2024-09-30T15:22:19Z

Issue about restoring schema into a differenct DC setting: #4049.

So currently the user is supposed to do the schema restore manually

@mikliapko so I'll say we should at least skip the nemesis if the region of the snapshots doesn't match.

At least until it would be implemented on the test end or manager end.

mikliapko · 2024-10-01T09:21:25Z

It's not a new issue, mostly a usability issue. @mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

I don't remember we have an issue for that.
Created the new one for me to correctly handle the case when the region of the snapshots doesn't match.
#4052

rayakurl · 2024-10-01T09:58:15Z

@mikliapko - IMO we can plan for a workaround, depends when this issue will be fixed from Manager side.
@karol-kokoszka , @Michal-Leszczynski - please discuss this in the next Manager refinement meeting. If it's not going to be handled soon, @mikliapko will create a WA in the test for it.

fruch · 2024-10-01T12:03:51Z

It's not a new issue, mostly a usability issue. @mikliapko I think the original issue is assigned to you, are you planning to change SCT so it will change the DC name while trying to restore?

I don't remember we have an issue for that. Created the new one for me to correctly handle the case when the region of the snapshots doesn't match. #4052

there was an issue about this, long long ago:
https://github.com/scylladb/qa-tasks/issues/1477

I don't know if anything was done to try to apply any workaround.

timtimb0t · 2024-10-22T13:05:10Z

The issue reproduced at https://jenkins.scylladb.com/job/scylla-master/job/tier1/job/longevity-mv-si-4days-streaming-test/7/
plus latest run

roydahan · 2024-11-20T15:28:43Z

@mikliapko we want to have at least a workaround for this issue until it's being fixed in manager.

mikliapko · 2024-11-20T15:36:28Z

@mikliapko we want to have at least a workaround for this issue until it's being fixed in manager.

Sorry, lost track of this issue for a while. I’ll try to come up with a solution no later than next week.

roydahan · 2024-11-21T13:29:12Z

If there is no fix planned in manager or an easy workaround on the manager side, you can have a workaround in SCT by uploading backups to several regions (each with correct region in schema).
Then, change the nemesis to pull the backup from the correct region.

mikliapko · 2024-11-25T11:33:08Z

A workaround is to take the schema file from backup location, modify it to fit your needs, and apply it manually.

@Michal-Leszczynski
I'd like to workaround this issue.
Where can I find a list of CQL statements I need to apply after manual modification of schema file?

Michal-Leszczynski · 2024-11-25T11:41:42Z

Where can I find a list of CQL statements I need to apply after manual modification of schema file?

It's in the backup location under /backup/schema/cluster/<clusterID>/task_<taskID>_tag_<snapshotTag>_schema_with_internals.json.gz.
This file contains json array of schema statements returned from DESCRIBE SCHEMA WITH INTERNALS.

Uncompressed file can look like that:

[
  {
    "keyspace": "restoretest_full",
    "type": "keyspace",
    "name": "restoretest_full",
    "cql_stmt": "CREATE KEYSPACE restoretest_full WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc1': '2'} AND durable_writes = true;"
  },
  {
    "keyspace": "restoretest_full",
    "type": "table",
    "name": "big_table",
    "cql_stmt": "CREATE TABLE restoretest_full.big_table (\n    id int,\n    data blob,\n    PRIMARY KEY (id)\n) WITH ID = d3607a70-a8bf-11ef-851d-38d002d034f1\nAND bloom_filter_fp_chance = 0.01\n    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}\n    AND comment = ''\n    AND compaction = {'class': 'NullCompactionStrategy', 'enabled': 'false'}\n    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}\n    AND crc_check_chance = 1\n    AND default_time_to_live = 0\n    AND gc_grace_seconds = 864000\n    AND max_index_interval = 2048\n    AND memtable_flush_period_in_ms = 0\n    AND min_index_interval = 128\n    AND speculative_retry = '99.0PERCENTILE'\n    AND paxos_grace_seconds = 864000\n    AND tombstone_gc = {'mode': 'repair', 'propagation_delay_in_seconds': '3600'};\n"
  },
  {
    "keyspace": "restoretest_full",
    "type": "index",
    "name": "bydata_index",
    "cql_stmt": "CREATE INDEX bydata ON restoretest_full.big_table(data);\n"
  },
  {
    "keyspace": "restoretest_full",
    "type": "view",
    "name": "testmv",
    "cql_stmt": "CREATE MATERIALIZED VIEW restoretest_full.testmv AS\n    SELECT id, data\n    FROM restoretest_full.big_table\n    WHERE data IS NOT null\n    PRIMARY KEY (id, data)\n    WITH ID = d38ae5d0-a8bf-11ef-8cb1-45ed76572674\nAND CLUSTERING ORDER BY (data ASC)\n    AND bloom_filter_fp_chance = 0.01\n    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}\n    AND comment = ''\n    AND compaction = {'class': 'SizeTieredCompactionStrategy'}\n    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}\n    AND crc_check_chance = 1\n    AND default_time_to_live = 0\n    AND gc_grace_seconds = 864000\n    AND max_index_interval = 2048\n    AND memtable_flush_period_in_ms = 0\n    AND min_index_interval = 128\n    AND speculative_retry = '99.0PERCENTILE'\n    AND paxos_grace_seconds = 864000\n    AND tombstone_gc = {'mode': 'repair', 'propagation_delay_in_seconds': '3600'};\n"
  }
]

timtimb0t · 2024-11-25T13:57:28Z

reproduced there:

Packages

Scylla version: 6.3.0~dev-20241122.e2e6f4f441be with build-id 2493a7aae1f855d3df502197f757822b6afc1033

Kernel Version: 6.8.0-1019-aws

Installation details

Cluster size: 5 nodes (i4i.8xlarge)

Scylla Nodes used in this run:

longevity-mv-si-4d-master-db-node-299884c7-8 (3.250.175.37 | 10.4.11.76) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-7 (3.255.213.87 | 10.4.11.152) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-6 (54.75.36.72 | 10.4.10.102) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-5 (34.255.195.233 | 10.4.9.117) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-4 (18.202.252.170 | 10.4.9.20) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-3 (3.255.99.166 | 10.4.10.136) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-2 (52.50.139.63 | 10.4.11.189) (shards: 30)
longevity-mv-si-4d-master-db-node-299884c7-1 (46.137.67.238 | 10.4.8.195) (shards: 30)

OS / Image: ami-001a2091244fdbdf3 (aws: undefined_region)

Test: longevity-mv-si-4days-streaming-test
Test id: 299884c7-f5ee-4e0d-8e21-a27a3509b0a6
Test name: scylla-master/tier1/longevity-mv-si-4days-streaming-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

longevity-mv-si-4days.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 299884c7-f5ee-4e0d-8e21-a27a3509b0a6
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 299884c7-f5ee-4e0d-8e21-a27a3509b0a6

Logs:

longevity-mv-si-4d-master-db-node-299884c7-1 - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241123_041129/longevity-mv-si-4d-master-db-node-299884c7-1-299884c7.tar.gz
longevity-mv-si-4d-master-db-node-299884c7-2 - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241123_041129/longevity-mv-si-4d-master-db-node-299884c7-2-299884c7.tar.gz
longevity-mv-si-4d-master-db-node-299884c7-5 - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241123_041129/longevity-mv-si-4d-master-db-node-299884c7-5-299884c7.tar.gz
longevity-mv-si-4d-master-db-node-299884c7-7 - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241123_041129/longevity-mv-si-4d-master-db-node-299884c7-7-299884c7.tar.gz
db-cluster-299884c7.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241124_060331/db-cluster-299884c7.tar.gz
sct-runner-events-299884c7.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241124_060331/sct-runner-events-299884c7.tar.gz
sct-299884c7.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241124_060331/sct-299884c7.log.tar.gz
loader-set-299884c7.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241124_060331/loader-set-299884c7.tar.gz
monitor-set-299884c7.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/299884c7-f5ee-4e0d-8e21-a27a3509b0a6/20241124_060331/monitor-set-299884c7.tar.gz

Jenkins job URL
Argus

mikliapko · 2024-11-26T15:37:51Z

Was trying to workaround it via manual restore of the schema (with putting the right region into the CQL statements for keyspace creation) - failed because of kms:decryption issue (see details here) - the backup was encrypted with the key of the region it belongs to and can't be decryption with key from replaced region.

Looks like backup uploading to several regions is the only way left so far.

fruch · 2024-11-26T17:29:48Z

Was trying to workaround it via manual restore of the schema (with putting the right region into the CQL statements for keyspace creation) - failed because of kms:decryption issue (see details here) - the backup was encrypted with the key of the region it belongs to and can't be decryption with key from replaced region.

Looks like backup uploading to several regions is the only way left so far.

those backup were create with KMS keys which are long gone, regardless of region.
or I'm missing something from how the flow of resoute from KMS EaR encrypted sstables should work ?

mikliapko · 2024-11-26T17:35:03Z

those backup were create with KMS keys which are long gone, regardless of region.

I recreated those backups a few months ago.
They must be encrypted with the relevant key.

fruch · 2024-11-26T17:57:51Z

those backup were create with KMS keys which are long gone, regardless of region.

I recreated those backups a few months ago. They must be encrypted with the relevant key.

I take it back, we clear the aliases and not the keys.

mikliapko · 2024-12-03T10:32:30Z

Hm, we have an issue related to backup_bucket_region parameter.

In test_defaults.yaml this parameter is empty string:

backup_bucket_region: ''  # use the same region as a cluster

Then it gets rewritten by aws_config.yaml:

backup_bucket_region: 'us-east-1'

This parameter is used to configure manager agent:

        node.update_manager_agent_backup_config(
            region=self.params.get("backup_bucket_region"),
            general_config=agent_backup_general_config,
        )

As a result, if the region_name in test differs from us-east-1, we have a misconfiguration between actual region and region configured in scylla-agent.yaml.

I'm thinking about adding a validation rule for backup_bucket_region parameter in sdcm.sct_config.py. Something like this to forbid such kind of situations:

if self.get("backup_bucket_region") != self.get("region_name"):
    self["backup_bucket_region"] = self.get("region_name")

@fruch May this change lead to any unexpected consequences? How do you think?
Or perhaps you have some better ideas how to fix it?

fruch · 2024-12-03T13:59:43Z

Hm, we have an issue related to backup_bucket_region parameter.

In test_defaults.yaml this parameter is empty string:
backup_bucket_region: ''  # use the same region as a cluster
Then it gets rewritten by aws_config.yaml:
backup_bucket_region: 'us-east-1'
This parameter is used to configure manager agent:
    node.update_manager_agent_backup_config(
        region=self.params.get("backup_bucket_region"),
        general_config=agent_backup_general_config,
    )
As a result, if the region_name in test differs from us-east-1, we have a misconfiguration between actual region and region configured in scylla-agent.yaml.

I'm thinking about adding a validation rule for backup_bucket_region parameter in sdcm.sct_config.py. Something like this to forbid such kind of situations:

if self.get("backup_bucket_region") != self.get("region_name"):
self["backup_bucket_region"] = self.get("region_name")
@fruch May this change lead to any unexpected consequences? How do you think? Or perhaps you have some better ideas how to fix it?

as long as you have buckets in all regions for doing the backup nemesis (including of in gce / azure)
I doesn't think it's would be a problem.

I would recommend removing backup_bucket_region if it's not gonna be really usable, can be a followup

mikliapko · 2024-12-05T14:23:34Z

I would recommend removing backup_bucket_region if it's not gonna be really usable, can be a followup

Yes, as for now we can't set different region_name and backup_bucket_region, I suppose it make sense to get rid of it and use just region_name instead. Okay, I'll remove it in scope of this ticket.

mikliapko · 2024-12-06T13:01:50Z

After duplicating all snapshots and fixing backup location issues, disrupt_mgmt_restore is still failing with :

Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5426, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 187, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3092, in disrupt_mgmt_restore
    self.tester.set_ks_strategy_to_network_and_rf_according_to_cluster(
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 1111, in set_ks_strategy_to_network_and_rf_according_to_cluster
    NetworkTopologyReplicationStrategy(**datacenters).apply(node, keyspace)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/replication_strategy_utils.py", line 47, in apply
    session.execute(cql)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1318, in execute_verbose
    return execute_orig(*args, **kwargs)
  File "cassandra/cluster.py", line 2729, in cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 5120, in cassandra.cluster.ResponseFuture.result
cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Only one DC's RF can be changed at a time and not by more than 1"

The test alters keyspace with RF=3 to 5 what is prohibited.
https://argus.scylladb.com/tests/scylla-cluster-tests/0089ac32-5cc0-4168-852a-73718ab10242

mikliapko · 2024-12-06T13:09:40Z

I suppose to fix it, we need to implement the procedure described here:
https://opensource.docs.scylladb.com/stable/kb/rf-increase.html#example

mikliapko · 2024-12-06T14:20:11Z

I wonder how this test was introduces at first time?
With this limitation disrupt_mgmt_restore test couldn't pass.

timtimb0t · 2024-12-09T12:03:20Z

reproduced there:
https://argus.scylladb.com/tests/scylla-cluster-tests/1b3e80d1-e6cb-46c0-a07b-0ca1c1b8974d
Backend: aws
Region: eu-west-1, eu-west-2, eu-north-1
Image id: ami-0c7b4b0835c9342f7 ami-039f35b0f1e04947e ami-03a78f37d7eaf9c88
SCT commit sha: 1d4cbaa1ed74fd3d4748a4636c3f6d57805bc24b
SCT repository: [email protected]:scylladb/scylla-cluster-tests.git
SCT branch name: origin/master
Kernel version: 6.8.0-1019-aws
Scylla version: 6.3.0~dev-20241206.7e2875d6489d
Build id: 5227dd2a3fce4d2beb83ec6c17d47ad2e8ba6f5c
Instance type: i4i.4xlarge
Node amount: 8

yarongilor added backup restore and removed backup labels Sep 23, 2024

Michal-Leszczynski mentioned this issue Sep 30, 2024

Make it possible to restore schema into a different DC setting #4049

Open

mikliapko mentioned this issue Oct 1, 2024

Fix restore nemesis tests to correctly handle the case when the region of the snapshots doesn't match #4052

Open

yarongilor assigned mikliapko Oct 14, 2024

fruch mentioned this issue Nov 11, 2024

[tablets] Decommission failure: Failed to drain tablets: std::runtime_error (Unable to find new replica for tablet) scylladb/scylladb#19504

Closed

2 tasks

mikliapko linked a pull request Dec 6, 2024 that will close this issue

Support different regions in disrupt_mgmt_restore nemesis scylladb/scylla-cluster-tests#9492

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

yarongilor commented Sep 23, 2024

Logs:

Michal-Leszczynski commented Sep 24, 2024

karol-kokoszka commented Sep 30, 2024 •

edited

Loading

yarongilor commented Sep 30, 2024 •

edited

Loading

roydahan commented Sep 30, 2024

Michal-Leszczynski commented Sep 30, 2024

fruch commented Sep 30, 2024

mikliapko commented Oct 1, 2024

rayakurl commented Oct 1, 2024

fruch commented Oct 1, 2024

timtimb0t commented Oct 22, 2024 •

edited

Loading

roydahan commented Nov 20, 2024

mikliapko commented Nov 20, 2024

roydahan commented Nov 21, 2024

mikliapko commented Nov 25, 2024 •

edited

Loading

Michal-Leszczynski commented Nov 25, 2024

timtimb0t commented Nov 25, 2024

Logs:

mikliapko commented Nov 26, 2024

fruch commented Nov 26, 2024

mikliapko commented Nov 26, 2024

fruch commented Nov 26, 2024

mikliapko commented Dec 3, 2024 •

edited

Loading

fruch commented Dec 3, 2024

mikliapko commented Dec 5, 2024

mikliapko commented Dec 6, 2024

mikliapko commented Dec 6, 2024

mikliapko commented Dec 6, 2024 •

edited

Loading

timtimb0t commented Dec 9, 2024

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Schema restore failed with: "Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy" #4041

Comments

yarongilor commented Sep 23, 2024

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

Michal-Leszczynski commented Sep 24, 2024

karol-kokoszka commented Sep 30, 2024 • edited Loading

yarongilor commented Sep 30, 2024 • edited Loading

roydahan commented Sep 30, 2024

Michal-Leszczynski commented Sep 30, 2024

fruch commented Sep 30, 2024

mikliapko commented Oct 1, 2024

rayakurl commented Oct 1, 2024

fruch commented Oct 1, 2024

timtimb0t commented Oct 22, 2024 • edited Loading

roydahan commented Nov 20, 2024

mikliapko commented Nov 20, 2024

roydahan commented Nov 21, 2024

mikliapko commented Nov 25, 2024 • edited Loading

Michal-Leszczynski commented Nov 25, 2024

timtimb0t commented Nov 25, 2024

Packages

Installation details

Logs:

mikliapko commented Nov 26, 2024

fruch commented Nov 26, 2024

mikliapko commented Nov 26, 2024

fruch commented Nov 26, 2024

mikliapko commented Dec 3, 2024 • edited Loading

fruch commented Dec 3, 2024

mikliapko commented Dec 5, 2024

mikliapko commented Dec 6, 2024

mikliapko commented Dec 6, 2024

mikliapko commented Dec 6, 2024 • edited Loading

timtimb0t commented Dec 9, 2024

karol-kokoszka commented Sep 30, 2024 •

edited

Loading

yarongilor commented Sep 30, 2024 •

edited

Loading

timtimb0t commented Oct 22, 2024 •

edited

Loading

mikliapko commented Nov 25, 2024 •

edited

Loading

mikliapko commented Dec 3, 2024 •

edited

Loading

mikliapko commented Dec 6, 2024 •

edited

Loading