Skip to content

Commit

Permalink
improvement(gemini): run gemini from scylladb/gemini
Browse files Browse the repository at this point in the history
Dockerfile has been moved to gemini project, and now
SCT should use that image.

Changes:
- Use image from scylladb/gemini
- Remove dockerfile for gemini and point in readme location for the
  images
- Add CQL Statement Logging to gemini output
- Forward outputs from docker to $HOME/*.log
- Run default gemini flags from gemini_thread.py

Signed-off-by: Dusan Malusev <[email protected]>
  • Loading branch information
CodeLieutenant authored and Dusan Malusev committed Jan 10, 2025
1 parent 6355082 commit 0822faa
Show file tree
Hide file tree
Showing 20 changed files with 189 additions and 169 deletions.
2 changes: 1 addition & 1 deletion defaults/docker_images/gemini/values_gemini.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
gemini:
image: scylladb/hydra-loaders:gemini-v1.8.6
image: scylladb/gemini:1.8.9
5 changes: 2 additions & 3 deletions defaults/test_default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,6 @@ alternator_access_key_id: ''
alternator_secret_access_key: ''
dynamodb_primarykey_type: 'HASH'


store_cdclog_reader_stats_in_es: false
region_aware_loader: false

Expand Down Expand Up @@ -158,7 +157,7 @@ jepsen_test_cmd:
jepsen_test_count: 1
jepsen_test_run_policy: all

max_events_severities: ""
max_events_severities: ''
scylla_mgmt_agent_version: '3.4.0'
mgmt_docker_image: 'scylladb/scylla-manager:3.4.0'
k8s_log_api_calls: false
Expand Down Expand Up @@ -283,7 +282,7 @@ latency_decorator_error_thresholds:
mixed:
default:
P90 write:
fixed_limit: 5
fixed_limit: 5
P90 read:
fixed_limit: 5
P99 write:
Expand Down
19 changes: 0 additions & 19 deletions docker/gemini/Dockerfile

This file was deleted.

17 changes: 6 additions & 11 deletions docker/gemini/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
Currently, when releasing a new version of Gemini, there's no need to push the image to Docker Hub.
The image is built and pushed automatically by `goreleaser` when a new version is released.
Docs from Gemini repo: https://github.com/scylladb/gemini/blob/master/docs/release-process.md
Steps to release gemini :
```
0. Make sure you have proper go installed. See the version in https://github.com/scylladb/gemini/blob/master/go.mod
1. update changelog and tag the commit
2. create github token with write:packages permissions here: https://github.com/settings/tokens/new
3. export GITHUB_TOKEN="YOUR_GH_TOKEN”
4. Run `goreleaser`from`cmd/gemini`directory
```
# Gemini Image

## Locations

- [DockerHub](https://hub.docker.com/r/scylladb/gemini)
- [Gemini Github](https://github.com/scylladb/gemini)
1 change: 0 additions & 1 deletion docker/gemini/image

This file was deleted.

130 changes: 96 additions & 34 deletions sdcm/gemini_thread.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,50 +66,105 @@ def __init__(self, test_cluster, oracle_cluster, loaders, stress_cmd, timeout=No
super().__init__(loader_set=loaders, stress_cmd=stress_cmd, timeout=timeout, params=params)
self.test_cluster = test_cluster
self.oracle_cluster = oracle_cluster
self._gemini_result_file = None
self.gemini_commands = []
self.gemini_request_timeout = 180
self.gemini_connect_timeout = 120

@property
def gemini_result_file(self):
if not self._gemini_result_file:
self._gemini_result_file = os.path.join("/", "gemini_result_{}.log".format(uuid.uuid4()))
return self._gemini_result_file
self.unique_id = uuid.uuid4()
self.gemini_default_flags = {
"level": "info",
"request-timeout": "60s",
"connect-timeout": "60s",
"consistency": "QUORUM",
"async-objects-stabilization-backoff": "1s",
"async-objects-stabilization-attempts": 10,
"max-mutation-retries-backoff": "1s",
"max-mutation-retries": 10,
"dataset-size": "large",
"oracle-host-selection-policy": "token-aware",
"test-host-selection-policy": "token-aware",
"drop-schema": "true",
"cql-features": "normal",
"fail-fast": "true",
"materialized-views": "false",
"use-server-timestamps": "true",
"use-lwt": "false",
"use-counters": "false",
"max-tables": 1,
"max-columns": 16,
"min-columns": 8,
"max-partition-keys": 6,
"min-partition-keys": 2,
"max-clustering-keys": 4,
"min-clustering-keys": 2,
"partition-key-distribution": "normal", # Distribution for hitting the partition
# These two are used to control the memory usage of Gemini
"token-range-slices": 512, # Number of partitions
"partition-key-buffer-reuse-size": 100, # Internal Channel Size per parittion value generation
}

self.gemini_oracle_statements_file = f"gemini_oracle_statements_{self.unique_id}.log"
self.gemini_test_statements_file = f"gemini_test_statements_{self.unique_id}.log"
self.gemini_result_file = f"gemini_result_{self.unique_id}.log"

def _generate_gemini_command(self):
seed = self.params.get('gemini_seed')
seed = self.params.get('gemini_seed') or random.randint(1, 100)
table_options = self.params.get('gemini_table_options')
if not seed:
seed = random.randint(1, 100)
log_statements = self.params.get('gemini_log_cql_statements') or False

test_nodes = ",".join(self.test_cluster.get_node_cql_ips())
oracle_nodes = ",".join(self.oracle_cluster.get_node_cql_ips()) if self.oracle_cluster else None

cmd = "./{} --test-cluster={} --outfile {} --seed {} --request-timeout {}s --connect-timeout {}s ".format(
self.stress_cmd.strip(),
test_nodes,
self.gemini_result_file,
seed,
self.gemini_request_timeout,
self.gemini_connect_timeout)
if oracle_nodes:
cmd += "--oracle-cluster={} ".format(oracle_nodes)
oracle_nodes = ",".join(self.oracle_cluster.get_node_cql_ips())

cmd = f"gemini \
--non-interactive \
--oracle-cluster=\"{oracle_nodes}\" \
--test-cluster=\"{test_nodes}\" \
--seed={seed} \
--schema-seed={seed} \
--profiling-port=6060 \
--bind=0.0.0.0:2121 \
--outfile=/{self.gemini_result_file} \
--replication-strategy=\"{{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}}\" \
--oracle-replication-strategy=\"{{'class': 'NetworkTopologyStrategy', 'replication_factor': '1'}}\" "

if log_statements:
cmd += f"--test-statement-log-file=/{self.gemini_test_statements_file} \
--oracle-statement-log-file=/{self.gemini_oracle_statements_file} "

credentials = self.loader_set.get_db_auth()

if credentials and '--test-username' not in cmd:
cmd += f"--test-username={credentials[0]} \
--test-password={credentials[1]} \
--oracle-username={credentials[0]} \
--oracle-password={credentials[1]} "

if table_options:
cmd += " ".join([f"--table-options \"{table_opt}\"" for table_opt in table_options])
cmd += " ".join([f"--table-options=\"{table_opt}\"" for table_opt in table_options])

stress_cmd = self.stress_cmd.replace('\n', ' ').strip()

for key, value in self.gemini_default_flags.items():
if not key in stress_cmd:
cmd += f"--{key}={value} "

cmd += stress_cmd
self.gemini_commands.append(cmd)
return cmd

def _run_stress(self, loader, loader_idx, cpu_idx):

cpu_options = ""
if self.stress_num > 1:
cpu_options = f'--cpuset-cpus="{cpu_idx}"'

docker = cleanup_context = RemoteDocker(loader, self.docker_image_name,
extra_docker_opts=f'{cpu_options} --label shell_marker={self.shell_marker} '
'--network=host '
'--security-opt seccomp=unconfined '
'--entrypoint=""')
for file_name in [self.gemini_result_file, self.gemini_test_statements_file, self.gemini_oracle_statements_file]:
loader.remoter.run(f"touch $HOME/{file_name}", ignore_status=True, verbose=False)

docker = cleanup_context = RemoteDocker(
loader,
self.docker_image_name,
extra_docker_opts=f'--cpuset-cpus="{cpu_idx}"' if self.stress_num > 1 else ""
'--label shell_marker={self.shell_marker}'
'--network=host '
'--security-opt seccomp=unconfined '
'--entrypoint="" '
f'-v $HOME/{self.gemini_result_file}:/{self.gemini_result_file} '
f'-v $HOME/{self.gemini_test_statements_file}:/{self.gemini_test_statements_file} '
f'-v $HOME/{self.gemini_oracle_statements_file}:/{self.gemini_oracle_statements_file} '
)

if not os.path.exists(loader.logdir):
os.makedirs(loader.logdir, exist_ok=True)
Expand Down Expand Up @@ -148,6 +203,13 @@ def _run_stress(self, loader, loader_idx, cpu_idx):
results_copied = docker.receive_files(src=self.gemini_result_file, dst=local_gemini_result_file)
assert results_copied, "gemini results aren't available, did gemini even run ?"

local_gemini_test_statements_file = os.path.join(
docker.node.logdir, os.path.basename(self.gemini_test_statements_file))
local_gemini_oracle_statements_file = os.path.join(
docker.node.logdir, os.path.basename(self.gemini_oracle_statements_file))
docker.receive_files(src=self.gemini_test_statements_file, dst=local_gemini_test_statements_file)
docker.receive_files(src=self.gemini_oracle_statements_file, dst=local_gemini_oracle_statements_file)

return docker, result, local_gemini_result_file

def get_gemini_results(self):
Expand Down
2 changes: 2 additions & 0 deletions sdcm/logcollector.py
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,8 @@ class LoaderLogCollector(LogCollector):
search_locally=True),
FileLog(name='*gemini-l*.log',
search_locally=True),
FileLog(name='*gemini_*_statements_*.log',
search_locally=True),
FileLog(name='gemini_result*.log',
search_locally=True),
FileLog(name='cdclogreader*.log',
Expand Down
3 changes: 3 additions & 0 deletions sdcm/sct_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -708,6 +708,9 @@ class SCTConfiguration(dict):

dict(name="gemini_seed", env="SCT_GEMINI_SEED", type=int,
help="Seed number for gemini command"),
dict(name="gemini_log_cql_statements",
env="SCT_GEMINI_LOG_CQL_STATEMENTS",
type=boolean, help="Log CQL statements to file"),
dict(name="gemini_table_options", env="SCT_GEMINI_TABLE_OPTIONS", type=list,
help="""table options for created table. example:
["cdc={'enabled': true}"]
Expand Down
13 changes: 6 additions & 7 deletions test-cases/gemini/gemini-1tb-10h.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,14 @@ nemesis_selector: ['run_with_gemini']
nemesis_interval: 5
nemesis_seed: '041'

gemini_cmd: "gemini -d --duration 8h --warmup 2h -c 50 \
-m mixed -f --non-interactive --cql-features normal \
--max-mutation-retries 10 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms \
--replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}\"
--oracle-replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '1'}\" "

gemini_cmd: |
--duration 8h
--warmup 2h
--concurrency 50
--mode mixed
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_log_cql_statements: false

db_type: mixed_scylla
instance_type_db_oracle: 'i4i.16xlarge'
13 changes: 6 additions & 7 deletions test-cases/gemini/gemini-3h-cdc-postimage-write.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,16 @@ n_loaders: 1
n_monitor_nodes: 1
instance_type_db: 'i4i.4xlarge'

user_prefix: "gemini-cdc-postimage-write"

gemini_cmd: "gemini -d --duration 3h \
-c 30 -m write -f --non-interactive --cql-features normal \
--max-mutation-retries 5 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms \
--replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}\" "
user_prefix: 'gemini-cdc-postimage-write'

gemini_cmd: |
--duration 3h
--concurrency 30
--mode write
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_table_options:
- "cdc={'enabled': true, 'postimage': true}"
gemini_log_cql_statements: false

db_type: scylla
13 changes: 6 additions & 7 deletions test-cases/gemini/gemini-3h-cdc-preimage-write.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,16 @@ n_loaders: 1
n_monitor_nodes: 1
instance_type_db: 'i4i.4xlarge'

user_prefix: "gemini-cdc-preimage-write"

gemini_cmd: "gemini -d --duration 3h \
-c 30 -m write -f --non-interactive --cql-features normal \
--max-mutation-retries 5 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms \
--replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}\" "
user_prefix: 'gemini-cdc-preimage-write'

gemini_cmd: |
--duration 3h
--concurrency 30
--mode write
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_table_options:
- "cdc={'enabled': true, 'preimage': true}"
gemini_log_cql_statements: false

db_type: scylla
16 changes: 7 additions & 9 deletions test-cases/gemini/gemini-3h-cdc-write.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,16 @@ n_loaders: 1
n_monitor_nodes: 1
instance_type_db: 'i4i.4xlarge'

user_prefix: "gemini-cdc-write"
user_prefix: 'gemini-cdc-write'

gemini_cmd: "gemini -d --duration 3h \
-c 30 -m write -f --non-interactive --cql-features normal \
--max-mutation-retries 5 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms \
--replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}\" "
gemini_cmd: |
--duration 3h
--concurrency 30
--mode write
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_table_options:
- "cdc={'enabled': true}"


gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_log_cql_statements: false

db_type: scylla
18 changes: 7 additions & 11 deletions test-cases/gemini/gemini-3h-ics-cdc-with-nemesis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,17 @@ ami_db_scylla_user: 'centos'
nemesis_class_name: 'GeminiChaosMonkey'
nemesis_interval: 5

# gemini
# cmd: gemini -d -n [NUM_OF_TEST_ITERATIONS] -c [NUM_OF_THREADS] -m mixed -f
# the below cmd runs about 3 hours
gemini_cmd: "gemini -d --duration 3h --warmup 30m -c 50 -m mixed -f --non-interactive \
--cql-features normal --table-options \"compaction={'class': 'IncrementalCompactionStrategy'}\" \
--max-mutation-retries 5 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms \
--replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '3'}\" \
--oracle-replication-strategy \"{'class': 'NetworkTopologyStrategy', 'replication_factor': '1'}\" "

gemini_cmd: |
--duration 3h
--warmup 30m
--concurrency 50
--mode write
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used

gemini_table_options:
- "cdc={'enabled': true, 'preimage': true, 'postimage': true}"
- "compaction={'class': 'IncrementalCompactionStrategy'}"
gemini_log_cql_statements: false

stress_cdclog_reader_cmd: "cdc-stressor -duration 215m -stream-query-round-duration 30s"

Expand Down
15 changes: 8 additions & 7 deletions test-cases/gemini/gemini-3h-ics-with-nondisruptive-nemesis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,17 @@ ami_db_scylla_user: 'centos'

nemesis_class_name: 'GeminiNonDisruptiveChaosMonkey'
nemesis_interval: 5
# gemini
# cmd: gemini -d -n [NUM_OF_TEST_ITERATIONS] -c [NUM_OF_THREADS] -m mixed -f
# the below cmd runs about 3 hours
gemini_cmd: "gemini -d --duration 10800s --warmup 1800s -c 50 -m mixed -f --non-interactive \
--cql-features normal --table-options \"compaction={'class': 'IncrementalCompactionStrategy'}\" \
--max-mutation-retries 5 --max-mutation-retries-backoff 500ms \
--async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms "

gemini_cmd: |
--duration 3h
--warmup 30m
--concurrency 50
--mode write
gemini_schema_url: 'https://s3.amazonaws.com/scylla-gemini/Binaries/schema.json' # currently is not used
gemini_table_options:
- "compaction={'class': 'IncrementalCompactionStrategy'}"
gemini_log_cql_statements: false

db_type: mixed_scylla
instance_type_db_oracle: 'i4i.8xlarge'
Loading

0 comments on commit 0822faa

Please sign in to comment.