C11613 sai compression #1474

pkolaczk · 2024-12-20T15:58:58Z

What is the issue

SAI Indexes consume too much storage space, sometimes.

What does this PR fix and why was it fixed

This PR allows to compress both the per-sstable and per-index components of SAI.
Use index_compression table param to control the per-sstable components compression.
Use compression property on the index to control the per-index components compression.

Checklist before you submit for review

Make sure there is a PR in the CNDB project updating the Converged Cassandra version
Use NoSpamLogger for log lines that may appear frequently in the logs
Verify test results on Butler
Test coverage for new/modified code is > 80%
Proper code formatting
Proper title for each commit staring with the project-issue number, like CNDB-1234
Each commit has a meaningful description
Each commit is not very long and contains related changes
Renames, moves and reformatting are in distinct commits

What was ported: - current compaction throughput measurement by CompactionManager - exposing current compaction throughput in StorageService and CompactionMetrics - nodetool getcompactionthroughput, including tests Not ported: - changes to `nodetool compactionstats`, because that would require porting also the tests which are currently missing in CC and porting those tests turned out to be a complex task without porting the other changes in the CompactionManager API - Code for getting / setting compaction throughput as double

This commit introduces a new AdaptiveCompressor class. AdaptiveCompressor uses ZStandard compression with a dynamic compression level based on the current write load. AdaptiveCompressor's goal is to provide similar write performance as LZ4Compressor for write heavy workloads, but a significantly better compression ratio for databases with a moderate amount of writes or on systems with a lot of spare CPU power. If the memtable flush queue builds up, and it turns out the compression is a significant bottleneck, then the compression level used for flushing is decreased to gain speed. Similarly, when pending compaction tasks build up, then the compression level used for compaction is decreased. In order to enable adaptive compression: - set `-Dcassandra.default_sstable_compression=adaptive` JVM option to automatically select `AdaptiveCompressor` as the main compressor for flushes and new tables, if not overriden by specific options in cassandra.yaml or table schema - set `flush_compression: adaptive` in cassandra.yaml to enable it for flushing - set `AdaptiveCompressor` in Table options to enable it for compaction Caution: this feature is not turned on by default because it may impact read speed negatively in some rare cases. Fixes riptano/cndb#11532

Reduces some overhead of setting up / tearing down those contexts that happened inside the calls to Zstd.compress / Zstd.decompress. Makes a difference with very small chunks. Additionally, added some compression/decompression rate metrics.

cassci-bot · 2024-12-20T16:56:39Z

❌ Build ds-cassandra-pr-gate/PR-1474 rejected by Butler

211 new test failure(s) in 1 builds
See build details here

Found 211 new test failures

Showing only first 15 new test failures

Test	Explanation	Branch history	Upstream history
o.a.c.d.SchemaCQLHelperTest.testCfmOptionsCQL	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...ToolEnableDisableBinaryTest.testMaybeChangeDocs	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[dc]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...alTest.partitionRestrictedWidePartitionTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...alTest.partitionRestrictedWidePartitionTest[dc]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...teforceLocalTest.randomizedBqCompressedTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
....c.VectorBruteforceLocalTest.randomizedTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...ctorBruteforceLocalTest.rangeRestrictedTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...s.c.VectorLocalTest.partitionRestrictedTest[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...s.c.VectorLocalTest.partitionRestrictedTest[dc]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[dc]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
o.a.c.i.s.c.VectorLocalTest.rangeRestrictedTest...	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...d.v.SegmentFlushTest.testFlushBetweenRowIds[ca]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵
...d.v.SegmentFlushTest.testFlushBetweenRowIds[db]	regression	🔴	🔵🔵🔵🔵🔵🔵🔵

Found 8 known test failures

maoling and others added 6 commits December 16, 2024 17:26

Reuse ZStd compression/decompression context

e6baaee

Reduces some overhead of setting up / tearing down those contexts that happened inside the calls to Zstd.compress / Zstd.decompress. Makes a difference with very small chunks. Additionally, added some compression/decompression rate metrics.

CNDB-11613: Compressed SAI

25c366e

Separate control of compression of per-sstable index components

6a611d1

Enable chunk cache on compressed index components

b6c7de2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C11613 sai compression #1474

C11613 sai compression #1474

pkolaczk commented Dec 20, 2024

cassci-bot commented Dec 20, 2024

C11613 sai compression #1474

Are you sure you want to change the base?

C11613 sai compression #1474

Conversation

pkolaczk commented Dec 20, 2024

What is the issue

What does this PR fix and why was it fixed

Checklist before you submit for review

cassci-bot commented Dec 20, 2024

❌ Build ds-cassandra-pr-gate/PR-1474 rejected by Butler

Found 211 new test failures

Found 8 known test failures