Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C11613 sai compression #1474

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

C11613 sai compression #1474

wants to merge 6 commits into from

Conversation

pkolaczk
Copy link

What is the issue

SAI Indexes consume too much storage space, sometimes.

What does this PR fix and why was it fixed

This PR allows to compress both the per-sstable and per-index components of SAI.
Use index_compression table param to control the per-sstable components compression.
Use compression property on the index to control the per-index components compression.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

maoling and others added 6 commits December 16, 2024 17:26
What was ported:
- current compaction throughput measurement by CompactionManager
- exposing current compaction throughput in StorageService
  and CompactionMetrics
- nodetool getcompactionthroughput, including tests

Not ported:
- changes to `nodetool compactionstats`, because that would
  require porting also the tests which are currently missing in CC
  and porting those tests turned out to be a complex task without
  porting the other changes in the CompactionManager API
- Code for getting / setting compaction throughput as double
This commit introduces a new AdaptiveCompressor class.

AdaptiveCompressor uses ZStandard compression with a dynamic
compression level based on the current write load. AdaptiveCompressor's
goal is to provide similar write performance as LZ4Compressor
for write heavy workloads, but a significantly better compression ratio
for databases with a moderate amount of writes or on systems
with a lot of spare CPU power.

If the memtable flush queue builds up, and it turns out the compression
is a significant bottleneck, then the compression level used for
flushing is decreased to gain speed. Similarly, when pending
compaction tasks build up, then the compression level used
for compaction is decreased.

In order to enable adaptive compression:
  - set `-Dcassandra.default_sstable_compression=adaptive` JVM option
    to automatically select `AdaptiveCompressor` as the main compressor
    for flushes and new tables, if not overriden by specific options in
    cassandra.yaml or table schema
  - set `flush_compression: adaptive` in cassandra.yaml to enable it
    for flushing
  - set `AdaptiveCompressor` in Table options to enable it
    for compaction

Caution: this feature is not turned on by default because it
may impact read speed negatively in some rare cases.

Fixes riptano/cndb#11532
Reduces some overhead of setting up / tearing down those
contexts that happened inside the calls to Zstd.compress
/ Zstd.decompress. Makes a difference with very small chunks.

Additionally, added some compression/decompression
rate metrics.
@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1474 rejected by Butler


211 new test failure(s) in 1 builds
See build details here


Found 211 new test failures

Showing only first 15 new test failures

Test Explanation Branch history Upstream history
o.a.c.d.SchemaCQLHelperTest.testCfmOptionsCQL regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...ToolEnableDisableBinaryTest.testMaybeChangeDocs regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[dc] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...alTest.partitionRestrictedWidePartitionTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...alTest.partitionRestrictedWidePartitionTest[dc] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...teforceLocalTest.randomizedBqCompressedTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
....c.VectorBruteforceLocalTest.randomizedTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...ctorBruteforceLocalTest.rangeRestrictedTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...s.c.VectorLocalTest.partitionRestrictedTest[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...s.c.VectorLocalTest.partitionRestrictedTest[dc] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...tionRestrictedWidePartitionPqCompressedTest[dc] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
o.a.c.i.s.c.VectorLocalTest.rangeRestrictedTest... regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...d.v.SegmentFlushTest.testFlushBetweenRowIds[ca] regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...d.v.SegmentFlushTest.testFlushBetweenRowIds[db] regression 🔴 🔵🔵🔵🔵🔵🔵🔵

Found 8 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants