CASSANDRA-18802: Parallelized UCS compactions #3688
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CASSANDRA-18802
Implements parallelization of UCS compactions that would split the output into multiple sstables starting at known in advance boundaries. To do this, introduces a
CompositeLifecycleTransaction
that consists of multiplePartialLifecycleTransaction
parts and only commits after all individual parts have completed, and creates multiple compaction tasks under the same composite transaction. Each individual task has a separate operation UUID, but to make it possible to recognize composite tasks their UUIDs have a part index as their sequence component (visible as "800n" as the second-to-last component of the UUID string), while non-parallelized ones have 0.Major compaction is also changed to take advantage of parallelization. To make sure that it can be carried out on an active node, the number of threads used by a major compaction can be limited, to half the compaction threads by default. This is implemented by creating
CompositeCompactionTasks
that execute multiple tasks serially and grouping the major compaction tasks into a limited number of composites.Because we do not currently support arbitrary filtering of the ranges of an sstable, parallelized compactions cannot use early open. Despite this, they are able to achieve comparable or better performance.
The first two commits bring in CASSANDRA-20092, which is needed for correct calculation of total compaction sizes. The next commit introduces some utilities that are helpful but not ultimately necessary for this patch (it can be easily adjusted to not use them; it is likely that SAI changes will bring these in independently). The final commit simplifies the method of creating the compaction-strategy-specific scanner list variation and is also optional.