-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix compression rate degradation occurring after a dictionary overflo…
…w for some workloads (#82) This PR addresses the progressive degradation of the compression rate observed in Lightstep's span-type data over time. To facilitate a deeper understanding of the dynamics at play, a comprehensive suite of instrumentation has been implemented, targeting the analysis of the average compression rate in response to various schema modifications. Additionally, a new CLI command has been introduced, expanding the simulation capabilities to encompass diverse OTel Arrow stream life cycles, including variations in batch sizes and the number of batches per stream. The root cause of the diminishing compression rate has been identified as a dictionary overflow event. This overflow triggered an automatic fallback to a column without dictionary encoding—a standard and often appropriate response. However, scenarios have been identified where maintaining dictionary encoding and resetting the dictionary may be more beneficial. To ascertain the optimal response, a ratio is employed: the number of distinct values in the dictionary divided by the number of values inserted into the dictionary. This ratio serves as an indicator, dictating when a dictionary reset is preferable over the default fallback procedure. Empirical analysis has resulted in setting the default threshold for this ratio at 0.3. Further research may refine this threshold, adopting a more systematic approach to its determination. The following chart provides a comparative analysis, showcasing the optimization's impact on compression efficiency gains before and after its implementation. ![compression-efficiency-gain-after-optimization](https://github.com/open-telemetry/otel-arrow/assets/657994/ed3375c3-a553-476f-807f-2b45f6e29b6c)
- Loading branch information
Showing
51 changed files
with
1,062 additions
and
249 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.