Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: enable read cache and write cache when using remote object stores #5093

Merged

Conversation

killme2008
Copy link
Contributor

@killme2008 killme2008 commented Dec 4, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Close #4813

What's changed and what's your intention?

  • Enable read and write cache by default for remote object storages.
  • Set the default read cache path to {data_home}/object_cache/read and create subdirectories for multiplex storages by their names.
  • Set the default write cache path to {data_home}/object_cache/write, as the write cache key, IndexKey, includes region_id, eliminating the need for subdirectories for multiplex storages.
  • Change the default cache capacity from 1Gib to 5Gib.

@evenyag Please take a look.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced configuration documentation for object storage and caching.
    • Added new test functions for OpenTelemetry Protocol (OTLP) metrics, traces, and logs.
    • Introduced a new error variant for directory creation failures.
  • Improvements

    • Default cache size increased from 1GiB to 5GiB for object storage.
    • Clarified and updated configuration paths for experimental write cache.
    • Improved error handling and logging for cache directory creation.
  • Bug Fixes

    • Adjusted cache path handling for S3 storage type to prevent unintended caching.

@killme2008 killme2008 requested review from evenyag, v0y4g3r, waynexia and a team as code owners December 4, 2024 08:31
Copy link
Contributor

coderabbitai bot commented Dec 4, 2024

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces significant updates to configuration documentation and files for a system operating in standalone and distributed modes. Key changes include refined descriptions and default values for configuration parameters related to storage and caching, particularly for object storage. The modifications enhance clarity, address default cache settings, and improve error handling for directory creation. New methods for identifying object storage configurations have also been added, along with updates to integration tests to ensure comprehensive coverage of the new features.

Changes

File Change Summary
config/config.md Updated descriptions and default values for storage.cache_path, region_engine.mito.experimental_write_cache, and related parameters.
config/datanode.example.toml Clarified comments, updated default values for cache parameters, and noted deprecated fields.
config/standalone.example.toml Updated cache configuration section, default values for cache parameters, and clarified experimental write cache settings.
src/datanode/src/config.rs Updated DEFAULT_OBJECT_STORE_CACHE_SIZE to 5 GB, added is_object_storage methods to ObjectStoreConfig and StorageConfig.
src/datanode/src/datanode.rs Modified build_store_engines to handle object storage configuration for the Mito engine.
src/datanode/src/lib.rs Added feature flag #![feature(let_chains)].
src/datanode/src/store.rs Updated build_cache_layer to streamline logic for object storage and improved error handling for cache directory creation.
src/mito2/src/config.rs Updated documentation and logic for experimental_write_cache_path and experimental_write_cache_size.
src/mito2/src/error.rs Introduced new error variant CreateDir for directory creation failures.
src/mito2/src/worker.rs Enhanced error handling for directory creation in write_cache_from_config.
src/object-store/src/lib.rs Added public constant OBJECT_CACHE_DIR for default object cache directory name.
tests-integration/src/test_util.rs Modified get_test_store_config to handle cache path for S3 storage types.
tests-integration/tests/http.rs Added new test functions for OTLP metrics, traces, and logs; enhanced existing tests for better validation.
tests-integration/tests/main.rs Added macro for HTTP tests and commented out region failover tests.

Assessment against linked issues

Objective Addressed Explanation
Enable read/write cache by default if using remote object stores (#4813)

Possibly related PRs

Suggested reviewers

  • v0y4g3r
  • evenyag
  • WenyXu
  • waynexia
  • discord9

Poem

In the burrow deep where the data flows,
Caches are set, and the configuration grows.
With paths refined and sizes anew,
The rabbit hops forth, sharing changes with you! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@killme2008 killme2008 requested review from WenyXu, evenyag and fengjiachun and removed request for a team, v0y4g3r, evenyag and waynexia December 4, 2024 08:31
@github-actions github-actions bot added breaking-change This pull request contains breaking changes. docs-required This change requires docs update. labels Dec 4, 2024
@github-actions github-actions bot added docs-not-required This change does not impact docs. and removed docs-required This change requires docs update. labels Dec 4, 2024
Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 48.48485% with 34 lines in your changes missing coverage. Please review.

Project coverage is 83.80%. Comparing base (ce86ba3) to head (3d54fd2).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5093      +/-   ##
==========================================
- Coverage   84.03%   83.80%   -0.23%     
==========================================
  Files        1168     1168              
  Lines      217572   217620      +48     
==========================================
- Hits       182841   182383     -458     
- Misses      34731    35237     +506     

@killme2008
Copy link
Contributor Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Dec 4, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
src/datanode/src/store.rs (1)

149-168: Handle potential errors when creating cache directories

While using tokio::fs::create_dir_all to create the cache directory, consider handling potential I/O errors such as permission issues or existing paths that are not directories. Providing detailed error messages can aid in troubleshooting.

src/mito2/src/config.rs (1)

97-97: Update documentation for default write cache path

Clarifying that the default path is {data_home}/object_cache/write helps users understand where the write cache is stored by default.

config/datanode.example.toml (1)

297-304: LGTM! Consider documenting the rationale for the cache size.

The cache configuration changes look good. The documentation is clear, and the default paths follow a logical structure. The increased cache capacity from 1GiB to 5GiB should improve performance.

Consider adding a comment explaining why 5GiB was chosen as the default cache capacity, which could help users better understand if they need to adjust this value for their use case.

tests-integration/src/test_util.rs (1)

236-238: LGTM! Consider adding test cases for cache behavior.

The explicit cache path handling for different S3 storage configurations is clear and well-documented.

Would you like me to help generate additional test cases to verify:

  1. Cache behavior when disabled (empty string)
  2. Cache behavior when enabled with custom path
  3. Cache size limits and overflow scenarios
src/datanode/src/datanode.rs (1)

402-407: Consider using WARN level for the configuration change log.

While the code correctly enables write cache for object storage, this is a significant configuration change that affects performance and resource usage. Consider using WARN level instead of INFO to make this change more visible in logs, especially during the initial rollout period.

-        info!("Configured 'enable_experimental_write_cache=true' for mito engine.");
+        warn!("Configured 'enable_experimental_write_cache=true' for mito engine.");
src/mito2/src/worker.rs (1)

376-380: Consider validating the cache path before creation.

While the directory creation is implemented correctly, consider adding validation of the cache path to ensure it's absolute and has proper permissions. This can prevent issues in production environments with different permission models.

+    // Validate cache path
+    let cache_path = Path::new(&config.experimental_write_cache_path);
+    ensure!(
+        cache_path.is_absolute(),
+        InvalidConfigSnafu {
+            reason: format!(
+                "Write cache path must be absolute: {}",
+                config.experimental_write_cache_path
+            )
+        }
+    );
+
     tokio::fs::create_dir_all(Path::new(&config.experimental_write_cache_path))
         .await
         .context(CreateDirSnafu {
             dir: &config.experimental_write_cache_path,
         })?;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5092f5f and 2667bec.

📒 Files selected for processing (14)
  • config/config.md (4 hunks)
  • config/datanode.example.toml (2 hunks)
  • config/standalone.example.toml (2 hunks)
  • src/datanode/src/config.rs (4 hunks)
  • src/datanode/src/datanode.rs (1 hunks)
  • src/datanode/src/lib.rs (1 hunks)
  • src/datanode/src/store.rs (4 hunks)
  • src/mito2/src/config.rs (4 hunks)
  • src/mito2/src/error.rs (2 hunks)
  • src/mito2/src/worker.rs (3 hunks)
  • src/object-store/src/lib.rs (1 hunks)
  • tests-integration/src/test_util.rs (1 hunks)
  • tests-integration/tests/http.rs (1 hunks)
  • tests-integration/tests/main.rs (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • src/datanode/src/lib.rs
  • src/object-store/src/lib.rs
🔇 Additional comments (14)
src/datanode/src/store.rs (1)

70-70: Use of is_object_storage() enhances code clarity

The introduction of the is_object_storage() method improves readability by encapsulating the logic for determining if the store is an object storage.

tests-integration/tests/http.rs (1)

Line range hint 32-38: Expand test coverage for OTLP endpoints

The addition of tests for OTLP metrics, traces, and logs significantly improves test coverage. Ensure these tests handle various scenarios, including error conditions and edge cases, to validate the robustness of the new endpoints.

tests-integration/tests/main.rs (1)

25-27: Include HTTP tests for multiple storage backends

Adding http_tests!(File, S3, S3WithCache, Oss, Azblob, Gcs); ensures HTTP functionalities are tested across different storage backends, enhancing reliability.

src/datanode/src/config.rs (2)

63-69: Introduce is_object_storage() method in ObjectStoreConfig

The new method simplifies checks for object storage types, promoting cleaner and more maintainable code.


102-107: Add is_object_storage() to StorageConfig for consistency

Implementing is_object_storage() in StorageConfig provides a consistent interface for determining storage types at a higher configuration level.

src/mito2/src/config.rs (2)

154-154: Increase default write cache size to 5 GiB

Increasing the default experimental_write_cache_size provides better performance for workloads with high write throughput.


245-246: Set default write cache path using OBJECT_CACHE_DIR

Using the OBJECT_CACHE_DIR constant ensures consistency across the application when determining cache directory paths.

config/datanode.example.toml (1)

479-486: LGTM! Verify cache directory creation permissions.

The write cache configuration changes are well-documented and consistent with the read cache changes. The default paths and cache sizes are logically structured.

Let's verify the cache directory creation behavior:

✅ Verification successful

Based on the search results, I can see that the write cache directory creation is properly handled in src/mito2/src/worker.rs:

tokio::fs::create_dir_all(Path::new(&config.experimental_write_cache_path))
    .await
    .context(CreateDirSnafu {
        dir: &config.experimental_write_cache_path,
    })?;

The code uses create_dir_all() which recursively creates directories if they don't exist. This is the same approach used for other cache directories in the codebase, like read cache creation in src/datanode/src/store.rs.

Let's verify if there are any permission-related configurations or handling:


Cache directory creation is properly handled with appropriate error handling

The write cache directory creation is implemented securely in src/mito2/src/worker.rs using tokio::fs::create_dir_all() with proper error handling through the CreateDirSnafu context. The implementation:

  • Creates parent directories recursively if they don't exist
  • Uses the same approach as other cache directories in the codebase
  • Provides error context for troubleshooting
  • Defaults to {data_home}/object_cache/write which follows the system's data directory permissions
  • Is automatically enabled for object storage with appropriate logging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the code handles cache directory creation and permissions

# Test: Search for directory creation code
rg -A 5 "create_dir|create_directory|fs::create"

# Test: Search for permission handling
rg -A 5 "chmod|permission|mode"

Length of output: 76259


Script:

#!/bin/bash
# Search for write cache permission configurations
rg -A 5 "write_cache.*permission|write_cache.*mode"

# Search for any permission-related code near write cache
rg -B 5 -A 5 "experimental_write_cache" 

Length of output: 18471

config/standalone.example.toml (2)

335-342: LGTM! Configuration is consistent.

The read cache configuration mirrors the changes in datanode.example.toml, maintaining consistency across configuration files.


517-524: LGTM! Write cache configuration is consistent.

The write cache configuration mirrors the changes in datanode.example.toml, maintaining consistency across configuration files.

src/mito2/src/error.rs (1)

679-684: LGTM! Error handling implementation is thorough.

The new CreateDir error variant is well-implemented with:

  • Clear error message format
  • Complete error context (directory path and IO error)
  • Proper integration with the status code system
config/config.md (3)

96-96: LGTM: Clear documentation for storage read cache configuration.

The documentation clearly explains that read cache is configured by default for object storage systems and provides the default path. The recommendation for better performance is helpful for users.


424-424: LGTM: Consistent configuration across modes.

The datanode configuration mirrors the changes in standalone mode, maintaining consistency:

  • Read cache configuration with the same default path
  • Write cache enabled by default for object storage
  • Same cache size of 5GiB

This consistency helps prevent confusion and ensures uniform behavior across deployment modes.

Also applies to: 462-464


134-136: Verify the increased write cache size impact.

The write cache configuration has been updated with:

  • Enabled by default for object storage
  • Default path set to {data_home}/object_cache/write
  • Cache size increased significantly from 1GiB to 5GiB

While the changes improve performance, the increased cache size might impact memory usage.

✅ Verification successful

Let me generate additional scripts to check for any performance benchmarks or tests specifically related to write cache configuration:

The previous search showed a lot of memory-related code but we need to specifically focus on write cache performance implications. Let's check for any tests or documentation around write cache behavior and performance tuning.


Let me check the benchmark results and implementation details to better understand the write cache performance implications:

I want to verify if there are documented performance impacts and implementation safeguards around the increased cache size before making the final assessment.


Based on the gathered information, I can now provide a final response:

The increased write cache size to 5GiB is safe and well-managed.

The codebase shows several safeguards and optimizations around the write cache:

  • The write cache is experimental and only enabled by default for object storage
  • The implementation includes memory monitoring and metrics tracking
  • Performance benchmarks (TSBS) show improved query latencies with larger cache sizes
  • The cache has built-in TTL support and automatic cleanup mechanisms
  • The system includes write buffer rejection thresholds to prevent memory pressure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any memory-related issues or discussions in the codebase
# that might be relevant to the cache size increase.

# Search for memory-related configurations and comments
rg -i "memory|cache size|buffer size" --type rust

# Look for performance test results or benchmarks
rg -i "benchmark|performance test|memory usage" --type md

Length of output: 76499


Script:

#!/bin/bash
# Search for write cache related tests and benchmarks
rg -i "write.?cache|write.?buffer" --type rust -C 3 tests/
rg -i "write.?cache|write.?buffer" --type md

# Look for any performance related configuration docs
rg -i "performance|tuning" --type md config/

Length of output: 4243


Script:

#!/bin/bash
# Check benchmark results and implementation details
rg -A 10 "write buffer" docs/benchmarks/tsbs/v0.3.2.md

# Look for write cache implementation and tests
rg -i "write.?cache" --type rust -A 5 src/mito2/

Length of output: 32591

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM.

Is it necessary to change the default path? I'm ok with the new naming but since it's a breaking change

src/datanode/src/config.rs Outdated Show resolved Hide resolved
@killme2008
Copy link
Contributor Author

Others LGTM.

Is it necessary to change the default path? I'm ok with the new naming but since it's a breaking change

I was hesitant as well, but I believe that failing to standardize the paths for read and write caches will lead to growing chaos in data_home. It's wiser to implement these breaking changes sooner rather than later.

@killme2008 killme2008 mentioned this pull request Dec 5, 2024
9 tasks
@killme2008
Copy link
Contributor Author

@evenyag Please help to take a look

src/datanode/src/config.rs Outdated Show resolved Hide resolved
src/datanode/src/datanode.rs Outdated Show resolved Hide resolved
src/mito2/src/worker.rs Show resolved Hide resolved
@evenyag evenyag added the release-blocker Indicates a release-blocker label Dec 9, 2024
@killme2008 killme2008 force-pushed the feature/enable-object-cache-default branch from 2667bec to 2db232e Compare December 10, 2024 03:16
@killme2008 killme2008 enabled auto-merge December 10, 2024 03:16
@killme2008 killme2008 added this pull request to the merge queue Dec 10, 2024
@evenyag evenyag removed this pull request from the merge queue due to a manual request Dec 10, 2024
src/mito2/src/config.rs Outdated Show resolved Hide resolved
@evenyag evenyag enabled auto-merge December 10, 2024 03:48
@evenyag evenyag added this pull request to the merge queue Dec 10, 2024
Merged via the queue into GreptimeTeam:main with commit 03a2832 Dec 10, 2024
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This pull request contains breaking changes. docs-not-required This change does not impact docs. release-blocker Indicates a release-blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable read/write cache by default if using remote object stores
4 participants