-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: register & deregister region failure detectors actively #4223
feat: register & deregister region failure detectors actively #4223
Conversation
WalkthroughThe changes introduce a Changes
Tip Early access features: disabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range and nitpick comments (1)
src/meta-srv/src/procedure/region_migration.rs (1)
Line range hint
158-203
: Assess the impact of the failure detector on region migration logic.The integration of the
region_failure_detector_controller
into the region migration logic is a significant change. It is essential to conduct thorough testing, particularly integration tests, to ensure that the failure detection mechanisms do not adversely affect the migration processes.Would you like help in setting up the integration tests or reviewing the existing test scenarios to ensure comprehensive coverage?
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (22)
- src/cmd/src/standalone.rs (2 hunks)
- src/common/meta/src/ddl.rs (4 hunks)
- src/common/meta/src/ddl/create_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_database.rs (4 hunks)
- src/common/meta/src/ddl/drop_database/cursor.rs (3 hunks)
- src/common/meta/src/ddl/drop_database/executor.rs (7 hunks)
- src/common/meta/src/ddl/drop_database/metadata.rs (2 hunks)
- src/common/meta/src/ddl/drop_database/start.rs (3 hunks)
- src/common/meta/src/ddl/drop_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
- src/common/meta/src/ddl_manager.rs (2 hunks)
- src/common/meta/src/test_util.rs (2 hunks)
- src/meta-srv/src/failure_detector.rs (4 hunks)
- src/meta-srv/src/handler/failure_handler.rs (2 hunks)
- src/meta-srv/src/metasrv/builder.rs (4 hunks)
- src/meta-srv/src/procedure/region_migration.rs (4 hunks)
- src/meta-srv/src/procedure/region_migration/test_util.rs (2 hunks)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
- src/meta-srv/src/procedure/utils.rs (2 hunks)
- src/meta-srv/src/region/failure_detector.rs (2 hunks)
- src/meta-srv/src/region/supervisor.rs (13 hunks)
- tests-integration/src/standalone.rs (2 hunks)
Files not reviewed due to errors (1)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (no review received)
Additional comments not posted (34)
src/meta-srv/src/handler/failure_handler.rs (2)
29-32
: Constructor updated to includeHeartbeatAcceptor
.The addition of
HeartbeatAcceptor
as a parameter in the constructor is aligned with the PR's goal to enhance failure detection. Ensure that all instances ofRegionFailureHandler
creation are updated accordingly.
76-82
: Updated test method to includeHeartbeatAcceptor
.The test method
test_handle_heartbeat
now properly initializesHeartbeatAcceptor
, which is crucial for testing the new functionality. Ensure the rest of the test suite is similarly updated to cover all new changes.src/meta-srv/src/region/failure_detector.rs (2)
54-54
: Methodregion_failure_detector
correctly handlesIdent
.This method efficiently handles the retrieval or creation of
PhiAccrualFailureDetector
instances, ensuring that failure detection can be dynamically managed for eachIdent
.
64-76
: Methodmaybe_init_region_failure_detector
efficiently handles initialization.This method correctly initializes a
PhiAccrualFailure Detector
with a timestamp, which is crucial for accurate failure detection based on recent activity.src/common/meta/src/ddl/drop_database/start.rs (1)
92-92
: Proper initialization ofcluster_id
inDropDatabaseContext
.The
cluster_id
field is consistently initialized across different test scenarios. Ensure that the actual usage ofcluster_id
in production scenarios is similarly consistent and correct.Also applies to: 108-108, 131-131
src/common/meta/src/ddl/drop_database/metadata.rs (1)
121-121
: Proper initialization ofcluster_id
inDropDatabaseContext
.The
cluster_id
field is consistently initialized across different test scenarios. Ensure that the actual usage ofcluster_id
in production scenarios is similarly consistent and correct.Also applies to: 148-148
src/common/meta/src/ddl.rs (1)
105-125
: Introduction ofRegionFailureDetectorController
and its no-op implementation.The new trait
RegionFailureDetectorController
and its no-operation implementationNoopRegionFailureDetectorControl
are well-defined. They provide a clear interface for failure detection management, enhancing the system's modularity and extensibility.src/common/meta/src/ddl/drop_database.rs (1)
38-38
: Ensure appropriate imports forClusterId
.The file now uses
ClusterId
, but it's unclear if it's properly imported. This could potentially lead to compilation errors if not handled correctly.Verification successful
The import of
ClusterId
is correctly handled.The
ClusterId
type is defined within the crate as a type alias foru64
, ensuring that the import statementuse crate::ClusterId;
insrc/common/meta/src/ddl/drop_database.rs
is appropriate and will not cause any compilation issues.
src/common/meta/src/lib.rs
:pub type ClusterId = u64;
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify if `ClusterId` is imported in the file. # Test: Search for the import statement for `ClusterId`. rg --type rust 'use crate::ClusterId;' src/common/meta/src/ddl/drop_database.rsLength of output: 103
Script:
#!/bin/bash # Description: Verify if `ClusterId` is defined within the crate. # Test: Search for the definition of `ClusterId` within the crate. ast-grep --lang rust --pattern 'struct ClusterId { $$$ }'Length of output: 59
Script:
#!/bin/bash # Description: Verify if `ClusterId` is defined within the crate. # Test: Search for any definition of `ClusterId` within the crate. rg --type rust 'ClusterId'Length of output: 10475
src/common/meta/src/test_util.rs (2)
27-27
: Addition ofNoopRegionFailureDetectorControl
for testing purposes.The introduction of
NoopRegionFailureDetectorControl
in the testing utilities is appropriate for simulating behavior without actual side effects. Ensure that this aligns with the testing strategies and that it is used consistently across tests.
185-185
: Usage ofNoopRegionFailureDetectorControl
innew_ddl_context
.This ensures that the testing environment simulates the failure detection mechanism without performing real operations, which is crucial for unit tests that don't require actual side effects.
src/meta-srv/src/procedure/utils.rs (2)
117-117
: Addition ofNoopRegionFailureDetectorControl
in procedural utilities.The inclusion of
NoopRegionFailureDetectorControl
in procedural utilities is vital for testing and simulating procedural operations without real side effects. This is particularly important for unit tests and integration tests where actual failure detection is not required.
229-229
: Consistent use ofNoopRegionFailureDetectorControl
innew_ddl_context
.The consistent use of
NoopRegionFailureDetectorControl
across different utility functions ensures that the testing environment is uniform and predictable, which is crucial for reliable testing outcomes.
[APROVED]tests-integration/src/standalone.rs (2)
26-26
: Inclusion ofNoopRegionFailureDetectorControl
in integration tests.The use of
NoopRegionFailureDetectorControl
in integration tests is appropriate for simulating the behavior of the region failure detection without actual network or hardware interaction, which is crucial for isolating tests from external dependencies.
202-202
: Proper integration ofNoopRegionFailureDetectorControl
inDdlContext
.This ensures that the integration tests simulate the region failure detection mechanism accurately without performing real operations, which is essential for testing in a controlled environment.
src/common/meta/src/ddl/drop_table.rs (1)
282-282
: Ensure consistent usage ofcluster_id
across related methods.The addition of
cluster_id
to theDropTableExecutor
constructor is consistent with the changes made in other parts of the system to enhance region failure handling. Confirm thatcluster_id
is being used appropriately within theDropTableExecutor
methods to manage region-specific operations effectively.Verification successful
Ensure consistent usage of
cluster_id
across related methods.The
cluster_id
is consistently used in theDropTableExecutor
constructor and within a method to manage region-specific operations. The usage appears to be appropriate for the intended functionality.
DropTableExecutor
constructor: Initializescluster_id
.- Method usage: Utilizes
self.cluster_id
in region-related operations.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistent usage of `cluster_id` in related methods. # Test: Search for `cluster_id` usage in `DropTableExecutor` methods. Expect: Consistent and correct usage. rg --type rust --context 5 $'cluster_id' src/common/meta/src/ddl/drop_table/executor.rsLength of output: 1090
Script:
#!/bin/bash # Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`. # Step 1: List all methods within the `DropTableExecutor` implementation. ast-grep --lang rust --pattern 'impl DropTableExecutor { $$$ }' src/common/meta/src/ddl/drop_table/executor.rs # Step 2: Check for `cluster_id` usage within each method of `DropTableExecutor`. rg --type rust 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'fn ' | sed 's/^/rg cluster_id -A 5 -B 5 /' | bashLength of output: 15703
Script:
#!/bin/bash # Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`. # Step 1: List all methods within the `DropTableExecutor` implementation. ast-grep --lang rust --pattern 'impl DropTableExecutor { $$$ }' src/common/meta/src/ddl/drop_table/executor.rs # Step 2: Check for `cluster_id` usage within each method of `DropTableExecutor`. rg --type rust 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'fn ' | awk '{print $2}' | xargs -I {} rg 'cluster_id' -A 5 -B 5 src/common/meta/src/ddl/drop_table/executor.rs | grep -B 5 -A 5 -E 'fn {}'Length of output: 15713
Script:
#!/bin/bash # Description: Verify consistent usage of `cluster_id` in all methods of `DropTableExecutor`. # Search for `cluster_id` within the `DropTableExecutor` implementation block. rg 'impl DropTableExecutor' -A 1000 src/common/meta/src/ddl/drop_table/executor.rs | rg 'cluster_id' -A 5 -B 5Length of output: 1111
src/common/meta/src/ddl/drop_table/executor.rs (1)
54-59
: Proper integration ofcluster_id
inDropTableExecutor
.The addition of
cluster_id
to theDropTableExecutor
constructor is correctly implemented. Ensure thatcluster_id
is utilized effectively within the executor's methods to manage cluster-specific operations.Verification successful
Proper integration of
cluster_id
inDropTableExecutor
.The addition of
cluster_id
to theDropTableExecutor
constructor is correctly implemented and effectively used within its methods.
DropTableExecutor
constructor initializescluster_id
properly.cluster_id
is utilized in a method that processes region leaders.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the effective usage of `cluster_id` within `DropTableExecutor` methods. # Test: Search for `cluster_id` usage in `DropTableExecutor` methods. Expect: Effective and correct usage. rg --type rust --context 5 $'cluster_id' src/common/meta/src/ddl/drop_table/executor.rsLength of output: 1090
src/common/meta/src/ddl/drop_database/executor.rs (1)
190-190
: Consistency check forcluster_id
default value inDropDatabaseContext
.The repeated initialization of
DropDatabaseContext
with a defaultcluster_id
of 0 across multiple methods raises concerns about the appropriateness of this default value. Ensure thatcluster_id
is set dynamically based on operational requirements or a meaningful default is provided.Also applies to: 203-203, 257-257, 347-347, 377-377
src/common/meta/src/ddl/create_table.rs (1)
293-296
: Add logging for failure detector registration.Consider adding a log statement after registering the failure detectors to improve traceability and help in debugging.
+ info!("Registered failure detectors for table {table_id} with identifiers {ident:?}"); self.context .region_failure_detector_controller .register_failure_detectors(ident) .await;
src/meta-srv/src/metasrv/builder.rs (3)
25-27
: Check usage of newly imported modules.The imports
NoopRegionFailureDetectorControl
andRegionFailureDetectorControllerRef
are part of the new failure detection mechanism. Ensure that these are utilized effectively in the server initialization and configuration.
290-311
: Ensure correct conditional logic for failure detection setup.The conditional setup for the
region_failure_detector_controller
andregion_supervisor_ticker
is crucial for the correct functioning of the failure detection mechanism. Make sure that the conditions (options.enable_region_failover
andis_remote_wal
) and their implications are correctly implemented and tested.- if !is_remote_wal && options.enable_region_failover { + if options.enable_region_failover && !is_remote_wal { return error::UnexpectedSnafu { violated: "Region failover is not supported in the local WAL implementation!", } .fail(); }Verification successful
Conditional logic for failure detection setup is correctly implemented.
The condition
if !is_remote_wal && options.enable_region_failover
is present and correctly handles the scenario where region failover is not supported in the local WAL implementation.
src/meta-srv/src/metasrv/builder.rs
: Line 290Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the conditional logic in failure detection setup. # Test: Check if the conditions are correctly implemented. rg --type rust $'if options.enable_region_failover && !is_remote_wal'Length of output: 70
Script:
#!/bin/bash # Description: Verify the usage of `is_remote_wal` and `options.enable_region_failover` in the codebase. # Search for the presence and usage of `is_remote_wal` in the codebase. rg --type rust 'is_remote_wal' # Search for the presence and usage of `options.enable_region_failover` in the codebase. rg --type rust 'options.enable_region_failover'Length of output: 920
Line range hint
325-355
: Validate final server assembly in theMetasrvBuilder
build method.The final assembly of the
Metasrv
object involves integrating various components such asregion_failure_detector_controller
,region_migration_manager
, and others. Ensure that all components are correctly initialized and integrated, and that the configuration options are respected throughout the process.src/meta-srv/src/failure_detector.rs (1)
Line range hint
18-88
: Review enhancements toPhiAccrualFailureDetector
.The enhancements to the
PhiAccrualFailureDetector
, including new traits (Debug
,Clone
,PartialEq
) and configuration options (PhiAccrualFailureDetectorOptions
), are crucial for the flexibility and testability of the failure detection mechanism. Ensure that these enhancements are correctly implemented and that they integrate seamlessly with the existing logic.src/meta-srv/src/procedure/region_migration/test_util.rs (2)
23-23
: Added import of NoopRegionFailureDetectorControl.This import is essential for the new testing environment setup that includes the failure detector control. Ensure that this class is appropriately utilized within the test suites to simulate the behavior of the region failure detectors.
154-154
: Integration of NoopRegionFailureDetectorControl into the TestingEnv context factory.The integration of
NoopRegionFailureDetectorControl
into thecontext_factory
method is crucial for ensuring that the region failure detection mechanism is appropriately simulated during testing. This change helps maintain the fidelity of the test environment relative to the production environment where such detectors are active.src/meta-srv/src/region/supervisor.rs (4)
19-19
: Import of RegionFailureDetectorController.The import of
RegionFailureDetectorController
is critical for the new functionality that allows dynamic management of region failure detectors. This aligns with the PR's objectives of enhancing the system's reliability and fault tolerance.
79-110
: Addition of new Event variants for failure detector management.The introduction of
RegisterFailureDetectors
andDeregisterFailureDetectors
as new event types in theEvent
enum is a significant enhancement. These events facilitate the dynamic management of failure detectors, which is central to the functionality introduced in this PR. Ensure that these events are handled appropriately in the system's event loop to maintain robustness.
133-139
: Initialization of RegionSupervisorTicker.The setup of
RegionSupervisorTicker
with a configurable tick interval is a good practice, allowing for flexibility in how frequently the system checks for region failures. This setup is crucial for maintaining the system's responsiveness to failure conditions.
307-320
: Methods for registering and deregistering failure detectors in RegionSupervisor.These methods are crucial for managing the state of region failure detectors dynamically. The implementation is straightforward and aligns with the system's requirements for handling region failures proactively.
src/cmd/src/standalone.rs (1)
28-28
: Approved import of failure detector related entities.The import of
DdlContext
,NoopRegionFailureDetectorControl
, andProcedureExecutorRef
aligns with the PR's goal to enhance region failure detection capabilities.src/common/meta/src/ddl_manager.rs (1)
Line range hint
808-859
: Approved the import and initialization ofNoopRegionFailureDetectorControl
The import and initialization of
NoopRegionFailureDetectorControl
in the test setup are correctly handled. This ensures that the new region failure detection feature can be adequately tested.src/meta-srv/src/procedure/region_migration.rs (4)
158-158
: Integration ofregion_failure_detector_controller
in structs and methods.The
region_failure_detector_controller
is added to multiple structs (DefaultContextFactory
,Context
) and is used in method implementations. Ensure that this integration does not interfere with existing functionalities and adheres to the intended use cases for region failure management.Also applies to: 164-164, 168-168, 176-176, 190-190, 203-203
158-176
: Proper initialization ofregion_failure_detector_controller
.The initialization in
DefaultContextFactory
appears correct. It is important to ensure that theregion_failure_detector_controller
is properly passed to each new context created, maintaining thread safety and consistency.
32-32
: Ensure appropriate usage of importedRegionFailureDetectorControllerRef
.It's crucial to verify that the imported
RegionFailureDetectorControllerRef
is used correctly throughout the file. This is part of the new mechanism to actively manage region failure detection.
190-190
: Verify the propagation ofregion_failure_detector_controller
in context creation.The
region_failure_detector_controller
is cloned and passed to theContext
struct in thenew_context
method. This is crucial for ensuring that each context has its own reference to the controller, which is necessary for correct failure detection operations.Also applies to: 203-203
Verification successful
The
region_failure_detector_controller
is correctly cloned and propagated in thenew_context
method.The
new_context
method insrc/meta-srv/src/procedure/region_migration.rs
correctly clones theregion_failure_detector_controller
and passes it to theContext
struct, ensuring that each context has its own reference to the controller.
- Verified at:
src/meta-srv/src/procedure/region_migration.rs:190
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correct propagation and cloning of `region_failure_detector_controller` in context creation. # Test: Search for the `new_context` method calls and verify the propagation of the controller. rg --type rust $'new_context'Length of output: 7604
Script:
#!/bin/bash # Description: Extract the implementation of the `new_context` method to verify the propagation and cloning of `region_failure_detector_controller`. # Find the implementation of `new_context` method. ast-grep --lang rust --pattern 'fn new_context($_, $_) -> $_ { $$$ }' src/meta-srv/src/procedure/region_migration.rsLength of output: 1164
45a2ac8
to
d0f2134
Compare
@coderabbitai resolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (14)
- src/common/meta/src/ddl/create_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_database.rs (4 hunks)
- src/common/meta/src/ddl/drop_database/cursor.rs (3 hunks)
- src/common/meta/src/ddl/drop_database/executor.rs (7 hunks)
- src/common/meta/src/ddl/drop_database/metadata.rs (2 hunks)
- src/common/meta/src/ddl/drop_database/start.rs (3 hunks)
- src/common/meta/src/ddl/drop_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
- src/meta-srv/src/handler/failure_handler.rs (2 hunks)
- src/meta-srv/src/metasrv/builder.rs (4 hunks)
- src/meta-srv/src/procedure/region_migration.rs (4 hunks)
- src/meta-srv/src/procedure/region_migration/test_util.rs (2 hunks)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
- src/meta-srv/src/region/supervisor.rs (13 hunks)
Files skipped from review as they are similar to previous changes (13)
- src/common/meta/src/ddl/create_table.rs
- src/common/meta/src/ddl/drop_database.rs
- src/common/meta/src/ddl/drop_database/cursor.rs
- src/common/meta/src/ddl/drop_database/executor.rs
- src/common/meta/src/ddl/drop_database/metadata.rs
- src/common/meta/src/ddl/drop_database/start.rs
- src/common/meta/src/ddl/drop_table.rs
- src/common/meta/src/ddl/drop_table/executor.rs
- src/meta-srv/src/handler/failure_handler.rs
- src/meta-srv/src/metasrv/builder.rs
- src/meta-srv/src/procedure/region_migration.rs
- src/meta-srv/src/procedure/region_migration/test_util.rs
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
Additional comments not posted (7)
src/meta-srv/src/region/supervisor.rs (7)
Line range hint
19-39
: Ensure appropriate use of newly added imports and declarations.The imports and global declarations related to
RegionFailureDetectorController
and related entities are correctly added. These are necessary for the new functionality around region failure detection. Ensure these are used consistently throughout the file.
39-39
: Review theDatanodeHeartbeat
struct.The
DatanodeHeartbeat
struct is well-defined and includes necessary fields for handling heartbeats. This struct will play a critical role in the new failure detection mechanism by providing timely data from data nodes.
79-110
: Ensure correct integration of new events in theEvent
enum.The addition of
RegisterFailureDetectors
andDeregisterFailureDetectors
to theEvent
enum is crucial for the dynamic management of failure detectors. Ensure that these events are handled appropriately in all relevant parts of the system to maintain robustness.
[APROVED]
133-140
: Review modifications in theRegionSupervisorTicker
struct.The modifications to
RegionSupervisorTicker
are appropriate and support the new functionality of handling tick events for failure detection. Ensure that the implementation does not introduce any performance bottlenecks or race conditions.
249-252
: Review theHeartbeatAcceptor
implementation.The
HeartbeatAcceptor
is correctly implemented to forward heartbeat signals. Ensure that this implementation is efficient and error-resistant, particularly in high-load scenarios.
290-295
: Ensure robust handling of failure detector events in the main loop.The handling of
RegisterFailureDetectors
andDeregisterFailureDetectors
events in the main loop ofRegionSupervisor
is crucial for dynamic failure management. Ensure that these events are processed efficiently and correctly.
Line range hint
456-622
: Review the test implementations forRegionSupervisor
.The tests appear to be comprehensive and effectively simulate various scenarios related to region failure detection. Ensure that all new functionalities are covered by the tests to prevent regressions.
Actions performedComments resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- config/metasrv.example.toml (1 hunks)
Files skipped from review due to trivial changes (1)
- config/metasrv.example.toml
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4223 +/- ##
==========================================
- Coverage 84.86% 84.59% -0.28%
==========================================
Files 1045 1046 +1
Lines 184265 184597 +332
==========================================
- Hits 156384 156158 -226
- Misses 27881 28439 +558 |
e03ba5a
to
e4f318e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- config/config.md (1 hunks)
- config/metasrv.example.toml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- config/metasrv.example.toml
Additional context used
LanguageTool
config/config.md
[uncategorized] ~31-~31: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...(AUTO_HYPHEN)
[uncategorized] ~31-~31: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | |mysql
| -- | -- | MySQL server o...(AUTO_HYPHEN)
[uncategorized] ~40-~40: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | |postgres
| -- | -- | PostgresSQL ...(AUTO_HYPHEN)
[uncategorized] ~49-~49: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | |opentsdb
| -- | -- | OpenTSDB pro...(AUTO_HYPHEN)
[grammar] ~93-~93: “Google” is a proper noun and needs to be capitalized.
Context: ...e| String |
None` | The scope of the google cloud storage.
**It's only used whe...(A_GOOGLE)
[grammar] ~94-~94: “Google” is a proper noun and needs to be capitalized.
Context: ...g |None
| The credential path of the google cloud storage.
**It's only used whe...(A_GOOGLE)
[style] ~134-~134: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |
None| The log level. Can be
info/
debug/
warn/
error. | |
...(MISSING_IT_THERE)
[grammar] ~143-~143: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...Forstandalone
mode,self_import
is recommend to collect metrics generated by itself ...(BEEN_PART_AGREEMENT)
[uncategorized] ~179-~179: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...(AUTO_HYPHEN)
[uncategorized] ~179-~179: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | |mysql
| -- | -- | MySQL server o...(AUTO_HYPHEN)
[uncategorized] ~188-~188: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | |postgres
| -- | -- | PostgresSQL ...(AUTO_HYPHEN)
[uncategorized] ~197-~197: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload | |opentsdb
| -- | -- | OpenTSDB pro...(AUTO_HYPHEN)
[style] ~221-~221: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |
None| The log level. Can be
info/
debug/
warn/
error. | |
...(MISSING_IT_THERE)
[grammar] ~230-~230: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...Forstandalone
mode,self_import
is recommend to collect metrics generated by itself ...(BEEN_PART_AGREEMENT)
[style] ~283-~283: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |
None| The log level. Can be
info/
debug/
warn/
error. | |
...(MISSING_IT_THERE)
[grammar] ~292-~292: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...Forstandalone
mode,self_import
is recommend to collect metrics generated by itself ...(BEEN_PART_AGREEMENT)
[uncategorized] ~326-~326: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...for Certificate and key file change and auto reload.
For now, gRPC tls config does not ...(AUTO_HYPHEN)
[uncategorized] ~326-~326: It appears that a hyphen is missing (if ‘auto’ is not used in the context of ‘cars’).
Context: ...r now, gRPC tls config does not support auto reload. | |runtime
| -- | -- | The runtime ...(AUTO_HYPHEN)
[grammar] ~374-~374: “Google” is a proper noun and needs to be capitalized.
Context: ...e| String |
None` | The scope of the google cloud storage.
**It's only used whe...(A_GOOGLE)
[grammar] ~375-~375: “Google” is a proper noun and needs to be capitalized.
Context: ...g |None
| The credential path of the google cloud storage.
**It's only used whe...(A_GOOGLE)
[style] ~415-~415: To form a complete sentence, be sure to include a subject.
Context: ...vel| String |
None| The log level. Can be
info/
debug/
warn/
error. | |
...(MISSING_IT_THERE)
[grammar] ~424-~424: Consider using either the past participle “recommended” or the present participle “recommending” here.
Context: ...Forstandalone
mode,self_import
is recommend to collect metrics generated by itself ...(BEEN_PART_AGREEMENT)
Markdownlint
config/config.md
5-5: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
6-6: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
7-7: Expected: 2; Actual: 4
Unordered list indentation(MD007, ul-indent)
151-151: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
238-238: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
300-300: Expected: 1; Actual: 2
Multiple consecutive blank lines(MD012, no-multiple-blanks)
37-37: null
Bare URL used(MD034, no-bare-urls)
185-185: null
Bare URL used(MD034, no-bare-urls)
247-247: null
Bare URL used(MD034, no-bare-urls)
Additional comments not posted (1)
config/config.md (1)
262-262
: Updated Heartbeat Pause ConfigurationThe
acceptable_heartbeat_pause
setting was increased from3000ms
to10000ms
. This change should help in environments with variable network conditions but might delay the detection of node failures. It's crucial to monitor the effects of this change on the system's ability to detect and respond to failures in a timely manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- src/common/meta/src/ddl.rs (4 hunks)
- src/common/meta/src/ddl/create_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
Files skipped from review as they are similar to previous changes (3)
- src/common/meta/src/ddl/create_table.rs
- src/common/meta/src/ddl/drop_table/executor.rs
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
Additional comments not posted (2)
src/common/meta/src/ddl.rs (2)
149-150
: Addition ofregion_failure_detector_controller
inDdlContext
.The addition of
region_failure_detector_controller
toDdlContext
aligns with the PR's objective to enhance region failure detection. This change integrates the newly defined controller into the system context, allowing for broader access and management within DDL operations. Ensure that all usages ofDdlContext
correctly initialize this new field.
105-129
: Review of theRegionFailureDetectorController
interface and itsNoopRegionFailureDetectorControl
implementation.
- Interface Definition (Lines 107-117): The interface and its methods are well-defined for async operations. The method parameters use tuples, which align with previous suggestions by
fengjiachun
. However, detailed Rustdoc comments are needed to explain the parameters and expected behavior.- Noop Implementation (Lines 119-128): The noop implementation is clear and adheres to the async trait. It correctly implements the interface with no operations, which is typical for a noop pattern. Ensure that this implementation is only used where appropriate, as it effectively disables failure detection.
: The method signatures match the suggestions by
fengjiachun
.Would you like me to add detailed Rustdoc comments to these methods?
@fengjiachun PTAL |
Tip For best results, initiate chat on the files or code changes. @fengjiachun Hi! How can I assist you today? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/meta-srv/src/procedure/region_migration.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/meta-srv/src/procedure/region_migration.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/common/meta/src/ddl/create_table.rs (1 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
Files skipped from review as they are similar to previous changes (2)
- src/common/meta/src/ddl/create_table.rs
- src/common/meta/src/ddl/drop_table/executor.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- tests-fuzz/targets/failover/fuzz_failover_mito_regions.rs (2 hunks)
Additional comments not posted (3)
tests-fuzz/targets/failover/fuzz_failover_mito_regions.rs (3)
267-267
: Reduced parallelism in fuzz tests.The parallelism setting has been reduced from 8 to 4. While this might stabilize the tests, consider verifying if it still effectively uncovers concurrency-related issues.
87-87
: Reduced range for row generation in fuzz tests.The maximum limit for rows has been halved from 4096 to 2048. This reduction could make the tests less stressful and more focused, but it's important to ensure that it still meets the testing requirements for edge cases.
89-89
: Reduced range for insert generation in fuzz tests.The maximum limit for inserts has been reduced from 16 to 8. This change might focus the testing on more typical scenarios, but ensure it still adequately tests the system's handling of multiple concurrent inserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion: can we add some tests to cover these cases?
Yes, tests will be introduced in following PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- src/common/meta/src/ddl.rs (4 hunks)
- src/common/meta/src/ddl/create_table.rs (2 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
- src/common/meta/src/ddl/utils.rs (2 hunks)
- src/meta-srv/src/procedure/region_migration.rs (5 hunks)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
- src/meta-srv/src/region/failure_detector.rs (5 hunks)
- src/meta-srv/src/region/supervisor.rs (12 hunks)
Files skipped from review as they are similar to previous changes (7)
- src/common/meta/src/ddl.rs
- src/common/meta/src/ddl/create_table.rs
- src/common/meta/src/ddl/drop_table/executor.rs
- src/meta-srv/src/procedure/region_migration.rs
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
- src/meta-srv/src/region/failure_detector.rs
- src/meta-srv/src/region/supervisor.rs
Additional comments not posted (1)
src/common/meta/src/ddl/utils.rs (1)
134-147
: Validate the new utility function for converting region routes.The new function
convert_region_routes_to_detecting_regions
correctly mapsRegionRoute
objects toDetectingRegion
tuples. This is crucial for the new failure detection features. However, ensure that all callers handle the output correctly, especially regarding empty or null leader peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/meta-srv/src/metasrv/builder.rs (4 hunks)
- src/meta-srv/src/region/supervisor.rs (12 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/meta-srv/src/region/supervisor.rs
Additional comments not posted (2)
src/meta-srv/src/metasrv/builder.rs (2)
25-27
: Approved import changes for region failure detection.The newly imported
NoopRegionFailureDetectorControl
andRegionFailureDetectorControllerRef
are crucial for the enhancements related to region failure detection as described in the PR. Ensure these are integrated and utilized correctly in the subsequent code.
290-311
: Refine error handling and verify logical conditions for region failure detection setup.The conditional logic for setting up the region failure detectors seems appropriate. However, consider using a more descriptive error type or message than
UnexpectedSnafu
to clarify the specific conditions under which the error occurs.- return error::UnexpectedSnafu { - violated: "Region failover is not supported in the local WAL implementation!", - } - .fail(); + return Err(MetasrvError::ConfigurationError { + message: "Region failover requires a remote WAL setup. Please configure your WAL settings appropriately.", + });Ensure the conditions
is_remote_wal
andenable_region_failover
are documented and clearly explained in the system configuration or setup documentation.
40b24d5
to
cef5b65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- src/common/meta/src/ddl.rs (4 hunks)
- src/common/meta/src/ddl/drop_table/executor.rs (7 hunks)
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs (5 hunks)
- src/meta-srv/src/region/failure_detector.rs (4 hunks)
- src/meta-srv/src/region/supervisor.rs (13 hunks)
Files skipped from review as they are similar to previous changes (3)
- src/common/meta/src/ddl/drop_table/executor.rs
- src/meta-srv/src/procedure/region_migration/update_metadata/rollback_downgraded_region.rs
- src/meta-srv/src/region/failure_detector.rs
Additional comments not posted (6)
src/common/meta/src/ddl.rs (3)
19-19
: Imports are appropriate and well-organized.These imports are necessary for the new functionality related to region failure detection and handling.
Also applies to: 33-33
151-153
: Integration of failure detection into DDL operations is well executed.The addition of
region_failure_detector_controller
inDdlContext
struct allows for seamless integration of the new failure detection functionality.
160-164
: Effective management of failure detectors through new methods inDdlContext
.The methods
register_failure_detectors
andderegister_failure_detectors
are well-implemented, providing essential functionality for managing region failure detectors dynamically.Also applies to: 170-174
src/meta-srv/src/region/supervisor.rs (3)
19-19
: Imports are correctly updated to support new functionality.The imports are necessary for the new event types and failure detector management functionality introduced in this file.
79-80
: New event types for failure detectors are well-designed and implemented.The addition of
RegisterFailureDetectors
andDeregisterFailureDetectors
event types allows for effective asynchronous management of failure detectors, enhancing the system's resilience and adaptability.Also applies to: 103-110
310-323
: Effective event handling for failure detectors inRegionSupervisor
.The methods
register_failure_detectors
andderegister_failure_detectors
are correctly implemented to handle respective events, ensuring robust management of region failure detectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#4161
What's changed and what's your intention?
Checklist
Summary by CodeRabbit
New Features
Improvements
Configuration
acceptable_heartbeat_pause
value from"3000ms"
to"10000ms"
for better failure detection timing.Tests