-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] add Health Check for Range over gRPC Connection Loop #2828
base: main
Are you sure you want to change the base?
[BUGFIX] add Health Check for Range over gRPC Connection Loop #2828
Conversation
📝 WalkthroughWalkthroughThis pull request introduces extensive changes across the repository. The updates include modifications to Makefiles (adding new variables and updating command invocations), refactoring various Go packages to pass context parameters in gRPC client, pool, and test methods, and improvements to error logging and handling. API documentation in Swagger JSON and additional docs has been greatly enhanced with expanded summaries, structured status code tables, and troubleshooting information. Additionally, numerous dependency and version updates have been applied—including changes to Kubernetes deployment annotations, GitHub action versions, and Rust toolchain updates—along with minor typographical corrections in API and documentation files. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant gRPCClient
participant ConnectionPool
Client->>+gRPCClient: ConnectedAddrs(ctx)
gRPCClient->>+ConnectionPool: rangeConns(ctx, force, callback)
ConnectionPool-->>-gRPCClient: Connection health data
gRPCClient-->>-Client: List of active addresses
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (110)
🔇 Additional comments (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[CHATOPS:HELP] ChatOps commands.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/net/grpc/pool/pool.go (2)
143-160
: Fallback initial connection dial.
The fallback logic for re-dialing on the same or a new port is well-structured. Consider extracting this repeated dial fallback into a helper function if it grows more complex.
437-464
: New Reconnect method logic.
The flow checks if the pool is closing or already healthy, then conditionally triggers fresh DNS resolution. This seems correct; however, ensure that concurrent calls toReconnect
won’t lead to inconsistent states (e.g., changingreconnectHash
while another goroutine is also resolving DNS). Overall, a good expansion of functionality.pkg/gateway/lb/handler/grpc/aggregation.go (1)
108-108
: Consider using structured logging with error context.The added debug logging is good for visibility, but consider using structured logging to include additional context like target address, request ID, etc. This will make debugging easier by providing more context around the errors.
Example:
-log.Debug(err) +log.Debug("aggregation search error", + "error", err, + "target", target, + "request_id", bcfg.GetRequestId())Also applies to: 120-120, 177-177, 189-189
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
Makefile.d/k3d.mk
(3 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(12 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(27 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)versions/K3S_VERSION
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- versions/K3S_VERSION
- pkg/gateway/mirror/service/mirror.go
🔇 Additional comments (36)
internal/net/grpc/pool/pool.go (17)
23-40
: Imports look consistent.
These newly added imports (maps
,codes
,status
,singleflight
) seem fitting for the added functionalities (maps.Clone usage, gRPC error codes, singleflight to handle concurrency). No issues spotted.
50-54
: Context-aware method signatures.
Introducing context parameters intoDisconnect
andGet
helps ensure clean cancellations and timeouts. This update aligns well with modern best practices.
79-79
: Singleflight for connection concurrency control.
Defininggroup singleflight.Group[Conn]
is a neat approach for preventing duplicate connections and consolidating in-flight requests.
93-94
: Shared metrics map under locking.
Usingmu
to guard themetrics
map ensures thread safety. Confirm consistent locking around every access to avoid data races.
107-107
: Instantiating singleflight.
Initializingp.group = singleflight.New[Conn]()
on creation is helpful for grouping connection calls. Good addition.
193-193
: Debug log usage.
The debug log here is straightforward. No issues noted.
241-241
: Lock usage within store method.
Acquiring a write lock for the pool store operation is correct. Ensures correct pointer updates.
355-356
: Additional debug message.
Just a debug statement; no functional impact.
368-370
: Check if pool is closing before singleTargetConnect.
Short-circuiting whenp == nil || p.closing.Load()
helps avoid redundant dial attempts. Overall logic is sound.Also applies to: 376-376
466-467
: Context-aware disconnection.
Providing actx
argument toDisconnect
is a nice extension for controlled shutdowns.
494-494
: Dial debug logging.
This addition is harmless and helpful for tracing dial attempts.
555-559
: Calculation of unhealthy connections.
Assigningunhealthy = pl - cnt
implies “remaining connections are unhealthy.” Ifcnt
ever surpassespl
, this might yield a negative result. Verify that such a scenario cannot occur, or add a safeguard if needed.
604-633
: Enhanced Do method error recovery flow.
• Incorporating status checks and re-closing the connection if it’s canceled or fails is a solid approach to preserve resilience.
• The subsequentrefreshConn
call (line 626) properly attempts a retry.
Looks reasonable overall.
640-644
: Get method with a context parameter.
UsinggetHealthyConn
matches the “do once, fail fast” logic. Good step for context awareness.
795-843
: poolConn.Close with delayed shutdown logic.
The approach thoroughly checks connection states, waiting for a stable state or the context to expire. Good for graceful closings.
845-872
: isHealthy extended logging.
The function properly triggersconn.Connect()
for IDLE states and recurses upon state changes. This handles transient states robustly.
874-883
: Metrics function.
Exposing a safe clone of themetrics
map is a good pattern to avoid concurrency issues with external callers.internal/net/grpc/client.go (7)
93-93
: Method now accepts context.
SwitchingConnectedAddrs()
to be context-aware aligns with gRPC best practices, enabling cancellation/timeouts during address checking.
253-253
: Using rangeConns with force=true for internal loops.
These lines trigger a forced iteration over connections for rebalancing/health checks. Ensure concurrency is well-managed when multiple rebalances happen simultaneously.Also applies to: 289-289
418-418
: Non-forced rangeConns calls.
CallingrangeConns(ctx, false, ...)
indicates partial reconnection logic only if connections are unhealthy. This approach is consistent with your incremental healing design.Also applies to: 481-481
568-568
: Checking IsHealthy before usage.
The approach to skip or reconnect invalid pools is correct. Logging warns about unreachable addresses, which is helpful for debugging.Also applies to: 637-637
703-703
: RoundRobin approach calls rangeConns.
ApplyingrangeConns(ctx, false, ...)
is a valid pattern for enumerating connections until one is healthy. No concerns here.
1088-1101
: ConnectedAddrs now uses rangeConns.
Implementation filters out only healthy connections, returning an updated slice of addresses. Helps external callers handle connection states.
1121-1149
: New internal helper rangeConns.
•force
param usage for forced iteration vs. selective iteration is a nice design.
• The check that re-connects unhealthy pool objects ensures self-healing.
Looks cohesive.Makefile.d/k3d.mk (3)
20-20
: New K3D_NETWORK variable.
DeclaringK3D_NETWORK = bridge
is straightforward; no issues.
33-34
: Customized install directory.
UsingK3D_INSTALL_DIR=$(BINDIR)
and ensuringchmod a+x
is performed afterwards looks good. This keeps binaries organized.
46-47
: Eviction policy arguments for K3S.
These additional kubelet arguments help manage disk usage thresholds. The usage is correct for the “agent” node contexts. No issues spotted.internal/test/mock/grpc/grpc_client_mock.go (1)
54-54
: LGTM! Context parameter properly added.The method signature update is consistent with the interface changes across the codebase.
internal/observability/metrics/grpc/grpc.go (3)
33-34
: LGTM! New metrics for pool connection health.Clear and descriptive metric name and description for tracking pool connection health.
69-77
: LGTM! Proper metric view configuration.The view is correctly configured with sum aggregation, which is appropriate for connection count metrics.
83-102
: LGTM! Well-implemented metrics registration.The implementation properly:
- Creates an observable gauge
- Handles registration errors
- Sets up callback for metrics collection
internal/backoff/backoff.go (1)
244-244
: LGTM! Improved metrics cloning.Good use of maps.Clone for a more concise and efficient implementation.
internal/net/grpc/client_test.go (1)
3110-3110
: LGTM! Context parameter additions improve consistency.The addition of context parameters to
ConnectedAddrs
andrangeConns
methods aligns with the context-aware design pattern used throughout the codebase. This change enables better control over cancellation and timeouts.Also applies to: 3306-3306
internal/net/grpc/pool/pool_test.go (3)
2347-2347
: LGTM! Context parameter added correctly.The change to add
context.Background()
to theDisconnect
method call aligns with the updated method signature.
1-14
: LGTM! License header is properly maintained.The Apache 2.0 license header is correctly formatted and includes the appropriate copyright information.
18-108
: Implement missing test cases.The file contains extensive test case structures but most are commented out with TODO markers. Consider implementing these test cases to ensure proper coverage of the pool functionality, especially for the updated
Disconnect
method.Let's check the test coverage:
Also applies to: 110-265, 267-422, 424-584, 586-744, 746-909, 911-1061, 1063-1213, 1215-1360, 1362-1531, 1533-1696, 1698-1866, 1868-2035, 2037-2201, 2203-2352, 2354-2522, 2524-2684, 2686-2849, 2851-3015, 3017-3187, 3189-3339, 3341-3491, 3493-3657, 3659-3823, 3825-3975, 3977-4127, 4129-4232, 4234-4321
3b75a31
to
ac007bc
Compare
Deploying vald with
|
Latest commit: |
9a713bb
|
Status: | ✅ Deploy successful! |
Preview URL: | https://daffb205.vald.pages.dev |
Branch Preview URL: | https://bugfix-internal-grpc-add-hea.vald.pages.dev |
ac007bc
to
e317f7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
internal/observability/metrics/grpc/grpc.go (1)
81-103
: Consider error handling improvements in the callback function.The callback function for observing pool connection metrics could benefit from more robust error handling.
Consider this improvement:
func (gm *grpcServerMetrics) Register(m metrics.Meter) error { healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { return err } _, err = m.RegisterCallback( func(ctx context.Context, o api.Observer) error { ms := pool.Metrics(ctx) if len(ms) == 0 { return nil } + var errs error for name, cnt := range ms { - o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))) + if err := o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))); err != nil { + errs = errors.Join(errs, fmt.Errorf("failed to observe metrics for %s: %w", name, err)) + } } - return nil + return errs }, healthyConn, ) return err }internal/net/grpc/pool/pool.go (2)
92-95
: Consider using sync.Map for metrics.The current implementation uses a global map with mutex, which could be replaced with sync.Map for better concurrent access patterns.
Consider this improvement:
-var ( - mu sync.RWMutex - metrics map[string]int64 = make(map[string]int64) -) +var metrics sync.Map[string, int64]
466-492
: Consider adding metrics for disconnection events.The Disconnect method could benefit from tracking disconnection events in metrics.
Consider adding a new metric for tracking disconnection events:
func (p *pool) Disconnect(ctx context.Context) (err error) { log.Debug("Disconnecting...") p.closing.Store(true) defer p.closing.Store(false) emap := make(map[string]error, p.len()) + disconnectedConns := 0 err = p.loop(ctx, func(ctx context.Context, _ int, pc *poolConn) bool { if pc != nil && pc.conn != nil { ierr := pc.conn.Close() if ierr != nil { if !errors.Is(ierr, context.DeadlineExceeded) && !errors.Is(ierr, context.Canceled) { log.Debugf("failed to close connection pool addr = %s\terror = %v", pc.addr, ierr) emap[ierr.Error()] = err } else { log.Debugf("Disconnect loop operation canceled while closing pool member connection to %s,\terror: %v", pc.addr, ierr) return false } } + disconnectedConns++ } return true }) p.flush() + mu.Lock() + metrics[p.addr] = 0 // Reset metrics for this pool + mu.Unlock() for _, e := range emap { err = errors.Join(err, e) } return err }internal/net/grpc/client.go (1)
1128-1140
: Consider adding retry mechanism for connection recovery.The connection recovery logic could benefit from a retry mechanism with backoff.
Consider this improvement:
if p == nil || !p.IsHealthy(ctx) { + var retries int + maxRetries := 3 + backoffDuration := time.Second + for retries < maxRetries { pc, err := p.Connect(ctx) if pc == nil || err != nil || !pc.IsHealthy(ctx) { if pc != nil { if derr := pc.Disconnect(ctx); derr != nil { log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) } } + retries++ + if retries < maxRetries { + select { + case <-ctx.Done(): + return true + case <-time.After(backoffDuration * time.Duration(retries)): + continue + } + } log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) return true } p = pc + break + } }internal/net/grpc/pool/pool_test.go (1)
2347-2347
: Update test to include context parameter.The test is calling
p.Disconnect(context.Background())
with a default background context. Consider adding test cases that verify context cancellation behavior.Add test cases to verify:
- Context cancellation during disconnect
- Context timeout handling
- Context propagation to underlying connections
-err := p.Disconnect(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) +defer cancel() +err := p.Disconnect(ctx)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
Makefile.d/k3d.mk
(3 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(12 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(27 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)versions/K3S_VERSION
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- versions/K3S_VERSION
- internal/backoff/backoff.go
- internal/test/mock/grpc_testify_mock.go
- pkg/gateway/mirror/service/mirror.go
- internal/test/mock/grpc/grpc_client_mock.go
- Makefile.d/k3d.mk
- pkg/gateway/lb/handler/grpc/aggregation.go
- internal/net/grpc/client_test.go
⏰ Context from checks skipped due to timeout of 90000ms (93)
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
🔇 Additional comments (9)
internal/observability/metrics/grpc/grpc.go (3)
33-34
: LGTM! Clear and descriptive metric names.The new metric names and descriptions are clear and follow the existing naming pattern.
37-39
: LGTM! Well-structured struct update.The addition of poolTargetAddrKey field with a clear purpose enhances the struct's functionality for tracking pool connection metrics.
69-77
: LGTM! Consistent view registration.The new view for pool connection metrics follows the same pattern as existing views and uses appropriate aggregation type (Sum) for counting connections.
internal/net/grpc/pool/pool.go (2)
586-589
: LGTM! Thread-safe metrics update.The metrics update is properly protected by mutex lock.
873-881
: LGTM! Clean metrics retrieval implementation.The Metrics function safely clones and returns the metrics map.
internal/net/grpc/client.go (2)
1121-1148
: LGTM! Improved connection health checking.The rangeConns method now properly handles connection health checks and includes force parameter for different use cases.
1088-1100
: LGTM! Context-aware ConnectedAddrs implementation.The ConnectedAddrs method now properly uses context for health checks.
internal/net/grpc/pool/pool_test.go (2)
16-108
: Implement missing test cases forNew
function.The test cases for the
New
function are commented out with TODO placeholders. Since this PR adds health check functionality, it's important to have test coverage for pool initialization.Run this script to check if there are any existing tests for the
New
function:Would you like me to help implement test cases for the
New
function that verify:
- Pool initialization with health check configuration
- Error cases for invalid health check parameters
- Default health check behavior
4234-4320
: Add test cases for health check functionality.The
isHealthy
test function is crucial for this PR's health check feature but lacks test implementations.Run this script to check if there are any existing health check tests:
Would you like me to help implement test cases that verify:
- Health check success/failure scenarios
- Health check timeout handling
- Health check error conditions
- Health check metrics collection
716f8e0
to
a453411
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2828 +/- ##
==========================================
- Coverage 23.92% 17.18% -6.75%
==========================================
Files 547 136 -411
Lines 54586 15042 -39544
==========================================
- Hits 13061 2585 -10476
+ Misses 40741 12154 -28587
+ Partials 784 303 -481 ☔ View full report in Codecov by Sentry. |
8759e30
to
9ecb684
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
internal/net/grpc/client_test.go (1)
1-3490
: Test Implementation Incompleteness – Add Test CasesIt appears that the test file contains extensive scaffolding with 38 TODO markers and 38 placeholder "test_case_X" sections. Although the placeholders are present, no actual test logic is implemented. To ensure thorough coverage, we need to fill these scaffolds with meaningful tests addressing:
• Validating gRPC client connectivity, including context parameter changes
• Testing both happy paths and error scenarios
• Covering concurrent and ordered operations as defined in methods such as Range, RangeConcurrent, ConnectedAddrs, and rangeConnsWould you like assistance in generating concrete test implementations for these sections?
🔗 Analysis chain
Add test cases for the TODO sections.
The test file has extensive test case structures defined but most are commented out and marked as TODO. This indicates incomplete test coverage which should be addressed.
Let's verify the test coverage:
Would you like me to help generate test cases for the TODO sections? I can provide implementations that cover:
- Happy path scenarios
- Error cases
- Edge cases
- Concurrent execution scenarios
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Count total number of test functions vs implemented test cases echo "Test functions with TODO markers:" rg -c "TODO test cases" internal/net/grpc/client_test.go echo "Actual implemented test cases:" rg -c "test_case_[0-9]+" internal/net/grpc/client_test.goLength of output: 272
🧹 Nitpick comments (6)
internal/observability/metrics/grpc/grpc.go (1)
81-103
: Consider adding error handling for empty metrics.The implementation looks good but could benefit from additional error handling.
Consider adding error handling when
ms
is nil:func (gm *grpcServerMetrics) Register(m metrics.Meter) error { healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { return err } _, err = m.RegisterCallback( func(ctx context.Context, o api.Observer) error { ms := pool.Metrics(ctx) - if len(ms) == 0 { + if ms == nil || len(ms) == 0 { return nil } for name, cnt := range ms { o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))) } return nil }, healthyConn, ) return err }internal/net/grpc/pool/pool.go (1)
800-848
: Consider adding timeout validation.The connection closing logic could benefit from timeout validation.
Consider validating the delay parameter:
func (pc *poolConn) Close(ctx context.Context, delay time.Duration) error { + if delay <= 0 { + return errors.New("delay must be positive") + } tdelay := delay / 10 if tdelay < time.Millisecond*5 { tdelay = time.Millisecond * 5internal/net/grpc/client.go (1)
1121-1148
: Consider adding connection retry limit.The connection retry logic in rangeConns could benefit from a retry limit to prevent infinite loops.
Consider adding a retry limit:
func (g *gRPCClient) rangeConns(ctx context.Context, force bool, fn func(addr string, p pool.Conn) bool) error { var cnt int + const maxRetries = 3 + retries := 0 g.conns.Range(func(addr string, p pool.Conn) bool { if force { cnt++ return fn(addr, p) } if p == nil || !p.IsHealthy(ctx) { + if retries >= maxRetries { + log.Debugf("Max retries reached for %s", addr) + return true + } + retries++ pc, err := p.Connect(ctx)internal/net/grpc/client_test.go (2)
87-87
: Consider disabling parallel test execution.The tests use
t.Parallel()
but given the shared nature of gRPC connections and the potential for race conditions, parallel execution may lead to flaky tests.Consider removing
t.Parallel()
or implementing proper test isolation through:
- Unique connection pools per test
- Isolated test servers
- Proper cleanup between tests
Also applies to: 441-441, 632-632, 824-824, 1019-1019, 1212-1212, 1408-1408, 1611-1611, 1786-1786, 2138-2138, 2322-2322, 2518-2518, 2707-2707, 3071-3071
87-103
: Add goleak cleanup in TestMain.While individual tests use
goleak.VerifyNone()
, it would be better to centralize this in aTestMain
function.Add a TestMain function:
func TestMain(m *testing.M) { goleak.VerifyTestMain(m, goleak.IgnoreCurrent(), // Add other specific goroutine ignore rules ) }internal/net/grpc/pool/pool_test.go (1)
45-46
: Implement test cases for the pool operations.The test file has a well-structured test table setup but lacks actual test cases. I can help implement test cases for various pool operations like connection management, health checks, and reconnection logic.
Would you like me to help generate test cases for any specific pool operations? For example, I can start with:
- Connection initialization tests
- Health check tests
- Reconnection tests
- Error handling tests
Also applies to: 147-148, 304-305, 466-467, 624-625, 789-790, 949-950, 1101-1102, 1248-1249, 1407-1408, 1579-1580, 1746-1747, 1915-1916, 2083-2084, 2241-2242, 2402-2403, 2566-2567, 2729-2730, 2897-2898, 3065-3066, 3227-3228, 3379-3380, 3539-3540, 3704-3705, 3863-3864, 4015-4016, 4157-4158, 4257-4258
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
Makefile.d/k3d.mk
(3 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(12 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(26 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/net/net.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)versions/K3S_VERSION
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- pkg/gateway/mirror/service/mirror.go
- internal/test/mock/grpc_testify_mock.go
- internal/test/mock/grpc/grpc_client_mock.go
- Makefile.d/k3d.mk
- versions/K3S_VERSION
- internal/backoff/backoff.go
- pkg/gateway/lb/handler/grpc/aggregation.go
🔇 Additional comments (9)
internal/observability/metrics/grpc/grpc.go (3)
33-34
: LGTM! Clear and descriptive metric naming.The new metric for tracking pool connections has a clear name and description that accurately reflects its purpose.
37-39
: LGTM! Good field naming and initialization.The
poolTargetAddrKey
field is appropriately named and initialized with a descriptive value in theNew
function.
69-77
: LGTM! Proper metric view configuration.The view for pool connections is correctly configured with
AggregationSum
which is appropriate for counting healthy connections.internal/net/net.go (1)
199-199
: LGTM! Improved IP address logging.Using
StringExpanded()
provides a more detailed representation of IP addresses in logs, which is helpful for debugging.internal/net/grpc/pool/pool.go (2)
544-607
: LGTM! Robust health check implementation.The health check implementation is thorough and handles various edge cases appropriately:
- Checks connection state
- Handles reconnection attempts
- Updates metrics
- Proper error handling
92-95
: 🛠️ Refactor suggestionConsider using a more thread-safe metrics initialization.
The global metrics map initialization could be improved for thread safety.
Consider using sync.Once for thread-safe initialization:
var ( mu sync.RWMutex - metrics map[string]int64 = make(map[string]int64) + metrics map[string]int64 + metricsOnce sync.Once ) +func init() { + metricsOnce.Do(func() { + metrics = make(map[string]int64) + }) +}Likely invalid or redundant comment.
internal/net/grpc/client.go (1)
1088-1100
: LGTM! Proper context handling in ConnectedAddrs.The implementation correctly uses context for connection health checks and properly handles errors.
internal/net/grpc/pool/pool_test.go (2)
2347-2347
: LGTM! The context parameter addition is consistent.The change to add
context.Background()
parameter top.Disconnect()
call aligns with the goal of improving context handling in gRPC connections.
16-108
: Implement the TODO test cases to ensure comprehensive test coverage.The file contains numerous test function stubs with TODO placeholders. Given that this is a gRPC connection pool implementation, having comprehensive test coverage is crucial for ensuring reliability and proper connection handling.
Run the following script to check test coverage:
Also applies to: 109-265, 266-422, 423-584, 585-744, 745-909, 910-1061, 1062-1213, 1214-1360, 1361-1531, 1532-1697, 1698-1866, 1867-2035, 2036-2201, 2202-2353, 2354-2522, 2523-2684, 2685-2849, 2850-3015, 3016-3187, 3188-3339, 3340-3491, 3492-3657, 3658-3823, 3824-3975, 3976-4127, 4128-4232, 4233-4321
278a532
to
4247d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
internal/observability/metrics/grpc/grpc.go (1)
81-104
: Consider adding error handling for callback registration.The error from
RegisterCallback
is correctly propagated, but consider adding specific error handling for the callback function itself to prevent potential panics.func (gm *grpcServerMetrics) Register(m metrics.Meter) error { // The metrics are dynamically registered at the grpc server interceptor package, healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { return err } _, err = m.RegisterCallback( func(ctx context.Context, o api.Observer) error { ms := pool.Metrics(ctx) if len(ms) == 0 { return nil } + var errs error for name, cnt := range ms { - o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))) + if err := o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))); err != nil { + errs = errors.Join(errs, fmt.Errorf("failed to observe metrics for %s: %w", name, err)) + } } - return nil + return errs }, healthyConn, ) return err }internal/net/grpc/pool/pool.go (2)
92-95
: Consider using sync.Map for metrics.The global
metrics
map with mutex could potentially become a bottleneck under high concurrency. Consider usingsync.Map
for better performance.-var ( - mu sync.RWMutex - metrics map[string]int64 = make(map[string]int64) -) +var metrics sync.Map[string, int64]
592-594
: Consider batching metrics updates.The metrics update in
IsHealthy
could be optimized by batching updates or using atomic operations.-mu.Lock() -metrics[p.addr] = int64(pl - unhealthy) -mu.Unlock() +atomic.StoreInt64(&metrics[p.addr], int64(pl - unhealthy))internal/net/grpc/client.go (1)
1129-1156
: Consider adding timeout for health checks.The
rangeConns
method should consider adding a timeout for health checks to prevent long-running operations.func (g *gRPCClient) rangeConns(ctx context.Context, force bool, fn func(addr string, p pool.Conn) bool) error { + // Add timeout for health checks + ctx, cancel := context.WithTimeout(ctx, 5*time.Second) + defer cancel() + var cnt int g.conns.Range(func(addr string, p pool.Conn) bool { if force { cnt++ return fn(addr, p) } if p == nil || !p.IsHealthy(ctx) { pc, err := p.Connect(ctx) if pc == nil || err != nil || !pc.IsHealthy(ctx) { if pc != nil { if derr := pc.Disconnect(ctx); derr != nil { log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) } } log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) return true } p = pc } cnt++ return fn(addr, p) }) if cnt == 0 { return errors.ErrGRPCClientConnNotFound("*") } return nil }internal/net/grpc/client_test.go (1)
3110-3110
: Add timeout to context in test.When testing network operations, it's recommended to use context with timeout to prevent tests from hanging indefinitely.
-gotAddrs := g.ConnectedAddrs(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) +defer cancel() +gotAddrs := g.ConnectedAddrs(ctx)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
Makefile
(2 hunks)Makefile.d/functions.mk
(2 hunks)Makefile.d/k3d.mk
(3 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(16 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(26 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/net/net.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)versions/K3S_VERSION
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- versions/K3S_VERSION
- pkg/gateway/mirror/service/mirror.go
- internal/backoff/backoff.go
- internal/net/net.go
- Makefile.d/k3d.mk
- internal/test/mock/grpc/grpc_client_mock.go
- pkg/gateway/lb/handler/grpc/aggregation.go
- internal/test/mock/grpc_testify_mock.go
- internal/net/grpc/pool/pool_test.go
🔇 Additional comments (13)
Makefile.d/functions.mk (1)
62-67
: LGTM: CGO flags are properly defined and ordered.The CGO flags are correctly defined with proper ordering, ensuring that all necessary flags (
CFLAGS
,CXXFLAGS
,FFLAGS
) are set before they're used in CGO-specific variables.Makefile (4)
161-164
: Excellent optimization and security flags configuration.The new
COMMON_FLAGS
variable provides a comprehensive set of optimization and security flags:
- Performance:
-Ofast
,-march=native
,-mtune=native
,-flto=auto
- Security:
-fvisibility=hidden
- Size optimization:
-ffunction-sections
,-fdata-sections
165-167
: LGTM: Language standards are properly set.The language standards are correctly set for each compiler:
- C: gnu23
- C++: gnu++23
- Fortran: f2018
181-182
: LGTM: Clean flag handling for arm64.The flags for arm64 architecture are cleanly handled without unnecessary modifications.
176-178
: Verify AVX-512 disabling flags for NGT on amd64.The NGT_FLAGS disable AVX-512 instructions which might affect performance on newer CPUs that support these instructions.
Run this script to check if the target system supports AVX-512:
internal/observability/metrics/grpc/grpc.go (3)
33-34
: LGTM! Clear and descriptive metric names.The new metric names and descriptions are clear and accurately describe their purpose.
37-39
: LGTM! Well-structured struct update.The addition of
poolTargetAddrKey
field togrpcServerMetrics
struct is appropriate for storing the target address key.
69-77
: LGTM! Appropriate metric view configuration.The new view for pool connection metrics is correctly configured with sum aggregation, which is appropriate for counting healthy connections.
internal/net/grpc/pool/pool.go (1)
879-887
: LGTM! Clean metrics retrieval implementation.The
Metrics
function correctly usesmaps.Clone
to return a copy of the metrics map, preventing external modifications.internal/net/grpc/client.go (2)
1096-1108
: LGTM! Context-aware connection listing.The
ConnectedAddrs
method now correctly uses context for health checks and connection listing.
1115-1124
: LGTM! Proper context handling in Close method.The
Close
method now correctly handles context cancellation during disconnection.internal/net/grpc/client_test.go (2)
3306-3308
: LGTM! Context parameter correctly added.The test args struct has been updated to include the context parameter as previously requested.
20-105
: Implement test cases for better coverage.The test file contains extensive boilerplate but most test cases are commented out with TODO placeholders. Consider implementing these test cases to ensure proper test coverage of the gRPC client functionality.
Let's check the current test coverage:
Would you like me to help generate test cases for the key functionality?
Also applies to: 159-246, 346-435, 535-627, 728-819, 921-1014, 1118-1207, 1312-1403, 1510-1605, 1702-1781, 1876-1957, 2052-2133, 2228-2317, 2422-2513, 2612-2702, 2802-2891, 2986-3067, 3166-3254, 3352-3440
6c78959
to
7ba32c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (9)
internal/observability/metrics/grpc/grpc.go (1)
81-103
: Improve error handling in callback registration.While the implementation is solid, consider handling errors from the callback function more explicitly. Currently, if the callback returns an error, it's only returned from the Register method without any logging or cleanup.
Consider this improvement:
func (gm *grpcServerMetrics) Register(m metrics.Meter) error { healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { return err } _, err = m.RegisterCallback( func(ctx context.Context, o api.Observer) error { ms := pool.Metrics(ctx) if len(ms) == 0 { return nil } for name, cnt := range ms { o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))) } return nil }, healthyConn, ) - return err + if err != nil { + log.Errorf("Failed to register metrics callback: %v", err) + return fmt.Errorf("failed to register metrics callback: %w", err) + } + return nil }internal/net/grpc/pool/pool.go (3)
92-95
: Consider encapsulating metrics storage.While the implementation is thread-safe, consider encapsulating the metrics storage in a dedicated type to avoid global variables and provide better control over the metrics lifecycle.
Consider this alternative approach:
-var ( - mu sync.RWMutex - metrics map[string]int64 = make(map[string]int64) -) +type metricsStorage struct { + mu sync.RWMutex + metrics map[string]int64 +} + +func newMetricsStorage() *metricsStorage { + return &metricsStorage{ + metrics: make(map[string]int64), + } +} + +var defaultMetricsStorage = newMetricsStorage()
544-607
: Enhance logging consistency in health check.While the health check implementation is robust, the logging could be more structured and consistent. Consider using structured logging with consistent fields across all log messages.
Consider this improvement:
func (p *pool) IsHealthy(ctx context.Context) (healthy bool) { if p == nil || p.closing.Load() { return false } - log.Debug("Checking health...") + log.WithFields(log.Fields{ + "addr": p.addr, + "pool_len": p.len(), + }).Debug("Starting health check") var cnt, unhealthy int pl := p.len() err := p.loop(ctx, func(ctx context.Context, idx int, pc *poolConn) bool { if pc == nil || !isHealthy(ctx, pc.conn) { if p.isIP { if pc != nil && pc.addr != "" { err := p.refreshConn(ctx, idx, pc, pc.addr) if err != nil { - unhealthy = pl - cnt + unhealthy = pl - cnt + log.WithFields(log.Fields{ + "addr": pc.addr, + "error": err, + }).Debug("Failed to refresh IP connection") return false } return true } unhealthy = pl - cnt return false }
879-887
: Consider optimizing metrics map cloning.While the implementation is thread-safe, consider optimizing the map cloning for better performance with large maps. Pre-allocating the map with the exact size would avoid map growth and rehashing.
Consider this improvement:
func Metrics(context.Context) map[string]int64 { mu.RLock() - defer mu.RUnlock() - if len(metrics) == 0 { + mu.RUnlock() return nil } - return maps.Clone(metrics) + result := make(map[string]int64, len(metrics)) + for k, v := range metrics { + result[k] = v + } + mu.RUnlock() + return result }internal/net/grpc/client.go (3)
1129-1158
: Enhance error handling in rangeConns.While the implementation is solid, consider improving error handling by aggregating errors from connection operations and providing more context in error messages.
Consider this improvement:
func (g *gRPCClient) rangeConns( ctx context.Context, force bool, fn func(addr string, p pool.Conn) bool, ) error { var cnt int + var errs []error g.conns.Range(func(addr string, p pool.Conn) bool { if force { cnt++ return fn(addr, p) } if p == nil || !p.IsHealthy(ctx) { pc, err := p.Connect(ctx) if pc == nil || err != nil || !pc.IsHealthy(ctx) { if pc != nil { if derr := pc.Disconnect(ctx); derr != nil { - log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) + errs = append(errs, fmt.Errorf("disconnect %s: %w", addr, derr)) } } - log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) + log.WithFields(log.Fields{ + "addr": addr, + "error": err, + "pool": p.String(), + }).Debug("Unhealthy connection detected") return true } p = pc } cnt++ return fn(addr, p) }) if cnt == 0 { - return errors.ErrGRPCClientConnNotFound("*") + if len(errs) > 0 { + return fmt.Errorf("no healthy connections found: %w", errors.Join(errs...)) + } + return errors.ErrGRPCClientConnNotFound("*") } return nil }
1096-1108
: Optimize ConnectedAddrs implementation.While the implementation is correct, consider pre-allocating the slice with the exact capacity to avoid reallocations during append operations.
Consider this improvement:
func (g *gRPCClient) ConnectedAddrs(ctx context.Context) (addrs []string) { - addrs = make([]string, 0, g.conns.Len()) + // Pre-count healthy connections to allocate exact capacity + healthyCount := 0 + g.rangeConns(ctx, false, func(addr string, p pool.Conn) bool { + if p != nil && p.IsHealthy(ctx) { + healthyCount++ + } + return true + }) + addrs = make([]string, 0, healthyCount) err := g.rangeConns(ctx, false, func(addr string, p pool.Conn) bool { if p != nil && p.IsHealthy(ctx) { addrs = append(addrs, addr) } return true }) if err != nil { return nil } return addrs }
1115-1124
: Consider parallel disconnection in Close.While the implementation is correct, consider implementing parallel disconnection for faster cleanup of multiple connections.
Consider this improvement:
-select { -case <-ctx.Done(): - return false -default: - derr := g.Disconnect(ctx, addr) - if derr != nil && !errors.Is(derr, errors.ErrGRPCClientConnNotFound(addr)) { - err = errors.Join(err, derr) - } - return true -} +eg, egctx := errgroup.New(ctx) +eg.SetLimit(runtime.GOMAXPROCS(0)) // Or another suitable concurrency limit +g.conns.Range(func(addr string, p pool.Conn) bool { + addr := addr // Capture for goroutine + eg.Go(safety.RecoverFunc(func() error { + select { + case <-egctx.Done(): + return egctx.Err() + default: + if derr := g.Disconnect(egctx, addr); derr != nil && + !errors.Is(derr, errors.ErrGRPCClientConnNotFound(addr)) { + return derr + } + return nil + } + })) + return true +}) +if err := eg.Wait(); err != nil { + return err +}apis/swagger/v1/vald/insert.swagger.json (1)
17-18
: Fix typo in API documentation.There's a typo in the summary: "Inset RPC" should be "Insert RPC"
Apply this diff to fix the typo:
-"summary": "Overview\nInset RPC is the method to add a new single vector.\n +"summary": "Overview\nInsert RPC is the method to add a new single vector.\napis/swagger/v1/vald/update.swagger.json (1)
50-52
: MultiUpdate Endpoint Documentation Added:
The changes for the/update/multiple
endpoint clearly articulate the multi-update functionality and include a helpful notice regarding gRPC message size limitations. For consistency, consider adding more detailed troubleshooting information in future revisions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
apis/grpc/v1/agent/core/agent.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/agent/sidecar/sidecar.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/discoverer/discoverer.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/egress/egress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/ingress/ingress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/meta/meta.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/payload/payload.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/rpc/errdetails/error_details.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (68)
.gitfiles
(4 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)Makefile
(2 hunks)Makefile.d/functions.mk
(2 hunks)Makefile.d/k3d.mk
(3 hunks)apis/swagger/v1/mirror/mirror.swagger.json
(1 hunks)apis/swagger/v1/vald/filter.swagger.json
(8 hunks)apis/swagger/v1/vald/flush.swagger.json
(1 hunks)apis/swagger/v1/vald/index.swagger.json
(5 hunks)apis/swagger/v1/vald/insert.swagger.json
(2 hunks)apis/swagger/v1/vald/object.swagger.json
(4 hunks)apis/swagger/v1/vald/remove.swagger.json
(3 hunks)apis/swagger/v1/vald/search.swagger.json
(8 hunks)apis/swagger/v1/vald/update.swagger.json
(3 hunks)apis/swagger/v1/vald/upsert.swagger.json
(2 hunks)example/client/go.mod
(2 hunks)go.mod
(12 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(16 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(26 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/net/net.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)k8s/discoverer/deployment.yaml
(1 hunks)k8s/gateway/gateway/lb/deployment.yaml
(1 hunks)k8s/gateway/gateway/mirror/deployment.yaml
(1 hunks)k8s/index/operator/deployment.yaml
(1 hunks)k8s/manager/index/deployment.yaml
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/CMAKE_VERSION
(1 hunks)versions/DOCKER_VERSION
(1 hunks)versions/FAISS_VERSION
(1 hunks)versions/GOLANGCILINT_VERSION
(1 hunks)versions/HELM_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KIND_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/YQ_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/ACTIONS_UPLOAD_ARTIFACT
(1 hunks)versions/actions/CODECOV_CODECOV_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_BUILDX_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_QEMU_ACTION
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_INIT
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
(1 hunks)versions/actions/REVIEWDOG_ACTION_HADOLINT
(1 hunks)versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (55)
- versions/HELM_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
- versions/KIND_VERSION
- versions/actions/ACTIONS_SETUP_NODE
- versions/OPERATOR_SDK_VERSION
- versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
- versions/KUBECTL_VERSION
- versions/NGT_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
- versions/DOCKER_VERSION
- versions/YQ_VERSION
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
- versions/actions/DOCKER_SETUP_BUILDX_ACTION
- versions/actions/DOCKER_SETUP_QEMU_ACTION
- versions/PROTOBUF_VERSION
- versions/K3S_VERSION
- versions/actions/CODECOV_CODECOV_ACTION
- versions/BUF_VERSION
- versions/TELEPRESENCE_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_INIT
- versions/actions/REVIEWDOG_ACTION_HADOLINT
- versions/RUST_VERSION
- versions/USEARCH_VERSION
- apis/swagger/v1/vald/flush.swagger.json
- versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
- versions/actions/GITHUB_ISSUE_METRICS
- versions/PROMETHEUS_STACK_VERSION
- versions/GOLANGCILINT_VERSION
- rust/rust-toolchain
- .github/ISSUE_TEMPLATE/bug_report.md
- versions/actions/ACTIONS_UPLOAD_ARTIFACT
- versions/FAISS_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
- pkg/gateway/mirror/service/mirror.go
- internal/test/mock/grpc_testify_mock.go
- k8s/index/operator/deployment.yaml
- Makefile.d/functions.mk
- internal/test/mock/grpc/grpc_client_mock.go
- apis/swagger/v1/vald/index.swagger.json
- internal/backoff/backoff.go
- k8s/discoverer/deployment.yaml
- k8s/gateway/gateway/lb/deployment.yaml
- Makefile.d/k3d.mk
- versions/CMAKE_VERSION
- .github/ISSUE_TEMPLATE/security_issue_report.md
- internal/net/grpc/pool/pool_test.go
- internal/net/net.go
- k8s/gateway/gateway/mirror/deployment.yaml
- k8s/manager/index/deployment.yaml
- Makefile
- .github/PULL_REQUEST_TEMPLATE.md
- example/client/go.mod
- pkg/gateway/lb/handler/grpc/aggregation.go
- go.mod
🧰 Additional context used
📓 Learnings (1)
.gitfiles (1)
Learnt from: datelier
PR: vdaas/vald#2807
File: rust/bin/agent/src/handler/object.rs:36-38
Timestamp: 2025-01-22T06:26:59.130Z
Learning: In Rust code for Vald handlers, avoid using unwrap() on Option types. Instead, handle None values explicitly by returning appropriate error statuses to maintain robustness.
🔇 Additional comments (34)
apis/swagger/v1/mirror/mirror.swagger.json (1)
15-18
: Enhanced Documentation through Detailed Summary:The updated summary on line 17 now clearly outlines the endpoint's purpose and includes a structured table of status codes. This approach improves clarity and provides valuable context regarding the expected responses from the endpoint. However, please verify that the multiline formatting and table syntax (using newline characters and pipe symbols) is rendered as expected in your Swagger UI/editor. In some cases, certain tools might not fully support this markdown-like formatting within JSON strings.
.gitfiles (3)
126-126
: LGTM! Addition of Rust unit test workflow.The addition of a dedicated Rust unit test workflow improves the testing coverage for Rust components.
161-163
: LGTM! New buf.gen configuration files.The addition of buf.gen configuration files for documentation generation enhances the project's documentation tooling.
164-177
: LGTM! Comprehensive API documentation.The addition of multiple API documentation files improves the project's documentation coverage and maintainability.
internal/observability/metrics/grpc/grpc.go (3)
16-24
: LGTM!The imports and struct changes are well-organized and necessary for implementing the pool connection metrics functionality.
Also applies to: 37-39
33-34
: LGTM!The metric constants are well-named and have clear descriptions.
69-77
: LGTM!The pool connection metrics view is properly configured with sum aggregation, which is appropriate for tracking the count of healthy connections.
internal/net/grpc/client_test.go (2)
3110-3110
: Verify context handling in ConnectedAddrs test.The test should verify context cancellation and timeout scenarios.
Apply this diff to improve context handling:
-gotAddrs := g.ConnectedAddrs(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), time.Second) +defer cancel() +gotAddrs := g.ConnectedAddrs(ctx)
3306-3308
: 🛠️ Refactor suggestionUpdate rangeConns test to include context parameter.
The test function signature has been updated but test cases need to be updated accordingly.
Apply this diff to update the test:
type args struct { + ctx context.Context fn func(addr string, p pool.Conn) bool }
✅ Verification successful
Test Case Update Required: Update the args struct to include the context parameter.
The test case in
internal/net/grpc/client_test.go
still contains commented-out code where thectx context.Context
field is present. This confirms that after the signature change, the test cases haven’t been updated to include the context parameter as expected. Please apply the provided diff to uncomment (or add) thectx context.Context
field within theargs
struct so that it aligns with the updated function signature.
- Location:
internal/net/grpc/client_test.go
(lines around 3306–3308)🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts or web queries found
Script:
#!/bin/bash # List occurrences of the 'args' struct definition in the test file to inspect context usage. echo "Occurrences of 'type args' in internal/net/grpc/client_test.go:" rg -n "type args *{" internal/net/grpc/client_test.go echo "--------------------------------------------" # List occurrences of 'ctx context.Context' to see if the context parameter is currently present. echo "Occurrences of 'ctx context.Context' in internal/net/grpc/client_test.go:" rg -n "ctx context.Context" internal/net/grpc/client_test.go echo "--------------------------------------------" # List the lines around where the commented-out code exists to verify removal or changes. echo "Context around commented code for 'fn func(addr string, p pool.Conn) bool':" rg -n -C 5 "fn func\(addr string, p pool.Conn\)" internal/net/grpc/client_test.goLength of output: 2154
apis/swagger/v1/vald/object.swagger.json (2)
17-18
: LGTM! Well-structured API documentation.The documentation for /exists/{id} endpoint is comprehensive and follows best practices:
- Clear overview section
- Structured status code table
- Detailed troubleshooting guide with common errors and resolutions
106-107
: LGTM! Well-structured API documentation.The documentation for /object/{id.id} endpoint is comprehensive and follows best practices:
- Clear overview section
- Structured status code table
- Detailed troubleshooting guide with common errors and resolutions
apis/swagger/v1/vald/insert.swagger.json (1)
50-51
: LGTM! Well-structured API documentation.The documentation for /insert/multiple endpoint is comprehensive and follows best practices:
- Clear overview section
- Important notice about gRPC message size limitation
- Structured status code table
- Detailed troubleshooting guide with common errors and resolutions
apis/swagger/v1/vald/upsert.swagger.json (2)
17-18
: Enhanced Upsert Endpoint Documentation:
The updated summary and description now clearly explain that the Upsert RPC handles both updating an existing vector and inserting a new one. The detailed status code table and troubleshooting section greatly improve clarity. Consider a slight rewording (for example, “update an existing vector or insert a new one”) to make it even clearer.
50-52
: Enhanced MultiUpsert Endpoint Documentation:
The revised summary for the/upsert/multiple
endpoint now explains the dual functionality of updating existing vectors and adding new ones in a single request. The added notice on gRPC message size limitations and the structured table of status codes/troubleshooting details provide excellent guidance.apis/swagger/v1/vald/update.swagger.json (1)
17-18
: Enhanced Update Endpoint Documentation:
The updated summary and description for the/update
endpoint now include a comprehensive status code table with troubleshooting details. This clear presentation helps users understand the purpose and common error scenarios of this RPC.apis/swagger/v1/vald/remove.swagger.json (3)
17-19
: Improved Single Remove Documentation:
The revised summary and description for the/remove
endpoint now include a detailed table of possible responses and troubleshooting tips. This enhancement clarifies what a client should expect in error conditions, improving the overall API usability.
50-52
: Enhanced MultiRemove Endpoint Documentation:
The updated documentation for the/remove/multiple
endpoint now mirrors the style of the single remove endpoint—with a detailed status code table and troubleshooting suggestions. The clarity and consistency across endpoints are commendable.
83-85
: Detailed RemoveByTimestamp Documentation:
The summary and description for the/remove/timestamp
endpoint now provide a comprehensive explanation—including how multiple timestamps are combined using an AND condition—with a clear status code table and troubleshooting section.apis/swagger/v1/vald/filter.swagger.json (8)
17-18
: InsertObject Endpoint Documentation Enhancement:
The updated summary now specifies that the InsertObject RPC is used for inserting objects via the Vald Filter Gateway. The added status code table clearly lays out possible responses, which enhances user understanding.
49-50
: MultiInsertObject Endpoint Documentation Enhancement:
The revised summary for the/insert/object/multiple
endpoint makes it clear that multiple new objects can be inserted in a single request. The detailed presentation of status codes improves clarity.
81-82
: SearchObject Endpoint Documentation Updated:
The updated summary for the/search/object
endpoint clearly articulates that the SearchObject RPC is used for object-based searches. The error handling table and troubleshooting information add valuable context.
112-113
: StreamSearchObject Endpoint Documentation Updated:
The enhanced summary for the/search/object/multiple
endpoint now includes a notice on bidirectional streaming, which is very useful. The structured status code table and troubleshooting details further improve the documentation.
145-146
: UpdateObject Endpoint Documentation Improved:
For the/update/object
endpoint, the revised summary now effectively communicates the update operation for a single object, complete with a clear status code table.
177-178
: MultiUpdateObject Endpoint Documentation Updated:
The updated summary and description emphasize the multi-update functionality and note the gRPC message size limitation. Clear troubleshooting steps are provided, enhancing the endpoint’s usability.
210-211
: UpsertObject Endpoint Documentation Enhanced:
The summary for the/upsert/object
endpoint now clearly states that it is used for both updating an existing object and inserting a new one. The concise explanation is very helpful.
242-244
: MultiUpsertObject Endpoint Documentation Enhanced:
The changes for the/upsert/object/multiple
endpoint now offer a detailed summary with a clear notice on gRPC message size limitations and troubleshooting information. This consistency and level of detail across endpoints is excellent.apis/swagger/v1/vald/search.swagger.json (8)
16-19
: LinearSearch Endpoint Documentation Enhanced:
The updated summary for the/linearsearch
endpoint clearly describes the LinearSearch RPC with a detailed status code table and troubleshooting guide. This helps set a clear expectation for users.
49-52
: LinearSearchByID Endpoint Documentation Updated:
The revised documentation for the/linearsearch/id
endpoint now explains that the search is performed by a user-defined vector ID – including a note that the vector must be indexed prior to querying. This adds useful context regarding prerequisites.
81-85
: MultiLinearSearchByID Endpoint Documentation Enhanced:
The enhanced summary for the/linearsearch/id/multiple
endpoint now includes a clear notice about gRPC message size limitations, a thorough status code table, and detailed troubleshooting steps, which together offer a comprehensive guide.
115-118
: MultiLinearSearch Endpoint Documentation Improved:
The updated summary for/linearsearch/multiple
explains the functionality clearly and provides users with detailed troubleshooting information. This ensures that clients are aware of both the capabilities and limitations of the RPC.
148-151
: Search Endpoint Documentation Enhanced:
The/search
endpoint now features an expanded summary and detailed troubleshooting information. This helps in setting clear expectations regarding both the success and error conditions of the search operation.
180-184
: SearchByID Endpoint Documentation Updated:
The summary for the/search/id
endpoint now clearly indicates that the RPC searches based on a user-defined vector ID, including important validation notes and detailed error guidance.
213-217
: MultiSearchByID Endpoint Documentation Enhanced:
The updated documentation for/search/id/multiple
now includes a detailed status code table and clear troubleshooting advice. This helps users diagnose potential issues when searching with multiple IDs.
246-250
: MultiSearch Endpoint Documentation Improved:
The enhanced summary and comprehensive troubleshooting information for the/search/multiple
endpoint provide a clear and useful guide for users. The documentation now effectively communicates both functional details and considerations (such as gRPC limitations).
9c82017
to
0e02ce2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 4
🧹 Nitpick comments (7)
internal/observability/metrics/grpc/grpc.go (1)
81-104
: Consider adding error handling for metrics registration.While the implementation is good, consider adding cleanup or fallback handling if registration fails.
func (gm *grpcServerMetrics) Register(m metrics.Meter) error { + var registeredMetrics []api.Observable healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { + // Clean up any previously registered metrics + for _, metric := range registeredMetrics { + m.UnregisterCallback(metric) + } return err } + registeredMetrics = append(registeredMetrics, healthyConn) _, err = m.RegisterCallback( func(ctx context.Context, o api.Observer) error { ms := pool.Metrics(ctx) if len(ms) == 0 { return nil } for name, cnt := range ms { o.ObserveInt64(healthyConn, cnt, api.WithAttributes(attribute.String(gm.poolTargetAddrKey, name))) } return nil }, healthyConn, ) return err }internal/net/grpc/pool/pool.go (1)
592-594
: Consider using deferred mutex unlock for metrics update.The current implementation could lead to mutex contention. Consider using a deferred unlock pattern.
- mu.Lock() - metrics[p.addr] = int64(pl - unhealthy) - mu.Unlock() + func() { + mu.Lock() + defer mu.Unlock() + metrics[p.addr] = int64(pl - unhealthy) + }()apis/swagger/v1/vald/insert.swagger.json (1)
17-18
: Typo in Endpoint Summary – “Inset RPC” vs “Insert RPC”.
The updated summary for the/insert
endpoint now reads “Inset RPC is the method to add a new single vector.” This is likely a typographical mistake. Please update “Inset” to “Insert” for clarity and consistency with the rest of the API documentation.apis/swagger/v1/vald/filter.swagger.json (1)
177-179
: MultiUpdateObject Endpoint – Notice on Message Size Limitation.
The/update/object/multiple
endpoint now has a noticed message about gRPC message size limitations, which is important for users creating bulk requests. Ensure the “exceed” wording is grammatically adjusted (consider “exceeds” instead of “exceed”).Makefile (3)
161-167
: Enhance Compiler and Linker Flags Consolidation
The new definitions forLDFLAGS
,COMMON_FLAGS
, and the language-specific flag variables are well organized. Using a centralCOMMON_FLAGS
variable helps maintain consistency across C, C++ and Fortran builds.Suggestion: Consider adding an inline comment above these definitions explaining the rationale behind using aggressive optimizations (e.g.
-Ofast
) and the potential trade-offs regarding floating‐point precision or portability.
181-182
: Arm64 Flag Handling
For thearm64
architecture, the flags remain unchanged as indicated by the reassignment. Although functionally correct, it might be helpful to add a brief comment noting that no extra optimizations or disabling flags are required for arm64 targets.
185-186
: Default Flag Preservation for Other Architectures
The explicit reassignment ofCFLAGS
andCXXFLAGS
in the fallback path (for architectures other than amd64 or arm64) confirms that the defaults are preserved. This is clear and acceptable as long as no additional architecture-specific flags are needed.Suggestion: A comment might be added to explain this branch if future maintainers expect architecture-specific optimizations.
🛑 Comments failed to post (4)
.gitfiles (1)
1951-1951: 💡 Codebase verification
Robust Error Handling: Multiple unwrap() Calls Detected
The QBG implementation and object handler contain numerous calls to
unwrap()
, which do not align with the robust error handling practices outlined in PR #2807. Specifically:
- rust/libs/algorithms/qbg/src/lib.rs has multiple
unwrap()
usages (e.g., lines 379, 397, 448, etc.) that may cause runtime panics if an error occurs.- rust/bin/agent/src/handler/object.rs (line 43) calls
unwrap()
on the conversion of hostname, which can also lead to crashes if the hostname isn’t valid.It’s recommended to refactor these instances to use proper error handling methods, such as
?
for propagating errors or matching on the result to handle potential errors gracefully.🔗 Analysis chain
LGTM: New Rust implementation files.
The addition of Rust source files for object handling and the QBG algorithm implementation aligns with the project's goals. Based on the retrieved learning from PR #2807, ensure that the Rust implementation follows robust error handling practices.
Run the following script to verify the Rust implementation follows the project's error handling guidelines:
Also applies to: 1970-1974
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for unwrap() usage in Rust files # Note: unwrap() should be avoided as per PR #2807 learning rg -n 'unwrap\(\)' 'rust/bin/agent/src/handler/object.rs' 'rust/libs/algorithms/qbg/src/lib.rs'Length of output: 2839
apis/swagger/v1/vald/object.swagger.json (2)
77-77: 🛠️ Refactor suggestion
Add missing status codes and troubleshooting information.
The documentation for
/object/meta/{id.id}
endpoint has TODO placeholders for status codes and troubleshooting.- "summary": "Overview\nRepresent the RPC to get the vector metadata. This RPC is mainly used for index correction process\n---\nStatus Code\nTODO\n---\nTroubleshooting\nTODO", + "summary": "Overview\nRepresent the RPC to get the vector metadata. This RPC is mainly used for index correction process\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", + "description": "Here are some common reasons and how to resolve each error.\n\n| name | common reason | how to resolve |\n| :---------------- | :---------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------- |\n| CANCELLED | Executed cancel() of rpc from client/server-side or network problems between client and server. | Check the code, especially around timeout and connection management, and fix if needed. |\n| INVALID_ARGUMENT | The Requested vector's ID is empty, or some request payload is invalid. | Check request payload and fix request payload. |\n| DEADLINE_EXCEEDED | The RPC timeout setting is too short on the client/server side. | Check the gRPC timeout setting on both the client and server sides and fix it if needed. |\n| NOT_FOUND | Requested ID is NOT inserted. | Send a request with an ID that is already inserted. |\n| INTERNAL | Target Vald cluster or network route has some critical error. | Check target Vald cluster first and check network route including ingress as second. |",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."summary": "Overview\nRepresent the RPC to get the vector metadata. This RPC is mainly used for index correction process\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", "description": "Here are some common reasons and how to resolve each error.\n\n| name | common reason | how to resolve |\n| :---------------- | :---------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------- |\n| CANCELLED | Executed cancel() of rpc from client/server-side or network problems between client and server. | Check the code, especially around timeout and connection management, and fix if needed. |\n| INVALID_ARGUMENT | The Requested vector's ID is empty, or some request payload is invalid. | Check request payload and fix request payload. |\n| DEADLINE_EXCEEDED | The RPC timeout setting is too short on the client/server side. | Check the gRPC timeout setting on both the client and server sides and fix it if needed. |\n| NOT_FOUND | Requested ID is NOT inserted. | Send a request with an ID that is already inserted. |\n| INTERNAL | Target Vald cluster or network route has some critical error. | Check target Vald cluster first and check network route including ingress as second. |",
47-47: 🛠️ Refactor suggestion
Add missing status codes and troubleshooting information.
The documentation for
/object/list
endpoint has TODO placeholders for status codes and troubleshooting.- "summary": "Overview\nA method to get all the vectors with server streaming\n---\nStatus Code\nTODO\n---\nTroubleshooting\nTODO", + "summary": "Overview\nA method to get all the vectors with server streaming\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", + "description": "Here are some common reasons and how to resolve each error.\n\n| name | common reason | how to resolve |\n| :---------------- | :---------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------- |\n| CANCELLED | Executed cancel() of rpc from client/server-side or network problems between client and server. | Check the code, especially around timeout and connection management, and fix if needed. |\n| INVALID_ARGUMENT | Some request payload is invalid. | Check request payload and fix request payload. |\n| DEADLINE_EXCEEDED | The RPC timeout setting is too short on the client/server side. | Check the gRPC timeout setting on both the client and server sides and fix it if needed. |\n| INTERNAL | Target Vald cluster or network route has some critical error. | Check target Vald cluster first and check network route including ingress as second. |",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."summary": "Overview\nA method to get all the vectors with server streaming\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", "description": "Here are some common reasons and how to resolve each error.\n\n| name | common reason | how to resolve |\n| :---------------- | :---------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------- |\n| CANCELLED | Executed cancel() of rpc from client/server-side or network problems between client and server. | Check the code, especially around timeout and connection management, and fix if needed. |\n| INVALID_ARGUMENT | Some request payload is invalid. | Check request payload and fix request payload. |\n| DEADLINE_EXCEEDED | The RPC timeout setting is too short on the client/server side. | Check the gRPC timeout setting on both the client and server sides and fix it if needed. |\n| INTERNAL | Target Vald cluster or network route has some critical error. | Check target Vald cluster first and check network route including ingress as second. |",
apis/swagger/v1/vald/update.swagger.json (1)
83-84:
⚠️ Potential issueIncomplete Placeholder for UpdateTimestamp Endpoint.
The summary for/update/timestamp
still contains “TODO” placeholders for both status codes and troubleshooting. This should be completed before merging—the documentation must reflect actual status values and common error resolutions for the update timestamp operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
apis/swagger/v1/vald/upsert.swagger.json (2)
50-51
: Clarify MultiUpsert Endpoint Summary and Description.
The updated summary and description for the/upsert/multiple
endpoint clearly outline the method’s purpose and call out the gRPC message size limitation. One minor nitpick: consider whether using "1 request" is optimal for clarity—using "a single request" might be more immediately clear for all readers.
252-253
: Fix Typographical Error in UpsertMulti Request Definition.
There is a typo in the description field: "Represent mthe ultiple upsert request." should be corrected to "Represent the multiple upsert request." This change will improve clarity and maintain the professionalism of the API documentation.Below is a suggested diff for the correction:
- "description": "Represent mthe ultiple upsert request." + "description": "Represent the multiple upsert request."internal/net/grpc/pool/pool.go (1)
1129-1158
: Enhance error handling in rangeConns.The
rangeConns
function has been updated to handle connection health checks, but there are a few improvements that could be made:
- Consider adding error propagation from
Connect
andDisconnect
calls- Consider adding retry logic for transient failures
Apply this diff to improve error handling:
func (g *gRPCClient) rangeConns( ctx context.Context, force bool, fn func(addr string, p pool.Conn) bool, ) error { var cnt int + var errs []error g.conns.Range(func(addr string, p pool.Conn) bool { if force { cnt++ return fn(addr, p) } if p == nil || !p.IsHealthy(ctx) { pc, err := p.Connect(ctx) if pc == nil || err != nil || !pc.IsHealthy(ctx) { if pc != nil { if derr := pc.Disconnect(ctx); derr != nil { log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) + errs = append(errs, derr) } } log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) return true } p = pc } cnt++ return fn(addr, p) }) if cnt == 0 { return errors.ErrGRPCClientConnNotFound("*") } + if len(errs) > 0 { + return errors.Join(errs...) + } return nil }internal/net/grpc/client.go (1)
257-291
: Consider adding retry logic for connection failures.The connection monitoring loops could benefit from retry logic for transient failures.
Consider adding exponential backoff for connection retries:
err = g.rangeConns(ctx, true, func(addr string, p pool.Conn) bool { + var retries int + var backoff = time.Millisecond * 100 if addr == "" || p == nil { disconnectTargets = append(disconnectTargets, addr) return true } var err error - p, err = p.Reconnect(ctx, false) + for retries < 3 { + p, err = p.Reconnect(ctx, false) + if err == nil { + break + } + time.Sleep(backoff) + backoff *= 2 + retries++ + } if err != nil { // ... error handling ... } return true })Also applies to: 294-333
apis/swagger/v1/vald/filter.swagger.json (2)
175-178
: Revised Summary & Description for/update/object/multiple
Endpoint
The updated text now incorporates a notice regarding potential gRPC message size limitations and provides a comprehensive status code table. One suggestion is to verify that the HTML snippet (i.e.<div class="notice">
) renders correctly in your Swagger UI.
240-243
: Enhanced Summary & Description for/upsert/object/multiple
Endpoint
The updated summary and description now indicate that the endpoint performs multi-upsert operations, complete with a detailed status code table and an HTML-formatted notice about gRPC message size limitations. It is recommended to check that the embedded HTML is supported by all Swagger UI renderers in your environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
apis/grpc/v1/agent/core/agent.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/agent/sidecar/sidecar.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/discoverer/discoverer.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/egress/egress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/ingress/ingress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/meta/meta.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/payload/payload.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/rpc/errdetails/error_details.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (68)
.gitfiles
(4 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)Makefile
(6 hunks)Makefile.d/functions.mk
(2 hunks)Makefile.d/k3d.mk
(3 hunks)apis/swagger/v1/mirror/mirror.swagger.json
(1 hunks)apis/swagger/v1/vald/filter.swagger.json
(8 hunks)apis/swagger/v1/vald/flush.swagger.json
(1 hunks)apis/swagger/v1/vald/index.swagger.json
(5 hunks)apis/swagger/v1/vald/insert.swagger.json
(2 hunks)apis/swagger/v1/vald/object.swagger.json
(4 hunks)apis/swagger/v1/vald/remove.swagger.json
(3 hunks)apis/swagger/v1/vald/search.swagger.json
(8 hunks)apis/swagger/v1/vald/update.swagger.json
(3 hunks)apis/swagger/v1/vald/upsert.swagger.json
(2 hunks)example/client/go.mod
(2 hunks)go.mod
(12 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(16 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(26 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/net/net.go
(1 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)k8s/discoverer/deployment.yaml
(1 hunks)k8s/gateway/gateway/lb/deployment.yaml
(1 hunks)k8s/gateway/gateway/mirror/deployment.yaml
(1 hunks)k8s/index/operator/deployment.yaml
(1 hunks)k8s/manager/index/deployment.yaml
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/CMAKE_VERSION
(1 hunks)versions/DOCKER_VERSION
(1 hunks)versions/FAISS_VERSION
(1 hunks)versions/GOLANGCILINT_VERSION
(1 hunks)versions/HELM_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KIND_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/YQ_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/ACTIONS_UPLOAD_ARTIFACT
(1 hunks)versions/actions/CODECOV_CODECOV_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_BUILDX_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_QEMU_ACTION
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_INIT
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
(1 hunks)versions/actions/REVIEWDOG_ACTION_HADOLINT
(1 hunks)versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (58)
- versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
- versions/KUBECTL_VERSION
- versions/OPERATOR_SDK_VERSION
- versions/CMAKE_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/DOCKER_VERSION
- versions/actions/ACTIONS_SETUP_NODE
- versions/HELM_VERSION
- versions/actions/DOCKER_SETUP_QEMU_ACTION
- versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
- versions/PROTOBUF_VERSION
- versions/actions/ACTIONS_UPLOAD_ARTIFACT
- versions/KIND_VERSION
- versions/actions/DOCKER_SETUP_BUILDX_ACTION
- versions/actions/CODECOV_CODECOV_ACTION
- versions/RUST_VERSION
- versions/NGT_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/FAISS_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
- versions/BUF_VERSION
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
- versions/USEARCH_VERSION
- versions/K3S_VERSION
- internal/net/net.go
- versions/actions/REVIEWDOG_ACTION_HADOLINT
- versions/actions/GITHUB_ISSUE_METRICS
- versions/actions/GITHUB_CODEQL_ACTION_INIT
- versions/GOLANGCILINT_VERSION
- versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
- versions/YQ_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
- internal/test/mock/grpc/grpc_client_mock.go
- Makefile.d/functions.mk
- internal/test/mock/grpc_testify_mock.go
- k8s/gateway/gateway/lb/deployment.yaml
- pkg/gateway/lb/handler/grpc/aggregation.go
- internal/backoff/backoff.go
- versions/TELEPRESENCE_VERSION
- .github/PULL_REQUEST_TEMPLATE.md
- rust/rust-toolchain
- apis/swagger/v1/vald/flush.swagger.json
- apis/swagger/v1/mirror/mirror.swagger.json
- k8s/index/operator/deployment.yaml
- pkg/gateway/mirror/service/mirror.go
- apis/swagger/v1/vald/index.swagger.json
- k8s/discoverer/deployment.yaml
- .github/ISSUE_TEMPLATE/bug_report.md
- k8s/gateway/gateway/mirror/deployment.yaml
- k8s/manager/index/deployment.yaml
- .github/ISSUE_TEMPLATE/security_issue_report.md
- Makefile.d/k3d.mk
- internal/net/grpc/pool/pool_test.go
- apis/swagger/v1/vald/remove.swagger.json
- Makefile
- example/client/go.mod
- apis/swagger/v1/vald/update.swagger.json
- go.mod
🧰 Additional context used
📓 Learnings (1)
.gitfiles (1)
Learnt from: datelier
PR: vdaas/vald#2807
File: rust/bin/agent/src/handler/object.rs:36-38
Timestamp: 2025-01-22T06:26:59.130Z
Learning: In Rust code for Vald handlers, avoid using unwrap() on Option types. Instead, handle None values explicitly by returning appropriate error statuses to maintain robustness.
🔇 Additional comments (34)
apis/swagger/v1/vald/upsert.swagger.json (1)
17-18
: Enhance Upsert Endpoint Documentation.
The updated summary and description now provide a comprehensive overview of the Upsert RPC, including detailed status codes and troubleshooting guidelines. This enriched documentation should be very helpful to API consumers. Please verify that the multiline text with embedded markdown-style tables renders as expected in your Swagger UI.apis/swagger/v1/vald/object.swagger.json (4)
17-18
: Enhanced Documentation for the/exists/{id}
Endpoint.
The updatedsummary
and addeddescription
now clearly outline the purpose of the Exists RPC along with detailed error codes and troubleshooting steps. Please ensure that these textual details remain in sync with the backend’s error-handling logic and overall documentation standards.
47-47
: Incomplete Documentation in/object/list
Endpoint.
Thesummary
field still has placeholders ("Status Code TODO" and "Troubleshooting TODO"). Please fill in these sections with the appropriate status codes and troubleshooting information so that API consumers receive complete documentation.
77-77
: Incomplete Documentation in/object/meta/{id.id}
Endpoint.
Similar to the/object/list
endpoint, thesummary
contains placeholders for "Status Code" and "Troubleshooting". Completing these sections will improve clarity and consistency with other endpoints.
106-107
: Enhanced Documentation for the/object/{id.id}
Endpoint.
The improvedsummary
anddescription
now offer a detailed explanation of the GetObject RPC, including a clear table of status codes and error scenarios. Verify that these details accurately reflect the actual behavior and error responses from the service, and ensure consistency with similar endpoints such as/exists/{id}
..gitfiles (2)
1951-1951
: Verify error handling in the new Rust handler.Based on previous feedback, ensure that the new object handler follows proper error handling practices by avoiding
unwrap()
on Option types.Run the following script to check for unwrap() usage:
#!/bin/bash # Description: Check for unwrap() usage in the new Rust handler rg -A 2 'unwrap\(\)' rust/bin/agent/src/handler/object.rs
1970-1974
: Integration Testing and Documentation for QBG Algorithm Are Missing.The new QBG algorithm implementation requires proper integration testing and documentation.
Run the following script to check for tests and documentation:
#!/bin/bash # Description: Check for test files and documentation for QBG algorithm echo "Checking for tests..." fd -t f "test" rust/libs/algorithms/qbg/ echo "Checking for documentation..." fd -t f "README|\.md" rust/libs/algorithms/qbg/internal/observability/metrics/grpc/grpc.go (4)
33-34
: LGTM! New metrics for pool connection health.The new metrics
server_pool_conn
with description "Count of healthy pool connections by target address" will help monitor the health of gRPC connection pools.
37-39
: LGTM! Added field for target address key.The
poolTargetAddrKey
field ingrpcServerMetrics
struct will be used as a label key for the pool connection metrics.
69-77
: LGTM! Added view for pool connection metrics.The new view for
poolConnMetricsName
withAggregationSum
is correctly configured to track the count of healthy pool connections.
83-103
: LGTM! Added registration for pool connection metrics.The registration logic correctly:
- Creates an observable gauge for pool connections
- Registers a callback to observe connection health
- Uses the target address as an attribute
internal/net/grpc/pool/pool.go (3)
93-95
: LGTM! Added global metrics storage.The global
metrics
map with mutex synchronization will store connection health metrics per target address.
592-594
: LGTM! Added metrics update in health check.The health check function now updates the metrics map with the count of healthy connections per target address.
879-887
: LGTM! Added metrics retrieval function.The
Metrics
function safely returns a clone of the metrics map usingmaps.Clone
.internal/net/grpc/client.go (2)
1096-1108
: LGTM! Added context to ConnectedAddrs.The
ConnectedAddrs
function now accepts a context parameter and uses it for health checks, improving context propagation.
1115-1124
: LGTM! Added context handling in Close.The
Close
function now properly handles context cancellation during cleanup.internal/net/grpc/client_test.go (2)
3110-3110
: Context parameter added but test cases need update.The
ConnectedAddrs
method now accepts a context parameter but the test cases haven't been fully updated to reflect this change.The test should verify context cancellation and timeout scenarios. Apply this diff:
-gotAddrs := g.ConnectedAddrs(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), time.Second) +defer cancel() +gotAddrs := g.ConnectedAddrs(ctx)
3306-3308
: Context parameter added to rangeConns but test cases need update.The
rangeConns
method signature has been updated to include a context parameter but the test cases need to be updated accordingly.The test should verify context handling. Apply this diff:
type args struct { - ctx context.Context + ctx context.Context fn func(addr string, p pool.Conn) bool }apis/swagger/v1/vald/insert.swagger.json (2)
17-18
: LGTM! Comprehensive API documentation improvements.The documentation improvements for the
/insert
endpoint are excellent, providing:
- Clear status code table
- Detailed troubleshooting guide
- Common error scenarios and resolutions
50-51
: LGTM! Well-structured documentation for multi-insert endpoint.The documentation for
/insert/multiple
endpoint is well-structured with:
- Clear warning about gRPC message size limitations
- Comprehensive status code table
- Detailed troubleshooting section
apis/swagger/v1/vald/filter.swagger.json (6)
15-17
: Enhanced Summary for/insert/object
Endpoint
The updated summary now clearly specifies that the endpoint uses the InsertObject RPC method and includes a detailed status code table. This increased clarity should help API consumers understand the possible responses.
47-50
: Improved Summary for/insert/object/multiple
Endpoint
The revised summary explicitly states that the endpoint corresponds to the MultiInsertObject RPC and incorporates a status code table with a notice on gRPC message size limitations. This additional information is useful for users concerned with message size constraints.
79-82
: Updated Summary for/search/object
Endpoint
The summary now clearly indicates that this endpoint uses the SearchObject RPC and provides a detailed status code table. This update enhances overall clarity and consistency.
111-114
: Clarified Summary for/search/object/multiple
Endpoint
The updated summary emphasizes that the endpoint employs the bidirectional streaming mechanism (StreamSearchObject RPC) while including a status code table. This change makes it easier for consumers to understand the streaming behavior.
143-146
: Refined Summary for/update/object
Endpoint
The summary update explicitly denotes the UpdateObject RPC along with a detailed list of status codes, which promotes better understanding of the operation’s expected outcomes.
208-211
: Improved Summary for/upsert/object
Endpoint
The revised summary clearly describes that this endpoint facilitates both updating and inserting a single object, and it details the corresponding status codes. This enhances the clarity of the API documentation.apis/swagger/v1/vald/search.swagger.json (8)
15-19
: Enhanced Summary and Troubleshooting for/linearsearch
Endpoint
The updated summary now explains that the endpoint uses the LinearSearch RPC method and includes a detailed status code table with troubleshooting guidance. This comprehensive information should help users diagnose potential issues.
48-52
: Updated Summary for/linearsearch/id
Endpoint
The enhanced summary now identifies the endpoint as using the LinearSearchByID RPC, noting that the vector must be indexed intovald-agent
beforehand. Detailed status codes and troubleshooting instructions are provided, which improve clarity.
81-85
: Improved Summary for/linearsearch/id/multiple
Endpoint
The revised summary now clarifies that this endpoint supports multi-linear search by IDs in a single call. The inclusion of an HTML notice regarding gRPC message size limitations and detailed troubleshooting steps significantly enhances the documentation.
114-118
: Revised Summary for/linearsearch/multiple
Endpoint
The updated summary and description clearly communicate that the endpoint enables multi-linear search with multiple vectors in one request. The detailed status codes and troubleshooting information offer valuable guidance for resolving issues.
147-151
: Enhanced Summary for/search
Endpoint
The updated summary now includes a detailed status code table and troubleshooting instructions for the Search RPC endpoint, thereby improving usability and user guidance.
180-184
: Updated Summary for/search/id
Endpoint
The summary has been refined to clearly indicate that the endpoint uses the SearchByID RPC. It also provides important notes about pre-indexing requirements along with a comprehensive list of status codes and troubleshooting advice.
213-217
: Enhanced Summary for/search/id/multiple
Endpoint
The revised summary now reflects that this endpoint processes multi-search requests by IDs and provides extensive troubleshooting details, including an HTML notice on gRPC message size limitations.
246-250
: Updated Summary for/search/multiple
Endpoint
The enhanced summary clearly outlines the multi-search functionality using multiple vectors in a single request. The inclusion of a detailed status code table, troubleshooting guidance, and a notice regarding gRPC message size limitations makes the endpoint documentation much clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
Makefile.d/build.mk (1)
45-45
: LGTM! Switching to static linking improves binary portability.The change from
$(LDFLAGS)
to$(STATIC_LDFLAGS)
for CGO-enabled targets is a good improvement that:
- Reduces runtime dependencies by statically linking libraries
- Makes binaries more portable across different environments
- Ensures consistent behavior by bundling all dependencies
Note that this may increase binary sizes and build times.
Also applies to: 50-50, 102-102, 110-110, 114-114
Makefile (4)
161-162
: Improved linker flags organization.The separation of
LDFLAGS
andSTATIC_LDFLAGS
provides better control over linking behavior. The new flags focus on security (-z relro -z now
), optimization (-flto=auto
,-Wl,-O2
), and size reduction (-Wl,--gc-sections
).
170-175
: Well-structured compiler optimization flags.The introduction of
COMMON_FLAGS
with carefully chosen optimization flags is excellent:
- Performance optimizations:
-O3
,-ffast-math
,-march=native
- Size optimizations:
-ffunction-sections
,-fdata-sections
- Security improvements:
-fvisibility=hidden
177-190
: Good architecture-specific optimizations.The architecture-specific handling is well-implemented:
- amd64: Disables newer AVX instructions for better compatibility
- arm64: Uses appropriate march setting
- Fallback for other architectures
688-706
: Consistent CMake configurations across different installations.The CMake configurations for NGT, Faiss, and usearch are consistently structured with appropriate flags for:
- Build type (Release)
- Static linking
- Optimization flags
- Installation paths
Also applies to: 711-734, 740-766
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Makefile
(5 hunks)Makefile.d/build.mk
(2 hunks)Makefile.d/tools.mk
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile.d/tools.mk
⏰ Context from checks skipped due to timeout of 90000ms (98)
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / dump-contexts-to-log
- GitHub Check: build / build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
internal/net/grpc/pool/pool.go (4)
303-347
: Consider optimizing connection refresh logic.The refreshConn function could be optimized by:
- Reducing nested if conditions
- Moving the connection closure logic to a separate helper function
- Adding retry mechanism for failed connections
func (p *pool) refreshConn(ctx context.Context, idx int, pc *poolConn, addr string) (err error) { + closeConn := func(pc *poolConn) { + if pc != nil && pc.conn != nil { + p.eg.Go(safety.RecoverFunc(func() error { + log.Debugf("waiting for invalid connection to %s to be closed...", pc.addr) + err := pc.Close(ctx, p.roccd) + if err != nil { + log.Debugf("failed to close connection pool addr = %s\terror = %v", pc.addr, err) + } + return nil + })) + } + } + if pc != nil && pc.addr == addr && isHealthy(ctx, pc.conn) { return nil } - if pc != nil { - log.Debugf("connection for %s pool %d/%d len %d is unhealthy trying to establish new pool member connection to %s", pc.addr, idx+1, p.Size(), p.Len(), addr) - } else { - log.Debugf("connection pool %d/%d len %d is empty, establish new pool member connection to %s", idx+1, p.Size(), p.Len(), addr) - } + log.Debugf("connection pool %d/%d len %d status: %s, establish new pool member connection to %s", + idx+1, p.Size(), p.Len(), + map[bool]string{true: "unhealthy", false: "empty"}[pc != nil], + addr) + conn, err := p.dial(ctx, addr) if err != nil { if pc != nil && isHealthy(ctx, pc.conn) { log.Debugf("dialing new connection to %s failed,\terror: %v,\tbut existing connection to %s is healthy will keep existing connection", addr, err, pc.addr) return nil } - if pc != nil { - if pc.conn != nil { - p.eg.Go(safety.RecoverFunc(func() error { - log.Debugf("waiting for invalid connection to %s to be closed...", pc.addr) - err = pc.Close(ctx, p.roccd) - if err != nil { - log.Debugf("failed to close connection pool addr = %s\terror = %v", pc.addr, err) - } - return nil - })) - } - } + closeConn(pc) return errors.Join(err, errors.ErrInvalidGRPCClientConn(addr)) } + p.store(idx, &poolConn{ conn: conn, addr: addr, }) - if pc != nil { - p.eg.Go(safety.RecoverFunc(func() error { - log.Debugf("waiting for old connection to %s to be closed...", pc.addr) - err = pc.Close(ctx, p.roccd) - if err != nil { - log.Debugf("failed to close connection pool addr = %s\terror = %v", pc.addr, err) - } - return nil - })) - } + closeConn(pc) return nil }
617-651
: Enhance error handling in Do method.The error handling in the Do method could be improved by:
- Adding retries with backoff for transient failures
- Better handling of context cancellation
func (p *pool) Do(ctx context.Context, f func(conn *ClientConn) error) (err error) { if p == nil { return errors.ErrGRPCClientConnNotFound("*") } + + var attempts int + maxAttempts := 3 + + for attempts < maxAttempts { + if ctx.Err() != nil { + return ctx.Err() + } + idx, pc, ok := p.getHealthyConn(ctx, 0, p.Len()) if !ok || pc == nil || pc.conn == nil { + attempts++ + if attempts == maxAttempts { return errors.ErrGRPCClientConnNotFound(p.addr) + } + time.Sleep(time.Duration(attempts) * 100 * time.Millisecond) + continue } conn := pc.conn err = f(conn) if err != nil { st, ok := status.FromError(err) - if ok && st != nil && st.Code() != codes.Canceled { + if ok && st != nil { + if st.Code() == codes.Canceled { + return err + } if conn != nil { cerr := conn.Close() if cerr != nil { st, ok := status.FromError(cerr) if ok && st != nil && st.Code() != codes.Canceled { log.Warnf("Failed to close connection: %v", cerr) } } } rerr := p.refreshConn(ctx, idx, pc, p.addr) if rerr == nil { if newErr := f(p.load(idx).conn); newErr != nil { return errors.Join(err, newErr) } return nil } err = errors.Join(err, rerr) + attempts++ + if attempts < maxAttempts { + time.Sleep(time.Duration(attempts) * 100 * time.Millisecond) + continue + } } } + return err + } return err }
547-615
: Optimize health check implementation.The IsHealthy method could be improved by:
- Using atomic operations for counters
- Adding metrics for individual connection states
- Implementing circuit breaker pattern
+type healthMetrics struct { + total int64 + healthy int64 + unhealthy int64 + states map[connectivity.State]int64 +} + func (p *pool) IsHealthy(ctx context.Context) (healthy bool) { if p == nil || p.closing.Load() { return false } log.Debug("Checking health...") - var cnt, unhealthy int + metrics := &healthMetrics{ + states: make(map[connectivity.State]int64), + } pl := p.len() + atomic.StoreInt64(&metrics.total, int64(pl)) + err := p.loop(ctx, func(ctx context.Context, idx int, pc *poolConn) bool { + if pc != nil && pc.conn != nil { + state := pc.conn.GetState() + atomic.AddInt64(&metrics.states[state], 1) + } + if pc == nil || !isHealthy(ctx, pc.conn) { + atomic.AddInt64(&metrics.unhealthy, 1) if p.isIP { // ... rest of the IP handling logic } // ... rest of the DNS handling logic } - cnt++ + atomic.AddInt64(&metrics.healthy, 1) return true }) mu.Lock() - metrics[p.addr] = int64(pl - unhealthy) + metrics[p.addr] = atomic.LoadInt64(&metrics.healthy) mu.Unlock() - if cnt == 0 { + if atomic.LoadInt64(&metrics.healthy) == 0 { return false } if p.isIP { - return unhealthy == 0 + return atomic.LoadInt64(&metrics.unhealthy) == 0 } - return unhealthy < pl + return atomic.LoadInt64(&metrics.unhealthy) < int64(pl) }
893-901
: Consider enhancing metrics with more detailed information.The current metrics implementation could be improved by adding:
- Connection state counts
- Connection age
- Error counts
- Latency metrics
+type ConnectionMetrics struct { + HealthyConnections int64 + StateMetrics map[connectivity.State]int64 + LastErrorTimestamp int64 + ErrorCount int64 + AverageLatency int64 +} + +var ( + metricsData map[string]*ConnectionMetrics +) + -func Metrics(context.Context) map[string]int64 { +func Metrics(context.Context) map[string]*ConnectionMetrics { mu.RLock() defer mu.RUnlock() - if len(metrics) == 0 { + if len(metricsData) == 0 { return nil } - return maps.Clone(metrics) + return maps.Clone(metricsData) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
internal/net/grpc/pool/pool.go
(27 hunks)internal/net/net.go
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/net/net.go
⏰ Context from checks skipped due to timeout of 90000ms (109)
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: coverage
- GitHub Check: check-format-diff
- GitHub Check: Run tests for pkg packages
- GitHub Check: Run tests for internal packages
- GitHub Check: Run tests for cmd packages
- GitHub Check: grpc-stream
- GitHub Check: grpc-sequential
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: build / build
🔇 Additional comments (1)
internal/net/grpc/pool/pool.go (1)
92-95
: LGTM! Global metrics map is properly synchronized.The implementation uses a mutex to protect concurrent access to the metrics map, which is a good practice for thread safety.
fix fix fix fix fix fix Signed-off-by: kpango <[email protected]>
9726362
to
341a593
Compare
Signed-off-by: kpango <[email protected]>
Signed-off-by: kpango <[email protected]>
Signed-off-by: kpango <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
internal/observability/metrics/grpc/grpc.go (1)
83-103
: Consider adding error metrics for connection failures.While the current implementation tracks healthy connections, it would be valuable to also track connection failures to help diagnose issues.
Consider adding a counter for failed connections:
+const ( + poolConnFailedMetricsName = "server_pool_conn_failed" + poolConnFailedMetricsDescription = "Count of failed pool connections by target address" +) func (gm *grpcServerMetrics) Register(m metrics.Meter) error { healthyConn, err := m.Int64ObservableGauge( poolConnMetricsName, metrics.WithDescription(poolConnMetricsDescription), metrics.WithUnit(metrics.Dimensionless), ) if err != nil { return err } + failedConn, err := m.Int64Counter( + poolConnFailedMetricsName, + metrics.WithDescription(poolConnFailedMetricsDescription), + metrics.WithUnit(metrics.Dimensionless), + ) + if err != nil { + return err + }internal/net/grpc/pool/pool.go (1)
92-95
: Consider using sync.Map for metrics.The current implementation uses a regular map with mutex, which could be replaced with
sync.Map
for better performance in concurrent scenarios.-var ( - mu sync.RWMutex - metrics map[string]int64 = make(map[string]int64) -) +var metrics sync.Map
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (31)
apis/grpc/v1/agent/core/agent.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/agent/sidecar/sidecar.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/discoverer/discoverer.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/egress/egress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/filter/ingress/ingress_filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/meta/meta.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/mirror/mirror_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/payload/payload.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/rpc/errdetails/error_details.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/filter_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/flush_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/index_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/insert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/object_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/remove_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/search_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/update_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
apis/grpc/v1/vald/upsert_vtproto.pb.go
is excluded by!**/*.pb.go
,!**/*.pb.go
example/client/go.sum
is excluded by!**/*.sum
go.sum
is excluded by!**/*.sum
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (70)
.gitfiles
(4 hunks).github/ISSUE_TEMPLATE/bug_report.md
(1 hunks).github/ISSUE_TEMPLATE/security_issue_report.md
(1 hunks).github/PULL_REQUEST_TEMPLATE.md
(1 hunks)Makefile
(5 hunks)Makefile.d/build.mk
(2 hunks)Makefile.d/functions.mk
(3 hunks)Makefile.d/k3d.mk
(3 hunks)Makefile.d/tools.mk
(2 hunks)apis/swagger/v1/mirror/mirror.swagger.json
(1 hunks)apis/swagger/v1/vald/filter.swagger.json
(8 hunks)apis/swagger/v1/vald/flush.swagger.json
(1 hunks)apis/swagger/v1/vald/index.swagger.json
(5 hunks)apis/swagger/v1/vald/insert.swagger.json
(2 hunks)apis/swagger/v1/vald/object.swagger.json
(4 hunks)apis/swagger/v1/vald/remove.swagger.json
(3 hunks)apis/swagger/v1/vald/search.swagger.json
(8 hunks)apis/swagger/v1/vald/update.swagger.json
(3 hunks)apis/swagger/v1/vald/upsert.swagger.json
(2 hunks)example/client/go.mod
(2 hunks)go.mod
(12 hunks)internal/backoff/backoff.go
(2 hunks)internal/net/grpc/client.go
(18 hunks)internal/net/grpc/client_test.go
(2 hunks)internal/net/grpc/pool/pool.go
(27 hunks)internal/net/grpc/pool/pool_test.go
(1 hunks)internal/net/net.go
(3 hunks)internal/observability/metrics/grpc/grpc.go
(3 hunks)internal/test/mock/grpc/grpc_client_mock.go
(1 hunks)internal/test/mock/grpc_testify_mock.go
(1 hunks)k8s/discoverer/deployment.yaml
(1 hunks)k8s/gateway/gateway/lb/deployment.yaml
(1 hunks)k8s/gateway/gateway/mirror/deployment.yaml
(1 hunks)k8s/index/operator/deployment.yaml
(1 hunks)k8s/manager/index/deployment.yaml
(1 hunks)pkg/gateway/lb/handler/grpc/aggregation.go
(4 hunks)pkg/gateway/mirror/service/mirror.go
(1 hunks)rust/rust-toolchain
(1 hunks)versions/BUF_VERSION
(1 hunks)versions/CMAKE_VERSION
(1 hunks)versions/DOCKER_VERSION
(1 hunks)versions/FAISS_VERSION
(1 hunks)versions/GOLANGCILINT_VERSION
(1 hunks)versions/HELM_VERSION
(1 hunks)versions/K3S_VERSION
(1 hunks)versions/KIND_VERSION
(1 hunks)versions/KUBECTL_VERSION
(1 hunks)versions/NGT_VERSION
(1 hunks)versions/OPERATOR_SDK_VERSION
(1 hunks)versions/PROMETHEUS_STACK_VERSION
(1 hunks)versions/PROTOBUF_VERSION
(1 hunks)versions/RUST_VERSION
(1 hunks)versions/TELEPRESENCE_VERSION
(1 hunks)versions/USEARCH_VERSION
(1 hunks)versions/YQ_VERSION
(1 hunks)versions/actions/ACTIONS_SETUP_GO
(1 hunks)versions/actions/ACTIONS_SETUP_NODE
(1 hunks)versions/actions/ACTIONS_UPLOAD_ARTIFACT
(1 hunks)versions/actions/CODECOV_CODECOV_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_BUILDX_ACTION
(1 hunks)versions/actions/DOCKER_SETUP_QEMU_ACTION
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_INIT
(1 hunks)versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
(1 hunks)versions/actions/GITHUB_ISSUE_METRICS
(1 hunks)versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
(1 hunks)versions/actions/REVIEWDOG_ACTION_HADOLINT
(1 hunks)versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
(1 hunks)versions/actions/SOFTPROPS_ACTION_GH_RELEASE
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (59)
- versions/actions/PETER_EVANS_CREATE_PULL_REQUEST
- versions/actions/DOCKER_SETUP_QEMU_ACTION
- versions/HELM_VERSION
- versions/KUBECTL_VERSION
- versions/KIND_VERSION
- versions/actions/ACTIONS_SETUP_GO
- versions/BUF_VERSION
- versions/USEARCH_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_UPLOAD_SARIF
- versions/OPERATOR_SDK_VERSION
- versions/TELEPRESENCE_VERSION
- versions/CMAKE_VERSION
- versions/actions/CODECOV_CODECOV_ACTION
- versions/NGT_VERSION
- versions/FAISS_VERSION
- versions/actions/ACTIONS_SETUP_NODE
- versions/actions/ACTIONS_UPLOAD_ARTIFACT
- versions/actions/SOFTPROPS_ACTION_GH_RELEASE
- versions/GOLANGCILINT_VERSION
- versions/PROTOBUF_VERSION
- versions/actions/GITHUB_CODEQL_ACTION_INIT
- versions/actions/GITHUB_CODEQL_ACTION_ANALYZE
- versions/actions/GITHUB_CODEQL_ACTION_AUTOBUILD
- versions/K3S_VERSION
- versions/actions/DOCKER_SETUP_BUILDX_ACTION
- versions/RUST_VERSION
- versions/PROMETHEUS_STACK_VERSION
- versions/actions/REVIEWDOG_ACTION_LANGUAGETOOL
- versions/actions/GITHUB_ISSUE_METRICS
- apis/swagger/v1/vald/flush.swagger.json
- k8s/index/operator/deployment.yaml
- k8s/gateway/gateway/lb/deployment.yaml
- apis/swagger/v1/vald/index.swagger.json
- internal/test/mock/grpc/grpc_client_mock.go
- .github/ISSUE_TEMPLATE/bug_report.md
- .github/PULL_REQUEST_TEMPLATE.md
- internal/test/mock/grpc_testify_mock.go
- Makefile.d/build.mk
- Makefile.d/k3d.mk
- apis/swagger/v1/mirror/mirror.swagger.json
- internal/backoff/backoff.go
- k8s/manager/index/deployment.yaml
- k8s/discoverer/deployment.yaml
- pkg/gateway/lb/handler/grpc/aggregation.go
- versions/YQ_VERSION
- internal/net/grpc/pool/pool_test.go
- versions/DOCKER_VERSION
- rust/rust-toolchain
- versions/actions/REVIEWDOG_ACTION_HADOLINT
- .github/ISSUE_TEMPLATE/security_issue_report.md
- internal/net/net.go
- pkg/gateway/mirror/service/mirror.go
- Makefile.d/tools.mk
- k8s/gateway/gateway/mirror/deployment.yaml
- apis/swagger/v1/vald/remove.swagger.json
- Makefile.d/functions.mk
- apis/swagger/v1/vald/update.swagger.json
- example/client/go.mod
- go.mod
🧰 Additional context used
📓 Learnings (1)
.gitfiles (1)
Learnt from: datelier
PR: vdaas/vald#2807
File: rust/bin/agent/src/handler/object.rs:36-38
Timestamp: 2025-01-22T06:26:59.130Z
Learning: In Rust code for Vald handlers, avoid using unwrap() on Option types. Instead, handle None values explicitly by returning appropriate error statuses to maintain robustness.
🔇 Additional comments (39)
apis/swagger/v1/vald/upsert.swagger.json (2)
17-18
: Enhanced /upsert Documentation: Detailed Summary and DescriptionThe updated summary now offers a comprehensive overview of the Upsert RPC, including a clear description of its update/insert behavior and a detailed status code table. The added description provides valuable troubleshooting tips for common error scenarios. Please verify that the embedded markdown (tables, line breaks, and code formatting with backticks) renders correctly in your Swagger UI output.
50-51
: Enhanced MultiUpsert Endpoint DocumentationThe changes for the
/upsert/multiple
endpoint now include a more descriptive summary emphasizing the ability to update or add multiple vectors in a single request, with bold formatting for clarity. The description starts with an HTML notice warning about gRPC message size limitations and then outlines detailed status codes and troubleshooting steps. Ensure that the HTML snippet and markdown content display as intended across all consuming tools.apis/swagger/v1/vald/insert.swagger.json (3)
18-18
: Enhanced Error Handling and Troubleshooting (Description Update)
The updated description for the/insert
endpoint now provides a detailed set of common error reasons and resolutions. This added clarity is very beneficial to API users.
50-50
: Clear MultiInsert Summary
The summary for the/insert/multiple
endpoint is concise and clearly states that multiple vectors can be added in a single request. The use of markdown (e.g., 1) appropriately emphasizes the key point.
51-51
: Detailed MultiInsert Description with Warning Note
The new description for the/insert/multiple
endpoint adds a useful warning about gRPC message size limitations and provides comprehensive status code and troubleshooting information. Please verify that the HTML encoded elements (e.g.,\u003cdiv class="notice"\u003e
) render correctly in your Swagger UI environment.internal/observability/metrics/grpc/grpc.go (3)
33-34
: LGTM! Clear and descriptive metric names.The metric names and descriptions are well-defined and follow observability best practices.
37-39
: LGTM! Good struct field naming.The
poolTargetAddrKey
field is appropriately named and its purpose is clear from the context.
69-77
: LGTM! Well-structured metric view.The view for pool connection metrics is properly configured with:
- Clear name and description
- Appropriate aggregation type (Sum)
internal/net/grpc/pool/pool.go (2)
600-602
: LGTM! Thread-safe metrics update.The metrics update is properly protected by a mutex lock.
894-902
: Consider adding new Metrics function.The new
Metrics
function effectively usesmaps.Clone
for safe access to metrics data.internal/net/grpc/client.go (2)
1138-1140
: LGTM! Good method signature update.The
rangeConns
method now properly accepts context and force parameters, improving control over the connection loop.
1124-1133
: LGTM! Proper context cancellation handling.The code correctly handles context cancellation during connection closure.
.gitfiles (2)
161-177
: LGTM! Comprehensive API documentation coverage.The addition of detailed API documentation files for core Vald operations (filter, flush, index, insert, mirror, object, remove, search, update, upsert) improves the project's usability and maintainability.
1970-1974
: Integration Testing and Documentation for QBG Algorithm Are MissingThe new QBG algorithm implementation lacks:
- Test files in the QBG directory
- README or documentation explaining the algorithm's purpose and usage
Run the following script to verify the missing test files and documentation:
#!/bin/bash # Description: Check for test files and documentation for QBG algorithm echo "Checking for tests..." fd -t f "test" rust/libs/algorithms/qbg/ echo "Checking for documentation..." fd -t f "README|\.md" rust/libs/algorithms/qbg/internal/net/grpc/client_test.go (2)
3306-3308
: Context parameter added to rangeConns but test cases need update.The
rangeConns
method signature has been updated to include a context parameter but the test cases need to be updated accordingly.The test should verify context handling. Apply this diff:
type args struct { - ctx context.Context + ctx context.Context fn func(addr string, p pool.Conn) bool }
3110-3110
: 🛠️ Refactor suggestionContext parameter added but test cases need update.
The
ConnectedAddrs
method now accepts a context parameter but the test cases haven't been fully updated to reflect this change.The test should verify context cancellation and timeout scenarios. Apply this diff:
-gotAddrs := g.ConnectedAddrs(context.Background()) +ctx, cancel := context.WithTimeout(context.Background(), time.Second) +defer cancel() +gotAddrs := g.ConnectedAddrs(ctx)Likely invalid or redundant comment.
apis/swagger/v1/vald/object.swagger.json (4)
17-18
: LGTM! Documentation improvements for /exists/{id} endpoint.The documentation has been significantly enhanced with:
- Clear overview section
- Comprehensive status code table
- Detailed troubleshooting guide with common errors and resolutions
47-47
: Complete the TODO sections in documentation.The /object/list endpoint documentation has incomplete sections:
- Status Code: TODO
- Troubleshooting: TODO
Would you like me to help generate the missing documentation sections?
77-77
: Complete the TODO sections in documentation.The /object/meta/{id.id} endpoint documentation has incomplete sections:
- Status Code: TODO
- Troubleshooting: TODO
Would you like me to help generate the missing documentation sections?
106-107
: LGTM! Documentation improvements for /object/{id.id} endpoint.The documentation has been significantly enhanced with:
- Clear overview section
- Comprehensive status code table
- Detailed troubleshooting guide with common errors and resolutions
apis/swagger/v1/vald/filter.swagger.json (8)
17-17
: Enhanced InsertObject Documentation
The updated summary now clearly explains that the endpoint uses the InsertObject RPC with a detailed table of status codes. This added context will help API consumers understand potential responses.
49-49
: Improved MultiInsertObject Summary
The description for the/insert/object/multiple
endpoint has been expanded to explain that multiple objects can be inserted in one request and now includes clear status code details.
81-81
: Clarified SearchObject Summary
The updated summary for/search/object
now explains that the SearchObject RPC is used and provides users with a table of status codes. This is a welcome improvement in clarity.
113-113
: Enhanced StreamSearchObject Documentation
The revision for/search/object/multiple
now emphasizes the bidirectional streaming nature of the RPC along with status codes and troubleshooting information. This extra detail should aid both developers and integrators.
145-145
: Clearer UpdateObject Summary
The new summary for/update/object
succinctly explains that this endpoint updates a single vector. Including the status code table ensures that consumers know what to expect on error or success.
177-178
: MultiUpdateObject Enhanced with Warning Notice
The/update/object/multiple
endpoint now includes a summary and a descriptive block notifying users of gRPC message size limitations. This additional troubleshooting section—with potential reasons and guidance—is especially useful.
210-210
: UpsertObject Operation Clarified
The summary for/upsert/object
has been updated to indicate that it both updates an existing object and inserts a new one, which improves the understanding of its dual role.
241-243
: Comprehensive MultiUpsertObject Documentation
The revision in/upsert/object/multiple
now details that multiple upsert operations can be handled in one request. The added warning about message size limitations and the structured status code table are both valuable improvements.Makefile (3)
161-162
: Refactored Linker Flags
The adjustment ofLDFLAGS
to include options like-Wl,--gc-sections
and-Wl,-O2
together with the introduction ofSTATIC_LDFLAGS
(which prepends-static
) helps optimize the linking process and manage binary size/security more effectively. Please ensure that these removals (e.g. of-static
from the dynamic flag set) are fully intended.
170-176
: Introduction of COMMON_FLAGS and Compiler Settings
The newCOMMON_FLAGS
variable consolidates many optimization and visibility options, and its use inCFLAGS
,CXXFLAGS
, andFFLAGS
improves maintainability. The choice of standards (gnu17, gnu++23, f2018) is clearly documented by the variable names. Consider adding a brief comment in the Makefile about the rationale behind these specific flags for future maintainers.
177-190
: Architecture-Specific Adjustments
The conditional block forGOARCH
correctly applies additional flags for amd64 (withNGT_FLAGS
) while leaving arm64 untouched. This ensures that architecture optimizations are only applied where relevant.apis/swagger/v1/vald/search.swagger.json (8)
17-18
: Detailed LinearSearch RPC Overview
The updated summary for/linearsearch
now includes a comprehensive status code table and a troubleshooting section. This makes it much clearer what each response code means and which remedies to consider if an error occurs.
50-52
: Improved LinearSearchByID Documentation
The/linearsearch/id
endpoint now clearly states that the vector must already be indexed and provides precise troubleshooting details. This clarification should help prevent misuse and reduce confusion for API consumers.
83-85
: Enhanced MultiLinearSearchByID Summary
The revision for/linearsearch/id/multiple
includes both an updated summary and a troubleshooting table, along with a notice about gRPC message size limitations. These additions are very helpful for guiding users through common pitfalls.
116-118
: Clear MultiLinearSearch RPC Description
The updated summary and description for/linearsearch/multiple
now call out the potential issue with gRPC message sizes along with detailed status codes and troubleshooting tips. This is a very user‑friendly enhancement.
149-151
: Refined Search RPC Overview
The/search
endpoint summary is now more descriptive and includes the necessary troubleshooting information and status code table. It maintains consistency with the other search endpoints.
182-183
: Enhanced SearchByID Documentation
By detailing that the vector must be pre-indexed and offering clear troubleshooting steps, the updated summary for/search/id
significantly reduces ambiguity.
214-218
: Comprehensive MultiSearchByID Information
The/search/id/multiple
update provides an extensive troubleshooting table and clearly states the limitations and remedies. This added level of detail is very helpful for debugging issues when searching by multiple IDs.
248-250
: Improved MultiSearch RPC Clarity
The changes in/search/multiple
standardize the wording for multi‑vector searches. The combination of summary, status code table, and troubleshooting instructions delivers a consistent and clear documentation experience.
@@ -14,7 +14,8 @@ | |||
"paths": { | |||
"/insert": { | |||
"post": { | |||
"summary": "A method to add a new single vector.", | |||
"summary": "Overview\nInset RPC is the method to add a new single vector.\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Typo in RPC Name (Summary Update)
The summary for the /insert
endpoint currently reads "Inset RPC" which appears to be a typographical error. It should likely be "Insert RPC" to match the intended meaning.
Proposed change:
- "summary": "Overview\nInset RPC is the method to add a new single vector.\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.",
+ "summary": "Overview\nInsert RPC is the method to add a new single vector.\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.",
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"summary": "Overview\nInset RPC is the method to add a new single vector.\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`.", | |
"summary": "Overview\nInsert RPC is the method to add a new single vector.\n---\nStatus Code\n| 0 | OK |\n| 1 | CANCELLED |\n| 3 | INVALID_ARGUMENT |\n| 4 | DEADLINE_EXCEEDED |\n| 5 | NOT_FOUND |\n| 13 | INTERNAL |\n---\nTroubleshooting\nThe request process may not be completed when the response code is NOT `0 (OK)`", |
if force { | ||
cnt++ | ||
return fn(addr, p) | ||
} | ||
if p == nil || !p.IsHealthy(ctx) { | ||
pc, err := p.Connect(ctx) | ||
if pc == nil || err != nil || !pc.IsHealthy(ctx) { | ||
if pc != nil && pc.IsIPConn() { | ||
log.Debugf("Failed to re-connect unhealthy connection for %s: %v, trying to disconnect unhealthy", addr, err) | ||
if derr := pc.Disconnect(ctx); derr != nil { | ||
log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) | ||
} else { | ||
g.conns.Delete(addr) | ||
} | ||
} | ||
log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) | ||
return true | ||
} | ||
p = pc | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling in forced connection checks.
The forced connection check logic could be improved to handle errors more gracefully.
if force {
+ if p == nil {
+ log.Debugf("Nil connection detected for %s during forced check", addr)
+ return true
+ }
cnt++
return fn(addr, p)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if force { | |
cnt++ | |
return fn(addr, p) | |
} | |
if p == nil || !p.IsHealthy(ctx) { | |
pc, err := p.Connect(ctx) | |
if pc == nil || err != nil || !pc.IsHealthy(ctx) { | |
if pc != nil && pc.IsIPConn() { | |
log.Debugf("Failed to re-connect unhealthy connection for %s: %v, trying to disconnect unhealthy", addr, err) | |
if derr := pc.Disconnect(ctx); derr != nil { | |
log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) | |
} else { | |
g.conns.Delete(addr) | |
} | |
} | |
log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) | |
return true | |
} | |
p = pc | |
} | |
if force { | |
if p == nil { | |
log.Debugf("Nil connection detected for %s during forced check", addr) | |
return true | |
} | |
cnt++ | |
return fn(addr, p) | |
} | |
if p == nil || !p.IsHealthy(ctx) { | |
pc, err := p.Connect(ctx) | |
if pc == nil || err != nil || !pc.IsHealthy(ctx) { | |
if pc != nil && pc.IsIPConn() { | |
log.Debugf("Failed to re-connect unhealthy connection for %s: %v, trying to disconnect unhealthy", addr, err) | |
if derr := pc.Disconnect(ctx); derr != nil { | |
log.Debugf("Failed to disconnect unhealthy connection for %s: %v", addr, derr) | |
} else { | |
g.conns.Delete(addr) | |
} | |
} | |
log.Debugf("Unhealthy connection detected for %s during gRPC Connection Range over Loop:\t%s", addr, p.String()) | |
return true | |
} | |
p = pc | |
} |
Signed-off-by: kpango <[email protected]>
Description
[WIP]
Related Issue
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
Documentation
Chores