-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sn): add metrics for Append and Replicate RPCs #806
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 tasks
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dockerfile_grpc_health_probe #806 +/- ##
===============================================================
Coverage ? 68.46%
===============================================================
Files ? 182
Lines ? 17899
Branches ? 0
===============================================================
Hits ? 12255
Misses ? 4859
Partials ? 785 ☔ View full report in Codecov by Sentry. |
ijsong
added a commit
that referenced
this pull request
Jun 8, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 8, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
hungryjang
approved these changes
Jun 10, 2024
This pull request defines metrics for measuring RPCs such as Append and Replicate. It introduces four metrics: - `log_rpc.server.duration` measures the time spent processing inbound RPC calls in microseconds. It is very similar to the `rpc.server.duration` defined by OpenTelemetry, but our metric also measures the processing time triggered by each call on a gRPC stream. - `log_rpc.server.log_entry.size` measures the size of appended log entries. It is similar to the `rpc.server.request.size` metric, but our metric measures the size of each log entry included in the appended batch. - `log_rpc.server.batch.size` measures the size of log entry batches appended. - `log_rpc.server.log_entries_per_batch` measures the number of log entries per appended batch. These metrics are histogram-type, allowing us to compute percentiles and analyze histograms and heat maps. Users can leverage these metrics to analyze the duration of RPCs, the distribution of log entry sizes, and the length of batches. We expect users to find better configurations to optimize storage node performance.
ccee2ed
to
5aae6ec
Compare
ijsong
added a commit
that referenced
this pull request
Jun 13, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 13, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 14, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 14, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 15, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
ijsong
added a commit
that referenced
this pull request
Jun 16, 2024
Improve unmarshaling performance by reusing buffers for ReplicateRequest in the backup replica. The protobuf message `github.com/kakao/varlog/proto/snpb.(ReplicateRequest)` has two slice fields—LLSN (`[]uint64`) and Data (`[][]byte`). The backup replica receives replicated log entries from the primary replica via the gRPC service `github.com/kakao/varlog/proto/snpb.(ReplicatorServer).Replicate`, which sends `ReplicateRequest` messages. Upon receiving a `ReplicateRequest`, the backup replica unmarshals the message, which involves growing slices for fields such as LLSN and Data. This growth causes copy overhead whenever the slice capacities need to expand. To address this, we introduce a new method, `ResetReuse`, for reusing slices instead of resetting them completely. The `ResetReuse` method shrinks the slice lengths while preserving their capacities, thus avoiding the overhead of reallocating memory. Example implementation: ```go type Message struct { Buffer []byte // Other fields } func (m *Message) Reset() { *m = Message{} } func (m *Message) ResetReuse() { s := m.Buffer[:0] *m = Message{} m.Buffer = s } ``` Risks: This approach has potential downsides. Since the heap space consumed by the slices is not reclaimed, the storage node's memory consumption may increase. Currently, there is no mechanism to shrink the heap usage. Additionally, this PR changes the generated code. The protobuf compiler can revert it, which is contrary to our intention. To catch this mistake, this PR includes a unit test (github.com/kakao/varlog/proto/snpb.TestReplicateRequest) to verify that the buffer backing the slices is reused. Resolves: #795 See also: #806
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
This pull request defines metrics for measuring RPCs such as Append and
Replicate. It introduces four metrics:
log_rpc.server.duration
measures the time spent processing inbound RPC callsin microseconds. It is very similar to the
rpc.server.duration
defined byOpenTelemetry, but our metric also measures the processing time triggered by
each call on a gRPC stream.
log_rpc.server.log_entry.size
measures the size of appended log entries. Itis similar to the
rpc.server.request.size
metric, but our metric measuresthe size of each log entry included in the appended batch.
log_rpc.server.batch.size
measures the size of log entry batches appended.log_rpc.server.log_entries_per_batch
measures the number of log entries perappended batch.
These metrics are histogram-type, allowing us to compute percentiles and analyze
histograms and heat maps. Users can leverage these metrics to analyze the
duration of RPCs, the distribution of log entry sizes, and the length of
batches. We expect users to find better configurations to optimize storage node
performance.
Which issue(s) this PR resolves
Update: #795