-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: tracer: upgrade elastic search transport for pubsub traces #10405
feat: tracer: upgrade elastic search transport for pubsub traces #10405
Conversation
UpdatesThe suggested changes have been running for 5 consecutive days on a locally monitored node. To bring more information about how the remote traces submission impacts the overall performance of the Lotus daemon, these are some plots I generated when comparing the resources used the NOTE:
Highlights
Resource profiling
|
a028d9e
to
67d419e
Compare
Rebased PR to |
b76f4f5
to
87235d2
Compare
Rebase fork to latest |
dff25de
to
c2e2725
Compare
Hey, @snadrus ! Thank you so much for the feedback! I just addressed your comments bringing back the |
Related Issues
The current lotus trace wrapper in
master
has a limited number of traces pushed to the Elastic Search (ES) remote instance. The daemon only pushes traces related to:Furthermore, the current code also forces the user to set up a
RemoteTrace
in theconfig-file
to track the published messages of the desired topics.Related to the proposed gossipsub measurement study protocol/network-measurements#17, to track broadcasting latencies of gossipsub messages, the
PRC_Recv
events are needed. Thus, this PR enables the remote submission of RPC-related traces. As RPC calls already include all themesh-control
messages, these ones are disabled in pro of reducing the overall overhead.Proposed Changes
Submitting all the RPC traces to the remote ES instance through individual HTTP requests adds significant overhead to the lotus daemon, making it lose synchronization with the head of the chain. Thus, this PR upgrades the Elastic Search transport to support trace batching and a faster HTTP transport in the ES client.
The PR applies some parameters that would work fine and are currently being tested. Feel free to suggest better approaches or parameters like the
flushing time
orbuffer limit
for the batching system.Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>