-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument driver with OpenTelemetry tracing #147
base: scylla-3.x
Are you sure you want to change the base?
Conversation
1f667e4
to
0c3522e
Compare
Some of the tests are failing. Those should be fixed by: zpp-2021-telemetry/java-driver#10 - please look if this PR could be simplified (the reason it wasn't merged was that it was a bit too complex to understand and not documented well enough). |
Minor nit: formatting of commit messages: there shouldn't be a |
driver-opentelemetry/src/test/java/com/datastax/driver/opentelemetry/OpenTelemetryTest.java
Show resolved
Hide resolved
driver-opentelemetry/src/test/java/com/datastax/driver/opentelemetry/OpenTelemetryTest.java
Show resolved
Hide resolved
driver-opentelemetry/src/test/java/com/datastax/driver/opentelemetry/OpenTelemetryTest.java
Show resolved
Hide resolved
6134b17
to
f838d27
Compare
In order to be generic and to still stick to Java 6 requirement for driver-core, abstract classes: TracingInfo and TracingInfoFactory are introduced. Further implementations of tracing are expected to be placed apart from driver-core and hence keep Java 8 (or higher) requirements away.
5b6e60b
to
15e2d53
Compare
4cbfa72
to
e210be5
Compare
After long adventures with failing tests, the PR is finally ready for last review. |
e210be5
to
d302542
Compare
@avelanarius the requested fixes regading boundValues and partitionKey have been added. |
As some gathered data can be heavy or sensitive or we might want to not have them in the trace for some reason, verbosity level layer has been added. Currently, statement is both possibly long and containing some sensitive data, so it is included only on level FULL (and not NORMAL).
d302542
to
c833b22
Compare
While building a Cluster object, one can pass a TracingInfo class implementation and it will be used for tracing in all sessions with this cluster from now on.
Every request (in RequestHandler) is covered by one "request" span, and every (speculative) execution has its own "speculative_execution" span. Each retry attempt is covered by "attempt" span, effectively yielding a tree of form: "request" -1--many-> "speculative_execution" -1--many-> "attempt"
After a request is completed, its status code (OK or ERROR) is added to trace data, as well as exception information (if one occured).
All data that constitute added tags are already know by the driver, i.e. they do not require fetching any additional metadata from the cluster.
Renamed PrecisionLevel to VerbosityLevel. |
These tests verify that the span tree structure is valid and that spans are filled with tags of proper content, by providing a mock implementation of TracingInfo.
Added TracingInfo and TracingInfoFactory implementations in separate driver-opentelemetry module.
Added OpenTelemetry and Zipkin connection configuration classes, along with docker-compose file, example run script and related changes in README.
53cdcaa
to
cc23ceb
Compare
The test verifies that span tree structure and status code are valid. Speculative executions are run parallel to the main thread, so some of them can finish only after query result has been returned. Thus, in order to collect span data from entire request, we decided to wait until all speculative executions end. The main thread uses conditional variable `allEnded` to wait for them and lock is used for concurrent mutation of activeSpans.
0d1c3e9
to
f5fc83d
Compare
In order to increase driver's observability, it has been instrumented with OTel tracing. Requests, speculative executions and particular retries are covered with appropriate spans, which are filled with informative tags. The tracing incurs no overhead by default (when off), and bearable overhead when on. The PR contains (not many yet) tests and a Zipkin-based example.