-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to auto-instrument with the OTel Java SDK #1704
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1704 +/- ##
==========================================
+ Coverage 65.29% 69.68% +4.38%
==========================================
Files 196 210 +14
Lines 20086 21282 +1196
==========================================
+ Hits 13116 14831 +1715
+ Misses 6152 5721 -431
+ Partials 818 730 -88
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is legit great!
I left a bunch of nits/suggestions, feel free to ignore. I guess the only potential issue is the assumption of "." as the current working directory - let me know what you think
sdkInstrumented := false | ||
if ta.sdkInjectionPossible(&instr.Obj) { | ||
if err := ta.sdkInjector.NewExecutable(&instr.Obj); err == nil { | ||
sdkInstrumented = true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a cosmetic nit - if you wrap this into a function, you can reword this as
sdkInstrumented := ta.trySDKInjection(&instr.Obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's amazing!!!
I'd raise some concerns/questions, not to be addressed right now in this PR, but to think about in the future:
- What happens if we stop Beyla? Is the agent unloaded or it keeps running and instrumenting until the service is restarted?
- If Beyla is restarted with a new configuration (for example, a new OTEL endpoint or credentials), the new configuration should be applied in the target service. This might require a way to know when a service has been instrumented "manually" or when the instrumentation has been injected from Beyla.
- Having the JAR file embedded in our repository might look a bit obscure for some users. I'd discuss the option of either compile it from the sources or, download the binary from a trusted environment.
But anyway, this is an amazing leap forward!
Dockerfile
Outdated
@@ -39,6 +40,7 @@ COPY --from=builder /opt/app-root/bin/beyla . | |||
COPY --from=builder /opt/app-root/LICENSE . | |||
COPY --from=builder /opt/app-root/NOTICE . | |||
COPY --from=builder /opt/app-root/third_party_licenses.csv . | |||
COPY --from=builder /opt/app-root/opentelemetry-javaagent.jar . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As something to improve in the future, we should find a way track the javaagent version and include it in our vulnerability scans.
pkg/internal/otelsdk/sdk_inject.go
Outdated
"github.com/grafana/beyla/v2/pkg/internal/svc" | ||
) | ||
|
||
const OTelJDKAgent = "opentelemetry-javaagent.jar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a future task, I'd set this value configurable so the user can provide its own version of the java-agent (e.g. a patched version, or an updated version to temporarily fix a discovered vulnerability).
Actually, I think we should address the issues @mariomac brought up before we merge this PR. So far Beyla is low risk: If instrumentation causes issues, you can always stop Beyla and everything's back to normal. If that changes because attaching the JVM agent can't be undone, we need an idea what to do if the agent causes issues or if there is a misconfiguration (wrong OTLP endpoint, too high sampling rate, etc). |
…t into docker_utils
Generalise endpoint resolution func
Good point, I think we can work on fixing this in follow-up PRs. I have some ideas, I think we should be able to supply set of options to make the agent effectively disabled.
Great suggestion, I think we can do this. Right now we avoid double instrumenting, but we can do better than this. We should be able to tell if this agent was loaded on the command line or not, so to tell if it was managed by Beyla or not.
Rafael pushed changes to this PR and resolved this issue and the stuff that was bugging him :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Groovy!
We are not going to document this option for now, so that we can work on making it more solid. As you and Mario mentioned there are edge-cases we need to consider. But if we did it all at once it would be a much larger PR and complex. This way we can iterate on this until we are confident we can make this option suitable for customers to turn on. |
@@ -59,4 +59,7 @@ type EBPFTracer struct { | |||
|
|||
// Enables debug printing of the protocol data | |||
ProtocolDebug bool `yaml:"protocol_debug_print" env:"BEYLA_PROTOCOL_DEBUG_PRINT"` | |||
|
|||
// Enables Java instrumentation with the OpenTelemetry JDK Agent | |||
UseOTelSDKForJava bool `yaml:"use_otel_sdk_for_java" env:"BEYLA_USE_OTEL_SDK_FOR_JAVA"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code above, I assume that enabling this would only generate OTEL metrics or traces. I think we should validate that if Prometheus is enabled this option can't be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this happens right now:
https://github.com/grafana/beyla/pull/1704/files#diff-d6000537195fb5cd7aa9e0b6bea705c332ce53d5db9e070e39dffcdc191562e3R65
With prometheus scrape Metrics will be false, so we'll program it to send traces only.
The Hotspot JVM allows for dynamic loading of agents and this provides a way for auto-instrumentation of Java applications without the use of eBPF, but by loading the OpenTelemetry Java agent dynamically. This PR introduces an experimental support for using the OpenTelemetry agent instead of eBPF when instrumenting Java applications.
The way this is implemented is as follows:
jcmd
commands.