-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the overall fidelity of JFR profiles and reduce sampling bias #1376
base: develop
Are you sure you want to change the base?
Conversation
Note due to https://bugs.openjdk.org/browse/JDK-8201516 there may still be potentially small incorrect attribution, but overall this tradeoff is worth the sampling bias impact. See references: * http://hirt.se/blog/?p=609 * https://docs.oracle.com/javacomponents/jmc-5-5/jfr-runtime-guide/about.htm#JFRRT111 * https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html
Generate changelog in
|
// see http://hirt.se/blog/?p=609 & https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html | ||
// Per https://bugs.openjdk.org/browse/JDK-8201516 may still be potentially incorrect attribution. | ||
"-XX:+UnlockDiagnosticVMOptions", | ||
"-XX:+DebugNonSafepoints", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider testing impact of preserving frame pointers for native and kernel traces, though that may be too expensive to enable by default:
- https://github.com/jvm-profiling-tools/async-profiler
- http://techblog.netflix.com/2015/07/java-in-flames.html
- https://bugs.openjdk.org/browse/JDK-8068945
"-XX:+DebugNonSafepoints", | |
"-XX:+DebugNonSafepoints", | |
"-XX:+PreserveFramePointer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to benchmark myself, do you know if there's a measurable runtime cost associated with + DebugNonSafepoints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had cycles yet to benchmark with/without -XX:+DebugNonSafepoints
on current JDKs for our various workflows such as WC infra. The JDK ticket and Jean-Phillipe Bempel's Why JVM modern profilers are still safepoint biased?
blog post show ~2% overhead. If you have some pointers to spin those WC benchmarks up, I can try to give them a shot to understand potential fleet implications.
Async-profiler enables DebugNonSafepoints via JVMTI agent registering CompileMethodLoad
Also worth reading the note http://hirt.se/blog/?p=609 from Marcus Hirt, one of the original JRockit and JFR authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks don't appear to be impacted (results are within margin of error)
// Improve the overall fidelity of JFR profiles and reduce sampling bias, | ||
// see http://hirt.se/blog/?p=609 & https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html | ||
// Per https://bugs.openjdk.org/browse/JDK-8201516 may still be potentially incorrect attribution. | ||
"-XX:+UnlockDiagnosticVMOptions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some danger opting into this by default: jvm option overrides may rely upon this without realizing, meaning it cannot safely be removed from sls-packaging. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding this now would make removing any diagnostic VM arg that depends on it a break. We could mitigate a bit with -XX:+IgnoreUnrecognizedVMOptions
where startup would no longer fail on those args, but we could silently end up in a situation where we assume args are in effect when they're not actually.
@carterkozak curious for your thoughts on this again. I can rebase on develop if interested. |
I think last time we tried this, it produced method and allocation stack traces that pointed near the points we were measuring, but off by enough to cause confusion. It's certainly better to avoid sampling bias, but I'm not sure whether or not it's net positive if the stack traces lead us down rabbit holes. |
Before this PR
sls-packaging did not enable
DebugNonSafepoints
by default, so JFR profiles may exhibit safepoint sampling bias for method profiling.After this PR
==COMMIT_MSG==
Improve the overall fidelity of JFR profiles and reduce safepoint sampling bias
Note due to https://bugs.openjdk.org/browse/JDK-8201516 there may still
be potentially small incorrect attribution, but overall this tradeoff is
worth the sampling bias impact.
See references:
==COMMIT_MSG==
Possible downsides?
Enables diagnostic JVM options by default now, and also JFR profiles may exhibit some incorrect attribution due to https://bugs.openjdk.org/browse/JDK-8201516