Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the overall fidelity of JFR profiles and reduce sampling bias #1376

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions changelog/@unreleased/pr-1376.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
type: improvement
improvement:
description: |-
Improve the overall fidelity of JFR profiles and reduce safepoint sampling bias

Note due to https://bugs.openjdk.org/browse/JDK-8201516 there may still
be potentially small incorrect attribution, but overall this tradeoff is
worth the sampling bias impact.

See references:
* http://hirt.se/blog/?p=609
* https://docs.oracle.com/javacomponents/jmc-5-5/jfr-runtime-guide/about.htm#JFRRT111
* https://psy-lob-saw.blogspot.com/2016/02/why-most-sampling-java-profilers-are.html
* https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html
links:
- https://github.com/palantir/sls-packaging/pull/1376
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@ public abstract class LaunchConfigTask extends DefaultTask {
// AWS-managed systems that modify DNS records on failover.
"-Dsun.net.inetaddr.ttl=20",
"-XX:NativeMemoryTracking=summary",
// Improve the overall fidelity of JFR profiles and reduce sampling bias,
// see http://hirt.se/blog/?p=609 & https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html
// Per https://bugs.openjdk.org/browse/JDK-8201516 may still be potentially incorrect attribution.
"-XX:+UnlockDiagnosticVMOptions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some danger opting into this by default: jvm option overrides may rely upon this without realizing, meaning it cannot safely be removed from sls-packaging. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, adding this now would make removing any diagnostic VM arg that depends on it a break. We could mitigate a bit with -XX:+IgnoreUnrecognizedVMOptions where startup would no longer fail on those args, but we could silently end up in a situation where we assume args are in effect when they're not actually.

"-XX:+DebugNonSafepoints",
Copy link
Contributor Author

@schlosna schlosna Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider testing impact of preserving frame pointers for native and kernel traces, though that may be too expensive to enable by default:

Suggested change
"-XX:+DebugNonSafepoints",
"-XX:+DebugNonSafepoints",
"-XX:+PreserveFramePointer",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to benchmark myself, do you know if there's a measurable runtime cost associated with + DebugNonSafepoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had cycles yet to benchmark with/without -XX:+DebugNonSafepoints on current JDKs for our various workflows such as WC infra. The JDK ticket and Jean-Phillipe Bempel's Why JVM modern profilers are still safepoint biased? blog post show ~2% overhead. If you have some pointers to spin those WC benchmarks up, I can try to give them a shot to understand potential fleet implications.

Async-profiler enables DebugNonSafepoints via JVMTI agent registering CompileMethodLoad

Also worth reading the note http://hirt.se/blog/?p=609 from Marcus Hirt, one of the original JRockit and JFR authors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks don't appear to be impacted (results are within margin of error)

// Increase default JFR stack depth beyond the default (conservative) 64 frames.
// This can be overridden by user-provided options.
// See sls-packaging#1230
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,8 @@ class JavaServiceDistributionPluginTests extends GradleIntegrationSpec {
'-XX:HeapDumpPath=var/log',
'-Dsun.net.inetaddr.ttl=20',
'-XX:NativeMemoryTracking=summary',
'-XX:+UnlockDiagnosticVMOptions',
'-XX:+DebugNonSafepoints',
'-XX:FlightRecorderOptions=stackdepth=256',
'-XX:+UseParallelGC',
'-Xmx4M',
Expand All @@ -432,6 +434,8 @@ class JavaServiceDistributionPluginTests extends GradleIntegrationSpec {
'-XX:HeapDumpPath=var/log',
'-Dsun.net.inetaddr.ttl=20',
'-XX:NativeMemoryTracking=summary',
'-XX:+UnlockDiagnosticVMOptions',
'-XX:+DebugNonSafepoints',
'-XX:FlightRecorderOptions=stackdepth=256',
'-Xmx4M',
'-Djavax.net.ssl.trustStore=truststore.jks'])
Expand Down Expand Up @@ -471,6 +475,8 @@ class JavaServiceDistributionPluginTests extends GradleIntegrationSpec {
'-XX:HeapDumpPath=var/log',
'-Dsun.net.inetaddr.ttl=20',
'-XX:NativeMemoryTracking=summary',
'-XX:+UnlockDiagnosticVMOptions',
'-XX:+DebugNonSafepoints',
'-XX:FlightRecorderOptions=stackdepth=256',
"-XX:+PrintGCDateStamps",
"-XX:+PrintGCDetails",
Expand Down