Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the overall fidelity of JFR profiles and reduce sampling bias #1376

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

schlosna
Copy link
Contributor

@schlosna schlosna commented Aug 6, 2022

Before this PR

sls-packaging did not enable DebugNonSafepoints by default, so JFR profiles may exhibit safepoint sampling bias for method profiling.

After this PR

==COMMIT_MSG==
Improve the overall fidelity of JFR profiles and reduce safepoint sampling bias

Note due to https://bugs.openjdk.org/browse/JDK-8201516 there may still
be potentially small incorrect attribution, but overall this tradeoff is
worth the sampling bias impact.

See references:

Possible downsides?

Enables diagnostic JVM options by default now, and also JFR profiles may exhibit some incorrect attribution due to https://bugs.openjdk.org/browse/JDK-8201516

@schlosna schlosna requested a review from carterkozak August 6, 2022 15:14
@changelog-app
Copy link

changelog-app bot commented Aug 6, 2022

Generate changelog in changelog/@unreleased

Type
See change types. Select one:

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Improve the overall fidelity of JFR profiles and reduce safepoint sampling bias

Note due to https://bugs.openjdk.org/browse/JDK-8201516 there may still
be potentially small incorrect attribution, but overall this tradeoff is
worth the sampling bias impact.

See references:

Check the box to generate changelog(s)

  • Generate changelog entry

// see http://hirt.se/blog/?p=609 & https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html
// Per https://bugs.openjdk.org/browse/JDK-8201516 may still be potentially incorrect attribution.
"-XX:+UnlockDiagnosticVMOptions",
"-XX:+DebugNonSafepoints",
Copy link
Contributor Author

@schlosna schlosna Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider testing impact of preserving frame pointers for native and kernel traces, though that may be too expensive to enable by default:

Suggested change
"-XX:+DebugNonSafepoints",
"-XX:+DebugNonSafepoints",
"-XX:+PreserveFramePointer",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to benchmark myself, do you know if there's a measurable runtime cost associated with + DebugNonSafepoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had cycles yet to benchmark with/without -XX:+DebugNonSafepoints on current JDKs for our various workflows such as WC infra. The JDK ticket and Jean-Phillipe Bempel's Why JVM modern profilers are still safepoint biased? blog post show ~2% overhead. If you have some pointers to spin those WC benchmarks up, I can try to give them a shot to understand potential fleet implications.

Async-profiler enables DebugNonSafepoints via JVMTI agent registering CompileMethodLoad

Also worth reading the note http://hirt.se/blog/?p=609 from Marcus Hirt, one of the original JRockit and JFR authors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks don't appear to be impacted (results are within margin of error)

// Improve the overall fidelity of JFR profiles and reduce sampling bias,
// see http://hirt.se/blog/?p=609 & https://jpbempel.github.io/2022/06/22/debug-non-safepoints.html
// Per https://bugs.openjdk.org/browse/JDK-8201516 may still be potentially incorrect attribution.
"-XX:+UnlockDiagnosticVMOptions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some danger opting into this by default: jvm option overrides may rely upon this without realizing, meaning it cannot safely be removed from sls-packaging. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, adding this now would make removing any diagnostic VM arg that depends on it a break. We could mitigate a bit with -XX:+IgnoreUnrecognizedVMOptions where startup would no longer fail on those args, but we could silently end up in a situation where we assume args are in effect when they're not actually.

@schlosna
Copy link
Contributor Author

@carterkozak curious for your thoughts on this again. I can rebase on develop if interested.

@carterkozak
Copy link
Contributor

I think last time we tried this, it produced method and allocation stack traces that pointed near the points we were measuring, but off by enough to cause confusion. It's certainly better to avoid sampling bias, but I'm not sure whether or not it's net positive if the stack traces lead us down rabbit holes.
I'd certainly be willing to try it more broadly for a time to collect more data, the cost of regret should be very low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants