-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable pprof cpu profiling #5081
Conversation
This adds two routes to control pprof Enable CPU profiling (100hz) http://localhost:7280/pprof/start Get the profile as flamegraph http://localhost:7280/pprof/stop The routes are behind the `pprof` feature flag
quickwit/quickwit-serve/src/pprof.rs
Outdated
}) | ||
}; | ||
|
||
fn get_flamegraph(profiler_guard: Arc<Mutex<Option<ProfilerGuard>>>) -> impl warp::Reply { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't the following have worked too?
fn get_flamegraph(profiler_guard: Arc<Mutex<Option<ProfilerGuard>>>) -> impl warp::Reply { | |
fn get_flamegraph(profiler_guard: &Mutex<Option<ProfilerGuard>>) -> impl warp::Reply { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the old version yes, in the new version we need a static lifetime
quickwit/quickwit-serve/src/pprof.rs
Outdated
if let Some(profiler) = guard.take() { | ||
if let Ok(report) = profiler.report().build() { | ||
let mut buffer = Vec::new(); | ||
if report.flamegraph(&mut buffer).is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if generating the report is cpu heavy? if so we should probably run it on the thread pool for cpu intensive stuff.
run_cpu_intensive
in quickwit_common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes they can be cpu intensive, I moved it to a new thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good for experimentations but this is quite the footgun. Are statistics gathered really bounded?
What would happen if someone called start and never called stop?
Can we guard ourselves against that?
I assume your point is that it feels silly to hope for a GET request to run over 30s.
Could it be two endpoints like you did, but
pprof/run
would start profiling for 30s, and
pprof/flamegraph
would return the last computed profile, or something like that?
In your test were 100Hz sufficient? Should we make the frequency and the duration configurable maybe?
add safeguards add query parameters keep and return last flamegraph
async fn save_flamegraph( | ||
profiler_state: Arc<Mutex<ProfilerState>>, | ||
) -> tokio::task::JoinHandle<()> { | ||
spawn_blocking(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it is better to use quickwit_common:run_cpu_intensive
because spawn blocking is an infinite pool.
That being said, here, I think we don't care much.
Fix if you want.
I didn't see memory issues while collecting, but there's a performance hit usually, so we should avoid to run it indefinitely by accident. 100hz is usually fine, except very short running queries. |
I'm going to move this endpoint to |
This adds two routes to control pprof
Enable CPU profiling (100hz)
http://localhost:7280/pprof/start
Get the profile as flamegraph
http://localhost:7280/pprof/flamegraph
The routes are behind the
pprof
feature flagcargo install --locked --path . --features pprof