enable pprof cpu profiling #5081

PSeitz · 2024-06-05T05:29:59Z

This adds two routes to control pprof
Enable CPU profiling (100hz)
http://localhost:7280/pprof/start
Get the profile as flamegraph
http://localhost:7280/pprof/flamegraph

The routes are behind the pprof feature flag

cargo install --locked --path . --features pprof

This adds two routes to control pprof Enable CPU profiling (100hz) http://localhost:7280/pprof/start Get the profile as flamegraph http://localhost:7280/pprof/stop The routes are behind the `pprof` feature flag

github-actions · 2024-06-05T06:03:39Z

On SSD:

Average search latency is 1.0x that of the reference (lower is better).
Ref run id: 1870, ref commit: b57eb12
Link

On GCS:

Average search latency is 1.2x that of the reference (lower is better).
Ref run id: 1871, ref commit: b57eb12
Link

fulmicoton · 2024-06-05T06:51:50Z

quickwit/quickwit-serve/src/pprof.rs

+        })
+    };
+
+    fn get_flamegraph(profiler_guard: Arc<Mutex<Option<ProfilerGuard>>>) -> impl warp::Reply {


wouldn't the following have worked too?

Suggested change

fn get_flamegraph(profiler_guard: Arc<Mutex<Option<ProfilerGuard>>>) -> impl warp::Reply {

fn get_flamegraph(profiler_guard: &Mutex<Option<ProfilerGuard>>) -> impl warp::Reply {

in the old version yes, in the new version we need a static lifetime

fulmicoton · 2024-06-05T06:53:37Z

quickwit/quickwit-serve/src/pprof.rs

+        if let Some(profiler) = guard.take() {
+            if let Ok(report) = profiler.report().build() {
+                let mut buffer = Vec::new();
+                if report.flamegraph(&mut buffer).is_ok() {


do you know if generating the report is cpu heavy? if so we should probably run it on the thread pool for cpu intensive stuff.

run_cpu_intensive in quickwit_common

yes they can be cpu intensive, I moved it to a new thread

fulmicoton

That looks good for experimentations but this is quite the footgun. Are statistics gathered really bounded?

What would happen if someone called start and never called stop?
Can we guard ourselves against that?

I assume your point is that it feels silly to hope for a GET request to run over 30s.
Could it be two endpoints like you did, but

pprof/run would start profiling for 30s, and
pprof/flamegraph would return the last computed profile, or something like that?

In your test were 100Hz sufficient? Should we make the frequency and the duration configurable maybe?

add safeguards add query parameters keep and return last flamegraph

fulmicoton · 2024-06-05T09:10:00Z

quickwit/quickwit-serve/src/pprof.rs

+    async fn save_flamegraph(
+        profiler_state: Arc<Mutex<ProfilerState>>,
+    ) -> tokio::task::JoinHandle<()> {
+        spawn_blocking(move || {


Technically it is better to use quickwit_common:run_cpu_intensive
because spawn blocking is an infinite pool.

That being said, here, I think we don't care much.
Fix if you want.

quickwit/quickwit-serve/src/pprof.rs

PSeitz · 2024-06-05T11:38:55Z

I didn't see memory issues while collecting, but there's a performance hit usually, so we should avoid to run it indefinitely by accident.

100hz is usually fine, except very short running queries.

guilload · 2024-06-05T15:33:12Z

I'm going to move this endpoint to api/developer/pprof*.

enable pprof cpu profiling

752feca

This adds two routes to control pprof Enable CPU profiling (100hz) http://localhost:7280/pprof/start Get the profile as flamegraph http://localhost:7280/pprof/stop The routes are behind the `pprof` feature flag

fulmicoton reviewed Jun 5, 2024

View reviewed changes

fulmicoton requested changes Jun 5, 2024

View reviewed changes

switch to flamegraph endpoint

18bd499

add safeguards add query parameters keep and return last flamegraph

fulmicoton reviewed Jun 5, 2024

View reviewed changes

quickwit/quickwit-serve/src/pprof.rs Show resolved Hide resolved

fulmicoton approved these changes Jun 5, 2024

View reviewed changes

Merge branch 'main' into pprof_rs

0182802

PSeitz merged commit 01571db into main Jun 5, 2024
5 checks passed

PSeitz deleted the pprof_rs branch June 5, 2024 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable pprof cpu profiling #5081

enable pprof cpu profiling #5081

PSeitz commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading

fulmicoton Jun 5, 2024

PSeitz Jun 5, 2024

fulmicoton Jun 5, 2024

PSeitz Jun 5, 2024

fulmicoton left a comment

fulmicoton Jun 5, 2024

PSeitz commented Jun 5, 2024

guilload commented Jun 5, 2024

	fn get_flamegraph(profiler_guard: Arc<Mutex<Option<ProfilerGuard>>>) -> impl warp::Reply {
	fn get_flamegraph(profiler_guard: &Mutex<Option<ProfilerGuard>>) -> impl warp::Reply {

enable pprof cpu profiling #5081

enable pprof cpu profiling #5081

Conversation

PSeitz commented Jun 5, 2024 • edited Loading

github-actions bot commented Jun 5, 2024 • edited Loading

On SSD:

On GCS:

fulmicoton Jun 5, 2024

Choose a reason for hiding this comment

PSeitz Jun 5, 2024

Choose a reason for hiding this comment

fulmicoton Jun 5, 2024

Choose a reason for hiding this comment

PSeitz Jun 5, 2024

Choose a reason for hiding this comment

fulmicoton left a comment

Choose a reason for hiding this comment

fulmicoton Jun 5, 2024

Choose a reason for hiding this comment

PSeitz commented Jun 5, 2024

guilload commented Jun 5, 2024

PSeitz commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Jun 5, 2024 •

edited

Loading