Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

MaduMitha-Ravi · 2024-11-18T03:36:37Z

Increase in Outliers beyond 16 thread concurrency (Last value cache)

For concurrency threads greater than 16, we are observing more outliers nearly 5x-10x of the typical latency thus impacting the P95 numbers
CPU usage was less than 20% and Memory consumption was less than 25%
This observation and pattern looks like there are some restrictions/limitations that are paving way for outliers of latencies.

Could there be a wait happening on some internal resources?

Evidence

Note: How we capture latency (P95 reported) is by having backgrounded threads which are 12, 14, 16 etc. and collect the metrics from just one. This shows on concurrent load, how a particular user observes performance. Stating that, QPS could have been impacted by the outliers observed.

hiltontj · 2024-11-18T15:42:10Z

Hey @MaduMitha-Ravi - I'm wondering if we have observed similar break down in performance for higher thread counts when issuing regular queries, i.e., not to the last cache? I want to rule out that this is related to something systemic vs. in the last cache specifically before digging into what might be wrong in the cache.

MaduMitha-Ravi · 2024-11-18T15:45:45Z

I will do some quick runs and update in here. We can modify the issue based on evidence.

MaduMitha-Ravi · 2024-11-18T21:42:05Z

@hiltontj You suspicion is right. More outliers spike with the increase in concurrency.

pauldix · 2024-11-19T00:36:02Z

@hiltontj We encountered a problem with concurrency in IOx before that required moving query planning off of the IO threadpool and onto the DF threadpool. The pr is influxdata/influxdb_iox#11029 which has pointers to related PRs and issues that are worth reading through.

Basically, we weren't able to take advantage of all the cores of a larger machine because we have two threadpools: one for tokio IO and one for DF query execution. Too much happening in the IO threadpool would cause IO stalls and make it so we couldn't effectively utilize all cores.

Might be the case again, but might not. Thought it was worth highlighting.

hiltontj · 2024-11-19T14:36:56Z

Thanks for confirming @MaduMitha-Ravi and for the pointer @pauldix - @MaduMitha-Ravi is this is a major blocker? If so, I can start looking into it; otherwise, I will dive into this next week once I am through with #25539

MaduMitha-Ravi · 2024-11-19T14:44:58Z

@hiltontj Not a blocker, just a concern. We can take it up next week.

MaduMitha-Ravi added the v3 label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

MaduMitha-Ravi commented Nov 18, 2024

hiltontj commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

pauldix commented Nov 19, 2024

hiltontj commented Nov 19, 2024

MaduMitha-Ravi commented Nov 19, 2024

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Comments

MaduMitha-Ravi commented Nov 18, 2024

Increase in Outliers beyond 16 thread concurrency (Last value cache)

Evidence

hiltontj commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

pauldix commented Nov 19, 2024

hiltontj commented Nov 19, 2024

MaduMitha-Ravi commented Nov 19, 2024