-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close traps when testing concurrent search #275
Comments
It's the last |
Grouping was added here: #56 because without it the QPS measurement per task was meaningless |
OH! Thanks for digging @msokolov ... hmm I think this is why we have the QPS is such a tricky metric ... |
It's been a while but I think the issue was that we want to measure wall clock time per task This works OK when we run single threaded since the times in the log represent times spent working on that single task. However when we run multiple tasks concurrently the wall clock times represent work done on all the tasks we are running, which can be of mixed types. So this grouping change ensures that we run only tasks of the same type at the same time, enabling us to attribute the wall clock times in the logs to that task unambiguously. |
I guess we could switch from measuring wall clock time to measuring CPU time using Java's JMX API. If we did that we wouldn't need to group tasks this way, and would possibly get a more realistic task mix in order to simulate cross-task effects in a mixed query load. |
+1 to explore this! This would be better than elapsed wall clock time... e.g. this would've prevented the whole confusion around lower QPS because multiple queries are running concurrently intra and inter query concurrency ... |
The thing that caused me to dig into this particular issue was watching the |
Hmm OK I tried reverting that change, so we no longer group by task category:
Then I ran this
And got these results after 10 iterations:
I ran it again and got different QPS for each category because
|
Maybe this is also fixed by forcing |
…from 2 to max(2, <cpu-cores>/3); don't run with both intra- and inter-query concurrency
OK I had wondered why the nightly benchmarks didn't show the (trappy) "effective QPS" slowdown when we enabled I'll change the nightly bench to use higher intra-query concurrency, with only one query in flight at once, so it doesn't take forever to run. But the reported effective QPS will be wonky (too high). I think we really need to switch the the true per-thread CPU count aggregation from the JMX bean. I'll open a spinoff for that ... |
Spinoff from apache/lucene#13472 where we thought we saw massive gains from an improvement to how Lucene implements intra-query concurrency, but, at least some of the gains were due to
luceneutil
trappiness. Some ideas:Competitor
'snumThreads
tonumConcurrentQueries
(or maybenumQueriesInFlight
?) to make it clear this controls how many concurrent queries are sent toIndexSearcher
searchConcurrency > 1
maybe we should not allownumThreads>1
? This is so trappy because the reported QPS gain from running concurrently is trappily divided bynumThreads
(see details from this comment)luceneutil
? This would drive enough threads (concurrent queries or worker thread pool) to saturate CPU and/or IO. This is a tricky change tho ... we should open a separate spinoff. E.g. the precise set of tasks matters, their mix and order matters, reproducible arrival (Poisson-like) times matter, etc.luceneutil
runs enough concurrent things (queries or work units) such that the tests runs as efficiently as possible, but, does not bump up against false contention on the box (i.e. does NOT saturate CPU or IO) that mess up the resultsluceneutil
should not even report QPS but rather latency? It's confusing because it is an "effective QPS" (1/avg-of-median-query-wall-clock-time-after-discarding-warmup-and-outliers) not an actual "red-line capacity".Also, strangely, if
searchConcurrency > 1
, we oddly group the tasks so that all tasks within a single cat are done at the same time (across all threads). This is weird e.g. theBrowse*
tasks are single threaded even if you ask for concurrency and sotop
will show only 100% CPU used if you are running with six threads ...The text was updated successfully, but these errors were encountered: