Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search is too slow #5176

Open
Tyriar opened this issue Oct 3, 2024 · 8 comments
Open

Search is too slow #5176

Tyriar opened this issue Oct 3, 2024 · 8 comments

Comments

@Tyriar
Copy link
Member

Tyriar commented Oct 3, 2024

Currently the search addon limits itself to 1000 highlights by default. Contrast this to monaco's 20000 results before it gives up highlighting. When we increase our limit to 20000 the search itself ends up blocking the renderer pretty significantly.

Here is a profile using my i7-12700KF 3.61 GHz:

test file test.txt

Image

Image

Note that the search is also not incremental, so all successive searches take the same amount of time even for non-regex searches. #5177

VS Code issue: microsoft/vscode#230365

@jerch
Copy link
Member

jerch commented Oct 3, 2024

Hmm I am pretty sure that monaco makes use of an index - text data in editors tend to stay pretty stable normally beside the few lines the user is currently editing. So even more costly index creation is justified by a rather low change rate over time.

The terminal buffer is not quite the same in that regard - it tends to change text data substantially over time, so the question is, if we can find a low cost index repr, which still can speed up the search significantly.

I have lit. no clue about search addon, so idk how it currently approaches this issue. I prolly would check, if a trie / suffix array can help here, but those are typically quite expensive to create, so might be too much for fast changing data. If thats the case, maybe a lighter version only caching line & content position for the first n letters, then switching to line content eval is faster. How deep such an index cache should go, depends on the trie branching and the size of the position buckets, which is again data dependent (so there is no overall good solution for that).

@Tyriar
Copy link
Member Author

Tyriar commented Oct 3, 2024

I don't think monaco has a particularly clever index as you're suggesting actually, just caching similar to us (I might be wrong). I think the performance problems mainly stem from the fact that neither of us really got too involved in the search addon and performance wasn't a big focus. Performance of registering so many decorations might be a part of this too, would need to investigate.

@jerch
Copy link
Member

jerch commented Oct 3, 2024

I don't think monaco has a particularly clever index as you're suggesting actually...

Ah ok, well that was just a wild guess from my side. But if they do it likewise as the search addon currently, the perf difference might indeed result from another bottleneck and not the actual positional search.

@jerch
Copy link
Member

jerch commented Oct 6, 2024

Did a few orientation perf tests with your test file to see, where we start from (scrollback set to 100k, searching the full file line count in one go with sync code just collecting buffer positions):

  • turn whole buffer into one big string and search with indexOf('FIXES'): 72960 matches in ~100 ms
    • (translateToString takes ~45 ms of that time)
  • search directly on the arraybuffer with codepoints:
    • uncached: 72960 matches in ~55 ms
    • 1st codepoint cached:
      • 1st run: 72960 matches in ~70 ms
      • later runs: 72960 matches in ~40 ms

Seems the lower limit for naive search on the whole buffer is 50-100 ms with Javascript on my machine for your test file. While WASM typically shows a 2-5x speedup for those "chew on that big chunk of data" tasks, I dont expect it to be faster here due to the needed data copies, so I did not test it.

Configuring the current code to also match all occurences takes currently ~6.3 s:
image
mainly caused by setTimeout/clearTimeout calls:
image
from these lines:

window.clearTimeout(this._linesCacheTimeoutId);
this._linesCacheTimeoutId = window.setTimeout(() => this._destroyLinesCache(), LINES_CACHE_TIME_TO_LIVE);
. Disabling them it runs at ~800 ms (~8x faster):
image

Remember that those numbers cannot be directly compared (the current addon code does a lot more than just collecting buffer positions). It still gives some hints about the possible gains. Also I did not test any more clever caching, as the file is a flat repeat of the search string, so any branch cutting on lines wont apply at all.

@jerch
Copy link
Member

jerch commented Oct 7, 2024

@Tyriar Did a bit more investigation - when I add the highlightAll part to my fast direct buffer search, things dont look good anymore, runtime for 72960 highlighted matches is now ~600 ms:

  • buffer search: 46 ms
  • _createResultDecoration is at ~470 ms, mostly resulting from:
    • registerMarker ~170 ms
    • registerDecoration ~170 ms

Tracing things further down, big portions of the overhead runtime come from:

  • src/vs/base/common/event.ts:L1097 with ~290 ms (mostly from markers)
  • minor GC with ~190 ms (GC invocations mostly point into vs event codebase)

To me it seems, that the vs event system is too heavy for fast marker creation. Also the results suggest, that the event impl is not specifically optimized for low GC profile:
image

registerMarker path
image

registerDecoration path
image

I'd suggest to re-eval the usage of vs/eventemitter for terminal markers. Idk if those could be made faster per se, the code looks quite convoluted to me and I guess thats for a reason. Maybe we dont need all the event fanciness for markers and can get away with a less complete but much faster impl instead.

@jerch
Copy link
Member

jerch commented Oct 8, 2024

More profiling, this time to get behind the high GC:
image

I had to lower scrollback to 10000 to get this profile loading, it matched 10018x in the buffer and highlighted those. The allocation table looks crazy, 360000 function allocations alone for 10k highlights. Note that all numbers are almost perfect multiples of 10018, so we see here the "workload" per decoration done:

  • Function: 36x, ~1700 bytes
  • Object: 12x, ~1690 bytes
  • Context: 33x, ~1632 bytes
  • (JSArrayBufferData: those are the bufferlines, so not newly allocated)
  • Array: 4x, ~1465 bytes
  • _Marker2: 1x, ~1148 bytes
  • UniqueContainer: 8x, ~384 bytes
  • Emitter: 3x, ~368 bytes
  • Decoration: 1x, ~332 bytes

Now those numbers cannot simply be added to see the full deal for one decoration, as "Context" also accounts arrow "Function"s, and "Object" also contains the bootstrapping setup (thus all bufferlines). What can be said so - the decoration creation is highly dominated by function creations, most of them being arrow functions. When looking into the details under "Function" and "Context" it becomes clear, that mostly the marker attached to a decoration creates that function pressure. Looking at the code all these anon arrow functions have to go through the event emitter setup, which explains, why event.ts:1097 is so high in timeline profile and triggers GC that often.

The numbers above align with the JS Heap growing from initially 8 MB to 40 MB after decorating 10k matches.

Long story short - our marker setup placing tons of arrow functions on onDispose is very heavy. We prolly should make that much more light-weighted, if we want it to work with +10k decorations.

@jerch
Copy link
Member

jerch commented Oct 8, 2024

And here the allocation counts for 10k matches:
image

indicating that, _event gets called very often from addMarker and _createResultDecoration. The high runtime is in the end a result a high call count of that getter creating function objects over and over (also causing high GC pressure).

When looking at event.ts:1097 I wonder, if reshaping the getter with anonymous function creation into a directly called function would lower the runtime stress a lot here.

@jerch
Copy link
Member

jerch commented Oct 8, 2024

Profiling v5.5.0 (before the vs addition) - finishs in ~400 ms (vs 600 ms in master) and has a less toxic function creation overhead:
image

image

So with the VS event emitter we lost ~50% performance on marker/decoration creation (+200 ms). The original runtime of 400 ms with the old event emitter is still high, and mainly caused by the big stack of single actions at each marker/decoration creation. To fix the latter, the whole marker/decoration impl would be on trial - it either needs batchers to save some runtime in multiple creations at once or even very reduced marker/decoration impls with containers doing the event housekeeping in batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants