build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

keegancsmith · 2023-11-06T09:55:08Z

See the individual commits for an explanation. The overall performance improvement of BenchmarkTagsToSections on my machine is:

goos: linux
goarch: amd64
pkg: github.com/sourcegraph/zoekt/build
cpu: AMD Ryzen 9 5950X 16-Core Processor

old time/op    new time/op    delta
   188µs ± 7%     101µs ± 3%  -46.10%  (p=0.000 n=10+10)

old alloc/op   new alloc/op   delta
  79.3kB ± 0%    36.3kB ± 0%  -54.24%  (p=0.000 n=9+10)

old allocs/op  new allocs/op  delta
     443 ± 0%       441 ± 0%   -0.45%  (p=0.000 n=10+10)

keegancsmith · 2023-11-06T10:01:05Z

This is quite fast, so I don't this the speed improvements are that important. However, the memory usage improvements are likely noticeable due to less work the GC needs to do.

Also I recently instrumented the time taken by the ctags parser and didn't include the time taken by tagsToSections. I think given the speed of this that is likely the correct call. If we are slow here it will be ctags not this conversion code.

jtibshirani

Looks good to me! I didn't see that this folded in my commits and already merged the original one, so you'll need to rebase.

On my machine this reduces wall clock time of BenchmarkTagsToSections by 38%. This is faster since bytes.IndexByte relies on CPU specific optimizations to find the next new line (eg uses AVX2 if available). goos: linux goarch: amd64 pkg: github.com/sourcegraph/zoekt/build cpu: AMD Ryzen 9 5950X 16-Core Processor old time/op new time/op delta 188µs ± 7% 117µs ± 4% -37.96% (p=0.000 n=10+10) old alloc/op new alloc/op delta 79.3kB ± 0% 79.3kB ± 0% ~ (all equal) old allocs/op new allocs/op delta 443 ± 0% 443 ± 0% ~ (all equal) Test Plan: go test -bench BenchmarkTagsToSections

I noticed in the profiler a nonsignificant chunk in the garbage collector. The slice built by newLinesIndices is allocated and thrown away for each call to tagsToSections. This means we can re-use it which this commit implements by introducing a struct storing the buffer. We now use this buffer per shard of symbols we analyse. Even with the improvements of the previous commit, this further improves performance by 13% and halves memory use: $ benchstat after.txt after2.txt old time/op new time/op delta 117µs ± 4% 101µs ± 3% -13.12% (p=0.000 n=10+10) old alloc/op new alloc/op delta 79.3kB ± 0% 36.3kB ± 0% -54.24% (p=0.000 n=9+10) old allocs/op new allocs/op delta 443 ± 0% 441 ± 0% -0.45% (p=0.000 n=10+10) Test Plan: go test -bench BenchmarkTagsToSections

keegancsmith requested a review from a team November 6, 2023 09:55

keegancsmith mentioned this pull request Nov 6, 2023

Add benchmark for ctags conversion #679

Merged

Base automatically changed from jtibs/ctags to main November 6, 2023 15:18

jtibshirani approved these changes Nov 6, 2023

View reviewed changes

keegancsmith added 2 commits November 6, 2023 19:47

keegancsmith force-pushed the k/ctags branch from be8e4a7 to fd5fdea Compare November 6, 2023 17:48

keegancsmith merged commit bec12a7 into main Nov 6, 2023
8 checks passed

keegancsmith deleted the k/ctags branch November 6, 2023 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

keegancsmith commented Nov 6, 2023

keegancsmith commented Nov 6, 2023

jtibshirani left a comment

build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

Conversation

keegancsmith commented Nov 6, 2023

keegancsmith commented Nov 6, 2023

jtibshirani left a comment

Choose a reason for hiding this comment