Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: faster newLinesIndices via bytes.IndexByte and buffer re-use #680

Merged
merged 2 commits into from
Nov 6, 2023

Conversation

keegancsmith
Copy link
Member

See the individual commits for an explanation. The overall performance improvement of BenchmarkTagsToSections on my machine is:

goos: linux
goarch: amd64
pkg: github.com/sourcegraph/zoekt/build
cpu: AMD Ryzen 9 5950X 16-Core Processor

old time/op    new time/op    delta
   188µs ± 7%     101µs ± 3%  -46.10%  (p=0.000 n=10+10)

old alloc/op   new alloc/op   delta
  79.3kB ± 0%    36.3kB ± 0%  -54.24%  (p=0.000 n=9+10)

old allocs/op  new allocs/op  delta
     443 ± 0%       441 ± 0%   -0.45%  (p=0.000 n=10+10)

@keegancsmith keegancsmith requested a review from a team November 6, 2023 09:55
@keegancsmith
Copy link
Member Author

This is quite fast, so I don't this the speed improvements are that important. However, the memory usage improvements are likely noticeable due to less work the GC needs to do.

Also I recently instrumented the time taken by the ctags parser and didn't include the time taken by tagsToSections. I think given the speed of this that is likely the correct call. If we are slow here it will be ctags not this conversion code.

Base automatically changed from jtibs/ctags to main November 6, 2023 15:18
Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I didn't see that this folded in my commits and already merged the original one, so you'll need to rebase.

On my machine this reduces wall clock time of BenchmarkTagsToSections by
38%. This is faster since bytes.IndexByte relies on CPU specific
optimizations to find the next new line (eg uses AVX2 if available).

  goos: linux
  goarch: amd64
  pkg: github.com/sourcegraph/zoekt/build
  cpu: AMD Ryzen 9 5950X 16-Core Processor

  old time/op    new time/op    delta
     188µs ± 7%     117µs ± 4%  -37.96%  (p=0.000 n=10+10)

  old alloc/op   new alloc/op   delta
    79.3kB ± 0%    79.3kB ± 0%     ~     (all equal)

  old allocs/op  new allocs/op  delta
       443 ± 0%       443 ± 0%     ~     (all equal)

Test Plan: go test -bench BenchmarkTagsToSections
I noticed in the profiler a nonsignificant chunk in the garbage
collector. The slice built by newLinesIndices is allocated and thrown
away for each call to tagsToSections. This means we can re-use it which
this commit implements by introducing a struct storing the buffer. We
now use this buffer per shard of symbols we analyse.

Even with the improvements of the previous commit, this further improves
performance by 13% and halves memory use:

  $ benchstat after.txt after2.txt
  old time/op    new time/op    delta
     117µs ± 4%     101µs ± 3%  -13.12%  (p=0.000 n=10+10)

  old alloc/op   new alloc/op   delta
    79.3kB ± 0%    36.3kB ± 0%  -54.24%  (p=0.000 n=9+10)

  old allocs/op  new allocs/op  delta
       443 ± 0%       441 ± 0%   -0.45%  (p=0.000 n=10+10)

Test Plan: go test -bench BenchmarkTagsToSections
@keegancsmith keegancsmith merged commit bec12a7 into main Nov 6, 2023
8 checks passed
@keegancsmith keegancsmith deleted the k/ctags branch November 6, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants