-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very slow with 10k tags #248
Comments
I'd be interesting in speeding things up. Can you provide an example repo that I can use to reproduce the issue? You can also simply disable ctags for the time being. |
Is this about git tags or ctags? |
This is about git tags. |
I did some investigation today. We have multiple causes here. The time spent compiling the list of tags (and branches) for display is spent 50% on listing refs (Dulwich’s Then, looking up refs and checking their time stamps is not cached at the moment. We can easily cache the time stamp part; there’s already logic in the code base for cache invalidation based on changed list-of-refs. The ref lookup would still need to be performed on each page load, since it is used as cache validator, so we can only save 50% of page load time using this method. @jelmer two Dulwich/Git questions here:
|
Btw, Mathias, if you have no more than 10k tags and page load takes up to 30 seconds, are you using a terribly slow file system or operating system or a very slow computer? My benchmarks were more in the range of 1-2 seconds for 10k tags on a moderately capable MacBook. |
On Tue, Mar 17, 2020 at 01:55:35PM -0700, Jonas Haag wrote:
I did some investigation today. We have multiple causes here.
The time spent compiling the list of tags (and branches) for display is spent 50% on listing refs (Dulwich’s `refs.as_dict()`), and 50% on looking up each ref’s time stamp for sorting.
Then, looking up refs and checking their time stamps is not cached at the moment.
We can easily cache the time stamp part; there’s already logic in the code base for cache invalidation based on changed list-of-refs. The ref lookup would still need to be performed on each page load, since it is used as cache validator, so we can only save 50% of page load time using this method.
@jelmer two Dulwich/Git questions here:
- Is there a faster way to check for changed refs (tags and branches) than compiling the dict of refs? Probably we can use ref files’ modification time stamps as a cheap has-any-ref-been-modified pre-check, and only then resolve each ref to check for changes? Is this a reliable method?
- Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.
Packing the refs into a packed-refs file would be a good alternative
that is probably much faster to parse. That does however involve write
access to the repository.
Another option may be to provide notifications for tag changes on top
of inotify, either inside of dulwich or inside of klaus.
I'm a little wary of tracking file timestamps; the refs files
themselves are tiny, and the overhead of tracking timestamps versus
actually reading the files is also non-trivial (both in terms of
performance and in terms of additional code).
Cheers,
Jelmer
…--
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
|
Thanks!
Do you think we could gain some speedup by batching the ref lookups?
Code-wise, isn't it only a matter of calling |
On Wed, Mar 18, 2020 at 05:52:17AM -0700, Jonas Haag wrote:
> Me: Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.
Do you think we could gain some speedup by batching the ref lookups?
Not by much - the performance of Refs.as_dict() can probably be
improved somewhat, but not by much. packed-refs will make a *big*
difference here, rather than using individual files.
> the overhead of tracking timestamps versus
> actually reading the files is also non-trivial (both in terms of
> performance and in terms of additional code).
Code-wise, isn't it only a matter of calling `os.stat()` on each ref file? Do you think that's slower than reading each file's contents?
statting 10k files is not fast either; I'm not convinced it's going to
be a significant win over just reading the file contents.
I think we can add a mechanism in Dulwich that allows you to listen to
updates to refs - I'ved filed
jelmer/dulwich#751 about this.
Jelmer
…--
Jelmer Vernooij <[email protected]>
PGP Key: https://www.jelmer.uk/D729A457.asc
|
You can now use RefsContainer.watch() to wait for changes to refs:
Note that you'd probably have to run this from another thread, since it will block until one of the refs changes. |
Great news! Minor downer is that inotify is Linux-only, so no macOS or BSD support. But that’s probably easy to add, and maybe entirely unnecessary since 99% of users with huge number of repositories will run Linux anyways. |
The API on the Dulwich side should be generic enough that we could add a Windows-specific implementation (perhaps using watchdog?) to it, without any changes on the klaus side. |
Not working on caching at the moment... This reverts commit e21cf6a.
Loading every single page takes 20-30 seconds. If I remove the get tags function from the context it only takes a couple of seconds.
Tags don't seem that important, how come they have such a terrible performance impact?
Could some caching logic help?
The text was updated successfully, but these errors were encountered: