very slow with 10k tags #248

mgaunard · 2020-03-16T16:58:18Z

Loading every single page takes 20-30 seconds. If I remove the get tags function from the context it only takes a couple of seconds.

Tags don't seem that important, how come they have such a terrible performance impact?
Could some caching logic help?

jonashaag · 2020-03-16T21:05:30Z

I'd be interesting in speeding things up. Can you provide an example repo that I can use to reproduce the issue?

You can also simply disable ctags for the time being.

jelmer · 2020-03-16T21:12:56Z

Is this about git tags or ctags?

mgaunard · 2020-03-17T14:03:56Z

This is about git tags.
Nothing to do with ctags.

jonashaag · 2020-03-17T20:55:21Z

I did some investigation today. We have multiple causes here.

The time spent compiling the list of tags (and branches) for display is spent 50% on listing refs (Dulwich’s refs.as_dict()), and 50% on looking up each ref’s time stamp for sorting.

Then, looking up refs and checking their time stamps is not cached at the moment.

We can easily cache the time stamp part; there’s already logic in the code base for cache invalidation based on changed list-of-refs. The ref lookup would still need to be performed on each page load, since it is used as cache validator, so we can only save 50% of page load time using this method.

@jelmer two Dulwich/Git questions here:

Is there a faster way to check for changed refs (tags and branches) than compiling the dict of refs? Probably we can use ref files’ modification time stamps as a cheap has-any-ref-been-modified pre-check, and only then resolve each ref to check for changes? Is this a reliable method?
Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.

jonashaag · 2020-03-17T20:58:27Z

Btw, Mathias, if you have no more than 10k tags and page load takes up to 30 seconds, are you using a terribly slow file system or operating system or a very slow computer? My benchmarks were more in the range of 1-2 seconds for 10k tags on a moderately capable MacBook.

jelmer · 2020-03-18T10:31:34Z

On Tue, Mar 17, 2020 at 01:55:35PM -0700, Jonas Haag wrote: I did some investigation today. We have multiple causes here. The time spent compiling the list of tags (and branches) for display is spent 50% on listing refs (Dulwich’s `refs.as_dict()`), and 50% on looking up each ref’s time stamp for sorting. Then, looking up refs and checking their time stamps is not cached at the moment. We can easily cache the time stamp part; there’s already logic in the code base for cache invalidation based on changed list-of-refs. The ref lookup would still need to be performed on each page load, since it is used as cache validator, so we can only save 50% of page load time using this method. @jelmer two Dulwich/Git questions here: - Is there a faster way to check for changed refs (tags and branches) than compiling the dict of refs? Probably we can use ref files’ modification time stamps as a cheap has-any-ref-been-modified pre-check, and only then resolve each ref to check for changes? Is this a reliable method? - Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.

Packing the refs into a packed-refs file would be a good alternative that is probably much faster to parse. That does however involve write access to the repository. Another option may be to provide notifications for tag changes on top of inotify, either inside of dulwich or inside of klaus. I'm a little wary of tracking file timestamps; the refs files themselves are tiny, and the overhead of tracking timestamps versus actually reading the files is also non-trivial (both in terms of performance and in terms of additional code). Cheers, Jelmer

…

-- Jelmer Vernooĳ <[email protected]> PGP Key: https://www.jelmer.uk/D729A457.asc

jonashaag · 2020-03-18T12:52:03Z

Thanks!

Me: Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting.

Do you think we could gain some speedup by batching the ref lookups?

the overhead of tracking timestamps versus
actually reading the files is also non-trivial (both in terms of
performance and in terms of additional code).

Code-wise, isn't it only a matter of calling os.stat() on each ref file? Do you think that's slower than reading each file's contents?

jelmer · 2020-03-21T20:06:47Z

On Wed, Mar 18, 2020 at 05:52:17AM -0700, Jonas Haag wrote: > Me: Is there a faster way to look up a bunch of refs at the same time? Use case would be looking up all refs’ time stamps for sorting. Do you think we could gain some speedup by batching the ref lookups?

Not by much - the performance of Refs.as_dict() can probably be improved somewhat, but not by much. packed-refs will make a *big* difference here, rather than using individual files.

> the overhead of tracking timestamps versus > actually reading the files is also non-trivial (both in terms of > performance and in terms of additional code). Code-wise, isn't it only a matter of calling `os.stat()` on each ref file? Do you think that's slower than reading each file's contents?

statting 10k files is not fast either; I'm not convinced it's going to be a significant win over just reading the file contents. I think we can add a mechanism in Dulwich that allows you to listen to updates to refs - I'ved filed jelmer/dulwich#751 about this. Jelmer

…

-- Jelmer Vernooĳ <[email protected]> PGP Key: https://www.jelmer.uk/D729A457.asc

jelmer · 2020-06-28T16:13:50Z

You can now use RefsContainer.watch() to wait for changes to refs:

r = Repo('.')
with r.refs.watch() as w:
    for (ref, sha) in w:
        print('Ref %s has been updated to %s' % (ref, sha))

Note that you'd probably have to run this from another thread, since it will block until one of the refs changes.

jonashaag · 2020-06-28T21:51:10Z

Great news!

Minor downer is that inotify is Linux-only, so no macOS or BSD support. But that’s probably easy to add, and maybe entirely unnecessary since 99% of users with huge number of repositories will run Linux anyways.

jelmer · 2020-06-28T22:55:09Z

The API on the Dulwich side should be generic enough that we could add a Windows-specific implementation (perhaps using watchdog?) to it, without any changes on the klaus side.

Not working on caching at the moment... This reverts commit e21cf6a.

jelmer mentioned this issue Mar 21, 2020

notifications for changed refs jelmer/dulwich#751

Closed

jonashaag pushed a commit that referenced this issue Mar 28, 2020

Refactor to prepare for caching overhaul. Refs #248

dcf7770

jonashaag added a commit that referenced this issue Mar 28, 2020

Refactor to prepare for caching overhaul. Refs #248

7d3cb18

jonashaag added a commit that referenced this issue Mar 28, 2020

Refactor to prepare for caching overhaul. Refs #248

e21cf6a

jelmer mentioned this issue Oct 8, 2020

Switch from Flask to starlette? #257

Open

jonashaag added a commit that referenced this issue Oct 12, 2020

Revert "Refactor to prepare for caching overhaul. Refs #248"

fbfa171

Not working on caching at the moment... This reverts commit e21cf6a.

jonashaag mentioned this issue Apr 2, 2023

large repository makes index page very slow #309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow with 10k tags #248

very slow with 10k tags #248

mgaunard commented Mar 16, 2020 •

edited

Loading

jonashaag commented Mar 16, 2020

jelmer commented Mar 16, 2020

mgaunard commented Mar 17, 2020

jonashaag commented Mar 17, 2020

jonashaag commented Mar 17, 2020

jelmer commented Mar 18, 2020 via email

jonashaag commented Mar 18, 2020

jelmer commented Mar 21, 2020 via email

jelmer commented Jun 28, 2020

jonashaag commented Jun 28, 2020

jelmer commented Jun 28, 2020 •

edited

Loading

very slow with 10k tags #248

very slow with 10k tags #248

Comments

mgaunard commented Mar 16, 2020 • edited Loading

jonashaag commented Mar 16, 2020

jelmer commented Mar 16, 2020

mgaunard commented Mar 17, 2020

jonashaag commented Mar 17, 2020

jonashaag commented Mar 17, 2020

jelmer commented Mar 18, 2020 via email

jonashaag commented Mar 18, 2020

jelmer commented Mar 21, 2020 via email

jelmer commented Jun 28, 2020

jonashaag commented Jun 28, 2020

jelmer commented Jun 28, 2020 • edited Loading

mgaunard commented Mar 16, 2020 •

edited

Loading

jelmer commented Jun 28, 2020 •

edited

Loading