Make performance counter #20

foriequal0 · 2020-02-17T02:24:30Z

To diagnosis operations, I need a performance counter.
How many branches are there? How big is a repository? How long it takes for each operation?

siedentop · 2020-11-13T03:11:43Z

Yet one more ticket that I wanted to comment on. ;)

From the description, it is not clear to me, what you want to accomplish with these metrics. I.e. repo size, number of branches are in and of themselves not performance counters.

However, I am using this on a repo and it basically does not work because of performance reasons. I have ~3500 branches (git branch --all | wc -l). I don't know the repo size (I'm writing this on an iPad.) but it is definitely very big (think LLVM repo size). Could you tell me how you would calculate repo size? Possibilities: number of git objects. Size of .git folder, size of checkout out repo. Depth of history. Max width of git commit tree.

"How long it takes for each operation."

I was trying this earlier on this particular repo and I was running it env_logger set to "debug" level. That will provide some timings. As I said, it took to long and so I aborted it.

foriequal0 · 2020-11-13T03:59:51Z

Thank you for pointing it out. I've made this issue to leave some notes while closing this issue. Current log is quiet tedious to inspect. I might be able to find the slow spot with timgings in the log, but it doesn't tell me why it is slow. I wanted to make it easier to analyze. But was a vague idea and lost motivation since there hasn't been a performance issue after closing it.

By the way, can you tell me how many local branches do you have? IIRC, total time should be proportional to the number of local branches by default, not total branches, except --delete remote flag is given.

siedentop · 2020-11-13T07:01:09Z

Here's the data you requested in the linked ticket.

git rev-list --all --count ==> 369170
All CPU cores are in use
❯ git branch | wc -l ==> 102
git branch --all | wc -l ==> ~3500

Feature Request: Provide output during the run at which stage it is (out of how many total stages.) Ideally in the form of a status-bar.

I am going to do two things next: (1) Run cargo-flamegraph (I tried but couldn't get it to work in a different directory.) . (2) Run it overnight.

foriequal0 · 2020-11-13T07:08:59Z

Thanks! However, requested features might take some time. I'm busy at work for months. Also I might be able to try to trim some branches even if it is aborted.

siedentop · 2020-11-13T18:14:02Z

Results from running the command over night:

The logs look like it took 2:45h to run. (I.e. last log message timestamp minus first timestamp).

       Command being timed: "git trim --no-update --dry-run"
        User time (seconds): 180559.03
        System time (seconds): 10047.93
        Percent of CPU this job got: 1926%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:44:52
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1151504
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 3845
        Minor (reclaiming a frame) page faults: 446960188
        Voluntary context switches: 1063108901
        Involuntary context switches: 3131583
        Swaps: 0
        File system inputs: 655272
        File system outputs: 880
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Also I might be able to try to trim some branches even if it is aborted.

I have a feeling that this the complexity is non-linear? Then trimming only the first N branches would be beneficial. That would also be beneficial from the perspective of reviewing the proposed changes.

git2::Repository::remotes() and find_remote(..) are incredibly slow. Calling them every time for each branch is unnecessarily slow. Please note that this is not yet the best way to implement this. We are still iterating through all remote.refspecs() for each branch. Just that all Remote structs are cached. An alternative would be to work on `expand_refspec()` and create a map/index of (branch name, remote branches) once. \foriequal0#20

siedentop · 2020-11-16T07:02:52Z

@foriequal0 sorry to take your time again. I realize that you might be busy. If so, no need to respond.

I identified MergeTracker<T>::check_and_track as taking up all the time in the earlier stages of the runtime (after making fixes in 42a0874). In particular, it calls repo.merge_base which takes all the time.

Here's a list of things that I don't understand:

What is the idea of MergeTracker? What is the meaning of merged_set, what does it contain?
How does the check_and_track function work?

Many thanks and also totally fine if you don't have time to answer this.

foriequal0 · 2020-11-16T14:00:08Z

The basic isn't changed from this merge testing script:

MERGE_BASE=$(git merge-base $BASE $BRANCH)
# Is branch merged by cherry-pick or rebase?
git rev-list --cherry-pick --right-only --no-merges -n1 $MERGE_BASE...$BRANCH # empty if merged
// Is branch merged by squash? https://stackoverflow.com/questions/43489303/how-can-i-delete-all-git-branches-which-have-been-squash-and-merge-via-github/56026209#56026209
TREE=$(git rev-parse $BRANCH^{tree})
SQUASH=git commit-tree $TREE -p $MERGE_BASE -m _)
git rev-list --cherry-pick --right-only --no-merges -n1 $MERGE_BASE...$SQUASH # empty if merged

Also git branch --merged $BASE gives you a list of no-ff merged branches. I've tried to optimize more by avoiding rev-list and commit-tree since rev-list is slow, and commit-tree is slower, especially if you are using it on WSL (its disk operations are notoriously slow. Even slower than non-optimized, non-cached git for windows)

This is the core idea of MergeTracker.

merged_set contains a set of branches that are already merged into bases. It starts from a set of bases, and git branch --merged.
Any branches that are ancestors of merged branches are trivially merged, (children of un-merged branches are un-merged?), without testing it.
If A is an ancestor of B, we know that A == B or $(git merge-base A B) == A. It would be a short-circuit when it is much cheaper than running a series of rev-list and commit-tree
If we fail, then we return to the rev-list and commit-tree method. However, can we share the result to other tasks so they can short-circuit with the result?

Also, I thought that calling libgit2 API would be faster faster than executing git with std::process::Command. I thought MergeTracker would be the bottleneck, not repo.remotes()?.iter() and find_remote() so I've left them for brevity and some simplicity.

jyn514 · 2023-02-25T18:27:42Z

FWIW, git branch --all --merged does roughly the same thing and runs several orders of magnitude faster. Maybe it would be simpler to do that and prune the branches instead of trying to determine them from scratch?

foriequal0 · 2023-02-27T04:10:35Z

As far as I know, git branch --all --merged doesn't count squash-merged, or rebase-merged branches.
git-trim was created when I preferred rebase-merge.

foriequal0 self-assigned this Feb 17, 2020

foriequal0 added the enhancement New feature or request label Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make performance counter #20

Make performance counter #20

foriequal0 commented Feb 17, 2020

siedentop commented Nov 13, 2020

foriequal0 commented Nov 13, 2020

siedentop commented Nov 13, 2020

foriequal0 commented Nov 13, 2020 •

edited

Loading

siedentop commented Nov 13, 2020

siedentop commented Nov 16, 2020

foriequal0 commented Nov 16, 2020

jyn514 commented Feb 25, 2023

foriequal0 commented Feb 27, 2023

Make performance counter #20

Make performance counter #20

Comments

foriequal0 commented Feb 17, 2020

siedentop commented Nov 13, 2020

foriequal0 commented Nov 13, 2020

siedentop commented Nov 13, 2020

foriequal0 commented Nov 13, 2020 • edited Loading

siedentop commented Nov 13, 2020

siedentop commented Nov 16, 2020

foriequal0 commented Nov 16, 2020

jyn514 commented Feb 25, 2023

foriequal0 commented Feb 27, 2023

foriequal0 commented Nov 13, 2020 •

edited

Loading