On Wed, Aug 04, 2021 at 04:59:40PM -0400, Jeff King wrote:
> On Wed, Aug 04, 2021 at 03:56:11PM +0200, Patrick Steinhardt wrote:
> 
> > When doing reference negotiation, git-fetch-pack(1) is loading all refs
> > from disk in order to determine which commits it has in common with the
> > remote repository. This can be quite expensive in repositories with many
> > references though: in a real-world repository with around 2.2 million
> > refs, fetching a single commit by its ID takes around 44 seconds.
> > 
> > Dominating the loading time is decompression and parsing of the objects
> > which are referenced by commits. Given the fact that we only care about
> > commits (or tags which can be peeled to one) in this context, there is
> > thus an easy performance win by switching the parsing logic to make use
> > of the commit graph in case we have one available. Like this, we avoid
> > hitting the object database to parse these commits but instead only load
> > them from the commit-graph. This results in a significant performance
> > boost when executing git-fetch in said repository with 2.2 million refs:
> > 
> >     Benchmark #1: HEAD~: git fetch $remote $commit
> >       Time (mean ± σ):     44.168 s ±  0.341 s    [User: 42.985 s, System: 1.106 s]
> >       Range (min … max):   43.565 s … 44.577 s    10 runs
> > 
> >     Benchmark #2: HEAD: git fetch $remote $commit
> >       Time (mean ± σ):     19.498 s ±  0.724 s    [User: 18.751 s, System: 0.690 s]
> >       Range (min … max):   18.629 s … 20.454 s    10 runs
> > 
> >     Summary
> >       'HEAD: git fetch $remote $commit' ran
> >         2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit'
> 
> Nice. I've sometimes wondered if parse_object() should be doing this
> optimization itself. Though we'd possibly still want callers (like this
> one) to give us more hints, since we already know the type is
> OBJ_COMMIT. Whereas parse_object() would have to discover that itself
> (though we already incur the extra type lookup there to handle blobs).

Would certainly make it much harder to hit this pitfall. The only thing
one needs to be cautious about is that we need to somehow assert the
object still exists in our ODB. Otherwise you may look up a commit via
the commit-graph even though the commit doesn't exist anymore.

> Do you have a lot of tags in your repository?

No, it's only about 2000 tags.

> I wonder where the remaining 20s is going. 

Rebasing this commit on top of my git-rev-list(1) series [1] for the
connectivity check gives another 25% speedup, going down from 20s to 14s
(numbers are a bit different given that I'm on a different machine right
now). From here on, it's multiple things which take time:

    - 20% of the time is spent sorting the refs in
      `mark_complete_and_common_ref()`. This time around I feel less
      comfortable to just disable sorting given that it may impact
      correctness.

    - 30% of the time is spent looking up object types via
      `oid_object_info_extended()`, where 75% of these lookups come from
      `deref_without_lazy_fetch()`. This can be improved a bit by doing
      the `lookup_unknown_object()` dance, buying a modest speedup of
      ~8%. But this again has memory tradeoffs given that we must
      allocate the object such that all types would fit.

Other than that I don't see any obvious things in the flame graphs. In
case anybody is interested, I've posted flame graphs in our GitLab issue
at [2], with the state before this patch, with this patch and in
combination with [1].

[1]: http://public-inbox.org/git/cover.1627896460.git.ps@pks.im/
[2]: https://gitlab.com/gitlab-org/gitlab/-/issues/336657#note_642957933

>   - you'd want to double check that we always call this during ref
>     iteration (it looks like we do, and I think peel_iterated_ref()
>     falls back to a normal peel otherwise)
> 
>   - for a tag-of-tag-of-X, that will give us the complete peel to X. But
>     it looks like deref_without_lazy_fetch() marks intermediate tags
>     with the COMPLETE flag, too. I'm not sure how important that is
>     (i.e., is it necessary for correctness, or just an optimization, in
>     which case we might be better off guessing that tags are
>     single-layer, as it's by far the common case).
> 
> If we don't go that route, there's another possible speedup: after
> parsing a tag, the type of tag->tagged (if it is not NULL) will be known
> from the tag's contents, and we can avoid the oid_object_info_extended()
> type lookup. It might need some extra surgery to convince the tag-parse
> not to fetch promisor objects, though.
> 
> I'm not sure it would make that big a difference, though. If we save one
> type-lookup per parsed tag, then the tag parsing is likely to dwarf it.

Yeah, I'd assume the same. And in any case, our repo doesn't really have
any problems with tags given that there's so few of them. So I wouldn't
really have the data to back up any performance improvements here.

Patrick