From: Patrick Steinhardt <ps@pks.im>
To: Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD
Date: Tue, 1 Mar 2022 09:44:00 +0100 [thread overview]
Message-ID: <Yh3c0PvV1f3HEsOI@ncase> (raw)
In-Reply-To: <33154b3c-1316-aab3-46a1-a0fbd87293de@github.com>
[-- Attachment #1: Type: text/plain, Size: 4708 bytes --]
On Wed, Feb 23, 2022 at 09:18:05AM -0500, Derrick Stolee wrote:
> On 2/23/2022 7:35 AM, Patrick Steinhardt wrote:
> > When fetching from a remote repository we will by default write what has
> > been fetched into the special FETCH_HEAD reference. The order in which
> > references are written depends on whether the reference is for merge or
> > not, which, despite some other conditions, is also determined based on
> > whether the old object ID the reference is being updated from actually
> > exists in the repository.
> >
> > To write FETCH_HEAD we thus loop through all references thrice: once for
> > the references that are about to be merged, once for the references that
> > are not for merge, and finally for all references that are ignored. For
> > every iteration, we then look up the old object ID to determine whether
> > the referenced object exists so that we can label it as "not-for-merge"
> > if it doesn't exist. It goes without saying that this can be expensive
> > in case where we are fetching a lot of references.
> >
> > While this is hard to avoid in the case where we're writing FETCH_HEAD,
> > users can in fact ask us to skip this work via `--no-write-fetch-head`.
> > In that case, we do not care for the result of those lookups at all
> > because we don't have to order writes to FETCH_HEAD in the first place.
> >
> > Skip this busywork in case we're not writing to FETCH_HEAD. The
> > following benchmark performs a mirror-fetch in a repository with about
> > two million references:
> >
> > Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)
> > Time (mean ± σ): 75.388 s ± 1.942 s [User: 71.103 s, System: 8.953 s]
> > Range (min … max): 73.184 s … 76.845 s 3 runs
> >
> > Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)
> > Time (mean ± σ): 69.486 s ± 1.016 s [User: 65.941 s, System: 8.806 s]
> > Range (min … max): 68.864 s … 70.659 s 3 runs
> >
> > Summary
> > 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran
> > 1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)'
>
> I have a super-small nitpick here.
>
> I see that you are using '-n' to name your experiments. These names
> are a bit long, especially since they are the same Git command but
> built at different commits. It would be enough to say the command
> you are testing before the stats and leave the names as "HEAD" and
> "HEAD~" (or, I typically use "new" and "old", respectively).
Fair enough, will change.
Patrick
> > /*
> > - * References in "refs/tags/" are often going to point
> > - * to annotated tags, which are not part of the
> > - * commit-graph. We thus only try to look up refs in
> > - * the graph which are not in that namespace to not
> > - * regress performance in repositories with many
> > - * annotated tags.
> > + * When writing FETCH_HEAD we need to determine whether
> > + * we already have the commit or not. If not, then the
> > + * reference is not for merge and needs to be written
> > + * to the reflog after other commits which we already
> > + * have. We're not interested in this property though
> > + * in case FETCH_HEAD is not to be updated, so we can
> > + * skip the classification in that case.
> > */
> > - if (!starts_with(rm->name, "refs/tags/"))
> > - commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > - if (!commit) {
> > - commit = lookup_commit_reference_gently(the_repository,
> > - &rm->old_oid,
> > - 1);
> > - if (!commit)
> > - rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > + if (fetch_head->fp) {
> > + struct commit *commit = NULL;
> > +
> > + /*
> > + * References in "refs/tags/" are often going to point
> > + * to annotated tags, which are not part of the
> > + * commit-graph. We thus only try to look up refs in
> > + * the graph which are not in that namespace to not
> > + * regress performance in repositories with many
> > + * annotated tags.
> > + */
> > + if (!starts_with(rm->name, "refs/tags/"))
> > + commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > + if (!commit) {
> > + commit = lookup_commit_reference_gently(the_repository,
> > + &rm->old_oid,
> > + 1);
> > + if (!commit)
> > + rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > + }
> > }
> Looks good. Most of the diff is whitespace.
>
> Thanks,
> -Stolee
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-03-01 8:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 12:35 [PATCH 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-02-23 14:13 ` Derrick Stolee
2022-03-01 8:43 ` Patrick Steinhardt
2022-03-01 9:24 ` Patrick Steinhardt
2022-03-02 18:53 ` Derrick Stolee
2022-02-23 12:35 ` [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Patrick Steinhardt
2022-02-23 14:18 ` Derrick Stolee
2022-03-01 8:44 ` Patrick Steinhardt [this message]
2022-02-23 12:35 ` [PATCH 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-03-01 9:33 ` [PATCH v2 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01 22:02 ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Junio C Hamano
2022-03-02 18:54 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yh3c0PvV1f3HEsOI@ncase \
--to=ps@pks.im \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).