git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD
Date: Tue, 1 Mar 2022 09:44:00 +0100	[thread overview]
Message-ID: <Yh3c0PvV1f3HEsOI@ncase> (raw)
In-Reply-To: <33154b3c-1316-aab3-46a1-a0fbd87293de@github.com>

[-- Attachment #1: Type: text/plain, Size: 4708 bytes --]

On Wed, Feb 23, 2022 at 09:18:05AM -0500, Derrick Stolee wrote:
> On 2/23/2022 7:35 AM, Patrick Steinhardt wrote:
> > When fetching from a remote repository we will by default write what has
> > been fetched into the special FETCH_HEAD reference. The order in which
> > references are written depends on whether the reference is for merge or
> > not, which, despite some other conditions, is also determined based on
> > whether the old object ID the reference is being updated from actually
> > exists in the repository.
> > 
> > To write FETCH_HEAD we thus loop through all references thrice: once for
> > the references that are about to be merged, once for the references that
> > are not for merge, and finally for all references that are ignored. For
> > every iteration, we then look up the old object ID to determine whether
> > the referenced object exists so that we can label it as "not-for-merge"
> > if it doesn't exist. It goes without saying that this can be expensive
> > in case where we are fetching a lot of references.
> > 
> > While this is hard to avoid in the case where we're writing FETCH_HEAD,
> > users can in fact ask us to skip this work via `--no-write-fetch-head`.
> > In that case, we do not care for the result of those lookups at all
> > because we don't have to order writes to FETCH_HEAD in the first place.
> > 
> > Skip this busywork in case we're not writing to FETCH_HEAD. The
> > following benchmark performs a mirror-fetch in a repository with about
> > two million references:
> > 
> >     Benchmark 1: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)
> >       Time (mean ± σ):     75.388 s ±  1.942 s    [User: 71.103 s, System: 8.953 s]
> >       Range (min … max):   73.184 s … 76.845 s    3 runs
> > 
> >     Benchmark 2: git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)
> >       Time (mean ± σ):     69.486 s ±  1.016 s    [User: 65.941 s, System: 8.806 s]
> >       Range (min … max):   68.864 s … 70.659 s    3 runs
> > 
> >     Summary
> >       'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD)' ran
> >         1.08 ± 0.03 times faster than 'git fetch --prune --no-write-fetch-head +refs/*:refs/* (HEAD~)'
> 
> I have a super-small nitpick here.
> 
> I see that you are using '-n' to name your experiments. These names
> are a bit long, especially since they are the same Git command but
> built at different commits. It would be enough to say the command
> you are testing before the stats and leave the names as "HEAD" and
> "HEAD~" (or, I typically use "new" and "old", respectively).

Fair enough, will change.

Patrick

> >  			/*
> > -			 * References in "refs/tags/" are often going to point
> > -			 * to annotated tags, which are not part of the
> > -			 * commit-graph. We thus only try to look up refs in
> > -			 * the graph which are not in that namespace to not
> > -			 * regress performance in repositories with many
> > -			 * annotated tags.
> > +			 * When writing FETCH_HEAD we need to determine whether
> > +			 * we already have the commit or not. If not, then the
> > +			 * reference is not for merge and needs to be written
> > +			 * to the reflog after other commits which we already
> > +			 * have. We're not interested in this property though
> > +			 * in case FETCH_HEAD is not to be updated, so we can
> > +			 * skip the classification in that case.
> >  			 */
> > -			if (!starts_with(rm->name, "refs/tags/"))
> > -				commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > -			if (!commit) {
> > -				commit = lookup_commit_reference_gently(the_repository,
> > -									&rm->old_oid,
> > -									1);
> > -				if (!commit)
> > -					rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > +			if (fetch_head->fp) {
> > +				struct commit *commit = NULL;
> > +
> > +				/*
> > +				 * References in "refs/tags/" are often going to point
> > +				 * to annotated tags, which are not part of the
> > +				 * commit-graph. We thus only try to look up refs in
> > +				 * the graph which are not in that namespace to not
> > +				 * regress performance in repositories with many
> > +				 * annotated tags.
> > +				 */
> > +				if (!starts_with(rm->name, "refs/tags/"))
> > +					commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
> > +				if (!commit) {
> > +					commit = lookup_commit_reference_gently(the_repository,
> > +										&rm->old_oid,
> > +										1);
> > +					if (!commit)
> > +						rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
> > +				}
> >  			}
> Looks good. Most of the diff is whitespace.
> 
> Thanks,
> -Stolee

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-03-01  8:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 12:35 [PATCH 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-02-23 14:13   ` Derrick Stolee
2022-03-01  8:43     ` Patrick Steinhardt
2022-03-01  9:24       ` Patrick Steinhardt
2022-03-02 18:53         ` Derrick Stolee
2022-02-23 12:35 ` [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Patrick Steinhardt
2022-02-23 14:18   ` Derrick Stolee
2022-03-01  8:44     ` Patrick Steinhardt [this message]
2022-02-23 12:35 ` [PATCH 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01  9:33 ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01 22:02   ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Junio C Hamano
2022-03-02 18:54   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yh3c0PvV1f3HEsOI@ncase \
    --to=ps@pks.im \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).