git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD
Date: Tue, 1 Mar 2022 10:33:41 +0100	[thread overview]
Message-ID: <fbe76b78c30d98f1b10c474b1e0ddf6fa4db44fc.1646127015.git.ps@pks.im> (raw)
In-Reply-To: <cover.1646127015.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4415 bytes --]

When fetching from a remote repository we will by default write what has
been fetched into the special FETCH_HEAD reference. The order in which
references are written depends on whether the reference is for merge or
not, which, despite some other conditions, is also determined based on
whether the old object ID the reference is being updated from actually
exists in the repository.

To write FETCH_HEAD we thus loop through all references thrice: once for
the references that are about to be merged, once for the references that
are not for merge, and finally for all references that are ignored. For
every iteration, we then look up the old object ID to determine whether
the referenced object exists so that we can label it as "not-for-merge"
if it doesn't exist. It goes without saying that this can be expensive
in case where we are fetching a lot of references.

While this is hard to avoid in the case where we're writing FETCH_HEAD,
users can in fact ask us to skip this work via `--no-write-fetch-head`.
In that case, we do not care for the result of those lookups at all
because we don't have to order writes to FETCH_HEAD in the first place.

Skip this busywork in case we're not writing to FETCH_HEAD. The
following benchmark performs a mirror-fetch in a repository with about
two million references via `git fetch --prune --no-write-fetch-head
+refs/*:refs/*`:

    Benchmark 1: HEAD~
      Time (mean ± σ):     75.388 s ±  1.942 s    [User: 71.103 s, System: 8.953 s]
      Range (min … max):   73.184 s … 76.845 s    3 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     69.486 s ±  1.016 s    [User: 65.941 s, System: 8.806 s]
      Range (min … max):   68.864 s … 70.659 s    3 runs

    Summary
      'HEAD' ran
        1.08 ± 0.03 times faster than 'HEAD~'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e8305b6662..4d12c2fd4d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1146,7 +1146,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	     want_status <= FETCH_HEAD_IGNORE;
 	     want_status++) {
 		for (rm = ref_map; rm; rm = rm->next) {
-			struct commit *commit = NULL;
 			struct ref *ref = NULL;
 
 			if (rm->status == REF_STATUS_REJECT_SHALLOW) {
@@ -1157,21 +1156,34 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 			}
 
 			/*
-			 * References in "refs/tags/" are often going to point
-			 * to annotated tags, which are not part of the
-			 * commit-graph. We thus only try to look up refs in
-			 * the graph which are not in that namespace to not
-			 * regress performance in repositories with many
-			 * annotated tags.
+			 * When writing FETCH_HEAD we need to determine whether
+			 * we already have the commit or not. If not, then the
+			 * reference is not for merge and needs to be written
+			 * to the reflog after other commits which we already
+			 * have. We're not interested in this property though
+			 * in case FETCH_HEAD is not to be updated, so we can
+			 * skip the classification in that case.
 			 */
-			if (!starts_with(rm->name, "refs/tags/"))
-				commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
-			if (!commit) {
-				commit = lookup_commit_reference_gently(the_repository,
-									&rm->old_oid,
-									1);
-				if (!commit)
-					rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
+			if (fetch_head->fp) {
+				struct commit *commit = NULL;
+
+				/*
+				 * References in "refs/tags/" are often going to point
+				 * to annotated tags, which are not part of the
+				 * commit-graph. We thus only try to look up refs in
+				 * the graph which are not in that namespace to not
+				 * regress performance in repositories with many
+				 * annotated tags.
+				 */
+				if (!starts_with(rm->name, "refs/tags/"))
+					commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
+				if (!commit) {
+					commit = lookup_commit_reference_gently(the_repository,
+										&rm->old_oid,
+										1);
+					if (!commit)
+						rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
+				}
 			}
 
 			if (rm->fetch_head_status != want_status)
-- 
2.35.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2022-03-01  9:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 12:35 [PATCH 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-02-23 14:13   ` Derrick Stolee
2022-03-01  8:43     ` Patrick Steinhardt
2022-03-01  9:24       ` Patrick Steinhardt
2022-03-02 18:53         ` Derrick Stolee
2022-02-23 12:35 ` [PATCH 2/5] fetch: avoid lookup of commits when not appending to FETCH_HEAD Patrick Steinhardt
2022-02-23 14:18   ` Derrick Stolee
2022-03-01  8:44     ` Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-02-23 12:35 ` [PATCH 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01  9:33 ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 1/5] upload-pack: look up "want" lines via commit-graph Patrick Steinhardt
2022-03-01  9:33   ` Patrick Steinhardt [this message]
2022-03-01  9:33   ` [PATCH v2 3/5] refs: add ability for backends to special-case reading of symbolic refs Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 4/5] remote: read symbolic refs via `refs_read_symbolic_ref()` Patrick Steinhardt
2022-03-01  9:33   ` [PATCH v2 5/5] refs/files-backend: optimize reading of symbolic refs Patrick Steinhardt
2022-03-01 22:02   ` [PATCH v2 0/5] fetch: more optimizations for mirror fetches Junio C Hamano
2022-03-02 18:54   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbe76b78c30d98f1b10c474b1e0ddf6fa4db44fc.1646127015.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).