git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: s@kazlauskas.me, Jeff King <peff@peff.net>,
	git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH] files-backend: cheapen refname_available check when locking refs
Date: Thu, 17 Aug 2017 17:12:50 +0200	[thread overview]
Message-ID: <4e81f1ecf190082d3415d96650014841cd4c5b19.1502982012.git.mhagger@alum.mit.edu> (raw)
In-Reply-To: <20170709112932.njac5m6jmgmjywoz@sigill.intra.peff.net>

When locking references in preparation for updating them, we need to
check that none of the newly added references D/F conflict with
existing references (e.g., we don't allow `refs/foo` to be added if
`refs/foo/bar` already exists, or vice versa).

Prior to 524a9fdb51 (refs_verify_refname_available(): use function in
more places, 2017-04-16), conflicts with existing loose references
were checked by looking directly in the filesystem, and then conflicts
with existing packed references were checked by running
`verify_refname_available_dir()` against the packed-refs cache.

But that commit changed the final check to call
`refs_verify_refname_available()` against the *whole* files ref-store,
including both loose and packed references, with the following
comment:

> This means that those callsites now check for conflicts with all
> references rather than just packed refs, but the performance cost
> shouldn't be significant (and will be regained later).

That comment turned out to be too sanguine. User s@kazlauskas.me
reported that fetches involving a very large number of references in
neighboring directories were slowed down by that change.

The problem is that when fetching, each reference is updated
individually, within its own reference transaction. This is done
because some reference updates might succeed even though others fail.
But every time a reference update transaction is finished,
`clear_loose_ref_cache()` is called. So when it is time to update the
next reference, part of the loose ref cache has to be repopulated for
the `refs_verify_refname_available()` call. If the references are all
in neighboring directories, then the cost of repopulating the
reference cache increases with the number of references, resulting in
O(N²) effort.

The comment above also claims that the performance cost "will be
regained later". The idea was that once the packed-refs were finished
being split out into a separate ref-store, we could limit the
`refs_verify_refname_available()` call to the packed references again.
That is what we do now.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
This patch applies on top of branch mh/packed-ref-store. It can also
be obtained from my fork [1] as branch "faster-refname-available-check".

I was testing this using the reporter's recipe (but fetching from a
local clone), and found the following surprising timing numbers:

b05855b5bc (before the slowdown): 22.7 s
524a9fdb51 (immediately after the slowdown): 13 minutes
4e81f1ecf1 (after this fix): 14.5 s

The fact that the fetch is now significantly *faster* than before the
slowdown seems not to have anything to do with the reference code.

 refs/files-backend.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index e9b95592b6..f2a420c611 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -631,11 +631,11 @@ static int lock_raw_ref(struct files_ref_store *refs,
 
 		/*
 		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing ref that conflicts
-		 * with refname:
+		 * make sure there is no existing packed ref that
+		 * conflicts with refname:
 		 */
 		if (refs_verify_refname_available(
-				    &refs->base, refname,
+				    refs->packed_ref_store, refname,
 				    extras, skip, err))
 			goto error_return;
 	}
@@ -938,7 +938,7 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs,
 	 * our refname.
 	 */
 	if (is_null_oid(&lock->old_oid) &&
-	    refs_verify_refname_available(&refs->base, refname,
+	    refs_verify_refname_available(refs->packed_ref_store, refname,
 					  extras, skip, err)) {
 		last_errno = ENOTDIR;
 		goto error_return;
-- 
2.11.0


  reply	other threads:[~2017-08-17 15:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-09 10:25 Fetching new refs gets progressively slower s
2017-07-09 11:29 ` Jeff King
2017-08-17 15:12   ` Michael Haggerty [this message]
2017-08-17 15:22     ` [PATCH] files-backend: cheapen refname_available check when locking refs Jeff King
2017-08-17 17:56       ` Brandon Williams
2017-08-17 21:37       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e81f1ecf190082d3415d96650014841cd4c5b19.1502982012.git.mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=s@kazlauskas.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).