git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: steve.norman@thomsonreuters.com
Cc: pclouds@gmail.com, git@vger.kernel.org
Subject: [PATCH] index-pack: avoid excessive re-reading of pack directory
Date: Fri, 5 Jun 2015 08:29:21 -0400	[thread overview]
Message-ID: <20150605122921.GA7586@peff.net> (raw)
In-Reply-To: <20150605121817.GA22125@peff.net>

On Fri, Jun 05, 2015 at 08:18:17AM -0400, Jeff King wrote:

>   1. Devise some torture to tests to see whether my patch series is in
>      fact racy on Linux.
>
>   2. Assuming it is, scrap it and make a has_sha1_file_quick() which
>      might sometimes return a false negative (if somebody else is
>      repacking). Use that in index-pack (and possibly other places, but
>      we can start with index-pack).
> 
> If we skip step 1 out of pessimism (which I think is a reasonable thing
> to do), then step 2 should not be all that much work. I'm going to be
> offline for a few days, though, so I won't get to it until next week at
> the earliest. If you (or someone else) wants to take a stab at it,
> please feel free.

Actually, it really is a tiny bit of work. I knocked this out in less
than 10 minutes. It passes the test suite, but it's entirely possible
that I did something boneheaded that does not fix your problem. ;)

Please test and confirm that it makes things better on your system.

-- >8 --
Subject: [PATCH] index-pack: avoid excessive re-reading of pack directory

Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we spend extra effort for
has_sha1_file to give the right answer when somebody else is
repacking. Usually this effort does not matter, because
after finding that the object does not exist, the next step
is usually to die().

However, some code paths make a large number of
has_sha1_file checks which are _not_ expected to return 1.
The collision test in index-pack.c is such a case. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.

This patch introduces a "quick" flag to has_sha1_file which
callers can use when they would prefer high performance at
the cost of false negatives during repacks. There may be
other code paths that can use this, but the index-pack one
is the most obviously critical, so we'll start with
switching that one.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/index-pack.c |  2 +-
 cache.h              | 11 ++++++++++-
 sha1_file.c          |  4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ea2020..0761dfd 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -784,7 +784,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 	assert(data || obj_entry);
 
 	read_lock();
-	collision_test_needed = has_sha1_file(sha1);
+	collision_test_needed = has_sha1_file_with_flags(sha1, HAS_SHA1_QUICK);
 	read_unlock();
 
 	if (collision_test_needed && !data) {
diff --git a/cache.h b/cache.h
index 571c98f..98edc2c 100644
--- a/cache.h
+++ b/cache.h
@@ -943,8 +943,17 @@ extern int has_sha1_pack(const unsigned char *sha1);
  * Return true iff we have an object named sha1, whether local or in
  * an alternate object database, and whether packed or loose.  This
  * function does not respect replace references.
+ *
+ * If the QUICK flag is set, do not re-check the pack directory
+ * when we cannot find the object (this means we may give a false
+ * negative answer if another process is simultaneously repacking).
  */
-extern int has_sha1_file(const unsigned char *sha1);
+#define HAS_SHA1_QUICK 0x1
+extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
+static inline int has_sha1_file(const unsigned char *sha1)
+{
+	return has_sha1_file_with_flags(sha1, 0);
+}
 
 /*
  * Return true iff an alternate object database has a loose object
diff --git a/sha1_file.c b/sha1_file.c
index 7e38148..dc8746b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3161,7 +3161,7 @@ int has_sha1_pack(const unsigned char *sha1)
 	return find_pack_entry(sha1, &e);
 }
 
-int has_sha1_file(const unsigned char *sha1)
+int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
 	struct pack_entry e;
 
@@ -3169,6 +3169,8 @@ int has_sha1_file(const unsigned char *sha1)
 		return 1;
 	if (has_loose_object(sha1))
 		return 1;
+	if (flags & HAS_SHA1_QUICK)
+		return 0;
 	reprepare_packed_git();
 	return find_pack_entry(sha1, &e);
 }
-- 
2.4.2.752.geeb594a

  reply	other threads:[~2015-06-05 12:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38   ` Duy Nguyen
2015-05-21 15:53     ` steve.norman
2015-05-22  0:16       ` Duy Nguyen
2015-05-22  7:12         ` Jeff King
2015-05-22  8:35           ` steve.norman
2015-05-22 10:05             ` Duy Nguyen
2015-05-22 14:37               ` Junio C Hamano
2015-05-22 15:02               ` steve.norman
2015-05-22 23:51                 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51                   ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00                     ` Michael Haggerty
2015-05-24  8:29                       ` Jeff King
2015-05-22 23:52                   ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54                   ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23  1:19                   ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23  1:21                     ` Duy Nguyen
2015-05-24  8:20                     ` Jeff King
2015-05-24  9:00           ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01             ` steve.norman
2015-06-05 12:18               ` Jeff King
2015-06-05 12:29                 ` Jeff King [this message]
2015-06-09 17:24                   ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:41                     ` Jeff King
2015-06-10  3:46                   ` Shawn Pearce
2015-06-10 14:00                     ` Jeff King
2015-06-10 14:36                       ` Duy Nguyen
2015-06-10 21:34                       ` Shawn Pearce
2015-06-05 14:20                 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150605122921.GA7586@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=steve.norman@thomsonreuters.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).