git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: steve.norman@thomsonreuters.com, Git Mailing List <git@vger.kernel.org>
Subject: Re: Troubleshoot clone issue to NFS.
Date: Fri, 22 May 2015 03:12:24 -0400	[thread overview]
Message-ID: <20150522071224.GA10734@peff.net> (raw)
In-Reply-To: <CACsJy8BULBJ=cL1+4TFX_7tYSCFL3MNEz1Ay0YGqx8W_8=nwAg@mail.gmail.com>

On Fri, May 22, 2015 at 07:16:54AM +0700, Duy Nguyen wrote:

> > Is there anything else I can provide or test?
> 
> In builtin/index-pack.c, replace the line "collision_test_needed =
> has_sha1_file(sha1);" with "collision_test_needed = 0;". Security is
> compromised but for this test it should be ok. Then clone again. I
> hope the new number gets down close to v1.8.4.1.

Yeah, I think that is a good starting point. I timed a local clone
before and after 45e8a74; there is a small difference on my system
(about 5%), but it goes away with your suggestion.

The problem with has_sha1_file() prior to v1.8.4.2 is that it is racy
with respect to simultaneous operations; we might claim we do not have
an object, when in fact we do. As you noted, usually has_sha1_file()
returns true (i.e., we look up objects that we expect to have), and the
performance impact is minimal.

But for code paths where _not_ having the object is normal, the impact
is much greater. So I think there are two possibilities for improving
this:

  1. Find places where we expect the object will not exist (like the
     collision_test check you pointed out) and use a
     "has_sha1_file_fast" that accepts that it may very occasionally
     erroneously return false. In this case it would mean potentially
     skipping a collision check, but I think that is OK. That could have
     security implications, but only if an attacker:

       a. has broken sha1 to generate a colliding object

       b. can manipulate the victim into repacking in a loop

       c. can manipulate the victim into fetching (or receiving a push)
          simultaneously with (b)

     at which point they can try to race the repack procedure to add
     their colliding object to the repository. It seems rather unlikely
     (especially part a).

  2. Make reprepare_packed_git() cheaper in the common case that nothing
     has changed. It would probably be enough to stat("objects/pack").
     We know that packfiles themselves do not change; we may only add or
     delete them. And since the hierarchy of objects/pack is flat, we
     know that the mtime on that directory will change if any packs are
     added or removed.

     Of course, we are still doing an extra stat() for each has_sha1_file
     call. Whether that helps for the NFS case depends on whether stat()
     is significantly cheaper than opendir/readdir/closedir. On my local
     disk, the hacky patch below did seem to give me back the 5% lost by
     45e8a74 (I did it directly on master, since that old commit does
     not have the stat_validity infrastructure).

---
diff --git a/cache.h b/cache.h
index c97d807..17c840c 100644
--- a/cache.h
+++ b/cache.h
@@ -1661,6 +1661,7 @@ int sane_execvp(const char *file, char *const argv[]);
  */
 struct stat_validity {
 	struct stat_data *sd;
+	unsigned mode;
 };
 
 void stat_validity_clear(struct stat_validity *sv);
diff --git a/read-cache.c b/read-cache.c
index 36ff89f..1d8702f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2281,6 +2281,7 @@ void stat_validity_clear(struct stat_validity *sv)
 {
 	free(sv->sd);
 	sv->sd = NULL;
+	sv->mode = 0;
 }
 
 int stat_validity_check(struct stat_validity *sv, const char *path)
@@ -2291,18 +2292,19 @@ int stat_validity_check(struct stat_validity *sv, const char *path)
 		return sv->sd == NULL;
 	if (!sv->sd)
 		return 0;
-	return S_ISREG(st.st_mode) && !match_stat_data(sv->sd, &st);
+	return sv->mode == st.st_mode && !match_stat_data(sv->sd, &st);
 }
 
 void stat_validity_update(struct stat_validity *sv, int fd)
 {
 	struct stat st;
 
-	if (fstat(fd, &st) < 0 || !S_ISREG(st.st_mode))
+	if (fstat(fd, &st) < 0)
 		stat_validity_clear(sv);
 	else {
 		if (!sv->sd)
 			sv->sd = xcalloc(1, sizeof(struct stat_data));
+		sv->mode = st.st_mode;
 		fill_stat_data(sv->sd, &st);
 	}
 }
diff --git a/sha1_file.c b/sha1_file.c
index ccc6dac..abe1be9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1225,7 +1225,8 @@ static void report_pack_garbage(struct string_list *list)
 	report_helper(list, seen_bits, first, list->nr);
 }
 
-static void prepare_packed_git_one(char *objdir, int local)
+static void prepare_packed_git_one(char *objdir, int local,
+				   struct stat_validity *validity)
 {
 	struct strbuf path = STRBUF_INIT;
 	size_t dirnamelen;
@@ -1235,6 +1236,12 @@ static void prepare_packed_git_one(char *objdir, int local)
 
 	strbuf_addstr(&path, objdir);
 	strbuf_addstr(&path, "/pack");
+
+	if (stat_validity_check(validity, path.buf)) {
+		strbuf_release(&path);
+		return;
+	}
+
 	dir = opendir(path.buf);
 	if (!dir) {
 		if (errno != ENOENT)
@@ -1243,6 +1250,10 @@ static void prepare_packed_git_one(char *objdir, int local)
 		strbuf_release(&path);
 		return;
 	}
+
+	/* XXX dirfd is a BSD-ism that only made it into POSIX.1-2008 */
+	stat_validity_update(validity, dirfd(dir));
+
 	strbuf_addch(&path, '/');
 	dirnamelen = path.len;
 	while ((de = readdir(dir)) != NULL) {
@@ -1348,15 +1359,17 @@ static void rearrange_packed_git(void)
 static int prepare_packed_git_run_once = 0;
 void prepare_packed_git(void)
 {
+	static struct stat_validity validity;
 	struct alternate_object_database *alt;
 
 	if (prepare_packed_git_run_once)
 		return;
-	prepare_packed_git_one(get_object_directory(), 1);
+	prepare_packed_git_one(get_object_directory(), 1, &validity);
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
 		alt->name[-1] = 0;
-		prepare_packed_git_one(alt->base, 0);
+		/* XXX should carry pack validity struct for each alternate */
+		prepare_packed_git_one(alt->base, 0, NULL);
 		alt->name[-1] = '/';
 	}
 	rearrange_packed_git();

  reply	other threads:[~2015-05-22  7:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21 13:13 Troubleshoot clone issue to NFS steve.norman
2015-05-21 14:00 ` Christian Couder
2015-05-21 14:31 ` Duy Nguyen
2015-05-21 14:38   ` Duy Nguyen
2015-05-21 15:53     ` steve.norman
2015-05-22  0:16       ` Duy Nguyen
2015-05-22  7:12         ` Jeff King [this message]
2015-05-22  8:35           ` steve.norman
2015-05-22 10:05             ` Duy Nguyen
2015-05-22 14:37               ` Junio C Hamano
2015-05-22 15:02               ` steve.norman
2015-05-22 23:51                 ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Jeff King
2015-05-22 23:51                   ` [PATCH 1/3] stat_validity: handle non-regular files Jeff King
2015-05-23 11:00                     ` Michael Haggerty
2015-05-24  8:29                       ` Jeff King
2015-05-22 23:52                   ` [PATCH 2/3] cache.h: move stat_validity definition up Jeff King
2015-05-22 23:54                   ` [PATCH 3/3] prepare_packed_git: use stat_validity to avoid re-reading packs Jeff King
2015-05-23  1:19                   ` [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir Duy Nguyen
2015-05-23  1:21                     ` Duy Nguyen
2015-05-24  8:20                     ` Jeff King
2015-05-24  9:00           ` Troubleshoot clone issue to NFS Duy Nguyen
2015-06-05 12:01             ` steve.norman
2015-06-05 12:18               ` Jeff King
2015-06-05 12:29                 ` [PATCH] index-pack: avoid excessive re-reading of pack directory Jeff King
2015-06-09 17:24                   ` Jeff King
2015-06-09 17:41                     ` Jeff King
2015-06-10  3:46                   ` Shawn Pearce
2015-06-10 14:00                     ` Jeff King
2015-06-10 14:36                       ` Duy Nguyen
2015-06-10 21:34                       ` Shawn Pearce
2015-06-05 14:20                 ` Troubleshoot clone issue to NFS steve.norman
2015-06-16 20:50                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150522071224.GA10734@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=steve.norman@thomsonreuters.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).