git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Michael Giuffrida" <michaelpg@chromium.org>,
	git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [BUG] add_again() off-by-one error in custom format
Date: Sun, 18 Jun 2017 12:58:49 +0200	[thread overview]
Message-ID: <b728352d-7fa5-2adb-af00-5f232b85b2bf@web.de> (raw)
In-Reply-To: <20170615132532.nivmj22dctowxssm@sigill.intra.peff.net>

Am 15.06.2017 um 15:25 schrieb Jeff King:
> On Thu, Jun 15, 2017 at 01:33:34PM +0200, René Scharfe wrote:
>> Can we trust object directory time stamps for cache invalidation?
> 
> I think it would work on POSIX-ish systems, since loose object changes
> always involve new files, not modifications of existing ones. I don't
> know if there are systems that don't update directory mtimes even for
> newly added files.
> 
> That would still be a stat() per request. I'd be curious how the timing
> compares to the opendir/readdir that happens now.

Modification times wouldn't be exact, as you mentioned above: An object
could be added just after checking the time stamp.  So perhaps we don't
need that, or a time-based invalidation suffices, e.g. we discard the
cache after five minutes or so?

Anyway, here's a patch for stat-based invalidation, on top of the other
one.  Array removal is really slow (hope I didn't sneak a bug in there,
but my confidence in this code isn't very high).  No locking is done;
parallel threads removing and adding entries could make a mess, but
that's not an issue for log.

Timings for "time git log --pretty=%h >/dev/null" in my git repository
with 5094 loose objects on Debian:

        master       first patch  this patch
real    0m1.065s     0m0.581s     0m0.633s
user    0m0.648s     0m0.564s     0m0.580s
sys     0m0.412s     0m0.016s     0m0.052s


And on mingw with 227 loose objects:

        master       first patch  this patch
real    0m1.756s     0m0.546s     0m1.659s
user    0m0.000s     0m0.000s     0m0.000s
sys     0m0.000s     0m0.000s     0m0.000s

So at least for Windows it would be really nice if we could avoid
calling stat..
---
 cache.h     |  1 +
 sha1_name.c | 32 ++++++++++++++++++++++++++++----
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 8c6748907b..386b09710d 100644
--- a/cache.h
+++ b/cache.h
@@ -1589,6 +1589,7 @@ extern struct alternate_object_database {
 
 	uint32_t loose_objects_subdir_bitmap[8];
 	struct oid_array loose_objects_cache;
+	time_t loose_objects_subdir_mtime[256];
 
 	char path[FLEX_ARRAY];
 } *alt_odb_list;
diff --git a/sha1_name.c b/sha1_name.c
index ae6a5c3abe..baecb10454 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -77,6 +77,24 @@ static void update_candidates(struct disambiguate_state *ds, const struct object
 	/* otherwise, current can be discarded and candidate is still good */
 }
 
+static void remove_subdir_entries(struct oid_array *array, int subdir_nr)
+{
+	struct object_id oid;
+	int start, end;
+
+	memset(&oid, 0, sizeof(oid));
+	oid.hash[0] = subdir_nr;
+	start = oid_array_lookup(array, &oid);
+	if (start < 0)
+		start = -1 - start;
+	end = start;
+	while (end < array->nr && array->oid[end].hash[0] == subdir_nr)
+		end++;
+	memmove(array->oid + start, array->oid + end,
+		(array->nr - end) * sizeof(array->oid[0]));
+	array->nr -= end - start;
+}
+
 static void read_loose_object_subdir(struct alternate_object_database *alt,
 				     int subdir_nr)
 {
@@ -86,15 +104,21 @@ static void read_loose_object_subdir(struct alternate_object_database *alt,
 	struct dirent *de;
 	size_t bitmap_index = subdir_nr / 32;
 	uint32_t bitmap_mask = 1 << (subdir_nr % 32);
-
-	if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask)
-		return;
-	alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask;
+	struct stat st;
 
 	buf = alt_scratch_buf(alt);
 	strbuf_addf(buf, "%02x/", subdir_nr);
 	xsnprintf(hex, sizeof(hex), "%02x", subdir_nr);
 
+	stat(buf->buf, &st);
+	if (alt->loose_objects_subdir_bitmap[bitmap_index] & bitmap_mask) {
+		if (alt->loose_objects_subdir_mtime[subdir_nr] == st.st_mtime)
+			return;
+		remove_subdir_entries(&alt->loose_objects_cache, subdir_nr);
+	}
+	alt->loose_objects_subdir_mtime[subdir_nr] = st.st_mtime;
+	alt->loose_objects_subdir_bitmap[bitmap_index] |= bitmap_mask;
+
 	dir = opendir(buf->buf);
 	if (!dir)
 		return;
-- 
2.13.0

  parent reply	other threads:[~2017-06-18 10:59 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12  3:13 [BUG] add_again() off-by-one error in custom format Michael Giuffrida
2017-06-12 22:49 ` Junio C Hamano
2017-06-13 18:09   ` René Scharfe
2017-06-13 18:29     ` Junio C Hamano
2017-06-13 20:29       ` René Scharfe
2017-06-13 21:20         ` Junio C Hamano
2017-06-14 18:24           ` René Scharfe
2017-06-15  5:56             ` Jeff King
2017-06-15 11:33               ` René Scharfe
2017-06-15 13:25                 ` Jeff King
2017-06-18 10:58                   ` René Scharfe
2017-06-18 11:49                     ` Jeff King
2017-06-18 12:59                       ` René Scharfe
2017-06-18 13:56                         ` Jeff King
2017-06-22 18:19                           ` René Scharfe
2017-06-22 23:15                             ` Jeff King
2017-06-18 10:58                   ` René Scharfe [this message]
2017-06-18 11:50                     ` Jeff King
2017-06-19  4:46                       ` Junio C Hamano
2017-06-22 18:19                         ` [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename() René Scharfe
2017-06-22 23:10                           ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:14                               ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:20                               ` Jeff King
2017-06-24 14:09                                 ` René Scharfe
2017-06-24 14:12                                   ` Jeff King
2017-06-15 18:37             ` [BUG] add_again() off-by-one error in custom format Junio C Hamano
2017-06-13 22:24         ` SZEDER Gábor
2017-06-14 17:34           ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b728352d-7fa5-2adb-af00-5f232b85b2bf@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michaelpg@chromium.org \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).