git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Matt McCutchen <matt@mattmccutchen.net>, git <git@vger.kernel.org>
Subject: Re* Protecting old temporary objects being reused from concurrent "git gc"?
Date: Wed, 16 Nov 2016 17:35:47 -0800	[thread overview]
Message-ID: <xmqqvavmopl8.fsf_-_@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20161117010449.6k3cwo3njvrid4jy@sigill.intra.peff.net> (Jeff King's message of "Wed, 16 Nov 2016 17:04:50 -0800")

Jeff King <peff@peff.net> writes:

> I think the case that is helped here is somebody who runs "git
> write-tree" and expects that the timestamp on those trees is fresh. So
> even more a briefly used index, like:
>
>   export GIT_INDEX_FILE=/tmp/foo
>   git read-tree ...
>   git write-tree
>   rm -f $GIT_INDEX_FILE
>
> we'd expect that a "git gc" which runs immediately after would see those
> trees as recent and avoid pruning them (and transitively, any blobs that
> are reachable from the trees). But I don't think that write-tree
> actually freshens them (it sees "oh, we already have these; there is
> nothing to write").

OK, here is what I have queued.

-- >8 --
Subject: cache-tree: make sure to "touch" tree objects the cache-tree records

The cache_tree_fully_valid() function is called by callers that want
to know if they need to call cache_tree_update(), i.e. as an attempt
to optimize. They all want to have a fully valid cache-tree in the
end so that they can write a tree object out.

We used to check if the cached tree object is up-to-date (i.e. the
index entires covered by the cache-tree entry hasn't been changed
since the roll-up hash was computed for the cache-tree entry) and
made sure the tree object is already in the object store.  Since the
top-level tree we would soon be asked to write out (and would find
in the object store) may not be anchored to any refs or commits
immediately, freshen the tree object when it happens.

Similarly, when we actually compute the cache-tree entries in
cache_tree_update(), we refrained from writing a tree object out
when it already exists in the object store.  We would want to
freshen the tree object in that case to protect it from premature
pruning.

Strictly speaking, freshing these tree objects at each and every
level is probably unnecessary, given that anything reachable from a
young object inherits the youth from the referring object to be
protected from pruning.  It should be sufficient to freshen only the
very top-level tree instead.  Benchmarking and optimization is left
as an exercise for later days.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache-tree.c | 6 +++---
 cache.h      | 1 +
 sha1_file.c  | 9 +++++++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 3ebf9c3aa4..c8c74a1e07 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -225,7 +225,7 @@ int cache_tree_fully_valid(struct cache_tree *it)
 	int i;
 	if (!it)
 		return 0;
-	if (it->entry_count < 0 || !has_sha1_file(it->sha1))
+	if (it->entry_count < 0 || !freshen_object(it->sha1))
 		return 0;
 	for (i = 0; i < it->subtree_nr; i++) {
 		if (!cache_tree_fully_valid(it->down[i]->cache_tree))
@@ -253,7 +253,7 @@ static int update_one(struct cache_tree *it,
 
 	*skip_count = 0;
 
-	if (0 <= it->entry_count && has_sha1_file(it->sha1))
+	if (0 <= it->entry_count && freshen_object(it->sha1))
 		return it->entry_count;
 
 	/*
@@ -393,7 +393,7 @@ static int update_one(struct cache_tree *it,
 	if (repair) {
 		unsigned char sha1[20];
 		hash_sha1_file(buffer.buf, buffer.len, tree_type, sha1);
-		if (has_sha1_file(sha1))
+		if (freshen_object(sha1))
 			hashcpy(it->sha1, sha1);
 		else
 			to_invalidate = 1;
diff --git a/cache.h b/cache.h
index 4ff196c259..72c0b321ac 100644
--- a/cache.h
+++ b/cache.h
@@ -1077,6 +1077,7 @@ extern int sha1_object_info(const unsigned char *, unsigned long *);
 extern int hash_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1);
 extern int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *return_sha1);
 extern int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type, unsigned char *sha1, unsigned flags);
+extern int freshen_object(const unsigned char *);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_noatime(const char *name);
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa029b..9acce3d3b8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3151,6 +3151,11 @@ static int freshen_packed_object(const unsigned char *sha1)
 	return 1;
 }
 
+int freshen_object(const unsigned char *sha1)
+{
+	return freshen_packed_object(sha1) || freshen_loose_object(sha1);
+}
+
 int write_sha1_file(const void *buf, unsigned long len, const char *type, unsigned char *sha1)
 {
 	char hdr[32];
@@ -3160,7 +3165,7 @@ int write_sha1_file(const void *buf, unsigned long len, const char *type, unsign
 	 * it out into .git/objects/??/?{38} file.
 	 */
 	write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
-	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
+	if (freshen_object(sha1))
 		return 0;
 	return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }
@@ -3178,7 +3183,7 @@ int hash_sha1_file_literally(const void *buf, unsigned long len, const char *typ
 
 	if (!(flags & HASH_WRITE_OBJECT))
 		goto cleanup;
-	if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
+	if (freshen_object(sha1))
 		goto cleanup;
 	status = write_loose_object(sha1, header, hdrlen, buf, len, 0);
 
-- 
2.11.0-rc1-156-g07127df5c1


  reply	other threads:[~2016-11-17  1:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-15 14:13 Protecting old temporary objects being reused from concurrent "git gc"? Matt McCutchen
2016-11-15 17:06 ` Jeff King
2016-11-15 17:33   ` Matt McCutchen
2016-11-15 17:40     ` Jeff King
2016-11-15 19:08       ` [PATCH] git-gc.txt: expand discussion of races with other processes Matt McCutchen
2016-11-15 19:12       ` Protecting old temporary objects being reused from concurrent "git gc"? Matt McCutchen
2016-11-15 20:01       ` Junio C Hamano
2016-11-16  8:07         ` Jeff King
2016-11-16 18:18           ` Junio C Hamano
2016-11-16 18:58       ` Junio C Hamano
2016-11-17  1:04         ` Jeff King
2016-11-17  1:35           ` Junio C Hamano [this message]
2016-11-17  1:43             ` Re* " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqvavmopl8.fsf_-_@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=matt@mattmccutchen.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).