git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <stolee@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 02:04:34 -0400	[thread overview]
Message-ID: <20200322060434.GC578498@coredump.intra.peff.net> (raw)
In-Reply-To: <20200322054916.GB578498@coredump.intra.peff.net>

On Sun, Mar 22, 2020 at 01:49:16AM -0400, Jeff King wrote:

> [1] I'm actually not quite sure about correctness here. It should be
>     fine to generate a graph file without any given commit; readers will
>     just have to load that commit the old-fashioned way. But at this
>     phase of "commit-graph write", I think we'll already have done the
>     close_reachable() check. What does it mean to throw away a commit at
>     this stage? If we're the parent of another commit, then it will have
>     trouble referring to us by a uint32_t. Will the actual writing phase
>     barf, or will we generate an invalid graph file?

It doesn't seem great. If I instrument Git like this to simulate an
object temporarily "missing" (if it were really missing the whole repo
would be corrupt; we're trying to see what would happen if a race causes
us to momentarily not see it):

diff --git a/commit-graph.c b/commit-graph.c
index 3da52847e4..71419c2532 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1596,6 +1596,19 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 	}
 }
 
+static int pretend_commit_is_missing(const struct object_id *oid)
+{
+	static int initialized;
+	static struct object_id missing;
+	if (!initialized) {
+		const char *x = getenv("PRETEND_COMMIT_IS_MISSING");
+		if (x)
+			get_oid_hex(x, &missing);
+		initialized = 1;
+	}
+	return oideq(&missing, oid);
+}
+
 static void merge_commit_graph(struct write_commit_graph_context *ctx,
 			       struct commit_graph *g)
 {
@@ -1612,6 +1625,11 @@ static void merge_commit_graph(struct write_commit_graph_context *ctx,
 
 		load_oid_from_graph(g, i + offset, &oid);
 
+		if (pretend_commit_is_missing(&oid)) {
+			warning("pretending %s is missing", oid_to_hex(&oid));
+			continue;
+		}
+
 		/* only add commits if they still exist in the repo */
 		result = lookup_commit_reference_gently(ctx->r, &oid, 1);
 

and then I make a fully-graphed repo like this:

  git init repo
  cd repo
  for i in $(seq 10); do
    git commit --allow-empty -m $i
  done
  git commit-graph write --input=reachable --split=no-merge

if we pretend a parent is missing, I get a BUG():

  $ git rev-parse HEAD |
    PRETEND_COMMIT_IS_MISSING=$(git rev-parse HEAD^) \
    git commit-graph write --stdin-commits --split=merge-all
  warning: pretending 35e6e15c738cf2bfbe495957b2a941c2efe86dd9 is missing
  BUG: commit-graph.c:879: missing parent 35e6e15c738cf2bfbe495957b2a941c2efe86dd9 for commit d4141fb57a9bbe26b247f23c790d63d078977833
  Aborted

So it seems like just skipping here (either with the new patch or
without) isn't really a good strategy.

-Peff

  reply	other threads:[~2020-03-22  6:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King [this message]
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200322060434.GC578498@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).