git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Cc: Taylor Blau <me@ttaylorr.com>,
	GIT Mailing-list <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: 2.29.0.rc0.windows.1: Duplicate commit id error message when fetching
Date: Thu, 8 Oct 2020 08:06:58 -0400	[thread overview]
Message-ID: <20201008120658.GA2689590@coredump.intra.peff.net> (raw)
In-Reply-To: <329d91ed-097f-38ac-f1b1-73b4d57ce8ad@virtuell-zuhause.de>

On Thu, Oct 08, 2020 at 11:52:03AM +0200, Thomas Braun wrote:

> > Is it possible to share the contents of your .git directory? If not, can
> > you look in .git/objects/info/ and see if there are multiple
> > commit-graph files (and if so, possibly share those; they don't contain
> > any identifying info).
> 
> Yes sure, I can share that [1]. Thanks for looking into that.
> 
> [1]:
> http://byte-physics.de/Downloads/dotGitWriteCommitGraphDuplicatedCommitIssue.tar.gz

Thanks, I was able to easily reproduce with:

  # make a cheap copy of the repo; if our test succeeds it modifies the
  # graph files, so just try it on a fresh copy each time
  rm -rf repo.git
  cp -al /path/to/extracted/tarball/.git repo.git
  git -C repo.git -c fetch.writecommitgraph fetch .

which yields:

  From .
   * branch                HEAD       -> FETCH_HEAD
  fatal: unexpected duplicate commit id 31a13139875bc5f49ddcbd42b4b4d3dc18c16576

The good news is that this isn't a regression in v2.29. The bad news is
that it's been broken for many versions. :)

To solve your immediate problem, you can just remove the whole
.git/objects/info/commit-graphs directory. It doesn't have any data that
can't be regenerated from the actual objects.

The rest of this email is my look at what the actual bug is.

Bisecting finds a very curious culprit: 0bd52e27e3 (commit-graph.h:
store an odb in 'struct write_commit_graph_context', 2020-02-03). That
commit causes us to use a more consistent object directory name. As a
result, this loop in split_graph_merge_strategy():

        while (g && (g->num_commits <= size_mult * num_commits ||
                    (max_commits && num_commits > max_commits))) {
                if (strcmp(g->obj_dir, ctx->odb->path))
                        break;

                num_commits += g->num_commits;
                g = g->base_graph;

                ctx->num_commit_graphs_after--;
        }

does not trigger the "break" on the strcmp, whereas before that commit
it did. As a result, our "after" graph count is smaller (2 versus 4).
And then later in merge_commit_graphs(), we know we're shrinking the
number of graphs, so we add in the contents of those graph files.

So I _think_ everything being done by that patch is correct, and we
didn't see the problem before simply because we were erroneously not
rolling up the graph files. And once we do, we can see that indeed we
have the same commit in two files. If I instrument the commit-graph code
like this (I couldn't find a command to dump incremental graph file
data; is there one?):

diff --git a/commit-graph.c b/commit-graph.c
index cbfeece112..4d22fa3b41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1603,13 +1603,16 @@ static void merge_commit_graph(struct write_commit_graph_context *ctx,
 
 	ALLOC_GROW(ctx->commits.list, ctx->commits.nr + g->num_commits, ctx->commits.alloc);
 
+	warning("opening graph file %s", g->filename);
+
 	for (i = 0; i < g->num_commits; i++) {
 		struct object_id oid;
 		struct commit *result;
 
 		display_progress(ctx->progress, i + 1);
 
 		load_oid_from_graph(g, i + offset, &oid);
+		warning("has oid %s", oid_to_hex(&oid));
 
 		/* only add commits if they still exist in the repo */
 		result = lookup_commit_reference_gently(ctx->r, &oid, 1);

I see:

  warning: opening graph file objects/info/commit-graphs/graph-6ed4c0bfb0adcc15a7dc58159b3652a23d6d8c14.graph
  ...
  warning: has oid 31a13139875bc5f49ddcbd42b4b4d3dc18c16576
  ...
  warning: opening graph file objects/info/commit-graphs/graph-6444f51143e12b3f34c031e60a672d2b29d1c09e.graph
  ...
  warning: has oid 31a13139875bc5f49ddcbd42b4b4d3dc18c16576

I'm not sure how that happened, and whether it's a bug that we got into
this state at all. But regardless, it seems unfriendly that we can't
get out of it while merging the graphs. Doing this obviously makes the
problem go away:

diff --git a/commit-graph.c b/commit-graph.c
index cb042bdba8..ae1f94ccc4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2023,8 +2023,11 @@ static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx)
 
 		if (i && oideq(&ctx->commits.list[i - 1]->object.oid,
 			  &ctx->commits.list[i]->object.oid)) {
-			die(_("unexpected duplicate commit id %s"),
-			    oid_to_hex(&ctx->commits.list[i]->object.oid));
+			/*
+			 * quietly ignore duplicates; these could come from
+			 * incremental graph files mentioning the same commit.
+			 */
+			continue;
 		} else {
 			unsigned int num_parents;
 

but it's not clear to me if that's papering over another bug, or
gracefully handling a situation that we ought to be.

-Peff

  reply	other threads:[~2020-10-08 12:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 20:28 2.29.0.rc0.windows.1: Duplicate commit id error message when fetching Thomas Braun
2020-10-07 21:06 ` Jeff King
2020-10-08  9:52   ` Thomas Braun
2020-10-08 12:06     ` Jeff King [this message]
2020-10-08 12:50       ` Derrick Stolee
2020-10-08 13:22         ` Derrick Stolee
2020-10-09 15:29           ` Thomas Braun
2020-10-09 16:49             ` Derrick Stolee
2020-10-09 17:12               ` Thomas Braun
2020-10-09 17:46                 ` Derrick Stolee
2020-10-09 17:55                   ` Jeff King
2020-10-09 18:28                     ` Taylor Blau
2020-10-09 18:33                       ` Derrick Stolee
2020-10-09 18:37                         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201008120658.GA2689590@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).