git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v3] commit-graph: ignore duplicates when merging layers
Date: Thu, 8 Oct 2020 12:26:29 -0400	[thread overview]
Message-ID: <e1d4c427-b9a7-261a-6297-4a4768e8dbc0@gmail.com> (raw)
In-Reply-To: <20201008155304.GA2823778@coredump.intra.peff.net>

On 10/8/2020 11:53 AM, Jeff King wrote:
> On Thu, Oct 08, 2020 at 03:04:39PM +0000, Derrick Stolee via GitGitGadget wrote:
>> @@ -2023,17 +2023,27 @@ static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx)
>>  
>>  		if (i && oideq(&ctx->commits.list[i - 1]->object.oid,
>>  			  &ctx->commits.list[i]->object.oid)) {
>> -			die(_("unexpected duplicate commit id %s"),
>> -			    oid_to_hex(&ctx->commits.list[i]->object.oid));
>> +			/*
>> +			 * Silently ignore duplicates. These were likely
>> +			 * created due to a commit appearing in multiple
>> +			 * layers of the chain, which is unexpected but
>> +			 * not invalid. We should make sure there is a
>> +			 * unique copy in the new layer.
>> +			 */
> 
> You mentioned earlier checking tha the metadata for the duplicates was
> identical. How hard would that be to do here?

I do think it is a bit tricky, since we would need to identify
from these duplicates which commit-graph layers they live in,
then compare the binary data in each row (for tree, date, generation)
and also logical data (convert parent int-ids into oids). One way
to do this would be to create distinct 'struct commit' objects (do
not use lookup_commit()) but finding the two positions within the
layers is the hard part.

At this point, any disagreement between rows would be corrupt data
in one or the other, and it should be caught by the 'verify'
subcommand. It definitely would be caught by 'verify' in the merged
layer after the 'write' completes.

At this point, we don't have any evidence that whatever causes the
duplicate rows could possibly write the wrong data to the duplicate
rows. I'll keep it in mind as we look for that root cause.

Thanks,
-Stolee

  reply	other threads:[~2020-10-08 16:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-08 13:56 [PATCH] commit-graph: ignore duplicates when merging layers Derrick Stolee via GitGitGadget
2020-10-08 14:15 ` Taylor Blau
2020-10-08 14:29   ` Derrick Stolee
2020-10-08 14:59 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2020-10-08 15:04   ` [PATCH v3] " Derrick Stolee via GitGitGadget
2020-10-08 15:53     ` Jeff King
2020-10-08 16:26       ` Derrick Stolee [this message]
2020-10-08 16:42         ` Taylor Blau
2020-10-08 16:43         ` Jeff King
2020-10-09 20:53     ` [PATCH v4 0/2] " Derrick Stolee via GitGitGadget
2020-10-09 20:53       ` [PATCH v4 1/2] " Derrick Stolee via GitGitGadget
2020-10-09 20:53       ` [PATCH v4 2/2] commit-graph: don't write commit-graph when disabled Derrick Stolee via GitGitGadget
2020-10-09 21:12         ` Junio C Hamano
2020-10-09 21:17         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1d4c427-b9a7-261a-6297-4a4768e8dbc0@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).