user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 2/2] v2: mirrors don't clobber msgs w/ reused Message-IDs
Date: Mon, 18 Oct 2021 05:09:05 +0000	[thread overview]
Message-ID: <20211018050905.21275-3-e@80x24.org> (raw)
In-Reply-To: <20211018050905.21275-1-e@80x24.org>

For odd messages with reused Message-IDs, the second message
showing up in a mirror (via git-fetch + -index) should never
clobber an entry with a different blob in over.

This is noticeable only if the messages arrive in-between
indexing runs.

Fixes: 4441a38481ed ("v2: index forwards (via `git log --reverse')")
---
 MANIFEST                      |  1 +
 lib/PublicInbox/V2Writable.pm |  7 ++++++-
 t/v2index-late-dupe.t         | 37 +++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 t/v2index-late-dupe.t

diff --git a/MANIFEST b/MANIFEST
index b5aae77747dd..af1522d71bd1 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -552,6 +552,7 @@ t/v1-add-remove-add.t
 t/v1reindex.t
 t/v2-add-remove-add.t
 t/v2dupindex.t
+t/v2index-late-dupe.t
 t/v2mda.t
 t/v2mirror.t
 t/v2reindex.t
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3914383cc9d3..ed5182ae8460 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -813,8 +813,8 @@ sub index_oid { # cat_async callback
 			}
 		}
 	}
+	my $oidx = $self->{oidx};
 	if (!defined($num)) { # reuse if reindexing (or duplicates)
-		my $oidx = $self->{oidx};
 		for my $mid (@$mids) {
 			($num, $mid0) = $oidx->num_mid0_for_oid($oid, $mid);
 			last if defined $num;
@@ -822,6 +822,11 @@ sub index_oid { # cat_async callback
 	}
 	$mid0 //= do { # is this a number we got before?
 		$num = $arg->{mm_tmp}->num_for($mids->[0]);
+
+		# don't clobber existing if Message-ID is reused:
+		if (my $x = defined($num) ? $oidx->get_art($num) : undef) {
+			undef($num) if $x->{blob} ne $oid;
+		}
 		defined($num) ? $mids->[0] : undef;
 	};
 	if (!defined($num)) {
diff --git a/t/v2index-late-dupe.t b/t/v2index-late-dupe.t
new file mode 100644
index 000000000000..c83e3409044f
--- /dev/null
+++ b/t/v2index-late-dupe.t
@@ -0,0 +1,37 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# this simulates a mirror path: git fetch && -index
+use strict; use v5.10.1; use PublicInbox::TestCommon;
+use Test::More; # redundant, used for bisect
+require_mods 'v2';
+require PublicInbox::Import;
+require PublicInbox::Inbox;
+require PublicInbox::Git;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = "$tmpdir/i";
+PublicInbox::Import::init_bare(my $e0 = "$inboxdir/git/0.git");
+open my $fh, '>', "$inboxdir/inbox.lock" or xbail $!;
+my $git = PublicInbox::Git->new($e0);
+my $im = PublicInbox::Import->new($git, qw(i i@example.com));
+$im->{lock_path} = undef;
+$im->{path_type} = 'v2';
+my $eml = eml_load('t/plack-qp.eml');
+ok($im->add($eml), 'add original');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index');
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir });
+my $orig = $ibx->over->get_art(1);
+
+my @mid = $eml->header_raw('Message-ID');
+$eml->header_set('Message-ID', @mid, '<extra@z>');
+ok($im->add($eml), 'add another');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index again');
+
+my $after = $ibx->over->get_art(1);
+is_deeply($after, $orig, 'original unchanged') or note explain([$orig,$after]);
+
+done_testing;

      parent reply	other threads:[~2021-10-18  5:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18  5:09 [PATCH 0/2] fix v2 mirrors of reused Message-IDs Eric Wong
2021-10-18  5:09 ` [PATCH 1/2] extindex: show mismatches for messages deleted from inbox Eric Wong
2021-10-18  5:09 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018050905.21275-3-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).