* Re: Troubleshooting threads missing from /all/
2021-10-18 5:25 6% ` Eric Wong
@ 2021-10-18 14:04 0% ` Konstantin Ryabitsev
0 siblings, 0 replies; 4+ results
From: Konstantin Ryabitsev @ 2021-10-18 14:04 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Mon, Oct 18, 2021 at 05:25:26AM +0000, Eric Wong wrote:
> > Btw, I'm chasing a separate bug in v2 which causes recycled
> > Message-IDs to go missing sometimes from a v2 over.sqlite3;
> > which then causes -extindex to lose a message...
>
> I just pushed out commit 325fbe26c3e7731e
> (v2: mirrors don't clobber msgs w/ reused Message-IDs, 2021-10-18)
>
> Now I'm reindexing all my v2 inboxes before running
> "-extindex --all --reindex --fast". Fortunately, v2 inboxes
> are all "-L basic" so they're not too expensive to reindex.
Okay, I guess I should plan the same, then. I'll see if I can pair this with
the switching over to the "basic" indexing for individual inboxes.
-K
^ permalink raw reply [relevance 0%]
* Re: Troubleshooting threads missing from /all/
@ 2021-10-18 5:25 6% ` Eric Wong
2021-10-18 14:04 0% ` Konstantin Ryabitsev
0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-18 5:25 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > On Sat, Oct 16, 2021 at 09:43:24AM +0000, Eric Wong wrote:
> > > Eric Wong <e@80x24.org> wrote:
> > > > Yes. Though given the current situation with missing messages
> > > > from /all/, I'd wait until a reindex recovers the missing
> > > > messages (and probably a fast fsck checker).
> > >
> > > I think "public-inbox-extindex --reindex --all --fast" is
> > > reasonably ready as an fsck checker. I've been running it a
> > > bunch in recent days/weeks and also found+fixed some other bugs
> > > along the way.
>
> Btw, I'm chasing a separate bug in v2 which causes recycled
> Message-IDs to go missing sometimes from a v2 over.sqlite3;
> which then causes -extindex to lose a message...
I just pushed out commit 325fbe26c3e7731e
(v2: mirrors don't clobber msgs w/ reused Message-IDs, 2021-10-18)
Now I'm reindexing all my v2 inboxes before running
"-extindex --all --reindex --fast". Fortunately, v2 inboxes
are all "-L basic" so they're not too expensive to reindex.
Really hoping this is the last bug related to indexing for a
while...
^ permalink raw reply [relevance 6%]
* [PATCH 2/2] v2: mirrors don't clobber msgs w/ reused Message-IDs
2021-10-18 5:09 6% [PATCH 0/2] fix v2 mirrors of reused Message-IDs Eric Wong
@ 2021-10-18 5:09 7% ` Eric Wong
0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-18 5:09 UTC (permalink / raw)
To: meta
For odd messages with reused Message-IDs, the second message
showing up in a mirror (via git-fetch + -index) should never
clobber an entry with a different blob in over.
This is noticeable only if the messages arrive in-between
indexing runs.
Fixes: 4441a38481ed ("v2: index forwards (via `git log --reverse')")
---
MANIFEST | 1 +
lib/PublicInbox/V2Writable.pm | 7 ++++++-
t/v2index-late-dupe.t | 37 +++++++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 1 deletion(-)
create mode 100644 t/v2index-late-dupe.t
diff --git a/MANIFEST b/MANIFEST
index b5aae77747dd..af1522d71bd1 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -552,6 +552,7 @@ t/v1-add-remove-add.t
t/v1reindex.t
t/v2-add-remove-add.t
t/v2dupindex.t
+t/v2index-late-dupe.t
t/v2mda.t
t/v2mirror.t
t/v2reindex.t
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3914383cc9d3..ed5182ae8460 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -813,8 +813,8 @@ sub index_oid { # cat_async callback
}
}
}
+ my $oidx = $self->{oidx};
if (!defined($num)) { # reuse if reindexing (or duplicates)
- my $oidx = $self->{oidx};
for my $mid (@$mids) {
($num, $mid0) = $oidx->num_mid0_for_oid($oid, $mid);
last if defined $num;
@@ -822,6 +822,11 @@ sub index_oid { # cat_async callback
}
$mid0 //= do { # is this a number we got before?
$num = $arg->{mm_tmp}->num_for($mids->[0]);
+
+ # don't clobber existing if Message-ID is reused:
+ if (my $x = defined($num) ? $oidx->get_art($num) : undef) {
+ undef($num) if $x->{blob} ne $oid;
+ }
defined($num) ? $mids->[0] : undef;
};
if (!defined($num)) {
diff --git a/t/v2index-late-dupe.t b/t/v2index-late-dupe.t
new file mode 100644
index 000000000000..c83e3409044f
--- /dev/null
+++ b/t/v2index-late-dupe.t
@@ -0,0 +1,37 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# this simulates a mirror path: git fetch && -index
+use strict; use v5.10.1; use PublicInbox::TestCommon;
+use Test::More; # redundant, used for bisect
+require_mods 'v2';
+require PublicInbox::Import;
+require PublicInbox::Inbox;
+require PublicInbox::Git;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = "$tmpdir/i";
+PublicInbox::Import::init_bare(my $e0 = "$inboxdir/git/0.git");
+open my $fh, '>', "$inboxdir/inbox.lock" or xbail $!;
+my $git = PublicInbox::Git->new($e0);
+my $im = PublicInbox::Import->new($git, qw(i i@example.com));
+$im->{lock_path} = undef;
+$im->{path_type} = 'v2';
+my $eml = eml_load('t/plack-qp.eml');
+ok($im->add($eml), 'add original');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index');
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir });
+my $orig = $ibx->over->get_art(1);
+
+my @mid = $eml->header_raw('Message-ID');
+$eml->header_set('Message-ID', @mid, '<extra@z>');
+ok($im->add($eml), 'add another');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index again');
+
+my $after = $ibx->over->get_art(1);
+is_deeply($after, $orig, 'original unchanged') or note explain([$orig,$after]);
+
+done_testing;
^ permalink raw reply related [relevance 7%]
* [PATCH 0/2] fix v2 mirrors of reused Message-IDs
@ 2021-10-18 5:09 6% Eric Wong
2021-10-18 5:09 7% ` [PATCH 2/2] v2: mirrors don't clobber msgs w/ " Eric Wong
0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-18 5:09 UTC (permalink / raw)
To: meta
Eeep! :<
Eric Wong (2):
extindex: show mismatches for messages deleted from inbox
v2: mirrors don't clobber msgs w/ reused Message-IDs
MANIFEST | 1 +
lib/PublicInbox/ExtSearchIdx.pm | 14 ++++++++++---
lib/PublicInbox/V2Writable.pm | 7 ++++++-
t/v2index-late-dupe.t | 37 +++++++++++++++++++++++++++++++++
4 files changed, 55 insertions(+), 4 deletions(-)
create mode 100644 t/v2index-late-dupe.t
^ permalink raw reply [relevance 6%]
Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-10-01 23:11 Troubleshooting threads missing from /all/ Konstantin Ryabitsev
2021-10-01 23:46 ` Eric Wong
2021-10-05 4:39 ` Eric Wong
2021-10-05 18:03 ` Konstantin Ryabitsev
2021-10-07 21:33 ` Eric Wong
2021-10-08 17:33 ` Konstantin Ryabitsev
2021-10-08 21:34 ` Eric Wong
2021-10-16 9:43 ` Eric Wong
2021-10-17 18:04 ` Konstantin Ryabitsev
2021-10-17 23:12 ` Eric Wong
2021-10-18 5:25 6% ` Eric Wong
2021-10-18 14:04 0% ` Konstantin Ryabitsev
2021-10-18 5:09 6% [PATCH 0/2] fix v2 mirrors of reused Message-IDs Eric Wong
2021-10-18 5:09 7% ` [PATCH 2/2] v2: mirrors don't clobber msgs w/ " Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).