user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: Troubleshooting threads missing from /all/
  2021-10-18  5:25  6%                   ` Eric Wong
@ 2021-10-18 14:04  0%                     ` Konstantin Ryabitsev
  0 siblings, 0 replies; 4+ results
From: Konstantin Ryabitsev @ 2021-10-18 14:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Mon, Oct 18, 2021 at 05:25:26AM +0000, Eric Wong wrote:
> > Btw, I'm chasing a separate bug in v2 which causes recycled
> > Message-IDs to go missing sometimes from a v2 over.sqlite3;
> > which then causes -extindex to lose a message...
> 
> I just pushed out commit 325fbe26c3e7731e
> (v2: mirrors don't clobber msgs w/ reused Message-IDs, 2021-10-18)
> 
> Now I'm reindexing all my v2 inboxes before running
> "-extindex --all --reindex --fast".  Fortunately, v2 inboxes
> are all "-L basic" so they're not too expensive to reindex.

Okay, I guess I should plan the same, then. I'll see if I can pair this with
the switching over to the "basic" indexing for individual inboxes.

-K

^ permalink raw reply	[relevance 0%]

* Re: Troubleshooting threads missing from /all/
  @ 2021-10-18  5:25  6%                   ` Eric Wong
  2021-10-18 14:04  0%                     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-18  5:25 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > On Sat, Oct 16, 2021 at 09:43:24AM +0000, Eric Wong wrote:
> > > Eric Wong <e@80x24.org> wrote:
> > > > Yes.  Though given the current situation with missing messages
> > > > from /all/, I'd wait until a reindex recovers the missing
> > > > messages (and probably a fast fsck checker).
> > > 
> > > I think "public-inbox-extindex --reindex --all --fast" is
> > > reasonably ready as an fsck checker.  I've been running it a
> > > bunch in recent days/weeks and also found+fixed some other bugs
> > > along the way.
> 
> Btw, I'm chasing a separate bug in v2 which causes recycled
> Message-IDs to go missing sometimes from a v2 over.sqlite3;
> which then causes -extindex to lose a message...

I just pushed out commit 325fbe26c3e7731e
(v2: mirrors don't clobber msgs w/ reused Message-IDs, 2021-10-18)

Now I'm reindexing all my v2 inboxes before running
"-extindex --all --reindex --fast".  Fortunately, v2 inboxes
are all "-L basic" so they're not too expensive to reindex.

Really hoping this is the last bug related to indexing for a
while...

^ permalink raw reply	[relevance 6%]

* [PATCH 2/2] v2: mirrors don't clobber msgs w/ reused Message-IDs
  2021-10-18  5:09  6% [PATCH 0/2] fix v2 mirrors of reused Message-IDs Eric Wong
@ 2021-10-18  5:09  7% ` Eric Wong
  0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2021-10-18  5:09 UTC (permalink / raw)
  To: meta

For odd messages with reused Message-IDs, the second message
showing up in a mirror (via git-fetch + -index) should never
clobber an entry with a different blob in over.

This is noticeable only if the messages arrive in-between
indexing runs.

Fixes: 4441a38481ed ("v2: index forwards (via `git log --reverse')")
---
 MANIFEST                      |  1 +
 lib/PublicInbox/V2Writable.pm |  7 ++++++-
 t/v2index-late-dupe.t         | 37 +++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 t/v2index-late-dupe.t

diff --git a/MANIFEST b/MANIFEST
index b5aae77747dd..af1522d71bd1 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -552,6 +552,7 @@ t/v1-add-remove-add.t
 t/v1reindex.t
 t/v2-add-remove-add.t
 t/v2dupindex.t
+t/v2index-late-dupe.t
 t/v2mda.t
 t/v2mirror.t
 t/v2reindex.t
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 3914383cc9d3..ed5182ae8460 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -813,8 +813,8 @@ sub index_oid { # cat_async callback
 			}
 		}
 	}
+	my $oidx = $self->{oidx};
 	if (!defined($num)) { # reuse if reindexing (or duplicates)
-		my $oidx = $self->{oidx};
 		for my $mid (@$mids) {
 			($num, $mid0) = $oidx->num_mid0_for_oid($oid, $mid);
 			last if defined $num;
@@ -822,6 +822,11 @@ sub index_oid { # cat_async callback
 	}
 	$mid0 //= do { # is this a number we got before?
 		$num = $arg->{mm_tmp}->num_for($mids->[0]);
+
+		# don't clobber existing if Message-ID is reused:
+		if (my $x = defined($num) ? $oidx->get_art($num) : undef) {
+			undef($num) if $x->{blob} ne $oid;
+		}
 		defined($num) ? $mids->[0] : undef;
 	};
 	if (!defined($num)) {
diff --git a/t/v2index-late-dupe.t b/t/v2index-late-dupe.t
new file mode 100644
index 000000000000..c83e3409044f
--- /dev/null
+++ b/t/v2index-late-dupe.t
@@ -0,0 +1,37 @@
+# Copyright (C) all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# this simulates a mirror path: git fetch && -index
+use strict; use v5.10.1; use PublicInbox::TestCommon;
+use Test::More; # redundant, used for bisect
+require_mods 'v2';
+require PublicInbox::Import;
+require PublicInbox::Inbox;
+require PublicInbox::Git;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = "$tmpdir/i";
+PublicInbox::Import::init_bare(my $e0 = "$inboxdir/git/0.git");
+open my $fh, '>', "$inboxdir/inbox.lock" or xbail $!;
+my $git = PublicInbox::Git->new($e0);
+my $im = PublicInbox::Import->new($git, qw(i i@example.com));
+$im->{lock_path} = undef;
+$im->{path_type} = 'v2';
+my $eml = eml_load('t/plack-qp.eml');
+ok($im->add($eml), 'add original');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index');
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir });
+my $orig = $ibx->over->get_art(1);
+
+my @mid = $eml->header_raw('Message-ID');
+$eml->header_set('Message-ID', @mid, '<extra@z>');
+ok($im->add($eml), 'add another');
+$im->done;
+run_script([qw(-index -Lbasic), $inboxdir]);
+is($?, 0, 'basic index again');
+
+my $after = $ibx->over->get_art(1);
+is_deeply($after, $orig, 'original unchanged') or note explain([$orig,$after]);
+
+done_testing;

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/2] fix v2 mirrors of reused Message-IDs
@ 2021-10-18  5:09  6% Eric Wong
  2021-10-18  5:09  7% ` [PATCH 2/2] v2: mirrors don't clobber msgs w/ " Eric Wong
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2021-10-18  5:09 UTC (permalink / raw)
  To: meta

Eeep! :<

Eric Wong (2):
  extindex: show mismatches for messages deleted from inbox
  v2: mirrors don't clobber msgs w/ reused Message-IDs

 MANIFEST                        |  1 +
 lib/PublicInbox/ExtSearchIdx.pm | 14 ++++++++++---
 lib/PublicInbox/V2Writable.pm   |  7 ++++++-
 t/v2index-late-dupe.t           | 37 +++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 4 deletions(-)
 create mode 100644 t/v2index-late-dupe.t

^ permalink raw reply	[relevance 6%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-10-01 23:11     Troubleshooting threads missing from /all/ Konstantin Ryabitsev
2021-10-01 23:46     ` Eric Wong
2021-10-05  4:39       ` Eric Wong
2021-10-05 18:03         ` Konstantin Ryabitsev
2021-10-07 21:33           ` Eric Wong
2021-10-08 17:33             ` Konstantin Ryabitsev
2021-10-08 21:34               ` Eric Wong
2021-10-16  9:43                 ` Eric Wong
2021-10-17 18:04                   ` Konstantin Ryabitsev
2021-10-17 23:12                     ` Eric Wong
2021-10-18  5:25  6%                   ` Eric Wong
2021-10-18 14:04  0%                     ` Konstantin Ryabitsev
2021-10-18  5:09  6% [PATCH 0/2] fix v2 mirrors of reused Message-IDs Eric Wong
2021-10-18  5:09  7% ` [PATCH 2/2] v2: mirrors don't clobber msgs w/ " Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).