user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Kyle Meyer <kyle@kyleam.com>
Cc: meta@public-inbox.org
Subject: [PATCH] over: ensure old, merged {tid} is really gone
Date: Fri, 4 Dec 2020 12:09:29 +0000	[thread overview]
Message-ID: <20201204120929.GA22736@dcvr> (raw)
In-Reply-To: <87pn3qjlb8.fsf@kyleam.com>

Kyle Meyer <kyle@kyleam.com> wrote:
> Eric Wong writes:
> 
> > Yes, the fix is quite small (I think the below test case can be
> > made smaller).
> >
> > --rethread seems to be a separate bug, will fix when more awake.
> 
> Great, thank you for the explanation and fix.

No problem, thanks for the excellent bug report.  Below patch
fixes the rethread case, too, which has the same conceptual
problem in it was leaving a stale {tid} in the DB.

Also made the test data smaller and a shuffle test to ensure
it's truly order-independent.

> By the way, I've been enjoying playing with the extindex feature.  I
> haven't been doing anything fancy and it's just a few inboxes, but it's
> been _really_ nice to be able to group the inboxes for a project.
> Anyway, off topic for this thread, but thanks for that too!

Great to know.  Working on some other stuff in that area to
allow selecting inbox subsets and (hopefully) reduce SSD wear.

-------------------8<---------------

Subject: [PATCH] over: ensure old, merged {tid} is really gone

We must use the result of link_refs() since it can trigger
merge_threads() and invalidate $old_tid.  In case
merge_threads() isn't triggered, link_refs() will return
$old_tid anyways.

When rethreading and allocating new {tid}, we also must update
the row where the now-expired {tid} came from to ensure only the
new {tid} is seen when reindexing subsequent messages in
history.  Otherwise, every subsequently reindexed+rethreaded
message could end up getting a new {tid}.

Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87360nlc44.fsf@kyleam.com/
---
 MANIFEST                   |  1 +
 lib/PublicInbox/OverIdx.pm | 12 ++++++--
 t/thread-index-gap.t       | 57 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 t/thread-index-gap.t

diff --git a/MANIFEST b/MANIFEST
index 544ec5f9..946e4b8a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -366,6 +366,7 @@ t/solver_git.t
 t/spamcheck_spamc.t
 t/spawn.t
 t/thread-cycle.t
+t/thread-index-gap.t
 t/time.t
 t/uri_imap.t
 t/utf8.eml
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 07cca4e5..88daa64f 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -170,8 +170,14 @@ sub _resolve_mid_to_tid {
 		$$tid = $cur_tid;
 	} else { # rethreading, queue up dead ghosts
 		$$tid = next_tid($self);
-		my $num = $smsg->{num};
-		push(@{$self->{-ghosts_to_delete}}, $num) if $num < 0;
+		my $n = $smsg->{num};
+		if ($n > 0) {
+			$self->{dbh}->prepare_cached(<<'')->execute($$tid, $n);
+UPDATE over SET tid = ? WHERE num = ?
+
+		} elsif ($n < 0) {
+			push(@{$self->{-ghosts_to_delete}}, $n);
+		}
 	}
 	1;
 }
@@ -298,7 +304,7 @@ sub _add_over {
 		}
 	} elsif ($n < 0) { # ghost
 		$$old_tid //= $cur_valid ? $cur_tid : next_tid($self);
-		link_refs($self, $refs, $$old_tid);
+		$$old_tid = link_refs($self, $refs, $$old_tid);
 		delete_by_num($self, $n);
 		$$v++;
 	}
diff --git a/t/thread-index-gap.t b/t/thread-index-gap.t
new file mode 100644
index 00000000..49f254e9
--- /dev/null
+++ b/t/thread-index-gap.t
@@ -0,0 +1,57 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use PublicInbox::InboxWritable;
+use PublicInbox::Config;
+use List::Util qw(shuffle);
+require_mods(qw(DBD::SQLite));
+require_git(2.6);
+
+chomp(my @msgs = split(/\n\n/, <<'EOF')); # "git log" order
+Subject: [bug#45000] [PATCH 1/9]
+References: <20201202045335.31096-1-j@example.com>
+Message-Id: <20201202045540.31248-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+Message-Id: <20201202045335.31096-1-j@example.com>
+
+Subject: [bug#45000] [PATCH 0/9]
+References: <20201202045335.31096-1-j@example.com>
+Message-ID: <86sg8o1mou.fsf@example.com>
+
+Subject: [bug#45000] [PATCH 8/9]
+Message-Id: <20201202045540.31248-8-j@example.com>
+References: <20201202045540.31248-1-j@example.com>
+
+EOF
+
+my ($home, $for_destroy) = tmpdir();
+local $ENV{HOME} = $home;
+for my $msgs (['orig', reverse @msgs], ['shuffle', shuffle(@msgs)]) {
+	my $desc = shift @$msgs;
+	my $n = "index-cap-$desc";
+	run_script([qw(-init -L basic -V2), $n, "$home/$n",
+		"http://example.com/$n", "$n\@example.com"]) or
+		BAIL_OUT 'init';
+	my $ibx = PublicInbox::Config->new->lookup_name($n);
+	my $im = PublicInbox::InboxWritable->new($ibx)->importer(0);
+	for my $m (@$msgs) {
+		$im->add(PublicInbox::Eml->new("$m\nFrom: x\@example.com\n\n"));
+	}
+	$im->done;
+	my $over = $ibx->over;
+	my @tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over');
+	is(scalar(@tid), 1, "only one thread initially ($desc)");
+	$over->dbh_close;
+	run_script([qw(-index --reindex --rethread), $ibx->{inboxdir}]) or
+		BAIL_OUT 'rethread';
+	@tid = $over->dbh->selectall_array('SELECT DISTINCT(tid) FROM over');
+	is(scalar(@tid), 1, "only one thread after rethread ($desc)");
+}
+
+done_testing;

      reply	other threads:[~2020-12-04 12:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03  4:59 missing messages in thread overview Kyle Meyer
2020-12-03 20:19 ` Eric Wong
2020-12-04  2:12   ` [WIP] over: ensure old, merged {tid} is really gone Eric Wong
2020-12-04  3:35     ` Kyle Meyer
2020-12-04 12:09       ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204120929.GA22736@dcvr \
    --to=e@80x24.org \
    --cc=kyle@kyleam.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).