user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [RFC] overidx: preserve `tid' column on re-indexing
  2018-08-05  8:19 14%       ` [RFC] overidx: preserve `tid' column on re-indexing Eric Wong
@ 2018-08-05 21:41  8%         ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2018-08-05 21:41 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

Eric Wong <e@80x24.org> wrote:
> Lightly tested, but seems to make sense...
> Reindexing http://czquwvybam4bgbro.onion/git/ now...

Seems fine, updating the non-.onion sites

http://hjrcffqmbrq6wope.onion/git will be last

(And czquwvybam4bgbro.onion going down for unrelated maintenance)

^ permalink raw reply	[relevance 8%]

* [RFC] overidx: preserve `tid' column on re-indexing
  @ 2018-08-05  8:19 14%       ` Eric Wong
  2018-08-05 21:41  8%         ` Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2018-08-05  8:19 UTC (permalink / raw)
  To: meta; +Cc: Konstantin Ryabitsev

Eric Wong <e@80x24.org> wrote:
> While working on this, I noticed the backwards --reindex walk
> breaks `tid' on v1 repositories, at least.  That bug was hidden
> by the Subject: match logic and not discovered until now.  It
> will be fixed separately.

Lightly tested, but seems to make sense...
Reindexing http://czquwvybam4bgbro.onion/git/ now...

-------8<-------
Subject: [RFC] overidx: preserve `tid' column on re-indexing

Otherwise, walking backwards through history could mean the root
message in a thread forgets its `tid' and it prevents messages
from being looked up by it.

This bug was hidden by the fact that `sid' matches were often
good enough to link threads together.
---
 lib/PublicInbox/OverIdx.pm | 11 +++++++++--
 t/search-thr-index.t       | 40 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 62fec0d..cc9bd7d 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -79,8 +79,15 @@ sub mid2id {
 }
 
 sub delete_by_num {
-	my ($self, $num) = @_;
+	my ($self, $num, $tid_ref) = @_;
 	my $dbh = $self->{dbh};
+	if ($tid_ref) {
+		my $sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+		$sth->execute($num);
+		$$tid_ref = $sth->fetchrow_array; # may be undef
+	}
 	foreach (qw(over id2num)) {
 		$dbh->prepare_cached(<<"")->execute($num);
 DELETE FROM $_ WHERE num = ?
@@ -262,7 +269,7 @@ sub add_over {
 	my $vivified = 0;
 
 	$self->begin_lazy;
-	$self->delete_by_num($num);
+	$self->delete_by_num($num, \$old_tid);
 	foreach my $mid (@$mids) {
 		my $v = 0;
 		each_by_mid($self, $mid, ['tid'], sub {
diff --git a/t/search-thr-index.t b/t/search-thr-index.t
index 2aa97bf..ab6d1b0 100644
--- a/t/search-thr-index.t
+++ b/t/search-thr-index.t
@@ -48,9 +48,49 @@ foreach (reverse split(/\n\n/, $data)) {
 }
 
 my $prev;
+my %tids;
+my $dbh = $rw->{over}->connect;
 foreach my $mid (@mids) {
 	my $msgs = $rw->{over}->get_thread($mid);
 	is(3, scalar(@$msgs), "got all messages from $mid");
+	foreach my $m (@$msgs) {
+		my $tid = $dbh->selectrow_array(<<'', undef, $m->{num});
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+		$tids{$tid}++;
+	}
+}
+
+is(scalar keys %tids, 1, 'all messages have the same tid');
+
+$rw->commit_txn_lazy;
+
+$xdb = $rw->begin_txn_lazy;
+{
+	my $mime = Email::MIME->new(<<'');
+Subject: [RFC 00/14]
+Message-Id: <1-bw@g>
+From: bw@g
+To: git@vger.kernel.org
+
+	my $dbh = $rw->{over}->connect;
+	my ($id, $prev);
+	my $reidx = $rw->{over}->next_by_mid('1-bw@g', \$id, \$prev);
+	ok(defined $reidx);
+	my $num = $reidx->{num};
+	my $tid0 = $dbh->selectrow_array(<<'', undef, $num);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+	my $bytes = bytes::length($mime->as_string);
+	my $mid = mids($mime->header_obj)->[0];
+	my $doc_id = $rw->add_message($mime, $bytes, $num, 'ignored', $mid);
+	ok($doc_id, 'message reindexed'. $mid);
+	is($doc_id, $num, "article number unchanged: $num");
+
+	my $tid1 = $dbh->selectrow_array(<<'', undef, $num);
+SELECT tid FROM over WHERE num = ? LIMIT 1
+
+	is($tid1, $tid0, 'tid unchanged on reindex');
 }
 
 $rw->commit_txn_lazy;
-- 
EW

^ permalink raw reply related	[relevance 14%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-08-03 18:26     Threading/searching problem Konstantin Ryabitsev
2018-08-03 19:20     ` Eric Wong
2018-08-03 19:38       ` Konstantin Ryabitsev
2018-08-05  6:04         ` [PATCH] view: distinguish strict and loose thread matches Eric Wong
2018-08-05  8:19 14%       ` [RFC] overidx: preserve `tid' column on re-indexing Eric Wong
2018-08-05 21:41  8%         ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).