user/dev discussion of public-inbox itself
 help / Atom feed
* [PATCH 0/3] force reindex for threading changes
@ 2017-02-06 21:55 Eric Wong
  2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

We cannot rely on in-place --reindex to handle thread_id
changes when we fix threading bugs in the search indexer
like in commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")

So, bump the schema version and pay the cost of requiring
extra disk space to create a new index in parallel.


^ permalink raw reply	[flat|threaded] 4+ messages in thread

* [PATCH 1/3] searchidx: reindex clobbers old thread IDs
  2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
  2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
  2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

We cannot always reuse thread IDs since our threading
logic may change as bugs are fixed.
---
 lib/PublicInbox/SearchIdx.pm | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 1142ca7..bc003c6 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -157,6 +157,10 @@ sub add_message {
 			# it will also clobber any existing regular message
 			$doc_id = $smsg->{doc_id};
 			$old_tid = $smsg->thread_id;
+
+			# no need to remove_term for old_tid, we use a new
+			# doc to replace the old one when reindexing:
+			$old_tid = undef if $self->{reindex};
 		}
 		$smsg = PublicInbox::SearchMsg->new($mime);
 		my $doc = $smsg->{doc};
@@ -464,7 +468,7 @@ sub _git_log {
 sub _index_sync {
 	my ($self, $opts) = @_;
 	my $tip = $opts->{ref} || 'HEAD';
-	my $reindex = $opts->{reindex};
+	$self->{reindex} = $opts->{reindex};
 	my ($mkey, $last_commit, $lx, $xlog);
 	$self->{git}->batch_prepare;
 	my $xdb = _xdb_acquire($self);
@@ -474,7 +478,7 @@ sub _index_sync {
 		$mkey = 'last_commit';
 		$last_commit = $xdb->get_metadata('last_commit');
 		$lx = $last_commit;
-		if ($reindex) {
+		if ($self->{reindex}) {
 			$lx = '';
 			$mkey = undef if $last_commit ne '';
 		}
-- 
EW


^ permalink raw reply	[flat|threaded] 4+ messages in thread

* [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs"
  2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
  2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
  2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

Oops, that's broken, too.  I guess the only way to reindex
after fixing the thread detection is to start from scratch.

This reverts commit 5d91adedf5f33ef1cb87df2a86306ddf370b4f8d.
---
 lib/PublicInbox/SearchIdx.pm | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index bc003c6..1142ca7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -157,10 +157,6 @@ sub add_message {
 			# it will also clobber any existing regular message
 			$doc_id = $smsg->{doc_id};
 			$old_tid = $smsg->thread_id;
-
-			# no need to remove_term for old_tid, we use a new
-			# doc to replace the old one when reindexing:
-			$old_tid = undef if $self->{reindex};
 		}
 		$smsg = PublicInbox::SearchMsg->new($mime);
 		my $doc = $smsg->{doc};
@@ -468,7 +464,7 @@ sub _git_log {
 sub _index_sync {
 	my ($self, $opts) = @_;
 	my $tip = $opts->{ref} || 'HEAD';
-	$self->{reindex} = $opts->{reindex};
+	my $reindex = $opts->{reindex};
 	my ($mkey, $last_commit, $lx, $xlog);
 	$self->{git}->batch_prepare;
 	my $xdb = _xdb_acquire($self);
@@ -478,7 +474,7 @@ sub _index_sync {
 		$mkey = 'last_commit';
 		$last_commit = $xdb->get_metadata('last_commit');
 		$lx = $last_commit;
-		if ($self->{reindex}) {
+		if ($reindex) {
 			$lx = '';
 			$mkey = undef if $last_commit ne '';
 		}
-- 
EW


^ permalink raw reply	[flat|threaded] 4+ messages in thread

* [PATCH 3/3] search: schema version bump for empty References/In-Reply-To
  2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
  2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
  2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
@ 2017-02-06 21:55 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
---
 lib/PublicInbox/Search.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index c909424..8c72fa1 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -39,7 +39,9 @@ use constant {
 	# 10 - optimize doc for NNTP overviews
 	# 11 - merge threads when vivifying ghosts
 	# 12 - change YYYYMMDD value column to numeric
-	SCHEMA_VERSION => 12,
+	# 13 - fix threading for empty References/In-Reply-To
+	#      (commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0)
+	SCHEMA_VERSION => 13,
 
 	# n.b. FLAG_PURE_NOT is expensive not suitable for a public website
 	# as it could become a denial-of-service vector
-- 
EW


^ permalink raw reply	[flat|threaded] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 21:55 [PATCH 0/3] force reindex for threading changes Eric Wong
2017-02-06 21:55 ` [PATCH 1/3] searchidx: reindex clobbers old thread IDs Eric Wong
2017-02-06 21:55 ` [PATCH 2/3] Revert "searchidx: reindex clobbers old thread IDs" Eric Wong
2017-02-06 21:55 ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox