user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 0/3] force reindex for threading changes
@ 2017-02-06 21:55  4% Eric Wong
  2017-02-06 21:55  4% ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong
  0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

We cannot rely on in-place --reindex to handle thread_id
changes when we fix threading bugs in the search indexer
like in commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")

So, bump the schema version and pay the cost of requiring
extra disk space to create a new index in parallel.


^ permalink raw reply	[relevance 4%]

* [PATCH 3/3] search: schema version bump for empty References/In-Reply-To
  2017-02-06 21:55  4% [PATCH 0/3] force reindex for threading changes Eric Wong
@ 2017-02-06 21:55  4% ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2017-02-06 21:55 UTC (permalink / raw)
  To: meta

We cannot distinguish between legitimate ghosts and mis-threaded
messages before commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0
("searchidx: deal with empty In-Reply-To and References headers")
so we must rebuild the index in parallel to fix it.
---
 lib/PublicInbox/Search.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index c909424..8c72fa1 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -39,7 +39,9 @@ use constant {
 	# 10 - optimize doc for NNTP overviews
 	# 11 - merge threads when vivifying ghosts
 	# 12 - change YYYYMMDD value column to numeric
-	SCHEMA_VERSION => 12,
+	# 13 - fix threading for empty References/In-Reply-To
+	#      (commit 83425ef12e4b65cdcecd11ddcb38175d4a91d5a0)
+	SCHEMA_VERSION => 13,
 
 	# n.b. FLAG_PURE_NOT is expensive not suitable for a public website
 	# as it could become a denial-of-service vector
-- 
EW


^ permalink raw reply related	[relevance 4%]

* [PATCH] searchidx: deal with empty In-Reply-To and References headers
@ 2017-02-06 20:02  7% Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2017-02-06 20:02 UTC (permalink / raw)
  To: meta; +Cc: Johannes Schindelin

In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).

See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.

Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
  <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
---
 Not fixed on the live sites, yet, but it will be once reindexing
 finishes (eatmydata public-inbox-index --reindex $GIT_DIR)

 lib/PublicInbox/SearchIdx.pm | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index d63dd7c..1142ca7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -292,11 +292,15 @@ sub link_message {
 	my $mime = $smsg->{mime};
 	my $hdr = $mime->header_obj;
 	my $refs = $hdr->header_raw('References');
-	my @refs = $refs ? ($refs =~ /<([^>]+)>/g) : ();
+	my @refs = defined $refs ? ($refs =~ /<([^>]+)>/g) : ();
 	my $irt = $hdr->header_raw('In-Reply-To');
 	if (defined $irt) {
-		$irt = mid_clean($irt);
-		$irt = undef if $mid eq $irt;
+		if ($irt eq '') {
+			$irt = undef;
+		} else {
+			$irt = mid_clean($irt);
+			$irt = undef if $mid eq $irt;
+		}
 	}
 
 	my $tid;
-- 
EW

^ permalink raw reply related	[relevance 7%]

Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2017-02-06 20:02  7% [PATCH] searchidx: deal with empty In-Reply-To and References headers Eric Wong
2017-02-06 21:55  4% [PATCH 0/3] force reindex for threading changes Eric Wong
2017-02-06 21:55  4% ` [PATCH 3/3] search: schema version bump for empty References/In-Reply-To Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).