user/dev discussion of public-inbox itself
 help / Atom feed
* [PATCH] searchidx: deal with empty In-Reply-To and References headers
@ 2017-02-06 20:02 Eric Wong
  0 siblings, 0 replies; 1+ messages in thread
From: Eric Wong @ 2017-02-06 20:02 UTC (permalink / raw)
  To: meta; +Cc: Johannes Schindelin

In some messages, these headers exist, but have empty values.
Do not let empty values throw off our search indexer to tie
threads together, as it can make non-sensical threads grouped
to a Message-Id of "" (empty string).

See
<https://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@rawuncut.elitemail.org/raw>
for an example of such a message.

Thanks-to: Johannes Schindelin <Johannes.Schindelin@gmx.de>
  <https://public-inbox.org/git/alpine.DEB.2.20.1702041206130.3496@virtualbox/>
---
 Not fixed on the live sites, yet, but it will be once reindexing
 finishes (eatmydata public-inbox-index --reindex $GIT_DIR)

 lib/PublicInbox/SearchIdx.pm | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index d63dd7c..1142ca7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -292,11 +292,15 @@ sub link_message {
 	my $mime = $smsg->{mime};
 	my $hdr = $mime->header_obj;
 	my $refs = $hdr->header_raw('References');
-	my @refs = $refs ? ($refs =~ /<([^>]+)>/g) : ();
+	my @refs = defined $refs ? ($refs =~ /<([^>]+)>/g) : ();
 	my $irt = $hdr->header_raw('In-Reply-To');
 	if (defined $irt) {
-		$irt = mid_clean($irt);
-		$irt = undef if $mid eq $irt;
+		if ($irt eq '') {
+			$irt = undef;
+		} else {
+			$irt = mid_clean($irt);
+			$irt = undef if $mid eq $irt;
+		}
 	}
 
 	my $tid;
-- 
EW

^ permalink raw reply	[flat|threaded] 1+ messages in thread

only message in thread, back to index

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 20:02 [PATCH] searchidx: deal with empty In-Reply-To and References headers Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox