From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 081371FAFC; Mon, 6 Feb 2017 20:02:17 +0000 (UTC) Date: Mon, 6 Feb 2017 20:02:17 +0000 From: Eric Wong To: meta@public-inbox.org Cc: Johannes Schindelin Subject: [PATCH] searchidx: deal with empty In-Reply-To and References headers Message-ID: <20170206200216.GA26676@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: In some messages, these headers exist, but have empty values. Do not let empty values throw off our search indexer to tie threads together, as it can make non-sensical threads grouped to a Message-Id of "" (empty string). See for an example of such a message. Thanks-to: Johannes Schindelin --- Not fixed on the live sites, yet, but it will be once reindexing finishes (eatmydata public-inbox-index --reindex $GIT_DIR) lib/PublicInbox/SearchIdx.pm | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index d63dd7c..1142ca7 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -292,11 +292,15 @@ sub link_message { my $mime = $smsg->{mime}; my $hdr = $mime->header_obj; my $refs = $hdr->header_raw('References'); - my @refs = $refs ? ($refs =~ /<([^>]+)>/g) : (); + my @refs = defined $refs ? ($refs =~ /<([^>]+)>/g) : (); my $irt = $hdr->header_raw('In-Reply-To'); if (defined $irt) { - $irt = mid_clean($irt); - $irt = undef if $mid eq $irt; + if ($irt eq '') { + $irt = undef; + } else { + $irt = mid_clean($irt); + $irt = undef if $mid eq $irt; + } } my $tid; -- EW