From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 4C5E720714 for ; Fri, 9 Sep 2016 00:01:35 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 06/10] search: increase term positions for each quoted hunk Date: Fri, 9 Sep 2016 00:01:27 +0000 Message-Id: <20160909000131.18584-7-e@80x24.org> In-Reply-To: <20160909000131.18584-1-e@80x24.org> References: <20160909000131.18584-1-e@80x24.org> List-Id: We pay a storage cost for storing positional information in Xapian, make good use of it by attempting to preserve it for (hopefully) better search results. --- lib/PublicInbox/SearchIdx.pm | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 25452da..0e499ad 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -135,6 +135,13 @@ sub index_users ($$) { $tg->increase_termpos; } +sub index_body ($$$) { + my ($tg, $lines, $inc) = @_; + $tg->index_text(join("\n", @$lines), $inc, $inc ? 'XNQ' : 'XQUOT'); + @$lines = (); + $tg->increase_termpos; +} + sub add_message { my ($self, $mime, $bytes, $num, $blob) = @_; # mime = Email::MIME object my $db = $self->{xdb}; @@ -185,23 +192,15 @@ sub add_message { my @lines = split(/\n/, $body); while (defined(my $l = shift @lines)) { if ($l =~ /^>/) { + index_body($tg, \@orig, 1) if @orig; push @quot, $l; } else { + index_body($tg, \@quot, 0) if @quot; push @orig, $l; } } - if (@quot) { - my $s = join("\n", @quot); - @quot = (); - $tg->index_text($s, 0, 'XQUOT'); - $tg->increase_termpos; - } - if (@orig) { - my $s = join("\n", @orig); - @orig = (); - $tg->index_text($s, 1, 'XNQ'); - $tg->increase_termpos; - } + index_body($tg, \@quot, 0) if @quot; + index_body($tg, \@orig, 1) if @orig; }); link_message($self, $smsg, $old_tid); -- EW