* [PATCH 06/10] search: increase term positions for each quoted hunk
2016-09-09 0:01 5% [PATCH 0/10] search: more mairix prefix compatibility Eric Wong
@ 2016-09-09 0:01 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2016-09-09 0:01 UTC (permalink / raw)
To: meta
We pay a storage cost for storing positional information
in Xapian, make good use of it by attempting to preserve
it for (hopefully) better search results.
---
lib/PublicInbox/SearchIdx.pm | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 25452da..0e499ad 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -135,6 +135,13 @@ sub index_users ($$) {
$tg->increase_termpos;
}
+sub index_body ($$$) {
+ my ($tg, $lines, $inc) = @_;
+ $tg->index_text(join("\n", @$lines), $inc, $inc ? 'XNQ' : 'XQUOT');
+ @$lines = ();
+ $tg->increase_termpos;
+}
+
sub add_message {
my ($self, $mime, $bytes, $num, $blob) = @_; # mime = Email::MIME object
my $db = $self->{xdb};
@@ -185,23 +192,15 @@ sub add_message {
my @lines = split(/\n/, $body);
while (defined(my $l = shift @lines)) {
if ($l =~ /^>/) {
+ index_body($tg, \@orig, 1) if @orig;
push @quot, $l;
} else {
+ index_body($tg, \@quot, 0) if @quot;
push @orig, $l;
}
}
- if (@quot) {
- my $s = join("\n", @quot);
- @quot = ();
- $tg->index_text($s, 0, 'XQUOT');
- $tg->increase_termpos;
- }
- if (@orig) {
- my $s = join("\n", @orig);
- @orig = ();
- $tg->index_text($s, 1, 'XNQ');
- $tg->increase_termpos;
- }
+ index_body($tg, \@quot, 0) if @quot;
+ index_body($tg, \@orig, 1) if @orig;
});
link_message($self, $smsg, $old_tid);
--
EW
^ permalink raw reply related [relevance 7%]
* [PATCH 0/10] search: more mairix prefix compatibility
@ 2016-09-09 0:01 5% Eric Wong
2016-09-09 0:01 7% ` [PATCH 06/10] search: increase term positions for each quoted hunk Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2016-09-09 0:01 UTC (permalink / raw)
To: meta
This brings us closer to the behavior of mairix(1) for search
by supporting n:, t:, c:, f:, tc:, tcf:, n:, b:, and bs:
prefixes as documented in the mairix(1) manpage.
We also introduce the use of q: and nq: prefixes for quoted and
non-quoted text, respectively.
There is a schema version change in [PATCH 7/10] to maintain
compatibility with Debian 7.x wheezy installs. The in-place
reindexing would've been expensive anyways, so perhaps the
schema bump is a good idea, anyways, as creating a fresh index
should be faster than --reindex.
Eric Wong (10):
search: allow searching user fields (To/Cc/From)
search: drop longer subject: prefix for search
search: more granular message body searching
search: fix space regressions from recent changes
search: match quote detection behavior of view
search: increase term positions for each quoted hunk
search: fix compatibility with Debian wheezy
search: avoid mindlessly calling body_set
search: match the behavior of WWW for indexing text
search: index attachment filenames
lib/PublicInbox/Search.pm | 32 +++++++++---
lib/PublicInbox/SearchIdx.pm | 104 ++++++++++++++++++++++++-------------
t/search.t | 120 ++++++++++++++++++++++++++++++++++++++++---
3 files changed, 206 insertions(+), 50 deletions(-)
^ permalink raw reply [relevance 5%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-09-09 0:01 5% [PATCH 0/10] search: more mairix prefix compatibility Eric Wong
2016-09-09 0:01 7% ` [PATCH 06/10] search: increase term positions for each quoted hunk Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).