* [PATCH 03/10] search: more granular message body searching
2016-09-09 0:01 7% [PATCH 0/10] search: more mairix prefix compatibility Eric Wong
@ 2016-09-09 0:01 6% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2016-09-09 0:01 UTC (permalink / raw)
To: meta
"bs:" and "b:" are adapted from mairix(1)
We will also support searching explicitly for quoted vs
non-quoted text via "q:" and "nq:" prefixes since sometimes
readers will not care for quoted text.
In the future, we will support parsing diffs (perhaps when
repobrowse integration is complete).
Note: this roughly doubles the size of the Xapian database due
to the additional information; so this change may not be worth
it.
---
lib/PublicInbox/Search.pm | 18 ++++++++++++------
lib/PublicInbox/SearchIdx.pm | 17 ++++++++++++++---
t/search.t | 25 +++++++++++++++++++++++++
3 files changed, 51 insertions(+), 9 deletions(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 3b25b66..f74129d 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -58,16 +58,22 @@ my %bool_pfx_external = (
);
my %prob_prefix = (
- s => 'S', # for mairix compatibility
+ # for mairix compatibility
+ s => 'S',
m => 'Q', # 'mid' is exact, 'm' can do partial
- f => 'A', # for mairix compatibility
- t => 'XTO', # for mairix compatibility
- tc => 'XTC', # for mairix compatibility
- c => 'XCC', # for mairix compatibility
- tcf => 'XTCF', # for mairix compatibility
+ f => 'A',
+ t => 'XTO',
+ tc => 'XTC',
+ c => 'XCC',
+ tcf => 'XTCF',
+ b => 'XBODY',
+ bs => 'XBS',
+
# n.b.: leaving out "a:" alias for "tcf:" even though
# mairix supports it. It is only mentioned in passing in mairix(1)
# and the extra two letters are not significantly longer.
+ q => 'XQUOT',
+ nq => 'XNQ',
);
# not documenting m: and mid: for now, the using the URLs works w/o Xapian
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 37fefbe..cd27a29 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -173,7 +173,10 @@ sub add_message {
my $tg = $self->term_generator;
$tg->set_document($doc);
- $tg->index_text($subj, 1, 'S') if $subj;
+ if ($subj) {
+ $tg->index_text($subj, 1, 'S');
+ $tg->index_text($subj, 1, 'XBS');
+ }
$tg->increase_termpos;
$tg->index_text($subj) if $subj;
$tg->increase_termpos;
@@ -199,13 +202,21 @@ sub add_message {
}
}
if (@quot) {
- $tg->index_text(join("\n", @quot), 0);
+ my $s = join("\n", @quot);
@quot = ();
+ $tg->index_text($s, 1, 'XQUOT');
+ $tg->index_text($s, 0, 'XBS');
+ $tg->index_text($s, 0, 'XBODY');
+ $tg->index_text($s, 0);
$tg->increase_termpos;
}
if (@orig) {
- $tg->index_text(join("\n", @orig));
+ my $s = join("\n", @orig);
@orig = ();
+ $tg->index_text($s, 1, 'XNQ');
+ $tg->index_text($s, 1, 'XBS');
+ $tg->index_text($s, 1, 'XBODY');
+ $tg->index_text($s);
$tg->increase_termpos;
}
});
diff --git a/t/search.t b/t/search.t
index 7abaf83..bddb545 100644
--- a/t/search.t
+++ b/t/search.t
@@ -361,6 +361,31 @@ sub filter_mids {
}
}
+{
+ $rw_commit->();
+ $ro->reopen;
+ my $res = $ro->query('b:hello');
+ is(scalar @{$res->{msgs}}, 0, 'no match on body search only');
+ $res = $ro->query('bs:smith');
+ is(scalar @{$res->{msgs}}, 0,
+ 'no match on body+subject search for From');
+
+ $res = $ro->query('q:theatre');
+ is(scalar @{$res->{msgs}}, 1, 'only one quoted body');
+ like($res->{msgs}->[0]->from, qr/\AQuoter/, 'got quoted body');
+
+ $res = $ro->query('nq:theatre');
+ is(scalar @{$res->{msgs}}, 1, 'only one non-quoted body');
+ like($res->{msgs}->[0]->from, qr/\ANon-Quoter/, 'got non-quoted body');
+
+ foreach my $pfx (qw(b: bs:)) {
+ $res = $ro->query($pfx . 'theatre');
+ is(scalar @{$res->{msgs}}, 2, "searched both bodies for $pfx");
+ like($res->{msgs}->[0]->from, qr/\ANon-Quoter/,
+ "non-quoter first for $pfx");
+ }
+}
+
done_testing();
1;
--
EW
^ permalink raw reply related [relevance 6%]
* [PATCH 0/10] search: more mairix prefix compatibility
@ 2016-09-09 0:01 7% Eric Wong
2016-09-09 0:01 6% ` [PATCH 03/10] search: more granular message body searching Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2016-09-09 0:01 UTC (permalink / raw)
To: meta
This brings us closer to the behavior of mairix(1) for search
by supporting n:, t:, c:, f:, tc:, tcf:, n:, b:, and bs:
prefixes as documented in the mairix(1) manpage.
We also introduce the use of q: and nq: prefixes for quoted and
non-quoted text, respectively.
There is a schema version change in [PATCH 7/10] to maintain
compatibility with Debian 7.x wheezy installs. The in-place
reindexing would've been expensive anyways, so perhaps the
schema bump is a good idea, anyways, as creating a fresh index
should be faster than --reindex.
Eric Wong (10):
search: allow searching user fields (To/Cc/From)
search: drop longer subject: prefix for search
search: more granular message body searching
search: fix space regressions from recent changes
search: match quote detection behavior of view
search: increase term positions for each quoted hunk
search: fix compatibility with Debian wheezy
search: avoid mindlessly calling body_set
search: match the behavior of WWW for indexing text
search: index attachment filenames
lib/PublicInbox/Search.pm | 32 +++++++++---
lib/PublicInbox/SearchIdx.pm | 104 ++++++++++++++++++++++++-------------
t/search.t | 120 ++++++++++++++++++++++++++++++++++++++++---
3 files changed, 206 insertions(+), 50 deletions(-)
^ permalink raw reply [relevance 7%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-09-09 0:01 7% [PATCH 0/10] search: more mairix prefix compatibility Eric Wong
2016-09-09 0:01 6% ` [PATCH 03/10] search: more granular message body searching Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).