user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 10/10] search: index attachment filenames
Date: Fri,  9 Sep 2016 00:01:31 +0000	[thread overview]
Message-ID: <20160909000131.18584-11-e@80x24.org> (raw)
In-Reply-To: <20160909000131.18584-1-e@80x24.org>

And while we're at it, ensure searching inside displayable
attachment bodies works.
---
 lib/PublicInbox/Search.pm    |  3 ++-
 lib/PublicInbox/SearchIdx.pm |  4 ++++
 t/search.t                   | 44 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index ceee39a..0c05677 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -69,6 +69,7 @@ my %prob_prefix = (
 	tcf => 'XTO XCC A',
 	b => 'XNQ XQUOT',
 	bs => 'XNQ XQUOT S',
+	n => 'XFN',
 
 	# n.b.: leaving out "a:" alias for "tcf:" even though
 	# mairix supports it.  It is only mentioned in passing in mairix(1)
@@ -77,7 +78,7 @@ my %prob_prefix = (
 	nq => 'XNQ',
 
 	# default:
-	'' => 'XMID S A XNQ XQUOT',
+	'' => 'XMID S A XNQ XQUOT XFN',
 );
 
 # not documenting m: and mid: for now, the using the URLs works w/o Xapian
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index fb68f4b..23aef9f 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -181,6 +181,10 @@ sub add_message {
 		msg_iter($mime, sub {
 			my ($part, $depth, @idx) = @{$_[0]};
 			my $ct = $part->content_type || 'text/plain';
+			my $fn = $part->filename;
+			if (defined $fn && $fn ne '') {
+				$tg->index_text($fn, 1, 'XFN');
+			}
 
 			return if $ct =~ m!\btext/x?html\b!i;
 
diff --git a/t/search.t b/t/search.t
index bddb545..cce3b9e 100644
--- a/t/search.t
+++ b/t/search.t
@@ -386,6 +386,50 @@ sub filter_mids {
 	}
 }
 
+{
+	my $part1 = Email::MIME->create(
+                 attributes => {
+                     content_type => 'text/plain',
+                     disposition  => 'attachment',
+                     charset => 'US-ASCII',
+		     encoding => 'quoted-printable',
+		     filename => 'attached_fart.txt',
+                 },
+                 body_str => 'inside the attachment',
+	);
+	my $part2 = Email::MIME->create(
+                 attributes => {
+                     content_type => 'text/plain',
+                     disposition  => 'attachment',
+                     charset => 'US-ASCII',
+		     encoding => 'quoted-printable',
+		     filename => 'part_deux.txt',
+                 },
+                 body_str => 'inside another',
+	);
+	my $amsg = Email::MIME->create(
+		header_str => [
+			Subject => 'see attachment',
+			'Message-ID' => '<file@attached>',
+			From => 'John Smith <js@example.com>',
+			To => 'list@example.com',
+		],
+		parts => [ $part1, $part2 ],
+	);
+	ok($rw->add_message($amsg), 'added attachment');
+	$rw_commit->();
+	$ro->reopen;
+	my $n = $ro->query('n:attached_fart.txt');
+	is(scalar @{$n->{msgs}}, 1, 'got result for n:');
+	my $res = $ro->query('part_deux.txt');
+	is(scalar @{$res->{msgs}}, 1, 'got result without n:');
+	is($n->{msgs}->[0]->mid, $res->{msgs}->[0]->mid,
+		'same result with and without');
+	my $txt = $ro->query('"inside another"');
+	is($txt->{msgs}->[0]->mid, $res->{msgs}->[0]->mid,
+		'search inside text attachments works');
+}
+
 done_testing();
 
 1;
-- 
EW


      parent reply	other threads:[~2016-09-09  0:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09  0:01 [PATCH 0/10] search: more mairix prefix compatibility Eric Wong
2016-09-09  0:01 ` [PATCH 01/10] search: allow searching user fields (To/Cc/From) Eric Wong
2016-09-09  0:01 ` [PATCH 02/10] search: drop longer subject: prefix for search Eric Wong
2016-09-09  0:01 ` [PATCH 03/10] search: more granular message body searching Eric Wong
2016-09-09  0:01 ` [PATCH 04/10] search: fix space regressions from recent changes Eric Wong
2016-09-09  0:01 ` [PATCH 05/10] search: match quote detection behavior of view Eric Wong
2016-09-09  0:01 ` [PATCH 06/10] search: increase term positions for each quoted hunk Eric Wong
2016-09-09  0:01 ` [PATCH 07/10] search: fix compatibility with Debian wheezy Eric Wong
2016-09-09  0:01 ` [PATCH 08/10] search: avoid mindlessly calling body_set Eric Wong
2016-09-09  0:01 ` [PATCH 09/10] search: match the behavior of WWW for indexing text Eric Wong
2016-09-09  0:01 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160909000131.18584-11-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).