From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH] search: index byte size of a message for IMAP search
Date: Mon, 1 Jun 2020 22:10:35 +0000 [thread overview]
Message-ID: <20200601221035.31273-1-e@yhbt.net> (raw)
Searching for messages smaller than a certain size is allowed by
offlineimap(1), mbsync(1), and possibly other tools. Maybe
public-inbox-watch will support it, too.
I don't see a reason to expose searching by size via WWW search
right now (but maybe in the future, I could be convinced to).
Note: we only store the byte-size of the message in git,
this is typically LF-only and we won't have the correct
size after CRLF conversion for NNTP or IMAP.
However, since most folks using tools like mbsync(1) and
offlineimap(1) would be on *nix systems where LF-only is
expected, I don't see the point of spending LoC or CPU cycles to
count bytes for CRLF on the wire.
---
lib/PublicInbox/Search.pm | 12 ++++++++----
lib/PublicInbox/SearchIdx.pm | 2 ++
t/search.t | 6 ++++++
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index cb669e8733e..f2d3b92dc82 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -5,12 +5,16 @@
# Read-only search interface for use by the web and NNTP interfaces
package PublicInbox::Search;
use strict;
-use warnings;
# values for searching
-use constant TS => 0; # Received: header in Unix time
-use constant YYYYMMDD => 1; # Date: header for searching in the WWW UI
-use constant DT => 2; # Date: YYYYMMDDHHMMSS
+use constant {
+ TS => 0, # Received: header in Unix time (IMAP INTERNALDATE)
+ YYYYMMDD => 1, # Date: header for searching in the WWW UI
+ DT => 2, # Date: YYYYMMDDHHMMSS
+ BYTES => 3, # IMAP RFC822.SIZE
+ # TODO
+ # REPLYCNT => 4, # IMAP ANSWERED
+};
use PublicInbox::Smsg;
use PublicInbox::Over;
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index f10a9104e78..5c161b9accf 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -341,6 +341,7 @@ sub add_xapian ($$$$) {
add_val($doc, PublicInbox::Search::YYYYMMDD(), $yyyymmdd);
my $dt = strftime('%Y%m%d%H%M%S', @ds);
add_val($doc, PublicInbox::Search::DT(), $dt);
+ add_val($doc, PublicInbox::Search::BYTES(), $smsg->{bytes});
my $tg = term_generator($self);
$tg->set_document($doc);
@@ -388,6 +389,7 @@ sub add_message {
# v1 and tests only:
$smsg->populate($hdr, $self);
+ $smsg->{bytes} //= length($mime->as_string);
eval {
# order matters, overview stores every possible piece of
diff --git a/t/search.t b/t/search.t
index 6cf2bc2d6b4..cf3254169ca 100644
--- a/t/search.t
+++ b/t/search.t
@@ -318,6 +318,12 @@ $ibx->with_umask(sub {
foreach my $m ($mset->items) {
my $smsg = $ro->{over_ro}->get_art($m->get_docid);
like($smsg->{to}, qr/\blist\@example\.com\b/, 'to appears');
+ my $doc = $m->get_document;
+ my $col = PublicInbox::Search::BYTES();
+ my $bytes = PublicInbox::Smsg::get_val($doc, $col);
+ like($bytes, qr/\A[0-9]+\z/, '$bytes stored as digit');
+ ok($bytes > 0, '$bytes is > 0');
+ is($bytes, $smsg->{bytes}, 'bytes Xapian value matches Over');
}
$mset = $ro->query('tc:list@example.com', {mset => 1});
reply other threads:[~2020-06-01 22:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200601221035.31273-1-e@yhbt.net \
--to=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).