* [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir
2020-12-18 12:09 6% [PATCH 00/26] lei: basic UI + IPC work Eric Wong
@ 2020-12-18 12:09 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-12-18 12:09 UTC (permalink / raw)
To: meta
Dovecot, mutt, and likely much other software support mbox
Status/X-Status headers. Ensure we have a way to extract these
headers as JMAP-compatible keywords before removing them for git
storage.
->add_eml now accepts setting keywords at import time,
and will probably be called like this:
$lst->add_eml($eml, $lst->mbox_keywords($eml));
$lst->add_eml($eml, $lst->maildir_keywords($fn));
---
lib/PublicInbox/LeiStore.pm | 23 ++++++++++++++++++++++-
t/lei_store.t | 14 ++++++++++++++
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index c95df785..553adbc8 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -162,8 +162,27 @@ sub remove_eml_keywords {
\@docids;
}
+# cf: https://doc.dovecot.org/configuration_manual/mail_location/mbox/
+my %status2kw = (F => 'flagged', A => 'answered', R => 'seen', T => 'draft');
+# O (old/non-recent), and D (deleted) aren't in JMAP,
+# so probably won't be supported by us.
+sub mbox_keywords {
+ my $eml = $_[-1];
+ my $s = "@{[$eml->header_raw('X-Status'),$eml->header_raw('Status')]}";
+ my %kw;
+ $s =~ s/([FART])/$kw{$status2kw{$1}} = 1/sge;
+ sort(keys %kw);
+}
+
+# cf: https://cr.yp.to/proto/maildir.html
+my %c2kw = ('D' => 'draft', F => 'flagged', R => 'answered', S => 'seen');
+sub maildir_keywords {
+ $_[-1] =~ /:2,([A-Z]+)\z/i ?
+ sort(map { $c2kw{$_} // () } split(//, $1)) : ();
+}
+
sub add_eml {
- my ($self, $eml) = @_;
+ my ($self, $eml, @kw) = @_;
my $eidx = eidx_init($self);
my $oidx = $eidx->{oidx};
my $smsg = bless { -oidx => $oidx }, 'PublicInbox::Smsg';
@@ -178,6 +197,7 @@ sub add_eml {
my $idx = $eidx->idx_shard($docid);
$oidx->add_xref3($docid, -1, $smsg->{blob}, '.');
$idx->shard_add_eidx_info($docid, '.', $eml); # List-Id
+ $idx->shard_add_keywords($docid, @kw) if @kw;
}
} else {
$smsg->{num} = $oidx->adj_counter('eidx_docid', '+');
@@ -185,6 +205,7 @@ sub add_eml {
$oidx->add_xref3($smsg->{num}, -1, $smsg->{blob}, '.');
my $idx = $eidx->idx_shard($smsg->{num});
$idx->index_raw($msgref, $eml, $smsg);
+ $idx->shard_add_keywords($smsg->{num}, @kw) if @kw;
}
$smsg->{blob}
}
diff --git a/t/lei_store.t b/t/lei_store.t
index c18a9620..03ab5af6 100644
--- a/t/lei_store.t
+++ b/t/lei_store.t
@@ -19,6 +19,20 @@ like($oid, qr/\A[0-9a-f]+\z/, 'add returned OID');
my $eml = eml_load('t/data/0001.patch');
is($lst->add_eml($eml), undef, 'idempotent');
$lst->done;
+is_deeply([$lst->mbox_keywords($eml)], [], 'no keywords');
+$eml->header_set('Status', 'RO');
+is_deeply([$lst->mbox_keywords($eml)], ['seen'], 'seen extracted');
+$eml->header_set('X-Status', 'A');
+is_deeply([$lst->mbox_keywords($eml)], [qw(answered seen)],
+ 'seen+answered extracted');
+$eml->header_set($_) for qw(Status X-Status);
+
+is_deeply([$lst->maildir_keywords('/foo:2,')], [], 'Maildir no keywords');
+is_deeply([$lst->maildir_keywords('/foo:2,S')], ['seen'], 'Maildir seen');
+is_deeply([$lst->maildir_keywords('/foo:2,RS')], ['answered', 'seen'],
+ 'Maildir answered + seen');
+is_deeply([$lst->maildir_keywords('/foo:2,RSZ')], ['answered', 'seen'],
+ 'Maildir answered + seen w/o Z');
{
my $es = $lst->search;
my $msgs = $es->over->query_xover(0, 1000);
^ permalink raw reply related [relevance 7%]
* [PATCH 00/26] lei: basic UI + IPC work
@ 2020-12-18 12:09 6% Eric Wong
2020-12-18 12:09 7% ` [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-12-18 12:09 UTC (permalink / raw)
To: meta
Some work on the storage side, but MiscIdx still needs work to
handle existing publicinboxes, extinboxes (over HTTP(S)), and
other config things.
PATCH 22/26 - bash completion sorta works, but filename
completions get broken. Not sure why and help would be
greatly appreciated (along with help for other shells).
I don't know bash-specific stuff well at all, even; and
less about other non-POSIX shells.
Somewhat nice UI things (at least to my delirious sleep-deprived
state):
* -$DIGIT option parsing works (e.g. "git log -10"),
"kill -9"
* help-based CLI arg/prototype checking seems working
and hopefully cuts down on long-term maintenance work
while promoting UI consistency
* having IO::FDPass hides startup time, 20-30ms isn't
really noticeable for humans on interactive terminals,
but still not ideal for loops.
* lei.sh + "make symlink-install"
And some internal improvements:
* several simplifications to existing Search code,
->xdb_shards_flat will come in handy
* generic OnDestroy - long overdue
Eric Wong (26):
lei: FD-passing and IPC basics
lei: proposed command-listing and options
lei_store: local storage for Local Email Interface
tests: more common JSON module loading
lei: use spawn (vfork + execve) for lazy start
lei: refine help/option parsing, implement "init"
t/lei-oneshot: standalone oneshot (non-socket) test
lei: ensure we run a restrictive umask
lei: support `daemon-env' for modifying long-lived env
lei_store: simplify git_epoch_max, slightly
search: simplify initialization, add ->xdb_shards_flat
rename LeiDaemon package to PublicInbox::LEI
lei: support pass-through for `lei config'
lei: help: show actual paths being operated on
lei: rename $client => $self and bless
lei: micro-optimize startup time
lei_store: relax GIT_COMMITTER_IDENT check
lei_store: keyword extraction from mbox and Maildir
on_destroy: generic localized END
lei: restore default __DIE__ handler for event loop
lei: drop $SIG{__DIE__}, add oneshot fallbacks
lei: start working on bash completion
build: add lei.sh + "make symlink-install" target
lei: support for -$DIGIT and -$SIG CLI switches
lei: revise output routines
lei: extinbox: start implementing in config file
MANIFEST | 11 +
Makefile.PL | 11 +
contrib/completion/lei-completion.bash | 11 +
lei.sh | 7 +
lib/PublicInbox/Daemon.pm | 6 +-
lib/PublicInbox/ExtSearch.pm | 10 +-
lib/PublicInbox/ExtSearchIdx.pm | 35 +-
lib/PublicInbox/Import.pm | 4 +
lib/PublicInbox/LEI.pm | 776 +++++++++++++++++++++++++
lib/PublicInbox/LeiExtinbox.pm | 52 ++
lib/PublicInbox/LeiSearch.pm | 39 ++
lib/PublicInbox/LeiStore.pm | 227 ++++++++
lib/PublicInbox/ManifestJsGz.pm | 2 +-
lib/PublicInbox/OnDestroy.pm | 16 +
lib/PublicInbox/OverIdx.pm | 10 +
lib/PublicInbox/Search.pm | 65 +--
lib/PublicInbox/SearchIdx.pm | 62 +-
lib/PublicInbox/SearchIdxShard.pm | 33 ++
lib/PublicInbox/TestCommon.pm | 7 +-
lib/PublicInbox/V2Writable.pm | 10 +-
script/lei | 76 +++
t/extsearch.t | 3 +-
t/lei-oneshot.t | 25 +
t/lei.t | 306 ++++++++++
t/lei_store.t | 88 +++
t/on_destroy.t | 25 +
t/www_listing.t | 8 +-
27 files changed, 1843 insertions(+), 82 deletions(-)
create mode 100644 contrib/completion/lei-completion.bash
create mode 100755 lei.sh
create mode 100644 lib/PublicInbox/LEI.pm
create mode 100644 lib/PublicInbox/LeiExtinbox.pm
create mode 100644 lib/PublicInbox/LeiSearch.pm
create mode 100644 lib/PublicInbox/LeiStore.pm
create mode 100644 lib/PublicInbox/OnDestroy.pm
create mode 100755 script/lei
create mode 100644 t/lei-oneshot.t
create mode 100644 t/lei.t
create mode 100644 t/lei_store.t
create mode 100644 t/on_destroy.t
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-12-18 12:09 6% [PATCH 00/26] lei: basic UI + IPC work Eric Wong
2020-12-18 12:09 7% ` [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).