user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir
  2020-12-18 12:09  6% [PATCH 00/26] lei: basic UI + IPC work Eric Wong
@ 2020-12-18 12:09  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-12-18 12:09 UTC (permalink / raw)
  To: meta

Dovecot, mutt, and likely much other software support mbox
Status/X-Status headers.  Ensure we have a way to extract these
headers as JMAP-compatible keywords before removing them for git
storage.

->add_eml now accepts setting keywords at import time,
and will probably be called like this:

	$lst->add_eml($eml, $lst->mbox_keywords($eml));
	$lst->add_eml($eml, $lst->maildir_keywords($fn));
---
 lib/PublicInbox/LeiStore.pm | 23 ++++++++++++++++++++++-
 t/lei_store.t               | 14 ++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiStore.pm b/lib/PublicInbox/LeiStore.pm
index c95df785..553adbc8 100644
--- a/lib/PublicInbox/LeiStore.pm
+++ b/lib/PublicInbox/LeiStore.pm
@@ -162,8 +162,27 @@ sub remove_eml_keywords {
 	\@docids;
 }
 
+# cf: https://doc.dovecot.org/configuration_manual/mail_location/mbox/
+my %status2kw = (F => 'flagged', A => 'answered', R => 'seen', T => 'draft');
+# O (old/non-recent), and D (deleted) aren't in JMAP,
+# so probably won't be supported by us.
+sub mbox_keywords {
+	my $eml = $_[-1];
+	my $s = "@{[$eml->header_raw('X-Status'),$eml->header_raw('Status')]}";
+	my %kw;
+	$s =~ s/([FART])/$kw{$status2kw{$1}} = 1/sge;
+	sort(keys %kw);
+}
+
+# cf: https://cr.yp.to/proto/maildir.html
+my %c2kw = ('D' => 'draft', F => 'flagged', R => 'answered', S => 'seen');
+sub maildir_keywords {
+	$_[-1] =~ /:2,([A-Z]+)\z/i ?
+		sort(map { $c2kw{$_} // () } split(//, $1)) : ();
+}
+
 sub add_eml {
-	my ($self, $eml) = @_;
+	my ($self, $eml, @kw) = @_;
 	my $eidx = eidx_init($self);
 	my $oidx = $eidx->{oidx};
 	my $smsg = bless { -oidx => $oidx }, 'PublicInbox::Smsg';
@@ -178,6 +197,7 @@ sub add_eml {
 			my $idx = $eidx->idx_shard($docid);
 			$oidx->add_xref3($docid, -1, $smsg->{blob}, '.');
 			$idx->shard_add_eidx_info($docid, '.', $eml); # List-Id
+			$idx->shard_add_keywords($docid, @kw) if @kw;
 		}
 	} else {
 		$smsg->{num} = $oidx->adj_counter('eidx_docid', '+');
@@ -185,6 +205,7 @@ sub add_eml {
 		$oidx->add_xref3($smsg->{num}, -1, $smsg->{blob}, '.');
 		my $idx = $eidx->idx_shard($smsg->{num});
 		$idx->index_raw($msgref, $eml, $smsg);
+		$idx->shard_add_keywords($smsg->{num}, @kw) if @kw;
 	}
 	$smsg->{blob}
 }
diff --git a/t/lei_store.t b/t/lei_store.t
index c18a9620..03ab5af6 100644
--- a/t/lei_store.t
+++ b/t/lei_store.t
@@ -19,6 +19,20 @@ like($oid, qr/\A[0-9a-f]+\z/, 'add returned OID');
 my $eml = eml_load('t/data/0001.patch');
 is($lst->add_eml($eml), undef, 'idempotent');
 $lst->done;
+is_deeply([$lst->mbox_keywords($eml)], [], 'no keywords');
+$eml->header_set('Status', 'RO');
+is_deeply([$lst->mbox_keywords($eml)], ['seen'], 'seen extracted');
+$eml->header_set('X-Status', 'A');
+is_deeply([$lst->mbox_keywords($eml)], [qw(answered seen)],
+	'seen+answered extracted');
+$eml->header_set($_) for qw(Status X-Status);
+
+is_deeply([$lst->maildir_keywords('/foo:2,')], [], 'Maildir no keywords');
+is_deeply([$lst->maildir_keywords('/foo:2,S')], ['seen'], 'Maildir seen');
+is_deeply([$lst->maildir_keywords('/foo:2,RS')], ['answered', 'seen'],
+	'Maildir answered + seen');
+is_deeply([$lst->maildir_keywords('/foo:2,RSZ')], ['answered', 'seen'],
+	'Maildir answered + seen w/o Z');
 {
 	my $es = $lst->search;
 	my $msgs = $es->over->query_xover(0, 1000);

^ permalink raw reply related	[relevance 7%]

* [PATCH 00/26] lei: basic UI + IPC work
@ 2020-12-18 12:09  6% Eric Wong
  2020-12-18 12:09  7% ` [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-12-18 12:09 UTC (permalink / raw)
  To: meta

Some work on the storage side, but MiscIdx still needs work to
handle existing publicinboxes, extinboxes (over HTTP(S)), and
other config things.

PATCH 22/26 - bash completion sorta works, but filename
completions get broken.  Not sure why and help would be
greatly appreciated (along with help for other shells).
I don't know bash-specific stuff well at all, even; and
less about other non-POSIX shells.

Somewhat nice UI things (at least to my delirious sleep-deprived
state):

* -$DIGIT option parsing works (e.g. "git log -10"),
  "kill -9"

* help-based CLI arg/prototype checking seems working
  and hopefully cuts down on long-term maintenance work
  while promoting UI consistency

* having IO::FDPass hides startup time, 20-30ms isn't
  really noticeable for humans on interactive terminals,
  but still not ideal for loops.

* lei.sh + "make symlink-install"

And some internal improvements:

* several simplifications to existing Search code,
  ->xdb_shards_flat will come in handy

* generic OnDestroy - long overdue

Eric Wong (26):
  lei: FD-passing and IPC basics
  lei: proposed command-listing and options
  lei_store: local storage for Local Email Interface
  tests: more common JSON module loading
  lei: use spawn (vfork + execve) for lazy start
  lei: refine help/option parsing, implement "init"
  t/lei-oneshot: standalone oneshot (non-socket) test
  lei: ensure we run a restrictive umask
  lei: support `daemon-env' for modifying long-lived env
  lei_store: simplify git_epoch_max, slightly
  search: simplify initialization, add ->xdb_shards_flat
  rename LeiDaemon package to PublicInbox::LEI
  lei: support pass-through for `lei config'
  lei: help: show actual paths being operated on
  lei: rename $client => $self and bless
  lei: micro-optimize startup time
  lei_store: relax GIT_COMMITTER_IDENT check
  lei_store: keyword extraction from mbox and Maildir
  on_destroy: generic localized END
  lei: restore default __DIE__ handler for event loop
  lei: drop $SIG{__DIE__}, add oneshot fallbacks
  lei: start working on bash completion
  build: add lei.sh + "make symlink-install" target
  lei: support for -$DIGIT and -$SIG CLI switches
  lei: revise output routines
  lei: extinbox: start implementing in config file

 MANIFEST                               |  11 +
 Makefile.PL                            |  11 +
 contrib/completion/lei-completion.bash |  11 +
 lei.sh                                 |   7 +
 lib/PublicInbox/Daemon.pm              |   6 +-
 lib/PublicInbox/ExtSearch.pm           |  10 +-
 lib/PublicInbox/ExtSearchIdx.pm        |  35 +-
 lib/PublicInbox/Import.pm              |   4 +
 lib/PublicInbox/LEI.pm                 | 776 +++++++++++++++++++++++++
 lib/PublicInbox/LeiExtinbox.pm         |  52 ++
 lib/PublicInbox/LeiSearch.pm           |  39 ++
 lib/PublicInbox/LeiStore.pm            | 227 ++++++++
 lib/PublicInbox/ManifestJsGz.pm        |   2 +-
 lib/PublicInbox/OnDestroy.pm           |  16 +
 lib/PublicInbox/OverIdx.pm             |  10 +
 lib/PublicInbox/Search.pm              |  65 +--
 lib/PublicInbox/SearchIdx.pm           |  62 +-
 lib/PublicInbox/SearchIdxShard.pm      |  33 ++
 lib/PublicInbox/TestCommon.pm          |   7 +-
 lib/PublicInbox/V2Writable.pm          |  10 +-
 script/lei                             |  76 +++
 t/extsearch.t                          |   3 +-
 t/lei-oneshot.t                        |  25 +
 t/lei.t                                | 306 ++++++++++
 t/lei_store.t                          |  88 +++
 t/on_destroy.t                         |  25 +
 t/www_listing.t                        |   8 +-
 27 files changed, 1843 insertions(+), 82 deletions(-)
 create mode 100644 contrib/completion/lei-completion.bash
 create mode 100755 lei.sh
 create mode 100644 lib/PublicInbox/LEI.pm
 create mode 100644 lib/PublicInbox/LeiExtinbox.pm
 create mode 100644 lib/PublicInbox/LeiSearch.pm
 create mode 100644 lib/PublicInbox/LeiStore.pm
 create mode 100644 lib/PublicInbox/OnDestroy.pm
 create mode 100755 script/lei
 create mode 100644 t/lei-oneshot.t
 create mode 100644 t/lei.t
 create mode 100644 t/lei_store.t
 create mode 100644 t/on_destroy.t

^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-12-18 12:09  6% [PATCH 00/26] lei: basic UI + IPC work Eric Wong
2020-12-18 12:09  7% ` [PATCH 18/26] lei_store: keyword extraction from mbox and Maildir Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).