* [PATCH 00/36] another round of lei stuff
@ 2020-12-31 13:51 7% Eric Wong
2020-12-31 13:51 5% ` [PATCH 04/36] lei_to_mail: initial implementation for writing mbox formats Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2020-12-31 13:51 UTC (permalink / raw)
To: meta
This is against lei branch @ commit
0c8106d44f317175e122744b43407bf067183175 in
https://public-inbox.org/public-inbox.git
Infrastructure stuff for reading + writing local Maildirs and a
bunch of mbox formats are done (including gz/bz2/xz support)
and it's usage should be familiar to mairix(1) users.
Infrastructure for deduplication + augmenting search results
in place and tested.
Going to skip MH and MMDF for now; but IMAP/JMAP might happen
sooner but deduplication needs low-latency.
"extinbox" renamed "external"
Basic infrastructure like PublicInbox::IPC and SharedKV
should've been done and in use ages ago... I look forward to
using them, at least.
Some DS safety fixes since lei will use it in stranger ways
than current.
Bad enough we have messages with duplicate Message-IDs, lei will
need to deal with Unsent/Drafts messages w/o Message-IDs at all!
Eric Wong (36):
import: respect init.defaultBranch
lei_store: use per-machine refname as git HEAD
revert "lei_store: use per-machine refname as git HEAD"
lei_to_mail: initial implementation for writing mbox formats
sharedkv: fork()-friendly key-value store
sharedkv: split out index_values
lei_to_mail: start atomic and compressed mbox writing
mboxreader: new class for reading various mbox formats
lei_to_mail: start --augment, dedupe, bz2 and xz
lei: implement various deduplication strategies
lei_to_mail: lazy-require LeiDedupe
lei_to_mail: support for non-seekable outputs
lei_to_mail: support Maildir, fix+test --augment
ipc: generic IPC dispatch based on Storable
ipc: support Sereal
lei_store: add ->set_eml, ->add_eml can return smsg
lei: rename "extinbox" => "external"
mid: use defined-or with `push' for uniqueness check
mid: hoist out mids_in sub
lei_store: handle messages without Message-ID at all
ipc: use shutdown(2), base atfork* callback
lei_to_mail: unlink mboxes if not augmenting
lei: add --mfolder as an option
spawn: move run_die here from PublicInbox::Import
init: remove embedded UnlinkMe package
t/run.perl: avoid uninitialized var on incomplete test
gcf2client: reap process on DESTROY
lei_to_mail: open FIFOs O_WRONLY so we block
searchidxshard: call DS->Reset at worker start
t/ipc.t: test for references via `die'
use PublicInbox::DS for dwaitpid
syscall: SFD_NONBLOCK can be a constant, again
lei: avoid Spawn package when starting daemon
avoid calling waitpid from children in DESTROY
ds: clobber $in_loop first at reset
on_destroy: support PID owner guard
MANIFEST | 12 +-
lib/PublicInbox/DS.pm | 42 +-
lib/PublicInbox/DSKQXS.pm | 4 +-
lib/PublicInbox/Daemon.pm | 4 +-
lib/PublicInbox/Gcf2Client.pm | 18 +-
lib/PublicInbox/Git.pm | 7 +-
lib/PublicInbox/IPC.pm | 165 ++++++++
lib/PublicInbox/Import.pm | 36 +-
lib/PublicInbox/LEI.pm | 44 +--
lib/PublicInbox/LeiDedupe.pm | 100 +++++
.../{LeiExtinbox.pm => LeiExternal.pm} | 18 +-
lib/PublicInbox/LeiStore.pm | 32 +-
lib/PublicInbox/LeiToMail.pm | 361 ++++++++++++++++++
lib/PublicInbox/LeiXSearch.pm | 2 +-
lib/PublicInbox/Lock.pm | 17 +-
lib/PublicInbox/MID.pm | 15 +-
lib/PublicInbox/MboxReader.pm | 127 ++++++
lib/PublicInbox/OnDestroy.pm | 5 +
lib/PublicInbox/OverIdx.pm | 2 +
lib/PublicInbox/ProcessPipe.pm | 34 +-
lib/PublicInbox/Qspawn.pm | 43 +--
lib/PublicInbox/SearchIdxShard.pm | 1 +
lib/PublicInbox/SharedKV.pm | 148 +++++++
lib/PublicInbox/Sigfd.pm | 4 +-
lib/PublicInbox/Smsg.pm | 6 +-
lib/PublicInbox/Spawn.pm | 9 +-
lib/PublicInbox/Syscall.pm | 4 +-
lib/PublicInbox/TestCommon.pm | 25 +-
lib/PublicInbox/V2Writable.pm | 10 +-
script/lei | 17 +-
script/public-inbox-init | 32 +-
script/public-inbox-watch | 4 +-
t/convert-compact.t | 4 +-
t/index-git-times.t | 3 +-
t/ipc.t | 80 ++++
t/lei.t | 22 +-
t/lei_dedupe.t | 59 +++
t/lei_store.t | 47 ++-
t/lei_to_mail.t | 246 ++++++++++++
t/lei_xsearch.t | 2 +-
t/mbox_reader.t | 75 ++++
t/on_destroy.t | 9 +
t/plack.t | 4 +-
t/run.perl | 3 +-
t/shared_kv.t | 58 +++
t/sigfd.t | 6 +-
46 files changed, 1755 insertions(+), 211 deletions(-)
create mode 100644 lib/PublicInbox/IPC.pm
create mode 100644 lib/PublicInbox/LeiDedupe.pm
rename lib/PublicInbox/{LeiExtinbox.pm => LeiExternal.pm} (75%)
create mode 100644 lib/PublicInbox/LeiToMail.pm
create mode 100644 lib/PublicInbox/MboxReader.pm
create mode 100644 lib/PublicInbox/SharedKV.pm
create mode 100644 t/ipc.t
create mode 100644 t/lei_dedupe.t
create mode 100644 t/lei_to_mail.t
create mode 100644 t/mbox_reader.t
create mode 100644 t/shared_kv.t
^ permalink raw reply [relevance 7%]
* [PATCH 04/36] lei_to_mail: initial implementation for writing mbox formats
2020-12-31 13:51 7% [PATCH 00/36] another round of lei stuff Eric Wong
@ 2020-12-31 13:51 5% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2020-12-31 13:51 UTC (permalink / raw)
To: meta
No Maildir, support, yet, but it'll come.
---
MANIFEST | 2 +
lib/PublicInbox/LeiToMail.pm | 109 +++++++++++++++++++++++++++++++++++
t/lei_to_mail.t | 65 +++++++++++++++++++++
3 files changed, 176 insertions(+)
create mode 100644 lib/PublicInbox/LeiToMail.pm
create mode 100644 t/lei_to_mail.t
diff --git a/MANIFEST b/MANIFEST
index a5ff81cf..12b67e95 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -165,6 +165,7 @@ lib/PublicInbox/LEI.pm
lib/PublicInbox/LeiExtinbox.pm
lib/PublicInbox/LeiSearch.pm
lib/PublicInbox/LeiStore.pm
+lib/PublicInbox/LeiToMail.pm
lib/PublicInbox/LeiXSearch.pm
lib/PublicInbox/Linkify.pm
lib/PublicInbox/Listener.pm
@@ -328,6 +329,7 @@ t/kqnotify.t
t/lei-oneshot.t
t/lei.t
t/lei_store.t
+t/lei_to_mail.t
t/lei_xsearch.t
t/linkify.t
t/main-bin/spamc
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
new file mode 100644
index 00000000..b0d4b664
--- /dev/null
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -0,0 +1,109 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# Writes PublicInbox::Eml objects atomically to a mbox variant or Maildir
+package PublicInbox::LeiToMail;
+use strict;
+use v5.10.1;
+use PublicInbox::Eml;
+
+my %kw2char = ( # Maildir characters
+ draft => 'D',
+ flagged => 'F',
+ answered => 'R',
+ seen => 'S'
+);
+
+my %kw2status = (
+ flagged => [ 'X-Status' => 'F' ],
+ answered => [ 'X-Status' => 'A' ],
+ seen => [ 'Status' => 'R' ],
+ draft => [ 'X-Status' => 'T' ],
+);
+
+sub _mbox_hdr_buf ($$$) {
+ my ($eml, $type, $kw) = @_;
+ $eml->header_set($_) for (qw(Lines Bytes Content-Length));
+ my %hdr; # set Status, X-Status
+ for my $k (@$kw) {
+ if (my $ent = $kw2status{$k}) {
+ push @{$hdr{$ent->[0]}}, $ent->[1];
+ } else { # X-Label?
+ warn "TODO: keyword `$k' not supported for mbox\n";
+ }
+ }
+ while (my ($name, $chars) = each %hdr) {
+ $eml->header_set($name, join('', sort @$chars));
+ }
+ my $buf = delete $eml->{hdr};
+
+ # fixup old bug from import (pre-a0c07cba0e5d8b6a)
+ $$buf =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
+
+ substr($$buf, 0, 0, # prepend From line
+ "From lei\@$type Thu Jan 1 00:00:00 1970$eml->{crlf}");
+ $buf;
+}
+
+sub write_in_full_atomic ($$) {
+ my ($fh, $buf) = @_;
+ defined(my $w = syswrite($fh, $$buf)) or die "write: $!";
+ $w == length($$buf) or die "short write: $w != ".length($$buf);
+}
+
+sub eml2mboxrd ($;$) {
+ my ($eml, $kw) = @_;
+ my $buf = _mbox_hdr_buf($eml, 'mboxrd', $kw);
+ if (my $bdy = delete $eml->{bdy}) {
+ $$bdy =~ s/^(>*From )/>$1/gm;
+ $$buf .= $eml->{crlf};
+ substr($$bdy, 0, 0, $$buf); # prepend header
+ $buf = $bdy;
+ }
+ $$buf .= $eml->{crlf};
+ $buf;
+}
+
+sub eml2mboxo {
+ my ($eml, $kw) = @_;
+ my $buf = _mbox_hdr_buf($eml, 'mboxo', $kw);
+ if (my $bdy = delete $eml->{bdy}) {
+ $$bdy =~ s/^From />From /gm;
+ $$buf .= $eml->{crlf};
+ substr($$bdy, 0, 0, $$buf); # prepend header
+ $buf = $bdy;
+ }
+ $$buf .= $eml->{crlf};
+ $buf;
+}
+
+# mboxcl still escapes "From " lines
+sub eml2mboxcl {
+ my ($eml, $kw) = @_;
+ my $buf = _mbox_hdr_buf($eml, 'mboxcl', $kw);
+ my $crlf = $eml->{crlf};
+ if (my $bdy = delete $eml->{bdy}) {
+ $$bdy =~ s/^From />From /gm;
+ $$buf .= 'Content-Length: '.length($$bdy).$crlf.$crlf;
+ substr($$bdy, 0, 0, $$buf); # prepend header
+ $buf = $bdy;
+ }
+ $$buf .= $crlf;
+ $buf;
+}
+
+# mboxcl2 has no "From " escaping
+sub eml2mboxcl2 {
+ my ($eml, $kw) = @_;
+ my $buf = _mbox_hdr_buf($eml, 'mboxcl2', $kw);
+ my $crlf = $eml->{crlf};
+ if (my $bdy = delete $eml->{bdy}) {
+ $$buf .= 'Content-Length: '.length($$bdy).$crlf.$crlf;
+ substr($$bdy, 0, 0, $$buf); # prepend header
+ $buf = $bdy;
+ }
+ $$buf .= $crlf;
+ $buf;
+}
+
+1;
diff --git a/t/lei_to_mail.t b/t/lei_to_mail.t
new file mode 100644
index 00000000..089a422e
--- /dev/null
+++ b/t/lei_to_mail.t
@@ -0,0 +1,65 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use v5.10.1;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use_ok 'PublicInbox::LeiToMail';
+my $from = "Content-Length: 10\nSubject: x\n\nFrom hell\n";
+my $noeol = "Subject: x\n\nFrom hell";
+my $crlf = $noeol;
+$crlf =~ s/\n/\r\n/g;
+my $kw = [qw(seen answered flagged)];
+for my $mbox (qw(mboxrd mboxo mboxcl mboxcl2)) {
+ my $m = "eml2$mbox";
+ my $cb = PublicInbox::LeiToMail->can($m);
+ my $s = $cb->(PublicInbox::Eml->new($from), $kw);
+ is(substr($$s, -1, 1), "\n", "trailing LF in normal $mbox");
+ my $eml = PublicInbox::Eml->new($s);
+ is($eml->header('Status'), 'R', "Status: set by $m");
+ is($eml->header('X-Status'), 'AF', "X-Status: set by $m");
+ if ($mbox eq 'mboxcl2') {
+ like($eml->body_raw, qr/^From /, "From not escaped $m");
+ } else {
+ like($eml->body_raw, qr/^>From /, "From escaped once by $m");
+ }
+ my @cl = $eml->header('Content-Length');
+ if ($mbox =~ /mboxcl/) {
+ is(scalar(@cl), 1, "$m only has one Content-Length header");
+ is($cl[0] + length("\n"),
+ length($eml->body_raw), "$m Content-Length matches");
+ } else {
+ is(scalar(@cl), 0, "$m clobbered Content-Length");
+ }
+ $s = $cb->(PublicInbox::Eml->new($noeol), $kw);
+ is(substr($$s, -1, 1), "\n",
+ "trailing LF added by $m when original lacks EOL");
+ $eml = PublicInbox::Eml->new($s);
+ if ($mbox eq 'mboxcl2') {
+ is($eml->body_raw, "From hell\n", "From not escaped by $m");
+ } else {
+ is($eml->body_raw, ">From hell\n", "From escaped once by $m");
+ }
+ $s = $cb->(PublicInbox::Eml->new($crlf), $kw);
+ is(substr($$s, -2, 2), "\r\n",
+ "trailing CRLF added $m by original lacks EOL");
+ $eml = PublicInbox::Eml->new($s);
+ if ($mbox eq 'mboxcl2') {
+ is($eml->body_raw, "From hell\r\n", "From not escaped by $m");
+ } else {
+ is($eml->body_raw, ">From hell\r\n", "From escaped once by $m");
+ }
+ if ($mbox =~ /mboxcl/) {
+ is($eml->header('Content-Length') + length("\r\n"),
+ length($eml->body_raw), "$m Content-Length matches");
+ } elsif ($mbox eq 'mboxrd') {
+ $s = $cb->($eml, $kw);
+ $eml = PublicInbox::Eml->new($s);
+ is($eml->body_raw,
+ ">>From hell\r\n\r\n", "From escaped again by $m");
+ }
+}
+
+done_testing;
^ permalink raw reply related [relevance 5%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-12-31 13:51 7% [PATCH 00/36] another round of lei stuff Eric Wong
2020-12-31 13:51 5% ` [PATCH 04/36] lei_to_mail: initial implementation for writing mbox formats Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).