about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiToMail.pm
DateCommit message (Collapse)
2021-10-22lei: use RENAME_NOREPLACE on Linux 3.15+
One syscall is better than two for atomicity in Maildirs. This means there's no window where another process can see both the old and new file at the same time (link && unlink), nor a window where we might inadvertantly clobber an existing file if we were to do `stat && rename'.
2021-10-16lei_to_mail: quiet down abort messages
We don't need to flood the terminal with "W: $oid is (!= blob)\n" messages when somebody nukes a git cat-file process from under us.
2021-10-15lei + ipc: simplify process reaping
Simplify our APIs and force dwaitpid() to work in async mode for all lei workers. This avoids having lingering zombies for parallel searches if one worker finishes soon before another. The old distinction between "old" and "new" workers was needlessly complex, error-prone, and embarrasingly bad. We also never handled v2:// writers properly before on Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS to ensure they get handled by $lei when appropropriate.
2021-10-15lei: TSTP affects all curl and related subprocesses
By relying more on pgroups for remaining remaining processes, this lets us pause all curl+tail subprocesses with a single kill(2) to avoid cluttering stderr. We won't bother pausing the pigz/gzip/bzip2/xz compressor process not cat-file processes, though, since those don't write to the terminal and they idle soon after the workers react to SIGSTOP. AutoReap is hoisted out from TestCommon.pm. CLONE_SKIP is gone since we won't be using Perl threads any time soon (they're discouraged by the maintainers of Perl).
2021-10-10lei_to_mail: show --output on augment progress failure
Just in case it fails when there's many parallel invocations.
2021-09-25lei: make pkt_op easier-to-use and understand
Since switching to SOCK_SEQUENTIAL, we no longer have to use fixed-width records to guarantee atomic reads. Thus we can maintain more human-readable/searchable PktOp opcodes. Furthermore, we can infer the subroutine name in many cases to avoid repeating ourselves by specifying a command-name twice (e.g. $ops->{CMD} => [ \&CMD, $obj ]; can now simply be written as: $ops->{CMD} => [ $obj ] if CMD is a method of $obj.
2021-09-25lei2mail: augment_inprogress: guard against closed FDs
I'm not sure what caused it, but $err was undef and caused print to fail, leading to an event loop error. Guard the timer with an eval and assume warn() can't trigger an event loop failure.
2021-09-21lei q: show progress on >1s preparation phase
Overwriting existing destinations safe (but slow) by default, so show a progress message noting what we're doing while a user waits.
2021-09-19lei/store: use SOCK_SEQPACKET rather than pipe
This has several advantages: * no need to use ipc.lock to protect a pipe for non-atomic writes * ability to pass FDs. In another commit, this will let us simplify lei->sto_done_request and pass newly-created sockets to lei/store directly. disadvantages: - an extra pipe is required for rare messages over several hundred KB, this is probably a non-issue, though The performance delta is unknown, but I expect shards (which remain pipes) to be the primary bottleneck IPC-wise for lei/store.
2021-09-18lei up: automatically use dt: for remote externals
Since we can't use maxuid for remote externals, automatically maintaining the last time we got results and appending a dt: range to the query will prevent HTTP(S) responses from getting too big. We could be using "rt:", but no stable release of public-inbox supports it, yet, so we'll use dt:, instead. By default, there's a two day fudge factor to account for MTA downtime and delays; which is hopefully enough. The fudge factor may be changed per-invocation with the --remote-fudge-factor=INTERVAL option Since different externals can have different message transport routes, "lastresult" entries are stored on a per-external basis.
2021-09-18lei_mail_sync: rely on flock(2), avoid IPC
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"), relying on lei/store to serialize access was a pointless endeavor. Rely on flock(2) to serialize multiple writers since (in my experience) it's the easiest way to deal with parallel writers when using SQLite. This allows us to simplify existing callers while speeding up 'lei refresh-mail-sync --all=local' by 5% or so.
2021-09-11lei q|lcat: support "-f reply" output format
When composing replies in "git format-patch" cover letters, I'd been relying on "lei q -f text ...", but that still requires several steps to make it suitable for composing a reply: * s/^/> / to quote the body * drop existing In-Reply-To+References * s/^Message-ID:/In-Reply-To:/; * add an attribute line ... "lei q -f reply" takes care of most of that and users will only have to trim "From " lines, unnecessary results and over-quoted text (and trimming is likely less error-prone than doing all the steps above manually). This should also be a good replacement for "git format-patch --in-reply-to=...", since copying long Message-IDs can be error-prone (and this lets you include quoted text in replies).
2021-09-08lei q|up: fix write counter for v2
It's a bit confusing to see "0 written to ..." when we actually wrote something.
2021-09-07lei up: support --all for IMAP folders
Since "lei up" is expected to be a heavily-used command, better support for IMAP seems like a reasonable idea. This is inefficient since we waste an IMAP(S) TCP connection since it dies when an auth-only LeiUp worker process dies, but it's better than not working at all, right now.
2021-09-06lei_auth: simplify users
There's no need to alias net_merge_all in each WQ class which uses LeiAuth, `$obj->$sub' works even when `$sub' is a fully-qualified subroutine name with `::' in it. perlobj(1) documents it under "Method Call Variations".
2021-09-04lei_to_mail+mbox_reader: fix handling of empty/bogus emails
We may be handling invalid mboxes, so just return no objects in that case. While "lei q" on HTTP(S) externals expects a gzipped mboxrd, there's always a chance something else gzipped can be sent to us. There's also changes to lei_to_mail to better handle emails which lack a body and/or headers (e.g. t/solve/bare.patch) Link: https://public-inbox.org/meta/20210903151500.h72mzcpqixgtytjs@meerkat.local/
2021-09-03lei: fix read/write IMAP access
xt/net_writer-imap.t was completely broken in recent months and I completely forgot this test. net->add_url still only accepts bare scalars (and not scalar refs), so we must set that up properly. Furthermore, our changes to do FLAGS-only synchronization in lei of old messages was causing us to not handle FLAGS properly for the test.
2021-08-19lei q: make --save the default
Since "lei up" is more often useful than not and incurs neglible overhead; enable --save by default and allow --no-save to work. This also fixes a long-standing when overwriting --output destinations with saved searches: dedupe data from previous searches are reset and no longer influences the new (changed) search, so results no longer go missing if two sequential invocations of "lei q --save" point to the same --output.
2021-07-25lei: avoid SQLite COUNT() for dedupe
SQLite COUNT() is a slow operation that does a full table scan with no conditions. There's no need for it, since lei dedupe only needs to know if it's empty or not to decide between new/ and cur/ for Maildir outputs.
2021-06-04pkt_op: make pkt_do an OO method
This will make it easier to use for internal use such as managing Maildir and IMAP IDLE watches.
2021-05-30lei q: --sort and --save|v2 are incompatible
Saved searches rely on (reverse) docid ordering for efficient incremental results, and sorting any other way prevents that. Update comment description in LeiQuery while we're at it: "ls-query" and "rm-query" are "ls-search" and "forget-search", respectively, and "mv-query" is implicit with "edit-search"
2021-05-30lei import|lcat: improve+fix single message IMAP support
lcat can now dump the memoized contents of entire IMAP folders, not just a single UID. It's now parallelized and pipelined for multiple lei2mail workers. Furthemore, various forms of JSON output work consistently with blob-only output, now. While working on this, I noticed NetReader was passing UID URLs to imap_each callbacks, which was causing mail_sync.sqlite3 to store UIDs in `folders' and clearly wrong so it's now fixed.
2021-05-29lei_to_mail: use abs_path for Maildir in mail_sync.sqlite3
lei->rel2abs doesn't resolve symlinks, which could cause synchronization problems with export-kw or other commands.
2021-05-28lei q|up: support v2:/path/to/inboxdir destination
This allows "lei-managed pseudo mailing lists" as described by Konstantin. Alternates use is optional and can be enables via --shared. This doesn't manage or edit ~/.public-inbox/config; presumably there'll need to be some tweaking of search parameters before finalizing and making the inbox publicly accessible via HTTP/NNTP. Link: https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/T/
2021-05-28lei: handle a single IMAP message in most places
"lei import" can now import a single IMAP message via <imaps://example.com/MAILBOX/;UID=$UID> Likewise, "lei inspect" can show the blob information for UID URLs and "lei lcat" can display the blob without network access if imported. "lei lcat" also gets rid of some unused code and supports "blob:$OIDHEX" syntax as described in the comments (and used by our "text" output format). v2: enforce UID in URL, fail without v3: fix error reporting (s/fail/child_error/)
2021-05-23lei <q|up>: set \Recent on non-empty mbox and Maildir
Despite JMAP not supporting the equivalent of the IMAP \Recent flag, it is useful for "lei q --augment", and "lei up" users to be able to distinguish new results from old-but-unread messages in an mbox or Maildir. For mbox family messages, we'll drop the "O" status flag when appending to mboxes, and we'll write to the "new" subdirectory of Maildirs. Behavior when writing to initially empty Maildirs and mboxes remains unchanged since there's no need to distinguish between new and old results in the initial case. Having users wait for a rename(2) storm or complete mbox rewrite hurts UX. With IMAP mailboxes, \Recent is already enforced by the IMAP server and IMAP clients have no way of changing it(*) (*) mutt uses the "Old" IMAP flag which isn't part of RFC 3501, other MUAs may do similar things.
2021-05-23lei export-kw: support exporting keywords to IMAP
We support writing to IMAP stores in other places (just like Maildir), and it's actually less complex for us to write to IMAP. Neither usability nor performance is ideal, but usability will be addressed in the next commit to relax CLI argument checking. Performance is poor due to the synchronous Mail::IMAPClient API and will need to be addressed with pipelining sometime further in the future.
2021-05-23net_reader|net_writer: pass URI refs deeper into callbacks
This will give us more flexibility in the future w.r.t. dealing with UIDVALIDITY and AUTH= info with IMAP. The LoC reduction is welcome, too.
2021-05-23lei export-kw: new command to export keywords to Maildirs
IMAP will eventually be supported.
2021-05-23lei tag: support tagging index-only messages
This will make some of our tests faster and allow users to try more features of lei without high storage requirements.
2021-05-04lei index: new command to index mail w/o git storage
Since completely purging blobs from git is slow, users may wish to index messages in Maildirs (and eventually other local storage) without storing data in git. Much code from LeiImport and LeiInput is reused, and a new dummy FakeImport class supplies a non-storing $im->add and minimize changes to LeiStore. The tricky part of this command is to support "lei import" after a message has gone through "lei index". Relying on $smsg->{bytes} == 0 (as we do for external-only vmd storage) does not work here, since it would break searching for "z:" byte-ranges when not using externals. This eventually required PublicInbox::Import::add to use a SharedKV to keep track of imported blobs and prevent duplication.
2021-05-04lei up: fix dedupe with remote externals on Maildir + IMAP
LeiToMail Maildir and IMAP write callbacks need to account for the caller-supplied smsg. We'll also make better use of the user-supplied smsg object by ensuring blob deduplication happens ASAP. Fixes: e76683309ca4f254 ("lei <q|up>: distinguish between mset and l2m counts")
2021-05-03lei <q|up>: writes to Maildirs and IMAP use mail-sync
This will allow keyword updates from other folders to propagate to folders where search results may be duplicated.
2021-05-01lei_auth: s/net_merge_complete/net_merge_all_done/
We use the "done" term elsewhere for similar things, and my easily-confused mind equates "complete" with shell completion.
2021-05-01lei <q|up>: distinguish between mset and l2m counts
The number of messages we write to --output is usually different than the mset count due to deduplication from combining multiple sources. This change makes the stderr output of "lei up --all=local" way more useful IMHO.
2021-04-30lei: IMAP .onion support via --proxy=s switch
Mail::IMAPClient provides the ability to pass a pre-connected Socket to it. We can rely on this functionality to use IO::Socket::Socks in place whatever socket class Mail::IMAPClient chooses to use. The --proxy=s is shared with curl(1), though we only support socks5h:// at the moment. Is there any need for SOCKS4 or SOCKS5 without name resolution? Tor .onions require socks5h:// for name resolution and to prevent data leakage.
2021-04-27lei q + lcat: support --format=text output
This is mainly for "lei lcat" where it's the default, but I find it useful anyways compared to the JSON view. Colors are loaded from ~/.config/lei/config, and fall back to using diff colors from a normal git config (e.g. ~/.gitconfig).
2021-04-23lei_to_mail: cwd-agnostic Maildir wakeup
Since we don't have *at() syscalls readily available to us, lei-daemon may call ->poke_dst in the wrong relative directory. Despite not having *at() syscalls, we can still capture the "$MAILDIR/cur" directory handle at pre_augment time so we can reliably call futimes(2) on it using the `utime' perlop.
2021-04-22lei: flesh out `forwarded' kw support for Maildir and IMAP
Maildir and IMAP can both handle `forwarded'. Ensure we don't lose `forwarded' when reading from stores which do not support it, but ensure we can set it when reading from IMAP and Maildir stores.
2021-04-19lei q: implement import-before default for --save
This makes "lei q --save" as safe as "lei q" to prevent against accidental data loss when clobbering an existing output,
2021-04-16lei_to_mail: cast to URIimap object early
NetReader->add_url supports URI-like objects, now. We'll be relying on the canonicalization for LeiSavedSearch.
2021-04-13lei: add "lei up" to complement "lei q --save"
The command isn't finalized, yet, but it's intended to update an existing saved search.
2021-04-13lei q: start wiring up saved search
This will have a over.sqlite3 for content-based deduplication. It may exhibit ibxish methods, so serving a read-only (or even R/W) IMAP or instance or displaying HTML isn't outside the realm of possibility.
2021-04-13lei_dedupe: adjust to prepare for saved searches
LeiSavedSearch will use a LeiDedupe-like internal API, so we won't have to make as many changes to callsites between saved and unsaved searches.
2021-04-05lei q: fix auth IMAP --output with remote mboxrd
IMAP authentication info is only shared amongst lei2mail workers, so we must ensure all IMAP writes go through lei2mail workers even if we don't have to access the mail through git. This allows us to decouple the latency of the remote mboxrd from the latency of the IMAP --output at the expense of extra IPC overhead within our own processes.
2021-04-05lei_to_mail: improve comments and reduce LoC
We don't need to waste LoC on corner cases, single-use internal subs, or restoring SIG{__WARN__} when a process exits. All that extra code contributes to memory use and startup time, especially for users who can't use FD passing.
2021-04-05lei: maildir: move shard support to MdirReader
We'll eventually want lei_input users like "lei import" and "lei tag" to support parallel reads.
2021-04-05lei_to_mail: trim down imports
We don't need to import so many things. None of the Errno constants are in common paths so unlikely to benefit from constant folding.
2021-04-01lei: maildir: handle "forwarded" keyword as "P"
mbox and IMAP seem to have no way of describing this keyword. but Maildir does with the "P" flagged (for "passed").
2021-04-01lei q: reduce lei/store work for kw changes to stored mail
We can tweak lse->kw_changed to return docids and reduce IPC traffic and reduce work the lei/store worker needs to do.