about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-06-14lei_input: allow keywords when importing 1 file from Maildir
This will eventually be useful for supporting inotify watches on Maildir. It will also allow users to script their own FS watchers more easily.
2021-06-13net_reader: canonicalize URL args on add_url
This fixes cases when users specify an IMAP or NNTP URL with standard port numbers explicitly. In other words, this allows users to use "lei ls-mail-source nntps://public-inbox.org:563/" and "lei ls-mail-source imaps://public-inbox.org:993/" without hitting "BUG:" errors.
2021-06-13lei import: use url_folder_cache for completion
And fix "lei index" completion while we're at it.
2021-06-13lei ls-mail-source: write through to URL folder cache
We'll be able to use this for shell completion for lei import, lcat, tag, etc.. This also adds --url support for scripting purposes.
2021-06-13lei: stop pager early on exit
This is necessary when using "ls-mail-source" on an unreachable IMAP server.
2021-06-12lei ls-mail-source: list IMAP folders and NNTP groups
While other tools can provide the same functionality, having integration with git-credential is convenient, here. Caching and completion will be implemented separately.
2021-06-10lei tag: less confusing warning about unimported messages
"unimported" is more meaningful than "missing", here. And instead of having every worker spew about unimported messages, we'll accumulate and only print one warning line. This necessitated alterating ->DESTROY behavior and persisting the client socket within the $lei object itself, not just the PktOp consumer object.
2021-06-10lei import: support --new-only for IMAP
Taking ~40s to synchronize a ~75K message IMAP folder is still a lot of time, so support an option to only touch new messages. This is similar to "offlineimap -q" (quick) or "mbsync --new" switches, but lei already accepts "-q" as a shortcut for --quiet. "--new" could work, but "--new-only" might be more descriptive (or "--only-new"?), since the default fetches also fetches new messages. v2: warn for non-IMAP sources, I'm not sure it's worth it for Maildir or other sources, yet. It will also make sense for MH and JMAP once we support them.
2021-06-09lei prune-mail-sync: new command to prune invalid sync data
This will be invoked automatically by "lei import" eventually, but it may make sense to expose as a separate command.
2021-06-09lei_mail_sync: hoist out --all handling from export-kw
We'll be reusing it in other commands, too.
2021-06-09lei tag: parallelize Maildir access
Since Maildir isn't guaranteed to have any sort of order, we can parallelize inputs, here. On a 4-core system, this reduced one of my tag invocations from 5.5 to 1.4s.
2021-06-09mdir_reader: maildir_each_file: pass flags, skip Trash
This is a slight behavior change for "lei q": Trashed (but not-yet-expunged) messages no longer get unlinked when --output is used without --augment.
2021-06-09inbox_writable: fix import_maildir
I'm not sure if anybody uses this, but it exists. It'll likely be dropped in the future. Fixes: fa3f0cbcd1af5008 ("use MdirReader in -watch and InboxWritable")
2021-06-09lei/store: do eidx_init before creating R/W lms dbh
Sharing lms->{dbh} with eidx shards appears to be the cause of the "Issuing rollback() due to DESTROY without explicit disconnect() of DBD::SQLite::db handle" messages I've been seeing from "lei up".
2021-06-09lei edit-search: fix and add a (weak) test
This broke recently and lacked an automated test, so rely on EDITOR=cat to ensure we have some coverage. Fixes: d2670108f71b1eff ("pkt_op: make pkt_do an OO method")
2021-06-09lei pmdir: fix nproc for <= 4 CPUs
I forgot my FreeBSD VM has 8 cores, actually, and tweaked the nproc detection on that machine before finalizing commit 10b523eb017162240b1ac3647f8dcbbf2be348a7 ("lei import: speed up repeated Maildir imports") Fixes: 10b523eb01716224 ("lei import: speed up repeated Maildir imports")
2021-06-08lei import: speed up repeated Maildir imports
On a 4-core CPU, this speeds up "lei import" on a largish Maildir inbox with 75K messages from ~8 minutes down to ~40s. Parallelizing alone did not bring any improvement and may even hurt performance slightly, depending on CPU availability. However, creating the index on the "fid" and "name" columns in blob2name yields us the same speedup we got. Parallelizing IMAP makes more sense due to the fact most IMAP stores are non-local and subject to network latency. Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
2021-06-08lei: generalize auxiliary WQ handling
op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-06-08lei: safety fix for multiple WQ classes
For commands utilizing multiple workers, this simple change generalizes the persistence mechanism and and prevents lei->dclose from causing script/lei to exit if there are still in-flight workers. This ougth to prevent read-after-write consistency problems that occasionally manifest in scripts (e.g. test cases) but usually go unnoticed in normal use.
2021-06-08lei/store: checkpoint commits mail_sync.sqlite3
We mainly rely on ->done with lei/store, but moving to ->checkpoint probably makes sense. Note: over, msgmap, and mail_sync all have slightly different transacation behavior; perhaps they can be unified in the future.
2021-06-06lei: don't drop WQ workers on normal exit
This is dangerous and causes race conditions on commands which utilize multiple workqueues.
2021-06-04pkt_op: make pkt_do an OO method
This will make it easier to use for internal use such as managing Maildir and IMAP IDLE watches.
2021-06-03pkt_op: remove blocking I/O support
Since lei-daemon is guaranteed to be running, there's no need to keep blocking I/O support around (and we can get it back via git if we need it). Followup-to: 1d6e1f9a6a66a42d ("lei: require Socket::MsgHdr or Inline::C, drop oneshot")
2021-06-03lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP inbox with 75K messages from ~21 minutes down to 40s. Parallelizing with the new LeiImportKw WQ worker class gives a near-linear speedup and brought the runtime down to ~5:40. The new idx_fid_uid index on the "fid" and "uid" columns of blob2num in mail_sync.sqlite3 brought us the final speedup. An additional index on over.sqlite3#xref3(oidbin) did not help, since idx_nntp already exists and speeds up the new ->oidbin_exists internal API. I initially experimented with a separate "lei import-kw" command but decided against it since it's useless outside of IMAP+JMAP and would require extra cognitive overhead for both users and hackers. So LeiImportKw is just a WQ worker used by "lei import" and not its own user-visible command. v2: fix ikw_done_wait arg handling (ugh, confusing API :x)
2021-06-02lei export-kw: do not write directly to mail_sync.sqlite3
Only the lei/store process should be writing to files/DBs in lei/store.
2021-06-02lei: remove "forget" (old name for "rm")
"rm" is probably the better name for it, since it matches "public-inbox-learn rm"
2021-06-01lei_mail_sync: more debug info for uncommitted txn
I'm not actually sure if I hit an uncommitted transaction just now, it doesn't seem like it.
2021-06-01lei import: reduce writes to lei/store on IMAP sync
We don't need to write VMD changes to lei/store if local keywords are unchanged.
2021-05-30lei import: import IMAP flag changes from old messages
This makes "lei import" behavior with IMAP folders more consistent with that with Maildir. Opening IMAP folders read-write with "SELECT" (instead of read-only with "EXAMINE") was necessary, since it lets an IMAP server communicate to us as to whether or not it's worth refetching IMAP flags of previously imported messages. Fetching UID+FLAGS only is one of the fastest IMAP operations with dovecot, our -imapd and presumably other common IMAP servers. It is issued by common MUAs such as mutt after every SELECT. Users may now rely on "lei import" exclusively to merge mail and keywords into lei/store, and "lei export-kw" to propagate keyword changes back to IMAP servers. A sticks-and-stones workflow for personal mailboxes is currently: lei import imaps://$MY_PERSONAL_INBOX lei q --mua=$MUA -o /tmp/results SEARCH TERMS... # do stuff from within $MUA to /tmp/results lei import /tmp/results # read keyword changes from MUA lei export-kw imaps://$MY_PERSONAL_INBOX # repeat when new stuff shows up in personal inbox The next goal is to automate repeated imports + export-kw commands with with inotify and IMAP IDLE.
2021-05-30lei: support implicit stdin by default
This adds implicit stdin suppport for p2q and lcat, while rm and rediff no longer need explicit support for it.
2021-05-30lei lcat: support maildir: paths, too
This could be helpful in case when a Maildir is on a slow or unmounted filesystem and lei/store is on fast storage.
2021-05-30lei lcat: allow IMAP folder URLs w/o UIDVALIDITY
Requiring UIDVALIDITY on the command-line is of course unreasonable.
2021-05-30lei lcat+inspect: start wiring up completion
Colons and other delimiters still cause problems for our bash completion, but some completion is better than no completion.
2021-05-30lei q: --sort and --save|v2 are incompatible
Saved searches rely on (reverse) docid ordering for efficient incremental results, and sorting any other way prevents that. Update comment description in LeiQuery while we're at it: "ls-query" and "rm-query" are "ls-search" and "forget-search", respectively, and "mv-query" is implicit with "edit-search"
2021-05-30lei import|lcat: improve+fix single message IMAP support
lcat can now dump the memoized contents of entire IMAP folders, not just a single UID. It's now parallelized and pipelined for multiple lei2mail workers. Furthemore, various forms of JSON output work consistently with blob-only output, now. While working on this, I noticed NetReader was passing UID URLs to imap_each callbacks, which was causing mail_sync.sqlite3 to store UIDs in `folders' and clearly wrong so it's now fixed.
2021-05-29lei_to_mail: use abs_path for Maildir in mail_sync.sqlite3
lei->rel2abs doesn't resolve symlinks, which could cause synchronization problems with export-kw or other commands.
2021-05-28lei q|up: support v2:/path/to/inboxdir destination
This allows "lei-managed pseudo mailing lists" as described by Konstantin. Alternates use is optional and can be enables via --shared. This doesn't manage or edit ~/.public-inbox/config; presumably there'll need to be some tweaking of search parameters before finalizing and making the inbox publicly accessible via HTTP/NNTP. Link: https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/T/
2021-05-28lei: retry_reopen on read-only Xapian access
Xapian DBs may be modified by a parallel process while we're reading it, and Xapian's MVCC model places the burden on readers to retry operations. We'll also have retry_reopen croak instead of die on errors, which ought to help us track down some "Document not found" errors I've occasionally seen when using "lei <q|up>".
2021-05-28lei: restore working directory in more places
Every tick of the event loop can change the working directory, so we need to restore it for every client if they operate in different directories. This would be easier if we had openat(2) and friends in Perl; but Inline::C is practically required for lei, now.
2021-05-28lei: handle a single IMAP message in most places
"lei import" can now import a single IMAP message via <imaps://example.com/MAILBOX/;UID=$UID> Likewise, "lei inspect" can show the blob information for UID URLs and "lei lcat" can display the blob without network access if imported. "lei lcat" also gets rid of some unused code and supports "blob:$OIDHEX" syntax as described in the comments (and used by our "text" output format). v2: enforce UID in URL, fail without v3: fix error reporting (s/fail/child_error/)
2021-05-28lei_mail_sync: debug code for uncommitted txn
I'm not 100% sure why, but "lei up" seems to cause uncommitted transaction errors. LeiToMail calls sto->set_sync_info, but LeiXSearch should call sto->done and lms_commit, so I'm not sure where the uncommited transaction is coming from...
2021-05-28lei: add TODO item for FUSE mount
It seems possible and natural to allow browsing lei/store as a Maildir (as well as read-write JMAP/IMAP store).
2021-05-28lei: mark reorder-and-rewrite-local-history as a TODO item
This is low priority, for now.
2021-05-28viewdiff: escape '{' and '}' for regexp
Perl 5 doesn't warn on this, yet, but it warns on unescaped '(' and ')' nowadays, so it's conceivable Perl could start warning on this in the future. So future-proof our code and reduce reader confusion.
2021-05-28viewdiff: make $UNSAFE a variable
There's no sense in using a constant here since it gets copied into the uri_escape_utf8 function anyways. Furthermore, inlined constants still leave behind a subroutine and subs cost several KB of memory. Finally, add a comment as to why it's different than the default escape, since I just spent a minute wondering that.
2021-05-27lei rm: new command to remove messages from index
This is similar to "public-inbox-learn rm", but it's possible to point an entire Maildir/IMAP/mbox*/newsgroup at it.
2021-05-26lei: require Socket::MsgHdr or Inline::C, drop oneshot
The cost of supporting separate code paths between oneshot and daemon isn't worth the trouble; especially if there are more users to support. The test suite time nearly doubles with oneshot, so that's hurting developer productivity. FD passing is currently required to work efficiently with remote HTTP(S) queries which return large messages, as seen in commit 708b182a57373172f5523f3dc297659d58e03b58 ("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case"). Additionally, upcoming support for IMAP IDLE and inotify-based monitoring of Maildirs cannot work properly without a background daemon.
2021-05-25ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case
WQWorkers are limited roughly to MAX_ARG_STRLEN (the kernel limit of argv + environ) to avoid excessive memory growth. Occasionally, we need to send larger messages via workqueues that are too small to hit EMSGSIZE on the sender. This fixes "lei q" when using HTTP(S) externals, since that code path sends large Eml objects from lei_xsearch workers directly to lei2mail WQ workers.
2021-05-25ipc: avoid potential stack-not-refcounted bug
This fixes a potential problem with Carp::longmess firing somewhere deeper in the stack. This is not a known problem at this time, but something I noticed while chasing something else.
2021-05-25lei forget-mail-sync: new command to drop sync information
Sometimes a user stops caring to sync an IMAP or Maildir folder, or wants to force a resync. Let them run this command to have lei forget all the sync information about the mail folder. This won't delete any stored messages in git, but will leave "lei index" users with dangling references.