about summary refs log tree commit homepage
path: root/lib/PublicInbox/LEI.pm
DateCommit message (Collapse)
2021-09-17lei refresh-mail-sync: replace prune-mail-sync
Merely pruning mail synchronization information was insufficient for Maildir: renames are common in Maildir and we need to detect them after-the-fact when lei-daemon isn't running. Running this command could make "lei index" far more useful... v2: close R/O mail_sync.sqlite3 dbh before fork Keeping the DB file handle open across fork can cause bad things to happen even if we don't use it since sqlite3 itself still knows about it (but doesn't know Perl code doesn't know about it).
2021-09-16lei: git_oid: replace git_blob_id
We'll be using binary SHA-1 and SHA-256 in-memory since that's what mail_sync.sqlite3 stores.
2021-09-14lei up: fix env/cwd mismatches with multiple folders
By moving %ENV localization and fchdir into ->dispatch, we can maintain a consistent environment across multiple dispatches while having different clients.
2021-09-14lei: sto_done_request: add eval guard
Failures here can cause the lei-daemon event loop to break since PktOp doesn't guard dispatch. Add a guard here (and not deeper in the stack) so we can use the $lei object to report errors.
2021-09-14lei: warn on event loop errors
This should help us notice (and fix) bugs more easily.
2021-09-13lei: stop_pager: restore stdout when done
The reason(s) we had for not restoring stdout haven't been valid since script/lei (and not lei-daemon) spawns the pager, nowadays.
2021-09-12new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is complex since multiple commands are required to clone and fetch into epochs. Unlike grokmirror, these commands do not require any configuration. Instead, they rely on existing git config files and work like "git clone --mirror" and "git fetch", respectively. Like grokmirror, they use manifest.js.gz, but only on a per-inbox basis so users won't have to clone every inbox of a large instance nor edit config files to include/exclude inboxes they're interested in.
2021-09-11lei: pass client stderr to git-config in more places
This should improve the users' chances of seeing errors in various git config files we use.
2021-09-11lei: fix handling of broken lei.saved-search config files
lei shouldn't become unusable if a config file is invalid. Instead, show the "git config" stderr and attempt to continue gracefully. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
2021-09-10lei: split out @net_opt for curl/torsocks use
IMAP and NNTP connections share some curl(1) options for TLS, IPv4/IPv6, or netrc, etc...
2021-09-09lei-rm: add man page, support LeiInput args
-F/--in-format and --lock=TYPE(S) are easily supported by all classes using LeiInput.
2021-09-07lei up: support --all for IMAP folders
Since "lei up" is expected to be a heavily-used command, better support for IMAP seems like a reasonable idea. This is inefficient since we waste an IMAP(S) TCP connection since it dies when an auth-only LeiUp worker process dies, but it's better than not working at all, right now.
2021-09-07lei: dump and clear log at exit
This may be helpful for diagnosing errors in case we missed any.
2021-09-03lei inspect: support reading eml from --stdin
This can be useful inside mutt since I was diagnosing why a label ("L:$FOO") search was giving me a false-positive search result...
2021-09-03lei: ->child_error less error-prone
I was calling "child_error(1, ...)" in a few places where I meant to be calling "child_error(1 << 8, ...)" and inadvertantly triggering SIGHUP in script/lei. Since giving a zero exit code to child_error makes no sense, just allow falsy values to default to 1 << 8.
2021-09-03lei: dump errors to syslog, and not to CLI
Dumping errors from the previous run can often get lost, so just spew to syslog since it's a standard place to put errors that don't make it to a client. Note: we don't rely on $SIG{__WARN__} since some of the Net:: stuff will write directly to STDERR (as will external processes).
2021-08-31lei: refresh watches before MUA spawn for Maildir
If we possibly just wrote or created a Maildir, ensure it's monitored by the lei watch mechanism.
2021-08-31lei note-event: always flush changes on daemon exit
Because the timer may not fire in time before daemon shutdown.
2021-08-25lei up: improve --all=local stderr output
The "# $NR written to $DEST ($total matches)" messages are arguably the most useful output of "lei up --all=local", but they get intermixed with progress messages from various workers. Queue up these finalization messages and only spit them out on ->DESTROY.
2021-08-24lei: non-blocking lei/store->done in lei-daemon
This allows client sockets to wait for "done" commits to lei/store while the daemon reacts asynchronously. The goal of this change is to keep the script/lei client alive until lei/store commits changes to the filesystem, but without blocking the lei-daemon event loop. It depends on Perl refcounting to close the socket. This change also highlighted our over-use of "done" requests to lei/store processes, which is now corrected so we only issue it on collective socket EOF rather than upon reaping every single worker. This also fixes "lei forget-mail-sync" when it is the initial command. This took several iterations and much debugging to arrive at the current implementation: 1. The initial iteration of this change utilized socket passing from lei-daemon to lei/store, which necessitated switching from faster pipes to slower Unix sockets. 2. The second iteration switched to registering notification sockets independently of "done" requests, but that could lead to early wakeups when "done" was requested by other workers. This appeared to work most of the time, but suffered races under high load which were difficult to track down. Finally, this iteration passes the stringified socket GLOB ref to lei/store which is echoed back to lei-daemon upon completion of that particular "done" request.
2021-08-24lei: add missing LeiWatch lazy-load
I'm not sure if this class will actually be needed, but we need to load it while we're using it.
2021-08-21lei: implicitly watch all Maildirs it knows about
This allows MUA-made flag changes to Maildirs to be instantly read and acknowledged for future search results. In the future, it may be used to speed up --augment and --import-before (the default) with with "lei q".
2021-08-19lei q: make --save the default
Since "lei up" is more often useful than not and incurs neglible overhead; enable --save by default and allow --no-save to work. This also fixes a long-standing when overwriting --output destinations with saved searches: dedupe data from previous searches are reset and no longer influences the new (changed) search, so results no longer go missing if two sequential invocations of "lei q --save" point to the same --output.
2021-08-18lei: add ->lms shortcut for LeiMailSync
We access this read-only in many places (and will in more), so provide a shortcut to simplify callers.
2021-08-14lei: hexdigest mocks account for unwanted headers
PublicInbox::Import never imports @UNWANTED_HEADERS, so ensure our mock blob OIDs do the same. This ought to prevent duplicates if the PSGI mboxrd download starts setting "X-Status: F" like "lei q -tt .."
2021-08-13lei up: support multiple output folders w/o --all=local
Being able to update 1 folder, or all (local) folders is sometimes too limiting, so just allow updating any subset of local folders.
2021-08-11lei: attempt to canonicalize away "/../" pathnames
As documented, File::Spec->canonpath does not canonicalize "/../". While we want to do our best to preserve symlinks in pathnames, leaving "/../" can mislead our inotify|kqueue usage.
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-04lei: close inotify FD in forked child
Linux::Inotify2 2.3+ includes an ->fh method to give us the ability to safely close an FD without hitting EBADF (and automatically use FD_CLOEXEC). We'll still need a new wrapper class (LI2Wrap) to handle it for users of old versions, though. Link: http://lists.schmorp.de/pipermail/perl/2021q3/thread.html
2021-07-28listener: maximize listen(2) backlog
This helps avoid errors from script/lei dying on ECONNRESET when a single lei-daemon is serving all tests when run via "make check-run". Instead of using some arbitrary limit, use INT_MAX and let the kernel clamp it (both Linux and FreeBSD do). There's no need to call listen() in LEI.pm, either, since Listener->new takes care of it.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-07-25lei rm-watch: new command to support removing watches
Pretty trivial since it just invokes "git-config". It's mainly intended to make shell completion easier.
2021-07-22lei: auto-refresh watches in config, cancel missing
This makes behavior less surprising on restarts as we no longer lose state on restarts, so there's no need to manually run "lei add-watch" to re-enable watches. This also allows us to transparently handle changes if somebody edits the lei config file directly or via git-config(1).
2021-07-22lei: start implementing inotify Maildir support
This allows lei to automatically note keyword (message flag) changes made to a Maildir and propagate it into lei/store: lei add-watch --state=tag-ro /path/to/Maildir This doesn't persist across restarts, yet. In the future, it will be applied automatically to "lei q" output Maildirs by default (with an option to disable it). State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all be supported for Maildir. This represents a fairly major internal change that's fairly intrusive, but the whole daemon-oriented design was to facilitate being able to automatically monitor (and propagate) Maildir/IMAP flag changes.
2021-07-05lei: drop workers on EOF from clients
Sometimes a user will be bored waiting for a command to finish, so ensure we drop disconnect workers in this case.
2021-07-03lei inspect: help+completion for --dir option
It's the most generic name I could find for it since it can mean so many things...
2021-06-22lei: use open() perlop for -C (chdir)
This is for consistency with the open() at initial accept, in case we hit a code path which expects Perl directory handles rather than "file handles". Both work with the chdir() perlop (fchdir(2), in our case).
2021-06-20lei import: help + completion for --new-only
I've found it's very helpful for large IMAP folders.
2021-06-13lei ls-mail-source: write through to URL folder cache
We'll be able to use this for shell completion for lei import, lcat, tag, etc.. This also adds --url support for scripting purposes.
2021-06-13lei: stop pager early on exit
This is necessary when using "ls-mail-source" on an unreachable IMAP server.
2021-06-12lei ls-mail-source: list IMAP folders and NNTP groups
While other tools can provide the same functionality, having integration with git-credential is convenient, here. Caching and completion will be implemented separately.
2021-06-10lei tag: less confusing warning about unimported messages
"unimported" is more meaningful than "missing", here. And instead of having every worker spew about unimported messages, we'll accumulate and only print one warning line. This necessitated alterating ->DESTROY behavior and persisting the client socket within the $lei object itself, not just the PktOp consumer object.
2021-06-10lei import: support --new-only for IMAP
Taking ~40s to synchronize a ~75K message IMAP folder is still a lot of time, so support an option to only touch new messages. This is similar to "offlineimap -q" (quick) or "mbsync --new" switches, but lei already accepts "-q" as a shortcut for --quiet. "--new" could work, but "--new-only" might be more descriptive (or "--only-new"?), since the default fetches also fetches new messages. v2: warn for non-IMAP sources, I'm not sure it's worth it for Maildir or other sources, yet. It will also make sense for MH and JMAP once we support them.
2021-06-09lei prune-mail-sync: new command to prune invalid sync data
This will be invoked automatically by "lei import" eventually, but it may make sense to expose as a separate command.
2021-06-08lei import: speed up repeated Maildir imports
On a 4-core CPU, this speeds up "lei import" on a largish Maildir inbox with 75K messages from ~8 minutes down to ~40s. Parallelizing alone did not bring any improvement and may even hurt performance slightly, depending on CPU availability. However, creating the index on the "fid" and "name" columns in blob2name yields us the same speedup we got. Parallelizing IMAP makes more sense due to the fact most IMAP stores are non-local and subject to network latency. Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
2021-06-08lei: generalize auxiliary WQ handling
op_wait_event is now more lei-specific since we no longer have to care about oneshot and use a synchronous loop. {ikw} (import-keywords) started a trend, but LeiPmdir (parallel Maildir) is an upcoming WQ class that will follow this idea. Eventually, {l2m} usage may be updated to follow this, too.
2021-06-08lei: safety fix for multiple WQ classes
For commands utilizing multiple workers, this simple change generalizes the persistence mechanism and and prevents lei->dclose from causing script/lei to exit if there are still in-flight workers. This ougth to prevent read-after-write consistency problems that occasionally manifest in scripts (e.g. test cases) but usually go unnoticed in normal use.
2021-06-06lei: don't drop WQ workers on normal exit
This is dangerous and causes race conditions on commands which utilize multiple workqueues.
2021-06-04pkt_op: make pkt_do an OO method
This will make it easier to use for internal use such as managing Maildir and IMAP IDLE watches.
2021-06-03lei import: speed up kw updates for old IMAP messages
On a 4-core CPU, this speeds up "lei import" on a largish IMAP inbox with 75K messages from ~21 minutes down to 40s. Parallelizing with the new LeiImportKw WQ worker class gives a near-linear speedup and brought the runtime down to ~5:40. The new idx_fid_uid index on the "fid" and "uid" columns of blob2num in mail_sync.sqlite3 brought us the final speedup. An additional index on over.sqlite3#xref3(oidbin) did not help, since idx_nntp already exists and speeds up the new ->oidbin_exists internal API. I initially experimented with a separate "lei import-kw" command but decided against it since it's useless outside of IMAP+JMAP and would require extra cognitive overhead for both users and hackers. So LeiImportKw is just a WQ worker used by "lei import" and not its own user-visible command. v2: fix ikw_done_wait arg handling (ugh, confusing API :x)