Date | Commit message (Collapse) |
|
Simplify our APIs and force dwaitpid() to work in async mode for
all lei workers. This avoids having lingering zombies for
parallel searches if one worker finishes soon before another.
The old distinction between "old" and "new" workers was
needlessly complex, error-prone, and embarrasingly bad.
We also never handled v2:// writers properly before on
Ctrl-C/Ctrl-Z (SIGINT/SIGTSTP), so add them to @WQ_KEYS
to ensure they get handled by $lei when appropropriate.
|
|
By relying more on pgroups for remaining remaining processes,
this lets us pause all curl+tail subprocesses with a single
kill(2) to avoid cluttering stderr.
We won't bother pausing the pigz/gzip/bzip2/xz compressor
process not cat-file processes, though, since those don't write
to the terminal and they idle soon after the workers react to
SIGSTOP.
AutoReap is hoisted out from TestCommon.pm. CLONE_SKIP
is gone since we won't be using Perl threads any time
soon (they're discouraged by the maintainers of Perl).
|
|
warn() is easier to augment with context information, and
frankly unavoidable in the presence of 3rd-party libraries
we don't control.
|
|
This is useful in finding the cause of deduplication bugs,
and possibly the cause of missing threads reported by
Konstantin in <20211001130527.z7eivotlgqbgetzz@meerkat.local>
usage:
u=https://yhbt.net/lore/all/87czop5j33.fsf@tynnyri.adurom.net/raw
lei mail-diff $u
|
|
This has several advantages:
* no need to use ipc.lock to protect a pipe for non-atomic writes
* ability to pass FDs. In another commit, this will let us
simplify lei->sto_done_request and pass newly-created
sockets to lei/store directly.
disadvantages:
- an extra pipe is required for rare messages over several
hundred KB, this is probably a non-issue, though
The performance delta is unknown, but I expect shards
(which remain pipes) to be the primary bottleneck IPC-wise
for lei/store.
|
|
Since 44917fdd24a8bec1 ("lei_mail_sync: do not use transactions"),
relying on lei/store to serialize access was a pointless endeavor.
Rely on flock(2) to serialize multiple writers since (in my
experience) it's the easiest way to deal with parallel writers
when using SQLite. This allows us to simplify existing callers
while speeding up 'lei refresh-mail-sync --all=local' by 5% or
so.
|
|
There's no point in keeping mail_sync.sqlite3 entries around
if the folder is gone. We do keep saved-search configs around,
however, since somebody may decide to blow away a search and
start over.
|
|
I just made this mistake running "lei import" myself, so
I figure giving a hint makes sense, here.
|
|
I was calling "child_error(1, ...)" in a few places where I meant
to be calling "child_error(1 << 8, ...)" and inadvertantly
triggering SIGHUP in script/lei. Since giving a zero exit code
to child_error makes no sense, just allow falsy values to
default to 1 << 8.
|
|
This will be needed as we track changes in real-time, especially
for "lei index" since there's no storage involved.
|
|
Pretty trivial since it just invokes "git-config". It's mainly
intended to make shell completion easier.
|
|
This allows lei to automatically note keyword (message flag)
changes made to a Maildir and propagate it into lei/store:
lei add-watch --state=tag-ro /path/to/Maildir
This doesn't persist across restarts, yet. In the future,
it will be applied automatically to "lei q" output Maildirs
by default (with an option to disable it).
State values of tag-rw, index-<ro|rw>, import-<ro|rw> will all
be supported for Maildir.
This represents a fairly major internal change that's fairly
intrusive, but the whole daemon-oriented design was to
facilitate being able to automatically monitor (and propagate)
Maildir/IMAP flag changes.
|
|
This will simplify upcoming code for watches.
|
|
This will eventually be useful for supporting inotify watches
on Maildir. It will also allow users to script their own FS
watchers more easily.
|
|
While other tools can provide the same functionality, having
integration with git-credential is convenient, here. Caching
and completion will be implemented separately.
|
|
On a 4-core CPU, this speeds up "lei import" on a largish
Maildir inbox with 75K messages from ~8 minutes down to ~40s.
Parallelizing alone did not bring any improvement and may
even hurt performance slightly, depending on CPU availability.
However, creating the index on the "fid" and "name" columns in
blob2name yields us the same speedup we got.
Parallelizing IMAP makes more sense due to the fact most IMAP
stores are non-local and subject to network latency.
Followup-to: bdecd7ed8e0dcf0b45491b947cd737ba8cfe38a3 ("lei import: speed up kw updates for old IMAP messages")
|
|
This is needed for the upcoming "lei export-kw"
|
|
|
|
Sometimes a mailed patch is generated with non-ideal output,
(lacking context, noisy whitespace changes, etc.), or a user
wants to use the same external diff viewer they've configured
git to use.
Since we have SolverGit to regenerate arbitrary blobs from
patches; this new command allows us to regenerate a diff with
different options using the blobs SolverGit gives us.
The amount of git-diff(1) options is mind numbing, so it's
likely I missed some favorites or botched the getopt spec
translation.
This also fixes Inbox::base_url to check psgi.url_scheme
before attempting to generate URLs and avoid uninitialized
variable warnings. Oddly, the "lei blob" tests did not
trigger these uninitialized warnings.
Note: this will automatically import+index the message(s)
it's regenerating, because solver relies on being able
to lookup pre/postimage OIDs and read blobs.
|
|
Since completely purging blobs from git is slow, users may wish
to index messages in Maildirs (and eventually other local
storage) without storing data in git.
Much code from LeiImport and LeiInput is reused, and a new dummy
FakeImport class supplies a non-storing $im->add and minimize
changes to LeiStore.
The tricky part of this command is to support "lei import"
after a message has gone through "lei index". Relying on
$smsg->{bytes} == 0 (as we do for external-only vmd storage)
does not work here, since it would break searching for "z:"
byte-ranges when not using externals.
This eventually required PublicInbox::Import::add to use a
SharedKV to keep track of imported blobs and prevent
duplication.
|
|
I'm not sure how we'll distinguish JMAP vs read-only HTTPS,
yet; but we'll focus on currently-supported stuff, first.
|
|
I suspect there'll be more lei_input-only things in the future.
|
|
"lei inspect" also shows "mail-sync" as a field name
|
|
Mail::IMAPClient provides the ability to pass a pre-connected
Socket to it. We can rely on this functionality to use
IO::Socket::Socks in place whatever socket class
Mail::IMAPClient chooses to use.
The --proxy=s is shared with curl(1), though we only support
socks5h:// at the moment. Is there any need for SOCKS4 or SOCKS5
without name resolution? Tor .onions require socks5h:// for
name resolution and to prevent data leakage.
|
|
Specifying a UIDVALIDITY value allows the user to enforce
a strict match and force failure. This necessitated changes
to NetReader to allow die() and make error reporting more
suitable for CLI usage rather than daemonized usage of -watch.
|
|
IMAPTracker has a UNIQUE constraint on the `url' column,
which may cause compatibility and/or rollback problems
in attempting to deal with UIDVALIDITY changes.
Having multiple sources of truth leads to confusion and bugs,
so relying on LeiMailSync exclusively ought to simplify things.
Furthermore, since LeiMailSync is only written to by LeiStore,
it is safer in that it won't mark a UID or article as imported
until git-fast-import has seen it, and the SQLite commit always
happens after "done\n" is sent to fast-import.
This mostly reverts recent commits to IMAPTracker to support
lei, those are:
1) commit 7632d8f7590daf70c65d4270e750c36552fa9389
("net_reader: restart on first UID when UIDVALIDITY changes")
2) commit 311a5d37ad275cd75b1e64d87827c4d13fe4bfab
("imap_tracker: prepare for use with lei").
This means public-inbox-watch will not change between 1.6 and
1.7: -watch stops synching a folder when UIDVALIDITY changes.
|
|
This gives "lei import", "lei tag", and similar commands
the ability to use URLs recognized by our PSGI frontend
directly.
This is more convenient than an equivalent shell pipeline
since "set -o pipefail" is not portable and errors may be
lost.
|
|
We aren't using it, yet, but the plan is to be able to use
this information to propagate keyword changes back to IMAP
and Maildir folders using some to-be-implemented command.
"lei inspect" is a half-baked new command to make testing this
change easier. It will be updated to support more SQLite+Xapian
introspection duties in the future, including public-inbox
things independent of lei.
|
|
Followup-to: 49b036771ef3bf45 ("lei_input: support compressed mboxes")
|
|
This saves some work and makes it easier to set volatile
metadata on a message at import time.
|
|
We'll eventually want lei_input users like "lei import" and
"lei tag" to support parallel reads.
|
|
No point in sending a command for every input when a
single one will do. We'll also trigger LeiStore->done
sooner in the worker rather than later.
|
|
Since "lei q" and "lei convert" already support writing these
compressed inboxes, it makes sense that all mbox readers support
them, as well.
Using compression is one reliable way to know an mboxrd or mboxo
hasn't been unexpectedly truncated.
|
|
".eml" is a suffix supported by (/usr/local)/etc/mime.types
on Debian and FreeBSD systems using the "mime-support" package.
".patch" is what "git format-patch" generates by default since
git v1.5.0 in 2007.
|
|
We can consistently open /dev/stdin correctly nowadays, so
drop the input_stdin and just use the normal ->path_to_fd
code path.
|
|
Since lei-daemon won't have the same FDs as the client, we
need to special-case thse mappings and won't be able to open
arbitrary, non-standard FDs.
We also won't attempt to support /proc/self/fd/[0-2] since
that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err}
are portable to FreeBSD, at least. mawk(1) also supports
/dev/std{out,err}, as does gawk(1) (which supports everything
we can support, and arbitrary /dev/fd/$FD).
|
|
"lei convert" is actually a bit of the odd one, since
it uses lei2mail for auth, unlike the others.
|
|
Otherwise we could get non-sensical results if somebody tries
running "lei atfork_child" from the command-line.
|
|
Only tested for keywords and labels with file inputs, so far;
but it seems to do what it needs to do. There's a bit more
redundant code than I'd like, and more opportunities for code
sharing in the future
"lei import" will be expanded to support +kw:$KEYWORD and
+L:$LABEL in the future.
|
|
This matches the long-standing behavior of public-inbox-mda,
public-inbox-learn and our other tools. It is useful because
mutt, "git format-patch", and likely other tools will
pipe a single message with a "From " header line, but with
no further "From " escaping or Content-Length: header.
|
|
This improve code regularity, and will let us deal with
the "RFC822" messages with "From " line that mutt pipes
to.
|
|
Relying on UNIVERSAL::can may cause internal helper methods
to be used, which can lead to failures or nonsensical results.
|
|
These commands accept mail the same way, and this forces
us to maintain consistent input format support between
commands.
We'll be using this for "lei mark", too.
|