Date | Commit message (Collapse) |
|
Only tested for keywords and labels with file inputs, so far;
but it seems to do what it needs to do. There's a bit more
redundant code than I'd like, and more opportunities for code
sharing in the future
"lei import" will be expanded to support +kw:$KEYWORD and
+L:$LABEL in the future.
|
|
These commands accept mail the same way, and this forces
us to maintain consistent input format support between
commands.
We'll be using this for "lei mark", too.
|
|
This will be used for keyword (and label) storage for externals.
We'll be using this to ensure we don't redundantly auto-import
messages into lei/store if they're already in a local external
(they can still be imported explicitly via "lei import").
|
|
We'll try to share a bit more configuration with
extindex entries for WWW PSGI usage.
|
|
This was for quote-folding behavior we had long ago, but
it ended up just being yet another import test.
|
|
This saves over 100ms in t/lei-q-remote-import.t so far when
TMPDIR is on an SSD. If we can memoize inbox creation to save a
few dozen milliseconds every test, this could add up to
noticeable savings across our entire test suite.
|
|
Some poorly-configured MUAs will send application/octet-stream
even for text-only attachments. We can't make expect all MUAs
are configured with proper MIME types, and there is plenty of
historical mail that falls into this unfortunate criteria.
v2: simplify the check and ensures returned text is Perl "utf8"
|
|
This is intended to keep track of concepts with different terms
between NNTP, IMAP, config file, lei storage, and upcoming
JMAP support.
|
|
This will eventually be supported for other mail stores,
but Maildir is the easiest to test and support, here.
This lets us avoid a situation where flag changes get
lost between search results.
|
|
Instead of teaching the to-be-implemented "lei show" to search
threads/messages based commits, this orthogonal sub-command is
designed to generate queries for use with "lei q --stdin".
URI-escaped query parameters may be generated with --uri for
HTTP(S) public-inbox instances, but otherwise the output is
designed for "lei q --stdin".
To find threads for a given git commit from a git worktree:
lei p2q $COMMIT_OID | lei q --stdin -t ...
It can also read via --stdin|-
curl $INBOX_URL/$MSGID/raw | lei p2q - | lei q --stdin -t
Or from the filesystem:
lei p2q $(git format-patch -1) | lei q --stdin -t
This defaults to only generating "dfpost:"-prefixed terms since
I've found those most useful for finding messages relating to a
commit. This is subject to change.
--want=s@ is a comma-separated or multi-value list of prefixes
that defaults to "dfpost7". Not all are implemented, yet, but
s, dfn, dfpre, and dfpost all seem to mostly work. Phrase
handling may need to be tweaked to work with Xapian.
OR, NEAR, ADJ, AND, NOT may be used with --want
(e.g. --want=dfpost,OR,dfn)
Prefixing the field prefix with '+' or '-' (e.g. --want=+dfpost)
generates "+dfpost:$EXTRACTED_OID" for Xapian. For non-boolean
search prefixes, wildcard (*) may also be supplied: (--want=dfn*)
For boolean search prefixes, suffixing the field prefix with a
digit (e.g. --want=dfpost7) provides a minimum length, allowing
truncated variations to be searched. This is helpful for
finding older messages as git chooses longer dfpost|dfpre
abbreviations as repos get larger.
Automatic date range generation is not implemented, yet.
|
|
While this diverges from from mairix(1) behavior, it's the safer
option. We'll follow Debian policy by supporting fcntl and
dotlocks by default (in that order). Users who do not want
locking can use "--lock=none"
This will be used in a read-only capacity for watching
mailboxes for keyword updates via inotify or EVFILT_VNODE.
|
|
This can be used to quickly distinguish messages which were
direct hits when doing thread expansion vs messages that
were merely part of the same thread.
This is NOT mairix-derived behavior, but I occasionally found
it useful when looking at results in an MUA to know whether
a message was a direct hit or not.
This makes "-t" consistent with non-"-t" cases as far as keyword
reading goes.
|
|
This lets users avoid network traffic on subsequent searches at
the expense of local disk space. --no-import-remote may be
specified to reverse this trade-off for users with little
storage.
|
|
We can read NNTP in -watch and Net::NNTP is shipped with Perl5,
so lei import and convert have no excuse not to support NNTP
as a client.
Authentication is not tested, yet; but should be close to what
IMAP is like...
|
|
We prefer the IANA-registered form of URIs to avoid confusing
users, but the URI package has yet to support it.
cf. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983419
|
|
Requiring TEST_IMAP_WRITE_URL to be set to a writable IMAP
server URL isn't ideal, but it works for now until we have time
to setup a mock dovecot/cyrus/etc... instance for testing.
|
|
We need to ensure authentication failures and error codes get
propagated to the parent process(es) properly.
v2: update MANIFEST
v3: LeiAuth.pm ->_lei_cfg bit moved to a previous commit
|
|
This makes "lei import" more similar to "lei convert" and
allows importing from disparate sources simultaneously.
We'll also fix some ->child_error usage errors and make
the style of the code more similar to the "lei convert"
code.
v2: fix missing requires
|
|
This will make testing IMAP support for other commands easier, as
it doesn't write to lei/store at all. Like the pager and MUA,
"git credential" is always spawned by script/lei (and not
lei-daemon) so it has a controlling terminal for password
prompts.
v2: fix missing requires, correct test ordering
v3: ensure config exists for IMAP auth
|
|
-imapd won't support newsgroups ending with /\.[0-9]+\z/ since
it reserves those for partitioning inboxes into 50K slices.
So bump the home[0-9]+ version and switch to IMAP-friendly
newsgroup names.
|
|
|
|
We'll be using some of this for IMAP and NNTP support in lei,
too. More will need to be done to improve code sharing and
reusability, soon, but this is a start.
|
|
MdirReader now handles files in "$MAILDIR/new" properly and
is stricter about what it accepts. eml_from_path is also
made robust against FIFOs while eliminating TOCTOU races with
between stat(2) and open(2) calls.
|
|
We'll do more requires in the top-level lei-daemon process to
save work in workers. We can also work towards aborting on
user errors in lei-daemon rather than worker processes.
"lei import -f mbox*" is finally tested inside t/lei_to_mail.t
|
|
It seems to be working trivially, though I'm probably
going to split out Maildir reading into a separate
package rather than using LeiToMail.
|
|
We'll reword and improve formatting with non-breaking spaces
("\xa0") which is only replaced with SP after wrapping.
Some terminology is shortened (e.g. "URL_OR_PATHNAME" => "LOCATION")
to improve formatting.
This also enables completion for -h/--help and lets us
prioritize favored switch names while attempting to
satisfy users relying on muscle memory from other tools.
|
|
This can be useful for users who want to clone and
mirror an existing public-inbox. This doesn't have
update support, yet, so users will need to run
"git fetch && public-inbox-index" for now.
|
|
This makes it easier for hackers to find daemon-specific
tests and forces us to always test both daemon and
oneshot mode.
|
|
We'll probably use this in many more existing places
and likely change non-lei tests to use it.
|
|
This is still overloaded with "lei q" stuff, but that's
somewhat inevitable.
|
|
This will make it easier to maintain and test lei going forward,
we need to be testing against existing read-only daemons. We'll
also save ourselves some boilerplate by exporting all the
Test::More methods directly in TestCommon
We'll start using this by splitting out the latest "lei import"
tests into its own file.
|
|
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
|
|
This will be useful on shared machines when a user doesn't want
search queries visible to other users looking at the ps(1)
output or similar.
|
|
This will allow us to use larger messages and do progress
reporting to accumulate in the main daemon.
|
|
This allows us to avoid repeated open() and close() syscalls
and speeds up the new xt/stress-sharedkv.t maintainer test
by roughly 7%.
|
|
This will let us to maximize the capability of our asynchronous
git API. This lets us avoid relying on EOF to notify lei2mail
workers; thus giving us the option of running fewer lei_xsearch
worker processes in parallel than local sources.
I tried using a synchronous git API; and even with libgit2 in
the same process to avoid the IPC cost failed to match the
throughput afforded by this change. This is because libgit2 is
built (at least on Debian) with the SHA-1 collision code enabled
and ubc_check stuff was dominating my profiles.
|
|
[ew: s/mboxrd/mboxcl2/ since that's what mutt uses]
|
|
Add manpages for lei and the currently implemented subcommands. The
included options and their descriptions follow to a large degree the
--help output, dropping some options that are not currently wired up.
|
|
It's barely started, but I started writing this weeks ago, but
I'm still unsure about some behavioral/usability things and
hoping work on lei(1) can flush them out.
|
|
I missed these during the merge :x
|
|
For proper matching, we'll do a better job canonicalizing
URLs and path names for matching. Of course, users may edit
the file outside of lei, so ensure we try both the canonicalized
and as-is form provided by the user.
I also don't think we'll need to store externals info in
MiscIdx; just the config file is fine.
|
|
All the augment and deduplication stuff seems to be working
based on unit tests. OpPipe is a nice general addition that
will probably make future state machines easier.
|
|
Most writes to stdout aren't atomic and we need locking to
prevent workers from interleaving and corrupting JSON output.
The one case stdout won't require locking is if it's pointed
to a regular file with O_APPEND; as POSIX O_APPEND semantics
guarantees atomicity.
|
|
The new test ensures consistency between oneshot and
client/daemon users. Cancelling an in-progress result now also
stops xsearch workers to avoid wasted CPU and I/O.
Note the lei->atfork_child_wq usage changes, it is to workaround
a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
This switches the internal protocol to use SOCK_SEQPACKET
AF_UNIX sockets to prevent merging messages from the daemon to
client to run pager and kill/exit the client script.
|
|
This internal API is better suited for fork-friendliness (but
locking + dedupe still needs to be re-added).
Normal "json" is the default, though stream-friendly "concatjson"
and "jsonl" (AKA "ndjson" AKA "ldjson") all seem working
(though tests aren't working, yet).
For normal "json", the biggest downside is the necessity of a
trailing "null" element at the end of the array because of
parallel processes, since (AFAIK) regular JSON doesn't allow
trailing commas, unlike JavaScript.
|
|
It's easier to make the code more generic by transferring
all four FDs (std(in|out|err) + socket) instead of omitting
stdin.
We'll be reading from stdin on some imports, and possibly
outputting to stdout, so omitting stdin now would needlessly
complicate things.
The differences with IO::FDPass "1" code paths and the "4"
code paths used by Inline::C and Socket::MsgHdr are far too
much to support and test at the moment.
|
|
For another step in in syscall reduction, we'll support
transferring 3 FDs and a buffer with a single sendmsg/recvmsg
syscall using Socket::MsgHdr if available.
Beyond script/lei itself, this will be used for internal IPC
between search backends (perhaps with SOCK_SEQPACKET). There's
a chance this could make it to the public-facing daemons, too.
This adds an optional dependency on the Socket::MsgHdr package,
available as libsocket-msghdr-perl on Debian-based distros
(but not CentOS 7.x and FreeBSD 11.x, at least).
Our Inline::C version in PublicInbox::Spawn remains the last
choice for script/lei due to the high startup time, and
IO::FDPass remains supported for non-Debian distros.
Since the socket name prefix changes from 3 to 4, we'll also
take this opportunity to make the argv+env buffer transfer less
error-prone by relying on argc instead of designated delimiters.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|
|
The words "extinbox" and "extindex" are too close and easy to
confuse with the other. Rename "extinbox" to "external", since
these could be IMAP, JMAP or other non-public-inbox search APIs.
Link: https://public-inbox.org/meta/20201226112649.GB6226@dcvr/
|
|
I intend to use this with LeiStore when importing from multiple
slow sources at once (e.g. curl, IMAP, etc). This is because
over.sqlite3 can only have a single writer, and we'll have
several slow readers running in parallel.
Watch and SearchIdxShard should also be able to use this code
in the future, but this will be proven with LeiStore, first.
|