Date | Commit message (Collapse) |
|
Only the lei/store process should be writing to files/DBs
in lei/store.
|
|
This is similar to "public-inbox-learn rm", but it's
possible to point an entire Maildir/IMAP/mbox*/newsgroup
at it.
|
|
The cost of supporting separate code paths between oneshot and
daemon isn't worth the trouble; especially if there are more
users to support. The test suite time nearly doubles with
oneshot, so that's hurting developer productivity.
FD passing is currently required to work efficiently with
remote HTTP(S) queries which return large messages, as seen in
commit 708b182a57373172f5523f3dc297659d58e03b58
("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case").
Additionally, upcoming support for IMAP IDLE and inotify-based
monitoring of Maildirs cannot work properly without a background
daemon.
|
|
Since completely purging blobs from git is slow, users may wish
to index messages in Maildirs (and eventually other local
storage) without storing data in git.
Much code from LeiImport and LeiInput is reused, and a new dummy
FakeImport class supplies a non-storing $im->add and minimize
changes to LeiStore.
The tricky part of this command is to support "lei import"
after a message has gone through "lei index". Relying on
$smsg->{bytes} == 0 (as we do for external-only vmd storage)
does not work here, since it would break searching for "z:"
byte-ranges when not using externals.
This eventually required PublicInbox::Import::add to use a
SharedKV to keep track of imported blobs and prevent
duplication.
|
|
This will allow keyword updates from other folders to propagate
to folders where search results may be duplicated.
|
|
Prior to this change, it was possible for oneshot lei processes
to race on epoch creation/rollover. lei-daemon normally
prevents the problem by funnelling all writes to a single
socket, but oneshot lei has no such protection.
|
|
"lei import" is probably the only place where it users
might care about warnings.
|
|
We aren't using it, yet, but the plan is to be able to use
this information to propagate keyword changes back to IMAP
and Maildir folders using some to-be-implemented command.
"lei inspect" is a half-baked new command to make testing this
change easier. It will be updated to support more SQLite+Xapian
introspection duties in the future, including public-inbox
things independent of lei.
|
|
It's nicer in case a user transfers lei/store across machines
and wants a way to track when/where they imported something.
|
|
Since every command that writes to lei/store calls ->done
to commit its output, we can rely on that to return a
pathname for a readable file with errors in it.
Errors can still get crossed up if multiple lei commands
are writing to the store at once, but reduces the delay
in seeing them and ensures it won't get seen when somebody
is attempting to use shell completion.
|
|
We'll just let the ExtSearchIdx code handle this uncommon case
by doing a full commit.
|
|
It's needless noise when doing augment and output preparation
and shows up way too late and out-of-band with lei-daemon.
|
|
lei_store contents aren't intended to become public, so there's
no point in nagging users for their email address for git
committer information like git does.
|
|
There's no point in adding vmd information for an external
message if it was never stored and there's no vmd at all.
We also don't need to check _docids_for for similar messages,
either, since we always check lse->kw_changed, first.
|
|
We can tweak lse->kw_changed to return docids and reduce IPC
traffic and reduce work the lei/store worker needs to do.
|
|
"lei q" now displays labels in JSON output, "lei mark"
can add or remove labels for any messages.
"lei ls-label" is supported, too.
Unfortunately, "lei q" won't hande "kw:" or "L:" for
external messages, they must be imported, first.
|
|
We'll prioritize the last two components of the path name
("lei/store") since that's how I often refer to the on-disk
location. Then, show the XDG_DATA_HOME it belongs to in case
a user changes HOME or XDG_* for testing purposes.
|
|
Otherwise we could get non-sensical results if somebody tries
running "lei atfork_child" from the command-line.
|
|
Only tested for keywords and labels with file inputs, so far;
but it seems to do what it needs to do. There's a bit more
redundant code than I'd like, and more opportunities for code
sharing in the future
"lei import" will be expanded to support +kw:$KEYWORD and
+L:$LABEL in the future.
|
|
Keyword storage for external-only messages was preventing
messages from being explicitly imported. Teach lei_store
to vivify keyword-only entries into fully-indexed messages
on import.
|
|
"lei q" now preserves changes per-message keywords across
invocations when it's --output (Maildir or mbox) is reused
(with or without --augment).
In the future, these changes will be monitored via inotify,
EVFILT_VNODE or IMAP IDLE, too.
Unfortunately, this currently prevents "lei import" from ever
importing a message that's in an external. That will be fixed
in a future change.
|
|
This will be used for keyword (and label) storage for externals.
We'll be using this to ensure we don't redundantly auto-import
messages into lei/store if they're already in a local external
(they can still be imported explicitly via "lei import").
|
|
This was causing errors in a mass keyword import patch
I'm working on.
|
|
Since keywords and mailboxes (AKA labels) are separate things in
JMAP; and only keywords can map reliably to Maildir and mbox;
we'll keep them separate in our internal data representations,
too.
I initially wanted to call this just "meta" for "metadata", but
that might be confused with our mailing list name. "metadata"
is already used in Xapian's own API, to add another layer of
confusion.
"tags" was also considered, but probably confusing to notmuch
users since our "labels" are analogous to "tags" in notmuch,
and notmuch doesn't seem to cover "keywords" separately...
So "vmd" it is, since we haven't used this particular
three-letter-abbreviation anywhere before; and "volatile" seems
like a good description of this metadata since everything else
up to this point has been mostly WORM (write-once, read-many).
|
|
It's redundant and the same functionality is in MdirReader.
|
|
MboxReader is a more appropriate place for it than LeiStore.
|
|
We only want to auto import messages that are exclusively in
remote externals. Messages in local externals are not
auto-imported to save space and reduce wear on storage device.
|
|
Having a one-off Maildir functionality in LeiStore doesn't seem
worth the maintenance burden, especially given an upcoming
change to skip trashed messages.
I expect this will hurt performance slightly with extra IPC
overhead for the socket copy, but "lei import" may eventually
become rare or at least not hit messages redundantly.
|
|
This will eventually be supported for other mail stores,
but Maildir is the easiest to test and support, here.
This lets us avoid a situation where flag changes get
lost between search results.
|
|
Since eidx_init updates ALL.git/objects/info/alternates, we need
to ensure new epochs we create from LeiStore->importer exist
before eidx_init writes alternates.
Reported-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/8735xou0gq.fsf@kyleam.com/
|
|
It seems to be working trivially, though I'm probably
going to split out Maildir reading into a separate
package rather than using LeiToMail.
|
|
Only tested with .eml files so far, but Maildir + IMAP
will be supported.
|
|
Parallelism and interactivity with pager + SIGPIPE needs work;
but results are shown and phrase search works without shell
users having to apply Xapian quoting rules on top of standard
shell quoting.
|
|
We don't need to be keeping the raw message around after it hits
git. Shard work now relies on Storable (or Sereal) and all of
the indexing code relies on the Email::MIME-like API of Eml to
access interesting parts of the message.
Similarly, smsg->{raw_bytes} is no longer carried around and we
do the CRLF adjustment when setting smsg->{bytes}.
There's also a small simplification to t/import.t while
we're in the area to use xqx instead of spawn/popen_rd.
|
|
Since Storable and Sereal are designed for lossless
serialization, we'll just pass $eml objects to whatever process
is running SearchIdx.
|
|
We can remove some now-pointless wrapper functions by using
->ipc_do in even more places.
|
|
While the changes to git->qx/git->popen from commit 171a9c24022ad7ef
will be useful for the lei daemon, hiding git error messages from
actual users is probably wrong and we'll just localize GIT_*
vars for testing.
|
|
Hopefully this will make it easier to spot dependency
bugs in the future.
|
|
$git->qx and $git->popen now $env and $opt for redirects
like lower-level popen_rd. This may be beneficial in other
places.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
For personal mail, unsent drafts messages are a common source of
messages without Message-IDs.
|
|
Add a ->set_eml method which can be a useful fire-and-forget
way of either adding new files to store OR setting keywords
on them.
When seeing brand-new messages, add_eml can afford to return
more information in the smsg instead of just the OID.
|
|
I intend to use this with LeiStore when importing from multiple
slow sources at once (e.g. curl, IMAP, etc). This is because
over.sqlite3 can only have a single writer, and we'll have
several slow readers running in parallel.
Watch and SearchIdxShard should also be able to use this code
in the future, but this will be proven with LeiStore, first.
|
|
In retrospect, per-machine HEADs was a bad idea because users
of removable storage would be thrown off when moving storage
between different machines.
This is only a partial revert, the Import::init_bare change to
support alternate head names still exists because we may use it
for other reasons.
|
|
It may be helpful to identify the source of messages
and perhaps avoid conflicting history.
On the other hand, this may be a terrible idea for users who
move portable storage (e.g. USB sticks) across computers...
|
|
Dovecot, mutt, and likely much other software support mbox
Status/X-Status headers. Ensure we have a way to extract these
headers as JMAP-compatible keywords before removing them for git
storage.
->add_eml now accepts setting keywords at import time,
and will probably be called like this:
$lst->add_eml($eml, $lst->mbox_keywords($eml));
$lst->add_eml($eml, $lst->maildir_keywords($fn));
|
|
It's pretty meaningless, since probably nobody notices committer
info we extract author info from individual emails, anyways.
|
|
This follows how we detect the max epoch for v2 and shard count
in Xapian.
|
|
There's a bunch of work in here as the foundations are being
fleshed out. One of the UI/UX is to make it easy to keep
built-in help and shell completions consistent
|
|
Still unstable, this builds off the equally unstable extindex :P
This will be used for caching/memoization of traditional mail
stores (IMAP, Maildir, etc) while providing indexing via Xapian,
along with compression, and checksumming from git.
Most notably, this adds the ability to add/remove per-message
keywords (draft, seen, flagged, answered) as described in the
JMAP specification (RFC 8621 section 4.1.1).
We'll use `.' (a single period) as an $eidx_key since it's an
invalid {inboxdir} or {newsgroup} name.
|