Date | Commit message (Collapse) |
|
Not documented, yet, but it runs...
|
|
But warn on it, this lets us test new or throwaway commands more
easily if we don't have to start a new POD for everything we
want to dump in script/.
|
|
In case we want to reuse code with ExtSearchIdx or V2Writable.
|
|
This will let us work consistently with both existing inboxes
and external indices.
|
|
A couple of more things to prepare us to run syncs on
both v1 and v2 inboxes.
|
|
We'll be needing it in ExtSearchIdx for the next commit.
|
|
Now that the V2Writable code is more generic, we can
sync with it to use `units' which represent either
a v2 epoch or an entire v1 inbox.
|
|
We'll be validating against this in the future to stop
bugs from creeping in.
|
|
Moved to per-epoch "units".
|
|
And clearly label it. We may try to reuse some of this for v1
indexing code paths.
|
|
We'll use `index_oid' and `unindex_oid' as our method names
so V2Writable methods may use `$self->can' to access them.
|
|
This will let us use it from ExtSearchIdx.
|
|
This will allow ExtSearchIdx to override or reuse them more
easily. Unfortunately we lose prototype validation, but that
seems to be discouraged anyways given the 'signatures' feature
in Perl 5.20+.
|
|
This will make it easier to reuse some indexing code for ExtSearchIdx.
|
|
Using `->can(method)' allows subclasses to override `index_oid'
and `unindex_oid' methods.
|
|
We want to reuse this code for ExtSearchIdx, eventually.
|
|
Since we store {ibx} in $sync state, we no longer have to
pass it as an argument to log2stack.
|
|
This will allow reusability with ExtSearchIdx
|
|
Having a special init path for external indices is probably
easier than further overloading SearchIdx->new initialization
to work without an Inbox object.
|
|
Not yet tested, but Perl compiles it!
|
|
Using `O' (owner) here (according Xapian omega's
termprefixes.rst) since we could say the newsgroup or inbox is
the owner of the given message.
|
|
It compiles...
|
|
ExtSearchIdx will not have Msgmap, since it may index
non email blobs in the future (it'll still be usable
with IMAP, but not NNTP).
|
|
"remote" used to imply "child process on the same machine" which
was somewhat non-sensical, anyways. And OverIdx has been in the
same process since v2 was finalized. So use the suffix "aux"
for "auxiliary" since it can be safely jettisoned without
breaking URLs.
|
|
This is preferable to open-coding "newsgroup // inboxdir" everywhere.
|
|
We'll be using per-sync-state {ibx} refs instead, so make parts
of the v2 indexing code less-dependent on $self->{ibx} where
$self is a V2Writable object.
|
|
Since external indices won't have msgmap.sqlite3, we'll need to
store last_commit-* metadata in over.sqlite3 instead. This
has a longer limits to account for path names or newsgroup names
stored in keys.
We'll also rely on built-in counters for Xapian document IDs,
since msgmap.sqlite3 no longer provides an AUTOINCREMENT column.
|
|
This will be needed for ExtSearchIdx which doesn't have a
persistent PublicInbox::Inbox object.
|
|
This will make it easier-to-use in ExtSearchIdx.
|
|
We don't need to keep it in code paths which are guaranteed to
only see PublicInbox::Eml (and not Email::MIME or PublicInbox::MIME
which did not round-trip properly). However, we must set
{raw_bytes} since PublicInbox::Eml may add an extra "\n" for
rare messages with no bodies.
|
|
We'll be reusing this for external indices and possibly
other places.
|
|
External indices won't have $self->{ibx} since it needs to
deal with multiple inboxes. We can also hoist out
->parallel_init to make it easier to distinguish the
non-parallel control flow.
|
|
This will be used to track cross-posted messages in the
external/detached index.
|
|
We can simplify callers by using $self->{xpfx} instead of
passing another arg on the stack.
|
|
We'll try to reuse as much V2Writable code as possible for
external indices, but the way "last_commit" info is stored
must be different as external indices will deal with last_commit
info for multiple inboxes.
|
|
This will make it easier to share code with ExtSearchIdx.
|
|
This will be used by external/detached indices, too.
|
|
This will provide a similar API to PublicInbox::Inbox for
read-only WWW, -imapd, and -nntpd interfaces.
|
|
We'll be using this in detached (ext) Xapian indexes
in cross inbox search.
|
|
We linkify these in the WWW UI, and will support them in other
places. These URL schemes may end up being stored in
external/detached indices for indexing non-git-based mail
stores.
|
|
Perhaps some NNTP clients would be unhappy with the old value
"y". So use a bit more bandwidth+space to use the server-name
and historical "!not-for-mail" tail-entry to better conform to
a published RFC.
Reported-by: Andrey Melnikov <temnota.am@gmail.com>
|
|
...instead of spaces. This is specified in RFC 5536 3.1.4.
Include references to RFC 1036, 5536 and 5537 in our docs while
we're at it.
Reported-by: Andrey Melnikov <temnota.am@gmail.com>
Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
|
|
Apparently they happen (triggered by my -imapd instance), so
bail out by closing the underlying socket rather than stopping
the event loop and daemon process.
|
|
Fixes: 6550226296e9db79 ("xt: remove eml_check_roundtrip")
|
|
If there's no body ({bdy} field), ->each_part set the {bdy}
field to "\n" and the ->as_string result afterwards is one
extra "\n" byte longer than the original.
It's not worth extra cycles in common ->each_part calls to
ensure 100% round-trip matches of header-only messages (which
are likely spam), especially when the only difference is a
trailing "\n".
|
|
->cat_async and ->check_async may trigger each other (in future
callers) while waiting, so we need a unified method to ensure
both complete. This doesn't affect current code, but allows us
to slightly simplify existing callers.
|
|
Once again we'll need O_APPEND on a temporary file, so note we
support it, here; since Perl 5.32 is way too new to depend on
our users having.
|
|
We need to loop the inflight check for nested callback
invocations to ensure we don't clog the pipe that feeds
`git cat-file'.
This bug was obscured by the fact that we're already
accounting for 64-char git OIDs with SHA-256 in the
pipe space calculation; perhaps we shouldn't do that.
|
|
For external indices, we'll need to support nested cat_async
invocations to deduplicate cross-posted messages.
Thus we need to ensure we do not clobber the {inflight*} queues
while stepping through and ensure {cat_rbuf} is stored before
invoking callbacks.
This fixes the ->cat_async-only case, but does not yet
account for the mix of ->check_async interspersed with
->cat_async calls, yet. More work will be needed on that
front at a later date.
|
|
It's currently not a problem as ->destroy doesn't
happen for no reason, we'll need to ensure future uses of
->destroy correctly discard the check_async buffer.
|