about summary refs log tree commit homepage
DateCommit message (Collapse)
2020-11-07script: add preliminary eindex implementation
Not documented, yet, but it runs...
2020-11-07Makefile.PL: do not build manpage if POD is missing
But warn on it, this lets us test new or throwaway commands more easily if we don't have to start a new POD for everything we want to dump in script/.
2020-11-07searchidx: favor $sync->{ibx} (over $self->{ibx})
In case we want to reuse code with ExtSearchIdx or V2Writable.
2020-11-07searchidx: reduce inbox-dependency, wrap ->with_umask
This will let us work consistently with both existing inboxes and external indices.
2020-11-07extsearchidx: sync updates
A couple of more things to prepare us to run syncs on both v1 and v2 inboxes.
2020-11-07searchidx: export prepare_stack
We'll be needing it in ExtSearchIdx for the next commit.
2020-11-07extsearchidx: sync unit updates
Now that the V2Writable code is more generic, we can sync with it to use `units' which represent either a v2 epoch or an entire v1 inbox.
2020-11-07v2writable: pass oid to uindex_oid
We'll be validating against this in the future to stop bugs from creeping in.
2020-11-07extsearchidx: remove {unindex_range} field
Moved to per-epoch "units".
2020-11-07v2writable: reduce scope of epoch-aware code
And clearly label it. We may try to reuse some of this for v1 indexing code paths.
2020-11-07extsearchidx: more compatibility with V2Writable callers
We'll use `index_oid' and `unindex_oid' as our method names so V2Writable methods may use `$self->can' to access them.
2020-11-07v2writable: move size check init to sync_prepare
This will let us use it from ExtSearchIdx.
2020-11-07v2writable: make *last_commits and sync_prepare OO methods
This will allow ExtSearchIdx to override or reuse them more easily. Unfortunately we lose prototype validation, but that seems to be discouraged anyways given the 'signatures' feature in Perl 5.20+.
2020-11-07v2writable: rename {v2w} field to {self}
This will make it easier to reuse some indexing code for ExtSearchIdx.
2020-11-07v2writable: allow OO method references
Using `->can(method)' allows subclasses to override `index_oid' and `unindex_oid' methods.
2020-11-07v2writable: more generic sync setup code
We want to reuse this code for ExtSearchIdx, eventually.
2020-11-07searchidx: log2stack: simplify callers
Since we store {ibx} in $sync state, we no longer have to pass it as an argument to log2stack.
2020-11-07searchidx: put {ibx} into $sync state
This will allow reusability with ExtSearchIdx
2020-11-07searchidxshard: special init for eidx
Having a special init path for external indices is probably easier than further overloading SearchIdx->new initialization to work without an Inbox object.
2020-11-07searchidx: xref3 delete support
Not yet tested, but Perl compiles it!
2020-11-07searchidx: index eidx_key as a boolean term
Using `O' (owner) here (according Xapian omega's termprefixes.rst) since we could say the newsgroup or inbox is the owner of the given message.
2020-11-07extsearchidx: initial implementation
It compiles...
2020-11-07v2writable: checkpoint: account for lack of {mm}
ExtSearchIdx will not have Msgmap, since it may index non email blobs in the future (it'll still be usable with IMAP, but not NNTP).
2020-11-07v2writable: rename remaining "remote" terminology
"remote" used to imply "child process on the same machine" which was somewhat non-sensical, anyways. And OverIdx has been in the same process since v2 was finalized. So use the suffix "aux" for "auxiliary" since it can be safely jettisoned without breaking URLs.
2020-11-07inboxwritable: eidx_key for external index
This is preferable to open-coding "newsgroup // inboxdir" everywhere.
2020-11-07v2: some changes for ExtSearchIdx compatibility
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object.
2020-11-07overidx: introduce changes for external index
Since external indices won't have msgmap.sqlite3, we'll need to store last_commit-* metadata in over.sqlite3 instead. This has a longer limits to account for path names or newsgroup names stored in keys. We'll also rely on built-in counters for Xapian document IDs, since msgmap.sqlite3 no longer provides an AUTOINCREMENT column.
2020-11-07v2writable: count_shards: allow working without {ibx}
This will be needed for ExtSearchIdx which doesn't have a persistent PublicInbox::Inbox object.
2020-11-07v2writable: idx_shard: simplify callers
This will make it easier-to-use in ExtSearchIdx.
2020-11-07searchidxshard: allow msgref to be undef
We don't need to keep it in code paths which are guaranteed to only see PublicInbox::Eml (and not Email::MIME or PublicInbox::MIME which did not round-trip properly). However, we must set {raw_bytes} since PublicInbox::Eml may add an extra "\n" for rare messages with no bodies.
2020-11-07v2writable: hoist out write_alternates
We'll be reusing this for external indices and possibly other places.
2020-11-07v2writable: prepare initialization for external indices
External indices won't have $self->{ibx} since it needs to deal with multiple inboxes. We can also hoist out ->parallel_init to make it easier to distinguish the non-parallel control flow.
2020-11-07searchidx: introduce "xref3" concept
This will be used to track cross-posted messages in the external/detached index.
2020-11-07search: xdb_sharded: make this a public method for ExtSearch
We can simplify callers by using $self->{xpfx} instead of passing another arg on the stack.
2020-11-07v2writable: make OO calls to last_commit-related methods
We'll try to reuse as much V2Writable code as possible for external indices, but the way "last_commit" info is stored must be different as external indices will deal with last_commit info for multiple inboxes.
2020-11-07v2writable: add git method
This will make it easier to share code with ExtSearchIdx.
2020-11-07searchidx: expose INDEXLEVELS as `our'
This will be used by external/detached indices, too.
2020-11-07extsearch: start mocking out
This will provide a similar API to PublicInbox::Inbox for read-only WWW, -imapd, and -nntpd interfaces.
2020-11-07search: hoist out _xdb_sharded for v2 inboxes
We'll be using this in detached (ext) Xapian indexes in cross inbox search.
2020-11-05doc/standards: add RFCs for URL schemes
We linkify these in the WWW UI, and will support them in other places. These URL schemes may end up being stored in external/detached indices for indexing non-git-based mail stores.
2020-11-04nntp: attempt RFC 5536 3.1.5-conformant Path: headers
Perhaps some NNTP clients would be unhappy with the old value "y". So use a bit more bandwidth+space to use the server-name and historical "!not-for-mail" tail-entry to better conform to a published RFC. Reported-by: Andrey Melnikov <temnota.am@gmail.com>
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-10-17actually remove xt/eml_check_roundtrip.t
Fixes: 6550226296e9db79 ("xt: remove eml_check_roundtrip")
2020-10-17xt: remove eml_check_roundtrip
If there's no body ({bdy} field), ->each_part set the {bdy} field to "\n" and the ->as_string result afterwards is one extra "\n" byte longer than the original. It's not worth extra cycles in common ->each_part calls to ensure 100% round-trip matches of header-only messages (which are likely spam), especially when the only difference is a trailing "\n".
2020-10-17git: introduce async_wait_all
->cat_async and ->check_async may trigger each other (in future callers) while waiting, so we need a unified method to ensure both complete. This doesn't affect current code, but allows us to slightly simplify existing callers.
2020-10-16tmpfile: modernize to 5.10.1+, note O_APPEND workaround
Once again we'll need O_APPEND on a temporary file, so note we support it, here; since Perl 5.32 is way too new to depend on our users having.
2020-10-16git: async: loop inflight checks for nested callbacks
We need to loop the inflight check for nested callback invocations to ensure we don't clog the pipe that feeds `git cat-file'. This bug was obscured by the fact that we're already accounting for 64-char git OIDs with SHA-256 in the pipe space calculation; perhaps we shouldn't do that.
2020-10-16git: *_async: support nested callback invocations
For external indices, we'll need to support nested cat_async invocations to deduplicate cross-posted messages. Thus we need to ensure we do not clobber the {inflight*} queues while stepping through and ensure {cat_rbuf} is stored before invoking callbacks. This fixes the ->cat_async-only case, but does not yet account for the mix of ->check_async interspersed with ->cat_async calls, yet. More work will be needed on that front at a later date.
2020-10-16git: ensure ->destroy clobbers check_async read buffer
It's currently not a problem as ->destroy doesn't happen for no reason, we'll need to ensure future uses of ->destroy correctly discard the check_async buffer.