about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2020-11-07v2writable: pass oid to uindex_oid
We'll be validating against this in the future to stop bugs from creeping in.
2020-11-07extsearchidx: remove {unindex_range} field
Moved to per-epoch "units".
2020-11-07v2writable: reduce scope of epoch-aware code
And clearly label it. We may try to reuse some of this for v1 indexing code paths.
2020-11-07extsearchidx: more compatibility with V2Writable callers
We'll use `index_oid' and `unindex_oid' as our method names so V2Writable methods may use `$self->can' to access them.
2020-11-07v2writable: move size check init to sync_prepare
This will let us use it from ExtSearchIdx.
2020-11-07v2writable: make *last_commits and sync_prepare OO methods
This will allow ExtSearchIdx to override or reuse them more easily. Unfortunately we lose prototype validation, but that seems to be discouraged anyways given the 'signatures' feature in Perl 5.20+.
2020-11-07v2writable: rename {v2w} field to {self}
This will make it easier to reuse some indexing code for ExtSearchIdx.
2020-11-07v2writable: allow OO method references
Using `->can(method)' allows subclasses to override `index_oid' and `unindex_oid' methods.
2020-11-07v2writable: more generic sync setup code
We want to reuse this code for ExtSearchIdx, eventually.
2020-11-07searchidx: log2stack: simplify callers
Since we store {ibx} in $sync state, we no longer have to pass it as an argument to log2stack.
2020-11-07searchidx: put {ibx} into $sync state
This will allow reusability with ExtSearchIdx
2020-11-07searchidxshard: special init for eidx
Having a special init path for external indices is probably easier than further overloading SearchIdx->new initialization to work without an Inbox object.
2020-11-07searchidx: xref3 delete support
Not yet tested, but Perl compiles it!
2020-11-07searchidx: index eidx_key as a boolean term
Using `O' (owner) here (according Xapian omega's termprefixes.rst) since we could say the newsgroup or inbox is the owner of the given message.
2020-11-07extsearchidx: initial implementation
It compiles...
2020-11-07v2writable: checkpoint: account for lack of {mm}
ExtSearchIdx will not have Msgmap, since it may index non email blobs in the future (it'll still be usable with IMAP, but not NNTP).
2020-11-07v2writable: rename remaining "remote" terminology
"remote" used to imply "child process on the same machine" which was somewhat non-sensical, anyways. And OverIdx has been in the same process since v2 was finalized. So use the suffix "aux" for "auxiliary" since it can be safely jettisoned without breaking URLs.
2020-11-07inboxwritable: eidx_key for external index
This is preferable to open-coding "newsgroup // inboxdir" everywhere.
2020-11-07v2: some changes for ExtSearchIdx compatibility
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object.
2020-11-07overidx: introduce changes for external index
Since external indices won't have msgmap.sqlite3, we'll need to store last_commit-* metadata in over.sqlite3 instead. This has a longer limits to account for path names or newsgroup names stored in keys. We'll also rely on built-in counters for Xapian document IDs, since msgmap.sqlite3 no longer provides an AUTOINCREMENT column.
2020-11-07v2writable: count_shards: allow working without {ibx}
This will be needed for ExtSearchIdx which doesn't have a persistent PublicInbox::Inbox object.
2020-11-07v2writable: idx_shard: simplify callers
This will make it easier-to-use in ExtSearchIdx.
2020-11-07searchidxshard: allow msgref to be undef
We don't need to keep it in code paths which are guaranteed to only see PublicInbox::Eml (and not Email::MIME or PublicInbox::MIME which did not round-trip properly). However, we must set {raw_bytes} since PublicInbox::Eml may add an extra "\n" for rare messages with no bodies.
2020-11-07v2writable: hoist out write_alternates
We'll be reusing this for external indices and possibly other places.
2020-11-07v2writable: prepare initialization for external indices
External indices won't have $self->{ibx} since it needs to deal with multiple inboxes. We can also hoist out ->parallel_init to make it easier to distinguish the non-parallel control flow.
2020-11-07searchidx: introduce "xref3" concept
This will be used to track cross-posted messages in the external/detached index.
2020-11-07search: xdb_sharded: make this a public method for ExtSearch
We can simplify callers by using $self->{xpfx} instead of passing another arg on the stack.
2020-11-07v2writable: make OO calls to last_commit-related methods
We'll try to reuse as much V2Writable code as possible for external indices, but the way "last_commit" info is stored must be different as external indices will deal with last_commit info for multiple inboxes.
2020-11-07v2writable: add git method
This will make it easier to share code with ExtSearchIdx.
2020-11-07searchidx: expose INDEXLEVELS as `our'
This will be used by external/detached indices, too.
2020-11-07extsearch: start mocking out
This will provide a similar API to PublicInbox::Inbox for read-only WWW, -imapd, and -nntpd interfaces.
2020-11-07search: hoist out _xdb_sharded for v2 inboxes
We'll be using this in detached (ext) Xapian indexes in cross inbox search.
2020-11-04nntp: attempt RFC 5536 3.1.5-conformant Path: headers
Perhaps some NNTP clients would be unhappy with the old value "y". So use a bit more bandwidth+space to use the server-name and historical "!not-for-mail" tail-entry to better conform to a published RFC. Reported-by: Andrey Melnikov <temnota.am@gmail.com>
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-10-17git: introduce async_wait_all
->cat_async and ->check_async may trigger each other (in future callers) while waiting, so we need a unified method to ensure both complete. This doesn't affect current code, but allows us to slightly simplify existing callers.
2020-10-16tmpfile: modernize to 5.10.1+, note O_APPEND workaround
Once again we'll need O_APPEND on a temporary file, so note we support it, here; since Perl 5.32 is way too new to depend on our users having.
2020-10-16git: async: loop inflight checks for nested callbacks
We need to loop the inflight check for nested callback invocations to ensure we don't clog the pipe that feeds `git cat-file'. This bug was obscured by the fact that we're already accounting for 64-char git OIDs with SHA-256 in the pipe space calculation; perhaps we shouldn't do that.
2020-10-16git: *_async: support nested callback invocations
For external indices, we'll need to support nested cat_async invocations to deduplicate cross-posted messages. Thus we need to ensure we do not clobber the {inflight*} queues while stepping through and ensure {cat_rbuf} is stored before invoking callbacks. This fixes the ->cat_async-only case, but does not yet account for the mix of ->check_async interspersed with ->cat_async calls, yet. More work will be needed on that front at a later date.
2020-10-16git: ensure ->destroy clobbers check_async read buffer
It's currently not a problem as ->destroy doesn't happen for no reason, we'll need to ensure future uses of ->destroy correctly discard the check_async buffer.
2020-10-16inbox: add uidvalidity method
This will make it easier to deal with ExtSearchIdx, which won't have msgmap.
2020-10-13admin: preserve config ordering of `--all' switch
When `--all' is passed to -index and similar commands, process them in the same order as what is given in the config file. This ensures predictable behavior so admins can ensure certain inboxes see updated indices before others. For (upcoming) external indices, this will ensure stable Xref: ordering for predictable caching/memoization by NNTP clients.
2020-10-05manifest: favor Cpanel::JSON::XS
JSON::MaybeXS already favors Cpanel::JSON::XS (and has for many years, now). Allow users to skip installing JSON::MaybeXS if they want an XS-based JSON implementation.
2020-09-30v2writable: use "HEAD" to match v1 indexing behavior
Users may want to change the default branch used for git epochs in v2 (v1 SearchIdx always used whatever "HEAD" pointed to).
2020-09-29searchidx: index lower-case List-Id value
We don't want a List-Id value being confused with a Xapian term prefix, here. Followup-to: 8b06cda3a3af3f0e ("mda: match List-Id insensitively")
2020-09-28gcf2: improve error handling and do not ->fail on wbuf
For historical reasons, both Danga::Socket::write and PublicInbox::DS::write will return 0 when data is buffered; so Gcf2Client must not call ->fail when DS::write returns 0. We'll also improve robustness by recreating the entire Gcf2Client object if it does die for other reasons, instead of risking mismatched fields due to deferred close. We also need to ensure we only get one EPOLLERR wakeup and issue EPOLL_CTL_DEL if ->event_step is triggered by a dying Gcf2 process, so always register the FD with EPOLLONESHOT.
2020-09-27ds: add missing label for systems w/o EPOLLEXCLUSIVE
Oops :x
2020-09-26imap: avoid raising exception if client disconnects
This ought to save a few cycles if a client disconnects while in the middle of a (UID) FETCH. This avoids: Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm errors in stderr.
2020-09-24searchidx: fix (undocumented) --skip-docdata handling
This switch is still undocumented, but we can reduce the scope of our Xapian docdata dependency by moving its only caller to SearchIdx. This reduces the amount of code loaded by read-only code paths.
2020-09-24v2writable: drop outdated {unindex_range} check
{unindex_range} only exists in the $sync state, nowadays, not the V2Writable ($self) object. $sync->{unindex_range} won't be populated if $regen_max is zero, either, unless somebody is injecting importable commits into an epoch history, in which this change will result in no-op indexing doing no work.