about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2020-11-07overidx: introduce changes for external index
Since external indices won't have msgmap.sqlite3, we'll need to store last_commit-* metadata in over.sqlite3 instead. This has a longer limits to account for path names or newsgroup names stored in keys. We'll also rely on built-in counters for Xapian document IDs, since msgmap.sqlite3 no longer provides an AUTOINCREMENT column.
2020-11-07v2writable: count_shards: allow working without {ibx}
This will be needed for ExtSearchIdx which doesn't have a persistent PublicInbox::Inbox object.
2020-11-07v2writable: idx_shard: simplify callers
This will make it easier-to-use in ExtSearchIdx.
2020-11-07searchidxshard: allow msgref to be undef
We don't need to keep it in code paths which are guaranteed to only see PublicInbox::Eml (and not Email::MIME or PublicInbox::MIME which did not round-trip properly). However, we must set {raw_bytes} since PublicInbox::Eml may add an extra "\n" for rare messages with no bodies.
2020-11-07v2writable: hoist out write_alternates
We'll be reusing this for external indices and possibly other places.
2020-11-07v2writable: prepare initialization for external indices
External indices won't have $self->{ibx} since it needs to deal with multiple inboxes. We can also hoist out ->parallel_init to make it easier to distinguish the non-parallel control flow.
2020-11-07searchidx: introduce "xref3" concept
This will be used to track cross-posted messages in the external/detached index.
2020-11-07search: xdb_sharded: make this a public method for ExtSearch
We can simplify callers by using $self->{xpfx} instead of passing another arg on the stack.
2020-11-07v2writable: make OO calls to last_commit-related methods
We'll try to reuse as much V2Writable code as possible for external indices, but the way "last_commit" info is stored must be different as external indices will deal with last_commit info for multiple inboxes.
2020-11-07v2writable: add git method
This will make it easier to share code with ExtSearchIdx.
2020-11-07searchidx: expose INDEXLEVELS as `our'
This will be used by external/detached indices, too.
2020-11-07extsearch: start mocking out
This will provide a similar API to PublicInbox::Inbox for read-only WWW, -imapd, and -nntpd interfaces.
2020-11-07search: hoist out _xdb_sharded for v2 inboxes
We'll be using this in detached (ext) Xapian indexes in cross inbox search.
2020-11-04nntp: attempt RFC 5536 3.1.5-conformant Path: headers
Perhaps some NNTP clients would be unhappy with the old value "y". So use a bit more bandwidth+space to use the server-name and historical "!not-for-mail" tail-entry to better conform to a published RFC. Reported-by: Andrey Melnikov <temnota.am@gmail.com>
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-10-17git: introduce async_wait_all
->cat_async and ->check_async may trigger each other (in future callers) while waiting, so we need a unified method to ensure both complete. This doesn't affect current code, but allows us to slightly simplify existing callers.
2020-10-16tmpfile: modernize to 5.10.1+, note O_APPEND workaround
Once again we'll need O_APPEND on a temporary file, so note we support it, here; since Perl 5.32 is way too new to depend on our users having.
2020-10-16git: async: loop inflight checks for nested callbacks
We need to loop the inflight check for nested callback invocations to ensure we don't clog the pipe that feeds `git cat-file'. This bug was obscured by the fact that we're already accounting for 64-char git OIDs with SHA-256 in the pipe space calculation; perhaps we shouldn't do that.
2020-10-16git: *_async: support nested callback invocations
For external indices, we'll need to support nested cat_async invocations to deduplicate cross-posted messages. Thus we need to ensure we do not clobber the {inflight*} queues while stepping through and ensure {cat_rbuf} is stored before invoking callbacks. This fixes the ->cat_async-only case, but does not yet account for the mix of ->check_async interspersed with ->cat_async calls, yet. More work will be needed on that front at a later date.
2020-10-16git: ensure ->destroy clobbers check_async read buffer
It's currently not a problem as ->destroy doesn't happen for no reason, we'll need to ensure future uses of ->destroy correctly discard the check_async buffer.
2020-10-16inbox: add uidvalidity method
This will make it easier to deal with ExtSearchIdx, which won't have msgmap.
2020-10-13admin: preserve config ordering of `--all' switch
When `--all' is passed to -index and similar commands, process them in the same order as what is given in the config file. This ensures predictable behavior so admins can ensure certain inboxes see updated indices before others. For (upcoming) external indices, this will ensure stable Xref: ordering for predictable caching/memoization by NNTP clients.
2020-10-05manifest: favor Cpanel::JSON::XS
JSON::MaybeXS already favors Cpanel::JSON::XS (and has for many years, now). Allow users to skip installing JSON::MaybeXS if they want an XS-based JSON implementation.
2020-09-30v2writable: use "HEAD" to match v1 indexing behavior
Users may want to change the default branch used for git epochs in v2 (v1 SearchIdx always used whatever "HEAD" pointed to).
2020-09-29searchidx: index lower-case List-Id value
We don't want a List-Id value being confused with a Xapian term prefix, here. Followup-to: 8b06cda3a3af3f0e ("mda: match List-Id insensitively")
2020-09-28gcf2: improve error handling and do not ->fail on wbuf
For historical reasons, both Danga::Socket::write and PublicInbox::DS::write will return 0 when data is buffered; so Gcf2Client must not call ->fail when DS::write returns 0. We'll also improve robustness by recreating the entire Gcf2Client object if it does die for other reasons, instead of risking mismatched fields due to deferred close. We also need to ensure we only get one EPOLLERR wakeup and issue EPOLL_CTL_DEL if ->event_step is triggered by a dying Gcf2 process, so always register the FD with EPOLLONESHOT.
2020-09-27ds: add missing label for systems w/o EPOLLEXCLUSIVE
Oops :x
2020-09-26imap: avoid raising exception if client disconnects
This ought to save a few cycles if a client disconnects while in the middle of a (UID) FETCH. This avoids: Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm errors in stderr.
2020-09-24searchidx: fix (undocumented) --skip-docdata handling
This switch is still undocumented, but we can reduce the scope of our Xapian docdata dependency by moving its only caller to SearchIdx. This reduces the amount of code loaded by read-only code paths.
2020-09-24v2writable: drop outdated {unindex_range} check
{unindex_range} only exists in the $sync state, nowadays, not the V2Writable ($self) object. $sync->{unindex_range} won't be populated if $regen_max is zero, either, unless somebody is injecting importable commits into an epoch history, in which this change will result in no-op indexing doing no work.
2020-09-24idxstack: fix comment about file_char
It's `d' for deletes, not `a'.
2020-09-22mda: match List-Id insensitively
This follows -watch commit b70473ab8296d31ebb600adb4fa8fe0ac5935ca8 to match List-Id headers case-insensitively. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20200921180152.uyqluod7qxbwqubo@chatter.i7.local/
2020-09-20mid: drop repeated ';' in mid_escape() regular expression
2020-09-20config: warn on multiple values for some fields
Our code doesn't support multi-values for these, and having unexpected arrays leads to unexpected results (e.g. showing stuff like "ARRAY(0xDEADBEEFADD12E55)" in user interfaces). So warn and only use the last value (matching git-config(1) behavior without `--get-all').
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2*: more descriptive package descriptions
Hopefully this allows others to more quickly figure out what's going on.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-19gcf2: libgit2-based git cat-file alternative
Having tens of thousands of inboxes and associated git processes won't work well, so we'll use libgit2 to access the object DB directly. We only care about OID lookups and won't need to rely on per-repo revision names or paths. The Git::Raw XS package won't be used since its manpages don't promise a stable API. Since we already use Inline::C and have experience with I::C when it comes to compatibility, this only introduces libgit2 itself as a source of new incompatibilities. This also provides an excuse for me to writev(2) to reduce syscalls, but liburing is on the horizon for next year.
2020-09-18git_async_cat: inline + drop redundant batch_prepare call
$git->cat_async already calls $git->batch_prepare iff needed, so we can reduce subroutine calls and inline a one-off subroutine to save some memory, here.
2020-09-16git_async_cat: fix outdated comment
We replaced Danga::Socket with PublicInbox::DS roughly a year before GitAsyncCat was introduced into our git history.
2020-09-16wwwtext: link to public-inbox.org/meta archives
Since we're advertising our address at meta@public-inbox.org, we should advertise the archives, too.
2020-09-16wwwstream: link to cgit URLs for coderepo
Hopefully this reduces the ambiguity between code for the project(s) using public-inbox and the code for public-inbox itself.
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-09-16mid: rename MID_MAX to ID_MAX
It's only used for HTML anchors which we will need indefinitely.
2020-09-15imap: quiet uninitialized variable warning on FETCH
This was triggered by blindly trying to FETCH an MSN (not "UID FETCH") on an empty dummy inbox. It's harmless, and probably triggered by a wayward client or misbehaving bot.
2020-09-14tests: consistently check for xapian-compact
We may need to test against development versions of Xapian, which may rely on setting `XAPIAN_COMPACT=xapian-compact-1.5'. Ensure it's possible to do that. And add a missing check in t/xcpdb-reshard.t, too.
2020-09-14sigfd: fix typos and scoping on systems w/o epoll+kqueue
Unfortunately, I'm not sure how easy catching these at compile-time, is. Prototypes do not seem to check these at compile time when crossing packages (not even with exported subroutines).