about summary refs log tree commit homepage
path: root/lib/PublicInbox
DateCommit message (Collapse)
2020-11-07search: hoist out _xdb_sharded for v2 inboxes
We'll be using this in detached (ext) Xapian indexes in cross inbox search.
2020-11-04nntp: attempt RFC 5536 3.1.5-conformant Path: headers
Perhaps some NNTP clients would be unhappy with the old value "y". So use a bit more bandwidth+space to use the server-name and historical "!not-for-mail" tail-entry to better conform to a published RFC. Reported-by: Andrey Melnikov <temnota.am@gmail.com>
2020-11-04nntp: delimit Newsgroup: header with commas
...instead of spaces. This is specified in RFC 5536 3.1.4. Include references to RFC 1036, 5536 and 5537 in our docs while we're at it. Reported-by: Andrey Melnikov <temnota.am@gmail.com> Link: https://public-inbox.org/meta/CA+PODjpUN5Q4gBFQhAzUNuMasVEdmp9f=8Uo0Ej0mFumdSwi4w@mail.gmail.com/
2020-10-30tls: epollbit: account for miscellaneous OpenSSL errors
Apparently they happen (triggered by my -imapd instance), so bail out by closing the underlying socket rather than stopping the event loop and daemon process.
2020-10-17git: introduce async_wait_all
->cat_async and ->check_async may trigger each other (in future callers) while waiting, so we need a unified method to ensure both complete. This doesn't affect current code, but allows us to slightly simplify existing callers.
2020-10-16tmpfile: modernize to 5.10.1+, note O_APPEND workaround
Once again we'll need O_APPEND on a temporary file, so note we support it, here; since Perl 5.32 is way too new to depend on our users having.
2020-10-16git: async: loop inflight checks for nested callbacks
We need to loop the inflight check for nested callback invocations to ensure we don't clog the pipe that feeds `git cat-file'. This bug was obscured by the fact that we're already accounting for 64-char git OIDs with SHA-256 in the pipe space calculation; perhaps we shouldn't do that.
2020-10-16git: *_async: support nested callback invocations
For external indices, we'll need to support nested cat_async invocations to deduplicate cross-posted messages. Thus we need to ensure we do not clobber the {inflight*} queues while stepping through and ensure {cat_rbuf} is stored before invoking callbacks. This fixes the ->cat_async-only case, but does not yet account for the mix of ->check_async interspersed with ->cat_async calls, yet. More work will be needed on that front at a later date.
2020-10-16git: ensure ->destroy clobbers check_async read buffer
It's currently not a problem as ->destroy doesn't happen for no reason, we'll need to ensure future uses of ->destroy correctly discard the check_async buffer.
2020-10-16inbox: add uidvalidity method
This will make it easier to deal with ExtSearchIdx, which won't have msgmap.
2020-10-13admin: preserve config ordering of `--all' switch
When `--all' is passed to -index and similar commands, process them in the same order as what is given in the config file. This ensures predictable behavior so admins can ensure certain inboxes see updated indices before others. For (upcoming) external indices, this will ensure stable Xref: ordering for predictable caching/memoization by NNTP clients.
2020-10-05manifest: favor Cpanel::JSON::XS
JSON::MaybeXS already favors Cpanel::JSON::XS (and has for many years, now). Allow users to skip installing JSON::MaybeXS if they want an XS-based JSON implementation.
2020-09-30v2writable: use "HEAD" to match v1 indexing behavior
Users may want to change the default branch used for git epochs in v2 (v1 SearchIdx always used whatever "HEAD" pointed to).
2020-09-29searchidx: index lower-case List-Id value
We don't want a List-Id value being confused with a Xapian term prefix, here. Followup-to: 8b06cda3a3af3f0e ("mda: match List-Id insensitively")
2020-09-28gcf2: improve error handling and do not ->fail on wbuf
For historical reasons, both Danga::Socket::write and PublicInbox::DS::write will return 0 when data is buffered; so Gcf2Client must not call ->fail when DS::write returns 0. We'll also improve robustness by recreating the entire Gcf2Client object if it does die for other reasons, instead of risking mismatched fields due to deferred close. We also need to ensure we only get one EPOLLERR wakeup and issue EPOLL_CTL_DEL if ->event_step is triggered by a dying Gcf2 process, so always register the FD with EPOLLONESHOT.
2020-09-27ds: add missing label for systems w/o EPOLLEXCLUSIVE
Oops :x
2020-09-26imap: avoid raising exception if client disconnects
This ought to save a few cycles if a client disconnects while in the middle of a (UID) FETCH. This avoids: Can't call method "git" on an undefined value at .../PublicInbox/IMAP.pm errors in stderr.
2020-09-24searchidx: fix (undocumented) --skip-docdata handling
This switch is still undocumented, but we can reduce the scope of our Xapian docdata dependency by moving its only caller to SearchIdx. This reduces the amount of code loaded by read-only code paths.
2020-09-24v2writable: drop outdated {unindex_range} check
{unindex_range} only exists in the $sync state, nowadays, not the V2Writable ($self) object. $sync->{unindex_range} won't be populated if $regen_max is zero, either, unless somebody is injecting importable commits into an epoch history, in which this change will result in no-op indexing doing no work.
2020-09-24idxstack: fix comment about file_char
It's `d' for deletes, not `a'.
2020-09-22mda: match List-Id insensitively
This follows -watch commit b70473ab8296d31ebb600adb4fa8fe0ac5935ca8 to match List-Id headers case-insensitively. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20200921180152.uyqluod7qxbwqubo@chatter.i7.local/
2020-09-20mid: drop repeated ';' in mid_escape() regular expression
2020-09-20config: warn on multiple values for some fields
Our code doesn't support multi-values for these, and having unexpected arrays leads to unexpected results (e.g. showing stuff like "ARRAY(0xDEADBEEFADD12E55)" in user interfaces). So warn and only use the last value (matching git-config(1) behavior without `--get-all').
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2*: more descriptive package descriptions
Hopefully this allows others to more quickly figure out what's going on.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-19gcf2: libgit2-based git cat-file alternative
Having tens of thousands of inboxes and associated git processes won't work well, so we'll use libgit2 to access the object DB directly. We only care about OID lookups and won't need to rely on per-repo revision names or paths. The Git::Raw XS package won't be used since its manpages don't promise a stable API. Since we already use Inline::C and have experience with I::C when it comes to compatibility, this only introduces libgit2 itself as a source of new incompatibilities. This also provides an excuse for me to writev(2) to reduce syscalls, but liburing is on the horizon for next year.
2020-09-18git_async_cat: inline + drop redundant batch_prepare call
$git->cat_async already calls $git->batch_prepare iff needed, so we can reduce subroutine calls and inline a one-off subroutine to save some memory, here.
2020-09-16git_async_cat: fix outdated comment
We replaced Danga::Socket with PublicInbox::DS roughly a year before GitAsyncCat was introduced into our git history.
2020-09-16wwwtext: link to public-inbox.org/meta archives
Since we're advertising our address at meta@public-inbox.org, we should advertise the archives, too.
2020-09-16wwwstream: link to cgit URLs for coderepo
Hopefully this reduces the ambiguity between code for the project(s) using public-inbox and the code for public-inbox itself.
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-09-16mid: rename MID_MAX to ID_MAX
It's only used for HTML anchors which we will need indefinitely.
2020-09-15imap: quiet uninitialized variable warning on FETCH
This was triggered by blindly trying to FETCH an MSN (not "UID FETCH") on an empty dummy inbox. It's harmless, and probably triggered by a wayward client or misbehaving bot.
2020-09-14tests: consistently check for xapian-compact
We may need to test against development versions of Xapian, which may rely on setting `XAPIAN_COMPACT=xapian-compact-1.5'. Ensure it's possible to do that. And add a missing check in t/xcpdb-reshard.t, too.
2020-09-14sigfd: fix typos and scoping on systems w/o epoll+kqueue
Unfortunately, I'm not sure how easy catching these at compile-time, is. Prototypes do not seem to check these at compile time when crossing packages (not even with exported subroutines).
2020-09-12nntp: share more code between art_lookup callers
This prepares us for future changes to improve scalability to many inboxes.
2020-09-12treewide: avoid `goto &NAME' for tail recursion
While Perl implements tail recursion via `goto' which allows avoiding warnings on deep recursion. It doesn't (as of 5.28) optimize the speed of such dispatches, though it may reduce ephemeral memory usage. Make the code less alien to hackers coming from other languages by using normal subroutine dispatch. It's actually slightly faster in micro benchmarks due to the complexity of `goto &NAME'.
2020-09-10wwwstream: show init + index instructions for -V1, too
This should've always been there. I'm not sure how widely spread 1.0 and earlier releases were, but we'll keep documenting the version requirement.
2020-09-10solver: async blob retrieval for diff extraction
Like the rest of the WWW code, public-inbox-httpd now uses git_async_cat to retrieve blobs without blocking the event loop. This improves fairness when git blobs are on slow storage and allows us to take better advantage of SMP systems.
2020-09-10solver: break apart inbox blob retrieval
To avoid hogging the event loop in public-inbox-httpd when many candidate messages match, we'll separate the steps to ensure fairness on slow storage.
2020-09-10solver: check one git coderepo and inbox at a time
With public-inbox-httpd, this mitigates the effect of slow git blob storage with multiple coderepos configured for an inbox. It's still synchronous for now (and may need to remain that way for ->last_check_err), but no longer monopolizes the event loop when checking multiple coderepos. We don't yet support multi-inbox scanning, yet; but this also prepares us for a future where we do. We'll also support >=40 char blob OIDs in preparation for future git SHA-256 support, too.
2020-09-10wwwlisting: avoid hogging event loop
By using the just-introduced ConfigIter class. And make ManifestJsGz a subclass of it to reduce duplication.
2020-09-10extmsg: prevent cross-inbox matches from hogging event loop
With many inboxes, checking multiple SQLite repos will be slow and time-consuming, so ensure we can schedule it fairly between multiple inboxes.
2020-09-10config: split out iterator into separate object
We will need to allow simultaneous iterators on the same config object, since we'll need this for ExtMsg, NNTPD, WwwListing, NewsWWW, and other places.
2020-09-10config: flatten each_inbox and iterate_start args
In Perl, we can simplify callers by passing a single array all the way down the stack instead of a single array ref which needs to be expanded every call.
2020-09-10www: manifest.js.gz generation no longer hogs event loop
It's still as slow as before with hundreds/thousands of inboxes, but at least it's fair. Future changes will allow it to be cached and memoized with persistent HTTP servers.
2020-09-10use "\&" where possible when referring to subroutines
"*foo" is ambiguous in that it may refer to a bareword file handle; so we'll use it where we can without triggering warnings. PublicInbox::TestCommon::run_script_exit required dropping the prototype, however. We'll also future-proof by dropping "use warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm while we're in the area.