about summary refs log tree commit homepage
DateCommit message (Collapse)
2020-09-20doc: post-1.6 updates, start 1.7
I should've dropped "PENDING" notes before the 1.6 release; they're dropped now, and a note is added to remind my future self to drop them before 1.7.
2020-09-20config: warn on multiple values for some fields
Our code doesn't support multi-values for these, and having unexpected arrays leads to unexpected results (e.g. showing stuff like "ARRAY(0xDEADBEEFADD12E55)" in user interfaces). So warn and only use the last value (matching git-config(1) behavior without `--get-all').
2020-09-19gcf2: wire up read-only daemons and rm -gcf2 script
It seems easiest to have a singleton Gcf2Client client object per daemon worker for all inboxes to use. This reduces overall FD usage from pipes. The `public-inbox-gcf2' command + manpage are gone and a `$^X' one-liner is used, instead. This saves inodes for internal commands and hopefully makes it easier to avoid mismatched PERL5LIB include paths (as noticed during development :x). We'll also make the existing cat-file process management infrastructure more resilient to BOFHs on process killing sprees (or in case our libgit2-based code fails on us). (Rare) PublicInbox::WWW PSGI users NOT using public-inbox-httpd won't automatically benefit from this change, and extra configuration will be required (to be documented later).
2020-09-19gcf2: require git dir with OID
This amortizes the cost of recreating PublicInbox::Gcf2 objects when alternates change in v2 all.git.
2020-09-19gcf2*: more descriptive package descriptions
Hopefully this allows others to more quickly figure out what's going on.
2020-09-19gcf2: transparently retry on missing OID
Since we only get OIDs from trusted local data sources (over.sqlite3), we can safely retry within the -gcf2 process without worry about clients spamming us with requests for invalid OIDs and triggering reopens.
2020-09-19add gcf2 client and executable script
This should be able to replace multiple `git cat-file' for blob retrieval, but adjustments may be needed.
2020-09-19t/gcf2: test changes to alternates
Calling ->add_alternate won't pick up new additions to $OBJDIR/info/alternates, unfornately. Thus v2 inboxes will need to do something to invalidate Gcf2 objects.
2020-09-19gcf2: libgit2-based git cat-file alternative
Having tens of thousands of inboxes and associated git processes won't work well, so we'll use libgit2 to access the object DB directly. We only care about OID lookups and won't need to rely on per-repo revision names or paths. The Git::Raw XS package won't be used since its manpages don't promise a stable API. Since we already use Inline::C and have experience with I::C when it comes to compatibility, this only introduces libgit2 itself as a source of new incompatibilities. This also provides an excuse for me to writev(2) to reduce syscalls, but liburing is on the horizon for next year.
2020-09-18git_async_cat: inline + drop redundant batch_prepare call
$git->cat_async already calls $git->batch_prepare iff needed, so we can reduce subroutine calls and inline a one-off subroutine to save some memory, here.
2020-09-18doc: txt2pre: more manpage URLs
We host our own -imapd manpage, and we started using a few more git commands (fast-import for ages). We'll also need to link to manpages.debian.org and live with long URLs for a few non-standard manpages in software we reference.
2020-09-18doc: flow: include -imapd
It's another read-only daemon, and it may see more usage than -nntpd as more users have IMAP support than NNTP.
2020-09-16t/indexlevels-mirror: fix improperly skipped test
Oops :x
2020-09-16public-inbox 1.6.0 v1.6.0
2020-09-16git_async_cat: fix outdated comment
We replaced Danga::Socket with PublicInbox::DS roughly a year before GitAsyncCat was introduced into our git history.
2020-09-16wwwtext: link to public-inbox.org/meta archives
Since we're advertising our address at meta@public-inbox.org, we should advertise the archives, too.
2020-09-16wwwstream: link to cgit URLs for coderepo
Hopefully this reduces the ambiguity between code for the project(s) using public-inbox and the code for public-inbox itself.
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-09-16mid: rename MID_MAX to ID_MAX
It's only used for HTML anchors which we will need indefinitely.
2020-09-15imap: quiet uninitialized variable warning on FETCH
This was triggered by blindly trying to FETCH an MSN (not "UID FETCH") on an empty dummy inbox. It's harmless, and probably triggered by a wayward client or misbehaving bot.
2020-09-15ci/deps: add Plack::Test::ExternalServer for devtest
More of our Plack tests exercise public-inbox-httpd, nowadays; and ExternalServer lets us test it easily alongside generic PSGI stuff.
2020-09-15t/imapd.t: skip dependent test on failure
We don't want to cascade failures/warnings when something else breaks. There's likely more of these to be fixed as we encounter them.
2020-09-14doc: TODO and release notes updates ahead of 1.6
Some more things have happened... And drop some items which are too expensive to support, such as automatic mirroring.
2020-09-14tests: consistently check for xapian-compact
We may need to test against development versions of Xapian, which may rely on setting `XAPIAN_COMPACT=xapian-compact-1.5'. Ensure it's possible to do that. And add a missing check in t/xcpdb-reshard.t, too.
2020-09-14sigfd: fix typos and scoping on systems w/o epoll+kqueue
Unfortunately, I'm not sure how easy catching these at compile-time, is. Prototypes do not seem to check these at compile time when crossing packages (not even with exported subroutines).
2020-09-14doc: Add piem to list of clients
2020-09-12nntp: share more code between art_lookup callers
This prepares us for future changes to improve scalability to many inboxes.
2020-09-12t/nntpd: add test for the XPATH command
It's only in RFC 2980 (not 977 or 3977), but Net::NNTP has supported it since 2001, at least. We'll be making changes to avoid pathological behavior, so test it, first.
2020-09-12treewide: avoid `goto &NAME' for tail recursion
While Perl implements tail recursion via `goto' which allows avoiding warnings on deep recursion. It doesn't (as of 5.28) optimize the speed of such dispatches, though it may reduce ephemeral memory usage. Make the code less alien to hackers coming from other languages by using normal subroutine dispatch. It's actually slightly faster in micro benchmarks due to the complexity of `goto &NAME'.
2020-09-10wwwstream: show init + index instructions for -V1, too
This should've always been there. I'm not sure how widely spread 1.0 and earlier releases were, but we'll keep documenting the version requirement.
2020-09-10solver: async blob retrieval for diff extraction
Like the rest of the WWW code, public-inbox-httpd now uses git_async_cat to retrieve blobs without blocking the event loop. This improves fairness when git blobs are on slow storage and allows us to take better advantage of SMP systems.
2020-09-10solver: break apart inbox blob retrieval
To avoid hogging the event loop in public-inbox-httpd when many candidate messages match, we'll separate the steps to ensure fairness on slow storage.
2020-09-10solver: check one git coderepo and inbox at a time
With public-inbox-httpd, this mitigates the effect of slow git blob storage with multiple coderepos configured for an inbox. It's still synchronous for now (and may need to remain that way for ->last_check_err), but no longer monopolizes the event loop when checking multiple coderepos. We don't yet support multi-inbox scanning, yet; but this also prepares us for a future where we do. We'll also support >=40 char blob OIDs in preparation for future git SHA-256 support, too.
2020-09-10wwwlisting: avoid hogging event loop
By using the just-introduced ConfigIter class. And make ManifestJsGz a subclass of it to reduce duplication.
2020-09-10extmsg: prevent cross-inbox matches from hogging event loop
With many inboxes, checking multiple SQLite repos will be slow and time-consuming, so ensure we can schedule it fairly between multiple inboxes.
2020-09-10t/cgi.t: show stderr on failures
This helped me diagnose an error I would've introduced in the next commit.
2020-09-10config: split out iterator into separate object
We will need to allow simultaneous iterators on the same config object, since we'll need this for ExtMsg, NNTPD, WwwListing, NewsWWW, and other places.
2020-09-10config: flatten each_inbox and iterate_start args
In Perl, we can simplify callers by passing a single array all the way down the stack instead of a single array ref which needs to be expanded every call.
2020-09-10www: manifest.js.gz generation no longer hogs event loop
It's still as slow as before with hundreds/thousands of inboxes, but at least it's fair. Future changes will allow it to be cached and memoized with persistent HTTP servers.
2020-09-10use "\&" where possible when referring to subroutines
"*foo" is ambiguous in that it may refer to a bareword file handle; so we'll use it where we can without triggering warnings. PublicInbox::TestCommon::run_script_exit required dropping the prototype, however. We'll also future-proof by dropping "use warnings" in Cgit.pm and use the less-ambiguous "//=" in Inbox.pm while we're in the area.
2020-09-10solver: drop warnings, modernize use v5.10.1, use SEEK_SET
With Perl upstream preparing to deprecate things, we'll move towards only enabling warnings during development via shebang and stop enabling them via "use". We'll also favor "use v5.10.1" over the Perl 5.6-compatible "use 5.010_001", since our code base never worked on 5.6. Finally, were also importing SEEK_SET without using it, just use it for readability since we can't avoid loading Fcntl in other places and it'll get constant-folded, anyways.
2020-09-10xt/solver: test with public-inbox-httpd, too
We'll be making changes to solver to make it even fairer to slow clients on slow storage. Ensure we test with public-inbox-httpd-specific codepaths, since the generic PSGI code paths are rare in production use.
2020-09-10wwwtext: config comment improvements
Use the full URL of the inbox being mirrored to reduce ambiguity (instead of just the inbox name). Using asymmetric quotes (e.g `foo') improves readability for me in that it's more obvious when a quote begins and ends. It also lights up fewer pixels and reduces visual noise compared to double-quotes. We'll also reflow the `mainrepo' vs `inboxdir' comment slightly to emphasize the word `instead'.
2020-09-10wwwtext: don't blindly quote "git clone" destination
Save screen space and light up fewer pixels to reduce visual noise.
2020-09-10wwwtext: describe the use of `coderepo' entries
The `solver' feature is not very obvious, give potential users a hint about it.
2020-09-10nntp: fix cross-newsgroup Message-ID lookups
We cannot blindly use the selected newsgroup for HEAD/ARTICLE/BODY requests using Message-ID, since those commands look across all newsgroups; not just the selected one (if any). So stuff a reference to the Inbox object into $smsg. We can reduce args passed into set_nntp_headers() and msg_hdr_write(), too. Fixes: 0e6ceff37fc38f28 ("nntp: support slow blob retrievals")
2020-09-09wwwstream: fix "Atom feed" link
Oops, I wanted to stop escaping double-quotes with `qq()' but used `q()' instead :x Fixes: 2f61828fcb727e51 ("www: make mirror instructions more prominent")
2020-09-09contrib/css: limit <a> coloring to links, only
We don't want <a> tags without href= attributes to be colored, since the `<a id=mirror>' tag in the HTML footer is intended as an anchor destination for `<a href=#mirror>' link at the top.
2020-09-09www: make mirror instructions more prominent
In order to fight the misconception that public-inboxes are centralized, anchor "#mirror" to the clone instructions and place an emphasis on "mirror", not just cloning. While we're at it, better describe multi-epoch -V2 inboxes, since some users do not seem to realize epochs consist of different data.
2020-09-03v2writable: reuse read-only shard counting code
We'll also fix the read-only code to ensure we notice missing Xapian shards, since gaps would throw off our expectation that Xapian document IDs and NNTP article numbers are interchangeable.