about summary refs log tree commit homepage
path: root/lib/PublicInbox/ManifestJsGz.pm
DateCommit message (Collapse)
2021-09-12www: use ->ALL for per-inbox manifest.js.gz, too
With 11 epochs on LKML, the lkml/manifest.js.gz response time goes from around 60ms to around 10ms, a significant improvement. And improve test coverage while we're at it.
2021-09-12manifest.js.gz: avoid long-lived per-epoch cat-file processes
When generating per-inbox manifests, we were forgetting to cleanup per-epoch "git cat-file --batch" processes. Our previous method of generating modified times was also stupidly inefficient, so replace the pipeline with a single "git for-each-ref" invocation.
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-06-29www: fix manifest.js.gz for default publicInbox.grokManifest
ManifestJsGz->response was not invoking the new "url_filter" method properly. Furthermore, fix url_filter for returning 404 responses. Reported-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87fsx3128a.fsf@kyleam.com/ Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-24www_listing: fix manifest.js.gz generation with extindex "all"
WwwListing and ManifestJsGz may be too different nowadays to be worth the code sharing between them. Update some comments and note we still needs better tests :x Fixes: 520be116e8a686cb ("www_listing: start updating for pagination + search")
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-31Merge remote-tracking branch 'origin/master' into lorelei
* origin/master: (58 commits) ds: flatten + reuse @events, epoll_wait style fixes ds: simplify EventLoop implementation check defined return value for localized slurp errors import: check for git->qx errors, clearer return values git: qx: avoid extra "local" for scalar context case search: remove {mset} option for ->mset method search: remove pointless {relevance} setting miscsearch: take reopen from Search and use it extsearch: unconditionally reopen on access extindex: allow using --all without EXTINDEX_DIR extindex: add undocumented --no-scan switch extindex: enable autoflush on STDOUT/STDERR extindex: various --watch signal handling fixes extindex: --watch for inotify-based updates eml: fix undefined vars on <Perl 5.28 t/config: test --get-urlmatch for git <2.26 default to CORE::warn in $SIG{__WARN__} handlers inbox: name variable for values loop iterator inboxidle: avoid needless syscalls on refresh inboxidle: clue users into resolving ENOSPC from inotify ...
2020-12-21manifest.js.gz: fix per-inbox /$INBOX/manifest.js.gz
/$INBOX/manifest.js.gz should not attempt to match every inbox in the domain (or every inbox); that is for /manifest.js.gz (without a /$INBOX prefix). Fixes: f303b4add8ea1883 ("wwwlisting: avoid hogging event loop")
2020-12-19tests: more common JSON module loading
We'll probably be using JSON more in the future, so make it easier to require in tests
2020-12-10manifest: account for future cache in MiscIdx docdata
We'll be storing private data inside the "" (empty string) key of the JSON doc we use for store for manifest.js.gz generation. This private data will allow us to reduce FS activity at and speed up startup times, but some will also be in Xapian boolean terms and values for searching and filtering.
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-11-24manifest: support faster generation via [extindex "all"]
For a mirror of lore.kernel.org with >140 inboxes, this speeds up manifest.js.gz generation from ~1s to 40ms on my HW. This is still unacceptable when dealing with thousands of inboxes, but gets us closer to where we need to be.
2020-11-24manifest: use ibx->git_epoch method for v2
We can slightly reduce the amount of version-specific logic, here.
2020-11-24git: add manifest_entry method
We'll be using this for MiscIdx and pre-generating the necessary JSON for manifest.js.gz, so make it easier to share code for generating per-repo JSON entries for grokmirror.
2020-11-24move JSON module portability into PublicInbox::Config
We'll be using JSON in MiscIdx and MiscSearch, and PublicInbox::Config seems like an appropriate place to put it.
2020-10-05manifest: favor Cpanel::JSON::XS
JSON::MaybeXS already favors Cpanel::JSON::XS (and has for many years, now). Allow users to skip installing JSON::MaybeXS if they want an XS-based JSON implementation.
2020-09-10wwwlisting: avoid hogging event loop
By using the just-introduced ConfigIter class. And make ManifestJsGz a subclass of it to reduce duplication.
2020-09-10www: manifest.js.gz generation no longer hogs event loop
It's still as slow as before with hundreds/thousands of inboxes, but at least it's fair. Future changes will allow it to be cached and memoized with persistent HTTP servers.