about summary refs log tree commit homepage
path: root/lib/PublicInbox/MiscSearch.pm
DateCommit message (Collapse)
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-06-23search: make xap_terms easier-to-use and use it more
This allows us to simplify callers throughout, and exceptions are can no longer be silently hidden. MiscSearch now uses xap_terms for looking up eidx_key terms for a code reduction. We also simplify LeiStore->_msg_kw for runtime use by moving the MsetIterator handling into t/lei_store.t test case.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28miscsearch: take reopen from Search and use it
As with ExtSearch, MiscSearch lacks a janky cleanup timer of PublicInbox::Inbox objects, leading to info about inboxes/newsgroups going stale. Fortunately, we don't use MiscSearch very heavily, yet. In the future, we may be able to detect new inboxes without having to SIGHUP or restart daemons using MiscSearch.
2020-12-23miscsearch: index UIDVALIDITY, use as startup cache
This brings -nntpd startup time down from ~35s to ~5s with 50K inboxes. Further improvements ought to be possible with deeper changes to MiscIdx, since -mda having to load every inbox seems unreasonable; but this general change is fairly unintrusive.
2020-12-23miscsearch: load Xapian at initialization
We need Xapian bindings loaded before calling (Search::)Xapian::Database->new
2020-11-28miscsearch: implement ->newsgroup_matches
This may be used to speed up newsgroup searches down-the-line, but the grep perlop isn't too shabby, at the moment.
2020-11-24*search: simplify retry_reopen users
Every callback uses `$self', and creating short-lived array references is not necessary when it's just as easy to copy the array in Perl (unlike C).
2020-11-24manifest: support faster generation via [extindex "all"]
For a mirror of lore.kernel.org with >140 inboxes, this speeds up manifest.js.gz generation from ~1s to 40ms on my HW. This is still unacceptable when dealing with thousands of inboxes, but gets us closer to where we need to be.
2020-11-24miscsearch: a new Xapian sub-DB for extindex
This will be used to index and search Inbox objects and perhaps individual git repositories/epochs for grokmirror manifest.js.gz generation. There is no sharding planned for this at the moment since inbox count should remain low (~100K to 1M) compared to message count. Folding this into the existing sharded DBs could be possible; but would likely increase query and maintenance costs, as well as development complexity. So we'll use a few more inodes and FDs at runtime, instead.