about summary refs log tree commit homepage
path: root/lib/PublicInbox/Isearch.pm
DateCommit message (Collapse)
2024-04-24www: wire up search to use async xap_helper
The C++ version of xap_helper will allow more complex and expensive queries. Both the Perl and C++-only version will allow offloading search into a separate process which can be killed via ITIMER_REAL or RLIMIT_CPU in the face of overload. The xap_helper `mset' command wrapper is simplified to unconditionally return rank, percentage, and estimated matches information. This may slightly penalize mbox retrievals and lei users, but perhaps that can be a different command entirely.
2023-08-24introduce optional C++ xap_helper
This allows us to perform the expensive "dump_ibx" operations in native C++ code using the Xapian C++ library. This provides the majority of the speedup with the -cindex --associate switch. Eventually this may be expanded to cover all uses of Xapian within the project to ensure we have access to Xapian APIs which aren't available in XS|SWIG bindings; and also for ease-of-installation on systems which don't provide pre-packaged Perl Xapian bindings (e.g. OpenBSD 7.3) but do provide Xapian development libraries. Most of the C++ code is still C, as I'm not remotely familiar with C++ compared to C. I suspect many users and potential hackers being from git, Linux kernel, and glibc world are in the same boat.
2023-08-19isearch: avoid hex string for Xapian sortable_serialise
While a string representing a integer in hex is fine for DBI and SQLite, Xapian's sortable_serialise requires a Perl integer value. So just retrieve the last Xapian DB document ID in this rare code path because we can't use 64-bit integer literals in some 32-bit Perl builds (e.g. OpenBSD on i386) Fixes: be2a0a353d60 ("isearch: support 64-bit article numbers for SQLite query")
2023-05-07isearch: support 64-bit article numbers for SQLite query
While IMAP UIDs are specified as 32-bit in RFC 3501, there's no reason we can't support 64-bit article numbers on our end when the time comes. Neither NNTP nor POP3 have the 32-bit limitation, even, so it's not inconceivable that IMAP will drop that limitation at some point, too.
2022-08-04isearch: mset_to_artnums: avoid unnecessary ops
We can use DBI's selectcol_arrayref directly (as we do in other places) to avoid unnecessary arrays and ops on our end.
2021-10-12isearch: do not access Extsearch->{over} directly
It may not exist due to periodic cleanup to avoid excessive FD use.
2021-02-11search: use git approxidate in WWW and "lei q --stdin"
This greatly improves the usability of d:, dt:, and rt: search prefixes for users already familiar git's "approxidate" feature. That is, users familiar with the --(since|after|until|before)= options in git-log(1) and similar commands will be able to use those dates in the WWW UI.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-28search: remove {mset} option for ->mset method
The ->mset method always returns a Xapian mset nowadays, so naming a parameter {mset} is too confusing. As it does with MiscSearch, setting the {relevance} parameter to -1 now sorts by ascending docid order. -2 is now supported for descending docid order, too, since it may be useful for lei users.
2020-12-21isearch: use numeric sort for article numbers
Perl sort is alphabetical by default and Xapian uses numeric document IDs, so sort must be told explicitly to use numeric comparisons even if the scalars are integer values (IV) internally. And eliminate extra hash marks ("#") since they're probably too noisy if there are many IDs. Note: I haven't seen this warning message in syslog, yet :>
2020-12-05imap: support isearch and reduce Xapian queries
Since IMAP search (either with Isearch or traditional per-Inbox search) only returns UIDs, we can safely set the limit to the UID slice size(*). With isearch, we can also trust the Xapian result to fit any docid range we specify. Limiting Xapian results to 1000 was making ->ALL docid <=> per-Inbox UID impossible since results could overlap between ranges unpredictably. Finally, we can map the ->ALL docids into per-Inbox UIDs and show them to the client in the UID order of the Inbox, not the docid order of the ->ALL extindex. This also lets us get rid of the "uid:" query parser prefix and use the Xapian::Query API directly to reduce our search prefix footprint. For mbox.gz downloads in WWW, we'll also make a best effort to preserve the order from the Inbox, not the order of extindex; though it's possible large result sets can have non-overlapping windows. (*) by definition, UID slice size is a "safe" value which shouldn't OOM either the server or clients.
2020-12-05isearch: emulate per-inbox search with ->ALL
Using "eidx_key:" boolean prefix to limit results to a given inbox, we can use ->ALL to emulate and replace per-Inbox xap15/[0-9] search indices. With this change, the presence of "extindex.all.topdir" in the $PI_CONFIG will cause the WWW code to use that extindex and ignore per-inbox Xapian DBs in xap15/[0-9]. Unfortunately IMAP search still requires old per-inbox indices, for now. Mapping extindex Xapian docids to per-Inbox UIDs and vice-versa is proving tricky. Fortunately, IMAP search is rarely used and optional. The RFCs don't specify expensive phrase search, either, so `indexlevel=medium' can be used in per-inbox Xapian indices to save space. For primarily WWW (and future JMAP) users; this should result in significant disk space, FD, and page cache footprint savings for large instances with many inboxes and many cross-posted messages.