about summary refs log tree commit homepage
path: root/t/psgi_search.t
DateCommit message (Collapse)
2023-11-27www: qs_html: fix escaping of `q' param
Our use of MID_ESC characters was only intended for the pathname component of URIs and not appropriate for the query string component. So use a different $unsafe parameter list for uri_escape to make the result appropriate for query strings by disallowing [\&\'\+=] characters. Most notably, this change also allows us to accept `/' (slash) unescaped to make dfn: queries nicer to look at. Finally, we'll also add a ascii_html call on the URI-escaped result as an extra safety measure even though it's not really needed. As far as I can tell, the code without this fix didn't result in in an HTML injection since all our uses of uri_escape did escape angle brackets. Reported-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Link: https://public-inbox.org/meta/87o7ff4nlk.fsf@collabora.com/ Tested-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
2023-09-24config: drop scalar ref support from internal API
It's a needless branch to maintain exclusively for our tests. The `git config -l' output isn't pleasant to write in tests, anyways. So just use heredocs to write git configs in their native format rather than emulate the output of `git config -l'. This does make the test suite do more work with temporary files and process invocations, but it doesn't seem very measurable when testing on tmpfs (TMPDIR=/dev/shm). We'll make a minor improvement to TestCommon::tmpdir by allowing it to return a single value (which I suspect we can rely on in more places since File::Temp::Dir overloads stringification).
2023-09-11treewide: favor Xapian (SWIG binding) over Search::Xapian
The Xapian SWIG bindings are favored by Xapian upstream for ease-of-maintenance compared to the XS version. While Debian lags on this front, the SWIG bindings are widely available on all *BSDs.
2023-01-13search_view: show "No results" text on 404
Oops, this was broken a while ago Fixes: 55263c56cf41c87f (wwwstream: reduce blob fetch paths for ->getline, 2020-07-05)
2022-11-26eml: header_raw converts octets to Perl UTF-8
This fixes the display of raw (non-RFC 2047) names and subjects in HTML message views. SMTPUTF8 (RFC 6531) allows raw UTF-8 in headers without RFC 2047 encoding, so let Perl handle it as a character sequence for the rest of our consumers. Thus, the old special case in PublicInbox::Smsg->populate is no longer necessary and gone. The one regression notice so far (and fixed here) is compressed IMAP envelope responses still needs raw bytes since the zlib wrapper is designed for octets, not Perl UTF-8 chars. Thus we reverse utf8::decode with utf8::encode in PublicInbox::IMAP::_esc. ->header_set also forces encoding to bytes, since all existing callers would either be dealing with ->header_raw results or be RFC-2047-encoded anyways. Reindexing is not necessary with this change due to the prior PublicInbox::Smsg->populate special case. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20221124153715.3nenjpjzj43vqxr2@meerkat.local/
2021-08-28get rid of unnecessary bytes::length usage
The only place where we could return wide characters with -httpd was the raw $INBOX_DIR/description text, which is now converted to octets. All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode, so length() and bytes::length() are equivalent on reads. For socket writes, any non-octet data would warn about wide characters and we are strict in warnings with test_httpd. All gzipped buffers are also octets, as is PublicInbox::Eml->body, and anything from PerlIO objects ("git cat-file --batch" output, filesystems), so bytes::length was unnecessary in all those places.
2021-06-23www: do not warn on blank query parameters
Sometimes users (or bots) may lead queries with '&' and trigger uninitialized variable warnings, just ignore them and give consumers a $ctx->{qp}->{''} entry. While we're in the area, pass a regexp rather than scalar string to the `split' perlop to prevent Perl from recompiling the regexp on every call.
2021-03-15t/psgi_search: use create_inbox
2021-02-15t/psgi_search: fix test around date boundaries
git approxidate won't actually return times in the future, so "1.{hour,day,year}.from.now" all return the current epoch time. So just use "now" and ensure we have a predictable time zone for testing.
2021-02-11search: use git approxidate in WWW and "lei q --stdin"
This greatly improves the usability of d:, dt:, and rt: search prefixes for users already familiar git's "approxidate" feature. That is, users familiar with the --(since|after|until|before)= options in git-log(1) and similar commands will be able to use those dates in the WWW UI.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-08-23mbox: disable "&t" on existing Xapian until full reindex
Expanding threads via over.sqlite3 for mbox.gz downloads without Xapian effectively collapsing on the THREADID column leads to repeated messages getting downloaded. To avoid that situation, use a "has_threadid" Xapian metadata flag that's only set on --reindex (and brand new Xapian DBs). This allows admins to upgrade WWW or do --reindex in any order; without worrying about users eating up bandwidth and CPU cycles.
2020-08-19smsg: handle wide characters in raw mail headers
There may be messages in the wild with wide characters in headers which aren't non-RFC2047 encoded. Assume UTF-8 so those fields can round trip through over.sqlite3. This doesn't affect docdata.glass in Xapian, but it does affect how over.sqlite3 stores the same deflated info.
2020-08-01improve error handling on import fork / lock failures
v?fork failures seems to be the cause of locks not getting released in -watch. Ensure lock release doesn't get skipped in ->done for both v1 and v2 inboxes. We also need to do everything we can to ensure DB handles, pipes and processes get released even in the face of failure. While we're at it, make failures around `git update-server-info' non-fatal, since smart HTTP seems more popular anyways. v2 changes: - spawn: show failing command - ensure waitpid is synchronous for inotify events - teardown all fast-import processes on exception, not just the failing one - beef up lock_release error handling - release lock on fast-import spawn failure
2020-05-09remove most internal Email::MIME usage
We no longer load or use Email::MIME outside of comparison tests.
2020-04-22t/*.t: reduce dependency on Email::MIME APIs
Instead, favor PublicInbox::MIME->new for non-attachment emails. We may support alternatives to Email::MIME down the line. We'll still keep Email::MIME->create to deal with attachments, for now, but there's also a fair amount of test duplication we should eliminate, later.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-31mboxgz: ensure gzipped mboxes always have filenames
Lets always have Content-Disposition for files intended to be downloaded for consumption by non-browsers, such as pigz, zcat, "git am". This is also to be consistent with the non-gzipped mbox $MESSAGE_ID/raw endpoint.
2020-01-31t/psgi_search: test for subject-free messages
Apparently I fixed this bug a while back in commit f94c3a195a25a31d0215cd175938008fca473378 but did not write tests.
2020-01-11make Plack optional for non-WWW and non-httpd users
Some users just want to run -mda, -watch, and/or -nntpd. Let them run just those without forcing them to pull in a bunch of dependencies.
2019-12-24search: support SWIG-generated Xapian.pm
Xapian upstream is slowly phasing out the XS-based Search::Xapian in favor of the SWIG-generated "Xapian" package. While Debian and both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian" binding. More information about the status of the "Xapian" Perl module here: https://trac.xapian.org/ticket/523
2019-12-24testcommon: add require_mods method and use it
This cuts down on lines of code in individual test cases and fixes some misnamed error messages by using "$0" consistently. This will also provide us with a method of swapping out dependencies which provide equivalent functionality (e.g "Xapian" SWIG can replace "Search::Xapian" XS bindings).
2019-12-19tests: move t/common.perl to PublicInbox::TestCommon
We want to be able to use run_script with *.t files, so t/common.perl putting subs into the top-level "main" namespace won't work. Instead, make it a module which uses Exporter like other libraries.
2019-11-24tests: use File::Temp->newdir instead of tempdir()
We'll also introduce a tmpdir() API to give tempdirs consistent names.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-15config: we always have {-section_order}
Rewrite a bunch of tests to use ordered input (emulating "git config -l" output) so we can always walk sections in the order they were given in the config file.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-16t/psgi_search.t: use higher-level APIs
No point in using lower-level APIs for a PSGI test.
2019-06-14searchidx: require PublicInbox::Inbox (or InboxWritable) ref
PublicInbox::Inbox objects have minimal dependencies, so drop code to support old tests which existed before the PublicInbox::Inbox object came into existence.
2019-06-03t/psgi_search.t: require DBD::SQLite
In case we encounter an odd system which has Search::Xapian but not DBD::SQLite.
2019-05-15lazy load Xapian and make it optional for v2
More tests work without Search::Xapian, now. Usability issues still need to be fixed
2019-02-13ensure bytes::length is available to callers
We were relying on Danga::Socket using the "bytes" pragma, previously. Nowadays, the "bytes" pragma is not recommended in general, but bytes::length remains acceptable for getting the byte-size of a scalar.
2018-04-22extmsg: use Xapian only for partial matches
"LIKE" in SQLite (and other SQL implementations I've seen) is expensive with nearly 3 million messages in the archives. This caused some partial Message-ID lookups to take over 600ms on my workstation (~300ms on a faster Xeon). Cut that to below under 30ms on average on my workstation by relying exclusively on Xapian for partial Message-ID lookups as we have in the past. Unlike in the past when we tried using Xapian to match partial Message-IDs; we now optimize our indexing of Message-IDs to break apart "words" in Message-IDs for searching, yielding (hopefully) "good enough" accuracy for folks who get long URLs broken across lines when copy+pasting. We'll also drop the (in retrospect) pointless stripping of "/[tTf]" suffixes for the partial match, since anybody who hits that codepath would be hitting an invalid message ID. Finally, limit wildcard expansion to prevent easy DoS vectors on short terms. And blame Pine and alpine for generating Message-IDs with low-entropy prefixes :P
2018-04-18Merge remote-tracking branch 'origin/master' into v2
* origin/master: nntp: allow and ignore empty commands mbox: do not barf on queries which return no results nntp: fix NEWNEWS command searchview: fix non-numeric comparison Allow specification of the number of search results to return githttpbackend: avoid infinite loop on generic PSGI servers http: fix modification of read-only value extmsg: use news.gmane.org for Message-ID lookups extmsg: rework partial MID matching to favor current inbox Update the installation instructions with Fedora package names nntp: do not drain rbuf if there is a command pending nntp: improve fairness during XOVER and similar commands searchidx: do not modify Xapian DB while iterating Don't use LIMIT in UPDATE statements
2018-04-03mbox: do not barf on queries which return no results
Having zero search results means we never get a chance to populate the Content-Disposition header for mbox downloads.
2018-04-02replace Xapian skeleton with SQLite overview DB
This ought to provide better performance and scalability which is less dependent on inbox size. Xapian does not seem optimized for some queries used by the WWW homepage, Atom feeds, XOVER and NEWNEWS NNTP commands. This can actually make Xapian optional for NNTP usage, and allow more functionality to work without Xapian installed. Indexing performance was extremely bad at first, but DBI::Profile helped me optimize away problematic queries.
2018-04-01searchview: fix non-numeric comparison
We don't want non-fully-numeric limits being compared and tripping warnings. While we're at it, avoid hard-coding '200' and reuse $LIM as the default.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-05-23www: do not mangle characters from search queries
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> https://public-inbox.org/meta/CACBZZX5Gnow08r=0A1J_kt3a=zpGyMfvsqu8nAN7kacNnDm+dg@mail.gmail.com/