about summary refs log tree commit homepage
DateCommit message (Collapse)
2021-02-12search: query_approxidate: cleanup regexp, more tests
The cleanup doesn't seem to matter, I initially thought I needed to handle "" (two double quotes) explicitly because that's what Xapian does to escape a double quote inside a double-quoted phrase. It turns out we only need to be able to pass phrases through to Xapian unmodified, and the existing group of ["\x{201c}\x{201d}] is sufficient for our purposes.
2021-02-12mbox_reader: do not chomp non-blank EOL
It's conceivable some cases won't generate an empty line before an mboxrd or mboxo From_ line. Ensure we can handle that case and don't leave the Eml->{bdy} without a trailing LF character. And drop an unnecessary alarm import while we're in the area.
2021-02-12import_mbox: use MboxReader
It supports more mbox variants and it's trailing newline behavior is probably more correct despite the previous change to PublicInbox::Filter::Vger.
2021-02-12filter/vger: kill trailing newlines aggressively
PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last trailing newline, not every single trailing newline like InboxWritable->import_mbox does. Testing PublicInbox::MboxReader->mboxrd (next commit) with scripts/import_vger_from_mbox on the LKML archive I got 2018 for v2 development; this difference was responsible for a single spam message(*) from out of 2722831 not being filtered correctly and returning a different result. (*) dated 2014-08-25
2021-02-11search: disallow spaces in argv approxidate queries
This is for consistency with --stdin and WWW front ends which can't distinguish between phrase searches and prefix ranges used for d:/dt:/rt:. In any case, I expect users on the lei command-line are more likely to use `5.days.ago' instead of `"5 days ago"'
2021-02-11search: use git approxidate in WWW and "lei q --stdin"
This greatly improves the usability of d:, dt:, and rt: search prefixes for users already familiar git's "approxidate" feature. That is, users familiar with the --(since|after|until|before)= options in git-log(1) and similar commands will be able to use those dates in the WWW UI.
2021-02-11doc: lei: update manpages
Catch up with recent developments.
2021-02-11doc: add lei-import(1)
2021-02-11doc: lei: prefer 'location' and 'dirname'
This follows the help output change in 52342875 (lei help: split out into separate file, 2021-02-06).
2021-02-11doc: lei q: use 'mfolder' as --output placeholder
'mfolder' is familiar to mairix users, and 'path' isn't a good choice because support will be added for IMAP. Link: https://public-inbox.org/meta/YCBh62OqkYnr5cqw@dcvr
2021-02-10tests: skip properly with git <2.6
Tested with git 1.8.3.1 on CentOS 7.x `plan skip_all => ...' doesn't work after some tests have run, we have to call skip() instead.
2021-02-10search: fix argv handling of quoted phrases
This fixes both an old bug in "lei q" argv handling and one recent regression introduced with the change to use approxidate. Field prefixes are also handled correctly inside parenthesized statements when the field follows "(" without a separation character. Fixes: fbb7ccabbf54a405 ("lei q: use git approxidate with d:, dt: and rt: ranges")
2021-02-10lei_external: fix+test handling of escaped braces
While '{' and '}' are rare in path names, somebody may still use them or deal with software which does (e.g. GNU arch).
2021-02-10net_reader: new package split from -watch
We'll be using some of this for IMAP and NNTP support in lei, too. More will need to be done to improve code sharing and reusability, soon, but this is a start.
2021-02-10lei: note some TODO items (curl, externals)
I don't know if it's worth it to use libcurl directly (nor the effort to support and maintain tests)
2021-02-10lei ls-external: support --local and --remote
Similar to "lei q", "--local" means only local and "--remote" means remote only. I can't think of a reason to have --no-* variants for these switches. There's also updates to the TestCommon for more common lei cases.
2021-02-10test_common: support lei-daemon only testing
Daemon-only tests can be significantly faster due to cached configs; so give developers a chance to test only daemons to improve productivity. The differences between daemon and oneshot modes are minimal, at this point.
2021-02-10lei_external: remove unnecessary Exporter use
We don't need to export for methods which are only called via "->" or "->can".
2021-02-10lei *external: glob improvements, ls-external filtering
The "ls-external" now accepts the same glob patterns used by with lei q --{include,only,exclude}. If no glob is detected, it will be treated as a literal substring match (like "grep -F"). Inverting matches is also supported ("grep -v").
2021-02-10tests|lei: fixes for TEST_RUN_MODE=0 and lei oneshot
DESTROY callbacks can clobber $?, so we must take care to preserve it when exiting. We'll also try to make an effort to ensure better DESTROY ordering and delete as much as possible before x_it finishes. We also need to load PublicInbox::Config when setting up public inboxes.
2021-02-10lei: replace "I:"-prefixed info messages with "#"
The "#" is what TAP <https://testanything.org/> uses, which is also consistent with what our (and many other) test suites emit.
2021-02-10t/run.perl: drop Cwd dependency
Perl 5.8.8/5.10.0+ can use fchdir(), and we depend on 5.10.1+
2021-02-10lei q: prefix --alert ops with ':' instead of '-'
Using dashed keywords confuses the option parser without "=" signs (and bash completion doesn't yet work with "="). So use ":" instead of "-" as the prefix for internal ops, since ":" is just as unlikely to be the first character of an executable file in a user's $PATH.
2021-02-10use MdirReader in -watch and InboxWritable
MdirReader now handles files in "$MAILDIR/new" properly and is stricter about what it accepts. eml_from_path is also made robust against FIFOs while eliminating TOCTOU races with between stat(2) and open(2) calls.
2021-02-10t/run.perl: fix for >128 tests
We need to explicitly close the write-end of the pipe in workers to ensure they don't prevent each other from seeing EOF. Also, make a note to keep using the pipe for now since Linux <3.14 had broken read(2) semantics when file descriptions are shared across threads/processes.
2021-02-10lei: split out MdirReader package, lazy-require earlier
We'll do more requires in the top-level lei-daemon process to save work in workers. We can also work towards aborting on user errors in lei-daemon rather than worker processes. "lei import -f mbox*" is finally tested inside t/lei_to_mail.t
2021-02-10git: ->qx: respect caller's $/ in array context
This could lead to bad results when doing ls-tree -z for v2 import in case there's multiple files. In any case, the `local $/ = "\0"' in Import.pm is also eliminated to reduce potential confusion and surprises.
2021-02-10t/cgi.t: modernizations and style updates
We prefer BAIL_OUT or fail to die in tests (I didn't know BAIL_OUT existed when I started the project). We can also depend on IO::Uncompress::Gunzip being available, We'll keep the cgi_run wrapper since the .cgi could use some coverage and remove the FIXME note. run_script makes tests fast enough.
2021-02-10test_common: disable fsync on the CLI where possible
This makes tests faster for users on slow TMPDIR (or not using eatmydata) and forces coverage on a non-default switch. Unfortunately, this doesn't yet cover InboxWritable usage.
2021-02-10t/thread-index-gap.t: avoid unnecessary map
We only care abount the number of results.
2021-02-09www: stream mboxrd in descending docid order
Order doesn't matter when users are completely downloading mboxrds onto the FS and then opening them with an MUA. The MUA is expected to sort the results in the user's preferred order. However, lei can start streaming the results to its destination Maildir (or eventually IMAP/JMAP mailbox) with an MUA already open. This will let users see recent results sooner in their MUA, as those tend to have a higher docid. This matches the behavior of the HTML results, as well. As a bonus, this is around ~5% faster in a one-off, informal test case with 66k results. I expect this to hold true in all all cases since git has always optimized storage to favor recent objects.
2021-02-08spawnpp: raise exception on E2BIG errors
This matches the Inline::C version, and lets us test argv overflow with $search->query_argv_to_string;
2021-02-08search: use one git-rev-parse process for all dates
This is necessary to avoid slowdowns with pathological cases with many dates in the query, since each rev-parse invocation takes ~5ms. This is immeasurably slower with one open-ended range, but already faster with any closed range featuring two dates which require parsing via git.
2021-02-08lei q: use git approxidate with d:, dt: and rt: ranges
Instead of having --(sent|received)-(before|after)=s command-line switches, we'll just try to make sense of argv so it's usable within parenthesized statements and such. Given the negligible performance penalty with Inline::C process spawning, we'll probably wire this up to the WWW interface, too. "d:" is for mairix compatibility. I don't know if "dt:" and "rt:" will be too useful, but they exist because of IMAP (and JMAP).
2021-02-08git: implement date_parse method
Users are expected to be familiar with git's "approxidate" functionality for parsing dates, so we'll expose that in our UIs. Xapian itself has limited date parsing functionality and I can't expect users to learn it. This takes around 4-5ms on my aging workstation, so it'll probably be made acceptable for the WWW UI, even. libgit2 has a git__date_parse function which I expect to have less overhead, but it's only for internal use at the moment.
2021-02-08lei: drop BSD::Resource usage
It's no longer necessary with the changes to stop doing FD passing in our backend. cf. commits 5180ed0a1cd65139 and 7d440bf3667b8ef5 ("lei q: eliminate $not_done temporary git dir hack") ("lei q: reorder internals to reduce FD passing")
2021-02-08lei: avoid racing on unlink + bind + listen
When multiple lei(1) processes are starting in parallel without lei-daemon already running, it's possible for them to trample each others' socket path trying to start lei-daemon. Lock errors.log before unlink/bind/listen. We'll add an extra connect(2) attempt to check if the starter lost the race. Without this change, a stress script like the following could easily cause problems: lei q -o ~/tmp/a foo ... & lei q -o ~/tmp/b bar ... & lei q -o ~/tmp/c quux ... & lei q -o ~/tmp/d baz ... &
2021-02-08lei: start_pager: drop COLUMNS default
It shouldn't be needed since none of our subcommands will care or attempt to format output. Once "lei show" is implemented, we'll run "git show" directly on the result.
2021-02-08ds: improve add_timer usability
Packing args into an arrayref is awkward and we may be using this API more in lei.
2021-02-08tests: favor IPv6
IPv4 gets plenty of real-world coverage, and apparently there's Debian buildd hosts which lack IPv4(*). So ensure everything can work on IPv6 and not cause problems for odd setups. (*) https://bugs.debian.org/979432
2021-02-08lei q: support --alert=CMD for early MUA users
For --mua users writing to lock-free -o MFOLDER destinations; we'll keep -WINCH and send an ASCII terminal bell when results are complete. This is intended to let early MUA spawners know when lei2mail is done writing results. We'll also support running arbitrary commands. It may be used to run play(1) (from SoX), handle pipelines+redirects (e.g. "/bin/sh -c 'echo search done | wall'") or other commands.
2021-02-08lei q: SIGWINCH process group with the terminal
While using utime on the destination Maildir is enough for mutt to eventually notice new mail, "eventually" isn't good enough. Send a SIGWINCH to wake mutt (and likely other MUAs) immediately. This is more portable than relying on MUAs to support inotify or EVFILT_VNODE.
2021-02-08lei_xsearch: quiet Eml warnings from remote mboxrds
This will probably cover full Atom/HTML feed generation or any outputs which are order-dependent, but those aren't prioritized at the moment.
2021-02-08lei q: improve remote mboxrd UX + MUA
For early MUA spawners using lock-free outputs, we we need to on the startq pipe to silence progress reporting. For --augment users, we can start the MUA even earlier by creating Maildirs in the pre-augment phase. To improve progress reporting for non-MUA (or late-MUA) spawners, we'll no longer blindly append "--compressed" to the curl(1) command when POST-ing for the gzipped mboxrd. Furthermore, we'll overload stringify ('""') in LeiCurl to ensure the empty -d '' string shows up properly. v2: fix startq waiting with --threads mset_progress is never shown with early MUA spawning, The plan is to still show progress when augmenting and deduping. This fixes all local search cases. A leftover debug bit is dropped, too
2021-02-08INSTALL: depend on Text::ParseWords
It's been distributed with Perl since 1994, and we use it for both -imapd and lei. It's split out as a separate package in CentOS 7.x, so we'll depend on it to avoid surprising users of RPM-based distros.
2021-02-08lei q: fix arbitrary --mua command handling
Perl doesn't seem to warn for shadowed variables, here :x
2021-02-08lei import: support Maildirs
It seems to be working trivially, though I'm probably going to split out Maildir reading into a separate package rather than using LeiToMail.
2021-02-07httpd/async: avoid unnecessary on-stack delete
While this doesn't fix a known problem, this was a risky construct in case somebody uses confess/longmess inside the user-supplied callback. cf. commit 0795b0906cc81f40 ("ds: guard against stack-not-refcounted quirk of Perl 5")
2021-02-07imap: avoid unnecessary on-stack delete
None of the Content-Type attributes are long-lived (and unlikely to be memory intensive). While these callsites won't trigger $DB::args segfaults via confess or longmess, it'll make future code audits easier. cf. commit 0795b0906cc81f40 ("ds: guard against stack-not-refcounted quirk of Perl 5")
2021-02-07lei: replace --thread with --threads
Nobody is expected to use long options, but for consistency with mairix(1), we'll use the pluralized option throughout (including existing PublicInbox::{Search,SearchView}). Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/