Date | Commit message (Collapse) |
|
The cleanup doesn't seem to matter, I initially thought I needed
to handle "" (two double quotes) explicitly because that's what
Xapian does to escape a double quote inside a double-quoted
phrase. It turns out we only need to be able to pass phrases
through to Xapian unmodified, and the existing group of
["\x{201c}\x{201d}] is sufficient for our purposes.
|
|
It's conceivable some cases won't generate an empty line before
an mboxrd or mboxo From_ line. Ensure we can handle that case
and don't leave the Eml->{bdy} without a trailing LF character.
And drop an unnecessary alarm import while we're in the area.
|
|
It supports more mbox variants and it's trailing newline
behavior is probably more correct despite the previous change
to PublicInbox::Filter::Vger.
|
|
PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last
trailing newline, not every single trailing newline like
InboxWritable->import_mbox does.
Testing PublicInbox::MboxReader->mboxrd (next commit) with
scripts/import_vger_from_mbox on the LKML archive I got 2018 for
v2 development; this difference was responsible for a single
spam message(*) from out of 2722831 not being filtered correctly
and returning a different result.
(*) dated 2014-08-25
|
|
This is for consistency with --stdin and WWW front ends
which can't distinguish between phrase searches and
prefix ranges used for d:/dt:/rt:.
In any case, I expect users on the lei command-line are more
likely to use `5.days.ago' instead of `"5 days ago"'
|
|
This greatly improves the usability of d:, dt:, and rt: search
prefixes for users already familiar git's "approxidate" feature.
That is, users familiar with the --(since|after|until|before)=
options in git-log(1) and similar commands will be able to use
those dates in the WWW UI.
|
|
Catch up with recent developments.
|
|
|
|
This follows the help output change in 52342875 (lei help: split out
into separate file, 2021-02-06).
|
|
'mfolder' is familiar to mairix users, and 'path' isn't a good choice
because support will be added for IMAP.
Link: https://public-inbox.org/meta/YCBh62OqkYnr5cqw@dcvr
|
|
Tested with git 1.8.3.1 on CentOS 7.x
`plan skip_all => ...' doesn't work after some tests have run,
we have to call skip() instead.
|
|
This fixes both an old bug in "lei q" argv handling and one
recent regression introduced with the change to use approxidate.
Field prefixes are also handled correctly inside parenthesized
statements when the field follows "(" without a separation
character.
Fixes: fbb7ccabbf54a405 ("lei q: use git approxidate with d:, dt: and rt: ranges")
|
|
While '{' and '}' are rare in path names, somebody may still
use them or deal with software which does (e.g. GNU arch).
|
|
We'll be using some of this for IMAP and NNTP support in lei,
too. More will need to be done to improve code sharing and
reusability, soon, but this is a start.
|
|
I don't know if it's worth it to use libcurl directly
(nor the effort to support and maintain tests)
|
|
Similar to "lei q", "--local" means only local and "--remote"
means remote only. I can't think of a reason to have --no-*
variants for these switches.
There's also updates to the TestCommon for more common lei
cases.
|
|
Daemon-only tests can be significantly faster due to cached
configs; so give developers a chance to test only daemons to
improve productivity.
The differences between daemon and oneshot modes are minimal,
at this point.
|
|
We don't need to export for methods which are only called via
"->" or "->can".
|
|
The "ls-external" now accepts the same glob patterns used by
with lei q --{include,only,exclude}. If no glob is detected, it
will be treated as a literal substring match (like "grep -F").
Inverting matches is also supported ("grep -v").
|
|
DESTROY callbacks can clobber $?, so we must take care to
preserve it when exiting. We'll also try to make an effort to
ensure better DESTROY ordering and delete as much as possible
before x_it finishes.
We also need to load PublicInbox::Config when setting up
public inboxes.
|
|
The "#" is what TAP <https://testanything.org/> uses,
which is also consistent with what our (and many other)
test suites emit.
|
|
Perl 5.8.8/5.10.0+ can use fchdir(), and we depend on 5.10.1+
|
|
Using dashed keywords confuses the option parser without
"=" signs (and bash completion doesn't yet work with "=").
So use ":" instead of "-" as the prefix for internal ops,
since ":" is just as unlikely to be the first character of
an executable file in a user's $PATH.
|
|
MdirReader now handles files in "$MAILDIR/new" properly and
is stricter about what it accepts. eml_from_path is also
made robust against FIFOs while eliminating TOCTOU races with
between stat(2) and open(2) calls.
|
|
We need to explicitly close the write-end of the pipe in workers
to ensure they don't prevent each other from seeing EOF.
Also, make a note to keep using the pipe for now since
Linux <3.14 had broken read(2) semantics when file descriptions
are shared across threads/processes.
|
|
We'll do more requires in the top-level lei-daemon process to
save work in workers. We can also work towards aborting on
user errors in lei-daemon rather than worker processes.
"lei import -f mbox*" is finally tested inside t/lei_to_mail.t
|
|
This could lead to bad results when doing ls-tree -z
for v2 import in case there's multiple files. In any case,
the `local $/ = "\0"' in Import.pm is also eliminated to
reduce potential confusion and surprises.
|
|
We prefer BAIL_OUT or fail to die in tests (I didn't know
BAIL_OUT existed when I started the project). We can also
depend on IO::Uncompress::Gunzip being available,
We'll keep the cgi_run wrapper since the .cgi could
use some coverage and remove the FIXME note. run_script
makes tests fast enough.
|
|
This makes tests faster for users on slow TMPDIR (or not using
eatmydata) and forces coverage on a non-default switch.
Unfortunately, this doesn't yet cover InboxWritable usage.
|
|
We only care abount the number of results.
|
|
Order doesn't matter when users are completely downloading
mboxrds onto the FS and then opening them with an MUA. The
MUA is expected to sort the results in the user's preferred
order.
However, lei can start streaming the results to its destination
Maildir (or eventually IMAP/JMAP mailbox) with an MUA already
open. This will let users see recent results sooner in their
MUA, as those tend to have a higher docid. This matches the
behavior of the HTML results, as well.
As a bonus, this is around ~5% faster in a one-off, informal
test case with 66k results. I expect this to hold true in all
all cases since git has always optimized storage to favor recent
objects.
|
|
This matches the Inline::C version, and lets us test
argv overflow with $search->query_argv_to_string;
|
|
This is necessary to avoid slowdowns with pathological cases
with many dates in the query, since each rev-parse invocation
takes ~5ms.
This is immeasurably slower with one open-ended range, but
already faster with any closed range featuring two dates which
require parsing via git.
|
|
Instead of having --(sent|received)-(before|after)=s
command-line switches, we'll just try to make sense of argv so
it's usable within parenthesized statements and such.
Given the negligible performance penalty with Inline::C
process spawning, we'll probably wire this up to the
WWW interface, too.
"d:" is for mairix compatibility. I don't know if "dt:" and
"rt:" will be too useful, but they exist because of IMAP
(and JMAP).
|
|
Users are expected to be familiar with git's "approxidate"
functionality for parsing dates, so we'll expose that
in our UIs. Xapian itself has limited date parsing functionality
and I can't expect users to learn it.
This takes around 4-5ms on my aging workstation, so it'll
probably be made acceptable for the WWW UI, even.
libgit2 has a git__date_parse function which I expect to have
less overhead, but it's only for internal use at the moment.
|
|
It's no longer necessary with the changes to stop doing
FD passing in our backend.
cf. commits 5180ed0a1cd65139 and 7d440bf3667b8ef5
("lei q: eliminate $not_done temporary git dir hack")
("lei q: reorder internals to reduce FD passing")
|
|
When multiple lei(1) processes are starting in parallel without
lei-daemon already running, it's possible for them to trample
each others' socket path trying to start lei-daemon. Lock
errors.log before unlink/bind/listen. We'll add an extra
connect(2) attempt to check if the starter lost the race.
Without this change, a stress script like the following could
easily cause problems:
lei q -o ~/tmp/a foo ... &
lei q -o ~/tmp/b bar ... &
lei q -o ~/tmp/c quux ... &
lei q -o ~/tmp/d baz ... &
|
|
It shouldn't be needed since none of our subcommands will care
or attempt to format output. Once "lei show" is implemented,
we'll run "git show" directly on the result.
|
|
Packing args into an arrayref is awkward and we may be using
this API more in lei.
|
|
IPv4 gets plenty of real-world coverage, and apparently there's
Debian buildd hosts which lack IPv4(*). So ensure everything
can work on IPv6 and not cause problems for odd setups.
(*) https://bugs.debian.org/979432
|
|
For --mua users writing to lock-free -o MFOLDER destinations;
we'll keep -WINCH and send an ASCII terminal bell when results
are complete. This is intended to let early MUA spawners know
when lei2mail is done writing results.
We'll also support running arbitrary commands. It may be used
to run play(1) (from SoX), handle pipelines+redirects
(e.g. "/bin/sh -c 'echo search done | wall'") or other commands.
|
|
While using utime on the destination Maildir is enough for mutt
to eventually notice new mail, "eventually" isn't good enough.
Send a SIGWINCH to wake mutt (and likely other MUAs)
immediately. This is more portable than relying on MUAs to
support inotify or EVFILT_VNODE.
|
|
This will probably cover full Atom/HTML feed generation or any
outputs which are order-dependent, but those aren't prioritized
at the moment.
|
|
For early MUA spawners using lock-free outputs, we we need to
on the startq pipe to silence progress reporting. For
--augment users, we can start the MUA even earlier by
creating Maildirs in the pre-augment phase.
To improve progress reporting for non-MUA (or late-MUA)
spawners, we'll no longer blindly append "--compressed" to the
curl(1) command when POST-ing for the gzipped mboxrd.
Furthermore, we'll overload stringify ('""') in LeiCurl to
ensure the empty -d '' string shows up properly.
v2: fix startq waiting with --threads
mset_progress is never shown with early MUA spawning,
The plan is to still show progress when augmenting and
deduping. This fixes all local search cases.
A leftover debug bit is dropped, too
|
|
It's been distributed with Perl since 1994, and we use it for
both -imapd and lei. It's split out as a separate package in
CentOS 7.x, so we'll depend on it to avoid surprising users
of RPM-based distros.
|
|
Perl doesn't seem to warn for shadowed variables, here :x
|
|
It seems to be working trivially, though I'm probably
going to split out Maildir reading into a separate
package rather than using LeiToMail.
|
|
While this doesn't fix a known problem, this was a risky
construct in case somebody uses confess/longmess inside
the user-supplied callback.
cf. commit 0795b0906cc81f40
("ds: guard against stack-not-refcounted quirk of Perl 5")
|
|
None of the Content-Type attributes are long-lived
(and unlikely to be memory intensive). While these
callsites won't trigger $DB::args segfaults via
confess or longmess, it'll make future code audits
easier.
cf. commit 0795b0906cc81f40
("ds: guard against stack-not-refcounted quirk of Perl 5")
|
|
Nobody is expected to use long options, but for consistency
with mairix(1), we'll use the pluralized option throughout
(including existing PublicInbox::{Search,SearchView}).
Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/
|