about summary refs log tree commit homepage
path: root/lib/PublicInbox/LeiOverview.pm
DateCommit message (Collapse)
2021-10-19lei inspect: show ISO8601 {rt} and {dt}, too
While inspect is intended for debugging, the Unix epoch in seconds requires extra steps for human consumption; just steal what we used for "lei q -f json" output.
2021-10-16lei_overview: die rather than lei->fail
This will make our code more flexible in case it gets used in non-lei things.
2021-08-11lei: attempt to canonicalize away "/../" pathnames
As documented, File::Spec->canonpath does not canonicalize "/../". While we want to do our best to preserve symlinks in pathnames, leaving "/../" can mislead our inotify|kqueue usage.
2021-05-28lei q|up: support v2:/path/to/inboxdir destination
This allows "lei-managed pseudo mailing lists" as described by Konstantin. Alternates use is optional and can be enables via --shared. This doesn't manage or edit ~/.public-inbox/config; presumably there'll need to be some tweaking of search parameters before finalizing and making the inbox publicly accessible via HTTP/NNTP. Link: https://public-inbox.org/meta/20210426164454.5zd5kgugfhfwfkpo@nitro.local/T/
2021-05-23lei: drop EOFpipe in favor of PktOp
lei already uses PktOp and SOCK_SEQPACKET throughout; whereas EOFpipe had one single use in lei. Since PktOp is a strict superset of EOFpipe functionality, we may be able to get rid of EOFpipe entirely. However, lei is considered a portability canary and I'm not sure if the stable public-inbox-* code can drop EOFpipe just yet.
2021-04-05lei q: fix auth IMAP --output with remote mboxrd
IMAP authentication info is only shared amongst lei2mail workers, so we must ensure all IMAP writes go through lei2mail workers even if we don't have to access the mail through git. This allows us to decouple the latency of the remote mboxrd from the latency of the IMAP --output at the expense of extra IPC overhead within our own processes.
2021-03-30lei q: avoid redundant default setting for sort with l2m
No point in munging user-supplied $lei->{opt} when %mset_opt exists. We'll be depending on docid being in descending order for saved search support.
2021-03-28treewide: shorten temporary filename
File::Temp only requires four 'X' characters (unlike mkstemp(3), which requires six). So only so only give it 4 to avoid an 80-column violation and maybe save metadata space on FSes.
2021-03-26lei: support /dev/fd/[0-2] inputs and outputs in daemon
Since lei-daemon won't have the same FDs as the client, we need to special-case thse mappings and won't be able to open arbitrary, non-standard FDs. We also won't attempt to support /proc/self/fd/[0-2] since that's a Linux-ism. /dev/fd/[0-2] and /dev/std{in,out,err} are portable to FreeBSD, at least. mawk(1) also supports /dev/std{out,err}, as does gawk(1) (which supports everything we can support, and arbitrary /dev/fd/$FD).
2021-03-26lei q: skip lei/store->write_prepare for JSON outputs
JSON outputs won't write to lei/store at all, so there's no point in forking the store worker if it's not already running. LeiSearch object ($lse) is also fork-safe until it opens a persistent FD for Xapian/SQLite so we can unconditionally carry it across fork.
2021-03-26lei: add some labels support
"lei q" now displays labels in JSON output, "lei mark" can add or remove labels for any messages. "lei ls-label" is supported, too. Unfortunately, "lei q" won't hande "kw:" or "L:" for external messages, they must be imported, first.
2021-03-21lei q: trim JSON output
Stop showing `docid' since it's not useful with shards. `bytes' and `lines' are probably noise, but maybe could be visible in some "fuller" view. v2: t/lei_xsearch: fix warnings from {docid} removal
2021-03-21lei q: put keywords on one line in --pretty output
Don't waste precious terminal space when there are only a small number of possible keywords supported/reserved for JMAP. In the future, we may implement more sophisticated wrapping for labels, but it we'll cross tha bridge when we come to it.
2021-03-21lei q: support vmd for external-only messages
"lei q" now preserves changes per-message keywords across invocations when it's --output (Maildir or mbox) is reused (with or without --augment). In the future, these changes will be monitored via inotify, EVFILT_VNODE or IMAP IDLE, too. Unfortunately, this currently prevents "lei import" from ever importing a message that's in an external. That will be fixed in a future change.
2021-03-21lei: All Local Externals: bare git dir for alternates
This will be used for keyword (and label) storage for externals. We'll be using this to ensure we don't redundantly auto-import messages into lei/store if they're already in a local external (they can still be imported explicitly via "lei import").
2021-03-19lei_overview: unnecessary g2m capture
Nothing like the -Wunused C compiler flag in perl, AFAIK...
2021-03-09lei q: remove angle brackets around Message-IDs
They're unnecessary visual noise, and angle brackets don't always work as intended when going through Xapian's query parser. Since we already use "m:" and "refs:" instead of the actual header names, it should be obvious we're at liberty to abbreviate such things Link: https://public-inbox.org/meta/20210304184348.GA19350@dcvr/
2021-02-21lei q: support IMAP/IMAPS --output destinations
Augment (and dedupe) aren't parallel, yet, so its more sensitive to high-latency networks.
2021-02-18lei convert: mail format conversion sub-command
This will make testing IMAP support for other commands easier, as it doesn't write to lei/store at all. Like the pager and MUA, "git credential" is always spawned by script/lei (and not lei-daemon) so it has a controlling terminal for password prompts. v2: fix missing requires, correct test ordering v3: ensure config exists for IMAP auth
2021-02-10lei q: prefix --alert ops with ':' instead of '-'
Using dashed keywords confuses the option parser without "=" signs (and bash completion doesn't yet work with "="). So use ":" instead of "-" as the prefix for internal ops, since ":" is just as unlikely to be the first character of an executable file in a user's $PATH.
2021-02-08lei q: support --alert=CMD for early MUA users
For --mua users writing to lock-free -o MFOLDER destinations; we'll keep -WINCH and send an ASCII terminal bell when results are complete. This is intended to let early MUA spawners know when lei2mail is done writing results. We'll also support running arbitrary commands. It may be used to run play(1) (from SoX), handle pipelines+redirects (e.g. "/bin/sh -c 'echo search done | wall'") or other commands.
2021-02-08lei q: improve remote mboxrd UX + MUA
For early MUA spawners using lock-free outputs, we we need to on the startq pipe to silence progress reporting. For --augment users, we can start the MUA even earlier by creating Maildirs in the pre-augment phase. To improve progress reporting for non-MUA (or late-MUA) spawners, we'll no longer blindly append "--compressed" to the curl(1) command when POST-ing for the gzipped mboxrd. Furthermore, we'll overload stringify ('""') in LeiCurl to ensure the empty -d '' string shows up properly. v2: fix startq waiting with --threads mset_progress is never shown with early MUA spawning, The plan is to still show progress when augmenting and deduping. This fixes all local search cases. A leftover debug bit is dropped, too
2021-02-07ipc: wq_do => wq_io_do
We will have a ->wq_do that doesn't pass FDs for I/O.
2021-02-07lei_overview: drop unnecessary autoflush call
This was actually causing xt/lei-sigpipe.t failures, presumably due to reused/recycled workers with many externals.
2021-02-05lei q: eliminate $not_done temporary git dir hack
Another step towards simplifying lei internals. None of our current uses of ->wq_do involve FD passing, and the plan is only rely on FD passing between lei-daemon and lei(1). Internally, it ought to be possible for lei-daemon internal bits to be ordered properly to not need FD passing.
2021-02-05lei q: only start pager if output is to stdout
No need to be starting a pager if we're writing to a regular file.
2021-02-05lei q: reorder internals to reduce FD passing
While FD passing is critical for script/lei <=> lei-daemon, lei-daemon doesn't need to use it internally if FDs are created in the proper order before forking.
2021-02-04lei q: support reading queries from stdin
This will be useful on shared machines when a user doesn't want search queries visible to other users looking at the ps(1) output or similar.
2021-02-04lei: propagate curl errors, improve internal consistency
IO::Uncompress::Gunzip seems to be losing $? when closing PublicInbox::ProcessPipe. To workaround this, do a synchronous waitpid ourselves to force proper $? reporting update tests to use the new --only feature for testing invalid URLs. This improves internal code consistency by having {pkt_op} parse the same ASCII-only protocol script/lei understands. We no longer pass {sock} to worker processes at all, further reducing FD pressure on per-user limits.
2021-02-04lei: further reduce lei2mail FD pressure
We don't need to be sending errors directly to the client, but instead go through lei-daemon or the top-level one-shot process.
2021-02-04lei: reduce FD pressure from lei2mail worker
lei2mail doesn't need stdin anymore, so we can use the [0] slot for the $not_done keepalive purposes.
2021-02-03lei q: tidy up progress reporting
We won't be reporting progress when output is going to stdout since it can clutter up the terminal unless stderr != stdout, which probably isn't worth checking. We'll also use a more agnostic mset_progress which may make it easier to support worker-less invocations.
2021-02-03lei_overview: avoid unnecessary {l2m} delete
We may reuse these objects in the non-worker code paths.
2021-02-03lei q: do not leave temporary files after oneshot exit
Avoid on-stack shortcuts which may prevent destructors from firing since we're not inside the event loop. We'll also tidy up the unlink mechanism in LeiOverview while we're at it.
2021-02-01lei: remove SIGPIPE handler
It doesn't save us any code, and the action-at-a-distance element was making it confusing to track down actual problems. Another potential problem was keeping references alive too long. So do like we would a C100K server and check every write while still ensuring lei(1) exit with a proper SIGPIPE iff needed.
2021-02-01lei: more consistent dedupe and ovv_buf init
This fixes "--dedupe none" with Maildir where we don't create the object at all.
2021-01-30lei: less error-prone FD mapping
Keeping track of non-standard FDs gets tricky, so make it easier by relying on st_dev/st_ino mapping in the transmitted objects. We'll keep using numbers for the standard FDs since we need to be able to easily redirect them in the producer (main daemon) process for (gzip|bzip2|xz) if writing to a compressed mbox.
2021-01-29lei_overview: clear redundant ovv_buf definition
Declaring "my $buf" inside that `if' branch was useless, so we've been declaring it per-callback when {l2m} isn't in use.
2021-01-26lei q: drop "oid" output format
The default deduplication command-line arguments would be non-sensical for such an option and probably confusing. It doesn't seem worth the code to support OID-only output when it's easy enough to use one of the JSON formats to extract the same info. We also don't have OIDs if using remotes, and the to-be-implemented memoization will be optional.
2021-01-26lei: reinstate JSON smsg output deduplication
This was accidentally clobbered completely in ("lei q: fix JSON overview with remote externals"). There are now more tests to prevent future regressions.
2021-01-24lei q: fix JSON overview with remote externals
We can't (and don't need to) repeatedly get the $each_smsg callback for each URI since that clobbers {ovv_buf} before it can be output. I initially thought this was a dedupe-related bug and moved the dedupe code into the $each_smsg callback to minimize differences. Nevertheless it's a nice code reduction. I also thought it was related to incomplete smsg info, so {references} is now filled in correctly for dedupe.
2021-01-23lei: support remote externals
Via curl(1), since that lets us easily use tor on a per-connection basis via LD_PRELOAD (torsocks) or proxy. We'll eventually support more curl options which can allow users to get past firewalls and deal with other odd network configurations.
2021-01-22lei: show {pct} and {oid} in From_ lines and filenames
From_ lines are shown when mbox* variants are output to stdout, making {oid} and {pct} information visible without risking being propagated to other importer processes if they were in lei-specific X-* headers. Maildirs already had OIDs in the filename, now they gain Xapian {pct} in case anybody cares.
2021-01-22lei q: retrieve keywords for local, non-external messages
This isn't tested for now, so maybe it works.
2021-01-22lei_overview: rename {relevance} => {pct}
The old name was too long compared to the rest of the field names. With the Xapian method being named ->get_percent, "pct" is a well known abbreviation for "percent" and already used internally by our wrapper. ..And cleanup some excess whitespace while we're in the area.
2021-01-21lei: allow more mbox inode types
We may attempt to write an mbox to any terminal, block, or character device, not just regular files and FIFOs/pipes. The only thing that is known to not work is a directory. Sockets may be possible with some OSes (e.g. Plan 9) or filesystems. This fixes t/lei.t on FreeBSD 11.x
2021-01-21lei_overview: start implementing format detection
We'll need it for IMAP support, at least. Proper mbox family detection will be expensive, so deal with it later.
2021-01-21lei: test some likely errors due to misuse
Because user errors happen...
2021-01-21lei_overview: do not write if $lei->{1} is gone
We'll invalidate the {1} (stdout) field on SIGPIPE, so don't trigger a Perl warning by writing to it.
2021-01-18lei q: parallelize Maildir and mbox writing
With 4 dedicated workers, this seems to provide a 100-120% speedup on a 4 core machine when writing thousands of search results to a Maildir or mbox. This also sets us up for high-latency IMAP destinations in the future. This opens the door to more speedup opportunities such as optimizing dedupe locking and other ways to reduce contention. This change is fairly complex and convoluted, unfortunately. Further work may allow us to simplify it and even improve performance.