about summary refs log tree commit homepage
path: root/script
DateCommit message (Collapse)
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-22script/lei: describe purpose of sleep loop
It looks dumb, but I'm not about to take a runtime penalty to use signalfd|EVFILT_SIGNAL, here, either.
2021-09-21script/lei: handle SIGTSTP and SIGCONT
Sometimes it's useful to pause an expensive query or refresh-mail-sync to do something else. While lei-daemon and lei/store can't be paused since they're shared across clients, per-invocation WQ workers can be paused safely using the unblockable SIGSTOP. While we're at it, drop the ETOOMANYREFS hint since it hasn't been a problem since we drastically reduced FD passing early in development.
2021-09-17script/lei: umask(077) before execve
While my MUA also runs umask(077) unconditionally, not all MUAs do. Additionally, pagers may support writing its buffer to disk, so ensure anything else we spawn has umask(077).
2021-09-15support -C (chdir) for most non-daemon commands
Because make(1), git(1), tar(1) all support -C in this form, as do our newer commands such as lei, public-inbox-{clone,fetch}.
2021-09-15fetch: support --exit-code switch
As noted in the new manpage entry, this is useful for avoiding public-inbox-index invocations when there's nothing to update. We use 127 to match "grok-pull", and also because it doesn't conflict with any of the current curl(1) exit codes.
2021-09-15multi_git: hoist out common epoch/alternates handling
IMHO, this greatly improves code sharing and organization between v2, extindex, and lei/store. Common git-related logic for these is lightly-refactored and easier to reason about. The impetus for this big change was to ensure inboxes created+managed by public-inbox-{clone,fetch} could have alternates and configs setup properly without depending on SQLite (via V2Writable). This change does that while making old code shorter and better factored.
2021-09-12init: set a useful description
"Unnamed repository" for v1 inboxes was misleading, and having a non-existent description for v2 was equally annoying, so set a short description based on the primary address. We remove descriptions when setting up new test inboxes to preserve the behavior of the t/lei-mirror.t test case.
2021-09-12new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is complex since multiple commands are required to clone and fetch into epochs. Unlike grokmirror, these commands do not require any configuration. Instead, they rely on existing git config files and work like "git clone --mirror" and "git fetch", respectively. Like grokmirror, they use manifest.js.gz, but only on a per-inbox basis so users won't have to clone every inbox of a large instance nor edit config files to include/exclude inboxes they're interested in.
2021-09-11lei: fix handling of broken lei.saved-search config files
lei shouldn't become unusable if a config file is invalid. Instead, show the "git config" stderr and attempt to continue gracefully. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210910141157.6u5adehpx7wftkor@meerkat.local/
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-08httpd: set psgi.url_scheme to 'https' for TLS listeners
For users using the native TLS functionality of -httpd (instead of using nginx + Plack::Middleware::ReverseProxy), psgi.url_scheme=http was wrong and would lead to improper redirects.
2021-08-04extindex: fix boost with partial runs
Boost relies on knowledge of all inboxes in a given config file to work properly. So while we support indexing a subset of inboxes, we must still account for boost in inboxes we're not indexing. So split internal inbox groups into "known" and "active", where previously we only cared for inboxes which were being actively indexed. Furthermore, boost checks need to be applied when a message arrives in different inboxes across multiple invocations. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210802204058.vscbxs5q7xyolyu2@nitro.local/
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-28lei: die on ECONNRESET
ECONNRESET should be rare on a private local socket, and if we hit it, it's because we're hitting the listen() limit.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-07-25init: support git <2.30 for "-c KEY=VALUE" args
It turns out `--fixed-value' is a relatively new git-config(1) feature in git 2.30+ (December 2020). So use the quotemeta perlop for now since it seems compatible-enough for POSIX ERE used by git.
2021-07-25extindex: support --dedupe[=MSGID]
Sometimes I just want to dedupe a single Message-ID to test something, and this lets me do it. This patch appears to do what its supposed to. But it also appears to be finding duplicates that were previously missed. That's a good thing, but I wish I understood what seems to be fixed :x I'm not sure why the previous ExtSearchIdx.pm (blob 357312b8) was causing messages to be missed, even, and why this patch seems to fix it... And it's not infinite looping, either. Anyways, before this patch, "-extindex --dedupe" was taking ~5 min to no-op every message (after the initial full --dedupe run which took over a day to run). No-op --dedupes now take just under 2 hours to scan every single cross-posted message for a no-op dedupe. The initial dedupe took nearly 44 hours on my system for <https://yhbt.net/lore/all/> due to SATA-2 TLC SSD latency on 3 gigantic Xapian shards. Running --dedupe with this change seems to prevent /BUG\?.*?not deduplicated properly/ stderr messages from being triggered by View.pm. Current versions of -extindex do not seem susceptible to introducing duplicates.
2021-07-22init: allow arbitrary key-values via -c KEY=VALUE
This won't blindly append identical key=values, but allows specifying multiple, different key=value pairs as long as the values are different.
2021-07-20httpd: fix SIGHUP by invalidating cache on reload
Since we require separate PublicInbox::HTTPD instances for each listen socket address (in order to support {SERVER_<NAME|PORT>} for PSGI env), the old cache needed to be invalidated on rare app refreshes. SIGHUP has always been broken in -httpd (but not -imapd or -nntpd) due to this cache. Update the daemon documentation and 5.10.1-ize some bits while we're in the area.
2021-07-06extindex: implement --dedupe to fix old extindices
This is intended to fix older indices that had deduplication bugs for matching content. It'll also make dealing with future changes to ContentHash easier since that's never guaranteed stable. It also supports --dry-run to print changes only without making them.
2021-05-28script/lei: drop leftover message about fallback
Non-daemon lei isn't implemented, anymore.
2021-05-26lei: require Socket::MsgHdr or Inline::C, drop oneshot
The cost of supporting separate code paths between oneshot and daemon isn't worth the trouble; especially if there are more users to support. The test suite time nearly doubles with oneshot, so that's hurting developer productivity. FD passing is currently required to work efficiently with remote HTTP(S) queries which return large messages, as seen in commit 708b182a57373172f5523f3dc297659d58e03b58 ("ipc: wq: handle >MAX_ARG_STRLEN && <EMSGSIZE case"). Additionally, upcoming support for IMAP IDLE and inotify-based monitoring of Maildirs cannot work properly without a background daemon.
2021-05-05script/public-inbox-extindex: chmod +x
Everything else that's intended to be executable at some point has the executable bit set. Remove an inaccurate comment while we're at it.
2021-05-01lei edit-search: support relocating lei.q.output
The contents of the old lei.q.output will not be removed, but will be converted into the new one.
2021-04-28lei: simple WQ workers use {wq1} field
This lets us share more code and reduces cognitive overhead when it comes to picking names (because {lsss} was ridiculous). We'll need to ensure the first error set in lei is the actual error we exit with, otherwise things can get confusing and errors may get lost.
2021-04-22lei: XDG_RUNTIME_DIR=/dev/null disables daemon mode
We'll support this mode of operation for now to quiet down testing of oneshot mode where the daemon doesn't persist.
2021-04-17lei q: fix MUA spawn after reading query from stdin
Since "lei q" may read queries from stdin, we must reconnect a known terminal before spawning terminal MUAs. Attempt to use stdout as stdin for this purpose, since terminal MUAs tend to expect stdout to be a terminal. Reported-By: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87v98klxg3.fsf@kyleam.com/
2021-04-05script/lei: waitpid for git-credential and pager
We need to ensure we reap things we spawn.
2021-04-02lei: fix git-credential handling
I completely forgot about git-credential prompting when making lei background the client process for MUA. Now it backgrounds itself only for the MUA when no FDs are passed, since the MUA is the final command run. Otherwise, it relies on FD passing as before. Fixes: c790a75439f3a1db ("script/lei: background ourselves on MUA/pager exec")
2021-04-01script/lei: background ourselves on MUA/pager exec
This ought to give the MUA or pager exclusive access to the controlling terminal. The downside is we can only exec the pager or MUA once per invocation, but I can't imagine a valid case for running those things multiple times, either. Note: I'm no expert when it comes to terminal control matters, but this allows Ctrl-Z-ed mutt instance to come back and is a nice code reduction, as well.
2021-03-28treewide: shorten temporary filename
File::Temp only requires four 'X' characters (unlike mkstemp(3), which requires six). So only so only give it 4 to avoid an 80-column violation and maybe save metadata space on FSes.
2021-02-08lei: drop BSD::Resource usage
It's no longer necessary with the changes to stop doing FD passing in our backend. cf. commits 5180ed0a1cd65139 and 7d440bf3667b8ef5 ("lei q: eliminate $not_done temporary git dir hack") ("lei q: reorder internals to reduce FD passing")
2021-02-08lei q: SIGWINCH process group with the terminal
While using utime on the destination Maildir is enough for mutt to eventually notice new mail, "eventually" isn't good enough. Send a SIGWINCH to wake mutt (and likely other MUAs) immediately. This is more portable than relying on MUAs to support inotify or EVFILT_VNODE.
2021-02-07script/lei: avoid waitpid(-1, ...) to keep tests fast
We only spawn one process to be reaped at the moment. tests will run the contents of script/* in the same process if possible, so any test scripts which spawn -httpd or other read-only can cause us to stall with waitpid(-1, ...)
2021-02-07init: lowercase -j for --jobs
This is taken from common implementations of make(1) and only affected people using the command-line help output.
2021-02-04lei: use sleep(1) loop for infinite sleep
Perl may internally race and miss signals due to a lack of self-pipe / eventfd / signalfd / EVFILT_SIGNAL usage. While our event loop paths avoid these problems by using signalfd or EVFILT_SIGNAL, thse sleep() calls are not within the event loop.
2021-02-01lei: avoid ETOOMANYREFS, cleanup imports
As with PublicInbox::IPC, we'll attempt to bump RLIMIT_NOFILE and transparently workaround ETOOMANYREFS. If that fails, we'll give the user a hint to bump RLIMIT_NOFILE since ETOOMANYREFS is an uncommon error which users may be unfamiliar with. Found while stress testing for segfaults.
2021-02-01lei: increase initial timeout
PublicInbox::Listener unconditionally sets O_NONBLOCK upon accept(), so we need a larger timeout under heavy load since there's no "dataready" accept filter on the listener. With O_NONBLOCK already set, we don't have to set it at ->event_step_init
2021-01-24ipc: get rid of wq_set_recv_modes
Just open every FD as read/write. Perl (or any non-broken runtime) won't care and won't attempt to use F_SETFL to alter file description flags; as attempting to change those would lead to unpleasant side effects if the file description is shared with another process.
2021-01-23lei: support remote externals
Via curl(1), since that lets us easily use tor on a per-connection basis via LD_PRELOAD (torsocks) or proxy. We'll eventually support more curl options which can allow users to get past firewalls and deal with other odd network configurations.
2021-01-22lei: remove INT/QUIT/TERM handlers, fix daemon EOF
The signal handlers on the client side were unnecessary, all we need is to handle socket EOF properly in the daemon by killing xsearch and l2m workers.
2021-01-15lei: pass FD to CWD via cmsg, use fchdir on server
Perl chdir() automatically does fchdir(2) if given a file or directory handle since 5.8.8/5.10.0, so we can safely rely on it given our 5.10.1+ requirement. This means we no longer have to waste several milliseconds loading the Cwd.so and making stat() calls to ensure ENV{PWD} is correct and usable in the server. It also lets us work in directories that are no longer accessible via pathname.
2021-01-14daemon+watch: fix localization of %SIG for non-signalfd users
It turns out "local" did not take effect in the way we used it: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Fortunately, none of the old use cases seem affected, unlike the previous lei change to ensure consistent SIGPIPE handling.
2021-01-14lei: test SIGPIPE, stop xsearch workers on client abort
The new test ensures consistency between oneshot and client/daemon users. Cancelling an in-progress result now also stops xsearch workers to avoid wasted CPU and I/O. Note the lei->atfork_child_wq usage changes, it is to workaround a bug in Perl 5: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> This switches the internal protocol to use SOCK_SEQPACKET AF_UNIX sockets to prevent merging messages from the daemon to client to run pager and kill/exit the client script.
2021-01-12lei_xsearch: transfer 4 FDs internally, drop IO::FDPass
It's easier to make the code more generic by transferring all four FDs (std(in|out|err) + socket) instead of omitting stdin. We'll be reading from stdin on some imports, and possibly outputting to stdout, so omitting stdin now would needlessly complicate things. The differences with IO::FDPass "1" code paths and the "4" code paths used by Inline::C and Socket::MsgHdr are far too much to support and test at the moment.
2021-01-12lei: run pager in client script
While most single keystrokes work fine when the pager is launched from the background daemon, Ctrl-C and WINCH can cause strangeness when connected to the wrong terminal.
2021-01-12lei: get rid of client {pid} field
Using kill(2) is too dangerous since extremely long queries may mean the original PID of the aborted lei(1) client process to be recycled by a new process. It would be bad if the lei_xsearch worker process issued a kill on the wrong process. So just rely on sending the exit message via socket.
2021-01-12ipc: start supporting sending/receiving more than 3 FDs
Actually, sending 4 FDs will be useful for lei internal xsearch work once we start accepting input from stdin. It won't be used with the lightweight lei(1) client, however. For WWW (eventually), a single FD may be enough.