about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-09-27lei completion: workaround old Perl bug
While `$argv[-1]' is `undef' on an empty @argv, using `$argv[-1]' as a subroutine argument would fail incorrectly with: Modification of non-creatable array value attempted, subscript -1 at ... ...even though we'd never attempt to modify @_ itself in the subroutines being called. Work around the bug (tested on 5.16.3) by passing `undef' explicitly when `$argv[-1]' is already `undef'. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27fetch: support running as root
The "-w" perlop always succeeds as root, so we need to check st_mode for writability bits to detect directories we shouldn't write to. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210927124056.kj5okiefvs4ztk27@meerkat.local/
2021-09-27config: get_1: use full parameter name
Instead of passing the prefix section and key separately, pass them together as is commonly done with git-config(1) usage as well as our ->get_all API. This inconsistency in the get_1 API is a needless footgun and confused me a bit while working on "lei up" the other week.
2021-09-27lei rediff: add --drq and --dequote-only
More switches which can be useful for users who pipe from text editors. --drq can be helpful while writing patch review email replies, and perhaps --dequote-only, too.
2021-09-27lei rediff: quiet warnings from Import and Eml
lei rediff is expected to see partial patch fragments and such, so silence warnings when something isn't exactly a valid email message.
2021-09-26net_reader: drop support for IgnoreSizeErrors option
Only the ->message_string method of Mail::IMAPClient uses it, and we have no intention of using ->message_string outside of tests.
2021-09-26lei: ensure refresh_watches isn't called from workers
Only the top-level lei-daemon will do inotify/kevent.
2021-09-26inbox: cloneurl: avoid undef to hash table value
This saves us some memory for the hash slot in the common case the `cloneurl' file doesn't exist.
2021-09-26lei -f reply: fix Cc: header combining
When combining lines from To: and Cc: headers, ", " needs to be used to separate them.
2021-09-26www_listing: support /all/ search as a 302 redirect
This allows users to search /all/ from the top-level WwwListing without extra manual steps, although there's still extra network roundtrips incurred. No vertical whitespace is added, and there's no clumsy radio buttons nor menus to deal with. Users only have to use a different <input type=submit /> button. I forgot how to do this until I realized we already do something similar with multiple submit buttons for threaded vs non-threaded mboxrd.gz downloads. Link: https://public-inbox.org/meta/20210827120845.29682-1-e@80x24.org/
2021-09-26lei note-event: ignore kw_changed exceptions
The note-event worker may see changes before a Xapian shard commit happens, meaning keyword lookups fail as a result. Just emit the request to the lei/store worker since it's a fairly cheap operation at this point. We'll try harder to look for kw changes, too, since deduplication changes may lead to multiple docids being resolved for a single message.
2021-09-26search: avoid setting undef hashtable entries
`undef' entries still take up a slot in the hash table, and cause the `exists' check to false-positive in ->cleanup_shards. This should fully fix the (innocuous) messages introduced in commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
2021-09-26extmsg: search_partial: use ->isrch if available
This allows us to avoid creating ibx->{search}->{xdb} at this spot by using an `undef' value. This is a step towards eliminating the innocuous "/path/to/inboxdir/xap15 has no shards" messages introduced in commit 63d7b8ce (daemons: revamp periodic cleanup task, 2021-09-23)
2021-09-25lei ls-external: split into separate file
This was written before we had auto-loading and rarely used.
2021-09-25lei add-external: split into separate file
Also was written before we had auto-loading and rarely used.
2021-09-25lei forget-external: split into separate file
This was written before we had auto-loading, and forget-external should be a rarely-used command that's not worth loading at startup. Do some golfing while we're in the area, too.
2021-09-25lei: make pkt_op easier-to-use and understand
Since switching to SOCK_SEQUENTIAL, we no longer have to use fixed-width records to guarantee atomic reads. Thus we can maintain more human-readable/searchable PktOp opcodes. Furthermore, we can infer the subroutine name in many cases to avoid repeating ourselves by specifying a command-name twice (e.g. $ops->{CMD} => [ \&CMD, $obj ]; can now simply be written as: $ops->{CMD} => [ $obj ] if CMD is a method of $obj.
2021-09-25lei2mail: augment_inprogress: guard against closed FDs
I'm not sure what caused it, but $err was undef and caused print to fail, leading to an event loop error. Guard the timer with an eval and assume warn() can't trigger an event loop failure.
2021-09-25lei: restore old sigmask before daemon exit
If the event loop fails, we want blocking waitpid (wait4) calls to be interruptible with SIGTERM via "kill $PID" rather than SIGKILL. Though a failing event loop is something we should avoid...
2021-09-25lei up: show timezone offset with localtime
Sometimes a user (e.g. me) isn't really sure what timezone they're in...
2021-09-24fetch: support v2 w/o manifest on old WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW running and serving v2 inboxes. While -clone and "add-external --mirror" were working, -fetch was failing due to 301 redirect to $INBOX_URL/manifest.js.gz/ and not the expected 404. Update the code to deal with a JSON decode error (from the 301) and ensure v2 epochs detection is correct (and not using a shadowed variable).
2021-09-24clone|fetch|--mirror: cull manifest in partial mirrors
This makes it easier for users to enable fetching on a previously read-only epoch. Prior to this change, users were required to delete manifest.js.gz in addition to adding the writable bit. Now, they just have to "chmod +w $EPOCH_DIR".
2021-09-24clone|--mirror: fix and test against pre-manifest WWW
There may still be pre-manifest.js.gz versions of PublicInbox::WWW. running and serving v2 inboxes. Since $INBOX_URL/manifest.js.gz was not understood, it was assumed to be a Message-ID and 301-ed to "$INBOX_URL/manifest.js.gz/" with a trailing slash, so our 404 checks were invalid. Update our fallbacks to deal with 301 by catching JSON decoding errors to trigger HTML scraping. For HTML parsing, be sure to not be fooled by potential user-generated content and only scan the part after the last <hr>. We also need to avoid propagating $? from curl unnecessarily when we can continue safely. Finally, update v2mirror.t with tests to use PublicInbox::WWW from our "v1.1.0-pre1" tag to ensure these code paths get tested
2021-09-24fetch: fix skipping with multi-epoch inboxes
We need to check every epoch for writability, so don't break out of the loop when we find a URL.
2021-09-24clone|--mirror: support --epoch=RANGE for partial clones
Partial (v2) clones should be useful addition for users wanting to conserve storage while having fast access to recent messages. Continuing work started in 876e74283ff3 (fetch: ignore non-writable epoch dirs, 2021-09-17), this creates bare, read-only epoch git repos. These git repos have the remotes pre-configured, but does not fetch any objects. The goal is to allow users to set the writable bit on a previously-skipped epoch and start fetching it. Shell completion support may not be necessary given how short the epoch ranges are, here. Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://public-inbox.org/meta/20210917002204.GA13112@dcvr/T/#u
2021-09-23lei_xsearch: use localtime for user message
It's probably least confusing for user-facing messages to display times in the user's configured timezone. I considered appending "UTC" to the message and sticking with gmtime(), too, but this output isn't intended to be web-cache friendly nor expect users from across multiple timezones to view the same output.
2021-09-23lei: common --all[=remote|local] help message
It helps to be consistent and reduce the learning curve, here.
2021-09-23xcpdb: avoid race when shards are added
It's possible for the rename() sequence to cause read-only daemons using ->xdb_shards_flat to load an incomplete set of contiguous shards and get invalid docids for search results. With this change, we favor the case where search is momentarily unavailable rather than giving wrong results during the small window where Xapcmd->commit_changes runs.
2021-09-23xcpdb: -R$SHARDS creates new shards with correct perms
"Correct" meaning the permissions match that of the parent xap15 or ei15 directory.
2021-09-23test_common: reset umask on non-forking run_script
public-inbox-init sets umask for git <2.1.0, so our fork+exec replacement needs to restore the original umask of the "parent".
2021-09-23daemons: revamp periodic cleanup task
Neither Inboxes nor ExtSearch objects were retrying correctly when there are live git processes, but the inboxes were getting rescanned for search or other reasons. Ensure the scan retries eventually if there's live processes. We also need to update the cleanup task to detect Xapian shard count changes, since Xapian ->reopen is enough to detect any other Xapian changes. Otherwise, we just issue an inexpensive ->reopen call and let Xapian check whether there's anything worth reopening. This also lets us eliminate the Devel::Peek dependency.
2021-09-23gcf2 + extsearch: check for unlinked files on Linux
Check for unlinked mmap-ed files via /proc/$PID/maps every 60s or so. ExtSearch (extindex) is compatible-enough with Inbox objects to be wired into the old per-inbox code, but the startup cost is projected to be much higher down the line when there's >30K inboxes, so we scan /proc/$PID/maps for deleted files before unlinking. With old Inbox objects, it was (and is) simpler to just kill processes w/o checking due to the low startup cost (and non-portability of checking). Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210921144754.gulkneuulzo27qbw@meerkat.local/
2021-09-22lei: drop redundant WQ EOF callbacks
Redundant code is noise and therefore confusing :<
2021-09-22lei up: avoid excessively parallel --all
We shouldn't dispatch all outputs right away since they can be expensive CPU-wise. Instead, rely on DESTROY to trigger further redispatches. This also fixes a circular reference bug for the single-output case that could lead to a leftover script/lei after MUA exit. I'm not sure how --jobs/-j should work when the actual xsearch and lei2mail has it's own parallelism ("--jobs=$X,$M"), but it's better than having thousands of subtasks running. Fixes: b34a267efff7b831 ("lei up: fix --mua with single output")
2021-09-22inbox: do not waste hash slot on httpbackend_limiter
A few dozen bytes saved here can add up when we have thousands of inboxes. It also makes Data::Dumper debug output a bit cleaner.
2021-09-22lei: dclose: do not close unnecessarily
The bit about reap_compress is no longer true since LeiXSearch->query_done triggers it, instead. I only noticed this while working on "lei up".
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-09-22ipc: do not add "0" to $0 of solo workers
It's needless noise and misleads users reading "ps" into thinking there's more workers when there's only one.
2021-09-21lei: umask(077) before opening errors.log
There's a chance some sensitive information (e.g. folder names) can end up in errors.log, though $XDG_RUNTIME_DIR or /tmp/lei-$UID/ will have 0700 permissions, anyways.
2021-09-21script/lei: handle SIGTSTP and SIGCONT
Sometimes it's useful to pause an expensive query or refresh-mail-sync to do something else. While lei-daemon and lei/store can't be paused since they're shared across clients, per-invocation WQ workers can be paused safely using the unblockable SIGSTOP. While we're at it, drop the ETOOMANYREFS hint since it hasn't been a problem since we drastically reduced FD passing early in development.
2021-09-21lei q: improve --limit behavior and progress
Avoid slurping gigantic (e.g. 100000) result sets into a single response if a giant limit is specified, and instead use 10000 as a window for the mset with a given offset. We'll also warn and hint towards about the --limit= switch when the estimated result set is larger than the default limit.
2021-09-21lei q: update messages to reflect --save default
I wanted to try --dedupe=none for something, but it failed since I forgot --no-save :x So hint users towards --no-save if necessary.
2021-09-21search: drop reopen retry message
It's needless noise in syslogs for daemons and unnecessarily alarming to users on the command-line.
2021-09-21lei q: show progress on >1s preparation phase
Overwriting existing destinations safe (but slow) by default, so show a progress message noting what we're doing while a user waits.
2021-09-21lei: various completion improvements
"lei export-kw" no longer completes for anonymous sources. More commands use "lei refresh-mail-sync" as a basis for their completion work, as well. ";AUTH=ANONYMOUS@" is stripped from completions since it was preventing bash completion from working on AUTH=ANONYMOUS IMAP URLs. I'm not sure if there's a better way, but all of our code works fine without specifying AUTH=ANONYMOUS as a command-line arg. Finally, we fallback to using more candidates if none can be found, allowing multiple URLs to be completed.
2021-09-21lei lcat: support NNTP URLs
NNTP URLs are probably more prevalent in public message archives than IMAP URLs.
2021-09-21lei lcat: use single queue for ordering
If lcat-ing multiple argument types (blobs vs folders), maintain the original order of the arguments instead of dumping all blobs before folder contents.
2021-09-21lei: simplify internal arg2folder usage
We can set opt->{quiet} for (internal) 'note-event' command to quiet ->qerr, since we use ->qerr everywhere else. And we'll just die() instead of setting a ->{fail} message, since eval + die are more inline with the rest of our Perl code.
2021-09-21lei_mail_sync: account for non-unique cases
NNTP servers, IMAP servers, and various MUAs may recycle "unique" identifiers due to software bugs or careless BOFHs. Warn about them, but always be prepared to account for them.
2021-09-21lei inspect: support NNTP URLs
No reason not to support them, since there's more public-inbox-nntpd instances than -imapd instances, currently.