about summary refs log tree commit homepage
path: root/lib
DateCommit message (Collapse)
2021-11-10lei q: make HTTP(S) query strings even less ugly
Following commit 57fed2e4b78ed394 (lei: normalize whitespace in remote queries, 2021-09-11), leaving the trailing `\n' from stdin queries to be normalized to ` ' (SP) causes it to appear as `+' in URLs, which Xapian ignores.
2021-11-10lei q: disallow "\n" in argv[] elements
I don't expect this to be hit in real-world use via normal interactive shells. However, somebody could accidentally add "\n" in languages (e.g. Perl, C) where it's easy to pass "\n" in argv[].
2021-11-10lei up: infer rawstr from old searches via trailing "\n"
For --stdin searches created prior to commit 666dde69a3f6 (lei q|up: fix saved searches for single-phrase search, 2021-11-08) we still want to be able to run "lei up" on them without regressions. So assume nobody manages to enter "\n" as an argv[] element and consider the presence of "\n" as a previous --stdin use. This fixes errors from "lei up" such as: lei_xsearch 2 wq_worker: Exception: Key too long: length was 840 bytes, maximum length of a key is 255 bytes at ../PublicInbox/IPC.pm line 250. Fixes: 666dde69a3f6 ("lei q|up: fix saved searches for single-phrase search")
2021-11-10ipc: note failing sub name
Hopefully problems can get diagnosed more quickly with the sub name in the error message.
2021-11-10solver: support sha256 coderepos
Tested manually on a newish project I'm working on.
2021-11-09lei q|up: fix saved searches for single-phrase search
`"' (double-quote) needs to be quoted for stdin searches. We also need to differentiate between "lei q --stdin" usage when calling "lei up", do it by setting an internal "rawstr" knob to ensure we can parse the config properly regardless of whether the initial search used --stdin or not.
2021-11-09searchidx: index "diff --git a/... b/..." headers
While we do detailed indexing of git diffs, the header itself was failing and queries like 'nq:diff' would not work. Noticed-by: Rob Herring <robh@kernel.org>
2021-11-04lei_curl: use http.proxy knob via URL match for curl
Using the --proxy on the command-line affects the entire lei invocation, and users searching HTTP(S) remotes and writing to an IMAP folder may want more fine-grained proxy use: lei q -o imap://no-proxy.example/foo -O https://need-proxy.example/bar ...
2021-11-03doc: lei-q: document SEARCH TERMS prefixes
The new Documentation/common.perl file will be used for all manpages in the future.
2021-11-02lei <rediff|rm|tag>: stdin implies `-F eml'
These commands are usually run on a single message, so saving the user the trouble of typing `-F eml' on the command-line seems reasonable. I don't think commands like "index" and "import" will be too useful for single messages, though.
2021-11-02lei: simplify common LeiInput users with ->wq1_start
This method replaces a common pattern of starting workers, preparing internal auth ops, and asynchronous waiting of command completion. It also adds missing LeiAuth support to rediff and rm which rarely need auth.
2021-11-02lei mail-diff: do not default to 'eml'
In retrospect, this doesn't make sense, since it needs at least two messages to diff. So go about "normal" input rules and require users to specify the format.
2021-11-02mbox_reader: do not blindly pass --rsyncable to gzip
FreeBSD gzip does not support --rsyncable, though my VM usually has pigz installed.
2021-11-01treewide: kill problematic "$h->{k} //= do {" assignments
As stated in the previous change, conditional hash assignments which trigger other hash assignments seem problematic, at times. So replace: $h->{k} //= do { $h->{x} = ...; $val }; $h->{k} // do { $h->{x} = ...; $hk->{k} = $val }; "||=" is affected the same way, and some instances of "||=" are replaced with "//=" or "// do {", now.
2021-11-01idx_stack: avoid conditional hash assignment weirdness
I've been seeing the following error on occasion during "make check-run": $PWD/t/data-gen/reindex-time-range.v1-master index failed: Modification of a read-only value attempted at $DIR/lib/PublicInbox/SearchIdx.pm line 899, <$r> line 1. Perhaps this fixes it. In any case, a construct of: $h->{k} //= do { $h->{x} = ...; $val }; seems wrong and may cause Perl to error out depending on how hashes are randomized.
2021-10-31lei_input: disallow uppercase characters for labels
Xapian boolean terms rely on upper-case prefixes, so the terms themselves need to be all lowercase.
2021-10-30lei_to_mail: avoid SEGV on worker exit via SIGTERM
->DESTROY ordering via "exit()" calls is tricky, and dedupe checks were causing problems. AFAIK, this only affects users who manually enable WAL on lei/store/ei*/over.sqlite3. Fortunately, there is no data corruption as a result even though "read-only" WAL requires write permissions.
2021-10-30lei_xsearch: quiet error message on SIG{PIPE,TERM}
SIGPIPE and SIGTERM are common and user-induced, so they're not worth warning on. Add the value of "$?", though, since it can help users notice other errors (e.g. SIGSEGV).
2021-10-30lei_to_mail: limit workers for text, reply and v2 outputs
"text" and "reply" outputs are intended for the pager, so parallelizing them is a waste of resources. v2 has shards, of course, so parallelizing writes to it is also a waste since the deduplication work is a bit more complex.
2021-10-30lei: do not access {sock} after SIGPIPE
It's possible for this to break out of the event loop if note_sigpipe fires via PktOp in the same iteration.
2021-10-28test_common: clear XDG_CACHE_HOME before lei tests
We don't want to read a users' $XDG_CACHE_HOME/lei/all_locals_ever.git during tests. Reported-by: Thomas Weißschuh <thomas@t-8ch.de> Tested-by: Thomas Weißschuh <thomas@t-8ch.de> Link: https://public-inbox.org/meta/f239abac-4aee-4573-a0d6-e533c7a32662@t-8ch.de/
2021-10-28lei rm: move generic input_maildir_cb to LeiInput parent class
It's not much of a savings, right now, but maybe it can be in the future. I wanted to eliminate the "lei convert" one, too, but convert needs to preserve keywords which isn't possible with the generic fallback, so new tests were written for convert, instead.
2021-10-28lei sucks: show nproc in CPU info
Some bugs are triggered with more CPUs, some with 1 CPU.
2021-10-28lei convert: remove redundant input_net_cb
Use the one provided by the LeiInput parent class.
2021-10-28lei convert: use "--output" in failure message
The extra dashes should help users find the correct option more easily.
2021-10-28xt/net_writer_imap: test "lei convert" w/ IMAP source
I just did a double-take and nearly thought authentication was broken while reading LeiConvert.pm. Add a comment in LeiConvert.pm to clarify things, too.
2021-10-28lei add-watch: ensure folders are known to mail_sync.sqlite3
This prevents noisy errors in syslog when running t/lei-watch.t
2021-10-27lei q: fix remote import accounting
We need to update the {-nr_remote_eml} counter regardless of progress display being enabled since it's needed for saved searches. We'll also split out the {-imported} flag separately and only call LeiStore->done if a new message was imported. Note: this change is NOT expected to fix errors reported by Thomas in <ebf92218-1470-4602-b534-6dae59639dc6@t-8ch.de> Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-27test_common: key test inboxes to init.defaultBranch
This lets users change their global init.defaultBranch config knob in ~/.gitconfig or similar without breaking tests. Reported-by: Thomas Weißschuh <thomas@t-8ch.de> Tested-by: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-27lei mail-diff: support more inputs, split newlines
Support --in-format like the rest of LeiInput users, and don't default to .eml if a per-input format was specified. In any case, I saved a bunch of messages from mutt which uses mboxcl2. We'll also split newlines for diff, since it's a pain to read diffs with escaped "\n" characters in them.
2021-10-26lei_to_mail: only run lms_write_prepare for IMAP+Maildir
Mail synchronization in lei_to_mail only works for IMAP and Maildir; so don't waste time preparing mbox* writers for it.
2021-10-26input_pipe: account for undefined {sock}
It's possible for ->event_step to fire twice due to ->requeue with EPOLLET (but not EPOLLONESHOT). So account for that and avoid causing event loop errors as a result.
2021-10-26lei rm|tag: drop redundant mbox+net callbacks
These are supplied by the base LeiInput class
2021-10-26lei p2q: use LeiInput for multi-patch series
The LeiInput backend now allows p2q to work like any other command which reads .eml, .patch, mbox*, Maildir, IMAP, and NNTP input. Running "git format-patch --stdout -1 $COMMIT" remains supported. This is intended to allow lower memory use while parsing "git log --pretty=mboxrd -p" output. Previously, the entire output of "git log" would be slurped into memory at once. The intended use is to allow easy(-ish :P) searching for unapplied patches as documented in the new example in the manpage.
2021-10-26lei: add net getopt spec to various commands
All of these commands should support --proxy, at least, if not other curl options.
2021-10-26lei inspect: fix atfork hook
The misnamed sub wasn't firing, but was unlikely to be noticeable given the short lifetime of the process. Fixes: 1f887bd51d92b0d4 ("lei inspect: add atfork hook")
2021-10-26lei q: enable expensive Xapian flags
FLAG_PURE_NOT is too expensive for public-facing WWW use, but lei isn't public-facing. We'll also unconditionally enable phrase search on old "chert" DBs since lei doesn't need to worry about fairness across 10K users.
2021-10-26eml: keep body if no headers are found
This easily allows us to treat "git diff" output as header-less "messages" for commands such as "lei p2q".
2021-10-26lei p2q: document --uri, add examples
This is useful for users lacking in local storage. Also, referencing lei-add-external(1) seems to make less sense than referencing lei-q(1). We'll also start dropping years from the copyright statement to reduce future churn.
2021-10-26www: mirror: fix rendering of NNTP URLs
As of commit 738c4a65, the code for reporting NNTP information in _/text/mirror/ incorrectly uses ->imap_url rather than ->nntp_url. Fixes: 738c4a65719e6278 ("www: various help text updates")
2021-10-25lei_to_mail: write directly to mail_sync.sqlite3
No need to go through the lei/store process when we write mail_sync.sqlite3. This ought to reduce ENOBUFS errors (and the sleep workaround) on RAM-starved systems.
2021-10-25www: $MSGID/raw: set charset in HTTP response
By using the charset specified in the message, web browsers are more likely to display the raw text properly for human readers. Inspired by a patch by Thomas Weißschuh: https://public-inbox.org/meta/20211024214337.161779-3-thomas@t-8ch.de/ Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-25gzip_filter: delay async wcb call
This will let us modify the response header later to set a proper charset for Content-Type when displaying raw messages. Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-24viewvcs: die on tmpfile() errors
Just let Plack::Util::run_app catch the error and generate a 500 response for it.
2021-10-24git: avoid Perl5 internal scratchpad target cache
Creating a scalar ref directly off substr() seemed to be causing the underlying non-ref scalar to end up in Perl's scratchpad. Assign the substr result to a local variable seems sufficient to prevent multi-megabyte SVs from lingering indefinitely when a read-only daemon serves rare, oversized blobs.
2021-10-24thread: avoid Perl5 internal scratchpad target cache
The use of array-returning built-ins such as `grep' inside arrayref declarations appears to result in permanently allocated scratchpad space for caching according to my malloc inspector. Thread skeletons get discarded every response, but multiple skeletons can exist in memory at once, so do what we can to prevent long-lived allocations from being made, here. In other words, replacing constructs such as: my $foo = [ grep(...) ]; with: my @foo = grep(...); Seems to ensure the mortality of the underlying array.
2021-10-24listener: emit warnings on EPERM
In retrospect, warnings for EPERM on accept4(2) failure may help detect misconfigured firewalls, so start emitting warnings for EPERM. Fwiw, I've never known excessive EPERM warnings to be excessively noisy in other TCP services I've run over the years.
2021-10-24http: use a larger buffer for ->getline responses
64K matches the Linux pipe default, and matches what we use in httpd/async and qspawn. This should reduce syscalls used for serving git packs via dumb HTTP and any ->getline code paths used by other PSGI code. This appears to speed up HTML rendering by w3m when serving giant HTML responsees from the Devel::Mwrap::PSGI memory debugger.
2021-10-24shared_kv: remove cache_size attribute support
We're not using it, anywhere.
2021-10-24lei export-kw: skip read-only IMAP folders
Since we want to store IMAP flags asynchronously and not wait for results, we can't check for IMAP errors this way and end up wasting bandwidth on public-inbox-imapd. Now, we just check PERMANENTFLAGS up front to ensure a folder can handle IMAP flag storage before proceeding.