public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2023-11-16	lei convert: fix repeat and idempotent v2 output
	We should be able to treat v2 outputs just like any other mail format, with the exception that content dedupe is always enforced by the v2 format. This allows users hosting v2 public-inboxes to catch up broken synchronization from alternate archives such as the mbox archives hosted by https://lists.gnu.org/ Link: https://public-inbox.org/meta/20231114-hypersonic-papaya-starling-e1cfc8@nitro/
2023-11-15	treewide: more autodie safety fixes for older Perl
	Avoid mixing autodie use in different scopes since it's likely to cause problems like it did in Gcf2. While none of these fix known problems with test cases, it's likely worthwhile to avoid it anyways to avoid future surprises. For Process::IO, we'll add some additional tests in t/io.t to ensure we don't get unintended exceptions for try_cat.
2023-11-15	t/lei-import: account for more verbose error
	Perl 5.16.3 on CentOS seems more verbose in one of the EIO tests. Relax the regexp so we can account for extra errors reported by Perl.
2023-11-14	config: avoid eidx_key and newsgroup conflicts
	Start lowercasing newsgroup names automatically since uppercase names are incompatible with IMAP and POP3 and also causes problems with both -extindex and -cindex. We'll also warn on eidx_key and newsgroup conflicts to avoid sometimes subtle breakage when using -extindex and -cindex.
2023-11-13	xap_helper: stricter and harsher error handling
	We'll require an error stream for dump_ibx and dump_roots commands; they're too important to ignore. Instead of writing code to provide diagnostics for errors, rely on abort(3) and the -ggdb3 compiler flag to generate nice core dumps for gdb since all commands sent to xap_helper are from internal users. We'll even abort on most usage errors since they could be bugs in split2argv or our use of getopt(3). We'll also just exit on ENOMEM errors since it's the easiest way to recover from those errors by starting a new process which closes all open Xapian DB handles.
2023-11-13	spawn: don't append to scalarrefs on stdout/stderr
	None of our current code relies on it, and I can't imagine it's something we'd need in the future, actually... This keeps the door open for relying more on Spawn in TestCommon.
2023-11-13	xap_client: spawn C++ xap_helper directly
	No need to suffer through an extra dose of slow Perl load times when we can drive the build in the big parent Perl process and get the executable path name to pass to spawn directly.
2023-11-11	mda: fix and test some usage problems
	-mda now honors `--help' properly and invocations missing ORIGINAL_RECIPIENT now fail with EX_NOUSER. Helped-by: Leah Neukirchen <leah@vuxu.org> Link: https://public-inbox.org/meta/87msvlguqu.fsf@vuxu.org/
2023-11-11	mda\|learn\|watch: support dropUniqueUnsubscribe config
	List-Unsubscribe headers with unique identifiers (such as those generated by our examples/unsubscribe.milter) should not end up in public archives. Add a new config knob to strip List-Unsubscribe headers if they have the `List-Unsubscribe-Post: List-Unsubscribe=One-Click' header. Unfortunately, this breaks DKIM signatures if the signature covers either of these List-Unsubscribe* headers. However, breaking DKIM is the lesser evil compared to any archive reader being able to stop archival by an independent archivist. As much as I would like this to be the default, it probably affects few users at the moment since very few mailing lists use unique identifiers in List-Unsubscribe (but that number has grown, recently).
2023-11-11	t/lei-import: skip strace for restricted systems
	Systems with Yama can restrict ptrace(2) (the underlying syscall used by strace(1)) and make it difficult to test error handling via error injection. Just skip the tests on such systems since it's probably not worth the effort to start using prctl(2) to enable the test on such systems.
2023-11-10	www: add topics_(new\|active).(html\|atom) endpoints
	This seems like a easy (but WWW-specific) way to get recently created and recently active topics as suggested by Konstantin. To do this with Xapian will require a new columns and reindexing; and I'm not sure if the current lei handling of search results by dumping results to a format readable by common MUAs would work well with this. A new TUI may be required... Suggested-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20231107-skilled-cobra-of-swiftness-a6ff26@meerkat/
2023-11-09	lei_input: always close single `eml' inputs
	This matches the behavior we have for multi-message mbox files since we rely on ->close to detect errors on bad mboxes. This ensures we'll notice errors reading single messages from stdin. We'll also start relying more on strace error injection to test error handling.
2023-11-09	ipc: simplify partial sendmsg fallback
	In the rare case sendmsg(2) isn't able to send the full amount (due to buffers >=2GB on Linux), use print + (autodie)close to send the remainder and retry on EINTR. `substr' should be able to avoid a large malloc via offsets and CoW on modern Perl.
2023-11-07	lei: fix SIGPIPE on large result sets to pager
	When dealing with large search results, we need to deal with EPIPE not just from the pager, but also EPIPE or ECONNRESET between lei_xsearch and lei2mail processes. Without this fix, lei_xsearch processes could linger and get stuck writing to dead lei2mail processes if a user aborts the pager early during a large result set. To ensure lei_xsearch processes don't linger around after lei2mail workers all die, we must close $l2m->{-wq_s2} before spawning lei_xsearch processes, since $l2m->{-wq_s2} is only used in lei2mail workers. For `git cat-file' processes, we also need to trigger PublicInbox::Git->close to handle unpredictable destructor ordering to avoid using uninitialized IO refs. This combines with the `git_to_mail' change to deal with process cleanup handling from premature shutdowns. To test all this, we can't just rely on a single message being large, but also need to rely on the result set being large enough to saturate the lei_xsearch -> lei2mail socket so we rely on GIANT_INBOX_DIR once again.
2023-11-03	t/cindex+extsearch: use write_file, autodie, etc.
	write_file is a new API which makes setting up config files more pleasant, while autodie and scalarref redirects (in tests) have been available for a while, now. So do what we can to reduce the code burden we have.
2023-11-03	move read_all, try_cat, and poll_in to PublicInbox::IO
	The IO package seems like a better home for I/O subs than the Git package. We lose the 60 second read timeout for `git cat-file --batch-*' processes since it's probably not necessary given how reliable the code has proven and things would fall over hard in other ways if the storage device were completely hosed.
2023-11-03	io: introduce write_file helper sub
	This is pretty convenient way to create files for diff generation in both WWW and lei. The test suite should also be able to take advantage of it.
2023-11-03	replace ProcessIO with untied PublicInbox::IO
	This fixes two major problems with the use of tie for filehandles: * no way to do fcntl, stat, etc. calls directly on the tied handle, forcing callers to use the `tied' perlop to access the underlying IO::Handle * needing separate classes to handle blocking and non-blocking I/O As a result, Git->cleanup_if_unlinked, InputPipe->consume, and Qspawn->_yield_start have fewer bizzare bits and we can call `$io->blocking(0)' directly instead of `(tied *$io)->{fh}->blocking(0)' Having a PublicInbox::IO class will also allow us to support custom read buffering which allows inspecting the current state.
2023-11-03	treewide: use ->close to call ProcessIO->CLOSE
	This will open the door for us to drop `tie' usage from ProcessIO completely in favor of OO method dispatch. While OO method dispatches (e.g. `$fh->close') are slower than normal subroutine calls, it hardly matters in this case since process teardown is a fairly rare operation and we continue to use `close($fh)' for Maildir writes.
2023-11-03	ds: don't try ->close after ->accept_SSL failure
	Eric Wong <e@80x24.org> wrote: > --- a/lib/PublicInbox/DS.pm > +++ b/lib/PublicInbox/DS.pm > @@ -341,8 +341,8 @@ sub greet { > my $ev = EPOLLIN; > my $wbuf; > if ($sock->can('accept_SSL') && !$sock->accept_SSL) { > - return CORE::close($sock) if $! != EAGAIN; > - $ev = PublicInbox::TLS::epollbit() or return CORE::close($sock); > + return $sock->close if $! != EAGAIN; > + $ev = PublicInbox::TLS::epollbit() or return $sock->close; > $wbuf = [ \&accept_tls_step, $self->can('do_greet')]; > } > new($self, $sock, $ev \| EPOLLONESHOT); Noticed this on deploy: -----8<----- Subject: [PATCH] ds: don't try ->close after ->accept_SSL failure ->accept_SSL failures leaves the socket ref as a GLOB (not IO::Handle) and unable to respond to the ->close method. Calling close in any form isn't actually necessary at all, so just let refcounting destroy the socket.
2023-11-01	ds: move maxevents further down the stack
	The epoll implementation is the only one which respects the limit (kevent would, but IO::KQueue does not). In any case, I'm not a fan of the maxevents=1000 historical default since it leads to fairness problems with shared non-blocking listeners across multiple daemon workers.
2023-10-31	poll+select: check EBADF + POLLNVAL errors
	I hit this in via select running -cindex with some other experimental patches. I can't reproduce the problem, though, but this ensure we have a chance to diagnose it if it happens again instead of looping on select(2) => EBADF.
2023-10-28	treewide: use run_qx where appropriate
	This saves us some code, and is a small step towards getting ProcessIO working with stat, fcntl and other perlops that don't work with tied handles.
2023-10-25	cindex: quiet --prune when checking objectFormat
	Most coderepos don't have extensions.objectFormat set, so it's senseless to emit warnings on failures. Fixes: 709fcf00c4d5 (cindex: use run_await to read extensions.objectFormat)
2023-10-25	drop psgi_return, httpd/async and GetlineBody
	Now that psgi_yield is used everywhere, the more complex psgi_return and it's helper bits can be removed. We'll also fix some outdated comments now that everything on psgi_return has switched to psgi_yield. GetlineResponse replaces GetlineBody and does a better job of isolating generic PSGI-only code.
2023-10-25	xt/check-run: call DS->Reset after all tests
	This ensures reused processes get a clean start and avoids surprises as we develop more code around the DS event loop.
2023-10-25	spawn: support synchronous run_qx
	This is similar to `backtick` but supports all our existing spawn functionality (chdir, env, rlimit, redirects, etc.). It also supports SCALAR ref redirects like run_script in our test suite for std{in,out,err}. We can probably use :utf8 by default for these redirects, even.
2023-10-25	limiter: split out from qspawn
	It's slightly better organized this way, especially since `publicinboxLimiter' has its own user-facing config section and knobs. I may use it in LeiMirror and CodeSearchIdx for process management.
2023-10-24	cindex: basic inboxes are non-fatal for --associate
	We need to gracefully continue when a user tries to associate with --all but has basic (or completely unindexed) inboxes.
2023-10-24	t/cindex: use autodie
	More tests to come, so cut down on the noise in the test code.
2023-10-23	t/init.t: don't modify $HOME/.public-inbox/config in test
	Oops :x
2023-10-18	init: drop extraneous `+'
	It's actually valid Perl syntax, but still confusing to look at. Fixes: add90b9504f4 ("support -C (chdir) for most non-daemon commands")
2023-10-18	t/lei-up: additional diagnostics for match failures
	I'm not sure why, but this test just failed for some odd reason from `make check-run' on my Debian bullseye workstatation.
2023-10-18	syscall: common $F_SETPIPE_SZ definition
	We use this in various places to minimize or maximize pipe size on Linux. So keep it all in one place.
2023-10-18	ds: get rid of SetLoopTimeout
	It's not worth the code and memory to have a setter method we never use outside of tests.
2023-10-17	lei: consolidate stdin slurp, fix warnings
	We can share more code amongst stdin slurper (not streaming) commands. This also fixes uninitialized variable warnings when feeding an empty stdin to these commands.
2023-10-15	learn: respect indexlevel for v1 inboxes
	v2 never suffered from this bug, apparently, but -learn didn't seem able to handle indexlevel=basic (nor respect `medium') for v1 inboxes. I only noticed this bug because I converted some ancient v1 inboxes to `basic' to save space.
2023-10-11	lei import\|tag\|rm: support --commit-delay=SECONDS
	Delayed commits allows users to trade off immediate safety for throughput and reduced storage wear when running multiple discreet commands. This feature is currently useful for providing a way to make t/lei-store-fail.t reliable and for ensuring `lei blob' can retrieve messages which have not yet been committed. In the future, it'll also be useful for the FUSE layer to batch git activity.
2023-10-10	t/imap_searchqp.t: retry bad query test on failure
	I really don't understand why this fails, sometimes; but it does.
2023-10-10	t/nntp.t: attempt to track source of undefined vars
	Occasionally, t/nntp.t spews undefined variable warnings under `make check-run'. While the test doesn't fail, it's annoying to see them and it could be a source of deeper problems.
2023-10-08	process_io: fix binmode and use it in lei_xsearch
	The `binmode' perlop can only take two scalars, so passing `@_' blindly won't work since prototypes are checked. This means we can get IO::Uncompress::Gunzip working properly with ProcessIO and use it for curl. We'll also just autodie (instead of warn) on FS errors when dealing with curl stderr; since the process will likely be in bigger trouble soon, anyways.
2023-10-08	process_io: pass args to awaitpid as list
	Specifying {cb_args} in the options hash felt awkward to me. Instead, just use the Perl stack like we do with awaitpid() and pass the list down directly.
2023-10-08	rename ProcessPipe to ProcessIO
	Since we deal with pipes (of either direction) and bidirectional stream sockets for this class, it's better to remove the `Pipe' from the name and replace it with `IO' to communicate that it works for any form of IO::Handle-like object tied to a process.
2023-10-08	ipc: require fork+SOCK_SEQPACKET for wq_* functions
	None of the lei internals works properly without forking and sockets. The fallback code increases the potential to accidentally call subs in the wrong process during the teardown phase. We'll still support ipc_do w/o forking for now since it forking doesn't benefit small indexing runs from -mda and such.
2023-10-08	lei: always use async `done' requests to store
	It's safer against deadlocks and we still get proper error reporting by passing stderr across in addition to the lei socket.
2023-10-06	finalize DragonFlyBSD support
	require_bsd and require_mods(':fcntl_lock') are now supported in TestCommon to make it easier to maintain than a big list of regexps. getsockopt for SO_ACCEPTFILTER seems to always succeed, even if the retrieved struct is all zeroes.
2023-10-06	t/dir_idle: dump event list on failure
	Hopefully this makes it easier to diagnose portability problems on new OSes we use.
2023-10-06	ipc: lower-level send_cmd/recv_cmd handle EINTR directly
	This ensures script/lei $send_cmd usage is EINTR-safe (since I prefer to avoid loading PublicInbox::IPC for startup time). Overall, it saves us some code, too.
2023-10-04	treewide: use PublicInbox::Lock->new
	This gets rid of a few bare bless statements and helps ensure we properly load Lock.pm before using it.
2023-10-04	t/lei_to_mail: modernize and document test