public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-04-25	watchmaildir: match List-ID case-insensitively
	RFC 2919 section 6 states the following: There is only one operation defined for list identifiers, that of case insensitive equality. So no arguing with that. Now, the other headers are open to interpretation, so put a note about them.
2020-04-25	watchmaildir: scan all matching headers
	Some headers may appear more than once in a message, so it's probably best to ensure we attempt matches on all of them. This ought to allow matching on Received: or similar because a list lacks List-IDs :P
2020-04-22	t/mda.t: avoid needless use of Email::Simple
	Totally pointless to create an object only to convert it back to a raw string for -mda input.
2020-04-22	t/*.t: reduce dependency on Email::MIME APIs
	Instead, favor PublicInbox::MIME->new for non-attachment emails. We may support alternatives to Email::MIME down the line. We'll still keep Email::MIME->create to deal with attachments, for now, but there's also a fair amount of test duplication we should eliminate, later.
2020-04-22	t/*.t: use Email::MIME->create over PublicInbox::MIME->create
	PublicInbox::MIME only supports ->new, and is only different from Email::MIME for old versions of Email::MIME. In the future, PublicInbox::MIME may not be a subclass of Email::MIME at all.
2020-04-22	t/feed: remove useless $ENV{GIT_DIR} assignment
	I don't think this has been useful since we stopped supporting ssoma in this test.
2020-04-22	make zlib-related modules a hard dependency
	This allows us to simplify some of our existing code and make future changes easier. I doubt anybody goes through the trouble to have a Perl installation without zlib support. The zlib source code is even bundled with Perl since 5.9.3 for systems without existing zlib development headers and libraries. Of course, zlib is also a requirement of git, too; and we're not going to stop using git :) [squashed: "wwwaltid: use gzipfilter up front"]
2020-04-22	view: actually omit subject text when dumping topics
	Despite dump_topics() calling dedupe_subject() on the subject, the index shows partly duplicated subjects, for example ` [PATCH 2/2] t/www_listing: avoid 'once' warnings ` [PATCH v2] t/www_listing: avoid 'once' warnings " In the second line, the omission character " is appended, but the entire subject is shown. To display the subject with duplicated parts omitted, regenerate it from the array that is modified by dedupe_subject().
2020-04-22	view: strip omission character from current message in thread view
	In the thread view shown at the top of a message, the subject for the current message is dropped, leaving just the sender's name. However, if skel_dump() omitted part of the subject because it was duplicated, the omission character is still displayed: * [PATCH v2] t/www_listing: avoid 'once' warnings 2020-03-21 1:10 ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong @ 2020-03-21 5:24 ` " Eric Wong Note the " on the last line. Adjust the regular expression in _th_index_lite() to account for the omission character. [ew: avoid capturing $1, keep under 80 cols]
2020-04-22	Merge branch '1.4.0-tag-merge'
	* 1.4.0-tag-merge: public-inbox 1.4.0
2020-04-22	Merge tag 'v1.4.0' into 1.4.0-tag-merge
	Oops, it looks like I tagged, amended, and pushed 1.4.0 with the wrong tag and Message-ID * tag 'v1.4.0': public-inbox 1.4.0 Link: https://public-inbox.org/meta/87r1wgqkff.fsf@kyleam.com/
2020-04-21	t/nntpd: die if we can't open stderr output
	We need to detect FS errors and bail out on the test if we can't open a file -nntpd was just writing to.
2020-04-21	t/nntpd: reduce dependencies on internal API
	Since the advent of run_script(), we can rely on it to simplify our test code. Changes like this will let us evolve the internal API more easily while preserving stable CLI interfaces, especially since we test the v2 path by default, now.
2020-04-21	t/nntpd: fix lsof check w/ TEST_RUN_MODE=0
	The `xqx' sub requires an absolute path for optional commands. Fixes: 6e07def560b211d9 ("testcommon: spawn-aware system() and qx[] workalikes")
2020-04-21	index: support --max-size / publicinbox.indexMaxSize
	In normal mail paths, we can rely on MTAs being configured with reasonable limits in the -watch and -mda mail injection paths. However, the MTA is bypassed in a git-only delivery path, a BOFH could inject a large message and DoS users attempting to mirror a public-inbox. This doesn't protect unindexed WWW interfaces from Email::MIME memory explosions on v1 inboxes. Probably nobody cares about unindexed WWW interfaces anymore, especially now that Xapian is optional for indexing.
2020-04-21	qspawn: remove Perl 5.16.x leak workaround
	It seems no longer necessary to workaround this Perl 5.16.3 bug after the removal of anonymous subs from all of our internal code in https://public-inbox.org/meta/20191225075104.22184-1-e@80x24.org/ Tested with repeated clones (both aborted and completed) in a CentOS 7.x VM which was once able to reproduce leaks before the workaround appeared in 2fc42236f72ad16a ("qspawn: workaround Perl 5.16.3 leak, re-enable Deflater") Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
2020-04-20	v2writable: drop SQLite-based multi_mid_q_new
	We switched to the SDBM-based queue to store author/committer info last month. Fixes: c7acdfe78bda5bf3 ("v2: SDBM-based multi Message-ID queue")
2020-04-20	drop needless `eval {}' around Config->new
	It hasn't been needed since commit 089cca37fa036411 ("config: ignore missing config files"). And we actually want to propagate errors when we can't start new processes or if git(1) is missing.
2020-04-20	doc: txt2pre: fix URL of git-filter-repo(1)
	Probably a typo when doing concatenation.
2020-04-20	doc: HACKING: add a bit about faster testing
	`make test' is annoyingly slow, and `make check-run' works wonders for improving the edit && test cycle.
2020-04-20	testcommon: spawn-aware system() and qx[] workalikes
	Barely noticeable on Linux, but this gives a 1-2% speedup on a FreeBSD 11.3 VM and lets us use built-in redirects rather than relying on /bin/sh.
2020-04-20	t/ds-leak: use BSD::Resource
	We use BSD::Resource in other places, so there's no sense in avoiding it, here.
2020-04-20	import: init_bare: use pure Perl
	Even on systems with Inline::C spawn(), this cuts a primed "make check-run" time by 2-3% on Linux, and roughly 5-7% on FreeBSD when using vfork-enabled spawn. I doubt anybody cares: this omits the sample hooks and some empty and useless-for-us or obsolete directories created by git-init(1).
2020-04-20	import: init_bare: allow use as method, use in tests
	Allowing ->init_bare to be used as a method saves some keystrokes, and we can save a little bit of time on systems with our vfork(2)-enabled spawn(). This also sets us up for future improvements where we can avoid spawning a process at all.
2020-04-20	watchmaildir: support multiple watchheader values
	The watchheader key supports only a single value. Supporting multiple watchheader values was mentioned in discussion [1] of 8d3e3bd8 (doc: explain publicinbox.<name>.watchheader, 2019-10-09), and it wasn't clear if there was a need. One scenario in which matching multiple headers would be convenient is when someone wants to set up public-inbox archives for some small projects but does _not_ want to run mailing lists for them, instead allowing others to follow the project by any of the pull mechanisms. Using a common underlying address, an address alias for each project is configured via a third-party email provider, with messages for each alias being exposed as a separate public-inbox archive. In this setup, messages for an inbox cannot be selected by a List-ID header but can be identified by the inbox's address in either the To or Cc header. To support such a use case, update the watchheader handling to consider multiple values, accepting a message if it matches any value. While selecting a message based on matching _any_ rather than _all_ values is motivated by the above scenario, it's worth noting that the "any" behavior is consistent with how multiple listid config values are handled. [1] https://public-inbox.org/meta/20191010085118.r3amey4cayazfycb@dcvr/
2020-04-19	t/v*-add-remove-add: fix typo in description of 'removed' check

2020-04-19	doc: start writeup on semi-automatic memory management
	I don't consider Perl's memory management "automatic". Instead, having an extra bit of control as a hacker is nice and there's no need to burden ordinary users with GC tuning knobs.
2020-04-19	reduce scope of mbox From_ line removal
	It's unnecessary overhead for anything which does Email::MIME parsing. It was never done for v2 indexing, even though v1->v2 conversions did NOT remove those From_ lines. There was never a need to remote From_ lines the v1 SearchIdx paths, either. Hitting a /$INBOX_URL/$MSGID/T/ endpoint with an 18 message thread reveals a ~0.5% speed improvement. This will become more apparent when we have a faster MIME parser.
2020-04-19	mbox: use per-message line-ending for From_ line
	Email::Simple preserves the message line ending in headers, so make the From_ line consistent with the rest of the headers.
2020-04-19	wwwatomstream: move {emit_header} field to $self
	There's no need to pollute the cross-package $ctx with it.
2020-04-19	favor `do {}' over `eval {}' for localized slurp
	I did not know to use the return value of `do' back in the day. There's probably no practical difference in these cases, but `eval' is overkill for these uses and may hide actual errors. We can get rid of a few redundant `scalar' ops and pass scalar refs to Email::MIME->new to avoid copies in a few more places, too.
2020-04-19	inbox: replace `eval {}' with `do {}' where appropriate
	-Git->new and -Limiter->new will never fail unless there's an OOM, so using `eval' is incorrect.
2020-04-19	inbox: don't memoize missing description\|cloneurl
	It's probably common to have inboxes initially setup without these files properly configured, so don't memoize at that stage.
2020-04-19	searchidx: die on cat-file failures
	We always use the object ID from "git <log\|rev-list>" for retrieving blobs, so fail loudly if the git repository is corrupt instead of silently continuing.
2020-04-19	inboxwritable: mime_from_path: reuse in more places
	There's nothing Maildir-specific about the function, so `maildir_path_load' was a bad name. So give it a more appropriate name and use it in our tests. This save ourselves some code and inconsistency by reusing an existing internal library routine in more places. We can drop the "From_" line in some of our (formerly) mbox sample files.
2020-04-17	searchthread: reduce indirection by removing container
	We can rid ourselves of a layer of indirection by subclassing PublicInbox::Smsg instead of using a container object to hold each $smsg. Furthermore, the `{id}' vs. `{mid}' field name confusion is eliminated. This reduces the size of the $rootset passed to walk_thread by around 15%, that is over 50K memory when rendering a /$INBOX/ landing page.
2020-04-17	doc: update 1.4.0 relnotes with date, start 1.5.0

2020-04-17	public-inbox 1.4.0

2020-04-17	public-inbox 1.4.0 v1.4.0

2020-04-17	t/httpd-unix: skip some tests w/o signalfd\|EVFILT_SIGNAL
	Some of these tests just don't seem reliable enough with the way we or Perl do portable signal handling.
2020-04-16	t/httpd-corner: improve reliability and diagnostics
	The graceful-shutdown-on-PUT test is unreliable because we can't rely on a FIFO as we do with the GET tests. So increase the delay to 100ms since that seems enough on my system even with CONFIG_HZ=100. Add a timeout and backtrace to the $check_self sub to help with further diagnostics while we're at it, too. It would be nice if there were a portable syscall tracing mechanism we could attach to the -httpd process to make the test more determistic...
2020-04-15	t/httpd-corner.t: relax read-after-failed-write handling
	I've observed FreeBSD 11.2 read(2) having one of three behaviors after a failed write(2) on a socket: 1) returning number of bytes read 2) failing with ECONNRESET 3) returning with EOF 1) is the most common, and I've only seen 1) on Linux. It may be possible to use SO_LINGER or shutdown(2) to ensure 1) always happens, but SO_LINGER behavior seems inconsistent across OSes, especially with non-blocking sockets. Since these tests are corner-cases where we're dealing with broken/malicious clients, lets continue spending the least amount of syscalls protecting ourselves in the daemon and instead make the client-side test code tolerate more socket implementations.
2020-04-15	t/*.t: localize $SIG{__WARN__} changes
	We don't want to propagate %SIG changes to other tests when running multiple tests within the same process via t/run.perl.
2020-04-15	dskqxs: ignore EV_SET errors on EVFILT_WRITE
	Just like the EPOLL_CTL_ADD emulation path, the EPOLL_CTL_MOD and EPOLL_CTL_DEL emulation paths can fail if attempting to install an EVFILT_WRITE for a read-only pipe. I've only observed this on the EPOLL_CTL_DEL emulation path, but I suspect it could happen on the EPOLL_CTL_MOD path as well. Increasing the amount of read-only pipes we rely on with altid exports via sqlite3 made this old bug more apparent and reproducible while looping the test suite. This may be adjusted in the future to deal with write-only pipes, but we currently don't have any of those watched by kqueue.
2020-04-15	testcommon: DESTROY: wait for killed daemon
	Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue() may reap it in a subsequent test when using t/run.perl to reuse processes for testing. While we're at it, make Xapcmd::process_queue warn about unknown PIDs in case other PIDs leak through to us in the future.
2020-04-15	MANIFEST update

2020-04-13	doc: add technical/whyperl
	Some people don't like Perl; but it exists, there's no avoiding it with everything that depends on it. And nearly all code still works unmodified after 20 years.
2020-04-13	doc: start reproducibility document
	Not new ideas, just gathering thoughts.
2020-04-12	doc: escape internal ">" in listid code snippet
	A code snippet in the listid description is incorrectly rendered as "publicinbox.$NAME.watchheader=List-Id:<foo.example.com"> Escape the closing bracket around the List-Id value to avoid this. Also escape the opening bracket for symmetry/readability.
2020-04-09	t/httpd-unix: improve test reliability
	Net::Server::Daemonize::create_pid_file does not write the PID file atomically, so we need to barf if it's incomplete.