public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2019-11-08	t/httpd-corner.t: get rid of IPC::Run for running curl
	We already load PublicInbox::Spawn, so there's no need to add another dependency to make life difficult for potential contributors.
2019-11-08	t/httpd-corner.t: drop unnecessary bytes:: for length()
	We don't need to force byte semantics for a buffer we clearly create (via ->read) with byte semantics. Since we didn't "use bytes" in t/httpd-corner.t, it was inadvertantly made available by IPC::Run (which goes away, next).
2019-11-08	t/*.t: remove IPC::Run dependency for git commands
	One small step towards making tests easier-to-run. We can rely on "local $ENV{GIT_DIR}" for potentially shell-unsafe path names, and the rest of our path names are relative and don't contain characters which require escaping.
2019-11-04	t/edit: use PublicInbox::Git::qx for pathname safety
	Another case where spaces can be in TMPDIR and cause shell expansion with `command` to fail.
2019-11-04	tests: rely on PublicInbox::Git for pathname safety
	It's possible (but unlikely) a user will put spaces in TMPDIR and cause File::Temp::tempdir() to return a temporary directory with spaces in the filename, making it unsafe for shell expansion. PublicInbox::Git didn't exist when t/mda.t was written, and I just forgot about PublicInbox::Git->qx for t/plack.t :x
2019-11-04	t/httpd-corner.t: check for curl(1) errors in big async test
	curl(1) can fail and we need to invalidate the test in the rare case it fails.
2019-11-04	index: "git log" failures are fatal
	While I've never seen "git log" fail on its own, it could happen one day and we should be prepared to abort indexing when it happens. Beef up tests for t/spawn.t to ensure close() behaves on popen_rd the way we expect it to.
2019-10-31	hval: replace "'" with "'" for compatibility
	While testing 216light.css changes, I managed to hit some cases where dillo failed to render ' correctly, but I also can't reproduce it reliably. Anyways, it's definitely a problem with some old browsers and newer versions of highlight already work around it, but Debian 10.x has 3.41, so use "'" to maximize compatibility.
2019-10-31	msgiter: do not assume UTF-8 if Email::MIME->body_str succeeds
	ISO-2202-JP and other non-UTF-8 messages need to be displayed correctly. Fixes: 7d82a8bc04ce ('handle "multipart/mixed" messages which are not multipart')
2019-10-30	Merge branch 'learn'
	* learn: doc: add public-inbox-learn(1) manpage mda: support multiple List-ID matches mda: prepare for multiple destinations inboxwritable: add assert_usable_dir sub mda: skip MIME parsing if spam mda: hoist out mda_filter_adjust filter/base: remove MAX_MID_SIZE constant mda: hoist out List-ID handling and reuse in -learn learn: hoist out remove_or_add subroutine learn: GIT_COMMITTER_<NAME\|EMAIL> may be "" or "0" learn: update usage statement learn: only map recipient list on "ham" or "rm" learn: support multiple To/Cc headers
2019-10-30	mda: support multiple List-ID matches
	While it's not RFC2919-conformant, mail software can theoretically set multiple List-ID headers. Deliver to all inboxes which match a given List-ID since that's likely the intended. Cc: Eric W. Biederman <ebiederm@xmission.com> Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
2019-10-30	inboxwritable: add assert_usable_dir sub
	And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths...
2019-10-28	index: allow search/lookups on X-Alt-Message-ID
	Since we replace extra Message-ID headers with X-Alt-Message-ID to placate NNTP clients, we should allow searching and indexing on X-Alt-Message-ID just like we do with Message-ID.
2019-10-28	view: move '<' and '>' outside <a>
	Browsers may underline '<' and '>' in links, which may be confused with '≤' and '≥'. So have the Message-ID header display follow what we do with In-Reply-To headers and move the "<" and ">" outside of <a> in the HTML.
2019-10-28	search: support multiple From/To/Cc/Subject headers
	We can easily support searching on messages with multiple From/To/Cc/Subject headers just like we do with multiple Message-ID headers. This matches the normal mutt pager display behavior.
2019-10-21	v2writable: reindex handles 3-headered monsters
	And maybe 8-headered ones, too... I noticed --reindex failing on the linux-renesas-soc mirror due one 3-headed monster of a message having 3 sets of headers; while another normal message had a Message-ID that matched one of the 3 IDs of the 3-headed monster. We still try to do the majority of indexing backwards, but we defer indexing multi-Message-ID'd messages until the end to ensure we get all the "good" messages in before we process the multi-headered ones. Link: https://public-inbox.org/meta/20191016211415.GA6084@dcvr/
2019-10-17	Merge remote-tracking branch 'origin/inboxdir'
	* origin/inboxdir: config: remove redundant inboxdir check config: support "inboxdir" in addition to "mainrepo" examples/grok-pull.post_update_hook: use "inbox_dir"
2019-10-16	config: support "inboxdir" in addition to "mainrepo"
	"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-16	mda: support --no-precheck option
	Since -mda now supports List-ID to better support mirroring of existing mailing lists, it probably makes sense to support disabling the precheck function to provide more accurate (though potentially spammier) mirrors of lists
2019-10-15	mda, watch: wire up List-ID header support
	This also adds watchheader tests for -watch, which we never had before :x
2019-10-15	config: we always have {-section_order}
	Rewrite a bunch of tests to use ordered input (emulating "git config -l" output) so we can always walk sections in the order they were given in the config file.
2019-10-10	t/git-http-backend: disable worker processes
	We want to ensure we run lsof(8) on the worker (if needed), and not the master, which doesn't serve requests. This was originally on top of a test-only patch in https://public-inbox.org/meta/20190913015043.17149-1-e@80x24.org/ In any case, no point in spawning extra processes for this test.
2019-10-05	init: implement locking
	First, we use flock(2) to wait on parallel public-inbox-init(1) invocations while we make multiple changes using git-config(1). This flock allows -init processes to wait on each other if using reasonable POSIX filesystems. Then, we also need a git-config(1)-compatible lock to prevent user-invoked git-config(1) processes from clobbering our changes while we're holding the flock.
2019-10-05	init: favor --skip-epoch instead of --skip
	Since I intend to add support for --skip-artnum, disambiguating the long option name makes sense. We'll support --skip indefinitely for compatibility.
2019-10-05	t/search: bail out on `git init --shared' failures
	We can save future testers some time if we bail out early on "git init --shared" failures, since things like seccomp or non-POSIX FSes would trigger failures. BAIL_OUT has been in Test::Simple since Perl v5.10.0, so it's old-enough to call for our purposes. Thanks-to: Alyssa Ross <hi@alyssa.is> Reviewed-by: Alyssa Ross <hi@alyssa.is> Tested-by: Alyssa Ross <hi@alyssa.is> Link: https://public-inbox.org/meta/878sq2hd08.fsf@alyssa.is/
2019-10-03	t/search: show file modes as octal on failures
	This ought to make permissions errors on odd systems easier to diagnose in the future.
2019-10-02	tests: recommend running create-certs.pl with $^X
	This is better than recommending running the script directly because it will ensure the correct version of perl is used.
2019-10-01	www: fix absolute URLs when mounted under a subdir
	While we avoid generating absolute URLs in most cases, our "git clone" instructions and URL headers in mboxrd files contain full URLs. So do the same thing we do for WwwAtomStream and pre-generate the full URL before Plack::App::URLMap changes $env->{PATH_INFO} and $env->{SCRIPT_NAME} back to their original values. Reported-by: edef <edef@edef.eu> Link: https://public-inbox.org/meta/cover.0f97c47bb88db8b875be7497289d8fedd3b11991.1569296942.git-series.edef@edef.eu/
2019-09-30	config: use NUL-delimited git-config(1) output
	This allows us to deal with newlines in config values, since git-config(1) acquired "-z" support in git v1.5.3. I'm not sure if it's actually useful in our case, but maybe some multi-line texts could be added. And newlines in path names are super useful!
2019-09-27	wwwtext: support $INBOX_URL/_/text/config/raw
	This returns a git-config(1)-compatible file to make it easier to get started on mirroring an existing public-inbox. Omitting the "raw" from the URL works, as well, but I'm not sure if it's very useful.
2019-09-18	config: boolean handling matches git-config(1)
	We need to handle arbitrary integers and case-insensitive variations of human words to match git-config(1) behavior, since that's what users would expect given we use config files parseable by git-config(1).
2019-09-17	t/httpd-corner.t: don't fail lsof test if stdin is a pipe (try #2)
	Actually do the redirect properly
2019-09-17	t/httpd-corner.t: don't fail lsof test if stdin is a pipe
	We don't want the stdin from the test runner to accidentally cause this test to fail.
2019-09-17	qspawn: remove return value from ->finish
	We don't use the return value in real code since we do waitpid asynchronously, now. So simplify our runtime code at the cost of making our test slighly more complex.
2019-09-15	t/httpd-corner: use which() sub for detecting curl(1)
	We already import `which' for lsof(8), so we might as well use it to detect curl(1), too.
2019-09-14	t/httpd-corner: check for leaking FDs and pipes
	-W0 (no workers) should not create any pipes on its own, and we shouldn't have any deleted FDs if no clients are connected. This can find if leaks which may be triggered by PublicInbox::HTTP (and not Qspawn or GitHTTPBackend).
2019-09-09	run update-copyrights from gnulib for 2019

2019-09-09	tests: add tcp_connect() helper
	IO::Socket::INET->new is rather verbose with the options hash, extract it into a standalone sub
2019-07-13	nntp: support optional [range] arg in LISTGROUP
	RFC3977 6.1.2.2 LISTGROUP allows a [range] arg after [group], and supporting it allows NNTP support in neomutt to work again. Tested with NeoMutt 20170113 (1.7.2) on Debian stretch (oldstable)
2019-07-13	nntp: fix LIST OVERVIEW.FMT ordering and format
	RFC3977 8.4.2 mandates the order of non-standard headers to be after the first seven standard headers/metadata; so "Xref:" must appear after "Lines:"\|":lines". Additionally, non-required header names must be followed by ":full". Cc: Jonathan Corbet <corbet@lwn.net> Reported-by: Urs Janßen <E1hmKBw-0008Bq-8t@akw>
2019-07-06	nntp: reduce memory overhead of zlib
	Using Z_FULL_FLUSH at the right places in our event loop, it appears we can share a single zlib deflate context across ALL clients in a process. The zlib deflate context is the biggest factor in per-client memory use, so being able to share that across many clients results in a large memory savings. With 10K idle-but-did-something NNTP clients connected to a single process on a 64-bit system, TLS+DEFLATE used around 1.8 GB of RSS before this change. It now uses around 300 MB. TLS via IO::Socket::SSL alone uses <200MB in the same situation, so the actual memory reduction is over 10x. This makes compression less efficient and bandwidth increases around 45% in informal testing, but it's far better than no compression at all. It's likely around the same level of compression gzip gives on the HTTP side. Security implications with TLS? I don't know, but I don't really care, either... public-inbox-nntpd doesn't support authentication and it's up to the client to enable compression. It's not too different than Varnish caching gzipped responses on the HTTP side and having responses go to multiple HTTPS clients.
2019-07-06	nntp: support COMPRESS DEFLATE per RFC 8054
	This is only tested so far with my patches to Net::NNTP at: https://rt.cpan.org/Ticket/Display.html?id=129967 Memory use in C10K situations is disappointing, but that's the nature of compression. gzip compression over HTTPS does have the advantage of not keeping zlib streams open when clients are idle, at the cost of worse compression.
2019-07-05	t/nntpd*.t: require IO::Socket::SSL 2.007 for Net::NNTP tests
	Net::NNTP won't attempt to use older versions of IO::Socket::SSL because 2.007 is the "first version with default CA on most platforms" according to comments in Net::NNTP. But then again we don't make remote requests when testing...
2019-07-04	qspawn: retry sysread when parsing headers, too
	We need to ensure the BIN_DETECT (8000 byte) check in ViewVCS can be handled properly when sending giant files. Otherwise, EPOLLET won't notify us, again, and responses can get stuck. While we're at it, bump up the read-size up to 4096 bytes so we make fewer trips to the kernel.
2019-06-30	Merge remote-tracking branch 'origin/nntp'
	* origin/nntp: nntp: add support for CAPABILITIES command nntp: remove DISABLED hash checks
2019-06-30	nntp: add support for CAPABILITIES command
	Some clients may rely on this for STARTTLS support.
2019-06-30	t/httpd-unix.t: avoid race in between bind() and listen()
	We need to be able to successfully connect() to the socket before attempting further tests. Merely testing for the existence of a socket isn't enough, since the server may've only done bind(), not listen().
2019-06-30	daemon: warn on inheriting blocking listeners
	For users relying on socket activation via service manager (e.g. systemd) and running multiple service instances (@1, @2), we need to ensure configuration of the socket is NonBlocking. Otherwise, service managers such as systemd may clear the O_NONBLOCK flag for a small window where accept/accept4 blocks: public-inbox-nntpd@1 \|systemd \|public-inbox-nntpd@2 --------------------------+----------------+-------------------- F_SETFL,O_NONBLOCK\|O_RDWR \| \| (not running, yet) \|F_SETFL, O_RDWR \| \|fork+exec @2... \| accept(...) # blocks! \| \|(started by systemd) \| \|F_SETFL,O_NONBLOCK\|O_RDWR \| \|accept(...) non-blocking It's a very small window where O_NONBLOCK can be cleared, but it exists, and I finally hit it after many years.
2019-06-30	tests: common tcp_server and unix_server helpers
	IO::Socket:*->new options are verbose and we can save a bunch of code by putting this into t/common.perl, since the related spawn_listener stuff is already there.
2019-06-30	t/perf-nntpd.t: fix off-by-one if NEWNEWS_DATE is unset
	20190431 isn't real, NNTP.pm failed to parse it when our test client sent it.