public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-01-27	switch to sysseek + sysread for serving static files
	The "perlio" layer doesn't do read(2) syscalls over 8192 bytes at the moment, and binmode($fh, ':unix') leaks[1]. So use sysseek and sysread for now, since I can't see retaining compatibility with PerlIO::scalar being worth the trouble. [1] http://nntp.perl.org/group/perl.perl5.porters/256918
2020-01-25	s/news.gmane.org/news.gmane.io/
	gmane still has a NNTP server, so update links to point to it. cf. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
2020-01-25	wwwstatic: wire up buffer bypass for -httpd
	This prevents public-inbox-httpd from buffering ->getline results from a static file into another temporary file when writing to slow clients. Instead we inject the static file ref with offsets and length directly into the {wbuf} queue. It took me a while to decide to go this route, some rejected ideas: 1. Using Plack::Util::set_io_path and having PublicInbox::HTTP serve the result directly. This is compatible with what some other PSGI servers do using sendfile. However, neither Starman or Twiggy currently use sendfile for partial responses. 2. Parsing the Content-Range response header for offsets and lengths to use with set_io_path for partial responses. These rejected ideas required increasing the complexity of HTTP response writing in PublicInbox::HTTP in the common, non-static file cases. Instead, we made minor changes to the colder write buffering path of PublicInbox::DS and leave the hot paths untouched. We still support generic PSGI servers via ->getline. However, since we don't know the characteristics of other PSGI servers, we no longer do a 64K initial read in an attempt to negotiate a larger TCP window.
2020-01-25	ds: tmpio: store offsets per-buffer
	We want to be able to inject existing file handles + offsets and even lengths into this in the future, without going through the ->getline interface[1] We also switch to using a 64K buffer size since we can safely discard whatever got truncated on write and full writes can help negotiate a larger TCP window for high-latency, high-bandwidth links. While we're at it, make it obvious that we're using O_APPEND for our tmpfile() interface so we can seek freely for reading while the writer always prints to the end of the file. [1] the getline interface for serving static files may result in us buffering on-FS data into another temporary file, which is a waste.
2020-01-25	wwwstatic: offload error handling to PSGI server
	The PSGI server needs to account for ->getline failing due to disk failures or truncated files, anyways. So just die() ourselves and let the PSGI server log and drop the client.
2020-01-25	http: eliminate short-lived cyclic ref for psgix.io
	While there is no known actual leak due to reference cycles, here, eliminating a potential source of leaks is helpful.
2020-01-25	spelling: favor `publicly' over `publically'
	While both can be correct, the former seems more common, is shorter, and is also consistent with the spelling found in the AGPL-3.0 text.
2020-01-25	website: omit technical/ and other subdirs
	We don't need to clutter the website with unnecessary technical information. Anybody who reads the technical/ directory should be looking at our source code, anyways; and we also have cgit and gitweb mirrors.
2020-01-25	doc: INSTALL describe required deps for released versions
	1.3.0 isn't out, yet, and sometimes folks will rely on INSTALL on our website while installing older versions, so try to clarify that.
2020-01-25	website: re-add top-level files
	I noticed the TODO was out-of-date on the website, among some other things. This was broken in moving GNU-isms in the Makefile to Perl.
2020-01-25	doc: avoid needless rebuilds of NEWS
	Repeatedly rebuilding `NEWS' because the mtime of `NEWS' is synched to the latest release .eml is a bit annoying, but necessary to save bandwidth for the website. So we'll also update the mtime of the source .eml file when reading them. It's kinda gross to be setting mtimes of source .eml files in Documentation/RelNotes/, but I can't think of anything better at the moment...
2020-01-25	mbox: handle empty subjects after dropping "Re:" prefix
	We can't pass empty strings to `to_filename' without triggering warnings, and `to_filename' on an empty string makes no sense.
2020-01-24	contentid: ignore duplicate References: headers
	OverIdx::parse_references already skips duplicate References (which we use in SearchThread for rendering). So there's no reason for our content deduplication logic to care if a Message-Id in the Reference header is mentioned twice.
2020-01-24	wwwstream: shorten cloneurl uniquification
	Another place where List::Scalar::uniq doesn't make sense, but there's a small op reduction to be had anyways.
2020-01-24	mid: shorten uniq_mids logic
	We won't be able to use List::Util::uniq here, but we can still shorten our logic and make it more consistent with the rest of our code which does similar things.
2020-01-24	inbox: simplify filtering for duplicate NNTP URLs
	And add a note to remind ourselves to use List::Util::uniq when it becomes common.
2020-01-24	nntp: simplify setting X-Alt-Message-ID
	We can cut down on the number of operations required using "grep" instead of "foreach".
2020-01-24	contentid: use map to generate %seen for Message-Ids
	This use of map {} is a common idiom as we no longer consider the Message-ID as part of the digest.
2020-01-23	hval: from_attr: move to unit test
	We don't call from_attr anywhere outside of tests, so don't bloat normal processes with it.
2020-01-23	hval: to_attr: support wide characters
	We need to escape wide characters when making attribute names from filename-looking things in diffstats.
2020-01-19	doc: some 1.3.0 release notes updates

2020-01-13	sigfd: simplify loop and improve documentation
	We can use the return value of sysread to bound our loop instead of repeatedly shortening the string. Furthermore add some comments which can be easily checked against the signalfd(2) manpage.
2020-01-13	ds: flatten $EXPMAP, delete entries on close
	We can reduce the amount of small arrayrefs in memory by flattening $EXPMAP. This forces us to properly clean up references during deferred close handling, so NNTP (and soon HTTP) connections no longer linger until expiry.
2020-01-13	ds: rely on autovivification for $later_queue
	No reason to have an empty arrayref lying around when not everybody needs it. Re-indent the later-related subs since we're changing a bunch of lines, anyways.
2020-01-13	ds: rely on autovivication for waitpid bits
	No need to create an arrayref until we need it, and fix up a comment while we're in the area. Some aesthetic changes while we're at it: - Rename $WaitPids to $wait_pids to make it clear this is unique to our implementation and not in Danga::Socket. - rewrite dwaitpid() to reduce indentation level
2020-01-13	ds: rely on autovivification for nextq
	Another place we can delay creating arrays until needed.
2020-01-13	ds\|http\|nntp: simplify {wbuf} population
	We can rely on autovification to turn `undef' value of {wbuf} into an arrayref. Furthermore, "push" returns the (new) size of the array since at least Perl 5.0 (I didn't look further back), so we can use that return value instead of calling "scalar" again.
2020-01-13	ds: guard ToClose against DESTROY side-effects
	This does not affect our current code, but theoretically a DESTROY callback could call PublicInbox::DS::close to enqueue elements into the ToClose array. So take a similar strategy as we do with other queues (e.g. $nextq) by swapping references to arrays, rather than operating on the array itself. Since close operations are relatively rare, we can rely on auto-vivification via "push" ops to create the array on an as-needed basis. Since we're in the area, clean up the PostLoopCallback invocation to use the ternary operator rather than a confusing (to me) combination of statements. Finally, add a prototype to strengthen compile-time checking, and move it in front of our only caller to make use of the prototype.
2020-01-13	ds: remove Timer->cancel and Timer class+bless
	It doesn't seem needed at the moment, and we can re-add it in the future if needed.
2020-01-13	ds: add an in_loop() function for Inbox.pm use
	Inbox.pm accessing the $in_loop variable directly raises warnings when Inbox is loaded without DS.
2020-01-13	ds: add_timer: rename from AddTimer, remove a parameter
	The class parameter is pointless, especially for an internal sub which only has one external caller in a test. Add a sub prototype while we're at it to get some compile time checking.
2020-01-13	ds: new: avoid redundant check, make clobbering fatal
	"fileno(undef)" already dies under "use strict", so there's no need to check for it ourselves. As far as "fileno($closed_io)" or "fileno($fake_io)" goes, we'll let epoll_ctl detect the error, instead. Our design should make DescriptorMap entries impossible to clobber, so make it fatal via confess in case it does happen, because inadvertantly clobbering a FD would be very bad. While we're at it, remove a redundant return statement and rely on implicit returns.
2020-01-13	use popen_rd for bidirectional pipes
	popen_rd accepts arbitrary redirects, so we can reuse its code to setup the pipe end we want to read, saving each caller a few lines of code compared to calling pipe+spawn.
2020-01-13	t/solver_git: avoid uninitialized warnings in hostname generation
	Outside of tests, this is only relevant for non-PSGI use, which may happen someday... Fixes: cb1c874520153f5c ("inbox: use PublicInbox::Git::host_prefix_url for base_url")
2020-01-13	xt/git_async_cmp: do not slurp large OID list into memory
	I somehow thought "foreach (<$cat>)" could work like "while (<$cat>)" when it came to iterating over file handles...
2020-01-13	xapcmd: use popen_rd for running xapian-compact
	public-inbox-compact wrapper displays progress by default, anyways, and there's not a lot of output, so simplify our code by using popen_rd instead of spawn + optional pipe. While we're at it use "while (<HANDLE>)" to display progress as it happens, since "foreach (<$HANDLE>)" slurps the contents into an array, first.
2020-01-13	cgit: drop cgit_parse_hdr wrapper
	Unlike PublicInbox::GitHTTPBackend::git_parse_hdr, cgit_parse_hdr does nothing interesting besides calling parse_cgi_headers. So just make a reference to PublicInbox::GitHTTPBackend::parse_cgi_headers and call it.
2020-01-13	solver: path_a may be undef from /dev/null
	This avoids uninitialized variable warnings when viewing newly-created files.
2020-01-13	git: packed_bytes: use GLOB_NOSORT
	File::Glob is loaded by the perl for the "glob()" op, anyways, so call bsd_glob with the GLOB_NOSORT to avoid needless sorting of the output.
2020-01-13	git: modified: don't slurp `rev-parse --branches'
	While v1 inboxes typically only have one branch, code repositories may have dozens or even hundreds. Slurping those into memory is a waste.
2020-01-13	config: do not slurp entire cgitrc at once
	cgitrc files can have hundreds or thousands of lines in them and slurping them into memory is a waste. "while (<$fh>)" only reads one line at a time, whereas "for (<$fh>)" reads the entire contents of the file into a temporary array.
2020-01-12	examples/unsubscribe.milter: support unique mailto:
	Instead of providing a generic "mailto:foo+unsubscribe@example.com" address in List-Unsubscribe which requires confirmation, replace it with a mailto: header with a unique subject which contains the same unique ID we put in the https:// URL. This makes it easier for some MUAs without https:// support to unsubscribe with a single action via the List-Unsubscribe header.
2020-01-12	examples/unsubscribe.milter: skip gmane-mx
	Mail to gmane is being delivered to gmane-mx.org, nowadays, and we don't want ordinary readers to be able to trigger unconfirmed unsubscription off any mailing lists which go through our unsubscribe.milter. https://lars.ingebrigtsen.no/2020/01/06/whatever-happened-to-news-gmane-org/
2020-01-12	www: discard multipart parent on iteration
	We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2020-01-12	xt/mem-msgview.t: change to test one multipart message
	A single multipart message is far more common than a reused Message-ID, so rewrite the test to only have a single multipart message. Memory improvements will be implemented in the next commit.
2020-01-11	make Filesys::Notify::Simple optional
	It's only used by us in public-inbox-watch, and maybe not for long. It's in most installations because Plack pulls it in though, but Plack is no longer required.
2020-01-11	make Plack optional for non-WWW and non-httpd users
	Some users just want to run -mda, -watch, and/or -nntpd. Let them run just those without forcing them to pull in a bunch of dependencies.
2020-01-11	doc: technical/ds.txt: describe PublicInbox::DS divergences
	Danga::Socket 1.62 was released a few months back and the maintainer indicated it would be the last release. We've diverged significantly in incompatible ways... While most of this should've already been documented in commit messages, putting it all into one document could make it easier-to-digest. It's also a strange design for anybody used to conventional event loops. Maybe this is an unconventional project :P
2020-01-11	spawn (and thus popen_rd) die on failure
	Most spawn and popen_rd callers die on failure to spawn, anyways, and some are missing checks entirely. This saves us a bunch of verbose error-checking code in callers. This also makes popen_rd more consistent, since it already dies on pipe creation failures.
2020-01-11	git: remove ->commit_title method
	We haven't used it in SolverGit, yet, and I'll be reworking it to work with ->cat_async, instead.