public-inbox.git - an "archives first" approach to mailing lists

Date	Commit message (Collapse)
2020-03-19	http: fix RFC conformance w.r.t. message length
	We need to favor "Transfer-Encoding: chunked" over the value of the Content-Length header. We should also reject bogus, duplicate and/or unreasonable values for both these, since they can trigger unexpected behavior when combined with other HTTP parsers in proxies such as varnish, nginx, haproxy, etc... See RFC 7230 (and RFC 2616) for more details: https://tools.ietf.org/html/rfc7230 https://www.rfc-editor.org/errata_search.php?rfc=7230
2020-02-06	treewide: run update-copyrights from gnulib for 2019
	I didn't wait until September to do it, this year!
2020-01-25	http: eliminate short-lived cyclic ref for psgix.io
	While there is no known actual leak due to reference cycles, here, eliminating a potential source of leaks is helpful.
2020-01-13	ds\|http\|nntp: simplify {wbuf} population
	We can rely on autovification to turn `undef' value of {wbuf} into an arrayref. Furthermore, "push" returns the (new) size of the array since at least Perl 5.0 (I didn't look further back), so we can use that return value instead of calling "scalar" again.
2020-01-11	allow HTTP_HOST to be '0' via defined() checks
	'0' is a valid value for HTTP_HOST, and maybe some folks will want to hit that as port 80 where the HTTP client won't send the ":$PORT" suffix.
2020-01-09	http: log response_write errors
	Application-supplied callbacks may error out, try to log them so the PSGI app developer can figure out what went wrong.
2020-01-06	treewide: "require" + "use" cleanup and docs
	There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-01	http: update comment about psgix.io usage
	We've been using async_pass for a while.
2019-12-22	http: avoid anonymous sub for getline callback
	We can avoid the danger of self-referential subs entirely for code internal to PublicInbox::HTTP. This change was only made possible by commit 8e1c3155da4edc082e8e3d8b30351f0c861757a7 ("ds: pass $self to code references")
2019-12-22	http: get rid of anonymous subs for write/close
	Each sub costs us several kilobytes of memory for every response we make. An arrayref only costs 80 bytes on 64-bit, so bless that to packages with appropriate ->write and ->close methods.
2019-12-14	ds: move EvCleanup code into DS
	EvCleanup only existed since Danga::Socket was a separate component, and cleanup code belongs with the event loop.
2019-09-17	http: remove unnecessary delete
	Only removing $http->{env} is needed to prevent circular references. $env->{'psgix.io'} does not need to be deleted since $env will no longer have any references to it when ->close returns.
2019-09-17	http: drop unused `$env' variable after delete
	And explain why we need to do that delete in a comment.
2019-09-14	tmpfile: give temporary files meaningful names
	Although we always unlink temporary files, give them a meaningful name so that we can we can still make sense of the pre-unlink name when using lsof(8) or similar tools on Linux.
2019-09-09	run update-copyrights from gnulib for 2019

2019-07-10	http\|nntp: avoid recursion inside ->write
	In HTTP.pm, we can use the same technique NNTP.pm uses with long_response with the $long_cb callback and avoid storing $pull in the per-client structure at all. We can also reuse the same logic to push the callback into wbuf from NNTP. This does NOT introduce a new circular reference, but documents it more clearly.
2019-07-08	http\|nntp: "use PublicInbox::DS" instead of ->import
	Relying on "use" to import during BEGIN means we get to take advantage of prototype checking of function args during the rest of the compilation phase.
2019-06-29	httpd/async: switch to buffering-as-fast-as-possible
	With DS buffering to a temporary file nowadays, applying backpressure to git-http-backend(1) hurts overall memory usage of the system. Instead, try to get git-http-backend(1) to finish as quickly as possible and use edge-triggered notifications to reduce wakeups on our end.
2019-06-29	http: support HTTPS (kinda)
	It's barely any effort at all to support HTTPS now that we have NNTPS support and can share all the code for writing daemons. However, we still depend on Varnish to avoid hug-of-death situations, so supporting reverse-proxying will be required.
2019-06-29	ds: handle deferred DS->close after timers
	Our hacks in EvCleanup::next_tick and EvCleanup::asap were due to the fact "closed" sockets were deferred and could not wake up the event loop, causing certain actions to be delayed until an event fired. Instead, ensure we don't sleep if there are pending sockets to close. We can then remove most of the EvCleanup stuff While we're at it, split out immediate timer handling into a separate array so we don't need to deal with time calculations for the event loop.
2019-06-29	http: use requeue instead of watch_in1
	Don't use epoll or kqueue to watch for anything unless we hit EAGAIN, since we don't know if a socket is SSL or not.
2019-06-29	ds: share lazy rbuf handling between HTTP and NNTP
	Doing this for HTTP cuts the memory usage of 10K idle-after-one-request HTTP clients from 92 MB to 47 MB. The savings over the equivalent NNTP change in commit 6f173864f5acac89769a67739b8c377510711d49, ("nntp: lazily allocate and stash rbuf") seems down to the size of HTTP requests and the fact HTTP is a client-sends-first protocol where as NNTP is server-sends-first.
2019-06-24	allow use of PerlIO layers for filesystem writes
	It may make sense to use PerlIO::mmap or PerlIO::scalar for DS write buffering with IO::Socket::SSL or similar (since we can't use MSG_MORE), so that means we need to go through buffering in userspace for the common case; while still being easily compatible with slow clients. And it also simplifies GitHTTPBackend slightly. Maybe it can make sense for HTTP input buffering, too...
2019-06-24	ds: hoist out do_read from NNTP and HTTP
	Both NNTP and HTTP have common needs and we can factor out some common code to make dealing with IO::Socket::SSL easier.
2019-06-24	http\|nntp: be explicit about bytes::length on rbuf
	It should not matter because our rbuf is always from a socket without encoding layers, but this makes things easier to follow.
2019-06-24	ds: pass $self to code references
	We can reduce the amount of short-lived anonymous subs we create by passing $self to code references.
2019-06-24	http: don't pass extra args to PublicInbox::DS::close
	YAGNI Followup-to: commit 30ab5cf82b9d47242640f748a0f9a088ca783e32 ("ds: reduce Errno imports and drop ->close reason")
2019-06-24	ds: favor `delete' over assigning fields to `undef'
	This is cleaner in most cases and may allow Perl to reuse memory from unused fields. We can do this now that we no longer support Perl 5.8; since Danga::Socket was written with struct-like pseudo-hash support in mind, and Perl 5.9+ dropped support for pseudo-hashes over a decade ago.
2019-06-24	http\|nntp: favor "$! == EFOO" over $!{EFOO} checks
	Integer comparisions of "$!" are faster than hash lookups. See commit 6fa2b29fcd0477d126ebb7db7f97b334f74bbcbc ("ds: cleanup Errno imports and favor constant comparisons") for benchmarks.
2019-06-24	ds: get rid of event_watch field
	We don't need to keep track of that field since we always know what events we're interested in when using one-shot wakeups.
2019-06-24	ds: set event flags directly at initialization
	We can avoid the EPOLL_CTL_ADD && EPOLL_CTL_MOD sequence with a single EPOLL_CTL_ADD.
2019-06-24	ds: switch write buffering to use a tempfile
	Data which can't fit into a generously-sized socket buffer, has no business being stored in heap.
2019-06-24	ds: share send(..., MSG_MORE) logic
	No sense in having similar Linux-specific functionality in both our NNTP.pm and HTTP.pm
2019-06-24	http: favor DS->write(strref) when reasonable
	This can avoid large memory copies when strings can't be copy-on-write and saves us the trouble of creating new refs in the code.
2019-06-24	ds: lazy-initialize wbuf
	We don't need write buffering unless we encounter slow clients requesting large responses. So don't waste a hash slot or (empty) arrayref for it.
2019-06-24	ds: get rid of {closed} field
	Merely checking the presence of the {sock} field is enough, and having multiple sources of truth increases confusion and the likelyhood of bugs.
2019-06-16	ds: stop distinguishing event read and write callbacks
	Having separate read/write callbacks in every class is too confusing to my easily-confused mind. Instead, give every class an "event_step" callback which is easier to wrap my head around. This will make future code to support IO::Socket::SSL-wrapped sockets easier-to-digest, since SSL_write() can require waiting on POLLIN events, and SSL_read() can require waiting on POLLOUT events.
2019-06-10	ds: do not distinguish between POLLHUP and POLLERR
	In my experience, both are worthless as any normal read/write call path will be wanting to check errors and deal with them appropriately; so we can just call event_read, for now. Eventually, there'll probably be only one callback for dealing with all in/out/err/hup events to simplify logic, especially w.r.t TLS socket negotiation.
2019-06-10	ds: simplify write buffer accounting
	Keeping track of write_buf_size was redundant and pointless when we can simply check the number of elements in the buffer array. Multiple sources of truth leads to confusion; confusion leads to bugs. Finally, rename the prefixes to 'wbuf' to ensure we loudly (instead of silently) break any external dependencies being ported over from Danga::Socket, as further changes are pending.
2019-06-04	http: require SERVER_PORT to be ASCII digit
	I'm not sure what middlewares care for for SERVER_PORT; but allowing non-ASCII digits seems non-sensical, here.
2019-05-15	remove hard Devel::Peek dependency and lazy load for daemons
	It's only useful for a corner case in long-running daemons when an admin decides to compact or vacuum a Xapian or SQLite DB. As a result, other scripts should run slightly faster. For instance, this saves about 80ms (2.710s => 2.630s) in t/mda.t on my remote workstation. While we're at it, make sure EvCleanup is properly require'd in Daemon.pm and HTTP.pm and document our use of Devel::Peek.
2019-05-04	bundle Danga::Socket and Sys::Syscall
	These modules are unmaintained upstream at the moment, but I'll be able to help with the intended maintainer once/if CPAN ownership is transferred. OTOH, we've been waiting for that transfer for several years, now... Changes I intend to make: * EPOLLEXCLUSIVE for Linux * remove unused fields wasting memory * kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615 * accept4 support And some lower priority experiments: * switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes) * nginx-style buffering to tmpfile instead of string array * sendfile off tmpfile buffers * io_uring maybe?
2019-02-13	ensure bytes::length is available to callers
	We were relying on Danga::Socket using the "bytes" pragma, previously. Nowadays, the "bytes" pragma is not recommended in general, but bytes::length remains acceptable for getting the byte-size of a scalar.
2019-02-07	http: cleanup partial-write handling on readonly values
	Don't bother assigning to $_[1]; just let Danga::Socket do its thing since $_[1] should be out-of-scope soon.
2018-03-27	http: fix modification of read-only value
	This fails in the rare case we get a partial send() on "\r\n" when writing chunked HTTP responses out.
2018-02-07	update copyrights for 2018
	Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-01-04	http: remove weaken usage, reduce anonsub capture scope
	Avoiding weaken here is no more dangerous than the existing circular refs (e.g. psgix.io) we create and manage throughout the lifetime of the connection. So, trust ourselves to maintain the data structure properly and avoid triggering extra memory usage. While we're at it, avoid having anonymous subroutines capture more variables than necessary to simplify reference auditing.
2017-01-04	http: fix spelling error
	Oops. And we'll be fixing circular references from now...
2016-12-25	http: fix clobbering of $null_io
	Oops, this would be disatrous if we started handling bigger request bodies or slow clients. Fixes: c008654229a9 ("avoid IO::File for anonymous temporary files")
2016-11-26	avoid IO::File for anonymous temporary files
	We do not need to import IO::File into the main programs since Perl 5.8+ supports literal "undef" for generating anonymous temporary file handles.