Date | Commit message (Collapse) |
|
64K matches the Linux pipe default, and matches what we use in
httpd/async and qspawn. This should reduce syscalls used for
serving git packs via dumb HTTP and any ->getline code paths
used by other PSGI code.
This appears to speed up HTML rendering by w3m when serving
giant HTML responsees from the Devel::Mwrap::PSGI memory
debugger.
|
|
Malicious clients may attempt HTTP request smuggling this way.
This doesn't affect our current code as we only look for exact
matches, but it could affect other servers behind a
to-be-implemented reverse proxy built around our -httpd.
This doesn't affect users behind varnish at all, nor the
HTTPS/HTTP reverse proxy I use (I don't know about nginx), but
could be passed through by other reverse proxies.
This change is only needed for HTTP::Parser::XS which most users
probably use. Users of the pure Perl parser (via
PLACK_HTTP_PARSER_PP=1) already hit 400 errors in this case,
so this makes the common XS case consistent with the pure Perl
case.
cf. https://www.mozilla.org/en-US/security/advisories/mfsa2006-33/
|
|
Most of the HTTP server code was written for Danga::Socket and
not fully-transitioned to take advantage of PublicInbox::DS.
This change brings it up-to-date with the style of pipeline
handling used for -imapd and -nntpd.
|
|
It's needlessly complex and O(n), so it doesn't scale well to a
high number of clients nor is it easy-to-scale with the data
structures available to us in pure Perl.
In any case, I see no evidence of either -imapd nor -nntpd
experiencing high connection loads on public-facing sites.
-httpd has never had its own timer-based expiration, either.
Fwiw, public-inbox.org itself has been running a public-facing
HTTP/HTTPS server with no userspace idle client expiration for
the past 8 years or with no ill effect. Clients can come and go
as they wish, and SO_KEEPALIVE takes care of truly broken
connections if they're gone for ~2 hours.
Internet connections drop all time, so it should be harmless to
drop connections w/o warning since both NNTP and IMAP protocols
have well-defined semantics for determining if a message was
truncated (as does HTTP/1.1+).
|
|
Large chunks of our codebase and 3rd-party dependencies do not
use ->{psgi.errors}, so trying to standardize on it was a
fruitless endeavor. Since warn() and carp() are standard
mechanism within Perl, just use that instead and simplify a
bunch of existing code.
|
|
By using syswrite to populate env->{psgi.input}. The substr()
call IO::Handle->write will trigger Perl's target/scratchpad and
result in a permanent allocation. Since this is a cold path,
that allocation is pointless, and syswrite() can already write a
substring.
Allowing Perl to cache a large allocation in a cold path only
result in fragmentation and wasted RAM.
write(2) on a regular file won't result in short writes
unless the FS quotas or free space limits are hit, or the buffer
is close to overflowing (e.g. the 0x7ffff000-byte Linux limit).
Since our HTTP server will never buffer that much in RAM,
there's no need to retry syswrite nor rely on the retrying
implicit in IO::Handle->write and the "print" perlop.
|
|
The only place where we could return wide characters with -httpd
was the raw $INBOX_DIR/description text, which is now converted
to octets.
All daemon (HTTP/NNTP/IMAP) sockets are opened in binary mode,
so length() and bytes::length() are equivalent on reads. For
socket writes, any non-octet data would warn about wide characters
and we are strict in warnings with test_httpd.
All gzipped buffers are also octets, as is PublicInbox::Eml->body,
and anything from PerlIO objects ("git cat-file --batch" output,
filesystems), so bytes::length was unnecessary in all those places.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Apparently they happen (triggered by my -imapd instance), so
bail out by closing the underlying socket rather than stopping
the event loop and daemon process.
|
|
This lets the -httpd worker process make better use of time
instead of waiting for git-cat-file to respond. With 4 jobs in
the new test case against a clone of
<https://public-inbox.org/meta/>, a speedup of 10-12% is shown.
Even a single job shows a 2-5% improvement on an SSD.
|
|
Since the removal of pseudo-hash support in Perl 5.10, the
"fields" module no longer provides the space or speed benefits
it did in 5.8. It also does not allow for compile-time checks,
only run-time checks.
To me, the extra developer overhead in maintaining "use fields"
args has become a hassle. None of our non-DS-related code uses
fields.pm, nor do any of our current dependencies. In fact,
Danga::Socket (which DS was originally forked from) and its
subclasses are the only fields.pm users I've ever encountered in
the wild. Removing fields may make our code more approachable
to other Perl hackers.
So stop using fields.pm and locked hashes, but continue to
document what fields do for non-trivial classes.
|
|
Doing a ref($obj) string comparison ties us to IO::Socket::SSL
(and OpenSSL) In the future, we may support GnuTLS or other TLS
implementations. This was already done in the IMAP code.
|
|
We need to favor "Transfer-Encoding: chunked" over the value of
the Content-Length header. We should also reject bogus,
duplicate and/or unreasonable values for both these, since they
can trigger unexpected behavior when combined with other HTTP
parsers in proxies such as varnish, nginx, haproxy, etc...
See RFC 7230 (and RFC 2616) for more details:
https://tools.ietf.org/html/rfc7230
https://www.rfc-editor.org/errata_search.php?rfc=7230
|
|
I didn't wait until September to do it, this year!
|
|
While there is no known actual leak due to reference cycles,
here, eliminating a potential source of leaks is helpful.
|
|
We can rely on autovification to turn `undef' value of {wbuf}
into an arrayref.
Furthermore, "push" returns the (new) size of the array since at
least Perl 5.0 (I didn't look further back), so we can use that
return value instead of calling "scalar" again.
|
|
'0' is a valid value for HTTP_HOST, and maybe some folks
will want to hit that as port 80 where the HTTP client won't
send the ":$PORT" suffix.
|
|
Application-supplied callbacks may error out, try to log them
so the PSGI app developer can figure out what went wrong.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
We've been using async_pass for a while.
|
|
We can avoid the danger of self-referential subs entirely for
code internal to PublicInbox::HTTP.
This change was only made possible by
commit 8e1c3155da4edc082e8e3d8b30351f0c861757a7
("ds: pass $self to code references")
|
|
Each sub costs us several kilobytes of memory for every
response we make. An arrayref only costs 80 bytes on
64-bit, so bless that to packages with appropriate ->write
and ->close methods.
|
|
EvCleanup only existed since Danga::Socket was a separate
component, and cleanup code belongs with the event loop.
|
|
Only removing $http->{env} is needed to prevent circular
references. $env->{'psgix.io'} does not need to be deleted
since $env will no longer have any references to it when
->close returns.
|
|
And explain why we need to do that delete in a comment.
|
|
Although we always unlink temporary files, give them a
meaningful name so that we can we can still make sense
of the pre-unlink name when using lsof(8) or similar
tools on Linux.
|
|
|
|
In HTTP.pm, we can use the same technique NNTP.pm uses with
long_response with the $long_cb callback and avoid storing
$pull in the per-client structure at all. We can also reuse
the same logic to push the callback into wbuf from NNTP.
This does NOT introduce a new circular reference, but documents
it more clearly.
|
|
Relying on "use" to import during BEGIN means we get to take
advantage of prototype checking of function args during the rest
of the compilation phase.
|
|
With DS buffering to a temporary file nowadays, applying
backpressure to git-http-backend(1) hurts overall memory
usage of the system. Instead, try to get git-http-backend(1)
to finish as quickly as possible and use edge-triggered
notifications to reduce wakeups on our end.
|
|
It's barely any effort at all to support HTTPS now that we have
NNTPS support and can share all the code for writing daemons.
However, we still depend on Varnish to avoid hug-of-death
situations, so supporting reverse-proxying will be required.
|
|
Our hacks in EvCleanup::next_tick and EvCleanup::asap were due
to the fact "closed" sockets were deferred and could not wake
up the event loop, causing certain actions to be delayed until
an event fired.
Instead, ensure we don't sleep if there are pending sockets to
close.
We can then remove most of the EvCleanup stuff
While we're at it, split out immediate timer handling into a
separate array so we don't need to deal with time calculations
for the event loop.
|
|
Don't use epoll or kqueue to watch for anything unless we hit
EAGAIN, since we don't know if a socket is SSL or not.
|
|
Doing this for HTTP cuts the memory usage of 10K
idle-after-one-request HTTP clients from 92 MB to 47 MB.
The savings over the equivalent NNTP change in commit
6f173864f5acac89769a67739b8c377510711d49,
("nntp: lazily allocate and stash rbuf") seems down to the
size of HTTP requests and the fact HTTP is a client-sends-first
protocol where as NNTP is server-sends-first.
|
|
It may make sense to use PerlIO::mmap or PerlIO::scalar for
DS write buffering with IO::Socket::SSL or similar (since we can't
use MSG_MORE), so that means we need to go through buffering
in userspace for the common case; while still being easily
compatible with slow clients.
And it also simplifies GitHTTPBackend slightly.
Maybe it can make sense for HTTP input buffering, too...
|
|
Both NNTP and HTTP have common needs and we can factor
out some common code to make dealing with IO::Socket::SSL
easier.
|
|
It should not matter because our rbuf is always from
a socket without encoding layers, but this makes things
easier to follow.
|
|
We can reduce the amount of short-lived anonymous subs we
create by passing $self to code references.
|
|
YAGNI
Followup-to: commit 30ab5cf82b9d47242640f748a0f9a088ca783e32
("ds: reduce Errno imports and drop ->close reason")
|
|
This is cleaner in most cases and may allow Perl to reuse memory
from unused fields.
We can do this now that we no longer support Perl 5.8; since
Danga::Socket was written with struct-like pseudo-hash support
in mind, and Perl 5.9+ dropped support for pseudo-hashes over
a decade ago.
|
|
Integer comparisions of "$!" are faster than hash lookups.
See commit 6fa2b29fcd0477d126ebb7db7f97b334f74bbcbc
("ds: cleanup Errno imports and favor constant comparisons")
for benchmarks.
|
|
We don't need to keep track of that field since we always
know what events we're interested in when using one-shot
wakeups.
|
|
We can avoid the EPOLL_CTL_ADD && EPOLL_CTL_MOD sequence with
a single EPOLL_CTL_ADD.
|
|
Data which can't fit into a generously-sized socket buffer,
has no business being stored in heap.
|
|
No sense in having similar Linux-specific functionality in
both our NNTP.pm and HTTP.pm
|
|
This can avoid large memory copies when strings can't be
copy-on-write and saves us the trouble of creating new
refs in the code.
|
|
We don't need write buffering unless we encounter slow clients
requesting large responses. So don't waste a hash slot or
(empty) arrayref for it.
|
|
Merely checking the presence of the {sock} field is
enough, and having multiple sources of truth increases
confusion and the likelyhood of bugs.
|
|
Having separate read/write callbacks in every class is too
confusing to my easily-confused mind. Instead, give every class
an "event_step" callback which is easier to wrap my head around.
This will make future code to support IO::Socket::SSL-wrapped
sockets easier-to-digest, since SSL_write() can require waiting
on POLLIN events, and SSL_read() can require waiting on POLLOUT
events.
|
|
In my experience, both are worthless as any normal read/write
call path will be wanting to check errors and deal with them
appropriately; so we can just call event_read, for now.
Eventually, there'll probably be only one callback for dealing
with all in/out/err/hup events to simplify logic, especially w.r.t
TLS socket negotiation.
|