about summary refs log tree commit homepage
path: root/lib/PublicInbox/GitHTTPBackend.pm
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-01wwwstatic: move r(...) functions here
Remove redundant "r" functions for generating short error responses. These responses will no longer be cached by clients, which is probably a good thing since most errors ought to be transient, anyways. This also fixes error responses for our cgit wrapper when static files are missing.
2020-01-01githttpbackend: remove ancient compatibility check
The ref() call could be hitting memory leaks on Perl 5.16.x. It's been 3 years (2016-12-25) since 292ca34140489da2 ("githttpbackend: simplify compatibility code") back when this project was barely known and probably nobody used examples/public-inbox.psgi...
2019-12-30spawn: allow passing GLOB handles for redirects
We can save callers the trouble of {-hold} and {-dev_null} refs as well as the trouble of calling fileno().
2019-12-27githttpbackend: split out wwwstatic
Make it easier to share code between our GitHTTPBackend and Cgit packages, for now, and possibly other packages in the future. We can avoid inline_object and anonymous subs at the same time, reducing per-request memory overhead.
2019-12-26qspawn: psgi_return: allow non-anon parse_hdr callback
Callers can supply an arg to parse_hdr, now, eliminating the need for closures to capture local variables.
2019-09-17qspawn: improve variable naming and commenting
Naming $start_cb consistently helps avoid confusing new readers, and some comments will help with understanding flow
2019-09-14githttpbackend: use REMOTE_ADDR for deleted identifier
REMOTE_HOST is not set by us (it is the reverse DNS name) of REMOTE_ADDR, and there's few better ways to kill HTTP server performance than to use standard name resolution APIs like getnameinfo(3).
2019-09-14tmpfile: give temporary files meaningful names
Although we always unlink temporary files, give them a meaningful name so that we can we can still make sense of the pre-unlink name when using lsof(8) or similar tools on Linux.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-24allow use of PerlIO layers for filesystem writes
It may make sense to use PerlIO::mmap or PerlIO::scalar for DS write buffering with IO::Socket::SSL or similar (since we can't use MSG_MORE), so that means we need to go through buffering in userspace for the common case; while still being easily compatible with slow clients. And it also simplifies GitHTTPBackend slightly. Maybe it can make sense for HTTP input buffering, too...
2019-06-04githttpbackend: require ASCII in path
We mainly support git-upload-pack; and maybe somebody uses git-receive-pack with this. Perhaps other (experimental) command names are acceptable. But it's unlikely anybody will want Unicode command names for git services.
2019-06-04githttpbackend: require Range:, Status: to be ASCII digits
Non-ASCII digits would be interpreted as a zeroes as integers. While we're at it, ensure the Status: code is an ASCII digit, too; though I would not expect git-http-backend(1) or cgit(1) start spewing non-ASCII digits at us.
2019-05-04bundle Danga::Socket and Sys::Syscall
These modules are unmaintained upstream at the moment, but I'll be able to help with the intended maintainer once/if CPAN ownership is transferred. OTOH, we've been waiting for that transfer for several years, now... Changes I intend to make: * EPOLLEXCLUSIVE for Linux * remove unused fields wasting memory * kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615 * accept4 support And some lower priority experiments: * switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes) * nginx-style buffering to tmpfile instead of string array * sendfile off tmpfile buffers * io_uring maybe?
2019-04-15cgit: serve static css, logo, favicon directly
We can reduce the configuration needed to run cgit by reusing the static file handling logic of the dumb git HTTP protocol. I hate logos and icons, so don't expect public-inbox.org or 80x24.org to ever have those to waste users' bandwidth with :P But I expect other users to find this useful.
2019-04-04githttpbackend: check for other errors and relax CRLF check
Reads to git-http-backend(1) could fail or EOF prematurely, so we must be ready for that case. Furthermore, cgit (and possibly other CGI) uses LF instead of CRLF, so support those programs, too.
2019-04-04githttpbackend: move more psgi.input handling into subroutine
This will be useful for other CGI wrappers we make. This also fixes a bug with some PSGI servers which did not present a real IO::Handle in the psgi.input env field.
2019-04-02githttpbackend: serve $GIT_DIR/info/attributes
This will be useful for reproducibility when mirroring coderepos and generating diffs.
2019-01-22qspawn: implement psgi_return and use it for githttpbackend
Was: ("repobrowse: port patch generation over to qspawn") We'll be using it for githttpbackend and maybe other things.
2019-01-09doc: various overview-level module comments
Hopefully this helps people familiarize themselves with the source code.
2018-03-27githttpbackend: avoid infinite loop on generic PSGI servers
We must detect EOF when reading a POST body with standard PSGI servers. This does not affect deployments using the standard public-inbox-httpd; but most smaller inboxes should be able to get away using a generic PSGI server.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-12-25githttpbackend: minor cleanups to improve readability
Fewer returns improves readability and the diffstat agrees.
2016-12-25githttpbackend: simplify compatibility code
Fewer conditionals means theres fewer code paths to test and makes things easier-to-read.
2016-12-25githttpbackend: minor readability improvement
Use a more meaningful variable name for the Qspawn object, since this module is the reference for its use.
2016-12-22doc: various comments on async handling
Notes for future developers (myself included) since we can't assume people can read my mind.
2016-11-26avoid IO::File for anonymous temporary files
We do not need to import IO::File into the main programs since Perl 5.8+ supports literal "undef" for generating anonymous temporary file handles.
2016-11-26githttpbackend: error checking for input handling
This was sloppy code, all calls need to be checked for failure.
2016-07-09www: add configurable limiters
Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git
2016-07-09qspawn: allow configurable limiters
And bump the default limit to 32 so we match git-daemon behavior. This shall allow us to configure different levels of concurrency for different repositories and prevent clones of giant repos from stalling service to small repos.
2016-07-09cleanup some unnecessary use/requires
Hopefully this can reduce memory overhead for people that use one-shot CGI.
2016-07-07githttpbackend: avoid intermediate array creation from stat
No need to keep an extra array around for this.
2016-07-03githttpbackend: match Content-Type of git-http-backend(1)
This will allow cache proxies such as Varnish to avoid caching data sent by us.
2016-07-01git: allow cloning from the URL root, too
This means we can still show non-git users a somewhat browseable URL with a link to the README.html file while allowing git users to type less when cloning. All of the following are supported: git clone https://public-inbox.org/ public-inbox git clone https://public-inbox.org/public-inbox git clone https://public-inbox.org/public-inbox.git torsocks git clone http://ou63pmih66umazou.onion/public-inbox
2016-07-01githttpbackend: allow git to be a regular scalar string
No point in forcing users to pass a hashref/object to get a single git directory.
2016-06-24githttpbackend: shallow clone workaround
Apparently git-http-backend exits with a non-zero status on shallow clones (due to git-upload-pack), so there is a to-be-fixed bug in git.git http://mid.gmane.org/20160621112303.GA21973@dcvr.yhbt.net http://mid.gmane.org/20160621121041.GA29156@sigill.intra.peff.net
2016-05-30git-http-backend: remove dependency on Plack::Request
Plack::Request is unnecessary overhead for this given the strictness of git-http-backend. Furthermore, having to make commit 311c2adc8c63 ("avoid Plack::Request parsing body") to avoid tempfiles should not have been necessary.
2016-05-27git-http-backend: close pipe for generic PSGI on errors
The generic PSGI code needs to avoid resource leaks if smart cloning is disabled (due to resource contraints).
2016-05-27git-http-backend: move real close to GetlineBody
This makes more sense as it keeps management of rpipe nice and neat.
2016-05-27git-http-backend: fix aborts for generic PSGI clone
We need to avoid circular references in the generic PSGI layer, do it by abusing DESTROY.
2016-05-24git-http-backend: use qspawn to limit running processes
Having an excessive amount of git-pack-objects processes is dangerous to the health of the server. Queue up process spawning for long-running responses and serve them sequentially, instead.
2016-05-23git-http-backend: refactor to support cleanup
We will have clients dropping connections during long clone and fetch operations; so do not retain references holding backend processes once we detect a client has dropped.
2016-05-23git-http-backend: avoid Plack::Request parsing body
Only check query parameters since there's no useful body in there.
2016-05-23git-http-backend: cleanup vestigial the process limiter code
This bit is still being redone to support gigantic repos.
2016-05-22git-http-backend: switch to async_pass
This simplifies the code somewhat; but it could probably still be made simpler. It will need to support command queueing for expensive commands so expensive processes can be queued up.
2016-05-22git-http-backend: simplify dumb serving
We can rely entirely on getline + close callbacks and be compatible with 100% of PSGI servers.
2016-05-22git-http-backend: remove process limit
We will figure out a different way to avoid overloading...
2016-05-15git-http-backend: set cache headers
Mostly stolen from git upstream, these should prevent any caches such as varnish or squid from acting improperly.
2016-05-12git-http-backend: do not drop connection on successful finish
We can maintain the client HTTP connection if the process exited with failure as long as we terminated our own response properly.