about summary refs log tree commit homepage
path: root/lib/PublicInbox/GitHTTPBackend.pm
DateCommit message (Collapse)
2019-06-04githttpbackend: require ASCII in path
We mainly support git-upload-pack; and maybe somebody uses git-receive-pack with this. Perhaps other (experimental) command names are acceptable. But it's unlikely anybody will want Unicode command names for git services.
2019-06-04githttpbackend: require Range:, Status: to be ASCII digits
Non-ASCII digits would be interpreted as a zeroes as integers. While we're at it, ensure the Status: code is an ASCII digit, too; though I would not expect git-http-backend(1) or cgit(1) start spewing non-ASCII digits at us.
2019-05-04bundle Danga::Socket and Sys::Syscall
These modules are unmaintained upstream at the moment, but I'll be able to help with the intended maintainer once/if CPAN ownership is transferred. OTOH, we've been waiting for that transfer for several years, now... Changes I intend to make: * EPOLLEXCLUSIVE for Linux * remove unused fields wasting memory * kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615 * accept4 support And some lower priority experiments: * switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes) * nginx-style buffering to tmpfile instead of string array * sendfile off tmpfile buffers * io_uring maybe?
2019-04-15cgit: serve static css, logo, favicon directly
We can reduce the configuration needed to run cgit by reusing the static file handling logic of the dumb git HTTP protocol. I hate logos and icons, so don't expect public-inbox.org or 80x24.org to ever have those to waste users' bandwidth with :P But I expect other users to find this useful.
2019-04-04githttpbackend: check for other errors and relax CRLF check
Reads to git-http-backend(1) could fail or EOF prematurely, so we must be ready for that case. Furthermore, cgit (and possibly other CGI) uses LF instead of CRLF, so support those programs, too.
2019-04-04githttpbackend: move more psgi.input handling into subroutine
This will be useful for other CGI wrappers we make. This also fixes a bug with some PSGI servers which did not present a real IO::Handle in the psgi.input env field.
2019-04-02githttpbackend: serve $GIT_DIR/info/attributes
This will be useful for reproducibility when mirroring coderepos and generating diffs.
2019-01-22qspawn: implement psgi_return and use it for githttpbackend
Was: ("repobrowse: port patch generation over to qspawn") We'll be using it for githttpbackend and maybe other things.
2019-01-09doc: various overview-level module comments
Hopefully this helps people familiarize themselves with the source code.
2018-03-27githttpbackend: avoid infinite loop on generic PSGI servers
We must detect EOF when reading a POST body with standard PSGI servers. This does not affect deployments using the standard public-inbox-httpd; but most smaller inboxes should be able to get away using a generic PSGI server.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-12-25githttpbackend: minor cleanups to improve readability
Fewer returns improves readability and the diffstat agrees.
2016-12-25githttpbackend: simplify compatibility code
Fewer conditionals means theres fewer code paths to test and makes things easier-to-read.
2016-12-25githttpbackend: minor readability improvement
Use a more meaningful variable name for the Qspawn object, since this module is the reference for its use.
2016-12-22doc: various comments on async handling
Notes for future developers (myself included) since we can't assume people can read my mind.
2016-11-26avoid IO::File for anonymous temporary files
We do not need to import IO::File into the main programs since Perl 5.8+ supports literal "undef" for generating anonymous temporary file handles.
2016-11-26githttpbackend: error checking for input handling
This was sloppy code, all calls need to be checked for failure.
2016-07-09www: add configurable limiters
Currently only for git-http-backend use, this allows limiting the number of spawned processes per-inbox or by group, if there are multiple large inboxes amidst a sea of small ones. For example, a "big" repo limiter could be used for big inboxes: which would be shared between multiple repos: [limiter "big"] max = 4 [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.git ; shared limiter with giant: httpbackendmax = big [publicinbox "giant"] address = giant@project.org mainrepo = /path/to/giant.git ; shared limiter with git: httpbackendmax = big ; This is a tiny inbox, use the default limiter with 32 slots: [publicinbox "meta"] address = meta@public-inbox.org mainrepo = /path/to/meta.git
2016-07-09qspawn: allow configurable limiters
And bump the default limit to 32 so we match git-daemon behavior. This shall allow us to configure different levels of concurrency for different repositories and prevent clones of giant repos from stalling service to small repos.
2016-07-09cleanup some unnecessary use/requires
Hopefully this can reduce memory overhead for people that use one-shot CGI.
2016-07-07githttpbackend: avoid intermediate array creation from stat
No need to keep an extra array around for this.
2016-07-03githttpbackend: match Content-Type of git-http-backend(1)
This will allow cache proxies such as Varnish to avoid caching data sent by us.
2016-07-01git: allow cloning from the URL root, too
This means we can still show non-git users a somewhat browseable URL with a link to the README.html file while allowing git users to type less when cloning. All of the following are supported: git clone https://public-inbox.org/ public-inbox git clone https://public-inbox.org/public-inbox git clone https://public-inbox.org/public-inbox.git torsocks git clone http://ou63pmih66umazou.onion/public-inbox
2016-07-01githttpbackend: allow git to be a regular scalar string
No point in forcing users to pass a hashref/object to get a single git directory.
2016-06-24githttpbackend: shallow clone workaround
Apparently git-http-backend exits with a non-zero status on shallow clones (due to git-upload-pack), so there is a to-be-fixed bug in git.git http://mid.gmane.org/20160621112303.GA21973@dcvr.yhbt.net http://mid.gmane.org/20160621121041.GA29156@sigill.intra.peff.net
2016-05-30git-http-backend: remove dependency on Plack::Request
Plack::Request is unnecessary overhead for this given the strictness of git-http-backend. Furthermore, having to make commit 311c2adc8c63 ("avoid Plack::Request parsing body") to avoid tempfiles should not have been necessary.
2016-05-27git-http-backend: close pipe for generic PSGI on errors
The generic PSGI code needs to avoid resource leaks if smart cloning is disabled (due to resource contraints).
2016-05-27git-http-backend: move real close to GetlineBody
This makes more sense as it keeps management of rpipe nice and neat.
2016-05-27git-http-backend: fix aborts for generic PSGI clone
We need to avoid circular references in the generic PSGI layer, do it by abusing DESTROY.
2016-05-24git-http-backend: use qspawn to limit running processes
Having an excessive amount of git-pack-objects processes is dangerous to the health of the server. Queue up process spawning for long-running responses and serve them sequentially, instead.
2016-05-23git-http-backend: refactor to support cleanup
We will have clients dropping connections during long clone and fetch operations; so do not retain references holding backend processes once we detect a client has dropped.
2016-05-23git-http-backend: avoid Plack::Request parsing body
Only check query parameters since there's no useful body in there.
2016-05-23git-http-backend: cleanup vestigial the process limiter code
This bit is still being redone to support gigantic repos.
2016-05-22git-http-backend: switch to async_pass
This simplifies the code somewhat; but it could probably still be made simpler. It will need to support command queueing for expensive commands so expensive processes can be queued up.
2016-05-22git-http-backend: simplify dumb serving
We can rely entirely on getline + close callbacks and be compatible with 100% of PSGI servers.
2016-05-22git-http-backend: remove process limit
We will figure out a different way to avoid overloading...
2016-05-15git-http-backend: set cache headers
Mostly stolen from git upstream, these should prevent any caches such as varnish or squid from acting improperly.
2016-05-12git-http-backend: do not drop connection on successful finish
We can maintain the client HTTP connection if the process exited with failure as long as we terminated our own response properly.
2016-05-03git-http-backend: reduce memory use for clone/fetch
When serving large static files or large packs, we may call Danga::Socket::write directly to queue up callbacks to resume reading and defer firing them until the socket is writable. This prevents us from scheduling writes or buffering until we know the socket is writable and prevents needless buffering by Danga::Socket when faced with slow clients. For smart clones, this comes at the cost of throttling the output of "git pack-objects" to the speed of the client connection. This is probably not ideal, but is the behavior of the standard git-daemon, too; and is preferable to running the httpd out-of-memory. Buffering to the filesystem may be an option in the future...
2016-05-01git-http-backend: use real lseek for Content-Range
Since we use sysread, we must use sysseek for symmetry although PerlIO may be doing a real lseek with "seek", anyways. Fixes: 310819ea86ac ("git-http-backend: favor sysread for regular files")
2016-04-29http: improve error handling for aborted responses
We need to abort connections properly if a response is prematurely truncated. This includes problems with serving static files, since a clumsy admin or broken FS could return truncated responses and inadvertently leave a client waiting (since the client saw "Content-Length" in the header and expected a certain length).
2016-04-29git-http-backend: check EINTR as well as EAGAIN
The blocking PSGI server may cause EINTR to be hit, here.
2016-04-28githttpbackend: clamp to one smart HTTP request at-a-time
Server admins may not be able to afford to have too many git-pack-objects processes running at once. Since PSGI HTTP servers should already be configured to use multiple processes for other requests; limit concurrency of smart backends to one; and fall back to dumb responses if we're already generating a pack.
2016-04-28githttpbackend: fall back to dumb if smart HTTP is off
Using http.getanyfile still keeps the http-backend process alive, so it's better to break out of that process and handle serving entirely within the HTTP server.
2016-04-25githttpbackend: require IO::File explicitly
This is used all over the place, but may not be in the future, so ensure we explicitly load it ourselves.
2016-03-05git-http-backend: favor sysread for regular files
We do not need line buffering, here; so favor sysread to bypass extra copies which may be done by normal read.
2016-03-01httpd: document pi-httpd.async as totally unstable
We'll have to use it some more before deciding it is a public interface. I do hope for it to be a usable public interface one day for other users.
2016-02-29git-http-backend: fixes for mod_perl
Apache2 mod_perl does not give us a real file handle, so we must translate that before giving that to git-http-backend(1). Also, parse the Status: correctly for errors since we failed to set %ENV properly before the previous fix for SpawnPP
2016-02-29git-http-backend: stricter parsing of CRLF
It is not needed as we know git uses CRLF termination.
2016-02-27git: use built-in spawn implementation for vfork
This should reduce overhead of spawning git processes from our long-running httpd and nntpd servers.