Date | Commit message (Collapse) |
|
We mainly support git-upload-pack; and maybe somebody uses
git-receive-pack with this. Perhaps other (experimental)
command names are acceptable. But it's unlikely anybody will
want Unicode command names for git services.
|
|
Non-ASCII digits would be interpreted as a zeroes as integers.
While we're at it, ensure the Status: code is an ASCII digit,
too; though I would not expect git-http-backend(1) or cgit(1)
start spewing non-ASCII digits at us.
|
|
These modules are unmaintained upstream at the moment, but I'll
be able to help with the intended maintainer once/if CPAN
ownership is transferred. OTOH, we've been waiting for that
transfer for several years, now...
Changes I intend to make:
* EPOLLEXCLUSIVE for Linux
* remove unused fields wasting memory
* kqueue bugfixes e.g. https://rt.cpan.org/Ticket/Display.html?id=116615
* accept4 support
And some lower priority experiments:
* switch to EV_ONESHOT / EPOLLONESHOT (incompatible changes)
* nginx-style buffering to tmpfile instead of string array
* sendfile off tmpfile buffers
* io_uring maybe?
|
|
We can reduce the configuration needed to run cgit by reusing
the static file handling logic of the dumb git HTTP protocol.
I hate logos and icons, so don't expect public-inbox.org or
80x24.org to ever have those to waste users' bandwidth with :P
But I expect other users to find this useful.
|
|
Reads to git-http-backend(1) could fail or EOF prematurely,
so we must be ready for that case.
Furthermore, cgit (and possibly other CGI) uses LF instead
of CRLF, so support those programs, too.
|
|
This will be useful for other CGI wrappers we make.
This also fixes a bug with some PSGI servers which did not
present a real IO::Handle in the psgi.input env field.
|
|
This will be useful for reproducibility when mirroring
coderepos and generating diffs.
|
|
Was: ("repobrowse: port patch generation over to qspawn")
We'll be using it for githttpbackend and maybe other things.
|
|
Hopefully this helps people familiarize themselves with
the source code.
|
|
We must detect EOF when reading a POST body with standard PSGI servers.
This does not affect deployments using the standard public-inbox-httpd;
but most smaller inboxes should be able to get away using a generic
PSGI server.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Fewer returns improves readability and the diffstat agrees.
|
|
Fewer conditionals means theres fewer code paths to test
and makes things easier-to-read.
|
|
Use a more meaningful variable name for the Qspawn
object, since this module is the reference for its
use.
|
|
Notes for future developers (myself included) since we
can't assume people can read my mind.
|
|
We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.
|
|
This was sloppy code, all calls need to be checked
for failure.
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
And bump the default limit to 32 so we match git-daemon
behavior. This shall allow us to configure different levels
of concurrency for different repositories and prevent clones
of giant repos from stalling service to small repos.
|
|
Hopefully this can reduce memory overhead for people that
use one-shot CGI.
|
|
No need to keep an extra array around for this.
|
|
This will allow cache proxies such as Varnish to avoid
caching data sent by us.
|
|
This means we can still show non-git users a somewhat browseable
URL with a link to the README.html file while allowing git users
to type less when cloning.
All of the following are supported:
git clone https://public-inbox.org/ public-inbox
git clone https://public-inbox.org/public-inbox
git clone https://public-inbox.org/public-inbox.git
torsocks git clone http://ou63pmih66umazou.onion/public-inbox
|
|
No point in forcing users to pass a hashref/object to
get a single git directory.
|
|
Apparently git-http-backend exits with a non-zero
status on shallow clones (due to git-upload-pack),
so there is a to-be-fixed bug in git.git
http://mid.gmane.org/20160621112303.GA21973@dcvr.yhbt.net
http://mid.gmane.org/20160621121041.GA29156@sigill.intra.peff.net
|
|
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
|
|
The generic PSGI code needs to avoid resource leaks if
smart cloning is disabled (due to resource contraints).
|
|
This makes more sense as it keeps management of rpipe
nice and neat.
|
|
We need to avoid circular references in the generic PSGI layer,
do it by abusing DESTROY.
|
|
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
|
|
We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.
|
|
Only check query parameters since there's no useful body
in there.
|
|
This bit is still being redone to support gigantic repos.
|
|
This simplifies the code somewhat; but it could probably
still be made simpler. It will need to support command
queueing for expensive commands so expensive processes
can be queued up.
|
|
We can rely entirely on getline + close callbacks
and be compatible with 100% of PSGI servers.
|
|
We will figure out a different way to avoid overloading...
|
|
Mostly stolen from git upstream, these should prevent any caches
such as varnish or squid from acting improperly.
|
|
We can maintain the client HTTP connection if the process exited
with failure as long as we terminated our own response properly.
|
|
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
|
|
Since we use sysread, we must use sysseek for symmetry although
PerlIO may be doing a real lseek with "seek", anyways.
Fixes: 310819ea86ac ("git-http-backend: favor sysread for regular files")
|
|
We need to abort connections properly if a response is prematurely
truncated. This includes problems with serving static files, since
a clumsy admin or broken FS could return truncated responses and
inadvertently leave a client waiting (since the client saw
"Content-Length" in the header and expected a certain length).
|
|
The blocking PSGI server may cause EINTR to be hit, here.
|
|
Server admins may not be able to afford to have too many
git-pack-objects processes running at once. Since PSGI
HTTP servers should already be configured to use multiple
processes for other requests; limit concurrency of smart
backends to one; and fall back to dumb responses if we're
already generating a pack.
|
|
Using http.getanyfile still keeps the http-backend process
alive, so it's better to break out of that process and
handle serving entirely within the HTTP server.
|
|
This is used all over the place, but may not be in the future,
so ensure we explicitly load it ourselves.
|
|
We do not need line buffering, here; so favor sysread to
bypass extra copies which may be done by normal read.
|
|
We'll have to use it some more before deciding it is a public
interface. I do hope for it to be a usable public interface
one day for other users.
|
|
Apache2 mod_perl does not give us a real file handle, so
we must translate that before giving that to git-http-backend(1).
Also, parse the Status: correctly for errors since we failed to
set %ENV properly before the previous fix for SpawnPP
|
|
It is not needed as we know git uses CRLF termination.
|
|
This should reduce overhead of spawning git processes
from our long-running httpd and nntpd servers.
|