Date | Commit message (Collapse) |
|
We must detect EOF when reading a POST body with standard PSGI servers.
This does not affect deployments using the standard public-inbox-httpd;
but most smaller inboxes should be able to get away using a generic
PSGI server.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Fewer returns improves readability and the diffstat agrees.
|
|
Fewer conditionals means theres fewer code paths to test
and makes things easier-to-read.
|
|
Use a more meaningful variable name for the Qspawn
object, since this module is the reference for its
use.
|
|
Notes for future developers (myself included) since we
can't assume people can read my mind.
|
|
We do not need to import IO::File into the main programs
since Perl 5.8+ supports literal "undef" for generating
anonymous temporary file handles.
|
|
This was sloppy code, all calls need to be checked
for failure.
|
|
Currently only for git-http-backend use, this allows limiting
the number of spawned processes per-inbox or by group, if there
are multiple large inboxes amidst a sea of small ones.
For example, a "big" repo limiter could be used for big inboxes:
which would be shared between multiple repos:
[limiter "big"]
max = 4
[publicinbox "git"]
address = git@vger.kernel.org
mainrepo = /path/to/git.git
; shared limiter with giant:
httpbackendmax = big
[publicinbox "giant"]
address = giant@project.org
mainrepo = /path/to/giant.git
; shared limiter with git:
httpbackendmax = big
; This is a tiny inbox, use the default limiter with 32 slots:
[publicinbox "meta"]
address = meta@public-inbox.org
mainrepo = /path/to/meta.git
|
|
And bump the default limit to 32 so we match git-daemon
behavior. This shall allow us to configure different levels
of concurrency for different repositories and prevent clones
of giant repos from stalling service to small repos.
|
|
Hopefully this can reduce memory overhead for people that
use one-shot CGI.
|
|
No need to keep an extra array around for this.
|
|
This will allow cache proxies such as Varnish to avoid
caching data sent by us.
|
|
This means we can still show non-git users a somewhat browseable
URL with a link to the README.html file while allowing git users
to type less when cloning.
All of the following are supported:
git clone https://public-inbox.org/ public-inbox
git clone https://public-inbox.org/public-inbox
git clone https://public-inbox.org/public-inbox.git
torsocks git clone http://ou63pmih66umazou.onion/public-inbox
|
|
No point in forcing users to pass a hashref/object to
get a single git directory.
|
|
Apparently git-http-backend exits with a non-zero
status on shallow clones (due to git-upload-pack),
so there is a to-be-fixed bug in git.git
http://mid.gmane.org/20160621112303.GA21973@dcvr.yhbt.net
http://mid.gmane.org/20160621121041.GA29156@sigill.intra.peff.net
|
|
Plack::Request is unnecessary overhead for this given the
strictness of git-http-backend. Furthermore, having to make
commit 311c2adc8c63 ("avoid Plack::Request parsing body")
to avoid tempfiles should not have been necessary.
|
|
The generic PSGI code needs to avoid resource leaks if
smart cloning is disabled (due to resource contraints).
|
|
This makes more sense as it keeps management of rpipe
nice and neat.
|
|
We need to avoid circular references in the generic PSGI layer,
do it by abusing DESTROY.
|
|
Having an excessive amount of git-pack-objects processes is
dangerous to the health of the server. Queue up process spawning
for long-running responses and serve them sequentially, instead.
|
|
We will have clients dropping connections during long clone
and fetch operations; so do not retain references holding
backend processes once we detect a client has dropped.
|
|
Only check query parameters since there's no useful body
in there.
|
|
This bit is still being redone to support gigantic repos.
|
|
This simplifies the code somewhat; but it could probably
still be made simpler. It will need to support command
queueing for expensive commands so expensive processes
can be queued up.
|
|
We can rely entirely on getline + close callbacks
and be compatible with 100% of PSGI servers.
|
|
We will figure out a different way to avoid overloading...
|
|
Mostly stolen from git upstream, these should prevent any caches
such as varnish or squid from acting improperly.
|
|
We can maintain the client HTTP connection if the process exited
with failure as long as we terminated our own response properly.
|
|
When serving large static files or large packs, we may call
Danga::Socket::write directly to queue up callbacks to resume
reading and defer firing them until the socket is writable.
This prevents us from scheduling writes or buffering until we
know the socket is writable and prevents needless buffering by
Danga::Socket when faced with slow clients.
For smart clones, this comes at the cost of throttling the
output of "git pack-objects" to the speed of the client
connection. This is probably not ideal, but is the behavior of
the standard git-daemon, too; and is preferable to running the
httpd out-of-memory. Buffering to the filesystem may be an
option in the future...
|
|
Since we use sysread, we must use sysseek for symmetry although
PerlIO may be doing a real lseek with "seek", anyways.
Fixes: 310819ea86ac ("git-http-backend: favor sysread for regular files")
|
|
We need to abort connections properly if a response is prematurely
truncated. This includes problems with serving static files, since
a clumsy admin or broken FS could return truncated responses and
inadvertently leave a client waiting (since the client saw
"Content-Length" in the header and expected a certain length).
|
|
The blocking PSGI server may cause EINTR to be hit, here.
|
|
Server admins may not be able to afford to have too many
git-pack-objects processes running at once. Since PSGI
HTTP servers should already be configured to use multiple
processes for other requests; limit concurrency of smart
backends to one; and fall back to dumb responses if we're
already generating a pack.
|
|
Using http.getanyfile still keeps the http-backend process
alive, so it's better to break out of that process and
handle serving entirely within the HTTP server.
|
|
This is used all over the place, but may not be in the future,
so ensure we explicitly load it ourselves.
|
|
We do not need line buffering, here; so favor sysread to
bypass extra copies which may be done by normal read.
|
|
We'll have to use it some more before deciding it is a public
interface. I do hope for it to be a usable public interface
one day for other users.
|
|
Apache2 mod_perl does not give us a real file handle, so
we must translate that before giving that to git-http-backend(1).
Also, parse the Status: correctly for errors since we failed to
set %ENV properly before the previous fix for SpawnPP
|
|
It is not needed as we know git uses CRLF termination.
|
|
This should reduce overhead of spawning git processes
from our long-running httpd and nntpd servers.
|
|
This will allow us to more easily read and test later.
|
|
Even with output buffering disabled via IO::Handle::autoflush,
writes are not atomic unless it is a single argument passed to
"print". Multiple arguments to "print" will show up as multiple
calls to write(2) instead of a single, atomic writev(2).
|
|
git-http-backend may take a while, ensure we can process other
requests while waiting on it. We currently do this via
Danga::Socket in public-inbox-httpd; but avoid exposing this
internal implementation detail to the PSGI interface and
instead only expose a callback via: $env->{'pi-httpd.async'}
|
|
Designing for asynchronous, non-blocking operations makes
adapting for synchronous, blocking operation easy.
Going the other way around is not easy, so do it now and
allow us to be more easily adapted for non-blocking use
in the next commit...
|
|
This allows us to stream the output to the client without buffering
everything up-front. Next, we'll let Danga::Socket (or AE in the
future) wait for readability.
|
|
Relying on Plack::Handler::CGI is much easier for long-term
maintenance and development.
Nowadays, we even include our own httpd implementation to
facilitate easier deployment with PSGI/Plack.
|
|
This requires POST and (small file) upload support from the
PSGI/Plack web server. CGI.pm is currently not supported with
this feature.
We'll serve everything git can handle by default for performance
in the general case.
To avoid introducing cognitive overhead for sysadmins managing
existing HTTP backends, we do not introduce new configuration
directives.
Thus, setting http.uploadpack=false in the relevant git config
file for each public-inbox (ssoma) git repo will disable smart
HTTP for CPU/memory-constrained systems.
Technically we could support http.receivepack to allow posting
messages to a public-inbox over HTTP(S), but that breaks
the public-inbox model of encouraging users to Cc: everyone.
Again, we encourage users to Cc: everyone to reduce the chance
of a public-inbox becoming a centralized point of
failure/censorship.
|