user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 00/13] support parsing cgitrc and spawning cgit
@ 2019-03-12  4:00 Eric Wong
  2019-03-12  4:00 ` [PATCH 01/13] git: add "commit_title" method Eric Wong
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

Parsing an existing cgitrc reduces the amount of work some for
admins (who already maintain cgit instances) by allowing them to
skip the tedious setup of of setting up [coderepo "..."]
sections.

We currently do not support "scan-path", "project-list" or
macros in cgitrc processing, yet.  So it's expected that
"repo.url" and "repo.path" be configured in the cgitrc for each
code repo.  (I started using cgit in 2008 before cgit supported
path scanning, and never updated my setup :x)

Since cgit does not serve smart HTTP fetch/clone, we can
intercept requests for those and route such requests to
git-http-backend(1) using the same mechanisms we use for
serving inboxes.

Eric Wong (13):
  git: add "commit_title" method
  viewvcs: preliminary support for showing non-blobs
  viewvcs: match 8000-byte lookup for git
  spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE
  support publicinbox.cgitrc directive
  githttpbackend: move more psgi.input handling into subroutine
  githttpbackend: check for other errors and relax CRLF check
  spawn: support absolute paths
  cgit: support running cgit as a standalone CGI
  www: wire up cgit as a 404 handler if cgitrc is configured
  qspawn: wire up RLIMIT_* handling to limiters
  cgit: use a dedicated named limiter
  spawn: require soft and hard entries in RLIMIT_* handling

 Documentation/public-inbox-config.pod | 44 +++++++++++++-
 MANIFEST                              |  2 +
 examples/cgit.psgi                    | 29 +++++++++
 lib/PublicInbox/Cgit.pm               | 88 +++++++++++++++++++++++++++
 lib/PublicInbox/Config.pm             | 61 +++++++++++++++++--
 lib/PublicInbox/Git.pm                | 20 +++++-
 lib/PublicInbox/GitHTTPBackend.pm     | 29 ++++-----
 lib/PublicInbox/Qspawn.pm             | 41 ++++++++++++-
 lib/PublicInbox/SolverGit.pm          | 14 +++--
 lib/PublicInbox/Spawn.pm              | 40 ++++++++++--
 lib/PublicInbox/SpawnPP.pm            |  9 ++-
 lib/PublicInbox/ViewVCS.pm            | 34 ++++++++++-
 lib/PublicInbox/WWW.pm                | 20 +++++-
 t/solver_git.t                        |  9 ++-
 t/spawn.t                             | 19 ++++++
 15 files changed, 416 insertions(+), 43 deletions(-)
 create mode 100644 examples/cgit.psgi
 create mode 100644 lib/PublicInbox/Cgit.pm

-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/13] git: add "commit_title" method
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 02/13] viewvcs: preliminary support for showing non-blobs Eric Wong
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

This will be useful for extracting titles/subjects from
commit objects when displaying commits.
---
 lib/PublicInbox/Git.pm | 7 +++++++
 t/solver_git.t         | 9 ++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index a756684..265c3fb 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -296,6 +296,13 @@ sub pub_urls {
 	local_nick($self);
 }
 
+sub commit_title ($$) {
+	my ($self, $oid) = @_; # PublicInbox::Git, $sha1hex
+	my $buf = cat_file($self, $oid) or return;
+	utf8::decode($$buf);
+	($$buf =~ /\r?\n\r?\n([^\r\n]+)\r?\n?/)[0]
+}
+
 1;
 __END__
 =pod
diff --git a/t/solver_git.t b/t/solver_git.t
index 8de6398..6f0ce77 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -40,7 +40,14 @@ sub deliver_patch ($) {
 
 deliver_patch('t/solve/0001-simple-mod.patch');
 
-$ibx->{-repo_objs} = [ PublicInbox::Git->new($git_dir) ];
+my $git = PublicInbox::Git->new($git_dir);
+is('public-inbox 1.0.0',
+	$git->commit_title('cb7c42b1e15577ed2215356a2bf925aef59cdd8d'),
+	'commit_title works on 1.0.0');
+
+is(undef, $git->commit_title('impossible'), 'undef on impossible object');
+
+$ibx->{-repo_objs} = [ $git ];
 my $res;
 my $solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
 open my $log, '+>>', "$mainrepo/solve.log" or die "open: $!";
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 02/13] viewvcs: preliminary support for showing non-blobs
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
  2019-03-12  4:00 ` [PATCH 01/13] git: add "commit_title" method Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 03/13] viewvcs: match 8000-byte lookup for git Eric Wong
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

Eventually, we'll have special displays for various git objects
(commit, tree, tag).  But for now, we'll just use git-show
to spew whatever comes from git.
---
 lib/PublicInbox/SolverGit.pm | 11 +++++++----
 lib/PublicInbox/ViewVCS.pm   | 28 ++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 463a9b6..cd0f94a 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -62,14 +62,15 @@ sub ERR ($$) {
 	die $err;
 }
 
-# look for existing blobs already in git repos
+# look for existing objects already in git repos
 sub solve_existing ($$) {
 	my ($self, $want) = @_;
 	my $oid_b = $want->{oid_b};
+	my $have_hints = scalar keys %$want > 1;
 	my @ambiguous; # Array of [ git, $oids]
 	foreach my $git (@{$self->{gits}}) {
 		my ($oid_full, $type, $size) = $git->check($oid_b);
-		if (defined($type) && $type eq 'blob') {
+		if (defined($type) && (!$have_hints || $type eq 'blob')) {
 			return [ $git, $oid_full, $type, int($size) ];
 		}
 
@@ -480,10 +481,11 @@ sub resolve_patch ($$) {
 		die "Loop detected solving $cur_want\n";
 	}
 	if (my $existing = solve_existing($self, $want)) {
+		my ($found_git, undef, $type, undef) = @$existing;
 		dbg($self, "found $cur_want in " .
-			join("\n", $existing->[0]->pub_urls));
+			join("\n", $found_git->pub_urls));
 
-		if ($cur_want eq $self->{oid_want}) { # all done!
+		if ($cur_want eq $self->{oid_want} || $type ne 'blob') {
 			eval { delete($self->{user_cb})->($existing) };
 			die "E: $@" if $@;
 			return;
@@ -540,6 +542,7 @@ sub new {
 }
 
 # recreate $oid_want using $hints
+# hints keys: path_a, path_b, oid_a
 # Calls {user_cb} with: [ ::Git object, oid_full, type, size, di (diff_info) ]
 # with found object, or undef if nothing was found
 # Calls {user_cb} with a string error on fatal errors
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index f537451..4a97b9a 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -67,6 +67,33 @@ sub stream_large_blob ($$$$) {
 	});
 }
 
+sub show_other ($$$$) {
+	my ($ctx, $res, $logref, $fn) = @_;
+	my ($git, $oid, $type, $size) = @$res;
+	if ($size > $max_size) {
+		$$logref = "$oid is too big to show\n" . $$logref;
+		return html_page($ctx, 200, $logref);
+	}
+	my $cmd = ['git', "--git-dir=$git->{git_dir}",
+		qw(show --encoding=UTF-8 --no-color --no-abbrev), $oid ];
+	my $qsp = PublicInbox::Qspawn->new($cmd);
+	my $env = $ctx->{env};
+	$qsp->psgi_qx($env, undef, sub {
+		my ($bref) = @_;
+		if (my $err = $qsp->{err}) {
+			utf8::decode($$err);
+			$$logref .= "git show error: $err";
+			return html_page($ctx, 500, $logref);
+		}
+		my $l = PublicInbox::Linkify->new;
+		utf8::decode($$bref);
+		$l->linkify_1($$bref);
+		$$bref = '<pre>'. $l->linkify_2(ascii_html($$bref));
+		$$bref .= '</pre><hr>' . $$logref;
+		html_page($ctx, 200, $bref);
+	});
+}
+
 sub solve_result {
 	my ($ctx, $res, $log, $hints, $fn) = @_;
 
@@ -86,6 +113,7 @@ sub solve_result {
 	$ref eq 'ARRAY' or return html_page($ctx, 500, \$log);
 
 	my ($git, $oid, $type, $size, $di) = @$res;
+	return show_other($ctx, $res, \$log, $fn) if $type ne 'blob';
 	my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
 	my $raw_link = "(<a\nhref=$path>raw</a>)";
 	if ($size > $max_size) {
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 03/13] viewvcs: match 8000-byte lookup for git
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
  2019-03-12  4:00 ` [PATCH 01/13] git: add "commit_title" method Eric Wong
  2019-03-12  4:00 ` [PATCH 02/13] viewvcs: preliminary support for showing non-blobs Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 04/13] spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE Eric Wong
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

No need to scan the entire string, but prefer to match git
behavior.  This might be faster if/when Perl can create
substrings efficiently using CoW.

Fix a 80-column violation while we're at it.
---
 lib/PublicInbox/ViewVCS.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 4a97b9a..b90cd8c 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -131,14 +131,14 @@ sub solve_result {
 		return html_page($ctx, 500, \$log);
 	}
 
-	my $binary = index($$blob, "\0") >= 0;
+	my $bin = index(substr($$blob, 0, $BIN_DETECT), "\0") >= 0;
 	if (defined $fn) {
 		my $h = [ 'Content-Length', $size, 'Content-Type' ];
-		push(@$h, ($binary ? 'application/octet-stream' : 'text/plain'));
+		push(@$h, ($bin ? 'application/octet-stream' : 'text/plain'));
 		return delete($ctx->{-wcb})->([200, $h, [ $$blob ]]);
 	}
 
-	if ($binary) {
+	if ($bin) {
 		$log = "<pre>$oid $type $size bytes (binary)" .
 			" $raw_link</pre>" . $log;
 		return html_page($ctx, 200, \$log);
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 04/13] spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (2 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 03/13] viewvcs: match 8000-byte lookup for git Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 05/13] support publicinbox.cgitrc directive Eric Wong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

We'll be spawning cgit and git-diff, which can take gigantic
amounts of CPU time and/or heap given the right (ermm... wrong)
input.  Limit the damage that large/expensive diffs can cause.
---
 lib/PublicInbox/Spawn.pm   | 43 ++++++++++++++++++++++++++++++++++----
 lib/PublicInbox/SpawnPP.pm |  9 ++++++--
 t/spawn.t                  | 18 ++++++++++++++++
 3 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 91a3c12..8ea255a 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -22,6 +22,8 @@ our @EXPORT_OK = qw/which spawn popen_rd/;
 my $vfork_spawn = <<'VFORK_SPAWN';
 #include <sys/types.h>
 #include <sys/uio.h>
+#include <sys/time.h>
+#include <sys/resource.h>
 #include <unistd.h>
 #include <alloca.h>
 #include <signal.h>
@@ -74,11 +76,12 @@ static void xerr(const char *msg)
  * whatever we'll need in the future.
  * Be sure to update PublicInbox::SpawnPP if this changes
  */
-int public_inbox_fork_exec(int in, int out, int err,
-			SV *file, SV *cmdref, SV *envref)
+int pi_fork_exec(int in, int out, int err,
+			SV *file, SV *cmdref, SV *envref, SV *rlimref)
 {
 	AV *cmd = (AV *)SvRV(cmdref);
 	AV *env = (AV *)SvRV(envref);
+	AV *rlim = (AV *)SvRV(rlimref);
 	const char *filename = SvPV_nolen(file);
 	pid_t pid;
 	char **argv, **envp;
@@ -99,12 +102,27 @@ int public_inbox_fork_exec(int in, int out, int err,
 	pid = vfork();
 	if (pid == 0) {
 		int sig;
+		I32 i, max;
 
 		REDIR(in, 0);
 		REDIR(out, 1);
 		REDIR(err, 2);
 		for (sig = 1; sig < NSIG; sig++)
 			signal(sig, SIG_DFL); /* ignore errors on signals */
+
+		max = av_len(rlim);
+		for (i = 0; i < max; i += 3) {
+			struct rlimit rl;
+			SV **res = av_fetch(rlim, i, 0);
+			SV **soft = av_fetch(rlim, i + 1, 0);
+			SV **hard = av_fetch(rlim, i + 2, 0);
+
+			rl.rlim_cur = SvIV(*soft);
+			rl.rlim_max = SvIV(*hard);
+			if (setrlimit(SvIV(*res), &rl) < 0)
+				xerr("sertlimit");
+		}
+
 		/*
 		 * don't bother unblocking, we don't want signals
 		 * to the group taking out a subprocess
@@ -145,7 +163,7 @@ if (defined $vfork_spawn) {
 unless (defined $vfork_spawn) {
 	require PublicInbox::SpawnPP;
 	no warnings 'once';
-	*public_inbox_fork_exec = *PublicInbox::SpawnPP::public_inbox_fork_exec
+	*pi_fork_exec = *PublicInbox::SpawnPP::pi_fork_exec
 }
 
 # n.b. we never use absolute paths with this
@@ -182,7 +200,24 @@ sub spawn ($;$$) {
 	my $in = $opts->{0} || 0;
 	my $out = $opts->{1} || 1;
 	my $err = $opts->{2} || 2;
-	my $pid = public_inbox_fork_exec($in, $out, $err, $f, $cmd, \@env);
+	my $rlim = [];
+
+	foreach my $l (qw(RLIMIT_CPU RLIMIT_CORE RLIMIT_DATA)) {
+		defined(my $v = $opts->{$l}) or next;
+		my ($soft, $hard);
+		if (ref($v)) {
+			($soft, $hard) = @$v;
+		} else {
+			$soft = $hard = $v;
+		}
+		my $r = eval "require BSD::Resource; BSD::Resource::$l();";
+		unless (defined $r) {
+			warn "$l undefined by BSD::Resource: $@\n";
+			next;
+		}
+		push @$rlim, $r, $soft, $hard;
+	}
+	my $pid = pi_fork_exec($in, $out, $err, $f, $cmd, \@env, $rlim);
 	$pid < 0 ? undef : $pid;
 }
 
diff --git a/lib/PublicInbox/SpawnPP.pm b/lib/PublicInbox/SpawnPP.pm
index 743db22..8692b76 100644
--- a/lib/PublicInbox/SpawnPP.pm
+++ b/lib/PublicInbox/SpawnPP.pm
@@ -9,8 +9,8 @@ use warnings;
 use POSIX qw(dup2 :signal_h);
 
 # Pure Perl implementation for folks that do not use Inline::C
-sub public_inbox_fork_exec ($$$$$$) {
-	my ($in, $out, $err, $f, $cmd, $env) = @_;
+sub pi_fork_exec ($$$$$$) {
+	my ($in, $out, $err, $f, $cmd, $env, $rlim) = @_;
 	my $old = POSIX::SigSet->new();
 	my $set = POSIX::SigSet->new();
 	$set->fillset or die "fillset failed: $!";
@@ -22,6 +22,11 @@ sub public_inbox_fork_exec ($$$$$$) {
 		$pid = -1;
 	}
 	if ($pid == 0) {
+		while (@$rlim) {
+			my ($r, $soft, $hard) = splice(@$rlim, 0, 3);
+			BSD::Resource::setrlimit($r, $soft, $hard) or
+			  warn "failed to set $r=[$soft,$hard]\n";
+		}
 		if ($in != 0) {
 			dup2($in, 0) or die "dup2 failed for stdin: $!";
 		}
diff --git a/t/spawn.t b/t/spawn.t
index db3f2dc..5abedc9 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -92,6 +92,24 @@ use PublicInbox::Spawn qw(which spawn popen_rd);
 	isnt($?, 0, '$? set properly: '.$?);
 }
 
+SKIP: {
+	eval {
+		require BSD::Resource;
+		defined(BSD::Resource::RLIMIT_CPU())
+	} or skip 'BSD::Resource::RLIMIT_CPU missing', 3;
+	my ($r, $w);
+	pipe($r, $w) or die "pipe: $!";
+	my $cmd = ['sh', '-c', 'while true; do :; done'];
+	my $opt = { RLIMIT_CPU => [ 1, 1 ], RLIMIT_CORE => 0, 1 => fileno($w) };
+	my $pid = spawn($cmd, undef, $opt);
+	close $w or die "close(w): $!";
+	my $rset = '';
+	vec($rset, fileno($r), 1) = 1;
+	ok(select($rset, undef, undef, 5), 'child died before timeout');
+	is(waitpid($pid, 0), $pid, 'XCPU child process reaped');
+	isnt($?, 0, 'non-zero exit status');
+}
+
 done_testing();
 
 1;
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 05/13] support publicinbox.cgitrc directive
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (3 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 04/13] spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 06/13] githttpbackend: move more psgi.input handling into subroutine Eric Wong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

We can save admins the trouble of declaring [coderepo "..."]
sections in the public-inbox config by parsing the cgitrc
directly.

Macro expansion (e.g. $HTTP_HOST) expansion is not supported,
yet; but may be in the future.
---
 Documentation/public-inbox-config.pod | 15 +++++++-
 lib/PublicInbox/Config.pm             | 54 +++++++++++++++++++++++++--
 lib/PublicInbox/Git.pm                | 13 ++++++-
 lib/PublicInbox/SolverGit.pm          |  5 ++-
 4 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index 27d27e4..5ee93e2 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -188,16 +188,27 @@ be treated as the default value.
 
 Default: 25
 
-=item coderepo.<name>.dir
+=item coderepo.<nick>.dir
 
 The path to a git repository for "publicinbox.<name>.coderepo"
 
-=item coderepo.<name>.cgitUrl
+=item coderepo.<nick>.cgitUrl
 
 The URL of the cgit instance associated with the coderepo.
 
 Default: none
 
+=item publicinbox.cgitrc
+
+A path to a L<cgitrc(5)> file.  "repo.url" directives in the cgitrc
+will be mapped to the nickname of a coderepo (without trailing slash),
+and "repo.path" directives map to "coderepo.<nick>.dir".
+Use of this directive allows admins of existing cgit installations
+to skip declaring coderepo sections and map inboxes directly to
+code repositories known to cgit.
+
+Macro expansion (e.g. C<$HTTP_HOST>) is not yet supported.
+
 =back
 
 =head2 NAMED LIMITER (PSGI)
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index da443e5..490b4c4 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -196,15 +196,59 @@ sub valid_inbox_name ($) {
 	1;
 }
 
+sub cgit_repo_merge ($$) {
+	my ($self, $repo) = @_;
+	# $repo = { url => 'foo.git', path => '/path/to/foo.git' }
+	my $nick = $repo->{url};
+	$self->{"coderepo.$nick.dir"} ||= $repo->{path};
+	$self->{"coderepo.$nick.cgiturl"} ||= $nick;
+}
+
+sub parse_cgitrc {
+	my ($self, $cgitrc, $nesting) = @_;
+
+	# same limit as cgit/configfile.c::parse_configfile
+	return if $nesting > 8;
+
+	open my $fh, '<', $cgitrc or do {
+		warn "failed to open cgitrc=$cgitrc: $!\n";
+		return;
+	};
+
+	# FIXME: this doesn't support macro expansion via $VARS, yet
+	my $repo;
+	foreach (<$fh>) {
+		chomp;
+		if (m!\Arepo\.url=(.+?)/*\z!) {
+			my $nick = $1;
+			cgit_repo_merge($self, $repo) if $repo;
+			$repo = { url => $nick };
+		} elsif (m!\Arepo\.path=(.+)\z!) {
+			if (defined $repo) {
+				$repo->{path} = $1;
+			} else {
+				warn "$_ without repo.url\n";
+			}
+		} elsif (m!\Ainclude=(.+)\z!) {
+			parse_cgitrc($self, $1, $nesting + 1);
+		}
+	}
+	cgit_repo_merge($self, $repo) if $repo;
+}
+
 # parse a code repo
 # Only git is supported at the moment, but SVN and Hg are possibilities
 sub _fill_code_repo {
 	my ($self, $nick) = @_;
 	my $pfx = "coderepo.$nick";
 
+	# TODO: support gitweb and other repository viewers?
+	if (defined(my $cgitrc = delete $self->{'publicinbox.cgitrc'})) {
+		parse_cgitrc($self, $cgitrc, 0);
+	}
 	my $dir = $self->{"$pfx.dir"}; # aka "GIT_DIR"
 	unless (defined $dir) {
-		warn "$pfx.repodir unset";
+		warn "$pfx.dir unset";
 		return;
 	}
 
@@ -225,8 +269,6 @@ sub _fill_code_repo {
 			} @$cgits;
 		}
 	}
-	# TODO: support gitweb and other repository viewers?
-	# TODO: parse cgitrc
 
 	$git;
 }
@@ -291,7 +333,11 @@ sub _fill {
 		my $code_repos = $self->{-code_repos};
 		my $repo_objs = $rv->{-repo_objs} = [];
 		foreach my $nick (@$ibx_code_repos) {
-			valid_inbox_name($nick) or next;
+			my @parts = split(m!/!, $nick);
+			my $valid = 0;
+			$valid += valid_inbox_name($_) foreach (@parts);
+			$valid == scalar(@parts) or next;
+
 			my $repo = $code_repos->{$nick} ||=
 						_fill_code_repo($self, $nick);
 			push @$repo_objs, $repo if $repo;
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 265c3fb..8a96e10 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -288,10 +288,19 @@ sub src_blob_url {
 	local_nick($self);
 }
 
+sub host_prefix_url ($$) {
+	my ($env, $url) = @_;
+	return $url if index($url, '//') >= 0;
+	my $scheme = $env->{'psgi.url_scheme'};
+	my $host_port = $env->{HTTP_HOST} ||
+		"$env->{SERVER_NAME}:$env->{SERVER_PORT}";
+	"$scheme://$host_port". ($env->{SCRIPT_NAME} || '/') . $url;
+}
+
 sub pub_urls {
-	my ($self) = @_;
+	my ($self, $env) = @_;
 	if (my $urls = $self->{cgit_url}) {
-		return @$urls;
+		return map { host_prefix_url($env, $_) } @$urls;
 	}
 	local_nick($self);
 }
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index cd0f94a..3841c56 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -85,7 +85,8 @@ sub solve_existing ($$) {
 		# push @ambiguous, [ $git, @oids ];
 
 		dbg($self, "`$oid_b' ambiguous in " .
-				join("\n\t", $git->pub_urls) . "\n" .
+				join("\n\t", $git->pub_urls($self->{psgi_env}))
+                                . "\n" .
 				join('', map { "$_ blob\n" } @oids));
 	}
 	scalar(@ambiguous) ? \@ambiguous : undef;
@@ -483,7 +484,7 @@ sub resolve_patch ($$) {
 	if (my $existing = solve_existing($self, $want)) {
 		my ($found_git, undef, $type, undef) = @$existing;
 		dbg($self, "found $cur_want in " .
-			join("\n", $found_git->pub_urls));
+			join("\n", $found_git->pub_urls($self->{psgi_env})));
 
 		if ($cur_want eq $self->{oid_want} || $type ne 'blob') {
 			eval { delete($self->{user_cb})->($existing) };
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 06/13] githttpbackend: move more psgi.input handling into subroutine
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (4 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 05/13] support publicinbox.cgitrc directive Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 07/13] githttpbackend: check for other errors and relax CRLF check Eric Wong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

This will be useful for other CGI wrappers we make.
This also fixes a bug with some PSGI servers which did not
present a real IO::Handle in the psgi.input env field.
---
 lib/PublicInbox/GitHTTPBackend.pm | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/GitHTTPBackend.pm b/lib/PublicInbox/GitHTTPBackend.pm
index ab43a00..be5b924 100644
--- a/lib/PublicInbox/GitHTTPBackend.pm
+++ b/lib/PublicInbox/GitHTTPBackend.pm
@@ -182,11 +182,6 @@ sub prepare_range {
 # returns undef if 403 so it falls back to dumb HTTP
 sub serve_smart {
 	my ($env, $git, $path) = @_;
-	my $in = $env->{'psgi.input'};
-	my $fd = eval { fileno($in) };
-	unless (defined $fd && $fd >= 0) {
-		$in = input_to_file($env) or return r(500);
-	}
 	my %env = %ENV;
 	# GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL
 	# may be set in the server-process and are passed as-is
@@ -202,7 +197,7 @@ sub serve_smart {
 	my $limiter = $git->{-httpbackend_limiter} || $default_limiter;
 	$env{GIT_HTTP_EXPORT_ALL} = '1';
 	$env{PATH_TRANSLATED} = "$git->{git_dir}/$path";
-	my $rdr = { 0 => fileno($in) };
+	my $rdr = input_prepare($env) or return r(500);
 	my $qsp = PublicInbox::Qspawn->new([qw(git http-backend)], \%env, $rdr);
 	$qsp->psgi_return($env, $limiter, sub {
 		my ($r, $bref) = @_;
@@ -211,14 +206,19 @@ sub serve_smart {
 	});
 }
 
-sub input_to_file {
+sub input_prepare {
 	my ($env) = @_;
+
+	my $input = $env->{'psgi.input'};
+	my $fd = eval { fileno($input) };
+	if (defined $fd && $fd >= 0) {
+		return { 0 => $fd };
+	}
 	open(my $in, '+>', undef);
 	unless (defined $in) {
 		err($env, "could not open temporary file: $!");
 		return;
 	}
-	my $input = $env->{'psgi.input'};
 	my $buf;
 	while (1) {
 		my $r = $input->read($buf, 8192);
@@ -243,7 +243,7 @@ sub input_to_file {
 		err($env, "error seeking temporary file: $!");
 		return;
 	}
-	return $in;
+	{ 0 => fileno($in), -hold => $in };
 }
 
 sub parse_cgi_headers {
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 07/13] githttpbackend: check for other errors and relax CRLF check
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (5 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 06/13] githttpbackend: move more psgi.input handling into subroutine Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 08/13] spawn: support absolute paths Eric Wong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

Reads to git-http-backend(1) could fail or EOF prematurely,
so we must be ready for that case.

Furthermore, cgit (and possibly other CGI) uses LF instead
of CRLF, so support those programs, too.
---
 lib/PublicInbox/GitHTTPBackend.pm | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/GitHTTPBackend.pm b/lib/PublicInbox/GitHTTPBackend.pm
index be5b924..2e4a954 100644
--- a/lib/PublicInbox/GitHTTPBackend.pm
+++ b/lib/PublicInbox/GitHTTPBackend.pm
@@ -201,8 +201,8 @@ sub serve_smart {
 	my $qsp = PublicInbox::Qspawn->new([qw(git http-backend)], \%env, $rdr);
 	$qsp->psgi_return($env, $limiter, sub {
 		my ($r, $bref) = @_;
-		$r = parse_cgi_headers($bref) or return; # incomplete headers
-		$r->[0] == 403 ? serve_dumb($env, $git, $path) : $r;
+		my $res = parse_cgi_headers($r, $bref) or return; # incomplete
+		$res->[0] == 403 ? serve_dumb($env, $git, $path) : $res;
 	});
 }
 
@@ -247,12 +247,13 @@ sub input_prepare {
 }
 
 sub parse_cgi_headers {
-	my ($bref) = @_;
-	$$bref =~ s/\A(.*?)\r\n\r\n//s or return;
+	my ($r, $bref) = @_;
+	return r(500) unless defined $r && $r >= 0;
+	$$bref =~ s/\A(.*?)\r?\n\r?\n//s or return $r == 0 ? r(500) : undef;
 	my $h = $1;
 	my $code = 200;
 	my @h;
-	foreach my $l (split(/\r\n/, $h)) {
+	foreach my $l (split(/\r?\n/, $h)) {
 		my ($k, $v) = split(/:\s*/, $l, 2);
 		if ($k =~ /\AStatus\z/i) {
 			($code) = ($v =~ /\b(\d+)\b/);
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 08/13] spawn: support absolute paths
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (6 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 07/13] githttpbackend: check for other errors and relax CRLF check Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 09/13] cgit: support running cgit as a standalone CGI Eric Wong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

cgit (and most other CGI executables) is not typically installed
for use via $PATH, so we'll need to support absolute paths to
run it.
---
 lib/PublicInbox/Spawn.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 8ea255a..202cfca 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -166,9 +166,9 @@ unless (defined $vfork_spawn) {
 	*pi_fork_exec = *PublicInbox::SpawnPP::pi_fork_exec
 }
 
-# n.b. we never use absolute paths with this
 sub which ($) {
 	my ($file) = @_;
+	return $file if index($file, '/') == 0;
 	foreach my $p (split(':', $ENV{PATH})) {
 		$p .= "/$file";
 		return $p if -x $p;
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 09/13] cgit: support running cgit as a standalone CGI
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (7 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 08/13] spawn: support absolute paths Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 10/13] www: wire up cgit as a 404 handler if cgitrc is configured Eric Wong
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

We depend on git-http-backend for smart HTTP clone support,
however; since cgit does not support smart clones natively.
WWW.pm will be able to cascade down to this as a 404 handler in
the future.
---
 MANIFEST                |  2 +
 examples/cgit.psgi      | 29 ++++++++++++++
 lib/PublicInbox/Cgit.pm | 87 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+)
 create mode 100644 examples/cgit.psgi
 create mode 100644 lib/PublicInbox/Cgit.pm

diff --git a/MANIFEST b/MANIFEST
index e6316ef..150e337 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -38,6 +38,7 @@ examples/apache2_perl.conf
 examples/apache2_perl_old.conf
 examples/cgi-webrick.rb
 examples/cgit-commit-filter.lua
+examples/cgit.psgi
 examples/highlight.psgi
 examples/logrotate.conf
 examples/newswww.psgi
@@ -58,6 +59,7 @@ examples/varnish-4.vcl
 lib/PublicInbox/Address.pm
 lib/PublicInbox/Admin.pm
 lib/PublicInbox/AltId.pm
+lib/PublicInbox/Cgit.pm
 lib/PublicInbox/Config.pm
 lib/PublicInbox/ContentId.pm
 lib/PublicInbox/Daemon.pm
diff --git a/examples/cgit.psgi b/examples/cgit.psgi
new file mode 100644
index 0000000..ca93f92
--- /dev/null
+++ b/examples/cgit.psgi
@@ -0,0 +1,29 @@
+#!/usr/bin/perl -w
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: GPL-3.0+ <https://www.gnu.org/licenses/gpl-3.0.txt>
+#
+# PublicInbox::Cgit may be used independently of WWW.
+#
+# Usage:
+#	plackup -I lib -o 127.0.0.1 -R lib -r examples/cgit.psgi
+use strict;
+use warnings;
+use Plack::Builder;
+use PublicInbox::Cgit;
+use PublicInbox::Config;
+my $pi_config = PublicInbox::Config->new;
+my $cgit = PublicInbox::Cgit->new($pi_config);
+
+builder {
+	eval {
+		enable 'Deflater',
+			content_type => [ qw(
+				text/html
+				text/plain
+				application/atom+xml
+				)]
+	};
+	eval { enable 'ReverseProxy' };
+	enable 'Head';
+	sub { $cgit->call($_[0]) }
+}
diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
new file mode 100644
index 0000000..3d1a0d5
--- /dev/null
+++ b/lib/PublicInbox/Cgit.pm
@@ -0,0 +1,87 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# wrapper for cgit(1) and git-http-backend(1) for browsing and
+# serving git code repositories.  Requires 'publicinbox.cgitrc'
+# directive to be set in the public-inbox config file.
+
+package PublicInbox::Cgit;
+use strict;
+use PublicInbox::GitHTTPBackend;
+# not bothering with Exporter for a one-off
+*r = *PublicInbox::GitHTTPBackend::r;
+*input_prepare = *PublicInbox::GitHTTPBackend::input_prepare;
+*parse_cgi_headers = *PublicInbox::GitHTTPBackend::parse_cgi_headers;
+*serve = *PublicInbox::GitHTTPBackend::serve;
+use warnings;
+use PublicInbox::Qspawn;
+
+sub new {
+	my ($class, $pi_config) = @_;
+	my $cgit_bin = $pi_config->{'publicinbox.cgitbin'} ||
+		# Debian default location:
+		'/usr/lib/cgit/cgit.cgi';
+
+	my $self = bless {
+		cmd => [ $cgit_bin ],
+		pi_config => $pi_config,
+	}, $class;
+
+	$pi_config->each_inbox(sub {}); # fill in -code_repos mapped to inboxes
+
+	# some cgit repos may not be mapped to inboxes, so ensure those exist:
+	my $code_repos = $pi_config->{-code_repos};
+	foreach my $k (keys %$pi_config) {
+		$k =~ /\Acoderepo\.(.+)\.dir\z/ or next;
+		my $dir = $pi_config->{$k};
+		$code_repos->{$1} ||= PublicInbox::Git->new($dir);
+	}
+	while (my ($nick, $repo) = each %$code_repos) {
+		$self->{"\0$nick"} = $repo;
+	}
+	$self;
+}
+
+# only what cgit cares about:
+my @PASS_ENV = qw(
+	HTTP_HOST
+	QUERY_STRING
+	REQUEST_METHOD
+	SCRIPT_NAME
+	SERVER_NAME
+	SERVER_PORT
+	HTTP_COOKIE
+	HTTP_REFERER
+	CONTENT_LENGTH
+);
+# XXX: cgit filters may care about more variables...
+
+sub call {
+	my ($self, $env) = @_;
+	my $path_info = $env->{PATH_INFO};
+
+	# handle requests without spawning cgit iff possible:
+	if ($path_info =~ m!\A/(.+?)/($PublicInbox::GitHTTPBackend::ANY)\z!ox) {
+		my ($nick, $path) = ($1, $2);
+		if (my $git = $self->{"\0$nick"}) {
+			return serve($env, $git, $path);
+		}
+	}
+
+	my $cgi_env = { PATH_INFO => $path_info };
+	foreach (@PASS_ENV) {
+		defined(my $v = $env->{$_}) or next;
+		$cgi_env->{$_} = $v;
+	}
+	$cgi_env->{'HTTPS'} = 'on' if $env->{'psgi.url_scheme'} eq 'https';
+
+	my $rdr = input_prepare($env) or return r(500);
+	my $qsp = PublicInbox::Qspawn->new($self->{cmd}, $cgi_env, $rdr);
+	$qsp->psgi_return($env, undef, sub {
+		my ($r, $bref) = @_;
+		my $res = parse_cgi_headers($r, $bref) or return; # incomplete
+		$res;
+	});
+}
+
+1;
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 10/13] www: wire up cgit as a 404 handler if cgitrc is configured
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (8 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 09/13] cgit: support running cgit as a standalone CGI Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 11/13] qspawn: wire up RLIMIT_* handling to limiters Eric Wong
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

Requests intended for cgit are unlikely to conflict with
requests to inboxes.  So we can safely hand those requests
off to cgit.cgi.
---
 Documentation/public-inbox-config.pod |  8 ++++++++
 lib/PublicInbox/Config.pm             |  3 ++-
 lib/PublicInbox/WWW.pm                | 20 +++++++++++++++++++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index 5ee93e2..9647e4a 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -209,6 +209,14 @@ code repositories known to cgit.
 
 Macro expansion (e.g. C<$HTTP_HOST>) is not yet supported.
 
+=item publicinbox.cgitbin
+
+A path to the C<cgit.cgi> executable.  The L<PublicInbox::WWW>
+interface can spawn cgit as a fallback if the publicinbox.cgitrc
+directive is configured.
+
+Default: /usr/lib/cgit/cgit.cgi
+
 =back
 
 =head2 NAMED LIMITER (PSGI)
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 490b4c4..15664c6 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -30,6 +30,7 @@ sub new {
 	$self->{-no_obfuscate} ||= {};
 	$self->{-limiters} ||= {};
 	$self->{-code_repos} ||= {}; # nick => PublicInbox::Git object
+	$self->{-cgitrc_unparsed} = $self->{'publicinbox.cgitrc'};
 
 	if (my $no = delete $self->{'publicinbox.noobfuscate'}) {
 		$no = _array($no);
@@ -243,7 +244,7 @@ sub _fill_code_repo {
 	my $pfx = "coderepo.$nick";
 
 	# TODO: support gitweb and other repository viewers?
-	if (defined(my $cgitrc = delete $self->{'publicinbox.cgitrc'})) {
+	if (defined(my $cgitrc = delete $self->{-cgitrc_unparsed})) {
 		parse_cgitrc($self, $cgitrc, 0);
 	}
 	my $dir = $self->{"$pfx.dir"}; # aka "GIT_DIR"
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 7ed4f65..6e14e8c 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -15,6 +15,7 @@ use 5.008;
 use strict;
 use warnings;
 use bytes (); # only for bytes::length
+use Plack::Util;
 use PublicInbox::Config;
 use PublicInbox::Hval;
 use URI::Escape qw(uri_unescape);
@@ -154,6 +155,7 @@ sub preload {
 		eval "require $_;";
 	}
 	if (ref($self)) {
+		$self->cgit;
 		$self->stylesheets_prepare($_) for ('', '../', '../../');
 	}
 }
@@ -188,7 +190,9 @@ sub invalid_inbox ($$) {
 	# generation and link things intended for nntp:// to https?://,
 	# so try to infer links and redirect them to the appropriate
 	# list URL.
-	$www->news_www->call($ctx->{env});
+	my $env = $ctx->{env};
+	my $res = $www->news_www->call($env);
+	$res->[0] == 404 ? $www->cgit->call($env) : $res;
 }
 
 # returns undef if valid, array ref response if invalid
@@ -467,6 +471,20 @@ sub news_www {
 	}
 }
 
+sub cgit {
+	my ($self) = @_;
+	$self->{cgit} ||= do {
+		my $pi_config = $self->{pi_config};
+
+		if (defined($pi_config->{'publicinbox.cgitrc'})) {
+			require PublicInbox::Cgit;
+			PublicInbox::Cgit->new($pi_config);
+		} else {
+			Plack::Util::inline_object(call => sub { r404() });
+		}
+	}
+}
+
 sub get_attach {
 	my ($ctx, $idx, $fn) = @_;
 	require PublicInbox::WwwAttach;
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 11/13] qspawn: wire up RLIMIT_* handling to limiters
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (9 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 10/13] www: wire up cgit as a 404 handler if cgitrc is configured Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 12/13] cgit: use a dedicated named limiter Eric Wong
  2019-03-12  4:00 ` [PATCH 13/13] spawn: require soft and hard vals for RLIMIT_* params Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

This allows users to configure RLIMIT_{CORE,CPU,DATA} using
our "limiter" config directive when spawning external processes.
---
 Documentation/public-inbox-config.pod | 17 +++++++++++
 lib/PublicInbox/Config.pm             |  4 ++-
 lib/PublicInbox/Qspawn.pm             | 41 +++++++++++++++++++++++++--
 lib/PublicInbox/Spawn.pm              |  3 +-
 4 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index 9647e4a..dae6998 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -228,12 +228,29 @@ large inboxes, it makes sense to put large inboxes on a named
 limiter with a low max value; while smaller inboxes can use
 the default limiter.
 
+C<RLIMIT_*> keys may be set to enforce resource limits for
+a particular limiter.
+
 =over 8
 
 =item publicinboxlimiter.<name>.max
 
 The maximum number of parallel processes for the given limiter.
 
+=item publicinboxlimiter.<name>.rlimitCore
+
+=item publicinboxlimiter.<name>.rlimitCPU
+
+=item publicinboxlimiter.<name>.rlimitData
+
+The maximum core size, CPU time, or data size processes run with the
+given limiter will use.  This may be comma-separated to distinguish
+soft and hard limits.  The word "INFINITY" is accepted as the
+RLIM_INFINITY constant (if supported by your OS).
+
+See L<setrlimit(2)> for more info on the behavior of RLIMIT_CORE,
+RLIMIT_CPU, and RLIMIT_DATA for you operating system.
+
 =back
 
 =head3 EXAMPLE WITH NAMED LIMITERS
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 15664c6..0f7485f 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -133,7 +133,9 @@ sub limiter {
 	$self->{-limiters}->{$name} ||= do {
 		require PublicInbox::Qspawn;
 		my $max = $self->{"publicinboxlimiter.$name.max"};
-		PublicInbox::Qspawn::Limiter->new($max);
+		my $limiter = PublicInbox::Qspawn::Limiter->new($max);
+		$limiter->setup_rlimit($name, $self);
+		$limiter;
 	};
 }
 
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 509a441..79cdae7 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -43,10 +43,18 @@ sub new ($$$;) {
 sub _do_spawn {
 	my ($self, $cb) = @_;
 	my $err;
+	my ($cmd, $env, $opts) = @{$self->{args}};
+	my %opts = %{$opts || {}};
+	my $limiter = $self->{limiter};
+	foreach my $k (PublicInbox::Spawn::RLIMITS()) {
+		if (defined(my $rlimit = $limiter->{$k})) {
+			$opts{$k} = $rlimit;
+		}
+	}
 
-	($self->{rpipe}, $self->{pid}) = popen_rd(@{$self->{args}});
+	($self->{rpipe}, $self->{pid}) = popen_rd($cmd, $env, \%opts);
 	if (defined $self->{pid}) {
-		$self->{limiter}->{running}++;
+		$limiter->{running}++;
 	} else {
 		$self->{err} = $!;
 	}
@@ -251,9 +259,38 @@ sub new {
 		max => $max || 32,
 		running => 0,
 		run_queue => [],
+		# RLIMIT_CPU => undef,
+		# RLIMIT_DATA => undef,
+		# RLIMIT_CORE => undef,
 	}, $class;
 }
 
+sub setup_rlimit {
+	my ($self, $name, $config) = @_;
+	foreach my $rlim (PublicInbox::Spawn::RLIMITS()) {
+		my $k = lc($rlim);
+		$k =~ tr/_//d;
+		$k = "publicinboxlimiter.$name.$k";
+		defined(my $v = $config->{$k}) or next;
+		my @rlimit = split(/\s*,\s*/, $v);
+		if (scalar(@rlimit) == 1) {
+			push @rlimit, $rlimit[0];
+		} elsif (scalar(@rlimit) != 2) {
+			warn "could not parse $k: $v\n";
+		}
+		eval { require BSD::Resource };
+		if ($@) {
+			warn "BSD::Resource missing for $rlim";
+			next;
+		}
+		foreach my $i (0..$#rlimit) {
+			next if $rlimit[$i] ne 'INFINITY';
+			$rlimit[$i] = BSD::Resource::RLIM_INFINITY();
+		}
+		$self->{$rlim} = \@rlimit;
+	}
+}
+
 # captures everything into a buffer and executes a callback when done
 package PublicInbox::Qspawn::Qx;
 use strict;
diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index 202cfca..fd98160 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -18,6 +18,7 @@ use Symbol qw(gensym);
 use IO::Handle;
 use PublicInbox::ProcessPipe;
 our @EXPORT_OK = qw/which spawn popen_rd/;
+sub RLIMITS () { qw(RLIMIT_CPU RLIMIT_CORE RLIMIT_DATA) }
 
 my $vfork_spawn = <<'VFORK_SPAWN';
 #include <sys/types.h>
@@ -202,7 +203,7 @@ sub spawn ($;$$) {
 	my $err = $opts->{2} || 2;
 	my $rlim = [];
 
-	foreach my $l (qw(RLIMIT_CPU RLIMIT_CORE RLIMIT_DATA)) {
+	foreach my $l (RLIMITS()) {
 		defined(my $v = $opts->{$l}) or next;
 		my ($soft, $hard);
 		if (ref($v)) {
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 12/13] cgit: use a dedicated named limiter
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (10 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 11/13] qspawn: wire up RLIMIT_* handling to limiters Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  2019-03-12  4:00 ` [PATCH 13/13] spawn: require soft and hard vals for RLIMIT_* params Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

I mainly need this to enforce RLIMIT_CPU (and RLIMIT_CORE)
when requests come which generate giant, unrealistic diffs.

Per-coderepo limiters may be added in the future.  But for now,
I need to prevent cgit from monopolizing resources on my dinky
server.
---
 Documentation/public-inbox-config.pod | 4 ++++
 lib/PublicInbox/Cgit.pm               | 3 ++-
 lib/PublicInbox/Config.pm             | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/public-inbox-config.pod b/Documentation/public-inbox-config.pod
index dae6998..d6ecd08 100644
--- a/Documentation/public-inbox-config.pod
+++ b/Documentation/public-inbox-config.pod
@@ -231,6 +231,10 @@ the default limiter.
 C<RLIMIT_*> keys may be set to enforce resource limits for
 a particular limiter.
 
+Default named-limiters are prefixed with "-".  Currently,
+the "-cgit" named limiter is reserved for instances spawning
+cgit via C<publicinbox.cgitrc>
+
 =over 8
 
 =item publicinboxlimiter.<name>.max
diff --git a/lib/PublicInbox/Cgit.pm b/lib/PublicInbox/Cgit.pm
index 3d1a0d5..9ba9e14 100644
--- a/lib/PublicInbox/Cgit.pm
+++ b/lib/PublicInbox/Cgit.pm
@@ -77,7 +77,8 @@ sub call {
 
 	my $rdr = input_prepare($env) or return r(500);
 	my $qsp = PublicInbox::Qspawn->new($self->{cmd}, $cgi_env, $rdr);
-	$qsp->psgi_return($env, undef, sub {
+	my $limiter = $self->{pi_config}->limiter('-cgit');
+	$qsp->psgi_return($env, $limiter, sub {
 		my ($r, $bref) = @_;
 		my $res = parse_cgi_headers($r, $bref) or return; # incomplete
 		$res;
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 0f7485f..06e6215 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -132,7 +132,7 @@ sub limiter {
 	my ($self, $name) = @_;
 	$self->{-limiters}->{$name} ||= do {
 		require PublicInbox::Qspawn;
-		my $max = $self->{"publicinboxlimiter.$name.max"};
+		my $max = $self->{"publicinboxlimiter.$name.max"} || 1;
 		my $limiter = PublicInbox::Qspawn::Limiter->new($max);
 		$limiter->setup_rlimit($name, $self);
 		$limiter;
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 13/13] spawn: require soft and hard vals for RLIMIT_* params
  2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
                   ` (11 preceding siblings ...)
  2019-03-12  4:00 ` [PATCH 12/13] cgit: use a dedicated named limiter Eric Wong
@ 2019-03-12  4:00 ` Eric Wong
  12 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2019-03-12  4:00 UTC (permalink / raw)
  To: meta

Our high-level config already treats single limits as a
soft==hard limit for limiters; so stop handling that redundant
in the low-level spawn() sub.
---
 lib/PublicInbox/Spawn.pm | 8 +-------
 t/spawn.t                | 3 ++-
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm
index fd98160..7b0f3bd 100644
--- a/lib/PublicInbox/Spawn.pm
+++ b/lib/PublicInbox/Spawn.pm
@@ -205,18 +205,12 @@ sub spawn ($;$$) {
 
 	foreach my $l (RLIMITS()) {
 		defined(my $v = $opts->{$l}) or next;
-		my ($soft, $hard);
-		if (ref($v)) {
-			($soft, $hard) = @$v;
-		} else {
-			$soft = $hard = $v;
-		}
 		my $r = eval "require BSD::Resource; BSD::Resource::$l();";
 		unless (defined $r) {
 			warn "$l undefined by BSD::Resource: $@\n";
 			next;
 		}
-		push @$rlim, $r, $soft, $hard;
+		push @$rlim, $r, @$v;
 	}
 	my $pid = pi_fork_exec($in, $out, $err, $f, $cmd, \@env, $rlim);
 	$pid < 0 ? undef : $pid;
diff --git a/t/spawn.t b/t/spawn.t
index 5abedc9..8840428 100644
--- a/t/spawn.t
+++ b/t/spawn.t
@@ -100,7 +100,8 @@ SKIP: {
 	my ($r, $w);
 	pipe($r, $w) or die "pipe: $!";
 	my $cmd = ['sh', '-c', 'while true; do :; done'];
-	my $opt = { RLIMIT_CPU => [ 1, 1 ], RLIMIT_CORE => 0, 1 => fileno($w) };
+	my $fd = fileno($w);
+	my $opt = { RLIMIT_CPU => [ 1, 1 ], RLIMIT_CORE => [ 0, 0 ], 1 => $fd };
 	my $pid = spawn($cmd, undef, $opt);
 	close $w or die "close(w): $!";
 	my $rset = '';
-- 
EW


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12  4:00 [PATCH 00/13] support parsing cgitrc and spawning cgit Eric Wong
2019-03-12  4:00 ` [PATCH 01/13] git: add "commit_title" method Eric Wong
2019-03-12  4:00 ` [PATCH 02/13] viewvcs: preliminary support for showing non-blobs Eric Wong
2019-03-12  4:00 ` [PATCH 03/13] viewvcs: match 8000-byte lookup for git Eric Wong
2019-03-12  4:00 ` [PATCH 04/13] spawn: support RLIMIT_CPU, RLIMIT_DATA and RLIMIT_CORE Eric Wong
2019-03-12  4:00 ` [PATCH 05/13] support publicinbox.cgitrc directive Eric Wong
2019-03-12  4:00 ` [PATCH 06/13] githttpbackend: move more psgi.input handling into subroutine Eric Wong
2019-03-12  4:00 ` [PATCH 07/13] githttpbackend: check for other errors and relax CRLF check Eric Wong
2019-03-12  4:00 ` [PATCH 08/13] spawn: support absolute paths Eric Wong
2019-03-12  4:00 ` [PATCH 09/13] cgit: support running cgit as a standalone CGI Eric Wong
2019-03-12  4:00 ` [PATCH 10/13] www: wire up cgit as a 404 handler if cgitrc is configured Eric Wong
2019-03-12  4:00 ` [PATCH 11/13] qspawn: wire up RLIMIT_* handling to limiters Eric Wong
2019-03-12  4:00 ` [PATCH 12/13] cgit: use a dedicated named limiter Eric Wong
2019-03-12  4:00 ` [PATCH 13/13] spawn: require soft and hard vals for RLIMIT_* params Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox