user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 00/14] convert solver to use pi-httpd.async
@ 2019-01-27  4:03 Eric Wong
  2019-01-27  4:03 ` [PATCH 01/14] httpd/async: remove needless sysread wrapper Eric Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

Much of the groundwork for this was laid in the now-abandoned
"repobrowse" branch.  The goal was to improves fairness as we no
longer wait synchronously on git (apply|update-index|ls-files)
processes and can requests for other clients.

The end result was slightly (2-3%?) slower with all the
callbacks, but reducing "git apply" invocations by relying on
pathnames (instead of stdin) made the end result ~20% faster for
a large (64) patch series.

Email::Simple (via Email::MIME/PublicInbox::MIME) remains a
performance bottleneck, as it does a lot of unnecessary header
parsing and hash-table populating we don't care about; but I'm
not sure if I'll have time to address that.

Eric Wong (14):
  httpd/async: remove needless sysread wrapper
  qspawn: implement psgi_return and use it for githttpbackend
  qspawn|getlinebody: support streaming filters
  qspawn|httpd/async: improve and fix out-of-date comments
  httpd/async: stop running command if client disconnects
  qspawn: implement psgi_qx
  t/qspawn.t: psgi_qx stderr test
  view: swap CRLF for LF in HTML output
  solver: rewrite to use Qspawn->psgi_qx and pi-httpd.async
  solver: hold patches in temporary directory
  solver: reduce "git apply" invocations
  qspawn: decode $? for user-friendliness
  viewvcs: do not show final error message twice
  solver: crank up max patches to 9999

 lib/PublicInbox/GetlineBody.pm    |  16 +-
 lib/PublicInbox/Git.pm            |   2 +-
 lib/PublicInbox/GitHTTPBackend.pm |  64 +---
 lib/PublicInbox/HTTPD/Async.pm    |  27 +-
 lib/PublicInbox/Qspawn.pm         | 143 +++++++-
 lib/PublicInbox/SolverGit.pm      | 532 +++++++++++++++++-------------
 lib/PublicInbox/View.pm           |   4 +
 lib/PublicInbox/ViewVCS.pm        |  50 ++-
 t/qspawn.t                        |  12 +-
 t/solver_git.t                    |  22 +-
 10 files changed, 543 insertions(+), 329 deletions(-)

-- 
EW


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/14] httpd/async: remove needless sysread wrapper
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 02/14] qspawn: implement psgi_return and use it for githttpbackend Eric Wong
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

We don't appear to be using it anywhere
---
 lib/PublicInbox/HTTPD/Async.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index 842aaf6..a8488b6 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -66,7 +66,6 @@ sub async_pass {
 sub event_read { $_[0]->{cb}->(@_) }
 sub event_hup { $_[0]->{cb}->(@_) }
 sub event_err { $_[0]->{cb}->(@_) }
-sub sysread { shift->{sock}->sysread(@_) }
 
 sub close {
 	my $self = shift;
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/14] qspawn: implement psgi_return and use it for githttpbackend
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
  2019-01-27  4:03 ` [PATCH 01/14] httpd/async: remove needless sysread wrapper Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 03/14] qspawn|getlinebody: support streaming filters Eric Wong
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

Was: ("repobrowse: port patch generation over to qspawn")

We'll be using it for githttpbackend and maybe other things.
---
 lib/PublicInbox/GitHTTPBackend.pm | 64 +++----------------------------
 lib/PublicInbox/Qspawn.pm         | 58 ++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 59 deletions(-)

diff --git a/lib/PublicInbox/GitHTTPBackend.pm b/lib/PublicInbox/GitHTTPBackend.pm
index 54ccfa0..ab43a00 100644
--- a/lib/PublicInbox/GitHTTPBackend.pm
+++ b/lib/PublicInbox/GitHTTPBackend.pm
@@ -200,69 +200,15 @@ sub serve_smart {
 		$env{$name} = $val if defined $val;
 	}
 	my $limiter = $git->{-httpbackend_limiter} || $default_limiter;
-	my $git_dir = $git->{git_dir};
 	$env{GIT_HTTP_EXPORT_ALL} = '1';
-	$env{PATH_TRANSLATED} = "$git_dir/$path";
+	$env{PATH_TRANSLATED} = "$git->{git_dir}/$path";
 	my $rdr = { 0 => fileno($in) };
 	my $qsp = PublicInbox::Qspawn->new([qw(git http-backend)], \%env, $rdr);
-	my ($fh, $rpipe);
-	my $end = sub {
-		if (my $err = $qsp->finish) {
-			err($env, "git http-backend ($git_dir): $err");
-		}
-		$fh->close if $fh; # async-only
-	};
-
-	# Danga::Socket users, we queue up the read_enable callback to
-	# fire after pending writes are complete:
-	my $buf = '';
-	my $rd_hdr = sub {
-		my $r = sysread($rpipe, $buf, 1024, length($buf));
-		return if !defined($r) && ($!{EINTR} || $!{EAGAIN});
-		return r(500, 'http-backend error') unless $r;
-		$r = parse_cgi_headers(\$buf) or return; # incomplete headers
+	$qsp->psgi_return($env, $limiter, sub {
+		my ($r, $bref) = @_;
+		$r = parse_cgi_headers($bref) or return; # incomplete headers
 		$r->[0] == 403 ? serve_dumb($env, $git, $path) : $r;
-	};
-	my $res;
-	my $async = $env->{'pi-httpd.async'}; # XXX unstable API
-	my $cb = sub {
-		my $r = $rd_hdr->() or return;
-		$rd_hdr = undef;
-		if (scalar(@$r) == 3) { # error:
-			if ($async) {
-				$async->close; # calls rpipe->close
-			} else {
-				$rpipe->close;
-				$end->();
-			}
-			$res->($r);
-		} elsif ($async) {
-			$fh = $res->($r);
-			$async->async_pass($env->{'psgix.io'}, $fh, \$buf);
-		} else { # for synchronous PSGI servers
-			require PublicInbox::GetlineBody;
-			$r->[2] = PublicInbox::GetlineBody->new($rpipe, $end,
-								$buf);
-			$res->($r);
-		}
-	};
-	sub {
-		($res) = @_;
-
-		# hopefully this doesn't break any middlewares,
-		# holding the input here is a waste of FDs and memory
-		$env->{'psgi.input'} = undef;
-
-		$qsp->start($limiter, sub { # may run later, much later...
-			($rpipe) = @_;
-			$in = undef;
-			if ($async) {
-				$async = $async->($rpipe, $cb, $end);
-			} else { # generic PSGI
-				$cb->() while $rd_hdr;
-			}
-		});
-	};
+	});
 }
 
 sub input_to_file {
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 3500f8a..b80dac1 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -9,6 +9,7 @@ package PublicInbox::Qspawn;
 use strict;
 use warnings;
 use PublicInbox::Spawn qw(popen_rd);
+my $def_limiter;
 
 sub new ($$$;) {
 	my ($class, $cmd, $env, $opt) = @_;
@@ -59,6 +60,63 @@ sub start {
 	}
 }
 
+sub psgi_return {
+	my ($self, $env, $limiter, $parse_hdr) = @_;
+	my ($fh, $rpipe);
+	my $end = sub {
+		if (my $err = $self->finish) {
+			$err = join(' ', @{$self->{args}->[0]}).": $err\n";
+			$env->{'psgi.errors'}->print($err);
+		}
+		$fh->close if $fh; # async-only
+	};
+
+	# Danga::Socket users, we queue up the read_enable callback to
+	# fire after pending writes are complete:
+	my $buf = '';
+	my $rd_hdr = sub {
+		my $r = sysread($rpipe, $buf, 1024, length($buf));
+		return if !defined($r) && ($!{EINTR} || $!{EAGAIN});
+		$parse_hdr->($r, \$buf);
+	};
+	my $res;
+	my $async = $env->{'pi-httpd.async'};
+	my $cb = sub {
+		my $r = $rd_hdr->() or return;
+		$rd_hdr = undef;
+		if (scalar(@$r) == 3) { # error
+			if ($async) {
+				$async->close; # calls rpipe->close
+			} else {
+				$rpipe->close;
+				$end->();
+			}
+			$res->($r);
+		} elsif ($async) {
+			$fh = $res->($r); # scalar @$r == 2
+			$async->async_pass($env->{'psgix.io'}, $fh, \$buf);
+		} else { # for synchronous PSGI servers
+			require PublicInbox::GetlineBody;
+			$r->[2] = PublicInbox::GetlineBody->new($rpipe, $end,
+								$buf);
+			$res->($r);
+		}
+	};
+	$limiter ||= $def_limiter ||= PublicInbox::Qspawn::Limiter->new(32);
+	sub {
+		($res) = @_;
+		$self->start($limiter, sub { # may run later, much later...
+			($rpipe) = @_;
+			if ($async) {
+			# PublicInbox::HTTPD::Async->new($rpipe, $cb, $end)
+				$async = $async->($rpipe, $cb, $end);
+			} else { # generic PSGI
+				$cb->() while $rd_hdr;
+			}
+		});
+	};
+}
+
 package PublicInbox::Qspawn::Limiter;
 use strict;
 use warnings;
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/14] qspawn|getlinebody: support streaming filters
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
  2019-01-27  4:03 ` [PATCH 01/14] httpd/async: remove needless sysread wrapper Eric Wong
  2019-01-27  4:03 ` [PATCH 02/14] qspawn: implement psgi_return and use it for githttpbackend Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 04/14] qspawn|httpd/async: improve and fix out-of-date comments Eric Wong
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

This is intended for wrapping "git show" and "git diff"
processes in the future and to prevent it from monopolizing
callers.

This will us to better handle backpressure from gigantic
commits.
---
 lib/PublicInbox/GetlineBody.pm | 16 +++++++++++++---
 lib/PublicInbox/Qspawn.pm      | 21 +++++++++++++++++++--
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/GetlineBody.pm b/lib/PublicInbox/GetlineBody.pm
index ea07f3d..0a922fd 100644
--- a/lib/PublicInbox/GetlineBody.pm
+++ b/lib/PublicInbox/GetlineBody.pm
@@ -13,8 +13,13 @@ use strict;
 use warnings;
 
 sub new {
-	my ($class, $rpipe, $end, $buf) = @_;
-	bless { rpipe => $rpipe, end => $end, buf => $buf }, $class;
+	my ($class, $rpipe, $end, $buf, $filter) = @_;
+	bless {
+		rpipe => $rpipe,
+		end => $end,
+		buf => $buf,
+		filter => $filter || 0,
+	}, $class;
 }
 
 # close should always be called after getline returns undef,
@@ -24,8 +29,13 @@ sub DESTROY { $_[0]->close }
 
 sub getline {
 	my ($self) = @_;
+	my $filter = $self->{filter};
+	return if $filter == -1; # last call was EOF
+
 	my $buf = delete $self->{buf}; # initial buffer
-	defined $buf ? $buf : $self->{rpipe}->getline;
+	$buf = $self->{rpipe}->getline unless defined $buf;
+	$self->{filter} = -1 unless defined $buf; # set EOF for next call
+	$filter ? $filter->($buf) : $buf;
 }
 
 sub close {
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index b80dac1..3247cd0 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -9,6 +9,7 @@ package PublicInbox::Qspawn;
 use strict;
 use warnings;
 use PublicInbox::Spawn qw(popen_rd);
+require Plack::Util;
 my $def_limiter;
 
 sub new ($$$;) {
@@ -60,11 +61,25 @@ sub start {
 	}
 }
 
+# create a filter for "push"-based streaming PSGI writes used by HTTPD::Async
+sub filter_fh ($$) {
+	my ($fh, $filter) = @_;
+	Plack::Util::inline_object(
+		close => sub {
+			$fh->write($filter->(undef));
+			$fh->close;
+		},
+		write => sub {
+			$fh->write($filter->($_[0]));
+		});
+}
+
 sub psgi_return {
 	my ($self, $env, $limiter, $parse_hdr) = @_;
 	my ($fh, $rpipe);
 	my $end = sub {
-		if (my $err = $self->finish) {
+		my $err = $self->finish;
+		if ($err && !$env->{'qspawn.quiet'}) {
 			$err = join(' ', @{$self->{args}->[0]}).": $err\n";
 			$env->{'psgi.errors'}->print($err);
 		}
@@ -84,6 +99,7 @@ sub psgi_return {
 	my $cb = sub {
 		my $r = $rd_hdr->() or return;
 		$rd_hdr = undef;
+		my $filter = delete $env->{'qspawn.filter'};
 		if (scalar(@$r) == 3) { # error
 			if ($async) {
 				$async->close; # calls rpipe->close
@@ -94,11 +110,12 @@ sub psgi_return {
 			$res->($r);
 		} elsif ($async) {
 			$fh = $res->($r); # scalar @$r == 2
+			$fh = filter_fh($fh, $filter) if $filter;
 			$async->async_pass($env->{'psgix.io'}, $fh, \$buf);
 		} else { # for synchronous PSGI servers
 			require PublicInbox::GetlineBody;
 			$r->[2] = PublicInbox::GetlineBody->new($rpipe, $end,
-								$buf);
+								$buf, $filter);
 			$res->($r);
 		}
 	};
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/14] qspawn|httpd/async: improve and fix out-of-date comments
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (2 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 03/14] qspawn|getlinebody: support streaming filters Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 05/14] httpd/async: stop running command if client disconnects Eric Wong
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

---
 lib/PublicInbox/HTTPD/Async.pm | 1 +
 lib/PublicInbox/Qspawn.pm      | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index a8488b6..a9a9573 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -23,6 +23,7 @@ sub new {
 	$self;
 }
 
+# fires after pending writes are complete:
 sub restart_read_cb ($) {
 	my ($self) = @_;
 	sub { $self->watch_read(1) }
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 3247cd0..96fbf38 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -86,8 +86,6 @@ sub psgi_return {
 		$fh->close if $fh; # async-only
 	};
 
-	# Danga::Socket users, we queue up the read_enable callback to
-	# fire after pending writes are complete:
 	my $buf = '';
 	my $rd_hdr = sub {
 		my $r = sysread($rpipe, $buf, 1024, length($buf));
@@ -102,7 +100,7 @@ sub psgi_return {
 		my $filter = delete $env->{'qspawn.filter'};
 		if (scalar(@$r) == 3) { # error
 			if ($async) {
-				$async->close; # calls rpipe->close
+				$async->close; # calls rpipe->close and $end
 			} else {
 				$rpipe->close;
 				$end->();
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/14] httpd/async: stop running command if client disconnects
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (3 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 04/14] qspawn|httpd/async: improve and fix out-of-date comments Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 06/14] qspawn: implement psgi_qx Eric Wong
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

If an HTTP client disconnects while we're piping the output of a
process to them, break the pipe of the process to reclaim
resources as soon as possible.
---
 lib/PublicInbox/HTTPD/Async.pm | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index a9a9573..a1f7551 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -36,14 +36,16 @@ sub main_cb ($$$) {
 		my $r = sysread($self->{sock}, $$bref, 8192);
 		if ($r) {
 			$fh->write($$bref);
-			return if $http->{closed};
-			if ($http->{write_buf_size}) {
-				$self->watch_read(0);
-				$http->write(restart_read_cb($self));
+			unless ($http->{closed}) { # Danga::Socket sets this
+				if ($http->{write_buf_size}) {
+					$self->watch_read(0);
+					$http->write(restart_read_cb($self));
+				}
+				# stay in watch_read, but let other clients
+				# get some work done, too.
+				return;
 			}
-			# stay in watch_read, but let other clients
-			# get some work done, too.
-			return;
+			# fall through to close below...
 		} elsif (!defined $r) {
 			return if $!{EAGAIN} || $!{EINTR};
 		}
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/14] qspawn: implement psgi_qx
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (4 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 05/14] httpd/async: stop running command if client disconnects Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 07/14] t/qspawn.t: psgi_qx stderr test Eric Wong
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

This new asynchronous API, will allow us to take
advantage of non-blocking I/O from even small commands;
as those may still need to wait for slow operations.
---
 lib/PublicInbox/Qspawn.pm | 89 ++++++++++++++++++++++++++++++++-------
 1 file changed, 74 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 96fbf38..6859a8a 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -61,6 +61,48 @@ sub start {
 	}
 }
 
+sub _psgi_finish ($$) {
+	my ($self, $env) = @_;
+	my $err = $self->finish;
+	if ($err && !$env->{'qspawn.quiet'}) {
+		$err = join(' ', @{$self->{args}->[0]}).": $err\n";
+		$env->{'psgi.errors'}->print($err);
+	}
+}
+
+sub psgi_qx {
+	my ($self, $env, $limiter, $qx_cb) = @_;
+	my $qx = PublicInbox::Qspawn::Qx->new;
+	my $end = sub {
+		_psgi_finish($self, $env);
+		eval { $qx_cb->($qx) };
+		$qx = undef;
+	};
+	my $rpipe;
+	my $async = $env->{'pi-httpd.async'};
+	my $cb = sub {
+		my $r = sysread($rpipe, my $buf, 8192);
+		if ($async) {
+			$async->async_pass($env->{'psgix.io'}, $qx, \$buf);
+		} elsif (defined $r) {
+			$r ? $qx->write($buf) : $end->();
+		} else {
+			return if $!{EAGAIN} || $!{EINTR}; # loop again
+			$end->();
+		}
+	};
+	$limiter ||= $def_limiter ||= PublicInbox::Qspawn::Limiter->new(32);
+	$self->start($limiter, sub { # may run later, much later...
+		($rpipe) = @_;
+		if ($async) {
+		# PublicInbox::HTTPD::Async->new($rpipe, $cb, $end)
+			$async = $async->($rpipe, $cb, $end);
+		} else { # generic PSGI
+			$cb->() while $qx;
+		}
+	});
+}
+
 # create a filter for "push"-based streaming PSGI writes used by HTTPD::Async
 sub filter_fh ($$) {
 	my ($fh, $filter) = @_;
@@ -78,11 +120,7 @@ sub psgi_return {
 	my ($self, $env, $limiter, $parse_hdr) = @_;
 	my ($fh, $rpipe);
 	my $end = sub {
-		my $err = $self->finish;
-		if ($err && !$env->{'qspawn.quiet'}) {
-			$err = join(' ', @{$self->{args}->[0]}).": $err\n";
-			$env->{'psgi.errors'}->print($err);
-		}
+		_psgi_finish($self, $env);
 		$fh->close if $fh; # async-only
 	};
 
@@ -92,7 +130,7 @@ sub psgi_return {
 		return if !defined($r) && ($!{EINTR} || $!{EAGAIN});
 		$parse_hdr->($r, \$buf);
 	};
-	my $res;
+	my $res = delete $env->{'qspawn.response'};
 	my $async = $env->{'pi-httpd.async'};
 	my $cb = sub {
 		my $r = $rd_hdr->() or return;
@@ -118,17 +156,21 @@ sub psgi_return {
 		}
 	};
 	$limiter ||= $def_limiter ||= PublicInbox::Qspawn::Limiter->new(32);
+	my $start_cb = sub { # may run later, much later...
+		($rpipe) = @_;
+		if ($async) {
+			# PublicInbox::HTTPD::Async->new($rpipe, $cb, $end)
+			$async = $async->($rpipe, $cb, $end);
+		} else { # generic PSGI
+			$cb->() while $rd_hdr;
+		}
+	};
+
+	return $self->start($limiter, $start_cb) if $res;
+
 	sub {
 		($res) = @_;
-		$self->start($limiter, sub { # may run later, much later...
-			($rpipe) = @_;
-			if ($async) {
-			# PublicInbox::HTTPD::Async->new($rpipe, $cb, $end)
-				$async = $async->($rpipe, $cb, $end);
-			} else { # generic PSGI
-				$cb->() while $rd_hdr;
-			}
-		});
+		$self->start($limiter, $start_cb);
 	};
 }
 
@@ -146,4 +188,21 @@ sub new {
 	}, $class;
 }
 
+# captures everything into a buffer and executes a callback when done
+package PublicInbox::Qspawn::Qx;
+use strict;
+use warnings;
+
+sub new {
+	my ($class) = @_;
+	my $buf = '';
+	bless \$buf, $class;
+}
+
+# called by PublicInbox::HTTPD::Async ($fh->write)
+sub write {
+	${$_[0]} .= $_[1];
+	undef;
+}
+
 1;
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/14] t/qspawn.t: psgi_qx stderr test
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (5 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 06/14] qspawn: implement psgi_qx Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 08/14] view: swap CRLF for LF in HTML output Eric Wong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

---
 t/qspawn.t | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/t/qspawn.t b/t/qspawn.t
index 170e4d7..745ec4d 100644
--- a/t/qspawn.t
+++ b/t/qspawn.t
@@ -1,8 +1,16 @@
-# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
+# Copyright (C) 2016-2019 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use Test::More;
 use_ok 'PublicInbox::Qspawn';
 
+{
+	my $cmd = [qw(sh -c), 'echo >&2 err; echo out'];
+	my $qsp = PublicInbox::Qspawn->new($cmd, {}, { 2 => 1 });
+	my $res;
+	$qsp->psgi_qx({}, undef, sub { $res = ${$_[0]} });
+	is($res, "err\nout\n", 'captured stderr and stdout');
+}
+
 my $limiter = PublicInbox::Qspawn::Limiter->new(1);
 {
 	my $x = PublicInbox::Qspawn->new([qw(true)]);
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/14] view: swap CRLF for LF in HTML output
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (6 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 07/14] t/qspawn.t: psgi_qx stderr test Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 09/14] solver: rewrite to use Qspawn->psgi_qx and pi-httpd.async Eric Wong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

It makes no difference to browsers aside from saving a few
bytes; and this means we won't have to worry about extra
'%0D' showing up in links to solver.
---
 lib/PublicInbox/View.pm | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 41a45b0..5aaa72b 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -557,6 +557,10 @@ sub add_text_body {
 
 	return attach_link($upfx, $ct, $p, $fn) unless defined $s;
 
+	# makes no difference to browsers, and don't screw up filename
+	# link generation in diffs with the extra '%0D'
+	$s =~ s/\r\n/\n/sg;
+
 	my ($diff, $spfx);
 	if ($s =~ /^(?:diff|---|\+{3}) /ms) {
 		$diff = [];
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/14] solver: rewrite to use Qspawn->psgi_qx and pi-httpd.async
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (7 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 08/14] view: swap CRLF for LF in HTML output Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 10/14] solver: hold patches in temporary directory Eric Wong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

The psgi_qx routine in the now-abandoned "repobrowse" branch
allows us to break down blob-solving at each process execution
point.  It reuses the Qspawn facility for git-http-backend(1),
allowing us to limit parallel subprocesses independently of Perl
worker count.

This is actually a 2-3% slower a fully-synchronous execution;
but it is fair to other clients as it won't monopolize the server
for hundreds of milliseconds (or even seconds) at a time.
---
 lib/PublicInbox/HTTPD/Async.pm |   9 +
 lib/PublicInbox/SolverGit.pm   | 467 ++++++++++++++++++---------------
 lib/PublicInbox/ViewVCS.pm     |  51 ++--
 t/solver_git.t                 |  22 +-
 4 files changed, 313 insertions(+), 236 deletions(-)

diff --git a/lib/PublicInbox/HTTPD/Async.pm b/lib/PublicInbox/HTTPD/Async.pm
index a1f7551..a647f10 100644
--- a/lib/PublicInbox/HTTPD/Async.pm
+++ b/lib/PublicInbox/HTTPD/Async.pm
@@ -14,6 +14,15 @@ require PublicInbox::EvCleanup;
 
 sub new {
 	my ($class, $io, $cb, $cleanup) = @_;
+
+	# no $io? call $cb at the top of the next event loop to
+	# avoid recursion:
+	unless (defined($io)) {
+		PublicInbox::EvCleanup::asap($cb) if $cb;
+		PublicInbox::EvCleanup::next_tick($cleanup) if $cleanup;
+		return;
+	}
+
 	my $self = fields::new($class);
 	IO::Handle::blocking($io, 0);
 	$self->SUPER::new($io);
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 53a6262..a7a9a0a 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -15,23 +15,41 @@ use Fcntl qw(SEEK_SET);
 use PublicInbox::Git qw(git_unquote git_quote);
 use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::MsgIter qw(msg_iter msg_part_text);
+use PublicInbox::Qspawn;
 use URI::Escape qw(uri_escape_utf8);
 
+# di = diff info / a hashref with information about a diff ($di):
+# {
+#	oid_a => abbreviated pre-image oid,
+#	oid_b => abbreviated post-image oid,
+#	tmp => anonymous file handle with the diff,
+#	hdr_lines => arrayref of various header lines for mode information
+#	mode_a => original mode of oid_a (string, not integer),
+#	ibx => PublicInbox::Inbox object containing the diff
+#	smsg => PublicInbox::SearchMsg object containing diff
+#	path_a => pre-image path
+#	path_b => post-image path
+# }
+
 # don't bother if somebody sends us a patch with these path components,
 # it's junk at best, an attack attempt at worse:
 my %bad_component = map { $_ => 1 } ('', '.', '..');
 
-sub new {
-	my ($class, $gits, $inboxes) = @_;
-	bless {
-		gits => $gits,
-		inboxes => $inboxes,
-	}, $class;
+sub dbg ($$) {
+	print { $_[0]->{out} } $_[1], "\n" or ERR($_[0], "print(dbg): $!");
+}
+
+sub ERR ($$) {
+	my ($self, $err) = @_;
+	print { $self->{out} } $err, "\n";
+	my $ucb = delete($self->{user_cb});
+	eval { $ucb->($err) } if $ucb;
+	die $err;
 }
 
 # look for existing blobs already in git repos
-sub solve_existing ($$$) {
-	my ($self, $out, $want) = @_;
+sub solve_existing ($$) {
+	my ($self, $want) = @_;
 	my $oid_b = $want->{oid_b};
 	my @ambiguous; # Array of [ git, $oids]
 	foreach my $git (@{$self->{gits}}) {
@@ -50,25 +68,13 @@ sub solve_existing ($$$) {
 		# TODO: do something with the ambiguous array?
 		# push @ambiguous, [ $git, @oids ];
 
-		print $out "`$oid_b' ambiguous in ",
-				join("\n", $git->pub_urls), "\n",
-				join('', map { "$_ blob\n" } @oids), "\n";
+		dbg($self, "`$oid_b' ambiguous in " .
+				join("\n\t", $git->pub_urls) . "\n" .
+				join('', map { "$_ blob\n" } @oids));
 	}
 	scalar(@ambiguous) ? \@ambiguous : undef;
 }
 
-# returns a hashref with information about a diff ($di):
-# {
-#	oid_a => abbreviated pre-image oid,
-#	oid_b => abbreviated post-image oid,
-#	tmp => anonymous file handle with the diff,
-#	hdr_lines => arrayref of various header lines for mode information
-#	mode_a => original mode of oid_a (string, not integer),
-#	ibx => PublicInbox::Inbox object containing the diff
-#	smsg => PublicInbox::SearchMsg object containing diff
-#	path_a => pre-image path
-#	path_b => post-image path
-# }
 sub extract_diff ($$$$) {
 	my ($p, $re, $ibx, $smsg) = @_;
 	my ($part) = @$p; # ignore $depth and @idx;
@@ -182,11 +188,51 @@ sub find_extract_diff ($$$) {
 	}
 }
 
+sub prepare_index ($) {
+	my ($self) = @_;
+	my $patches = $self->{patches};
+	$self->{nr} = 0;
+	$self->{tot} = scalar @$patches;
+
+	my $di = $patches->[0] or die 'no patches';
+	my $oid_a = $di->{oid_a} or die '{oid_a} unset';
+	my $existing = $self->{found}->{$oid_a};
+
+	# no index creation for added files
+	$oid_a =~ /\A0+\z/ and return next_step($self);
+
+	die "BUG: $oid_a not not found" unless $existing;
+
+	my $oid_full = $existing->[1];
+	my $path_a = $di->{path_a} or die "BUG: path_a missing for $oid_full";
+	my $mode_a = $di->{mode_a} || extract_old_mode($di);
+
+	open my $in, '+>', undef or die "open: $!";
+	print $in "$mode_a $oid_full\t$path_a\0" or die "print: $!";
+	$in->flush or die "flush: $!";
+	sysseek($in, 0, 0) or die "seek: $!";
+
+	dbg($self, 'preparing index');
+	my $rdr = { 0 => fileno($in) };
+	my $cmd = [ qw(git -C), $self->{wt_dir},
+			qw(update-index -z --index-info) ];
+	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $rdr);
+	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
+		my ($bref) = @_;
+		if (my $err = $qsp->{err}) {
+			ERR($self, "git update-index error: $err");
+		}
+		dbg($self, "index prepared:\n" .
+			"$mode_a $oid_full\t" . git_quote($path_a));
+		next_step($self); # onto do_git_apply
+	});
+}
+
 # pure Perl "git init"
 sub do_git_init_wt ($) {
 	my ($self) = @_;
 	my $wt = File::Temp->newdir('solver.wt-XXXXXXXX', TMPDIR => 1);
-	my $dir = $wt->dirname;
+	my $dir = $self->{wt_dir} = $wt->dirname;
 
 	foreach ('', qw(objects refs objects/info refs/heads)) {
 		mkdir("$dir/.git/$_") or die "mkdir $_: $!";
@@ -211,7 +257,9 @@ EOF
 	print($fh (map { "$_->{git_dir}/objects\n" } @{$self->{gits}})) or
 		die "print $f: $!";
 	close $fh or die "close: $f: $!";
-	$wt;
+	my $wt_git = $self->{wt_git} = PublicInbox::Git->new("$dir/.git");
+	$wt_git->{-wt} = $wt;
+	prepare_index($self);
 }
 
 sub extract_old_mode ($) {
@@ -222,232 +270,227 @@ sub extract_old_mode ($) {
 	'100644';
 }
 
-sub reap ($$) {
-	my ($pid, $msg) = @_;
-	waitpid($pid, 0) == $pid or die "waitpid($msg): $!";
-	$? == 0 or die "$msg failed: $?";
+sub do_step ($) {
+	my ($self) = @_;
+	eval {
+		# step 1: resolve blobs to patches in the todo queue
+		if (my $want = pop @{$self->{todo}}) {
+			# this populates {patches} and {todo}
+			resolve_patch($self, $want);
+
+		# step 2: then we instantiate a working tree once
+		# the todo queue is finally empty:
+		} elsif (!defined($self->{wt_git})) {
+			do_git_init_wt($self);
+
+		# step 3: apply each patch in the stack
+		} elsif (scalar @{$self->{patches}}) {
+			do_git_apply($self);
+
+		# step 4: execute the user-supplied callback with
+		# our result: (which may be undef)
+		# Other steps may call user_cb to terminate prematurely
+		# on error
+		} elsif (my $ucb = delete($self->{user_cb})) {
+			$ucb->($self->{found}->{$self->{oid_want}});
+		} else {
+			die 'about to call user_cb twice'; # Oops :x
+		}
+	}; # eval
+	my $err = $@;
+	if ($err) {
+		$err =~ s/^\s*Exception:\s*//; # bad word to show users :P
+		dbg($self, "E: $err");
+		my $ucb = delete($self->{user_cb});
+		eval { $ucb->($err) } if $ucb;
+	}
 }
 
-sub prepare_index ($$$$) {
-	my ($out, $wt_dir, $existing, $di) = @_;
-	my $oid_full = $existing->[1];
-	my ($r, $w);
-	my $path_a = $di->{path_a} or die "BUG: path_a missing for $oid_full";
-	my $mode_a = $di->{mode_a} || extract_old_mode($di);
-
-	# unlike git-apply(1), this only gets called once in a patch
-	# series and happens too quickly to be worth making async:
-	pipe($r, $w) or die "pipe: $!";
-	my $rdr = { 0 => fileno($r) };
-	my $pid = spawn([qw(git -C), $wt_dir,
-			 qw(update-index -z --index-info)], undef, $rdr);
-	close $r or die "close pipe(r): $!";
-	print $w "$mode_a $oid_full\t$path_a\0" or die "print update-index: $!";
-
-	close $w or die "close update-index: $!";
-	reap($pid, 'update-index -z --index-info');
-
-	print $out "index prepared:\n",
-		"$mode_a $oid_full\t", git_quote($path_a), "\n";
+sub step_cb ($) {
+	my ($self) = @_;
+	sub { do_step($self) };
 }
 
-sub do_apply_begin ($$$) {
-	my ($out, $wt_dir, $di) = @_;
-
-	my $tmp = delete $di->{tmp} or die "BUG: no tmp ", di_url($di);
-	$tmp->flush or die "tmp->flush failed: $!";
-	$out->flush or die "err->flush failed: $!";
-	sysseek($tmp, 0, SEEK_SET) or die "sysseek(tmp) failed: $!";
-
-	defined(my $err_fd = fileno($out)) or die "fileno(out): $!";
-	my $rdr = { 0 => fileno($tmp), 1 => $err_fd, 2 => $err_fd };
-
-	# we need --ignore-whitespace because some patches are CRLF
-	my $cmd = [ qw(git -C), $wt_dir,
-	            qw(apply --cached --ignore-whitespace
-		       --whitespace=warn --verbose) ];
-	spawn($cmd, undef, $rdr);
+sub next_step ($) {
+	my ($self) = @_;
+	# if outside of public-inbox-httpd, caller is expected to be
+	# looping step_cb, anyways
+	my $async = $self->{psgi_env}->{'pi-httpd.async'} or return;
+	# PublicInbox::HTTPD::Async->new
+	$async->(undef, step_cb($self));
 }
 
-sub do_apply_continue ($$) {
-	my ($wt_dir, $apply_pid) = @_;
-	reap($apply_pid, 'apply');
-	popen_rd([qw(git -C), $wt_dir, qw(ls-files -s -z)]);
+sub mark_found ($$$) {
+	my ($self, $oid, $found_info) = @_;
+	$self->{found}->{$oid} = $found_info;
 }
 
-sub do_apply_end ($$$$) {
-	my ($out, $wt_git, $rd, $di) = @_;
+sub parse_ls_files ($$$$) {
+	my ($self, $qsp, $bref, $di) = @_;
+	if (my $err = $qsp->{err}) {
+		die "git ls-files error: $err";
+	}
 
-	local $/ = "\0";
-	defined(my $line = <$rd>) or die "failed to read ls-files: $!";
-	chomp $line or die "no trailing \\0 in [$line] from ls-files";
+	my ($line, @extra) = split(/\0/, $$bref);
+	scalar(@extra) and die "BUG: extra files in index: <",
+				join('> <', @extra), ">";
 
 	my ($info, $file) = split(/\t/, $line, 2);
 	my ($mode_b, $oid_b_full, $stage) = split(/ /, $info);
+	if ($file ne $di->{path_b}) {
+		die
+"BUG: index mismatch: file=$file != path_b=$di->{path_b}";
+	}
 
-	defined($line = <$rd>) and die "extra files in index: $line";
-	close $rd or die "close ls-files: $?";
-
-	$file eq $di->{path_b} or
-		die "index mismatch: file=$file != path_b=$di->{path_b}";
-
+	my $wt_git = $self->{wt_git} or die 'no git working tree';
 	my (undef, undef, $size) = $wt_git->check($oid_b_full);
+	defined($size) or die "check $oid_b_full failed";
 
-	defined($size) or die "failed to read_size from $oid_b_full";
-
-	print $out "$mode_b $oid_b_full\t$file\n";
-	[ $wt_git, $oid_b_full, 'blob', $size, $di ];
+	dbg($self, "index at:\n$mode_b $oid_b_full\t$file");
+	my $created = [ $wt_git, $oid_b_full, 'blob', $size, $di ];
+	mark_found($self, $di->{oid_b}, $created);
+	next_step($self); # onto the next patch
 }
 
-sub di_url ($) {
-	my ($di) = @_;
-	# note: we don't pass the PSGI env here, different inboxes
-	# can have different HTTP_HOST on the same instance.
-	my $url = $di->{ibx}->base_url;
-	my $mid = $di->{smsg}->{mid};
-	defined($url) ? "$url$mid/" : "<$mid>";
+sub start_ls_files ($$) {
+	my ($self, $di) = @_;
+	my $cmd = [qw(git -C), $self->{wt_dir}, qw(ls-files -s -z)];
+	my $qsp = PublicInbox::Qspawn->new($cmd);
+	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
+		my ($bref) = @_;
+		eval { parse_ls_files($self, $qsp, $bref, $di) };
+		ERR($self, $@) if $@;
+	});
 }
 
-# reconstruct the oid_b blob using patches we found:
-sub apply_patches_cb ($$$$$) {
-	my ($self, $out, $found, $patches, $oid_b) = @_;
-
-	my $tot = scalar(@$patches) or return sub {
-		print $out "no patch(es) for $oid_b\n";
-		undef;
-	};
-
-	my $wt = do_git_init_wt($self);
-	my $wt_dir = $wt->dirname;
-	my $wt_git = PublicInbox::Git->new("$wt_dir/.git");
-	$wt_git->{-wt} = $wt;
-
-	my $cur = 0;
-	my ($apply_pid, $rd, $di);
-
-	# returns an empty string if in progress, undef if not found,
-	# or the final [ ::Git, oid_full, type, size, $di ] arrayref
-	# if found
-	sub {
-		if ($rd) {
-			$found->{$di->{oid_b}} =
-					do_apply_end($out, $wt_git, $rd, $di);
-			$rd = undef;
-			# continue to shift @$patches
-		} elsif ($apply_pid) {
-			$rd = do_apply_continue($wt_dir, $apply_pid);
-			$apply_pid = undef;
-			return ''; # $rd => do_apply_ned
-		}
+sub do_git_apply ($) {
+	my ($self) = @_;
 
-		# may return undef here
-		$di = shift @$patches or return $found->{$oid_b};
+	my $di = shift @{$self->{patches}} or die 'empty {patches}';
+	my $tmp = delete $di->{tmp} or die 'no tmp ', di_url($self, $di);
+	$tmp->flush or die "tmp->flush failed: $!";
+	sysseek($tmp, 0, SEEK_SET) or die "sysseek(tmp) failed: $!";
 
-		my $i = ++$cur;
-		my $oid_a = $di->{oid_a};
-		my $existing = $found->{$oid_a};
-		my $empty_oid = $oid_a =~ /\A0+\z/;
+	my $i = ++$self->{nr};
+	dbg($self, "\napplying [$i/$self->{tot}] " . di_url($self, $di) .
+		"\n" . join('', @{$di->{hdr_lines}}));
 
-		if ($empty_oid && $i != 1) {
-			die "empty oid at [$i/$tot] ", di_url($di);
-		}
-		if (!$existing && !$empty_oid) {
-			die "missing $oid_a at [$i/$tot] ", di_url($di);
+	# we need --ignore-whitespace because some patches are CRLF
+	my $cmd = [ qw(git -C), $self->{wt_dir},
+	            qw(apply --cached --ignore-whitespace
+		       --whitespace=warn --verbose) ];
+	my $rdr = { 0 => fileno($tmp), 2 => 1 };
+	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $rdr);
+	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
+		my ($bref) = @_;
+		close $tmp;
+		dbg($self, $$bref);
+		if (my $err = $qsp->{err}) {
+			ERR($self, "git apply error: $err");
 		}
+		eval { start_ls_files($self, $di) };
+		ERR($self, $@) if $@;
+	});
+}
 
-		# prepare the worktree for patch application:
-		if ($i == 1 && $existing) {
-			prepare_index($out, $wt_dir, $existing, $di);
-		}
+sub di_url ($$) {
+	my ($self, $di) = @_;
+	# note: we don't pass the PSGI env unconditionally, here,
+	# different inboxes can have different HTTP_HOST on the same instance.
+	my $ibx = $di->{ibx};
+	my $env = $self->{psgi_env} if $ibx eq $self->{inboxes}->[0];
+	my $url = $ibx->base_url($env);
+	my $mid = $di->{smsg}->{mid};
+	defined($url) ? "$url$mid/" : "<$mid>";
+}
 
-		print $out "\napplying [$i/$tot] ", di_url($di), "\n",
-			   join('', @{$di->{hdr_lines}}), "\n"
-			or die "print \$out failed: $!";
+sub resolve_patch ($$) {
+	my ($self, $want) = @_;
 
-		# begin the patch application patch!
-		$apply_pid = do_apply_begin($out, $wt_dir, $di);
-		# next call to this callback will call do_apply_continue
-		'';
+	if (scalar(@{$self->{patches}}) > $self->{max_patch}) {
+		die "Aborting, too many steps to $self->{oid_want}";
 	}
-}
 
-# recreate $oid_b
-# Returns an array ref: [ ::Git object, oid_full, type, size, di ]
-# or undef if nothing was found.
-#
-# TODO: complete the migration of this and ViewVCS into an evented
-# model for fairness
-sub solve ($$$$) {
-	my ($self, $out, $oid_b, $hints) = @_;
+	# see if we can find the blob in an existing git repo:
+	my $cur_want = $want->{oid_b};
+	if (my $existing = solve_existing($self, $want)) {
+		dbg($self, "found $cur_want in " .
+			join("\n", $existing->[0]->pub_urls));
 
-	# should we even get here? Probably not, but somebody
-	# could be manually typing URLs:
-	return if $oid_b =~ /\A0+\z/;
-
-	my $req = { %$hints, oid_b => $oid_b };
-	my @todo = ($req);
-	my $found = {}; # { abbrev => [ ::Git, oid_full, type, size, $di ] }
-	my $patches = []; # [ array of $di hashes ]
-	my $max = $self->{max_patches} || 200;
-	my $apply_cb;
-	my $cb = sub {
-		my $want = pop @todo;
-		unless ($want) {
-			$apply_cb ||= apply_patches_cb($self, $out, $found,
-			                               $patches, $oid_b);
-			return $apply_cb->();
-		}
-
-		if (scalar(@$patches) > $max) {
-			print $out "Aborting, too many steps to $oid_b\n";
+		if ($cur_want eq $self->{oid_want}) { # all done!
+			eval { delete($self->{user_cb})->($existing) };
+			die "E: $@" if $@;
 			return;
 		}
-		# see if we can find the blob in an existing git repo:
-		my $want_oid = $want->{oid_b};
-		if (my $existing = solve_existing($self, $out, $want)) {
-			print $out "found $want_oid in ",
-				join("\n", $existing->[0]->pub_urls), "\n";
-
-			return $existing if $want_oid eq $oid_b; # DONE!
-			$found->{$want_oid} = $existing;
-			return ''; # ok, one blob resolved, more to go?
-		}
-
-		# scan through inboxes to look for emails which results in
-		# the oid we want:
-		my $di;
-		foreach my $ibx (@{$self->{inboxes}}) {
-			$di = find_extract_diff($self, $ibx, $want) or next;
+		mark_found($self, $cur_want, $existing);
+		return next_step($self); # onto patch application
+	}
 
-			unshift @$patches, $di;
-			print $out "found $want_oid in ",di_url($di),"\n";
+	# scan through inboxes to look for emails which results in
+	# the oid we want:
+	my $di;
+	foreach my $ibx (@{$self->{inboxes}}) {
+		$di = find_extract_diff($self, $ibx, $want) or next;
 
-			# good, we can find a path to the oid we $want, now
-			# lets see if we need to apply more patches:
-			my $src = $di->{oid_a};
+		unshift @{$self->{patches}}, $di;
+		dbg($self, "found $cur_want in ".di_url($self, $di));
 
-			last if $src =~ /\A0+\z/;
+		# good, we can find a path to the oid we $want, now
+		# lets see if we need to apply more patches:
+		my $src = $di->{oid_a};
 
+		unless ($src =~ /\A0+\z/) {
 			# we have to solve it using another oid, fine:
 			my $job = { oid_b => $src, path_b => $di->{path_a} };
-			push @todo, $job;
-			last; # onto the next @todo item
+			push @{$self->{todo}}, $job;
 		}
-		unless ($di) {
-			print $out "$want_oid could not be found\n";
-			return;
-		}
-		''; # continue onto next @todo item;
-	};
+		return next_step($self); # onto the next todo item
+	}
+	dbg($self, "could not find $cur_want");
+	eval { delete($self->{user_cb})->(undef) }; # not found! :<
+	die "E: $@" if $@;
+}
 
-	while (1) {
-		my $ret = eval { $cb->() };
-		unless (defined($ret)) {
-			print $out "E: $@\n" if $@;
-			return;
-		}
-		return $ret if ref($ret);
-		# $ret == ''; so continue looping here
+# this API is designed to avoid creating self-referential structures;
+# so user_cb never references the SolverGit object
+sub new {
+	my ($class, $ibx, $user_cb) = @_;
+
+	bless {
+		gits => $ibx->{-repo_objs},
+		user_cb => $user_cb,
+		max_patch => 100,
+
+		# TODO: config option for searching related inboxes
+		inboxes => [ $ibx ],
+	}, $class;
+}
+
+# recreate $oid_want using $hints
+# Calls {user_cb} with: [ ::Git object, oid_full, type, size, di (diff_info) ]
+# with found object, or undef if nothing was found
+# Calls {user_cb} with a string error on fatal errors
+sub solve ($$$$$) {
+	my ($self, $env, $out, $oid_want, $hints) = @_;
+
+	# should we even get here? Probably not, but somebody
+	# could be manually typing URLs:
+	return (delete $self->{user_cb})->(undef) if $oid_want =~ /\A0+\z/;
+
+	$self->{oid_want} = $oid_want;
+	$self->{out} = $out;
+	$self->{psgi_env} = $env;
+	$self->{todo} = [ { %$hints, oid_b => $oid_want } ];
+	$self->{patches} = []; # [ $di, $di, ... ]
+	$self->{found} = {}; # { abbr => [ ::Git, oid, type, size, $di ] }
+
+	dbg($self, "solving $oid_want ...");
+	my $step_cb = step_cb($self);
+	if (my $async = $env->{'pi-httpd.async'}) {
+		# PublicInbox::HTTPD::Async->new
+		$async->(undef, $step_cb);
+	} else {
+		$step_cb->() while $self->{user_cb};
 	}
 }
 
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 4a3896d..fa76086 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -27,37 +27,33 @@ my $enc_utf8 = find_encoding('UTF-8');
 
 sub html_page ($$$) {
 	my ($ctx, $code, $strref) = @_;
+	my $wcb = delete $ctx->{-wcb};
 	$ctx->{-upfx} = '../../'; # from "/$INBOX/$OID/s/"
-	PublicInbox::WwwStream->response($ctx, $code, sub {
+	my $res = PublicInbox::WwwStream->response($ctx, $code, sub {
 		my ($nr, undef) =  @_;
 		$nr == 1 ? $$strref : undef;
 	});
+	$wcb->($res);
 }
 
-sub show ($$;$) {
-	my ($ctx, $oid_b, $fn) = @_;
-	my $ibx = $ctx->{-inbox};
-	my $inboxes = [ $ibx ];
-	my $solver = PublicInbox::SolverGit->new($ibx->{-repo_objs}, $inboxes);
-	my $qp = $ctx->{qp};
-	my $hints = {};
-	while (my ($from, $to) = each %QP_MAP) {
-		defined(my $v = $qp->{$from}) or next;
-		$hints->{$to} = $v;
-	}
-
-	open my $log, '+>', undef or die "open: $!";
-	my $res = $solver->solve($log, $oid_b, $hints);
+sub solve_result {
+	my ($ctx, $res, $log, $hints, $fn) = @_;
 
-	seek($log, 0, 0) or die "seek: $!";
+	unless (seek($log, 0, 0)) {
+		$ctx->{env}->{'psgi.errors'}->print("seek(log): $!\n");
+		return html_page($ctx, 500, \'seek error');
+	}
 	$log = do { local $/; <$log> };
 
+	my $ref = ref($res);
+	$log .= $res unless $ref;
 	my $l = PublicInbox::Linkify->new;
 	$l->linkify_1($log);
 	$log = '<pre>debug log:</pre><hr /><pre>' .
 		$l->linkify_2(ascii_html($log)) . '</pre>';
 
 	$res or return html_page($ctx, 404, \$log);
+	$ref eq 'ARRAY' or return html_page($ctx, 500, \$log);
 
 	my ($git, $oid, $type, $size, $di) = @$res;
 	if ($size > $max_size) {
@@ -78,7 +74,7 @@ sub show ($$;$) {
 	if ($fn) {
 		my $h = [ 'Content-Length', $size, 'Content-Type' ];
 		push(@$h, ($binary ? 'application/octet-stream' : 'text/plain'));
-		return [ 200, $h, [ $$blob ]];
+		return delete($ctx->{-wcb})->([200, $h, [ $$blob ]]);
 	}
 
 	my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
@@ -107,4 +103,25 @@ sub show ($$;$) {
 	html_page($ctx, 200, \$log);
 }
 
+sub show ($$;$) {
+	my ($ctx, $oid_b, $fn) = @_;
+	my $qp = $ctx->{qp};
+	my $hints = {};
+	while (my ($from, $to) = each %QP_MAP) {
+		defined(my $v = $qp->{$from}) or next;
+		$hints->{$to} = $v;
+	}
+
+	open my $log, '+>', undef or die "open: $!";
+	my $solver = PublicInbox::SolverGit->new($ctx->{-inbox}, sub {
+		solve_result($ctx, $_[0], $log, $hints, $fn);
+	});
+
+	# PSGI server will call this and give us a callback
+	sub {
+		$ctx->{-wcb} = $_[0]; # HTTP write callback
+		$solver->solve($ctx->{env}, $log, $oid_b, $hints);
+	};
+}
+
 1;
diff --git a/t/solver_git.t b/t/solver_git.t
index fe322ea..197a003 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -40,10 +40,12 @@ sub deliver_patch ($) {
 
 deliver_patch('t/solve/0001-simple-mod.patch');
 
-my $gits = [ PublicInbox::Git->new($git_dir) ];
-my $solver = PublicInbox::SolverGit->new($gits, [ $ibx ]);
+$ibx->{-repo_objs} = [ PublicInbox::Git->new($git_dir) ];
+my $res;
+my $solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
 open my $log, '+>>', "$mainrepo/solve.log" or die "open: $!";
-my $res = $solver->solve($log, '69df7d5', {});
+my $psgi_env = { 'psgi.url_scheme' => 'http', HTTP_HOST => 'example.com' };
+$solver->solve($psgi_env, $log, '69df7d5', {});
 ok($res, 'solved a blob!');
 my $wt_git = $res->[0];
 is(ref($wt_git), 'PublicInbox::Git', 'got a git object for the blob');
@@ -62,20 +64,24 @@ if (0) { # TODO: check this?
 	diag $z;
 }
 
+$solver = undef;
 $res = undef;
 my $wt_git_dir = $wt_git->{git_dir};
 $wt_git = undef;
 ok(!-d $wt_git_dir, 'no references to WT held');
 
-$res = $solver->solve($log, '0'x40, {});
+$solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
+$solver->solve($psgi_env, $log, '0'x40, {});
 is($res, undef, 'no error on z40');
 
 my $git_v2_20_1_tag = '7a95a1cd084cb665c5c2586a415e42df0213af74';
-$res = $solver->solve($log, $git_v2_20_1_tag, {});
+$solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
+$solver->solve($psgi_env, $log, $git_v2_20_1_tag, {});
 is($res, undef, 'no error on a tag not in our repo');
 
 deliver_patch('t/solve/0002-rename-with-modifications.patch');
-$res = $solver->solve($log, '0a92431', {});
+$solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
+$solver->solve($psgi_env, $log, '0a92431', {});
 ok($res, 'resolved without hints');
 
 my $hints = {
@@ -83,7 +89,9 @@ my $hints = {
 	path_a => 'HACKING',
 	path_b => 'CONTRIBUTING'
 };
-my $hinted = $solver->solve($log, '0a92431', $hints);
+$solver = PublicInbox::SolverGit->new($ibx, sub { $res = $_[0] });
+$solver->solve($psgi_env, $log, '0a92431', $hints);
+my $hinted = $res;
 # don't compare ::Git objects:
 shift @$res; shift @$hinted;
 is_deeply($res, $hinted, 'hints work (or did not hurt :P');
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/14] solver: hold patches in temporary directory
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (8 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 09/14] solver: rewrite to use Qspawn->psgi_qx and pi-httpd.async Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 11/14] solver: reduce "git apply" invocations Eric Wong
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

We can avoid bumping up RLIMIT_NOFILE too much by storing
patches in a temporary directory.  And we can share this
top-level directory with our temporary git repository.

Since we no longer rely on a working-tree for git, we are free
to rearrange the layout and avoid relying on the ".git"
convention and relying on "git -C" for chdir.

This may also ease porting public-inbox to older systems
where git does not support "-C" for chdir.
---
 lib/PublicInbox/Git.pm       |  2 +-
 lib/PublicInbox/SolverGit.pm | 57 +++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index a0b934a..3ad0811 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -50,7 +50,7 @@ sub new {
 	my ($class, $git_dir) = @_;
 	my @st;
 	$st[7] = $st[10] = 0;
-	# may contain {-wt} field (working-tree (File::Temp::Dir))
+	# may contain {-tmp} field for File::Temp::Dir
 	bless { git_dir => $git_dir, st => \@st }, $class
 }
 
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index a7a9a0a..e307202 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -214,9 +214,8 @@ sub prepare_index ($) {
 
 	dbg($self, 'preparing index');
 	my $rdr = { 0 => fileno($in) };
-	my $cmd = [ qw(git -C), $self->{wt_dir},
-			qw(update-index -z --index-info) ];
-	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $rdr);
+	my $cmd = [ qw(git update-index -z --index-info) ];
+	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
 	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
 		my ($bref) = @_;
 		if (my $err = $qsp->{err}) {
@@ -229,16 +228,16 @@ sub prepare_index ($) {
 }
 
 # pure Perl "git init"
-sub do_git_init_wt ($) {
+sub do_git_init ($) {
 	my ($self) = @_;
-	my $wt = File::Temp->newdir('solver.wt-XXXXXXXX', TMPDIR => 1);
-	my $dir = $self->{wt_dir} = $wt->dirname;
+	my $dir = $self->{tmp}->dirname;
+	my $git_dir = "$dir/git";
 
 	foreach ('', qw(objects refs objects/info refs/heads)) {
-		mkdir("$dir/.git/$_") or die "mkdir $_: $!";
+		mkdir("$git_dir/$_") or die "mkdir $_: $!";
 	}
-	open my $fh, '>', "$dir/.git/config" or die "open .git/config: $!";
-	print $fh <<'EOF' or die "print .git/config $!";
+	open my $fh, '>', "$git_dir/config" or die "open git/config: $!";
+	print $fh <<'EOF' or die "print git/config $!";
 [core]
 	repositoryFormatVersion = 0
 	filemode = true
@@ -246,19 +245,23 @@ sub do_git_init_wt ($) {
 	fsyncObjectfiles = false
 	logAllRefUpdates = false
 EOF
-	close $fh or die "close .git/config: $!";
+	close $fh or die "close git/config: $!";
 
-	open $fh, '>', "$dir/.git/HEAD" or die "open .git/HEAD: $!";
-	print $fh "ref: refs/heads/master\n" or die "print .git/HEAD: $!";
-	close $fh or die "close .git/HEAD: $!";
+	open $fh, '>', "$git_dir/HEAD" or die "open git/HEAD: $!";
+	print $fh "ref: refs/heads/master\n" or die "print git/HEAD: $!";
+	close $fh or die "close git/HEAD: $!";
 
-	my $f = '.git/objects/info/alternates';
-	open $fh, '>', "$dir/$f" or die "open: $f: $!";
+	my $f = 'objects/info/alternates';
+	open $fh, '>', "$git_dir/$f" or die "open: $f: $!";
 	print($fh (map { "$_->{git_dir}/objects\n" } @{$self->{gits}})) or
 		die "print $f: $!";
 	close $fh or die "close: $f: $!";
-	my $wt_git = $self->{wt_git} = PublicInbox::Git->new("$dir/.git");
-	$wt_git->{-wt} = $wt;
+	my $tmp_git = $self->{tmp_git} = PublicInbox::Git->new($git_dir);
+	$tmp_git->{-tmp} = $self->{tmp};
+	$self->{git_env} = {
+		GIT_DIR => $git_dir,
+		GIT_INDEX_FILE => "$git_dir/index",
+	};
 	prepare_index($self);
 }
 
@@ -280,8 +283,8 @@ sub do_step ($) {
 
 		# step 2: then we instantiate a working tree once
 		# the todo queue is finally empty:
-		} elsif (!defined($self->{wt_git})) {
-			do_git_init_wt($self);
+		} elsif (!defined($self->{tmp_git})) {
+			do_git_init($self);
 
 		# step 3: apply each patch in the stack
 		} elsif (scalar @{$self->{patches}}) {
@@ -342,20 +345,20 @@ sub parse_ls_files ($$$$) {
 "BUG: index mismatch: file=$file != path_b=$di->{path_b}";
 	}
 
-	my $wt_git = $self->{wt_git} or die 'no git working tree';
-	my (undef, undef, $size) = $wt_git->check($oid_b_full);
+	my $tmp_git = $self->{tmp_git} or die 'no git working tree';
+	my (undef, undef, $size) = $tmp_git->check($oid_b_full);
 	defined($size) or die "check $oid_b_full failed";
 
 	dbg($self, "index at:\n$mode_b $oid_b_full\t$file");
-	my $created = [ $wt_git, $oid_b_full, 'blob', $size, $di ];
+	my $created = [ $tmp_git, $oid_b_full, 'blob', $size, $di ];
 	mark_found($self, $di->{oid_b}, $created);
 	next_step($self); # onto the next patch
 }
 
 sub start_ls_files ($$) {
 	my ($self, $di) = @_;
-	my $cmd = [qw(git -C), $self->{wt_dir}, qw(ls-files -s -z)];
-	my $qsp = PublicInbox::Qspawn->new($cmd);
+	my $cmd = [qw(git ls-files -s -z)];
+	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env});
 	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
 		my ($bref) = @_;
 		eval { parse_ls_files($self, $qsp, $bref, $di) };
@@ -376,11 +379,10 @@ sub do_git_apply ($) {
 		"\n" . join('', @{$di->{hdr_lines}}));
 
 	# we need --ignore-whitespace because some patches are CRLF
-	my $cmd = [ qw(git -C), $self->{wt_dir},
-	            qw(apply --cached --ignore-whitespace
+	my $cmd = [ qw(git apply --cached --ignore-whitespace
 		       --whitespace=warn --verbose) ];
 	my $rdr = { 0 => fileno($tmp), 2 => 1 };
-	my $qsp = PublicInbox::Qspawn->new($cmd, undef, $rdr);
+	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
 	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
 		my ($bref) = @_;
 		close $tmp;
@@ -483,6 +485,7 @@ sub solve ($$$$$) {
 	$self->{todo} = [ { %$hints, oid_b => $oid_want } ];
 	$self->{patches} = []; # [ $di, $di, ... ]
 	$self->{found} = {}; # { abbr => [ ::Git, oid, type, size, $di ] }
+	$self->{tmp} = File::Temp->newdir('solver.tmp-XXXXXXXX', TMPDIR => 1);
 
 	dbg($self, "solving $oid_want ...");
 	my $step_cb = step_cb($self);
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/14] solver: reduce "git apply" invocations
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (9 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 10/14] solver: hold patches in temporary directory Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 12/14] qspawn: decode $? for user-friendliness Eric Wong
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

"git apply" is capable of applying multiple patches in one
invocation, so give it multiple patches on the command-line
now that we no longer rely on anonymous file handles to hold
patches.

This cuts down a 64-patch series on git@vger from ~1s to ~800ms
with vfork spawn enabled using Inline::C.
---
 lib/PublicInbox/SolverGit.pm | 61 +++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index e307202..dfc28d2 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -18,6 +18,12 @@ use PublicInbox::MsgIter qw(msg_iter msg_part_text);
 use PublicInbox::Qspawn;
 use URI::Escape qw(uri_escape_utf8);
 
+# POSIX requires _POSIX_ARG_MAX >= 4096, and xargs is required to
+# subtract 2048 bytes.  We also don't factor in environment variable
+# headroom into this.
+use POSIX qw(sysconf _SC_ARG_MAX);
+my $ARG_SIZE_MAX = (sysconf(_SC_ARG_MAX) || 4096) - 2048;
+
 # di = diff info / a hashref with information about a diff ($di):
 # {
 #	oid_a => abbreviated pre-image oid,
@@ -75,8 +81,8 @@ sub solve_existing ($$) {
 	scalar(@ambiguous) ? \@ambiguous : undef;
 }
 
-sub extract_diff ($$$$) {
-	my ($p, $re, $ibx, $smsg) = @_;
+sub extract_diff ($$$$$) {
+	my ($self, $p, $re, $ibx, $smsg) = @_;
 	my ($part) = @$p; # ignore $depth and @idx;
 	my $hdr_lines; # diff --git a/... b/...
 	my $tmp;
@@ -103,9 +109,11 @@ sub extract_diff ($$$$) {
 				}
 			}
 
+
 			# start writing the diff out to a tempfile
-			open($tmp, '+>', undef) or die "open(tmp): $!";
-			$di->{tmp} = $tmp;
+			my $pn = ++$self->{tot};
+			open($tmp, '>', $self->{tmp}->dirname . "/$pn") or
+							die "open(tmp): $!";
 
 			push @$hdr_lines, $l;
 			$di->{hdr_lines} = $hdr_lines;
@@ -115,7 +123,7 @@ sub extract_diff ($$$$) {
 			$di->{ibx} = $ibx;
 			$di->{smsg} = $smsg;
 		} elsif ($l =~ m!\Adiff --git ("?a/.+) ("?b/.+)$!) {
-			return $di if $tmp; # got our blob, done!
+			last if $tmp; # got our blob, done!
 
 			my ($path_a, $path_b) = ($1, $2);
 
@@ -145,7 +153,9 @@ sub extract_diff ($$$$) {
 			}
 		}
 	}
-	$tmp ? $di : undef;
+	return undef unless $tmp;
+	close $tmp or die "close(tmp): $!";
+	$di;
 }
 
 sub path_searchable ($) { defined($_[0]) && $_[0] =~ m!\A[\w/\. \-]+\z! }
@@ -182,7 +192,7 @@ sub find_extract_diff ($$$) {
 	foreach my $smsg (@$msgs) {
 		$ibx->smsg_mime($smsg) or next;
 		msg_iter(delete($smsg->{mime}), sub {
-			$di ||= extract_diff($_[0], $re, $ibx, $smsg);
+			$di ||= extract_diff($self, $_[0], $re, $ibx, $smsg);
 		});
 		return $di if $di;
 	}
@@ -192,7 +202,6 @@ sub prepare_index ($) {
 	my ($self) = @_;
 	my $patches = $self->{patches};
 	$self->{nr} = 0;
-	$self->{tot} = scalar @$patches;
 
 	my $di = $patches->[0] or die 'no patches';
 	my $oid_a = $di->{oid_a} or die '{oid_a} unset';
@@ -368,24 +377,31 @@ sub start_ls_files ($$) {
 
 sub do_git_apply ($) {
 	my ($self) = @_;
-
-	my $di = shift @{$self->{patches}} or die 'empty {patches}';
-	my $tmp = delete $di->{tmp} or die 'no tmp ', di_url($self, $di);
-	$tmp->flush or die "tmp->flush failed: $!";
-	sysseek($tmp, 0, SEEK_SET) or die "sysseek(tmp) failed: $!";
-
-	my $i = ++$self->{nr};
-	dbg($self, "\napplying [$i/$self->{tot}] " . di_url($self, $di) .
-		"\n" . join('', @{$di->{hdr_lines}}));
+	my $dn = $self->{tmp}->dirname;
+	my $patches = $self->{patches};
 
 	# we need --ignore-whitespace because some patches are CRLF
-	my $cmd = [ qw(git apply --cached --ignore-whitespace
-		       --whitespace=warn --verbose) ];
-	my $rdr = { 0 => fileno($tmp), 2 => 1 };
-	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
+	my @cmd = qw(git apply --cached --ignore-whitespace
+			--whitespace=warn --verbose);
+	my $len = length(join(' ', @cmd));
+	my $total = $self->{tot};
+	my $di; # keep track of the last one for "git ls-files"
+
+	do {
+		my $i = ++$self->{nr};
+		$di = shift @$patches;
+		dbg($self, "\napplying [$i/$total] " . di_url($self, $di) .
+			"\n" . join('', @{$di->{hdr_lines}}));
+		my $pn = $total + 1 - $i;
+		my $path = "$dn/$pn";
+		$len += length($path) + 1;
+		push @cmd, $path;
+	} while (@$patches && $len < $ARG_SIZE_MAX);
+
+	my $rdr = { 2 => 1 };
+	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $rdr);
 	$qsp->psgi_qx($self->{psgi_env}, undef, sub {
 		my ($bref) = @_;
-		close $tmp;
 		dbg($self, $$bref);
 		if (my $err = $qsp->{err}) {
 			ERR($self, "git apply error: $err");
@@ -481,6 +497,7 @@ sub solve ($$$$$) {
 
 	$self->{oid_want} = $oid_want;
 	$self->{out} = $out;
+	$self->{tot} = 0;
 	$self->{psgi_env} = $env;
 	$self->{todo} = [ { %$hints, oid_b => $oid_want } ];
 	$self->{patches} = []; # [ $di, $di, ... ]
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/14] qspawn: decode $? for user-friendliness
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (10 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 11/14] solver: reduce "git apply" invocations Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 13/14] viewvcs: do not show final error message twice Eric Wong
  2019-01-27  4:03 ` [PATCH 14/14] solver: crank up max patches to 9999 Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

The raw value of $? isn't very useful, generally.
---
 lib/PublicInbox/Qspawn.pm | 11 ++++++++++-
 t/qspawn.t                |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index 6859a8a..913fac8 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -30,13 +30,22 @@ sub _do_spawn {
 	$cb->($self->{rpipe});
 }
 
+sub child_err ($) {
+	my ($child_error) = @_; # typically $?
+	my $exitstatus = ($child_error >> 8) or return;
+	my $sig = $child_error & 127;
+	my $msg = "exit status=$exitstatus";
+	$msg .= " signal=$sig" if $sig;
+	$msg;
+}
+
 sub finish ($) {
 	my ($self) = @_;
 	my $limiter = $self->{limiter};
 	my $running;
 	if (delete $self->{rpipe}) {
 		my $pid = delete $self->{pid};
-		$self->{err} = $pid == waitpid($pid, 0) ? $? :
+		$self->{err} = $pid == waitpid($pid, 0) ? child_err($?) :
 				"PID:$pid still running?";
 		$running = --$limiter->{running};
 	}
diff --git a/t/qspawn.t b/t/qspawn.t
index 745ec4d..ab6e375 100644
--- a/t/qspawn.t
+++ b/t/qspawn.t
@@ -31,7 +31,7 @@ my $limiter = PublicInbox::Qspawn::Limiter->new(1);
 		my ($rpipe) = @_;
 		is(0, sysread($rpipe, my $buf, 1), 'read zero bytes from false');
 		my $err = $x->finish;
-		is($err, 256, 'error on finish');
+		ok($err, 'error on finish');
 		$run = 1;
 	});
 	is($run, 1, 'callback ran alright');
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 13/14] viewvcs: do not show final error message twice
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (11 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 12/14] qspawn: decode $? for user-friendliness Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  2019-01-27  4:03 ` [PATCH 14/14] solver: crank up max patches to 9999 Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

SolverGit::ERR already writes the exception to the debug
log before calling {user_cb}, so there's no need for viewvcs
to append it.
---
 lib/PublicInbox/ViewVCS.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index fa76086..5de37ee 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -46,7 +46,6 @@ sub solve_result {
 	$log = do { local $/; <$log> };
 
 	my $ref = ref($res);
-	$log .= $res unless $ref;
 	my $l = PublicInbox::Linkify->new;
 	$l->linkify_1($log);
 	$log = '<pre>debug log:</pre><hr /><pre>' .
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 14/14] solver: crank up max patches to 9999
  2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
                   ` (12 preceding siblings ...)
  2019-01-27  4:03 ` [PATCH 13/14] viewvcs: do not show final error message twice Eric Wong
@ 2019-01-27  4:03 ` Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2019-01-27  4:03 UTC (permalink / raw)
  To: meta

Might as well, since the only constraint is filesystem space
for temporary files for public-inbox-httpd users.

-httpd can fairly share work across clients with our use of
psgi_qx; and there's a recent patch series in git@vger with 64
patches in sequence.
---
 lib/PublicInbox/SolverGit.pm | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index dfc28d2..39acbe4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -24,6 +24,14 @@ use URI::Escape qw(uri_escape_utf8);
 use POSIX qw(sysconf _SC_ARG_MAX);
 my $ARG_SIZE_MAX = (sysconf(_SC_ARG_MAX) || 4096) - 2048;
 
+# By default, "git format-patch" generates filenames with a four-digit
+# prefix, so that means 9999 patch series are OK, right? :>
+# Maybe we can make this configurable, main concern is disk space overhead
+# for uncompressed patch fragments.  Aside from space, public-inbox-httpd
+# is otherwise unaffected by having many patches, here, as it can share
+# work fairly.  Other PSGI servers may have trouble, though.
+my $MAX_PATCH = 9999;
+
 # di = diff info / a hashref with information about a diff ($di):
 # {
 #	oid_a => abbreviated pre-image oid,
@@ -425,7 +433,7 @@ sub di_url ($$) {
 sub resolve_patch ($$) {
 	my ($self, $want) = @_;
 
-	if (scalar(@{$self->{patches}}) > $self->{max_patch}) {
+	if (scalar(@{$self->{patches}}) > $MAX_PATCH) {
 		die "Aborting, too many steps to $self->{oid_want}";
 	}
 
@@ -477,7 +485,6 @@ sub new {
 	bless {
 		gits => $ibx->{-repo_objs},
 		user_cb => $user_cb,
-		max_patch => 100,
 
 		# TODO: config option for searching related inboxes
 		inboxes => [ $ibx ],
-- 
EW


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-27  4:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-27  4:03 [PATCH 00/14] convert solver to use pi-httpd.async Eric Wong
2019-01-27  4:03 ` [PATCH 01/14] httpd/async: remove needless sysread wrapper Eric Wong
2019-01-27  4:03 ` [PATCH 02/14] qspawn: implement psgi_return and use it for githttpbackend Eric Wong
2019-01-27  4:03 ` [PATCH 03/14] qspawn|getlinebody: support streaming filters Eric Wong
2019-01-27  4:03 ` [PATCH 04/14] qspawn|httpd/async: improve and fix out-of-date comments Eric Wong
2019-01-27  4:03 ` [PATCH 05/14] httpd/async: stop running command if client disconnects Eric Wong
2019-01-27  4:03 ` [PATCH 06/14] qspawn: implement psgi_qx Eric Wong
2019-01-27  4:03 ` [PATCH 07/14] t/qspawn.t: psgi_qx stderr test Eric Wong
2019-01-27  4:03 ` [PATCH 08/14] view: swap CRLF for LF in HTML output Eric Wong
2019-01-27  4:03 ` [PATCH 09/14] solver: rewrite to use Qspawn->psgi_qx and pi-httpd.async Eric Wong
2019-01-27  4:03 ` [PATCH 10/14] solver: hold patches in temporary directory Eric Wong
2019-01-27  4:03 ` [PATCH 11/14] solver: reduce "git apply" invocations Eric Wong
2019-01-27  4:03 ` [PATCH 12/14] qspawn: decode $? for user-friendliness Eric Wong
2019-01-27  4:03 ` [PATCH 13/14] viewvcs: do not show final error message twice Eric Wong
2019-01-27  4:03 ` [PATCH 14/14] solver: crank up max patches to 9999 Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).