user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 1/4] www: use a dedicated limiter for blob solver
Date: Mon, 11 Mar 2024 19:40:09 +0000	[thread overview]
Message-ID: <20240311194012.1266143-2-e@80x24.org> (raw)
In-Reply-To: <20240311194012.1266143-1-e@80x24.org>

Wrap the entire solver command chain with a dedicated limiter.
The normal limiter is designed for longer-lived commands or ones
which serve a single HTTP request (e.g. git-http-backend or
cgit) and not effective for short memory + CPU intensive commands
used for solver.

Each overall solver request is both memory + CPU intensive: it
spawns several short-lived git processes(*) in addition to a
longer-lived `git cat-file --batch' process.

Thus running parallel solvers from a single -netd/-httpd worker
(which have their own parallelization) results in excessive
parallelism that is both memory and CPU-bound (not network-bound)
and cascade into slowdowns for handling simpler memory/CPU-bound
requests.  Parallel solvers were also responsible for the
increased lifetime and frequency of zombies since the event loop
was too saturated to reap them.

We'll also return 503 on excessive solver queueing, since these
require an FD for the client HTTP(S) socket to be held onto.

(*) git (update-index|apply|ls-files) are all run by solver and
    short-lived
---
 lib/PublicInbox/SolverGit.pm | 15 ++++++-----
 lib/PublicInbox/ViewVCS.pm   | 48 +++++++++++++++++++++++++++++-------
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 4e79f750..296e7d17 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -256,6 +256,12 @@ sub update_index_result ($$) {
 	next_step($self); # onto do_git_apply
 }
 
+sub qsp_qx ($$$) {
+	my ($self, $qsp, $cb) = @_;
+	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
+	$qsp->psgi_qx($self->{psgi_env}, $self->{limiter}, $cb, $self);
+}
+
 sub prepare_index ($) {
 	my ($self) = @_;
 	my $patches = $self->{patches};
@@ -284,9 +290,8 @@ sub prepare_index ($) {
 	my $cmd = [ qw(git update-index -z --index-info) ];
 	my $qsp = PublicInbox::Qspawn->new($cmd, $self->{git_env}, $rdr);
 	$path_a = git_quote($path_a);
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
 	$self->{-msg} = "index prepared:\n$mode_a $oid_full\t$path_a";
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&update_index_result, $self);
+	qsp_qx $self, $qsp, \&update_index_result;
 }
 
 # pure Perl "git init"
@@ -465,8 +470,7 @@ sub apply_result ($$) { # qx_cb
 	my @cmd = qw(git ls-files -s -z);
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env});
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&ls_files_result, $self);
+	qsp_qx $self, $qsp, \&ls_files_result;
 }
 
 sub do_git_apply ($) {
@@ -495,8 +499,7 @@ sub do_git_apply ($) {
 	my $opt = { 2 => 1, -C => _tmp($self)->dirname, quiet => 1 };
 	my $qsp = PublicInbox::Qspawn->new(\@cmd, $self->{git_env}, $opt);
 	$self->{-cur_di} = $di;
-	$qsp->{qsp_err} = \($self->{-qsp_err} = '');
-	$qsp->psgi_qx($self->{psgi_env}, undef, \&apply_result, $self);
+	qsp_qx $self, $qsp, \&apply_result;
 }
 
 sub di_url ($$) {
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 61329db6..790b9a2c 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -49,6 +49,10 @@ my %GIT_MODE = (
 	'160000' => 'g', # commit (gitlink)
 );
 
+# TODO: not fork safe, but we don't fork w/o exec in PublicInbox::WWW
+my (@solver_q, $solver_lim);
+my $solver_nr = 0;
+
 sub html_page ($$;@) {
 	my ($ctx, $code) = @_[0, 1];
 	my $wcb = delete $ctx->{-wcb};
@@ -614,26 +618,52 @@ sub show_blob { # git->cat_async callback
 		'</code></pre></td></tr></table>'.dbg_log($ctx), @def);
 }
 
-# GET /$INBOX/$GIT_OBJECT_ID/s/
-# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
-sub show ($$;$) {
-	my ($ctx, $oid_b, $fn) = @_;
-	my $hints = $ctx->{hints} = {};
+sub start_solver ($) {
+	my ($ctx) = @_;
 	while (my ($from, $to) = each %QP_MAP) {
 		my $v = $ctx->{qp}->{$from} // next;
-		$hints->{$to} = $v if $v ne '';
+		$ctx->{hints}->{$to} = $v if $v ne '';
 	}
-	$ctx->{fn} = $fn;
-	$ctx->{-tmp} = File::Temp->newdir("solver.$oid_b-XXXX", TMPDIR => 1);
+	$ctx->{-next_solver} = PublicInbox::OnDestroy->new($$, \&next_solver);
+	++$solver_nr;
+	$ctx->{-tmp} = File::Temp->newdir("solver.$ctx->{oid_b}-XXXX",
+						TMPDIR => 1);
 	$ctx->{lh} or open $ctx->{lh}, '+>>', "$ctx->{-tmp}/solve.log";
 	my $solver = PublicInbox::SolverGit->new($ctx->{ibx},
 						\&solve_result, $ctx);
+	$solver->{limiter} = $solver_lim;
 	$solver->{gits} //= [ $ctx->{git} ];
 	$solver->{tmp} = $ctx->{-tmp}; # share tmpdir
 	# PSGI server will call this immediately and give us a callback (-wcb)
+	$solver->solve(@$ctx{qw(env lh oid_b hints)});
+}
+
+# run the next solver job when done and DESTROY-ed
+sub next_solver {
+	--$solver_nr;
+	# XXX FIXME: client may've disconnected if it waited a long while
+	start_solver(shift(@solver_q) // return);
+}
+
+sub may_start_solver ($) {
+	my ($ctx) = @_;
+	$solver_lim //= $ctx->{www}->{pi_cfg}->limiter('codeblob');
+	if ($solver_nr >= $solver_lim->{max}) {
+		@solver_q > 128 ? html_page($ctx, 503, 'too busy')
+				: push(@solver_q, $ctx);
+	} else {
+		start_solver($ctx);
+	}
+}
+
+# GET /$INBOX/$GIT_OBJECT_ID/s/
+# GET /$INBOX/$GIT_OBJECT_ID/s/$FILENAME
+sub show ($$;$) {
+	my ($ctx, $oid_b, $fn) = @_;
+	@$ctx{qw(oid_b fn)} = ($oid_b, $fn);
 	sub {
 		$ctx->{-wcb} = $_[0]; # HTTP write callback
-		$solver->solve($ctx->{env}, $ctx->{lh}, $oid_b, $hints);
+		may_start_solver $ctx;
 	};
 }
 

  reply	other threads:[~2024-03-11 19:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 19:40 [PATCH 0/4] memory reductions for WWW + solver Eric Wong
2024-03-11 19:40 ` Eric Wong [this message]
2024-03-11 19:40 ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
2024-03-11 19:40 ` [PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc Eric Wong
2024-03-11 19:40 ` [PATCH 4/4] codesearch: deduplicate $git->{nick} field Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240311194012.1266143-2-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).