From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A5C721F47C; Wed, 4 Jan 2023 03:49:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1672804173; bh=UNiCW2Sr1K9z4pfOUZtSZEuA3IdOOP57KtOA//N4b6I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bS3aaqcKEgYYb4KwqKXTm/ObcU6lraxRHMb4OiHRKPR7yhZo7xD7J3Pa8G3oTU8ul HC94oYP5PYJBMIzLL2JTdUlJVM4in9934LfzJ2Y0h++OUE4CtKvtAStHky/ribZafs FdRZZvp9CxkavSOIDA1iCGGiicQWRnvfHNPphilo= Date: Wed, 4 Jan 2023 03:49:34 +0000 From: Eric Wong To: Chris Brannon Cc: meta@public-inbox.org Subject: [PATCH] git: fix asynchronous batching for deep pipelines Message-ID: <20230104034934.M147841@dcvr> References: <875ye5m1wo.fsf@the-brannons.com> <20221221122102.M600156@dcvr> <871qosna30.fsf@the-brannons.com> <20221221194855.GA5179@dcvr> <87mt7glc23.fsf@the-brannons.com> <20221221211114.M412849@dcvr> <87edssl7u0.fsf@the-brannons.com> <20221221232210.M865188@dcvr> <877cykl387.fsf@the-brannons.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <877cykl387.fsf@the-brannons.com> List-Id: Chris Brannon wrote: > Eric Wong writes: > > > Also, can I assume just running 'public-inbox-index' on any > > freshly-cloned inbox also fails for you independently of > > -convert? > > Yes it does. I double-checked to be sure. > > Thanks for the fix! If you come up with something more permanent that > you want me to test, please ask. --------8<------- Subject: [PATCH] git: fix asynchronous batching for deep pipelines ...By using non-blocking pipe writes. This avoids problems for musl (and other libc) where getdelim(3) used by `git cat-file --batch*' uses a smaller input buffer than glibc or FreeBSD libc. My key mistake was our check against MAX_INFLIGHT is only useful for the initial batch of requests. It is not useful for subsequent requests since git will drain the pipe at unpredictable rates due to libc differences. To fix this problem, I initially tried to drain the read pipe as long as readable data was pending. However, reading git output without giving git more work would also limit parallelism opportunities since we don't want git to sit idle, either. This change ensures we keep both pipes reasonably full to reduce stalls and maximize parallelism between git and public-inbox. While the limit set a few weeks ago in commit 56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21) remains in place, any higher or lower limit will work. It may be worth it to use an even lower limit to improve interactivity w.r.t. Ctrl-C interrupts. I've tested the pre-56e6e587745c and even higher values on an Alpine VM in the GCC Farm Reported-by: Chris Brannon Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/ --- lib/PublicInbox/Git.pm | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index a1af776b..86b80a4e 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -145,6 +145,7 @@ sub _bidi_pipe { fcntl($out_w, 1031, 4096); fcntl($in_r, 1031, 4096) if $batch eq '--batch-check'; } + $out_w->blocking(0); $self->{$out} = $out_w; $self->{$in} = $in_r; } @@ -203,7 +204,9 @@ sub cat_async_retry ($$) { for (my $i = 0; $i < @$inflight; $i += 3) { $buf .= "$inflight->[$i]\n"; } + $self->{out}->blocking(1); # brand new pipe, should never block print { $self->{out} } $buf or $self->fail("write error: $!"); + $self->{out}->blocking(0); my $req = shift @$inflight; unshift(@$inflight, \$req); # \$ref to indicate retried @@ -305,13 +308,27 @@ sub check_async_begin ($) { $self->{inflight_c} = []; } +sub write_all { + my ($self, $out, $buf, $read_step, $inflight) = @_; + $read_step->($self, $inflight) while @$inflight >= MAX_INFLIGHT; + do { + my $w = syswrite($out, $buf); + if (defined $w) { + return if $w == length($buf); + warn "chop: $w"; + substr($buf, 0, $w, ''); # sv_chop + } elsif ($! != EAGAIN) { + $self->fail("write: $!"); + } else { warn "E: $!" } + $read_step->($self, $inflight); + } while (1); +} + sub check_async ($$$$) { my ($self, $oid, $cb, $arg) = @_; my $inflight_c = $self->{inflight_c} // check_async_begin($self); - while (scalar(@$inflight_c) >= MAX_INFLIGHT) { - check_async_step($self, $inflight_c); - } - print { $self->{out_c} } $oid, "\n" or $self->fail("write error: $!"); + write_all($self, $self->{out_c}, $oid."\n", + \&check_async_step, $inflight_c); push(@$inflight_c, $oid, $cb, $arg); } @@ -496,10 +513,7 @@ sub cat_async_begin { sub cat_async ($$$;$) { my ($self, $oid, $cb, $arg) = @_; my $inflight = $self->{inflight} // cat_async_begin($self); - while (scalar(@$inflight) >= MAX_INFLIGHT) { - cat_async_step($self, $inflight); - } - print { $self->{out} } $oid, "\n" or $self->fail("write error: $!"); + write_all($self, $self->{out}, $oid."\n", \&cat_async_step, $inflight); push(@$inflight, $oid, $cb, $arg); }