user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] git: calculate MAX_INFLIGHT properly in Perl
@ 2023-09-29  2:44  7% Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2023-09-29  2:44 UTC (permalink / raw)
  To: meta

Unlike C, Perl automatically converts quotients to double-precision
floating point even with UV/IV numerators and denominators.  So
force the intermediate quotient to be an integer before
multiplying it by the size of each inflight array element.

This bug was inconsequential for all platforms since d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)
and inconsequential on most (or all?) Linux even before that due
to the larger 4096-byte PIPE_BUF on Linux.
---
 lib/PublicInbox/Git.pm | 4 ++--
 t/git.t                | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 764e38d7..53887e3d 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -35,10 +35,10 @@ my @MODIFIED_DATE = qw[for-each-ref --sort=-committerdate
 			--format=%(committerdate:raw) --count=1];
 
 # 512: POSIX PIPE_BUF minimum (see pipe(7))
-# 3: @$inflight is flattened [ $OID, $cb, $arg ]
 # 65: SHA-256 hex size + "\n" in preparation for git using non-SHA1
+# 3: @$inflight is flattened [ $OID, $cb, $arg ]
 use constant {
-	MAX_INFLIGHT => 512 * 3 / (65 + length('contents ')),
+	MAX_INFLIGHT => int(512 / (65 + length('contents '))) * 3,
 	BATCH_CMD_VER => v2.36.0, # git 2.36+
 };
 
diff --git a/t/git.t b/t/git.t
index dfa5eab2..bde6d35b 100644
--- a/t/git.t
+++ b/t/git.t
@@ -8,6 +8,8 @@ my ($dir, $for_destroy) = tmpdir();
 use PublicInbox::Import;
 use POSIX qw(strftime);
 use PublicInbox::Git;
+is(PublicInbox::Git::MAX_INFLIGHT,
+	int(PublicInbox::Git::MAX_INFLIGHT), 'MAX_INFLIGHT is an integer');
 
 {
 	PublicInbox::Import::init_bare($dir, 'master');

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH 1/2] git: use --batch-command in git 2.36+ to save processes
  @ 2023-01-27  8:14  7%   ` Eric Wong
  0 siblings, 0 replies; 6+ results
From: Eric Wong @ 2023-01-27  8:14 UTC (permalink / raw)
  To: meta

t/imapd.t on my slower 32-bit machine fails without the
following fix.  I guess prefetch wasn't being tested on my
faster workstation?

Also, the prior use of `print { $git->{out} }' is a potential (but
unlikely) bug since commit d4ba8828ab23f278
(git: fix asynchronous batching for deep pipelines, 2023-01-04)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 1c77b4ce..a9db8ad7 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -239,6 +239,16 @@ sub cat_async_retry ($$) {
 	cat_async_step($self, $inflight); # take one step
 }
 
+# returns true if prefetch is successful
+sub async_prefetch {
+	my ($self, $oid, $cb, $arg) = @_;
+	my $inflight = $self->{inflight} or return;
+	return if @$inflight;
+	substr($oid, 0, 0) = 'contents ' if $self->{-bc};
+	write_all($self, $self->{out}, "$oid\n", \&cat_async_step, $inflight);
+	push(@$inflight, $oid, $cb, $arg);
+}
+
 sub cat_async_step ($$) {
 	my ($self, $inflight) = @_;
 	die 'BUG: inflight empty or odd' if scalar(@$inflight) < 3;
diff --git a/lib/PublicInbox/GitAsyncCat.pm b/lib/PublicInbox/GitAsyncCat.pm
index 1f86aad0..49a3c602 100644
--- a/lib/PublicInbox/GitAsyncCat.pm
+++ b/lib/PublicInbox/GitAsyncCat.pm
@@ -98,12 +98,8 @@ sub ibx_async_prefetch {
 			$oid .= " $git->{git_dir}\n";
 			return $GCF2C->gcf2_async(\$oid, $cb, $arg); # true
 		}
-	} elsif ($git->{async_cat} && (my $inflight = $git->{inflight})) {
-		if (!@$inflight) {
-			print { $git->{out} } $oid, "\n" or
-						$git->fail("write error: $!");
-			return push(@$inflight, $oid, $cb, $arg);
-		}
+	} elsif ($git->{async_cat}) {
+		return $git->async_prefetch($oid, $cb, $arg);
 	}
 	undef;
 }

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH] git: write_all: remove leftover debug messages
  2023-01-05  1:44  7%                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
@ 2023-01-05  7:32  0%                       ` Chris Brannon
  0 siblings, 0 replies; 6+ results
From: Chris Brannon @ 2023-01-05  7:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Subject: [PATCH] git: write_all: remove leftover debug messages
>
> I used these messages during development to verify Alpine was
> triggering the intended codepaths.  They're no longer necessary
> and just noise at this point.
>
> Reported-by: Chris Brannon <chris@the-brannons.com>
> Tested-by: Chris Brannon <chris@the-brannons.com>
> Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
> ---
>  lib/PublicInbox/Git.pm | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Thanks!
-- Chris

^ permalink raw reply	[relevance 0%]

* [PATCH] git: write_all: remove leftover debug messages
  2023-01-05  1:08  9%                   ` Chris Brannon
@ 2023-01-05  1:44  7%                     ` Eric Wong
  2023-01-05  7:32  0%                       ` Chris Brannon
  0 siblings, 1 reply; 6+ results
From: Eric Wong @ 2023-01-05  1:44 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> The conversion seems to have been successful, but it logged very many of
> the following:
> E: Resource temporarily unavailable at
> /usr/share/perl5/vendor_perl/PublicInbox/Git.pm line 320.
> 
> I'm sorry, I didn't count how many.  It seemed like multiple screenfuls
> at least.

Oops, yeah.   Just leftover debug messages that never triggered
on my FreeBSD and Debian systems.  Will push this out:

---------8<----------
Subject: [PATCH] git: write_all: remove leftover debug messages

I used these messages during development to verify Alpine was
triggering the intended codepaths.  They're no longer necessary
and just noise at this point.

Reported-by: Chris Brannon <chris@the-brannons.com>
Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
---
 lib/PublicInbox/Git.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 86b80a4e..c7f82ba2 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -315,11 +315,10 @@ sub write_all {
 		my $w = syswrite($out, $buf);
 		if (defined $w) {
 			return if $w == length($buf);
-			warn "chop: $w";
 			substr($buf, 0, $w, ''); # sv_chop
 		} elsif ($! != EAGAIN) {
 			$self->fail("write: $!");
-		} else { warn "E: $!" }
+		}
 		$read_step->($self, $inflight);
 	} while (1);
 }

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH] git: fix asynchronous batching for deep pipelines
  2023-01-04  3:49 14%                 ` [PATCH] git: fix asynchronous batching for deep pipelines Eric Wong
@ 2023-01-05  1:08  9%                   ` Chris Brannon
  2023-01-05  1:44  7%                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
  0 siblings, 1 reply; 6+ results
From: Chris Brannon @ 2023-01-05  1:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

>
> --------8<-------
> Subject: [PATCH] git: fix asynchronous batching for deep pipelines
>
> ...By using non-blocking pipe writes.  This avoids problems for
> musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
> uses a smaller input buffer than glibc or FreeBSD libc.
>
> My key mistake was our check against MAX_INFLIGHT is only useful
> for the initial batch of requests.  It is not useful for
> subsequent requests since git will drain the pipe at
> unpredictable rates due to libc differences.
>
> To fix this problem, I initially tried to drain the read pipe
> as long as readable data was pending.  However, reading git
> output without giving git more work would also limit parallelism
> opportunities since we don't want git to sit idle, either.  This
> change ensures we keep both pipes reasonably full to reduce
> stalls and maximize parallelism between git and public-inbox.
>
> While the limit set a few weeks ago in commit
> 56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
> remains in place, any higher or lower limit will work.  It may
> be worth it to use an even lower limit to improve interactivity
> w.r.t. Ctrl-C interrupts.
>
> I've tested the pre-56e6e587745c and even higher values on an
> Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>
>
> Reported-by: Chris Brannon <chris@the-brannons.com>
> Tested-by: Chris Brannon <chris@the-brannons.com>
> Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
*SNIP*

Thanks!

The conversion seems to have been successful, but it logged very many of
the following:
E: Resource temporarily unavailable at
/usr/share/perl5/vendor_perl/PublicInbox/Git.pm line 320.

I'm sorry, I didn't count how many.  It seemed like multiple screenfuls
at least.

-- Chris

^ permalink raw reply	[relevance 9%]

* [PATCH] git: fix asynchronous batching for deep pipelines
  @ 2023-01-04  3:49 14%                 ` Eric Wong
  2023-01-05  1:08  9%                   ` Chris Brannon
  0 siblings, 1 reply; 6+ results
From: Eric Wong @ 2023-01-04  3:49 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Also, can I assume just running 'public-inbox-index' on any
> > freshly-cloned inbox also fails for you independently of
> > -convert?
> 
> Yes it does.  I double-checked to be sure.
> 
> Thanks for the fix!  If you come up with something more permanent that
> you want me to test, please ask.



--------8<-------
Subject: [PATCH] git: fix asynchronous batching for deep pipelines

...By using non-blocking pipe writes.  This avoids problems for
musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
uses a smaller input buffer than glibc or FreeBSD libc.

My key mistake was our check against MAX_INFLIGHT is only useful
for the initial batch of requests.  It is not useful for
subsequent requests since git will drain the pipe at
unpredictable rates due to libc differences.

To fix this problem, I initially tried to drain the read pipe
as long as readable data was pending.  However, reading git
output without giving git more work would also limit parallelism
opportunities since we don't want git to sit idle, either.  This
change ensures we keep both pipes reasonably full to reduce
stalls and maximize parallelism between git and public-inbox.

While the limit set a few weeks ago in commit
56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
remains in place, any higher or lower limit will work.  It may
be worth it to use an even lower limit to improve interactivity
w.r.t. Ctrl-C interrupts.

I've tested the pre-56e6e587745c and even higher values on an
Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>

Reported-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
---
 lib/PublicInbox/Git.pm | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index a1af776b..86b80a4e 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -145,6 +145,7 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
+	$out_w->blocking(0);
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }
@@ -203,7 +204,9 @@ sub cat_async_retry ($$) {
 	for (my $i = 0; $i < @$inflight; $i += 3) {
 		$buf .= "$inflight->[$i]\n";
 	}
+	$self->{out}->blocking(1); # brand new pipe, should never block
 	print { $self->{out} } $buf or $self->fail("write error: $!");
+	$self->{out}->blocking(0);
 	my $req = shift @$inflight;
 	unshift(@$inflight, \$req); # \$ref to indicate retried
 
@@ -305,13 +308,27 @@ sub check_async_begin ($) {
 	$self->{inflight_c} = [];
 }
 
+sub write_all {
+	my ($self, $out, $buf, $read_step, $inflight) = @_;
+	$read_step->($self, $inflight) while @$inflight >= MAX_INFLIGHT;
+	do {
+		my $w = syswrite($out, $buf);
+		if (defined $w) {
+			return if $w == length($buf);
+			warn "chop: $w";
+			substr($buf, 0, $w, ''); # sv_chop
+		} elsif ($! != EAGAIN) {
+			$self->fail("write: $!");
+		} else { warn "E: $!" }
+		$read_step->($self, $inflight);
+	} while (1);
+}
+
 sub check_async ($$$$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight_c = $self->{inflight_c} // check_async_begin($self);
-	while (scalar(@$inflight_c) >= MAX_INFLIGHT) {
-		check_async_step($self, $inflight_c);
-	}
-	print { $self->{out_c} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out_c}, $oid."\n",
+		\&check_async_step, $inflight_c);
 	push(@$inflight_c, $oid, $cb, $arg);
 }
 
@@ -496,10 +513,7 @@ sub cat_async_begin {
 sub cat_async ($$$;$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight = $self->{inflight} // cat_async_begin($self);
-	while (scalar(@$inflight) >= MAX_INFLIGHT) {
-		cat_async_step($self, $inflight);
-	}
-	print { $self->{out} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out}, $oid."\n", \&cat_async_step, $inflight);
 	push(@$inflight, $oid, $cb, $arg);
 }
 

^ permalink raw reply related	[relevance 14%]

Results 1-6 of 6 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2022-12-21 11:28     public-inbox-convert hangs on systems using musl libc Chris Brannon
2022-12-21 12:21     ` Eric Wong
2022-12-21 13:46       ` Chris Brannon
2022-12-21 19:48         ` Eric Wong
2022-12-21 20:46           ` Chris Brannon
2022-12-21 21:11             ` Eric Wong
2022-12-21 22:17               ` Chris Brannon
2022-12-21 23:22                 ` [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum Eric Wong
2022-12-21 23:57                   ` Chris Brannon
2023-01-04  3:49 14%                 ` [PATCH] git: fix asynchronous batching for deep pipelines Eric Wong
2023-01-05  1:08  9%                   ` Chris Brannon
2023-01-05  1:44  7%                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
2023-01-05  7:32  0%                       ` Chris Brannon
2023-01-26  9:32     [PATCH 0/2] use `git cat-file --batch-command' if available Eric Wong
2023-01-26  9:32     ` [PATCH 1/2] git: use --batch-command in git 2.36+ to save processes Eric Wong
2023-01-27  8:14  7%   ` Eric Wong
2023-09-29  2:44  7% [PATCH] git: calculate MAX_INFLIGHT properly in Perl Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).