user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* public-inbox-convert hangs on systems using musl libc
@ 2022-12-21 11:28 Chris Brannon
  2022-12-21 12:21 ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2022-12-21 11:28 UTC (permalink / raw)
  To: meta

I can reproduce it reliably on both Alpine Linux and the musl variant of
Void Linux.  Just clone a mirror of public-inbox.org/meta and try and
convert it:

$ git clone --mirror https://public-inbox.org/meta
$ public-inbox-convert meta metanew

If I pass --no-index, the process is successful.

This is a deadlock of some sort, because during the hang, git cat-file
is blocked on write and other things are blocked on read.

That's as far as I have gotten with debugging it.

-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 11:28 public-inbox-convert hangs on systems using musl libc Chris Brannon
@ 2022-12-21 12:21 ` Eric Wong
  2022-12-21 13:46   ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2022-12-21 12:21 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> I can reproduce it reliably on both Alpine Linux and the musl variant of
> Void Linux.  Just clone a mirror of public-inbox.org/meta and try and
> convert it:
> 
> $ git clone --mirror https://public-inbox.org/meta
> $ public-inbox-convert meta metanew
> 
> If I pass --no-index, the process is successful.
> 
> This is a deadlock of some sort, because during the hang, git cat-file
> is blocked on write and other things are blocked on read.

I've requested some packages w/ cfarm admins to test with.

Do you know which pipes are which?  "lsof -p $PID +E" can help
with connectivity checking, as can script/dtas-graph in
https://80x24.org/dtas.git if you have Graph::Easy

What's curious is all the code paths should be independent of
stdio buffering, so I wouldn't think libc differences would
matter...

I also just did the above meta conversion on FreeBSD w/o problems.

> That's as far as I have gotten with debugging it.

Some shots in the dark:

1. force read pipe on our end to be non-blocking

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 882a9a4a..b239ceb9 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -143,6 +143,7 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
+	$in_r->blocking(0);
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }

2. Tweak $PIPE_BUFSIZ and/or MAX_INFLIGHT to smaller values.  e.g.

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 882a9a4a..ec40edd7 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -23,13 +23,12 @@ use Carp qw(croak carp);
 use Digest::SHA ();
 use PublicInbox::DS qw(dwaitpid);
 our @EXPORT_OK = qw(git_unquote git_quote);
-our $PIPE_BUFSIZ = 65536; # Linux default
+our $PIPE_BUFSIZ = 4096; # Linux default
 our $in_cleanup;
 our $RDTIMEO = 60_000; # milliseconds
 our $async_warn; # true in read-only daemons
 
-use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) /
-	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
+use constant MAX_INFLIGHT => 4;
 
 my %GIT_ESC = (
 	a => "\a",


MAX_INFLIGHT could go down to 1 outside of t/git.t, I think...
But my sleep deprived mind isn't finding anything that jumps out...

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 12:21 ` Eric Wong
@ 2022-12-21 13:46   ` Chris Brannon
  2022-12-21 19:48     ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2022-12-21 13:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

[-- Attachment #1: Type: text/plain, Size: 1650 bytes --]

Eric Wong <e@80x24.org> writes:

> Do you know which pipes are which?  "lsof -p $PID +E" can help
> with connectivity checking, as can script/dtas-graph in
> https://80x24.org/dtas.git if you have Graph::Easy

Yes.  I'm attaching my lsof output and a typescript.

The processes of interest here are 4849 public-inbox-convert and 4879
git cat-file.
PID 4849's FD 11 is the write end of a pipe, with 4879's stdin as the
read end.
PID 4849's FD 12 is the read end of a pipe, with 4879's stdout as the
write end.  At the point of the hang, 4849 is trying to write a SHA1 to
FD 11, while 4879 is writing an email message to its stdout.

> Some shots in the dark:
>
> 2. Tweak $PIPE_BUFSIZ and/or MAX_INFLIGHT to smaller values.  e.g.
>
> diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
> index 882a9a4a..ec40edd7 100644
> --- a/lib/PublicInbox/Git.pm
> +++ b/lib/PublicInbox/Git.pm
> @@ -23,13 +23,12 @@ use Carp qw(croak carp);
>  use Digest::SHA ();
>  use PublicInbox::DS qw(dwaitpid);
>  our @EXPORT_OK = qw(git_unquote git_quote);
> -our $PIPE_BUFSIZ = 65536; # Linux default
> +our $PIPE_BUFSIZ = 4096; # Linux default
>  our $in_cleanup;
>  our $RDTIMEO = 60_000; # milliseconds
>  our $async_warn; # true in read-only daemons
>  
> -use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) /
> -	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
> +use constant MAX_INFLIGHT => 4;
>  
>  my %GIT_ESC = (
>  	a => "\a",
>

This right here seems to have fixed it, when testing locally.

PS.  Thank you for that lsof command.  I've never used lsof in that way;
I'll have to add that to my *nix debugging toolbelt.

-- Chris


[-- Attachment #2: lsofout.txt --]
[-- Type: text/plain, Size: 4867 bytes --]

COMMAND    PID  USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
screen    4028 chris   11u   CHR    5,2      0t0       86 /dev/ptmx ->/dev/pts/7 2214,mksh,0u 2214,mksh,1u 2214,mksh,2u 4849,public-in,0u 4849,public-in,1u 4849,public-in,2u 4874,shard[0],0u 4874,shard[0],1u 4874,shard[0],2u 4879,git,2u
public-in 4849 chris  cwd    DIR   0,42      412    17211 /files/cmb-files/repos
public-in 4849 chris  rtd    DIR   0,26       28       34 /
public-in 4849 chris  txt    REG   0,26    14144   239824 /usr/bin/perl
public-in 4849 chris  mem    REG   0,26  2969912   227085 /usr/lib/libstdc++.so.6.0.30
public-in 4849 chris  mem    REG   0,26   100328   564900 /usr/lib/libz.so.1.2.13
public-in 4849 chris  mem    REG   0,26    30760   399826 /usr/lib/libuuid.so.1.3.0
public-in 4849 chris  mem    REG   0,26   121208   226756 /usr/lib/libgcc_s.so.1
public-in 4849 chris  mem    REG   0,26  2234600   563318 /usr/lib/libxapian.so.30.12.2
public-in 4849 chris  mem    REG   0,26   500184   219769 /usr/lib/perl5/vendor_perl/auto/Search/Xapian/Xapian.so
public-in 4849 chris  mem    REG   0,26   132928   239703 /usr/lib/perl5/core_perl/auto/Compress/Raw/Zlib/Zlib.so
public-in 4849 chris  mem    REG   0,26  1351488   347561 /usr/lib/libsqlite3.so.0.8.6
public-in 4849 chris  mem    REG   0,26   180112   239928 /usr/lib/perl5/vendor_perl/auto/DBD/SQLite/SQLite.so
public-in 4849 chris  mem    REG   0,26   138264   240017 /usr/lib/perl5/vendor_perl/auto/DBI/DBI.so
public-in 4849 chris  mem    REG   0,26     2310     4758 /usr/share/zoneinfo/PST8PDT
public-in 4849 chris  mem    REG   0,26    18168   239686 /usr/lib/perl5/core_perl/auto/MIME/Base64/Base64.so
public-in 4849 chris  mem    REG   0,26   657216   239665 /usr/lib/perl5/core_perl/auto/re/re.so
public-in 4849 chris  mem    REG   0,26   116640   239674 /usr/lib/perl5/core_perl/auto/Storable/Storable.so
public-in 4849 chris  mem    REG   0,26    46920   239702 /usr/lib/perl5/core_perl/auto/Encode/Encode.so
public-in 4849 chris  mem    REG   0,26    39608   218906 /usr/lib/perl5/vendor_perl/auto/Email/Address/XS/XS.so
public-in 4849 chris  mem    REG   0,26    46920   239675 /usr/lib/perl5/core_perl/auto/Socket/Socket.so
public-in 4849 chris  mem    REG   0,26    59128   239687 /usr/lib/perl5/core_perl/auto/List/Util/Util.so
public-in 4849 chris  mem    REG   0,26    46840   238715 /usr/lib/perl5/core_perl/auto/Digest/SHA/SHA.so
public-in 4849 chris  mem    REG   0,26    30456   239671 /usr/lib/perl5/core_perl/auto/Time/HiRes/HiRes.so
public-in 4849 chris  mem    REG   0,26    18168   239744 /usr/lib/perl5/core_perl/auto/Cwd/Cwd.so
public-in 4849 chris  mem    REG   0,26    30536   239695 /usr/lib/perl5/core_perl/auto/File/Glob/Glob.so
public-in 4849 chris  mem    REG   0,26    22264   239689 /usr/lib/perl5/core_perl/auto/IO/IO.so
public-in 4849 chris  mem    REG   0,26   108360   239682 /usr/lib/perl5/core_perl/auto/POSIX/POSIX.so
public-in 4849 chris  mem    REG   0,26    22344   239697 /usr/lib/perl5/core_perl/auto/Fcntl/Fcntl.so
public-in 4849 chris  mem    REG   0,26  3789544   239902 /usr/lib/libperl.so.5.36.0
public-in 4849 chris  mem    REG   0,26  4251720   228062 /usr/lib/libc.so
public-in 4849 chris    0u   CHR  136,7      0t0       10 /dev/pts/7 4028,screen,11u
public-in 4849 chris    1u   CHR  136,7      0t0       10 /dev/pts/7 4028,screen,11u
public-in 4849 chris    2u   CHR  136,7      0t0       10 /dev/pts/7 4028,screen,11u
public-in 4849 chris    3w   CHR    1,3      0t0        4 /dev/null
public-in 4849 chris    4uW  REG   0,42        0  2074958 /files/cmb-files/repos/metanew/inbox.lock
public-in 4849 chris    5ur  REG   0,42    24576  2074742 /files/cmb-files/repos/metanew/msgmap.sqlite3
public-in 4849 chris    6w  FIFO   0,12      0t0 26905684 pipe 4874,shard[0],5r
public-in 4849 chris    7r  FIFO   0,12      0t0 26905685 pipe 4874,shard[0],8w
public-in 4849 chris    8ur  REG   0,42    69632  2074960 /files/cmb-files/repos/metanew/xap15/over.sqlite3
public-in 4849 chris    9u   REG   0,49   461301   410306 /tmp/PerlIO_LaapMm (deleted)
public-in 4849 chris   10u   REG   0,42    24576  2074986 /files/cmb-files/repos/metanew/mm_tmp-4849-i60n
public-in 4849 chris   11w  FIFO   0,12      0t0 26905179 pipe 4879,git,0r
public-in 4849 chris   12r  FIFO   0,12      0t0 26905180 pipe 4879,git,1w
public-in 4849 chris   13u   REG   0,42    16928  2074743 /files/cmb-files/repos/metanew/msgmap.sqlite3-journal
public-in 4849 chris   14u   REG   0,42    66176  2074961 /files/cmb-files/repos/metanew/xap15/over.sqlite3-journal
shard[0]  4874 chris    5r  FIFO   0,12      0t0 26905684 pipe 4849,public-in,6w
shard[0]  4874 chris    8w  FIFO   0,12      0t0 26905685 pipe 4849,public-in,7r
git       4879 chris    0r  FIFO   0,12      0t0 26905179 pipe 4849,public-in,11w
git       4879 chris    1w  FIFO   0,12      0t0 26905180 pipe 4849,public-in,12r

[-- Attachment #3: typescript --]
[-- Type: text/plain, Size: 610 bytes --]

Script started on 2022-12-21 05:12:15-08:00 [TERM="screen.linux" TTY="/dev/pts/10" COLUMNS="80" LINES="30"]
chris@beast:/home/chris $ doas strace -p 4849
\rdoas (chris@beast) password: 
strace: Process 4849 attached
write(11, "4187293ba385ec204df84742b7197f4d"..., 41^Cstrace: Process 4849 detached
 <detached ...>
Interrupt
chris@beast:/home/chris $ doas strace -p 4879
strace: Process 4879 attached
write(1, "Return-Path: <e@80x24.org>\nX-Spa"..., 3793^Cstrace: Process 4879 detached
 <detached ...>
Interrupt
chris@beast:/home/chris $ exit

Script done on 2022-12-21 05:12:43-08:00 [COMMAND_EXIT_CODE="130"]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 13:46   ` Chris Brannon
@ 2022-12-21 19:48     ` Eric Wong
  2022-12-21 20:46       ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2022-12-21 19:48 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Do you know which pipes are which?  "lsof -p $PID +E" can help
> > with connectivity checking, as can script/dtas-graph in
> > https://80x24.org/dtas.git if you have Graph::Easy
> 
> Yes.  I'm attaching my lsof output and a typescript.
> 
> The processes of interest here are 4849 public-inbox-convert and 4879
> git cat-file.
> PID 4849's FD 11 is the write end of a pipe, with 4879's stdin as the
> read end.
> PID 4849's FD 12 is the read end of a pipe, with 4879's stdout as the
> write end.  At the point of the hang, 4849 is trying to write a SHA1 to
> FD 11, while 4879 is writing an email message to its stdout.

OK.  That is strange, because the current values are sized
conservatively for Linux (which has larger-than-required
PIPE_BUF size).

> > Some shots in the dark:
> >
> > 2. Tweak $PIPE_BUFSIZ and/or MAX_INFLIGHT to smaller values.  e.g.
> >
> > diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
> > index 882a9a4a..ec40edd7 100644
> > --- a/lib/PublicInbox/Git.pm
> > +++ b/lib/PublicInbox/Git.pm
> > @@ -23,13 +23,12 @@ use Carp qw(croak carp);
> >  use Digest::SHA ();
> >  use PublicInbox::DS qw(dwaitpid);
> >  our @EXPORT_OK = qw(git_unquote git_quote);
> > -our $PIPE_BUFSIZ = 65536; # Linux default
> > +our $PIPE_BUFSIZ = 4096; # Linux default
> >  our $in_cleanup;
> >  our $RDTIMEO = 60_000; # milliseconds
> >  our $async_warn; # true in read-only daemons
> >  
> > -use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) /
> > -	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
> > +use constant MAX_INFLIGHT => 4;
> 
> This right here seems to have fixed it, when testing locally.

Are you able to isolate whether $PIPE_BUFSIZ or MAX_INFLIGHT on
it's own fixes it?

And can you confirm the ->blocking(0) change had no effect on
it's own?

Capping MAX_INFLIGHT to a smaller value is probably fine (maybe
10 can work).

The old MAX_INFLIGHT value ((512 * 3)/65 = 23) is actually
extremely conservative for Linux, since the smallest possible
PIPE_BUF is 4096 (so (4096 * 3)/65 = 189), but giant values
don't help (and hurt interruptibility).

However, making $PIPE_BUFSIZ smaller would increase syscalls
made and hurt performance on systems with expensive syscalls.
So I want to keep $PIPE_BUFSIZ as big as reasonable.  Setting
`$PIPE_BUFSIZ = 1024 ** 2' ought to work on a system with giant
pipes, even; but the default Linux pipe size is 65536 unless
it's low on memory.

I'm honestly at a loss on how to explain what went wrong for you
because the existing values are OS-independent.

I also do wonder if you've hit a kernel bug under low memory
conditions.  I've certainly encountered problems with TCP
traffic hanging processes due to memory compaction.

> PS.  Thank you for that lsof command.  I've never used lsof in that way;
> I'll have to add that to my *nix debugging toolbelt.

np :>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 19:48     ` Eric Wong
@ 2022-12-21 20:46       ` Chris Brannon
  2022-12-21 21:11         ` Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2022-12-21 20:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Are you able to isolate whether $PIPE_BUFSIZ or MAX_INFLIGHT on
> it's own fixes it?

MAX_INFLIGHT is sufficient.  In fact, $PIPE_BUFSIZ had no effect at all
when I tried just now.

> And can you confirm the ->blocking(0) change had no effect on
> it's own?

That was the first thing I tried, and it had no effect.

> Capping MAX_INFLIGHT to a smaller value is probably fine (maybe
> 10 can work).

I tried with 10 just now, and that was fine also.

> I also do wonder if you've hit a kernel bug under low memory
> conditions.  I've certainly encountered problems with TCP
> traffic hanging processes due to memory compaction.

I doubt it.  I looked during a deadlock, and I had 2.66 GiB available.

-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 20:46       ` Chris Brannon
@ 2022-12-21 21:11         ` Eric Wong
  2022-12-21 22:17           ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2022-12-21 21:11 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Are you able to isolate whether $PIPE_BUFSIZ or MAX_INFLIGHT on
> > it's own fixes it?
> 
> MAX_INFLIGHT is sufficient.  In fact, $PIPE_BUFSIZ had no effect at all
> when I tried just now.

OK, good to know.  Thanks so far for your other testing and
reports.

> > Capping MAX_INFLIGHT to a smaller value is probably fine (maybe
> > 10 can work).
> 
> I tried with 10 just now, and that was fine also.

Oops, so POSIX::PIPE_BUF is actually 4096 on Linux.
(`perl -MPOSIX -E 'say POSIX::PIPE_BUF'` shows 4096 on the
Alpine cfarm machine)

But I'm still confused as to why the original code didn't
work on Linux since the pipe can't be smaller than 4096...

The following changes MAX_INFLIGHT to 23 for all platforms (and
explains things better).  I also wonder if you can find a
threshold for when things start failing (e.g. hardcoding 24
should fail on FreeBSD)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 882a9a4a..a1af776b 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -28,8 +28,10 @@ our $in_cleanup;
 our $RDTIMEO = 60_000; # milliseconds
 our $async_warn; # true in read-only daemons
 
-use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) /
-	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
+# 512: POSIX PIPE_BUF minimum (see pipe(7))
+# 3: @$inflight is flattened [ $OID, $cb, $arg ]
+# 65: SHA-256 hex size + "\n" in preparation for git using non-SHA1
+use constant MAX_INFLIGHT => 512 * 3 / 65;
 
 my %GIT_ESC = (
 	a => "\a",

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: public-inbox-convert hangs on systems using musl libc
  2022-12-21 21:11         ` Eric Wong
@ 2022-12-21 22:17           ` Chris Brannon
  2022-12-21 23:22             ` [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2022-12-21 22:17 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> The following changes MAX_INFLIGHT to 23 for all platforms (and
> explains things better).  I also wonder if you can find a
> threshold for when things start failing (e.g. hardcoding 24
> should fail on FreeBSD)

I did a binary search.  On my Linux system with musl, deadlocks start at
MAX_INFLIGHT >= 115.

-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum
  2022-12-21 22:17           ` Chris Brannon
@ 2022-12-21 23:22             ` Eric Wong
  2022-12-21 23:57               ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2022-12-21 23:22 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > The following changes MAX_INFLIGHT to 23 for all platforms (and
> > explains things better).  I also wonder if you can find a
> > threshold for when things start failing (e.g. hardcoding 24
> > should fail on FreeBSD)
> 
> I did a binary search.  On my Linux system with musl, deadlocks start at
> MAX_INFLIGHT >= 115.

Thanks.  I'll push the following out and intend to keep it at <= 23
going forward.  However, it still worries me greatly that I have
no explanation to why MAX_INFLIGHT >= 115 fails.  This stuff is
my bread and butter, really...

I'll wait on a cfarm admins to test.

Also, can I assume just running 'public-inbox-index' on any
freshly-cloned inbox also fails for you independently of
-convert?

------------8<----------
Subject: [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum

This ensures we get consistent pipelining behavior across
platforms.  Furthermore, a smaller value is probably more
reasonable since "git cat-file" can usually outpace indexing and
lower values allow us to react to user interaction (e.g. Ctrl-C)
more quickly.

The previous value based on Linux PIPE_BUF (4096) allowed a
value of 189 which worked fine on non-musl Linux systems, but
failed on musl-based Void and Alpine Linux.  Mysteriously, this
works on musl up to a value of 114 and starts locking up at 115.
The reason for this failure is currently unexplained and will
hopefully be discovered soon.

Regardless, capping the value to 23 based on the universal
PIPE_BUF minimum (512) seems reasonable, anyways.

Reported-by: Chris Brannon <chris@the-brannons.com>
Tested-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
---
 lib/PublicInbox/Git.pm | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 882a9a4a..a1af776b 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -28,8 +28,10 @@ our $in_cleanup;
 our $RDTIMEO = 60_000; # milliseconds
 our $async_warn; # true in read-only daemons
 
-use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) /
-	65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1
+# 512: POSIX PIPE_BUF minimum (see pipe(7))
+# 3: @$inflight is flattened [ $OID, $cb, $arg ]
+# 65: SHA-256 hex size + "\n" in preparation for git using non-SHA1
+use constant MAX_INFLIGHT => 512 * 3 / 65;
 
 my %GIT_ESC = (
 	a => "\a",

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum
  2022-12-21 23:22             ` [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum Eric Wong
@ 2022-12-21 23:57               ` Chris Brannon
  2023-01-04  3:49                 ` [PATCH] git: fix asynchronous batching for deep pipelines Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2022-12-21 23:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Also, can I assume just running 'public-inbox-index' on any
> freshly-cloned inbox also fails for you independently of
> -convert?

Yes it does.  I double-checked to be sure.

Thanks for the fix!  If you come up with something more permanent that
you want me to test, please ask.

-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] git: fix asynchronous batching for deep pipelines
  2022-12-21 23:57               ` Chris Brannon
@ 2023-01-04  3:49                 ` Eric Wong
  2023-01-05  1:08                   ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2023-01-04  3:49 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > Also, can I assume just running 'public-inbox-index' on any
> > freshly-cloned inbox also fails for you independently of
> > -convert?
> 
> Yes it does.  I double-checked to be sure.
> 
> Thanks for the fix!  If you come up with something more permanent that
> you want me to test, please ask.



--------8<-------
Subject: [PATCH] git: fix asynchronous batching for deep pipelines

...By using non-blocking pipe writes.  This avoids problems for
musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
uses a smaller input buffer than glibc or FreeBSD libc.

My key mistake was our check against MAX_INFLIGHT is only useful
for the initial batch of requests.  It is not useful for
subsequent requests since git will drain the pipe at
unpredictable rates due to libc differences.

To fix this problem, I initially tried to drain the read pipe
as long as readable data was pending.  However, reading git
output without giving git more work would also limit parallelism
opportunities since we don't want git to sit idle, either.  This
change ensures we keep both pipes reasonably full to reduce
stalls and maximize parallelism between git and public-inbox.

While the limit set a few weeks ago in commit
56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
remains in place, any higher or lower limit will work.  It may
be worth it to use an even lower limit to improve interactivity
w.r.t. Ctrl-C interrupts.

I've tested the pre-56e6e587745c and even higher values on an
Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>

Reported-by: Chris Brannon <chris@the-brannons.com>
Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
---
 lib/PublicInbox/Git.pm | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index a1af776b..86b80a4e 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -145,6 +145,7 @@ sub _bidi_pipe {
 		fcntl($out_w, 1031, 4096);
 		fcntl($in_r, 1031, 4096) if $batch eq '--batch-check';
 	}
+	$out_w->blocking(0);
 	$self->{$out} = $out_w;
 	$self->{$in} = $in_r;
 }
@@ -203,7 +204,9 @@ sub cat_async_retry ($$) {
 	for (my $i = 0; $i < @$inflight; $i += 3) {
 		$buf .= "$inflight->[$i]\n";
 	}
+	$self->{out}->blocking(1); # brand new pipe, should never block
 	print { $self->{out} } $buf or $self->fail("write error: $!");
+	$self->{out}->blocking(0);
 	my $req = shift @$inflight;
 	unshift(@$inflight, \$req); # \$ref to indicate retried
 
@@ -305,13 +308,27 @@ sub check_async_begin ($) {
 	$self->{inflight_c} = [];
 }
 
+sub write_all {
+	my ($self, $out, $buf, $read_step, $inflight) = @_;
+	$read_step->($self, $inflight) while @$inflight >= MAX_INFLIGHT;
+	do {
+		my $w = syswrite($out, $buf);
+		if (defined $w) {
+			return if $w == length($buf);
+			warn "chop: $w";
+			substr($buf, 0, $w, ''); # sv_chop
+		} elsif ($! != EAGAIN) {
+			$self->fail("write: $!");
+		} else { warn "E: $!" }
+		$read_step->($self, $inflight);
+	} while (1);
+}
+
 sub check_async ($$$$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight_c = $self->{inflight_c} // check_async_begin($self);
-	while (scalar(@$inflight_c) >= MAX_INFLIGHT) {
-		check_async_step($self, $inflight_c);
-	}
-	print { $self->{out_c} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out_c}, $oid."\n",
+		\&check_async_step, $inflight_c);
 	push(@$inflight_c, $oid, $cb, $arg);
 }
 
@@ -496,10 +513,7 @@ sub cat_async_begin {
 sub cat_async ($$$;$) {
 	my ($self, $oid, $cb, $arg) = @_;
 	my $inflight = $self->{inflight} // cat_async_begin($self);
-	while (scalar(@$inflight) >= MAX_INFLIGHT) {
-		cat_async_step($self, $inflight);
-	}
-	print { $self->{out} } $oid, "\n" or $self->fail("write error: $!");
+	write_all($self, $self->{out}, $oid."\n", \&cat_async_step, $inflight);
 	push(@$inflight, $oid, $cb, $arg);
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] git: fix asynchronous batching for deep pipelines
  2023-01-04  3:49                 ` [PATCH] git: fix asynchronous batching for deep pipelines Eric Wong
@ 2023-01-05  1:08                   ` Chris Brannon
  2023-01-05  1:44                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Brannon @ 2023-01-05  1:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

>
> --------8<-------
> Subject: [PATCH] git: fix asynchronous batching for deep pipelines
>
> ...By using non-blocking pipe writes.  This avoids problems for
> musl (and other libc) where getdelim(3) used by `git cat-file --batch*'
> uses a smaller input buffer than glibc or FreeBSD libc.
>
> My key mistake was our check against MAX_INFLIGHT is only useful
> for the initial batch of requests.  It is not useful for
> subsequent requests since git will drain the pipe at
> unpredictable rates due to libc differences.
>
> To fix this problem, I initially tried to drain the read pipe
> as long as readable data was pending.  However, reading git
> output without giving git more work would also limit parallelism
> opportunities since we don't want git to sit idle, either.  This
> change ensures we keep both pipes reasonably full to reduce
> stalls and maximize parallelism between git and public-inbox.
>
> While the limit set a few weeks ago in commit
> 56e6e587745c (git: cap MAX_INFLIGHT value to POSIX minimum, 2022-12-21)
> remains in place, any higher or lower limit will work.  It may
> be worth it to use an even lower limit to improve interactivity
> w.r.t. Ctrl-C interrupts.
>
> I've tested the pre-56e6e587745c and even higher values on an
> Alpine VM in the GCC Farm <https://cfarm.tetaneutral.net>
>
> Reported-by: Chris Brannon <chris@the-brannons.com>
> Tested-by: Chris Brannon <chris@the-brannons.com>
> Link: https://public-inbox.org/meta/87edssl7u0.fsf@the-brannons.com/T/
*SNIP*

Thanks!

The conversion seems to have been successful, but it logged very many of
the following:
E: Resource temporarily unavailable at
/usr/share/perl5/vendor_perl/PublicInbox/Git.pm line 320.

I'm sorry, I didn't count how many.  It seemed like multiple screenfuls
at least.

-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] git: write_all: remove leftover debug messages
  2023-01-05  1:08                   ` Chris Brannon
@ 2023-01-05  1:44                     ` Eric Wong
  2023-01-05  7:32                       ` Chris Brannon
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Wong @ 2023-01-05  1:44 UTC (permalink / raw)
  To: Chris Brannon; +Cc: meta

Chris Brannon <chris@the-brannons.com> wrote:
> The conversion seems to have been successful, but it logged very many of
> the following:
> E: Resource temporarily unavailable at
> /usr/share/perl5/vendor_perl/PublicInbox/Git.pm line 320.
> 
> I'm sorry, I didn't count how many.  It seemed like multiple screenfuls
> at least.

Oops, yeah.   Just leftover debug messages that never triggered
on my FreeBSD and Debian systems.  Will push this out:

---------8<----------
Subject: [PATCH] git: write_all: remove leftover debug messages

I used these messages during development to verify Alpine was
triggering the intended codepaths.  They're no longer necessary
and just noise at this point.

Reported-by: Chris Brannon <chris@the-brannons.com>
Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
---
 lib/PublicInbox/Git.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 86b80a4e..c7f82ba2 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -315,11 +315,10 @@ sub write_all {
 		my $w = syswrite($out, $buf);
 		if (defined $w) {
 			return if $w == length($buf);
-			warn "chop: $w";
 			substr($buf, 0, $w, ''); # sv_chop
 		} elsif ($! != EAGAIN) {
 			$self->fail("write: $!");
-		} else { warn "E: $!" }
+		}
 		$read_step->($self, $inflight);
 	} while (1);
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] git: write_all: remove leftover debug messages
  2023-01-05  1:44                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
@ 2023-01-05  7:32                       ` Chris Brannon
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Brannon @ 2023-01-05  7:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Subject: [PATCH] git: write_all: remove leftover debug messages
>
> I used these messages during development to verify Alpine was
> triggering the intended codepaths.  They're no longer necessary
> and just noise at this point.
>
> Reported-by: Chris Brannon <chris@the-brannons.com>
> Tested-by: Chris Brannon <chris@the-brannons.com>
> Fixes: d4ba8828ab23 ("git: fix asynchronous batching for deep pipelines")
> ---
>  lib/PublicInbox/Git.pm | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Thanks!
-- Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-05  7:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-21 11:28 public-inbox-convert hangs on systems using musl libc Chris Brannon
2022-12-21 12:21 ` Eric Wong
2022-12-21 13:46   ` Chris Brannon
2022-12-21 19:48     ` Eric Wong
2022-12-21 20:46       ` Chris Brannon
2022-12-21 21:11         ` Eric Wong
2022-12-21 22:17           ` Chris Brannon
2022-12-21 23:22             ` [PATCH] git: cap MAX_INFLIGHT value to POSIX minimum Eric Wong
2022-12-21 23:57               ` Chris Brannon
2023-01-04  3:49                 ` [PATCH] git: fix asynchronous batching for deep pipelines Eric Wong
2023-01-05  1:08                   ` Chris Brannon
2023-01-05  1:44                     ` [PATCH] git: write_all: remove leftover debug messages Eric Wong
2023-01-05  7:32                       ` Chris Brannon

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).