From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.1 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 943F51F852; Wed, 21 Dec 2022 19:48:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1671652135; bh=Vz7/D/b/1MRLCXrd2PvCZ96zVMV/Qva3tu5VkKtx0U8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=JoLhJ20h8NX+BigPn79DMDYWKQpeoZRA0VVmnkFtAoqbM6dxD6sZLAZhDjQE3dNvz Q4IxBZ8J7oyIj9gZn4rtnZWA3pYBX/40BDtAiV8f/ElDNQ/2fbGC01fNR3QOJjiZNd m5qJ/Rt9GQdAD8abNLkkhyXa5TO2aMmVo3+RY71I= Date: Wed, 21 Dec 2022 19:48:55 +0000 From: Eric Wong To: Chris Brannon Cc: meta@public-inbox.org Subject: Re: public-inbox-convert hangs on systems using musl libc Message-ID: <20221221194855.GA5179@dcvr> References: <875ye5m1wo.fsf@the-brannons.com> <20221221122102.M600156@dcvr> <871qosna30.fsf@the-brannons.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <871qosna30.fsf@the-brannons.com> List-Id: Chris Brannon wrote: > Eric Wong writes: > > > Do you know which pipes are which? "lsof -p $PID +E" can help > > with connectivity checking, as can script/dtas-graph in > > https://80x24.org/dtas.git if you have Graph::Easy > > Yes. I'm attaching my lsof output and a typescript. > > The processes of interest here are 4849 public-inbox-convert and 4879 > git cat-file. > PID 4849's FD 11 is the write end of a pipe, with 4879's stdin as the > read end. > PID 4849's FD 12 is the read end of a pipe, with 4879's stdout as the > write end. At the point of the hang, 4849 is trying to write a SHA1 to > FD 11, while 4879 is writing an email message to its stdout. OK. That is strange, because the current values are sized conservatively for Linux (which has larger-than-required PIPE_BUF size). > > Some shots in the dark: > > > > 2. Tweak $PIPE_BUFSIZ and/or MAX_INFLIGHT to smaller values. e.g. > > > > diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm > > index 882a9a4a..ec40edd7 100644 > > --- a/lib/PublicInbox/Git.pm > > +++ b/lib/PublicInbox/Git.pm > > @@ -23,13 +23,12 @@ use Carp qw(croak carp); > > use Digest::SHA (); > > use PublicInbox::DS qw(dwaitpid); > > our @EXPORT_OK = qw(git_unquote git_quote); > > -our $PIPE_BUFSIZ = 65536; # Linux default > > +our $PIPE_BUFSIZ = 4096; # Linux default > > our $in_cleanup; > > our $RDTIMEO = 60_000; # milliseconds > > our $async_warn; # true in read-only daemons > > > > -use constant MAX_INFLIGHT => (POSIX::PIPE_BUF * 3) / > > - 65; # SHA-256 hex size + "\n" in preparation for git using non-SHA1 > > +use constant MAX_INFLIGHT => 4; > > This right here seems to have fixed it, when testing locally. Are you able to isolate whether $PIPE_BUFSIZ or MAX_INFLIGHT on it's own fixes it? And can you confirm the ->blocking(0) change had no effect on it's own? Capping MAX_INFLIGHT to a smaller value is probably fine (maybe 10 can work). The old MAX_INFLIGHT value ((512 * 3)/65 = 23) is actually extremely conservative for Linux, since the smallest possible PIPE_BUF is 4096 (so (4096 * 3)/65 = 189), but giant values don't help (and hurt interruptibility). However, making $PIPE_BUFSIZ smaller would increase syscalls made and hurt performance on systems with expensive syscalls. So I want to keep $PIPE_BUFSIZ as big as reasonable. Setting `$PIPE_BUFSIZ = 1024 ** 2' ought to work on a system with giant pipes, even; but the default Linux pipe size is 65536 unless it's low on memory. I'm honestly at a loss on how to explain what went wrong for you because the existing values are OS-independent. I also do wonder if you've hit a kernel bug under low memory conditions. I've certainly encountered problems with TCP traffic hanging processes due to memory compaction. > PS. Thank you for that lsof command. I've never used lsof in that way; > I'll have to add that to my *nix debugging toolbelt. np :>