From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 5D66E1F597; Mon, 30 Jul 2018 11:57:36 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fk6nv-00034e-Eh; Mon, 30 Jul 2018 05:57:35 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fk6nu-0003gH-OM; Mon, 30 Jul 2018 05:57:35 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <87y3dtl3f6.fsf@xmission.com> <20180730054646.gaylszho35yclqjo@dcvr> Date: Mon, 30 Jul 2018 06:57:26 -0500 In-Reply-To: <20180730054646.gaylszho35yclqjo@dcvr> (Eric Wong's message of "Mon, 30 Jul 2018 05:46:46 +0000") Message-ID: <87y3dtj5qx.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fk6nu-0003gH-OM;;;mid=<87y3dtj5qx.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/wEjra/st+sxaANru7OpFrCIODTZBl1a4= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: Eric Wong writes: > "Eric W. Biederman" wrote: >> >> While playing with git fast export I discovered that mixing <> and >> read would give inconsistent results. I tracked the issue down to >> using sysread in ProcessPipe instead of plain read. >> >> If it is desirable to use readline I can't see how using sysread >> can work as readline to be efficient needs to use buffered I/O. > > Agreed. > >> Signed-off-by: "Eric W. Biederman" >> --- >> >> Am I missing something or was this a fundamental bug from the beginning? > > I recall it being intentional; but it's subtle and I should've > at least documented it :x > > Right now, none of the ProcessPipe users mix readline and read > calls; and I've found sysread slightly faster with lots of data. > > The cases where I parse fast-export output uses the wantarray > return from popen which bypasses ProcessPipe. But I'm not > opposed to your patch, though. The bandwidth-hungry > GitHTTPBackend/Qspawn case already uses the untied wantarray > values and calls sysread directly. And off the top of my head, > that's the only expensive data transfer in the PSGI interface. That explains why I could not see how something this broken could have survived. The had missed wantarray distinction that makes the ProcessPipe code optional. It makes sense. Either manage the waitpid yourself or let the helper do it. > But yes, very subtle and I think I'll take your patch to > simplify things since I don't think there's noticeable > performance regressions. Thank you. > Probably something like the following is not worth it; but > right now, uncommenting the "die" won't trigger failures in > the current code base: Yes. Good for illustration. But if the users can and do call sysread on their own that seems much less subtle, and much less error prone. Eric > > diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm > index 7bb6dde..37e4859 100644 > --- a/lib/PublicInbox/ProcessPipe.pm > +++ b/lib/PublicInbox/ProcessPipe.pm > @@ -11,9 +11,20 @@ sub TIEHANDLE { > bless { pid => $pid, fh => $fh }, $class; > } > > -sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) } > +sub READ { > + if ($_[0]->{did_readline}) { > + # die "Using buffered read\n"; > + read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0); > + } > + else { > + sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0); > + } > +} > > -sub READLINE { readline($_[0]->{fh}) } > +sub READLINE { > + $_[0]->{did_readline} = 1; > + readline($_[0]->{fh}); > +} > > sub CLOSE { > my $fh = delete($_[0]->{fh});