user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: meta@public-inbox.org
Subject: Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread
Date: Mon, 30 Jul 2018 05:46:46 +0000	[thread overview]
Message-ID: <20180730054646.gaylszho35yclqjo@dcvr> (raw)
In-Reply-To: <87y3dtl3f6.fsf@xmission.com>

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> While playing with git fast export I discovered that mixing <> and
> read would give inconsistent results.  I tracked the issue down to
> using sysread in ProcessPipe instead of plain read.
> 
> If it is desirable to use readline I can't see how using sysread
> can work as readline to be efficient needs to use buffered I/O.

Agreed.

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> Am I missing something or was this a fundamental bug from the beginning?

I recall it being intentional; but it's subtle and I should've
at least documented it :x

Right now, none of the ProcessPipe users mix readline and read
calls; and I've found sysread slightly faster with lots of data.

The cases where I parse fast-export output uses the wantarray
return from popen which bypasses ProcessPipe.  But I'm not
opposed to your patch, though.  The bandwidth-hungry
GitHTTPBackend/Qspawn case already uses the untied wantarray
values and calls sysread directly.  And off the top of my head,
that's the only expensive data transfer in the PSGI interface.

But yes, very subtle and I think I'll take your patch to
simplify things since I don't think there's noticeable
performance regressions.

Probably something like the following is not worth it; but
right now, uncommenting the "die" won't trigger failures in
the current code base:

diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm
index 7bb6dde..37e4859 100644
--- a/lib/PublicInbox/ProcessPipe.pm
+++ b/lib/PublicInbox/ProcessPipe.pm
@@ -11,9 +11,20 @@ sub TIEHANDLE {
 	bless { pid => $pid, fh => $fh }, $class;
 }
 
-sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) }
+sub READ {
+	if ($_[0]->{did_readline}) {
+		# die "Using buffered read\n";
+		read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
+	}
+	else {
+		sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
+	}
+}
 
-sub READLINE { readline($_[0]->{fh}) }
+sub READLINE {
+	$_[0]->{did_readline} = 1;
+	readline($_[0]->{fh});
+}
 
 sub CLOSE {
 	my $fh = delete($_[0]->{fh});

  reply	other threads:[~2018-07-30  5:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-30  5:04 [RFC][PATCH] ProcessPipe.pm: Use read not sysread Eric W. Biederman
2018-07-30  5:46 ` Eric Wong [this message]
2018-07-30 11:57   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180730054646.gaylszho35yclqjo@dcvr \
    --to=e@80x24.org \
    --cc=ebiederm@xmission.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).