user/dev discussion of public-inbox itself
 help / Atom feed
* [RFC][PATCH] ProcessPipe.pm: Use read not sysread
@ 2018-07-30  5:04 ebiederm
  2018-07-30  5:46 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: ebiederm @ 2018-07-30  5:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


While playing with git fast export I discovered that mixing <> and
read would give inconsistent results.  I tracked the issue down to
using sysread in ProcessPipe instead of plain read.

If it is desirable to use readline I can't see how using sysread
can work as readline to be efficient needs to use buffered I/O.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

Am I missing something or was this a fundamental bug from the beginning?

 lib/PublicInbox/ProcessPipe.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm
index 7bb6ddee40b1..2769e064ca21 100644
--- a/lib/PublicInbox/ProcessPipe.pm
+++ b/lib/PublicInbox/ProcessPipe.pm
@@ -11,7 +11,7 @@ sub TIEHANDLE {
 	bless { pid => $pid, fh => $fh }, $class;
 }
 
-sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) }
+sub READ { read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) }
 
 sub READLINE { readline($_[0]->{fh}) }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread
  2018-07-30  5:04 [RFC][PATCH] ProcessPipe.pm: Use read not sysread ebiederm
@ 2018-07-30  5:46 ` Eric Wong
  2018-07-30 11:57   ` ebiederm
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2018-07-30  5:46 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> While playing with git fast export I discovered that mixing <> and
> read would give inconsistent results.  I tracked the issue down to
> using sysread in ProcessPipe instead of plain read.
> 
> If it is desirable to use readline I can't see how using sysread
> can work as readline to be efficient needs to use buffered I/O.

Agreed.

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> Am I missing something or was this a fundamental bug from the beginning?

I recall it being intentional; but it's subtle and I should've
at least documented it :x

Right now, none of the ProcessPipe users mix readline and read
calls; and I've found sysread slightly faster with lots of data.

The cases where I parse fast-export output uses the wantarray
return from popen which bypasses ProcessPipe.  But I'm not
opposed to your patch, though.  The bandwidth-hungry
GitHTTPBackend/Qspawn case already uses the untied wantarray
values and calls sysread directly.  And off the top of my head,
that's the only expensive data transfer in the PSGI interface.

But yes, very subtle and I think I'll take your patch to
simplify things since I don't think there's noticeable
performance regressions.

Probably something like the following is not worth it; but
right now, uncommenting the "die" won't trigger failures in
the current code base:

diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm
index 7bb6dde..37e4859 100644
--- a/lib/PublicInbox/ProcessPipe.pm
+++ b/lib/PublicInbox/ProcessPipe.pm
@@ -11,9 +11,20 @@ sub TIEHANDLE {
 	bless { pid => $pid, fh => $fh }, $class;
 }
 
-sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) }
+sub READ {
+	if ($_[0]->{did_readline}) {
+		# die "Using buffered read\n";
+		read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
+	}
+	else {
+		sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
+	}
+}
 
-sub READLINE { readline($_[0]->{fh}) }
+sub READLINE {
+	$_[0]->{did_readline} = 1;
+	readline($_[0]->{fh});
+}
 
 sub CLOSE {
 	my $fh = delete($_[0]->{fh});

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread
  2018-07-30  5:46 ` Eric Wong
@ 2018-07-30 11:57   ` ebiederm
  0 siblings, 0 replies; 3+ messages in thread
From: ebiederm @ 2018-07-30 11:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> While playing with git fast export I discovered that mixing <> and
>> read would give inconsistent results.  I tracked the issue down to
>> using sysread in ProcessPipe instead of plain read.
>> 
>> If it is desirable to use readline I can't see how using sysread
>> can work as readline to be efficient needs to use buffered I/O.
>
> Agreed.
>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> 
>> Am I missing something or was this a fundamental bug from the beginning?
>
> I recall it being intentional; but it's subtle and I should've
> at least documented it :x
>
> Right now, none of the ProcessPipe users mix readline and read
> calls; and I've found sysread slightly faster with lots of data.
>
> The cases where I parse fast-export output uses the wantarray
> return from popen which bypasses ProcessPipe.  But I'm not
> opposed to your patch, though.  The bandwidth-hungry
> GitHTTPBackend/Qspawn case already uses the untied wantarray
> values and calls sysread directly.  And off the top of my head,
> that's the only expensive data transfer in the PSGI interface.

That explains why I could not see how something this broken could
have survived.  The had missed wantarray distinction that makes
the ProcessPipe code optional.  It makes sense.  Either manage
the waitpid yourself or let the helper do it.

> But yes, very subtle and I think I'll take your patch to
> simplify things since I don't think there's noticeable
> performance regressions.

Thank you.

> Probably something like the following is not worth it; but
> right now, uncommenting the "die" won't trigger failures in
> the current code base:

Yes.  Good for illustration.  But if the users can and do call sysread
on their own that seems much less subtle, and much less error prone.

Eric

>
> diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm
> index 7bb6dde..37e4859 100644
> --- a/lib/PublicInbox/ProcessPipe.pm
> +++ b/lib/PublicInbox/ProcessPipe.pm
> @@ -11,9 +11,20 @@ sub TIEHANDLE {
>  	bless { pid => $pid, fh => $fh }, $class;
>  }
>  
> -sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) }
> +sub READ {
> +	if ($_[0]->{did_readline}) {
> +		# die "Using buffered read\n";
> +		read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
> +	}
> +	else {
> +		sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0);
> +	}
> +}
>  
> -sub READLINE { readline($_[0]->{fh}) }
> +sub READLINE {
> +	$_[0]->{did_readline} = 1;
> +	readline($_[0]->{fh});
> +}
>  
>  sub CLOSE {
>  	my $fh = delete($_[0]->{fh});

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-30  5:04 [RFC][PATCH] ProcessPipe.pm: Use read not sysread ebiederm
2018-07-30  5:46 ` Eric Wong
2018-07-30 11:57   ` ebiederm

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox