user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd
  2020-01-24  9:43  5% ` [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd Eric Wong
  2020-01-25 19:27  7%   ` Eric Wong
@ 2020-01-25 19:34  7%   ` Eric Wong
  1 sibling, 0 replies; 4+ results
From: Eric Wong @ 2020-01-25 19:34 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
> index e1f536f3..174f4752 100644
> --- a/lib/PublicInbox/WwwStatic.pm
> +++ b/lib/PublicInbox/WwwStatic.pm
> @@ -51,7 +51,24 @@ sub r ($;$) {
>  	  [ $msg ] ]
>  }
>  
> -sub prepare_range {
> +sub getline_response ($$$$$) {
> +	my ($env, $in, $off, $len, $path) = @_;
> +	my $r = bless {}, __PACKAGE__;
> +	if ($env->{'pi-httpd.async'}) { # public-inbox-httpd-only mode
> +		$env->{'psgix.no-compress'} = 1; # do not chunk response
> +		%$r = ( bypass => [$in, $off, $len, $env->{'psgix.io'}] );
> +	} else {
> +		%$r = ( in => $in, off => $off, len => $len, path => $path );
> +	}
> +	$r;
> +}
> +
> +sub getline_generic ($$$$) {
> +	my ($in, $off, $len, $p) = @_;
> +	bless { in => $in, off => $off, len => $len, path => $p }, __PACKAGE__;
> +}

Oops, getline_generic isn't used anywhere, since I inlined it
into getline_response.  Going to squash the following in:

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index d75c0076..60a71d8d 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -63,11 +63,6 @@ sub getline_response ($$$$$) {
 	$r;
 }
 
-sub getline_generic ($$$$) {
-	my ($in, $off, $len, $p) = @_;
-	bless { in => $in, off => $off, len => $len, path => $p }, __PACKAGE__;
-}
-
 sub setup_range {
 	my ($env, $in, $h, $beg, $end, $size) = @_;
 	my $code = 200;

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd
  2020-01-24  9:43  5% ` [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd Eric Wong
@ 2020-01-25 19:27  7%   ` Eric Wong
  2020-01-25 19:34  7%   ` Eric Wong
  1 sibling, 0 replies; 4+ results
From: Eric Wong @ 2020-01-25 19:27 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
> index e1f536f3..174f4752 100644
> --- a/lib/PublicInbox/WwwStatic.pm
> +++ b/lib/PublicInbox/WwwStatic.pm

> @@ -131,6 +145,7 @@ sub response ($$$;$) {
>  			return r(500);
>  		}
>  		return r(404) unless -f $in;
> +		binmode $in, ':unix'; # reduce syscalls for read() >8192 bytes
>  	}
>  	my $size = -s _; # bare "_" reuses "struct stat" from "-f" above
>  	my $mtime = time2str((stat(_))[9]);

That binmode would also lead to leaks.  Will squash to workaround
current/old Perls: http://nntp.perl.org/group/perl.perl5.porters/256935

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index 174f4752..d75c0076 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -145,7 +145,6 @@ sub response ($$$;$) {
 			return r(500);
 		}
 		return r(404) unless -f $in;
-		binmode $in, ':unix'; # reduce syscalls for read() >8192 bytes
 	}
 	my $size = -s _; # bare "_" reuses "struct stat" from "-f" above
 	my $mtime = time2str((stat(_))[9]);

^ permalink raw reply related	[relevance 7%]

* [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd
  2020-01-24  9:43  6% [PATCH 0/4] -httpd static file improvements Eric Wong
@ 2020-01-24  9:43  5% ` Eric Wong
  2020-01-25 19:27  7%   ` Eric Wong
  2020-01-25 19:34  7%   ` Eric Wong
  0 siblings, 2 replies; 4+ results
From: Eric Wong @ 2020-01-24  9:43 UTC (permalink / raw)
  To: meta

This prevents public-inbox-httpd from buffering ->getline
results from a static file into another temporary file when
writing to slow clients.  Instead we inject the static file
ref with offsets and length directly into the {wbuf} queue.

It took me a while to decide to go this route, some
rejected ideas:

1. Using Plack::Util::set_io_path and having PublicInbox::HTTP
   serve the result directly.  This is compatible with what
   some other PSGI servers do using sendfile.  However, neither
   Starman or Twiggy currently use sendfile for partial responses.

2. Parsing the Content-Range response header for offsets and
   lengths to use with set_io_path for partial responses.

These rejected ideas required increasing the complexity of HTTP
response writing in PublicInbox::HTTP in the common, non-static
file cases.  Instead, we made minor changes to the colder write
buffering path of PublicInbox::DS and leave the hot paths
untouched.

We still support generic PSGI servers via ->getline.  However,
since we don't know the characteristics of other PSGI servers,
we no longer do a 64K initial read in an attempt to negotiate a
larger TCP window.
---
 lib/PublicInbox/WwwStatic.pm | 54 +++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/WwwStatic.pm b/lib/PublicInbox/WwwStatic.pm
index e1f536f3..174f4752 100644
--- a/lib/PublicInbox/WwwStatic.pm
+++ b/lib/PublicInbox/WwwStatic.pm
@@ -51,7 +51,24 @@ sub r ($;$) {
 	  [ $msg ] ]
 }
 
-sub prepare_range {
+sub getline_response ($$$$$) {
+	my ($env, $in, $off, $len, $path) = @_;
+	my $r = bless {}, __PACKAGE__;
+	if ($env->{'pi-httpd.async'}) { # public-inbox-httpd-only mode
+		$env->{'psgix.no-compress'} = 1; # do not chunk response
+		%$r = ( bypass => [$in, $off, $len, $env->{'psgix.io'}] );
+	} else {
+		%$r = ( in => $in, off => $off, len => $len, path => $path );
+	}
+	$r;
+}
+
+sub getline_generic ($$$$) {
+	my ($in, $off, $len, $p) = @_;
+	bless { in => $in, off => $off, len => $len, path => $p }, __PACKAGE__;
+}
+
+sub setup_range {
 	my ($env, $in, $h, $beg, $end, $size) = @_;
 	my $code = 200;
 	my $len = $size;
@@ -81,9 +98,6 @@ sub prepare_range {
 		if ($len <= 0) {
 			$code = 416;
 		} else {
-			if ($in) {
-				sysseek($in, $beg, SEEK_SET) or return r(500);
-			}
 			push @$h, qw(Accept-Ranges bytes Content-Range);
 			push @$h, "bytes $beg-$end/$size";
 
@@ -95,7 +109,7 @@ sub prepare_range {
 		push @$h, 'Content-Range', "bytes */$size";
 		return [ 416, $h, [] ];
 	}
-	($code, $len);
+	($code, $beg, $len);
 }
 
 # returns a PSGI arrayref response iff .gz and non-.gz mtimes match
@@ -131,6 +145,7 @@ sub response ($$$;$) {
 			return r(500);
 		}
 		return r(404) unless -f $in;
+		binmode $in, ':unix'; # reduce syscalls for read() >8192 bytes
 	}
 	my $size = -s _; # bare "_" reuses "struct stat" from "-f" above
 	my $mtime = time2str((stat(_))[9]);
@@ -142,30 +157,37 @@ sub response ($$$;$) {
 	my $len = $size;
 	my $code = 200;
 	push @$h, 'Content-Type', $type;
+	my $off = 0;
 	if (($env->{HTTP_RANGE} || '') =~ /\bbytes=([0-9]*)-([0-9]*)\z/) {
-		($code, $len) = prepare_range($env, $in, $h, $1, $2, $size);
+		($code, $off, $len) = setup_range($env, $in, $h, $1, $2, $size);
 		return $code if ref($code);
 	}
 	push @$h, 'Content-Length', $len, 'Last-Modified', $mtime;
-	my $body = $in ? bless {
-		initial_rd => 65536,
-		len => $len,
-		in => $in,
-		path => $path,
-		env => $env,
-	}, __PACKAGE__ : [];
-	[ $code, $h, $body ];
+	[ $code, $h, $in ? getline_response($env, $in, $off, $len, $path) : [] ]
 }
 
 # called by PSGI servers on each response chunk:
 sub getline {
 	my ($self) = @_;
+
+	# avoid buffering, by becoming the buffer! (public-inbox-httpd)
+	if (my $tmpio = delete $self->{bypass}) {
+		my $http = pop @$tmpio; # PublicInbox::HTTP
+		push @{$http->{wbuf}}, $tmpio; # [ $in, $off, $len ]
+		$http->flush_write;
+		return; # undef, EOF
+	}
+
+	# generic PSGI runs this:
 	my $len = $self->{len} or return; # undef, tells server we're done
-	my $n = delete($self->{initial_rd}) // 8192;
+	my $n = 8192;
 	$n = $len if $len < $n;
-	my $r = sysread($self->{in}, my $buf, $n);
+	seek($self->{in}, $self->{off}, SEEK_SET) or
+			die "seek ($self->{path}): $!";
+	my $r = read($self->{in}, my $buf, $n);
 	if (defined $r && $r > 0) { # success!
 		$self->{len} = $len - $r;
+		$self->{off} += $r;
 		return $buf;
 	}
 	my $m = defined $r ? "EOF with $len bytes left" : "read error: $!";

^ permalink raw reply related	[relevance 5%]

* [PATCH 0/4] -httpd static file improvements
@ 2020-01-24  9:43  6% Eric Wong
  2020-01-24  9:43  5% ` [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd Eric Wong
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2020-01-24  9:43 UTC (permalink / raw)
  To: meta

Serving large static files to slow clients could lead to
public-inbox-httpd buffering data already in static files again
into a temporary file.

This was inefficient, and solving it in a generic way could
actually break other PSGI servers since their sendfile
optimizations don't handle 206 (partial content) responses
correctly.

So we make minor changes to the way PublicInbox::DS handles
write buffers and inject static files, offsets, and length
limits directly into the {wbuf} queue.

Eric Wong (4):
  http: eliminate short-lived cyclic ref for psgix.io
  wwwstatic: offload error handling to PSGI server
  ds: tmpio: store offsets per-buffer
  wwwstatic: wire up buffer bypass for -httpd

 lib/PublicInbox/DS.pm        | 36 ++++++++++----------
 lib/PublicInbox/HTTP.pm      |  7 ++--
 lib/PublicInbox/WwwStatic.pm | 65 +++++++++++++++++++++---------------
 3 files changed, 61 insertions(+), 47 deletions(-)

^ permalink raw reply	[relevance 6%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-01-24  9:43  6% [PATCH 0/4] -httpd static file improvements Eric Wong
2020-01-24  9:43  5% ` [PATCH 4/4] wwwstatic: wire up buffer bypass for -httpd Eric Wong
2020-01-25 19:27  7%   ` Eric Wong
2020-01-25 19:34  7%   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).