user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 12/43] qspawn: learn to gzip streaming responses
Date: Sun,  5 Jul 2020 23:27:28 +0000	[thread overview]
Message-ID: <20200705232759.3161-13-e@yhbt.net> (raw)
In-Reply-To: <20200705232759.3161-1-e@yhbt.net>

This will allow us to gzip responses generated by cgit
and any other CGI programs or long-lived streaming
responses we may spawn.
---
 lib/PublicInbox/GzipFilter.pm | 16 ++++++++++++++++
 lib/PublicInbox/Qspawn.pm     |  6 ++++--
 t/httpd-corner.psgi           |  7 +++++++
 t/httpd-corner.t              |  9 ++++++++-
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/GzipFilter.pm b/lib/PublicInbox/GzipFilter.pm
index 0fbb4476a..0a6c56a5d 100644
--- a/lib/PublicInbox/GzipFilter.pm
+++ b/lib/PublicInbox/GzipFilter.pm
@@ -32,6 +32,22 @@ sub gzf_maybe ($$) {
 	bless { gz => $gz }, __PACKAGE__;
 }
 
+sub qsp_maybe ($$) {
+	my ($res_hdr, $env) = @_;
+	return if ($env->{HTTP_ACCEPT_ENCODING} // '') !~ /\bgzip\b/;
+	my $hdr = join("\n", @$res_hdr);
+	return if $hdr !~ m!^Content-Type\n
+				(?:(?:text/(?:html|plain))|
+				application/atom\+xml)\b!ixsm;
+	return if $hdr =~ m!^Content-Encoding\ngzip\n!smi;
+	return if $hdr =~ m!^Content-Length\n[0-9]+\n!smi;
+	return if $hdr =~ m!^Transfer-Encoding\n!smi;
+	# in case Plack::Middleware::Deflater is loaded:
+	return if $env->{'plack.skip-deflater'}++;
+	push @$res_hdr, @GZIP_HDRS;
+	bless {}, __PACKAGE__;
+}
+
 sub gzip_or_die () {
 	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
 	$err == Z_OK or die "Deflate->new failed: $err";
diff --git a/lib/PublicInbox/Qspawn.pm b/lib/PublicInbox/Qspawn.pm
index d395a10b3..88b6d390a 100644
--- a/lib/PublicInbox/Qspawn.pm
+++ b/lib/PublicInbox/Qspawn.pm
@@ -25,8 +25,8 @@
 
 package PublicInbox::Qspawn;
 use strict;
-use warnings;
 use PublicInbox::Spawn qw(popen_rd);
+use PublicInbox::GzipFilter;
 
 # n.b.: we get EAGAIN with public-inbox-httpd, and EINTR on other PSGI servers
 use Errno qw(EAGAIN EINTR);
@@ -255,7 +255,9 @@ sub psgi_return_init_cb {
 	my ($self) = @_;
 	my $r = rd_hdr($self) or return;
 	my $env = $self->{psgi_env};
-	my $filter = delete $env->{'qspawn.filter'};
+	my $filter = delete $env->{'qspawn.filter'} //
+		PublicInbox::GzipFilter::qsp_maybe($r->[1], $env);
+
 	my $wcb = delete $env->{'qspawn.wcb'};
 	my $async = delete $self->{async};
 	if (scalar(@$r) == 3) { # error
diff --git a/t/httpd-corner.psgi b/t/httpd-corner.psgi
index 446296200..cb41cfa05 100644
--- a/t/httpd-corner.psgi
+++ b/t/httpd-corner.psgi
@@ -94,6 +94,13 @@ my $app = sub {
 		return $qsp->psgi_return($env, undef, sub {
 			[ 200, [ qw(Content-Type application/octet-stream)]]
 		});
+	} elsif ($path eq '/psgi-return-compressible') {
+		require PublicInbox::Qspawn;
+		my $cmd = [qw(echo goodbye world)];
+		my $qsp = PublicInbox::Qspawn->new($cmd);
+		return $qsp->psgi_return($env, undef, sub {
+			[200, [qw(Content-Type text/plain)]]
+		});
 	} elsif ($path eq '/psgi-return-enoent') {
 		require PublicInbox::Qspawn;
 		my $cmd = [ 'this-better-not-exist-in-PATH'.rand ];
diff --git a/t/httpd-corner.t b/t/httpd-corner.t
index 681486550..514672a1b 100644
--- a/t/httpd-corner.t
+++ b/t/httpd-corner.t
@@ -340,11 +340,18 @@ SKIP: {
 	is($n, 30 * 1024 * 1024, 'got expected output from curl');
 	is($non_zero, 0, 'read all zeros');
 
-	require_mods(@zmods, 2);
+	require_mods(@zmods, 4);
 	my $buf = xqx([$curl, '-sS', "$base/psgi-return-gzip"]);
 	is($?, 0, 'curl succesful');
 	IO::Uncompress::Gunzip::gunzip(\$buf => \(my $out));
 	is($out, "hello world\n");
+	my $curl_rdr = { 2 => \(my $curl_err = '') };
+	$buf = xqx([$curl, qw(-sSv --compressed),
+			"$base/psgi-return-compressible"], undef, $curl_rdr);
+	is($?, 0, 'curl --compressed successful');
+	is($buf, "goodbye world\n", 'gzipped response as expected');
+	like($curl_err, qr/\bContent-Encoding: gzip\b/,
+		'curl got gzipped response');
 }
 
 {

  parent reply	other threads:[~2020-07-05 23:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
2020-07-05 23:27 ` [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware Eric Wong
2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
2020-07-05 23:27 ` [PATCH 04/43] wwwtext: gzip text/plain responses, as well Eric Wong
2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
2020-07-05 23:27 ` Eric Wong [this message]
2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200705232759.3161-13-e@yhbt.net \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).