user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 04/43] wwwtext: gzip text/plain responses, as well
Date: Sun,  5 Jul 2020 23:27:20 +0000	[thread overview]
Message-ID: <20200705232759.3161-5-e@yhbt.net> (raw)
In-Reply-To: <20200705232759.3161-1-e@yhbt.net>

Most of our plain-text responses are config files
big enough to warrant compression.
---
 lib/PublicInbox/WwwText.pm | 17 ++++++++++++++---
 t/psgi_text.t              | 33 ++++++++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/WwwText.pm b/lib/PublicInbox/WwwText.pm
index b23a415e4..508005fba 100644
--- a/lib/PublicInbox/WwwText.pm
+++ b/lib/PublicInbox/WwwText.pm
@@ -10,6 +10,8 @@ use PublicInbox::Linkify;
 use PublicInbox::WwwStream;
 use PublicInbox::Hval qw(ascii_html);
 use URI::Escape qw(uri_escape_utf8);
+use PublicInbox::GzipFilter qw(gzf_maybe);
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
 our $QP_URL = 'https://xapian.org/docs/queryparser.html';
 our $WIKI_URL = 'https://en.wikipedia.org/wiki';
 my $hl = eval {
@@ -35,14 +37,23 @@ sub get_text {
 		$code = 404;
 		$txt = "404 Not Found ($key)\n";
 	}
+	my $env = $ctx->{env};
 	if ($raw) {
-		$hdr->[3] = bytes::length($txt);
-		return [ $code, $hdr, [ $txt ] ]
+		my $body;
+		if (my $gzf = $code == 200 ? gzf_maybe($hdr, $env) : undef) {
+			my $zbuf = $gzf->translate($txt);
+			undef $txt;
+			$body = [ $zbuf .= $gzf->translate(undef) ];
+		} else {
+			$body = [ $txt ];
+		}
+		$hdr->[3] = bytes::length($body->[0]);
+		return [ $code, $hdr, $body ]
 	}
 
 	# enforce trailing slash for "wget -r" compatibility
 	if (!$have_tslash && $code == 200) {
-		my $url = $ctx->{-inbox}->base_url($ctx->{env});
+		my $url = $ctx->{-inbox}->base_url($env);
 		$url .= "_/text/$key/";
 
 		return [ 302, [ 'Content-Type', 'text/plain',
diff --git a/t/psgi_text.t b/t/psgi_text.t
index 833bcaba7..9867feaa4 100644
--- a/t/psgi_text.t
+++ b/t/psgi_text.t
@@ -10,7 +10,7 @@ my $maindir = "$tmpdir/main.git";
 my $addr = 'test-public@example.com';
 my $cfgpfx = "publicinbox.test";
 my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape Plack::Builder);
-require_mods(@mods);
+require_mods(@mods, 'IO::Uncompress::Gunzip');
 use_ok $_ foreach @mods;
 use PublicInbox::Import;
 use PublicInbox::Git;
@@ -26,17 +26,36 @@ my $www = PublicInbox::WWW->new($config);
 
 test_psgi(sub { $www->call(@_) }, sub {
 	my ($cb) = @_;
-	my $res;
-	$res = $cb->(GET('/test/_/text/help/'));
-	like($res->content, qr!<title>public-inbox help.*</title>!,
-		'default help');
-	$res = $cb->(GET('/test/_/text/config/raw'));
+	my $gunzipped;
+	my $req = GET('/test/_/text/help/');
+	my $res = $cb->($req);
+	my $content = $res->content;
+	like($content, qr!<title>public-inbox help.*</title>!, 'default help');
+	$req->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($req);
+	is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding');
+	is($res->header('Content-Type'), 'text/html; charset=UTF-8',
+		'got gzipped HTML');
+	IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped);
+	is($gunzipped, $content, 'gzipped content is correct');
+
+	$req = GET('/test/_/text/config/raw');
+	$res = $cb->($req);
+	$content = $res->content;
+	my $olen = $res->header('Content-Length');
 	my $f = "$tmpdir/cfg";
 	open my $fh, '>', $f or die;
-	print $fh $res->content or die;
+	print $fh $content or die;
 	close $fh or die;
 	my $cfg = PublicInbox::Config->new($f);
 	is($cfg->{"$cfgpfx.address"}, $addr, 'got expected address in config');
+
+	$req->header('Accept-Encoding' => 'gzip');
+	$res = $cb->($req);
+	is($res->header('Content-Encoding'), 'gzip', 'got gzip encoding');
+	ok($res->header('Content-Length') < $olen, 'gzipped help is smaller');
+	IO::Uncompress::Gunzip::gunzip(\($res->content) => \$gunzipped);
+	is($gunzipped, $content);
 });
 
 done_testing();

  parent reply	other threads:[~2020-07-05 23:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-05 23:27 [PATCH 00/43] www: async git cat-file w/ -httpd Eric Wong
2020-07-05 23:27 ` [PATCH 01/43] gzipfilter: minor cleanups Eric Wong
2020-07-05 23:27 ` [PATCH 02/43] wwwstream: oneshot: perform gzip without middleware Eric Wong
2020-07-05 23:27 ` [PATCH 03/43] www*stream: gzip ->getline responses Eric Wong
2020-07-05 23:27 ` Eric Wong [this message]
2020-07-05 23:27 ` [PATCH 05/43] wwwtext: switch to html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 06/43] www: need: use WwwStream::html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 07/43] wwwlisting: use GzipFilter for HTML Eric Wong
2020-07-05 23:27 ` [PATCH 08/43] gzipfilter: replace Compress::Raw::Deflate usages Eric Wong
2020-07-05 23:27 ` [PATCH 09/43] {gzip,noop}filter: ->zmore returns undef, always Eric Wong
2020-07-05 23:27 ` [PATCH 10/43] mbox: remove html_oneshot import Eric Wong
2020-07-05 23:27 ` [PATCH 11/43] wwwstatic: support gzipped directory listings Eric Wong
2020-07-05 23:27 ` [PATCH 12/43] qspawn: learn to gzip streaming responses Eric Wong
2020-07-05 23:27 ` [PATCH 13/43] stop auto-loading Plack::Middleware::Deflater Eric Wong
2020-07-05 23:27 ` [PATCH 14/43] mboxgz: do asynchronous git blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 15/43] mboxgz: reduce hash depth Eric Wong
2020-07-05 23:27 ` [PATCH 16/43] mbox: async blob fetch for "single message" raw mboxrd Eric Wong
2020-07-05 23:27 ` [PATCH 17/43] wwwatomstream: simplify feed_update callers Eric Wong
2020-07-05 23:27 ` [PATCH 18/43] wwwatomstream: use PublicInbox::Inbox->modified for feed_updated Eric Wong
2020-07-05 23:27 ` [PATCH 19/43] wwwatomstream: reuse $ctx as $self Eric Wong
2020-07-05 23:27 ` [PATCH 20/43] xt/httpd-async-stream: allow more options Eric Wong
2020-07-05 23:27 ` [PATCH 21/43] wwwatomstream: support async blob fetch Eric Wong
2020-07-05 23:27 ` [PATCH 22/43] wwwstream: reduce object graph depth Eric Wong
2020-07-05 23:27 ` [PATCH 23/43] wwwstream: reduce blob fetch paths for ->getline Eric Wong
2020-07-05 23:27 ` [PATCH 24/43] www: start making gzipfilter the parent response class Eric Wong
2020-07-05 23:27 ` [PATCH 25/43] remove unused/redundant zlib-related imports Eric Wong
2020-07-05 23:27 ` [PATCH 26/43] wwwstream: use parent.pm and no warnings Eric Wong
2020-07-05 23:27 ` [PATCH 27/43] wwwstream: subclass off GzipFilter Eric Wong
2020-07-05 23:27 ` [PATCH 28/43] view: make /$INBOX/$MSGID/ permalink async Eric Wong
2020-07-05 23:27 ` [PATCH 29/43] view: /$INBOX/$MSGID/t/ reads blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 30/43] view: update /$INBOX/$MSGID/T/ to be async Eric Wong
2020-07-05 23:27 ` [PATCH 31/43] feed: generate_i: eliminate pointless loop Eric Wong
2020-07-05 23:27 ` [PATCH 32/43] feed: /$INBOX/new.html fetches blobs asynchronously Eric Wong
2020-07-05 23:27 ` [PATCH 33/43] ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs Eric Wong
2020-07-05 23:27 ` [PATCH 34/43] view: eml_entry: reduce parameters Eric Wong
2020-07-05 23:27 ` [PATCH 35/43] view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case Eric Wong
2020-07-05 23:27 ` [PATCH 36/43] wwwstream: eliminate ::response, use html_oneshot Eric Wong
2020-07-05 23:27 ` [PATCH 37/43] www: update internal docs Eric Wong
2020-07-05 23:27 ` [PATCH 38/43] view: simplify eml_entry callers further Eric Wong
2020-07-05 23:27 ` [PATCH 39/43] wwwtext: simplify gzf_maybe use Eric Wong
2020-07-05 23:27 ` [PATCH 40/43] wwwattach: support async blob retrievals Eric Wong
2020-07-05 23:27 ` [PATCH 41/43] gzipfilter: drop HTTP connection on bugs or data corruption Eric Wong
2020-07-05 23:27 ` [PATCH 42/43] daemon: warn on missing blobs Eric Wong
2020-07-05 23:27 ` [PATCH 43/43] gzipfilter: check http->{forward} for client disconnects Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200705232759.3161-5-e@yhbt.net \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).