user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/3] start tidying up gzip-related code
@ 2019-11-16  2:34 Eric Wong
  2019-11-16  2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

Starting with the mbox.gz stuff, first.  Gettig rid of
Plack::Middleware::Deflater is a long-term goal since we can
take advantage of doing gzip during HTML/XML rendering to
reduce memory usage.

Eric Wong (3):
  mbox: unused mid_clean import
  mbox: split mboxgz out into a separate file
  mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip

 MANIFEST                  |  1 +
 lib/PublicInbox/Mbox.pm   | 68 +++----------------------------------
 lib/PublicInbox/MboxGz.pm | 71 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 64 deletions(-)
 create mode 100644 lib/PublicInbox/MboxGz.pm


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] mbox: unused mid_clean import
  2019-11-16  2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
@ 2019-11-16  2:34 ` Eric Wong
  2019-11-16  2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
  2019-11-16  2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

We're gradually phasing mid_clean out (in favor of mids()).
---
 lib/PublicInbox/Mbox.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 67b671f5..9e808c09 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -10,7 +10,7 @@
 package PublicInbox::Mbox;
 use strict;
 use warnings;
-use PublicInbox::MID qw/mid_clean mid_escape/;
+use PublicInbox::MID qw/mid_escape/;
 use PublicInbox::Hval qw/to_filename/;
 use Email::Simple;
 use Email::MIME::Encode;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/3] mbox: split mboxgz out into a separate file
  2019-11-16  2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
  2019-11-16  2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
@ 2019-11-16  2:34 ` Eric Wong
  2019-11-16  2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

It'll make using Compress::Raw::Zlib easier, since we
can use that and import constants more easily.
---
 MANIFEST                  |  1 +
 lib/PublicInbox/Mbox.pm   | 64 ++-------------------------------------
 lib/PublicInbox/MboxGz.pm | 64 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 62 deletions(-)
 create mode 100644 lib/PublicInbox/MboxGz.pm

diff --git a/MANIFEST b/MANIFEST
index ef8538b4..689d3d4e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -119,6 +119,7 @@ lib/PublicInbox/MDA.pm
 lib/PublicInbox/MID.pm
 lib/PublicInbox/MIME.pm
 lib/PublicInbox/Mbox.pm
+lib/PublicInbox/MboxGz.pm
 lib/PublicInbox/MsgIter.pm
 lib/PublicInbox/MsgTime.pm
 lib/PublicInbox/Msgmap.pm
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 9e808c09..42ed8c5d 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -136,7 +136,7 @@ sub msg_body ($) {
 
 sub thread_mbox {
 	my ($ctx, $over, $sfx) = @_;
-	eval { require IO::Compress::Gzip };
+	eval { require PublicInbox::MboxGz };
 	return sub { need_gzip(@_) } if $@;
 	my $mid = $ctx->{mid};
 	my $msgs = $over->get_thread($mid, {});
@@ -196,7 +196,7 @@ sub mbox_all_ids {
 sub mbox_all {
 	my ($ctx, $query) = @_;
 
-	eval { require IO::Compress::Gzip };
+	eval { require PublicInbox::MboxGz };
 	return sub { need_gzip(@_) } if $@;
 	return mbox_all_ids($ctx) if $query eq '';
 	my $opts = { mset => 2 };
@@ -239,63 +239,3 @@ EOF
 }
 
 1;
-
-package PublicInbox::MboxGz;
-use strict;
-use warnings;
-use PublicInbox::Hval qw/to_filename/;
-
-sub new {
-	my ($class, $ctx, $cb) = @_;
-	my $buf = '';
-	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless {
-		buf => \$buf,
-		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
-		cb => $cb,
-		ctx => $ctx,
-	}, $class;
-}
-
-sub response {
-	my ($class, $ctx, $cb, $fn) = @_;
-	my $body = $class->new($ctx, $cb);
-	# http://www.iana.org/assignments/media-types/application/gzip
-	my @h = qw(Content-Type application/gzip);
-	if ($fn) {
-		$fn = to_filename($fn);
-		push @h, 'Content-Disposition', "inline; filename=$fn.mbox.gz";
-	}
-	[ 200, \@h, $body ];
-}
-
-# called by Plack::Util::foreach or similar
-sub getline {
-	my ($self) = @_;
-	my $ctx = $self->{ctx} or return;
-	my $gz = $self->{gz};
-	while (my $smsg = $self->{cb}->()) {
-		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
-		my $h = Email::Simple->new($mref)->header_obj;
-		$gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
-		$gz->write(PublicInbox::Mbox::msg_body($$mref));
-
-		my $bref = $self->{buf};
-		if (length($$bref) >= 8192) {
-			my $ret = $$bref; # copy :<
-			${$self->{buf}} = '';
-			return $ret;
-		}
-
-		# be fair to other clients on public-inbox-httpd:
-		return '';
-	}
-	delete($self->{gz})->close;
-	# signal that we're done and can return undef next call:
-	delete $self->{ctx};
-	${delete $self->{buf}};
-}
-
-sub close {} # noop
-
-1;
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
new file mode 100644
index 00000000..2919ad6a
--- /dev/null
+++ b/lib/PublicInbox/MboxGz.pm
@@ -0,0 +1,64 @@
+# Copyright (C) 2015-2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+package PublicInbox::MboxGz;
+use strict;
+use warnings;
+use Email::Simple;
+use PublicInbox::Hval qw/to_filename/;
+use PublicInbox::Mbox;
+use IO::Compress::Gzip;
+
+sub new {
+	my ($class, $ctx, $cb) = @_;
+	my $buf = '';
+	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
+	bless {
+		buf => \$buf,
+		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
+		cb => $cb,
+		ctx => $ctx,
+	}, $class;
+}
+
+sub response {
+	my ($class, $ctx, $cb, $fn) = @_;
+	my $body = $class->new($ctx, $cb);
+	# http://www.iana.org/assignments/media-types/application/gzip
+	my @h = qw(Content-Type application/gzip);
+	if ($fn) {
+		$fn = to_filename($fn);
+		push @h, 'Content-Disposition', "inline; filename=$fn.mbox.gz";
+	}
+	[ 200, \@h, $body ];
+}
+
+# called by Plack::Util::foreach or similar
+sub getline {
+	my ($self) = @_;
+	my $ctx = $self->{ctx} or return;
+	my $gz = $self->{gz};
+	while (my $smsg = $self->{cb}->()) {
+		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
+		my $h = Email::Simple->new($mref)->header_obj;
+		$gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
+		$gz->write(PublicInbox::Mbox::msg_body($$mref));
+
+		my $bref = $self->{buf};
+		if (length($$bref) >= 8192) {
+			my $ret = $$bref; # copy :<
+			${$self->{buf}} = '';
+			return $ret;
+		}
+
+		# be fair to other clients on public-inbox-httpd:
+		return '';
+	}
+	delete($self->{gz})->close;
+	# signal that we're done and can return undef next call:
+	delete $self->{ctx};
+	${delete $self->{buf}};
+}
+
+sub close {} # noop
+
+1;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-16  2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
  2019-11-16  2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
  2019-11-16  2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
@ 2019-11-16  2:34 ` Eric Wong
  2019-11-19 13:57   ` SZEDER Gábor
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
anyways, and being able to easily detach buffers to return them
via ->getline is nice.  This results in a 1-2% performance
improvement when fetching giant mboxes.
---
 lib/PublicInbox/Mbox.pm   |  2 +-
 lib/PublicInbox/MboxGz.pm | 41 +++++++++++++++++++++++----------------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 42ed8c5d..42cedd15 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -231,7 +231,7 @@ sub need_gzip {
 	my $title = 'gzipped mbox not available';
 	$fh->write(<<EOF);
 <html><head><title>$title</title><body><pre>$title
-The administrator needs to install the IO::Compress::Gzip Perl module
+The administrator needs to install the Compress::Raw::Zlib Perl module
 to support gzipped mboxes.
 <a href="../">Return to index</a></pre></body></html>
 EOF
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 2919ad6a..2a55447f 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -7,17 +7,15 @@ use Email::Simple;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
 use IO::Compress::Gzip;
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	my $buf = '';
 	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless {
-		buf => \$buf,
-		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
-		cb => $cb,
-		ctx => $ctx,
-	}, $class;
+	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+	$err == Z_OK or die "Deflate->new failed: $err";
+	bless { gz => $gz, cb => $cb, ctx => $ctx }, $class;
 }
 
 sub response {
@@ -32,31 +30,40 @@ sub response {
 	[ 200, \@h, $body ];
 }
 
+sub gzip_fail ($$) {
+	my ($ctx, $err) = @_;
+	$ctx->{env}->{'psgi.errors'}->print("deflate failed: $err\n");
+	'';
+}
+
 # called by Plack::Util::foreach or similar
 sub getline {
 	my ($self) = @_;
 	my $ctx = $self->{ctx} or return;
 	my $gz = $self->{gz};
+	my $buf = delete($self->{buf});
 	while (my $smsg = $self->{cb}->()) {
 		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
 		my $h = Email::Simple->new($mref)->header_obj;
-		$gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
-		$gz->write(PublicInbox::Mbox::msg_body($$mref));
 
-		my $bref = $self->{buf};
-		if (length($$bref) >= 8192) {
-			my $ret = $$bref; # copy :<
-			${$self->{buf}} = '';
-			return $ret;
-		}
+		my $err = $gz->deflate(
+			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
+		        $buf);
+		return gzip_fail($ctx, $err) if $err != Z_OK;
+
+		$err = $gz->deflate(PublicInbox::Mbox::msg_body($$mref), $buf);
+		return gzip_fail($ctx, $err) if $err != Z_OK;
+
+		return $buf if length($buf) >= 8192;
 
 		# be fair to other clients on public-inbox-httpd:
+		$self->{buf} = $buf;
 		return '';
 	}
-	delete($self->{gz})->close;
 	# signal that we're done and can return undef next call:
 	delete $self->{ctx};
-	${delete $self->{buf}};
+	my $err = $gz->flush($buf, Z_FINISH);
+	$err == Z_OK ? $buf : gzip_fail($ctx, $err);
 }
 
 sub close {} # noop

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-16  2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
@ 2019-11-19 13:57   ` SZEDER Gábor
  2019-11-19 20:12     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: SZEDER Gábor @ 2019-11-19 13:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Hi,

On Sat, Nov 16, 2019 at 02:34:39AM +0000, Eric Wong wrote:
> IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
> anyways, and being able to easily detach buffers to return them
> via ->getline is nice.  This results in a 1-2% performance
> improvement when fetching giant mboxes.

I've just stumbled upon an issue that I suspect to be related to this
patch series (or maybe just a strange coincidence...).

When trying to download a mbox.gz with 'wget' I get a "501 Not
Implemented", e.g.:

  $ wget https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
  --2019-11-19 14:53:37--  https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
  Resolving public-inbox.org (public-inbox.org)... 64.71.152.64, 2600:3c01::f03c:91ff:fe96:f5d6
  Connecting to public-inbox.org (public-inbox.org)|64.71.152.64|:443... connected.
  HTTP request sent, awaiting response... 501 Not Implemented
  2019-11-19 14:53:38 ERROR 501: Not Implemented.

When I try to do that with Firefox, I get:

  gzipped mbox not available
  The administrator needs to install the Compress::Raw::Zlib Perl module
  to support gzipped mboxes.
  Return to index


Thanks,
Gábor


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-19 13:57   ` SZEDER Gábor
@ 2019-11-19 20:12     ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2019-11-19 20:12 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: meta

SZEDER Gábor <szeder.dev@gmail.com> wrote:
> I've just stumbled upon an issue that I suspect to be related to this
> patch series (or maybe just a strange coincidence...).
> 
> When trying to download a mbox.gz with 'wget' I get a "501 Not
> Implemented", e.g.:

Thanks, fixed now.  It's a bug in the build/install since
PublicInbox/MboxGz.pm was not installed (being a new file).

I made commit 4c20de0694d06ff3a5f963d7f51d509319060b50
("Makefile.PL: add dependency on MANIFEST contents") to
avoid that bug, but apparently it wasn't enough...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-16  2:34 [PATCH 0/3] start tidying up gzip-related code Eric Wong
2019-11-16  2:34 ` [PATCH 1/3] mbox: unused mid_clean import Eric Wong
2019-11-16  2:34 ` [PATCH 2/3] mbox: split mboxgz out into a separate file Eric Wong
2019-11-16  2:34 ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
2019-11-19 13:57   ` SZEDER Gábor
2019-11-19 20:12     ` Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git