user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-19 13:57  7%   ` SZEDER Gábor
@ 2019-11-19 20:12  7%     ` Eric Wong
  0 siblings, 0 replies; 5+ results
From: Eric Wong @ 2019-11-19 20:12 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: meta

SZEDER Gábor <szeder.dev@gmail.com> wrote:
> I've just stumbled upon an issue that I suspect to be related to this
> patch series (or maybe just a strange coincidence...).
> 
> When trying to download a mbox.gz with 'wget' I get a "501 Not
> Implemented", e.g.:

Thanks, fixed now.  It's a bug in the build/install since
PublicInbox/MboxGz.pm was not installed (being a new file).

I made commit 4c20de0694d06ff3a5f963d7f51d509319060b50
("Makefile.PL: add dependency on MANIFEST contents") to
avoid that bug, but apparently it wasn't enough...

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-16  2:34  5% ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
@ 2019-11-19 13:57  7%   ` SZEDER Gábor
  2019-11-19 20:12  7%     ` Eric Wong
  0 siblings, 1 reply; 5+ results
From: SZEDER Gábor @ 2019-11-19 13:57 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Hi,

On Sat, Nov 16, 2019 at 02:34:39AM +0000, Eric Wong wrote:
> IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
> anyways, and being able to easily detach buffers to return them
> via ->getline is nice.  This results in a 1-2% performance
> improvement when fetching giant mboxes.

I've just stumbled upon an issue that I suspect to be related to this
patch series (or maybe just a strange coincidence...).

When trying to download a mbox.gz with 'wget' I get a "501 Not
Implemented", e.g.:

  $ wget https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
  --2019-11-19 14:53:37--  https://public-inbox.org/meta/20191116023439.32410-1-e@80x24.org/t.mbox.gz
  Resolving public-inbox.org (public-inbox.org)... 64.71.152.64, 2600:3c01::f03c:91ff:fe96:f5d6
  Connecting to public-inbox.org (public-inbox.org)|64.71.152.64|:443... connected.
  HTTP request sent, awaiting response... 501 Not Implemented
  2019-11-19 14:53:38 ERROR 501: Not Implemented.

When I try to do that with Firefox, I get:

  gzipped mbox not available
  The administrator needs to install the Compress::Raw::Zlib Perl module
  to support gzipped mboxes.
  Return to index


Thanks,
Gábor


^ permalink raw reply	[relevance 7%]

* Re: message threading in the web UI
  @ 2019-11-16 14:40  5%   ` Ralph Siemsen
  0 siblings, 0 replies; 5+ results
From: Ralph Siemsen @ 2019-11-16 14:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Fri, Nov 15, 2019 at 11:52:35PM +0000, Eric Wong wrote:
>Ralph Siemsen <ralph.siemsen@linaro.org> wrote:
>> The way that threads are displayed seems to differ between the "frontpage"
>> (eg. https://public-inbox.org/meta/) versus how they are shown in the
>> "Thread Overview" (at the bottom of the page when viewing a specific
>> thread).
>>
>> On the frontpage, it seems only a subset of messages are displayed (maybe
>> only the direct replies?). The only indication that there are additional
>> hidden messages is the "+" sign, e.g. "(5+ messages)".
>
>Yes, the frontpage only grabs the 200 most recent messages
>(a "time window").

Ah, I see. Makes sense from a programmers point of view of course. 
However as an end user, based on what is shown in the UI, it looks like 
each thread is complete -- the number of messages in each thread varies 
(not artificially capped at say 5 messages per thread). When the thread 
consists of a patch series, the whole series seems to be displayed 
(likely because they are all posted together at the same time).

It is not very obvious that some additional messages (such as replies to 
the individual patches in a series) have been hidden. I only discovered 
it when I used my browser "Find in Page", and it did not find all the 
occurrences (but using "search" box on PI did find them).

>Getting the full count ("M") can be expensive since it needs an
>SQLite COUNT query for every thread...

Understood.

>Perhaps as an interim solution is to change the "+" for
>something more obvious:
>
>"5 or more messages"
>"5 messages in current time window"
>">= 5 messages"
>"at least 5 messages"

While those are a bit more obvious than just a "+" sign, to me at least, 
they are all ambiguous. The number itself is not really helping, since 
it is just a count of the number of lines that are displayed immediately 
below. In other words, the count does not add any information.

So perhaps just remove it entirely? For example:

[PATCH 0/3] start tidying up gzip-related code
 2019-11-16  2:34 UTC
` [PATCH 1/3] mbox: unused mid_clean import
` [PATCH 2/3] mbox: split mboxgz out into a separate file
` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
` <additional replies>

Note the extra line at the end, containing a link to the full thread. 
This essentially replaces the "+" with a more obvious indication that
messages have been omitted from the summary.

>I'm not sure how feasable that is since frontpage performance is
>critical and I spent a fair amount of time making it perform
>acceptably for giant inboxes like LKML.
>
>(Optionally) supporting Cache::FastMmap or other caching
>mechanisms may help, but that's still extra install overhead and
>initial load performance still needs to be taken into account.

Caching the whole page, esp for LKML and the like, certainly makes 
sense, but that is a bigger task for sure.

Thanks for your reply!
-Ralph


^ permalink raw reply	[relevance 5%]

* [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip
  2019-11-16  2:34  5% [PATCH 0/3] start tidying up gzip-related code Eric Wong
@ 2019-11-16  2:34  5% ` Eric Wong
  2019-11-19 13:57  7%   ` SZEDER Gábor
  0 siblings, 1 reply; 5+ results
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

IO::Compress::Gzip is a wrapper around Compress::Raw::Zlib,
anyways, and being able to easily detach buffers to return them
via ->getline is nice.  This results in a 1-2% performance
improvement when fetching giant mboxes.
---
 lib/PublicInbox/Mbox.pm   |  2 +-
 lib/PublicInbox/MboxGz.pm | 41 +++++++++++++++++++++++----------------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 42ed8c5d..42cedd15 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -231,7 +231,7 @@ sub need_gzip {
 	my $title = 'gzipped mbox not available';
 	$fh->write(<<EOF);
 <html><head><title>$title</title><body><pre>$title
-The administrator needs to install the IO::Compress::Gzip Perl module
+The administrator needs to install the Compress::Raw::Zlib Perl module
 to support gzipped mboxes.
 <a href="../">Return to index</a></pre></body></html>
 EOF
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index 2919ad6a..2a55447f 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -7,17 +7,15 @@ use Email::Simple;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
 use IO::Compress::Gzip;
+use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
+my %OPT = (-WindowBits => 15 + 16, -AppendOutput => 1);
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	my $buf = '';
 	$ctx->{base_url} = $ctx->{-inbox}->base_url($ctx->{env});
-	bless {
-		buf => \$buf,
-		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
-		cb => $cb,
-		ctx => $ctx,
-	}, $class;
+	my ($gz, $err) = Compress::Raw::Zlib::Deflate->new(%OPT);
+	$err == Z_OK or die "Deflate->new failed: $err";
+	bless { gz => $gz, cb => $cb, ctx => $ctx }, $class;
 }
 
 sub response {
@@ -32,31 +30,40 @@ sub response {
 	[ 200, \@h, $body ];
 }
 
+sub gzip_fail ($$) {
+	my ($ctx, $err) = @_;
+	$ctx->{env}->{'psgi.errors'}->print("deflate failed: $err\n");
+	'';
+}
+
 # called by Plack::Util::foreach or similar
 sub getline {
 	my ($self) = @_;
 	my $ctx = $self->{ctx} or return;
 	my $gz = $self->{gz};
+	my $buf = delete($self->{buf});
 	while (my $smsg = $self->{cb}->()) {
 		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
 		my $h = Email::Simple->new($mref)->header_obj;
-		$gz->write(PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}));
-		$gz->write(PublicInbox::Mbox::msg_body($$mref));
 
-		my $bref = $self->{buf};
-		if (length($$bref) >= 8192) {
-			my $ret = $$bref; # copy :<
-			${$self->{buf}} = '';
-			return $ret;
-		}
+		my $err = $gz->deflate(
+			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
+		        $buf);
+		return gzip_fail($ctx, $err) if $err != Z_OK;
+
+		$err = $gz->deflate(PublicInbox::Mbox::msg_body($$mref), $buf);
+		return gzip_fail($ctx, $err) if $err != Z_OK;
+
+		return $buf if length($buf) >= 8192;
 
 		# be fair to other clients on public-inbox-httpd:
+		$self->{buf} = $buf;
 		return '';
 	}
-	delete($self->{gz})->close;
 	# signal that we're done and can return undef next call:
 	delete $self->{ctx};
-	${delete $self->{buf}};
+	my $err = $gz->flush($buf, Z_FINISH);
+	$err == Z_OK ? $buf : gzip_fail($ctx, $err);
 }
 
 sub close {} # noop

^ permalink raw reply related	[relevance 5%]

* [PATCH 0/3] start tidying up gzip-related code
@ 2019-11-16  2:34  5% Eric Wong
  2019-11-16  2:34  5% ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
  0 siblings, 1 reply; 5+ results
From: Eric Wong @ 2019-11-16  2:34 UTC (permalink / raw)
  To: meta

Starting with the mbox.gz stuff, first.  Gettig rid of
Plack::Middleware::Deflater is a long-term goal since we can
take advantage of doing gzip during HTML/XML rendering to
reduce memory usage.

Eric Wong (3):
  mbox: unused mid_clean import
  mbox: split mboxgz out into a separate file
  mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip

 MANIFEST                  |  1 +
 lib/PublicInbox/Mbox.pm   | 68 +++----------------------------------
 lib/PublicInbox/MboxGz.pm | 71 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 64 deletions(-)
 create mode 100644 lib/PublicInbox/MboxGz.pm


^ permalink raw reply	[relevance 5%]

Results 1-5 of 5 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-11-15 21:28     message threading in the web UI Ralph Siemsen
2019-11-15 23:52     ` Eric Wong
2019-11-16 14:40  5%   ` Ralph Siemsen
2019-11-16  2:34  5% [PATCH 0/3] start tidying up gzip-related code Eric Wong
2019-11-16  2:34  5% ` [PATCH 3/3] mboxgz: use Compress::Raw::Zlib instead of IO::Compress::Gzip Eric Wong
2019-11-19 13:57  7%   ` SZEDER Gábor
2019-11-19 20:12  7%     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).