user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [RFC 4/4] nntp: reduce memory overhead of zlib
Date: Fri,  5 Jul 2019 22:53:39 +0000
Message-ID: <20190705225339.5698-5-e@80x24.org> (raw)
In-Reply-To: <20190705225339.5698-1-e@80x24.org>

Using Z_FULL_FLUSH at the right places in our event loop, it
appears we can share a single zlib deflate context across ALL
clients in a process.

The zlib deflate context is the biggest factor in per-client
memory use, so being able to share that across many clients
results in a large memory savings.

With 10K idle-but-did-something NNTP clients connected to a
single process on a 64-bit system, TLS+DEFLATE used around
1.8 GB of RSS before this change.  It now uses around 300 MB.
TLS via IO::Socket::SSL alone uses <200MB in the same situation,
so the actual memory reduction is over 10x.

This makes compression less efficient and bandwidth increases
around 45% in informal testing, but it's far better than no
compression at all.  It's likely around the same level of
compression gzip gives on the HTTP side.

Security implications with TLS?  I don't know, but I don't
really care, either...  public-inbox-nntpd doesn't support
authentication and it's up to the client to enable compression.
It's not too different than Varnish caching gzipped responses
on the HTTP side and having responses go to multiple HTTPS
clients.
---
 lib/PublicInbox/DS.pm          |  4 ++-
 lib/PublicInbox/NNTP.pm        |  7 ++++
 lib/PublicInbox/NNTPdeflate.pm | 59 +++++++++++++++++++++-------------
 t/nntpd-validate.t             |  8 ++++-
 4 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 586c47cd..b16b1896 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -553,7 +553,9 @@ sub msg_more ($$) {
             return 0;
         }
     }
-    $self->write(\($_[1]));
+
+    # don't redispatch into NNTPdeflate::write
+    PublicInbox::DS::write($self, \($_[1]));
 }
 
 sub epwait ($$) {
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index d6f315ba..895858b7 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -642,6 +642,11 @@ sub long_response ($$) {
 		} elsif ($more) { # $self->{wbuf}:
 			update_idle_time($self);
 
+			# COMPRESS users all share the same DEFLATE context.
+			# Flush it here to ensure clients don't see
+			# each other's data
+			$self->zflush;
+
 			# no recursion, schedule another call ASAP
 			# but only after all pending writes are done
 			my $wbuf = $self->{wbuf} ||= [];
@@ -925,6 +930,8 @@ sub cmd_compress ($$) {
 	undef
 }
 
+sub zflush {} # overridden by NNTPdeflate
+
 sub cmd_xpath ($$) {
 	my ($self, $mid) = @_;
 	return r501 unless $mid =~ /\A<(.+)>\z/;
diff --git a/lib/PublicInbox/NNTPdeflate.pm b/lib/PublicInbox/NNTPdeflate.pm
index 66210bfa..78da2a58 100644
--- a/lib/PublicInbox/NNTPdeflate.pm
+++ b/lib/PublicInbox/NNTPdeflate.pm
@@ -2,13 +2,18 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 
 # RFC 8054 NNTP COMPRESS DEFLATE implementation
-# Warning, enabling compression for C10K NNTP clients is rather
-# expensive in terms of memory use.
 #
 # RSS usage for 10K idle-but-did-something NNTP clients on 64-bit:
-#   TLS + DEFLATE :  1.8 GB  (MemLevel=9, 1.2 GB with MemLevel=8)
-#   TLS only      :  <200MB
-#   plain         :   <50MB
+#   TLS + DEFLATE[a] :  1.8 GB  (MemLevel=9, 1.2 GB with MemLevel=8)
+#   TLS + DEFLATE[b] :  ~300MB
+#   TLS only         :  <200MB
+#   plain            :   <50MB
+#
+# [a] - initial implementation using per-client Deflate contexts and buffer
+#
+# [b] - memory-optimized implementation using a global deflate context.
+#       It's less efficient in terms of compression, but way more
+#       efficient in terms of server memory usage.
 package PublicInbox::NNTPdeflate;
 use strict;
 use warnings;
@@ -23,11 +28,11 @@ my %IN_OPT = (
 	-AppendOutput => 1,
 );
 
-my %OUT_OPT = (
+# global deflate context and buffer
+my $zbuf = \(my $buf = '');
+my $zout = Compress::Raw::Zlib::Deflate->new(
 	# nnrpd (INN) and Compress::Raw::Zlib favor MemLevel=9,
-	# but the zlib C library and git use MemLevel=8
-	# as the default.  Using 8 drops our memory use with 10K
-	# TLS clients from 1.8 GB to 1.2 GB, but...
+	# but the zlib C library and git use MemLevel=8 as the default.
 	# FIXME: sometimes clients fail with 8, so we use 9
 	# -MemLevel => 9,
 
@@ -43,7 +48,6 @@ sub enable {
 	unlock_hash(%$self);
 	bless $self, $class;
 	$self->{zin} = [ Compress::Raw::Zlib::Inflate->new(%IN_OPT), '' ];
-	$self->{zout} = [ Compress::Raw::Zlib::Deflate->new(%OUT_OPT), '' ];
 }
 
 # overrides PublicInbox::NNTP::compressed
@@ -74,31 +78,42 @@ sub do_read ($$$$) {
 # override PublicInbox::DS::msg_more
 sub msg_more ($$) {
 	my $self = $_[0];
-	my $zout = $self->{zout};
 
 	# $_[1] may be a reference or not for ->deflate
-	my $err = $zout->[0]->deflate($_[1], $zout->[1]);
+	my $err = $zout->deflate($_[1], $zbuf);
 	$err == Z_OK or die "->deflate failed $err";
 	1;
 }
 
-# SUPER is PublicInbox::DS::write, so $_[1] may be a reference or not
+sub zflush ($) {
+	my ($self) = @_;
+
+	my $deflated = $zbuf;
+	$zbuf = \(my $next = '');
+
+	my $err = $zout->flush($deflated, Z_FULL_FLUSH);
+	$err == Z_OK or die "->flush failed $err";
+
+	# We can still let the lower socket layer do buffering:
+	PublicInbox::DS::msg_more($self, $$deflated);
+}
+
+# compatible with PublicInbox::DS::write, so $_[1] may be a reference or not
 sub write ($$) {
 	my $self = $_[0];
-	return $self->SUPER::write($_[1]) if ref($_[1]) eq 'CODE';
-	my $zout = $self->{zout};
-	my $deflated = pop @$zout;
+	return PublicInbox::DS::write($self, $_[1]) if ref($_[1]) eq 'CODE';
+
+	my $deflated = $zbuf;
+	$zbuf = \(my $next = '');
 
 	# $_[1] may be a reference or not for ->deflate
-	my $err = $zout->[0]->deflate($_[1], $deflated);
+	my $err = $zout->deflate($_[1], $deflated);
 	$err == Z_OK or die "->deflate failed $err";
-	$err = $zout->[0]->flush($deflated, Z_PARTIAL_FLUSH);
+	$err = $zout->flush($deflated, Z_FULL_FLUSH);
 	$err == Z_OK or die "->flush failed $err";
 
-	# PublicInbox::DS::write puts partial writes into another buffer,
-	# so we can prepare the next deflate buffer:
-	$zout->[1] = '';
-	$self->SUPER::write(\$deflated);
+	# We can still let the socket layer do buffering:
+	PublicInbox::DS::write($self, $deflated);
 }
 
 1;
diff --git a/t/nntpd-validate.t b/t/nntpd-validate.t
index 1a325105..532ef729 100644
--- a/t/nntpd-validate.t
+++ b/t/nntpd-validate.t
@@ -112,11 +112,17 @@ sub do_get_all {
 			}
 		}
 	}
+
+	# hacky bytes_read thing added to Net::NNTP for testing:
+	my $bytes_read = '';
+	if ($nntp->can('bytes_read')) {
+		$bytes_read .= ' '.$nntp->bytes_read.'b';
+	}
 	my $q = $nntp->quit;
 	print STDERR "# quit failed: ".$nntp->code."\n" if !$q;
 	my $elapsed = sprintf('%0.3f', clock_gettime(CLOCK_MONOTONIC) - $t0);
 	my $res = $dig->hexdigest;
-	print STDERR "# $desc - $res (${elapsed}s)\n";
+	print STDERR "# $desc - $res (${elapsed}s)$bytes_read\n";
 	$res;
 }
 my @tests = ([]);
-- 
EW


  parent reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-05 22:53 [PATCH 0/4] implement NNTP COMPRESS per RFC 8054 Eric Wong
2019-07-05 22:53 ` [PATCH 1/4] nntp: use msg_more as a method Eric Wong
2019-07-05 22:53 ` [PATCH 2/4] nntp: move LINE_MAX constant to the top Eric Wong
2019-07-05 22:53 ` [PATCH 3/4] nntp: support COMPRESS DEFLATE per RFC 8054 Eric Wong
2019-07-05 22:53 ` Eric Wong [this message]
2019-07-07  7:08 ` [PATCH 5/4] nntp: improve error reporting for COMPRESS Eric Wong

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190705225339.5698-5-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox