user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/4] http + mbox: tiny optimizations
@ 2016-06-25  0:45 Eric Wong
  2016-06-25  0:45 ` [PATCH 1/4] http: always yield on getline/body Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2016-06-25  0:45 UTC (permalink / raw)
  To: meta

For the gigantic $INBOX/all.mbox.gz response, this seems to slightly
improve speeds from roughly 290K/s to roughly 330K/s when fetching
out of a ~750MB aggressively-packed inbox.

Eric Wong (4):
      http: always yield on getline/body
      evcleanup: micro-optimize asap function
      mbox: reduce small packets for gzipped mboxes
      http: cork chunked responses for small savings

 lib/PublicInbox/EvCleanup.pm | 42 +++++++++++++++++++++++++++++++++---------
 lib/PublicInbox/HTTP.pm      | 14 ++++++--------
 lib/PublicInbox/Mbox.pm      | 23 ++++++++++-------------
 3 files changed, 49 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] http: always yield on getline/body
  2016-06-25  0:45 [PATCH 0/4] http + mbox: tiny optimizations Eric Wong
@ 2016-06-25  0:45 ` Eric Wong
  2016-06-25  0:45 ` [PATCH 2/4] evcleanup: micro-optimize asap function Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2016-06-25  0:45 UTC (permalink / raw)
  To: meta

We want to maximize fairness for large responses which may
download the entire mbox.
---
 lib/PublicInbox/HTTP.pm | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index 800b240..c141fc8 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -16,7 +16,6 @@ use Fcntl qw(:seek);
 use Plack::HTTPParser qw(parse_http_request); # XS or pure Perl
 use HTTP::Status qw(status_message);
 use HTTP::Date qw(time2str);
-use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
 use Scalar::Util qw(weaken);
 use IO::File;
 use constant {
@@ -26,8 +25,6 @@ use constant {
 	CHUNK_MAX_HDR => 256,
 };
 
-sub now () { clock_gettime(CLOCK_MONOTONIC) }
-
 # FIXME: duplicated code with NNTP.pm, layering violation
 my $WEAKEN = {}; # string(inbox) -> inbox
 my $weakt;
@@ -270,17 +267,15 @@ sub getline_response {
 		my $forward = $self->{forward};
 		# limit our own running time for fairness with other
 		# clients and to avoid buffering too much:
-		my $end = now() + 0.1;
 		while ($forward && defined(my $buf = $forward->getline)) {
 			$write->($buf);
 			last if $self->{closed};
 			if ($self->{write_buf_size}) {
 				$self->write($self->{pull});
-				return;
-			} elsif (now() > $end) {
+			} else {
 				PublicInbox::EvCleanup::asap($self->{pull});
-				return;
 			}
+			return;
 		}
 		$self->{forward} = $self->{pull} = undef;
 		$forward->close if $forward; # avoid recursion

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] evcleanup: micro-optimize asap function
  2016-06-25  0:45 [PATCH 0/4] http + mbox: tiny optimizations Eric Wong
  2016-06-25  0:45 ` [PATCH 1/4] http: always yield on getline/body Eric Wong
@ 2016-06-25  0:45 ` Eric Wong
  2016-06-25  0:45 ` [PATCH 3/4] mbox: reduce small packets for gzipped mboxes Eric Wong
  2016-06-25  0:45 ` [PATCH 4/4] http: cork chunked responses for small savings Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2016-06-25  0:45 UTC (permalink / raw)
  To: meta

Instead of relying on a timer with immediate callback,
arm a pipe to watch for writability, ensuring the callback
always fires.
---
 lib/PublicInbox/EvCleanup.pm | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/EvCleanup.pm b/lib/PublicInbox/EvCleanup.pm
index 5efb093..61837b8 100644
--- a/lib/PublicInbox/EvCleanup.pm
+++ b/lib/PublicInbox/EvCleanup.pm
@@ -5,32 +5,56 @@
 package PublicInbox::EvCleanup;
 use strict;
 use warnings;
+use base qw(Danga::Socket);
+use fields qw(rd);
+my $singleton;
+my $asapq = [ [], undef ];
+my $laterq = [ [], undef ];
 
-my $asapq = { queue => [], timer => undef };
-my $laterq = { queue => [], timer => undef };
+sub once_init () {
+	my $self = fields::new('PublicInbox::EvCleanup');
+	my ($r, $w);
+	pipe($r, $w) or die "pipe: $!";
+	$self->SUPER::new($w);
+	$self->{rd} = $r; # never read, since we never write..
+	$self;
+}
 
 sub _run_all ($) {
 	my ($q) = @_;
 
-	my $run = $q->{queue};
-	$q->{queue} = [];
-	$q->{timer} = undef;
+	my $run = $q->[0];
+	$q->[0] = [];
+	$q->[1] = undef;
 	$_->() foreach @$run;
 }
 
 sub _run_asap () { _run_all($asapq) }
 sub _run_later () { _run_all($laterq) }
 
+# Called by Danga::Socket
+sub event_write {
+	my ($self) = @_;
+	$self->watch_write(0);
+	_run_asap();
+}
+
+sub _asap_timer () {
+	$singleton ||= once_init();
+	$singleton->watch_write(1);
+	1;
+}
+
 sub asap ($) {
 	my ($cb) = @_;
-	push @{$asapq->{queue}}, $cb;
-	$asapq->{timer} ||= Danga::Socket->AddTimer(0, *_run_asap);
+	push @{$asapq->[0]}, $cb;
+	$asapq->[1] ||= _asap_timer();
 }
 
 sub later ($) {
 	my ($cb) = @_;
-	push @{$laterq->{queue}}, $cb;
-	$laterq->{timer} ||= Danga::Socket->AddTimer(60, *_run_later);
+	push @{$laterq->[0]}, $cb;
+	$laterq->[1] ||= Danga::Socket->AddTimer(60, *_run_later);
 }
 
 END {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] mbox: reduce small packets for gzipped mboxes
  2016-06-25  0:45 [PATCH 0/4] http + mbox: tiny optimizations Eric Wong
  2016-06-25  0:45 ` [PATCH 1/4] http: always yield on getline/body Eric Wong
  2016-06-25  0:45 ` [PATCH 2/4] evcleanup: micro-optimize asap function Eric Wong
@ 2016-06-25  0:45 ` Eric Wong
  2016-06-25  0:45 ` [PATCH 4/4] http: cork chunked responses for small savings Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2016-06-25  0:45 UTC (permalink / raw)
  To: meta

We want to avoid sending 10 or 20-byte gzip headers as
separate TCP packets to reduce syscalls and avoid wasting
bandwidth.
---
 lib/PublicInbox/Mbox.pm | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 63ec605..1c97f95 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -110,7 +110,7 @@ use warnings;
 
 sub new {
 	my ($class, $ctx, $cb) = @_;
-	my $buf;
+	my $buf = '';
 	bless {
 		buf => \$buf,
 		gz => IO::Compress::Gzip->new(\$buf, Time => 0),
@@ -121,19 +121,11 @@ sub new {
 	}, $class;
 }
 
-sub _flush_buf {
-	my ($self) = @_;
-	my $ret = $self->{buf};
-	$ret = $$ret;
-	${$self->{buf}} = undef;
-	$ret;
-}
-
 # called by Plack::Util::foreach or similar
 sub getline {
 	my ($self) = @_;
+	my $ctx = $self->{ctx} or return;
 	my $res;
-	my $ctx = $self->{ctx};
 	my $ibx = $ctx->{-inbox};
 	my $gz = $self->{gz};
 	do {
@@ -141,8 +133,12 @@ sub getline {
 			my $msg = eval { $ibx->msg_by_mid($smsg->mid) } or next;
 			$msg = Email::Simple->new($msg);
 			$gz->write(PublicInbox::Mbox::msg_str($ctx, $msg));
-			my $ret = _flush_buf($self);
-			return $ret if $ret;
+			my $bref = $self->{buf};
+			if (length($$bref) >= 8192) {
+				my $ret = $$bref; # copy :<
+				${$self->{buf}} = '';
+				return $ret;
+			}
 		}
 		$res = $self->{cb}->($self->{opts});
 		$self->{msgs} = $res->{msgs};
@@ -150,7 +146,8 @@ sub getline {
 		$self->{opts}->{offset} += $res;
 	} while ($res);
 	$gz->close;
-	_flush_buf($self);
+	delete $self->{ctx};
+	${delete $self->{buf}};
 }
 
 sub close {} # noop

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] http: cork chunked responses for small savings
  2016-06-25  0:45 [PATCH 0/4] http + mbox: tiny optimizations Eric Wong
                   ` (2 preceding siblings ...)
  2016-06-25  0:45 ` [PATCH 3/4] mbox: reduce small packets for gzipped mboxes Eric Wong
@ 2016-06-25  0:45 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2016-06-25  0:45 UTC (permalink / raw)
  To: meta

This only affects Linux users with MSG_MORE support.

We can avoid extra TCP overhead for sub-optimal chunk sizes
by using MSG_MORE even with chunk trailers under Linux.

This breaks real-time apps which require <= 200ms latency for
streaming small packets (e.g. implementing "tail -F"), but the
public-inbox WWW code does not (and will never) do such things.
---
 lib/PublicInbox/HTTP.pm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index c141fc8..e19c592 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -223,7 +223,10 @@ sub chunked_wcb ($) {
 		return if $_[0] eq '';
 		more($self, sprintf("%x\r\n", bytes::length($_[0])));
 		more($self, $_[0]);
-		$self->write("\r\n");
+
+		# use $self->write("\n\n") if you care about real-time
+		# streaming responses, public-inbox WWW does not.
+		more($self, "\r\n");
 	}
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-25  0:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-25  0:45 [PATCH 0/4] http + mbox: tiny optimizations Eric Wong
2016-06-25  0:45 ` [PATCH 1/4] http: always yield on getline/body Eric Wong
2016-06-25  0:45 ` [PATCH 2/4] evcleanup: micro-optimize asap function Eric Wong
2016-06-25  0:45 ` [PATCH 3/4] mbox: reduce small packets for gzipped mboxes Eric Wong
2016-06-25  0:45 ` [PATCH 4/4] http: cork chunked responses for small savings Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).