user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/7] optimize V2
@ 2018-04-03 11:09 Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 1/7] t/thread-all.t: modernize test to support modern inboxes Eric Wong (Contractor, The Linux Foundation)
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

The switch to the SQLite overview DB introduced a regression in
the use of OFFSET.  While Xapian largely unaffected by the
offset for pagination, SQLite basically gets the full result and
cuts it at the offset, so using large offsets gets slower as
time goes on.

There are some good side-effects of this, too:

* Using last-hop-Received timestamps for HTML pagination ought
  to improve cacheability on the client-side and in search engines.

* Switching to using NNTP article numbers gives us race avoidance
  for NEWNEWS so duplicates won't show up while iterating
  through result batches (not that I've seen any logs this
  command being used in the wild...)

I don't believe there are any more regressions from switching
to the Xapian skeleton to a SQLite over(view) DB.  Homepage
performance seems consistent whether it's 2.8 million messages
or 2-3x that.

Now, NNTP performance should be improved across the board and
commands like XOVER/XHDR are over 20x faster than they were
in v1 with Xapian.

There's also a few cleanups and code simplifications
which should make future work easier.

Eric Wong (Contractor, The Linux Foundation) (7):
  t/thread-all.t: modernize test to support modern inboxes
  rename+rewrite test using Benchmark module
  nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster
  view: avoid offset during pagination
  mbox: remove remaining OFFSET usage in SQLite
  msgmap: replace id_batch with ids_after
  nntp: simplify the long_response API

 MANIFEST                  |   3 +-
 lib/PublicInbox/Feed.pm   |  25 +--------
 lib/PublicInbox/Inbox.pm  |   4 +-
 lib/PublicInbox/Mbox.pm   |  30 ++++++++---
 lib/PublicInbox/Msgmap.pm |  29 +++++++----
 lib/PublicInbox/NNTP.pm   | 114 ++++++++++++++++++----------------------
 lib/PublicInbox/Over.pm   |  50 ++++++++++++------
 lib/PublicInbox/Search.pm |   8 +--
 lib/PublicInbox/View.pm   |  90 +++++++++++++++++++++++---------
 t/nntpd.t                 |   1 +
 t/perf-nntpd.t            | 130 ++++++++++++++++++++++++++++++++++++++++++++++
 t/perf-threading.t        |  32 ++++++++++++
 t/psgi_v2.t               |  22 +++++++-
 t/thread-all.t            |  38 --------------
 t/v2writable.t            |  17 ++++++
 15 files changed, 403 insertions(+), 190 deletions(-)
 create mode 100644 t/perf-nntpd.t
 create mode 100644 t/perf-threading.t
 delete mode 100644 t/thread-all.t

-- 
EW


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/7] t/thread-all.t: modernize test to support modern inboxes
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 2/7] rename+rewrite test using Benchmark module Eric Wong (Contractor, The Linux Foundation)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

We'll be adding more tests in the same vein as this
to improve NNTP performance.
---
 t/thread-all.t | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/t/thread-all.t b/t/thread-all.t
index d4e8c1f..820fba8 100644
--- a/t/thread-all.t
+++ b/t/thread-all.t
@@ -6,32 +6,26 @@ use strict;
 use warnings;
 use Test::More;
 use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
+use PublicInbox::Inbox;
 my $pi_dir = $ENV{GIANT_PI_DIR};
 plan skip_all => "GIANT_PI_DIR not defined for $0" unless $pi_dir;
-eval { require PublicInbox::Search; };
-plan skip_all => "Xapian missing for $0" if $@;
-my $srch = eval { PublicInbox::Search->new($pi_dir) };
-plan skip_all => "$pi_dir not initialized for $0" if $@;
+my $ibx = PublicInbox::Inbox->new({ mainrepo => $pi_dir });
+my $srch = $ibx->search;
+plan skip_all => "$pi_dir not configured for search $0" unless $srch;
 
 require PublicInbox::View;
 require PublicInbox::SearchThread;
 
-my $pfx = PublicInbox::Search::xpfx('thread');
-my $opts = { limit => 1000000, asc => 1 };
 my $t0 = clock_gettime(CLOCK_MONOTONIC);
 my $elapsed;
-
-my $sres = $srch->_do_enquire(undef, $opts);
-$elapsed = clock_gettime(CLOCK_MONOTONIC) - $t0;
-diag "enquire: $elapsed";
-
-$t0 = clock_gettime(CLOCK_MONOTONIC);
-my $msgs = PublicInbox::View::load_results($srch, $sres);
+my $msgs = $srch->{over_ro}->recent({limit => 200000});
+my $n =	scalar(@$msgs);
+ok($n, 'got some messages');
 $elapsed = clock_gettime(CLOCK_MONOTONIC) - $t0;
-diag "load_results $elapsed";
+diag "enquire: $elapsed for $n";
 
 $t0 = clock_gettime(CLOCK_MONOTONIC);
-PublicInbox::View::thread_results($msgs);
+PublicInbox::View::thread_results({-inbox => $ibx}, $msgs);
 $elapsed = clock_gettime(CLOCK_MONOTONIC) - $t0;
 diag "thread_results $elapsed";
 
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] rename+rewrite test using Benchmark module
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 1/7] t/thread-all.t: modernize test to support modern inboxes Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 3/7] nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster Eric Wong (Contractor, The Linux Foundation)
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

There'll be more performance-related tests in the future.
---
 MANIFEST           |  2 +-
 t/perf-threading.t | 32 ++++++++++++++++++++++++++++++++
 t/thread-all.t     | 32 --------------------------------
 3 files changed, 33 insertions(+), 33 deletions(-)
 create mode 100644 t/perf-threading.t
 delete mode 100644 t/thread-all.t

diff --git a/MANIFEST b/MANIFEST
index 5fd8acf..4e79a4c 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -173,6 +173,7 @@ t/msgmap.t
 t/nntp.t
 t/nntpd.t
 t/over.t
+t/perf-threading.t
 t/plack.t
 t/precheck.t
 t/psgi_attach.t
@@ -186,7 +187,6 @@ t/search-thr-index.t
 t/search.t
 t/spamcheck_spamc.t
 t/spawn.t
-t/thread-all.t
 t/thread-cycle.t
 t/time.t
 t/utf8.mbox
diff --git a/t/perf-threading.t b/t/perf-threading.t
new file mode 100644
index 0000000..15779c9
--- /dev/null
+++ b/t/perf-threading.t
@@ -0,0 +1,32 @@
+# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# real-world testing of search threading
+use strict;
+use warnings;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+my $pi_dir = $ENV{GIANT_PI_DIR};
+plan skip_all => "GIANT_PI_DIR not defined for $0" unless $pi_dir;
+my $ibx = PublicInbox::Inbox->new({ mainrepo => $pi_dir });
+eval { require PublicInbox::Search };
+my $srch = $ibx->search;
+plan skip_all => "$pi_dir not configured for search $0 $@" unless $srch;
+
+require PublicInbox::View;
+
+my $msgs;
+my $elapsed = timeit(1, sub {
+	$msgs = $srch->{over_ro}->recent({limit => 200000});
+});
+my $n = scalar(@$msgs);
+ok($n, 'got some messages');
+diag "enquire: ".timestr($elapsed)." for $n";
+
+$elapsed = timeit(1, sub {
+	PublicInbox::View::thread_results({-inbox => $ibx}, $msgs);
+});
+diag "thread_results ".timestr($elapsed);
+
+done_testing();
diff --git a/t/thread-all.t b/t/thread-all.t
deleted file mode 100644
index 820fba8..0000000
--- a/t/thread-all.t
+++ /dev/null
@@ -1,32 +0,0 @@
-# Copyright (C) 2016-2018 all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-#
-# real-world testing of search threading
-use strict;
-use warnings;
-use Test::More;
-use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
-use PublicInbox::Inbox;
-my $pi_dir = $ENV{GIANT_PI_DIR};
-plan skip_all => "GIANT_PI_DIR not defined for $0" unless $pi_dir;
-my $ibx = PublicInbox::Inbox->new({ mainrepo => $pi_dir });
-my $srch = $ibx->search;
-plan skip_all => "$pi_dir not configured for search $0" unless $srch;
-
-require PublicInbox::View;
-require PublicInbox::SearchThread;
-
-my $t0 = clock_gettime(CLOCK_MONOTONIC);
-my $elapsed;
-my $msgs = $srch->{over_ro}->recent({limit => 200000});
-my $n =	scalar(@$msgs);
-ok($n, 'got some messages');
-$elapsed = clock_gettime(CLOCK_MONOTONIC) - $t0;
-diag "enquire: $elapsed for $n";
-
-$t0 = clock_gettime(CLOCK_MONOTONIC);
-PublicInbox::View::thread_results({-inbox => $ibx}, $msgs);
-$elapsed = clock_gettime(CLOCK_MONOTONIC) - $t0;
-diag "thread_results $elapsed";
-
-done_testing();
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/7] nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 1/7] t/thread-all.t: modernize test to support modern inboxes Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 2/7] rename+rewrite test using Benchmark module Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 4/7] view: avoid offset during pagination Eric Wong (Contractor, The Linux Foundation)
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

While SQLite is faster than Xapian for some queries we
use, it sucks at handling OFFSET.  Fortunately, we do
not need offsets when retrieving sorted results and
can bake it into the query.

For inbox.comp.version-control.git (v1 Xapian),
XOVER and XHDR are over 20x faster.
---
 MANIFEST                  |   1 +
 lib/PublicInbox/NNTP.pm   |  39 +++++++-------
 lib/PublicInbox/Over.pm   |  15 +++---
 lib/PublicInbox/Search.pm |   4 +-
 t/perf-nntpd.t            | 130 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 159 insertions(+), 30 deletions(-)
 create mode 100644 t/perf-nntpd.t

diff --git a/MANIFEST b/MANIFEST
index 4e79a4c..2dad988 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -173,6 +173,7 @@ t/msgmap.t
 t/nntp.t
 t/nntpd.t
 t/over.t
+t/perf-nntpd.t
 t/perf-threading.t
 t/plack.t
 t/precheck.t
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 48ab7fc..ff6d895 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -331,20 +331,20 @@ sub cmd_newnews ($$$$;$$) {
 	};
 	return '.' unless @srch;
 
-	my $opts = { limit => 1000, offset => 0 };
+	my $prev = 0;
 	long_response($self, 0, long_response_limit, sub {
 		my ($i) = @_;
 		my $srch = $srch[0];
-		my $msgs = $srch->query_ts($ts, $opts);
-		if (my $nr = scalar @$msgs) {
+		my $msgs = $srch->query_ts($ts, $prev);
+		if (scalar @$msgs) {
 			more($self, '<' .
 				join(">\r\n<", map { $_->mid } @$msgs ).
 				'>');
-			$opts->{offset} += $nr;
+			$prev = $msgs->[-1]->{num};
 		} else {
 			shift @srch;
 			if (@srch) { # continue onto next newsgroup
-				$opts->{offset} = 0;
+				$prev = 0;
 			} else { # break out of the long response.
 				$$i = long_response_limit;
 			}
@@ -582,7 +582,7 @@ sub long_response ($$$$) {
 	$self->{long_res} = sub {
 		# limit our own running time for fairness with other
 		# clients and to avoid buffering too much:
-		my $lim = 100;
+		my $lim = $end == long_response_limit ? 1 : 100;
 
 		my $err;
 		do {
@@ -710,20 +710,19 @@ sub hdr_searchmsg ($$$$) {
 		return $r unless ref $r;
 		my ($beg, $end) = @$r;
 		more($self, $xhdr ? r221 : r225);
-		my $off = 0;
-		long_response($self, $beg, $end, sub {
+		my $cur = $beg;
+		long_response($self, 0, long_response_limit, sub {
 			my ($i) = @_;
-			my $msgs = $srch->query_xover($beg, $end, $off);
-			my $nr = scalar @$msgs or return;
-			$off += $nr;
+			my $msgs = $srch->query_xover($cur, $end);
+			my $nr = scalar @$msgs or
+					return ($$i = long_response_limit);
 			my $tmp = '';
 			foreach my $s (@$msgs) {
 				$tmp .= $s->num . ' ' . $s->$field . "\r\n";
 			}
 			utf8::encode($tmp);
 			do_more($self, $tmp);
-			# -1 to adjust for implicit increment in long_response
-			$$i = $nr ? $$i + $nr - 1 : long_response_limit;
+			$cur = $msgs->[-1]->{num} + 1;
 		});
 	}
 }
@@ -848,20 +847,18 @@ sub cmd_xover ($;$) {
 	my ($beg, $end) = @$r;
 	more($self, "224 Overview information follows for $beg to $end");
 	my $srch = $self->{ng}->search;
-	my $off = 0;
-	long_response($self, $beg, $end, sub {
+	my $cur = $beg;
+	long_response($self, 0, long_response_limit, sub {
 		my ($i) = @_;
-		my $msgs = $srch->query_xover($beg, $end, $off);
-		my $nr = scalar @$msgs or return;
-		$off += $nr;
+		my $msgs = $srch->query_xover($cur, $end);
+		my $nr = scalar @$msgs or return ($$i = long_response_limit);
 
 		# OVERVIEW.FMT
 		more($self, join("\r\n", map {
 			over_line($_->{num}, $_);
 			} @$msgs));
-
-		# -1 to adjust for implicit increment in long_response
-		$$i = $nr ? $$i + $nr - 1 : long_response_limit;
+		$cur = $msgs->[-1]->{num} + 1;
+		1;
 	});
 }
 
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 3d285ac..a7fd131 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -51,25 +51,26 @@ sub do_get {
 	my $dbh = $self->connect;
 	my $lim = (($opts->{limit} || 0) + 0) || 1000;
 	my $off = (($opts->{offset} || 0) + 0) || 0;
-	$sql .= "LIMIT $lim OFFSET $off";
+	$sql .= "LIMIT $lim";
+	$sql .= " OFFSET $off" if $off > 0;
 	my $msgs = $dbh->selectall_arrayref($sql, { Slice => {} }, @args);
 	load_from_row($_) for @$msgs;
 	$msgs
 }
 
 sub query_xover {
-	my ($self, $beg, $end, $off) = @_;
-	do_get($self, <<'', { offset => $off }, $beg, $end);
+	my ($self, $beg, $end) = @_;
+	do_get($self, <<'', {}, $beg, $end);
 SELECT * FROM over WHERE num >= ? AND num <= ?
 ORDER BY num ASC
 
 }
 
 sub query_ts {
-	my ($self, $ts, $opts) = @_;
-	do_get($self, <<'', $opts, $ts);
-SELECT * FROM over WHERE num > 0 AND ts >= ?
-ORDER BY ts ASC
+	my ($self, $ts, $prev) = @_;
+	do_get($self, <<'', {}, $ts, $prev);
+SELECT num,ddd FROM over WHERE ts >= ? AND num > ?
+ORDER BY num ASC
 
 }
 
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 84c0a22..f7fdf85 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -291,8 +291,8 @@ sub query_xover {
 }
 
 sub query_ts {
-	my ($self, $ts, $offset) = @_;
-	$self->{over_ro}->query_ts($ts, $offset);
+	my ($self, $ts, $prev) = @_;
+	$self->{over_ro}->query_ts($ts, $prev);
 }
 
 sub first_smsg_by_mid {
diff --git a/t/perf-nntpd.t b/t/perf-nntpd.t
new file mode 100644
index 0000000..4987f98
--- /dev/null
+++ b/t/perf-nntpd.t
@@ -0,0 +1,130 @@
+# Copyright (C) 2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+use File::Temp qw/tempdir/;
+use POSIX qw(dup2);
+use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD);
+use Net::NNTP;
+my $pi_dir = $ENV{GIANT_PI_DIR};
+plan skip_all => "GIANT_PI_DIR not defined for $0" unless $pi_dir;
+eval { require PublicInbox::Search };
+my ($host_port, $group, %opts, $s, $pid);
+END {
+	if ($s) {
+		$s->print("QUIT\r\n");
+		$s->getline;
+		$s = undef;
+	}
+	kill 'TERM', $pid if defined $pid;
+};
+
+if (($ENV{NNTP_TEST_URL} || '') =~ m!\Anntp://([^/]+)/([^/]+)\z!) {
+	($host_port, $group) = ($1, $2);
+	$host_port .= ":119" unless index($host_port, ':') > 0;
+} else {
+	$group = 'inbox.test.perf.nntpd';
+	my $ibx = { mainrepo => $pi_dir, newsgroup => $group };
+	$ibx = PublicInbox::Inbox->new($ibx);
+	my $nntpd = 'blib/script/public-inbox-nntpd';
+	my $tmpdir = tempdir('perf-nntpd-XXXXXX', TMPDIR => 1, CLEANUP => 1);
+
+	my $pi_config = "$tmpdir/config";
+	{
+		open my $fh, '>', $pi_config or die "open($pi_config): $!";
+		print $fh <<"" or die "print $pi_config: $!";
+[publicinbox "test"]
+	newsgroup = $group
+	mainrepo = $pi_dir
+	address = test\@example.com
+
+		close $fh or die "close($pi_config): $!";
+	}
+
+	%opts = (
+		LocalAddr => '127.0.0.1',
+		ReuseAddr => 1,
+		Proto => 'tcp',
+		Listen => 1024,
+	);
+	my $sock = IO::Socket::INET->new(%opts);
+
+	ok($sock, 'sock created');
+	$! = 0;
+	$pid = fork;
+	if ($pid == 0) {
+		# pretend to be systemd
+		my $fl = fcntl($sock, F_GETFD, 0);
+		dup2(fileno($sock), 3) or die "dup2 failed: $!\n";
+		dup2(1, 2) or die "dup2 failed: $!\n";
+		fcntl($sock, F_SETFD, $fl &= ~FD_CLOEXEC);
+		$ENV{LISTEN_PID} = $$;
+		$ENV{LISTEN_FDS} = 1;
+		$ENV{PI_CONFIG} = $pi_config;
+		exec $nntpd, '-W0';
+		die "FAIL: $!\n";
+	}
+	ok(defined $pid, 'forked nntpd process successfully');
+	$host_port = $sock->sockhost . ':' . $sock->sockport;
+}
+%opts = (
+	PeerAddr => $host_port,
+	Proto => 'tcp',
+	Timeout => 1,
+);
+$s = IO::Socket::INET->new(%opts);
+$s->autoflush(1);
+my $buf = $s->getline;
+is($buf, "201 server ready - post via email\r\n", 'got greeting');
+ok($s->print("GROUP $group\r\n"), 'changed group');
+$buf = $s->getline;
+my ($tot, $min, $max) = ($buf =~ /\A211 (\d+) (\d+) (\d+) /);
+ok($tot && $min && $max, 'got GROUP response');
+my $nr = $max - $min;
+my $nmax = 50000;
+my $nmin = $max - $nmax;
+$nmin = $min if $nmin < $min;
+my $res;
+my $spec = "$nmin-$max";
+my $n;
+
+sub read_until_dot ($) {
+	my $n = 0;
+	do {
+		$buf = $s->getline;
+		++$n
+	} until $buf eq ".\r\n";
+	$n;
+}
+
+my $t = timeit(1, sub {
+	$s->print("XOVER $spec\r\n");
+	$n = read_until_dot($s);
+});
+diag 'xover took: ' . timestr($t) . " for $n";
+
+$t = timeit(1, sub {
+	$s->print("HDR From $spec\r\n");
+	$n = read_until_dot($s);
+
+});
+diag "XHDR From ". timestr($t) . " for $n";
+
+my $date = $ENV{NEWNEWS_DATE};
+unless ($date) {
+	my (undef, undef, undef, $d, $m, $y) = gmtime(time - 30 * 86400);
+	$date = sprintf('%04u%02u%02u', $y + 1900, $m, $d);
+	diag "NEWNEWS_DATE undefined, using $date";
+}
+$t = timeit(1, sub {
+	$s->print("NEWNEWS * $date 000000 GMT\r\n");
+	$n = read_until_dot($s);
+});
+diag 'newnews took: ' . timestr($t) . " for $n";
+
+done_testing();
+
+1;
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/7] view: avoid offset during pagination
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
                   ` (2 preceding siblings ...)
  2018-04-03 11:09 ` [PATCH 3/7] nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 5/7] mbox: remove remaining OFFSET usage in SQLite Eric Wong (Contractor, The Linux Foundation)
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

OFFSET in SQLite gets painful to deal with.  Instead,
rely on timestamps (from Received:) for pagination.
This also sets us up for more precise Date searching
in case we want it.
---
 lib/PublicInbox/Feed.pm  | 25 ++------------
 lib/PublicInbox/Inbox.pm |  4 +--
 lib/PublicInbox/Over.pm  | 24 ++++++++++---
 lib/PublicInbox/View.pm  | 90 ++++++++++++++++++++++++++++++++++--------------
 4 files changed, 89 insertions(+), 54 deletions(-)

diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index ff20d7f..5cb044b 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -75,7 +75,7 @@ sub new_html {
 			my $more = scalar @$msgs;
 			return PublicInbox::View::index_entry($m, $ctx, $more);
 		}
-		new_html_footer($ctx);
+		PublicInbox::View::pagination_footer($ctx, './new.html');
 	});
 }
 
@@ -85,21 +85,6 @@ sub _no_thread () {
 	[404, ['Content-Type', 'text/plain'], ["No feed found for thread\n"]];
 }
 
-sub new_html_footer {
-	my ($ctx) = @_;
-	my $qp = delete $ctx->{qp} or return;
-	my $latest = '';
-	my $next = delete $ctx->{next_page} || '';
-	if ($next) {
-		$next = qq!<a\nhref="?$next"\nrel=next>next</a>!;
-	}
-	if (!$qp) {
-		$latest = qq! <a\nhref='./new.html'>latest</a>!;
-		$next ||= '    ';
-	}
-	"<hr><pre>page: $next$latest</pre>";
-}
-
 sub recent_msgs {
 	my ($ctx) = @_;
 	my $ibx = $ctx->{-inbox};
@@ -110,13 +95,7 @@ sub recent_msgs {
 		die "BUG: unsupported inbox version: $v\n";
 	}
 	if (my $srch = $ibx->search) {
-		my $o = $qp ? $qp->{o} : 0;
-		$o += 0;
-		$o = 0 if $o < 0;
-		my $msgs = $ibx->recent({ limit => $max, offset => $o });
-		my $next = $o + $max;
-		$ctx->{next_page} = "o=$next" if scalar(@$msgs) == $max;
-		return $msgs;
+		return PublicInbox::View::paginate_recent($ctx);
 	}
 
 	my $hex = '[a-f0-9]';
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 142b5c8..0ea18b4 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -318,8 +318,8 @@ sub msg_by_mid ($$;$) {
 }
 
 sub recent {
-	my ($self, $opts) = @_;
-	search($self)->query('', $opts);
+	my ($self, $opts, $after, $before) = @_;
+	search($self)->{over_ro}->recent($opts, $after, $before);
 }
 
 1;
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index a7fd131..b230d44 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -109,10 +109,26 @@ SELECT COUNT(num) $cond
 }
 
 sub recent {
-	my ($self, $opts) = @_;
-	my $msgs = do_get($self, <<'', $opts);
-SELECT * FROM over WHERE num > 0
-ORDER BY ts DESC
+	my ($self, $opts, $after, $before) = @_;
+	my ($s, @v);
+	if (defined($before)) {
+		if (defined($after)) {
+			$s = 'num > 0 AND ts >= ? AND ts <= ? ORDER BY ts DESC';
+			@v = ($after, $before);
+		} else {
+			$s = 'num > 0 AND ts <= ? ORDER BY ts DESC';
+			@v = ($before);
+		}
+	} else {
+		if (defined($after)) {
+			$s = 'num > 0 AND ts >= ? ORDER BY ts ASC';
+			@v = ($after);
+		} else {
+			$s = 'num > 0 ORDER BY ts DESC';
+		}
+	}
+	my $msgs = do_get($self, <<"", $opts, @v);
+SELECT * FROM over WHERE $s
 
 	return $msgs unless wantarray;
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index cad90a7..cbed916 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -15,6 +15,7 @@ use PublicInbox::Address;
 use PublicInbox::WwwStream;
 use PublicInbox::Reply;
 require POSIX;
+use Time::Local qw(timegm);
 
 use constant INDENT => '  ';
 use constant TCHILD => '` ';
@@ -1032,43 +1033,82 @@ sub dump_topics {
 	200;
 }
 
+sub ts2str ($) {
+	my ($ts) = @_;
+	POSIX::strftime('%Y%m%d%H%M%S', gmtime($ts));
+}
+
+sub str2ts ($) {
+	my ($yyyy, $mon, $dd, $hh, $mm, $ss) = unpack('A4A2A2A2A2A2', $_[0]);
+	timegm($ss, $mm, $hh, $dd, $mon - 1, $yyyy);
+}
+
+sub pagination_footer ($$) {
+	my ($ctx, $latest) = @_;
+	delete $ctx->{qp} or return;
+	my $next = $ctx->{next_page} || '';
+	my $prev = $ctx->{prev_page} || '';
+	if ($prev) {
+		$next = $next ? "$next " : '     ';
+		$prev .= qq! <a\nhref='$latest'>latest</a>!;
+	}
+	"<hr><pre>page: $next$prev</pre>";
+}
+
 sub index_nav { # callback for WwwStream
 	my (undef, $ctx) = @_;
-	delete $ctx->{qp} or return;
-	my ($next, $prev);
-	$next = $prev = '    ';
-	my $latest = '';
+	pagination_footer($ctx, '.')
+}
+
+sub paginate_recent ($) {
+	my ($ctx) = @_;
+	my $t = $ctx->{qp}->{t} || '';
+	my $lim = 200; # this is our window
+	my $opts = { limit => $lim };
+	my ($after, $before);
+
+	# Xapian uses '..' but '-' is perhaps friendier to URL linkifiers
+	# if only $after exists "YYYYMMDD.." because "." could be skipped
+	# if interpreted as an end-of-sentence
+	$t =~ s/\A(\d{8,14})-// and $after = str2ts($1);
+	$t =~ /\A(\d{8,14})\z/ and $before = str2ts($1);
 
-	my $next_o = $ctx->{-next_o};
-	if ($next_o) {
-		$next = qq!<a\nhref="?o=$next_o"\nrel=next>next</a>!;
+	my $ibx = $ctx->{-inbox};
+	my $msgs = $ibx->recent($opts, $after, $before);
+	my $nr = scalar @$msgs;
+	if ($nr < $lim && defined($after)) {
+		$after = $before = undef;
+		$msgs = $ibx->recent($opts);
+		$nr = scalar @$msgs;
 	}
-	if (my $cur_o = $ctx->{-cur_o}) {
-		$latest = qq! <a\nhref=.>latest</a>!;
-
-		my $o = $cur_o - ($next_o - $cur_o);
-		if ($o > 0) {
-			$prev = qq!<a\nhref="?o=$o"\nrel=prev>prev</a>!;
-		} elsif ($o == 0) {
-			$prev = qq!<a\nhref=.\nrel=prev>prev</a>!;
+	my $more = $nr == $lim;
+	my ($newest, $oldest);
+	if ($nr) {
+		$newest = $msgs->[0]->{ts};
+		$oldest = $msgs->[-1]->{ts};
+		# if we only had $after, our SQL query in ->recent ordered
+		if ($newest < $oldest) {
+			($oldest, $newest) = ($newest, $oldest);
+			$more = 0 if defined($after) && $after < $oldest;
 		}
 	}
-	"<hr><pre>page: $next $prev$latest</pre>";
+	if (defined($oldest) && $more) {
+		my $s = ts2str($oldest);
+		$ctx->{next_page} = qq!<a\nhref="?t=$s"\nrel=next>next</a>!;
+	}
+	if (defined($newest) && (defined($before) || defined($after))) {
+		my $s = ts2str($newest);
+		$ctx->{prev_page} = qq!<a\nhref="?t=$s-"\nrel=prev>prev</a>!;
+	}
+	$msgs;
 }
 
 sub index_topics {
 	my ($ctx) = @_;
-	my ($off) = (($ctx->{qp}->{o} || '0') =~ /(\d+)/);
-
-	$ctx->{order} = [];
-	my $srch = $ctx->{srch};
-	my $msgs = $ctx->{-inbox}->recent({offset => $off, limit => 200 });
-	my $nr = scalar @$msgs;
-	if ($nr) {
+	my $msgs = paginate_recent($ctx);
+	if (@$msgs) {
 		walk_thread(thread_results($ctx, $msgs), $ctx, *acc_topic);
 	}
-	$ctx->{-next_o} = $off + $nr;
-	$ctx->{-cur_o} = $off;
 	PublicInbox::WwwStream->response($ctx, dump_topics($ctx), *index_nav);
 }
 
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/7] mbox: remove remaining OFFSET usage in SQLite
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
                   ` (3 preceding siblings ...)
  2018-04-03 11:09 ` [PATCH 4/7] view: avoid offset during pagination Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 6/7] msgmap: replace id_batch with ids_after Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 7/7] nntp: simplify the long_response API Eric Wong (Contractor, The Linux Foundation)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

We can use id_batch in the common case to speed up full mbox
retrievals.  Gigantic msets are still a problem, but will
be fixed in future commits.
---
 lib/PublicInbox/Mbox.pm   | 37 +++++++++++++++++++++++++++++++------
 lib/PublicInbox/Over.pm   | 13 ++++++-------
 lib/PublicInbox/Search.pm |  4 ++--
 t/psgi_v2.t               | 22 +++++++++++++++++++++-
 4 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 05de6be..0be1968 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -138,8 +138,12 @@ sub thread_mbox {
 	my ($ctx, $srch, $sfx) = @_;
 	eval { require IO::Compress::Gzip };
 	return sub { need_gzip(@_) } if $@;
-
-	my $cb = sub { $srch->get_thread($ctx->{mid}, @_) };
+	my $prev = 0;
+	my $cb = sub {
+		my $msgs = $srch->get_thread($ctx->{mid}, $prev);
+		$prev = $msgs->[-1]->{num} if scalar(@$msgs);
+		$msgs;
+	};
 	PublicInbox::MboxGz->response($ctx, $cb);
 }
 
@@ -160,7 +164,25 @@ sub mbox_all {
 
 	eval { require IO::Compress::Gzip };
 	return sub { need_gzip(@_) } if $@;
-	my $cb = sub { $ctx->{srch}->query($query, @_) };
+	if ($query eq '') {
+		my $prev = 0;
+		my $msgs = [];
+		my $cb = sub {
+			$ctx->{-inbox}->mm->id_batch($prev, sub {
+				$msgs = $_[0];
+			});
+			$prev = $msgs->[-1] if @$msgs;
+			$msgs;
+		};
+		return PublicInbox::MboxGz->response($ctx, $cb, 'all');
+	}
+	my $opts = { offset => 0 };
+	my $srch = $ctx->{srch};
+	my $cb = sub { # called by MboxGz->getline
+		my $msgs = $srch->query($query, $opts);
+		$opts->{offset} += scalar @$msgs;
+		$msgs;
+	};
 	PublicInbox::MboxGz->response($ctx, $cb, 'results-'.$query);
 }
 
@@ -192,7 +214,6 @@ sub new {
 		cb => $cb,
 		ctx => $ctx,
 		msgs => [],
-		opts => { offset => 0 },
 	}, $class;
 }
 
@@ -223,6 +244,10 @@ sub getline {
 	do {
 		# work on existing result set
 		while (defined(my $smsg = shift @$msgs)) {
+			# id_batch may return integers
+			ref($smsg) or
+				$smsg = $ctx->{srch}->{over_ro}->get_art($smsg);
+
 			my $msg = eval { $ibx->msg_by_smsg($smsg) } or next;
 			$msg = Email::Simple->new($msg);
 			$gz->write(PublicInbox::Mbox::msg_str($ctx, $msg,
@@ -247,10 +272,10 @@ sub getline {
 		}
 
 		# refill result set
-		$msgs = $self->{msgs} = $self->{cb}->($self->{opts});
-		$self->{opts}->{offset} += scalar @$msgs;
+		$msgs = $self->{msgs} = $self->{cb}->();
 	} while (@$msgs);
 	$gz->close;
+	# signal that we're done and can return undef next call:
 	delete $self->{ctx};
 	${delete $self->{buf}};
 }
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index b230d44..0bd6008 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -50,9 +50,7 @@ sub do_get {
 	my ($self, $sql, $opts, @args) = @_;
 	my $dbh = $self->connect;
 	my $lim = (($opts->{limit} || 0) + 0) || 1000;
-	my $off = (($opts->{offset} || 0) + 0) || 0;
 	$sql .= "LIMIT $lim";
-	$sql .= " OFFSET $off" if $off > 0;
 	my $msgs = $dbh->selectall_arrayref($sql, { Slice => {} }, @args);
 	load_from_row($_) for @$msgs;
 	$msgs
@@ -77,7 +75,7 @@ ORDER BY num ASC
 sub nothing () { wantarray ? (0, []) : [] };
 
 sub get_thread {
-	my ($self, $mid, $opts) = @_;
+	my ($self, $mid, $prev) = @_;
 	my $dbh = $self->connect;
 
 	my $id = $dbh->selectrow_array(<<'', undef, $mid);
@@ -96,13 +94,14 @@ SELECT tid,sid FROM over WHERE num = ? LIMIT 1
 
 	defined $tid or return nothing; # $sid may be undef
 
-	my $cond = 'FROM over WHERE (tid = ? OR sid = ?) AND num > 0';
-	my $msgs = do_get($self, <<"", $opts, $tid, $sid);
-SELECT * $cond ORDER BY ts ASC
+	$prev ||= 0;
+	my $cond = 'FROM over WHERE (tid = ? OR sid = ?) AND num > ?';
+	my $msgs = do_get($self, <<"", {}, $tid, $sid, $prev);
+SELECT * $cond ORDER BY num ASC
 
 	return $msgs unless wantarray;
 
-	my $nr = $dbh->selectrow_array(<<"", undef, $tid, $sid);
+	my $nr = $dbh->selectrow_array(<<"", undef, $tid, $sid, $prev);
 SELECT COUNT(num) $cond
 
 	($nr, $msgs);
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index f7fdf85..eca2b0f 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -179,8 +179,8 @@ sub query {
 }
 
 sub get_thread {
-	my ($self, $mid, $opts) = @_;
-	$self->{over_ro}->get_thread($mid, $opts);
+	my ($self, $mid, $prev) = @_;
+	$self->{over_ro}->get_thread($mid, $prev);
 }
 
 sub retry_reopen {
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 31c4178..aa3279c 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -125,8 +125,28 @@ test_psgi(sub { $www->call(@_) }, sub {
 		like($out, qr/^hello world$/m, 'got first in t.mbox.gz');
 		like($out, qr/^hello world!$/m, 'got second in t.mbox.gz');
 		like($out, qr/^hello ghosts$/m, 'got third in t.mbox.gz');
-		@from_ = ($raw =~ m/^From /mg);
+		@from_ = ($out =~ m/^From /mg);
 		is(scalar(@from_), 3, 'three From_ lines in t.mbox.gz');
+
+		# search interface
+		$res = $cb->(POST('/v2test/?q=m:a-mid@b&x=m'));
+		$in = $res->content;
+		$status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		like($out, qr/^hello world$/m, 'got first in mbox POST');
+		like($out, qr/^hello world!$/m, 'got second in mbox POST');
+		like($out, qr/^hello ghosts$/m, 'got third in mbox POST');
+		@from_ = ($out =~ m/^From /mg);
+		is(scalar(@from_), 3, 'three From_ lines in mbox POST');
+
+		# all.mbox.gz interface
+		$res = $cb->(GET('/v2test/all.mbox.gz'));
+		$in = $res->content;
+		$status = IO::Uncompress::Gunzip::gunzip(\$in => \$out);
+		like($out, qr/^hello world$/m, 'got first in all.mbox');
+		like($out, qr/^hello world!$/m, 'got second in all.mbox');
+		like($out, qr/^hello ghosts$/m, 'got third in all.mbox');
+		@from_ = ($out =~ m/^From /mg);
+		is(scalar(@from_), 3, 'three From_ lines in all.mbox');
 	};
 
 	local $SIG{__WARN__} = 'DEFAULT';
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 6/7] msgmap: replace id_batch with ids_after
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
                   ` (4 preceding siblings ...)
  2018-04-03 11:09 ` [PATCH 5/7] mbox: remove remaining OFFSET usage in SQLite Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  2018-04-03 11:09 ` [PATCH 7/7] nntp: simplify the long_response API Eric Wong (Contractor, The Linux Foundation)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

id_batch had a an overly complicated interface, replace it
with id_batch which is simpler and takes advantage of
selectcol_arrayref in DBI.  This allows simplification of
callers and the diffstat agrees with me.
---
 lib/PublicInbox/Mbox.pm   | 11 ++---------
 lib/PublicInbox/Msgmap.pm | 19 ++++++++-----------
 lib/PublicInbox/NNTP.pm   | 12 ++++--------
 t/nntpd.t                 |  1 +
 t/v2writable.t            |  7 +++++++
 5 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 0be1968..c66ccaa 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -166,14 +166,7 @@ sub mbox_all {
 	return sub { need_gzip(@_) } if $@;
 	if ($query eq '') {
 		my $prev = 0;
-		my $msgs = [];
-		my $cb = sub {
-			$ctx->{-inbox}->mm->id_batch($prev, sub {
-				$msgs = $_[0];
-			});
-			$prev = $msgs->[-1] if @$msgs;
-			$msgs;
-		};
+		my $cb = sub { $ctx->{-inbox}->mm->ids_after(\$prev) };
 		return PublicInbox::MboxGz->response($ctx, $cb, 'all');
 	}
 	my $opts = { offset => 0 };
@@ -244,7 +237,7 @@ sub getline {
 	do {
 		# work on existing result set
 		while (defined(my $smsg = shift @$msgs)) {
-			# id_batch may return integers
+			# ids_after may return integers
 			ref($smsg) or
 				$smsg = $ctx->{srch}->{over_ro}->get_art($smsg);
 
diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index dea9573..26565d4 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -186,17 +186,14 @@ sub create_tables {
 }
 
 # used by NNTP.pm
-sub id_batch {
-	my ($self, $num, $cb) = @_;
-	my $dbh = $self->{dbh};
-	my $sth = $dbh->prepare('SELECT num FROM msgmap WHERE num > ? '.
-				'ORDER BY num ASC LIMIT 1000');
-	$sth->execute($num);
-	my $ary = $sth->fetchall_arrayref;
-	@$ary = map { $_->[0] } @$ary;
-	my $nr = scalar @$ary;
-	$cb->($ary) if $nr;
-	$nr;
+sub ids_after {
+	my ($self, $num) = @_;
+	my $ids = $self->{dbh}->selectcol_arrayref(<<'', undef, $$num);
+SELECT num FROM msgmap WHERE num > ?
+ORDER BY num ASC LIMIT 1000
+
+	$$num = $ids->[-1] if @$ids;
+	$ids;
 }
 
 # only used for mapping external serial numbers (e.g. articles from gmane)
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index ff6d895..b91cda1 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -34,7 +34,6 @@ my $LIST_HEADERS = join("\r\n", @OVERVIEW,
 			qw(:bytes :lines Xref To Cc)) . "\r\n";
 
 # disable commands with easy DoS potential:
-# LISTGROUP could get pretty bad, too...
 my %DISABLED; # = map { $_ => 1 } qw(xover list_overview_fmt newnews xhdr);
 
 my $EXPMAP; # fd -> [ idle_time, $self ]
@@ -225,15 +224,12 @@ sub cmd_listgroup ($;$) {
 	}
 
 	$self->{ng} or return '412 no newsgroup selected';
+	my $n = 0;
 	long_response($self, 0, long_response_limit, sub {
 		my ($i) = @_;
-		my $nr = $self->{ng}->mm->id_batch($$i, sub {
-			my ($ary) = @_;
-			more($self, join("\r\n", @$ary));
-		});
-
-		# -1 to adjust for implicit increment in long_response
-		$$i = $nr ? $$i + $nr - 1 : long_response_limit;
+		my $ary = $self->{ng}->mm->ids_after(\$n);
+		scalar @$ary or return ($$i = long_response_limit);
+		more($self, join("\r\n", @$ary));
 	});
 }
 
diff --git a/t/nntpd.t b/t/nntpd.t
index de781d7..c6e34ed 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -123,6 +123,7 @@ EOF
 	my $list = $n->list;
 	is_deeply($list, { $group => [ qw(1 1 n) ] }, 'LIST works');
 	is_deeply([$n->group($group)], [ qw(0 1 1), $group ], 'GROUP works');
+	is_deeply($n->listgroup($group), [1], 'listgroup OK');
 
 	%opts = (
 		PeerAddr => $host_port,
diff --git a/t/v2writable.t b/t/v2writable.t
index 1e8e404..6294735 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -189,6 +189,13 @@ EOF
 		is($nn{$mid}++, 0, "MID is unique in NEWNEWS");
 	}
 	is_deeply([sort keys %nn], [sort keys %uniq]);
+
+	my %lg;
+	foreach my $num (@{$n->listgroup($group)}) {
+		is($lg{$num}++, 0, "num is unique in LISTGROUP");
+	}
+	is_deeply([sort keys %lg], [sort keys %$x],
+		'XOVER and LISTGROUPS return the same article numbers');
 };
 {
 	local $ENV{NPROC} = 2;
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 7/7] nntp: simplify the long_response API
  2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
                   ` (5 preceding siblings ...)
  2018-04-03 11:09 ` [PATCH 6/7] msgmap: replace id_batch with ids_after Eric Wong (Contractor, The Linux Foundation)
@ 2018-04-03 11:09 ` Eric Wong (Contractor, The Linux Foundation)
  6 siblings, 0 replies; 8+ messages in thread
From: Eric Wong (Contractor, The Linux Foundation) @ 2018-04-03 11:09 UTC (permalink / raw)
  To: meta

We we worked around the default range/termination conditions of
long_response in many cases to reduce calls to SQLite or Xapian.
So continue that trend and become more like the PSGI API
which doesn't force callers to specify an article range or
work inside a loop.
---
 lib/PublicInbox/Msgmap.pm | 12 +++++++
 lib/PublicInbox/NNTP.pm   | 83 ++++++++++++++++++++++-------------------------
 t/v2writable.t            | 10 ++++++
 3 files changed, 61 insertions(+), 44 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index 26565d4..c6a7315 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -196,6 +196,18 @@ ORDER BY num ASC LIMIT 1000
 	$ids;
 }
 
+sub msg_range {
+	my ($self, $beg, $end) = @_;
+	my $dbh = $self->{dbh};
+	my $attr = { Columns => [] };
+	my $mids = $dbh->selectall_arrayref(<<'', $attr, $$beg, $end);
+SELECT num,mid FROM msgmap WHERE num >= ? AND num <= ?
+ORDER BY num ASC
+
+	$$beg = $mids->[-1]->[0] + 1 if @$mids;
+	$mids
+}
+
 # only used for mapping external serial numbers (e.g. articles from gmane)
 # see scripts/xhdr-num2mid or PublicInbox::Filter::RubyLang for usage
 sub mid_set {
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index b91cda1..e517935 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -225,11 +225,11 @@ sub cmd_listgroup ($;$) {
 
 	$self->{ng} or return '412 no newsgroup selected';
 	my $n = 0;
-	long_response($self, 0, long_response_limit, sub {
-		my ($i) = @_;
+	long_response($self, sub {
 		my $ary = $self->{ng}->mm->ids_after(\$n);
-		scalar @$ary or return ($$i = long_response_limit);
+		scalar @$ary or return;
 		more($self, join("\r\n", @$ary));
+		1;
 	});
 }
 
@@ -328,8 +328,7 @@ sub cmd_newnews ($$$$;$$) {
 	return '.' unless @srch;
 
 	my $prev = 0;
-	long_response($self, 0, long_response_limit, sub {
-		my ($i) = @_;
+	long_response($self, sub {
 		my $srch = $srch[0];
 		my $msgs = $srch->query_ts($ts, $prev);
 		if (scalar @$msgs) {
@@ -341,8 +340,9 @@ sub cmd_newnews ($$$$;$$) {
 			shift @srch;
 			if (@srch) { # continue onto next newsgroup
 				$prev = 0;
+				return 1;
 			} else { # break out of the long response.
-				$$i = long_response_limit;
+				return;
 			}
 		}
 	});
@@ -564,8 +564,8 @@ sub get_range ($$) {
 	[ $beg, $end ];
 }
 
-sub long_response ($$$$) {
-	my ($self, $beg, $end, $cb) = @_;
+sub long_response ($$) {
+	my ($self, $cb) = @_;
 	die "BUG: nested long response" if $self->{long_res};
 
 	my $fd = $self->{fd};
@@ -576,23 +576,14 @@ sub long_response ($$$$) {
 	$self->watch_read(0);
 	my $t0 = now();
 	$self->{long_res} = sub {
-		# limit our own running time for fairness with other
-		# clients and to avoid buffering too much:
-		my $lim = $end == long_response_limit ? 1 : 100;
-
-		my $err;
-		do {
-			eval { $cb->(\$beg) };
-		} until (($err = $@) || $self->{closed} ||
-			 ++$beg > $end || !--$lim || $self->{write_buf_size});
-
-		if ($err || $self->{closed}) {
+		my $more = eval { $cb->() };
+		if ($@ || $self->{closed}) {
 			$self->{long_res} = undef;
 
-			if ($err) {
+			if ($@) {
 				err($self,
 				    "%s during long response[$fd] - %0.6f",
-				    $err, now() - $t0);
+				    $@, now() - $t0);
 			}
 			if ($self->{closed}) {
 				out($self, " deferred[$fd] aborted - %0.6f",
@@ -601,7 +592,7 @@ sub long_response ($$$$) {
 				update_idle_time($self);
 				$self->watch_read(1);
 			}
-		} elsif (!$lim || $self->{write_buf_size}) {
+		} elsif ($more) { # $self->{write_buf_size}:
 			# no recursion, schedule another call ASAP
 			# but only after all pending writes are done
 			update_idle_time($self);
@@ -633,10 +624,13 @@ sub hdr_message_id ($$$) { # optimize XHDR Message-ID [range] for slrnpull.
 		my $mm = $self->{ng}->mm;
 		my ($beg, $end) = @$r;
 		more($self, $xhdr ? r221 : r225);
-		long_response($self, $beg, $end, sub {
-			my ($i) = @_;
-			my $mid = $mm->mid_for($$i);
-			more($self, "$$i <$mid>") if defined $mid;
+		long_response($self, sub {
+			my $r = $mm->msg_range(\$beg, $end);
+			@$r or return;
+			more($self, join("\r\n", map {
+				"$_->[0] <$_->[1]>"
+			} @$r));
+			1;
 		});
 	}
 }
@@ -676,10 +670,16 @@ sub hdr_xref ($$$) { # optimize XHDR Xref [range] for rtin
 		my $mm = $ng->mm;
 		my ($beg, $end) = @$r;
 		more($self, $xhdr ? r221 : r225);
-		long_response($self, $beg, $end, sub {
-			my ($i) = @_;
-			my $mid = $mm->mid_for($$i);
-			more($self, "$$i ".xref($ng, $$i)) if defined $mid;
+		long_response($self, sub {
+			my $r = $mm->msg_range(\$beg, $end);
+			@$r or return;
+			more($self, join("\r\n", map {
+				# TODO: use $_->[1] (mid) to fill
+				# Xref: from other inboxes
+				my $num = $_->[0];
+				"$num ".xref($ng, $num);
+			} @$r));
+			1;
 		});
 	}
 }
@@ -707,11 +707,9 @@ sub hdr_searchmsg ($$$$) {
 		my ($beg, $end) = @$r;
 		more($self, $xhdr ? r221 : r225);
 		my $cur = $beg;
-		long_response($self, 0, long_response_limit, sub {
-			my ($i) = @_;
+		long_response($self, sub {
 			my $msgs = $srch->query_xover($cur, $end);
-			my $nr = scalar @$msgs or
-					return ($$i = long_response_limit);
+			my $nr = scalar @$msgs or return;
 			my $tmp = '';
 			foreach my $s (@$msgs) {
 				$tmp .= $s->num . ' ' . $s->$field . "\r\n";
@@ -792,12 +790,11 @@ sub cmd_xrover ($;$) {
 	my $mm = $ng->mm;
 	my $srch = $ng->search;
 	more($self, '224 Overview information follows');
-	long_response($self, $beg, $end, sub {
-		my ($i) = @_;
-		my $num = $$i;
-		my $h = search_header_for($srch, $num, 'references');
-		defined $h or return;
-		more($self, "$num $h");
+
+	long_response($self, sub {
+		my $h = search_header_for($srch, $beg, 'references');
+		more($self, "$beg $h") if defined($h);
+		$beg++ < $end;
 	});
 }
 
@@ -844,17 +841,15 @@ sub cmd_xover ($;$) {
 	more($self, "224 Overview information follows for $beg to $end");
 	my $srch = $self->{ng}->search;
 	my $cur = $beg;
-	long_response($self, 0, long_response_limit, sub {
-		my ($i) = @_;
+	long_response($self, sub {
 		my $msgs = $srch->query_xover($cur, $end);
-		my $nr = scalar @$msgs or return ($$i = long_response_limit);
+		my $nr = scalar @$msgs or return;
 
 		# OVERVIEW.FMT
 		more($self, join("\r\n", map {
 			over_line($_->{num}, $_);
 			} @$msgs));
 		$cur = $msgs->[-1]->{num} + 1;
-		1;
 	});
 }
 
diff --git a/t/v2writable.t b/t/v2writable.t
index 6294735..2f83977 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -105,6 +105,7 @@ if ('ensure git configs are correct') {
 
 {
 	$mime->header_set('Message-Id', '<abcde@1>', '<abcde@2>');
+	$mime->header_set('References', '<zz-mid@b>');
 	ok($im->add($mime), 'message with multiple Message-ID');
 	$im->done;
 	my @found;
@@ -196,6 +197,15 @@ EOF
 	}
 	is_deeply([sort keys %lg], [sort keys %$x],
 		'XOVER and LISTGROUPS return the same article numbers');
+
+	my $xref = $n->xhdr('Xref', '1-');
+	is_deeply([sort keys %lg], [sort keys %$xref], 'Xref range OK');
+
+	my $mids = $n->xhdr('Message-ID', '1-');
+	is_deeply([sort keys %lg], [sort keys %$xref], 'Message-ID range OK');
+
+	my $rover = $n->xrover('1-');
+	is_deeply([sort keys %lg], [sort keys %$rover], 'XROVER range OK');
 };
 {
 	local $ENV{NPROC} = 2;
-- 
EW


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-03 11:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-03 11:09 [PATCH 0/7] optimize V2 Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 1/7] t/thread-all.t: modernize test to support modern inboxes Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 2/7] rename+rewrite test using Benchmark module Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 3/7] nntp: make XOVER, XHDR, OVER, HDR and NEWNEWS faster Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 4/7] view: avoid offset during pagination Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 5/7] mbox: remove remaining OFFSET usage in SQLite Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 6/7] msgmap: replace id_batch with ids_after Eric Wong (Contractor, The Linux Foundation)
2018-04-03 11:09 ` [PATCH 7/7] nntp: simplify the long_response API Eric Wong (Contractor, The Linux Foundation)

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).