user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 4/5] newswww: use ->ALL to avoid O(n) inbox scan
Date: Fri,  4 Dec 2020 22:03:48 +0000	[thread overview]
Message-ID: <20201204220349.4408-5-e@80x24.org> (raw)
In-Reply-To: <20201204220349.4408-1-e@80x24.org>

We can avoid doing a Message-ID lookup on every single inbox
by using ->ALL to scan its over.sqlite3 DB.  This mimics NNTP
behavior and picks the first message indexed, though redirecting
to /all/$MESSAGE_ID/ could be done.

With the current lore.kernel.org set of inboxes (~140), this
provides a 10-40% speedup depending on inbox ordering.
---
 lib/PublicInbox/Config.pm  |  4 ++--
 lib/PublicInbox/NewsWWW.pm | 30 +++++++++++++++++++++++-------
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 9b9d5c19..ba0ead6e 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -33,6 +33,7 @@ sub new {
 	$self->{-by_list_id} = {};
 	$self->{-by_name} = {};
 	$self->{-by_newsgroup} = {};
+	$self->{-by_eidx_key} = {};
 	$self->{-no_obfuscate} = {};
 	$self->{-limiters} = {};
 	$self->{-code_repos} = {}; # nick => PublicInbox::Git object
@@ -476,8 +477,7 @@ EOF
 			push @$repo_objs, $repo if $repo;
 		}
 	}
-
-	$ibx
+	$self->{-by_eidx_key}->{$ibx->eidx_key} = $ibx;
 }
 
 sub _fill_ei ($$) {
diff --git a/lib/PublicInbox/NewsWWW.pm b/lib/PublicInbox/NewsWWW.pm
index 6bed0103..ade8dfd1 100644
--- a/lib/PublicInbox/NewsWWW.pm
+++ b/lib/PublicInbox/NewsWWW.pm
@@ -63,7 +63,6 @@ sub call {
 		return redirect($code, $url);
 	}
 
-	my $res;
 	my @try = (join('/', @parts));
 
 	# trailing slash is in the rest of our WWW, so maybe some users
@@ -72,13 +71,30 @@ sub call {
 		pop @parts;
 		push @try, join('/', @parts);
 	}
-
-	foreach my $mid (@try) {
-		my $arg = [ $mid ];
-		$pi_config->each_inbox(\&try_inbox, $arg);
-		defined($res = $arg->[1]) and last;
+	my $ALL = $pi_config->ALL;
+	if (my $over = $ALL ? $ALL->over : undef) {
+		my $by_eidx_key = $pi_config->{-by_eidx_key};
+		for my $mid (@try) {
+			my ($id, $prev);
+			while (my $x = $over->next_by_mid($mid, \$id, \$prev)) {
+				my $xr3 = $over->get_xref3($x->{num});
+				for (@$xr3) {
+					s/:[0-9]+:$x->{blob}\z// or next;
+					my $ibx = $by_eidx_key->{$_} // next;
+					my $url = $ibx->base_url or next;
+					$url .= mid_escape($mid) . '/';
+					return redirect(302, $url);
+				}
+			}
+		}
+	} else { # slow path, scan every inbox
+		for my $mid (@try) {
+			my $arg = [ $mid ]; # [1] => result
+			$pi_config->each_inbox(\&try_inbox, $arg);
+			return $arg->[1] if $arg->[1];
+		}
 	}
-	$res || [ 404, [qw(Content-Type text/plain)], ["404 Not Found\n"] ];
+	[ 404, [qw(Content-Type text/plain)], ["404 Not Found\n"] ];
 }
 
 1;

  parent reply	other threads:[~2020-12-04 22:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 22:03 [PATCH 0/5] more ->ALL usage Eric Wong
2020-12-04 22:03 ` [PATCH 1/5] nntp: xref_by_tc: simplify slightly Eric Wong
2020-12-04 22:03 ` [PATCH 2/5] nntp: small speed up for multi-line responses Eric Wong
2020-12-04 22:03 ` [PATCH 3/5] search: remove mdocid export Eric Wong
2020-12-04 22:03 ` Eric Wong [this message]
2020-12-04 22:03 ` [PATCH 5/5] extmsg: use ->ALL for "global" MID lookups Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204220349.4408-5-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).