From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 4/5] newswww: use ->ALL to avoid O(n) inbox scan
Date: Fri, 4 Dec 2020 22:03:48 +0000 [thread overview]
Message-ID: <20201204220349.4408-5-e@80x24.org> (raw)
In-Reply-To: <20201204220349.4408-1-e@80x24.org>
We can avoid doing a Message-ID lookup on every single inbox
by using ->ALL to scan its over.sqlite3 DB. This mimics NNTP
behavior and picks the first message indexed, though redirecting
to /all/$MESSAGE_ID/ could be done.
With the current lore.kernel.org set of inboxes (~140), this
provides a 10-40% speedup depending on inbox ordering.
---
lib/PublicInbox/Config.pm | 4 ++--
lib/PublicInbox/NewsWWW.pm | 30 +++++++++++++++++++++++-------
2 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 9b9d5c19..ba0ead6e 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -33,6 +33,7 @@ sub new {
$self->{-by_list_id} = {};
$self->{-by_name} = {};
$self->{-by_newsgroup} = {};
+ $self->{-by_eidx_key} = {};
$self->{-no_obfuscate} = {};
$self->{-limiters} = {};
$self->{-code_repos} = {}; # nick => PublicInbox::Git object
@@ -476,8 +477,7 @@ EOF
push @$repo_objs, $repo if $repo;
}
}
-
- $ibx
+ $self->{-by_eidx_key}->{$ibx->eidx_key} = $ibx;
}
sub _fill_ei ($$) {
diff --git a/lib/PublicInbox/NewsWWW.pm b/lib/PublicInbox/NewsWWW.pm
index 6bed0103..ade8dfd1 100644
--- a/lib/PublicInbox/NewsWWW.pm
+++ b/lib/PublicInbox/NewsWWW.pm
@@ -63,7 +63,6 @@ sub call {
return redirect($code, $url);
}
- my $res;
my @try = (join('/', @parts));
# trailing slash is in the rest of our WWW, so maybe some users
@@ -72,13 +71,30 @@ sub call {
pop @parts;
push @try, join('/', @parts);
}
-
- foreach my $mid (@try) {
- my $arg = [ $mid ];
- $pi_config->each_inbox(\&try_inbox, $arg);
- defined($res = $arg->[1]) and last;
+ my $ALL = $pi_config->ALL;
+ if (my $over = $ALL ? $ALL->over : undef) {
+ my $by_eidx_key = $pi_config->{-by_eidx_key};
+ for my $mid (@try) {
+ my ($id, $prev);
+ while (my $x = $over->next_by_mid($mid, \$id, \$prev)) {
+ my $xr3 = $over->get_xref3($x->{num});
+ for (@$xr3) {
+ s/:[0-9]+:$x->{blob}\z// or next;
+ my $ibx = $by_eidx_key->{$_} // next;
+ my $url = $ibx->base_url or next;
+ $url .= mid_escape($mid) . '/';
+ return redirect(302, $url);
+ }
+ }
+ }
+ } else { # slow path, scan every inbox
+ for my $mid (@try) {
+ my $arg = [ $mid ]; # [1] => result
+ $pi_config->each_inbox(\&try_inbox, $arg);
+ return $arg->[1] if $arg->[1];
+ }
}
- $res || [ 404, [qw(Content-Type text/plain)], ["404 Not Found\n"] ];
+ [ 404, [qw(Content-Type text/plain)], ["404 Not Found\n"] ];
}
1;
next prev parent reply other threads:[~2020-12-04 22:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-04 22:03 [PATCH 0/5] more ->ALL usage Eric Wong
2020-12-04 22:03 ` [PATCH 1/5] nntp: xref_by_tc: simplify slightly Eric Wong
2020-12-04 22:03 ` [PATCH 2/5] nntp: small speed up for multi-line responses Eric Wong
2020-12-04 22:03 ` [PATCH 3/5] search: remove mdocid export Eric Wong
2020-12-04 22:03 ` Eric Wong [this message]
2020-12-04 22:03 ` [PATCH 5/5] extmsg: use ->ALL for "global" MID lookups Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201204220349.4408-5-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).