user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 5/5] extmsg: use ->ALL for "global" MID lookups
Date: Fri,  4 Dec 2020 22:03:49 +0000	[thread overview]
Message-ID: <20201204220349.4408-6-e@80x24.org> (raw)
In-Reply-To: <20201204220349.4408-1-e@80x24.org>

As with NewsWWW and NNTP, we can use ->ALL to completely
avoid trying SQLite/Xapian lookups across hundreds/thousands
of inboxes.
---
 lib/PublicInbox/ExtMsg.pm | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
index 03faf3a1..2a0a3e46 100644
--- a/lib/PublicInbox/ExtMsg.pm
+++ b/lib/PublicInbox/ExtMsg.pm
@@ -103,9 +103,37 @@ sub ext_msg_step {
 	}
 }
 
+sub ext_msg_ALL ($) {
+	my ($ctx) = @_;
+	my $ALL = $ctx->{www}->{pi_config}->ALL or return;
+	my $by_eidx_key = $ctx->{www}->{pi_config}->{-by_eidx_key};
+	my $cur_key = $ctx->{-inbox}->eidx_key;
+	my %seen = ($cur_key => 1);
+	my ($id, $prev);
+	while (my $x = $ALL->over->next_by_mid($ctx->{mid}, \$id, \$prev)) {
+		my $xr3 = $ALL->over->get_xref3($x->{num});
+		for my $k (@$xr3) {
+			$k =~ s/:[0-9]+:$x->{blob}\z// or next;
+			next if $k eq $cur_key;
+			my $ibx = $by_eidx_key->{$k} // next;
+			my $url = $ibx->base_url or next;
+			push(@{$ctx->{found}}, $ibx) unless $seen{$k}++;
+		}
+	}
+	return exact($ctx) if $ctx->{found};
+
+	# fall back to partial MID matching
+	for my $ibxish ($ctx->{-inbox}, $ALL) {
+		my $mids = search_partial($ibxish, $ctx->{mid}) or next;
+		push @{$ctx->{partial}}, [ $ibxish, $mids ];
+		last if ($ctx->{n_partial} += scalar(@$mids)) >= PARTIAL_MAX;
+	}
+	partial_response($ctx);
+}
+
 sub ext_msg {
 	my ($ctx) = @_;
-	sub {
+	ext_msg_ALL($ctx) // sub {
 		$ctx->{-wcb} = $_[0]; # HTTP server write callback
 
 		if ($ctx->{env}->{'pi-httpd.async'}) {
@@ -159,7 +187,7 @@ sub finalize_exact {
 	finalize_partial($ctx);
 }
 
-sub finalize_partial {
+sub partial_response ($) {
 	my ($ctx) = @_;
 	my $mid = $ctx->{mid};
 	my $code = 404;
@@ -192,9 +220,11 @@ sub finalize_partial {
 	$ctx->{-html_tip} = $s .= '</pre>';
 	$ctx->{-title_html} = $title;
 	$ctx->{-upfx} = '../';
-	$ctx->{-wcb}->(html_oneshot($ctx, $code));
+	html_oneshot($ctx, $code);
 }
 
+sub finalize_partial ($) { $_[0]->{-wcb}->(partial_response($_[0])) }
+
 sub ext_urls {
 	my ($ctx, $mid, $href, $html) = @_;
 

      parent reply	other threads:[~2020-12-04 22:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 22:03 [PATCH 0/5] more ->ALL usage Eric Wong
2020-12-04 22:03 ` [PATCH 1/5] nntp: xref_by_tc: simplify slightly Eric Wong
2020-12-04 22:03 ` [PATCH 2/5] nntp: small speed up for multi-line responses Eric Wong
2020-12-04 22:03 ` [PATCH 3/5] search: remove mdocid export Eric Wong
2020-12-04 22:03 ` [PATCH 4/5] newswww: use ->ALL to avoid O(n) inbox scan Eric Wong
2020-12-04 22:03 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201204220349.4408-6-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).