user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] search: do not iterate through entire termlist
@ 2015-08-28  0:57  7% Eric Wong
  0 siblings, 0 replies; 1+ results
From: Eric Wong @ 2015-08-28  0:57 UTC (permalink / raw)
  To: meta

A document may have many terms, so this hurts performance
if we blindly iterate.  Unfortunately, we can't rely on the
order of the termlist just yet, either, so we must repeatedly
restart the search for now until we're ready to bump schema
versions.
---
 lib/PublicInbox/SearchMsg.pm | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/SearchMsg.pm b/lib/PublicInbox/SearchMsg.pm
index a9f3180..4ad8a0c 100644
--- a/lib/PublicInbox/SearchMsg.pm
+++ b/lib/PublicInbox/SearchMsg.pm
@@ -110,7 +110,6 @@ sub references_sorted {
 sub ensure_metadata {
 	my ($self) = @_;
 	my $doc = $self->{doc};
-	my $i = $doc->termlist_begin;
 	my $end = $doc->termlist_end;
 
 	unless (defined $PFX2TERM_RE) {
@@ -118,12 +117,17 @@ sub ensure_metadata {
 		$PFX2TERM_RE = qr/\A($or)/;
 	}
 
-	for (; $i != $end; $i->inc) {
-		my $val = $i->get_termname;
+	while (my ($pfx, $field) = each %PublicInbox::Search::PFX2TERM_RMAP) {
+		# ideally we'd move this out of the loop:
+		my $i = $doc->termlist_begin;
 
-		if ($val =~ s/$PFX2TERM_RE//o) {
-			my $field = $PublicInbox::Search::PFX2TERM_RMAP{$1};
-			$self->{$field} = $val;
+		$i->skip_to($pfx);
+		if ($i != $end) {
+			my $val = $i->get_termname;
+
+			if ($val =~ s/$PFX2TERM_RE//o) {
+				$self->{$field} = $val;
+			}
 		}
 	}
 }
-- 
EW


^ permalink raw reply related	[relevance 7%]

Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2015-08-28  0:57  7% [PATCH] search: do not iterate through entire termlist Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).