user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH v2 1/3] SearchIdx.pm: Make indexing search positions optional
  2018-07-18 16:52  6%         ` [PATCH v2 1/3] Making the search indexes optional Eric W. Biederman
@ 2018-07-18 16:53  4%           ` Eric W. Biederman
  0 siblings, 0 replies; 4+ results
From: Eric W. Biederman @ 2018-07-18 16:53 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

About half the size of the Xapian search index turns out to be search
positions.  The search positions are only used in a very narrow set of
queries.  Make the search positions optional so people don't need to
pay the cost of queries they will never make.

This also makes public-inbox more approachable for light hacking as
generating all of the indexes is time consuming.

The way this is done is to add a method to SearchIdx called index_text
that wraps the call of the term generator method index_text.  The new
index_text method takes care of calling both index_text and
increase_termpos (the two functions that are responsible for position
data).

Then index_users, index_diff_inc, index_old_diff_fn, index_diff,
index_body are made proper methods that calls the new index_text.
Callers of the new index_text are slightly simplified as they don't
need to call increase_termpos as well.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm | 94 +++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 45 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 0e0796c12c12..b19618c71508 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -51,6 +51,7 @@ sub new {
 	my $git_dir = $mainrepo;
 	my ($altid, $git);
 	my $version = 1;
+	my $indexlevel = 'full';
 	if (ref $ibx) {
 		$mainrepo = $ibx->{mainrepo};
 		$altid = $ibx->{altid};
@@ -72,6 +73,7 @@ sub new {
 		git => $ibx->git,
 		-altid => $altid,
 		version => $version,
+		indexlevel => $indexlevel,
 	}, $class;
 	$ibx->umask_prepare;
 	if ($version == 1) {
@@ -118,34 +120,42 @@ sub add_val ($$$) {
 	$doc->add_value($col, $num);
 }
 
+sub index_text ($$$$)
+{
+	my ($self, $field, $n, $text) = @_;
+	my $tg = $self->term_generator;
+
+	if ($self->{indexlevel} eq 'full') {
+		$tg->index_text($field, $n, $text);
+		$tg->increase_termpos;
+	} else {
+		$tg->index_text_without_positions($field, $n, $text);
+	}
+}
+
 sub index_users ($$) {
-	my ($tg, $smsg) = @_;
+	my ($self, $smsg) = @_;
 
 	my $from = $smsg->from;
 	my $to = $smsg->to;
 	my $cc = $smsg->cc;
 
-	$tg->index_text($from, 1, 'A'); # A - author
-	$tg->increase_termpos;
-	$tg->index_text($to, 1, 'XTO') if $to ne '';
-	$tg->increase_termpos;
-	$tg->index_text($cc, 1, 'XCC') if $cc ne '';
-	$tg->increase_termpos;
+	$self->index_text($from, 1, 'A'); # A - author
+	$self->index_text($to, 1, 'XTO') if $to ne '';
+	$self->index_text($cc, 1, 'XCC') if $cc ne '';
 }
 
 sub index_diff_inc ($$$$) {
-	my ($tg, $text, $pfx, $xnq) = @_;
+	my ($self, $text, $pfx, $xnq) = @_;
 	if (@$xnq) {
-		$tg->index_text(join("\n", @$xnq), 1, 'XNQ');
-		$tg->increase_termpos;
+		$self->index_text(join("\n", @$xnq), 1, 'XNQ');
 		@$xnq = ();
 	}
-	$tg->index_text($text, 1, $pfx);
-	$tg->increase_termpos;
+	$self->index_text($text, 1, $pfx);
 }
 
 sub index_old_diff_fn {
-	my ($tg, $seen, $fa, $fb, $xnq) = @_;
+	my ($self, $seen, $fa, $fb, $xnq) = @_;
 
 	# no renames or space support for traditional diffs,
 	# find the number of leading common paths to strip:
@@ -156,7 +166,7 @@ sub index_old_diff_fn {
 		$fb = join('/', @fb);
 		if ($fa eq $fb) {
 			unless ($seen->{$fa}++) {
-				index_diff_inc($tg, $fa, 'XDFN', $xnq);
+				$self->index_diff_inc($fa, 'XDFN', $xnq);
 			}
 			return 1;
 		}
@@ -167,22 +177,22 @@ sub index_old_diff_fn {
 }
 
 sub index_diff ($$$) {
-	my ($tg, $lines, $doc) = @_;
+	my ($self, $lines, $doc) = @_;
 	my %seen;
 	my $in_diff;
 	my @xnq;
 	my $xnq = \@xnq;
 	foreach (@$lines) {
 		if ($in_diff && s/^ //) { # diff context
-			index_diff_inc($tg, $_, 'XDFCTX', $xnq);
+			$self->index_diff_inc($_, 'XDFCTX', $xnq);
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
 		} elsif (m!^diff --git ("?a/.+) ("?b/.+)\z!) {
 			my ($fa, $fb) = ($1, $2);
 			my $fn = (split('/', git_unquote($fa), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$fn = (split('/', git_unquote($fb), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		# traditional diff:
 		} elsif (m/^diff -(.+) (\S+) (\S+)$/) {
@@ -190,26 +200,26 @@ sub index_diff ($$$) {
 			push @xnq, $_;
 			# only support unified:
 			next unless $opt =~ /[uU]/;
-			$in_diff = index_old_diff_fn($tg, \%seen, $fa, $fb,
+			$in_diff = $self->index_old_diff_fn(\%seen, $fa, $fb,
 							$xnq);
 		} elsif (m!^--- ("?a/.+)!) {
 			my $fn = (split('/', git_unquote($1), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (m!^\+\+\+ ("?b/.+)!)  {
 			my $fn = (split('/', git_unquote($1), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (/^--- (\S+)/) {
 			$in_diff = $1;
 			push @xnq, $_;
 		} elsif (defined $in_diff && /^\+\+\+ (\S+)/) {
-			$in_diff = index_old_diff_fn($tg, \%seen, $in_diff, $1,
+			$in_diff = $self->index_old_diff_fn(\%seen, $in_diff, $1,
 							$xnq);
 		} elsif ($in_diff && s/^\+//) { # diff added
-			index_diff_inc($tg, $_, 'XDFB', $xnq);
+			$self->index_diff_inc($_, 'XDFB', $xnq);
 		} elsif ($in_diff && s/^-//) { # diff removed
-			index_diff_inc($tg, $_, 'XDFA', $xnq);
+			$self->index_diff_inc($_, 'XDFA', $xnq);
 		} elsif (m!^index ([a-f0-9]+)\.\.([a-f0-9]+)!) {
 			my ($ba, $bb) = ($1, $2);
 			index_git_blob_id($doc, 'XDFPRE', $ba);
@@ -219,7 +229,7 @@ sub index_diff ($$$) {
 			# traditional diff w/o -p
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)$/) {
 			# hunk header context
-			index_diff_inc($tg, $1, 'XDFHH', $xnq);
+			$self->index_diff_inc($1, 'XDFHH', $xnq);
 		# ignore the following lines:
 		} elsif (/^(?:dis)similarity index/ ||
 				/^(?:old|new) mode/ ||
@@ -238,25 +248,23 @@ sub index_diff ($$$) {
 		}
 	}
 
-	$tg->index_text(join("\n", @xnq), 1, 'XNQ');
-	$tg->increase_termpos;
+	$self->index_text(join("\n", @xnq), 1, 'XNQ');
 }
 
 sub index_body ($$$) {
-	my ($tg, $lines, $doc) = @_;
+	my ($self, $lines, $doc) = @_;
 	my $txt = join("\n", @$lines);
 	if ($doc) {
 		# does it look like a diff?
 		if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
 			$txt = undef;
-			index_diff($tg, $lines, $doc);
+			$self->index_diff($lines, $doc);
 		} else {
-			$tg->index_text($txt, 1, 'XNQ');
+			$self->index_text($txt, 1, 'XNQ');
 		}
 	} else {
-		$tg->index_text($txt, 0, 'XQUOT');
+		$self->index_text($txt, 0, 'XQUOT');
 	}
-	$tg->increase_termpos;
 	@$lines = ();
 }
 
@@ -284,18 +292,15 @@ sub add_message {
 		my $tg = $self->term_generator;
 
 		$tg->set_document($doc);
-		$tg->index_text($subj, 1, 'S') if $subj;
-		$tg->increase_termpos;
-
-		index_users($tg, $smsg);
+		$self->index_text($subj, 1, 'S') if $subj;
+		$self->index_users($smsg);
 
 		msg_iter($mime, sub {
 			my ($part, $depth, @idx) = @{$_[0]};
 			my $ct = $part->content_type || 'text/plain';
 			my $fn = $part->filename;
 			if (defined $fn && $fn ne '') {
-				$tg->index_text($fn, 1, 'XFN');
-				$tg->increase_termpos;
+				$self->index_text($fn, 1, 'XFN');
 			}
 
 			return if $ct =~ m!\btext/x?html\b!i;
@@ -318,27 +323,26 @@ sub add_message {
 			my @lines = split(/\n/, $body);
 			while (defined(my $l = shift @lines)) {
 				if ($l =~ /^>/) {
-					index_body($tg, \@orig, $doc) if @orig;
+					$self->index_body(\@orig, $doc) if @orig;
 					push @quot, $l;
 				} else {
-					index_body($tg, \@quot, 0) if @quot;
+					$self->index_body(\@quot, 0) if @quot;
 					push @orig, $l;
 				}
 			}
-			index_body($tg, \@quot, 0) if @quot;
-			index_body($tg, \@orig, $doc) if @orig;
+			$self->index_body(\@quot, 0) if @quot;
+			$self->index_body(\@orig, $doc) if @orig;
 		});
 
 		foreach my $mid (@$mids) {
-			$tg->index_text($mid, 1, 'XM');
+			$self->index_text($mid, 1, 'XM');
 
 			# because too many Message-IDs are prefixed with
 			# "Pine.LNX."...
 			if ($mid =~ /\w{12,}/) {
 				my @long = ($mid =~ /(\w{3,}+)/g);
-				$tg->index_text(join(' ', @long), 1, 'XM');
+				$self->index_text(join(' ', @long), 1, 'XM');
 			}
-			$tg->increase_termpos;
 		}
 		$smsg->{to} = $smsg->{cc} = '';
 		PublicInbox::OverIdx::parse_references($smsg, $mid0, $mids);
-- 
2.17.1


^ permalink raw reply related	[relevance 4%]

* [PATCH v2 1/3] Making the search indexes optional
  @ 2018-07-18 16:52  6%         ` Eric W. Biederman
  2018-07-18 16:53  4%           ` [PATCH v2 1/3] SearchIdx.pm: Make indexing search positions optional Eric W. Biederman
  0 siblings, 1 reply; 4+ results
From: Eric W. Biederman @ 2018-07-18 16:52 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


This is my respin of these patches.  I have used the levels:
full, medium, basic.

I think basic conveys the message that it is ok to run with and you can
expect most things to work, better than minimal where it feels like
you don't know what will fail.

I have tweaked the reindex tests to run with all 3 different levels
so at least these code paths get exercised.

Eric W. Biederman (3):
      SearchIdx.pm: Make indexing search positions optional
      SearchIdx: Add the mechanism for making all Xapian indexing optional
      SearchIdx: Allow the amount of indexing be configured

 lib/PublicInbox/Config.pm    |   2 +-
 lib/PublicInbox/SearchIdx.pm | 256 +++++++++++++++++++++++--------------------
 t/v1reindex.t                |  43 +++++++-
 t/v2reindex.t                |  40 +++++++
 4 files changed, 220 insertions(+), 121 deletions(-)


^ permalink raw reply	[relevance 6%]

* [PATCH 1/3] SearchIdx.pm: Make indexing search positions optional
  2018-07-17 23:27  7% [PATCH 0/3] Making the search indexes optional Eric W. Biederman
@ 2018-07-17 23:30  5% ` Eric W. Biederman
    1 sibling, 0 replies; 4+ results
From: Eric W. Biederman @ 2018-07-17 23:30 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

About half the size of the Xapian search index turns out to be search
positions.  The search positions are only used in a very narrow set of
queries.  Make the search positions optional so people don't need to
pay the cost of queries they will never make.

This also makes public-inbox more approachable for light hacking as
generating all of the indexes is time consuming.

The way this is done is to add a method to SearchIdx called index_text
that wraps the call of the term generator method index_text.  The new
index_text method takes care of calling both index_text and
increase_termpos (the two functions that are responsible for position
data).

Then index_users, index_diff_inc, index_old_diff_fn, index_diff,
index_body are made proper methods that calls the new index_text.
Callers of the new index_text are slightly simplified as they don't
need to call increase_termpos as well.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm | 94 +++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 45 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 0e0796c12c12..cc92c389a152 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -51,6 +51,7 @@ sub new {
 	my $git_dir = $mainrepo;
 	my ($altid, $git);
 	my $version = 1;
+	my $indexlevel = 'positions';
 	if (ref $ibx) {
 		$mainrepo = $ibx->{mainrepo};
 		$altid = $ibx->{altid};
@@ -72,6 +73,7 @@ sub new {
 		git => $ibx->git,
 		-altid => $altid,
 		version => $version,
+		indexlevel => $indexlevel,
 	}, $class;
 	$ibx->umask_prepare;
 	if ($version == 1) {
@@ -118,34 +120,42 @@ sub add_val ($$$) {
 	$doc->add_value($col, $num);
 }
 
+sub index_text ($$$$)
+{
+	my ($self, $field, $n, $text) = @_;
+	my $tg = $self->term_generator;
+
+	if ($self->{indexlevel} eq 'positions') {
+		$tg->index_text($field, $n, $text);
+		$tg->increase_termpos;
+	} else {
+		$tg->index_text_without_positions($field, $n, $text);
+	}
+}
+
 sub index_users ($$) {
-	my ($tg, $smsg) = @_;
+	my ($self, $smsg) = @_;
 
 	my $from = $smsg->from;
 	my $to = $smsg->to;
 	my $cc = $smsg->cc;
 
-	$tg->index_text($from, 1, 'A'); # A - author
-	$tg->increase_termpos;
-	$tg->index_text($to, 1, 'XTO') if $to ne '';
-	$tg->increase_termpos;
-	$tg->index_text($cc, 1, 'XCC') if $cc ne '';
-	$tg->increase_termpos;
+	$self->index_text($from, 1, 'A'); # A - author
+	$self->index_text($to, 1, 'XTO') if $to ne '';
+	$self->index_text($cc, 1, 'XCC') if $cc ne '';
 }
 
 sub index_diff_inc ($$$$) {
-	my ($tg, $text, $pfx, $xnq) = @_;
+	my ($self, $text, $pfx, $xnq) = @_;
 	if (@$xnq) {
-		$tg->index_text(join("\n", @$xnq), 1, 'XNQ');
-		$tg->increase_termpos;
+		$self->index_text(join("\n", @$xnq), 1, 'XNQ');
 		@$xnq = ();
 	}
-	$tg->index_text($text, 1, $pfx);
-	$tg->increase_termpos;
+	$self->index_text($text, 1, $pfx);
 }
 
 sub index_old_diff_fn {
-	my ($tg, $seen, $fa, $fb, $xnq) = @_;
+	my ($self, $seen, $fa, $fb, $xnq) = @_;
 
 	# no renames or space support for traditional diffs,
 	# find the number of leading common paths to strip:
@@ -156,7 +166,7 @@ sub index_old_diff_fn {
 		$fb = join('/', @fb);
 		if ($fa eq $fb) {
 			unless ($seen->{$fa}++) {
-				index_diff_inc($tg, $fa, 'XDFN', $xnq);
+				$self->index_diff_inc($fa, 'XDFN', $xnq);
 			}
 			return 1;
 		}
@@ -167,22 +177,22 @@ sub index_old_diff_fn {
 }
 
 sub index_diff ($$$) {
-	my ($tg, $lines, $doc) = @_;
+	my ($self, $lines, $doc) = @_;
 	my %seen;
 	my $in_diff;
 	my @xnq;
 	my $xnq = \@xnq;
 	foreach (@$lines) {
 		if ($in_diff && s/^ //) { # diff context
-			index_diff_inc($tg, $_, 'XDFCTX', $xnq);
+			$self->index_diff_inc($_, 'XDFCTX', $xnq);
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
 		} elsif (m!^diff --git ("?a/.+) ("?b/.+)\z!) {
 			my ($fa, $fb) = ($1, $2);
 			my $fn = (split('/', git_unquote($fa), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$fn = (split('/', git_unquote($fb), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		# traditional diff:
 		} elsif (m/^diff -(.+) (\S+) (\S+)$/) {
@@ -190,26 +200,26 @@ sub index_diff ($$$) {
 			push @xnq, $_;
 			# only support unified:
 			next unless $opt =~ /[uU]/;
-			$in_diff = index_old_diff_fn($tg, \%seen, $fa, $fb,
+			$in_diff = $self->index_old_diff_fn(\%seen, $fa, $fb,
 							$xnq);
 		} elsif (m!^--- ("?a/.+)!) {
 			my $fn = (split('/', git_unquote($1), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (m!^\+\+\+ ("?b/.+)!)  {
 			my $fn = (split('/', git_unquote($1), 2))[1];
-			$seen{$fn}++ or index_diff_inc($tg, $fn, 'XDFN', $xnq);
+			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (/^--- (\S+)/) {
 			$in_diff = $1;
 			push @xnq, $_;
 		} elsif (defined $in_diff && /^\+\+\+ (\S+)/) {
-			$in_diff = index_old_diff_fn($tg, \%seen, $in_diff, $1,
+			$in_diff = $self->index_old_diff_fn(\%seen, $in_diff, $1,
 							$xnq);
 		} elsif ($in_diff && s/^\+//) { # diff added
-			index_diff_inc($tg, $_, 'XDFB', $xnq);
+			$self->index_diff_inc($_, 'XDFB', $xnq);
 		} elsif ($in_diff && s/^-//) { # diff removed
-			index_diff_inc($tg, $_, 'XDFA', $xnq);
+			$self->index_diff_inc($_, 'XDFA', $xnq);
 		} elsif (m!^index ([a-f0-9]+)\.\.([a-f0-9]+)!) {
 			my ($ba, $bb) = ($1, $2);
 			index_git_blob_id($doc, 'XDFPRE', $ba);
@@ -219,7 +229,7 @@ sub index_diff ($$$) {
 			# traditional diff w/o -p
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)$/) {
 			# hunk header context
-			index_diff_inc($tg, $1, 'XDFHH', $xnq);
+			$self->index_diff_inc($1, 'XDFHH', $xnq);
 		# ignore the following lines:
 		} elsif (/^(?:dis)similarity index/ ||
 				/^(?:old|new) mode/ ||
@@ -238,25 +248,23 @@ sub index_diff ($$$) {
 		}
 	}
 
-	$tg->index_text(join("\n", @xnq), 1, 'XNQ');
-	$tg->increase_termpos;
+	$self->index_text(join("\n", @xnq), 1, 'XNQ');
 }
 
 sub index_body ($$$) {
-	my ($tg, $lines, $doc) = @_;
+	my ($self, $lines, $doc) = @_;
 	my $txt = join("\n", @$lines);
 	if ($doc) {
 		# does it look like a diff?
 		if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
 			$txt = undef;
-			index_diff($tg, $lines, $doc);
+			$self->index_diff($lines, $doc);
 		} else {
-			$tg->index_text($txt, 1, 'XNQ');
+			$self->index_text($txt, 1, 'XNQ');
 		}
 	} else {
-		$tg->index_text($txt, 0, 'XQUOT');
+		$self->index_text($txt, 0, 'XQUOT');
 	}
-	$tg->increase_termpos;
 	@$lines = ();
 }
 
@@ -284,18 +292,15 @@ sub add_message {
 		my $tg = $self->term_generator;
 
 		$tg->set_document($doc);
-		$tg->index_text($subj, 1, 'S') if $subj;
-		$tg->increase_termpos;
-
-		index_users($tg, $smsg);
+		$self->index_text($subj, 1, 'S') if $subj;
+		$self->index_users($smsg);
 
 		msg_iter($mime, sub {
 			my ($part, $depth, @idx) = @{$_[0]};
 			my $ct = $part->content_type || 'text/plain';
 			my $fn = $part->filename;
 			if (defined $fn && $fn ne '') {
-				$tg->index_text($fn, 1, 'XFN');
-				$tg->increase_termpos;
+				$self->index_text($fn, 1, 'XFN');
 			}
 
 			return if $ct =~ m!\btext/x?html\b!i;
@@ -318,27 +323,26 @@ sub add_message {
 			my @lines = split(/\n/, $body);
 			while (defined(my $l = shift @lines)) {
 				if ($l =~ /^>/) {
-					index_body($tg, \@orig, $doc) if @orig;
+					$self->index_body(\@orig, $doc) if @orig;
 					push @quot, $l;
 				} else {
-					index_body($tg, \@quot, 0) if @quot;
+					$self->index_body(\@quot, 0) if @quot;
 					push @orig, $l;
 				}
 			}
-			index_body($tg, \@quot, 0) if @quot;
-			index_body($tg, \@orig, $doc) if @orig;
+			$self->index_body(\@quot, 0) if @quot;
+			$self->index_body(\@orig, $doc) if @orig;
 		});
 
 		foreach my $mid (@$mids) {
-			$tg->index_text($mid, 1, 'XM');
+			$self->index_text($mid, 1, 'XM');
 
 			# because too many Message-IDs are prefixed with
 			# "Pine.LNX."...
 			if ($mid =~ /\w{12,}/) {
 				my @long = ($mid =~ /(\w{3,}+)/g);
-				$tg->index_text(join(' ', @long), 1, 'XM');
+				$self->index_text(join(' ', @long), 1, 'XM');
 			}
-			$tg->increase_termpos;
 		}
 		$smsg->{to} = $smsg->{cc} = '';
 		PublicInbox::OverIdx::parse_references($smsg, $mid0, $mids);
-- 
2.17.1


^ permalink raw reply related	[relevance 5%]

* [PATCH 0/3] Making the search indexes optional
@ 2018-07-17 23:27  7% Eric W. Biederman
  2018-07-17 23:30  5% ` [PATCH 1/3] SearchIdx.pm: Make indexing search positions optional Eric W. Biederman
    0 siblings, 2 replies; 4+ results
From: Eric W. Biederman @ 2018-07-17 23:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Here is the code to make the Xapian search indexes optional.

The first patch makes the term position database optional.
The second patch makes anything in Xapian optional.
Finally the last patch adds a config option.

At the end of the day it all looks simple and straight forward so I feel
good about the code.  At the very least it looks like a good starting
point.

What this code does not do is make the Xapian code modules optional.  As
that is more involved, and there is not much reward for that.  With a
little cleverness in moving around code that is probably possible in a
follow change.

Eric W. Biederman (3):
      SearchIdx.pm: Make indexing search positions optional
      SearchIdx: Add the mechanism for making all Xapian indexing optional
      SearchIdx: Allow the amount of indexing be configured

 lib/PublicInbox/Config.pm    |   2 +-
 lib/PublicInbox/SearchIdx.pm | 255 +++++++++++++++++++++++--------------------
 2 files changed, 137 insertions(+), 120 deletions(-)



^ permalink raw reply	[relevance 7%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-07-17 23:27  7% [PATCH 0/3] Making the search indexes optional Eric W. Biederman
2018-07-17 23:30  5% ` [PATCH 1/3] SearchIdx.pm: Make indexing search positions optional Eric W. Biederman
2018-07-17 23:30     ` [PATCH 3/3] SearchIdx: Allow the amount of indexing be configured Eric W. Biederman
2018-07-18 10:22       ` Eric Wong
2018-07-18 16:00         ` Eric W. Biederman
2018-07-18 16:31           ` Eric Wong
2018-07-18 16:52  6%         ` [PATCH v2 1/3] Making the search indexes optional Eric W. Biederman
2018-07-18 16:53  4%           ` [PATCH v2 1/3] SearchIdx.pm: Make indexing search positions optional Eric W. Biederman

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).