user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/6] searchidx: minor fix and some cleanups
@ 2020-01-03  8:45 Eric Wong
  2020-01-03  8:45 ` [PATCH 1/6] searchidx: index_diff: allow /^$/ line as diff context Eric Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:45 UTC (permalink / raw)
  To: meta

1/6 is a real bug fix, 4/6 is a nice simplification
and the rest makes reading easier when I haven't looked
at the indexing code in a while.

Xapian indexing is still expensive, though...

Eric Wong (6):
  searchidx: index_diff: allow /^$/ line as diff context
  searchidx: split off index_xapian for msg_iter
  searchidx: add_message: fix and make use of prototypes
  searchidx: simplify quote-splitting in index_body
  searchidx: index_text: use Xapian parameter names
  searchidx: remove_message: pedantic fix for v1

 lib/PublicInbox/SearchIdx.pm | 177 ++++++++++++++++-------------------
 1 file changed, 83 insertions(+), 94 deletions(-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] searchidx: index_diff: allow /^$/ line as diff context
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
@ 2020-01-03  8:45 ` Eric Wong
  2020-01-03  8:45 ` [PATCH 2/6] searchidx: split off index_xapian for msg_iter Eric Wong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:45 UTC (permalink / raw)
  To: meta

As discovered by solver bug hunting, "git apply" also handles
the case where blank lines w/o leading space are treated as diff
context, apparently because GNU diff once did it:

https://public-inbox.org/git/b507b465f7831612b9d9fc643e3e5218b64e5bfa/s/
---
 lib/PublicInbox/SearchIdx.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 21ab8119..4cfbc4aa 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -255,7 +255,9 @@ sub index_diff ($$$) {
 				/^Binary files .* differ/) {
 			push @xnq, $_;
 		} elsif ($_ eq '') {
-			$in_diff = undef;
+			# possible to be in diff context, some mail may be
+			# stripped by MUA or even GNU diff(1).  "git apply"
+			# treats a bare "\n" as diff context, too
 		} else {
 			push @xnq, $_;
 			warn "non-diff line: $_\n" if DEBUG && $_ ne '';

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/6] searchidx: split off index_xapian for msg_iter
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
  2020-01-03  8:45 ` [PATCH 1/6] searchidx: index_diff: allow /^$/ line as diff context Eric Wong
@ 2020-01-03  8:45 ` Eric Wong
  2020-01-03  8:46 ` [PATCH 3/6] searchidx: add_message: fix and make use of prototypes Eric Wong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:45 UTC (permalink / raw)
  To: meta

This ought to save some memory, but it's probably lost in the
noise given the cost of indexing.  Regardless it still reduces
the indentation level and makes future changes easier to read.
---
 lib/PublicInbox/SearchIdx.pm | 54 +++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 4cfbc4aa..5065974c 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -285,6 +285,33 @@ sub index_body ($$$) {
 	@$lines = ();
 }
 
+sub index_xapian { # msg_iter callback
+	my ($part, $depth, @idx) = @{$_[0]};
+	my ($self, $doc) = @{$_[1]};
+	my $ct = $part->content_type || 'text/plain';
+	my $fn = $part->filename;
+	if (defined $fn && $fn ne '') {
+		$self->index_text($fn, 1, 'XFN');
+	}
+
+	my ($s, undef) = msg_part_text($part, $ct);
+	defined $s or return;
+
+	my (@orig, @quot);
+	my @lines = split(/\n/, $s);
+	while (defined(my $l = shift @lines)) {
+		if ($l =~ /^>/) {
+			$self->index_body(\@orig, $doc) if @orig;
+			push @quot, $l;
+		} else {
+			$self->index_body(\@quot, 0) if @quot;
+			push @orig, $l;
+		}
+	}
+	$self->index_body(\@quot, 0) if @quot;
+	$self->index_body(\@orig, $doc) if @orig;
+}
+
 sub add_xapian ($$$$$) {
 	my ($self, $mime, $num, $oid, $mids, $mid0) = @_;
 	my $smsg = PublicInbox::SearchMsg->new($mime);
@@ -303,32 +330,7 @@ sub add_xapian ($$$$$) {
 	$self->index_text($subj, 1, 'S') if $subj;
 	$self->index_users($smsg);
 
-	msg_iter($mime, sub {
-		my ($part, $depth, @idx) = @{$_[0]};
-		my $ct = $part->content_type || 'text/plain';
-		my $fn = $part->filename;
-		if (defined $fn && $fn ne '') {
-			$self->index_text($fn, 1, 'XFN');
-		}
-
-		my ($s, undef) = msg_part_text($part, $ct);
-		defined $s or return;
-
-		my (@orig, @quot);
-		my @lines = split(/\n/, $s);
-		while (defined(my $l = shift @lines)) {
-			if ($l =~ /^>/) {
-				$self->index_body(\@orig, $doc) if @orig;
-				push @quot, $l;
-			} else {
-				$self->index_body(\@quot, 0) if @quot;
-				push @orig, $l;
-			}
-		}
-		$self->index_body(\@quot, 0) if @quot;
-		$self->index_body(\@orig, $doc) if @orig;
-	});
-
+	msg_iter($mime, \&index_xapian, [ $self, $doc ]);
 	foreach my $mid (@$mids) {
 		$self->index_text($mid, 1, 'XM');
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/6] searchidx: add_message: fix and make use of prototypes
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
  2020-01-03  8:45 ` [PATCH 1/6] searchidx: index_diff: allow /^$/ line as diff context Eric Wong
  2020-01-03  8:45 ` [PATCH 2/6] searchidx: split off index_xapian for msg_iter Eric Wong
@ 2020-01-03  8:46 ` Eric Wong
  2020-01-03  8:46 ` [PATCH 4/6] searchidx: simplify quote-splitting in index_body Eric Wong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:46 UTC (permalink / raw)
  To: meta

Procedural function calls allow prototype checking, and
our add_message prototype was totally wrong to begin with.
Convert most of the "$self->index_*" calls to "index_*($self"

While we're at it, use "//=" to avoid some "unless" statements.
---
 lib/PublicInbox/SearchIdx.pm | 121 +++++++++++++++++------------------
 1 file changed, 59 insertions(+), 62 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 5065974c..62e836e0 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -133,10 +133,19 @@ sub add_val ($$$) {
 	$doc->add_value($col, $num);
 }
 
-sub index_text ($$$$)
-{
+sub term_generator ($) { # write-only
+	my ($self) = @_;
+
+	$self->{term_generator} //= do {
+		my $tg = $X->{TermGenerator}->new;
+		$tg->set_stemmer($self->stemmer);
+		$tg;
+	}
+}
+
+sub index_text ($$$$) {
 	my ($self, $field, $n, $text) = @_;
-	my $tg = $self->term_generator;
+	my $tg = term_generator($self);
 
 	if ($self->{indexlevel} eq 'full') {
 		$tg->index_text($field, $n, $text);
@@ -153,18 +162,18 @@ sub index_users ($$) {
 	my $to = $smsg->to;
 	my $cc = $smsg->cc;
 
-	$self->index_text($from, 1, 'A'); # A - author
-	$self->index_text($to, 1, 'XTO') if $to ne '';
-	$self->index_text($cc, 1, 'XCC') if $cc ne '';
+	index_text($self, $from, 1, 'A'); # A - author
+	index_text($self, $to, 1, 'XTO') if $to ne '';
+	index_text($self, $cc, 1, 'XCC') if $cc ne '';
 }
 
 sub index_diff_inc ($$$$) {
 	my ($self, $text, $pfx, $xnq) = @_;
 	if (@$xnq) {
-		$self->index_text(join("\n", @$xnq), 1, 'XNQ');
+		index_text($self, join("\n", @$xnq), 1, 'XNQ');
 		@$xnq = ();
 	}
-	$self->index_text($text, 1, $pfx);
+	index_text($self, $text, 1, $pfx);
 }
 
 sub index_old_diff_fn {
@@ -179,7 +188,7 @@ sub index_old_diff_fn {
 		$fb = join('/', @fb);
 		if ($fa eq $fb) {
 			unless ($seen->{$fa}++) {
-				$self->index_diff_inc($fa, 'XDFN', $xnq);
+				index_diff_inc($self, $fa, 'XDFN', $xnq);
 			}
 			return 1;
 		}
@@ -197,15 +206,15 @@ sub index_diff ($$$) {
 	my $xnq = \@xnq;
 	foreach (@$lines) {
 		if ($in_diff && s/^ //) { # diff context
-			$self->index_diff_inc($_, 'XDFCTX', $xnq);
+			index_diff_inc($self, $_, 'XDFCTX', $xnq);
 		} elsif (/^-- $/) { # email signature begins
 			$in_diff = undef;
 		} elsif (m!^diff --git ("?a/.+) ("?b/.+)\z!) {
 			my ($fa, $fb) = ($1, $2);
 			my $fn = (split('/', git_unquote($fa), 2))[1];
-			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
+			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
 			$fn = (split('/', git_unquote($fb), 2))[1];
-			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
+			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		# traditional diff:
 		} elsif (m/^diff -(.+) (\S+) (\S+)$/) {
@@ -213,28 +222,28 @@ sub index_diff ($$$) {
 			push @xnq, $_;
 			# only support unified:
 			next unless $opt =~ /[uU]/;
-			$in_diff = $self->index_old_diff_fn(\%seen, $fa, $fb,
+			$in_diff = index_old_diff_fn($self, \%seen, $fa, $fb,
 							$xnq);
 		} elsif (m!^--- ("?a/.+)!) {
 			my $fn = $1;
 			$fn = (split('/', git_unquote($fn), 2))[1];
-			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
+			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (m!^\+\+\+ ("?b/.+)!)  {
 			my $fn = $1;
 			$fn = (split('/', git_unquote($fn), 2))[1];
-			$seen{$fn}++ or $self->index_diff_inc($fn, 'XDFN', $xnq);
+			$seen{$fn}++ or index_diff_inc($self, $fn, 'XDFN', $xnq);
 			$in_diff = 1;
 		} elsif (/^--- (\S+)/) {
 			$in_diff = $1;
 			push @xnq, $_;
 		} elsif (defined $in_diff && /^\+\+\+ (\S+)/) {
-			$in_diff = $self->index_old_diff_fn(\%seen, $in_diff, $1,
-							$xnq);
+			$in_diff = index_old_diff_fn($self, \%seen, $in_diff,
+							$1, $xnq);
 		} elsif ($in_diff && s/^\+//) { # diff added
-			$self->index_diff_inc($_, 'XDFB', $xnq);
+			index_diff_inc($self, $_, 'XDFB', $xnq);
 		} elsif ($in_diff && s/^-//) { # diff removed
-			$self->index_diff_inc($_, 'XDFA', $xnq);
+			index_diff_inc($self, $_, 'XDFA', $xnq);
 		} elsif (m!^index ([a-f0-9]+)\.\.([a-f0-9]+)!) {
 			my ($ba, $bb) = ($1, $2);
 			index_git_blob_id($doc, 'XDFPRE', $ba);
@@ -244,7 +253,7 @@ sub index_diff ($$$) {
 			# traditional diff w/o -p
 		} elsif (/^@@ (?:\S+) (?:\S+) @@\s*(\S+.*)$/) {
 			# hunk header context
-			$self->index_diff_inc($1, 'XDFHH', $xnq);
+			index_diff_inc($self, $1, 'XDFHH', $xnq);
 		# ignore the following lines:
 		} elsif (/^(?:dis)similarity index/ ||
 				/^(?:old|new) mode/ ||
@@ -265,7 +274,7 @@ sub index_diff ($$$) {
 		}
 	}
 
-	$self->index_text(join("\n", @xnq), 1, 'XNQ');
+	index_text($self, join("\n", @xnq), 1, 'XNQ');
 }
 
 sub index_body ($$$) {
@@ -275,12 +284,12 @@ sub index_body ($$$) {
 		# does it look like a diff?
 		if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
 			$txt = undef;
-			$self->index_diff($lines, $doc);
+			index_diff($self, $lines, $doc);
 		} else {
-			$self->index_text($txt, 1, 'XNQ');
+			index_text($self, $txt, 1, 'XNQ');
 		}
 	} else {
-		$self->index_text($txt, 0, 'XQUOT');
+		index_text($self, $txt, 0, 'XQUOT');
 	}
 	@$lines = ();
 }
@@ -291,7 +300,7 @@ sub index_xapian { # msg_iter callback
 	my $ct = $part->content_type || 'text/plain';
 	my $fn = $part->filename;
 	if (defined $fn && $fn ne '') {
-		$self->index_text($fn, 1, 'XFN');
+		index_text($self, $fn, 1, 'XFN');
 	}
 
 	my ($s, undef) = msg_part_text($part, $ct);
@@ -301,18 +310,18 @@ sub index_xapian { # msg_iter callback
 	my @lines = split(/\n/, $s);
 	while (defined(my $l = shift @lines)) {
 		if ($l =~ /^>/) {
-			$self->index_body(\@orig, $doc) if @orig;
+			index_body($self, \@orig, $doc) if @orig;
 			push @quot, $l;
 		} else {
-			$self->index_body(\@quot, 0) if @quot;
+			index_body($self, \@quot, 0) if @quot;
 			push @orig, $l;
 		}
 	}
-	$self->index_body(\@quot, 0) if @quot;
-	$self->index_body(\@orig, $doc) if @orig;
+	index_body($self, \@quot, 0) if @quot;
+	index_body($self, \@orig, $doc) if @orig;
 }
 
-sub add_xapian ($$$$$) {
+sub add_xapian ($$$$$$) {
 	my ($self, $mime, $num, $oid, $mids, $mid0) = @_;
 	my $smsg = PublicInbox::SearchMsg->new($mime);
 	my $doc = $X->{Document}->new;
@@ -324,21 +333,21 @@ sub add_xapian ($$$$$) {
 	my $dt = strftime('%Y%m%d%H%M%S', @ds);
 	add_val($doc, PublicInbox::Search::DT(), $dt);
 
-	my $tg = $self->term_generator;
+	my $tg = term_generator($self);
 
 	$tg->set_document($doc);
-	$self->index_text($subj, 1, 'S') if $subj;
-	$self->index_users($smsg);
+	index_text($self, $subj, 1, 'S') if $subj;
+	index_users($self, $smsg);
 
 	msg_iter($mime, \&index_xapian, [ $self, $doc ]);
 	foreach my $mid (@$mids) {
-		$self->index_text($mid, 1, 'XM');
+		index_text($self, $mid, 1, 'XM');
 
 		# because too many Message-IDs are prefixed with
 		# "Pine.LNX."...
 		if ($mid =~ /\w{12,}/) {
 			my @long = ($mid =~ /(\w{3,}+)/g);
-			$self->index_text(join(' ', @long), 1, 'XM');
+			index_text($self, join(' ', @long), 1, 'XM');
 		}
 	}
 	$smsg->{to} = $smsg->{cc} = '';
@@ -359,18 +368,27 @@ sub add_xapian ($$$$$) {
 	$self->{xdb}->replace_document($num, $doc);
 }
 
+sub _msgmap_init ($) {
+	my ($self) = @_;
+	die "BUG: _msgmap_init is only for v1\n" if $self->{version} != 1;
+	$self->{mm} //= eval {
+		require PublicInbox::Msgmap;
+		PublicInbox::Msgmap->new($self->{inboxdir}, 1);
+	};
+}
+
 sub add_message {
 	# mime = Email::MIME object
 	my ($self, $mime, $bytes, $num, $oid, $mid0) = @_;
 	my $mids = mids_for_index($mime->header_obj);
-	$mid0 = $mids->[0] unless defined $mid0; # v1 compatibility
-	unless (defined $num) { # v1
-		$self->_msgmap_init;
-		$num = index_mm($self, $mime);
-	}
+	$mid0 //= $mids->[0]; # v1 compatibility
+	$num //= do { # v1
+		_msgmap_init($self);
+		index_mm($self, $mime);
+	};
 	eval {
 		if (need_xapian($self)) {
-			$self->add_xapian($mime, $num, $oid, $mids, $mid0)
+			add_xapian($self, $mime, $num, $oid, $mids, $mid0);
 		}
 		if (my $over = $self->{over}) {
 			$over->add_overview($mime, $bytes, $num, $oid, $mid0);
@@ -468,18 +486,6 @@ sub remove_by_oid {
 	scalar(@delete);
 }
 
-sub term_generator { # write-only
-	my ($self) = @_;
-
-	my $tg = $self->{term_generator};
-	return $tg if $tg;
-
-	$tg = $X->{TermGenerator}->new;
-	$tg->set_stemmer($self->stemmer);
-
-	$self->{term_generator} = $tg;
-}
-
 sub index_git_blob_id {
 	my ($doc, $pfx, $objid) = @_;
 
@@ -617,15 +623,6 @@ sub read_log {
 	$batch_cb->($nr, $latest, $newest);
 }
 
-sub _msgmap_init {
-	my ($self) = @_;
-	die "BUG: _msgmap_init is only for v1\n" if $self->{version} != 1;
-	$self->{mm} ||= eval {
-		require PublicInbox::Msgmap;
-		PublicInbox::Msgmap->new($self->{inboxdir}, 1);
-	};
-}
-
 sub _git_log {
 	my ($self, $opts, $range) = @_;
 	my $git = $self->{git};

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/6] searchidx: simplify quote-splitting in index_body
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
                   ` (2 preceding siblings ...)
  2020-01-03  8:46 ` [PATCH 3/6] searchidx: add_message: fix and make use of prototypes Eric Wong
@ 2020-01-03  8:46 ` Eric Wong
  2020-01-03  8:46 ` [PATCH 5/6] searchidx: index_text: use Xapian parameter names Eric Wong
  2020-01-03  8:46 ` [PATCH 6/6] searchidx: remove_message: pedantic fix for v1 Eric Wong
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:46 UTC (permalink / raw)
  To: meta

We now use the same regexp View::add_text_body uses.
---
 lib/PublicInbox/SearchIdx.pm | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 62e836e0..47537ed4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -199,12 +199,12 @@ sub index_old_diff_fn {
 }
 
 sub index_diff ($$$) {
-	my ($self, $lines, $doc) = @_;
+	my ($self, $txt, $doc) = @_;
 	my %seen;
 	my $in_diff;
 	my @xnq;
 	my $xnq = \@xnq;
-	foreach (@$lines) {
+	foreach (split(/\n/, $txt)) {
 		if ($in_diff && s/^ //) { # diff context
 			index_diff_inc($self, $_, 'XDFCTX', $xnq);
 		} elsif (/^-- $/) { # email signature begins
@@ -278,20 +278,17 @@ sub index_diff ($$$) {
 }
 
 sub index_body ($$$) {
-	my ($self, $lines, $doc) = @_;
-	my $txt = join("\n", @$lines);
+	my ($self, $txt, $doc) = @_;
 	if ($doc) {
 		# does it look like a diff?
 		if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
-			$txt = undef;
-			index_diff($self, $lines, $doc);
+			index_diff($self, $txt, $doc);
 		} else {
 			index_text($self, $txt, 1, 'XNQ');
 		}
 	} else {
 		index_text($self, $txt, 0, 'XQUOT');
 	}
-	@$lines = ();
 }
 
 sub index_xapian { # msg_iter callback
@@ -306,19 +303,10 @@ sub index_xapian { # msg_iter callback
 	my ($s, undef) = msg_part_text($part, $ct);
 	defined $s or return;
 
-	my (@orig, @quot);
-	my @lines = split(/\n/, $s);
-	while (defined(my $l = shift @lines)) {
-		if ($l =~ /^>/) {
-			index_body($self, \@orig, $doc) if @orig;
-			push @quot, $l;
-		} else {
-			index_body($self, \@quot, 0) if @quot;
-			push @orig, $l;
-		}
-	}
-	index_body($self, \@quot, 0) if @quot;
-	index_body($self, \@orig, $doc) if @orig;
+	# split off quoted and unquoted blocks:
+	my @sections = split(/((?:^>[^\n]*\n)+)/sm, $s);
+	$part = $s = undef;
+	index_body($self, $_, /\A>/ ? 0 : $doc) for @sections;
 }
 
 sub add_xapian ($$$$$$) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 5/6] searchidx: index_text: use Xapian parameter names
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
                   ` (3 preceding siblings ...)
  2020-01-03  8:46 ` [PATCH 4/6] searchidx: simplify quote-splitting in index_body Eric Wong
@ 2020-01-03  8:46 ` Eric Wong
  2020-01-03  8:46 ` [PATCH 6/6] searchidx: remove_message: pedantic fix for v1 Eric Wong
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:46 UTC (permalink / raw)
  To: meta

Use the parameter names from the Search::Xapian::TermGenerator
manpage for our local variables instead of confusing names...
---
 lib/PublicInbox/SearchIdx.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 47537ed4..ca1457fd 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -144,14 +144,14 @@ sub term_generator ($) { # write-only
 }
 
 sub index_text ($$$$) {
-	my ($self, $field, $n, $text) = @_;
-	my $tg = term_generator($self);
+	my ($self, $text, $wdf_inc, $prefix) = @_;
+	my $tg = term_generator($self); # man Search::Xapian::TermGenerator
 
 	if ($self->{indexlevel} eq 'full') {
-		$tg->index_text($field, $n, $text);
+		$tg->index_text($text, $wdf_inc, $prefix);
 		$tg->increase_termpos;
 	} else {
-		$tg->index_text_without_positions($field, $n, $text);
+		$tg->index_text_without_positions($text, $wdf_inc, $prefix);
 	}
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 6/6] searchidx: remove_message: pedantic fix for v1
  2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
                   ` (4 preceding siblings ...)
  2020-01-03  8:46 ` [PATCH 5/6] searchidx: index_text: use Xapian parameter names Eric Wong
@ 2020-01-03  8:46 ` Eric Wong
  5 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2020-01-03  8:46 UTC (permalink / raw)
  To: meta

It shouldn't be possible for v1 inboxes to have multiple matches
for a given Message-ID, so the sub would only get called once,
but strange things could happen in 2112 :>
---
 lib/PublicInbox/SearchIdx.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index ca1457fd..0d983aab 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -433,7 +433,7 @@ sub remove_message {
 		batch_do($self, 'Q' . $mid, sub {
 			my ($ids) = @_;
 			$db->delete_document($_) for @$ids;
-			$nr = scalar @$ids;
+			$nr += scalar @$ids;
 		});
 	};
 	if ($@) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-03  8:45 [PATCH 0/6] searchidx: minor fix and some cleanups Eric Wong
2020-01-03  8:45 ` [PATCH 1/6] searchidx: index_diff: allow /^$/ line as diff context Eric Wong
2020-01-03  8:45 ` [PATCH 2/6] searchidx: split off index_xapian for msg_iter Eric Wong
2020-01-03  8:46 ` [PATCH 3/6] searchidx: add_message: fix and make use of prototypes Eric Wong
2020-01-03  8:46 ` [PATCH 4/6] searchidx: simplify quote-splitting in index_body Eric Wong
2020-01-03  8:46 ` [PATCH 5/6] searchidx: index_text: use Xapian parameter names Eric Wong
2020-01-03  8:46 ` [PATCH 6/6] searchidx: remove_message: pedantic fix for v1 Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git