user/dev discussion of public-inbox itself
 help / Atom feed
* [PATCH 00/13]: Incremental index fixes
@ 2018-08-01 16:41 ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: ebiederm @ 2018-08-01 16:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Eric,

Recently I noticed some weird indexing on one of my mailboxes.  So I dug
in and made the index_sync tests more robust and found several corner
cases where the current code can be problematic.  What follows are the
test cases and fixes for those cases.

The big one is when a message with the highest message number is
deleted.  This can cause message numbers to be reassigned so I
explicitly track the highest message number assigned.

Very confusing when reading mail there is a case in SearchIdx
that has been assigning message numbers in the wrong order.  Depending
on your time range that can be very confusing as old messages show up
before newer ones.

Finally in v2 deleted messages have not been being deleted from the
msgmap.  Which while great for keeping message numbers from going
backwards it means things still show up that shouldn't.

Eric W. Biederman (13):
      Import.pm: Don't assume {in} and {out} always exist
      t/v1reindex.t: Isolate the test cases
      t/v2reindex.t: Isolate the test cases more
      t/v[12]reindex.t: Place expected second in Xapian tests
      t/v[12]reindex.t: Test that the resulting msgmap is as expected
      t/v[12]reindex.t: Test incremental indexing works
      SearchIdx.pm: Always assign numbers backwards during incremental indexing
      Msgmap.pm: Track the largest value of num ever assigned
      t/v[12]reindex.t Verify num_highwater
      t/v[12]reindex.t: Verify the num highwater is as expected
      SearchIdx,V2Writeable: Update num_highwater on optimized deletes
      V2Writeable.pm: Ensure that a found message number is in the msgmap
      V2Writeable.pm: In unindex_oid delete the message from msgmap

 lib/PublicInbox/Import.pm     |   1 +
 lib/PublicInbox/Msgmap.pm     |  23 ++-
 lib/PublicInbox/SearchIdx.pm  |  30 ++--
 lib/PublicInbox/V2Writable.pm |  20 ++-
 t/v1reindex.t                 | 359 ++++++++++++++++++++++++++++++++++++------
 t/v2reindex.t                 | 335 +++++++++++++++++++++++++++++++++++----
 6 files changed, 666 insertions(+), 102 deletions(-)


Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While working on one of the tests I did:
	my $im = PublicInbox::V2Writable->new($ibx, 1);
	my $im0 = $im->importer();
	$im->add($mime);

Which resulted in a warning of the use of an undefined value from
atfork_child, and the test failing nastily.  Inspection of the code
reveals this can happen anytime gfi_start has not been called.

So just fix atfork_child to skip closing file descriptors that have
not yet been setup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Import.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 4e3b4c551797..bfa7a8053297 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -451,6 +451,7 @@ sub done {
 sub atfork_child {
 	my ($self) = @_;
 	foreach my $f (qw(in out)) {
+		next unless defined($self->{$f});
 		close $self->{$f} or die "failed to close import[$f]: $!\n";
 	}
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 02/13] t/v1reindex.t: Isolate the test cases
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While inspecting the tests I realized that because we have been
reusing variables there can be a memory between one test case and
another.  Add scopes and local variables to prevent an unintended
memory between one test and another.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 111 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 66 insertions(+), 45 deletions(-)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 75380f0f5f22..fdffdaeee892 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -22,7 +22,6 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $ibx = PublicInbox::Inbox->new($ibx_config);
 my $mime = PublicInbox::MIME->create(
 	header => [
 		From => 'a@example.com',
@@ -32,55 +31,72 @@ my $mime = PublicInbox::MIME->create(
 	],
 	body => "hello world\n",
 );
-my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
-foreach my $i (1..10) {
-	$mime->header_set('Message-Id', "<$i\@example.com>");
-	ok($im->add($mime), "message $i added");
-	if ($i == 4) {
-		$im->remove($mime);
+my $minmax;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	foreach my $i (1..10) {
+		$mime->header_set('Message-Id', "<$i\@example.com>");
+		ok($im->add($mime), "message $i added");
+		if ($i == 4) {
+			$im->remove($mime);
+		}
 	}
-}
 
-if ('test remove later') {
-	$mime->header_set('Message-Id', "<5\@example.com>");
-	$im->remove($mime);
-}
+	if ('test remove later') {
+		$mime->header_set('Message-Id', "<5\@example.com>");
+		$im->remove($mime);
+	}
 
-$im->done;
-my $rw = PublicInbox::SearchIdx->new($ibx, 1);
-eval { $rw->index_sync() };
-is($@, '', 'no error from indexing');
+	$im->done;
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	eval { $rw->index_sync() };
+	is($@, '', 'no error from indexing');
 
-my $minmax = [ $ibx->mm->minmax ];
-ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
-is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	$minmax = [ $ibx->mm->minmax ];
+	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
+	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+}
 
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
-eval { $rw->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	eval { $rw->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+}
 
 my $xap = "$mainrepo/public-inbox/xapian".PublicInbox::Search::SCHEMA_VERSION();
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed');
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 
-eval { $rw->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
-ok(-d $xap, 'Xapian directories recreated');
+	eval { $rw->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+	ok(-d $xap, 'Xapian directories recreated');
 
-delete $ibx->{mm};
-is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is(scalar(@warn), 0, 'no warnings from reindexing');
@@ -93,11 +109,13 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -110,13 +128,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -131,13 +150,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'basic';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'basic';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -152,13 +172,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 # upgrade existing basic to medium
 # note: changing indexlevels is not yet supported in v2,
 # and may not be without more effort
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 # no removals
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexleve} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync };
 	is($@, '', 'no error from indexing');
 	is_deeply(\@warn, [], 'no warnings');
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 03/13] t/v2reindex.t: Isolate the test cases more
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While inspecting the tests I realized that because we have been
reusing variables there can be a memory between one test case and
another.  Add scopes and local variables to prevent an unintended
memory between one test cases.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v2reindex.t | 85 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 31 deletions(-)

diff --git a/t/v2reindex.t b/t/v2reindex.t
index 1543309cf1ee..4d06c6ce6ea6 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -21,7 +21,6 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $ibx = PublicInbox::Inbox->new($ibx_config);
 my $mime = PublicInbox::MIME->create(
 	header => [
 		From => 'a@example.com',
@@ -32,39 +31,57 @@ my $mime = PublicInbox::MIME->create(
 	body => "hello world\n",
 );
 local $ENV{NPROC} = 2;
-my $im = PublicInbox::V2Writable->new($ibx, 1);
-foreach my $i (1..10) {
-	$mime->header_set('Message-Id', "<$i\@example.com>");
-	ok($im->add($mime), "message $i added");
-	if ($i == 4) {
+my $minmax;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	foreach my $i (1..10) {
+		$mime->header_set('Message-Id', "<$i\@example.com>");
+		ok($im->add($mime), "message $i added");
+		if ($i == 4) {
+			$im->remove($mime);
+		}
+	}
+
+	if ('test remove later') {
+		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
 	}
-}
 
-if ('test remove later') {
-	$mime->header_set('Message-Id', "<5\@example.com>");
-	$im->remove($mime);
+	$im->done;
+	$minmax = [ $ibx->mm->minmax ];
+	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
+	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
 }
 
-$im->done;
-my $minmax = [ $ibx->mm->minmax ];
-ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
-is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	eval { $im->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
 
-eval { $im->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 my $xap = "$mainrepo/xap".PublicInbox::Search::SCHEMA_VERSION();
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed');
-eval { $im->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
-ok(-d $xap, 'Xapian directories recreated');
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	eval { $im->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+	ok(-d $xap, 'Xapian directories recreated');
 
-delete $ibx->{mm};
-is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
@@ -72,6 +89,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is(scalar(@warn), 0, 'no warnings from reindexing');
@@ -88,6 +108,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -103,13 +126,13 @@ ok(!-d $xap, 'Xapian directories removed again');
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$im = PublicInbox::V2Writable->new($ibx);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -135,13 +158,13 @@ $im = PublicInbox::V2Writable->new($ibx);
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'basic';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$im = PublicInbox::V2Writable->new($ibx);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'basic';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (2 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Place the expected value second in is and isnt tests because when
these tests fail they report the second value as the expected value.

A report saying got 0 expected 8 'no Xapian search results' can be confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 6 +++---
 t/v2reindex.t | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index fdffdaeee892..de4fafda5ae9 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -144,7 +144,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
-	isnt(0, $mset->size, 'got Xapian search results');
+	isnt($mset->size, 0, 'got Xapian search results');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -166,7 +166,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
-	is(0, $mset->size, "no Xapian search results");
+	is($mset->size, 0, "no Xapian search results");
 }
 
 # upgrade existing basic to medium
@@ -184,7 +184,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is($@, '', 'no error from indexing');
 	is_deeply(\@warn, [], 'no warnings');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
-	isnt(0, $mset->size, 'search OK after basic -> medium');
+	isnt($mset->size, 0, 'search OK after basic -> medium');
 }
 
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 4d06c6ce6ea6..67d8be787a78 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -119,7 +119,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
-	isnt(0, $mset->size, "phrase search succeeds on indexlevel=full");
+	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 }
 
@@ -146,11 +146,11 @@ ok(!-d $xap, 'Xapian directories removed again');
 		# phrase searches still work
 		delete $ibx->{search};
 		my $mset = $ibx->search->query('"hello world"', {mset=>1});
-		is(0, $mset->size, 'phrase search does not work on medium');
+		is($mset->size, 0, 'phrase search does not work on medium');
 	}
 
 	my $mset = $ibx->search->query('hello world', {mset=>1});
-	isnt(0, $mset->size, "normal search works on indexlevel=medium");
+	isnt($mset->size, 0, "normal search works on indexlevel=medium");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{full} > $sizes{medium}, 'medium is smaller than full');
 }
@@ -173,7 +173,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello', {mset=>1});
-	is(0, $mset->size, "search fails on indexlevel='basic'");
+	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{medium} > $sizes{basic}, 'basic is smaller than medium');
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (3 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Deeply inspect the entire message map in the reindexing tests
as the actual message order is significant and can result
in surprises.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 35 +++++++++++++++++++++++++++++++++++
 t/v2reindex.t | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index de4fafda5ae9..877e70390910 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -32,6 +32,7 @@ my $mime = PublicInbox::MIME->create(
 	body => "hello world\n",
 );
 my $minmax;
+my $msgmap;
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -57,6 +58,19 @@ my $minmax;
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+
+	my ($min, $max) = @$minmax;
+	$msgmap = $ibx->mm->msg_range(\$min, $max);
+	is_deeply($msgmap, [
+			  [1, '1@example.com' ],
+			  [2, '2@example.com' ],
+			  [3, '3@example.com' ],
+			  [6, '6@example.com' ],
+			  [7, '7@example.com' ],
+			  [8, '8@example.com' ],
+			  [9, '9@example.com' ],
+			  [10, '10@example.com' ],
+		  ], 'msgmap as expected');
 }
 
 {
@@ -67,6 +81,9 @@ my $minmax;
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing');
 	$im->done;
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my $xap = "$mainrepo/public-inbox/xapian".PublicInbox::Search::SCHEMA_VERSION();
@@ -85,6 +102,9 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -104,6 +124,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -123,6 +146,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = @$minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -145,6 +171,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'got Xapian search results');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -167,6 +196,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	is($mset->size, 0, "no Xapian search results");
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 # upgrade existing basic to medium
@@ -185,6 +217,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply(\@warn, [], 'no warnings');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'search OK after basic -> medium');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 67d8be787a78..31e61d5b2578 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -32,6 +32,7 @@ my $mime = PublicInbox::MIME->create(
 );
 local $ENV{NPROC} = 2;
 my $minmax;
+my $msgmap;
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -53,6 +54,19 @@ my $minmax;
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+
+	my ($min, $max) = @$minmax;
+	$msgmap = $ibx->mm->msg_range(\$min, $max);
+	is_deeply($msgmap, [
+			  [1, '1@example.com' ],
+			  [2, '2@example.com' ],
+			  [3, '3@example.com' ],
+			  [6, '6@example.com' ],
+			  [7, '7@example.com' ],
+			  [8, '8@example.com' ],
+			  [9, '9@example.com' ],
+			  [10, '10@example.com' ],
+		  ], 'msgmap as expected');
 }
 
 {
@@ -65,6 +79,9 @@ my $minmax;
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my $xap = "$mainrepo/xap".PublicInbox::Search::SCHEMA_VERSION();
@@ -81,6 +98,9 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -99,6 +119,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my %sizes;
@@ -121,6 +144,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
 	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -153,6 +179,10 @@ ok(!-d $xap, 'Xapian directories removed again');
 	isnt($mset->size, 0, "normal search works on indexlevel=medium");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{full} > $sizes{medium}, 'medium is smaller than full');
+
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -176,6 +206,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{medium} > $sizes{basic}, 'basic is smaller than medium');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 done_testing();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (4 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Capture interesting commits of the test repository in mark variables.

Use those marks to build interesting scenarios where index_sync proceeds
as if those marks are the heads of the repositor.  Use this capability to
test what happens when adds and deletes are mixed within a repository.

Be sad because things don't yet work as they should.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 194 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/v2reindex.t | 195 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 389 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 877e70390910..8e78aa761333 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -33,6 +33,7 @@ my $mime = PublicInbox::MIME->create(
 );
 my $minmax;
 my $msgmap;
+my ($mark1, $mark2, $mark3, $mark4);
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -41,13 +42,17 @@ my $msgmap;
 		$mime->header_set('Message-Id', "<$i\@example.com>");
 		ok($im->add($mime), "message $i added");
 		if ($i == 4) {
+			$mark1 = $im->get_mark($im->{tip});
 			$im->remove($mime);
+			$mark2 = $im->get_mark($im->{tip});
 		}
 	}
 
 	if ('test remove later') {
+		$mark3 = $im->get_mark($im->{tip});
 		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
+		$mark4 = $im->get_mark($im->{tip});
 	}
 
 	$im->done;
@@ -222,4 +227,193 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
+# An incremental indexing test
+ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark1 4 simple additions in the same index_sync
+	eval { $rw->index_sync({ref => $mark1}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 4, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [4, '4@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark2 A delete separated form and add in the same index_sync
+	eval { $rw->index_sync({ref => $mark2}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark3 adds following the delete at mark2
+	eval { $rw->index_sync({ref => $mark3}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark4 A delete of an older message
+	eval { $rw->index_sync({ref => $mark4}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
+# Another incremental indexing test
+ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark2 an add and it's delete in the same index_sync
+	eval { $rw->index_sync({ref => $mark2}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark3 adds following the delete at mark2
+	eval { $rw->index_sync({ref => $mark3}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark4 A delete of an older message
+	eval { $rw->index_sync({ref => $mark4}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 31e61d5b2578..ce13a07c39c0 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -33,21 +33,27 @@ my $mime = PublicInbox::MIME->create(
 local $ENV{NPROC} = 2;
 my $minmax;
 my $msgmap;
+my ($mark1, $mark2, $mark3, $mark4);
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
 	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	my $im0 = $im->importer();
 	foreach my $i (1..10) {
 		$mime->header_set('Message-Id', "<$i\@example.com>");
 		ok($im->add($mime), "message $i added");
 		if ($i == 4) {
+			$mark1 = $im0->get_mark($im0->{tip});
 			$im->remove($mime);
+			$mark2 = $im0->get_mark($im0->{tip});
 		}
 	}
 
 	if ('test remove later') {
+		$mark3 = $im0->get_mark($im0->{tip});
 		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
+		$mark4 = $im0->get_mark($im0->{tip});
 	}
 
 	$im->done;
@@ -211,4 +217,193 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
+
+# An incremental indexing test
+ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark1 4 simple additions in the same index_sync
+	$ibx->{ref_head} = $mark1;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 4, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [4, '4@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark2 A delete separated from an add in the same index_sync
+	$ibx->{ref_head} = $mark2;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark3 adds following the delete at mark2
+	$ibx->{ref_head} = $mark3;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark4 A delete of an older message
+	$ibx->{ref_head} = $mark4;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
+# Another incremental indexing test
+ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark2 an add and it's delete in the same index_sync
+	$ibx->{ref_head} = $mark2;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark3 adds following the delete at mark2
+	$ibx->{ref_head} = $mark3;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark4 A delete of an older message
+	$ibx->{ref_head} = $mark4;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
 done_testing();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (5 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

When walking messages newest to oldest, assigning the larger numbers
before smaller numbers ensures older messages get smaller numbers.
This leads to the possibility of a msgmap that can be regenerated when
needed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 1d259a8642a3..ac821ac00c6a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -619,23 +619,28 @@ sub _git_log {
 	my ($self, $range) = @_;
 	my $git = $self->{git};
 
+	# Count the new files so they can be added newest to oldest
+	# and still have numbers increasing from oldest to newest
+	my $fcount = 0;
+	# can't use 'rev-list --count' if we use --diff-filter
+	my $fh = $git->popen(qw(log --pretty=tformat:%h
+			     --no-notes --no-color --no-renames
+			     --diff-filter=AM), $range);
+	++$fcount while <$fh>;
+	my (undef, $max) = $self->{mm}->minmax;
+
 	if (index($range, '..') < 0) {
-		my $regen_max = 0;
-		# can't use 'rev-list --count' if we use --diff-filter
-		my $fh = $git->popen(qw(log --pretty=tformat:%h
-				--no-notes --no-color --no-renames
-				--diff-filter=AM), $range);
-		++$regen_max while <$fh>;
-		my (undef, $max) = $self->{mm}->minmax;
-
-		if ($max && $max == $regen_max) {
+		if ($max && $max == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
 			$self->{regen_up} = $max;
 		} else {
 			# normal regen is for for fresh data
-			$self->{regen_down} = $regen_max;
+			$self->{regen_down} = $fcount;
 		}
+	} else {
+		# Give oldest messages the smallest numbers
+		$self->{regen_down} = $max + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (6 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-02  3:00   ` Eric Wong
  2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Today the only thing that prevents public-inbox not reusing the
message numbers of deleted messages is the sqlite autoincrement magic
and that only works part of the time.  The new incremental indexing
test has revealed areas where today public-inbox does try to reuse
numbers of deleted messages.

Reusing the message numbers of existing messages is a problem because
if a client ever sees messages that are subsequently deleted the
client will not see the new messages with their old numbers.

In practice this is difficult to trigger because it requires the most
recently added message to be removed and have the removal show up in a
separate pull request.  Still it can happen and it should be handled.

Instead of infering the highset number ever used by finding the maximum
number in the message map, track the largest number ever assigned directly.

Update Msgmap to track this value and update the indexers to use this
value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Msgmap.pm     | 23 +++++++++++++++++++++--
 lib/PublicInbox/SearchIdx.pm  |  8 ++++----
 lib/PublicInbox/V2Writable.pm |  4 ++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index fdc71e46e214..b79e4cde30b2 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -51,6 +51,10 @@ sub new_file {
 		$dbh->begin_work;
 		$self->created_at(time) unless $self->created_at;
 		$dbh->commit;
+
+		my (undef, $max) = $self->minmax();
+		$max ||= 0;
+		$self->num_highwater($max);
 	}
 	$self;
 }
@@ -107,6 +111,17 @@ sub created_at {
 	$self->meta_accessor('created_at', $second);
 }
 
+sub num_highwater {
+	my ($self, $num) = @_;
+	my $high = $self->{num_highwater} ||=
+	    $self->meta_accessor('num_highwater');
+	if (defined($num) && (!defined($high) || ($num > $high))) {
+		$self->{num_highwater} = $num;
+		$self->meta_accessor('num_highwater', $num);
+	}
+	$self->{num_highwater};
+}
+
 sub mid_insert {
 	my ($self, $mid) = @_;
 	my $dbh = $self->{dbh};
@@ -114,7 +129,9 @@ sub mid_insert {
 INSERT OR IGNORE INTO msgmap (mid) VALUES (?)
 
 	return if $sth->execute($mid) == 0;
-	$dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	my $num = $dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	$self->num_highwater($num) unless !defined($num);
+	$num;
 }
 
 sub mid_for {
@@ -213,7 +230,9 @@ sub mid_set {
 		$self->{dbh}->prepare(
 			'INSERT OR IGNORE INTO msgmap (num,mid) VALUES (?,?)');
 	};
-	$sth->execute($num, $mid);
+	my $result = $sth->execute($num, $mid);
+	$self->num_highwater($num) if (defined($result) && $result == 1);
+	$result;
 }
 
 sub DESTROY {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index ac821ac00c6a..2532c8dfd10d 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -627,20 +627,20 @@ sub _git_log {
 			     --no-notes --no-color --no-renames
 			     --diff-filter=AM), $range);
 	++$fcount while <$fh>;
-	my (undef, $max) = $self->{mm}->minmax;
+	my $high = $self->{mm}->num_highwater;
 
 	if (index($range, '..') < 0) {
-		if ($max && $max == $fcount) {
+		if ($high && $high == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
-			$self->{regen_up} = $max;
+			$self->{regen_up} = $high;
 		} else {
 			# normal regen is for for fresh data
 			$self->{regen_down} = $fcount;
 		}
 	} else {
 		# Give oldest messages the smallest numbers
-		$self->{regen_down} = $max + $fcount;
+		$self->{regen_down} = $high + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 934640eb672c..c450980c8f51 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -879,9 +879,9 @@ sub index_sync {
 	my $mm_tmp = $self->{mm}->tmp_clone;
 	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
 
-	my ($min, $max) = $mm_tmp->minmax;
+	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
-	$$regen += $max if $max;
+	$$regen += $high if $high;
 	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 09/13] t/v[12]reindex.t Verify num_highwater
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (7 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 7 +++++++
 t/v2reindex.t | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 8e78aa761333..876c9db3441a 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -246,6 +246,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 4, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -269,6 +270,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -291,6 +293,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -319,6 +322,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -352,6 +356,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -374,6 +379,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -402,6 +408,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
diff --git a/t/v2reindex.t b/t/v2reindex.t
index ce13a07c39c0..3d217aff3c72 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -237,6 +237,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 4, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -260,6 +261,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -282,6 +284,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -310,6 +313,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -343,6 +347,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -365,6 +370,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -393,6 +399,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (8 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Instrument the tests to verify the highwater num highwater mark is
where it is expected.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 10 ++++++++++
 t/v2reindex.t |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 876c9db3441a..8be95149723c 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -63,6 +63,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	$msgmap = $ibx->mm->msg_range(\$min, $max);
@@ -87,6 +88,8 @@ my ($mark1, $mark2, $mark3, $mark4);
 	is($@, '', 'no error from reindexing');
 	$im->done;
 
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
+
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
@@ -107,6 +110,7 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -129,6 +133,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -151,6 +156,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -174,6 +180,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'got Xapian search results');
 
@@ -199,6 +206,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	is($mset->size, 0, "no Xapian search results");
 
@@ -223,6 +231,8 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'search OK after basic -> medium');
 
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
+
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 3d217aff3c72..a5454a22abf5 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -60,6 +60,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	$msgmap = $ibx->mm->msg_range(\$min, $max);
@@ -85,6 +86,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -104,6 +106,7 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -125,6 +128,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -147,6 +151,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
 	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
@@ -172,6 +177,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	if (0) {
 		# not sure why, but Xapian seems to fallback to terms and
@@ -208,6 +214,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('hello', {mset=>1});
 	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (9 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

When performing an incremental index update with index_sync if a message is seen
to be both added and deleted update the num_highwater mark even though the
message is not otherwise indexed.

This ensures index_sync generates the same msgmap no matter which commit
it stops at during incremental syncs.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm  |  3 ++-
 lib/PublicInbox/V2Writable.pm | 10 ++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 2532c8dfd10d..54f82aa8ee4e 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -583,7 +583,8 @@ sub read_log {
 			my $blob = $1;
 			if (delete $D{$blob}) {
 				if (defined $self->{regen_down}) {
-					$self->{regen_down}--;
+					my $num = $self->{regen_down}--;
+					$self->{mm}->num_highwater($num);
 				}
 				next;
 			}
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index c450980c8f51..92d2672c78c4 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -658,7 +658,7 @@ sub mark_deleted {
 }
 
 sub reindex_oid {
-	my ($self, $mm_tmp, $D, $git, $oid, $regen) = @_;
+	my ($self, $mm_tmp, $D, $git, $oid, $regen, $reindex) = @_;
 	my $len;
 	my $msgref = $git->cat_file($oid, \$len);
 	my $mime = PublicInbox::MIME->new($$msgref);
@@ -700,7 +700,8 @@ sub reindex_oid {
 
 	if (!defined($mid0) || $del) {
 		if (!defined($mid0) && $del) { # expected for deletes
-			$$regen--;
+			$num = $$regen--;
+			$self->{mm}->num_highwater($num) unless $reindex;
 			return
 		}
 
@@ -877,7 +878,8 @@ sub index_sync {
 	return unless defined $latest;
 	$self->idx_init; # acquire lock
 	my $mm_tmp = $self->{mm}->tmp_clone;
-	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
+	my $reindex = $opts->{reindex};
+	my $ranges = $reindex ? [] : $self->last_commits($epoch_max);
 
 	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
@@ -903,7 +905,7 @@ sub index_sync {
 				chomp($cmt = $_);
 			} elsif (/\A:\d{6} 100644 $x40 ($x40) [AM]\tm$/o) {
 				$self->reindex_oid($mm_tmp, $D, $git, $1,
-						$regen);
+						$regen, $reindex);
 			} elsif (/\A:\d{6} 100644 $x40 ($x40) [AM]\td$/o) {
 				$self->mark_deleted($D, $git, $1);
 			}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (10 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

The lookup to see if a num has already been assigned to a message
happens in a temporary copy of message map.  It is possible that the
number has been removed from the current message map.  The
unindex/reindex after a history rewrite triggered by a purge should be
one such case.  Therefore add the number to the msgmap in case it is
not currently present.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 92d2672c78c4..4dd14331a78f 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -676,6 +676,7 @@ sub reindex_oid {
 		if (defined $n && $n > $num) {
 			$mid0 = $mid;
 			$num = $n;
+			$self->{mm}->mid_set($num, $mid0);
 		}
 	}
 	if (!defined($mid0) && $regen && !$del) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (11 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  12 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Now that we track the num highwater mark it is safe to remove messages
from msgmap that have been previously allocated.  Removing even the
highest numbered article will no longer cause new message numbers to
move backwards.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 4dd14331a78f..0396d9f58e55 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -842,7 +842,10 @@ sub unindex_oid {
 			warn "BUG: multiple articles linked to $oid\n",
 				join(',',sort keys %gone), "\n";
 		}
-		$self->{unindexed}->{$_}++ foreach keys %gone;
+		foreach my $num (keys %gone) {
+			$self->{unindexed}->{$_}++;
+			$self->{mm}->num_delete($num);
+		}
 		$self->unindex_oid_remote($oid, $mid);
 	}
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
@ 2018-08-02  3:00   ` Eric Wong
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
  2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Wong @ 2018-08-02  3:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> --- a/lib/PublicInbox/Msgmap.pm
> +++ b/lib/PublicInbox/Msgmap.pm
> @@ -51,6 +51,10 @@ sub new_file {
>  		$dbh->begin_work;
>  		$self->created_at(time) unless $self->created_at;
>  		$dbh->commit;
> +
> +		my (undef, $max) = $self->minmax();
> +		$max ||= 0;
> +		$self->num_highwater($max);

I think the calls to minmax and num_highwater should be in the
transaction right above it.  I can squash locally if that's the
only thing...

Everything else looks good, so far; thanks.


While testing this, it looks like I introduced a bug to
indexlevel=basic which broke incremental indexing when I made it
possible to upgrade to (medium|full).  Patch coming for that in
a bit...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02  3:00   ` Eric Wong
@ 2018-08-02  3:44     ` Eric Wong
  2018-08-02 12:25       ` ebiederm
  2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Wong @ 2018-08-02  3:44 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

I wrote:
> While testing this, it looks like I introduced a bug to
> indexlevel=basic which broke incremental indexing when I made it
> possible to upgrade to (medium|full).  Patch coming for that in
> a bit...

Eep, I think there's deeper problems with indexlevel=basic and
incremental updates, since it's still doing lookups against
Xapian for deletes...

This is my work-in-progress to stop full-reindexing, at least.

----8<-----
Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic

---
 lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 54f82aa..5cac08f 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -668,6 +668,15 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit {
+	my ($self) = @_;
+	if ($self->{indexlevel} =~ $xapianlevels) {
+		$self->{xdb}->get_metadata('last_commit');
+	} else {
+		$self->{mm}->last_commit || '';
+	}
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,7 +691,7 @@ sub _index_sync {
 	do {
 		$xlog = undef;
 		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
+		$last_commit = $self->_last_x_commit;
 		$lx = $last_commit;
 		if ($reindex) {
 			$lx = '';
@@ -707,7 +716,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
-- 
EW

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-02  3:00   ` Eric Wong
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
@ 2018-08-02 12:08     ` ebiederm
  1 sibling, 0 replies; 20+ messages in thread
From: ebiederm @ 2018-08-02 12:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> --- a/lib/PublicInbox/Msgmap.pm
>> +++ b/lib/PublicInbox/Msgmap.pm
>> @@ -51,6 +51,10 @@ sub new_file {
>>  		$dbh->begin_work;
>>  		$self->created_at(time) unless $self->created_at;
>>  		$dbh->commit;
>> +
>> +		my (undef, $max) = $self->minmax();
>> +		$max ||= 0;
>> +		$self->num_highwater($max);
>
> I think the calls to minmax and num_highwater should be in the
> transaction right above it.  I can squash locally if that's the
> only thing...

Sounds good.  With autocommit being on by default and repository locking
happening I was not at all certain where transaction boundaries need to
go (or if they are really needed).  But by it's very nature
num_highwater needs to be in the same transaction as whatever adds a new
higher number into msgmap.

> Everything else looks good, so far; thanks.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
@ 2018-08-02 12:25       ` ebiederm
  2018-08-02 17:12         ` ebiederm
  0 siblings, 1 reply; 20+ messages in thread
From: ebiederm @ 2018-08-02 12:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> I wrote:
>> While testing this, it looks like I introduced a bug to
>> indexlevel=basic which broke incremental indexing when I made it
>> possible to upgrade to (medium|full).  Patch coming for that in
>> a bit...
>
> Eep, I think there's deeper problems with indexlevel=basic and
> incremental updates, since it's still doing lookups against
> Xapian for deletes...
>
> This is my work-in-progress to stop full-reindexing, at least.

So I don't think your change is quite right.

If the repo has only lived with indexlevel == 'basic' then last_commit
should always match the output of xdb->get_metadata('last_commit') as
it will always been undef or ''.

In fact I don't see any code that is updating
$xdb->get_metadata('last_commit');

Am I wrong or can we just always use mm->last_commit for this logic?

Eric

> ----8<-----
> Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic
>
> ---
>  lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
> index 54f82aa..5cac08f 100644
> --- a/lib/PublicInbox/SearchIdx.pm
> +++ b/lib/PublicInbox/SearchIdx.pm
> @@ -668,6 +668,15 @@ sub need_update ($$$) {
>  	($n eq '' || $n > 0);
>  }
>  
> +sub _last_x_commit {
> +	my ($self) = @_;
> +	if ($self->{indexlevel} =~ $xapianlevels) {
> +		$self->{xdb}->get_metadata('last_commit');
> +	} else {
> +		$self->{mm}->last_commit || '';
> +	}
> +}
> +
>  # indexes all unindexed messages (v1 only)
>  sub _index_sync {
>  	my ($self, $opts) = @_;
> @@ -682,7 +691,7 @@ sub _index_sync {
>  	do {
>  		$xlog = undef;
>  		$mkey = 'last_commit';
> -		$last_commit = $xdb->get_metadata('last_commit');
> +		$last_commit = $self->_last_x_commit;
>  		$lx = $last_commit;
>  		if ($reindex) {
>  			$lx = '';
> @@ -707,7 +716,7 @@ sub _index_sync {
>  		$xlog = _git_log($self, $range);
>  
>  		$xdb = $self->begin_txn_lazy;
> -	} while ($xdb->get_metadata('last_commit') ne $last_commit);
> +	} while ($self->_last_x_commit ne $last_commit);
>  
>  	my $dbh = $mm->{dbh} if $mm;
>  	my $cb = sub {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02 12:25       ` ebiederm
@ 2018-08-02 17:12         ` ebiederm
  2018-08-02 18:15           ` ebiederm
  0 siblings, 1 reply; 20+ messages in thread
From: ebiederm @ 2018-08-02 17:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> Eric Wong <e@80x24.org> writes:
>
>> I wrote:
>>> While testing this, it looks like I introduced a bug to
>>> indexlevel=basic which broke incremental indexing when I made it
>>> possible to upgrade to (medium|full).  Patch coming for that in
>>> a bit...
>>
>> Eep, I think there's deeper problems with indexlevel=basic and
>> incremental updates, since it's still doing lookups against
>> Xapian for deletes...
>>
>> This is my work-in-progress to stop full-reindexing, at least.
>
> So I don't think your change is quite right.
>
> If the repo has only lived with indexlevel == 'basic' then last_commit
> should always match the output of xdb->get_metadata('last_commit') as
> it will always been undef or ''.
>
> In fact I don't see any code that is updating
> $xdb->get_metadata('last_commit');
>
> Am I wrong or can we just always use mm->last_commit for this logic?

Yes.  I was wrong.  I had overlooked the get_metadata($mkey) and
set_metadata($mkey) calls.   From just below but those are all
with $mkey == 'last_commit'  So 'last_commit' does get updated.

I think we want something more like below.  Where _last_x_commit
just computes the last_commit value to use, using both values of
last_commit.

I am not 100% about the loop.  Perhaps that check to see if either
value of last_commit changes?

I definitely don't think when we already have one set of logic
for substituting in last_commit we should be adding in another.
That gets confusing very fast.

Eric


diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 198f39b96e6c..130e5a41c5e7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -669,6 +669,23 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit
+{
+	my ($self) = @_;
+        my $lm = $mm->last_commit || '';
+	my $lx = '';
+	if ($self->{indexlevel} ~= $xapianlevels) {
+		$lx = $self->{xdb}->get_metadata('last_commit') || '';
+        } else {
+		$lx = $lm;
+	}
+	# Use last_commit from msgmap if it is older or unset
+	if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) {
+		$lx = $lm;
+	}
+	$lx;
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,19 +699,8 @@ sub _index_sync {
 	my $mm = _msgmap_init($self);
 	do {
 		$xlog = undef;
-		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
-		$lx = $last_commit;
-		if ($reindex) {
-			$lx = '';
-			$mkey = undef if $last_commit ne '';
-		}
-
-		# use last_commit from msgmap if it is older or unset
-		my $lm = $mm->last_commit || '';
-		if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) {
-			$lx = $lm;
-		}
+		$last_commit = $self->_last_x_commit;
+		$lx = !$reindex ? $last_commit : '';
 
 		$self->{over}->rollback_lazy;
 		$self->{over}->disconnect;
@@ -708,7 +714,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
@@ -722,7 +728,7 @@ sub _index_sync {
 			}
 			$dbh->commit;
 		}
-		if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) {
+		if ($newest && $self->{indexlevel} =~ $xapianlevels) {
 			my $cur = $xdb->get_metadata($mkey);
 			if (need_update($self, $cur, $newest)) {
 				$xdb->set_metadata($mkey, $newest);


>> ----8<-----
>> Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic
>>
>> ---
>>  lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
>> index 54f82aa..5cac08f 100644
>> --- a/lib/PublicInbox/SearchIdx.pm
>> +++ b/lib/PublicInbox/SearchIdx.pm
>> @@ -668,6 +668,15 @@ sub need_update ($$$) {
>>  	($n eq '' || $n > 0);
>>  }
>>  
>> +sub _last_x_commit {
>> +	my ($self) = @_;
>> +	if ($self->{indexlevel} =~ $xapianlevels) {
>> +		$self->{xdb}->get_metadata('last_commit');
>> +	} else {
>> +		$self->{mm}->last_commit || '';
>> +	}
>> +}
>> +
>>  # indexes all unindexed messages (v1 only)
>>  sub _index_sync {
>>  	my ($self, $opts) = @_;
>> @@ -682,7 +691,7 @@ sub _index_sync {
>>  	do {
>>  		$xlog = undef;
>>  		$mkey = 'last_commit';
>> -		$last_commit = $xdb->get_metadata('last_commit');
>> +		$last_commit = $self->_last_x_commit;
>>  		$lx = $last_commit;
>>  		if ($reindex) {
>>  			$lx = '';
>> @@ -707,7 +716,7 @@ sub _index_sync {
>>  		$xlog = _git_log($self, $range);
>>  
>>  		$xdb = $self->begin_txn_lazy;
>> -	} while ($xdb->get_metadata('last_commit') ne $last_commit);
>> +	} while ($self->_last_x_commit ne $last_commit);
>>  
>>  	my $dbh = $mm->{dbh} if $mm;
>>  	my $cb = sub {


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02 17:12         ` ebiederm
@ 2018-08-02 18:15           ` ebiederm
  0 siblings, 0 replies; 20+ messages in thread
From: ebiederm @ 2018-08-02 18:15 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Eric Wong <e@80x24.org> writes:
>>
>>> I wrote:
>>>> While testing this, it looks like I introduced a bug to
>>>> indexlevel=basic which broke incremental indexing when I made it
>>>> possible to upgrade to (medium|full).  Patch coming for that in
>>>> a bit...
>>>
>>> Eep, I think there's deeper problems with indexlevel=basic and
>>> incremental updates, since it's still doing lookups against
>>> Xapian for deletes...
>>>
>>> This is my work-in-progress to stop full-reindexing, at least.
>>
>> So I don't think your change is quite right.
>>
>> If the repo has only lived with indexlevel == 'basic' then last_commit
>> should always match the output of xdb->get_metadata('last_commit') as
>> it will always been undef or ''.
>>
>> In fact I don't see any code that is updating
>> $xdb->get_metadata('last_commit');
>>
>> Am I wrong or can we just always use mm->last_commit for this logic?
>
> Yes.  I was wrong.  I had overlooked the get_metadata($mkey) and
> set_metadata($mkey) calls.   From just below but those are all
> with $mkey == 'last_commit'  So 'last_commit' does get updated.
>
> I think we want something more like below.  Where _last_x_commit
> just computes the last_commit value to use, using both values of
> last_commit.
>
> I am not 100% about the loop.  Perhaps that check to see if either
> value of last_commit changes?
>
> I definitely don't think when we already have one set of logic
> for substituting in last_commit we should be adding in another.
> That gets confusing very fast.
>
Make that something like this where we remove the $mkey variable 
and all of it's special semantics with regards to reindexing entirely.

Eric


diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 198f39b96e6c..8b12da0436a4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -669,6 +669,23 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit
+{
+	my ($self) = @_;
+        my $lm = $mm->last_commit || '';
+	my $lx = '';
+	if ($self->{indexlevel} ~= $xapianlevels) {
+		$lx = $self->{xdb}->get_metadata('last_commit') || '';
+        } else {
+		$lx = $lm;
+	}
+	# Use last_commit from msgmap if it is older or unset
+	if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) {
+		$lx = $lm;
+	}
+	$lx;
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,19 +699,8 @@ sub _index_sync {
 	my $mm = _msgmap_init($self);
 	do {
 		$xlog = undef;
-		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
-		$lx = $last_commit;
-		if ($reindex) {
-			$lx = '';
-			$mkey = undef if $last_commit ne '';
-		}
-
-		# use last_commit from msgmap if it is older or unset
-		my $lm = $mm->last_commit || '';
-		if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) {
-			$lx = $lm;
-		}
+		$last_commit = $self->_last_x_commit;
+		$lx = !$reindex ? $last_commit : '';
 
 		$self->{over}->rollback_lazy;
 		$self->{over}->disconnect;
@@ -708,7 +714,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
@@ -722,10 +728,10 @@ sub _index_sync {
 			}
 			$dbh->commit;
 		}
-		if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) {
-			my $cur = $xdb->get_metadata($mkey);
+		if ($newest && $self->{indexlevel} =~ $xapianlevels) {
+			my $cur = $xdb->get_metadata('last_commit');
 			if (need_update($self, $cur, $newest)) {
-				$xdb->set_metadata($mkey, $newest);
+				$xdb->set_metadata('last_commit', $newest);
 			}
 		}
 		$self->commit_txn_lazy;

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
2018-08-02  3:00   ` Eric Wong
2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
2018-08-02 12:25       ` ebiederm
2018-08-02 17:12         ` ebiederm
2018-08-02 18:15           ` ebiederm
2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox