user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 00/13]: Incremental index fixes
@ 2018-08-01 16:41 ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: ebiederm @ 2018-08-01 16:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Eric,

Recently I noticed some weird indexing on one of my mailboxes.  So I dug
in and made the index_sync tests more robust and found several corner
cases where the current code can be problematic.  What follows are the
test cases and fixes for those cases.

The big one is when a message with the highest message number is
deleted.  This can cause message numbers to be reassigned so I
explicitly track the highest message number assigned.

Very confusing when reading mail there is a case in SearchIdx
that has been assigning message numbers in the wrong order.  Depending
on your time range that can be very confusing as old messages show up
before newer ones.

Finally in v2 deleted messages have not been being deleted from the
msgmap.  Which while great for keeping message numbers from going
backwards it means things still show up that shouldn't.

Eric W. Biederman (13):
      Import.pm: Don't assume {in} and {out} always exist
      t/v1reindex.t: Isolate the test cases
      t/v2reindex.t: Isolate the test cases more
      t/v[12]reindex.t: Place expected second in Xapian tests
      t/v[12]reindex.t: Test that the resulting msgmap is as expected
      t/v[12]reindex.t: Test incremental indexing works
      SearchIdx.pm: Always assign numbers backwards during incremental indexing
      Msgmap.pm: Track the largest value of num ever assigned
      t/v[12]reindex.t Verify num_highwater
      t/v[12]reindex.t: Verify the num highwater is as expected
      SearchIdx,V2Writeable: Update num_highwater on optimized deletes
      V2Writeable.pm: Ensure that a found message number is in the msgmap
      V2Writeable.pm: In unindex_oid delete the message from msgmap

 lib/PublicInbox/Import.pm     |   1 +
 lib/PublicInbox/Msgmap.pm     |  23 ++-
 lib/PublicInbox/SearchIdx.pm  |  30 ++--
 lib/PublicInbox/V2Writable.pm |  20 ++-
 t/v1reindex.t                 | 359 ++++++++++++++++++++++++++++++++++++------
 t/v2reindex.t                 | 335 +++++++++++++++++++++++++++++++++++----
 6 files changed, 666 insertions(+), 102 deletions(-)


Eric



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While working on one of the tests I did:
	my $im = PublicInbox::V2Writable->new($ibx, 1);
	my $im0 = $im->importer();
	$im->add($mime);

Which resulted in a warning of the use of an undefined value from
atfork_child, and the test failing nastily.  Inspection of the code
reveals this can happen anytime gfi_start has not been called.

So just fix atfork_child to skip closing file descriptors that have
not yet been setup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Import.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 4e3b4c551797..bfa7a8053297 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -451,6 +451,7 @@ sub done {
 sub atfork_child {
 	my ($self) = @_;
 	foreach my $f (qw(in out)) {
+		next unless defined($self->{$f});
 		close $self->{$f} or die "failed to close import[$f]: $!\n";
 	}
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 02/13] t/v1reindex.t: Isolate the test cases
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While inspecting the tests I realized that because we have been
reusing variables there can be a memory between one test case and
another.  Add scopes and local variables to prevent an unintended
memory between one test and another.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 111 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 66 insertions(+), 45 deletions(-)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 75380f0f5f22..fdffdaeee892 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -22,7 +22,6 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $ibx = PublicInbox::Inbox->new($ibx_config);
 my $mime = PublicInbox::MIME->create(
 	header => [
 		From => 'a@example.com',
@@ -32,55 +31,72 @@ my $mime = PublicInbox::MIME->create(
 	],
 	body => "hello world\n",
 );
-my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
-foreach my $i (1..10) {
-	$mime->header_set('Message-Id', "<$i\@example.com>");
-	ok($im->add($mime), "message $i added");
-	if ($i == 4) {
-		$im->remove($mime);
+my $minmax;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	foreach my $i (1..10) {
+		$mime->header_set('Message-Id', "<$i\@example.com>");
+		ok($im->add($mime), "message $i added");
+		if ($i == 4) {
+			$im->remove($mime);
+		}
 	}
-}
 
-if ('test remove later') {
-	$mime->header_set('Message-Id', "<5\@example.com>");
-	$im->remove($mime);
-}
+	if ('test remove later') {
+		$mime->header_set('Message-Id', "<5\@example.com>");
+		$im->remove($mime);
+	}
 
-$im->done;
-my $rw = PublicInbox::SearchIdx->new($ibx, 1);
-eval { $rw->index_sync() };
-is($@, '', 'no error from indexing');
+	$im->done;
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	eval { $rw->index_sync() };
+	is($@, '', 'no error from indexing');
 
-my $minmax = [ $ibx->mm->minmax ];
-ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
-is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	$minmax = [ $ibx->mm->minmax ];
+	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
+	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+}
 
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
-eval { $rw->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	eval { $rw->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+}
 
 my $xap = "$mainrepo/public-inbox/xapian".PublicInbox::Search::SCHEMA_VERSION();
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed');
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 
-eval { $rw->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
-ok(-d $xap, 'Xapian directories recreated');
+	eval { $rw->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+	ok(-d $xap, 'Xapian directories recreated');
 
-delete $ibx->{mm};
-is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is(scalar(@warn), 0, 'no warnings from reindexing');
@@ -93,11 +109,13 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -110,13 +128,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -131,13 +150,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'basic';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'basic';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -152,13 +172,14 @@ $rw = PublicInbox::SearchIdx->new($ibx, 1);
 # upgrade existing basic to medium
 # note: changing indexlevels is not yet supported in v2,
 # and may not be without more effort
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$rw = PublicInbox::SearchIdx->new($ibx, 1);
 # no removals
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexleve} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
 	eval { $rw->index_sync };
 	is($@, '', 'no error from indexing');
 	is_deeply(\@warn, [], 'no warnings');
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 03/13] t/v2reindex.t: Isolate the test cases more
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
  2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

While inspecting the tests I realized that because we have been
reusing variables there can be a memory between one test case and
another.  Add scopes and local variables to prevent an unintended
memory between one test cases.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v2reindex.t | 85 ++++++++++++++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 31 deletions(-)

diff --git a/t/v2reindex.t b/t/v2reindex.t
index 1543309cf1ee..4d06c6ce6ea6 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -21,7 +21,6 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $ibx = PublicInbox::Inbox->new($ibx_config);
 my $mime = PublicInbox::MIME->create(
 	header => [
 		From => 'a@example.com',
@@ -32,39 +31,57 @@ my $mime = PublicInbox::MIME->create(
 	body => "hello world\n",
 );
 local $ENV{NPROC} = 2;
-my $im = PublicInbox::V2Writable->new($ibx, 1);
-foreach my $i (1..10) {
-	$mime->header_set('Message-Id', "<$i\@example.com>");
-	ok($im->add($mime), "message $i added");
-	if ($i == 4) {
+my $minmax;
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	foreach my $i (1..10) {
+		$mime->header_set('Message-Id', "<$i\@example.com>");
+		ok($im->add($mime), "message $i added");
+		if ($i == 4) {
+			$im->remove($mime);
+		}
+	}
+
+	if ('test remove later') {
+		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
 	}
-}
 
-if ('test remove later') {
-	$mime->header_set('Message-Id', "<5\@example.com>");
-	$im->remove($mime);
+	$im->done;
+	$minmax = [ $ibx->mm->minmax ];
+	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
+	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
 }
 
-$im->done;
-my $minmax = [ $ibx->mm->minmax ];
-ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
-is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	eval { $im->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
 
-eval { $im->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 my $xap = "$mainrepo/xap".PublicInbox::Search::SCHEMA_VERSION();
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed');
-eval { $im->index_sync({reindex => 1}) };
-is($@, '', 'no error from reindexing');
-$im->done;
-ok(-d $xap, 'Xapian directories recreated');
+{
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	eval { $im->index_sync({reindex => 1}) };
+	is($@, '', 'no error from reindexing');
+	$im->done;
+	ok(-d $xap, 'Xapian directories recreated');
 
-delete $ibx->{mm};
-is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	delete $ibx->{mm};
+	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+}
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
@@ -72,6 +89,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is(scalar(@warn), 0, 'no warnings from reindexing');
@@ -88,6 +108,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx, 1);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -103,13 +126,13 @@ ok(!-d $xap, 'Xapian directories removed again');
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'medium';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$im = PublicInbox::V2Writable->new($ibx);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'medium';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
@@ -135,13 +158,13 @@ $im = PublicInbox::V2Writable->new($ibx);
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
 remove_tree($xap);
 ok(!-d $xap, 'Xapian directories removed again');
-
-$ibx_config->{indexlevel} = 'basic';
-$ibx = PublicInbox::Inbox->new($ibx_config);
-$im = PublicInbox::V2Writable->new($ibx);
 {
 	my @warn;
 	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	$config{indexlevel} = 'basic';
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::V2Writable->new($ibx);
 	eval { $im->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing without msgmap');
 	is_deeply(\@warn, [], 'no warnings');
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (2 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Place the expected value second in is and isnt tests because when
these tests fail they report the second value as the expected value.

A report saying got 0 expected 8 'no Xapian search results' can be confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 6 +++---
 t/v2reindex.t | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index fdffdaeee892..de4fafda5ae9 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -144,7 +144,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
-	isnt(0, $mset->size, 'got Xapian search results');
+	isnt($mset->size, 0, 'got Xapian search results');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -166,7 +166,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
-	is(0, $mset->size, "no Xapian search results");
+	is($mset->size, 0, "no Xapian search results");
 }
 
 # upgrade existing basic to medium
@@ -184,7 +184,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is($@, '', 'no error from indexing');
 	is_deeply(\@warn, [], 'no warnings');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
-	isnt(0, $mset->size, 'search OK after basic -> medium');
+	isnt($mset->size, 0, 'search OK after basic -> medium');
 }
 
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 4d06c6ce6ea6..67d8be787a78 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -119,7 +119,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
-	isnt(0, $mset->size, "phrase search succeeds on indexlevel=full");
+	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 }
 
@@ -146,11 +146,11 @@ ok(!-d $xap, 'Xapian directories removed again');
 		# phrase searches still work
 		delete $ibx->{search};
 		my $mset = $ibx->search->query('"hello world"', {mset=>1});
-		is(0, $mset->size, 'phrase search does not work on medium');
+		is($mset->size, 0, 'phrase search does not work on medium');
 	}
 
 	my $mset = $ibx->search->query('hello world', {mset=>1});
-	isnt(0, $mset->size, "normal search works on indexlevel=medium");
+	isnt($mset->size, 0, "normal search works on indexlevel=medium");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{full} > $sizes{medium}, 'medium is smaller than full');
 }
@@ -173,7 +173,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello', {mset=>1});
-	is(0, $mset->size, "search fails on indexlevel='basic'");
+	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{medium} > $sizes{basic}, 'basic is smaller than medium');
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (3 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Deeply inspect the entire message map in the reindexing tests
as the actual message order is significant and can result
in surprises.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 35 +++++++++++++++++++++++++++++++++++
 t/v2reindex.t | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index de4fafda5ae9..877e70390910 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -32,6 +32,7 @@ my $mime = PublicInbox::MIME->create(
 	body => "hello world\n",
 );
 my $minmax;
+my $msgmap;
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -57,6 +58,19 @@ my $minmax;
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+
+	my ($min, $max) = @$minmax;
+	$msgmap = $ibx->mm->msg_range(\$min, $max);
+	is_deeply($msgmap, [
+			  [1, '1@example.com' ],
+			  [2, '2@example.com' ],
+			  [3, '3@example.com' ],
+			  [6, '6@example.com' ],
+			  [7, '7@example.com' ],
+			  [8, '8@example.com' ],
+			  [9, '9@example.com' ],
+			  [10, '10@example.com' ],
+		  ], 'msgmap as expected');
 }
 
 {
@@ -67,6 +81,9 @@ my $minmax;
 	eval { $rw->index_sync({reindex => 1}) };
 	is($@, '', 'no error from reindexing');
 	$im->done;
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my $xap = "$mainrepo/public-inbox/xapian".PublicInbox::Search::SCHEMA_VERSION();
@@ -85,6 +102,9 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -104,6 +124,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -123,6 +146,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = @$minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -145,6 +171,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'got Xapian search results');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
@@ -167,6 +196,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	is($mset->size, 0, "no Xapian search results");
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 # upgrade existing basic to medium
@@ -185,6 +217,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply(\@warn, [], 'no warnings');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'search OK after basic -> medium');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 67d8be787a78..31e61d5b2578 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -32,6 +32,7 @@ my $mime = PublicInbox::MIME->create(
 );
 local $ENV{NPROC} = 2;
 my $minmax;
+my $msgmap;
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -53,6 +54,19 @@ my $minmax;
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+
+	my ($min, $max) = @$minmax;
+	$msgmap = $ibx->mm->msg_range(\$min, $max);
+	is_deeply($msgmap, [
+			  [1, '1@example.com' ],
+			  [2, '2@example.com' ],
+			  [3, '3@example.com' ],
+			  [6, '6@example.com' ],
+			  [7, '7@example.com' ],
+			  [8, '8@example.com' ],
+			  [9, '9@example.com' ],
+			  [10, '10@example.com' ],
+		  ], 'msgmap as expected');
 }
 
 {
@@ -65,6 +79,9 @@ my $minmax;
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my $xap = "$mainrepo/xap".PublicInbox::Search::SCHEMA_VERSION();
@@ -81,6 +98,9 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -99,6 +119,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 my %sizes;
@@ -121,6 +144,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
 	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -153,6 +179,10 @@ ok(!-d $xap, 'Xapian directories removed again');
 	isnt($mset->size, 0, "normal search works on indexlevel=medium");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{full} > $sizes{medium}, 'medium is smaller than full');
+
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
@@ -176,6 +206,9 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
 	ok($sizes{medium} > $sizes{basic}, 'basic is smaller than medium');
+
+	my ($min, $max) = $ibx->mm->minmax;
+	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
 done_testing();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (4 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Capture interesting commits of the test repository in mark variables.

Use those marks to build interesting scenarios where index_sync proceeds
as if those marks are the heads of the repositor.  Use this capability to
test what happens when adds and deletes are mixed within a repository.

Be sad because things don't yet work as they should.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 194 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/v2reindex.t | 195 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 389 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 877e70390910..8e78aa761333 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -33,6 +33,7 @@ my $mime = PublicInbox::MIME->create(
 );
 my $minmax;
 my $msgmap;
+my ($mark1, $mark2, $mark3, $mark4);
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
@@ -41,13 +42,17 @@ my $msgmap;
 		$mime->header_set('Message-Id', "<$i\@example.com>");
 		ok($im->add($mime), "message $i added");
 		if ($i == 4) {
+			$mark1 = $im->get_mark($im->{tip});
 			$im->remove($mime);
+			$mark2 = $im->get_mark($im->{tip});
 		}
 	}
 
 	if ('test remove later') {
+		$mark3 = $im->get_mark($im->{tip});
 		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
+		$mark4 = $im->get_mark($im->{tip});
 	}
 
 	$im->done;
@@ -222,4 +227,193 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
+# An incremental indexing test
+ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark1 4 simple additions in the same index_sync
+	eval { $rw->index_sync({ref => $mark1}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 4, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [4, '4@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark2 A delete separated form and add in the same index_sync
+	eval { $rw->index_sync({ref => $mark2}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark3 adds following the delete at mark2
+	eval { $rw->index_sync({ref => $mark3}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark4 A delete of an older message
+	eval { $rw->index_sync({ref => $mark4}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
+# Another incremental indexing test
+ok(unlink "$mainrepo/public-inbox/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark2 an add and it's delete in the same index_sync
+	eval { $rw->index_sync({ref => $mark2}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark3 adds following the delete at mark2
+	eval { $rw->index_sync({ref => $mark3}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
+	my $rw = PublicInbox::SearchIdx->new($ibx, 1);
+	# mark4 A delete of an older message
+	eval { $rw->index_sync({ref => $mark4}) };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
 done_testing();
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 31e61d5b2578..ce13a07c39c0 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -33,21 +33,27 @@ my $mime = PublicInbox::MIME->create(
 local $ENV{NPROC} = 2;
 my $minmax;
 my $msgmap;
+my ($mark1, $mark2, $mark3, $mark4);
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
 	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	my $im0 = $im->importer();
 	foreach my $i (1..10) {
 		$mime->header_set('Message-Id', "<$i\@example.com>");
 		ok($im->add($mime), "message $i added");
 		if ($i == 4) {
+			$mark1 = $im0->get_mark($im0->{tip});
 			$im->remove($mime);
+			$mark2 = $im0->get_mark($im0->{tip});
 		}
 	}
 
 	if ('test remove later') {
+		$mark3 = $im0->get_mark($im0->{tip});
 		$mime->header_set('Message-Id', "<5\@example.com>");
 		$im->remove($mime);
+		$mark4 = $im0->get_mark($im0->{tip});
 	}
 
 	$im->done;
@@ -211,4 +217,193 @@ ok(!-d $xap, 'Xapian directories removed again');
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
 
+
+# An incremental indexing test
+ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark1 4 simple additions in the same index_sync
+	$ibx->{ref_head} = $mark1;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 4, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [4, '4@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark2 A delete separated from an add in the same index_sync
+	$ibx->{ref_head} = $mark2;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark3 adds following the delete at mark2
+	$ibx->{ref_head} = $mark3;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark4 A delete of an older message
+	$ibx->{ref_head} = $mark4;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
+
+# Another incremental indexing test
+ok(unlink "$mainrepo/msgmap.sqlite3", 'remove msgmap');
+remove_tree($xap);
+ok(!-d $xap, 'Xapian directories removed again');
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark2 an add and it's delete in the same index_sync
+	$ibx->{ref_head} = $mark2;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 3, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark3 adds following the delete at mark2
+	$ibx->{ref_head} = $mark3;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [5, '5@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+{
+	my @warn;
+	local $SIG{__WARN__} = sub { push @warn, @_ };
+	my %config = %$ibx_config;
+	my $ibx = PublicInbox::Inbox->new(\%config);
+	# mark4 A delete of an older message
+	$ibx->{ref_head} = $mark4;
+	my $im = PublicInbox::V2Writable->new($ibx);
+	eval { $im->index_sync() };
+	is($@, '', 'no error from reindexing without msgmap');
+	is_deeply(\@warn, [], 'no warnings');
+	$im->done;
+	my ($min, $max) = $ibx->mm->minmax;
+	is($min, 1, 'min as expected');
+	is($max, 10, 'max as expected');
+	is_deeply($ibx->mm->msg_range(\$min, $max),
+		  [
+		   [1, '1@example.com' ],
+		   [2, '2@example.com' ],
+		   [3, '3@example.com' ],
+		   [6, '6@example.com' ],
+		   [7, '7@example.com' ],
+		   [8, '8@example.com' ],
+		   [9, '9@example.com' ],
+		   [10, '10@example.com' ],
+		  ], 'msgmap as expected' );
+}
+
 done_testing();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (5 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

When walking messages newest to oldest, assigning the larger numbers
before smaller numbers ensures older messages get smaller numbers.
This leads to the possibility of a msgmap that can be regenerated when
needed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 1d259a8642a3..ac821ac00c6a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -619,23 +619,28 @@ sub _git_log {
 	my ($self, $range) = @_;
 	my $git = $self->{git};
 
+	# Count the new files so they can be added newest to oldest
+	# and still have numbers increasing from oldest to newest
+	my $fcount = 0;
+	# can't use 'rev-list --count' if we use --diff-filter
+	my $fh = $git->popen(qw(log --pretty=tformat:%h
+			     --no-notes --no-color --no-renames
+			     --diff-filter=AM), $range);
+	++$fcount while <$fh>;
+	my (undef, $max) = $self->{mm}->minmax;
+
 	if (index($range, '..') < 0) {
-		my $regen_max = 0;
-		# can't use 'rev-list --count' if we use --diff-filter
-		my $fh = $git->popen(qw(log --pretty=tformat:%h
-				--no-notes --no-color --no-renames
-				--diff-filter=AM), $range);
-		++$regen_max while <$fh>;
-		my (undef, $max) = $self->{mm}->minmax;
-
-		if ($max && $max == $regen_max) {
+		if ($max && $max == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
 			$self->{regen_up} = $max;
 		} else {
 			# normal regen is for for fresh data
-			$self->{regen_down} = $regen_max;
+			$self->{regen_down} = $fcount;
 		}
+	} else {
+		# Give oldest messages the smallest numbers
+		$self->{regen_down} = $max + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (6 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-02  3:00   ` Eric Wong
  2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Today the only thing that prevents public-inbox not reusing the
message numbers of deleted messages is the sqlite autoincrement magic
and that only works part of the time.  The new incremental indexing
test has revealed areas where today public-inbox does try to reuse
numbers of deleted messages.

Reusing the message numbers of existing messages is a problem because
if a client ever sees messages that are subsequently deleted the
client will not see the new messages with their old numbers.

In practice this is difficult to trigger because it requires the most
recently added message to be removed and have the removal show up in a
separate pull request.  Still it can happen and it should be handled.

Instead of infering the highset number ever used by finding the maximum
number in the message map, track the largest number ever assigned directly.

Update Msgmap to track this value and update the indexers to use this
value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Msgmap.pm     | 23 +++++++++++++++++++++--
 lib/PublicInbox/SearchIdx.pm  |  8 ++++----
 lib/PublicInbox/V2Writable.pm |  4 ++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index fdc71e46e214..b79e4cde30b2 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -51,6 +51,10 @@ sub new_file {
 		$dbh->begin_work;
 		$self->created_at(time) unless $self->created_at;
 		$dbh->commit;
+
+		my (undef, $max) = $self->minmax();
+		$max ||= 0;
+		$self->num_highwater($max);
 	}
 	$self;
 }
@@ -107,6 +111,17 @@ sub created_at {
 	$self->meta_accessor('created_at', $second);
 }
 
+sub num_highwater {
+	my ($self, $num) = @_;
+	my $high = $self->{num_highwater} ||=
+	    $self->meta_accessor('num_highwater');
+	if (defined($num) && (!defined($high) || ($num > $high))) {
+		$self->{num_highwater} = $num;
+		$self->meta_accessor('num_highwater', $num);
+	}
+	$self->{num_highwater};
+}
+
 sub mid_insert {
 	my ($self, $mid) = @_;
 	my $dbh = $self->{dbh};
@@ -114,7 +129,9 @@ sub mid_insert {
 INSERT OR IGNORE INTO msgmap (mid) VALUES (?)
 
 	return if $sth->execute($mid) == 0;
-	$dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	my $num = $dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	$self->num_highwater($num) unless !defined($num);
+	$num;
 }
 
 sub mid_for {
@@ -213,7 +230,9 @@ sub mid_set {
 		$self->{dbh}->prepare(
 			'INSERT OR IGNORE INTO msgmap (num,mid) VALUES (?,?)');
 	};
-	$sth->execute($num, $mid);
+	my $result = $sth->execute($num, $mid);
+	$self->num_highwater($num) if (defined($result) && $result == 1);
+	$result;
 }
 
 sub DESTROY {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index ac821ac00c6a..2532c8dfd10d 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -627,20 +627,20 @@ sub _git_log {
 			     --no-notes --no-color --no-renames
 			     --diff-filter=AM), $range);
 	++$fcount while <$fh>;
-	my (undef, $max) = $self->{mm}->minmax;
+	my $high = $self->{mm}->num_highwater;
 
 	if (index($range, '..') < 0) {
-		if ($max && $max == $fcount) {
+		if ($high && $high == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
-			$self->{regen_up} = $max;
+			$self->{regen_up} = $high;
 		} else {
 			# normal regen is for for fresh data
 			$self->{regen_down} = $fcount;
 		}
 	} else {
 		# Give oldest messages the smallest numbers
-		$self->{regen_down} = $max + $fcount;
+		$self->{regen_down} = $high + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 934640eb672c..c450980c8f51 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -879,9 +879,9 @@ sub index_sync {
 	my $mm_tmp = $self->{mm}->tmp_clone;
 	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
 
-	my ($min, $max) = $mm_tmp->minmax;
+	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
-	$$regen += $max if $max;
+	$$regen += $high if $high;
 	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 09/13] t/v[12]reindex.t Verify num_highwater
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (7 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 7 +++++++
 t/v2reindex.t | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 8e78aa761333..876c9db3441a 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -246,6 +246,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 4, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -269,6 +270,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -291,6 +293,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -319,6 +322,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -352,6 +356,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -374,6 +379,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -402,6 +408,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
diff --git a/t/v2reindex.t b/t/v2reindex.t
index ce13a07c39c0..3d217aff3c72 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -237,6 +237,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 4, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -260,6 +261,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -282,6 +284,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -310,6 +313,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -343,6 +347,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 3, 'max as expected');
+	is($ibx->mm->num_highwater, 4, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -365,6 +370,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
@@ -393,6 +399,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my ($min, $max) = $ibx->mm->minmax;
 	is($min, 1, 'min as expected');
 	is($max, 10, 'max as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	is_deeply($ibx->mm->msg_range(\$min, $max),
 		  [
 		   [1, '1@example.com' ],
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (8 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Instrument the tests to verify the highwater num highwater mark is
where it is expected.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/v1reindex.t | 10 ++++++++++
 t/v2reindex.t |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/t/v1reindex.t b/t/v1reindex.t
index 876c9db3441a..8be95149723c 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -63,6 +63,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	$msgmap = $ibx->mm->msg_range(\$min, $max);
@@ -87,6 +88,8 @@ my ($mark1, $mark2, $mark3, $mark4);
 	is($@, '', 'no error from reindexing');
 	$im->done;
 
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
+
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
@@ -107,6 +110,7 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -129,6 +133,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -151,6 +156,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -174,6 +180,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'got Xapian search results');
 
@@ -199,6 +206,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	is($mset->size, 0, "no Xapian search results");
 
@@ -223,6 +231,8 @@ ok(!-d $xap, 'Xapian directories removed again');
 	my $mset = $ibx->search->reopen->query('hello world', {mset=>1});
 	isnt($mset->size, 0, 'search OK after basic -> medium');
 
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
+
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
 }
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 3d217aff3c72..a5454a22abf5 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -60,6 +60,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 	$minmax = [ $ibx->mm->minmax ];
 	ok(defined $minmax->[0] && defined $minmax->[1], 'minmax defined');
 	is_deeply($minmax, [ 1, 10 ], 'minmax as expected');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = @$minmax;
 	$msgmap = $ibx->mm->msg_range(\$min, $max);
@@ -85,6 +86,7 @@ my ($mark1, $mark2, $mark3, $mark4);
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -104,6 +106,7 @@ ok(!-d $xap, 'Xapian directories removed');
 
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -125,6 +128,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	my ($min, $max) = $ibx->mm->minmax;
 	is_deeply($ibx->mm->msg_range(\$min, $max), $msgmap, 'msgmap unchanged');
@@ -147,6 +151,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('"hello world"', {mset=>1});
 	isnt($mset->size, 0, "phrase search succeeds on indexlevel=full");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
@@ -172,6 +177,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 
 	if (0) {
 		# not sure why, but Xapian seems to fallback to terms and
@@ -208,6 +214,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	ok(-d $xap, 'Xapian directories recreated');
 	delete $ibx->{mm};
 	is_deeply([ $ibx->mm->minmax ], $minmax, 'minmax unchanged');
+	is($ibx->mm->num_highwater, 10, 'num_highwater as expected');
 	my $mset = $ibx->search->query('hello', {mset=>1});
 	is($mset->size, 0, "search fails on indexlevel='basic'");
 	for (<"$xap/*/*">) { $sizes{$ibx->{indexlevel}} += -s _ if -f $_ }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (9 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

When performing an incremental index update with index_sync if a message is seen
to be both added and deleted update the num_highwater mark even though the
message is not otherwise indexed.

This ensures index_sync generates the same msgmap no matter which commit
it stops at during incremental syncs.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/SearchIdx.pm  |  3 ++-
 lib/PublicInbox/V2Writable.pm | 10 ++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 2532c8dfd10d..54f82aa8ee4e 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -583,7 +583,8 @@ sub read_log {
 			my $blob = $1;
 			if (delete $D{$blob}) {
 				if (defined $self->{regen_down}) {
-					$self->{regen_down}--;
+					my $num = $self->{regen_down}--;
+					$self->{mm}->num_highwater($num);
 				}
 				next;
 			}
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index c450980c8f51..92d2672c78c4 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -658,7 +658,7 @@ sub mark_deleted {
 }
 
 sub reindex_oid {
-	my ($self, $mm_tmp, $D, $git, $oid, $regen) = @_;
+	my ($self, $mm_tmp, $D, $git, $oid, $regen, $reindex) = @_;
 	my $len;
 	my $msgref = $git->cat_file($oid, \$len);
 	my $mime = PublicInbox::MIME->new($$msgref);
@@ -700,7 +700,8 @@ sub reindex_oid {
 
 	if (!defined($mid0) || $del) {
 		if (!defined($mid0) && $del) { # expected for deletes
-			$$regen--;
+			$num = $$regen--;
+			$self->{mm}->num_highwater($num) unless $reindex;
 			return
 		}
 
@@ -877,7 +878,8 @@ sub index_sync {
 	return unless defined $latest;
 	$self->idx_init; # acquire lock
 	my $mm_tmp = $self->{mm}->tmp_clone;
-	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
+	my $reindex = $opts->{reindex};
+	my $ranges = $reindex ? [] : $self->last_commits($epoch_max);
 
 	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
@@ -903,7 +905,7 @@ sub index_sync {
 				chomp($cmt = $_);
 			} elsif (/\A:\d{6} 100644 $x40 ($x40) [AM]\tm$/o) {
 				$self->reindex_oid($mm_tmp, $D, $git, $1,
-						$regen);
+						$regen, $reindex);
 			} elsif (/\A:\d{6} 100644 $x40 ($x40) [AM]\td$/o) {
 				$self->mark_deleted($D, $git, $1);
 			}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (10 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

The lookup to see if a num has already been assigned to a message
happens in a temporary copy of message map.  It is possible that the
number has been removed from the current message map.  The
unindex/reindex after a history rewrite triggered by a purge should be
one such case.  Therefore add the number to the msgmap in case it is
not currently present.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 92d2672c78c4..4dd14331a78f 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -676,6 +676,7 @@ sub reindex_oid {
 		if (defined $n && $n > $num) {
 			$mid0 = $mid;
 			$num = $n;
+			$self->{mm}->mid_set($num, $mid0);
 		}
 	}
 	if (!defined($mid0) && $regen && !$del) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap
  2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
                   ` (11 preceding siblings ...)
  2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
@ 2018-08-01 16:43 ` Eric W. Biederman
  12 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Now that we track the num highwater mark it is safe to remove messages
from msgmap that have been previously allocated.  Removing even the
highest numbered article will no longer cause new message numbers to
move backwards.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 4dd14331a78f..0396d9f58e55 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -842,7 +842,10 @@ sub unindex_oid {
 			warn "BUG: multiple articles linked to $oid\n",
 				join(',',sort keys %gone), "\n";
 		}
-		$self->{unindexed}->{$_}++ foreach keys %gone;
+		foreach my $num (keys %gone) {
+			$self->{unindexed}->{$_}++;
+			$self->{mm}->num_delete($num);
+		}
 		$self->unindex_oid_remote($oid, $mid);
 	}
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
@ 2018-08-02  3:00   ` Eric Wong
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
  2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
  0 siblings, 2 replies; 24+ messages in thread
From: Eric Wong @ 2018-08-02  3:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> --- a/lib/PublicInbox/Msgmap.pm
> +++ b/lib/PublicInbox/Msgmap.pm
> @@ -51,6 +51,10 @@ sub new_file {
>  		$dbh->begin_work;
>  		$self->created_at(time) unless $self->created_at;
>  		$dbh->commit;
> +
> +		my (undef, $max) = $self->minmax();
> +		$max ||= 0;
> +		$self->num_highwater($max);

I think the calls to minmax and num_highwater should be in the
transaction right above it.  I can squash locally if that's the
only thing...

Everything else looks good, so far; thanks.


While testing this, it looks like I introduced a bug to
indexlevel=basic which broke incremental indexing when I made it
possible to upgrade to (medium|full).  Patch coming for that in
a bit...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02  3:00   ` Eric Wong
@ 2018-08-02  3:44     ` Eric Wong
  2018-08-02 12:25       ` ebiederm
  2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
  2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
  1 sibling, 2 replies; 24+ messages in thread
From: Eric Wong @ 2018-08-02  3:44 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

I wrote:
> While testing this, it looks like I introduced a bug to
> indexlevel=basic which broke incremental indexing when I made it
> possible to upgrade to (medium|full).  Patch coming for that in
> a bit...

Eep, I think there's deeper problems with indexlevel=basic and
incremental updates, since it's still doing lookups against
Xapian for deletes...

This is my work-in-progress to stop full-reindexing, at least.

----8<-----
Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic

---
 lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 54f82aa..5cac08f 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -668,6 +668,15 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit {
+	my ($self) = @_;
+	if ($self->{indexlevel} =~ $xapianlevels) {
+		$self->{xdb}->get_metadata('last_commit');
+	} else {
+		$self->{mm}->last_commit || '';
+	}
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,7 +691,7 @@ sub _index_sync {
 	do {
 		$xlog = undef;
 		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
+		$last_commit = $self->_last_x_commit;
 		$lx = $last_commit;
 		if ($reindex) {
 			$lx = '';
@@ -707,7 +716,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
-- 
EW

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-02  3:00   ` Eric Wong
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
@ 2018-08-02 12:08     ` ebiederm
  1 sibling, 0 replies; 24+ messages in thread
From: ebiederm @ 2018-08-02 12:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> --- a/lib/PublicInbox/Msgmap.pm
>> +++ b/lib/PublicInbox/Msgmap.pm
>> @@ -51,6 +51,10 @@ sub new_file {
>>  		$dbh->begin_work;
>>  		$self->created_at(time) unless $self->created_at;
>>  		$dbh->commit;
>> +
>> +		my (undef, $max) = $self->minmax();
>> +		$max ||= 0;
>> +		$self->num_highwater($max);
>
> I think the calls to minmax and num_highwater should be in the
> transaction right above it.  I can squash locally if that's the
> only thing...

Sounds good.  With autocommit being on by default and repository locking
happening I was not at all certain where transaction boundaries need to
go (or if they are really needed).  But by it's very nature
num_highwater needs to be in the same transaction as whatever adds a new
higher number into msgmap.

> Everything else looks good, so far; thanks.

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
@ 2018-08-02 12:25       ` ebiederm
  2018-08-02 17:12         ` ebiederm
  2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
  1 sibling, 1 reply; 24+ messages in thread
From: ebiederm @ 2018-08-02 12:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> I wrote:
>> While testing this, it looks like I introduced a bug to
>> indexlevel=basic which broke incremental indexing when I made it
>> possible to upgrade to (medium|full).  Patch coming for that in
>> a bit...
>
> Eep, I think there's deeper problems with indexlevel=basic and
> incremental updates, since it's still doing lookups against
> Xapian for deletes...
>
> This is my work-in-progress to stop full-reindexing, at least.

So I don't think your change is quite right.

If the repo has only lived with indexlevel == 'basic' then last_commit
should always match the output of xdb->get_metadata('last_commit') as
it will always been undef or ''.

In fact I don't see any code that is updating
$xdb->get_metadata('last_commit');

Am I wrong or can we just always use mm->last_commit for this logic?

Eric

> ----8<-----
> Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic
>
> ---
>  lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
> index 54f82aa..5cac08f 100644
> --- a/lib/PublicInbox/SearchIdx.pm
> +++ b/lib/PublicInbox/SearchIdx.pm
> @@ -668,6 +668,15 @@ sub need_update ($$$) {
>  	($n eq '' || $n > 0);
>  }
>  
> +sub _last_x_commit {
> +	my ($self) = @_;
> +	if ($self->{indexlevel} =~ $xapianlevels) {
> +		$self->{xdb}->get_metadata('last_commit');
> +	} else {
> +		$self->{mm}->last_commit || '';
> +	}
> +}
> +
>  # indexes all unindexed messages (v1 only)
>  sub _index_sync {
>  	my ($self, $opts) = @_;
> @@ -682,7 +691,7 @@ sub _index_sync {
>  	do {
>  		$xlog = undef;
>  		$mkey = 'last_commit';
> -		$last_commit = $xdb->get_metadata('last_commit');
> +		$last_commit = $self->_last_x_commit;
>  		$lx = $last_commit;
>  		if ($reindex) {
>  			$lx = '';
> @@ -707,7 +716,7 @@ sub _index_sync {
>  		$xlog = _git_log($self, $range);
>  
>  		$xdb = $self->begin_txn_lazy;
> -	} while ($xdb->get_metadata('last_commit') ne $last_commit);
> +	} while ($self->_last_x_commit ne $last_commit);
>  
>  	my $dbh = $mm->{dbh} if $mm;
>  	my $cb = sub {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02 12:25       ` ebiederm
@ 2018-08-02 17:12         ` ebiederm
  2018-08-02 18:15           ` ebiederm
  0 siblings, 1 reply; 24+ messages in thread
From: ebiederm @ 2018-08-02 17:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> Eric Wong <e@80x24.org> writes:
>
>> I wrote:
>>> While testing this, it looks like I introduced a bug to
>>> indexlevel=basic which broke incremental indexing when I made it
>>> possible to upgrade to (medium|full).  Patch coming for that in
>>> a bit...
>>
>> Eep, I think there's deeper problems with indexlevel=basic and
>> incremental updates, since it's still doing lookups against
>> Xapian for deletes...
>>
>> This is my work-in-progress to stop full-reindexing, at least.
>
> So I don't think your change is quite right.
>
> If the repo has only lived with indexlevel == 'basic' then last_commit
> should always match the output of xdb->get_metadata('last_commit') as
> it will always been undef or ''.
>
> In fact I don't see any code that is updating
> $xdb->get_metadata('last_commit');
>
> Am I wrong or can we just always use mm->last_commit for this logic?

Yes.  I was wrong.  I had overlooked the get_metadata($mkey) and
set_metadata($mkey) calls.   From just below but those are all
with $mkey == 'last_commit'  So 'last_commit' does get updated.

I think we want something more like below.  Where _last_x_commit
just computes the last_commit value to use, using both values of
last_commit.

I am not 100% about the loop.  Perhaps that check to see if either
value of last_commit changes?

I definitely don't think when we already have one set of logic
for substituting in last_commit we should be adding in another.
That gets confusing very fast.

Eric


diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 198f39b96e6c..130e5a41c5e7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -669,6 +669,23 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit
+{
+	my ($self) = @_;
+        my $lm = $mm->last_commit || '';
+	my $lx = '';
+	if ($self->{indexlevel} ~= $xapianlevels) {
+		$lx = $self->{xdb}->get_metadata('last_commit') || '';
+        } else {
+		$lx = $lm;
+	}
+	# Use last_commit from msgmap if it is older or unset
+	if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) {
+		$lx = $lm;
+	}
+	$lx;
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,19 +699,8 @@ sub _index_sync {
 	my $mm = _msgmap_init($self);
 	do {
 		$xlog = undef;
-		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
-		$lx = $last_commit;
-		if ($reindex) {
-			$lx = '';
-			$mkey = undef if $last_commit ne '';
-		}
-
-		# use last_commit from msgmap if it is older or unset
-		my $lm = $mm->last_commit || '';
-		if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) {
-			$lx = $lm;
-		}
+		$last_commit = $self->_last_x_commit;
+		$lx = !$reindex ? $last_commit : '';
 
 		$self->{over}->rollback_lazy;
 		$self->{over}->disconnect;
@@ -708,7 +714,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
@@ -722,7 +728,7 @@ sub _index_sync {
 			}
 			$dbh->commit;
 		}
-		if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) {
+		if ($newest && $self->{indexlevel} =~ $xapianlevels) {
 			my $cur = $xdb->get_metadata($mkey);
 			if (need_update($self, $cur, $newest)) {
 				$xdb->set_metadata($mkey, $newest);


>> ----8<-----
>> Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic
>>
>> ---
>>  lib/PublicInbox/SearchIdx.pm | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
>> index 54f82aa..5cac08f 100644
>> --- a/lib/PublicInbox/SearchIdx.pm
>> +++ b/lib/PublicInbox/SearchIdx.pm
>> @@ -668,6 +668,15 @@ sub need_update ($$$) {
>>  	($n eq '' || $n > 0);
>>  }
>>  
>> +sub _last_x_commit {
>> +	my ($self) = @_;
>> +	if ($self->{indexlevel} =~ $xapianlevels) {
>> +		$self->{xdb}->get_metadata('last_commit');
>> +	} else {
>> +		$self->{mm}->last_commit || '';
>> +	}
>> +}
>> +
>>  # indexes all unindexed messages (v1 only)
>>  sub _index_sync {
>>  	my ($self, $opts) = @_;
>> @@ -682,7 +691,7 @@ sub _index_sync {
>>  	do {
>>  		$xlog = undef;
>>  		$mkey = 'last_commit';
>> -		$last_commit = $xdb->get_metadata('last_commit');
>> +		$last_commit = $self->_last_x_commit;
>>  		$lx = $last_commit;
>>  		if ($reindex) {
>>  			$lx = '';
>> @@ -707,7 +716,7 @@ sub _index_sync {
>>  		$xlog = _git_log($self, $range);
>>  
>>  		$xdb = $self->begin_txn_lazy;
>> -	} while ($xdb->get_metadata('last_commit') ne $last_commit);
>> +	} while ($self->_last_x_commit ne $last_commit);
>>  
>>  	my $dbh = $mm->{dbh} if $mm;
>>  	my $cb = sub {


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [WIP] searchidx: support incremental indexing on indexlevel=basic
  2018-08-02 17:12         ` ebiederm
@ 2018-08-02 18:15           ` ebiederm
  0 siblings, 0 replies; 24+ messages in thread
From: ebiederm @ 2018-08-02 18:15 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Eric Wong <e@80x24.org> writes:
>>
>>> I wrote:
>>>> While testing this, it looks like I introduced a bug to
>>>> indexlevel=basic which broke incremental indexing when I made it
>>>> possible to upgrade to (medium|full).  Patch coming for that in
>>>> a bit...
>>>
>>> Eep, I think there's deeper problems with indexlevel=basic and
>>> incremental updates, since it's still doing lookups against
>>> Xapian for deletes...
>>>
>>> This is my work-in-progress to stop full-reindexing, at least.
>>
>> So I don't think your change is quite right.
>>
>> If the repo has only lived with indexlevel == 'basic' then last_commit
>> should always match the output of xdb->get_metadata('last_commit') as
>> it will always been undef or ''.
>>
>> In fact I don't see any code that is updating
>> $xdb->get_metadata('last_commit');
>>
>> Am I wrong or can we just always use mm->last_commit for this logic?
>
> Yes.  I was wrong.  I had overlooked the get_metadata($mkey) and
> set_metadata($mkey) calls.   From just below but those are all
> with $mkey == 'last_commit'  So 'last_commit' does get updated.
>
> I think we want something more like below.  Where _last_x_commit
> just computes the last_commit value to use, using both values of
> last_commit.
>
> I am not 100% about the loop.  Perhaps that check to see if either
> value of last_commit changes?
>
> I definitely don't think when we already have one set of logic
> for substituting in last_commit we should be adding in another.
> That gets confusing very fast.
>
Make that something like this where we remove the $mkey variable 
and all of it's special semantics with regards to reindexing entirely.

Eric


diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 198f39b96e6c..8b12da0436a4 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -669,6 +669,23 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+sub _last_x_commit
+{
+	my ($self) = @_;
+        my $lm = $mm->last_commit || '';
+	my $lx = '';
+	if ($self->{indexlevel} ~= $xapianlevels) {
+		$lx = $self->{xdb}->get_metadata('last_commit') || '';
+        } else {
+		$lx = $lm;
+	}
+	# Use last_commit from msgmap if it is older or unset
+	if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) {
+		$lx = $lm;
+	}
+	$lx;
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
@@ -682,19 +699,8 @@ sub _index_sync {
 	my $mm = _msgmap_init($self);
 	do {
 		$xlog = undef;
-		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
-		$lx = $last_commit;
-		if ($reindex) {
-			$lx = '';
-			$mkey = undef if $last_commit ne '';
-		}
-
-		# use last_commit from msgmap if it is older or unset
-		my $lm = $mm->last_commit || '';
-		if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) {
-			$lx = $lm;
-		}
+		$last_commit = $self->_last_x_commit;
+		$lx = !$reindex ? $last_commit : '';
 
 		$self->{over}->rollback_lazy;
 		$self->{over}->disconnect;
@@ -708,7 +714,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while ($self->_last_x_commit ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
@@ -722,10 +728,10 @@ sub _index_sync {
 			}
 			$dbh->commit;
 		}
-		if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) {
-			my $cur = $xdb->get_metadata($mkey);
+		if ($newest && $self->{indexlevel} =~ $xapianlevels) {
+			my $cur = $xdb->get_metadata('last_commit');
 			if (need_update($self, $cur, $newest)) {
-				$xdb->set_metadata($mkey, $newest);
+				$xdb->set_metadata('last_commit', $newest);
 			}
 		}
 		$self->commit_txn_lazy;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/3] incremental index fixes for indexlevel=basic
  2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
  2018-08-02 12:25       ` ebiederm
@ 2019-05-14  2:04       ` Eric Wong
  2019-05-14  2:04         ` [PATCH 1/3] v1writable: new wrapper which is closer to v2writable Eric Wong
                           ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Eric Wong @ 2019-05-14  2:04 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

Resurrecting an old issue from last year I completely forgot
about :x  Thanks to Eric Biederman for the cleanup to 3/3
to simplify the v1 indexing logic.

The first two are prepatory patches which make tests
easier-to-write; and I have plans to cleanup and cull
redundancies in tests because they take too long.

Eric Wong (3):
  v1writable: new wrapper which is closer to v2writable
  v2writable: allow setting nproc via creat options
  searchidx: fix incremental index with indexlevel=basic on v1

 MANIFEST                      |   2 +
 lib/PublicInbox/Import.pm     |  15 +++-
 lib/PublicInbox/OverIdx.pm    |   9 ++-
 lib/PublicInbox/SearchIdx.pm  |  68 +++++++++++-------
 lib/PublicInbox/V1Writable.pm |  34 +++++++++
 lib/PublicInbox/V2Writable.pm |  19 ++++--
 script/public-inbox-index     |   3 +-
 script/public-inbox-init      |  23 +++----
 t/indexlevels-mirror.t        | 125 ++++++++++++++++++++++++++++++++++
 t/purge.t                     |   3 +-
 t/v2reindex.t                 |   3 +-
 t/v2writable.t                |   8 +--
 12 files changed, 252 insertions(+), 60 deletions(-)
 create mode 100644 lib/PublicInbox/V1Writable.pm
 create mode 100644 t/indexlevels-mirror.t

-- 
EW

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] v1writable: new wrapper which is closer to v2writable
  2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
@ 2019-05-14  2:04         ` Eric Wong
  2019-05-14  2:04         ` [PATCH 2/3] v2writable: allow setting nproc via creat options Eric Wong
  2019-05-14  2:04         ` [PATCH 3/3] searchidx: fix incremental index with indexlevel=basic on v1 Eric Wong
  2 siblings, 0 replies; 24+ messages in thread
From: Eric Wong @ 2019-05-14  2:04 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

Import initialization is a little strange from history, but we
also can't change it too much because it's technically a public
API which external code may rely on...

And we may need to support v1 repos indefinitely.  This should
make it easier to write tests for both formats.
---
 MANIFEST                      |  1 +
 lib/PublicInbox/Import.pm     | 15 ++++++++++++++-
 lib/PublicInbox/V1Writable.pm | 34 ++++++++++++++++++++++++++++++++++
 lib/PublicInbox/V2Writable.pm |  6 +-----
 script/public-inbox-init      | 23 ++++++++++-------------
 5 files changed, 60 insertions(+), 19 deletions(-)
 create mode 100644 lib/PublicInbox/V1Writable.pm

diff --git a/MANIFEST b/MANIFEST
index 29cea10..28300e0 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -125,6 +125,7 @@ lib/PublicInbox/SpawnPP.pm
 lib/PublicInbox/Syscall.pm
 lib/PublicInbox/Unsubscribe.pm
 lib/PublicInbox/UserContent.pm
+lib/PublicInbox/V1Writable.pm
 lib/PublicInbox/V2Writable.pm
 lib/PublicInbox/View.pm
 lib/PublicInbox/ViewDiff.pm
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index c775575..12abf39 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -18,12 +18,15 @@ use PublicInbox::MDA;
 use POSIX qw(strftime);
 
 sub new {
+	# we can't change arg order, this is documented in POD
+	# and external projects may rely on it:
 	my ($class, $git, $name, $email, $ibx) = @_;
 	my $ref = 'refs/heads/master';
 	if ($ibx) {
 		$ref = $ibx->{ref_head} || 'refs/heads/master';
 		$name ||= $ibx->{name};
 		$email ||= $ibx->{-primary_address};
+		$git ||= $ibx->git;
 	}
 	bless {
 		git => $git,
@@ -433,6 +436,16 @@ sub run_die ($;$$) {
 	$? == 0 or die join(' ', @$cmd) . " failed: $?\n";
 }
 
+sub init_bare {
+	my ($dir) = @_;
+	my @cmd = (qw(git init --bare -q), $dir);
+	run_die(\@cmd);
+	# set a reasonable default:
+	@cmd = (qw/git config/, "--file=$dir/config",
+		'repack.writeBitmaps', 'true');
+	run_die(\@cmd);
+}
+
 sub done {
 	my ($self) = @_;
 	my $w = delete $self->{out} or return;
@@ -586,7 +599,7 @@ __END__
 
 =head1 NAME
 
-PublicInbox::Import - message importer for public-inbox
+PublicInbox::Import - message importer for public-inbox v1 inboxes
 
 =head1 VERSION
 
diff --git a/lib/PublicInbox/V1Writable.pm b/lib/PublicInbox/V1Writable.pm
new file mode 100644
index 0000000..6ca5db4
--- /dev/null
+++ b/lib/PublicInbox/V1Writable.pm
@@ -0,0 +1,34 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+# This interface wraps PublicInbox::Import and makes it closer
+# to V2Writable
+# Used to write to V1 inboxes (see L<public-inbox-v1-format(5)>).
+package PublicInbox::V1Writable;
+use strict;
+use warnings;
+use base qw(PublicInbox::Import);
+use PublicInbox::InboxWritable;
+
+sub new {
+	my ($class, $ibx, $creat) = @_;
+	my $dir = $ibx->{mainrepo} or die "no mainrepo in inbox\n";
+	unless (-d $dir) {
+		if ($creat) {
+			PublicInbox::Import::init_bare($dir);
+		} else {
+			die "$dir does not exist\n";
+		}
+	}
+	$ibx = PublicInbox::InboxWritable->new($ibx);
+	$class->SUPER::new(undef, undef, undef, $ibx);
+}
+
+sub init_inbox {
+	my ($self, $partitions, $skip_epoch, $skip_artnum) = @_;
+	# TODO: honor skip_artnum
+	my $dir = $self->{-inbox}->{mainrepo} or die "no mainrepo in inbox\n";
+	PublicInbox::Import::init_bare($dir);
+}
+
+1;
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 87e8f3e..b92d8d2 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -517,11 +517,7 @@ sub fill_alternates ($$) {
 	my $all = "$self->{-inbox}->{mainrepo}/all.git";
 	my @cmd;
 	unless (-d $all) {
-		@cmd = (qw(git init --bare -q), $all);
-		PublicInbox::Import::run_die(\@cmd);
-		@cmd = (qw/git config/, "--file=$all/config",
-				'repack.writeBitmaps', 'true');
-		PublicInbox::Import::run_die(\@cmd);
+		PublicInbox::Import::init_bare($all);
 	}
 	@cmd = (qw/git config/, "--file=$pfx/$epoch.git/config",
 			'include.path', '../../all.git/config');
diff --git a/script/public-inbox-init b/script/public-inbox-init
index 39f7497..8bb7845 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -8,6 +8,7 @@ use warnings;
 my $usage = "public-inbox-init NAME REPO_DIR HTTP_URL ADDRESS [ADDRESS..]";
 use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/;
 use PublicInbox::Config;
+use PublicInbox::Inbox;
 use File::Temp qw/tempfile/;
 use File::Basename qw/dirname/;
 use File::Path qw/mkpath/;
@@ -103,23 +104,19 @@ if ($version == 1 && defined $skip) {
 	die "--skip is only supported for -V2 repos\n";
 }
 
+my $ibx = PublicInbox::Inbox->new({
+	mainrepo => $mainrepo,
+	name => $name,
+	version => $version,
+	-primary_address => $address[0],
+});
+
 if ($version >= 2) {
 	require PublicInbox::V2Writable;
-	require PublicInbox::Inbox;
-	my $ibx = {
-		mainrepo => $mainrepo,
-		name => $name,
-		version => $version,
-		-primary_address => $address[0],
-	};
-	$ibx = PublicInbox::Inbox->new($ibx);
 	PublicInbox::V2Writable->new($ibx, 1)->init_inbox(0, $skip);
 } elsif ($version == 1) {
-	x(qw(git init -q --bare), $mainrepo);
-
-	# set a reasonable default:
-	x(qw/git config/, "--file=$mainrepo/config",
-		'repack.writeBitmaps', 'true');
+	require PublicInbox::V1Writable;
+	PublicInbox::V1Writable->new($ibx, 1)->init_inbox(0, $skip);
 } else {
 	die "Unsupported -V/--version: $version\n";
 }
-- 
EW


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/3] v2writable: allow setting nproc via creat options
  2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
  2019-05-14  2:04         ` [PATCH 1/3] v1writable: new wrapper which is closer to v2writable Eric Wong
@ 2019-05-14  2:04         ` Eric Wong
  2019-05-14  2:04         ` [PATCH 3/3] searchidx: fix incremental index with indexlevel=basic on v1 Eric Wong
  2 siblings, 0 replies; 24+ messages in thread
From: Eric Wong @ 2019-05-14  2:04 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

Avoiding reliance on environment variables is a bit cleaner
for writing tests
---
 lib/PublicInbox/V2Writable.pm | 13 +++++++++++--
 script/public-inbox-index     |  3 +--
 t/purge.t                     |  3 +--
 t/v2reindex.t                 |  3 +--
 t/v2writable.t                |  8 ++------
 5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index b92d8d2..afcac4d 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -24,7 +24,14 @@ use IO::Handle;
 my $PACKING_FACTOR = 0.4;
 
 # assume 2 cores if GNU nproc(1) is not available
-sub nproc_parts () {
+sub nproc_parts ($) {
+	my ($creat_opt) = @_;
+	if (ref($creat_opt) eq 'HASH') {
+		if (defined(my $n = $creat_opt->{nproc})) {
+			return $n
+		}
+	}
+
 	my $n = int($ENV{NPROC} || `nproc 2>/dev/null` || 2);
 	# subtract for the main process and git-fast-import
 	$n -= 1;
@@ -52,6 +59,8 @@ sub count_partitions ($) {
 }
 
 sub new {
+	# $creat may be any true value, or 0/undef.  A hashref is true,
+	# and $creat->{nproc} may be set to an integer
 	my ($class, $v2ibx, $creat) = @_;
 	my $dir = $v2ibx->{mainrepo} or die "no mainrepo in inbox\n";
 	unless (-d $dir) {
@@ -80,7 +89,7 @@ sub new {
 		rotate_bytes => int((1024 * 1024 * 1024) / $PACKING_FACTOR),
 		last_commit => [], # git repo -> commit
 	};
-	$self->{partitions} = count_partitions($self) || nproc_parts();
+	$self->{partitions} = count_partitions($self) || nproc_parts($creat);
 	bless $self, $class;
 }
 
diff --git a/script/public-inbox-index b/script/public-inbox-index
index 2f810a5..b353093 100755
--- a/script/public-inbox-index
+++ b/script/public-inbox-index
@@ -71,8 +71,7 @@ sub index_dir {
 		eval { require PublicInbox::V2Writable };
 		die "v2 requirements not met: $@\n" if $@;
 		my $v2w = eval {
-			$jobs and local $ENV{NPROC} = $jobs;
-			PublicInbox::V2Writable->new($repo);
+			PublicInbox::V2Writable->new($repo, {nproc=>$jobs});
 		};
 		if (defined $jobs) {
 			if ($jobs == 0) {
diff --git a/t/purge.t b/t/purge.t
index b518c26..574935e 100644
--- a/t/purge.t
+++ b/t/purge.t
@@ -35,12 +35,11 @@ Hello World
 
 EOF
 
-local $ENV{NPROC} = '1';
 my $cfgfile = "$tmpdir/config";
 local $ENV{PI_CONFIG} = $cfgfile;
 open my $cfg_fh, '>', $cfgfile or die "open: $!";
 
-my $v2w = PublicInbox::V2Writable->new($ibx, 1);
+my $v2w = PublicInbox::V2Writable->new($ibx, {nproc => 1});
 my $mime = PublicInbox::MIME->new($raw);
 ok($v2w->add($mime), 'add message to be purged');
 $v2w->done;
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 8a3071b..c416629 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -39,14 +39,13 @@ my $mime = PublicInbox::MIME->create(
 	],
 	body => $agpl,
 );
-local $ENV{NPROC} = 2;
 my $minmax;
 my $msgmap;
 my ($mark1, $mark2, $mark3, $mark4);
 {
 	my %config = %$ibx_config;
 	my $ibx = PublicInbox::Inbox->new(\%config);
-	my $im = PublicInbox::V2Writable->new($ibx, 1);
+	my $im = PublicInbox::V2Writable->new($ibx, {nproc => 1});
 	my $im0 = $im->importer();
 	foreach my $i (1..10) {
 		$mime->header_set('Message-Id', "<$i\@example.com>");
diff --git a/t/v2writable.t b/t/v2writable.t
index f8ef415..7511015 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -33,10 +33,7 @@ my $mime = PublicInbox::MIME->create(
 	body => "hello world\n",
 );
 
-my $im = eval {
-	local $ENV{NPROC} = '1';
-	PublicInbox::V2Writable->new($ibx, 1);
-};
+my $im = PublicInbox::V2Writable->new($ibx, {nproc => 1});
 is($im->{partitions}, 1, 'one partition when forced');
 ok($im->add($mime), 'ordinary message added');
 foreach my $f ("$mainrepo/msgmap.sqlite3",
@@ -201,11 +198,10 @@ EOF
 	is_deeply([sort keys %lg], [sort keys %$rover], 'XROVER range OK');
 };
 {
-	local $ENV{NPROC} = 2;
 	my @log = qw(log --no-decorate --no-abbrev --no-notes --no-color);
 	my @before = $git0->qx(@log, qw(--pretty=oneline));
 	my $before = $git0->qx(@log, qw(--pretty=raw --raw -r));
-	$im = PublicInbox::V2Writable->new($ibx, 1);
+	$im = PublicInbox::V2Writable->new($ibx, {nproc => 2});
 	is($im->{partitions}, 1, 'detected single partition from previous');
 	my $smsg = $im->remove($mime, 'test removal');
 	$im->done;
-- 
EW


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/3] searchidx: fix incremental index with indexlevel=basic on v1
  2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
  2019-05-14  2:04         ` [PATCH 1/3] v1writable: new wrapper which is closer to v2writable Eric Wong
  2019-05-14  2:04         ` [PATCH 2/3] v2writable: allow setting nproc via creat options Eric Wong
@ 2019-05-14  2:04         ` Eric Wong
  2 siblings, 0 replies; 24+ messages in thread
From: Eric Wong @ 2019-05-14  2:04 UTC (permalink / raw)
  To: meta; +Cc: Eric W. Biederman

We were reindexing the full history every invocation of -index
when Xapian was not used because we were incorrectly relying on
'last_commit' metadata stored in Xapian.

Rewrite the indexing logic to be less confusing while we're
at it, since we rely on `git merge-base --is-ancestor' nowadays.

Furthermore, we need to handle message removals from the
overview index correctly when Xapian is not in use.

Co-authored-by: Eric W. Biederman <ebiederm@xmission.com>
---
 MANIFEST                     |   1 +
 lib/PublicInbox/OverIdx.pm   |   9 ++-
 lib/PublicInbox/SearchIdx.pm |  68 +++++++++++--------
 t/indexlevels-mirror.t       | 125 +++++++++++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 27 deletions(-)
 create mode 100644 t/indexlevels-mirror.t

diff --git a/MANIFEST b/MANIFEST
index 28300e0..1da40a9 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -198,6 +198,7 @@ t/httpd.t
 t/hval.t
 t/import.t
 t/inbox.t
+t/indexlevels-mirror.t
 t/init.t
 t/linkify.t
 t/main-bin/spamc
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index cc9bd7d..bb3068d 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -317,14 +317,21 @@ sub delete_articles {
 	$self->delete_by_num($_) foreach @$nums;
 }
 
+# returns number of removed messages
+# $oid may be undef to match only on $mid
 sub remove_oid {
 	my ($self, $oid, $mid) = @_;
+	my $nr = 0;
 	$self->begin_lazy;
 	each_by_mid($self, $mid, ['ddd'], sub {
 		my ($smsg) = @_;
-		$self->delete_by_num($smsg->{num}) if $smsg->{blob} eq $oid;
+		if (!defined($oid) || $smsg->{blob} eq $oid) {
+			$self->delete_by_num($smsg->{num});
+			$nr++;
+		}
 		1;
 	});
+	$nr;
 }
 
 sub create_tables {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index db0495b..1b86f72 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -365,6 +365,7 @@ sub find_doc_ids {
 	($db->postlist_begin($termval), $db->postlist_end($termval));
 }
 
+# v1 only
 sub batch_do {
 	my ($self, $termval, $cb) = @_;
 	my $batch_size = 1000; # don't let @ids grow too large to avoid OOM
@@ -379,25 +380,33 @@ sub batch_do {
 	}
 }
 
+# v1 only, where $mid is unique
 sub remove_message {
 	my ($self, $mid) = @_;
 	my $db = $self->{xdb};
-	my $called;
 	$mid = mid_clean($mid);
-	my $over = $self->{over};
 
+	if (my $over = $self->{over}) {
+		my $nr = eval { $over->remove_oid(undef, $mid) };
+		if ($@) {
+			warn "failed to remove <$mid> from overview: $@\n";
+		} elsif ($nr == 0) {
+			warn "<$mid> missing for removal from overview\n";
+		}
+	}
+	return if $self->{indexlevel} !~ $xapianlevels;
+	my $nr = 0;
 	eval {
 		batch_do($self, 'Q' . $mid, sub {
 			my ($ids) = @_;
 			$db->delete_document($_) for @$ids;
-			$over->delete_articles($ids) if $over;
-			$called = 1;
+			$nr = scalar @$ids;
 		});
 	};
 	if ($@) {
-		warn "failed to remove message <$mid>: $@\n";
-	} elsif (!$called) {
-		warn "cannot remove non-existent <$mid>\n";
+		warn "failed to remove <$mid> from Xapian: $@\n";
+	} elsif ($nr == 0) {
+		warn "<$mid> missing for removal from Xapian\n";
 	}
 }
 
@@ -648,12 +657,30 @@ sub need_update ($$$) {
 	($n eq '' || $n > 0);
 }
 
+# The last git commit we indexed with Xapian or SQLite (msgmap)
+# This needs to account for cases where Xapian or SQLite is
+# out-of-date with respect to the other.
+sub _last_x_commit {
+	my ($self, $mm) = @_;
+	my $lm = $mm->last_commit || '';
+	my $lx = '';
+	if ($self->{indexlevel} =~ $xapianlevels) {
+		$lx = $self->{xdb}->get_metadata('last_commit') || '';
+	} else {
+		$lx = $lm;
+	}
+	# Use last_commit from msgmap if it is older or unset
+	if (!$lm || ($lx && $lx && is_ancestor($self->{git}, $lm, $lx))) {
+		$lx = $lm;
+	}
+	$lx;
+}
+
 # indexes all unindexed messages (v1 only)
 sub _index_sync {
 	my ($self, $opts) = @_;
 	my $tip = $opts->{ref} || 'HEAD';
-	my $reindex = $opts->{reindex};
-	my ($mkey, $last_commit, $lx, $xlog);
+	my ($last_commit, $lx, $xlog);
 	my $git = $self->{git};
 	$git->batch_prepare;
 
@@ -661,19 +688,8 @@ sub _index_sync {
 	my $mm = _msgmap_init($self);
 	do {
 		$xlog = undef;
-		$mkey = 'last_commit';
-		$last_commit = $xdb->get_metadata('last_commit');
-		$lx = $last_commit;
-		if ($reindex) {
-			$lx = '';
-			$mkey = undef if $last_commit ne '';
-		}
-
-		# use last_commit from msgmap if it is older or unset
-		my $lm = $mm->last_commit || '';
-		if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) {
-			$lx = $lm;
-		}
+		$last_commit = _last_x_commit($self, $mm);
+		$lx = $opts->{reindex} ? '' : $last_commit;
 
 		$self->{over}->rollback_lazy;
 		$self->{over}->disconnect;
@@ -687,7 +703,7 @@ sub _index_sync {
 		$xlog = _git_log($self, $range);
 
 		$xdb = $self->begin_txn_lazy;
-	} while ($xdb->get_metadata('last_commit') ne $last_commit);
+	} while (_last_x_commit($self, $mm) ne $last_commit);
 
 	my $dbh = $mm->{dbh} if $mm;
 	my $cb = sub {
@@ -701,10 +717,10 @@ sub _index_sync {
 			}
 			$dbh->commit;
 		}
-		if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) {
-			my $cur = $xdb->get_metadata($mkey);
+		if ($newest && $self->{indexlevel} =~ $xapianlevels) {
+			my $cur = $xdb->get_metadata('last_commit');
 			if (need_update($self, $cur, $newest)) {
-				$xdb->set_metadata($mkey, $newest);
+				$xdb->set_metadata('last_commit', $newest);
 			}
 		}
 		$self->commit_txn_lazy;
diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
new file mode 100644
index 0000000..e25b827
--- /dev/null
+++ b/t/indexlevels-mirror.t
@@ -0,0 +1,125 @@
+# Copyright (C) 2019 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use PublicInbox::MIME;
+use PublicInbox::Inbox;
+use File::Temp qw/tempdir/;
+require './t/common.perl';
+require_git(2.6);
+my $this = (split('/', __FILE__))[-1];
+
+# TODO: remove Search::Xapian as a requirement for basic
+foreach my $mod (qw(DBD::SQLite Search::Xapian)) {
+	eval "require $mod";
+	plan skip_all => "$mod missing for $this" if $@;
+}
+
+my $path = 'blib/script';
+my $index = "$path/public-inbox-index";
+
+my $mime = PublicInbox::MIME->create(
+	header => [
+		From => 'a@example.com',
+		To => 'test@example.com',
+		Subject => 'this is a subject',
+		Date => 'Fri, 02 Oct 1993 00:00:00 +0000',
+	],
+	body => "hello world\n",
+);
+
+sub import_index_incremental {
+	my ($v, $level) = @_;
+	my $tmpdir = tempdir("pi-$this-tmp-XXXXXX", TMPDIR => 1, CLEANUP => 1);
+	my $ibx = PublicInbox::Inbox->new({
+		mainrepo => "$tmpdir/testbox",
+		name => "$this-$v",
+		version => $v,
+		-primary_address => 'test@example.com',
+		indexlevel => $level,
+	});
+	my $cls = "PublicInbox::V${v}Writable";
+	use_ok $cls;
+	my $im = $cls->new($ibx, {nproc=>1});
+	$mime->header_set('Message-ID', '<m@1>');
+	ok($im->add($mime), 'first message added');
+	$im->done;
+
+	# index master (required for v1)
+	is(system($index, $ibx->{mainrepo}), 0, 'index master OK');
+	my $ro_master = PublicInbox::Inbox->new({mainrepo => $ibx->{mainrepo}});
+	my ($nr, $msgs) = $ro_master->recent;
+	is($nr, 1, 'only one message in master, so far');
+	is($msgs->[0]->{mid}, 'm@1', 'first message in master indexed');
+
+	# clone
+	my @cmd = (qw(git clone --mirror -q));
+	my $mirror = "$tmpdir/mirror-$v";
+	if ($v == 1) {
+		push @cmd, $ibx->{mainrepo}, $mirror;
+	} else {
+		push @cmd, "$ibx->{mainrepo}/git/0.git", "$mirror/git/0.git";
+	}
+	my $fetch_dir = $cmd[-1];
+	is(system(@cmd), 0, "v$v clone OK");
+
+	# inbox init
+	local $ENV{PI_CONFIG} = "$tmpdir/.picfg";
+	@cmd = ("$path/public-inbox-init", '-L', $level,
+		'mirror', $mirror, '//example.com/test', 'test@example.com');
+	push @cmd, '-V2' if $v == 2;
+	is(system(@cmd), 0, "v$v init OK");
+
+	# index mirror
+	is(system($index, $mirror), 0, "v$v index mirror OK");
+
+	# read-only access
+	my $ro_mirror = PublicInbox::Inbox->new({mainrepo => $mirror});
+	($nr, $msgs) = $ro_mirror->recent;
+	is($nr, 1, 'only one message, so far');
+	is($msgs->[0]->{mid}, 'm@1', 'read first message');
+
+	# update master
+	$mime->header_set('Message-ID', '<m@2>');
+	ok($im->add($mime), '2nd message added');
+	$im->done;
+
+	# mirror updates
+	is(system('git', "--git-dir=$fetch_dir", qw(fetch -q)), 0, 'fetch OK');
+	is(system($index, $mirror), 0, "v$v index mirror again OK");
+	($nr, $msgs) = $ro_mirror->recent;
+	is($nr, 2, '2nd message seen in mirror');
+	is_deeply([sort { $a cmp $b } map { $_->{mid} } @$msgs],
+		['m@1','m@2'], 'got both messages in mirror');
+
+	# incremental index master (required for v1)
+	is(system($index, $ibx->{mainrepo}), 0, 'index master OK');
+	($nr, $msgs) = $ro_master->recent;
+	is($nr, 2, '2nd message seen in master');
+	is_deeply([sort { $a cmp $b } map { $_->{mid} } @$msgs],
+		['m@1','m@2'], 'got both messages in master');
+
+	# remove message from master
+	ok($im->remove($mime), '2nd message removed');
+	$im->done;
+
+	# sync the mirror
+	is(system('git', "--git-dir=$fetch_dir", qw(fetch -q)), 0, 'fetch OK');
+	is(system($index, $mirror), 0, "v$v index mirror again OK");
+	($nr, $msgs) = $ro_mirror->recent;
+	is($nr, 1, '2nd message gone from mirror');
+	is_deeply([map { $_->{mid} } @$msgs], ['m@1'],
+		'message unavailable in mirror');
+}
+
+# we can probably cull some other tests and put full/medium tests, here
+for my $level (qw(basic)) {
+	for my $v (1..2) {
+		subtest("v$v indexlevel=$level" => sub {
+			import_index_incremental($v, $level);
+		})
+	}
+}
+
+done_testing();
-- 
EW


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes ebiederm
2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
2018-08-01 16:43 ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
2018-08-02  3:00   ` Eric Wong
2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
2018-08-02 12:25       ` ebiederm
2018-08-02 17:12         ` ebiederm
2018-08-02 18:15           ` ebiederm
2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
2019-05-14  2:04         ` [PATCH 1/3] v1writable: new wrapper which is closer to v2writable Eric Wong
2019-05-14  2:04         ` [PATCH 2/3] v2writable: allow setting nproc via creat options Eric Wong
2019-05-14  2:04         ` [PATCH 3/3] searchidx: fix incremental index with indexlevel=basic on v1 Eric Wong
2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned ebiederm
2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.org/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox