user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-02  3:00  7%   ` Eric Wong
@ 2018-08-02 12:08  7%     ` Eric W. Biederman
  0 siblings, 0 replies; 4+ results
From: Eric W. Biederman @ 2018-08-02 12:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> --- a/lib/PublicInbox/Msgmap.pm
>> +++ b/lib/PublicInbox/Msgmap.pm
>> @@ -51,6 +51,10 @@ sub new_file {
>>  		$dbh->begin_work;
>>  		$self->created_at(time) unless $self->created_at;
>>  		$dbh->commit;
>> +
>> +		my (undef, $max) = $self->minmax();
>> +		$max ||= 0;
>> +		$self->num_highwater($max);
>
> I think the calls to minmax and num_highwater should be in the
> transaction right above it.  I can squash locally if that's the
> only thing...

Sounds good.  With autocommit being on by default and repository locking
happening I was not at all certain where transaction boundaries need to
go (or if they are really needed).  But by it's very nature
num_highwater needs to be in the same transaction as whatever adds a new
higher number into msgmap.

> Everything else looks good, so far; thanks.

Eric

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:43  5% ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
@ 2018-08-02  3:00  7%   ` Eric Wong
  2018-08-02 12:08  7%     ` Eric W. Biederman
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2018-08-02  3:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> --- a/lib/PublicInbox/Msgmap.pm
> +++ b/lib/PublicInbox/Msgmap.pm
> @@ -51,6 +51,10 @@ sub new_file {
>  		$dbh->begin_work;
>  		$self->created_at(time) unless $self->created_at;
>  		$dbh->commit;
> +
> +		my (undef, $max) = $self->minmax();
> +		$max ||= 0;
> +		$self->num_highwater($max);

I think the calls to minmax and num_highwater should be in the
transaction right above it.  I can squash locally if that's the
only thing...

Everything else looks good, so far; thanks.


While testing this, it looks like I introduced a bug to
indexlevel=basic which broke incremental indexing when I made it
possible to upgrade to (medium|full).  Patch coming for that in
a bit...

^ permalink raw reply	[relevance 7%]

* [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
  2018-08-01 16:41  5% [PATCH 00/13]: Incremental index fixes Eric W. Biederman
@ 2018-08-01 16:43  5% ` Eric W. Biederman
  2018-08-02  3:00  7%   ` Eric Wong
  0 siblings, 1 reply; 4+ results
From: Eric W. Biederman @ 2018-08-01 16:43 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, Eric W. Biederman

Today the only thing that prevents public-inbox not reusing the
message numbers of deleted messages is the sqlite autoincrement magic
and that only works part of the time.  The new incremental indexing
test has revealed areas where today public-inbox does try to reuse
numbers of deleted messages.

Reusing the message numbers of existing messages is a problem because
if a client ever sees messages that are subsequently deleted the
client will not see the new messages with their old numbers.

In practice this is difficult to trigger because it requires the most
recently added message to be removed and have the removal show up in a
separate pull request.  Still it can happen and it should be handled.

Instead of infering the highset number ever used by finding the maximum
number in the message map, track the largest number ever assigned directly.

Update Msgmap to track this value and update the indexers to use this
value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Msgmap.pm     | 23 +++++++++++++++++++++--
 lib/PublicInbox/SearchIdx.pm  |  8 ++++----
 lib/PublicInbox/V2Writable.pm |  4 ++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index fdc71e46e214..b79e4cde30b2 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -51,6 +51,10 @@ sub new_file {
 		$dbh->begin_work;
 		$self->created_at(time) unless $self->created_at;
 		$dbh->commit;
+
+		my (undef, $max) = $self->minmax();
+		$max ||= 0;
+		$self->num_highwater($max);
 	}
 	$self;
 }
@@ -107,6 +111,17 @@ sub created_at {
 	$self->meta_accessor('created_at', $second);
 }
 
+sub num_highwater {
+	my ($self, $num) = @_;
+	my $high = $self->{num_highwater} ||=
+	    $self->meta_accessor('num_highwater');
+	if (defined($num) && (!defined($high) || ($num > $high))) {
+		$self->{num_highwater} = $num;
+		$self->meta_accessor('num_highwater', $num);
+	}
+	$self->{num_highwater};
+}
+
 sub mid_insert {
 	my ($self, $mid) = @_;
 	my $dbh = $self->{dbh};
@@ -114,7 +129,9 @@ sub mid_insert {
 INSERT OR IGNORE INTO msgmap (mid) VALUES (?)
 
 	return if $sth->execute($mid) == 0;
-	$dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	my $num = $dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	$self->num_highwater($num) unless !defined($num);
+	$num;
 }
 
 sub mid_for {
@@ -213,7 +230,9 @@ sub mid_set {
 		$self->{dbh}->prepare(
 			'INSERT OR IGNORE INTO msgmap (num,mid) VALUES (?,?)');
 	};
-	$sth->execute($num, $mid);
+	my $result = $sth->execute($num, $mid);
+	$self->num_highwater($num) if (defined($result) && $result == 1);
+	$result;
 }
 
 sub DESTROY {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index ac821ac00c6a..2532c8dfd10d 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -627,20 +627,20 @@ sub _git_log {
 			     --no-notes --no-color --no-renames
 			     --diff-filter=AM), $range);
 	++$fcount while <$fh>;
-	my (undef, $max) = $self->{mm}->minmax;
+	my $high = $self->{mm}->num_highwater;
 
 	if (index($range, '..') < 0) {
-		if ($max && $max == $fcount) {
+		if ($high && $high == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
-			$self->{regen_up} = $max;
+			$self->{regen_up} = $high;
 		} else {
 			# normal regen is for for fresh data
 			$self->{regen_down} = $fcount;
 		}
 	} else {
 		# Give oldest messages the smallest numbers
-		$self->{regen_down} = $max + $fcount;
+		$self->{regen_down} = $high + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 934640eb672c..c450980c8f51 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -879,9 +879,9 @@ sub index_sync {
 	my $mm_tmp = $self->{mm}->tmp_clone;
 	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
 
-	my ($min, $max) = $mm_tmp->minmax;
+	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
-	$$regen += $max if $max;
+	$$regen += $high if $high;
 	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
-- 
2.17.1


^ permalink raw reply related	[relevance 5%]

* [PATCH 00/13]: Incremental index fixes
@ 2018-08-01 16:41  5% Eric W. Biederman
  2018-08-01 16:43  5% ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
  0 siblings, 1 reply; 4+ results
From: Eric W. Biederman @ 2018-08-01 16:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


Eric,

Recently I noticed some weird indexing on one of my mailboxes.  So I dug
in and made the index_sync tests more robust and found several corner
cases where the current code can be problematic.  What follows are the
test cases and fixes for those cases.

The big one is when a message with the highest message number is
deleted.  This can cause message numbers to be reassigned so I
explicitly track the highest message number assigned.

Very confusing when reading mail there is a case in SearchIdx
that has been assigning message numbers in the wrong order.  Depending
on your time range that can be very confusing as old messages show up
before newer ones.

Finally in v2 deleted messages have not been being deleted from the
msgmap.  Which while great for keeping message numbers from going
backwards it means things still show up that shouldn't.

Eric W. Biederman (13):
      Import.pm: Don't assume {in} and {out} always exist
      t/v1reindex.t: Isolate the test cases
      t/v2reindex.t: Isolate the test cases more
      t/v[12]reindex.t: Place expected second in Xapian tests
      t/v[12]reindex.t: Test that the resulting msgmap is as expected
      t/v[12]reindex.t: Test incremental indexing works
      SearchIdx.pm: Always assign numbers backwards during incremental indexing
      Msgmap.pm: Track the largest value of num ever assigned
      t/v[12]reindex.t Verify num_highwater
      t/v[12]reindex.t: Verify the num highwater is as expected
      SearchIdx,V2Writeable: Update num_highwater on optimized deletes
      V2Writeable.pm: Ensure that a found message number is in the msgmap
      V2Writeable.pm: In unindex_oid delete the message from msgmap

 lib/PublicInbox/Import.pm     |   1 +
 lib/PublicInbox/Msgmap.pm     |  23 ++-
 lib/PublicInbox/SearchIdx.pm  |  30 ++--
 lib/PublicInbox/V2Writable.pm |  20 ++-
 t/v1reindex.t                 | 359 ++++++++++++++++++++++++++++++++++++------
 t/v2reindex.t                 | 335 +++++++++++++++++++++++++++++++++++----
 6 files changed, 666 insertions(+), 102 deletions(-)


Eric



^ permalink raw reply	[relevance 5%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-08-01 16:41  5% [PATCH 00/13]: Incremental index fixes Eric W. Biederman
2018-08-01 16:43  5% ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
2018-08-02  3:00  7%   ` Eric Wong
2018-08-02 12:08  7%     ` Eric W. Biederman

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).