user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org, "Eric W. Biederman" <ebiederm@xmission.com>
Subject: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned
Date: Wed,  1 Aug 2018 11:43:39 -0500	[thread overview]
Message-ID: <20180801164344.7911-8-ebiederm@xmission.com> (raw)
In-Reply-To: <878t5qkpis.fsf@xmission.com>

Today the only thing that prevents public-inbox not reusing the
message numbers of deleted messages is the sqlite autoincrement magic
and that only works part of the time.  The new incremental indexing
test has revealed areas where today public-inbox does try to reuse
numbers of deleted messages.

Reusing the message numbers of existing messages is a problem because
if a client ever sees messages that are subsequently deleted the
client will not see the new messages with their old numbers.

In practice this is difficult to trigger because it requires the most
recently added message to be removed and have the removal show up in a
separate pull request.  Still it can happen and it should be handled.

Instead of infering the highset number ever used by finding the maximum
number in the message map, track the largest number ever assigned directly.

Update Msgmap to track this value and update the indexers to use this
value.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/Msgmap.pm     | 23 +++++++++++++++++++++--
 lib/PublicInbox/SearchIdx.pm  |  8 ++++----
 lib/PublicInbox/V2Writable.pm |  4 ++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm
index fdc71e46e214..b79e4cde30b2 100644
--- a/lib/PublicInbox/Msgmap.pm
+++ b/lib/PublicInbox/Msgmap.pm
@@ -51,6 +51,10 @@ sub new_file {
 		$dbh->begin_work;
 		$self->created_at(time) unless $self->created_at;
 		$dbh->commit;
+
+		my (undef, $max) = $self->minmax();
+		$max ||= 0;
+		$self->num_highwater($max);
 	}
 	$self;
 }
@@ -107,6 +111,17 @@ sub created_at {
 	$self->meta_accessor('created_at', $second);
 }
 
+sub num_highwater {
+	my ($self, $num) = @_;
+	my $high = $self->{num_highwater} ||=
+	    $self->meta_accessor('num_highwater');
+	if (defined($num) && (!defined($high) || ($num > $high))) {
+		$self->{num_highwater} = $num;
+		$self->meta_accessor('num_highwater', $num);
+	}
+	$self->{num_highwater};
+}
+
 sub mid_insert {
 	my ($self, $mid) = @_;
 	my $dbh = $self->{dbh};
@@ -114,7 +129,9 @@ sub mid_insert {
 INSERT OR IGNORE INTO msgmap (mid) VALUES (?)
 
 	return if $sth->execute($mid) == 0;
-	$dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	my $num = $dbh->last_insert_id(undef, undef, 'msgmap', 'num');
+	$self->num_highwater($num) unless !defined($num);
+	$num;
 }
 
 sub mid_for {
@@ -213,7 +230,9 @@ sub mid_set {
 		$self->{dbh}->prepare(
 			'INSERT OR IGNORE INTO msgmap (num,mid) VALUES (?,?)');
 	};
-	$sth->execute($num, $mid);
+	my $result = $sth->execute($num, $mid);
+	$self->num_highwater($num) if (defined($result) && $result == 1);
+	$result;
 }
 
 sub DESTROY {
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index ac821ac00c6a..2532c8dfd10d 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -627,20 +627,20 @@ sub _git_log {
 			     --no-notes --no-color --no-renames
 			     --diff-filter=AM), $range);
 	++$fcount while <$fh>;
-	my (undef, $max) = $self->{mm}->minmax;
+	my $high = $self->{mm}->num_highwater;
 
 	if (index($range, '..') < 0) {
-		if ($max && $max == $fcount) {
+		if ($high && $high == $fcount) {
 			# fix up old bugs in full indexes which caused messages to
 			# not appear in Msgmap
-			$self->{regen_up} = $max;
+			$self->{regen_up} = $high;
 		} else {
 			# normal regen is for for fresh data
 			$self->{regen_down} = $fcount;
 		}
 	} else {
 		# Give oldest messages the smallest numbers
-		$self->{regen_down} = $max + $fcount;
+		$self->{regen_down} = $high + $fcount;
 	}
 
 	$git->popen(qw/log --no-notes --no-color --no-renames
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 934640eb672c..c450980c8f51 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -879,9 +879,9 @@ sub index_sync {
 	my $mm_tmp = $self->{mm}->tmp_clone;
 	my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max);
 
-	my ($min, $max) = $mm_tmp->minmax;
+	my $high = $self->{mm}->num_highwater();
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
-	$$regen += $max if $max;
+	$$regen += $high if $high;
 	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
-- 
2.17.1


  parent reply	other threads:[~2018-08-01 16:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 16:41 [PATCH 00/13]: Incremental index fixes Eric W. Biederman
2018-08-01 16:43 ` [PATCH 01/13] Import.pm: Don't assume {in} and {out} always exist Eric W. Biederman
2018-08-01 16:43 ` [PATCH 02/13] t/v1reindex.t: Isolate the test cases Eric W. Biederman
2018-08-01 16:43 ` [PATCH 03/13] t/v2reindex.t: Isolate the test cases more Eric W. Biederman
2018-08-01 16:43 ` [PATCH 04/13] t/v[12]reindex.t: Place expected second in Xapian tests Eric W. Biederman
2018-08-01 16:43 ` [PATCH 05/13] t/v[12]reindex.t: Test that the resulting msgmap is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 06/13] t/v[12]reindex.t: Test incremental indexing works Eric W. Biederman
2018-08-01 16:43 ` [PATCH 07/13] SearchIdx.pm: Always assign numbers backwards during incremental indexing Eric W. Biederman
2018-08-01 16:43 ` Eric W. Biederman [this message]
2018-08-02  3:00   ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric Wong
2018-08-02  3:44     ` [WIP] searchidx: support incremental indexing on indexlevel=basic Eric Wong
2018-08-02 12:25       ` Eric W. Biederman
2018-08-02 17:12         ` Eric W. Biederman
2018-08-02 18:15           ` Eric W. Biederman
2019-05-14  2:04       ` [PATCH 0/3] incremental index fixes for indexlevel=basic Eric Wong
2019-05-14  2:04         ` [PATCH 1/3] v1writable: new wrapper which is closer to v2writable Eric Wong
2019-05-14  2:04         ` [PATCH 2/3] v2writable: allow setting nproc via creat options Eric Wong
2019-05-14  2:04         ` [PATCH 3/3] searchidx: fix incremental index with indexlevel=basic on v1 Eric Wong
2018-08-02 12:08     ` [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned Eric W. Biederman
2018-08-01 16:43 ` [PATCH 09/13] t/v[12]reindex.t Verify num_highwater Eric W. Biederman
2018-08-01 16:43 ` [PATCH 10/13] t/v[12]reindex.t: Verify the num highwater is as expected Eric W. Biederman
2018-08-01 16:43 ` [PATCH 11/13] SearchIdx,V2Writeable: Update num_highwater on optimized deletes Eric W. Biederman
2018-08-01 16:43 ` [PATCH 12/13] V2Writeable.pm: Ensure that a found message number is in the msgmap Eric W. Biederman
2018-08-01 16:43 ` [PATCH 13/13] V2Writeable.pm: In unindex_oid delete the message from msgmap Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180801164344.7911-8-ebiederm@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).