From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 3121A208EE; Wed, 1 Aug 2018 16:45:54 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fkuG1-0002JK-H6; Wed, 01 Aug 2018 10:45:53 -0600 Received: from [97.119.167.31] (helo=x220.int.ebiederm.org) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_CBC_SHA256:128) (Exim 4.87) (envelope-from ) id 1fkuG0-0001mm-KW; Wed, 01 Aug 2018 10:45:53 -0600 From: "Eric W. Biederman" To: Eric Wong Cc: meta@public-inbox.org, "Eric W. Biederman" Date: Wed, 1 Aug 2018 11:43:39 -0500 Message-Id: <20180801164344.7911-8-ebiederm@xmission.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <878t5qkpis.fsf@xmission.com> References: <878t5qkpis.fsf@xmission.com> X-XM-SPF: eid=1fkuG0-0001mm-KW;;;mid=<20180801164344.7911-8-ebiederm@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+LGflChrOH+IQB5T/Io1BlMVxxWY1SNn8= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: [PATCH 08/13] Msgmap.pm: Track the largest value of num ever assigned X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: Today the only thing that prevents public-inbox not reusing the message numbers of deleted messages is the sqlite autoincrement magic and that only works part of the time. The new incremental indexing test has revealed areas where today public-inbox does try to reuse numbers of deleted messages. Reusing the message numbers of existing messages is a problem because if a client ever sees messages that are subsequently deleted the client will not see the new messages with their old numbers. In practice this is difficult to trigger because it requires the most recently added message to be removed and have the removal show up in a separate pull request. Still it can happen and it should be handled. Instead of infering the highset number ever used by finding the maximum number in the message map, track the largest number ever assigned directly. Update Msgmap to track this value and update the indexers to use this value. Signed-off-by: "Eric W. Biederman" --- lib/PublicInbox/Msgmap.pm | 23 +++++++++++++++++++++-- lib/PublicInbox/SearchIdx.pm | 8 ++++---- lib/PublicInbox/V2Writable.pm | 4 ++-- 3 files changed, 27 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/Msgmap.pm b/lib/PublicInbox/Msgmap.pm index fdc71e46e214..b79e4cde30b2 100644 --- a/lib/PublicInbox/Msgmap.pm +++ b/lib/PublicInbox/Msgmap.pm @@ -51,6 +51,10 @@ sub new_file { $dbh->begin_work; $self->created_at(time) unless $self->created_at; $dbh->commit; + + my (undef, $max) = $self->minmax(); + $max ||= 0; + $self->num_highwater($max); } $self; } @@ -107,6 +111,17 @@ sub created_at { $self->meta_accessor('created_at', $second); } +sub num_highwater { + my ($self, $num) = @_; + my $high = $self->{num_highwater} ||= + $self->meta_accessor('num_highwater'); + if (defined($num) && (!defined($high) || ($num > $high))) { + $self->{num_highwater} = $num; + $self->meta_accessor('num_highwater', $num); + } + $self->{num_highwater}; +} + sub mid_insert { my ($self, $mid) = @_; my $dbh = $self->{dbh}; @@ -114,7 +129,9 @@ sub mid_insert { INSERT OR IGNORE INTO msgmap (mid) VALUES (?) return if $sth->execute($mid) == 0; - $dbh->last_insert_id(undef, undef, 'msgmap', 'num'); + my $num = $dbh->last_insert_id(undef, undef, 'msgmap', 'num'); + $self->num_highwater($num) unless !defined($num); + $num; } sub mid_for { @@ -213,7 +230,9 @@ sub mid_set { $self->{dbh}->prepare( 'INSERT OR IGNORE INTO msgmap (num,mid) VALUES (?,?)'); }; - $sth->execute($num, $mid); + my $result = $sth->execute($num, $mid); + $self->num_highwater($num) if (defined($result) && $result == 1); + $result; } sub DESTROY { diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index ac821ac00c6a..2532c8dfd10d 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -627,20 +627,20 @@ sub _git_log { --no-notes --no-color --no-renames --diff-filter=AM), $range); ++$fcount while <$fh>; - my (undef, $max) = $self->{mm}->minmax; + my $high = $self->{mm}->num_highwater; if (index($range, '..') < 0) { - if ($max && $max == $fcount) { + if ($high && $high == $fcount) { # fix up old bugs in full indexes which caused messages to # not appear in Msgmap - $self->{regen_up} = $max; + $self->{regen_up} = $high; } else { # normal regen is for for fresh data $self->{regen_down} = $fcount; } } else { # Give oldest messages the smallest numbers - $self->{regen_down} = $max + $fcount; + $self->{regen_down} = $high + $fcount; } $git->popen(qw/log --no-notes --no-color --no-renames diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 934640eb672c..c450980c8f51 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -879,9 +879,9 @@ sub index_sync { my $mm_tmp = $self->{mm}->tmp_clone; my $ranges = $opts->{reindex} ? [] : $self->last_commits($epoch_max); - my ($min, $max) = $mm_tmp->minmax; + my $high = $self->{mm}->num_highwater(); my $regen = $self->index_prepare($opts, $epoch_max, $ranges); - $$regen += $max if $max; + $$regen += $high if $high; my $D = {}; # "$mid\0$cid" => $oid my @cmd = qw(log --raw -r --pretty=tformat:%H --no-notes --no-color --no-abbrev --no-renames); -- 2.17.1