From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E6CBC1F915; Sat, 14 Jul 2018 00:46:01 +0000 (UTC) Date: Sat, 14 Jul 2018 00:46:01 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch Message-ID: <20180714004601.x2xlmdxv5ahfqtwz@dcvr> References: <87k1q1bky6.fsf@xmission.com> <20180712014715.dn5aouayoa3uejp4@dcvr> <87k1q07dyc.fsf@xmission.com> <20180712230946.mqv3yjw4aabf7xrf@dcvr.yhbt.net> <878t6f1ch7.fsf@xmission.com> <20180713220259.GA27845@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180713220259.GA27845@dcvr> List-Id: Eric Wong wrote: > "Eric W. Biederman" wrote: > > Eric Wong writes: > > > "Eric W. Biederman" wrote: > > >> Then I am going to report a probable bug. In V2 in public-inbox-index > > >> I can not find a path from finding a 'd' file and a call to unindex. V1 > > >> unindexes deleted files. Rebased heads for purges call unindex. I > > >> don't see that for ordinary d files though. > > > > > > It shouldn't need to call unindex because they never get indexed > > > on rebuilds. V2 indexing walks history backwards (normal "git log" > > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs > > > as it encounters them. > > > > > > v1 needed to unindex because it used "git log --reverse" to walk > > > forward in history. > > > > This assumes that you see them in the same git pull. I would think > > ideally anything that is going to be deleted that quickly you can just > > skip archiving. > > > > What is the time window of you expecting 'd' messages to appear? > > Ah, this is definitely a bug when using incremental fetch + -index. > Right now, it only warns on unseen entries in $D but won't reach > beyond the current "git log" window. The following should fix it, thanks for the bug report. -------8<------- Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch The normal behavior is to prevent the deleted messages from being indexed in the first place. However, when fetching incrementally via git; public-inbox-index needs to account for deleted files which were created outside of the most recent fetch/reindexing window. Reported-by: Eric W. Biederman --- lib/PublicInbox/V2Writable.pm | 20 ++++++++++---------- t/v2mirror.t | 28 +++++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 11 deletions(-) diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index 412eb6a..934640e 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -653,7 +653,7 @@ sub mark_deleted { my $mids = mids($mime->header_obj); my $cid = content_id($mime); foreach my $mid (@$mids) { - $D->{"$mid\0$cid"} = 1; + $D->{"$mid\0$cid"} = $oid; } } @@ -671,7 +671,7 @@ sub reindex_oid { my $num = -1; my $del = 0; foreach my $mid (@$mids) { - $del += (delete $D->{"$mid\0$cid"} || 0); + $del += delete($D->{"$mid\0$cid"}) ? 1 : 0; my $n = $mm_tmp->num_for($mid); if (defined $n && $n > $num) { $mid0 = $mid; @@ -882,7 +882,7 @@ sub index_sync { my ($min, $max) = $mm_tmp->minmax; my $regen = $self->index_prepare($opts, $epoch_max, $ranges); $$regen += $max if $max; - my $D = {}; + my $D = {}; # "$mid\0$cid" => $oid my @cmd = qw(log --raw -r --pretty=tformat:%H --no-notes --no-color --no-abbrev --no-renames); @@ -912,13 +912,13 @@ sub index_sync { delete $self->{reindex_pipe}; $self->update_last_commit($git, $i, $cmt) if defined $cmt; } - my @d = sort keys %$D; - if (@d) { - warn "BUG: ", scalar(@d)," unseen deleted messages marked\n"; - foreach (@d) { - my ($mid, undef) = split(/\0/, $_, 2); - warn "<$mid>\n"; - } + + # unindex is required for leftovers if "deletes" affect messages + # in a previous fetch+index window: + if (scalar keys %$D) { + my $git = $self->{-inbox}->git; + $self->unindex_oid($git, $_) for values %$D; + $git->cleanup; } $self->done; } diff --git a/t/v2mirror.t b/t/v2mirror.t index c0c329c..f95ad0f 100644 --- a/t/v2mirror.t +++ b/t/v2mirror.t @@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror'); is_deeply(\@warn, [], 'no warnings from index_sync after purge'); } -$v2w->done; +# deletes happen in a different fetch window +{ + $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1}); + is(scalar($mset->items), 1, '1@example.com visible in mirror'); + $mime->header_set('Message-ID', '<1@example.com>'); + $mime->header_set('Subject', 'subject = 1'); + ok($v2w->remove($mime), 'removed <1@example.com> from source'); + $v2w->done; + fetch_each_epoch(); + + open my $err, '+>', "$tmpdir/index-err" or die "open: $!"; + my $ipid = fork; + if ($ipid == 0) { + dup2(fileno($err), 2) or die "dup2 failed: $!"; + exec("$script-index", "$tmpdir/m"); + die "exec fail: $!"; + } + ok($ipid, 'running index'); + is(waitpid($ipid, 0), $ipid, 'index done'); + is($?, 0, 'no error from index'); + ok(seek($err, 0, 0), 'rewound stderr'); + $err = eval { local $/; <$err> }; + is($err, '', 'no errors reported by index'); + $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1}); + is(scalar($mset->items), 0, '1@example.com no longer visible in mirror'); +} + ok(kill('TERM', $pid), 'killed httpd'); $pid = undef; waitpid(-1, 0); -- EW