* [PATCH] v2writable: unindex deleted messages after incremental fetch
@ 2018-07-14 0:46 14% ` Eric Wong
0 siblings, 0 replies; 1+ results
From: Eric Wong @ 2018-07-14 0:46 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > Eric Wong <e@80x24.org> writes:
> > > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > >> Then I am going to report a probable bug. In V2 in public-inbox-index
> > >> I can not find a path from finding a 'd' file and a call to unindex. V1
> > >> unindexes deleted files. Rebased heads for purges call unindex. I
> > >> don't see that for ordinary d files though.
> > >
> > > It shouldn't need to call unindex because they never get indexed
> > > on rebuilds. V2 indexing walks history backwards (normal "git log"
> > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs
> > > as it encounters them.
> > >
> > > v1 needed to unindex because it used "git log --reverse" to walk
> > > forward in history.
> >
> > This assumes that you see them in the same git pull. I would think
> > ideally anything that is going to be deleted that quickly you can just
> > skip archiving.
> >
> > What is the time window of you expecting 'd' messages to appear?
>
> Ah, this is definitely a bug when using incremental fetch + -index.
> Right now, it only warns on unseen entries in $D but won't reach
> beyond the current "git log" window.
The following should fix it, thanks for the bug report.
-------8<-------
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch
The normal behavior is to prevent the deleted messages from
being indexed in the first place. However, when fetching
incrementally via git; public-inbox-index needs to account for
deleted files which were created outside of the most recent
fetch/reindexing window.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
---
lib/PublicInbox/V2Writable.pm | 20 ++++++++++----------
t/v2mirror.t | 28 +++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 412eb6a..934640e 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -653,7 +653,7 @@ sub mark_deleted {
my $mids = mids($mime->header_obj);
my $cid = content_id($mime);
foreach my $mid (@$mids) {
- $D->{"$mid\0$cid"} = 1;
+ $D->{"$mid\0$cid"} = $oid;
}
}
@@ -671,7 +671,7 @@ sub reindex_oid {
my $num = -1;
my $del = 0;
foreach my $mid (@$mids) {
- $del += (delete $D->{"$mid\0$cid"} || 0);
+ $del += delete($D->{"$mid\0$cid"}) ? 1 : 0;
my $n = $mm_tmp->num_for($mid);
if (defined $n && $n > $num) {
$mid0 = $mid;
@@ -882,7 +882,7 @@ sub index_sync {
my ($min, $max) = $mm_tmp->minmax;
my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
$$regen += $max if $max;
- my $D = {};
+ my $D = {}; # "$mid\0$cid" => $oid
my @cmd = qw(log --raw -r --pretty=tformat:%H
--no-notes --no-color --no-abbrev --no-renames);
@@ -912,13 +912,13 @@ sub index_sync {
delete $self->{reindex_pipe};
$self->update_last_commit($git, $i, $cmt) if defined $cmt;
}
- my @d = sort keys %$D;
- if (@d) {
- warn "BUG: ", scalar(@d)," unseen deleted messages marked\n";
- foreach (@d) {
- my ($mid, undef) = split(/\0/, $_, 2);
- warn "<$mid>\n";
- }
+
+ # unindex is required for leftovers if "deletes" affect messages
+ # in a previous fetch+index window:
+ if (scalar keys %$D) {
+ my $git = $self->{-inbox}->git;
+ $self->unindex_oid($git, $_) for values %$D;
+ $git->cleanup;
}
$self->done;
}
diff --git a/t/v2mirror.t b/t/v2mirror.t
index c0c329c..f95ad0f 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror');
is_deeply(\@warn, [], 'no warnings from index_sync after purge');
}
-$v2w->done;
+# deletes happen in a different fetch window
+{
+ $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+ is(scalar($mset->items), 1, '1@example.com visible in mirror');
+ $mime->header_set('Message-ID', '<1@example.com>');
+ $mime->header_set('Subject', 'subject = 1');
+ ok($v2w->remove($mime), 'removed <1@example.com> from source');
+ $v2w->done;
+ fetch_each_epoch();
+
+ open my $err, '+>', "$tmpdir/index-err" or die "open: $!";
+ my $ipid = fork;
+ if ($ipid == 0) {
+ dup2(fileno($err), 2) or die "dup2 failed: $!";
+ exec("$script-index", "$tmpdir/m");
+ die "exec fail: $!";
+ }
+ ok($ipid, 'running index');
+ is(waitpid($ipid, 0), $ipid, 'index done');
+ is($?, 0, 'no error from index');
+ ok(seek($err, 0, 0), 'rewound stderr');
+ $err = eval { local $/; <$err> };
+ is($err, '', 'no errors reported by index');
+ $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+ is(scalar($mset->items), 0, '1@example.com no longer visible in mirror');
+}
+
ok(kill('TERM', $pid), 'killed httpd');
$pid = undef;
waitpid(-1, 0);
--
EW
^ permalink raw reply related [relevance 14%]
Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-07-11 20:01 Q: V2 format Eric W. Biederman
2018-07-12 1:47 ` Eric Wong
2018-07-12 13:58 ` Eric W. Biederman
2018-07-12 23:09 ` Eric Wong
2018-07-13 13:39 ` Eric W. Biederman
2018-07-13 22:02 ` bug: v2 deletes on incremental fetch [was: Q: V2 format] Eric Wong
2018-07-14 0:46 14% ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).