user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: meta@public-inbox.org
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch
Date: Sat, 14 Jul 2018 00:46:01 +0000	[thread overview]
Message-ID: <20180714004601.x2xlmdxv5ahfqtwz@dcvr> (raw)
In-Reply-To: <20180713220259.GA27845@dcvr>

Eric Wong <e@80x24.org> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > Eric Wong <e@80x24.org> writes:
> > > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > >> Then I am going to report a probable bug.  In V2 in public-inbox-index
> > >> I can not find a path from finding a 'd' file and a call to unindex.  V1
> > >> unindexes deleted files.  Rebased heads for purges call unindex.  I
> > >> don't see that for ordinary d files though.
> > >
> > > It shouldn't need to call unindex because they never get indexed
> > > on rebuilds.  V2 indexing walks history backwards (normal "git log"
> > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs
> > > as it encounters them.
> > >
> > > v1 needed to unindex because it used "git log --reverse" to walk
> > > forward in history.
> > 
> > This assumes that you see them in the same git pull.  I would think
> > ideally anything that is going to be deleted that quickly you can just
> > skip archiving.
> > 
> > What is the time window of you expecting 'd' messages to appear?
> 
> Ah, this is definitely a bug when using incremental fetch + -index.
> Right now, it only warns on unseen entries in $D but won't reach
> beyond the current "git log" window.

The following should fix it, thanks for the bug report.

-------8<-------
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch

The normal behavior is to prevent the deleted messages from
being indexed in the first place.  However, when fetching
incrementally via git; public-inbox-index needs to account for
deleted files which were created outside of the most recent
fetch/reindexing window.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 20 ++++++++++----------
 t/v2mirror.t                  | 28 +++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 412eb6a..934640e 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -653,7 +653,7 @@ sub mark_deleted {
 	my $mids = mids($mime->header_obj);
 	my $cid = content_id($mime);
 	foreach my $mid (@$mids) {
-		$D->{"$mid\0$cid"} = 1;
+		$D->{"$mid\0$cid"} = $oid;
 	}
 }
 
@@ -671,7 +671,7 @@ sub reindex_oid {
 	my $num = -1;
 	my $del = 0;
 	foreach my $mid (@$mids) {
-		$del += (delete $D->{"$mid\0$cid"} || 0);
+		$del += delete($D->{"$mid\0$cid"}) ? 1 : 0;
 		my $n = $mm_tmp->num_for($mid);
 		if (defined $n && $n > $num) {
 			$mid0 = $mid;
@@ -882,7 +882,7 @@ sub index_sync {
 	my ($min, $max) = $mm_tmp->minmax;
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
 	$$regen += $max if $max;
-	my $D = {};
+	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
 
@@ -912,13 +912,13 @@ sub index_sync {
 		delete $self->{reindex_pipe};
 		$self->update_last_commit($git, $i, $cmt) if defined $cmt;
 	}
-	my @d = sort keys %$D;
-	if (@d) {
-		warn "BUG: ", scalar(@d)," unseen deleted messages marked\n";
-		foreach (@d) {
-			my ($mid, undef) = split(/\0/, $_, 2);
-			warn "<$mid>\n";
-		}
+
+	# unindex is required for leftovers if "deletes" affect messages
+	# in a previous fetch+index window:
+	if (scalar keys %$D) {
+		my $git = $self->{-inbox}->git;
+		$self->unindex_oid($git, $_) for values %$D;
+		$git->cleanup;
 	}
 	$self->done;
 }
diff --git a/t/v2mirror.t b/t/v2mirror.t
index c0c329c..f95ad0f 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror');
 	is_deeply(\@warn, [], 'no warnings from index_sync after purge');
 }
 
-$v2w->done;
+# deletes happen in a different fetch window
+{
+	$mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+	is(scalar($mset->items), 1, '1@example.com visible in mirror');
+	$mime->header_set('Message-ID', '<1@example.com>');
+	$mime->header_set('Subject', 'subject = 1');
+	ok($v2w->remove($mime), 'removed <1@example.com> from source');
+	$v2w->done;
+	fetch_each_epoch();
+
+	open my $err, '+>', "$tmpdir/index-err" or die "open: $!";
+	my $ipid = fork;
+	if ($ipid == 0) {
+		dup2(fileno($err), 2) or die "dup2 failed: $!";
+		exec("$script-index", "$tmpdir/m");
+		die "exec fail: $!";
+	}
+	ok($ipid, 'running index');
+	is(waitpid($ipid, 0), $ipid, 'index done');
+	is($?, 0, 'no error from index');
+	ok(seek($err, 0, 0), 'rewound stderr');
+	$err = eval { local $/; <$err> };
+	is($err, '', 'no errors reported by index');
+	$mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+	is(scalar($mset->items), 0, '1@example.com no longer visible in mirror');
+}
+
 ok(kill('TERM', $pid), 'killed httpd');
 $pid = undef;
 waitpid(-1, 0);
-- 
EW

  parent reply	other threads:[~2018-07-14  0:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 20:01 Q: V2 format Eric W. Biederman
2018-07-11 21:18 ` Konstantin Ryabitsev
2018-07-11 21:41   ` Eric W. Biederman
2018-07-12  1:47 ` Eric Wong
2018-07-12 13:58   ` Eric W. Biederman
2018-07-12 23:09     ` Eric Wong
2018-07-13 13:39       ` Eric W. Biederman
2018-07-13 20:03         ` Eric W. Biederman
2018-07-13 22:22           ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong
2018-07-14 19:01             ` Eric W. Biederman
2018-07-15  3:18               ` Eric Wong
2018-07-16 15:20                 ` Eric W. Biederman
2018-07-13 22:02         ` bug: v2 deletes on incremental fetch " Eric Wong
2018-07-13 22:51           ` Eric W. Biederman
2018-07-14  0:46           ` Eric Wong [this message]
2018-07-13 23:07         ` IMAP server " Eric Wong
2018-07-13 23:12           ` Eric W. Biederman
2018-09-28 20:10           ` Johannes Berg
2018-09-28 21:01             ` Eric W. Biederman
2018-10-01  7:46               ` Johannes Berg
2018-10-01  8:51                 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180714004601.x2xlmdxv5ahfqtwz@dcvr \
    --to=e@80x24.org \
    --cc=ebiederm@xmission.com \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).