user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] v2writable: unindex deleted messages after incremental fetch
  @ 2018-07-14  0:46 14%           ` Eric Wong
  0 siblings, 0 replies; 1+ results
From: Eric Wong @ 2018-07-14  0:46 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > Eric Wong <e@80x24.org> writes:
> > > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > >> Then I am going to report a probable bug.  In V2 in public-inbox-index
> > >> I can not find a path from finding a 'd' file and a call to unindex.  V1
> > >> unindexes deleted files.  Rebased heads for purges call unindex.  I
> > >> don't see that for ordinary d files though.
> > >
> > > It shouldn't need to call unindex because they never get indexed
> > > on rebuilds.  V2 indexing walks history backwards (normal "git log"
> > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs
> > > as it encounters them.
> > >
> > > v1 needed to unindex because it used "git log --reverse" to walk
> > > forward in history.
> > 
> > This assumes that you see them in the same git pull.  I would think
> > ideally anything that is going to be deleted that quickly you can just
> > skip archiving.
> > 
> > What is the time window of you expecting 'd' messages to appear?
> 
> Ah, this is definitely a bug when using incremental fetch + -index.
> Right now, it only warns on unseen entries in $D but won't reach
> beyond the current "git log" window.

The following should fix it, thanks for the bug report.

-------8<-------
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch

The normal behavior is to prevent the deleted messages from
being indexed in the first place.  However, when fetching
incrementally via git; public-inbox-index needs to account for
deleted files which were created outside of the most recent
fetch/reindexing window.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/PublicInbox/V2Writable.pm | 20 ++++++++++----------
 t/v2mirror.t                  | 28 +++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 412eb6a..934640e 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -653,7 +653,7 @@ sub mark_deleted {
 	my $mids = mids($mime->header_obj);
 	my $cid = content_id($mime);
 	foreach my $mid (@$mids) {
-		$D->{"$mid\0$cid"} = 1;
+		$D->{"$mid\0$cid"} = $oid;
 	}
 }
 
@@ -671,7 +671,7 @@ sub reindex_oid {
 	my $num = -1;
 	my $del = 0;
 	foreach my $mid (@$mids) {
-		$del += (delete $D->{"$mid\0$cid"} || 0);
+		$del += delete($D->{"$mid\0$cid"}) ? 1 : 0;
 		my $n = $mm_tmp->num_for($mid);
 		if (defined $n && $n > $num) {
 			$mid0 = $mid;
@@ -882,7 +882,7 @@ sub index_sync {
 	my ($min, $max) = $mm_tmp->minmax;
 	my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
 	$$regen += $max if $max;
-	my $D = {};
+	my $D = {}; # "$mid\0$cid" => $oid
 	my @cmd = qw(log --raw -r --pretty=tformat:%H
 			--no-notes --no-color --no-abbrev --no-renames);
 
@@ -912,13 +912,13 @@ sub index_sync {
 		delete $self->{reindex_pipe};
 		$self->update_last_commit($git, $i, $cmt) if defined $cmt;
 	}
-	my @d = sort keys %$D;
-	if (@d) {
-		warn "BUG: ", scalar(@d)," unseen deleted messages marked\n";
-		foreach (@d) {
-			my ($mid, undef) = split(/\0/, $_, 2);
-			warn "<$mid>\n";
-		}
+
+	# unindex is required for leftovers if "deletes" affect messages
+	# in a previous fetch+index window:
+	if (scalar keys %$D) {
+		my $git = $self->{-inbox}->git;
+		$self->unindex_oid($git, $_) for values %$D;
+		$git->cleanup;
 	}
 	$self->done;
 }
diff --git a/t/v2mirror.t b/t/v2mirror.t
index c0c329c..f95ad0f 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror');
 	is_deeply(\@warn, [], 'no warnings from index_sync after purge');
 }
 
-$v2w->done;
+# deletes happen in a different fetch window
+{
+	$mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+	is(scalar($mset->items), 1, '1@example.com visible in mirror');
+	$mime->header_set('Message-ID', '<1@example.com>');
+	$mime->header_set('Subject', 'subject = 1');
+	ok($v2w->remove($mime), 'removed <1@example.com> from source');
+	$v2w->done;
+	fetch_each_epoch();
+
+	open my $err, '+>', "$tmpdir/index-err" or die "open: $!";
+	my $ipid = fork;
+	if ($ipid == 0) {
+		dup2(fileno($err), 2) or die "dup2 failed: $!";
+		exec("$script-index", "$tmpdir/m");
+		die "exec fail: $!";
+	}
+	ok($ipid, 'running index');
+	is(waitpid($ipid, 0), $ipid, 'index done');
+	is($?, 0, 'no error from index');
+	ok(seek($err, 0, 0), 'rewound stderr');
+	$err = eval { local $/; <$err> };
+	is($err, '', 'no errors reported by index');
+	$mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+	is(scalar($mset->items), 0, '1@example.com no longer visible in mirror');
+}
+
 ok(kill('TERM', $pid), 'killed httpd');
 $pid = undef;
 waitpid(-1, 0);
-- 
EW

^ permalink raw reply related	[relevance 14%]

Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2018-07-11 20:01     Q: V2 format Eric W. Biederman
2018-07-12  1:47     ` Eric Wong
2018-07-12 13:58       ` Eric W. Biederman
2018-07-12 23:09         ` Eric Wong
2018-07-13 13:39           ` Eric W. Biederman
2018-07-13 22:02             ` bug: v2 deletes on incremental fetch [was: Q: V2 format] Eric Wong
2018-07-14  0:46 14%           ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).