From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 4/4] v2writable: do not modify DBs while iterating for ->remove
Date: Wed, 4 Apr 2018 21:25:00 +0000 [thread overview]
Message-ID: <20180404212500.1859-5-e@80x24.org> (raw)
In-Reply-To: <20180404212500.1859-1-e@80x24.org>
Xapian may become unhappy if a DB is modified during iteration:
nntp://news.gmane.org/20180228004400.GU12724@survex.com
---
lib/PublicInbox/V2Writable.pm | 46 +++++++++++++++++++++++++------------------
1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 5b4d9c0..74953d3 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -256,6 +256,7 @@ sub remove_internal {
my $mark;
foreach my $mid (@$mids) {
+ my %gone;
$srch->reopen->each_smsg_by_mid($mid, sub {
my ($smsg) = @_;
$smsg->load_expand;
@@ -267,28 +268,35 @@ sub remove_internal {
my $orig = $$msg;
my $cur = PublicInbox::MIME->new($msg);
if (content_id($cur) eq $cid) {
- $mm->num_delete($smsg->num);
- # $removed should only be set once assuming
- # no bugs in our deduplication code:
- $removed = $smsg;
- $removed->{mime} = $cur;
- my $oid = $smsg->{blob};
- if ($purge) {
- $purge->{$oid} = 1;
- } else {
- ($mark, undef) =
- $im->remove(\$orig, $cmt_msg);
- }
- $orig = undef;
- $removed->num; # memoize this for callers
-
- foreach my $idx (@$parts) {
- $idx->remote_remove($oid, $mid);
- }
- $self->{over}->remove_oid($oid, $mid);
+ $smsg->{mime} = $cur;
+ $gone{$smsg->num} = [ $smsg, \$orig ];
}
1; # continue
});
+ my $n = scalar keys %gone;
+ next unless $n;
+ if ($n > 1) {
+ warn "BUG: multiple articles linked to <$mid>\n",
+ join(',', sort keys %gone), "\n";
+ }
+ foreach my $num (keys %gone) {
+ my ($smsg, $orig) = @{$gone{$num}};
+ $mm->num_delete($num);
+ # $removed should only be set once assuming
+ # no bugs in our deduplication code:
+ $removed = $smsg;
+ my $oid = $smsg->{blob};
+ if ($purge) {
+ $purge->{$oid} = 1;
+ } else {
+ ($mark, undef) = $im->remove($orig, $cmt_msg);
+ }
+ $orig = undef;
+ foreach my $idx (@$parts) {
+ $idx->remote_remove($oid, $mid);
+ }
+ $self->{over}->remove_oid($oid, $mid);
+ }
$self->barrier;
}
--
EW
prev parent reply other threads:[~2018-04-04 21:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-04 21:24 [PATCH 0/4] incremental indexing support for mirrors Eric Wong (Contractor, The Linux Foundation)
2018-04-04 21:24 ` [PATCH 1/4] init: s/GIT_DIR/REPO_DIR/ in usage Eric Wong (Contractor, The Linux Foundation)
2018-04-04 21:24 ` [PATCH 2/4] import: rewrite less history during purge Eric Wong (Contractor, The Linux Foundation)
2018-04-04 21:24 ` [PATCH 3/4] v2: support incremental indexing + purge Eric Wong (Contractor, The Linux Foundation)
2018-04-04 21:25 ` Eric Wong (Contractor, The Linux Foundation) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180404212500.1859-5-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).