From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 69A571F404; Mon, 26 Mar 2018 20:08:26 +0000 (UTC) Date: Mon, 26 Mar 2018 20:08:26 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [PATCH 08/13] v2writable: support reindexing Xapian Message-ID: <20180326200826.GA2165@80x24.org> References: <20180322094015.14422-1-e@80x24.org> <20180322094015.14422-9-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180322094015.14422-9-e@80x24.org> List-Id: wrote: > --- a/lib/PublicInbox/SearchIdx.pm > +++ b/lib/PublicInbox/SearchIdx.pm > @@ -369,10 +369,12 @@ sub add_message { > } > } > > + $self->delete_article($num) if defined $num; # for reindexing > if ($skel) { > push @values, $mids, $xpath, $data; > $skel->index_skeleton(\@values); > $doc->add_boolean_term('Q' . $_) foreach @$mids; > + $doc->add_boolean_term('XNUM' . $num) if defined $num; > $doc_id = $self->{xdb}->add_document($doc); > } else { > $doc_id = link_and_save($self, $doc, $mids, $refs, > @@ -421,6 +423,16 @@ sub remove_message { > } > } > > +sub delete_article { > + my ($self, $num) = @_; > + my $ndel = 0; > + batch_do($self, 'XNUM' . $num, sub { > + my ($ids) = @_; > + $ndel += scalar @$ids; > + $self->{xdb}->delete_document($_) for @$ids; > + }); > +} I will need to do some further investigation, but I must be missing something and there's increases in Xapian DB size which doesn't seem to get recovered on xapian-compact... > diff --git a/lib/PublicInbox/SearchIdxSkeleton.pm b/lib/PublicInbox/SearchIdxSkeleton.pm > index 78a1730..4f15816 100644 > --- a/lib/PublicInbox/SearchIdxSkeleton.pm > +++ b/lib/PublicInbox/SearchIdxSkeleton.pm > @@ -134,6 +134,7 @@ sub index_skeleton_real ($$) { > $smsg->load_from_data($doc_data); > my $num = $values->[PublicInbox::Search::NUM]; > my @refs = ($smsg->references =~ /<([^>]+)>/g); > + $self->delete_article($num) if defined $num; # for reindexing > $self->link_and_save($doc, $mids, \@refs, $num, $xpath); > }