From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out02.mta.xmission.com (out02.mta.xmission.com [166.70.13.232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id BB8E61F597; Thu, 2 Aug 2018 17:12:35 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1flH9O-0001pD-Ny; Thu, 02 Aug 2018 11:12:34 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1flH9O-00051y-2K; Thu, 02 Aug 2018 11:12:34 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <878t5qkpis.fsf@xmission.com> <20180801164344.7911-8-ebiederm@xmission.com> <20180802030022.ly2tvbobs7tkfmn3@whir> <20180802034404.cnvfqlgvynamnc6n@whir> <87wot9hs5a.fsf@xmission.com> Date: Thu, 02 Aug 2018 12:12:29 -0500 In-Reply-To: <87wot9hs5a.fsf@xmission.com> (Eric W. Biederman's message of "Thu, 02 Aug 2018 07:25:37 -0500") Message-ID: <87h8kcitfm.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1flH9O-00051y-2K;;;mid=<87h8kcitfm.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/pk2SW74Lo2U5XrUQDycYrWtxd5dj6jz0= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [WIP] searchidx: support incremental indexing on indexlevel=basic X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: ebiederm@xmission.com (Eric W. Biederman) writes: > Eric Wong writes: > >> I wrote: >>> While testing this, it looks like I introduced a bug to >>> indexlevel=basic which broke incremental indexing when I made it >>> possible to upgrade to (medium|full). Patch coming for that in >>> a bit... >> >> Eep, I think there's deeper problems with indexlevel=basic and >> incremental updates, since it's still doing lookups against >> Xapian for deletes... >> >> This is my work-in-progress to stop full-reindexing, at least. > > So I don't think your change is quite right. > > If the repo has only lived with indexlevel == 'basic' then last_commit > should always match the output of xdb->get_metadata('last_commit') as > it will always been undef or ''. > > In fact I don't see any code that is updating > $xdb->get_metadata('last_commit'); > > Am I wrong or can we just always use mm->last_commit for this logic? Yes. I was wrong. I had overlooked the get_metadata($mkey) and set_metadata($mkey) calls. From just below but those are all with $mkey == 'last_commit' So 'last_commit' does get updated. I think we want something more like below. Where _last_x_commit just computes the last_commit value to use, using both values of last_commit. I am not 100% about the loop. Perhaps that check to see if either value of last_commit changes? I definitely don't think when we already have one set of logic for substituting in last_commit we should be adding in another. That gets confusing very fast. Eric diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 198f39b96e6c..130e5a41c5e7 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -669,6 +669,23 @@ sub need_update ($$$) { ($n eq '' || $n > 0); } +sub _last_x_commit +{ + my ($self) = @_; + my $lm = $mm->last_commit || ''; + my $lx = ''; + if ($self->{indexlevel} ~= $xapianlevels) { + $lx = $self->{xdb}->get_metadata('last_commit') || ''; + } else { + $lx = $lm; + } + # Use last_commit from msgmap if it is older or unset + if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) { + $lx = $lm; + } + $lx; +} + # indexes all unindexed messages (v1 only) sub _index_sync { my ($self, $opts) = @_; @@ -682,19 +699,8 @@ sub _index_sync { my $mm = _msgmap_init($self); do { $xlog = undef; - $mkey = 'last_commit'; - $last_commit = $xdb->get_metadata('last_commit'); - $lx = $last_commit; - if ($reindex) { - $lx = ''; - $mkey = undef if $last_commit ne ''; - } - - # use last_commit from msgmap if it is older or unset - my $lm = $mm->last_commit || ''; - if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) { - $lx = $lm; - } + $last_commit = $self->_last_x_commit; + $lx = !$reindex ? $last_commit : ''; $self->{over}->rollback_lazy; $self->{over}->disconnect; @@ -708,7 +714,7 @@ sub _index_sync { $xlog = _git_log($self, $range); $xdb = $self->begin_txn_lazy; - } while ($xdb->get_metadata('last_commit') ne $last_commit); + } while ($self->_last_x_commit ne $last_commit); my $dbh = $mm->{dbh} if $mm; my $cb = sub { @@ -722,7 +728,7 @@ sub _index_sync { } $dbh->commit; } - if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) { + if ($newest && $self->{indexlevel} =~ $xapianlevels) { my $cur = $xdb->get_metadata($mkey); if (need_update($self, $cur, $newest)) { $xdb->set_metadata($mkey, $newest); >> ----8<----- >> Subject: [PATCH] searchidx: support incremental indexing on indexlevel=basic >> >> --- >> lib/PublicInbox/SearchIdx.pm | 13 +++++++++++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm >> index 54f82aa..5cac08f 100644 >> --- a/lib/PublicInbox/SearchIdx.pm >> +++ b/lib/PublicInbox/SearchIdx.pm >> @@ -668,6 +668,15 @@ sub need_update ($$$) { >> ($n eq '' || $n > 0); >> } >> >> +sub _last_x_commit { >> + my ($self) = @_; >> + if ($self->{indexlevel} =~ $xapianlevels) { >> + $self->{xdb}->get_metadata('last_commit'); >> + } else { >> + $self->{mm}->last_commit || ''; >> + } >> +} >> + >> # indexes all unindexed messages (v1 only) >> sub _index_sync { >> my ($self, $opts) = @_; >> @@ -682,7 +691,7 @@ sub _index_sync { >> do { >> $xlog = undef; >> $mkey = 'last_commit'; >> - $last_commit = $xdb->get_metadata('last_commit'); >> + $last_commit = $self->_last_x_commit; >> $lx = $last_commit; >> if ($reindex) { >> $lx = ''; >> @@ -707,7 +716,7 @@ sub _index_sync { >> $xlog = _git_log($self, $range); >> >> $xdb = $self->begin_txn_lazy; >> - } while ($xdb->get_metadata('last_commit') ne $last_commit); >> + } while ($self->_last_x_commit ne $last_commit); >> >> my $dbh = $mm->{dbh} if $mm; >> my $cb = sub {