From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E6E591F597; Thu, 2 Aug 2018 18:16:10 +0000 (UTC) Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1flI8v-0003q7-Ja; Thu, 02 Aug 2018 12:16:09 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1flI8g-0000ef-89; Thu, 02 Aug 2018 12:16:09 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <878t5qkpis.fsf@xmission.com> <20180801164344.7911-8-ebiederm@xmission.com> <20180802030022.ly2tvbobs7tkfmn3@whir> <20180802034404.cnvfqlgvynamnc6n@whir> <87wot9hs5a.fsf@xmission.com> <87h8kcitfm.fsf@xmission.com> Date: Thu, 02 Aug 2018 13:15:49 -0500 In-Reply-To: <87h8kcitfm.fsf@xmission.com> (Eric W. Biederman's message of "Thu, 02 Aug 2018 12:12:29 -0500") Message-ID: <87bmakiqi2.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1flI8g-0000ef-89;;;mid=<87bmakiqi2.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+QrzHkffRvBYQeDcDZy7FWcY7nZG+GLU0= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [WIP] searchidx: support incremental indexing on indexlevel=basic X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) List-Id: ebiederm@xmission.com (Eric W. Biederman) writes: > ebiederm@xmission.com (Eric W. Biederman) writes: > >> Eric Wong writes: >> >>> I wrote: >>>> While testing this, it looks like I introduced a bug to >>>> indexlevel=basic which broke incremental indexing when I made it >>>> possible to upgrade to (medium|full). Patch coming for that in >>>> a bit... >>> >>> Eep, I think there's deeper problems with indexlevel=basic and >>> incremental updates, since it's still doing lookups against >>> Xapian for deletes... >>> >>> This is my work-in-progress to stop full-reindexing, at least. >> >> So I don't think your change is quite right. >> >> If the repo has only lived with indexlevel == 'basic' then last_commit >> should always match the output of xdb->get_metadata('last_commit') as >> it will always been undef or ''. >> >> In fact I don't see any code that is updating >> $xdb->get_metadata('last_commit'); >> >> Am I wrong or can we just always use mm->last_commit for this logic? > > Yes. I was wrong. I had overlooked the get_metadata($mkey) and > set_metadata($mkey) calls. From just below but those are all > with $mkey == 'last_commit' So 'last_commit' does get updated. > > I think we want something more like below. Where _last_x_commit > just computes the last_commit value to use, using both values of > last_commit. > > I am not 100% about the loop. Perhaps that check to see if either > value of last_commit changes? > > I definitely don't think when we already have one set of logic > for substituting in last_commit we should be adding in another. > That gets confusing very fast. > Make that something like this where we remove the $mkey variable and all of it's special semantics with regards to reindexing entirely. Eric diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm index 198f39b96e6c..8b12da0436a4 100644 --- a/lib/PublicInbox/SearchIdx.pm +++ b/lib/PublicInbox/SearchIdx.pm @@ -669,6 +669,23 @@ sub need_update ($$$) { ($n eq '' || $n > 0); } +sub _last_x_commit +{ + my ($self) = @_; + my $lm = $mm->last_commit || ''; + my $lx = ''; + if ($self->{indexlevel} ~= $xapianlevels) { + $lx = $self->{xdb}->get_metadata('last_commit') || ''; + } else { + $lx = $lm; + } + # Use last_commit from msgmap if it is older or unset + if (!$lm || ($lx && $lx && is_ancestor($git, $lm, $lx))) { + $lx = $lm; + } + $lx; +} + # indexes all unindexed messages (v1 only) sub _index_sync { my ($self, $opts) = @_; @@ -682,19 +699,8 @@ sub _index_sync { my $mm = _msgmap_init($self); do { $xlog = undef; - $mkey = 'last_commit'; - $last_commit = $xdb->get_metadata('last_commit'); - $lx = $last_commit; - if ($reindex) { - $lx = ''; - $mkey = undef if $last_commit ne ''; - } - - # use last_commit from msgmap if it is older or unset - my $lm = $mm->last_commit || ''; - if (!$lm || ($lm && $lx && is_ancestor($git, $lm, $lx))) { - $lx = $lm; - } + $last_commit = $self->_last_x_commit; + $lx = !$reindex ? $last_commit : ''; $self->{over}->rollback_lazy; $self->{over}->disconnect; @@ -708,7 +714,7 @@ sub _index_sync { $xlog = _git_log($self, $range); $xdb = $self->begin_txn_lazy; - } while ($xdb->get_metadata('last_commit') ne $last_commit); + } while ($self->_last_x_commit ne $last_commit); my $dbh = $mm->{dbh} if $mm; my $cb = sub { @@ -722,10 +728,10 @@ sub _index_sync { } $dbh->commit; } - if ($mkey && $newest && $self->{indexlevel} =~ $xapianlevels) { - my $cur = $xdb->get_metadata($mkey); + if ($newest && $self->{indexlevel} =~ $xapianlevels) { + my $cur = $xdb->get_metadata('last_commit'); if (need_update($self, $cur, $newest)) { - $xdb->set_metadata($mkey, $newest); + $xdb->set_metadata('last_commit', $newest); } } $self->commit_txn_lazy;