From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 5CA901F464; Wed, 25 Sep 2019 22:45:00 +0000 (UTC) Date: Wed, 25 Sep 2019 22:45:00 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: Git-only operation mode Message-ID: <20190925224500.GA28628@dcvr> References: <20190925182431.GA4628@chatter.i7.local> <20190925194503.GA21501@dcvr> <20190925195838.GB4628@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190925195838.GB4628@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > On Wed, Sep 25, 2019 at 07:45:03PM +0000, Eric Wong wrote: > > > Is there a way to run just the archiver component of public-inbox -- > > > just > > > writing to git repos without any of the indexing/frontend bits? One of the > > > idle conversations I had with vger.kernel.org folks was to see if we can > > > shift the source of truth archive generation to happen at their end. We > > > would then clone repositories from them and provide the frontend/search bits > > > on lore.kernel.org. From my cursory looking, it would seem that the > > > watch/delivery tools always expect to be taking care of xapian/indexing, but > > > I think being able to decouple git bits from search/frontend bits would be a > > > useful mode or operation. > > > > v1 was git-only (that led to scalability problems from big trees). > > v2 needs SQLite to do dedupe with indexlevel=basic, but not Xapian, > > anymore. We could get rid of dedupe for v2, but I'm not sure it's > > worth it... > > Needing sqlite is not a big deal -- compared to the size of the repos, > that's reasonably small (e.g. all of lkml git trees are 8.2GB, while > msgmap.sqlite3 is 600MB). Right, it'll also need xap15/over.sqlite* but that's still not too big. > Is there an easy way to exclude xapian indexes from being generated during > watch/mda runs then? public-inbox-init --indexlevel=basic Or setting publicinbox.$INBOX_NAME.indexlevel=basic in the config file after-the-fact. You should also be able to remove any non-SQLite files from xap15 after-the-fact, if you already generated them, too (but I haven't tested that). I started working on a public-inbox-init manpage the other day, still need to finish that... > A follow-up to that -- is running "public-inbox-index" on the repository > after it's been updated enough to update the xapian db? It would be easy to > do so as part of the grok-pull post-update hook. Yes, on a fresh clone. You'll need to change indexlevel to medium or full if it was setup using basic. I haven't figured out how to use a grok-pull post-update hook to run index on my clone of erol, since there's multiple epochs per-inbox to deal with.