From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 600851F9FD; Tue, 2 Mar 2021 09:28:02 +0000 (UTC) Date: Tue, 2 Mar 2021 09:28:02 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: lei: per-message keywords and externals Message-ID: <20210302092802.GA19386@dcvr> References: <20210224204950.GA2076@dcvr> <20210226092648.GA30618@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210226092648.GA30618@dcvr> List-Id: Eric Wong wrote: > Eric Wong wrote: > > Something I've been pondering for a bit is how to handle > > keywords (Seen, Important, Replied, ...) for messages stored in > > externals. > > > > I want "kw:" prefix to be a usable search term, like: > > > > lei q something interesting kw:seen > > lei q something interesting NOT kw:seen > > > > This is no problem for imported messages in ~/.local/share/lei/store. > > All the keyword info is stored in line with the rest of the > > Xapian index data. > > > > But, I also don't want to be wasting users' space by duplicating > > index data if they're already hosting inboxes for public > > consumption. So, it's looking like parsing out kw: ourselves > > and do extra filtering on our end when externals are in play is > > going to be a requirement... > > Something I considered a few weeks ago, but decided against, but > am again coming around to is indexing just the overview header > info in lei/store. In other words: > > $sto->set_eml($eml->header_obj, @kw) > > instead of: > > $sto->set_eml($eml, @kw) > > > Or, just don't support searching using "kw:" with externals, for > > now; but still stash keywords somewhere when writing to > > traditional mail stores. > > Maybe it'll be another instance of LeiStore in a separate dir > for external keywords: ~/.local/share/lei/xkw-store I'm leaning that way. For deduplication purposes (that is: merging keywords from cross-posted messages), OID will be indexed as a boolean term for repeat lookups (along with Message-ID). I'm not 100% sure if I want this to be SQLite or Xapian, yet. Leaning towards Xapian since that would give us more flexibility w.r.t keyword searches and also let us do filtering on common headers without too much cost.