From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 5F45B1FF9C for ; Tue, 27 Oct 2020 07:54:53 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/52] detached external index: mostly Date: Tue, 27 Oct 2020 07:54:01 +0000 Message-Id: <20201027075453.19163-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: ...and mostly wired up for WWW, but requires manual config editing atm. Needs docs and tests, and IMAP support. This will also form the basis of a mairix workalike client. Not sure about the usability aspects, but I think this can replace the need for per-inbox Xapian DBs and save a truckload of disk space (and more importantly: cache space). Per-inbox over.sqlite3 remains required for compatibility with NNTP/IMAP and existing WWW code. I don't know if the command-line tool is going to be called public-inbox-eindex or public-inbox-extindex, but probably the latter... "xindex" could be confusing, and "eindex" rhymes with "reindex" which could also be confusing. But I'm even more easily confused than usual these days :x Performance isn't great, it took 30+ hours to index my mirror of lore on a SATA SSD, but the entire index is <200GB due to deduplication between cross posts. -compact isn't working with these indices, yet, but will sometime... More changes on the way, still trying fix my brain and get through this year... Eric Wong (52): doc/standards: add RFCs for URL schemes search: hoist out _xdb_sharded for v2 inboxes extsearch: start mocking out searchidx: expose INDEXLEVELS as `our' v2writable: add git method v2writable: make OO calls to last_commit-related methods search: xdb_sharded: make this a public method for ExtSearch searchidx: introduce "xref3" concept v2writable: prepare initialization for external indices v2writable: hoist out write_alternates searchidxshard: allow msgref to be undef v2writable: idx_shard: simplify callers v2writable: count_shards: allow working without {ibx} overidx: introduce changes for external index v2: some changes for ExtSearchIdx compatibility inboxwritable: eidx_key for external index v2writable: rename remaining "remote" terminology v2writable: checkpoint: account for lack of {mm} extsearchidx: initial implementation searchidx: index eidx_key as a boolean term searchidx: xref3 delete support searchidxshard: special init for eidx searchidx: put {ibx} into $sync state searchidx: log2stack: simplify callers v2writable: more generic sync setup code v2writable: allow OO method references v2writable: rename {v2w} field to {self} v2writable: make *last_commits and sync_prepare OO methods v2writable: move size check init to sync_prepare extsearchidx: more compatibility with V2Writable callers v2writable: reduce scope of epoch-aware code extsearchidx: remove {unindex_range} field v2writable: pass oid to uindex_oid extsearchidx: sync unit updates searchidx: export prepare_stack extsearchidx: sync updates searchidx: reduce inbox-dependency, wrap ->with_umask searchidx: favor $sync->{ibx} (over $self->{ibx}) Makefile.PL: do not build manpage if POD is missing script: add preliminary eindex implementation index: eindex wiring over: store xref3 data in over.sqlite3 searchidx: remove xref3 support for Xapian t/extsearch.t: verify results and xref3 ordering t/v2writable: remove pointless ->barrier call extsearch: wire up smsg_eml extsearchidx: handle edits extsearch: wire up remaining Inbox-like methods for WWW searchidx: ignore exceptions from ->remove_term extsearchidx: set current_info in warning callbacks extsearchidx: support --batch-size checkpoints searchidxshard: make warnings with eidx_key less confusing Documentation/standards.perl | 3 + MANIFEST | 4 + Makefile.PL | 16 +- lib/PublicInbox/Config.pm | 12 + lib/PublicInbox/ExtSearch.pm | 69 +++++ lib/PublicInbox/ExtSearchIdx.pm | 404 ++++++++++++++++++++++++++++++ lib/PublicInbox/Inbox.pm | 53 ++-- lib/PublicInbox/InboxWritable.pm | 23 ++ lib/PublicInbox/Over.pm | 19 ++ lib/PublicInbox/OverIdx.pm | 122 ++++++++- lib/PublicInbox/Search.pm | 62 ++--- lib/PublicInbox/SearchIdx.pm | 135 +++++++--- lib/PublicInbox/SearchIdxShard.pm | 77 +++++- lib/PublicInbox/V2Writable.pm | 310 ++++++++++++----------- lib/PublicInbox/WWW.pm | 3 +- lib/PublicInbox/Xapcmd.pm | 2 +- script/public-inbox-eindex | 43 ++++ script/public-inbox-index | 3 +- t/extsearch.t | 75 ++++++ t/over.t | 24 ++ t/search.t | 2 - t/v2writable.t | 3 +- 22 files changed, 1204 insertions(+), 260 deletions(-) create mode 100644 lib/PublicInbox/ExtSearch.pm create mode 100644 lib/PublicInbox/ExtSearchIdx.pm create mode 100644 script/public-inbox-eindex create mode 100644 t/extsearch.t