From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 08A5A1F66E for ; Thu, 20 Aug 2020 20:24:58 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/23] indexing: --skip-docdata + speedups Date: Thu, 20 Aug 2020 20:24:34 +0000 Message-Id: <20200820202457.21042-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Some miscellaneous help and cleanup things, too. Document data is no longer read from Xapian by read-only daemons; that data is redundant given over.sqlite3 always exists. This should should improve page cache hit rates for over.sqlite3 by a small bit. Being able to mass load a bunch of rows from SQLite speeds up the default search summary view by ~10%, too. --skip-docdata option to -init and -index can avoid writing Xapian document data, saving ~1.5% in Xapian space overhead (and associated I/O and page cache overheads). It breaks rollbacks to old versions, though, so it won't be the default. Eric Wong (23): doc: note -compact and -xcpdb are rarely used admin: progress shows the inbox being indexed compact: support --help/-? and perform lazy loading init: support --help and -? init: support --newsgroup option init: drop -N alias for --skip-artnum search: v2: ensure shards are numerically sorted xapcmd: simplify {reindex} parameter passing www: reduce long-lived PublicInbox::Search references search: improve comments around constants search: export mdocid subroutine searchquery: split off from searchview search: make qparse_new an internal function smsg: reduce utf8::decode call sites searchview: use over.sqlite3 instead of Xapian docdata searchview: speed up search summary by ~10% searchview: convert nested and Atom display to over.sqlite3 extmsg: avoid using Xapian docdata mbox: avoid Xapian docdata in search results smsg: remove from_mitem t/nntpd-v2: set PI_TEST_VERSION=2 properly init+index: support --skip-docdata for Xapian search: add mset_to_artnums method Documentation/public-inbox-compact.pod | 5 + Documentation/public-inbox-config.pod | 2 +- Documentation/public-inbox-index.pod | 8 ++ Documentation/public-inbox-init.pod | 37 +++++++- Documentation/public-inbox-xcpdb.pod | 3 + MANIFEST | 1 + lib/PublicInbox/Admin.pm | 15 ++- lib/PublicInbox/ExtMsg.pm | 19 ++-- lib/PublicInbox/IMAP.pm | 5 +- lib/PublicInbox/Inbox.pm | 11 ++- lib/PublicInbox/Mbox.pm | 24 +++-- lib/PublicInbox/Over.pm | 28 +++--- lib/PublicInbox/Search.pm | 125 ++++++++++++++----------- lib/PublicInbox/SearchIdx.pm | 35 +++++-- lib/PublicInbox/SearchIdxShard.pm | 2 +- lib/PublicInbox/SearchQuery.pm | 53 +++++++++++ lib/PublicInbox/SearchView.pm | 90 ++++-------------- lib/PublicInbox/Smsg.pm | 10 +- lib/PublicInbox/Xapcmd.pm | 20 ++-- script/public-inbox-compact | 39 ++++++-- script/public-inbox-convert | 3 +- script/public-inbox-index | 7 +- script/public-inbox-init | 101 +++++++++++++------- t/imapd.t | 6 +- t/inbox_idle.t | 2 +- t/index-git-times.t | 11 ++- t/init.t | 17 +++- t/nntpd-v2.t | 2 +- t/nntpd.t | 9 +- t/search.t | 34 ++++--- 30 files changed, 437 insertions(+), 287 deletions(-) create mode 100644 lib/PublicInbox/SearchQuery.pm