From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 998081F424; Wed, 18 Apr 2018 09:13:18 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Cc: "Eric Wong (Contractor, The Linux Foundation)" Subject: [PATCH 00/12] better dedupe, contiguous article numbers Date: Wed, 18 Apr 2018 09:13:04 +0000 Message-Id: <20180418091316.29114-1-e@80x24.org> List-Id: Hopefully the final round of patches, most notably [7/12] improving deduplication and [9/12] to avoid causing difficulties for NNTP readers on v1 inboxes. And a few more bugfixes along the way... Eric Wong (Contractor, The Linux Foundation) (12): feed: respect feedmax, again v1: remove articles from overview DB compact: do not merge v2 repos by default v2writable: reduce partititions by one search: preserve References in Xapian smsg for x=t view v2: generate better Message-IDs for duplicates v2: improve deduplication checks import: cat_blob drops leading 'From ' lines like Inbox searchidx: regenerate and avoid article number gaps on full index extmsg: remove expensive git path checks use %H consistently to disable abbreviations searchidx: increase term positions for all text terms MANIFEST | 2 + lib/PublicInbox/ContentId.pm | 67 ++++++++++-- lib/PublicInbox/ExtMsg.pm | 39 ++----- lib/PublicInbox/Feed.pm | 4 +- lib/PublicInbox/Import.pm | 21 ++-- lib/PublicInbox/Inbox.pm | 5 - lib/PublicInbox/Msgmap.pm | 8 +- lib/PublicInbox/SearchIdx.pm | 191 ++++++++++++++++++++++------------ lib/PublicInbox/V2Writable.pm | 69 +++++++----- lib/PublicInbox/View.pm | 7 +- script/public-inbox-compact | 15 ++- scripts/dupe-finder | 54 ++++++++++ t/content_id.t | 10 ++ t/convert-compact.t | 2 - t/psgi_v2.t | 11 +- t/search.t | 9 +- t/v1-add-remove-add.t | 45 ++++++++ t/v2writable.t | 5 +- 18 files changed, 391 insertions(+), 173 deletions(-) create mode 100644 scripts/dupe-finder create mode 100644 t/v1-add-remove-add.t -- EW