From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E51131F576 for ; Thu, 15 Feb 2018 11:08:44 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [WIP 0/17] initial v2 work based on one-file tree Date: Thu, 15 Feb 2018 11:08:23 +0000 Message-Id: <20180215110840.30413-1-e@80x24.org> In-Reply-To: <20180215105509.GA22409@dcvr> References: <20180215105509.GA22409@dcvr> List-Id: The basic idea is to outsource deduplication to Xapian and use git as dumb storage. This yields huge dividends in object traversal based on preliminary tests: https://public-inbox.org/meta/20180209205140.GA11047@dcvr/ Additionally, insertion time does not degrade due to giant tree objects which plagued the initial v1 design. There's also a couple of small fixes along the way to make it tolerate some crap in older archives. The search indexer and content-based deduplication will still need to be worked on. Eric Wong (Contractor, The Linux Foundation) (17): AUTHORS: add The Linux Foundation watch_maildir: allow '-' in mail filename scripts/import_vger_from_mbox: relax From_ line match slightly import: stop writing legacy ssoma.index by default import: begin supporting this without ssoma.lock import: initial handling for v2 t/import: test for last_object_id insertion content_id: add test case searchmsg: add mid_mime import for _extract_mid scripts/import_vger_from_mbox: support --dry-run option import: APIs to support v2 use search: free up 'Q' prefix for a real unique identifier searchidx: fix comment around next_thread_id address: extract more characters from email addresses import: pass "raw" dates to git-fast-import(1) scripts/import_vger_from_mbox: use v2 layout for import import: quiet down warnings from bogus From: lines