From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 426E81F619 for ; Fri, 20 Mar 2020 08:18:21 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 0/9] preserve time and date of initial commit Date: Fri, 20 Mar 2020 08:18:12 +0000 Message-Id: <20200320081821.21715-1-e@yhbt.net> In-Reply-To: <20200305051310.GA26952@dcvr> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: For messages lacking Date and/or Received headers, search queries for "d:YYYYMMDD..YYYYMMDD" ranges can be unreliable in mirrors, as can the $INBOX_URL/?t=$TIMESTAMP query which only hits SQLite. Yes, this ended up being a lot of work to deal with corner case messages (probably most of which are spam), but there's also a lot of internal cleanups which made the end result easier to follow, I think... The main fix is actually in 1/9, but it's gross. Patch 2/9 fixes a small window where a race can happen and cause searches to be off by a minute. Patches 3-8 cleanup the mess left in 1 and 2, Finally, patch 9 fixes the corner-case-of-corner-cases for dealing with multi-MID messages which require a one-off queue to store the git commit/author times instead of overloading msgmap. Eric Wong (9): index: use git commit times on missing Date/Received v2writable: preserve timestamps from import if generated rename PublicInbox::SearchMsg => PublicInbox::Smsg smsg: to_doc_data: use existing fields overidx: parse_references: less error-prone args *idx: pass $smsg in more places instead of many args v2: pass smsg in more places *idx: pass smsg in even more places v2: SDBM-based multi Message-ID queue Documentation/mknews.perl | 4 +- Documentation/technical/data_structures.txt | 4 +- MANIFEST | 4 +- lib/PublicInbox/ExtMsg.pm | 3 +- lib/PublicInbox/Feed.pm | 4 +- lib/PublicInbox/Import.pm | 19 +++-- lib/PublicInbox/Inbox.pm | 2 +- lib/PublicInbox/Mbox.pm | 4 +- lib/PublicInbox/MsgTime.pm | 12 +-- lib/PublicInbox/MultiMidQueue.pm | 57 +++++++++++++ lib/PublicInbox/NNTP.pm | 14 ++-- lib/PublicInbox/Over.pm | 8 +- lib/PublicInbox/OverIdx.pm | 30 +++---- lib/PublicInbox/Search.pm | 8 +- lib/PublicInbox/SearchIdx.pm | 68 ++++++++++----- lib/PublicInbox/SearchIdxShard.pm | 26 ++++-- lib/PublicInbox/SearchView.pm | 8 +- lib/PublicInbox/{SearchMsg.pm => Smsg.pm} | 19 +++-- lib/PublicInbox/SolverGit.pm | 2 +- lib/PublicInbox/V2Writable.pm | 84 ++++++++++++------- lib/PublicInbox/View.pm | 2 +- t/import.t | 14 ++-- t/index-git-times.t | 93 +++++++++++++++++++++ t/multi-mid.t | 27 +++++- t/search-thr-index.t | 17 +++- t/thread-cycle.t | 2 +- 26 files changed, 386 insertions(+), 149 deletions(-) create mode 100644 lib/PublicInbox/MultiMidQueue.pm rename lib/PublicInbox/{SearchMsg.pm => Smsg.pm} (92%) create mode 100644 t/index-git-times.t