From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 540F31F86C; Mon, 30 Nov 2020 19:42:01 +0000 (UTC) Date: Mon, 30 Nov 2020 19:42:01 +0000 From: Eric Wong To: meta@public-inbox.org Subject: xref3 + NNTP problems... Message-ID: <20201130194201.GA6687@dcvr> References: <20201127095254.21624-1-e@80x24.org> <20201127095254.21624-13-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201127095254.21624-13-e@80x24.org> List-Id: Eric Wong wrote: > Subject: Re: [PATCH 12/12] nntp: xref: use ->ALL extindex if available > > Getting Xref for cross-posted messages is an O(n) operation > where `n' is the number of newsgroups on the server. This works > acceptably when there are dozens of groups, but would be > unnacceptable when there's tens of thousands of newsgroups. > > With ~140 newsgroups, a lore.kernel.org mirror already handles > "XHDR Xref $MESSAGE_ID" requests around 30% faster after > creating the xref3.idx_nntp index. xref3-based on ContentHash is great for internal use to ensure we don't archive redundant messages; but don't get fooled by reused Message-IDs, either (because users can and do reuse Message-IDs for completely different messages) But there's also more cases where content-matching for messages we don't match because of mailing list-added trailers (e.g. unsubscribe and other list info info) So every message posted to linux-mtd@lists.infradead.org nntp://rskvuqcfnfizkjg6h5jvovwb3wkikzcwskf54lfpymus6mxrzw67b5ad.onion/org.infradead.lists.linux-mtd fails to match cross posts to Xref:, which is bad, too... I have an idea to make it less bad, maybe it'll work...