From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, URIBL_RED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 4AE7E1F9FC; Wed, 17 Mar 2021 18:18:43 +0000 (UTC) Date: Wed, 17 Mar 2021 20:18:43 +0200 From: Eric Wong To: workflows@vger.kernel.org, meta@public-inbox.org Subject: Re: WIP: searching all of lore Message-ID: <20210317181843.GA9180@dcvr> References: <20201126194543.GA30337@dcvr> <20210317071116.GA8121@dcvr> <20210317132723.xx4klonordhsb6ve@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210317132723.xx4klonordhsb6ve@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > Looking good! I noticed that it doesn't "uniquify" the results. E.g. searching > for "lists.linux.dev" (just some uncommon wording I could think of) returns > multiple hits for the same message sent to multiple lists: > > https://yhbt.net/lore/all/?q=lists.linux.dev > > Is that intentional, or can this be tweaked to show a single result for the > same message-id? Not really. At least for the summary search results, it makes no sense: https://public-inbox.org/meta/20210317181408.9124-1-e@80x24.org/ The underlying cause that can be seen in https://yhbt.net/lore/all/20210316102311.182375-1-gregkh@linuxfoundation.org/ is the Mailman-added signature for one of the posts. I've been considering adding a "diff view" to more easily pick out differences between messages with identical Message-ID with subtly different content, but it could be expensive for PSGI... I will probably prototype it in lei, first.