From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A5BDB1F4BD; Tue, 1 Oct 2019 22:00:02 +0000 (UTC) Date: Tue, 1 Oct 2019 22:00:02 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: [PATCH] TODO: add item for searching based on git-patch-id(1) Message-ID: <20191001220002.GA21797@dcvr> References: <20191001033747.37354-1-e@80x24.org> <20191001210009.GA4232@pure.paranoia.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191001210009.GA4232@pure.paranoia.local> List-Id: Konstantin Ryabitsev wrote: > On Tue, Oct 01, 2019 at 03:37:47AM +0000, Eric Wong wrote: > > +* support searching based on `git-patch-id --stable` to improve > > + bidirectional mapping of commits <=> emails > > It would be handy, but a word of caution -- because it strips > whitespace, git-patch-id is not great for languages with syntactic > indentation, like Python. For example, the following two patches > generate the same patch-id, but one is actually malicious: Good point. Makefiles also fall into that category, I wonder what other languages are whitespace sensitive? > So, I wouldn't use git-patch-id as a mechanism to look up patches, > except as an auxiliary one. It's usable for 99% of patches for the kernel, though. But right, dfpost:$BLOB_ID matches should take precedence, and we can use a lower weight for the patch-id in Xapian The bigger question is the cost in time to reindex... And ultimately, I wonder if dfpost:$BLOB_ID + s:$COMMIT_TITLE is good enough, too... I think I need to dig out something I abandoned years ago for indexing coderepos and refactor that to be less space-intensive now.