From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id E7F6F1F8C2 for ; Mon, 8 Feb 2021 23:55:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229912AbhBHXxn (ORCPT ); Mon, 8 Feb 2021 18:53:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbhBHXxi (ORCPT ); Mon, 8 Feb 2021 18:53:38 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75A97C061786 for ; Mon, 8 Feb 2021 15:52:57 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id v193so12117188oie.8 for ; Mon, 08 Feb 2021 15:52:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SNkOJoYbbwDCpNdPhN2hC2iId8++w27740jCz7+u2bw=; b=hcuiZuntLA06p+dDbNRggM1jELCBwj0JpVjFpZ0kevxQnt8AbpPxxFbJPB/XgLSN2B JYtS+5S/ITN0ck/GDj4xKsJ1Pu43/FfJDOYSvD4EZ3PMFxOI2NXaSxe43LKVign69Ptv DD9XVTubOXtmIR1jsxWLNOc1R7A3p8G0brkaw4b6IlRTXBnePX9mWRlbsjJPWc7S6MUO p2TCoeB9SYRibs3IuShRpek9aX7mHUtwToXdzz5M85BeBY7Ehli64lunb7Ftv/mP0z+O nRRM2MgZZxgYy62IBfTyr4AcNHQ2KSdf89hh+OQrXs4Rb6nnL5X6+xpP2O1zEj6K79pF ZAqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SNkOJoYbbwDCpNdPhN2hC2iId8++w27740jCz7+u2bw=; b=c9l8OTKW9VOwF6JRzJG+C8zrk0sSFQgER50Icpfbzfuh9pkph/6yuWAH6saD9FMUD/ fwbP0EokHcUp+9blL+e40O2fYMh0/xzJOcS/hZ5IqNsG9VMSjjgVsiYt7nUftiLHpZmR kZ+Lr5lr5Q/Ov3uzhvBCSq+GRoxAGIijuJu9Bv/gAOk/kTaYmo2sOu3NzE5/R5MTHULA +H7ITCryp89yCA8OJ6/KeSuD52XC3xIBiastJTByr0glIi/NJhZhKa8bpqEuPLq/Wcvv emdMPBldme17HU15XstRNWqfIwD8X19b5at57hgwklja8LY87irc6WxUnwmpWd7yqvUc uLXw== X-Gm-Message-State: AOAM533qciaBRu4+UiB0lUEwfSG+KCpSuo+rCFXgbfiKanUmvm9z9pqd Czq7MP8lkmOddVOJ97OAnCY7JAC9bEsD2IbeqBsXVGRS8Xo= X-Google-Smtp-Source: ABdhPJw0xFkO23LJyaSyF4One12/tFk6vqzJfHzccI+VSzct0/WoytN58OkzDXctMzxhtnkwuVD25laFfME8nx/4VQI= X-Received: by 2002:aca:550e:: with SMTP id j14mr798255oib.31.1612828376893; Mon, 08 Feb 2021 15:52:56 -0800 (PST) MIME-Version: 1.0 References: <1d941c35076e8d515c8ff7ef01d6b9d8c092aaa9.1612651937.git.gitgitgadget@gmail.com> <9fbed0f9-032e-3f99-8467-f8a9cfa2d8f1@gmail.com> In-Reply-To: From: Elijah Newren Date: Mon, 8 Feb 2021 15:52:45 -0800 Message-ID: Subject: Re: [PATCH 3/3] diffcore-rename: guide inexact rename detection based on basenames To: Junio C Hamano Cc: Derrick Stolee , Elijah Newren via GitGitGadget , Git Mailing List , Derrick Stolee , Jonathan Tan , Taylor Blau , Jeff King Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Feb 8, 2021 at 3:43 PM Junio C Hamano wrote: > > Elijah Newren writes: > > > I'm sorry, but I'm not following you. As best I can tell, you seem to > > be suggesting that if we were to use a higher similarity bar for > > checking same-basename files, that such a difference would end up not > > accelerating the diffcore-rename algorithm at all? > > No. If we assume we use the minimum similarity threashold in the > new middle step that consider only the files that were moved across > directories without changing their names, and the last "full matrix" > step sees a src that did *not* pair with a dst of the same name in a > different directory surviving, we know that the pair would not be > similar enough (because we are using the same "minimum similarity" > in the middle step and the full matrix step) without comparing them > again. But if we used higher similarity in the middle step, the > fact that such a src/dst pair surviving the middle step without > producing a match only means that the pair was not similar enough > with a raised bar used in the middle, and the full-matrix step need > to consider the possibility that they may still be similar enough > when using "minimum similarity" used for all the other pairs. > > And because I was assuming that requiring higher similarity in the > middle step would be a prudent thing to do to avoid false matches > that discard better matches elsewhere, my conclusion was that it > would not be a useful optimization to do in the final full-matrix > step to see if a pair is something that was a candidate in the > middle step but did not match well enough (because the fact that the > pair did not compare well enough with higher bar does not mean it > would not compare well to pass the lower "minimum" bar). Ah, gotcha! Thanks for clarifying. Yes, yet another reason to not even try to avoid "redoing" the O(N) spanhash comparisons.