git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Denton Liu <liu.denton@gmail.com>
Subject: Re: [PATCH v3 00/13] ci: include a Visual Studio build & test in our Azure Pipeline
Date: Tue, 8 Oct 2019 14:46:34 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1910081423250.46@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <xmqqzhicnfmr.fsf@gitster-ct.c.googlers.com>

Hi Junio,

On Tue, 8 Oct 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> I didn't quite understand this part, though.
> >>
> >>     The default creation factor is 60 (roughly speaking, it wants 60% of
> >>     the lines to match between two patches, otherwise it considers the
> >>     patches to be unrelated).
> >>
> >> Would the updated creation factor used which is 95 (roughly
> >> speaking) want 95% of the lines to match between two patches?
> >>
> >> That would make the matching logic even pickier and reject more
> >> paring, so I must be reading the statement wrong X-<.
> >
> > No, I must have written the opposite of what I tried to say, is all.
>
> So, cfactor of 60 means at most 60% is allowed to differ and the
> two patches are still considered to be related, while 95 means only
> 5% needs to be common?  That would make more sense to me.

Okay, I not only wrote the opposite of what I wanted to say, I also
misremembered.

When `range-diff` tries to determine matching pairs of patches, it
builds an `(m+n)x(m+n)` cost matrix, where `m` is the number of patches
in the first commit range and `n` is the number of patches in the second
one.

Why not `m x n`? Well, that's the obvious matrix, and that's what it
starts with, essentially assigning the number of lines of the diff
between the diffs as "cost".

But then `git range-diff` extends the cost matrix to allow for _all_ of
the `m` patches to be considered deleted, and _all_ of the `n` patches
to be added. As cost, it cannot use a "diff of diffs" because there is
no second diff. So it uses the number of lines of the one diff it has,
multiplied by the creation factor interpreted as a percentage.

The naive creation factor would be 100%, which is (almost) as if we
assumed an empty diff for the missing diff. But that would make the
range-diff too eager to dismiss rewrites, as experience obviously showed
(not my experience, but Thomas Rast's, who came up with `tbdiff` after
all): the diff of diffs includes a diff header, for example.

The interpretation I offered (although I inverted what I wanted to say)
is similar in spirit to that metric (which is not actually a metric, I
believe, because I expect it to violate the triangle inequality) is
obviously inaccurate: the number of lines of the diff of diffs does not
say anything about the number of matching lines, quite to the contrary,
it correlates somewhat to the number of non-matching lines.

So a better interpretation would have been:

	The default creation factor is 60 (roughly speaking, it wants at
	most 60% of the diffs' lines to differ, otherwise it considers
	them not to be a match.

This is still inaccurate, but at least it gets the idea of the
range-diff across.

Of course, I will never be able to amend the commit message in
GitGitGadget anyway, as I have merged that PR already.

Ciao,
Dscho

  reply	other threads:[~2019-10-08 12:46 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-26  8:30 [PATCH 00/13] ci: include a Visual Studio build & test in our Azure Pipeline Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 01/13] push: do not pretend to return `int` from `die_push_simple()` Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 02/13] msvc: avoid using minus operator on unsigned types Johannes Schindelin via GitGitGadget
2019-09-26 17:20   ` Denton Liu
2019-09-26 21:01     ` Johannes Schindelin
2019-09-26 23:57       ` Denton Liu
2019-09-30  9:50         ` Johannes Schindelin
2019-09-26  8:30 ` [PATCH 03/13] winansi: use FLEX_ARRAY to avoid compiler warning Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 04/13] compat/win32/path-utils.h: add #include guards Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 05/13] msvc: ignore some libraries when linking Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 06/13] msvc: handle DEVELOPER=1 Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 07/13] msvc: work around a bug in GetEnvironmentVariable() Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 08/13] vcxproj: only copy `git-remote-http.exe` once it was built Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 09/13] vcxproj: include more generated files Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 10/13] test-tool run-command: learn to run (parts of) the testsuite Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 11/13] tests: let --immediate and --write-junit-xml play well together Johannes Schindelin via GitGitGadget
2019-09-28 22:22   ` Junio C Hamano
2019-09-30  9:52     ` Johannes Schindelin
2019-09-26  8:30 ` [PATCH 12/13] ci: really use shallow clones on Azure Pipelines Johannes Schindelin via GitGitGadget
2019-09-26  8:30 ` [PATCH 13/13] ci: also build and test with MS Visual Studio " Johannes Schindelin via GitGitGadget
2019-09-30  9:55 ` [PATCH v2 00/13] ci: include a Visual Studio build & test in our Azure Pipeline Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 02/13] msvc: avoid using minus operator on unsigned types Johannes Schindelin via GitGitGadget
2019-10-03 22:44     ` Junio C Hamano
2019-10-04  9:55       ` Johannes Schindelin
2019-10-04 17:09         ` Johannes Sixt
2019-10-04 21:24           ` Johannes Schindelin
2019-10-06  0:02             ` Junio C Hamano
2019-10-06 10:53               ` Johannes Sixt
2019-10-08 12:04                 ` Johannes Schindelin
2019-10-08 21:13                   ` Johannes Sixt
2019-09-30  9:55   ` [PATCH v2 01/13] push: do not pretend to return `int` from `die_push_simple()` Johannes Schindelin via GitGitGadget
2019-10-03 22:37     ` Junio C Hamano
2019-10-04  9:36       ` Johannes Schindelin
2019-09-30  9:55   ` [PATCH v2 03/13] winansi: use FLEX_ARRAY to avoid compiler warning Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 05/13] msvc: ignore some libraries when linking Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 04/13] compat/win32/path-utils.h: add #include guards Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 06/13] msvc: handle DEVELOPER=1 Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 07/13] msvc: work around a bug in GetEnvironmentVariable() Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 09/13] vcxproj: include more generated files Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 08/13] vcxproj: only copy `git-remote-http.exe` once it was built Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 10/13] test-tool run-command: learn to run (parts of) the testsuite Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 11/13] tests: let --immediate and --write-junit-xml play well together Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 12/13] ci: really use shallow clones on Azure Pipelines Johannes Schindelin via GitGitGadget
2019-09-30  9:55   ` [PATCH v2 13/13] ci: also build and test with MS Visual Studio " Johannes Schindelin via GitGitGadget
2019-10-04 15:09   ` [PATCH v3 00/13] ci: include a Visual Studio build & test in our Azure Pipeline Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 01/13] push: do not pretend to return `int` from `die_push_simple()` Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 02/13] msvc: avoid using minus operator on unsigned types Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 03/13] winansi: use FLEX_ARRAY to avoid compiler warning Johannes Schindelin via GitGitGadget
2019-10-07 19:16       ` Alban Gruin
2019-10-04 15:09     ` [PATCH v3 04/13] compat/win32/path-utils.h: add #include guards Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 05/13] msvc: ignore some libraries when linking Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 07/13] msvc: work around a bug in GetEnvironmentVariable() Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 06/13] msvc: handle DEVELOPER=1 Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 08/13] vcxproj: only copy `git-remote-http.exe` once it was built Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 09/13] vcxproj: include more generated files Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 10/13] test-tool run-command: learn to run (parts of) the testsuite Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 11/13] tests: let --immediate and --write-junit-xml play well together Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 12/13] ci: really use shallow clones on Azure Pipelines Johannes Schindelin via GitGitGadget
2019-10-04 15:09     ` [PATCH v3 13/13] ci: also build and test with MS Visual Studio " Johannes Schindelin via GitGitGadget
2019-10-06  0:19     ` [PATCH v3 00/13] ci: include a Visual Studio build & test in our Azure Pipeline Junio C Hamano
2019-10-06 10:45       ` Johannes Schindelin
2019-10-06 20:38         ` Johannes Schindelin
2019-10-07  1:14           ` Junio C Hamano
2019-10-07 21:51             ` Johannes Schindelin
2019-10-08  2:19               ` Junio C Hamano
2019-10-08 12:46                 ` Johannes Schindelin [this message]
2019-10-09 13:57                   ` Philip Oakley
2019-10-10  9:03                     ` Johannes Schindelin
2019-10-10 10:12                       ` Philip Oakley
2019-10-07  0:59         ` Junio C Hamano
2019-10-07 16:08           ` Thomas Gummerer
2019-10-11 22:06             ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1910081423250.46@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=liu.denton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).