git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch
@ 2019-03-15 16:09 Ævar Arnfjörð Bjarmason
  2019-03-15 19:00 ` Eric Sunshine
  0 siblings, 1 reply; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-15 16:09 UTC (permalink / raw)
  To: Git Mailing list; +Cc: Johannes Schindelin, Eric Sunshine

I just submittted
https://public-inbox.org/git/20190315155959.12390-1-avarab@gmail.com/
and for 8/8 had to tweak the creation factor to 80% due to having added
a large comment. Maybe something like the below makes more sense for
format-patch?

Also, the "Algorithm" section of git-range-diff describes how we'll try
to find a percentage similarity in the *diff*, but in this case I'm
fairly sure that e.g. a creation factor of 50 would do if it also
considered the commit message.

Maybe I'm just wrong and it does that already, but assuming I'm right in
my reading and it doesn't, was that ever considered? That should result
in fewer false "not the same patch" positives, but maybe if patches are
split up it'll screw things up in other ways.

diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
index 1af85d404f5..67a4881a20f 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -261,6 +261,10 @@ material (this may change in the future).
 	between the previous and current series of patches by adjusting the
 	creation/deletion cost fudge factor. See linkgit:git-range-diff[1])
 	for details.
++
+Defaults to 90, whereas the linkgit:git-range-diff[1] default is
+60. It's assumed that you're submitting a new patch series & that we
+should try harder than normal to find similarities.

 --notes[=<ref>]::
 	Append the notes (see linkgit:git-notes[1]) for the commit
diff --git a/builtin/log.c b/builtin/log.c
index ab859f59041..ff340130826 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1843,7 +1843,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 	}

 	if (creation_factor < 0)
-		creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT;
+		creation_factor = RANGE_DIFF_CREATION_FACTOR_FORMAT_PATCH_DEFAULT;
 	else if (!rdiff_prev)
 		die(_("--creation-factor requires --range-diff"));

diff --git a/range-diff.h b/range-diff.h
index 08a50b6e98f..634112396d3 100644
--- a/range-diff.h
+++ b/range-diff.h
@@ -4,6 +4,7 @@
 #include "diff.h"

 #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
+#define RANGE_DIFF_CREATION_FACTOR_FORMAT_PATCH_DEFAULT 90

 /*
  * Compare series of commmits in RANGE1 and RANGE2, and emit to the

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch
  2019-03-15 16:09 [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch Ævar Arnfjörð Bjarmason
@ 2019-03-15 19:00 ` Eric Sunshine
  2019-03-15 19:21   ` Ævar Arnfjörð Bjarmason
  2019-03-18  5:23   ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Sunshine @ 2019-03-15 19:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing list, Johannes Schindelin

On Fri, Mar 15, 2019 at 12:09 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
> @@ -261,6 +261,10 @@ material (this may change in the future).
> +Defaults to 90, whereas the linkgit:git-range-diff[1] default is
> +60. It's assumed that you're submitting a new patch series & that we
> +should try harder than normal to find similarities.

My understanding was that the primary use-case of git-range-diff
itself (which existed before the --range-diff option was added to
git-format-patch) was to generate a "range diff" for a cover letter of
a re-rolled series. So, I'm confused about why this tweaks the default
value of one command but not the other.

> diff --git a/range-diff.h b/range-diff.h
> @@ -4,6 +4,7 @@
>  #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
> +#define RANGE_DIFF_CREATION_FACTOR_FORMAT_PATCH_DEFAULT 90

The point of introducing RANGE_DIFF_CREATION_FACTOR_DEFAULT in the
first place was to ensure that the default creation-factor didn't get
out-of-sync between git-range-diff and git-format-patch., Thus,
introducing this sort of inconsistency between the two would likely
lead to confusion on the part of users. After all, --range-diff was
added to git-format-patch merely as a convenience over having to run
git-range-diff separately and copy/pasting its output into a cover
letter generated by git-format-patch. If the two programs default to
different values, then that "convenience equality" breaks down.

So, I'm fairly negative on this change. However, that doesn't mean I
would oppose tweaking the value shared between the two programs (and
also the default used by GitGitGadget, if it specifies it manually),
though I defer to Dscho in that regard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch
  2019-03-15 19:00 ` Eric Sunshine
@ 2019-03-15 19:21   ` Ævar Arnfjörð Bjarmason
  2019-03-21 11:22     ` Johannes Schindelin
  2019-03-18  5:23   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-15 19:21 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git Mailing list, Johannes Schindelin


On Fri, Mar 15 2019, Eric Sunshine wrote:

> On Fri, Mar 15, 2019 at 12:09 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
>> @@ -261,6 +261,10 @@ material (this may change in the future).
>> +Defaults to 90, whereas the linkgit:git-range-diff[1] default is
>> +60. It's assumed that you're submitting a new patch series & that we
>> +should try harder than normal to find similarities.
>
> My understanding was that the primary use-case of git-range-diff
> itself (which existed before the --range-diff option was added to
> git-format-patch) was to generate a "range diff" for a cover letter of
> a re-rolled series. So, I'm confused about why this tweaks the default
> value of one command but not the other.
>
>> diff --git a/range-diff.h b/range-diff.h
>> @@ -4,6 +4,7 @@
>>  #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
>> +#define RANGE_DIFF_CREATION_FACTOR_FORMAT_PATCH_DEFAULT 90
>
> The point of introducing RANGE_DIFF_CREATION_FACTOR_DEFAULT in the
> first place was to ensure that the default creation-factor didn't get
> out-of-sync between git-range-diff and git-format-patch., Thus,
> introducing this sort of inconsistency between the two would likely
> lead to confusion on the part of users. After all, --range-diff was
> added to git-format-patch merely as a convenience over having to run
> git-range-diff separately and copy/pasting its output into a cover
> letter generated by git-format-patch. If the two programs default to
> different values, then that "convenience equality" breaks down.
>
> So, I'm fairly negative on this change. However, that doesn't mean I
> would oppose tweaking the value shared between the two programs (and
> also the default used by GitGitGadget, if it specifies it manually),
> though I defer to Dscho in that regard.

I think that was the intention initially, but at least I'm now using
range-diff as a general comparison tool of different non-ff-branches,
e.g. the force-pushes to "pu".

I'd expect the tool in general to be used like that, whereas with
format-patch it's safe to say we're dealing with a re-roll of the same
thing.

Hence the hypothesis that for format-patch we can be more aggressive
about finding similarities.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch
  2019-03-15 19:00 ` Eric Sunshine
  2019-03-15 19:21   ` Ævar Arnfjörð Bjarmason
@ 2019-03-18  5:23   ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2019-03-18  5:23 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing list,
	Johannes Schindelin

Eric Sunshine <sunshine@sunshineco.com> writes:

> So, I'm fairly negative on this change. However, that doesn't mean I
> would oppose tweaking the value shared between the two programs (and
> also the default used by GitGitGadget, if it specifies it manually),
> though I defer to Dscho in that regard.

I do not think of a good reason why the command should behave
differently only when run from inside format-patch, and if we were
changing anything, perhaps raising it a bit for all may make sense.
I have yet to see range-diff getting confused and matching wrong
patches but often seee it being too conservative to match two
iterations of the same patch after a moderate amount of update.

I find myself passing "--creation-factor 99" or some absurdly
looking value when accepting a rerolled series.  The most recent was
the updated 'clean-up merge message with scissors' from Denton
[*1*].


[Footnote]

*1* <6716d28a5187c44c1d90f5ce840c44441f62352c.1552275703.git.liu.denton@gmail.com>
and <08426189b5a29b376581eb0172e52222ab22387a.1552817044.git.liu.denton@gmail.com>
do not line up without a high creation factor.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch
  2019-03-15 19:21   ` Ævar Arnfjörð Bjarmason
@ 2019-03-21 11:22     ` Johannes Schindelin
  0 siblings, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2019-03-21 11:22 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eric Sunshine, Git Mailing list

[-- Attachment #1: Type: text/plain, Size: 3579 bytes --]

Hi Ævar and Eric,

On Fri, 15 Mar 2019, Ævar Arnfjörð Bjarmason wrote:

>
> On Fri, Mar 15 2019, Eric Sunshine wrote:
>
> > On Fri, Mar 15, 2019 at 12:09 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >> diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
> >> @@ -261,6 +261,10 @@ material (this may change in the future).
> >> +Defaults to 90, whereas the linkgit:git-range-diff[1] default is
> >> +60. It's assumed that you're submitting a new patch series & that we
> >> +should try harder than normal to find similarities.
> >
> > My understanding was that the primary use-case of git-range-diff
> > itself (which existed before the --range-diff option was added to
> > git-format-patch) was to generate a "range diff" for a cover letter of
> > a re-rolled series. So, I'm confused about why this tweaks the default
> > value of one command but not the other.
> >
> >> diff --git a/range-diff.h b/range-diff.h
> >> @@ -4,6 +4,7 @@
> >>  #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60
> >> +#define RANGE_DIFF_CREATION_FACTOR_FORMAT_PATCH_DEFAULT 90
> >
> > The point of introducing RANGE_DIFF_CREATION_FACTOR_DEFAULT in the
> > first place was to ensure that the default creation-factor didn't get
> > out-of-sync between git-range-diff and git-format-patch., Thus,
> > introducing this sort of inconsistency between the two would likely
> > lead to confusion on the part of users. After all, --range-diff was
> > added to git-format-patch merely as a convenience over having to run
> > git-range-diff separately and copy/pasting its output into a cover
> > letter generated by git-format-patch. If the two programs default to
> > different values, then that "convenience equality" breaks down.
> >
> > So, I'm fairly negative on this change. However, that doesn't mean I
> > would oppose tweaking the value shared between the two programs (and
> > also the default used by GitGitGadget, if it specifies it manually),
> > though I defer to Dscho in that regard.

GitGitGadget does not specify the creation factor manually.

> I think that was the intention initially, but at least I'm now using
> range-diff as a general comparison tool of different non-ff-branches,
> e.g. the force-pushes to "pu".

Interesting ;-)

> I'd expect the tool in general to be used like that, whereas with
> format-patch it's safe to say we're dealing with a re-roll of the same
> thing.
>
> Hence the hypothesis that for format-patch we can be more aggressive
> about finding similarities.

I do agree that `format-patch`'s `--range-diff` is certainly exclusively
used for comparing different iterations of the same patch series. As such,
I do agree with Ævar that it makes sense to have a *different* default for
the creation factor.

Having said that, I did notice while porting `tbdiff` to C that it would
be a neat idea to put more weight behind the differences of the commit
message, and maybe even use a *different* measure on the commit message,
too. Personally, I would try to use a variation of the word diff
(maybe something that reflects my experience that it is common to change
the case in the oneline, to add a sentence here and there, or to delete a
sentence, but not so much to replace entire sentences), to account for the
fact that the commit message rarely changes substantially between
iterations.

So I guess there is a lot of room for improvement here: code (read: diff)
changes simply are not the same as commit message changes.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-21 11:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-15 16:09 [WIP PATCH/RFC] Use a higher range-diff --creation-factor for format-patch Ævar Arnfjörð Bjarmason
2019-03-15 19:00 ` Eric Sunshine
2019-03-15 19:21   ` Ævar Arnfjörð Bjarmason
2019-03-21 11:22     ` Johannes Schindelin
2019-03-18  5:23   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).