git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git has two ways to count modified lines
@ 2022-03-16 18:08 Laurent Lyaudet
  2022-04-02 16:49 ` Laurent Lyaudet
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Lyaudet @ 2022-03-16 18:08 UTC (permalink / raw)
  To: git

Hello,

I check the number of lines modified by my commits by hand,
and today I had a surprising count :

1) I commit and get the following count
laurent@laurent-GL73-8SD:~/ReposGit/flow$ git commit
[master c068911] Task CU-21ph8h7 add buttons in PresenceList: "Tout
déplier", "Tout replier", for this the useState for showDetail
(boolean) in Presence is replaced by a useState for showDetails (Set
of ids) in PresenceList; correction missing tr between thead and ths
in PresenceList.
 2 files changed, 88 insertions(+), 48 deletions(-)
 rewrite src/apps/logs/components/PresenceList.js (61%)

2) I check the diff by hand, it doesn't match.
I check on github, it agrees with my manual count.

3) I try to get the same answer again in command-line.
laurent@laurent-GL73-8SD:~/ReposGit/flow$ git log -1 --shortstat
commit c068911547bddbf7bfc4ddc7a68ee8482421ed5c (HEAD -> master,
origin/master, origin/HEAD)
Author: Laurent Lyaudet <laurent.lyaudet@gmail.com>
Date:   Wed Mar 16 18:40:25 2022 +0100    Task CU-21ph8h7 add buttons
in PresenceList: "Tout déplier", "Tout replier",
    for this the useState for showDetail (boolean) in Presence is replaced
    by a useState for showDetails (Set of ids) in PresenceList;
    correction missing tr between thead and ths in PresenceList. 2
files changed, 71 insertions(+), 31 deletions(-)

How comes git has two ways to count modified lines ?
How can I make git output again the same numbers than just after commit ?
How can I check how it computed these numbers in the first place ?

Thanks, best regards,
    Laurent Lyaudet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
  2022-03-16 18:08 Git has two ways to count modified lines Laurent Lyaudet
@ 2022-04-02 16:49 ` Laurent Lyaudet
  2022-04-02 21:55   ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Lyaudet @ 2022-04-02 16:49 UTC (permalink / raw)
  To: git

Le mer. 16 mars 2022 à 19:08, Laurent Lyaudet
<laurent.lyaudet@gmail.com> a écrit :
>
> Hello,
>
> I check the number of lines modified by my commits by hand,
> and today I had a surprising count :
>
> 1) I commit and get the following count
> laurent@laurent-GL73-8SD:~/ReposGit/flow$ git commit
> [master c068911] Task CU-21ph8h7 add buttons in PresenceList: "Tout
> déplier", "Tout replier", for this the useState for showDetail
> (boolean) in Presence is replaced by a useState for showDetails (Set
> of ids) in PresenceList; correction missing tr between thead and ths
> in PresenceList.
>  2 files changed, 88 insertions(+), 48 deletions(-)
>  rewrite src/apps/logs/components/PresenceList.js (61%)
>
> 2) I check the diff by hand, it doesn't match.
> I check on github, it agrees with my manual count.
>
> 3) I try to get the same answer again in command-line.
> laurent@laurent-GL73-8SD:~/ReposGit/flow$ git log -1 --shortstat
> commit c068911547bddbf7bfc4ddc7a68ee8482421ed5c (HEAD -> master,
> origin/master, origin/HEAD)
> Author: Laurent Lyaudet <laurent.lyaudet@gmail.com>
> Date:   Wed Mar 16 18:40:25 2022 +0100    Task CU-21ph8h7 add buttons
> in PresenceList: "Tout déplier", "Tout replier",
>     for this the useState for showDetail (boolean) in Presence is replaced
>     by a useState for showDetails (Set of ids) in PresenceList;
>     correction missing tr between thead and ths in PresenceList. 2
> files changed, 71 insertions(+), 31 deletions(-)
>
> How comes git has two ways to count modified lines ?
> How can I make git output again the same numbers than just after commit ?
> How can I check how it computed these numbers in the first place ?
>
> Thanks, best regards,
>     Laurent Lyaudet

Hello,

I thought my email was sent to the right mailbox, at least it is what
says this website :
https://git-scm.com/community
> General questions or comments for the Git community can be sent to the mailing list by using the email address git@vger.kernel.org.
Moreover this website is cited in the README here :
https://github.com/git/git/blob/master/README.md
> Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
Is there any problem with my questions explaining I got no answer, please ?

I have found a partial explanation for the count differences :
>  2 files changed, 88 insertions(+), 48 deletions(-)
>  rewrite src/apps/logs/components/PresenceList.js (61%)
When committing, a file considered as "rewrite" is counted like m
lines deleted and n lines added if it had m lines before and n lines
after.
Even if the diff is much smaller.
Hence I answered my question :
> How can I check how it computed these numbers in the first place ?
But the two other questions remains :
> How comes git has two ways to count modified lines ?
i.e. What is (was) the purpose of this rewrite counting (when coded) ?
> How can I make git output again the same numbers than just after commit ?

Thanks, best regards,
    Laurent Lyaudet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
  2022-04-02 16:49 ` Laurent Lyaudet
@ 2022-04-02 21:55   ` René Scharfe
       [not found]     ` <xmqqh779u72a.fsf@gitster.g>
  2022-04-05 15:57     ` Laurent Lyaudet
  0 siblings, 2 replies; 7+ messages in thread
From: René Scharfe @ 2022-04-02 21:55 UTC (permalink / raw)
  To: Laurent Lyaudet, git; +Cc: Junio C Hamano, Eckhard S. Maaß, Elijah Newren

Am 02.04.22 um 18:49 schrieb Laurent Lyaudet:
> Le mer. 16 mars 2022 à 19:08, Laurent Lyaudet
> <laurent.lyaudet@gmail.com> a écrit :
>>
> I thought my email was sent to the right mailbox, at least it is what
> says this website :
> https://git-scm.com/community

Sure it was.

>> General questions or comments for the Git community can be sent to
>> the mailing list by using the email address git@vger.kernel.org.
> Moreover this website is cited in the README here :
> https://github.com/git/git/blob/master/README.md
>> Many Git online resources are accessible from https://git-scm.com/
>> including full documentation and Git related tools.
> Is there any problem with my questions explaining I got no answer,
> please ?
Not really, don't worry.

> I have found a partial explanation for the count differences :
>>  2 files changed, 88 insertions(+), 48 deletions(-)
>>  rewrite src/apps/logs/components/PresenceList.js (61%)
> When committing, a file considered as "rewrite" is counted like m
> lines deleted and n lines added if it had m lines before and n lines
> after.
> Even if the diff is much smaller.

Git stores the file contents before and after your change.  It doesn't
store any diff, but calculates them as needed, e.g. for the commit
confirmation message, or when you run git diff.  So in that sense there
is no "the diff".  The difference between two stored states can be
represented in many ways.

> Hence I answered my question :
>> How can I check how it computed these numbers in the first place ?

The option --break-rewrites controls rewrite detection.  Check out its
description in the documentation of git diff to see how to use it.

> But the two other questions remains :
>> How comes git has two ways to count modified lines ?
> i.e. What is (was) the purpose of this rewrite counting (when coded) ?

Rewrite detection is meant to improve the diff of a file whose content
was replaced with something very different.  Instead of lots of hunks
containing lines that add and remove unrelated stuff, separated by empty
lines etc. that the diff algorithm matches between the sides even though
they are also unrelated, a rewrite diff removes all the old lines en
bloc and then adds all the new ones, which is easier to read in that
case.

>> How can I make git output again the same numbers than just after commit ?

git show --stat --break-rewrites

But I have a question to the list as well: Why is break_opt (the
diff_options member for --break-rewrites) enabled for git commit by
default?  I ask because the last commit that mentioned it, dc6b1d92ca
(wt-status: use settings from git_diff_ui_config, 2018-05-04), claimed
it would turn it off, if I read it correctly.

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
       [not found]     ` <xmqqh779u72a.fsf@gitster.g>
@ 2022-04-04 21:08       ` René Scharfe
  2022-04-05  1:58         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2022-04-04 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Laurent Lyaudet, git, Eckhard S. Maaß, Elijah Newren

Am 04.04.22 um 00:37 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Git stores the file contents before and after your change.  It doesn't
>> store any diff, but calculates them as needed, e.g. for the commit
>> confirmation message, or when you run git diff.  So in that sense there
>> is no "the diff".  The difference between two stored states can be
>> represented in many ways.
>
> That's the crucial point to answer this question.  Even without
> break/rewrite transformation, depending on the choice of the diff
> algorithm, the same change may be shown as different patch (with
> different line count, obviously).
>
>> But I have a question to the list as well: Why is break_opt (the
>> diff_options member for --break-rewrites) enabled for git commit by
>> default?
>
> If it is, that is a bug, I would say.  Don't we initialize the
> member to "-1" at around diff.c:4580 though?

Yes, but it's set to 0 in print_commit_summary() at sequencer.c:1330.
That line was introduced by 3eb2a15eb3 (builtin-commit: make summary
output consistent with status, 2007-12-16).

--- >8 ----
Subject: [PATCH] commit, sequencer: turn off break_opt for commit summary

dc6b1d92ca (wt-status: use settings from git_diff_ui_config, 2018-05-04)
disabled diffopt.break_opt for diffstats shown by git status and in
commit templates.  For git status there isn't even a way to enable it.
Make the commit summary (shown after the commit) consistent by disabling
it there as well.

Reported-by: Laurent Lyaudet <laurent.lyaudet@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 sequencer.c               |  1 -
 t/t7524-commit-summary.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100755 t/t7524-commit-summary.sh

diff --git a/sequencer.c b/sequencer.c
index a1bb39383d..85a17d45bd 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1327,7 +1327,6 @@ void print_commit_summary(struct repository *r,
 	get_commit_format(format.buf, &rev);
 	rev.always_show_header = 0;
 	rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
-	rev.diffopt.break_opt = 0;
 	diff_setup_done(&rev.diffopt);

 	refs = get_main_ref_store(the_repository);
diff --git a/t/t7524-commit-summary.sh b/t/t7524-commit-summary.sh
new file mode 100755
index 0000000000..47b2f1dc22
--- /dev/null
+++ b/t/t7524-commit-summary.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='git commit summary'
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_seq 101 200 >file &&
+	git add file &&
+	git commit -m initial &&
+	git tag initial
+'
+
+test_expect_success 'commit summary ignores rewrites' '
+	git reset --hard initial &&
+	test_seq 200 300 >file &&
+
+	git diff --stat >diffstat &&
+	git diff --stat --break-rewrites >diffstatrewrite &&
+
+	# make sure this scenario is a detectable rewrite
+	! test_cmp_bin diffstat diffstatrewrite &&
+
+	git add file &&
+	git commit -m second >actual &&
+
+	grep "1 file" <actual >actual.total &&
+	grep "1 file" <diffstat >diffstat.total &&
+	test_cmp diffstat.total actual.total
+'
+
+test_done
--
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
  2022-04-04 21:08       ` René Scharfe
@ 2022-04-05  1:58         ` Ævar Arnfjörð Bjarmason
  2022-04-05 15:57           ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-05  1:58 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Laurent Lyaudet, git, Eckhard S. Maaß,
	Elijah Newren


On Mon, Apr 04 2022, René Scharfe wrote:

> diff --git a/sequencer.c b/sequencer.c
> index a1bb39383d..85a17d45bd 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1327,7 +1327,6 @@ void print_commit_summary(struct repository *r,
>  	get_commit_format(format.buf, &rev);
>  	rev.always_show_header = 0;
>  	rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
> -	rev.diffopt.break_opt = 0;
>  	diff_setup_done(&rev.diffopt);
>
>  	refs = get_main_ref_store(the_repository);
> diff --git a/t/t7524-commit-summary.sh b/t/t7524-commit-summary.sh
> new file mode 100755
> index 0000000000..47b2f1dc22
> --- /dev/null
> +++ b/t/t7524-commit-summary.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +
> +test_description='git commit summary'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	test_seq 101 200 >file &&
> +	git add file &&
> +	git commit -m initial &&
> +	git tag initial
> +'
> +
> +test_expect_success 'commit summary ignores rewrites' '
> +	git reset --hard initial &&

A leftover debugging aid? You can also use test_commit earlier:
	
	diff --git a/t/t7524-commit-summary.sh b/t/t7524-commit-summary.sh
	index 47b2f1dc22a..60027e86ccd 100755
	--- a/t/t7524-commit-summary.sh
	+++ b/t/t7524-commit-summary.sh
	@@ -4,14 +4,10 @@ test_description='git commit summary'
	 . ./test-lib.sh
	 
	 test_expect_success 'setup' '
	-	test_seq 101 200 >file &&
	-	git add file &&
	-	git commit -m initial &&
	-	git tag initial
	+	test_commit initial file "$(test_seq 101 200)"
	 '
	 
	 test_expect_success 'commit summary ignores rewrites' '
	-	git reset --hard initial &&
	 	test_seq 200 300 >file &&
	 
	 	git diff --stat >diffstat &&


> +	test_seq 200 300 >file &&
> +
> +	git diff --stat >diffstat &&
> +	git diff --stat --break-rewrites >diffstatrewrite &&
> +
> +	# make sure this scenario is a detectable rewrite
> +	! test_cmp_bin diffstat diffstatrewrite &&

Is this really binary? I removed the ! and tried test_cmp, and it's just
a diffstat.

Elsewhere in the test suite we test_cmp this output, would be
clearer/easier to read to do the same here if possible.

> +
> +	git add file &&
> +	git commit -m second >actual &&
> +
> +	grep "1 file" <actual >actual.total &&
> +	grep "1 file" <diffstat >diffstat.total &&
> +	test_cmp diffstat.total actual.total
> +'
> +
> +test_done


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
  2022-04-02 21:55   ` René Scharfe
       [not found]     ` <xmqqh779u72a.fsf@gitster.g>
@ 2022-04-05 15:57     ` Laurent Lyaudet
  1 sibling, 0 replies; 7+ messages in thread
From: Laurent Lyaudet @ 2022-04-05 15:57 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, Junio C Hamano, Eckhard S. Maaß, Elijah Newren

Le sam. 2 avr. 2022 à 23:55, René Scharfe <l.s.r@web.de> a écrit :
>
> Am 02.04.22 um 18:49 schrieb Laurent Lyaudet:
> > Le mer. 16 mars 2022 à 19:08, Laurent Lyaudet
> The option --break-rewrites controls rewrite detection.  Check out its
> description in the documentation of git diff to see how to use it.
Hello everyone,
Thanks René, I checked and now I understand how it works :)
> > But the two other questions remains :
> >> How comes git has two ways to count modified lines ?
> > i.e. What is (was) the purpose of this rewrite counting (when coded) ?
>
> Rewrite detection is meant to improve the diff of a file whose content
> was replaced with something very different.  Instead of lots of hunks
> containing lines that add and remove unrelated stuff, separated by empty
> lines etc. that the diff algorithm matches between the sides even though
> they are also unrelated, a rewrite diff removes all the old lines en
> bloc and then adds all the new ones, which is easier to read in that
> case.
It makes sense to produce diffs simpler to understand.
From my experience, humans are better than computers at knowing when
to "break-rewrites" (or see that a line was moved, etc.) .
I have dreamed about hinting the diff tools for many years,
but I'm more used to simply accept what the computer produces as its
diff(s) and have my own analysis that does not match.
Maybe for git 10.0, someone will add metadata in commits for human
optimized diffs :)
For applying patches, it is useless but for understanding code change, it helps.
At least, it is a good mental exercise.
Just in case, if someone cares, in my notebook I note the following counts :
- non empty and not space only created lines
- empty or space only created lines
- modified lines (non space created or deleted character)
- space modified lines (but I should split this category between space
presentation, and space bug related (Yes you can correct bug by
changing only spaces, most of the time in strings)
- moved lines :
   - as is
   - with space modification
   - with non-space modification
- non empty and not space only deleted lines
- empty or space only deleted lines

> >> How can I make git output again the same numbers than just after commit ?
>
> git show --stat --break-rewrites
Thanks :)

Best regards,
    Laurent Lyaudet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Git has two ways to count modified lines
  2022-04-05  1:58         ` Ævar Arnfjörð Bjarmason
@ 2022-04-05 15:57           ` René Scharfe
  0 siblings, 0 replies; 7+ messages in thread
From: René Scharfe @ 2022-04-05 15:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Laurent Lyaudet, git, Eckhard S. Maaß,
	Elijah Newren

Am 05.04.22 um 03:58 schrieb Ævar Arnfjörð Bjarmason:
>
> On Mon, Apr 04 2022, René Scharfe wrote:
>
>> diff --git a/sequencer.c b/sequencer.c
>> index a1bb39383d..85a17d45bd 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -1327,7 +1327,6 @@ void print_commit_summary(struct repository *r,
>>  	get_commit_format(format.buf, &rev);
>>  	rev.always_show_header = 0;
>>  	rev.diffopt.detect_rename = DIFF_DETECT_RENAME;
>> -	rev.diffopt.break_opt = 0;
>>  	diff_setup_done(&rev.diffopt);
>>
>>  	refs = get_main_ref_store(the_repository);
>> diff --git a/t/t7524-commit-summary.sh b/t/t7524-commit-summary.sh
>> new file mode 100755
>> index 0000000000..47b2f1dc22
>> --- /dev/null
>> +++ b/t/t7524-commit-summary.sh
>> @@ -0,0 +1,31 @@
>> +#!/bin/sh
>> +
>> +test_description='git commit summary'
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'setup' '
>> +	test_seq 101 200 >file &&
>> +	git add file &&
>> +	git commit -m initial &&
>> +	git tag initial
>> +'
>> +
>> +test_expect_success 'commit summary ignores rewrites' '
>> +	git reset --hard initial &&
>
> A leftover debugging aid?

No, I expect all tests in that file will need to reset the state and
didn't want to make an exception just for the first one.  It might be a
case of YAGNI, but I put the reset in intentionally.

> You can also use test_commit earlier:
>
> 	diff --git a/t/t7524-commit-summary.sh b/t/t7524-commit-summary.sh
> 	index 47b2f1dc22a..60027e86ccd 100755
> 	--- a/t/t7524-commit-summary.sh
> 	+++ b/t/t7524-commit-summary.sh
> 	@@ -4,14 +4,10 @@ test_description='git commit summary'
> 	 . ./test-lib.sh
>
> 	 test_expect_success 'setup' '
> 	-	test_seq 101 200 >file &&
> 	-	git add file &&
> 	-	git commit -m initial &&
> 	-	git tag initial
> 	+	test_commit initial file "$(test_seq 101 200)"

Nice.  Would ignore test_seq errors, though.  Probably not worth
worrying too much about.

> 	 '
>
> 	 test_expect_success 'commit summary ignores rewrites' '
> 	-	git reset --hard initial &&
> 	 	test_seq 200 300 >file &&
>
> 	 	git diff --stat >diffstat &&
>
>
>> +	test_seq 200 300 >file &&
>> +
>> +	git diff --stat >diffstat &&
>> +	git diff --stat --break-rewrites >diffstatrewrite &&
>> +
>> +	# make sure this scenario is a detectable rewrite
>> +	! test_cmp_bin diffstat diffstatrewrite &&
>
> Is this really binary? I removed the ! and tried test_cmp, and it's just
> a diffstat.
>
> Elsewhere in the test suite we test_cmp this output, would be
> clearer/easier to read to do the same here if possible.

The required result is one bit (same content or not?).  That sanity
check should not waste cycles calculating and printing a diff of the
diffstats.  I only want to make sure they are different.

>
>> +
>> +	git add file &&
>> +	git commit -m second >actual &&
>> +
>> +	grep "1 file" <actual >actual.total &&
>> +	grep "1 file" <diffstat >diffstat.total &&
>> +	test_cmp diffstat.total actual.total
>> +'
>> +
>> +test_done
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-05 22:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16 18:08 Git has two ways to count modified lines Laurent Lyaudet
2022-04-02 16:49 ` Laurent Lyaudet
2022-04-02 21:55   ` René Scharfe
     [not found]     ` <xmqqh779u72a.fsf@gitster.g>
2022-04-04 21:08       ` René Scharfe
2022-04-05  1:58         ` Ævar Arnfjörð Bjarmason
2022-04-05 15:57           ` René Scharfe
2022-04-05 15:57     ` Laurent Lyaudet

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).