git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [git 2.26] stat counts reported by commit and log are different
@ 2020-04-09  2:04 Norbert Kiesel
  2020-04-09 13:59 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Norbert Kiesel @ 2020-04-09  2:04 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I just noticed that the numbers reported are different:

% git commit -m 'major code cleanup for SCIMPatchOperationUtil'
[nkiesel/nextrelease/SCIM_2 d4db6f6d83f] major code cleanup for
SCIMPatchOperationUtil
 1 file changed, 2106 insertions(+), 3367 deletions(-)
 rewrite utils/SCIMPatchOperationUtil.java (74%)
% git log --stat -1
commit d4db6f6d83f (HEAD -> nkiesel/nextrelease/SCIM_2)
Author: Norbert Kiesel <nkiesel@metricstream.com>
Date:   Wed Apr 8 18:49:27 2020 -0700

    major code cleanup for SCIMPatchOperationUtil

 utils/SCIMPatchOperationUtil.java | 3807
+++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------------------------------------------------
 1 file changed, 1273 insertions(+), 2534 deletions(-)
% wc -l utils/SCIMPatchOperationUtil.java
2106 utils/SCIMPatchOperationUtil.java

As you can see, the number of lines in the current file match the
number of insertions reported by commit.  I suspect some "automated
line ending conversion magic" is at fault here (the file has "\r\n"
for my Windows-impaired co-developers but I modified it on a Linux
system). `git config --get core.autocrlf` on my machine returns
`input`.  However, I think it would still be better to always report
the same numbers, no?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [git 2.26] stat counts reported by commit and log are different
  2020-04-09  2:04 [git 2.26] stat counts reported by commit and log are different Norbert Kiesel
@ 2020-04-09 13:59 ` Jeff King
  2020-04-09 21:58   ` Norbert Kiesel
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2020-04-09 13:59 UTC (permalink / raw)
  To: Norbert Kiesel; +Cc: Git Mailing List

On Wed, Apr 08, 2020 at 07:04:02PM -0700, Norbert Kiesel wrote:

> % git commit -m 'major code cleanup for SCIMPatchOperationUtil'
> [nkiesel/nextrelease/SCIM_2 d4db6f6d83f] major code cleanup for
> SCIMPatchOperationUtil
>  1 file changed, 2106 insertions(+), 3367 deletions(-)
>  rewrite utils/SCIMPatchOperationUtil.java (74%)

The commit summary diff is done with the break-detection option on. You
can tell it kicked in here because of the "rewrite" line. What that
means is that the changes to the file were so extensive that Git decided
the file had been totally rewritten, and broke it into a separate
add/delete. That has two implications:

  - we'd consider it a candidate for rename detection; that didn't
    happen here because there were no other files added or deleted to
    serve as source/dest candidates

  - the diff will be reported as a complete removal of the old content
    and the addition of the new, _even for lines that were the same_

> % git log --stat -1
> commit d4db6f6d83f (HEAD -> nkiesel/nextrelease/SCIM_2)
> Author: Norbert Kiesel <nkiesel@metricstream.com>
> Date:   Wed Apr 8 18:49:27 2020 -0700
> 
>     major code cleanup for SCIMPatchOperationUtil
> 
>  utils/SCIMPatchOperationUtil.java | 3807
> +++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------------------------------------------------
>  1 file changed, 1273 insertions(+), 2534 deletions(-)

But in git-log break-detection isn't on by default. Try adding "-B" and
you will get the other numbers.

Try also adding "-p" to see the whole-diff in action.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [git 2.26] stat counts reported by commit and log are different
  2020-04-09 13:59 ` Jeff King
@ 2020-04-09 21:58   ` Norbert Kiesel
  2020-04-09 22:47     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Norbert Kiesel @ 2020-04-09 21:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

Thanks for the explanation. I still wonder why break-detection is by
default enabled for commit but disabled for log.  Is there any
rationale for this?

On Thu, Apr 9, 2020 at 7:00 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Apr 08, 2020 at 07:04:02PM -0700, Norbert Kiesel wrote:
>
> > % git commit -m 'major code cleanup for SCIMPatchOperationUtil'
> > [nkiesel/nextrelease/SCIM_2 d4db6f6d83f] major code cleanup for
> > SCIMPatchOperationUtil
> >  1 file changed, 2106 insertions(+), 3367 deletions(-)
> >  rewrite utils/SCIMPatchOperationUtil.java (74%)
>
> The commit summary diff is done with the break-detection option on. You
> can tell it kicked in here because of the "rewrite" line. What that
> means is that the changes to the file were so extensive that Git decided
> the file had been totally rewritten, and broke it into a separate
> add/delete. That has two implications:
>
>   - we'd consider it a candidate for rename detection; that didn't
>     happen here because there were no other files added or deleted to
>     serve as source/dest candidates
>
>   - the diff will be reported as a complete removal of the old content
>     and the addition of the new, _even for lines that were the same_
>
> > % git log --stat -1
> > commit d4db6f6d83f (HEAD -> nkiesel/nextrelease/SCIM_2)
> > Author: Norbert Kiesel <nkiesel@metricstream.com>
> > Date:   Wed Apr 8 18:49:27 2020 -0700
> >
> >     major code cleanup for SCIMPatchOperationUtil
> >
> >  utils/SCIMPatchOperationUtil.java | 3807
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------------------------------------------------------
> >  1 file changed, 1273 insertions(+), 2534 deletions(-)
>
> But in git-log break-detection isn't on by default. Try adding "-B" and
> you will get the other numbers.
>
> Try also adding "-p" to see the whole-diff in action.
>
> -Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [git 2.26] stat counts reported by commit and log are different
  2020-04-09 21:58   ` Norbert Kiesel
@ 2020-04-09 22:47     ` Jeff King
  2020-04-09 22:55       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2020-04-09 22:47 UTC (permalink / raw)
  To: Norbert Kiesel; +Cc: Git Mailing List

On Thu, Apr 09, 2020 at 02:58:25PM -0700, Norbert Kiesel wrote:

> Thanks for the explanation. I still wonder why break-detection is by
> default enabled for commit but disabled for log.  Is there any
> rationale for this?

It's enabled for commit, because it was enable for git-status, via
3eb2a15eb3 (builtin-commit: make summary output consistent with status,
2007-12-16). It was enabled for status because it makes rename detection
more accurate; see the discussion at the end of this thread:

  https://lore.kernel.org/git/20071121171235.GA32233@sigill.intra.peff.net/

That tells us why it _is_ enable for commit, but not why it's not for
log. Traditionally rename detection was _not_ enabled by default there.
Because it can be expensive, we were quicker to enable it for
single-diff commands like commit (as opposed to log, which is doing a
bunch of diffs).

Much later, in 5404c116aa (diff: activate diff.renames by default,
2016-02-25), we turned on renames by default for git-log. But as far as
I recall, nobody gave much thought to turning on break detection at the
same time.

It might make sense to do so (and/or to make it possible to enable it by
config like we did for years with diff.renames). But it definitely is
way more expensive. For a tree-only diff, we don't usually have to look
at most blobs at all. Even with rename detection, we only have to look
at inexact rename candidates. But break detection must look at every
single modified file (timings are on git.git):

  [no renames at all, pretty fast]
  $ time git log --no-renames --raw >/dev/null
  real	0m2.231s
  user	0m2.203s
  sys	0m0.028s

  [adding in renames isn't very expensive]
  $ time git log -M --raw >/dev/null
  real	0m2.284s
  user	0m2.224s
  sys	0m0.060s

  [but break detection is]
  $ git log -B -M --raw >/dev/null
  real	0m33.377s
  user	0m32.942s
  sys	0m0.424s

A more fair comparison would actually generate content-level diffs,
which need to look in the blobs, like:

  $ time git log -M -p >/dev/null
  real	0m23.287s
  user	0m22.854s
  sys	0m0.432s

  $ time git log -B -M -p >/dev/null
  real	0m49.763s
  user	0m49.282s
  sys	0m0.480s

So not quite as bad percentage-wise, but still pretty expensive. And for
not a huge benefit. There are ~261 impacted commits. You can see a
recent example with:

  git show -B -M --stat --summary ce6521e44

where we find that most of builtin/fmt-merge-msg.c was moved to
fmt-merge-msg.c. It's nice, but it's expensive enough that it probably
shouldn't be the default.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [git 2.26] stat counts reported by commit and log are different
  2020-04-09 22:47     ` Jeff King
@ 2020-04-09 22:55       ` Junio C Hamano
  2020-04-09 23:12         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2020-04-09 22:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Norbert Kiesel, Git Mailing List

Jeff King <peff@peff.net> writes:

> It might make sense to do so (and/or to make it possible to enable it by
> config like we did for years with diff.renames). But it definitely is
> way more expensive.
> ...
> So not quite as bad percentage-wise, but still pretty expensive. And for
> not a huge benefit. There are ~261 impacted commits. You can see a
> recent example with:
>
>   git show -B -M --stat --summary ce6521e44
>
> where we find that most of builtin/fmt-merge-msg.c was moved to
> fmt-merge-msg.c. It's nice, but it's expensive enough that it probably
> shouldn't be the default.

Not only that, it can cost correctness-wise.  Until this

  https://public-inbox.org/git/xmqqegqaahnh.fsf@gitster.dls.corp.google.com/

gets corrected, it is not advisable to enable -B and -M at the same
time.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [git 2.26] stat counts reported by commit and log are different
  2020-04-09 22:55       ` Junio C Hamano
@ 2020-04-09 23:12         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2020-04-09 23:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Norbert Kiesel, Git Mailing List

On Thu, Apr 09, 2020 at 03:55:30PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It might make sense to do so (and/or to make it possible to enable it by
> > config like we did for years with diff.renames). But it definitely is
> > way more expensive.
> > ...
> > So not quite as bad percentage-wise, but still pretty expensive. And for
> > not a huge benefit. There are ~261 impacted commits. You can see a
> > recent example with:
> >
> >   git show -B -M --stat --summary ce6521e44
> >
> > where we find that most of builtin/fmt-merge-msg.c was moved to
> > fmt-merge-msg.c. It's nice, but it's expensive enough that it probably
> > shouldn't be the default.
> 
> Not only that, it can cost correctness-wise.  Until this
> 
>   https://public-inbox.org/git/xmqqegqaahnh.fsf@gitster.dls.corp.google.com/
> 
> gets corrected, it is not advisable to enable -B and -M at the same
> time.

Ah, I forgot about that. I think the resulting diff might still be
useful for a human to read, but yeah, it cannot be applied (though even
there, it would be nice for it to make a little more clear to humans
what's happening with the destination file).

I wonder if we ought to remove -B from its use in status/commit, but I
guess there we are never generating a diff to apply, but rather one to
help humans understand what happened.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-09 23:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-09  2:04 [git 2.26] stat counts reported by commit and log are different Norbert Kiesel
2020-04-09 13:59 ` Jeff King
2020-04-09 21:58   ` Norbert Kiesel
2020-04-09 22:47     ` Jeff King
2020-04-09 22:55       ` Junio C Hamano
2020-04-09 23:12         ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).