git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Norbert Kiesel <nkiesel@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [git 2.26] stat counts reported by commit and log are different
Date: Thu, 9 Apr 2020 18:47:47 -0400	[thread overview]
Message-ID: <20200409224747.GA4173825@coredump.intra.peff.net> (raw)
In-Reply-To: <CAM+g_NuZ3pKAd80+HoR8-_0=N9wV28L-yyb1VhJhTbYH+RS0og@mail.gmail.com>

On Thu, Apr 09, 2020 at 02:58:25PM -0700, Norbert Kiesel wrote:

> Thanks for the explanation. I still wonder why break-detection is by
> default enabled for commit but disabled for log.  Is there any
> rationale for this?

It's enabled for commit, because it was enable for git-status, via
3eb2a15eb3 (builtin-commit: make summary output consistent with status,
2007-12-16). It was enabled for status because it makes rename detection
more accurate; see the discussion at the end of this thread:

  https://lore.kernel.org/git/20071121171235.GA32233@sigill.intra.peff.net/

That tells us why it _is_ enable for commit, but not why it's not for
log. Traditionally rename detection was _not_ enabled by default there.
Because it can be expensive, we were quicker to enable it for
single-diff commands like commit (as opposed to log, which is doing a
bunch of diffs).

Much later, in 5404c116aa (diff: activate diff.renames by default,
2016-02-25), we turned on renames by default for git-log. But as far as
I recall, nobody gave much thought to turning on break detection at the
same time.

It might make sense to do so (and/or to make it possible to enable it by
config like we did for years with diff.renames). But it definitely is
way more expensive. For a tree-only diff, we don't usually have to look
at most blobs at all. Even with rename detection, we only have to look
at inexact rename candidates. But break detection must look at every
single modified file (timings are on git.git):

  [no renames at all, pretty fast]
  $ time git log --no-renames --raw >/dev/null
  real	0m2.231s
  user	0m2.203s
  sys	0m0.028s

  [adding in renames isn't very expensive]
  $ time git log -M --raw >/dev/null
  real	0m2.284s
  user	0m2.224s
  sys	0m0.060s

  [but break detection is]
  $ git log -B -M --raw >/dev/null
  real	0m33.377s
  user	0m32.942s
  sys	0m0.424s

A more fair comparison would actually generate content-level diffs,
which need to look in the blobs, like:

  $ time git log -M -p >/dev/null
  real	0m23.287s
  user	0m22.854s
  sys	0m0.432s

  $ time git log -B -M -p >/dev/null
  real	0m49.763s
  user	0m49.282s
  sys	0m0.480s

So not quite as bad percentage-wise, but still pretty expensive. And for
not a huge benefit. There are ~261 impacted commits. You can see a
recent example with:

  git show -B -M --stat --summary ce6521e44

where we find that most of builtin/fmt-merge-msg.c was moved to
fmt-merge-msg.c. It's nice, but it's expensive enough that it probably
shouldn't be the default.

-Peff

  reply	other threads:[~2020-04-09 22:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-09  2:04 [git 2.26] stat counts reported by commit and log are different Norbert Kiesel
2020-04-09 13:59 ` Jeff King
2020-04-09 21:58   ` Norbert Kiesel
2020-04-09 22:47     ` Jeff King [this message]
2020-04-09 22:55       ` Junio C Hamano
2020-04-09 23:12         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409224747.GA4173825@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=nkiesel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).