git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	Thomas Guyot-Sionnest <tguyot@gmail.com>,
	git@vger.kernel.org, dermoth@aei.ca
Subject: Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
Date: Mon, 21 Sep 2020 18:20:21 -0400	[thread overview]
Message-ID: <20200921222021.GA3533110@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqft7aer3a.fsf@gitster.c.googlers.com>

On Mon, Sep 21, 2020 at 02:51:21PM -0700, Junio C Hamano wrote:

> > This is the direction I was getting at in my earlier emails, except that
> > I imagined that first conditional could be checking:
> >
> >   if (!one->oid_valid || !two->oid_valid)
> >
> > but I was surprised to see that diff_fill_oid_info() does not set
> > oid_valid. Is that a bug?
> 
> I do not think so.  oid_valid refers to the state during the
> collection phase (those who called diff_addremove() etc.) and
> updating it in diff_fill_oid_info() would lose information.  Maybe
> nobody looks at the bit at this late in the processing chain these
> days, in which case we can start flipping the bit there, but I
> offhand do not know what consequences such a change would trigger.

We use the flag to determine whether we need to compute the oid from
scratch. So I would think the current code causes us to compute the oid
multiple times in many cases. For example, with this patch:

diff --git a/diff.c b/diff.c
index ee8e8189e9..8363abab5b 100644
--- a/diff.c
+++ b/diff.c
@@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
 				die_errno("stat '%s'", one->path);
 			if (index_path(istate, &one->oid, one->path, &st, 0))
 				die("cannot hash %s", one->path);
+			warning("computed oid of %s as %s",
+				one->path, oid_to_hex(&one->oid));
 		}
 	}
 	else

I get (because diff.c is dirty in my working tree due to the patch):

  $ ./git diff --stat -p
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
   diff.c | 2 ++
   1 file changed, 2 insertions(+)
  
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
  diff --git a/diff.c b/diff.c
  index ee8e8189e9..8363abab5b 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
   				die_errno("stat '%s'", one->path);
   			if (index_path(istate, &one->oid, one->path, &st, 0))
   				die("cannot hash %s", one->path);
  +			warning("computed oid of %s as %s",
  +				one->path, oid_to_hex(&one->oid));
   		}
   	}
   	else

even though we already know the oid in the second call, so it's wasted
work. I agree that other code could be depending on oid_valid in a weird
way, but IMHO that code is probably wrong to do so. But it may not be
worth digging into, if nobody has complained about the waste.

> > I also imagined that we'd have to determine right then whether the
> > contents are actually different or not with a memcmp(), to avoid
> > emitting a "0 changes" line, but we do handle that case within the
> > "!same_contents" conditional. See the comment starting with "Omit
> > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
> > uninteresting modifications, 2020-08-20).
> 
> Yes, we are essentially on the same page---same_contents bit is
> merely an optimization to decide cheaply when we do not have to do
> xdl, but the codepath that does the xdl must be prepared to deal
> with the "we thought they are different, but after all they turn out
> to be equivalent" case.  Therefore false positive to declare two
> different things as same cannot be tolerated, but false negative to
> declare two things that are the same as !same_contents is fine.

I thought it may matter on "maint", where we do not have 1cf3d5db9b.
I.e., I expected:

  echo foo >a
  echo foo >b
  git diff --no-index --stat a b

might switch from no output to having a line like:

  a => b | 0

But we don't even get to builtin_diffstat() there. We throw out the pair
in diffcore_skip_stat_unmatch(). Likewise, if you get past that with
something like a mode change:

  chmod +x b
  git diff --no-index --stat a b

then that does generate the "0" stat line. But it does so both before
and after the proposed change. The same thing happens in no-index mode:

  git init
  echo foo >file
  git add .
  git commit -am no-bit
  chmod +x file
  git commit -am exec-bit
  git show --stat

will give you:

   file | 0

I'm not sure if that's the desired behavior or not, but at any rate
fixing this builtin_diffstat() conditional won't change it either way. :)

-Peff

  reply	other threads:[~2020-09-21 22:20 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
2020-09-18 14:46   ` Taylor Blau
2020-09-18 15:10     ` Thomas Guyot-Sionnest
2020-09-18 17:37       ` Jeff King
2020-09-18 18:00         ` Thomas Guyot-Sionnest
2020-09-20  4:53       ` Thomas Guyot
2020-09-18 17:27   ` Jeff King
2020-09-18 17:52     ` Thomas Guyot-Sionnest
2020-09-18 18:06       ` Junio C Hamano
2020-09-23 19:16         ` Johannes Schindelin
2020-09-23 19:23           ` Junio C Hamano
2020-09-23 20:44             ` Johannes Schindelin
2020-09-24  4:49               ` Thomas Guyot
2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
2020-09-24  7:41                   ` [PATCH v4] " Thomas Guyot-Sionnest
2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
2020-09-24  7:13                   ` Thomas Guyot
2020-09-24 17:19                     ` Junio C Hamano
2020-09-24 17:38                       ` Junio C Hamano
2020-09-23 15:05     ` Johannes Schindelin
2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
2020-09-20 15:39     ` Taylor Blau
2020-09-20 16:38       ` Thomas Guyot
2020-09-20 19:11       ` Junio C Hamano
2020-09-20 20:08         ` Junio C Hamano
2020-09-20 20:36         ` Junio C Hamano
2020-09-20 22:15           ` Junio C Hamano
2020-09-21 19:26         ` Jeff King
2020-09-21 21:51           ` Junio C Hamano
2020-09-21 22:20             ` Jeff King [this message]
2020-09-21 22:37               ` Junio C Hamano
2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
2020-09-18 14:36   ` Taylor Blau
2020-09-18 16:34     ` Thomas Guyot-Sionnest
2020-09-18 17:19       ` Jeff King
2020-09-18 17:21         ` Jeff King
2020-09-18 17:39         ` Thomas Guyot-Sionnest
2020-09-18 17:48         ` Junio C Hamano
2020-09-18 18:02           ` Jeff King
2020-09-20 12:54             ` Thomas Guyot
2020-09-21 19:31               ` Jeff King
2020-09-21 20:14                 ` Junio C Hamano
2020-09-18 17:58       ` Taylor Blau
2020-09-18 18:05         ` Jeff King
2020-09-18 17:20     ` Jeff King
2020-09-18 18:00       ` Taylor Blau
2020-09-18 21:56   ` brian m. carlson
2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
2020-09-18 18:24   ` Thomas Guyot-Sionnest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200921222021.GA3533110@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dermoth@aei.ca \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=tguyot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).