From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Taylor Blau <me@ttaylorr.com>,
Thomas Guyot-Sionnest <tguyot@gmail.com>,
git@vger.kernel.org, dermoth@aei.ca
Subject: Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
Date: Mon, 21 Sep 2020 18:20:21 -0400 [thread overview]
Message-ID: <20200921222021.GA3533110@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqft7aer3a.fsf@gitster.c.googlers.com>
On Mon, Sep 21, 2020 at 02:51:21PM -0700, Junio C Hamano wrote:
> > This is the direction I was getting at in my earlier emails, except that
> > I imagined that first conditional could be checking:
> >
> > if (!one->oid_valid || !two->oid_valid)
> >
> > but I was surprised to see that diff_fill_oid_info() does not set
> > oid_valid. Is that a bug?
>
> I do not think so. oid_valid refers to the state during the
> collection phase (those who called diff_addremove() etc.) and
> updating it in diff_fill_oid_info() would lose information. Maybe
> nobody looks at the bit at this late in the processing chain these
> days, in which case we can start flipping the bit there, but I
> offhand do not know what consequences such a change would trigger.
We use the flag to determine whether we need to compute the oid from
scratch. So I would think the current code causes us to compute the oid
multiple times in many cases. For example, with this patch:
diff --git a/diff.c b/diff.c
index ee8e8189e9..8363abab5b 100644
--- a/diff.c
+++ b/diff.c
@@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
die_errno("stat '%s'", one->path);
if (index_path(istate, &one->oid, one->path, &st, 0))
die("cannot hash %s", one->path);
+ warning("computed oid of %s as %s",
+ one->path, oid_to_hex(&one->oid));
}
}
else
I get (because diff.c is dirty in my working tree due to the patch):
$ ./git diff --stat -p
warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
diff.c | 2 ++
1 file changed, 2 insertions(+)
warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
diff --git a/diff.c b/diff.c
index ee8e8189e9..8363abab5b 100644
--- a/diff.c
+++ b/diff.c
@@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
die_errno("stat '%s'", one->path);
if (index_path(istate, &one->oid, one->path, &st, 0))
die("cannot hash %s", one->path);
+ warning("computed oid of %s as %s",
+ one->path, oid_to_hex(&one->oid));
}
}
else
even though we already know the oid in the second call, so it's wasted
work. I agree that other code could be depending on oid_valid in a weird
way, but IMHO that code is probably wrong to do so. But it may not be
worth digging into, if nobody has complained about the waste.
> > I also imagined that we'd have to determine right then whether the
> > contents are actually different or not with a memcmp(), to avoid
> > emitting a "0 changes" line, but we do handle that case within the
> > "!same_contents" conditional. See the comment starting with "Omit
> > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
> > uninteresting modifications, 2020-08-20).
>
> Yes, we are essentially on the same page---same_contents bit is
> merely an optimization to decide cheaply when we do not have to do
> xdl, but the codepath that does the xdl must be prepared to deal
> with the "we thought they are different, but after all they turn out
> to be equivalent" case. Therefore false positive to declare two
> different things as same cannot be tolerated, but false negative to
> declare two things that are the same as !same_contents is fine.
I thought it may matter on "maint", where we do not have 1cf3d5db9b.
I.e., I expected:
echo foo >a
echo foo >b
git diff --no-index --stat a b
might switch from no output to having a line like:
a => b | 0
But we don't even get to builtin_diffstat() there. We throw out the pair
in diffcore_skip_stat_unmatch(). Likewise, if you get past that with
something like a mode change:
chmod +x b
git diff --no-index --stat a b
then that does generate the "0" stat line. But it does so both before
and after the proposed change. The same thing happens in no-index mode:
git init
echo foo >file
git add .
git commit -am no-bit
chmod +x file
git commit -am exec-bit
git show --stat
will give you:
file | 0
I'm not sure if that's the desired behavior or not, but at any rate
fixing this builtin_diffstat() conditional won't change it either way. :)
-Peff
next prev parent reply other threads:[~2020-09-21 22:20 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
2020-09-18 14:46 ` Taylor Blau
2020-09-18 15:10 ` Thomas Guyot-Sionnest
2020-09-18 17:37 ` Jeff King
2020-09-18 18:00 ` Thomas Guyot-Sionnest
2020-09-20 4:53 ` Thomas Guyot
2020-09-18 17:27 ` Jeff King
2020-09-18 17:52 ` Thomas Guyot-Sionnest
2020-09-18 18:06 ` Junio C Hamano
2020-09-23 19:16 ` Johannes Schindelin
2020-09-23 19:23 ` Junio C Hamano
2020-09-23 20:44 ` Johannes Schindelin
2020-09-24 4:49 ` Thomas Guyot
2020-09-24 5:24 ` [PATCH v3] " Thomas Guyot-Sionnest
2020-09-24 7:41 ` [PATCH v4] " Thomas Guyot-Sionnest
2020-09-24 6:40 ` [PATCH 1/2] " Junio C Hamano
2020-09-24 7:13 ` Thomas Guyot
2020-09-24 17:19 ` Junio C Hamano
2020-09-24 17:38 ` Junio C Hamano
2020-09-23 15:05 ` Johannes Schindelin
2020-09-20 13:09 ` [PATCH v2] " Thomas Guyot-Sionnest
2020-09-20 15:39 ` Taylor Blau
2020-09-20 16:38 ` Thomas Guyot
2020-09-20 19:11 ` Junio C Hamano
2020-09-20 20:08 ` Junio C Hamano
2020-09-20 20:36 ` Junio C Hamano
2020-09-20 22:15 ` Junio C Hamano
2020-09-21 19:26 ` Jeff King
2020-09-21 21:51 ` Junio C Hamano
2020-09-21 22:20 ` Jeff King [this message]
2020-09-21 22:37 ` Junio C Hamano
2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
2020-09-18 14:36 ` Taylor Blau
2020-09-18 16:34 ` Thomas Guyot-Sionnest
2020-09-18 17:19 ` Jeff King
2020-09-18 17:21 ` Jeff King
2020-09-18 17:39 ` Thomas Guyot-Sionnest
2020-09-18 17:48 ` Junio C Hamano
2020-09-18 18:02 ` Jeff King
2020-09-20 12:54 ` Thomas Guyot
2020-09-21 19:31 ` Jeff King
2020-09-21 20:14 ` Junio C Hamano
2020-09-18 17:58 ` Taylor Blau
2020-09-18 18:05 ` Jeff King
2020-09-18 17:20 ` Jeff King
2020-09-18 18:00 ` Taylor Blau
2020-09-18 21:56 ` brian m. carlson
2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
2020-09-18 18:24 ` Thomas Guyot-Sionnest
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200921222021.GA3533110@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=dermoth@aei.ca \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=tguyot@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).