From: Junio C Hamano <firstname.lastname@example.org> To: Jeff King <email@example.com> Cc: Taylor Blau <firstname.lastname@example.org>, Thomas Guyot-Sionnest <email@example.com>, firstname.lastname@example.org, email@example.com Subject: Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat Date: Mon, 21 Sep 2020 14:51:21 -0700 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <20200921192630.GA2399334@coredump.intra.peff.net> (Jeff King's message of "Mon, 21 Sep 2020 15:26:30 -0400") Jeff King <email@example.com> writes: > For diffstat, though, it seems like a waste of time; we don't care what > the object hash is. I.e., if we were to do this: > > diff --git a/diff.c b/diff.c > index 16eeaf6645..1934af29a5 100644 > --- a/diff.c > +++ b/diff.c > @@ -4564,9 +4564,6 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o, > if (o->prefix_length) > strip_prefix(o->prefix_length, &name, &other); > > - diff_fill_oid_info(p->one, o->repo->index); > - diff_fill_oid_info(p->two, o->repo->index); > - > builtin_diffstat(name, other, p->one, p->two, > diffstat, o, p); > } > > then everything seems to work fine _except_ a "git diff --stat > --no-index", exactly because it hits this "same_contents" check we've > been discussing. And once that is fixed properly (to handle any case > where we have no oid, not just when the stdin flag is set), then perhaps > it is worth doing. > Or perhaps not. Even if we have to memcmp sometimes in > builtin_diffstat(), it would be faster than computing the individual > hashes. But it may not be measurably so, and it would be no difference > for the common case of filespecs for which we do know the oid for free. > I also suspect we'd need to be a little smarter about combined formats > (e.g., "--stat --patch" might as well compute the oid as early as > possible, since we'll need it eventually for the patch; but we'd hit the > call in builtin_diffstat() before the one in run_diff()). > >> But there are paths for which oid cannot be computed without >> destroying their contents. Such paths are marked by the function >> with null_oid. > > I'm not clear how computing the oid destroys the contents. We have them > in an in-memory buffer at this point, don't we? So we _could_ generate > an oid even for stdin, like this: Yes, yes yes. That is the "best" (which later retracted) approach I suggested in the same message, but it would end up filling a real-looking object name for working tree side of diff-files, which has a far larger consequence we need to think about and consumes more brain cycles than warranted here, I would think. >> Summarizing the above, I think the second best fix is this (which >> means that the posted patch is the third best): >> >> /* >> * diff_fill_oid_info() marked one/two->oid with null_oid >> * for a path whose oid is not available. Disable early >> * return optimization for them. >> */ >> if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) >> same_contents = 0; /* could be different */ >> else if (oideq(&one->oid, &two->oid)) >> same_contents = 1; /* definitely the same */ >> else >> same_contents = 0; /* definitely different */ > > This is the direction I was getting at in my earlier emails, except that > I imagined that first conditional could be checking: > > if (!one->oid_valid || !two->oid_valid) > > but I was surprised to see that diff_fill_oid_info() does not set > oid_valid. Is that a bug? I do not think so. oid_valid refers to the state during the collection phase (those who called diff_addremove() etc.) and updating it in diff_fill_oid_info() would lose information. Maybe nobody looks at the bit at this late in the processing chain these days, in which case we can start flipping the bit there, but I offhand do not know what consequences such a change would trigger. > I also imagined that we'd have to determine right then whether the > contents are actually different or not with a memcmp(), to avoid > emitting a "0 changes" line, but we do handle that case within the > "!same_contents" conditional. See the comment starting with "Omit > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore > uninteresting modifications, 2020-08-20). Yes, we are essentially on the same page---same_contents bit is merely an optimization to decide cheaply when we do not have to do xdl, but the codepath that does the xdl must be prepared to deal with the "we thought they are different, but after all they turn out to be equivalent" case. Therefore false positive to declare two different things as same cannot be tolerated, but false negative to declare two things that are the same as !same_contents is fine.
next prev parent reply other threads:[~2020-09-21 21:51 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest 2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest 2020-09-18 14:46 ` Taylor Blau 2020-09-18 15:10 ` Thomas Guyot-Sionnest 2020-09-18 17:37 ` Jeff King 2020-09-18 18:00 ` Thomas Guyot-Sionnest 2020-09-20 4:53 ` Thomas Guyot 2020-09-18 17:27 ` Jeff King 2020-09-18 17:52 ` Thomas Guyot-Sionnest 2020-09-18 18:06 ` Junio C Hamano 2020-09-23 19:16 ` Johannes Schindelin 2020-09-23 19:23 ` Junio C Hamano 2020-09-23 20:44 ` Johannes Schindelin 2020-09-24 4:49 ` Thomas Guyot 2020-09-24 5:24 ` [PATCH v3] " Thomas Guyot-Sionnest 2020-09-24 7:41 ` [PATCH v4] " Thomas Guyot-Sionnest 2020-09-24 6:40 ` [PATCH 1/2] " Junio C Hamano 2020-09-24 7:13 ` Thomas Guyot 2020-09-24 17:19 ` Junio C Hamano 2020-09-24 17:38 ` Junio C Hamano 2020-09-23 15:05 ` Johannes Schindelin 2020-09-20 13:09 ` [PATCH v2] " Thomas Guyot-Sionnest 2020-09-20 15:39 ` Taylor Blau 2020-09-20 16:38 ` Thomas Guyot 2020-09-20 19:11 ` Junio C Hamano 2020-09-20 20:08 ` Junio C Hamano 2020-09-20 20:36 ` Junio C Hamano 2020-09-20 22:15 ` Junio C Hamano 2020-09-21 19:26 ` Jeff King 2020-09-21 21:51 ` Junio C Hamano [this message] 2020-09-21 22:20 ` Jeff King 2020-09-21 22:37 ` Junio C Hamano 2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest 2020-09-18 14:36 ` Taylor Blau 2020-09-18 16:34 ` Thomas Guyot-Sionnest 2020-09-18 17:19 ` Jeff King 2020-09-18 17:21 ` Jeff King 2020-09-18 17:39 ` Thomas Guyot-Sionnest 2020-09-18 17:48 ` Junio C Hamano 2020-09-18 18:02 ` Jeff King 2020-09-20 12:54 ` Thomas Guyot 2020-09-21 19:31 ` Jeff King 2020-09-21 20:14 ` Junio C Hamano 2020-09-18 17:58 ` Taylor Blau 2020-09-18 18:05 ` Jeff King 2020-09-18 17:20 ` Jeff King 2020-09-18 18:00 ` Taylor Blau 2020-09-18 21:56 ` brian m. carlson 2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano 2020-09-18 18:24 ` Thomas Guyot-Sionnest
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).