From: Junio C Hamano <firstname.lastname@example.org> To: Taylor Blau <email@example.com> Cc: Thomas Guyot-Sionnest <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, email@example.com Subject: Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat Date: Sun, 20 Sep 2020 12:11:20 -0700 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <20200920153915.GB2726066@nand.local> (Taylor Blau's message of "Sun, 20 Sep 2020 11:39:15 -0400") Taylor Blau <email@example.com> writes: > On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote: >> In builtin_diffstat(), when both files are coming from "stdin" (which >> could be better described as the file's content being written directly >> into the file object), oideq() compares two null hashes and ignores the >> actual differences for the statistics. >> >> This patch checks if is_stdin flag is set on both sides and compare >> contents directly. >> >> Signed-off-by: Thomas Guyot-Sionnest <firstname.lastname@example.org> >> --- >> Range-diff: >> 1: 479c2835fc ! 1: 1f25713d44 diff: Fix modified lines stats with --stat and --numstat >> @@ -20,8 +20,12 @@ >> } >> >> - same_contents = oideq(&one->oid, &two->oid); >> ++ /* What is_stdin really means is that the file's content is only >> ++ * in the filespec's buffer and its oid is zero. We can't compare >> ++ * oid's if both are null and we can just diff the buffers */ >> + if (one->is_stdin && two->is_stdin) >> -+ same_contents = !strcmp(one->data, two->data); >> ++ same_contents = (one->size == two->size ? >> ++ !memcmp(one->data, two->data, one->size) : 0); >> + else >> + same_contents = oideq(&one->oid, &two->oid); > > After reading your explanation in , this version makes more sense to > me. These oid fields are prepared by calling diff_fill_oid_info(), and even for paths that are dirty (hence no "free" oid available from index or tree entry), an appropriate oid is computed. But there are paths for which oid cannot be computed without destroying their contents. Such paths are marked by the function with null_oid. It happens to be that stdin is the only class of paths that are treated that way _right_ _now_, but future code may support different kind of paths that share the same trait. When we want to know "is comparing the oid sufficient?", we shouldn't inspect the is_stdin flag ourselves in a caller of diff_fill_oid_info(), because the helper _is_ responsible for knowing what kind of paths are special, and signals that "assume this would not be equal to anything else" by giving null_oid back. The caller should use the info left by diff_fill_oid_info(), namely, "even if the oid on both sides are the same, if it is null_oid, then we know diff_fill_oid_info() didn't actually compute the oid, and we need to compare the blob ourselves". And there is no point in doing memcmp() here, I think. The same_contents() check is done as an optimization to avoid xdl. Even if the two sides were thought to be different at the oid level, xdl comparison may find that there is no difference after all (e.g. think of whitespace ignoring comparison), so we should assume and rely on that the downstream code MUST BE prepared to handle false negatives (i.e. same_contents says they are different, but they actually produce no diffstat). Running memcmp() over contents in potentially a large buffer to find that they are different, and then have xdl process that large buffer again, would be a waste. Summarizing the above, I think the second best fix is this (which means that the posted patch is the third best): /* * diff_fill_oid_info() marked one/two->oid with null_oid * for a path whose oid is not available. Disable early * return optimization for them. */ if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid)) same_contents = 0; /* could be different */ else if (oideq(&one->oid, &two->oid)) same_contents = 1; /* definitely the same */ else same_contents = 0; /* definitely different */ But I suspect that the best fix is to teach diff_fill_oid_info() to hash the in-memory data to compute the oid, instead of punting and filling the oid field with null_oid. If function builtin_diffstat() is allowed to look at the contents and run memcmp() here, the 'data' field should have been filled and valid when diff_fill_oid_info() looked at it already. The "best" fix will have wider consequences, so we may not want to jump to it right away without careful consideration. For example, the "best" fix will fix another bug. The 'index' header shows a NULL object name in normal "diff --patch" output for these paths in the current code, which means they cannot be used with "apply --3way". That way, this codepath does not have to know anything about the "null means we don't know" convention. Try: $ (cat COPYING; echo) >RENAMING $ git diff --no-index COPYING - <RENAMING | grep '^index ' index 536e55524d..0000000000 100644 and notice that the stdin side has a null object name in the current code. I think we will show the right object name if we fix the diff_fill_oid_info(). Thanks.
next prev parent reply other threads:[~2020-09-20 19:11 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest 2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest 2020-09-18 14:46 ` Taylor Blau 2020-09-18 15:10 ` Thomas Guyot-Sionnest 2020-09-18 17:37 ` Jeff King 2020-09-18 18:00 ` Thomas Guyot-Sionnest 2020-09-20 4:53 ` Thomas Guyot 2020-09-18 17:27 ` Jeff King 2020-09-18 17:52 ` Thomas Guyot-Sionnest 2020-09-18 18:06 ` Junio C Hamano 2020-09-23 19:16 ` Johannes Schindelin 2020-09-23 19:23 ` Junio C Hamano 2020-09-23 20:44 ` Johannes Schindelin 2020-09-24 4:49 ` Thomas Guyot 2020-09-24 5:24 ` [PATCH v3] " Thomas Guyot-Sionnest 2020-09-24 7:41 ` [PATCH v4] " Thomas Guyot-Sionnest 2020-09-24 6:40 ` [PATCH 1/2] " Junio C Hamano 2020-09-24 7:13 ` Thomas Guyot 2020-09-24 17:19 ` Junio C Hamano 2020-09-24 17:38 ` Junio C Hamano 2020-09-23 15:05 ` Johannes Schindelin 2020-09-20 13:09 ` [PATCH v2] " Thomas Guyot-Sionnest 2020-09-20 15:39 ` Taylor Blau 2020-09-20 16:38 ` Thomas Guyot 2020-09-20 19:11 ` Junio C Hamano [this message] 2020-09-20 20:08 ` Junio C Hamano 2020-09-20 20:36 ` Junio C Hamano 2020-09-20 22:15 ` Junio C Hamano 2020-09-21 19:26 ` Jeff King 2020-09-21 21:51 ` Junio C Hamano 2020-09-21 22:20 ` Jeff King 2020-09-21 22:37 ` Junio C Hamano 2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest 2020-09-18 14:36 ` Taylor Blau 2020-09-18 16:34 ` Thomas Guyot-Sionnest 2020-09-18 17:19 ` Jeff King 2020-09-18 17:21 ` Jeff King 2020-09-18 17:39 ` Thomas Guyot-Sionnest 2020-09-18 17:48 ` Junio C Hamano 2020-09-18 18:02 ` Jeff King 2020-09-20 12:54 ` Thomas Guyot 2020-09-21 19:31 ` Jeff King 2020-09-21 20:14 ` Junio C Hamano 2020-09-18 17:58 ` Taylor Blau 2020-09-18 18:05 ` Jeff King 2020-09-18 17:20 ` Jeff King 2020-09-18 18:00 ` Taylor Blau 2020-09-18 21:56 ` brian m. carlson 2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano 2020-09-18 18:24 ` Thomas Guyot-Sionnest
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).