From: Thomas Guyot <email@example.com> To: Taylor Blau <firstname.lastname@example.org> Cc: email@example.com, Thomas Guyot-Sionnest <firstname.lastname@example.org>, Jeff King <email@example.com> Subject: Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Date: Sun, 20 Sep 2020 00:53:15 -0400 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <CALqVohfQZu=itUyfU7nubJpgBETh2q7W1TVx=c2E32ey2cFZkA@mail.gmail.com> Hi... Added Jeff as he got involved later and comments below are relevant to his questions. On 2020-09-18 11:10, Thomas Guyot-Sionnest wrote: > On Fri, 18 Sep 2020 at 10:46, Taylor Blau <email@example.com> wrote: >> >> - Why do we have to do this at all all the way up in >> 'builtin_diffstat'? I would expect these to contain the right >> OIDs by the time they are given back to us from the call to >> 'diff_fill_oid_info' in 'run_diffstat'. >> >> So, my last point is the most important of the three. I'd expect >> something more along the lines of: >> >> 1. diff_fill_oid_info resolve the link to the pipe, and >> 2. index_path handles the resolved fd. >> >> ...but it looks like that is already what it's doing? I'm confused why >> this doesn't work as-is. > > So the idea is to checksum the data and write a valid oid. I'll see if > I can figure that out. Thanks for the hint though else I would likely > have gone with a buffer and memcmp. Your solution seems cleaner, and > there is a few other uses of oideq's that look dubious at best with > the case of null oids / buffered data so it's definitely a better > approach. > After looking further at the code I understand your point, although pipes can only ever be read once, so even if we do that we would have to buffer on first read. It appears the files are first read by diff_populate_filespec() - builtin_diffstat isn't even called if the files match (even for two pipes). Jeff, on your suggestion to compare size, the size is set even if data is null. Files in-tree appears to be mmapped on demand for reads. diff_fill_oid_info explicitly resets oids for is_stdin and return, and if the file's been read already and it's a pipe, we would *have* to have buffered the data already so I don't really see what else we can do besides memcmp() (technically we should be able to tell if the files have been modified at this point but apparently that information isn't transmitted to builtin_diffstat - it's assumed and I won't make complex change for that odd case of diffing two pipes. That's what I have now: /* What is_stdin really means is that the file's content is only * in the filespec's buffer and its oid is zero. We can't compare * oid's if both are null and we can just diff the buffers */ if (one->is_stdin && two->is_stdin) same_contents = (one->size == two->size ? !memcmp(one->data, two->data, one->size) : 0); else same_contents = oideq(&one->oid, &two->oid); Even when we implement the --literally switch, considering we can't guarantee a single read per file, for now I'd keep using the is_stdin flag as an indication of in-memory data, and we'll have to read in all pipes we diff (like earlier patch). It could be a concern if we --literally diff a whole subtree of large pipes. In that case the only fix I can see is to reorder the operations to generate the stats on each file diff (or at least keep the diffs around for the stats pass). Regards, Thomas
next prev parent reply other threads:[~2020-09-20 5:02 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest 2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest 2020-09-18 14:46 ` Taylor Blau 2020-09-18 15:10 ` Thomas Guyot-Sionnest 2020-09-18 17:37 ` Jeff King 2020-09-18 18:00 ` Thomas Guyot-Sionnest 2020-09-20 4:53 ` Thomas Guyot [this message] 2020-09-18 17:27 ` Jeff King 2020-09-18 17:52 ` Thomas Guyot-Sionnest 2020-09-18 18:06 ` Junio C Hamano 2020-09-23 19:16 ` Johannes Schindelin 2020-09-23 19:23 ` Junio C Hamano 2020-09-23 20:44 ` Johannes Schindelin 2020-09-24 4:49 ` Thomas Guyot 2020-09-24 5:24 ` [PATCH v3] " Thomas Guyot-Sionnest 2020-09-24 7:41 ` [PATCH v4] " Thomas Guyot-Sionnest 2020-09-24 6:40 ` [PATCH 1/2] " Junio C Hamano 2020-09-24 7:13 ` Thomas Guyot 2020-09-24 17:19 ` Junio C Hamano 2020-09-24 17:38 ` Junio C Hamano 2020-09-23 15:05 ` Johannes Schindelin 2020-09-20 13:09 ` [PATCH v2] " Thomas Guyot-Sionnest 2020-09-20 15:39 ` Taylor Blau 2020-09-20 16:38 ` Thomas Guyot 2020-09-20 19:11 ` Junio C Hamano 2020-09-20 20:08 ` Junio C Hamano 2020-09-20 20:36 ` Junio C Hamano 2020-09-20 22:15 ` Junio C Hamano 2020-09-21 19:26 ` Jeff King 2020-09-21 21:51 ` Junio C Hamano 2020-09-21 22:20 ` Jeff King 2020-09-21 22:37 ` Junio C Hamano 2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest 2020-09-18 14:36 ` Taylor Blau 2020-09-18 16:34 ` Thomas Guyot-Sionnest 2020-09-18 17:19 ` Jeff King 2020-09-18 17:21 ` Jeff King 2020-09-18 17:39 ` Thomas Guyot-Sionnest 2020-09-18 17:48 ` Junio C Hamano 2020-09-18 18:02 ` Jeff King 2020-09-20 12:54 ` Thomas Guyot 2020-09-21 19:31 ` Jeff King 2020-09-21 20:14 ` Junio C Hamano 2020-09-18 17:58 ` Taylor Blau 2020-09-18 18:05 ` Jeff King 2020-09-18 17:20 ` Jeff King 2020-09-18 18:00 ` Taylor Blau 2020-09-18 21:56 ` brian m. carlson 2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano 2020-09-18 18:24 ` Thomas Guyot-Sionnest
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).