git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Guyot <tguyot@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Thomas Guyot-Sionnest <dermoth@aei.ca>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
Date: Sun, 20 Sep 2020 00:53:15 -0400	[thread overview]
Message-ID: <f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com> (raw)
In-Reply-To: <CALqVohfQZu=itUyfU7nubJpgBETh2q7W1TVx=c2E32ey2cFZkA@mail.gmail.com>

Hi... Added Jeff as he got involved later and comments below are
relevant to his questions.

On 2020-09-18 11:10, Thomas Guyot-Sionnest wrote:
> On Fri, 18 Sep 2020 at 10:46, Taylor Blau <me@ttaylorr.com> wrote:
>>
>>   - Why do we have to do this at all all the way up in
>>     'builtin_diffstat'? I would expect these to contain the right
>>     OIDs by the time they are given back to us from the call to
>>     'diff_fill_oid_info' in 'run_diffstat'.
>>
>> So, my last point is the most important of the three. I'd expect
>> something more along the lines of:
>>
>>   1. diff_fill_oid_info resolve the link to the pipe, and
>>   2. index_path handles the resolved fd.
>>
>> ...but it looks like that is already what it's doing? I'm confused why
>> this doesn't work as-is.
> 
> So the idea is to checksum the data and write a valid oid. I'll see if
> I can figure that out. Thanks for the hint though else I would likely
> have gone with a buffer and memcmp. Your solution seems cleaner, and
> there is a few other uses of oideq's that look dubious at best with
> the case of null oids / buffered data so it's definitely a better
> approach.
> 

After looking further at the code I understand your point, although
pipes can only ever be read once, so even if we do that we would have to
buffer on first read. It appears the files are first read by
diff_populate_filespec() - builtin_diffstat isn't even called if the
files match (even for two pipes).

Jeff, on your suggestion to compare size, the size is set even if data
is null. Files in-tree appears to be mmapped on demand for reads.

diff_fill_oid_info explicitly resets oids for is_stdin and return, and
if the file's been read already and it's a pipe, we would *have* to have
buffered the data already so I don't really see what else we can do
besides memcmp() (technically we should be able to tell if the files
have been modified at this point but apparently that information isn't
transmitted to builtin_diffstat - it's assumed and I won't make complex
change for that odd case of diffing two pipes. That's what I have now:

    /* What is_stdin really means is that the file's content is only
     * in the filespec's buffer and its oid is zero. We can't compare
     * oid's if both are null and we can just diff the buffers */
    if (one->is_stdin && two->is_stdin)
        same_contents = (one->size == two->size ?
            !memcmp(one->data, two->data, one->size) : 0);
    else
        same_contents = oideq(&one->oid, &two->oid);


Even when we implement the --literally switch, considering we can't
guarantee a single read per file, for now I'd keep using the is_stdin
flag as an indication of in-memory data, and we'll have to read in all
pipes we diff (like earlier patch). It could be a concern if we
--literally diff a whole subtree of large pipes. In that case the only
fix I can see is to reorder the operations to generate the stats on each
file diff (or at least keep the diffs around for the stats pass).


Regards,

Thomas



  parent reply	other threads:[~2020-09-20  5:02 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
2020-09-18 14:46   ` Taylor Blau
2020-09-18 15:10     ` Thomas Guyot-Sionnest
2020-09-18 17:37       ` Jeff King
2020-09-18 18:00         ` Thomas Guyot-Sionnest
2020-09-20  4:53       ` Thomas Guyot [this message]
2020-09-18 17:27   ` Jeff King
2020-09-18 17:52     ` Thomas Guyot-Sionnest
2020-09-18 18:06       ` Junio C Hamano
2020-09-23 19:16         ` Johannes Schindelin
2020-09-23 19:23           ` Junio C Hamano
2020-09-23 20:44             ` Johannes Schindelin
2020-09-24  4:49               ` Thomas Guyot
2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
2020-09-24  7:41                   ` [PATCH v4] " Thomas Guyot-Sionnest
2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
2020-09-24  7:13                   ` Thomas Guyot
2020-09-24 17:19                     ` Junio C Hamano
2020-09-24 17:38                       ` Junio C Hamano
2020-09-23 15:05     ` Johannes Schindelin
2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
2020-09-20 15:39     ` Taylor Blau
2020-09-20 16:38       ` Thomas Guyot
2020-09-20 19:11       ` Junio C Hamano
2020-09-20 20:08         ` Junio C Hamano
2020-09-20 20:36         ` Junio C Hamano
2020-09-20 22:15           ` Junio C Hamano
2020-09-21 19:26         ` Jeff King
2020-09-21 21:51           ` Junio C Hamano
2020-09-21 22:20             ` Jeff King
2020-09-21 22:37               ` Junio C Hamano
2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
2020-09-18 14:36   ` Taylor Blau
2020-09-18 16:34     ` Thomas Guyot-Sionnest
2020-09-18 17:19       ` Jeff King
2020-09-18 17:21         ` Jeff King
2020-09-18 17:39         ` Thomas Guyot-Sionnest
2020-09-18 17:48         ` Junio C Hamano
2020-09-18 18:02           ` Jeff King
2020-09-20 12:54             ` Thomas Guyot
2020-09-21 19:31               ` Jeff King
2020-09-21 20:14                 ` Junio C Hamano
2020-09-18 17:58       ` Taylor Blau
2020-09-18 18:05         ` Jeff King
2020-09-18 17:20     ` Jeff King
2020-09-18 18:00       ` Taylor Blau
2020-09-18 21:56   ` brian m. carlson
2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
2020-09-18 18:24   ` Thomas Guyot-Sionnest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com \
    --to=tguyot@gmail.com \
    --cc=dermoth@aei.ca \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).