git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jack Bates <bk874k@nottheoilrig.com>
Cc: git@vger.kernel.org, Jack Bates <jack@nottheoilrig.com>
Subject: Re: [PATCH] diff: fix up SHA-1 abbreviations outside of repository
Date: Mon, 5 Dec 2016 01:55:23 -0500	[thread overview]
Message-ID: <20161205065523.yspqt34p3dp5g5fk@sigill.intra.peff.net> (raw)
In-Reply-To: <20161204194747.7100-1-jack@nottheoilrig.com>

On Sun, Dec 04, 2016 at 12:47:47PM -0700, Jack Bates wrote:

> The three cases where "git diff" operates outside of a repository are 1)
> when we run it outside of a repository, 2) when one of the files we're
> comparing is outside of the repository we're in, and 3) the --no-index
> option. Commit 4f03666 ("diff: handle sha1 abbreviations outside of
> repository", 2016-10-20) only worked in the first case.

You didn't define "worked" here. From looking at your patch, it looks
like we look look in the object database for abbreviations in the
--no-index case, but you think we shouldn't.

I'm not sure I agree. The "--no-index" option asks git not to treat the
arguments as pathspecs, but instead as two filesystem files to diff.
But should it ignore the repository entirely? One use case is to just
ask for the diff of two files:

  git diff --no-index foo bar

which may or may not be tracked in the repository. For abbreviations, I
doubt that people would care much, but see below.

> diff --git a/builtin/diff.c b/builtin/diff.c
> index 7f91f6d..ec7c432 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -342,9 +342,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  		       "--no-index" : "[--no-index]");
>  
>  	}
> -	if (no_index)
> +	if (no_index) {
>  		/* If this is a no-index diff, just run it and exit there. */
> +		startup_info->have_repository = 0;
>  		diff_no_index(&rev, argc, argv);
> +	}

This reset of have_repository would affect more than just abbreviations.
We may also look at the repository for attributes, config, etc.  For
instance, right now:

  echo "*.pdf diff=pdf" >.git/info/attributes
  git config diff.pdf.textconv pdftotext
  git diff --no-index --textconv foo.pdf bar.pdf

will show a text diff of the two files, but wouldn't after your patch.

(I actually think even needing to say --textconv is actually a bug;
normally the diff porcelain defaults to --textconv, but that setup is
skipped in the no-index case).

> +To check that we don'\''t, create an blob with a SHA-1 that starts with
> +0000. (Outside of a repository, SHA-1s are all zeros.) Then make an
> +abbreviation and check that Git doesn'\''t lengthen it.

That's not always true. If we actually diff the file contents, the sha1s
are correct (which you can see in the default --patch output). It's only
in the --raw case that the sha1 is all zeroes.

I'm not 100% sure that isn't a bug.  In a normal git diff, we can show
the sha1s without actually looking at the file content, because we get
them from either the containing tree or the index. In a --no-index diff,
we create the diff_filespec structs without a valid sha1. But we can't
get away from reading the files eventually. The magic happens in
diffcore_skip_stat_unmatch(), which actually does a series of checks,
culminating in a size-check and then a comparison of the contents.

I suppose it _is_ faster than computing the actual sha1, because we can
sometimes show "modified" by only looking at the size, or the first few
bytes. But any time two files are identical, we pay the full cost. So if
you're comparing two hierarchies, say, like:

  git diff --raw one/ two/

it's probably not much more expensive to compare with the full sha1s,
because we're already reading the entire contents of every file that's
the same.

So I dunno. It kind of surprised me that anybody was using "--no-index
--raw" in the first place, so maybe there is some use case I'm missing.

-Peff

  reply	other threads:[~2016-12-05  6:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-04 19:47 [PATCH] diff: fix up SHA-1 abbreviations outside of repository Jack Bates
2016-12-05  6:55 ` Jeff King [this message]
2016-12-05  7:19 ` Junio C Hamano
2016-12-05  7:26   ` Jeff King
2016-12-05 17:45     ` Jack Bates
2016-12-05 22:43     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161205065523.yspqt34p3dp5g5fk@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=bk874k@nottheoilrig.com \
    --cc=git@vger.kernel.org \
    --cc=jack@nottheoilrig.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).