git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [RFC] git checkout $tree -- $path always rewrites files
Date: Fri, 7 Nov 2014 03:13:24 -0500	[thread overview]
Message-ID: <20141107081324.GA19845@peff.net> (raw)

I noticed that "git checkout $tree -- $path" will _always_ unlink and
write a new copy of each matching path, even if they are up-to-date with
the index and the content in $tree is the same.

Here's a simple reproduction:

    # some repo with a file in it
    git init
    echo content >foo && git add foo && git commit -m foo
    
    # give the file a known timestamp (it doesn't matter what)
    perl -e 'utime(123, 123, @ARGV)' foo
    git update-index --refresh
    
    # now check out an identical copy
    git checkout HEAD -- foo
    
    # and check whether it was updated
    perl -le 'print ((stat)[9]) for @ARGV' foo

which yields a modern timestamp, showing that the file was rewritten.

If you use

    git checkout -- foo

instead to checkout from the index, that works fine (the final line
prints 123). Similarly, if you load the new index and checkout
separately, like:

    git read-tree -m $tree
    git checkout -- foo

that also works. Of course it is not quite the same; read-tree loads the
whole index from $tree, whereas checkout would only touch a subset of
the entries. And that's the culprit; checkout does not use unpack-trees
to only touch the entries that need updating. It uses read_tree_recursive
with a pathspec, and just replaces any index entries that match.

Below is a patch which makes it do what I expect by silently copying
flags and stat-data from the old entry, but I feel like it is going in
the wrong direction. Besides causing a few tests to fail (which I
suspect is because I implemented it at the wrong layer), it really feels
like this should be using unpack-trees or something similar.

After all, that is what "git reset $tree -- foo" does, and it gets this
case right (in fact, you can replace the read-tree above with it). It
looks like it actually does a pathspec-limited index-diff rather than
unpack-trees, but the end effect is the same (and while checkout could
just pass "update" to unpack-trees, we need to handle checking out
directly from the index anyway, so I do not think it is a big deal to
keep the index update as a separate step).

Is there a reason that we don't use this diff technique for checkout?

Anyway, here is the (wrong) patch I came up with initially.

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29b0f6d..802e556 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts *opts,
 		ce->ce_flags &= ~CE_MATCHED;
 		if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
 			continue;
-		if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
-			/*
-			 * "git checkout tree-ish -- path", but this entry
-			 * is in the original index; it will not be checked
-			 * out to the working tree and it does not matter
-			 * if pathspec matched this entry.  We will not do
-			 * anything to this entry at all.
-			 */
-			continue;
+
 		/*
-		 * Either this entry came from the tree-ish we are
-		 * checking the paths out of, or we are checking out
-		 * of the index.
+		 * If we are checking out from a tree-ish, we already
+		 * did our pathspec matching.  However, since we then
+		 * drop the CE_UPDATE flag from any paths that do not
+		 * need updating, we don't know which ones were not
+		 * mentioned and which ones were simply up-to-date.
 		 *
 		 * If it comes from the tree-ish, we already know it
 		 * matches the pathspec and could just stamp
diff --git a/read-cache.c b/read-cache.c
index 8f3e9eb..fb2240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
-		if (!new_only)
+		if (!new_only) {
+			struct cache_entry *old = istate->cache[pos];
+			if (!is_null_sha1(ce->sha1) &&
+			    !hashcmp(old->sha1, ce->sha1))
+				copy_cache_entry(ce, old);
 			replace_index_entry(istate, pos, ce);
+		}
 		return 0;
 	}
 	pos = -pos-1;

             reply	other threads:[~2014-11-07  8:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07  8:13 Jeff King [this message]
2014-11-07  8:38 ` [RFC] git checkout $tree -- $path always rewrites files Jeff King
2014-11-07 10:13   ` Duy Nguyen
2014-11-07 16:51     ` Junio C Hamano
2014-11-07 19:15     ` Jeff King
2014-11-07 17:14 ` Junio C Hamano
2014-11-07 19:17   ` Jeff King
     [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
2014-11-08  7:10       ` Martin von Zweigbergk
     [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
2014-11-08  8:03           ` Martin von Zweigbergk
2014-11-08  8:30           ` Jeff King
2014-11-08  8:45             ` Jeff King
2014-11-09 18:37               ` Junio C Hamano
2014-11-08 16:19             ` Martin von Zweigbergk
2014-11-09  9:42               ` Jeff King
2014-11-09 17:21             ` Junio C Hamano
2014-11-13 18:30               ` Jeff King
2014-11-13 19:15                 ` Junio C Hamano
2014-11-13 19:26                   ` Jeff King
2014-11-13 20:03                     ` Jeff King
2014-11-13 21:18                       ` Junio C Hamano
2014-11-13 21:37                         ` Jeff King
2014-11-14  5:44               ` David Aguilar
2014-11-14 19:27                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141107081324.GA19845@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).