git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Alex Riesen <raa.lkml@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: GIT 1.6.0-rc1
Date: Tue, 29 Jul 2008 01:13:44 -0700	[thread overview]
Message-ID: <7v4p69jefb.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <7vr69dky93.fsf@gitster.siamese.dyndns.org> (Junio C. Hamano's message of "Mon, 28 Jul 2008 23:20:08 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Alex Riesen <raa.lkml@gmail.com> writes:
>
>> Junio C Hamano, Mon, Jul 28, 2008 08:44:52 +0200:
>>> Alex Riesen <raa.lkml@gmail.com> writes:
>>> 
>>> > t2103-update-index-ignore-missing.sh still broken on Windows because
>>> > of stat.st_size being 0 for directories there.
>>> 
>>> I recall we did not reach a useful conclusion of that discussion.
>>
>> Why isn't the proposed patch useful? (and would it be possible to make
>> the answer out of plain, small and short words?)
>
> Can you answer out of plain, small and short words why the proposed patch
> is correct without unwanted side effects, not just "this seems to solve
> the particular issue for me but I don't know if it has unintended side
> effects"?

Ok, I took a deeper look at the codepaths involved.  Although it does work
around the issue, I do not think your patch alone is the "correct" one in
the longer term.

It needs a bit of explanation, and the explanation won't be exactly
"plain, small and short", unfortunately.

-- >8 --
[PATCH] Teach gitlinks to ie_modified() and ce_modified_check_fs()

The ie_modified() function is the workhorse for refresh_cache_entry(),
i.e. checking if an index entry that is stat-dirty actually has changes.

After running quicker check to compare cached stat information with
results from the latest lstat(2) to answer "has modification" early, the
code goes on to check if there really is a change by comparing the staged
data with what is on the filesystem by asking ce_modified_check_fs().
However, this function always said "no change" for any gitlinks that has a
directory at the corresponding path.  This made ie_modified() to miss
actual changes in the subproject.

The patch fixes this first by modifying an existing short-circuit logic
before calling the ce_modified_check_fs() function.  It knows that for any
filesystem entity to which ie_match_stat() says its data has changed, if
its cached size is nonzero then the contents cannot match, which is a
correct optimization only for blob objects.  We teach gitlink objects to
this special case, as we already know that any gitlink that
ie_match_stat() says is modified is indeed modified at this point in the
codepath.

With the above change, we could leave ce_modified_check_fs() broken, but
it also futureproofs the code by teaching it to use ce_compare_gitlink(),
instead of assuming (incorrectly) that any directory is unchanged.

Originally noticed by Alex Riesen on Cygwin.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 read-cache.c |   27 ++++++++++++++++++++++-----
 1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index b5916b0..a940f8d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -152,7 +152,7 @@ static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
 		break;
 	case S_IFDIR:
 		if (S_ISGITLINK(ce->ce_mode))
-			return 0;
+			return ce_compare_gitlink(ce) ? DATA_CHANGED : 0;
 	default:
 		return TYPE_CHANGED;
 	}
@@ -192,6 +192,7 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 			changed |= TYPE_CHANGED;
 		break;
 	case S_IFGITLINK:
+		/* We ignore most of the st_xxx fields for gitlinks */
 		if (!S_ISDIR(st->st_mode))
 			changed |= TYPE_CHANGED;
 		else if (ce_compare_gitlink(ce))
@@ -298,11 +299,22 @@ int ie_modified(const struct index_state *istate,
 	if (changed & (MODE_CHANGED | TYPE_CHANGED))
 		return changed;
 
-	/* Immediately after read-tree or update-index --cacheinfo,
-	 * the length field is zero.  For other cases the ce_size
-	 * should match the SHA1 recorded in the index entry.
+	/*
+	 * Immediately after read-tree or update-index --cacheinfo,
+	 * the length field is zero, as we have never even read the
+	 * lstat(2) information once, and we cannot trust DATA_CHANGED
+	 * returned by ie_match_stat() which in turn was returned by
+	 * ce_match_stat_basic() to signal that the filesize of the
+	 * blob changed.  We have to actually go to the filesystem to
+	 * see if the contents match, and if so, should answer "unchanged".
+	 *
+	 * The logic does not apply to gitlinks, as ce_match_stat_basic()
+	 * already has checked the actual HEAD from the filesystem in the
+	 * subproject.  If ie_match_stat() already said it is different,
+	 * then we know it is.
 	 */
-	if ((changed & DATA_CHANGED) && ce->ce_size != 0)
+	if ((changed & DATA_CHANGED) &&
+	    (S_ISGITLINK(ce->ce_mode) || ce->ce_size != 0))
 		return changed;
 
 	changed_fs = ce_modified_check_fs(ce, st);
@@ -1331,6 +1343,11 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
 	 * falsely clean entry due to touch-update-touch race, so we leave
 	 * everything else as they are.  We are called for entries whose
 	 * ce_mtime match the index file mtime.
+	 *
+	 * Note that this actually does not do much for gitlinks, for
+	 * which ce_match_stat_basic() always goes to the actual
+	 * contents.  The caller checks with is_racy_timestamp() which
+	 * always says "no" for gitlinks, so we are not called for them ;-)
 	 */
 	struct stat st;
 

  reply	other threads:[~2008-07-29  8:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-28  0:09 GIT 1.6.0-rc1 Junio C Hamano
2008-07-28  0:44 ` Petr Baudis
2008-07-28  6:38 ` Alex Riesen
2008-07-28  6:44   ` Junio C Hamano
2008-07-28 21:37     ` Alex Riesen
2008-07-29  6:20       ` Junio C Hamano
2008-07-29  8:13         ` Junio C Hamano [this message]
2008-07-29  8:36           ` Junio C Hamano
2008-07-29 21:17             ` Alex Riesen
2008-07-29 22:03               ` Junio C Hamano
2008-07-30  6:10                 ` Alex Riesen
2008-07-30  6:15                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v4p69jefb.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=raa.lkml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).