git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* BUG in git diff-index
@ 2016-03-31 12:39 Andy Lowry
  2016-03-31 13:57 ` Carlos Martín Nieto
  2016-03-31 14:05 ` Jeff King
  0 siblings, 2 replies; 13+ messages in thread
From: Andy Lowry @ 2016-03-31 12:39 UTC (permalink / raw)
  To: git

Following transcript illustrates what I believe to be a bug in git diff-
index. The session used a git built from latest source, located in 
/tmp/git/git.

1. New repo, create empty file A, commit changes.
2. touch A
3. git diff-index reports A has changed, and reports bogus destination 
SHA
4. This is stable behavior until next step
5. git diff correctly reports no changes
6. git diff-index now also reports nothing

My understanding is that git diff-index should care only about content 
and file mode, not modification time.

===========================================================
andy@wiki:/tmp$ git/git init xxx
warning: templates not found /home/andy/share/git-core/templates
Initialized empty Git repository in /tmp/xxx/.git/
andy@wiki:/tmp$ cd xxx
andy@wiki:/tmp/xxx$ touch A
andy@wiki:/tmp/xxx$ ../git/git add A
andy@wiki:/tmp/xxx$ ../git/git commit -m initial
[master (root-commit) 370c3ac] initial
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 A
andy@wiki:/tmp/xxx$ touch A
andy@wiki:/tmp/xxx$ ../git/git diff-index HEAD
:100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 
0000000000000000000000000000000000000000 M    A
andy@wiki:/tmp/xxx$ ../git/git diff-index HEAD
:100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 
0000000000000000000000000000000000000000 M    A
andy@wiki:/tmp/xxx$ ../git/git diff
andy@wiki:/tmp/xxx$ ../git/git diff-index HEAD
andy@wiki:/tmp/xxx$ 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 12:39 BUG in git diff-index Andy Lowry
@ 2016-03-31 13:57 ` Carlos Martín Nieto
  2016-03-31 14:05 ` Jeff King
  1 sibling, 0 replies; 13+ messages in thread
From: Carlos Martín Nieto @ 2016-03-31 13:57 UTC (permalink / raw)
  To: Andy Lowry; +Cc: git

On Thu, 2016-03-31 at 12:39 +0000, Andy Lowry wrote:
> Following transcript illustrates what I believe to be a bug in git
> diff-
> index. The session used a git built from latest source, located in 
> /tmp/git/git.
> 
> 1. New repo, create empty file A, commit changes.
> 2. touch A
> 3. git diff-index reports A has changed, and reports bogus
> destination 
> SHA
> 4. This is stable behavior until next step

This is expected and matches the documentation. See the bit starting
with

    OPERATING MODES
           You can choose whether you want to trust the index file entirely (using the --cached flag) or ask the diff
           logic to show any files that don’t match the stat state as being "tentatively changed". Both of these
           operations are very useful indeed.

The next two sections describe what you are seeing. The default is non-
cached mode which also shows files which don't match the stat data in
the index (which you've changed by touching the file).

Cheers,
   cmn

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 12:39 BUG in git diff-index Andy Lowry
  2016-03-31 13:57 ` Carlos Martín Nieto
@ 2016-03-31 14:05 ` Jeff King
  2016-03-31 14:12   ` Andy Lowry
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff King @ 2016-03-31 14:05 UTC (permalink / raw)
  To: Andy Lowry; +Cc: git

On Thu, Mar 31, 2016 at 12:39:23PM +0000, Andy Lowry wrote:

> Following transcript illustrates what I believe to be a bug in git diff-
> index. The session used a git built from latest source, located in 
> /tmp/git/git.
> 
> 1. New repo, create empty file A, commit changes.
> 2. touch A
> 3. git diff-index reports A has changed, and reports bogus destination 
> SHA
> 4. This is stable behavior until next step
> 5. git diff correctly reports no changes
> 6. git diff-index now also reports nothing
> 
> My understanding is that git diff-index should care only about content 
> and file mode, not modification time.

This is working as designed (though I agree it is a little confusing).
From "git help diff-index":

       These commands all compare two sets of things; what is compared differs:

       git-diff-index <tree-ish>
           compares the <tree-ish> and the files on the filesystem.

       git-diff-index --cached <tree-ish>
           compares the <tree-ish> and the index.

       git-diff-tree [-r] <tree-ish-1> <tree-ish-2> [<pattern>...]
           compares the trees named by the two arguments.

       git-diff-files [<pattern>...]
           compares the index and the files on the filesystem.

Your invocation triggers the first, though it is not a true comparison
of what is on the filesystem, but rather a tree/index comparison, taking
into account the filesystem values. The all-zeroes sha1 indicates that
the index entry is not up to date with what is in the filesystem, but we
don't actually read the file contents to refresh the entry.

Back when diff-index was written, it was generally assumed that scripts
would refresh the index as their first operation, and then proceed to do
one or more operations like diff-index, which would rely on the refresh
from the first step.

Running the porcelain "git diff" does refresh the index, which is why
your step 6 shows no diff.

If you want a pure tree-to-index comparison, use --cached (this will
also be slightly faster, as it does not have to stat the working tree at
all).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* BUG in git diff-index
  2016-03-31 14:05 ` Jeff King
@ 2016-03-31 14:12   ` Andy Lowry
  2016-03-31 14:27     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lowry @ 2016-03-31 14:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thanks, very helpful.

What I'm actually after is a tree-to-filesystem comparison, regardless
of index. I've currently got a "diff" thrown in as a "work-around"
before "diff-index", but  now I understand it's not a workaround at
all. If there's a better way to achieve what I'm after, I'd appreciate
a tip. Otherwise I'll just change the comments explaining why there's
a "diff" in my script.

andy

> &gt; 5. git diff correctly reports no changes &gt; 6. git diff-index now also reports nothing This is working as designed (though I agree it is a little confusing). From "git help diff-index": These commands all compare two sets of things; what is compared differs: git-diff-index  compares the  and the files on the filesystem. git-diff-index --cached  compares the  and the index. git-diff-tree [-r]   [...] compares the trees named by the two arguments. git-diff-files [...] compares the index and the files on the filesystem. Your invocation triggers the first, though it is not a true comparison of what is on the filesystem, but rather a tree/index comparison, taking into account the filesystem values. The all-zeroes sha1 indicates that the index entry is not up to date with what is in the
  filesystem, but we don't actually read the file contents to refresh the entry. Back when diff-index was written, it was generally assumed that scripts would refresh the index as their first operation, and then proceed to do one or more operations like diff-index, which would rely on the refresh from the first step. Running the porcelain "git diff" does refresh the index, which is why your step 6 shows no diff. If you want a pure tree-to-index comparison, use --cached (this will also be slightly faster, as it does not have to stat the working tree at all). -Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 14:12   ` Andy Lowry
@ 2016-03-31 14:27     ` Jeff King
  2016-03-31 19:30       ` Andy Lowry
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2016-03-31 14:27 UTC (permalink / raw)
  To: Andy Lowry; +Cc: git

On Thu, Mar 31, 2016 at 10:12:07AM -0400, Andy Lowry wrote:

> What I'm actually after is a tree-to-filesystem comparison, regardless
> of index. I've currently got a "diff" thrown in as a "work-around"
> before "diff-index", but  now I understand it's not a workaround at
> all. If there's a better way to achieve what I'm after, I'd appreciate
> a tip. Otherwise I'll just change the comments explaining why there's
> a "diff" in my script.

If your workaround is just to refresh the index, then you can do "git
update-index --refresh", rather than diff.

I don't think there is a plumbing command to do a direct
filesystem-to-tree comparison without having an index at all. "git diff
<treeish>" claims in the documentation to do so, but besides not being
plumbing, I think it is really just doing the same thing as diff-index,
under the hood. The index is a pretty fundamental part of git's view of
the working tree.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 14:27     ` Jeff King
@ 2016-03-31 19:30       ` Andy Lowry
  2016-03-31 20:39         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lowry @ 2016-03-31 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git

OK, great. I think the update-index command is what I need.

If you'll indulge me, I'll describe my use-case in detail, and if you 
see anything screwy about it, I'd appreciate feedback. But don't feel 
obligated - you've been a great help already.

This is all about publishing updates to a static web site hosted as a 
gh-pages branch on a github repo.

Our master branch has everything that goes into building the site, and 
we use subtree push to update gh-pages with the embedded subtree that 
contains the generated site.

I'm creating a bash script that does the publishing, and as a general 
rule, we want to publish only from the master branch, and only if it's 
clean and up-to-date. And we also want to make sure that the embedded 
site reflects the current source content.

It's in that last requirement where diff-index comes in. The script runs 
a site build, and then should fail the policy check if that results in 
any changes to the working tree (including the embedded site)/. /I don't 
want the script to change the index in any way (e.g. so that unintended 
changes revealed by this policy check are less likely to be accidentally 
commited).

If I understand correctly, the update-index operation you indicated will 
not change index membership at all, but will simply resync the index 
members with actual working tree files.

So I think now that the script should do "update-index --refresh" 
followed by "diff-index --quiet HEAD". Sound correct?

Andy

On 3/31/2016 10:27 AM, Jeff King wrote:
> On Thu, Mar 31, 2016 at 10:12:07AM -0400, Andy Lowry wrote:
>
>> What I'm actually after is a tree-to-filesystem comparison, regardless
>> of index. I've currently got a "diff" thrown in as a "work-around"
>> before "diff-index", but  now I understand it's not a workaround at
>> all. If there's a better way to achieve what I'm after, I'd appreciate
>> a tip. Otherwise I'll just change the comments explaining why there's
>> a "diff" in my script.
> If your workaround is just to refresh the index, then you can do "git
> update-index --refresh", rather than diff.
>
> I don't think there is a plumbing command to do a direct
> filesystem-to-tree comparison without having an index at all. "git diff
> <treeish>" claims in the documentation to do so, but besides not being
> plumbing, I think it is really just doing the same thing as diff-index,
> under the hood. The index is a pretty fundamental part of git's view of
> the working tree.
>
> -Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 19:30       ` Andy Lowry
@ 2016-03-31 20:39         ` Junio C Hamano
  2017-09-26 19:46           ` Marc Herbert
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-03-31 20:39 UTC (permalink / raw)
  To: Andy Lowry; +Cc: Jeff King, git

Andy Lowry <andy.work@nglowry.com> writes:

> So I think now that the script should do "update-index --refresh"
> followed by "diff-index --quiet HEAD". Sound correct?

Yes.  That has always been one of the kosher ways for any script to
make sure that the files in the working tree that are tracked have
not been modified relative to HEAD (assuming that the index matches
HEAD).  If you are fuzzy about that assumption, you would also do
"diff-index --quiet --cached HEAD" to ensure it, making the whole
thing:

	update-index --refresh
        diff-index --quiet --cached HEAD && diff-index --quiet HEAD

Our scripts traditionally do the equivalent in a slightly different
way.  require_clean_work_tree() in git-sh-setup makes sure that (1)
your working tree files match what is in your index and that (2)
your index matches the HEAD, i.e.

	update-index --refresh
        diff-files --quiet && diff-index --cached --quiet HEAD

They are equivalent in that H==I && H==W (yours) mean H==I==W, while
I==W && H==I (ours) also mean H==I==W.  Two diff-index would require
you to open the tree object of the HEAD twice, so our version may be
more efficient but you probably wouldn't be able to measure the
difference.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2016-03-31 20:39         ` Junio C Hamano
@ 2017-09-26 19:46           ` Marc Herbert
  2017-09-26 20:11             ` Eric Wong
  2017-09-27 21:06             ` Wrong -dirty suffix set by setlocalversion " Marc Herbert
  0 siblings, 2 replies; 13+ messages in thread
From: Marc Herbert @ 2017-09-26 19:46 UTC (permalink / raw)
  To: Junio C Hamano, Andy Lowry
  Cc: Jeff King, git, Christian Kujau, josh, michael.w.mason,
	linux-kernel

On 31/03/16 13:39, Junio C Hamano wrote:
> Andy Lowry <andy.work@nglowry.com> writes:
> 
>> So I think now that the script should do "update-index --refresh"
>> followed by "diff-index --quiet HEAD". Sound correct?
> 
> Yes.  That has always been one of the kosher ways for any script to
> make sure that the files in the working tree that are tracked have
> not been modified relative to HEAD (assuming that the index matches
> HEAD).  

Too bad kernel/scripts/setlocalversion didn't get the memo:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdf2bc632ebc9ef51
> scripts/setlocalversion on write-protected source tree (2013)
> I don't see how removing "git update-index" could do any harm.

This causes a spurious "-dirty" suffix when building from a directory copy
(as Mike learned the hard way)

  ----

PS: I used NNTP and http://dir.gmane.org/gmane.comp.version-control.git
to quickly find this old thread (what could we do without NNTP?). Then
I googled for a web archive of this thread and Google could only find
this one: http://git.661346.n2.nabble.com/BUG-in-git-diff-index-tt7652105.html#none
Is there a robots.txt to block indexing on
https://public-inbox.org/git/1459432667.2124.2.camel@dwim.me ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG in git diff-index
  2017-09-26 19:46           ` Marc Herbert
@ 2017-09-26 20:11             ` Eric Wong
  2017-09-26 23:27               ` Google indexing https://public-inbox.org/git (was: BUG in git diff-index) Marc Herbert
  2017-09-27 21:06             ` Wrong -dirty suffix set by setlocalversion " Marc Herbert
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Wong @ 2017-09-26 20:11 UTC (permalink / raw)
  To: Marc Herbert
  Cc: Junio C Hamano, Andy Lowry, Jeff King, git, Christian Kujau, josh,
	michael.w.mason, linux-kernel

Marc Herbert <Marc.Herbert@intel.com> wrote:
> PS: I used NNTP and http://dir.gmane.org/gmane.comp.version-control.git
> to quickly find this old thread (what could we do without NNTP?). Then
> I googled for a web archive of this thread and Google could only find
> this one: http://git.661346.n2.nabble.com/BUG-in-git-diff-index-tt7652105.html#none
> Is there a robots.txt to block indexing on
> https://public-inbox.org/git/1459432667.2124.2.camel@dwim.me ?

There's no blocks on public-inbox.org and I'm completely against
any sort of blocking/throttling.  Maybe there's too many pages
to index?  Or the Message-IDs in URLs are too ugly/scary?  Not
sure what to do about that...

Anyways, I just put up a robots.txt with Crawl-Delay: 1, since I
seem to recall crawlers use a more conservative delay by default:

==> https://public-inbox.org/robots.txt <==
User-Agent: *
Crawl-Delay: 1


I don't know much about SEO other than keeping a site up and
responsive; so perhaps there's more to be done about getting
things indexed...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Google indexing https://public-inbox.org/git (was: BUG in git diff-index)
  2017-09-26 20:11             ` Eric Wong
@ 2017-09-26 23:27               ` Marc Herbert
  0 siblings, 0 replies; 13+ messages in thread
From: Marc Herbert @ 2017-09-26 23:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

[Reduced Cc: and change Subject:]

On 26/09/17 13:11, Eric Wong wrote:
> There's no blocks on public-inbox.org and I'm completely against
> any sort of blocking/throttling.  Maybe there's too many pages
> to index?  Or the Message-IDs in URLs are too ugly/scary?  Not
> sure what to do about that...
> 
> Anyways, I just put up a robots.txt with Crawl-Delay: 1, since I
> seem to recall crawlers use a more conservative delay by default:

Not sure what made the difference: Google can now easily find this
thread on both https://public-inbox.org/git and on spinics too.

Same change observed for another, unrelated thread on this list.

Nevermind.

"Unsearchable are his judgments, and his ways past finding out"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Wrong -dirty suffix set by setlocalversion (was: BUG in git diff-index)
  2017-09-26 19:46           ` Marc Herbert
  2017-09-26 20:11             ` Eric Wong
@ 2017-09-27 21:06             ` Marc Herbert
  2018-05-24 23:03               ` Mike Mason
  1 sibling, 1 reply; 13+ messages in thread
From: Marc Herbert @ 2017-09-27 21:06 UTC (permalink / raw)
  To: Junio C Hamano, Andy Lowry
  Cc: Jeff King, git, Christian Kujau, josh, michael.w.mason,
	linux-kernel, linux-kbuild

+ linux-kbuild list which is not in the output of:
  ./scripts/get_maintainer.pl -f scripts/setlocalversion 
... but seems relevant anyway.

On 31/03/16 13:39, Junio C Hamano wrote:
> Andy Lowry <andy.work@nglowry.com> writes:
> 
>> So I think now that the script should do "update-index --refresh"
>> followed by "diff-index --quiet HEAD". Sound correct?
> 
> Yes.  That has always been one of the kosher ways for any script to
> make sure that the files in the working tree that are tracked have
> not been modified relative to HEAD (assuming that the index matches
> HEAD).  

Too bad kernel/scripts/setlocalversion didn't get the memo:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdf2bc632ebc9ef51
> scripts/setlocalversion on write-protected source tree (2013)
> I don't see how removing "git update-index" could do any harm.

This causes a spurious "-dirty" suffix when building from a directory copy
(as Mike learned the hard way)

[...]

https://public-inbox.org/git/1459432667.2124.2.camel@dwim.me 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Wrong -dirty suffix set by setlocalversion (was: BUG in git diff-index)
  2017-09-27 21:06             ` Wrong -dirty suffix set by setlocalversion " Marc Herbert
@ 2018-05-24 23:03               ` Mike Mason
  2018-05-25  3:50                 ` Marc Herbert
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Mason @ 2018-05-24 23:03 UTC (permalink / raw)
  To: marc.herbert
  Cc: andy.work, git, gitster, josh, linux-kbuild, linux-kernel, lists,
	michael.w.mason, peff, nico-linuxsetlocalversion

How about something like this? It ignores attributes that should have no
bearing on whether the kernel is considered dirty. Copied trees with no other
changes would no longer be marked with -dirty. Plus it works on read-only
media since no index updating is required.

Would this also be considered kosher, at least for the purposes of
setlocalversion?

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index 71f39410691b..9da4c5e83285 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -73,8 +73,10 @@ scm_version()
 			printf -- '-svn%s' "`git svn find-rev $head`"
 		fi
 
-		# Check for uncommitted changes
-		if git diff-index --name-only HEAD | grep -qv "^scripts/package"; then
+		# Check for uncommitted changes. Only check mtime and size.
+       # Ignore insequential ctime, uid, gid and inode differences.
+		if git -c "core.checkstat=minimal" diff-index --name-only HEAD | \
+				grep -qv "^scripts/package"; then
 			printf '%s' -dirty
 		fi
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Wrong -dirty suffix set by setlocalversion (was: BUG in git diff-index)
  2018-05-24 23:03               ` Mike Mason
@ 2018-05-25  3:50                 ` Marc Herbert
  0 siblings, 0 replies; 13+ messages in thread
From: Marc Herbert @ 2018-05-25  3:50 UTC (permalink / raw)
  To: Mike Mason
  Cc: andy.work, git, gitster, josh, linux-kbuild, linux-kernel, lists,
	peff, nico-linuxsetlocalversion

On 24/05/2018 16:03, Mike Mason wrote:

> diff --git a/scripts/setlocalversion b/scripts/setlocalversion
> index 71f39410691b..9da4c5e83285 100755
> --- a/scripts/setlocalversion
> +++ b/scripts/setlocalversion
> @@ -73,8 +73,10 @@ scm_version()
>  			printf -- '-svn%s' "`git svn find-rev $head`"
>  		fi
>  
> -		# Check for uncommitted changes
> -		if git diff-index --name-only HEAD | grep -qv "^scripts/package"; then
> +		# Check for uncommitted changes. Only check mtime and size.
> +       # Ignore insequential ctime, uid, gid and inode differences.
> +		if git -c "core.checkstat=minimal" diff-index --name-only HEAD | \
> +				grep -qv "^scripts/package"; then
>  			printf '%s' -dirty
>  		fi

FWIW:

Reported-by: Marc.Herbert@intel.com
Reviewed-by: Marc.Herbert@intel.com  (assuming a future and decent commit message)
Tested-by: Marc.Herbert@intel.com


So the real use case is making a copy of a whole tree before building.
Typical in automated builds, old example:
https://groups.google.com/a/chromium.org/d/msg/chromium-os-dev/zxOa0OLWFkw/N_Sb7EZOBwAJ 

Here's a more complex but faster and more transparent way to test Mike's fix
than copying an entire tree:

# Make sure you start from a clean state
git describe --dirty      # must not -dirty

make prepare

# Simulate a copy of the tree but with just one file
rsync --perms --times  README   README.mtime_backup
rm  README
rsync --perms --times  README.mtime_backup   README
stat  README  README.mtime_backup 

# Demo the BUG fixed by Mike
./scripts/setlocalversion # -dirty BUG! because spurious inode ctime difference
git diff-index  HEAD
git describe --dirty      # not -dirty
./scripts/setlocalversion # not -dirty any more cause describe refreshed index

# Make sure mtime still causes -dirty with AND without Mike's fix
touch README
./scripts/setlocalversion # -dirty

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-05-25  3:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 12:39 BUG in git diff-index Andy Lowry
2016-03-31 13:57 ` Carlos Martín Nieto
2016-03-31 14:05 ` Jeff King
2016-03-31 14:12   ` Andy Lowry
2016-03-31 14:27     ` Jeff King
2016-03-31 19:30       ` Andy Lowry
2016-03-31 20:39         ` Junio C Hamano
2017-09-26 19:46           ` Marc Herbert
2017-09-26 20:11             ` Eric Wong
2017-09-26 23:27               ` Google indexing https://public-inbox.org/git (was: BUG in git diff-index) Marc Herbert
2017-09-27 21:06             ` Wrong -dirty suffix set by setlocalversion " Marc Herbert
2018-05-24 23:03               ` Mike Mason
2018-05-25  3:50                 ` Marc Herbert

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).