git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Weird behaviour of git diff-index in container
@ 2022-05-09 22:42 Timo Funke
  2022-05-09 23:18 ` Junio C Hamano
  2022-05-10  2:42 ` Jonathan Nieder
  0 siblings, 2 replies; 4+ messages in thread
From: Timo Funke @ 2022-05-09 22:42 UTC (permalink / raw)
  To: git@vger.kernel.org

What did you do before the bug happened? (Steps to reproduce your issue)

mkdir test
cd test
git init
touch test
git add test
git commit -m 'init'
podman run --rm -it -v `pwd`:/git:z --entrypoint sh docker.io/alpine
> container# apk add git
> container# cd /git
> container# git diff-index --quiet HEAD -- ; echo $?
1
> container# git diff-index --quiet HEAD -- ; echo $?
1
> container# git status
On branch master
nothing to commit, working tree clean
> container# git diff-index --quiet HEAD -- ; echo $?
0


What did you expect to happen? (Expected behavior)
`git diff-index --quiet HEAD -- ; echo $?` should return `0`
even without executing `git status`.

What happened instead? (Actual behavior)
Without executing `git status` `git diff-index --quiet HEAD -- ; echo $?`
will repeatedly print `1`.

What's different between what you expected and what actually happened?
It is odd that `git diff-index --quiet HEAD -- ; echo $?` prints
different results depending on whether `git status` was executed.

Anything else you want to add: Perhaps this has to do with running git in a container?


[System Info]
git version:
git version 2.34.2
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 4.18.0-348.20.1.el8_5.x86_64 #1 SMP Thu Mar 10 20:59:28 UTC 2022 x86_64
compiler info: gnuc: 10.3
libc info: no libc information available
$SHELL (typically, interactive shell): <unset>


[Enabled Hooks]
not run from a git repository - no hooks to show

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird behaviour of git diff-index in container
  2022-05-09 22:42 Weird behaviour of git diff-index in container Timo Funke
@ 2022-05-09 23:18 ` Junio C Hamano
  2022-05-10  2:42 ` Jonathan Nieder
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2022-05-09 23:18 UTC (permalink / raw)
  To: Timo Funke; +Cc: git@vger.kernel.org

Timo Funke <timoses@msn.com> writes:

>> container# git diff-index --quiet HEAD -- ; echo $?
> 1
>> container# git status
> On branch master
> nothing to commit, working tree clean
>> container# git diff-index --quiet HEAD -- ; echo $?
> 0

This is unfortunately very much expected and doubly unfortunately
not very well documented.  Patches to update documentation is very
much welcomed, but such a patch cannot be written in void, so let's
explain what is going on.

To detect paths that have not been modified quickly, Git uses the
mechanism called "cached stat data" in the index.  Among the cached
stat data is the timestamp of the last modification of each file.
By noting that the fact that the last time it checked, the contents
in the file on the filesystem hasn't been modified, together with
the file timestamp observed at the time of such a check, the next
time somebody asks "please compute 'git diff'", Git can notice that
the timestamp of the working tree file hasn't changed and say "no,
there is no change" without looking at the contents.

Now, when the file on the filesystem is "touched" in a way that its
timestamp gets updated without changing the contents (hence, if
there weren't the above optimization, diff would have said "no
change"), Git will think there is a change in the file.

There are two levels of Git subcommands.  Porcelain commands, like
"git diff", are end-user facing and are optimized more for usability
than performance.  "git diff --quiet HEAD --" in the above scenario
WILL notice that there is no change in the contents after all and
exit with 0 (unless diff.autoRefreshIndex configuration is set to
false).  The way they do so is by refreshing the "cached stat data"
automatically before using, and that operation is called "refreshing
the index" (hence the configuration variable name to disable it).

On the other hand, plumbing commands, like "git diff-files" and "git
diff-index", are designed to be used in scripts, number of times,
and do not want to pay the cost of refreshing the index always
before working.  The correct way to use them in a repository whose
current state you do not know about is to first "refresh the index"
by running the command to do so,  e.g. "git update-index --refresh"
before doing anything else.

If you were to run "git diff-files" and "git diff-index HEAD" in a
row in order to compute what "git status" would give you, for
example, you do not need to and want to pay the cost of refreshing
the index twice.  You run "git update-index --refresh" once, and
then run "git diff-files".  Doing so would not change the contents
of the working tree files, so you do not have to refresh the index
again after that, before running "git diff-index HEAD".  That is why
these plumbing commands do not refresh the index themselves.  They
expect you to be refreshing the index before you call them.

"git status" is one of the commands (as a Porcelain) that refreshes
the index automatically, so it is very much understandable that the
same "diff-index --quiet" behaves differently after running it once
and until you touch/smudge the working tree files.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird behaviour of git diff-index in container
  2022-05-09 22:42 Weird behaviour of git diff-index in container Timo Funke
  2022-05-09 23:18 ` Junio C Hamano
@ 2022-05-10  2:42 ` Jonathan Nieder
  2022-05-10 16:47   ` Junio C Hamano
  1 sibling, 1 reply; 4+ messages in thread
From: Jonathan Nieder @ 2022-05-10  2:42 UTC (permalink / raw)
  To: Timo Funke; +Cc: git@vger.kernel.org

Hi!

Timo Funke wrote:

> podman run --rm -it -v `pwd`:/git:z --entrypoint sh docker.io/alpine
> > container# apk add git
> > container# cd /git
> > container# git diff-index --quiet HEAD -- ; echo $?
> 1
> > container# git diff-index --quiet HEAD -- ; echo $?
> 1
> > container# git status
> On branch master
> nothing to commit, working tree clean
> > container# git diff-index --quiet HEAD -- ; echo $?
> 0
>
>
> What did you expect to happen? (Expected behavior)
> `git diff-index --quiet HEAD -- ; echo $?` should return `0`
> even without executing `git status`.
>
> What happened instead? (Actual behavior)
> Without executing `git status` `git diff-index --quiet HEAD -- ; echo $?`
> will repeatedly print `1`.
>
> What's different between what you expected and what actually happened?
> It is odd that `git diff-index --quiet HEAD -- ; echo $?` prints
> different results depending on whether `git status` was executed.

I love this example.  Thanks for writing.

I checked "git help diff-index" to see whether it describes this
pitfall, and I didn't see an explanation.  So at the very least you
have uncovered a documentation bug.

The difference between diff-index and status here is a difference
between "porcelain" (user-facing) commands and "plumbing"
(script-facing) commands.  In Git's index file there is stat(2)
information for each file; if that stat(2) information matches the
corresponding file in the working directory then we know it hasn't
been modified relative to what is in the index.  If the stat(2)
information differs from the working copy, on the other hand, the
behavior depends on whether the command being run is porcelain or
plumbing:

 - plumbing commands assume that the script author has run "git
   update-index --refresh -q" first to update the stat(2) information
   if the file hasn't changed.  This allows efficient scripts to
   refresh the index once and then run multiple commands that rely on
   the result of that:

	git update-index --refresh -q || :
	for rev in "${revs[@]}"
	do
		if git diff-index --quiet "$rev" --
		then
			... do something ...
		fi
	done

 - porcelain commands such as "git status" implicitly refresh the
   index before doing anything else.  This allows them to produce the
   expected result even if the repository is a copy made using "cp -a"
   or has been transferred across machines on a USB stick.

Some places I expected to find an explanation of this:

- documentation for the "git diff-index" command ("git help
  diff-index").  It does not mention this behavior.

- documentation for the "git diff" command ("git help diff").  It also
  doesn't mention this.  That's particularly surprising because it
  would be a great place to document the diff.autoRefreshIndex setting
  that affects this behavior of the "git diff" command (described in
  Documentation/config/diff.txt).

- the Git user manual (Documentation/user-manual.txt).  It describes
  "git update-index --refresh" but very briefly.  It doesn't describe
  the above scripting pattern.

- Git's command-line conventions ("git help cli").  No mention.

- overview of plumbing and porcelain commands ("man git").  No
  mention.

- the Git scripting manual ("git help core-tutorial").  It describes
  "git update-index --refresh" after a "cp -a" but not its use in
  scripts.

- the history of Git's contrib/examples/.  This contains many examples
  of the above scripting pattern but is not very discoverable.

So there are many opportunities for someone to document this better.
If you'd be interested in pursuing that, I'd be happy to provide some
pointers.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Weird behaviour of git diff-index in container
  2022-05-10  2:42 ` Jonathan Nieder
@ 2022-05-10 16:47   ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2022-05-10 16:47 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Timo Funke, git@vger.kernel.org

Jonathan Nieder <jrnieder@gmail.com> writes:

> I love this example.  Thanks for writing.

I guess our mails crossed ;-)

> Some places I expected to find an explanation of this:
>
> - documentation for the "git diff-index" command ("git help
>   diff-index").  It does not mention this behavior.

Yes, diff-index and diff-files should at least have a pointer to
"update-index --refresh".  Ideally they should share a write-up
based on what both of us covered in these responses.

> - documentation for the "git diff" command ("git help diff").  It also
>   doesn't mention this.  That's particularly surprising because it
>   would be a great place to document the diff.autoRefreshIndex setting
>   that affects this behavior of the "git diff" command (described in
>   Documentation/config/diff.txt).

And the autorefreshindex documentation is a tad stale (it is on by
default these days) and does not say why you would want it.  I do
not mind config/diff.txt having it, but that should eventually refer
to the same page that is designed to help the readers of the
diff-index and diff-files documentation.

I do not think anywhere else the missing info belongs to, but
stepping back a bit, it may help to have a write up on scripting
using the plumbing commands in general, not limited to "diff-*"
family of commands.  I actually am torn a bit, as we have long
neglected to give matching improvement to plumbing commands when we
add shiny new toys to commands at the Porcelain level, so Git may
have grown much more hostile to scripters over the years X-<.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-10 16:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 22:42 Weird behaviour of git diff-index in container Timo Funke
2022-05-09 23:18 ` Junio C Hamano
2022-05-10  2:42 ` Jonathan Nieder
2022-05-10 16:47   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).