git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git filter bug
@ 2022-06-10 22:19 Udoff, Marc
  2022-06-11  7:43 ` Johannes Sixt
  0 siblings, 1 reply; 6+ messages in thread
From: Udoff, Marc @ 2022-06-10 22:19 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: Shupak, Vitaly

Hi,

I believe there is a bug in git status that happens when a file changes but the filtered version of the file does not. Correctly, git diff does not show anything as different and git commit believes there is nothing to commit.

Reproducer:
$ git init
$ touch bar
$ git add bar
$ git commit -am 'Bar'
[main (root-commit) dd12b3e] Bar
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 bar
$ echo -en '\n[filter "noat"]\n     clean = grep -v "@"\n' >> .git/config
$ cat .git/config 
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true

[filter "noat"]
     clean = grep -v "@"
$ echo -en 'abc\n@def\nghi\n' > bar 
$ cat bar 
abc
@def
ghi
$ echo "* filter=noat" > .gitattributes
$ git commit -am 'No at bar'
[main e81ee3b] No at bar
2 files changed, 3 insertions(+)
create mode 100644 .gitattributes
$ git show HEAD:bar
abc
ghi
$ echo "@another line" >> bar # Add another @ which will be filtered. touch doesn't cause this bug
$ git status --porcelain
M bar
$ git diff # no output as there is no diff
$ git commit -am "I did not update"
On branch main
nothing to commit, working tree clean


While this reproducer is a bit contrived, the real world examples are with Jupyter notebooks filtering output, so I expect this is a somewhat common occurrence. I think it also may be the same as https://stackoverflow.com/questions/62641222/how-to-make-git-status-consider-the-clean-filter.

Git version: 2.35.1
OS: Linux


Note: I also logged this here, but I believe this mailing list is the correct place to raise the issue: https://github.com/gitgitgadget/git/issues/1256

Thanks,
Marc


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git filter bug
  2022-06-10 22:19 git filter bug Udoff, Marc
@ 2022-06-11  7:43 ` Johannes Sixt
  2022-06-13 17:29   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Sixt @ 2022-06-11  7:43 UTC (permalink / raw)
  To: Udoff, Marc; +Cc: Shupak, Vitaly, git@vger.kernel.org

Am 11.06.22 um 00:19 schrieb Udoff, Marc:
> Hi,
> 
> I believe there is a bug in git status that happens when a file
> changes but the filtered version of the file does not. Correctly,
> git diff does not show anything as different and git commit believes
> there is nothing to commit.
> 
> Reproducer:
> $ git init
> $ touch bar
> $ git add bar
> $ git commit -am 'Bar'
> [main (root-commit) dd12b3e] Bar
> 1 file changed, 0 insertions(+), 0 deletions(-)
> create mode 100644 bar
> $ echo -en '\n[filter "noat"]\n     clean = grep -v "@"\n' >> .git/config
> $ cat .git/config 
> [core]
>         repositoryformatversion = 0
>         filemode = true
>         bare = false
>         logallrefupdates = true
> 
> [filter "noat"]
>      clean = grep -v "@"
> $ echo -en 'abc\n@def\nghi\n' > bar 
> $ cat bar 
> abc
> @def
> ghi
> $ echo "* filter=noat" > .gitattributes
> $ git commit -am 'No at bar'
> [main e81ee3b] No at bar
> 2 files changed, 3 insertions(+)
> create mode 100644 .gitattributes
> $ git show HEAD:bar
> abc
> ghi
> $ echo "@another line" >> bar # Add another @ which will be filtered. touch doesn't cause this bug
> $ git status --porcelain
> M bar

git status does not compute differences; it only looks at the stat
information, and that is by design for performance reasons. So, IMO,
this is working as designed and not a bug.

-- Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git filter bug
  2022-06-11  7:43 ` Johannes Sixt
@ 2022-06-13 17:29   ` Junio C Hamano
  2022-06-13 21:15     ` Johannes Sixt
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2022-06-13 17:29 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Udoff, Marc, Shupak, Vitaly, git@vger.kernel.org

Johannes Sixt <j6t@kdbg.org> writes:

> git status does not compute differences; it only looks at the stat
> information, and that is by design for performance reasons. So, IMO,
> this is working as designed and not a bug.

Hmph, is that true?  I thought "git status" did an equivalent of
diff.autoRefreshIndex just like other commands like "git diff" at
the Porcelain level.

Is this more like the commonly seen "after you futzed the attributes
to affect normalization, "--renormalize" is needed to force the
index to match the cleaned version of working tree under the new
clean filter rules", I wonder?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git filter bug
  2022-06-13 17:29   ` Junio C Hamano
@ 2022-06-13 21:15     ` Johannes Sixt
  2022-06-14 19:11       ` Shupak, Vitaly
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Sixt @ 2022-06-13 21:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Udoff, Marc, Shupak, Vitaly, git@vger.kernel.org

Am 13.06.22 um 19:29 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> git status does not compute differences; it only looks at the stat
>> information, and that is by design for performance reasons. So, IMO,
>> this is working as designed and not a bug.
> 
> Hmph, is that true?  I thought "git status" did an equivalent of
> diff.autoRefreshIndex just like other commands like "git diff" at
> the Porcelain level.

Is it true? I don't know; you tell me ;) git status certainly does
autoRefreshIndex, but is that based on a diff computation? I thought git
status looks only at stat information.

> Is this more like the commonly seen "after you futzed the attributes
> to affect normalization, "--renormalize" is needed to force the
> index to match the cleaned version of working tree under the new
> clean filter rules", I wonder?

Not in this case. The modified file that git status reports happens long
after git commit -a has already applied the new filter.

-- Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git filter bug
  2022-06-13 21:15     ` Johannes Sixt
@ 2022-06-14 19:11       ` Shupak, Vitaly
  2022-06-23 12:13         ` Udoff, Marc
  0 siblings, 1 reply; 6+ messages in thread
From: Shupak, Vitaly @ 2022-06-14 19:11 UTC (permalink / raw)
  To: Johannes Sixt, Junio C Hamano; +Cc: Udoff, Marc, git@vger.kernel.org

Here's the behavior that I observe:
- If the mtime of the normal file changes from what's in the index but the content doesn't change, "git status" updates the index with the latest timestamp of the file.
- If the filtered file changes, but the size stays the same, git status also triggers an index update.
- BUT if the size of the filtered file changes, then the index does NOT get updated and the file appears modified on every git status run until you explicitly run "git add <filename>" again. This is true even if the post-clean filter content is the same as what's currently in the index.
- The clean filter runs on every "git status" call anyway, so this behavior does not appear to be an optimization.

So if a file is modified such that the post-clean filter content is the same as what's in the index, "git status" will show the file as modified only if the file size has also changed. It seems that perhaps "git status" is comparing file sizes before applying the clean filter to see if the index entry needs to be refreshed? 

Vitaly

-----Original Message-----
From: Johannes Sixt <j6t@kdbg.org> 
Sent: Monday, June 13, 2022 5:15 PM
To: Junio C Hamano <gitster@pobox.com>
Cc: Udoff, Marc <Marc.Udoff@deshaw.com>; Shupak, Vitaly <Vitaly.Shupak@deshaw.com>; git@vger.kernel.org
Subject: Re: git filter bug

This message was sent by an external party.


Am 13.06.22 um 19:29 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
>
>> git status does not compute differences; it only looks at the stat 
>> information, and that is by design for performance reasons. So, IMO, 
>> this is working as designed and not a bug.
>
> Hmph, is that true?  I thought "git status" did an equivalent of 
> diff.autoRefreshIndex just like other commands like "git diff" at the 
> Porcelain level.

Is it true? I don't know; you tell me ;) git status certainly does autoRefreshIndex, but is that based on a diff computation? I thought git status looks only at stat information.

> Is this more like the commonly seen "after you futzed the attributes 
> to affect normalization, "--renormalize" is needed to force the index 
> to match the cleaned version of working tree under the new clean 
> filter rules", I wonder?

Not in this case. The modified file that git status reports happens long after git commit -a has already applied the new filter.

-- Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: git filter bug
  2022-06-14 19:11       ` Shupak, Vitaly
@ 2022-06-23 12:13         ` Udoff, Marc
  0 siblings, 0 replies; 6+ messages in thread
From: Udoff, Marc @ 2022-06-23 12:13 UTC (permalink / raw)
  To: Shupak, Vitaly, Johannes Sixt, Junio C Hamano; +Cc: git@vger.kernel.org

Based on the conversation below, I'm not quite sure if there is consensus this is a bug and not a feature. If this is a bug, what are the next steps to getting it fixed? If this is something that someone with minimal codebase experience can fix, we'd be happy to submit a PR if you could give us a few pointers.

Thanks!

-----Original Message-----
From: Shupak, Vitaly <Vitaly.Shupak@deshaw.com> 
Sent: Tuesday, June 14, 2022 3:12 PM
To: Johannes Sixt <j6t@kdbg.org>; Junio C Hamano <gitster@pobox.com>
Cc: Udoff, Marc <Marc.Udoff@deshaw.com>; git@vger.kernel.org
Subject: RE: git filter bug

Here's the behavior that I observe:
- If the mtime of the normal file changes from what's in the index but the content doesn't change, "git status" updates the index with the latest timestamp of the file.
- If the filtered file changes, but the size stays the same, git status also triggers an index update.
- BUT if the size of the filtered file changes, then the index does NOT get updated and the file appears modified on every git status run until you explicitly run "git add <filename>" again. This is true even if the post-clean filter content is the same as what's currently in the index.
- The clean filter runs on every "git status" call anyway, so this behavior does not appear to be an optimization.

So if a file is modified such that the post-clean filter content is the same as what's in the index, "git status" will show the file as modified only if the file size has also changed. It seems that perhaps "git status" is comparing file sizes before applying the clean filter to see if the index entry needs to be refreshed? 

Vitaly

-----Original Message-----
From: Johannes Sixt <j6t@kdbg.org>
Sent: Monday, June 13, 2022 5:15 PM
To: Junio C Hamano <gitster@pobox.com>
Cc: Udoff, Marc <Marc.Udoff@deshaw.com>; Shupak, Vitaly <Vitaly.Shupak@deshaw.com>; git@vger.kernel.org
Subject: Re: git filter bug

This message was sent by an external party.


Am 13.06.22 um 19:29 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
>
>> git status does not compute differences; it only looks at the stat 
>> information, and that is by design for performance reasons. So, IMO, 
>> this is working as designed and not a bug.
>
> Hmph, is that true?  I thought "git status" did an equivalent of 
> diff.autoRefreshIndex just like other commands like "git diff" at the 
> Porcelain level.

Is it true? I don't know; you tell me ;) git status certainly does autoRefreshIndex, but is that based on a diff computation? I thought git status looks only at stat information.

> Is this more like the commonly seen "after you futzed the attributes 
> to affect normalization, "--renormalize" is needed to force the index 
> to match the cleaned version of working tree under the new clean 
> filter rules", I wonder?

Not in this case. The modified file that git status reports happens long after git commit -a has already applied the new filter.

-- Hannes

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-23 12:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 22:19 git filter bug Udoff, Marc
2022-06-11  7:43 ` Johannes Sixt
2022-06-13 17:29   ` Junio C Hamano
2022-06-13 21:15     ` Johannes Sixt
2022-06-14 19:11       ` Shupak, Vitaly
2022-06-23 12:13         ` Udoff, Marc

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).