git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Duplicate safecrlf warning for racily clean index entry
@ 2018-02-20 13:42 Matt McCutchen
  2018-02-21  7:53 ` Torsten Bögershausen
  2018-02-21 13:47 ` Matt McCutchen
  0 siblings, 2 replies; 5+ messages in thread
From: Matt McCutchen @ 2018-02-20 13:42 UTC (permalink / raw)
  To: git

I noticed that if a file subject to a safecrlf warning is added to the
index in the same second that it is created, resulting in a "racily
clean" index entry, then a subsequent "git add" command prints another
safecrlf warning.  I reproduced this on the current "next"
(499d7c4f91).  The procedure:

$ git init
$ git config core.autocrlf true
$ echo foo >file1 && git add file1 && git add file1
warning: LF will be replaced by CRLF in file1.
The file will have its original line endings in your working directory.
warning: LF will be replaced by CRLF in file1.
The file will have its original line endings in your working directory.
$ echo bar >file2 && sleep 1 && git add file2 && git add file2
warning: LF will be replaced by CRLF in file2.
The file will have its original line endings in your working directory.

This came up when I ran the test suite for Braid on Windows
(https://github.com/cristibalan/braid/issues/77).

The phenomenon actually seems to be more general: touching the file
causes the next "git add" to print a safecrlf warning, suggesting that
the warning occurs whenever the index entry is dirty.  One could argue
that a new warning is reasonable after touching the file, but it seems
clear that "racy cleanliness" is an implementation detail that
shouldn't have user-visible nondeterministic effects.

In either case, if "git update-index --refresh" (or "git status") is
run before "git add", then "git add" does not print the warning.  On
the other hand, if line endings in the working tree file are changed,
then git shows the file as having an unstaged change, even though the
content that would be added to the index after CRLF conversion is
identical.  So it seems that git remembers the pre-conversion file
content and uses it for "git update-index --refresh" and would just
need to use it for "git add" as well.

Thoughts about the proposed change?  Does someone want to work on it or
give me a pointer to where to get started?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Duplicate safecrlf warning for racily clean index entry
  2018-02-20 13:42 Duplicate safecrlf warning for racily clean index entry Matt McCutchen
@ 2018-02-21  7:53 ` Torsten Bögershausen
  2018-02-21 13:57   ` Matt McCutchen
  2018-02-21 13:47 ` Matt McCutchen
  1 sibling, 1 reply; 5+ messages in thread
From: Torsten Bögershausen @ 2018-02-21  7:53 UTC (permalink / raw)
  To: Matt McCutchen; +Cc: git

On Tue, Feb 20, 2018 at 08:42:26AM -0500, Matt McCutchen wrote:
> I noticed that if a file subject to a safecrlf warning is added to the
> index in the same second that it is created, resulting in a "racily
> clean" index entry, then a subsequent "git add" command prints another
> safecrlf warning.  I reproduced this on the current "next"
> (499d7c4f91).  The procedure:
> 
> $ git init
> $ git config core.autocrlf true
> $ echo foo >file1 && git add file1 && git add file1
> warning: LF will be replaced by CRLF in file1.
> The file will have its original line endings in your working directory.
> warning: LF will be replaced by CRLF in file1.
> The file will have its original line endings in your working directory.
> $ echo bar >file2 && sleep 1 && git add file2 && git add file2
> warning: LF will be replaced by CRLF in file2.
> The file will have its original line endings in your working directory.
> 
> This came up when I ran the test suite for Braid on Windows
> (https://github.com/cristibalan/braid/issues/77).

I think a .gitattributes file could/should be used. I'll answer
there seperatly.

> 
> The phenomenon actually seems to be more general: touching the file
> causes the next "git add" to print a safecrlf warning, suggesting that
> the warning occurs whenever the index entry is dirty.  One could argue
> that a new warning is reasonable after touching the file, but it seems
> clear that "racy cleanliness" is an implementation detail that
> shouldn't have user-visible nondeterministic effects.
> 
> In either case, if "git update-index --refresh" (or "git status") is
> run before "git add", then "git add" does not print the warning.  On
> the other hand, if line endings in the working tree file are changed,
> then git shows the file as having an unstaged change, even though the
> content that would be added to the index after CRLF conversion is
> identical.  So it seems that git remembers the pre-conversion file
> content and uses it for "git update-index --refresh" and would just
> need to use it for "git add" as well.
> 
> Thoughts about the proposed change?  Does someone want to work on it or
> give me a pointer to where to get started?

Good analyzes, thanks for that.

I don't hava a pointer, but what should happen ?
2 warnings for 2 "git add" should be OK, I think.

1 warning is part of the optimization, that Git does to handle
hundrets and thousands of files efficciently.

Is the 1/2 warning  real live problem  ?

> 
> Thanks,
> Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Duplicate safecrlf warning for racily clean index entry
  2018-02-20 13:42 Duplicate safecrlf warning for racily clean index entry Matt McCutchen
  2018-02-21  7:53 ` Torsten Bögershausen
@ 2018-02-21 13:47 ` Matt McCutchen
  2018-02-21 17:20   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Matt McCutchen @ 2018-02-21 13:47 UTC (permalink / raw)
  To: git

On Tue, 2018-02-20 at 08:42 -0500, Matt McCutchen wrote:
> In either case, if "git update-index --refresh" (or "git status") is
> run before "git add", then "git add" does not print the warning.  On
> the other hand, if line endings in the working tree file are changed,
> then git shows the file as having an unstaged change, even though the
> content that would be added to the index after CRLF conversion is
> identical.  So it seems that git remembers the pre-conversion file
> content and uses it for "git update-index --refresh" and would just
> need to use it for "git add" as well.

On further testing, this analysis is wrong.  What I was seeing is that
if the size of the working tree file has changed, git reports an
unstaged change.  (I suppose that reporting an unstaged change in this
case without checking whether the post-conversion content has changed
may be an important optimization.)  If the line endings are changed
without changing the size or post-conversion content, then no unstaged
change is reported.  It does not appear that git saves the pre-
conversion content.

Thus, if it were possible to create a file that doesn't need a safecrlf
warning, add it to the index, and then modify it so that it does need a
safecrlf warning without changing the size or post-conversion content,
we would have a bug where no warning is shown in the case where "git
status" is run before the second "git add".  I believe this bug can't
occur in the particular case of CRLF conversion without other filters
because the file that doesn't need a safecrlf warning has a unique
minimum (LF) or maximum (CRLF) size, though I presume it could occur
with custom filters.  My proposal would then be that "git add" should
not show a safecrlf warning if the size and post-conversion content
haven't changed; it would merely bring "git add" to parity with the
potential bug in the "git status" case.

Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Duplicate safecrlf warning for racily clean index entry
  2018-02-21  7:53 ` Torsten Bögershausen
@ 2018-02-21 13:57   ` Matt McCutchen
  0 siblings, 0 replies; 5+ messages in thread
From: Matt McCutchen @ 2018-02-21 13:57 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

On Wed, 2018-02-21 at 08:53 +0100, Torsten Bögershausen wrote:
> I don't hava a pointer, but what should happen ?
> 2 warnings for 2 "git add" should be OK, I think.
> 
> 1 warning is part of the optimization, that Git does to handle
> hundrets and thousands of files efficciently.
> 
> Is the 1/2 warning  real live problem  ?

As I've suggested, my opinion is that the nondeterministic second
warning can result in significant user confusion and should be avoided.
 (If it were always shown, I'd be less concerned.)  We'll see what the
decision-makers think.

Matt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Duplicate safecrlf warning for racily clean index entry
  2018-02-21 13:47 ` Matt McCutchen
@ 2018-02-21 17:20   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2018-02-21 17:20 UTC (permalink / raw)
  To: Matt McCutchen; +Cc: git

Matt McCutchen <matt@mattmccutchen.net> writes:

> ... may be an important optimization.)  If the line endings are changed
> without changing the size or post-conversion content, then no unstaged
> change is reported.  It does not appear that git saves the pre-
> conversion content.

Correct.  The cached-stat information is meant to be compared with
the size on the filesystem.  Based on your observation, it seems
that what you are seeing is not specific to safe-crlf thing.

If you reconfigure anything that affects the checkout conversion
codepath, including the "smudge" filter, an entry in the index that
used to be up-to-date will still have cached-stat info like
timestamp and size that match the on-disk file, even though if you
_were_ to check it out afresh out of the index, the reconfigured
checkout codepath may produce different file contents on-disk.  A
consequence of this is that you may cause Git to still say that the
path is clean, even though it is no longer true.

There is no single "right" solution out of this situation, as it
depends on the reason why you made such a reconfiguration in the
first place.

 - If the reason is because you found that what is stored in the
   index is correct but their contents are checked out incorrectly
   (e.g. both the index and the working tree files end their lines
   with LF, but you want your working tree files to be converted to
   CRLF, and you futzed with .gitattributes or core.crlf), then you
   would want to "correct" it by checking them out, bypassing the
   "if the cached-stat information says we already have the matching
   contents on disk, do not write the file out" optimization.

 - If the reason is the other way around, then you would want to
   "correct" the indexed contents by rehashing what you have on
   disk.  Perhaps a recently added "git add --renormalize" is what
   you are looking for.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-21 17:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 13:42 Duplicate safecrlf warning for racily clean index entry Matt McCutchen
2018-02-21  7:53 ` Torsten Bögershausen
2018-02-21 13:57   ` Matt McCutchen
2018-02-21 13:47 ` Matt McCutchen
2018-02-21 17:20   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).