git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Line ending normalization doesn't work as expected
@ 2017-10-03 15:00 Robert Dailey
  2017-10-03 16:26 ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Robert Dailey @ 2017-10-03 15:00 UTC (permalink / raw)
  To: Git

I'm on Windows using Git for Windows v2.13.1. Following github's
recommended process for fixing line endings after adding a
`.gitattributes` file[1], I run the following:

$ rm .git/index && git reset

Once I run `git status`, I see that no files have changed. Note that I
know for a fact in my repository, files were committed using CRLF line
endings (the files in question are C# code files, and no
.gitattributes was present at the time).

I also tried this:

$ git rm -r --cached . && git reset --hard

However, again `git status` shows no working copy modifications. The
one thing that *did* work (and I tried this on accident actually) is:

$ git rm -r --cached . && git add .

This properly showed all files in my index with line ending
modifications (I ran `git diff --cached -R` to be sure; the output
shows `^M` at the end of each line in the diff in this case). Note
that my global git config has `core.autocrlf` set to `false`, but I
also tried the top 2 commands above with it set to `true` but it made
no difference.

So my question is: Why do the top 2 commands not work, but the third
one does? To me this all feels like magic / nondeterministic, so I'm
hoping someone here knows what is going on and can explain the logic
of it. Also if this is a git config issue, I'm not sure what it could
be. Note my `.gitattributes` just has this in it:

* text=auto

Thanks in advance.


[1]: https://help.github.com/articles/dealing-with-line-endings/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-03 15:00 Line ending normalization doesn't work as expected Robert Dailey
@ 2017-10-03 16:26 ` Torsten Bögershausen
  2017-10-03 17:23   ` Robert Dailey
  0 siblings, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-10-03 16:26 UTC (permalink / raw)
  To: Robert Dailey, Git

On 2017-10-03 17:00, Robert Dailey wrote:
> I'm on Windows using Git for Windows v2.13.1. Following github's
> recommended process for fixing line endings after adding a
> `.gitattributes` file[1], I run the following:
> 
> $ rm .git/index && git reset
> 
> Once I run `git status`, I see that no files have changed. Note that I
> know for a fact in my repository, files were committed using CRLF line
> endings (the files in question are C# code files, and no
> .gitattributes was present at the time).
> 
> I also tried this:
> 
> $ git rm -r --cached . && git reset --hard
> 
> However, again `git status` shows no working copy modifications. The
> one thing that *did* work (and I tried this on accident actually) is:
> 
> $ git rm -r --cached . && git add .
> 
> This properly showed all files in my index with line ending
> modifications (I ran `git diff --cached -R` to be sure; the output
> shows `^M` at the end of each line in the diff in this case). Note
> that my global git config has `core.autocrlf` set to `false`, but I
> also tried the top 2 commands above with it set to `true` but it made
> no difference.
> 
> So my question is: Why do the top 2 commands not work, but the third
> one does? To me this all feels like magic / nondeterministic, so I'm
> hoping someone here knows what is going on and can explain the logic
> of it. Also if this is a git config issue, I'm not sure what it could
> be. Note my `.gitattributes` just has this in it:

The short version is, that the instructions on Github are outdated.
This is the official procedure (since 2016, Git v2.12 or so)
But it should work even with older version of Git.

$ echo "* text=auto" >.gitattributes
$ git read-tree --empty   # Clean index, force re-scan of working directory
$ git add .
$ git status        # Show files that will be normalized
$ git commit -m "Introduce end-of-line normalization"


Could you open an issue on Github ?
(Or can someone @github fix this ?)

> 
> * text=auto
> 
> Thanks in advance.
> 
> 
> [1]: https://help.github.com/articles/dealing-with-line-endings/
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-03 16:26 ` Torsten Bögershausen
@ 2017-10-03 17:23   ` Robert Dailey
  2017-10-03 19:19     ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Robert Dailey @ 2017-10-03 17:23 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git

On Tue, Oct 3, 2017 at 11:26 AM, Torsten Bögershausen <tboegi@web.de> wrote:
> The short version is, that the instructions on Github are outdated.
> This is the official procedure (since 2016, Git v2.12 or so)
> But it should work even with older version of Git.
>
> $ echo "* text=auto" >.gitattributes
> $ git read-tree --empty   # Clean index, force re-scan of working directory
> $ git add .
> $ git status        # Show files that will be normalized
> $ git commit -m "Introduce end-of-line normalization"

Is the way I did it that worked also a valid solution? Or did it only
work accidentally? Again the command I ran that worked is:

$ git rm -r --cached . && git add .

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-03 17:23   ` Robert Dailey
@ 2017-10-03 19:19     ` Torsten Bögershausen
  2017-10-04  2:00       ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-10-03 19:19 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

On 2017-10-03 19:23, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 11:26 AM, Torsten Bögershausen <tboegi@web.de> wrote:
>> The short version is, that the instructions on Github are outdated.
>> This is the official procedure (since 2016, Git v2.12 or so)
>> But it should work even with older version of Git.
>>
>> $ echo "* text=auto" >.gitattributes
>> $ git read-tree --empty   # Clean index, force re-scan of working directory
>> $ git add .
>> $ git status        # Show files that will be normalized
>> $ git commit -m "Introduce end-of-line normalization"
> 
> Is the way I did it that worked also a valid solution? Or did it only
> work accidentally? Again the command I ran that worked is:
> 
> $ git rm -r --cached . && git add .

(Both should work)

To be honest, from the documentation, I can't figure out the difference between
$ git read-tree --empty
and
$ git rm -r --cached .

Does anybody remember the discussion, why we ended up with read-tree ?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-03 19:19     ` Torsten Bögershausen
@ 2017-10-04  2:00       ` Junio C Hamano
  2017-10-04 16:26         ` Robert Dailey
  2018-02-15 15:24         ` Line ending normalization doesn't work as expected Robert Dailey
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-10-04  2:00 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Robert Dailey, Git

Torsten Bögershausen <tboegi@web.de> writes:

>> $ git rm -r --cached . && git add .
>
> (Both should work)
>
> To be honest, from the documentation, I can't figure out the difference between
> $ git read-tree --empty
> and
> $ git rm -r --cached .
>
> Does anybody remember the discussion, why we ended up with read-tree ?

We used to use neither, and considered it fine to "rm .git/index" if
you wanted to empty the on-disk index file in the old world.  In the
modern world, folks want you to avoid touching filesystem directly
and instead want you to use Git tools, and the above are two obvious
ways to do so.

"git read-tree" (without any parameter, i.e. "read these 0 trees and
populate the index with it") and its modern and preferred synonym
"git read-tree --empty" (i.e. "I am giving 0 trees and I know the
sole effect of this command is to empty the index.") are more direct
ways to express "I want the index emptied" between the two.

The other one, "git rm -r --cached .", in the end gives you the same
state because it tells Git to "iterate over all the entries in the
index, find the ones that match pathspec '.', and remove them from
the index.".  It is not wrong per-se, but conceptually it is a bit
roundabout way to say that "I want the index emptied", I would
think.

I wouldn't be surprised if the "rm -r --cached ." were a lot slower,
due to the overhead of having to do the pathspec filtering that ends
up to be a no-op, but there shouldn't be a difference in the end
result.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04  2:00       ` Junio C Hamano
@ 2017-10-04 16:26         ` Robert Dailey
  2017-10-04 16:59           ` Jonathan Nieder
  2017-10-04 21:17           ` Torsten Bögershausen
  2018-02-15 15:24         ` Line ending normalization doesn't work as expected Robert Dailey
  1 sibling, 2 replies; 34+ messages in thread
From: Robert Dailey @ 2017-10-04 16:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, Git

On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>>> $ git rm -r --cached . && git add .
>>
>> (Both should work)
>>
>> To be honest, from the documentation, I can't figure out the difference between
>> $ git read-tree --empty
>> and
>> $ git rm -r --cached .
>>
>> Does anybody remember the discussion, why we ended up with read-tree ?
>
> We used to use neither, and considered it fine to "rm .git/index" if
> you wanted to empty the on-disk index file in the old world.  In the
> modern world, folks want you to avoid touching filesystem directly
> and instead want you to use Git tools, and the above are two obvious
> ways to do so.
>
> "git read-tree" (without any parameter, i.e. "read these 0 trees and
> populate the index with it") and its modern and preferred synonym
> "git read-tree --empty" (i.e. "I am giving 0 trees and I know the
> sole effect of this command is to empty the index.") are more direct
> ways to express "I want the index emptied" between the two.
>
> The other one, "git rm -r --cached .", in the end gives you the same
> state because it tells Git to "iterate over all the entries in the
> index, find the ones that match pathspec '.', and remove them from
> the index.".  It is not wrong per-se, but conceptually it is a bit
> roundabout way to say that "I want the index emptied", I would
> think.
>
> I wouldn't be surprised if the "rm -r --cached ." were a lot slower,
> due to the overhead of having to do the pathspec filtering that ends
> up to be a no-op, but there shouldn't be a difference in the end
> result.

You guys are obviously worlds ahead of me on the internals of things,
but from my perspective I like to avoid the "plumbing" commands as
much as I can. Even if I used them, if I have to tell the rest of my
team "this is the way to do it", they're going to give me dirty looks
if I ask them to run things like this that make no sense to them.
That's the argument I have to deal with when it comes to Git's
usability within the team I manage. So based on this, I'd favor the
`git rm -r --cached` approach because this is the more common result
you see in Google, and also makes a little more sense from a high
level of thought perspective. However, this is just my personal
opinion. `read-tree --empty` is far less self explanatory IMHO.

Also let's not forget the second part of the command chain that
results in the different behavior. In one case, I use `git add` which
results in proper line ending normalization. In the other case, I do
`git reset --hard` which does *NOT* result in the line endings
normalized (`git status` shows no results). In both cases, I'm still
doing `git rm -r --cached`, so I am doubtful that is the root cause
for the line ending normalization piece. I'm still trying to
understand why both give different results (root cause) and also get
an understanding of what the correct (modern) solution is for line
ending normalization (not necessarily which is the right way to
clear/delete the index, which is really AFAIK just a means to this end
and an implementation detail of sorts for this specific task).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04 16:26         ` Robert Dailey
@ 2017-10-04 16:59           ` Jonathan Nieder
  2017-10-04 18:03             ` Robert Dailey
  2017-10-05  1:31             ` Junio C Hamano
  2017-10-04 21:17           ` Torsten Bögershausen
  1 sibling, 2 replies; 34+ messages in thread
From: Jonathan Nieder @ 2017-10-04 16:59 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Torsten Bögershausen, Git

Hi Robert,

Robert Dailey wrote:

> You guys are obviously worlds ahead of me on the internals of things,
> but from my perspective I like to avoid the "plumbing" commands as
> much as I can.

I suspect what we are dancing around is the need for some command like

	git checkout --renormalize .

which would shorten the sequence to

	git checkout --renormalize .
	git status; # Show files that will be normalized
	git commit; # Commit the result

What do you think?  Would you be interested in writing a patch for it?
("No" is as always an acceptable answer.)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04 16:59           ` Jonathan Nieder
@ 2017-10-04 18:03             ` Robert Dailey
  2017-10-05  1:31             ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Robert Dailey @ 2017-10-04 18:03 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, Torsten Bögershausen, Git

On Wed, Oct 4, 2017 at 11:59 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Hi Robert,
>
> Robert Dailey wrote:
>
>> You guys are obviously worlds ahead of me on the internals of things,
>> but from my perspective I like to avoid the "plumbing" commands as
>> much as I can.
>
> I suspect what we are dancing around is the need for some command like
>
>         git checkout --renormalize .
>
> which would shorten the sequence to
>
>         git checkout --renormalize .
>         git status; # Show files that will be normalized
>         git commit; # Commit the result
>
> What do you think?  Would you be interested in writing a patch for it?
> ("No" is as always an acceptable answer.)

I wish I could, but ultimately I'd probably not be able to do it. I
rarely have time to do recreational coding outside of work these days.

That aside, for now I want to know the proper & recommended method to
renormalize line endings using existing commands. Additionally I also
am interested in knowing why only 1 of the 3 solutions I tried (In my
OP) worked but the others didn't. My short term goal is just to get
educated a bit. There's so much conflicting and variable information
on this topic on Google. It makes it difficult to find the one true
path, especially since Git evolves and improves over time and
information usually becomes stale.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04 16:26         ` Robert Dailey
  2017-10-04 16:59           ` Jonathan Nieder
@ 2017-10-04 21:17           ` Torsten Bögershausen
  2017-10-05  1:38             ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-10-04 21:17 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Git

On Wed, Oct 04, 2017 at 11:26:55AM -0500, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Torsten Bögershausen <tboegi@web.de> writes:
> >
> >>> $ git rm -r --cached . && git add .
> >>
> >> (Both should work)
> >>
> >> To be honest, from the documentation, I can't figure out the difference between
> >> $ git read-tree --empty
> >> and
> >> $ git rm -r --cached .
> >>
> >> Does anybody remember the discussion, why we ended up with read-tree ?
> >
> > We used to use neither, and considered it fine to "rm .git/index" if
> > you wanted to empty the on-disk index file in the old world.  In the
> > modern world, folks want you to avoid touching filesystem directly
> > and instead want you to use Git tools, and the above are two obvious
> > ways to do so.
> >
> > "git read-tree" (without any parameter, i.e. "read these 0 trees and
> > populate the index with it") and its modern and preferred synonym
> > "git read-tree --empty" (i.e. "I am giving 0 trees and I know the
> > sole effect of this command is to empty the index.") are more direct
> > ways to express "I want the index emptied" between the two.
> >
> > The other one, "git rm -r --cached .", in the end gives you the same
> > state because it tells Git to "iterate over all the entries in the
> > index, find the ones that match pathspec '.', and remove them from
> > the index.".  It is not wrong per-se, but conceptually it is a bit
> > roundabout way to say that "I want the index emptied", I would
> > think.
> >
> > I wouldn't be surprised if the "rm -r --cached ." were a lot slower,
> > due to the overhead of having to do the pathspec filtering that ends
> > up to be a no-op, but there shouldn't be a difference in the end
> > result.
> 
> You guys are obviously worlds ahead of me on the internals of things,
> but from my perspective I like to avoid the "plumbing" commands as
> much as I can. Even if I used them, if I have to tell the rest of my
> team "this is the way to do it", they're going to give me dirty looks
> if I ask them to run things like this that make no sense to them.
> That's the argument I have to deal with when it comes to Git's
> usability within the team I manage. So based on this, I'd favor the
> `git rm -r --cached` approach because this is the more common result
> you see in Google, and also makes a little more sense from a high
> level of thought perspective. However, this is just my personal
> opinion. `read-tree --empty` is far less self explanatory IMHO.
> 
> Also let's not forget the second part of the command chain that
> results in the different behavior. In one case, I use `git add` which
> results in proper line ending normalization. In the other case, I do
> `git reset --hard` which does *NOT* result in the line endings
> normalized (`git status` shows no results). In both cases, I'm still
> doing `git rm -r --cached`, so I am doubtful that is the root cause
> for the line ending normalization piece. I'm still trying to
> understand why both give different results (root cause) and also get
> an understanding of what the correct (modern) solution is for line
> ending normalization (not necessarily which is the right way to
> clear/delete the index, which is really AFAIK just a means to this end
> and an implementation detail of sorts for this specific task).

Hopefully I am able to give a useful answer.

"git reset --hard" works like a hammer
and may destroy work that has been done,
in our case the cleaning of the index,
which is needed for normalization since Git 2.10 (or so)

Back to the question:
One solution, which you can tell your team, is this one:
$ git rm -r --cached . && git add .

And as Junio pointed out, this may be slower than needed.
And we don't want "slow" solutions in the official documentation ;-)

Whatever you find on search engines may get stale after a while,
so that we appreciate direct questions here.

(And I will open an issue on Github the next days)

The background is that the CRLF handling in Git changed over the years,
and one effect is that "git reset" is not "allowed" any more.

For the interested reader:
https://github.com/git-for-windows/git/issues/954


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04 16:59           ` Jonathan Nieder
  2017-10-04 18:03             ` Robert Dailey
@ 2017-10-05  1:31             ` Junio C Hamano
  2017-10-05  1:46               ` Jonathan Nieder
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-10-05  1:31 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Robert Dailey, Torsten Bögershausen, Git

Jonathan Nieder <jrnieder@gmail.com> writes:

> I suspect what we are dancing around is the need for some command like
>
> 	git checkout --renormalize .
>
> which would shorten the sequence to
>
> 	git checkout --renormalize .
> 	git status; # Show files that will be normalized
> 	git commit; # Commit the result
>
> What do you think?  Would you be interested in writing a patch for it?
> ("No" is as always an acceptable answer.)

I actually think what is being requested is the opposite, i.e. "the
object registered in the index have wrong line endings, and the
safe-crlf is getting in the way to prevent me from correcting by
hashing the working tree contents again to register contents with
corrected line endings, even with 'git add .'".

So I would understand if your suggestion were for

	git checkin --renormalize .

but not "git checkout".  And it probably is more familiar to lay
people if we spelled that as "git add --renormalize ." ;-)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04 21:17           ` Torsten Bögershausen
@ 2017-10-05  1:38             ` Junio C Hamano
  2017-10-05  3:31               ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-10-05  1:38 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Robert Dailey, Git

Torsten Bögershausen <tboegi@web.de> writes:

> One solution, which you can tell your team, is this one:
> $ git rm -r --cached . && git add .

Both this and its "git read-tree --empty" cousin share a grave
issue.  The "git add ." step would mean that before doing these
commands, your working tree must be truly clean, i.e. the paths
in the filesystem known to the index must match what is in the
index (modulo the line-ending gotcha you are trying to correct), 
*AND* there must be *NO* untracked paths you do not want to add
in the working tree.

That is a reason why we should solve it differently.  Perhaps adding
a new option "git add --rehash" to tell Git "Hey, you may think some
paths in the index and in the working tree are identical and no need
to re-register, but you are WRONG.  For each path in the index,
remove it and then register the object by hashing the contents from
the filesystem afresh!" would be the best way to go.  That will not
pick up untracked paths left in the filesystem, and does not limit
our solution to the "eol normalization is screwey" issue by not
calling the option "renormalize" or any other words that imply "why"
we are hashing again anew.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-05  1:31             ` Junio C Hamano
@ 2017-10-05  1:46               ` Jonathan Nieder
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2017-10-05  1:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robert Dailey, Torsten Bögershausen, Git

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> 	git checkout --renormalize .
>> 	git status; # Show files that will be normalized
>> 	git commit; # Commit the result
>>
>> What do you think?  Would you be interested in writing a patch for it?
>> ("No" is as always an acceptable answer.)
>
> I actually think what is being requested is the opposite, i.e. "the
> object registered in the index have wrong line endings, and the
> safe-crlf is getting in the way to prevent me from correcting by
> hashing the working tree contents again to register contents with
> corrected line endings, even with 'git add .'".
>
> So I would understand if your suggestion were for
>
> 	git checkin --renormalize .
>
> but not "git checkout".  And it probably is more familiar to lay
> people if we spelled that as "git add --renormalize ." ;-)

Good catch.  You understood correctly --- "git add --renormalize" is
the feature that I think is being hinted at here.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-05  1:38             ` Junio C Hamano
@ 2017-10-05  3:31               ` Junio C Hamano
  2017-10-05 21:42                 ` Torsten Bögershausen
  2017-10-16 16:49                 ` [PATCH v1 1/1] Introduce git add --renormalize tboegi
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-10-05  3:31 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Robert Dailey, Git

Junio C Hamano <gitster@pobox.com> writes:

> Both this and its "git read-tree --empty" cousin share a grave
> issue.  The "git add ." step would mean that before doing these
> commands, your working tree must be truly clean, i.e. the paths
> in the filesystem known to the index must match what is in the
> index (modulo the line-ending gotcha you are trying to correct), 
> *AND* there must be *NO* untracked paths you do not want to add
> in the working tree.
>
> That is a reason why we should solve it differently.  Perhaps adding
> a new option "git add --rehash" to tell Git "Hey, you may think some
> paths in the index and in the working tree are identical and no need
> to re-register, but you are WRONG.  For each path in the index,
> remove it and then register the object by hashing the contents from
> the filesystem afresh!" would be the best way to go.

Here is just to illustrate the direction I was heading to in the
above.  This is not even compile tested and I won't guarantee what
corner cases there are, though.

In a true production code, we shouldn't be using string-list with
two loops, but I just didn't want to spend more braincycles worrying
about removing from the list and then adding to it, both inside a
single loop that iterates over it in a mere illustration patch.

The second loop uses a simple "remove then add", but I think it
should rather be a "mark ce that it will _never_ match anything on
the working tree" followed by "add_file_to_cache()".  Currently we
do not have the "mark ce that it never matches" operation that lets
us bypass the comparison with the current cache entry (with safecrlf
thing that interferes), but we can afford to use a (in-core only)
bit in the ce flags word to represent this and plumb it through.
That way, we will still preserve the executable bit from the
original entry, hopefully ;-)


 builtin/add.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index 5d5773d5cd..264f84dbe7 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
 };
 static int patch_interactive, add_interactive, edit_interactive;
 static int take_worktree_changes;
+static int rehash;
 
 struct update_callback_data {
 	int flags;
@@ -121,6 +122,41 @@ int add_files_to_cache(const char *prefix,
 	return !!data.add_errors;
 }
 
+static int rehash_tracked_files(const char *prefix, const struct pathspec *pathspec,
+				int flags)
+{
+	struct string_list paths = STRING_LIST_INIT_DUP;
+	struct string_list_item *path;
+	int i, retval = 0;
+
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+
+		if (ce_stage(ce))
+			continue; /* do not touch unmerged paths */
+		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
+			continue; /* do not touch non blobs */
+		if (pathspec && !ce_path_match(ce, pathspec, NULL))
+			continue;
+		string_list_append(&paths, ce->name);
+	}
+
+	for_each_string_list_item(path, &paths) {
+		/*
+		 * Having a blob contaminated with CR will trigger the
+		 * safe-crlf kludge, avoidance of which is the primary
+		 * thing this helper function exists.  Remove it and
+		 * then re-add it.  Note that this may lose executable
+		 * bit on a filesystem that lacks it.
+		 */
+		remove_file_from_cache(path->string);
+		add_file_to_cache(path->string, flags);
+	}
+
+	string_list_clear(&paths, 0);
+	return retval;
+}
+
 static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
 {
 	char *seen;
@@ -274,6 +310,7 @@ static struct option builtin_add_options[] = {
 	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
 	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
 	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
+	OPT_BOOL(0, "rehash", &rehash, N_("really update tracked files")),
 	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
 	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
 	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
@@ -498,7 +535,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	plug_bulk_checkin();
 
-	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
+	if (rehash)
+		exit_status |= rehash_tracked_files(prefix, &pathspec, flags);
+	else
+		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-05  3:31               ` Junio C Hamano
@ 2017-10-05 21:42                 ` Torsten Bögershausen
  2017-10-06  0:33                   ` Junio C Hamano
  2017-10-16 16:49                 ` [PATCH v1 1/1] Introduce git add --renormalize tboegi
  1 sibling, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-10-05 21:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robert Dailey, Git

 
>  builtin/add.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/add.c b/builtin/add.c
> index 5d5773d5cd..264f84dbe7 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
>  };
>  static int patch_interactive, add_interactive, edit_interactive;
>  static int take_worktree_changes;
> +static int rehash;
>  
>  struct update_callback_data {
>  	int flags;
> @@ -121,6 +122,41 @@ int add_files_to_cache(const char *prefix,
>  	return !!data.add_errors;
>  }
>  
> +static int rehash_tracked_files(const char *prefix, const struct pathspec *pathspec,
> +				int flags)
> +{
> +	struct string_list paths = STRING_LIST_INIT_DUP;
> +	struct string_list_item *path;
> +	int i, retval = 0;
> +
> +	for (i = 0; i < active_nr; i++) {
> +		struct cache_entry *ce = active_cache[i];
> +
> +		if (ce_stage(ce))
> +			continue; /* do not touch unmerged paths */
> +		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> +			continue; /* do not touch non blobs */
> +		if (pathspec && !ce_path_match(ce, pathspec, NULL))
> +			continue;
> +		string_list_append(&paths, ce->name);
> +	}
> +
> +	for_each_string_list_item(path, &paths) {
> +		/*
> +		 * Having a blob contaminated with CR will trigger the
> +		 * safe-crlf kludge, avoidance of which is the primary
> +		 * thing this helper function exists.  Remove it and
> +		 * then re-add it.  Note that this may lose executable
> +		 * bit on a filesystem that lacks it.
> +		 */
> +		remove_file_from_cache(path->string);
> +		add_file_to_cache(path->string, flags);
> +	}
> +
> +	string_list_clear(&paths, 0);
> +	return retval;
> +}
> +
>  static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
>  {
>  	char *seen;
> @@ -274,6 +310,7 @@ static struct option builtin_add_options[] = {
>  	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
>  	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
>  	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
> +	OPT_BOOL(0, "rehash", &rehash, N_("really update tracked files")),
>  	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
>  	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
>  	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
> @@ -498,7 +535,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  
>  	plug_bulk_checkin();
>  
> -	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
> +	if (rehash)
> +		exit_status |= rehash_tracked_files(prefix, &pathspec, flags);
> +	else
> +		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
>  
>  	if (add_new_files)
>  		exit_status |= add_files(&dir, flags);

That looks like a nice one.
Before we put this into stone:
Does it make sense to say "renormalize" instead of "rehash" ?
(That term does exist already for merge.
 And rehash is more a technical term,  rather then a user-point-of-view explanation)
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-05 21:42                 ` Torsten Bögershausen
@ 2017-10-06  0:33                   ` Junio C Hamano
  2017-10-06 17:58                     ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-10-06  0:33 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Robert Dailey, Git

Torsten Bögershausen <tboegi@web.de> writes:

> Before we put this into stone:
> Does it make sense to say "renormalize" instead of "rehash" ?
> (That term does exist already for merge.
>  And rehash is more a technical term,  rather then a user-point-of-view explanation)

I do not mind "renormalize" at all.

As to the toy patch, I think it needs to (at least by default) turn
off the add_new_files codepath, and be allowed to work without any
pathspec (in which case all tracked paths should be renormalized).

And we really shouldn't do the "rm && add", which would not work
well on platforms where filesystem without executing-bit support is
prevalent.  This new feature is primarily needed on platforms where
CRLF line endings are used, and unfortunately these two sets of
platforms overlap quite a bit X-<.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-06  0:33                   ` Junio C Hamano
@ 2017-10-06 17:58                     ` Torsten Bögershausen
  0 siblings, 0 replies; 34+ messages in thread
From: Torsten Bögershausen @ 2017-10-06 17:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robert Dailey, Git

On Fri, Oct 06, 2017 at 09:33:31AM +0900, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
> > Before we put this into stone:
> > Does it make sense to say "renormalize" instead of "rehash" ?
> > (That term does exist already for merge.
> >  And rehash is more a technical term,  rather then a user-point-of-view explanation)
> 
> I do not mind "renormalize" at all.
> 
> As to the toy patch, I think it needs to (at least by default) turn
> off the add_new_files codepath, and be allowed to work without any
> pathspec (in which case all tracked paths should be renormalized).
> 

OK, then I will pick up your patch in a couple of days/weeks, and push it further then
(Documentation, test cases, other ?)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 1/1] Introduce git add --renormalize .
  2017-10-05  3:31               ` Junio C Hamano
  2017-10-05 21:42                 ` Torsten Bögershausen
@ 2017-10-16 16:49                 ` tboegi
  2017-10-16 17:34                   ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: tboegi @ 2017-10-16 16:49 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Make it safer to normalize the line endings in a repository:
Files that had been commited with CRLF will be commited with LF.
(Unless core.autorclf and .gitattributes specify that Git
 should not do line ending conversions)

The old way to normalize a repo was like this:
 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The new method is one step shorter, more intuitive and does not
add untracked files:
 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Documentation/git-add.txt       |  8 +++++++-
 Documentation/gitattributes.txt |  3 +--
 builtin/add.c                   | 27 +++++++++++++++++++++++++--
 cache.h                         |  1 +
 convert.c                       |  1 +
 environment.c                   |  1 +
 read-cache.c                    | 24 ++++++++++++++----------
 t/t0025-crlf-renormalize.sh     | 30 ++++++++++++++++++++++++++++++
 8 files changed, 80 insertions(+), 15 deletions(-)
 create mode 100755 t/t0025-crlf-renormalize.sh

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index f4169fb1ec..b6e431903d 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 [verse]
 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p]
 	  [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]]
-	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing]
+	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize]
 	  [--chmod=(+|-)x] [--] [<pathspec>...]
 
 DESCRIPTION
@@ -172,6 +172,12 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
 	warning (e.g., if you are manually performing operations on
 	submodules).
 
+--renormalize::
+	Normalizes the line endings from CRLF to LF of tracked files.
+	This applies to files which are either "text" or "text=auto"
+	in .gitattributes (or core.autocrlf is true or input)
+        --renormalize implies -u
+
 --chmod=(+|-)x::
 	Override the executable bit of the added files.  The executable
 	bit is only changed in the index, the files on disk are left
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4c68bc19d5..071dec2bc4 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -232,8 +232,7 @@ From a clean working directory:
 
 -------------------------------------------------
 $ echo "* text=auto" >.gitattributes
-$ git read-tree --empty   # Clean index, force re-scan of working directory
-$ git add .
+$ git add --renormalize .
 $ git status        # Show files that will be normalized
 $ git commit -m "Introduce end-of-line normalization"
 -------------------------------------------------
diff --git a/builtin/add.c b/builtin/add.c
index a648cf4c56..ee8e756fdc 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -123,6 +123,25 @@ int add_files_to_cache(const char *prefix,
 	return !!data.add_errors;
 }
 
+static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
+{
+	int i, retval = 0;
+
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+
+		if (ce_stage(ce))
+			continue; /* do not touch unmerged paths */
+		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
+			continue; /* do not touch non blobs */
+		if (pathspec && !ce_path_match(ce, pathspec, NULL))
+			continue;
+		retval |= add_file_to_cache(ce->name, flags);
+	}
+
+	return retval;
+}
+
 static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
 {
 	char *seen;
@@ -276,6 +295,7 @@ static struct option builtin_add_options[] = {
 	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
 	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
 	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
+	OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
 	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
 	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
 	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
@@ -406,7 +426,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			  chmod_arg[1] != 'x' || chmod_arg[2]))
 		die(_("--chmod param '%s' must be either -x or +x"), chmod_arg);
 
-	add_new_files = !take_worktree_changes && !refresh_only;
+	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
@@ -500,7 +520,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	plug_bulk_checkin();
 
-	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
+	if (add_renormalize)
+		exit_status |= renormalize_tracked_files(&pathspec, flags);
+	else
+		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
diff --git a/cache.h b/cache.h
index a916bc79e3..d4e8f86696 100644
--- a/cache.h
+++ b/cache.h
@@ -489,6 +489,7 @@ extern const char *get_git_namespace(void);
 extern const char *strip_namespace(const char *namespaced_ref);
 extern const char *get_super_prefix(void);
 extern const char *get_git_work_tree(void);
+extern int add_renormalize;
 
 /*
  * Return true if the given path is a git directory; note that this _just_
diff --git a/convert.c b/convert.c
index a09935cb81..70a37b0942 100644
--- a/convert.c
+++ b/convert.c
@@ -290,6 +290,7 @@ static int crlf_to_git(const struct index_state *istate,
 		 * cherry-pick.
 		 */
 		if ((checksafe != SAFE_CRLF_RENORMALIZE) &&
+		    !add_renormalize &&
 		    has_cr_in_index(istate, path))
 			convert_crlf_into_lf = 0;
 	}
diff --git a/environment.c b/environment.c
index f1f934b6fd..16fbce1123 100644
--- a/environment.c
+++ b/environment.c
@@ -87,6 +87,7 @@ int auto_comment_line_char;
 /* Parallel index stat data preload? */
 int core_preload_index = 1;
 
+int add_renormalize;
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
  * ensure that they do not modify the untracked cache when reading it.
diff --git a/read-cache.c b/read-cache.c
index b211c57af6..3388c5053f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -630,7 +630,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 {
 	int size, namelen, was_same;
 	mode_t st_mode = st->st_mode;
-	struct cache_entry *ce, *alias;
+	struct cache_entry *ce, *alias = NULL;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_RACY_IS_DIRTY;
 	int verbose = flags & (ADD_CACHE_VERBOSE | ADD_CACHE_PRETEND);
 	int pretend = flags & ADD_CACHE_PRETEND;
@@ -677,16 +677,20 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	if (ignore_case) {
 		adjust_dirname_case(istate, ce->name);
 	}
+	if (!add_renormalize) {
+		alias = index_file_exists(istate, ce->name,
+					  ce_namelen(ce), ignore_case);
+		if (alias &&
+		    !ce_stage(alias) &&
+		    !ie_match_stat(istate, alias, st, ce_option)) {
+			/* Nothing changed, really */
+			if (!S_ISGITLINK(alias->ce_mode))
+				ce_mark_uptodate(alias);
+			alias->ce_flags |= CE_ADDED;
 
-	alias = index_file_exists(istate, ce->name, ce_namelen(ce), ignore_case);
-	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
-		/* Nothing changed, really */
-		if (!S_ISGITLINK(alias->ce_mode))
-			ce_mark_uptodate(alias);
-		alias->ce_flags |= CE_ADDED;
-
-		free(ce);
-		return 0;
+			free(ce);
+			return 0;
+		}
 	}
 	if (!intent_only) {
 		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
new file mode 100755
index 0000000000..fb1ed631d2
--- /dev/null
+++ b/t/t0025-crlf-renormalize.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='CRLF renormalization'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git config core.autocrlf false &&
+	printf "LINEONE\nLINETWO\nLINETHREE"     >LF.txt &&
+	printf "LINEONE\r\nLINETWO\r\nLINETHREE" >CRLF.txt &&
+	printf "LINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF.txt &&
+	git add . &&
+	git commit -m initial
+'
+
+test_expect_success 'renormalize CRLF in repo' '
+	echo "*.txt text=auto" >.gitattributes &&
+	git add --renormalize "*.txt" &&
+cat >expect <<EOF &&
+i/lf w/crlf attr/text=auto CRLF.txt
+i/lf w/lf attr/text=auto LF.txt
+i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
+EOF
+	git ls-files --eol |
+	sed -e "s/	/ /g" -e "s/  */ /g" |
+	sort >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.14.1.729.g59c0ea183a


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 1/1] Introduce git add --renormalize .
  2017-10-16 16:49                 ` [PATCH v1 1/1] Introduce git add --renormalize tboegi
@ 2017-10-16 17:34                   ` Junio C Hamano
  2017-10-30 16:29                     ` [PATCH v2 " tboegi
  2017-11-16 16:38                     ` [PATCH v3 " tboegi
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-10-16 17:34 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> Make it safer to normalize the line endings in a repository:
> Files that had been commited with CRLF will be commited with LF.
> (Unless core.autorclf and .gitattributes specify that Git
>  should not do line ending conversions)

A few issues I saw after a quick read:

 - The log message tells us old and new ways, but does not make it
   clear why users are encouraged to use the new way at all.  You
   didn't make the implementation of "add --renormalize" to just
   start "git add" without calling read_cache() and letting all
   files added new to the index (which is how the old way worked) to
   give a sugarcoated equivalent of the old way for a reason, and
   that should be desribed in the log.

 - An ugly global variable is introduced instead of passing
   necessary information through the callchain properly, but the
   title does not say PATCH/RFC.

 - The documentation makes it sound as if this new feature is _only_
   about CRLF vs LF.  SHouldn't this equally apply after the user
   changes .gitattributes settings that govern the "clean" side of
   the filter and makes what is in the index "unclean"?

The second point is a showstopper from maintainability's point of
view, but none of the above should be insurmojntable.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/1] Introduce git add --renormalize .
  2017-10-16 17:34                   ` Junio C Hamano
@ 2017-10-30 16:29                     ` tboegi
  2017-11-07  5:50                       ` Junio C Hamano
  2017-11-16 16:38                     ` [PATCH v3 " tboegi
  1 sibling, 1 reply; 34+ messages in thread
From: tboegi @ 2017-10-30 16:29 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Make it safer to normalize the line endings in a repository:
Files that had been commited with CRLF will be commited with LF.

The old way to normalize a repo was like this:
 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.

The new "add ..renormalize" does not add untracked files:
 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

While add it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.

Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---

Second version:
- Removed the global flag
- Make clearer that the clean filters may need renormalization
- commit message improved

Documentation/git-add.txt       |  8 +++++++-
 Documentation/gitattributes.txt |  6 ++++--
 builtin/add.c                   | 28 ++++++++++++++++++++++++++--
 cache.h                         |  1 +
 read-cache.c                    | 30 +++++++++++++++++++-----------
 sha1_file.c                     | 16 ++++++++++++++--
 t/t0025-crlf-renormalize.sh     | 30 ++++++++++++++++++++++++++++++
 7 files changed, 101 insertions(+), 18 deletions(-)
 create mode 100755 t/t0025-crlf-renormalize.sh

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index b700beaff5..09a08ce4c1 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 [verse]
 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p]
 	  [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]]
-	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing]
+	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize]
 	  [--chmod=(+|-)x] [--] [<pathspec>...]
 
 DESCRIPTION
@@ -175,6 +175,12 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
 	warning (e.g., if you are manually performing operations on
 	submodules).
 
+--renormalize::
+	Normalizes the line endings from CRLF to LF of tracked files.
+	This applies to files which are either "text" or "text=auto"
+	in .gitattributes (or core.autocrlf is true or input)
+	--renormalize implies -u
+
 --chmod=(+|-)x::
 	Override the executable bit of the added files.  The executable
 	bit is only changed in the index, the files on disk are left
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4c68bc19d5..30687de81a 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -232,8 +232,7 @@ From a clean working directory:
 
 -------------------------------------------------
 $ echo "* text=auto" >.gitattributes
-$ git read-tree --empty   # Clean index, force re-scan of working directory
-$ git add .
+$ git add --renormalize .
 $ git status        # Show files that will be normalized
 $ git commit -m "Introduce end-of-line normalization"
 -------------------------------------------------
@@ -328,6 +327,9 @@ You can declare that a filter turns a content that by itself is unusable
 into a usable content by setting the filter.<driver>.required configuration
 variable to `true`.
 
+Note: Whenever the clean filter is changed, the repo should be renormalized:
+$ git add --renormalize .
+
 For example, in .gitattributes, you would assign the `filter`
 attribute for paths.
 
diff --git a/builtin/add.c b/builtin/add.c
index a648cf4c56..c42b50f857 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
 };
 static int patch_interactive, add_interactive, edit_interactive;
 static int take_worktree_changes;
+static int add_renormalize;
 
 struct update_callback_data {
 	int flags;
@@ -123,6 +124,25 @@ int add_files_to_cache(const char *prefix,
 	return !!data.add_errors;
 }
 
+static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
+{
+	int i, retval = 0;
+
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+
+		if (ce_stage(ce))
+			continue; /* do not touch unmerged paths */
+		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
+			continue; /* do not touch non blobs */
+		if (pathspec && !ce_path_match(ce, pathspec, NULL))
+			continue;
+		retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
+	}
+
+	return retval;
+}
+
 static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
 {
 	char *seen;
@@ -276,6 +296,7 @@ static struct option builtin_add_options[] = {
 	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
 	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
 	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
+	OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
 	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
 	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
 	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
@@ -406,7 +427,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			  chmod_arg[1] != 'x' || chmod_arg[2]))
 		die(_("--chmod param '%s' must be either -x or +x"), chmod_arg);
 
-	add_new_files = !take_worktree_changes && !refresh_only;
+	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
@@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	plug_bulk_checkin();
 
-	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
+	if (add_renormalize)
+		exit_status |= renormalize_tracked_files(&pathspec, flags);
+	else
+		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
diff --git a/cache.h b/cache.h
index 6440e2bf21..3e63a74113 100644
--- a/cache.h
+++ b/cache.h
@@ -686,6 +686,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_RENORMALIZE  4
 extern int index_fd(struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(struct object_id *oid, const char *path, struct stat *st, unsigned flags);
 
diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..8acd3fcb93 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -631,13 +631,17 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 {
 	int size, namelen, was_same;
 	mode_t st_mode = st->st_mode;
-	struct cache_entry *ce, *alias;
+	struct cache_entry *ce, *alias = NULL;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_RACY_IS_DIRTY;
 	int verbose = flags & (ADD_CACHE_VERBOSE | ADD_CACHE_PRETEND);
 	int pretend = flags & ADD_CACHE_PRETEND;
 	int intent_only = flags & ADD_CACHE_INTENT;
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
+	int newflags = HASH_WRITE_OBJECT;
+
+	if (flags & HASH_RENORMALIZE)
+		newflags |= HASH_RENORMALIZE;
 
 	if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
 		return error("%s: can only add regular files, symbolic links or git-directories", path);
@@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	if (ignore_case) {
 		adjust_dirname_case(istate, ce->name);
 	}
+	if (!(flags & HASH_RENORMALIZE)) {
+		alias = index_file_exists(istate, ce->name,
+					  ce_namelen(ce), ignore_case);
+		if (alias &&
+		    !ce_stage(alias) &&
+		    !ie_match_stat(istate, alias, st, ce_option)) {
+			/* Nothing changed, really */
+			if (!S_ISGITLINK(alias->ce_mode))
+				ce_mark_uptodate(alias);
+			alias->ce_flags |= CE_ADDED;
 
-	alias = index_file_exists(istate, ce->name, ce_namelen(ce), ignore_case);
-	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
-		/* Nothing changed, really */
-		if (!S_ISGITLINK(alias->ce_mode))
-			ce_mark_uptodate(alias);
-		alias->ce_flags |= CE_ADDED;
-
-		free(ce);
-		return 0;
+			free(ce);
+			return 0;
+		}
 	}
 	if (!intent_only) {
-		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
+		if (index_path(&ce->oid, path, st, newflags)) {
 			free(ce);
 			return error("unable to index file %s", path);
 		}
diff --git a/sha1_file.c b/sha1_file.c
index 10c3a0083d..15abb184c2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const unsigned char *sha1)
 	return NULL;
 }
 
+
+static enum safe_crlf get_safe_crlf(unsigned flags)
+{
+	if (flags & HASH_RENORMALIZE)
+		return SAFE_CRLF_RENORMALIZE;
+	else if (flags & HASH_WRITE_OBJECT)
+		return safe_crlf;
+	else
+		return SAFE_CRLF_FALSE;
+}
+
+
 int mkdir_in_gitdir(const char *path)
 {
 	if (mkdir(path, 0777)) {
@@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
 		if (convert_to_git(&the_index, path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+				   get_safe_crlf(flags))) {
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 	assert(would_convert_to_git_filter_fd(path));
 
 	convert_to_git_filter_fd(&the_index, path, fd, &sbuf,
-				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+				 get_safe_crlf(flags));
 
 	if (write_object)
 		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
new file mode 100755
index 0000000000..fb1ed631d2
--- /dev/null
+++ b/t/t0025-crlf-renormalize.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='CRLF renormalization'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git config core.autocrlf false &&
+	printf "LINEONE\nLINETWO\nLINETHREE"     >LF.txt &&
+	printf "LINEONE\r\nLINETWO\r\nLINETHREE" >CRLF.txt &&
+	printf "LINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF.txt &&
+	git add . &&
+	git commit -m initial
+'
+
+test_expect_success 'renormalize CRLF in repo' '
+	echo "*.txt text=auto" >.gitattributes &&
+	git add --renormalize "*.txt" &&
+cat >expect <<EOF &&
+i/lf w/crlf attr/text=auto CRLF.txt
+i/lf w/lf attr/text=auto LF.txt
+i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
+EOF
+	git ls-files --eol |
+	sed -e "s/	/ /g" -e "s/  */ /g" |
+	sort >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.14.1.729.g59c0ea183a


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-10-30 16:29                     ` [PATCH v2 " tboegi
@ 2017-11-07  5:50                       ` Junio C Hamano
  2017-11-07 17:26                         ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-11-07  5:50 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> Make it safer to normalize the line endings in a repository:
> Files that had been commited with CRLF will be commited with LF.
> ...
> Helped-By: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---

Nicely explained.

> @@ -175,6 +175,12 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
>  	warning (e.g., if you are manually performing operations on
>  	submodules).
>  
> +--renormalize::
> +	Normalizes the line endings from CRLF to LF of tracked files.
> +	This applies to files which are either "text" or "text=auto"
> +	in .gitattributes (or core.autocrlf is true or input)
> +	--renormalize implies -u
> +

OK.

> +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
> +{
> +	int i, retval = 0;
> +
> +	for (i = 0; i < active_nr; i++) {
> +		struct cache_entry *ce = active_cache[i];
> +
> +		if (ce_stage(ce))
> +			continue; /* do not touch unmerged paths */
> +		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> +			continue; /* do not touch non blobs */
> +		if (pathspec && !ce_path_match(ce, pathspec, NULL))
> +			continue;
> +		retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);

We are removing the entry and then adding the same entry under the
same name, and iteration over the active_cache[] from 0 through
active_nr should be safe, I guess.

> ...
> -	add_new_files = !take_worktree_changes && !refresh_only;
> +	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;

If renormalize is given, we will *not* take new files, good.

> @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  
>  	plug_bulk_checkin();
>  
> -	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
> +	if (add_renormalize)
> +		exit_status |= renormalize_tracked_files(&pathspec, flags);
> +	else
> +		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
>  
>  	if (add_new_files)
>  		exit_status |= add_files(&dir, flags);

OK.

> ...
>  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
>  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> +	int newflags = HASH_WRITE_OBJECT;
> +
> +	if (flags & HASH_RENORMALIZE)
> +		newflags |= HASH_RENORMALIZE;
> ...
> @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  	if (ignore_case) {
>  		adjust_dirname_case(istate, ce->name);
>  	}
> +	if (!(flags & HASH_RENORMALIZE)) {
> +		alias = index_file_exists(istate, ce->name,
> +					  ce_namelen(ce), ignore_case);
> +		if (alias &&
> +		    !ce_stage(alias) &&
> +		    !ie_match_stat(istate, alias, st, ce_option)) {
> +			/* Nothing changed, really */
> +			if (!S_ISGITLINK(alias->ce_mode))
> +				ce_mark_uptodate(alias);
> +			alias->ce_flags |= CE_ADDED;

OK, so RENORMALIZE option forces the code to skip the "does the path
exist already?  maybe we can do without adding it?" check.

>  	if (!intent_only) {
> -		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
> +		if (index_path(&ce->oid, path, st, newflags)) {

And then we do hash it.  We later do add_index_entry() on this thing
and we have OK_TO_REPLACE bit in the add_option, so we are good to go.

> diff --git a/sha1_file.c b/sha1_file.c
> index 10c3a0083d..15abb184c2 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const unsigned char *sha1)
>  	return NULL;
>  }
>  
> +
> +static enum safe_crlf get_safe_crlf(unsigned flags)
> +{
> +	if (flags & HASH_RENORMALIZE)
> +		return SAFE_CRLF_RENORMALIZE;
> +	else if (flags & HASH_WRITE_OBJECT)
> +		return safe_crlf;
> +	else
> +		return SAFE_CRLF_FALSE;
> +}
> +
> +
>  int mkdir_in_gitdir(const char *path)
>  {
>  	if (mkdir(path, 0777)) {
> @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
>  	if ((type == OBJ_BLOB) && path) {
>  		struct strbuf nbuf = STRBUF_INIT;
>  		if (convert_to_git(&the_index, path, buf, size, &nbuf,
> -				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
> +				   get_safe_crlf(flags))) {
>  			buf = strbuf_detach(&nbuf, &size);
>  			re_allocated = 1;
>  		}
> @@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
>  	assert(would_convert_to_git_filter_fd(path));
>  
>  	convert_to_git_filter_fd(&the_index, path, fd, &sbuf,
> -				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
> +				 get_safe_crlf(flags));
>  
>  	if (write_object)
>  		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),

OK.  We used to force CRLF_FALSE when we are not writing it out; now
we have three choices, and a new helper helps us isolating the logic
to make that choice.

> diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
> new file mode 100755
> index 0000000000..fb1ed631d2
> --- /dev/null
> +++ b/t/t0025-crlf-renormalize.sh
> @@ -0,0 +1,30 @@
> +#!/bin/sh
> +
> +test_description='CRLF renormalization'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	git config core.autocrlf false &&
> +	printf "LINEONE\nLINETWO\nLINETHREE"     >LF.txt &&
> +	printf "LINEONE\r\nLINETWO\r\nLINETHREE" >CRLF.txt &&
> +	printf "LINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF.txt &&

Did you mean to make all these files end with an incomplete line?  I
think it does not hurt but it is misleading---the reader would try
to see if the incomplete lines are significant and necessary part of
the test, which is not, and would end up wasting time, no?

> +	git add . &&
> +	git commit -m initial
> +'
> +
> +test_expect_success 'renormalize CRLF in repo' '
> +	echo "*.txt text=auto" >.gitattributes &&
> +	git add --renormalize "*.txt" &&
> +cat >expect <<EOF &&
> +i/lf w/crlf attr/text=auto CRLF.txt
> +i/lf w/lf attr/text=auto LF.txt
> +i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
> +EOF

Perhaps use the <<-\EOF pattern?

I'd suggest squashing this in (or I can do so myself if there is no
other change needed).

Thanks.  Looks mostly good.

 t/t0025-crlf-renormalize.sh | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
index fb1ed631d2..ea4f82ea27 100755
--- a/t/t0025-crlf-renormalize.sh
+++ b/t/t0025-crlf-renormalize.sh
@@ -6,9 +6,9 @@ test_description='CRLF renormalization'
 
 test_expect_success setup '
 	git config core.autocrlf false &&
-	printf "LINEONE\nLINETWO\nLINETHREE"     >LF.txt &&
-	printf "LINEONE\r\nLINETWO\r\nLINETHREE" >CRLF.txt &&
-	printf "LINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF.txt &&
+	printf "LINEONE\nLINETWO\nLINETHREE\n"       >LF.txt &&
+	printf "LINEONE\r\nLINETWO\r\nLINETHREE\r\n" >CRLF.txt &&
+	printf "LINEONE\r\nLINETWO\nLINETHREE\n"     >CRLF_mix_LF.txt &&
 	git add . &&
 	git commit -m initial
 '
@@ -16,11 +16,11 @@ test_expect_success setup '
 test_expect_success 'renormalize CRLF in repo' '
 	echo "*.txt text=auto" >.gitattributes &&
 	git add --renormalize "*.txt" &&
-cat >expect <<EOF &&
-i/lf w/crlf attr/text=auto CRLF.txt
-i/lf w/lf attr/text=auto LF.txt
-i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
-EOF
+	cat >expect <<-\EOF &&
+	i/lf w/crlf attr/text=auto CRLF.txt
+	i/lf w/lf attr/text=auto LF.txt
+	i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
+	EOF
 	git ls-files --eol |
 	sed -e "s/	/ /g" -e "s/  */ /g" |
 	sort >actual &&

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-11-07  5:50                       ` Junio C Hamano
@ 2017-11-07 17:26                         ` Torsten Bögershausen
  2017-11-08  0:37                           ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-11-07 17:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[snip]
> 
> > @@ -175,6 +175,12 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
> >  	warning (e.g., if you are manually performing operations on
> >  	submodules).
> >  
> > +--renormalize::
> > +	Normalizes the line endings from CRLF to LF of tracked files.
> > +	This applies to files which are either "text" or "text=auto"
> > +	in .gitattributes (or core.autocrlf is true or input)
> > +	--renormalize implies -u
> > +
> 
> OK.

I think the fact, that clean filters are re-run, and re-evaluated
in case they are changed, should be made more clear here.
I don't know how to explain it better that CRLF conversion and/or filters are
re-applied, this is an attempt:


--renormalize::
	Normalizes the line endings from CRLF to LF of tracked files,
	if the .gitattributes or core.autocrlf say so.
	Additionally the clean and ident filters, if any, are re-run.
	--renormalize implies -u





> 
> > +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
> > +{
> > +	int i, retval = 0;
> > +
> > +	for (i = 0; i < active_nr; i++) {
> > +		struct cache_entry *ce = active_cache[i];
> > +
> > +		if (ce_stage(ce))
> > +			continue; /* do not touch unmerged paths */
> > +		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> > +			continue; /* do not touch non blobs */
> > +		if (pathspec && !ce_path_match(ce, pathspec, NULL))
> > +			continue;
> > +		retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
> 
> We are removing the entry and then adding the same entry under the
> same name, and iteration over the active_cache[] from 0 through
> active_nr should be safe, I guess.
> 
> > ...
> > -	add_new_files = !take_worktree_changes && !refresh_only;
> > +	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
> 
> If renormalize is given, we will *not* take new files, good.
> 
> > @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
> >  
> >  	plug_bulk_checkin();
> >  
> > -	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
> > +	if (add_renormalize)
> > +		exit_status |= renormalize_tracked_files(&pathspec, flags);
> > +	else
> > +		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
> >  
> >  	if (add_new_files)
> >  		exit_status |= add_files(&dir, flags);
> 
> OK.
> 
> > ...
> >  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
> >  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> > +	int newflags = HASH_WRITE_OBJECT;
> > +
> > +	if (flags & HASH_RENORMALIZE)
> > +		newflags |= HASH_RENORMALIZE;
> > ...
> > @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
> >  	if (ignore_case) {
> >  		adjust_dirname_case(istate, ce->name);
> >  	}
> > +	if (!(flags & HASH_RENORMALIZE)) {
> > +		alias = index_file_exists(istate, ce->name,
> > +					  ce_namelen(ce), ignore_case);
> > +		if (alias &&
> > +		    !ce_stage(alias) &&
> > +		    !ie_match_stat(istate, alias, st, ce_option)) {
> > +			/* Nothing changed, really */
> > +			if (!S_ISGITLINK(alias->ce_mode))
> > +				ce_mark_uptodate(alias);
> > +			alias->ce_flags |= CE_ADDED;
> 
> OK, so RENORMALIZE option forces the code to skip the "does the path
> exist already?  maybe we can do without adding it?" check.
> 
> >  	if (!intent_only) {
> > -		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
> > +		if (index_path(&ce->oid, path, st, newflags)) {
> 
> And then we do hash it.  We later do add_index_entry() on this thing
> and we have OK_TO_REPLACE bit in the add_option, so we are good to go.
> 
> > diff --git a/sha1_file.c b/sha1_file.c
> > index 10c3a0083d..15abb184c2 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const unsigned char *sha1)
> >  	return NULL;
> >  }
> >  
> > +
> > +static enum safe_crlf get_safe_crlf(unsigned flags)
> > +{
> > +	if (flags & HASH_RENORMALIZE)
> > +		return SAFE_CRLF_RENORMALIZE;
> > +	else if (flags & HASH_WRITE_OBJECT)
> > +		return safe_crlf;
> > +	else
> > +		return SAFE_CRLF_FALSE;
> > +}
> > +
> > +
> >  int mkdir_in_gitdir(const char *path)
> >  {
> >  	if (mkdir(path, 0777)) {
> > @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
> >  	if ((type == OBJ_BLOB) && path) {
> >  		struct strbuf nbuf = STRBUF_INIT;
> >  		if (convert_to_git(&the_index, path, buf, size, &nbuf,
> > -				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
> > +				   get_safe_crlf(flags))) {
> >  			buf = strbuf_detach(&nbuf, &size);
> >  			re_allocated = 1;
> >  		}
> > @@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
> >  	assert(would_convert_to_git_filter_fd(path));
> >  
> >  	convert_to_git_filter_fd(&the_index, path, fd, &sbuf,
> > -				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
> > +				 get_safe_crlf(flags));
> >  
> >  	if (write_object)
> >  		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
> 
> OK.  We used to force CRLF_FALSE when we are not writing it out; now
> we have three choices, and a new helper helps us isolating the logic
> to make that choice.
> 
> > diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
> > new file mode 100755
> > index 0000000000..fb1ed631d2
> > --- /dev/null
> > +++ b/t/t0025-crlf-renormalize.sh
> > @@ -0,0 +1,30 @@
> > +#!/bin/sh
> > +
> > +test_description='CRLF renormalization'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success setup '
> > +	git config core.autocrlf false &&
> > +	printf "LINEONE\nLINETWO\nLINETHREE"     >LF.txt &&
> > +	printf "LINEONE\r\nLINETWO\r\nLINETHREE" >CRLF.txt &&
> > +	printf "LINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF.txt &&
> 
> Did you mean to make all these files end with an incomplete line?  I
> think it does not hurt but it is misleading---the reader would try
> to see if the incomplete lines are significant and necessary part of
> the test, which is not, and would end up wasting time, no?
> 
> > +	git add . &&
> > +	git commit -m initial
> > +'
> > +
> > +test_expect_success 'renormalize CRLF in repo' '
> > +	echo "*.txt text=auto" >.gitattributes &&
> > +	git add --renormalize "*.txt" &&
> > +cat >expect <<EOF &&
> > +i/lf w/crlf attr/text=auto CRLF.txt
> > +i/lf w/lf attr/text=auto LF.txt
> > +i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
> > +EOF
> 
> Perhaps use the <<-\EOF pattern?
> 
> I'd suggest squashing this in (or I can do so myself if there is no
> other change needed).
> 
> Thanks.  Looks mostly good.

Thanks for review and proposed fixes and help.

[snip the TC. Adding line endings is good)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-11-07 17:26                         ` Torsten Bögershausen
@ 2017-11-08  0:37                           ` Junio C Hamano
  2017-11-09 18:47                             ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-11-08  0:37 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Torsten Bögershausen <tboegi@web.de> writes:

>> > +--renormalize::
>> > +	Normalizes the line endings from CRLF to LF of tracked files.
>> > +	This applies to files which are either "text" or "text=auto"
>> > +	in .gitattributes (or core.autocrlf is true or input)
>> > +	--renormalize implies -u
>> > +
>> 
>> OK.
>
> I think the fact, that clean filters are re-run, and re-evaluated
> in case they are changed, should be made more clear here.
> I don't know how to explain it better that CRLF conversion and/or filters are
> re-applied, this is an attempt:
>
>
> --renormalize::
> 	Normalizes the line endings from CRLF to LF of tracked files,
> 	if the .gitattributes or core.autocrlf say so.
> 	Additionally the clean and ident filters, if any, are re-run.
> 	--renormalize implies -u

That is certainly better.  Do we have an end-user facing phrase to
collectively call everything the "convert_to_git()" processing does?
When I talk casually about it, I'd call it the "clean" process (as
opposed to the "smudge" process) as a term that includes all the
things that Git does to massage contents in the working tree to
in-repository representation.

If we had such a term in Documentation/glossary-contents.txt, we
could even say

	Add contents of all paths to the index by freshly applying
	the "clean" process, even to the ones Git may think are
	unmodified in the working tree since they were added the
	last time (based on the file timestamps etc.).  This is
	often useful after updating settings like `core.autocrlf` in
	the `.git/config` file and the `text` attributes in the
	`.gitattributes` file to correct the index entries that
	records lines with CRLF to use LF instead, or changing what
	the `clean` filter does.  This option implies `-u`.

The point is to express that the CRLF/LF is a consequence (even
though it may be the most prominent one from end-users' point of
view) of a larger processing.

> [snip the TC. Adding line endings is good)

What is TC in this context?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-11-08  0:37                           ` Junio C Hamano
@ 2017-11-09 18:47                             ` Torsten Bögershausen
  2017-11-10  0:22                               ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2017-11-09 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[]
> 
> If we had such a term in Documentation/glossary-contents.txt, we
> could even say
> 
> 	Add contents of all paths to the index by freshly applying
> 	the "clean" process, even to the ones Git may think are
> 	unmodified in the working tree since they were added the
> 	last time (based on the file timestamps etc.).  This is
> 	often useful after updating settings like `core.autocrlf` in
> 	the `.git/config` file and the `text` attributes in the
> 	`.gitattributes` file to correct the index entries that
> 	records lines with CRLF to use LF instead, or changing what
> 	the `clean` filter does.  This option implies `-u`.
> 
> The point is to express that the CRLF/LF is a consequence (even
> though it may be the most prominent one from end-users' point of
> view) of a larger processing.

Here is a somwhat shorter description:

Apply the "clean" process freshly to all tracked files.
This is useful after changing `core.autocrlf` or the `text`
attributes in the `.gitattributes` file because
Git may not consider these files as changed.
Correct the files that had been commited with CRLF,
they will from now on have LF instead.
Re-run what the `clean` filter does.
This option implies `-u`.


> 
> > [snip the TC. Adding line endings is good)
> 
> What is TC in this context?

Sorry for confusion: TC means test case.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-11-09 18:47                             ` Torsten Bögershausen
@ 2017-11-10  0:22                               ` Junio C Hamano
  2017-11-12 20:08                                 ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-11-10  0:22 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Torsten Bögershausen <tboegi@web.de> writes:

> Here is a somwhat shorter description:
>
> Apply the "clean" process freshly to all tracked files.
> This is useful after changing `core.autocrlf` or the `text`
> attributes in the `.gitattributes` file because
> Git may not consider these files as changed.

I think it is OK to omit .git/config for brevity (I am assuming that
your justification is because you thought it was obvious it is a
configuration variable); but then it is equally obvious (if not
more) that `text` attribute comes from .gitattributes (notice we do
not mention core.autocrlf is a configuration variable in the above,
but we do say `text` is an attribute) so it can also be omitted for
brevity.

> Correct the files that had been commited with CRLF,
> they will from now on have LF instead.

Reading this as a single sentence immediately after the above
paragraph leaves me feel confused.  First of all, this would not
happen unless the user corrects core.autocrlf/text like described
above.  In fact, updating these settings is done as in order to do
that correction.  So I'd say it should not be split.

> Re-run what the `clean` filter does.

This again looks out of place just like the previous sentence.  In
fact, provided if "the clean process" is understood by the end user,
this is redundant.

> This option implies `-u`.

Taking these altogether, perhaps

    Apply the "clean" process freshly to all tracked files to
    forcibly add them again to the index.  This is useful after
    changing `core.autocrlf` configuration or the `text` attribute
    in order to correct files added with wrong CRLF/LF line endings.
    This option implies `-u`.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/1] Introduce git add --renormalize .
  2017-11-10  0:22                               ` Junio C Hamano
@ 2017-11-12 20:08                                 ` Torsten Bögershausen
  0 siblings, 0 replies; 34+ messages in thread
From: Torsten Bögershausen @ 2017-11-12 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 10, 2017 at 09:22:10AM +0900, Junio C Hamano wrote:

> 
> Taking these altogether, perhaps
> 
>     Apply the "clean" process freshly to all tracked files to
>     forcibly add them again to the index.  This is useful after
>     changing `core.autocrlf` configuration or the `text` attribute
>     in order to correct files added with wrong CRLF/LF line endings.
>     This option implies `-u`.
> 
> Thanks.

OK with the text - I can send a new version the next days.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/1] Introduce git add --renormalize .
  2017-10-16 17:34                   ` Junio C Hamano
  2017-10-30 16:29                     ` [PATCH v2 " tboegi
@ 2017-11-16 16:38                     ` tboegi
  2017-11-17  1:24                       ` Junio C Hamano
  2017-11-17 20:44                       ` Eric Sunshine
  1 sibling, 2 replies; 34+ messages in thread
From: tboegi @ 2017-11-16 16:38 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Make it safer to normalize the line endings in a repository:
Files that had been commited with CRLF will be commited with LF.

The old way to normalize a repo was like this:
 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.

The new "add ..renormalize" does not add untracked files:
 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

While add it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.

Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---

Changes since V2:
  Add line endings in t0025
  Use the <<-\EOF pattern
  Improve the documentation for "git add --renormalize"
  

Documentation/git-add.txt       |  9 ++++++++-
 Documentation/gitattributes.txt |  6 ++++--
 builtin/add.c                   | 28 ++++++++++++++++++++++++++--
 cache.h                         |  1 +
 read-cache.c                    | 30 +++++++++++++++++++-----------
 sha1_file.c                     | 16 ++++++++++++++--
 t/t0025-crlf-renormalize.sh     | 30 ++++++++++++++++++++++++++++++
 7 files changed, 102 insertions(+), 18 deletions(-)
 create mode 100755 t/t0025-crlf-renormalize.sh

diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
index b700beaff5..d50fa339dc 100644
--- a/Documentation/git-add.txt
+++ b/Documentation/git-add.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 [verse]
 'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p]
 	  [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]]
-	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing]
+	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize]
 	  [--chmod=(+|-)x] [--] [<pathspec>...]
 
 DESCRIPTION
@@ -175,6 +175,13 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
 	warning (e.g., if you are manually performing operations on
 	submodules).
 
+--renormalize::
+	Apply the "clean" process freshly to all tracked files to
+	forcibly add them again to the index.  This is useful after
+	changing `core.autocrlf` configuration or the `text` attribute
+	in order to correct files added with wrong CRLF/LF line endings.
+	This option implies `-u`.
+
 --chmod=(+|-)x::
 	Override the executable bit of the added files.  The executable
 	bit is only changed in the index, the files on disk are left
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4c68bc19d5..30687de81a 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -232,8 +232,7 @@ From a clean working directory:
 
 -------------------------------------------------
 $ echo "* text=auto" >.gitattributes
-$ git read-tree --empty   # Clean index, force re-scan of working directory
-$ git add .
+$ git add --renormalize .
 $ git status        # Show files that will be normalized
 $ git commit -m "Introduce end-of-line normalization"
 -------------------------------------------------
@@ -328,6 +327,9 @@ You can declare that a filter turns a content that by itself is unusable
 into a usable content by setting the filter.<driver>.required configuration
 variable to `true`.
 
+Note: Whenever the clean filter is changed, the repo should be renormalized:
+$ git add --renormalize .
+
 For example, in .gitattributes, you would assign the `filter`
 attribute for paths.
 
diff --git a/builtin/add.c b/builtin/add.c
index a648cf4c56..c42b50f857 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
 };
 static int patch_interactive, add_interactive, edit_interactive;
 static int take_worktree_changes;
+static int add_renormalize;
 
 struct update_callback_data {
 	int flags;
@@ -123,6 +124,25 @@ int add_files_to_cache(const char *prefix,
 	return !!data.add_errors;
 }
 
+static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
+{
+	int i, retval = 0;
+
+	for (i = 0; i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+
+		if (ce_stage(ce))
+			continue; /* do not touch unmerged paths */
+		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
+			continue; /* do not touch non blobs */
+		if (pathspec && !ce_path_match(ce, pathspec, NULL))
+			continue;
+		retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
+	}
+
+	return retval;
+}
+
 static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
 {
 	char *seen;
@@ -276,6 +296,7 @@ static struct option builtin_add_options[] = {
 	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
 	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
 	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
+	OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
 	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
 	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
 	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
@@ -406,7 +427,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			  chmod_arg[1] != 'x' || chmod_arg[2]))
 		die(_("--chmod param '%s' must be either -x or +x"), chmod_arg);
 
-	add_new_files = !take_worktree_changes && !refresh_only;
+	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
 	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
 
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
@@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	plug_bulk_checkin();
 
-	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
+	if (add_renormalize)
+		exit_status |= renormalize_tracked_files(&pathspec, flags);
+	else
+		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
diff --git a/cache.h b/cache.h
index 6440e2bf21..3e63a74113 100644
--- a/cache.h
+++ b/cache.h
@@ -686,6 +686,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_RENORMALIZE  4
 extern int index_fd(struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(struct object_id *oid, const char *path, struct stat *st, unsigned flags);
 
diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..8acd3fcb93 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -631,13 +631,17 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 {
 	int size, namelen, was_same;
 	mode_t st_mode = st->st_mode;
-	struct cache_entry *ce, *alias;
+	struct cache_entry *ce, *alias = NULL;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_RACY_IS_DIRTY;
 	int verbose = flags & (ADD_CACHE_VERBOSE | ADD_CACHE_PRETEND);
 	int pretend = flags & ADD_CACHE_PRETEND;
 	int intent_only = flags & ADD_CACHE_INTENT;
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
+	int newflags = HASH_WRITE_OBJECT;
+
+	if (flags & HASH_RENORMALIZE)
+		newflags |= HASH_RENORMALIZE;
 
 	if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
 		return error("%s: can only add regular files, symbolic links or git-directories", path);
@@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	if (ignore_case) {
 		adjust_dirname_case(istate, ce->name);
 	}
+	if (!(flags & HASH_RENORMALIZE)) {
+		alias = index_file_exists(istate, ce->name,
+					  ce_namelen(ce), ignore_case);
+		if (alias &&
+		    !ce_stage(alias) &&
+		    !ie_match_stat(istate, alias, st, ce_option)) {
+			/* Nothing changed, really */
+			if (!S_ISGITLINK(alias->ce_mode))
+				ce_mark_uptodate(alias);
+			alias->ce_flags |= CE_ADDED;
 
-	alias = index_file_exists(istate, ce->name, ce_namelen(ce), ignore_case);
-	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
-		/* Nothing changed, really */
-		if (!S_ISGITLINK(alias->ce_mode))
-			ce_mark_uptodate(alias);
-		alias->ce_flags |= CE_ADDED;
-
-		free(ce);
-		return 0;
+			free(ce);
+			return 0;
+		}
 	}
 	if (!intent_only) {
-		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
+		if (index_path(&ce->oid, path, st, newflags)) {
 			free(ce);
 			return error("unable to index file %s", path);
 		}
diff --git a/sha1_file.c b/sha1_file.c
index 10c3a0083d..15abb184c2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const unsigned char *sha1)
 	return NULL;
 }
 
+
+static enum safe_crlf get_safe_crlf(unsigned flags)
+{
+	if (flags & HASH_RENORMALIZE)
+		return SAFE_CRLF_RENORMALIZE;
+	else if (flags & HASH_WRITE_OBJECT)
+		return safe_crlf;
+	else
+		return SAFE_CRLF_FALSE;
+}
+
+
 int mkdir_in_gitdir(const char *path)
 {
 	if (mkdir(path, 0777)) {
@@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
 		if (convert_to_git(&the_index, path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+				   get_safe_crlf(flags))) {
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 	assert(would_convert_to_git_filter_fd(path));
 
 	convert_to_git_filter_fd(&the_index, path, fd, &sbuf,
-				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+				 get_safe_crlf(flags));
 
 	if (write_object)
 		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
new file mode 100755
index 0000000000..50d8f3a0f3
--- /dev/null
+++ b/t/t0025-crlf-renormalize.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+test_description='CRLF renormalization'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git config core.autocrlf false &&
+	printf "LINEONE\nLINETWO\nLINETHREE\n"        >LF.txt &&
+	printf "LINEONE\r\nLINETWO\r\nLINETHREE\r\n"  >CRLF.txt &&
+	printf "LINEONE\r\nLINETWO\nLINETHREE\n"      >CRLF_mix_LF.txt &&
+	git add . &&
+	git commit -m initial
+'
+
+test_expect_success 'renormalize CRLF in repo' '
+	echo "*.txt text=auto" >.gitattributes &&
+	git add --renormalize "*.txt" &&
+cat >expect <<-\EOF &&
+	i/lf w/crlf attr/text=auto CRLF.txt
+	i/lf w/lf attr/text=auto LF.txt
+	i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
+EOF
+	git ls-files --eol |
+	sed -e "s/	/ /g" -e "s/  */ /g" |
+	sort >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.15.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] Introduce git add --renormalize .
  2017-11-16 16:38                     ` [PATCH v3 " tboegi
@ 2017-11-17  1:24                       ` Junio C Hamano
  2017-11-17 20:44                       ` Eric Sunshine
  1 sibling, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-11-17  1:24 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

I'll retitle this to

	Subject: add: introduce "--renormalize"

and will queue with s/$old/$new/ that you'll see below.

> From: Torsten Bögershausen <tboegi@web.de>
>
> Make it safer to normalize the line endings in a repository:

s/:$/./;

> Files that had been commited with CRLF will be commited with LF.
>
> The old way to normalize a repo was like this:
>  # Make sure that there are not untracked files
>  $ echo "* text=auto" >.gitattributes
>  $ git read-tree --empty
>  $ git add .
>  $ git commit -m "Introduce end-of-line normalization"
>
> The user must make sure that there are no untracked files,
> otherwise they would have been added and tracked from now on.
>
> The new "add ..renormalize" does not add untracked files:

s/\.\./--/;

>  $ echo "* text=auto" >.gitattributes
>  $ git add --renormalize .
>  $ git commit -m "Introduce end-of-line normalization"
>
> Note that "git add --renormalize <pathspec>" is the short form for
> "git add -u --renormalize <pathspec>".
>
> While add it, document that the same renormalization may be needed,

s/add it/at it/;

> whenever a clean filter is added or changed.
>
> Helped-By: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---

Thanks.

>
> Changes since V2:
>   Add line endings in t0025
>   Use the <<-\EOF pattern
>   Improve the documentation for "git add --renormalize"
>   
>
> Documentation/git-add.txt       |  9 ++++++++-
>  Documentation/gitattributes.txt |  6 ++++--
>  builtin/add.c                   | 28 ++++++++++++++++++++++++++--
>  cache.h                         |  1 +
>  read-cache.c                    | 30 +++++++++++++++++++-----------
>  sha1_file.c                     | 16 ++++++++++++++--
>  t/t0025-crlf-renormalize.sh     | 30 ++++++++++++++++++++++++++++++
>  7 files changed, 102 insertions(+), 18 deletions(-)
>  create mode 100755 t/t0025-crlf-renormalize.sh
>
> diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
> index b700beaff5..d50fa339dc 100644
> --- a/Documentation/git-add.txt
> +++ b/Documentation/git-add.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git add' [--verbose | -v] [--dry-run | -n] [--force | -f] [--interactive | -i] [--patch | -p]
>  	  [--edit | -e] [--[no-]all | --[no-]ignore-removal | [--update | -u]]
> -	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing]
> +	  [--intent-to-add | -N] [--refresh] [--ignore-errors] [--ignore-missing] [--renormalize]
>  	  [--chmod=(+|-)x] [--] [<pathspec>...]
>  
>  DESCRIPTION
> @@ -175,6 +175,13 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
>  	warning (e.g., if you are manually performing operations on
>  	submodules).
>  
> +--renormalize::
> +	Apply the "clean" process freshly to all tracked files to
> +	forcibly add them again to the index.  This is useful after
> +	changing `core.autocrlf` configuration or the `text` attribute
> +	in order to correct files added with wrong CRLF/LF line endings.
> +	This option implies `-u`.
> +
>  --chmod=(+|-)x::
>  	Override the executable bit of the added files.  The executable
>  	bit is only changed in the index, the files on disk are left
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 4c68bc19d5..30687de81a 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -232,8 +232,7 @@ From a clean working directory:
>  
>  -------------------------------------------------
>  $ echo "* text=auto" >.gitattributes
> -$ git read-tree --empty   # Clean index, force re-scan of working directory
> -$ git add .
> +$ git add --renormalize .
>  $ git status        # Show files that will be normalized
>  $ git commit -m "Introduce end-of-line normalization"
>  -------------------------------------------------
> @@ -328,6 +327,9 @@ You can declare that a filter turns a content that by itself is unusable
>  into a usable content by setting the filter.<driver>.required configuration
>  variable to `true`.
>  
> +Note: Whenever the clean filter is changed, the repo should be renormalized:
> +$ git add --renormalize .
> +
>  For example, in .gitattributes, you would assign the `filter`
>  attribute for paths.
>  
> diff --git a/builtin/add.c b/builtin/add.c
> index a648cf4c56..c42b50f857 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -26,6 +26,7 @@ static const char * const builtin_add_usage[] = {
>  };
>  static int patch_interactive, add_interactive, edit_interactive;
>  static int take_worktree_changes;
> +static int add_renormalize;
>  
>  struct update_callback_data {
>  	int flags;
> @@ -123,6 +124,25 @@ int add_files_to_cache(const char *prefix,
>  	return !!data.add_errors;
>  }
>  
> +static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
> +{
> +	int i, retval = 0;
> +
> +	for (i = 0; i < active_nr; i++) {
> +		struct cache_entry *ce = active_cache[i];
> +
> +		if (ce_stage(ce))
> +			continue; /* do not touch unmerged paths */
> +		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
> +			continue; /* do not touch non blobs */
> +		if (pathspec && !ce_path_match(ce, pathspec, NULL))
> +			continue;
> +		retval |= add_file_to_cache(ce->name, flags | HASH_RENORMALIZE);
> +	}
> +
> +	return retval;
> +}
> +
>  static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec, int prefix)
>  {
>  	char *seen;
> @@ -276,6 +296,7 @@ static struct option builtin_add_options[] = {
>  	OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
>  	OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files")),
>  	OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
> +	OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
>  	OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
>  	OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
>  	{ OPTION_CALLBACK, 0, "ignore-removal", &addremove_explicit,
> @@ -406,7 +427,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  			  chmod_arg[1] != 'x' || chmod_arg[2]))
>  		die(_("--chmod param '%s' must be either -x or +x"), chmod_arg);
>  
> -	add_new_files = !take_worktree_changes && !refresh_only;
> +	add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
>  	require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
>  
>  	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
> @@ -500,7 +521,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  
>  	plug_bulk_checkin();
>  
> -	exit_status |= add_files_to_cache(prefix, &pathspec, flags);
> +	if (add_renormalize)
> +		exit_status |= renormalize_tracked_files(&pathspec, flags);
> +	else
> +		exit_status |= add_files_to_cache(prefix, &pathspec, flags);
>  
>  	if (add_new_files)
>  		exit_status |= add_files(&dir, flags);
> diff --git a/cache.h b/cache.h
> index 6440e2bf21..3e63a74113 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -686,6 +686,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
>  
>  #define HASH_WRITE_OBJECT 1
>  #define HASH_FORMAT_CHECK 2
> +#define HASH_RENORMALIZE  4
>  extern int index_fd(struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
>  extern int index_path(struct object_id *oid, const char *path, struct stat *st, unsigned flags);
>  
> diff --git a/read-cache.c b/read-cache.c
> index 65f4fe8375..8acd3fcb93 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -631,13 +631,17 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  {
>  	int size, namelen, was_same;
>  	mode_t st_mode = st->st_mode;
> -	struct cache_entry *ce, *alias;
> +	struct cache_entry *ce, *alias = NULL;
>  	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_RACY_IS_DIRTY;
>  	int verbose = flags & (ADD_CACHE_VERBOSE | ADD_CACHE_PRETEND);
>  	int pretend = flags & ADD_CACHE_PRETEND;
>  	int intent_only = flags & ADD_CACHE_INTENT;
>  	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
>  			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
> +	int newflags = HASH_WRITE_OBJECT;
> +
> +	if (flags & HASH_RENORMALIZE)
> +		newflags |= HASH_RENORMALIZE;
>  
>  	if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
>  		return error("%s: can only add regular files, symbolic links or git-directories", path);
> @@ -678,19 +682,23 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
>  	if (ignore_case) {
>  		adjust_dirname_case(istate, ce->name);
>  	}
> +	if (!(flags & HASH_RENORMALIZE)) {
> +		alias = index_file_exists(istate, ce->name,
> +					  ce_namelen(ce), ignore_case);
> +		if (alias &&
> +		    !ce_stage(alias) &&
> +		    !ie_match_stat(istate, alias, st, ce_option)) {
> +			/* Nothing changed, really */
> +			if (!S_ISGITLINK(alias->ce_mode))
> +				ce_mark_uptodate(alias);
> +			alias->ce_flags |= CE_ADDED;
>  
> -	alias = index_file_exists(istate, ce->name, ce_namelen(ce), ignore_case);
> -	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, st, ce_option)) {
> -		/* Nothing changed, really */
> -		if (!S_ISGITLINK(alias->ce_mode))
> -			ce_mark_uptodate(alias);
> -		alias->ce_flags |= CE_ADDED;
> -
> -		free(ce);
> -		return 0;
> +			free(ce);
> +			return 0;
> +		}
>  	}
>  	if (!intent_only) {
> -		if (index_path(&ce->oid, path, st, HASH_WRITE_OBJECT)) {
> +		if (index_path(&ce->oid, path, st, newflags)) {
>  			free(ce);
>  			return error("unable to index file %s", path);
>  		}
> diff --git a/sha1_file.c b/sha1_file.c
> index 10c3a0083d..15abb184c2 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -74,6 +74,18 @@ static struct cached_object *find_cached_object(const unsigned char *sha1)
>  	return NULL;
>  }
>  
> +
> +static enum safe_crlf get_safe_crlf(unsigned flags)
> +{
> +	if (flags & HASH_RENORMALIZE)
> +		return SAFE_CRLF_RENORMALIZE;
> +	else if (flags & HASH_WRITE_OBJECT)
> +		return safe_crlf;
> +	else
> +		return SAFE_CRLF_FALSE;
> +}
> +
> +
>  int mkdir_in_gitdir(const char *path)
>  {
>  	if (mkdir(path, 0777)) {
> @@ -1680,7 +1692,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
>  	if ((type == OBJ_BLOB) && path) {
>  		struct strbuf nbuf = STRBUF_INIT;
>  		if (convert_to_git(&the_index, path, buf, size, &nbuf,
> -				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
> +				   get_safe_crlf(flags))) {
>  			buf = strbuf_detach(&nbuf, &size);
>  			re_allocated = 1;
>  		}
> @@ -1714,7 +1726,7 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
>  	assert(would_convert_to_git_filter_fd(path));
>  
>  	convert_to_git_filter_fd(&the_index, path, fd, &sbuf,
> -				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
> +				 get_safe_crlf(flags));
>  
>  	if (write_object)
>  		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
> diff --git a/t/t0025-crlf-renormalize.sh b/t/t0025-crlf-renormalize.sh
> new file mode 100755
> index 0000000000..50d8f3a0f3
> --- /dev/null
> +++ b/t/t0025-crlf-renormalize.sh
> @@ -0,0 +1,30 @@
> +#!/bin/sh
> +
> +test_description='CRLF renormalization'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	git config core.autocrlf false &&
> +	printf "LINEONE\nLINETWO\nLINETHREE\n"        >LF.txt &&
> +	printf "LINEONE\r\nLINETWO\r\nLINETHREE\r\n"  >CRLF.txt &&
> +	printf "LINEONE\r\nLINETWO\nLINETHREE\n"      >CRLF_mix_LF.txt &&
> +	git add . &&
> +	git commit -m initial
> +'
> +
> +test_expect_success 'renormalize CRLF in repo' '
> +	echo "*.txt text=auto" >.gitattributes &&
> +	git add --renormalize "*.txt" &&
> +cat >expect <<-\EOF &&
> +	i/lf w/crlf attr/text=auto CRLF.txt
> +	i/lf w/lf attr/text=auto LF.txt
> +	i/lf w/mixed attr/text=auto CRLF_mix_LF.txt
> +EOF
> +	git ls-files --eol |
> +	sed -e "s/	/ /g" -e "s/  */ /g" |
> +	sort >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] Introduce git add --renormalize .
  2017-11-16 16:38                     ` [PATCH v3 " tboegi
  2017-11-17  1:24                       ` Junio C Hamano
@ 2017-11-17 20:44                       ` Eric Sunshine
  2017-11-18  1:47                         ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Sunshine @ 2017-11-17 20:44 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List

On Thu, Nov 16, 2017 at 11:38 AM,  <tboegi@web.de> wrote:
> Make it safer to normalize the line endings in a repository:
> Files that had been commited with CRLF will be commited with LF.
>
> The old way to normalize a repo was like this:
>  # Make sure that there are not untracked files
>  $ echo "* text=auto" >.gitattributes
>  $ git read-tree --empty
>  $ git add .
>  $ git commit -m "Introduce end-of-line normalization"
>
> The user must make sure that there are no untracked files,
> otherwise they would have been added and tracked from now on.
>
> The new "add ..renormalize" does not add untracked files:
>  $ echo "* text=auto" >.gitattributes
>  $ git add --renormalize .
>  $ git commit -m "Introduce end-of-line normalization"
>
> Note that "git add --renormalize <pathspec>" is the short form for
> "git add -u --renormalize <pathspec>".
>
> While add it, document that the same renormalization may be needed,
> whenever a clean filter is added or changed.

Forgive me for chiming in so late, but as a newcomer to this topic,
the high-level choice made by this patch feels a bit questionable. I
understand that, for people familiar with the "old way" of normalizing
files, git-add might seems like the right place to house this
functionality (and perhaps that's true from an implementation angle?),
but as one coming to this topic with no existing bias about
implementation or the "old way", git-add feels like an odd choice.
This sort of normalization (emptying the index, potentially modifying
files, repopulating the index) seems too high-level for git-add.

I _could_ understand if this functionality lived in, say, a new
command git-attr:

    SYNOPSIS

    git attr renormalize [--no-commit | [-m <msg>]] pathname...
    git attr check [-a | --all | attr…] [--] pathname…
    git attr check --stdin [-z] [-a | --all | attr…]

    DESCRIPTION

    'git attr renormalize'

    Apply the "clean" process ... and commit the results. This is
    useful after changing `core.autocrlf` ... etc. With '-m', uses
    <msg> as the commit message, else launches the editor. Use
    '--no-commit' to skip the automatic commit.

The 'git attr check' command subsumes the role of existing
git-check-attr. One could envision git-attr growing additional
subcommands to edit .gitattributes, much like git-config edits
.gitconfig.

(I have since read the thread in which Junio's suggested[1] that
git-add could house this functionality, but it still feels too
high-level.)

More below...

> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
> diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
> @@ -175,6 +175,13 @@ for "git add --no-all <pathspec>...", i.e. ignored removed files.
> +--renormalize::
> +       Apply the "clean" process freshly to all tracked files to

This is the only time "clean" appears in git-add documentation. Every
newcomer to git learns about git-add very early on, but "clean
process" is a fairly advanced topic, unlikely to be on a newcomer's
radar. The term "renormalize" also feels out of place in git-add
documentation. If I was a newcomer reading git-add documentation, I
think I'd be left pretty well clueless by this description. At the
very least, perhaps add links to git-attributes and 'core.autocflf'
configuration.

> +       forcibly add them again to the index.  This is useful after
> +       changing `core.autocrlf` configuration or the `text` attribute
> +       in order to correct files added with wrong CRLF/LF line endings.
> +       This option implies `-u`.

[1]: https://public-inbox.org/git/xmqqbmlm9y94.fsf@gitster.mtv.corp.google.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 1/1] Introduce git add --renormalize .
  2017-11-17 20:44                       ` Eric Sunshine
@ 2017-11-18  1:47                         ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-11-18  1:47 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Torsten Bögershausen, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> I _could_ understand if this functionality lived in, say, a new
> command git-attr:
> ...
> (I have since read the thread in which Junio's suggested[1] that
> git-add could house this functionality, but it still feels too
> high-level.)

If this were part of a hypothetical "git attr" command, I would have
a hard time understanding it.  I would view the "attr" as attributes
attached to each path, telling various things about the path, one of
which may be how contents are handled between Git and the file on
the filesystem.  In other words, "attr" is _not_ contents; it is a
set of attributes attached to path that house the contents.

What we are correcting here at this point in the expected usecase is
not an attribute.  We are correting the contents, and between the
two potential sources of "the right version" of the contents, the
user tells that the filesystem is the right one, so "add" is used
to correct mistakes in the index based on what is in the filesystem
(if we were correcting the other way, like Jonathan said in the
message in the thread you read, "checkout" would be used to correct
the filesystem version using what is in the index).



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2017-10-04  2:00       ` Junio C Hamano
  2017-10-04 16:26         ` Robert Dailey
@ 2018-02-15 15:24         ` Robert Dailey
  2018-02-15 19:16           ` Junio C Hamano
  2018-02-16 16:34           ` Torsten Bögershausen
  1 sibling, 2 replies; 34+ messages in thread
From: Robert Dailey @ 2018-02-15 15:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, Git

On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>>> $ git rm -r --cached . && git add .
>>
>> (Both should work)
>>
>> To be honest, from the documentation, I can't figure out the difference between
>> $ git read-tree --empty
>> and
>> $ git rm -r --cached .
>>
>> Does anybody remember the discussion, why we ended up with read-tree ?
>
> We used to use neither, and considered it fine to "rm .git/index" if
> you wanted to empty the on-disk index file in the old world.  In the
> modern world, folks want you to avoid touching filesystem directly
> and instead want you to use Git tools, and the above are two obvious
> ways to do so.
>
> "git read-tree" (without any parameter, i.e. "read these 0 trees and
> populate the index with it") and its modern and preferred synonym
> "git read-tree --empty" (i.e. "I am giving 0 trees and I know the
> sole effect of this command is to empty the index.") are more direct
> ways to express "I want the index emptied" between the two.
>
> The other one, "git rm -r --cached .", in the end gives you the same
> state because it tells Git to "iterate over all the entries in the
> index, find the ones that match pathspec '.', and remove them from
> the index.".  It is not wrong per-se, but conceptually it is a bit
> roundabout way to say that "I want the index emptied", I would
> think.
>
> I wouldn't be surprised if the "rm -r --cached ." were a lot slower,
> due to the overhead of having to do the pathspec filtering that ends
> up to be a no-op, but there shouldn't be a difference in the end
> result.

Sorry to bring this old thread back to life, but I did notice that
this causes file modes to reset back to 644 (from 755) on Windows
version of Git. Is there a way to `$ git read-tree --empty && git add
.` without mucking with file permissions?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2018-02-15 15:24         ` Line ending normalization doesn't work as expected Robert Dailey
@ 2018-02-15 19:16           ` Junio C Hamano
  2018-02-15 21:47             ` Robert Dailey
  2018-02-16 16:34           ` Torsten Bögershausen
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2018-02-15 19:16 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Torsten Bögershausen, Git

Robert Dailey <rcdailey.lists@gmail.com> writes:

> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Torsten Bögershausen <tboegi@web.de> writes:
>>
>>>> $ git rm -r --cached . && git add .
>>>
>>> (Both should work)
>>>
>>> To be honest, from the documentation, I can't figure out the difference between
>>> $ git read-tree --empty
>>> and
>>> $ git rm -r --cached .
>>>
>>> Does anybody remember the discussion, why we ended up with read-tree ?
>> ...
>
> Sorry to bring this old thread back to life, but I did notice that
> this causes file modes to reset back to 644 (from 755) on Windows
> version of Git. Is there a way to `$ git read-tree --empty && git add
> .` without mucking with file permissions?

I think the message you are referring to is a tangent that discusses
how it was done in the old world, with issues that come from the
fact that with such an approach the paths are first removed from the
index and then added afresh to the index, which can lose cases and
executable bits when working on a filesystem that does not retain
enough information.

The way in the new world is to use "add --renormalize" which was
added at 9472935d ("add: introduce "--renormalize"", 2017-11-16), I
think.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2018-02-15 19:16           ` Junio C Hamano
@ 2018-02-15 21:47             ` Robert Dailey
  0 siblings, 0 replies; 34+ messages in thread
From: Robert Dailey @ 2018-02-15 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, Git

On Thu, Feb 15, 2018 at 1:16 PM, Junio C Hamano <gitster@pobox.com> wrote:
> I think the message you are referring to is a tangent that discusses
> how it was done in the old world, with issues that come from the
> fact that with such an approach the paths are first removed from the
> index and then added afresh to the index, which can lose cases and
> executable bits when working on a filesystem that does not retain
> enough information.
>
> The way in the new world is to use "add --renormalize" which was
> added at 9472935d ("add: introduce "--renormalize"", 2017-11-16), I
> think.

Oh I didn't realize someone actually did it. If so, that's awesome.
Thanks Junio!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2018-02-15 15:24         ` Line ending normalization doesn't work as expected Robert Dailey
  2018-02-15 19:16           ` Junio C Hamano
@ 2018-02-16 16:34           ` Torsten Bögershausen
  2018-02-16 17:19             ` Robert Dailey
  1 sibling, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2018-02-16 16:34 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Junio C Hamano, Git

On Thu, Feb 15, 2018 at 09:24:40AM -0600, Robert Dailey wrote:
> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:

[]
> 
> Sorry to bring this old thread back to life, but I did notice that
> this causes file modes to reset back to 644 (from 755) on Windows
> version of Git. Is there a way to `$ git read-tree --empty && git add
> .` without mucking with file permissions?

No problem with the delay, under the time we had the chance to improve Git:

>Git 2.16 Release Notes
>======================
>[]
>* "git add --renormalize ." is a new and safer way to record the fact
>   that you are correcting the end-of-line convention and other
>   "convert_to_git()" glitches in the in-repository data.

Could you upgrade to Git 2.16.1 (or higher, just take the latest)
and try with
git add --renormalize .
?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Line ending normalization doesn't work as expected
  2018-02-16 16:34           ` Torsten Bögershausen
@ 2018-02-16 17:19             ` Robert Dailey
  0 siblings, 0 replies; 34+ messages in thread
From: Robert Dailey @ 2018-02-16 17:19 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, Git

On Fri, Feb 16, 2018 at 10:34 AM, Torsten Bögershausen <tboegi@web.de> wrote:
> On Thu, Feb 15, 2018 at 09:24:40AM -0600, Robert Dailey wrote:
>> On Tue, Oct 3, 2017 at 9:00 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> []
>>
>> Sorry to bring this old thread back to life, but I did notice that
>> this causes file modes to reset back to 644 (from 755) on Windows
>> version of Git. Is there a way to `$ git read-tree --empty && git add
>> .` without mucking with file permissions?
>
> No problem with the delay, under the time we had the chance to improve Git:
>
>>Git 2.16 Release Notes
>>======================
>>[]
>>* "git add --renormalize ." is a new and safer way to record the fact
>>   that you are correcting the end-of-line convention and other
>>   "convert_to_git()" glitches in the in-repository data.
>
> Could you upgrade to Git 2.16.1 (or higher, just take the latest)
> and try with
> git add --renormalize .
> ?

Thanks for the response. Unfortunately I've deliberately been stuck on
v2.13 because of [a bug in Git for Windows][1] that hasn't yet been
resolved (it's a bug *somewhere*, not sure if it's git related or
not). Curly braces in aliases are being stripped which makes newer
releases unusable for me. I'll try upgrading on a different machine
and see if renormalize works for the case of binary files with file
modes set to 755.

[1]: https://github.com/git-for-windows/git/issues/1220

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-02-16 17:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 15:00 Line ending normalization doesn't work as expected Robert Dailey
2017-10-03 16:26 ` Torsten Bögershausen
2017-10-03 17:23   ` Robert Dailey
2017-10-03 19:19     ` Torsten Bögershausen
2017-10-04  2:00       ` Junio C Hamano
2017-10-04 16:26         ` Robert Dailey
2017-10-04 16:59           ` Jonathan Nieder
2017-10-04 18:03             ` Robert Dailey
2017-10-05  1:31             ` Junio C Hamano
2017-10-05  1:46               ` Jonathan Nieder
2017-10-04 21:17           ` Torsten Bögershausen
2017-10-05  1:38             ` Junio C Hamano
2017-10-05  3:31               ` Junio C Hamano
2017-10-05 21:42                 ` Torsten Bögershausen
2017-10-06  0:33                   ` Junio C Hamano
2017-10-06 17:58                     ` Torsten Bögershausen
2017-10-16 16:49                 ` [PATCH v1 1/1] Introduce git add --renormalize tboegi
2017-10-16 17:34                   ` Junio C Hamano
2017-10-30 16:29                     ` [PATCH v2 " tboegi
2017-11-07  5:50                       ` Junio C Hamano
2017-11-07 17:26                         ` Torsten Bögershausen
2017-11-08  0:37                           ` Junio C Hamano
2017-11-09 18:47                             ` Torsten Bögershausen
2017-11-10  0:22                               ` Junio C Hamano
2017-11-12 20:08                                 ` Torsten Bögershausen
2017-11-16 16:38                     ` [PATCH v3 " tboegi
2017-11-17  1:24                       ` Junio C Hamano
2017-11-17 20:44                       ` Eric Sunshine
2017-11-18  1:47                         ` Junio C Hamano
2018-02-15 15:24         ` Line ending normalization doesn't work as expected Robert Dailey
2018-02-15 19:16           ` Junio C Hamano
2018-02-15 21:47             ` Robert Dailey
2018-02-16 16:34           ` Torsten Bögershausen
2018-02-16 17:19             ` Robert Dailey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).