git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
@ 2022-05-31 14:24 Philip, Bevan
  2022-05-31 21:11 ` Philip Oakley
  0 siblings, 1 reply; 5+ messages in thread
From: Philip, Bevan @ 2022-05-31 14:24 UTC (permalink / raw)
  To: git@vger.kernel.org

Hello all,

I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.

What did you do before the bug happened? (Steps to reproduce your issue)

#!/bin/bash -x
printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal Line\\r\\r\\n\\r\\r\\n" > git.bdf
printf "* text=auto\\n*.bdf text" > .gitattributes
mkdir test1
cd test1
git init
cp ../git.bdf .
git add .
git status
git commit -m "Add file git.bdf"
cp ../.gitattributes .
git add .gitattributes
git add --renormalize .
git status
git commit -m "Renormalize git.bdf"
git add --renormalize .
git status
rm git.bdf
git restore .
git add --renormalize .
git status

What did you expect to happen? (Expected behavior)
Only needing to renormalize the file once.

What happened instead? (Actual behavior)
Renormalize the file once, then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.

What's different between what you expected and what actually happened?
Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.

Anything else you want to add:
This only occurs for files with \r\r\n line endings (and possibly also ending the file with \r\r\n\r\n)

The file is in three states:
- Initial state: \r\r\n line endings within Git object
- Initial renormalization state: \r\n line endings within Git object
- Second renormalization state: \n line endings within Git object

Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
System info is for a Windows build where it does happen.

Directory, and file names should be irrelevant.

We encountered this naturally, with some files within a SVN repo we're migrating.

[System Info]
git version:
git version 2.36.1.windows.1
cpu: x86_64
built from commit: e2ff68a2d1426758c78d023f863bfa1e03cbc768
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Windows 10.0 19043
compiler info: gnuc: 11.3
libc info: no libc information available
$SHELL (typically, interactive shell): <unset>

Thanks,
Bevan
This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please note that any distribution, copying, or use of this communication or the information in it, is strictly prohibited. If you have received this communication in error please notify us by e-mail and then delete the e-mail and any copies of it.
Software AG (UK) Limited Registered in England & Wales 1310740 - http://www.softwareag.com/uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
  2022-05-31 14:24 Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo Philip, Bevan
@ 2022-05-31 21:11 ` Philip Oakley
  2022-06-01 10:07   ` Philip, Bevan
  0 siblings, 1 reply; 5+ messages in thread
From: Philip Oakley @ 2022-05-31 21:11 UTC (permalink / raw)
  To: Philip, Bevan, git@vger.kernel.org

On 31/05/2022 15:24, Philip, Bevan wrote:
> Hello all,
>
> I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> #!/bin/bash -x
> printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal Line\\r\\r\\n\\r\\r\\n" > git.bdf
> printf "* text=auto\\n*.bdf text" > .gitattributes
> mkdir test1
> cd test1
> git init
> cp ../git.bdf .
> git add .
> git status
> git commit -m "Add file git.bdf"
> cp ../.gitattributes .
> git add .gitattributes
> git add --renormalize .
> git status
> git commit -m "Renormalize git.bdf"
> git add --renormalize .
> git status
> rm git.bdf
> git restore .
> git add --renormalize .
> git status
>
> What did you expect to happen? (Expected behavior)
> Only needing to renormalize the file once.

That sounds like an obvious expectation, ...
> What happened instead? (Actual behavior)
> Renormalize the file once, then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.
>
> What's different between what you expected and what actually happened?
> Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.
>
> Anything else you want to add:
> This only occurs for files with \r\r\n line endings (and possibly also ending the file with \r\r\n\r\n)

... however, if I remember the design discussion correctly, 
normalisation was decided to be just the conversion of the Windows style 
EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters 
(utf8 / ascii bytes) were to be unchanged, including random '\r' 
characters. So in that respect I think it is working as initially designed.

> The file is in three states:
> - Initial state: \r\r\n line endings within Git object
> - Initial renormalization state: \r\n line endings within Git object
> - Second renormalization state: \n line endings within Git object
>
> Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
> System info is for a Windows build where it does happen.
>
> Directory, and file names should be irrelevant.
>
> We encountered this naturally, with some files within a SVN repo we're migrating.

Do you have any information on how the mixed EOL styles (extra \r etc) 
came about?
Should those extra \r characters also be separate EOLs? (and how to 
decide..?)
Are the docs missing anything that would have helped clarify the issue 
earlier?
>
> [System Info]
> git version:
> git version 2.36.1.windows.1
> cpu: x86_64
> built from commit: e2ff68a2d1426758c78d023f863bfa1e03cbc768
> sizeof-long: 4
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> uname: Windows 10.0 19043
> compiler info: gnuc: 11.3
> libc info: no libc information available
> $SHELL (typically, interactive shell): <unset>
>
>
--
Philip

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
  2022-05-31 21:11 ` Philip Oakley
@ 2022-06-01 10:07   ` Philip, Bevan
  2022-06-03 13:14     ` Philip Oakley
  0 siblings, 1 reply; 5+ messages in thread
From: Philip, Bevan @ 2022-06-01 10:07 UTC (permalink / raw)
  To: Philip Oakley, git@vger.kernel.org

Hey Philip,

Thanks for the response!

> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
> (utf8 / ascii bytes) were to be unchanged, including random '\r'
> characters. So in that respect I think it is working as initially designed.

This makes sense.

> Do you have any information on how the mixed EOL styles (extra \r etc) came about?

I wish I knew how this file came about, but the people that put these files in our VCS have long left. I suspect some broken generation tool.

> Should those extra \r characters also be separate EOLs? (and how to
> decide..?)

Most tooling I use seems to do this, but I agree that this is an ambiguous topic.

> Are the docs missing anything that would have helped clarify the issue earlier?

A brief note on the limitations of renormalization might have proven helpful - in particular, the bit that tripped me up was the requirement to remove and restore the files from the Git repository itself. It wasn't obvious to me that this would have any impact on renormalization. Additionally, a note about the restriction on converting only \r\n to \n might also have proven useful.

Thanks,
Bevan


-----Original Message-----
From: Philip Oakley <philipoakley@iee.email>
Sent: 31 May 2022 22:12
To: Philip, Bevan <Bevan.Philip@softwareag.com>; git@vger.kernel.org
Subject: Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo

On 31/05/2022 15:24, Philip, Bevan wrote:
> Hello all,
>
> I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.
>
> What did you do before the bug happened? (Steps to reproduce your
> issue)
>
> #!/bin/bash -x
> printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal
> Line\\r\\r\\n\\r\\r\\n" > git.bdf printf "* text=auto\\n*.bdf text" >
> .gitattributes mkdir test1 cd test1 git init cp ../git.bdf .
> git add .
> git status
> git commit -m "Add file git.bdf"
> cp ../.gitattributes .
> git add .gitattributes
> git add --renormalize .
> git status
> git commit -m "Renormalize git.bdf"
> git add --renormalize .
> git status
> rm git.bdf
> git restore .
> git add --renormalize .
> git status
>
> What did you expect to happen? (Expected behavior) Only needing to
> renormalize the file once.

That sounds like an obvious expectation, ...
> What happened instead? (Actual behavior) Renormalize the file once,
> then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.
>
> What's different between what you expected and what actually happened?
> Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.
>
> Anything else you want to add:
> This only occurs for files with \r\r\n line endings (and possibly also
> ending the file with \r\r\n\r\n)

... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
(utf8 / ascii bytes) were to be unchanged, including random '\r'
characters. So in that respect I think it is working as initially designed.

> The file is in three states:
> - Initial state: \r\r\n line endings within Git object
> - Initial renormalization state: \r\n line endings within Git object
> - Second renormalization state: \n line endings within Git object
>
> Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
> System info is for a Windows build where it does happen.
>
> Directory, and file names should be irrelevant.
>
> We encountered this naturally, with some files within a SVN repo we're migrating.

Do you have any information on how the mixed EOL styles (extra \r etc) came about?
Should those extra \r characters also be separate EOLs? (and how to
decide..?)
Are the docs missing anything that would have helped clarify the issue earlier?
>
> [System Info]
> git version:
> git version 2.36.1.windows.1
> cpu: x86_64
> built from commit: e2ff68a2d1426758c78d023f863bfa1e03cbc768
> sizeof-long: 4
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> uname: Windows 10.0 19043
> compiler info: gnuc: 11.3
> libc info: no libc information available $SHELL (typically,
> interactive shell): <unset>
>
>
--
Philip
This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please note that any distribution, copying, or use of this communication or the information in it, is strictly prohibited. If you have received this communication in error please notify us by e-mail and then delete the e-mail and any copies of it.
Software AG (UK) Limited Registered in England & Wales 1310740 - http://www.softwareag.com/uk

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
  2022-06-01 10:07   ` Philip, Bevan
@ 2022-06-03 13:14     ` Philip Oakley
  2022-06-05 13:55       ` Philip Oakley
  0 siblings, 1 reply; 5+ messages in thread
From: Philip Oakley @ 2022-06-03 13:14 UTC (permalink / raw)
  To: Philip, Bevan, git@vger.kernel.org

On 01/06/2022 11:07, Philip, Bevan wrote:
> Hey Philip,
>
> Thanks for the response!
>
>> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
>> (utf8 / ascii bytes) were to be unchanged, including random '\r'
>> characters. So in that respect I think it is working as initially designed.
> This makes sense.
>
>> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
> I wish I knew how this file came about, but the people that put these files in our VCS have long left. I suspect some broken generation tool.

I vaguely remember tales that early Macs use \r as their EOL character,
so may have been that.
>
>> Should those extra \r characters also be separate EOLs? (and how to
>> decide..?)
> Most tooling I use seems to do this, but I agree that this is an ambiguous topic.
maybe an extra `sed` invocation changing all the \r to \n in such cases!
>
>> Are the docs missing anything that would have helped clarify the issue earlier?
> A brief note on the limitations of renormalization might have proven helpful
I'll maybe add that to my list of todo's (though it's a bit long and
aspirational;-)

>  - in particular, the bit that tripped me up was the requirement to remove and restore the files from the Git repository itself.
I think it's just a checkout and then an `add` of the renormalised files
`git add --renormalize . ` (not forgetting the all important `dot`),
though some may have termed the checkout as the files being 'removed'.

I did notice (when cross checking a few points) that there is also a
`merge.renormalize` config option that will then make sure that when
branches are merged you get the required re-normalisation (check the man
pages ..).

>  It wasn't obvious to me that this would have any impact on renormalization. Additionally, a note about the restriction on converting only \r\n to \n might also have proven useful.

OK.

PS, in-line replies preferred on the list.
>
> Thanks,
> Bevan
>
>
> -----Original Message-----
> From: Philip Oakley <philipoakley@iee.email>
> Sent: 31 May 2022 22:12
> To: Philip, Bevan <Bevan.Philip@softwareag.com>; git@vger.kernel.org
> Subject: Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
>
> On 31/05/2022 15:24, Philip, Bevan wrote:
>> Hello all,
>>
>> I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.
>>
>> What did you do before the bug happened? (Steps to reproduce your
>> issue)
>>
>> #!/bin/bash -x
>> printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal
>> Line\\r\\r\\n\\r\\r\\n" > git.bdf printf "* text=auto\\n*.bdf text" >
>> .gitattributes mkdir test1 cd test1 git init cp ../git.bdf .
>> git add .
>> git status
>> git commit -m "Add file git.bdf"
>> cp ../.gitattributes .
>> git add .gitattributes
>> git add --renormalize .
>> git status
>> git commit -m "Renormalize git.bdf"
>> git add --renormalize .
>> git status
>> rm git.bdf
>> git restore .
>> git add --renormalize .
>> git status
>>
>> What did you expect to happen? (Expected behavior) Only needing to
>> renormalize the file once.
> That sounds like an obvious expectation, ...
>> What happened instead? (Actual behavior) Renormalize the file once,
>> then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.
>>
>> What's different between what you expected and what actually happened?
>> Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.
>>
>> Anything else you want to add:
>> This only occurs for files with \r\r\n line endings (and possibly also
>> ending the file with \r\r\n\r\n)
> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
> (utf8 / ascii bytes) were to be unchanged, including random '\r'
> characters. So in that respect I think it is working as initially designed.
>
>> The file is in three states:
>> - Initial state: \r\r\n line endings within Git object
>> - Initial renormalization state: \r\n line endings within Git object
>> - Second renormalization state: \n line endings within Git object
>>
>> Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
>> System info is for a Windows build where it does happen.
>>
>> Directory, and file names should be irrelevant.
>>
>> We encountered this naturally, with some files within a SVN repo we're migrating.
> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
> Should those extra \r characters also be separate EOLs? (and how to
> decide..?)
> Are the docs missing anything that would have helped clarify the issue earlier?
>> [System Info]
>> git version:
>> git version 2.36.1.windows.1
>> cpu: x86_64
>> built from commit: e2ff68a2d1426758c78d023f863bfa1e03cbc768
>> sizeof-long: 4
>> sizeof-size_t: 8
>> shell-path: /bin/sh
>> feature: fsmonitor--daemon
>> uname: Windows 10.0 19043
>> compiler info: gnuc: 11.3
>> libc info: no libc information available $SHELL (typically,
>> interactive shell): <unset>
>>
>>
> --
> Philip
> This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please note that any distribution, copying, or use of this communication or the information in it, is strictly prohibited. If you have received this communication in error please notify us by e-mail and then delete the e-mail and any copies of it.
> Software AG (UK) Limited Registered in England & Wales 1310740 - http://www.softwareag.com/uk


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
  2022-06-03 13:14     ` Philip Oakley
@ 2022-06-05 13:55       ` Philip Oakley
  0 siblings, 0 replies; 5+ messages in thread
From: Philip Oakley @ 2022-06-05 13:55 UTC (permalink / raw)
  To: Philip, Bevan, git@vger.kernel.org

On 03/06/2022 14:14, Philip Oakley wrote:
> On 01/06/2022 11:07, Philip, Bevan wrote:
>> Hey Philip,
>>
>> Thanks for the response!
>>
>>> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
>>> (utf8 / ascii bytes) were to be unchanged, including random '\r'
>>> characters. So in that respect I think it is working as initially designed.
>> This makes sense.
>>
>>> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
>> I wish I knew how this file came about, but the people that put these files in our VCS have long left. I suspect some broken generation tool.
> I vaguely remember tales that early Macs use \r as their EOL character,
> so may have been that.
>>> Should those extra \r characters also be separate EOLs? (and how to
>>> decide..?)
>> Most tooling I use seems to do this, but I agree that this is an ambiguous topic.
> maybe an extra `sed` invocation changing all the \r to \n in such cases!

It looks like StackOverflow has an answer
https://stackoverflow.com/a/42914886/717355

$ sed -i 's/\r/\n/g; s/\n$//' for the all-at-once conversion filter
using sed (with explanation!). I believe its idempotent (great word to
know ;-)

>>> Are the docs missing anything that would have helped clarify the issue earlier?
>> A brief note on the limitations of renormalization might have proven helpful
> I'll maybe add that to my list of todo's (though it's a bit long and
> aspirational;-)
>
>>  - in particular, the bit that tripped me up was the requirement to remove and restore the files from the Git repository itself.
> I think it's just a checkout and then an `add` of the renormalised files
> `git add --renormalize . ` (not forgetting the all important `dot`),
> though some may have termed the checkout as the files being 'removed'.
>
> I did notice (when cross checking a few points) that there is also a
> `merge.renormalize` config option that will then make sure that when
> branches are merged you get the required re-normalisation (check the man
> pages ..).
>
>>  It wasn't obvious to me that this would have any impact on renormalization. Additionally, a note about the restriction on converting only \r\n to \n might also have proven useful.
> OK.
>
> PS, in-line replies preferred on the list.
>> Thanks,
>> Bevan
>>
>>
>> -----Original Message-----
>> From: Philip Oakley <philipoakley@iee.email>
>> Sent: 31 May 2022 22:12
>> To: Philip, Bevan <Bevan.Philip@softwareag.com>; git@vger.kernel.org
>> Subject: Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
>>
>> On 31/05/2022 15:24, Philip, Bevan wrote:
>>> Hello all,
>>>
>>> I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.
>>>
>>> What did you do before the bug happened? (Steps to reproduce your
>>> issue)
>>>
>>> #!/bin/bash -x
>>> printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal
>>> Line\\r\\r\\n\\r\\r\\n" > git.bdf printf "* text=auto\\n*.bdf text" >
>>> .gitattributes mkdir test1 cd test1 git init cp ../git.bdf .
>>> git add .
>>> git status
>>> git commit -m "Add file git.bdf"
>>> cp ../.gitattributes .
>>> git add .gitattributes
>>> git add --renormalize .
>>> git status
>>> git commit -m "Renormalize git.bdf"
>>> git add --renormalize .
>>> git status
>>> rm git.bdf
>>> git restore .
>>> git add --renormalize .
>>> git status
>>>
>>> What did you expect to happen? (Expected behavior) Only needing to
>>> renormalize the file once.
>> That sounds like an obvious expectation, ...
>>> What happened instead? (Actual behavior) Renormalize the file once,
>>> then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.
>>>
>>> What's different between what you expected and what actually happened?
>>> Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.
>>>
>>> Anything else you want to add:
>>> This only occurs for files with \r\r\n line endings (and possibly also
>>> ending the file with \r\r\n\r\n)
>> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
>> (utf8 / ascii bytes) were to be unchanged, including random '\r'
>> characters. So in that respect I think it is working as initially designed.
>>
>>> The file is in three states:
>>> - Initial state: \r\r\n line endings within Git object
>>> - Initial renormalization state: \r\n line endings within Git object
>>> - Second renormalization state: \n line endings within Git object
>>>
>>> Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
>>> System info is for a Windows build where it does happen.
>>>
>>> Directory, and file names should be irrelevant.
>>>
>>> We encountered this naturally, with some files within a SVN repo we're migrating.
>> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
>> Should those extra \r characters also be separate EOLs? (and how to
>> decide..?)
>> Are the docs missing anything that would have helped clarify the issue earlier?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-05 13:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 14:24 Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo Philip, Bevan
2022-05-31 21:11 ` Philip Oakley
2022-06-01 10:07   ` Philip, Bevan
2022-06-03 13:14     ` Philip Oakley
2022-06-05 13:55       ` Philip Oakley

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).