git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
@ 2017-08-01 18:24 Anthony Sottile
  2017-08-01 20:47 ` Torsten Bögershausen
  0 siblings, 1 reply; 36+ messages in thread
From: Anthony Sottile @ 2017-08-01 18:24 UTC (permalink / raw)
  To: Git Mailing List

Here's my minimal reproduction -- it's slightly far-fetched in that it
involves *committing crlf* and
then using `autocrlf=true` (commit lf, check out crlf).

```
#!/bin/bash
set -ex

rm -rf foo
git init foo
cd foo

# Commit crlf into repository
git config --local core.autocrlf false
python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
git add foo
git commit -m "Initial commit with crlf"

# Change whitespace mode to autocrlf, "commit lf, checkout crlf"
git config --local core.autocrlf true
python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'

# Generate a patch, check it out, restore it
git diff --ignore-submodules --binary --no-color --no-ext-diff > patch
python3 -c 'print(open("patch", "rb").read())'
git checkout -- .
# I expect this to succeed, it fails
git apply patch
```

And here's the output:

```
+ rm -rf foo
+ git init foo
Initialized empty Git repository in /tmp/foo/.git/
+ cd foo
+ git config --local core.autocrlf false
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
+ git add foo
+ git commit -m 'Initial commit with crlf'
[master (root-commit) 02d3246] Initial commit with crlf
 1 file changed, 2 insertions(+)
 create mode 100644 foo
+ git config --local core.autocrlf true
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
+ git diff --ignore-submodules --binary --no-color --no-ext-diff
+ python3 -c 'print(open("patch", "rb").read())'
b'diff --git a/foo b/foo\nindex bd956ea..fbf7936 100644\n---
a/foo\n+++ b/foo\n@@ -1,2 +1,5 @@\n 1\r\n 2\r\n+\r\n+\r\n+\r\n'
+ git checkout -- .
+ git apply patch
patch:8: trailing whitespace.

patch:9: trailing whitespace.

patch:10: trailing whitespace.

error: patch failed: foo:1
error: foo: patch does not apply
```

I also tried with `git apply --ignore-whitespace`, but this causes the
line endings of the existing contents to be changed to *lf* (there may
be two bugs here?)

Thanks,

Anthony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-01 18:24 core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Anthony Sottile
@ 2017-08-01 20:47 ` Torsten Bögershausen
  2017-08-01 20:58   ` Anthony Sottile
  0 siblings, 1 reply; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-01 20:47 UTC (permalink / raw)
  To: Anthony Sottile, Git Mailing List



On 08/01/2017 08:24 PM, Anthony Sottile wrote:
> Here's my minimal reproduction -- it's slightly far-fetched in that it
> involves*committing crlf*  and
> then using `autocrlf=true` (commit lf, check out crlf).
>
> ```
> #!/bin/bash
> set -ex
>
> rm -rf foo
> git init foo
> cd foo
>
> # Commit crlf into repository
> git config --local core.autocrlf false
> python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
> git add foo
> git commit -m "Initial commit with crlf"
>
> # Change whitespace mode to autocrlf, "commit lf, checkout crlf"
> git config --local core.autocrlf true
> python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
>
> # Generate a patch, check it out, restore it
> git diff --ignore-submodules --binary --no-color --no-ext-diff > patch
> python3 -c 'print(open("patch", "rb").read())'
> git checkout -- .
> # I expect this to succeed, it fails
> git apply patch
> ```
>
> And here's the output:
>
> ```
> + rm -rf foo
> + git init foo
> Initialized empty Git repository in/tmp/foo/.git/
> + cd foo
> + git config --local core.autocrlf false
> + python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
> + git add foo
> + git commit -m 'Initial commit with crlf'
> [master (root-commit) 02d3246] Initial commit with crlf
>   1 file changed, 2 insertions(+)
>   create mode 100644 foo
> + git config --local core.autocrlf true
> + python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
> + git diff --ignore-submodules --binary --no-color --no-ext-diff
> + python3 -c 'print(open("patch", "rb").read())'
> b'diff --git a/foo b/foo\nindex bd956ea..fbf7936 100644\n---
> a/foo\n+++ b/foo\n@@ -1,2 +1,5 @@\n 1\r\n 2\r\n+\r\n+\r\n+\r\n'
> + git checkout -- .
> + git apply patch
> patch:8: trailing whitespace.
>
> patch:9: trailing whitespace.
>
> patch:10: trailing whitespace.
>
> error: patch failed: foo:1
> error: foo: patch does not apply
> ```
>
> I also tried with `git apply --ignore-whitespace`, but this causes the
> line endings of the existing contents to be changed to*lf*  (there may
> be two bugs here?)
>
> Thanks,
>
> Anthony


I can reproduce you test case here.

The line

git apply patch

would succeed, if you temporally (for the runtime of the apply command) set
core.autocrlf to false:

git -c core.autocrlf=false apply patch

So this seems to be a bug (in a corner case ?):

Typically repos which had been commited with CRLF should be normalized,
which means that the CRLF in the repo are replaced by LF.
So you test script is a corner case, for which Git has not been designed,
It seems as if "git apply" gets things wrong here.
Especially, as the '\r' is not a whitespace as a white space. but part
of the line ending.
So in my understanding the "--ignore-whitespace" option shouldn't affect
the line endings at all.

Fixes are possible, does anyone have a clue, why the '\r' is handled
like this in apply ?

And out of interest: is this a real life problem ?



  



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-01 20:47 ` Torsten Bögershausen
@ 2017-08-01 20:58   ` Anthony Sottile
  2017-08-02 15:44     ` Torsten Bögershausen
  0 siblings, 1 reply; 36+ messages in thread
From: Anthony Sottile @ 2017-08-01 20:58 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git Mailing List

Here's where I'm hitting the problem described:
https://github.com/pre-commit/pre-commit/issues/570

Note that `git -c core.autocrlf=false` apply patch fixes this
situation, but breaks others.

Here's a testcase where `git -c core.autocrlf=false apply patch`
causes a *different* patch failure:

```
#!/bin/bash
set -ex

rm -rf foo
git init foo
cd foo

git config --local core.autocrlf true

# Commit lf into repository
python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
git add foo
python3 -c 'open("foo", "wb").write(b"3\n4\n")'

# Generate a patch, check it out, restore it
git diff --ignore-submodules --binary --no-color --no-ext-diff > patch
python3 -c 'print(open("patch", "rb").read())'
git checkout -- .
git -c core.autocrlf=false apply patch
```

output:

```
+ rm -rf foo
+ git init foo
Initialized empty Git repository in /tmp/foo/.git/
+ cd foo
+ git config --local core.autocrlf true
+ python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
+ git add foo
+ python3 -c 'open("foo", "wb").write(b"3\n4\n")'
+ git diff --ignore-submodules --binary --no-color --no-ext-diff
warning: LF will be replaced by CRLF in foo.
The file will have its original line endings in your working directory.
+ python3 -c 'print(open("patch", "rb").read())'
b'diff --git a/foo b/foo\nindex 1191247..b944734 100644\n---
a/foo\n+++ b/foo\n@@ -1,2 +1,2 @@\n-1\n-2\n+3\n+4\n'
+ git checkout -- .
+ git -c core.autocrlf=false apply patch
error: patch failed: foo:1
```

My current workaround is:
- try `git apply patch`
- try `git -c core.autocrlf=false apply patch`

which seems to work pretty well.

Anthony


On Tue, Aug 1, 2017 at 1:47 PM, Torsten Bögershausen <tboegi@web.de> wrote:
>
>
> On 08/01/2017 08:24 PM, Anthony Sottile wrote:
>>
>> Here's my minimal reproduction -- it's slightly far-fetched in that it
>> involves*committing crlf*  and
>>
>> then using `autocrlf=true` (commit lf, check out crlf).
>>
>> ```
>> #!/bin/bash
>> set -ex
>>
>> rm -rf foo
>> git init foo
>> cd foo
>>
>> # Commit crlf into repository
>> git config --local core.autocrlf false
>> python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
>> git add foo
>> git commit -m "Initial commit with crlf"
>>
>> # Change whitespace mode to autocrlf, "commit lf, checkout crlf"
>> git config --local core.autocrlf true
>> python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
>>
>> # Generate a patch, check it out, restore it
>> git diff --ignore-submodules --binary --no-color --no-ext-diff > patch
>> python3 -c 'print(open("patch", "rb").read())'
>> git checkout -- .
>> # I expect this to succeed, it fails
>> git apply patch
>> ```
>>
>> And here's the output:
>>
>> ```
>> + rm -rf foo
>> + git init foo
>> Initialized empty Git repository in/tmp/foo/.git/
>> + cd foo
>> + git config --local core.autocrlf false
>> + python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n")'
>> + git add foo
>> + git commit -m 'Initial commit with crlf'
>> [master (root-commit) 02d3246] Initial commit with crlf
>>   1 file changed, 2 insertions(+)
>>   create mode 100644 foo
>> + git config --local core.autocrlf true
>> + python3 -c 'open("foo", "wb").write(b"1\r\n2\r\n\r\n\r\n\r\n")'
>> + git diff --ignore-submodules --binary --no-color --no-ext-diff
>> + python3 -c 'print(open("patch", "rb").read())'
>> b'diff --git a/foo b/foo\nindex bd956ea..fbf7936 100644\n---
>> a/foo\n+++ b/foo\n@@ -1,2 +1,5 @@\n 1\r\n 2\r\n+\r\n+\r\n+\r\n'
>> + git checkout -- .
>> + git apply patch
>> patch:8: trailing whitespace.
>>
>> patch:9: trailing whitespace.
>>
>> patch:10: trailing whitespace.
>>
>> error: patch failed: foo:1
>> error: foo: patch does not apply
>> ```
>>
>> I also tried with `git apply --ignore-whitespace`, but this causes the
>> line endings of the existing contents to be changed to*lf*  (there may
>> be two bugs here?)
>>
>> Thanks,
>>
>> Anthony
>
>
>
> I can reproduce you test case here.
>
> The line
>
> git apply patch
>
> would succeed, if you temporally (for the runtime of the apply command) set
> core.autocrlf to false:
>
> git -c core.autocrlf=false apply patch
>
> So this seems to be a bug (in a corner case ?):
>
> Typically repos which had been commited with CRLF should be normalized,
> which means that the CRLF in the repo are replaced by LF.
> So you test script is a corner case, for which Git has not been designed,
> It seems as if "git apply" gets things wrong here.
> Especially, as the '\r' is not a whitespace as a white space. but part
> of the line ending.
> So in my understanding the "--ignore-whitespace" option shouldn't affect
> the line endings at all.
>
> Fixes are possible, does anyone have a clue, why the '\r' is handled
> like this in apply ?
>
> And out of interest: is this a real life problem ?
>
>
>
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-01 20:58   ` Anthony Sottile
@ 2017-08-02 15:44     ` Torsten Bögershausen
  2017-08-02 20:42       ` [PATCH v1 1/1] correct apply for files commited with CRLF tboegi
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
  0 siblings, 2 replies; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-02 15:44 UTC (permalink / raw)
  To: Anthony Sottile; +Cc: Git Mailing List



On 08/01/2017 10:58 PM, Anthony Sottile wrote:
> Here's where I'm hitting the problem described:
> https://github.com/pre-commit/pre-commit/issues/570
>
> Note that `git -c core.autocrlf=false` apply patch fixes this
> situation, but breaks others.

[]
I wasn't thinking of that - and thanks for the detailed report.
I seems as there are 3 things to be done:
- Make a workaround in your scripts/tools. It seems as if that is 
already done.
- Fix Git.
   My very first investigation shows that a patch like this could fix 
the problem:

diff --git a/apply.c b/apply.c
index f2d599141d..66b8387360 100644
--- a/apply.c
+++ b/apply.c
@@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const 
char *path, struct strbuf *buf)
         case S_IFREG:
                 if (strbuf_read_file(buf, path, st->st_size) != 
st->st_size)
                         return error(_("unable to open or read %s"), path);
+               if (would_convert_to_git(&the_index, path))
+                       read_cache();
                 convert_to_git(&the_index, path, buf->buf, buf->len, 
buf, 0);
                 return 0;
         default:
----------------
   I will probably do some more investigations if this is the core 
problem, so it will take some days or weeks.

- Convince people to normalize their repos and convert all CRLF in the 
repo into LF.
    This may take even longer.



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v1 1/1] correct apply for files commited with CRLF
  2017-08-02 15:44     ` Torsten Bögershausen
@ 2017-08-02 20:42       ` tboegi
  2017-08-02 21:13         ` Junio C Hamano
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
  1 sibling, 1 reply; 36+ messages in thread
From: tboegi @ 2017-08-02 20:42 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

git apply does not find the source lines when files have CRLF in the index
and core.autocrlf is true:
These files should not get the CRLF converted to LF. Because cmd_apply()
does not load the index, this does not work, CRLF are converted into LF
and apply fails.

Fix this in the spirit of commit a08feb8ef0b6,
"correct blame for files commited with CRLF" by loading the index.

As an optimization, skip read_cache() when no conversion is specified
for this path.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 apply.c         |  2 ++
 t/t0020-crlf.sh | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/apply.c b/apply.c
index f2d599141d..66b8387360 100644
--- a/apply.c
+++ b/apply.c
@@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
+		if (would_convert_to_git(&the_index, path))
+			read_cache();
 		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
 		return 0;
 	default:
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index 71350e0657..6611f8a6f6 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -386,4 +386,16 @@ test_expect_success 'New CRLF file gets LF in repo' '
 	test_cmp alllf alllf2
 '
 
+test_expect_success 'CRLF in repo, apply with autocrlf=true' '
+	git config core.autocrlf false &&
+	printf "1\r\n2\r\n" >crlf &&
+	git add crlf &&
+	git commit -m "commit crlf with crlf" &&
+	git config core.autocrlf true &&
+	printf "1\r\n2\r\n\r\n\r\n\r\n" >crlf &&
+	git diff >patch &&
+	git checkout -- . &&
+	git apply patch
+'
+
 test_done
-- 
2.13.2.533.ge0aaa1b


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-02 15:44     ` Torsten Bögershausen
  2017-08-02 20:42       ` [PATCH v1 1/1] correct apply for files commited with CRLF tboegi
@ 2017-08-02 20:58       ` Junio C Hamano
  2017-08-12  5:45         ` Torsten Bögershausen
                           ` (4 more replies)
  1 sibling, 5 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-02 20:58 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Anthony Sottile, Git Mailing List

Torsten Bögershausen <tboegi@web.de> writes:

>   My very first investigation shows that a patch like this could fix
> the problem:
>
> diff --git a/apply.c b/apply.c
> index f2d599141d..66b8387360 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const
> char *path, struct strbuf *buf)
>         case S_IFREG:
>                 if (strbuf_read_file(buf, path, st->st_size) !=
> st->st_size)
>                         return error(_("unable to open or read %s"), path);
> +               if (would_convert_to_git(&the_index, path))
> +                       read_cache();
>                 convert_to_git(&the_index, path, buf->buf, buf->len,
> buf, 0);

Sorry, but it is unclear why this is a "fix" to me.  We may not even
be doing "git apply --index" or "git apply --cached" that care about
the state recorded in the index, but just the plain vanilla "git apply",
which could even be outside any repository.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/1] correct apply for files commited with CRLF
  2017-08-02 20:42       ` [PATCH v1 1/1] correct apply for files commited with CRLF tboegi
@ 2017-08-02 21:13         ` Junio C Hamano
  2017-08-04 17:31           ` Torsten Bögershausen
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2017-08-02 21:13 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> git apply does not find the source lines when files have CRLF in the index
> and core.autocrlf is true:
> These files should not get the CRLF converted to LF. Because cmd_apply()
> does not load the index, this does not work, CRLF are converted into LF
> and apply fails.
>
> Fix this in the spirit of commit a08feb8ef0b6,
> "correct blame for files commited with CRLF" by loading the index.
>
> As an optimization, skip read_cache() when no conversion is specified
> for this path.

What happens when the input to apply is a patch to create a new file
and the patch uses CRLF line endings?  In such a case, shouldn't
core.autocrlf=true cause the patch to be converted to LF and then
succeed in applying?  An extra read_cache() would not help as there
must be no existing index entry for the path in such a case.

If you did "rm .git/index" after you did the "git checkout -- ." in
your test patch, "git apply" ought to succeed, as it is working as
a substitute for "patch -p1" and should not be consulting the index
at all.

I cannot shake this feeling that it is convert_to_git() that needs
to be fixed so that it does not need to refer to the current
contents in the index.  Why does it even need to look at the index?
If it is for the "safe CRLF" thing (which I would think is misguided),
shouldn't it be checking the original file in the working tree, not
the index?  After all, that is what we are patching, not the contents
stored in the index.

>
> Reported-by: Anthony Sottile <asottile@umich.edu>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
>  apply.c         |  2 ++
>  t/t0020-crlf.sh | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/apply.c b/apply.c
> index f2d599141d..66b8387360 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -2278,6 +2278,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
>  	case S_IFREG:
>  		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
>  			return error(_("unable to open or read %s"), path);
> +		if (would_convert_to_git(&the_index, path))
> +			read_cache();
>  		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
>  		return 0;
>  	default:
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> index 71350e0657..6611f8a6f6 100755
> --- a/t/t0020-crlf.sh
> +++ b/t/t0020-crlf.sh
> @@ -386,4 +386,16 @@ test_expect_success 'New CRLF file gets LF in repo' '
>  	test_cmp alllf alllf2
>  '
>  
> +test_expect_success 'CRLF in repo, apply with autocrlf=true' '
> +	git config core.autocrlf false &&
> +	printf "1\r\n2\r\n" >crlf &&
> +	git add crlf &&
> +	git commit -m "commit crlf with crlf" &&
> +	git config core.autocrlf true &&
> +	printf "1\r\n2\r\n\r\n\r\n\r\n" >crlf &&
> +	git diff >patch &&
> +	git checkout -- . &&
> +	git apply patch
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/1] correct apply for files commited with CRLF
  2017-08-02 21:13         ` Junio C Hamano
@ 2017-08-04 17:31           ` Torsten Bögershausen
  2017-08-04 17:57             ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-04 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, asottile



On 08/02/2017 11:13 PM, Junio C Hamano wrote:
> tboegi@web.de writes:
>
>> From: Torsten Bögershausen <tboegi@web.de>
>>
>> git apply does not find the source lines when files have CRLF in the index
>> and core.autocrlf is true:
>> These files should not get the CRLF converted to LF. Because cmd_apply()
>> does not load the index, this does not work, CRLF are converted into LF
>> and apply fails.
>>
>> Fix this in the spirit of commit a08feb8ef0b6,
>> "correct blame for files commited with CRLF" by loading the index.
>>
>> As an optimization, skip read_cache() when no conversion is specified
>> for this path.
> What happens when the input to apply is a patch to create a new file
> and the patch uses CRLF line endings?  In such a case, shouldn't
> core.autocrlf=true cause the patch to be converted to LF and then
> succeed in applying?  An extra read_cache() would not help as there
> must be no existing index entry for the path in such a case.
>
> If you did "rm .git/index" after you did the "git checkout -- ." in
> your test patch, "git apply" ought to succeed, as it is working as
> a substitute for "patch -p1" and should not be consulting the index
> at all.
>
> I cannot shake this feeling that it is convert_to_git() that needs
> to be fixed so that it does not need to refer to the current
> contents in the index.  Why does it even need to look at the index?
> If it is for the "safe CRLF" thing (which I would think is misguided),
> shouldn't it be checking the original file in the working tree, not
> the index?  After all, that is what we are patching, not the contents
> stored in the index.
Thanks for the review.
My understanding is that there are different things possible with `git 
apply`:
working on files in the index ("cached") files and "worktree" files.

If we work on files in the index, the line endings must match for
the patch to apply, the patch/apply machinery is not forgiving
mismatching line endings. At least not by default.
There is one exception: Use "blank-at-eol" to declare the CR of
the CRLF as a whitespace, and then tell git apply to ignore white space:
`git apply --ignore-whitespace`
My feeling is that this is not self-explaining, but this is a different 
story.

Back to the fix, the read_old_data() from below works on the working tree,
yes, but after convert_to_git().
And that is why we need the index, to fix this very case.

appy.c:
static int load_patch_target(struct apply_state *state,
                  struct strbuf *buf,
                  const struct cache_entry *ce,
                  struct stat *st,
                  const char *name,
                  unsigned expected_mode)
{
     if (state->cached || state->check_index) {
         if (read_file_or_gitlink(ce, buf))
             return error(_("failed to read %s"), name);
     } else if (name) {
         if (S_ISGITLINK(expected_mode)) {
             if (ce)
                 return read_file_or_gitlink(ce, buf);
             else
                 return SUBMODULE_PATCH_WITHOUT_INDEX;
         } else if (has_symlink_leading_path(name, strlen(name))) {
             return error(_("reading from '%s' beyond a symbolic link"), 
name);
         } else {
             if (read_old_data(st, name, buf))
                 return error(_("failed to read %s"), name);
         }
     }
     return 0;
}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/1] correct apply for files commited with CRLF
  2017-08-04 17:31           ` Torsten Bögershausen
@ 2017-08-04 17:57             ` Junio C Hamano
  2017-08-04 19:26               ` Junio C Hamano
  0 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2017-08-04 17:57 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, asottile

Torsten Bögershausen <tboegi@web.de> writes:

> Back to the fix, the read_old_data() from below works on the working tree,
> yes, but after convert_to_git().
> And that is why we need the index, to fix this very case.

But "git apply" (without "--cached" or "--index") is to work on the
working tree file only.  The target file of the patch that is going
to be modified does not even have to be in the index, i.e. it is
perfectly fine to:

 (1) create a file F, commit, then modify that file;

 (2) take two patches out of that repository you did (1);

 (3) go to a Git repository that does not yet know about file F;

 (4) run "git apply" to process the first patch you took in (2),
     which will create file F; then

 (5) run "git apply" to process the second patch on top, which will
     modify file F.

Step (4) will probably not cause too much issue, as the only thing
we make sure is "Because the patch creates F, F does not exist in
the directory", which would be the case.

Now, when trying to apply the second patch at step (5), we may need
to adjust for the broken line-ending convention, but you do not have
any entry in the index for F to rely on.

That is why I am saying convert_to_git() that checks the index
content is a wrong thing to use in this codepath.  

It is OK to use it for "git add" and friends, as the intent of the
(I'd say "broken") "safe CRLF" mechanism is to take a hint from the
"previous" state to see if CRLF in the "new" content should be
munged, and the "previous" in the context of running "git add" _is_
what is in the index.

The "previous" in the context of running "git apply" (which does not
look at the index) is _not_ what is in the index, on the other hand.
Instead of "struct index_state *", convert_to_git() needs to be
fixed to take something else that can be queried for the "previous"
version to use as a hint, if "safe CRLF" wants to work correctly.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/1] correct apply for files commited with CRLF
  2017-08-04 17:57             ` Junio C Hamano
@ 2017-08-04 19:26               ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-04 19:26 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, asottile

Junio C Hamano <gitster@pobox.com> writes:

> The "previous" in the context of running "git apply" (which does not
> look at the index) is _not_ what is in the index, on the other hand.
> Instead of "struct index_state *", convert_to_git() needs to be
> fixed to take something else that can be queried for the "previous"
> version to use as a hint, if "safe CRLF" wants to work correctly.

I left it unsaid by mistake, but I think the right thing to use as
the "previous" to take hint from in the context of "git apply" is
what is in the working tree, i.e. the result of applying patch in
step (4) to create a file F in the sample scenario.  

While applying patch in step (5), convert_to_git() should "imagine"
adding the file F currently in the working tree (i.e. the result of
step (4)) to the index---if the resulting object in the index would
have CR, then the safe CRLF logic should refrain from doing CRLF->LF
conversion.  And it should do so without actually adding neither the
preimage or the postimage to the index, of course.

When we are doing "git apply --index", then we _require_ that the
indexed contents and what is in the working tree matches before
applying the patch, so it is perfectly fine to let convert_to_git()
to look at the current index---that is the "previous" one we want to
take hint from while using the safe CRLF logic.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
@ 2017-08-12  5:45         ` Torsten Bögershausen
  2017-08-12  5:53           ` Torsten Bögershausen
  2017-08-12 14:56         ` [PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-12  5:45 UTC (permalink / raw)
  To: tboegi; +Cc: Anthony Sottile, Git Mailing List

Test from mutt

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch`
  2017-08-12  5:45         ` Torsten Bögershausen
@ 2017-08-12  5:53           ` Torsten Bögershausen
  0 siblings, 0 replies; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-12  5:53 UTC (permalink / raw)
  To: Anthony Sottile, Git Mailing List


>I left it unsaid by mistake, but I think the right thing to use as
>the "previous" to take hint from in the context of "git apply" is
>what is in the working tree, i.e. the result of applying patch in
>step (4) to create a file F in the sample scenario.

>While applying patch in step (5), convert_to_git() should "imagine"
>adding the file F currently in the working tree (i.e. the result of
>step (4)) to the index---if the resulting object in the index would
>have CR, then the safe CRLF logic should refrain from doing CRLF->LF
>conversion.  And it should do so without actually adding neither the
>preimage or the postimage to the index, of course.

(Sorry for the test mail)

What I wanted to say is that this long explanation convinced me to write
a patch and send it out the next days,

>When we are doing "git apply --index", then we _require_ that the
>indexed contents and what is in the working tree matches before
>applying the patch, so it is perfectly fine to let convert_to_git()
>to look at the current index---that is the "previous" one we want to
>take hint from while using the safe CRLF logic.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
  2017-08-12  5:45         ` Torsten Bögershausen
@ 2017-08-12 14:56         ` tboegi
  2017-08-12 14:56         ` [PATCH/RFC] File commited with CRLF should roundtrip diff and apply tboegi
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-12 14:56 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 convert.c | 10 ++++++----
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
-	if (ret && dst) {
-		src = dst->buf;
-		len = dst->len;
+	if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+		ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
+		if (ret && dst) {
+			src = dst->buf;
+			len = dst->len;
+		}
 	}
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
 	SAFE_CRLF_WARN = 2,
-	SAFE_CRLF_RENORMALIZE = 3
+	SAFE_CRLF_RENORMALIZE = 3,
+	SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/RFC] File commited with CRLF should roundtrip diff and apply
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
  2017-08-12  5:45         ` Torsten Bögershausen
  2017-08-12 14:56         ` [PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
@ 2017-08-12 14:56         ` tboegi
  2017-08-14 17:37           ` Junio C Hamano
  2017-08-13  8:51         ` [PATCH/RFC 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  4 siblings, 1 reply; 36+ messages in thread
From: tboegi @ 2017-08-12 14:56 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF and core.autocrlf is true,
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Analyze the patch if there is any context line with CRLF,
or if any line with CRLF is to be removed.

If yes, the new flag has_crlf is set in "struct patch", and two things
will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for t4140.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 apply.c               | 37 ++++++++++++++++++++++++++++---------
 apply.h               |  4 ++++
 t/t4140-apply-CRLF.sh | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 9 deletions(-)
 create mode 100755 t/t4140-apply-CRLF.sh

diff --git a/apply.c b/apply.c
index f2d599141d..63455cd65f 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int has_crlf:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/* Check if the patch has context lines with CRLF or
+   the patch wants to remove lines with CRLF */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->has_crlf = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2282,10 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags)
 {
+	enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2294,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3385,7 +3401,8 @@ static int load_patch_target(struct apply_state *state,
 			     const struct cache_entry *ce,
 			     struct stat *st,
 			     const char *name,
-			     unsigned expected_mode)
+			     unsigned expected_mode,
+			     int flags)
 {
 	if (state->cached || state->check_index) {
 		if (read_file_or_gitlink(ce, buf))
@@ -3399,7 +3416,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, name, buf, flags))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3423,6 +3440,7 @@ static int load_preimage(struct apply_state *state,
 	char *img;
 	struct patch *previous;
 	int status;
+	int flags = patch->has_crlf ? APPLY_FLAGS_CR_AT_EOL : 0;
 
 	previous = previous_patch(state, patch, &status);
 	if (status)
@@ -3433,7 +3451,7 @@ static int load_preimage(struct apply_state *state,
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
 		status = load_patch_target(state, &buf, ce, st,
-					   patch->old_name, patch->old_mode);
+					   patch->old_name, patch->old_mode, flags);
 		if (status < 0)
 			return status;
 		else if (status == SUBMODULE_PATCH_WITHOUT_INDEX) {
@@ -3493,7 +3511,8 @@ static int three_way_merge(struct image *image,
  */
 static int load_current(struct apply_state *state,
 			struct image *image,
-			struct patch *patch)
+			struct patch *patch,
+			int flags)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int status, pos;
@@ -3520,7 +3539,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, name, mode, flags);
 	if (status < 0)
 		return status;
 	else if (status)
@@ -3571,7 +3590,8 @@ static int try_threeway(struct apply_state *state,
 
 	/* our_oid is ours */
 	if (patch->is_new) {
-		if (load_current(state, &tmp_image, patch))
+		int flags = 0;
+		if (load_current(state, &tmp_image, patch, flags))
 			return error(_("cannot read the current contents of '%s'"),
 				     patch->new_name);
 	} else {
@@ -3617,7 +3637,6 @@ static int apply_data(struct apply_state *state, struct patch *patch,
 		      struct stat *st, const struct cache_entry *ce)
 {
 	struct image image;
-
 	if (load_preimage(state, &image, patch, st, ce) < 0)
 		return -1;
 
diff --git a/apply.h b/apply.h
index b3d6783d55..192140280f 100644
--- a/apply.h
+++ b/apply.h
@@ -33,9 +33,13 @@ enum apply_verbosity {
 #define APPLY_SYMLINK_GOES_AWAY 01
 #define APPLY_SYMLINK_IN_RESULT 02
 
+
+#define APPLY_FLAGS_CR_AT_EOL   (1<<0)
+
 struct apply_state {
 	const char *prefix;
 	int prefix_length;
+	int flags;
 
 	/* These are lock_file related */
 	struct lock_file *lock_file;
diff --git a/t/t4140-apply-CRLF.sh b/t/t4140-apply-CRLF.sh
new file mode 100755
index 0000000000..fd528daabd
--- /dev/null
+++ b/t/t4140-apply-CRLF.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='CRLF diff and apply'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	mkdir upstream &&
+	(
+		cd upstream &&
+		git init &&
+		git config core.autocrlf false &&
+		>.gitignore &&
+		git add . &&
+		git commit -m gitignore &&
+		printf "F1\r\n" >FileCRLF &&
+		git add . &&
+		git commit -m initial &&
+		git diff HEAD^1 HEAD -- >../patch1
+	) &&
+	git config core.autocrlf true
+'
+
+test_expect_success 'apply patches Replace lines' '
+	(
+		cd upstream &&
+		printf "F11\r\nF12\r\n" >FileCRLF &&
+		git diff  >../patch2Replace
+	) &&
+	git apply patch1 &&
+	git apply patch2Replace
+'
+
+test_expect_success 'apply patches Add lines' '
+	(
+		cd upstream &&
+		git checkout FileCRLF &&
+		printf "F2\r\n" >>FileCRLF &&
+		git diff  >../patch2Add
+	) &&
+	rm -f FileCRLF &&
+	git apply patch1 &&
+	git apply patch2Add
+'
+
+test_done
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/RFC 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
                           ` (2 preceding siblings ...)
  2017-08-12 14:56         ` [PATCH/RFC] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-13  8:51         ` tboegi
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  4 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-13  8:51 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
convert.c | 10 ++++++----
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
-	if (ret && dst) {
-		src = dst->buf;
-		len = dst->len;
+	if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+		ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
+		if (ret && dst) {
+			src = dst->buf;
+			len = dst->len;
+		}
 	}
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
 	SAFE_CRLF_WARN = 2,
-	SAFE_CRLF_RENORMALIZE = 3
+	SAFE_CRLF_RENORMALIZE = 3,
+	SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
                           ` (3 preceding siblings ...)
  2017-08-13  8:51         ` [PATCH/RFC 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
@ 2017-08-13  8:51         ` tboegi
  2017-08-16 18:28           ` Junio C Hamano
                             ` (4 more replies)
  4 siblings, 5 replies; 36+ messages in thread
From: tboegi @ 2017-08-13  8:51 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF and core.autocrlf is true,
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Analyze the patch if there is any context line with CRLF,
or if any line with CRLF is to be removed.

If yes, the new flag has_crlf is set in "struct patch", and two things
will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for t4140.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---


The last version did not pass t4124, fix this.



apply.c                  | 37 ++++++++++++++++++++++++++++---------
 apply.h                  |  4 ++++
 t/t4124-apply-ws-rule.sh |  3 +--
 t/t4140-apply-CRLF.sh    | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 79 insertions(+), 11 deletions(-)
 create mode 100755 t/t4140-apply-CRLF.sh

diff --git a/apply.c b/apply.c
index f2d599141d..63455cd65f 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int has_crlf:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/* Check if the patch has context lines with CRLF or
+   the patch wants to remove lines with CRLF */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->has_crlf = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2282,10 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags)
 {
+	enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2294,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3385,7 +3401,8 @@ static int load_patch_target(struct apply_state *state,
 			     const struct cache_entry *ce,
 			     struct stat *st,
 			     const char *name,
-			     unsigned expected_mode)
+			     unsigned expected_mode,
+			     int flags)
 {
 	if (state->cached || state->check_index) {
 		if (read_file_or_gitlink(ce, buf))
@@ -3399,7 +3416,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, name, buf, flags))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3423,6 +3440,7 @@ static int load_preimage(struct apply_state *state,
 	char *img;
 	struct patch *previous;
 	int status;
+	int flags = patch->has_crlf ? APPLY_FLAGS_CR_AT_EOL : 0;
 
 	previous = previous_patch(state, patch, &status);
 	if (status)
@@ -3433,7 +3451,7 @@ static int load_preimage(struct apply_state *state,
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
 		status = load_patch_target(state, &buf, ce, st,
-					   patch->old_name, patch->old_mode);
+					   patch->old_name, patch->old_mode, flags);
 		if (status < 0)
 			return status;
 		else if (status == SUBMODULE_PATCH_WITHOUT_INDEX) {
@@ -3493,7 +3511,8 @@ static int three_way_merge(struct image *image,
  */
 static int load_current(struct apply_state *state,
 			struct image *image,
-			struct patch *patch)
+			struct patch *patch,
+			int flags)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int status, pos;
@@ -3520,7 +3539,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, name, mode, flags);
 	if (status < 0)
 		return status;
 	else if (status)
@@ -3571,7 +3590,8 @@ static int try_threeway(struct apply_state *state,
 
 	/* our_oid is ours */
 	if (patch->is_new) {
-		if (load_current(state, &tmp_image, patch))
+		int flags = 0;
+		if (load_current(state, &tmp_image, patch, flags))
 			return error(_("cannot read the current contents of '%s'"),
 				     patch->new_name);
 	} else {
@@ -3617,7 +3637,6 @@ static int apply_data(struct apply_state *state, struct patch *patch,
 		      struct stat *st, const struct cache_entry *ce)
 {
 	struct image image;
-
 	if (load_preimage(state, &image, patch, st, ce) < 0)
 		return -1;
 
diff --git a/apply.h b/apply.h
index b3d6783d55..192140280f 100644
--- a/apply.h
+++ b/apply.h
@@ -33,9 +33,13 @@ enum apply_verbosity {
 #define APPLY_SYMLINK_GOES_AWAY 01
 #define APPLY_SYMLINK_IN_RESULT 02
 
+
+#define APPLY_FLAGS_CR_AT_EOL   (1<<0)
+
 struct apply_state {
 	const char *prefix;
 	int prefix_length;
+	int flags;
 
 	/* These are lock_file related */
 	struct lock_file *lock_file;
diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index d350065f25..4da6e6e894 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -475,11 +475,10 @@ test_expect_success 'same, but with CR-LF line endings && cr-at-eol unset' '
 	cp one save-one &&
 	printf "                 \r\n" >>one &&
 	git add one &&
-	cp one expect &&
 	printf "d\r\n" >>one &&
+	cp one expect &&
 	git diff -- one >patch &&
 	mv save-one one &&
-	echo d >>expect &&
 
 	git apply --ignore-space-change --whitespace=fix patch &&
 	test_cmp one expect
diff --git a/t/t4140-apply-CRLF.sh b/t/t4140-apply-CRLF.sh
new file mode 100755
index 0000000000..fd528daabd
--- /dev/null
+++ b/t/t4140-apply-CRLF.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='CRLF diff and apply'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	mkdir upstream &&
+	(
+		cd upstream &&
+		git init &&
+		git config core.autocrlf false &&
+		>.gitignore &&
+		git add . &&
+		git commit -m gitignore &&
+		printf "F1\r\n" >FileCRLF &&
+		git add . &&
+		git commit -m initial &&
+		git diff HEAD^1 HEAD -- >../patch1
+	) &&
+	git config core.autocrlf true
+'
+
+test_expect_success 'apply patches Replace lines' '
+	(
+		cd upstream &&
+		printf "F11\r\nF12\r\n" >FileCRLF &&
+		git diff  >../patch2Replace
+	) &&
+	git apply patch1 &&
+	git apply patch2Replace
+'
+
+test_expect_success 'apply patches Add lines' '
+	(
+		cd upstream &&
+		git checkout FileCRLF &&
+		printf "F2\r\n" >>FileCRLF &&
+		git diff  >../patch2Add
+	) &&
+	rm -f FileCRLF &&
+	git apply patch1 &&
+	git apply patch2Add
+'
+
+test_done
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH/RFC] File commited with CRLF should roundtrip diff and apply
  2017-08-12 14:56         ` [PATCH/RFC] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-14 17:37           ` Junio C Hamano
  2017-08-17  6:06             ` [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
                               ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-14 17:37 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> When a file had been commited with CRLF and core.autocrlf is true,
> the following does not roundtrip, `git apply` fails:
>
> printf "Added line\r\n" >>file &&
> git diff >patch &&
> git checkout -- . &&
> git apply patch

Should this tweak be in effect when we are paying attention to the
contents of the index?  Or does it matter only when we are doing
"git apply" without either "--index" or "--cache"?  

The fact that the new flag that is passed from load_preimage() down
to read_old_data(), the latter of which is only about the "behave as
a better 'patch -p1', ignoring the index" mode, hints that this
should not affect anything when we are paying attention to the
index, but then calling the load_patch_target() helper with a very
generic CR_AT_EOL flag looks a bit misleading, by making it appear
as if the caller is telling all three cases to do the funny CR
business.  I wonder if we instead want to pass the "patch" structure
down the callchain and have _only_ read_old_data() pay attention to
the has_crlf bit in it---that way, it becomes more obvious what is
going on.

I also notice that read_old_data() still passes &the_index to
convert_to_git().  Can we fix convert_to_git() further to allow NULL
as the istate parameter when we tell it not to look at the index ( I
am reading your passing SAFE_CRLF_KEEP and SAFE_CRLF_FALSE as a way
for the caller to tell that the caller knows what it wants already
and does not want covnert_to_git() to look at the index)?

With or without CR in the patch, I do not see any reason for the
codepath from read_old_data() down to the convert API to look at the
index, ever, as read_old_data() is called only when (!state->cached
&& !state->check_index), i.e. when we are working as a better 'patch
-p1' without even having to know that we are in a Git repository, by
load_patch_target().

> diff --git a/apply.c b/apply.c
> index f2d599141d..63455cd65f 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -220,6 +220,7 @@ struct patch {
>  	unsigned int recount:1;
>  	unsigned int conflicted_threeway:1;
>  	unsigned int direct_to_threeway:1;
> +	unsigned int has_crlf:1;
>  	struct fragment *fragments;
>  	char *result;
>  	size_t resultsize;
> @@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
>  	record_ws_error(state, result, line + 1, len - 2, state->linenr);
>  }
>  
> +/* Check if the patch has context lines with CRLF or
> +   the patch wants to remove lines with CRLF */

Style.

> +static void check_old_for_crlf(struct patch *patch, const char *line, int len)
> +{
> +	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
> +		patch->ws_rule |= WS_CR_AT_EOL;
> +		patch->has_crlf = 1;
> +	}
> +}

I am wondering where the discrepancy between the names (old crlf
here, as opposed to "struct patch" that says "has_crlf" implying it
does not care which side between old and new has one) comes from.

Is the intention that you only care about what is expected of the
preimage?  

> @@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
>  			if (!deleted && !added)
>  				leading++;
>  			trailing++;
> +			check_old_for_crlf(patch, line, len);
>  			if (!state->apply_in_reverse &&
>  			    state->ws_error_action == correct_ws_error)
>  				check_whitespace(state, line, len, patch->ws_rule);
>  			break;

If check_old is about "we care about the lines that are supposed to
match what currently exist in the target file to be patched", then
the call to it also must pay attention to apply_in_reverse.  What
appears on '+' lines in a patch will become the "expected old
contents the patched target is supposed to have" when we are running
"apply -R".

>  		case '-':
> +			check_old_for_crlf(patch, line, len);
>  			if (state->apply_in_reverse &&
>  			    state->ws_error_action != nowarn_ws_error)
>  				check_whitespace(state, line, len, patch->ws_rule);

Likewise.

Thanks for working on this; unlike the previous one I commented, I
think this one I can sort of see how this is an approach to fix it.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-16 18:28           ` Junio C Hamano
  2017-08-16 18:28           ` [PATCH/FIXUP 3/2] apply: remove unused member apply_state.flags Junio C Hamano
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-16 18:28 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

I'll be sending a few patches that apply on top of applying these
two patches to show what I meant in my previous review comments.
The net change to apply.c, when you combine your 2/2 with these,
would become like the attached, which I think makes more sense.

Instead of queuing a squashed result, I thought it may help to send
them as incremental fixes with their own justification.

diff --git a/apply.c b/apply.c
index f2d599141d..c06f7014a2 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int crlf_in_old:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->crlf_in_old = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1726,14 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -1725,6 +1742,8 @@ static int parse_fragment(struct apply_state *state,
 			trailing = 0;
 			break;
 		case '+':
+			if (state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2287,12 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, const char *path, struct strbuf *buf,
+			 struct patch *patch)
 {
+	enum safe_crlf safe_crlf = (patch->crlf_in_old
+				    ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE);
+
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2301,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3384,6 +3407,7 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
 			     unsigned expected_mode)
 {
@@ -3399,7 +3423,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, name, buf, patch))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3432,7 +3456,7 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
+		status = load_patch_target(state, &buf, ce, st, patch,
 					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
@@ -3520,7 +3544,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/FIXUP 3/2] apply: remove unused member apply_state.flags
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  2017-08-16 18:28           ` Junio C Hamano
@ 2017-08-16 18:28           ` Junio C Hamano
  2017-08-16 18:29           ` [PATCH/FIXUP 4/2] apply: only pay attention to CRLF in the preimage Junio C Hamano
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-16 18:28 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

The previous step added the "flags" member to apply_state, but it is
never used.  Remove it and move the bit assignment macro to apply.c
as that is just a private implementation detail.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 apply.c | 2 ++
 apply.h | 4 ----
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/apply.c b/apply.c
index 63455cd65f..7663e63df7 100644
--- a/apply.c
+++ b/apply.c
@@ -2282,6 +2282,8 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
+#define APPLY_FLAGS_CR_AT_EOL   (1<<0)
+
 static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags)
 {
 	enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ?
diff --git a/apply.h b/apply.h
index 192140280f..b3d6783d55 100644
--- a/apply.h
+++ b/apply.h
@@ -33,13 +33,9 @@ enum apply_verbosity {
 #define APPLY_SYMLINK_GOES_AWAY 01
 #define APPLY_SYMLINK_IN_RESULT 02
 
-
-#define APPLY_FLAGS_CR_AT_EOL   (1<<0)
-
 struct apply_state {
 	const char *prefix;
 	int prefix_length;
-	int flags;
 
 	/* These are lock_file related */
 	struct lock_file *lock_file;
-- 
2.14.1-331-g7631d96230


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/FIXUP 4/2] apply: only pay attention to CRLF in the preimage
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  2017-08-16 18:28           ` Junio C Hamano
  2017-08-16 18:28           ` [PATCH/FIXUP 3/2] apply: remove unused member apply_state.flags Junio C Hamano
@ 2017-08-16 18:29           ` Junio C Hamano
  2017-08-16 18:30           ` [PATCH/FIXUP 5/2] apply: localize the CRLF business to read_old_data() Junio C Hamano
  2017-08-16 18:34           ` [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case Junio C Hamano
  4 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-16 18:29 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

The newly added "patch.has_crlf" member wants to indicate if the
incoming patch expects any CRLF line in the patch target, and
parse_fragment() implements that logic for "git apply".

Rename the member to "patch.crlf_in_old" to clarify what it means,
and fix the logic in parse_fragment() so that it also works correctly
when running "git apply -R", where '+' lines correspond to the patch
target.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 There also is an obvious style fix for comment, but I didn't bother
 splitting it out to a separate step.

 apply.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/apply.c b/apply.c
index 7663e63df7..995973da3d 100644
--- a/apply.c
+++ b/apply.c
@@ -220,7 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
-	unsigned int has_crlf:1;
+	unsigned int crlf_in_old:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1663,13 +1663,15 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
-/* Check if the patch has context lines with CRLF or
-   the patch wants to remove lines with CRLF */
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
 static void check_old_for_crlf(struct patch *patch, const char *line, int len)
 {
 	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
 		patch->ws_rule |= WS_CR_AT_EOL;
-		patch->has_crlf = 1;
+		patch->crlf_in_old = 1;
 	}
 }
 
@@ -1730,7 +1732,8 @@ static int parse_fragment(struct apply_state *state,
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
-			check_old_for_crlf(patch, line, len);
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -1739,6 +1742,8 @@ static int parse_fragment(struct apply_state *state,
 			trailing = 0;
 			break;
 		case '+':
+			if (state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -3442,7 +3447,7 @@ static int load_preimage(struct apply_state *state,
 	char *img;
 	struct patch *previous;
 	int status;
-	int flags = patch->has_crlf ? APPLY_FLAGS_CR_AT_EOL : 0;
+	int flags = patch->crlf_in_old ? APPLY_FLAGS_CR_AT_EOL : 0;
 
 	previous = previous_patch(state, patch, &status);
 	if (status)

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/FIXUP 5/2] apply: localize the CRLF business to read_old_data()
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
                             ` (2 preceding siblings ...)
  2017-08-16 18:29           ` [PATCH/FIXUP 4/2] apply: only pay attention to CRLF in the preimage Junio C Hamano
@ 2017-08-16 18:30           ` Junio C Hamano
  2017-08-16 18:34           ` [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case Junio C Hamano
  4 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-16 18:30 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

Previous changes passed a new APPLY_FLAGS_CR_AT_EOL option down from
load_preimage() to read_old_data(), because the last function in
that callchain needs to decide how its call to convert_to_git()
function is made on the data read from the working tree.

The load_preimage() function and its direct callees, however, are
not limited to the case where the patch is applied to the data in
the working tree (i.e. "git apply" that is working as a better
"patch -p1"), unlike read_old_data(), which deals only with the
patch target in the working tree.  They are also responsible for
driving "git apply --cached" and "git apply --index", both of which
take the current index contents into account and do not need the new
special-casing of CRLF.  Exposing APPLY_FLAGS_CR_AT_EOL bit to them
is misleading.

Instead, just pass the "struct patch" down the same callchain, and
have read_old_data() look at its crlf_in_old member to make the
necessary decision.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 This is what I care about the most in these fix-ups.

 apply.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/apply.c b/apply.c
index 995973da3d..c06f7014a2 100644
--- a/apply.c
+++ b/apply.c
@@ -2287,12 +2287,12 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-#define APPLY_FLAGS_CR_AT_EOL   (1<<0)
-
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf, int flags)
+static int read_old_data(struct stat *st, const char *path, struct strbuf *buf,
+			 struct patch *patch)
 {
-	enum safe_crlf safe_crlf = flags & APPLY_FLAGS_CR_AT_EOL ?
-		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
+	enum safe_crlf safe_crlf = (patch->crlf_in_old
+				    ? SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE);
+
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -3407,9 +3407,9 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
-			     unsigned expected_mode,
-			     int flags)
+			     unsigned expected_mode)
 {
 	if (state->cached || state->check_index) {
 		if (read_file_or_gitlink(ce, buf))
@@ -3423,7 +3423,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf, flags))
+			if (read_old_data(st, name, buf, patch))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3447,7 +3447,6 @@ static int load_preimage(struct apply_state *state,
 	char *img;
 	struct patch *previous;
 	int status;
-	int flags = patch->crlf_in_old ? APPLY_FLAGS_CR_AT_EOL : 0;
 
 	previous = previous_patch(state, patch, &status);
 	if (status)
@@ -3457,8 +3456,8 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
-					   patch->old_name, patch->old_mode, flags);
+		status = load_patch_target(state, &buf, ce, st, patch,
+					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
 		else if (status == SUBMODULE_PATCH_WITHOUT_INDEX) {
@@ -3518,8 +3517,7 @@ static int three_way_merge(struct image *image,
  */
 static int load_current(struct apply_state *state,
 			struct image *image,
-			struct patch *patch,
-			int flags)
+			struct patch *patch)
 {
 	struct strbuf buf = STRBUF_INIT;
 	int status, pos;
@@ -3546,7 +3544,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode, flags);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)
@@ -3597,8 +3595,7 @@ static int try_threeway(struct apply_state *state,
 
 	/* our_oid is ours */
 	if (patch->is_new) {
-		int flags = 0;
-		if (load_current(state, &tmp_image, patch, flags))
+		if (load_current(state, &tmp_image, patch))
 			return error(_("cannot read the current contents of '%s'"),
 				     patch->new_name);
 	} else {
@@ -3644,6 +3641,7 @@ static int apply_data(struct apply_state *state, struct patch *patch,
 		      struct stat *st, const struct cache_entry *ce)
 {
 	struct image image;
+
 	if (load_preimage(state, &image, patch, st, ce) < 0)
 		return -1;
 
-- 
2.14.1-331-g7631d96230



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
                             ` (3 preceding siblings ...)
  2017-08-16 18:30           ` [PATCH/FIXUP 5/2] apply: localize the CRLF business to read_old_data() Junio C Hamano
@ 2017-08-16 18:34           ` Junio C Hamano
  2017-08-17  6:24             ` Torsten Bögershausen
  4 siblings, 1 reply; 36+ messages in thread
From: Junio C Hamano @ 2017-08-16 18:34 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

With the previous fixes to CRLF handling in place, read_old_data()
knows what it wants convert_to_git() to do with respect to CRLF.  In
fact, this codepath is about applying a patch to a file in the
filesystem, which may not exist in the index, or may exist but may
not match what is recorded in the index, or in the extreme case, we
may not even be in a Git repository.  If convert_to_git() peeked at
the index while doing its work, it *would* be a bug.

Pass NULL instead of &the_index to the function to make sure we
catch future bugs to clarify this.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * NOTE NOTE NOTE: This is not a part of the "squashed diff" I sent
   earlier, and with this applied, you will see failure in t0020.

   The breakage is because convert_to_git(), with your [PATCH 1/2],
   is not yet prepared to be told "there is no need for safe-crlf
   processing, so do not even look at the index".  You either need
   to invent yet another flag similar to SAFE_CRLF_KEEP_CRLF that is
   different from SAFE_CRLF_FALSE to tell the machinery never to
   look at the index, or fix SAFE_CRLF_FALSE itself so that the
   index is not checked when the caller knows safe-crlf processing
   is not needed.

 apply.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/apply.c b/apply.c
index c06f7014a2..ad58cd1c77 100644
--- a/apply.c
+++ b/apply.c
@@ -2301,7 +2301,15 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf,
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf);
+		/*
+		 * "git apply" without "--index/--cached" should never look
+		 * at the index; the target file may not have been added to
+		 * the index yet, and we may not even be in any Git repository.
+		 * Pass NULL to convert_to_git() to stress this; the function
+		 * should never look at the index when explicit crlf option
+		 * is given.
+		 */
+		convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
-- 
2.14.1-331-g7631d96230


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
  2017-08-14 17:37           ` Junio C Hamano
@ 2017-08-17  6:06             ` tboegi
  2017-08-17  6:06             ` [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply tboegi
                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-17  6:06 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 convert.c | 10 ++++++----
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
-	if (ret && dst) {
-		src = dst->buf;
-		len = dst->len;
+	if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+		ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
+		if (ret && dst) {
+			src = dst->buf;
+			len = dst->len;
+		}
 	}
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
 	SAFE_CRLF_WARN = 2,
-	SAFE_CRLF_RENORMALIZE = 3
+	SAFE_CRLF_RENORMALIZE = 3,
+	SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-14 17:37           ` Junio C Hamano
  2017-08-17  6:06             ` [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
@ 2017-08-17  6:06             ` tboegi
  2017-08-17  6:37               ` Junio C Hamano
  2017-08-17 21:43             ` [PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
                               ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: tboegi @ 2017-08-17  6:06 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this (ignore the $ at the begin of the line):

$ diff --git a/one b/one
$ index 533790e..c30dea8 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1,2 @@
$  a\r
$ +b\r

And for b) it looks like this:

$ diff --git a/one b/one
$ index 533790e..485540d 100644
$ --- a/one
$ +++ b/one
$ @@ -1 +1 @@
$ -a\r
$ +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag has_crlf is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for the changes in t4124.
One test case is split up into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree. (*)

* This one proves that convert_to_git(&the_index,...) still needs to pass
the &index, otherwise Git will crash.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 apply.c                  | 28 +++++++++++++++++++++++-----
 t/t4124-apply-ws-rule.sh | 33 +++++++++++++++++++++++++++------
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..bebb176099 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int has_crlf:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,17 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/* Check if the patch has context lines with CRLF or
+   the patch wants to remove lines with CRLF */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->has_crlf = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1724,13 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2282,11 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+			 const char *path, struct strbuf *buf)
 {
+	enum safe_crlf safe_crlf = patch->has_crlf ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_FALSE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2295,7 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		convert_to_git(&the_index, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3384,6 +3401,7 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
 			     unsigned expected_mode)
 {
@@ -3399,7 +3417,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, patch, name, buf))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3432,7 +3450,7 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
+		status = load_patch_target(state, &buf, ce, st, patch,
 					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
@@ -3520,7 +3538,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)
diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index d350065f25..4fc27c51f7 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -467,21 +467,42 @@ test_expect_success 'same, but with CR-LF line endings && cr-at-eol set' '
 	test_cmp one expect
 '
 
-test_expect_success 'same, but with CR-LF line endings && cr-at-eol unset' '
+test_expect_success 'CR-LF line endings && add line && text=auto' '
 	git config --unset core.whitespace &&
 	printf "a\r\n" >one &&
+	cp one save-one &&
+	git add one &&
 	printf "b\r\n" >>one &&
-	printf "c\r\n" >>one &&
+	cp one expect &&
+	git diff -- one >patch &&
+	mv save-one one &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
+
+test_expect_success 'CR-LF line endings && change line && text=auto' '
+	printf "a\r\n" >one &&
 	cp one save-one &&
-	printf "                 \r\n" >>one &&
 	git add one &&
+	printf "b\r\n" >one &&
 	cp one expect &&
-	printf "d\r\n" >>one &&
 	git diff -- one >patch &&
 	mv save-one one &&
-	echo d >>expect &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
 
-	git apply --ignore-space-change --whitespace=fix patch &&
+test_expect_success 'LF in repo, CRLF in worktree && change line && text=auto' '
+	printf "a\n" >one &&
+	git add one &&
+	printf "b\r\n" >one &&
+	git diff -- one >patch &&
+	printf "a\r\n" >one &&
+	echo "one text=auto" >.gitattributes &&
+	git -c core.eol=CRLF apply patch &&
+	printf "b\r\n" >expect &&
 	test_cmp one expect
 '
 
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-16 18:34           ` [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case Junio C Hamano
@ 2017-08-17  6:24             ` Torsten Bögershausen
  2017-08-17  7:06               ` Junio C Hamano
  2017-08-17  7:12               ` Junio C Hamano
  0 siblings, 2 replies; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-17  6:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, asottile

On Wed, Aug 16, 2017 at 11:34:45AM -0700, Junio C Hamano wrote:
> With the previous fixes to CRLF handling in place, read_old_data()
> knows what it wants convert_to_git() to do with respect to CRLF.  In
> fact, this codepath is about applying a patch to a file in the
> filesystem, which may not exist in the index, or may exist but may
> not match what is recorded in the index, or in the extreme case, we
> may not even be in a Git repository.  If convert_to_git() peeked at
> the index while doing its work, it *would* be a bug.
> 
> Pass NULL instead of &the_index to the function to make sure we
> catch future bugs to clarify this.

Thanks for the work, and now our emails crossed.

Calling convert_to_git(NULL,...) makes Git crash today, see t4124, my
latest version, "LF in repo, CRLF in working tree)
Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
like SAFE_CRLF_KEEP_CRLF does.
The combination of convert_to_git(NULL,...,SAFE_CRLF_KEEP_CRLF) is OK,
but all others must supply an &index.

At a very first glance the end result may look like this:
- Take my changes in convert.[ch]
- Take your changes/commit in apply.c (except the NULL, see above)
- Take my changes in t4124.

I don't have time to look at this today or tomorrow,
please give a hint if you are working further.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-17  6:06             ` [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-17  6:37               ` Junio C Hamano
  0 siblings, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17  6:37 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

tboegi@web.de writes:

> Analyze the patch if there is a) any context line with CRLF,
> or b) if any line with CRLF is to be removed.
> Thanks to Junio C Hamano, his input became the base for the changes in t4124.
> One test case is split up into 3:
> - Detect the " a\r" line in the patch
> - Detect the "-a\r" line in the patch
> - Use LF in repo and CLRF in the worktree. (*)
>
> * This one proves that convert_to_git(&the_index,...) still needs to pass
> the &index, otherwise Git will crash.

I do not understand why you think it proves anything like that.
Forget about "in repo" when you think about "git apply" without
"--index/--cache".  There is *NO* role the index should play in that
mode.

"Otherwise Git will crash" is true, because convert_to_git() tries
to dereference the istate it is given to see if there is CR in the
blob that happens to be in the index at the path.

But step back and *think*.  It only proves that convert_to_git() you
have and/or the way read_old_data() you updated calls it after
applying these two patches are still broken.

The "blob that happens to be in the index at the path" does *NOT*
have anything to do with the file in the working tree that you are
patching.  Such an index entry may not exist (and the code would see
that there is 0 CRs and 0 LFs---so what?), or it may have a blob
full of CRLF, or it may have a blob full of CR not LF, or any random
thing that has NO relation to the file you are patching.  Why should
that random blob (which may not even exist---we are discussing "git
apply" without "--index/--cache" here) affect the outcome?  If it
does not affect the outcome, why should convert_to_git() even look
at it?

It shouldn't be calling down to "has_cr_in_index(istate, path)" that
is called from crlf_to_git() in the first place.  The check for CR
was done in the caller of convert_to_git(), i.e. read_old_data(),
long before convert_to_git() is called, and the caller knows if it
is (or it is not) dealing with data that needs crlf conversion at
that point, based on the contents of the file being patched (which,
again, does not have any relation to the blob that may or may not
happen to be in the index).  

Your updated caller is already telling convert_to_git() codepath
when it needs CRLF and it refrains from peeking in the index with
SAFE_CRLF_KEEP_CRLF flag.  The bug still in the code after applying
these two patches is that the other choice, i.e. SAFE_CRLF_FALSE,
that is passed from read_old_data() to convert_to_git() does *not*
prevent the latter from peeking into the in-core index.  

And that is why "Otherwise Git will crash".

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-17  6:24             ` Torsten Bögershausen
@ 2017-08-17  7:06               ` Junio C Hamano
  2017-08-17  7:12               ` Junio C Hamano
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17  7:06 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, asottile

Torsten Bögershausen <tboegi@web.de> writes:

> I don't have time to look at this today or tomorrow,
> please give a hint if you are working further.

It is past my bedtime, and generally I prefer not to touch topics
that I know other people are willing to look into, especially when
I know those "other people" are well informed and capable.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-17  6:24             ` Torsten Bögershausen
  2017-08-17  7:06               ` Junio C Hamano
@ 2017-08-17  7:12               ` Junio C Hamano
  2017-08-17  8:24                 ` Torsten Bögershausen
  2017-08-17 17:07                 ` Junio C Hamano
  1 sibling, 2 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17  7:12 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, asottile

Torsten Bögershausen <tboegi@web.de> writes:

> Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
> like SAFE_CRLF_KEEP_CRLF does.

My preference is not to use NULL as any hint.  Instead, the "flag"
parameter we already pass to convert_to_git(), just like the updated
read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
not disturb existing CRLF without looking at the istate, should be
used to tell convert_to_git() to do the opposite, but do so without
looking at the istate.

Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
to invent another flag.  I dunno.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-17  7:12               ` Junio C Hamano
@ 2017-08-17  8:24                 ` Torsten Bögershausen
  2017-08-17 17:07                 ` Junio C Hamano
  1 sibling, 0 replies; 36+ messages in thread
From: Torsten Bögershausen @ 2017-08-17  8:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, asottile

On Thu, Aug 17, 2017 at 12:12:36AM -0700, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
> > Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
> > like SAFE_CRLF_KEEP_CRLF does.
> 
> My preference is not to use NULL as any hint.  Instead, the "flag"
> parameter we already pass to convert_to_git(), just like the updated
> read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
> not disturb existing CRLF without looking at the istate, should be
> used to tell convert_to_git() to do the opposite, but do so without
> looking at the istate.
> 
> Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
> to invent another flag.  I dunno.

OK, message taken, in short:
I will come up with a new series in a couple of days - 
thanks for the input.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case
  2017-08-17  7:12               ` Junio C Hamano
  2017-08-17  8:24                 ` Torsten Bögershausen
@ 2017-08-17 17:07                 ` Junio C Hamano
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17 17:07 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, asottile

Junio C Hamano <gitster@pobox.com> writes:

> Torsten Bögershausen <tboegi@web.de> writes:
>
>> Unless we re-define the meaning of "NULL" into "don't do CRLF conversions,
>> like SAFE_CRLF_KEEP_CRLF does.
>
> My preference is not to use NULL as any hint.  Instead, the "flag"
> parameter we already pass to convert_to_git(), just like the updated
> read_old_data() uses SAFE_CRLF_KEEP_CRLF to tell it that it should
> not disturb existing CRLF without looking at the istate, should be
> used to tell convert_to_git() to do the opposite, but do so without
> looking at the istate.
>
> Perhaps SAFE_CRLF_FALSE should be such a flag.  Or perhaps we need
> to invent another flag.  I dunno.

I grepped for SAFE_CRLF_FALSE and found only two callers that
explicitly pass it down the callchain, both of which essentially
says "if we are writing the object out, use core.safecrlf, but if we
are merely hashing, do SAFE_CRLF_FALSE thing".  

I think their use case is quite similar to the codepath we are
discussing---they have a data that come from the outside world, and
they know the index entry that happens to be at the path has nothing
to do with the data they are asking convert_to_git() to massage
(i.e. it is _wrong_ if the contents of the blob that happens to be
in the index at the path affected the outcome of the conversion).

So I think the fix to convert_to_git() can just use SAFE_CRLF_FALSE
as such an instruction that tells the function not do the "safe
crlf" thing, which looks at the contents in the index and decide if
the CRLFs in the contents being converted should be turned into LFs.
And because the function is told not to look at the index, it should
be made safe to pass istate=NULL.  There does not seem to be a need
to invent another flag.

Thanks.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
  2017-08-14 17:37           ` Junio C Hamano
  2017-08-17  6:06             ` [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
  2017-08-17  6:06             ` [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-17 21:43             ` tboegi
  2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-17 21:43 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 convert.c | 10 ++++++----
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
-	if (ret && dst) {
-		src = dst->buf;
-		len = dst->len;
+	if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+		ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
+		if (ret && dst) {
+			src = dst->buf;
+			len = dst->len;
+		}
 	}
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
 	SAFE_CRLF_WARN = 2,
-	SAFE_CRLF_RENORMALIZE = 3
+	SAFE_CRLF_RENORMALIZE = 3,
+	SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-14 17:37           ` Junio C Hamano
                               ` (2 preceding siblings ...)
  2017-08-17 21:43             ` [PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
@ 2017-08-17 21:43             ` tboegi
  2017-08-17 22:29               ` Junio C Hamano
  2017-08-17 22:35               ` Junio C Hamano
  2017-08-19 11:27             ` [PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
  2017-08-19 11:28             ` [PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  5 siblings, 2 replies; 36+ messages in thread
From: tboegi @ 2017-08-17 21:43 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this (ignore the * at the begin of the line):

* diff --git a/one b/one
* index 533790e..c30dea8 100644
* --- a/one
* +++ b/one
* @@ -1 +1,2 @@
*  a\r
* +b\r

And for b) it looks like this:

* diff --git a/one b/one
* index 533790e..485540d 100644
* --- a/one
* +++ b/one
* @@ -1 +1 @@
* -a\r
* +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

Thanks to Junio C Hamano, his input became the base for the changes in t4124.
One test case is split up into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree.

Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
Changes since v2:
- Manually integrated all code changes from Junio
  (Thanks, I hope that I didn't miss something)
- Having examples of "git diff" in the commit message confuses "git apply",
  so that all examples for git diff have a '*' at the beginnig of the line
  (V2 used '$' which is typically an example for a shell script)
- The official version to apply the CRLF-rules without having an index is
  SAFE_CRLF_RENORMALIZE, that is already working today.
- Now we have convert_to_git(NULL, ..., safe_crlf) with
  enum safe_crlf safe_crlf = patch->crlf_in_old ?
          SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;

apply.c                  | 40 +++++++++++++++++++++++++++++++++++-----
 t/t4124-apply-ws-rule.sh | 33 +++++++++++++++++++++++++++------
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..691f47c783 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int crlf_in_old:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->crlf_in_old = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+			 const char *path, struct strbuf *buf)
 {
+	enum safe_crlf safe_crlf = patch->crlf_in_old ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2299,15 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+	       /*
+		* "git apply" without "--index/--cached" should never look
+		* at the index; the target file may not have been added to
+		* the index yet, and we may not even be in any Git repository.
+		* Pass NULL to convert_to_git() to stress this; the function
+		* should never look at the index when explicit crlf option
+		* is given.
+		*/
+		convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3384,6 +3413,7 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
 			     unsigned expected_mode)
 {
@@ -3399,7 +3429,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, patch, name, buf))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3432,7 +3462,7 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
+		status = load_patch_target(state, &buf, ce, st, patch,
 					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
@@ -3520,7 +3550,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)
diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index d350065f25..4fc27c51f7 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -467,21 +467,42 @@ test_expect_success 'same, but with CR-LF line endings && cr-at-eol set' '
 	test_cmp one expect
 '
 
-test_expect_success 'same, but with CR-LF line endings && cr-at-eol unset' '
+test_expect_success 'CR-LF line endings && add line && text=auto' '
 	git config --unset core.whitespace &&
 	printf "a\r\n" >one &&
+	cp one save-one &&
+	git add one &&
 	printf "b\r\n" >>one &&
-	printf "c\r\n" >>one &&
+	cp one expect &&
+	git diff -- one >patch &&
+	mv save-one one &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
+
+test_expect_success 'CR-LF line endings && change line && text=auto' '
+	printf "a\r\n" >one &&
 	cp one save-one &&
-	printf "                 \r\n" >>one &&
 	git add one &&
+	printf "b\r\n" >one &&
 	cp one expect &&
-	printf "d\r\n" >>one &&
 	git diff -- one >patch &&
 	mv save-one one &&
-	echo d >>expect &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
 
-	git apply --ignore-space-change --whitespace=fix patch &&
+test_expect_success 'LF in repo, CRLF in worktree && change line && text=auto' '
+	printf "a\n" >one &&
+	git add one &&
+	printf "b\r\n" >one &&
+	git diff -- one >patch &&
+	printf "a\r\n" >one &&
+	echo "one text=auto" >.gitattributes &&
+	git -c core.eol=CRLF apply patch &&
+	printf "b\r\n" >expect &&
 	test_cmp one expect
 '
 
-- 
2.14.1.145.gb3622a4ee9


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-17 22:29               ` Junio C Hamano
  2017-08-17 22:35               ` Junio C Hamano
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17 22:29 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

tboegi@web.de writes:

> Changes since v2:
> - Manually integrated all code changes from Junio
>   (Thanks, I hope that I didn't miss something)

I suspect that "apply -R makes '+' preimage" change is not here.

> - Having examples of "git diff" in the commit message confuses "git apply",
>   so that all examples for git diff have a '*' at the beginnig of the line
>   (V2 used '$' which is typically an example for a shell script)

Just FYI we tend to just indent them further, just like any
displayed material in the proposed log message.

> - The official version to apply the CRLF-rules without having an index is
>   SAFE_CRLF_RENORMALIZE, that is already working today.

Ah, good find.  I forgot about that thing you added some time ago.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  2017-08-17 22:29               ` Junio C Hamano
@ 2017-08-17 22:35               ` Junio C Hamano
  1 sibling, 0 replies; 36+ messages in thread
From: Junio C Hamano @ 2017-08-17 22:35 UTC (permalink / raw)
  To: tboegi; +Cc: git, asottile

tboegi@web.de writes:

> @@ -1712,11 +1726,15 @@ static int parse_fragment(struct apply_state *state,
>  			if (!deleted && !added)
>  				leading++;
>  			trailing++;
> +			if (!state->apply_in_reverse)
> +				check_old_for_crlf(patch, line, len);
>  			if (!state->apply_in_reverse &&
>  			    state->ws_error_action == correct_ws_error)
>  				check_whitespace(state, line, len, patch->ws_rule);
>  			break;

This one is wrong.  You are looking at " " (common context) and you
should unconditionally call check_old for them.


>  		case '-':
> +			if (!state->apply_in_reverse)
> +				check_old_for_crlf(patch, line, len);

This is correct.

There is "case '+':" below here you did not touch.  There should be

		case '+':
	+		if (state->apply_in_reverse)
	+			check_old_for_crlf(...);

there.  Note that we call check_old() only when applying in reverse.

> @@ -2268,8 +2286,11 @@ static void show_stats(struct apply_state *state, struct patch *patch)
>  		add, pluses, del, minuses);
>  }
>  
> -static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
> +static int read_old_data(struct stat *st, struct patch *patch,
> +			 const char *path, struct strbuf *buf)

The order of argument to have the patch structure earlier is
different from my version; what we see here looks much better to me.

>  {
> +	enum safe_crlf safe_crlf = patch->crlf_in_old ?
> +		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
>  	switch (st->st_mode & S_IFMT) {
>  	case S_IFLNK:
>  		if (strbuf_readlink(buf, path, st->st_size) < 0)
> @@ -2278,7 +2299,15 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
>  	case S_IFREG:
>  		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
>  			return error(_("unable to open or read %s"), path);
> -		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
> +	       /*
> +		* "git apply" without "--index/--cached" should never look
> +		* at the index; the target file may not have been added to
> +		* the index yet, and we may not even be in any Git repository.
> +		* Pass NULL to convert_to_git() to stress this; the function
> +		* should never look at the index when explicit crlf option
> +		* is given.
> +		*/
> +		convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);

This comment is somewhat strangly indented.  I thought opening "/*"
alighs with the usual tab stop.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF
  2017-08-14 17:37           ` Junio C Hamano
                               ` (3 preceding siblings ...)
  2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
@ 2017-08-19 11:27             ` tboegi
  2017-08-19 11:28             ` [PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply tboegi
  5 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-19 11:27 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When convert_to_git() is called, the caller may want to keep CRLF
to be kept as CRLF (and not converted into LF).

This will be used in the next commit, when apply works with files that have
CRLF and patches are applied onto these files.

Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.

Prepare convert_to_git() to be able to run the clean filter,
skip the CRLF conversion and run the ident filter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 convert.c | 10 ++++++----
 convert.h |  3 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b3..040123b4fe 100644
--- a/convert.c
+++ b/convert.c
@@ -1104,10 +1104,12 @@ int convert_to_git(const struct index_state *istate,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
-	if (ret && dst) {
-		src = dst->buf;
-		len = dst->len;
+	if (checksafe != SAFE_CRLF_KEEP_CRLF) {
+		ret |= crlf_to_git(istate, path, src, len, dst, ca.crlf_action, checksafe);
+		if (ret && dst) {
+			src = dst->buf;
+			len = dst->len;
+		}
 	}
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
diff --git a/convert.h b/convert.h
index cecf59d1aa..cabd5ed6dd 100644
--- a/convert.h
+++ b/convert.h
@@ -10,7 +10,8 @@ enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
 	SAFE_CRLF_WARN = 2,
-	SAFE_CRLF_RENORMALIZE = 3
+	SAFE_CRLF_RENORMALIZE = 3,
+	SAFE_CRLF_KEEP_CRLF = 4
 };
 
 extern enum safe_crlf safe_crlf;
-- 
2.14.0.rc1.15.gd40c2d4e85.dirty


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply
  2017-08-14 17:37           ` Junio C Hamano
                               ` (4 preceding siblings ...)
  2017-08-19 11:27             ` [PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
@ 2017-08-19 11:28             ` tboegi
  5 siblings, 0 replies; 36+ messages in thread
From: tboegi @ 2017-08-19 11:28 UTC (permalink / raw)
  To: git, asottile; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this:

 diff --git a/one b/one
 index 533790e..c30dea8 100644
 --- a/one
 +++ b/one
 @@ -1 +1,2 @@
  a\r
 +b\r

And for b) it looks like this:

 diff --git a/one b/one
 index 533790e..485540d 100644
 --- a/one
 +++ b/one
 @@ -1 +1 @@
 -a\r
 +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

While at there, make clear that read_old_data() in apply.c
knows what it wants convert_to_git() to do with respect to CRLF.  In
fact, this codepath is about applying a patch to a file in the
filesystem, which may not exist in the index, or may exist but may
not match what is recorded in the index, or in the extreme case, we
may not even be in a Git repository.  If convert_to_git() peeked at
the index while doing its work, it *would* be a bug.

Pass NULL instead of &the_index to convert_to_git() to make sure we
catch future bugs to clarify this.

Update the test in t4124: split one test case into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree.

Reported-by: Anthony Sottile <asottile@umich.edu>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---

Changes since v3:
- took apply.c from junio/tb/apply-with-crlf
- Remove the leading asterix in the commit message, at the place
  where the "git diff" is cited.
- Mention "Pass NULL instead of &the_index to convert_to_git()"

apply.c                  | 41 ++++++++++++++++++++++++++++++++++++-----
 t/t4124-apply-ws-rule.sh | 33 +++++++++++++++++++++++++++------
 2 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..66c68f193a 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int crlf_in_old:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->crlf_in_old = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1726,14 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -1725,6 +1742,8 @@ static int parse_fragment(struct apply_state *state,
 			trailing = 0;
 			break;
 		case '+':
+			if (state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2287,11 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+			 const char *path, struct strbuf *buf)
 {
+	enum safe_crlf safe_crlf = patch->crlf_in_old ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2300,15 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		/*
+		 * "git apply" without "--index/--cached" should never look
+		 * at the index; the target file may not have been added to
+		 * the index yet, and we may not even be in any Git repository.
+		 * Pass NULL to convert_to_git() to stress this; the function
+		 * should never look at the index when explicit crlf option
+		 * is given.
+		 */
+		convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3384,6 +3414,7 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
 			     unsigned expected_mode)
 {
@@ -3399,7 +3430,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, patch, name, buf))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3432,7 +3463,7 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
+		status = load_patch_target(state, &buf, ce, st, patch,
 					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
@@ -3520,7 +3551,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)
diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index d350065f25..4fc27c51f7 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -467,21 +467,42 @@ test_expect_success 'same, but with CR-LF line endings && cr-at-eol set' '
 	test_cmp one expect
 '
 
-test_expect_success 'same, but with CR-LF line endings && cr-at-eol unset' '
+test_expect_success 'CR-LF line endings && add line && text=auto' '
 	git config --unset core.whitespace &&
 	printf "a\r\n" >one &&
+	cp one save-one &&
+	git add one &&
 	printf "b\r\n" >>one &&
-	printf "c\r\n" >>one &&
+	cp one expect &&
+	git diff -- one >patch &&
+	mv save-one one &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
+
+test_expect_success 'CR-LF line endings && change line && text=auto' '
+	printf "a\r\n" >one &&
 	cp one save-one &&
-	printf "                 \r\n" >>one &&
 	git add one &&
+	printf "b\r\n" >one &&
 	cp one expect &&
-	printf "d\r\n" >>one &&
 	git diff -- one >patch &&
 	mv save-one one &&
-	echo d >>expect &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
 
-	git apply --ignore-space-change --whitespace=fix patch &&
+test_expect_success 'LF in repo, CRLF in worktree && change line && text=auto' '
+	printf "a\n" >one &&
+	git add one &&
+	printf "b\r\n" >one &&
+	git diff -- one >patch &&
+	printf "a\r\n" >one &&
+	echo "one text=auto" >.gitattributes &&
+	git -c core.eol=CRLF apply patch &&
+	printf "b\r\n" >expect &&
 	test_cmp one expect
 '
 
-- 
2.14.0.rc1.15.gd40c2d4e85.dirty


^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-08-19 11:28 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 18:24 core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Anthony Sottile
2017-08-01 20:47 ` Torsten Bögershausen
2017-08-01 20:58   ` Anthony Sottile
2017-08-02 15:44     ` Torsten Bögershausen
2017-08-02 20:42       ` [PATCH v1 1/1] correct apply for files commited with CRLF tboegi
2017-08-02 21:13         ` Junio C Hamano
2017-08-04 17:31           ` Torsten Bögershausen
2017-08-04 17:57             ` Junio C Hamano
2017-08-04 19:26               ` Junio C Hamano
2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
2017-08-12  5:45         ` Torsten Bögershausen
2017-08-12  5:53           ` Torsten Bögershausen
2017-08-12 14:56         ` [PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-12 14:56         ` [PATCH/RFC] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-14 17:37           ` Junio C Hamano
2017-08-17  6:06             ` [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-17  6:06             ` [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-17  6:37               ` Junio C Hamano
2017-08-17 21:43             ` [PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-17 22:29               ` Junio C Hamano
2017-08-17 22:35               ` Junio C Hamano
2017-08-19 11:27             ` [PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-19 11:28             ` [PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-13  8:51         ` [PATCH/RFC 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-16 18:28           ` Junio C Hamano
2017-08-16 18:28           ` [PATCH/FIXUP 3/2] apply: remove unused member apply_state.flags Junio C Hamano
2017-08-16 18:29           ` [PATCH/FIXUP 4/2] apply: only pay attention to CRLF in the preimage Junio C Hamano
2017-08-16 18:30           ` [PATCH/FIXUP 5/2] apply: localize the CRLF business to read_old_data() Junio C Hamano
2017-08-16 18:34           ` [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case Junio C Hamano
2017-08-17  6:24             ` Torsten Bögershausen
2017-08-17  7:06               ` Junio C Hamano
2017-08-17  7:12               ` Junio C Hamano
2017-08-17  8:24                 ` Torsten Bögershausen
2017-08-17 17:07                 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).