git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
@ 2014-09-23  1:09 Laszlo Ersek
  2014-09-23 18:54 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23  1:09 UTC (permalink / raw)
  To: gitster, git, jordan.l.justen, matt.fleming, lersek

The edk2 (EFI Development Kit II) project at
<https://github.com/tianocore/edk2/> uses CRLF line endings.

The following small reproducer demonstrates how gitdiff_verify_name()
breaks when it meets the usual git patches workflow in combination with
CRLF line endings:

1. Prepare the test repo:

  mkdir testdir
  cd testdir
  git init
  git config core.whitespace cr-at-eol
  git config am.keepcr true
  touch f1
  git add f1
  git commit -m 'initial import'

2. In the contributor role, write a patch that creates a new file
(adhering to the CRLF convention), submit it, then clean up:

  git checkout -b branch1 master
  echo 'hello world' | unix2dos >f2
  git add f2
  git commit -m 'add f2'
  git format-patch master..branch1
  git send-email 0001-add-f2.patch
  # send it to yourself -- make sure it goes through your MTA
  git clean -fdx

3. In the reviewer / tester / maintainer role, save the patch from your
email client to a local file. Assume that your email client does not
corrupt the patch when saving it. (Thunderbird doesn't corrupt it, for
example.) Once saved, try to apply the patch email to a new branch.

  git checkout -b branch2 master
  git am /path/to/the/saved/file

  -> Applying: add f2
  -> fatal: git apply: bad git-diff - expected /dev/null on line 9

This happens because am.keepcr==true keeps the CRLFs intact (as it should
in fact), but then "/dev/null\r" in the diff header trips up
gitdiff_verify_name().

Fix it by reusing the is_dev_null() helper function, which in effect
changes the condition from

  memcmp("/dev/null", line, 9) || line[9] != '\n'

to

  memcmp("/dev/null", line, 9) || !isspace(line[9])

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---

Notes:
    I'm not subscribed to the list; please keep me CC'd. Thanks.

 builtin/apply.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 6b7c764..a9c6a08 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -955,7 +955,7 @@ static char *gitdiff_verify_name(const char *line, int isnull, char *orig_name,
 	}
 	else {
 		/* expect "/dev/null" */
-		if (memcmp("/dev/null", line, 9) || line[9] != '\n')
+		if (!is_dev_null(line))
 			die(_("git apply: bad git-diff - expected /dev/null on line %d"), linenr);
 		return NULL;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23  1:09 [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r" Laszlo Ersek
@ 2014-09-23 18:54 ` Junio C Hamano
  2014-09-23 19:31   ` Laszlo Ersek
  2014-09-23 20:17   ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 18:54 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

>   git format-patch master..branch1

The output from this has these (excerpt from "od -xc" output):

0000360       f   2  \n  \n   d   i   f   f       -   -   g   i   t
           6620    0a32    640a    6669    2066    2d2d    6967    2074
0000400   a   /   f   2       b   /   f   2  \n   n   e   w       f   i
           2f61    3266    6220    662f    0a32    656e    2077    6966
0000420   l   e       m   o   d   e       1   0   0   6   4   4  \n   i
           656c    6d20    646f    2065    3031    3630    3434    690a
0000440   n   d   e   x       0   0   0   0   0   0   0   .   .   f   3
           646e    7865    3020    3030    3030    3030    2e2e    3366
0000460   5   d   3   e   6  \n   -   -   -       /   d   e   v   /   n
           6435    6533    0a36    2d2d    202d    642f    7665    6e2f
0000500   u   l   l  \n   +   +   +       b   /   f   2  \n   @   @
           6c75    0a6c    2b2b    202b    2f62    3266    400a    2040
0000520   -   0   ,   0       +   1       @   @  \n   +   h   e   l   l
           302d    302c    2b20    2031    4040    2b0a    6568    6c6c
0000540   o       w   o   r   l   d  \r  \n   -   -      \n   2   .   1
           206f    6f77    6c72    0d64    2d0a    202d    320a    312e

The structural parts of the diff, including "--- /dev/null" line,
are all terminated by "\n" (as they should be), and the only CR
appears in the message is at the end of "+hello world" line.

So I do not think apply should need to loosen its sanity check and
take a random whitespace after the "/dev/null" as a valid "this is a
creation event for the path" marker (e.g. "--- /dev/null whoa"?).

is_dev_null() is used to in the fallback code path that parses
traditional patch output (e.g. GNU diff) which throws random cruft
(e.g. timestamp) after the /dev/null marker, e.g.

    $ diff -u /dev/null f2
    --- /dev/null   2014-09-17 18:22:57.995111003 -0700
    +++ f2  2014-09-23 11:37:09.000000000 -0700
    @@ -0,0 +1 @@
    +hello world

and we'd be hesitant to allow that kind of looseness for Git patches
where we know we end the line after the "/dev/null" marker.

> 3. In the reviewer / tester / maintainer role, save the patch from your
> email client to a local file. Assume that your email client does not
> corrupt the patch when saving it.

Perhaps compare this saved file with the output from the above
format-patch to see where things got broken?

SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
turn out that what you are trying to do might be an equilvalent of

	git format-patch ... |
        # first lose all \r\n
        dos2unix | 
	# then make everything \r\n
        unix2dos |
        # and apply
        git am

which is not workable in the first place.  I dunno.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 18:54 ` Junio C Hamano
@ 2014-09-23 19:31   ` Laszlo Ersek
  2014-09-23 19:56     ` Junio C Hamano
  2014-09-23 20:02     ` Junio C Hamano
  2014-09-23 20:17   ` Junio C Hamano
  1 sibling, 2 replies; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 20:54, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>>   git format-patch master..branch1
> 
> The output from this has these (excerpt from "od -xc" output):
> 
> 0000360       f   2  \n  \n   d   i   f   f       -   -   g   i   t
>            6620    0a32    640a    6669    2066    2d2d    6967    2074
> 0000400   a   /   f   2       b   /   f   2  \n   n   e   w       f   i
>            2f61    3266    6220    662f    0a32    656e    2077    6966
> 0000420   l   e       m   o   d   e       1   0   0   6   4   4  \n   i
>            656c    6d20    646f    2065    3031    3630    3434    690a
> 0000440   n   d   e   x       0   0   0   0   0   0   0   .   .   f   3
>            646e    7865    3020    3030    3030    3030    2e2e    3366
> 0000460   5   d   3   e   6  \n   -   -   -       /   d   e   v   /   n
>            6435    6533    0a36    2d2d    202d    642f    7665    6e2f
> 0000500   u   l   l  \n   +   +   +       b   /   f   2  \n   @   @
>            6c75    0a6c    2b2b    202b    2f62    3266    400a    2040
> 0000520   -   0   ,   0       +   1       @   @  \n   +   h   e   l   l
>            302d    302c    2b20    2031    4040    2b0a    6568    6c6c
> 0000540   o       w   o   r   l   d  \r  \n   -   -      \n   2   .   1
>            206f    6f77    6c72    0d64    2d0a    202d    320a    312e
> 
> The structural parts of the diff, including "--- /dev/null" line,
> are all terminated by "\n" (as they should be), and the only CR
> appears in the message is at the end of "+hello world" line.

That's right -- until the patch email goes through an MTA that turns all
line endings into CRLF. (Did you email the patch to yourself as
requested in the reproducer?)

Such CRLFs are normally transparent because git-am strips them. The
keepcr=true setting preserves them, but not only for the source code
lines (where it's the right thing to do): it also preserves them in the
git diff header lines. Which is not a problem in general, *except* when
said header line includes /dev/null.

> So I do not think apply should need to loosen its sanity check and
> take a random whitespace after the "/dev/null" as a valid "this is a
> creation event for the path" marker (e.g. "--- /dev/null whoa"?).
> 
> is_dev_null() is used to in the fallback code path that parses
> traditional patch output (e.g. GNU diff) which throws random cruft
> (e.g. timestamp) after the /dev/null marker, e.g.
> 
>     $ diff -u /dev/null f2
>     --- /dev/null   2014-09-17 18:22:57.995111003 -0700
>     +++ f2  2014-09-23 11:37:09.000000000 -0700
>     @@ -0,0 +1 @@
>     +hello world
> 
> and we'd be hesitant to allow that kind of looseness for Git patches
> where we know we end the line after the "/dev/null" marker.

Okay, let's say I only relax the check in question to accept "\r\n" in
addition to the current "\n". Will you take that?

>> 3. In the reviewer / tester / maintainer role, save the patch from your
>> email client to a local file. Assume that your email client does not
>> corrupt the patch when saving it.
> 
> Perhaps compare this saved file with the output from the above
> format-patch to see where things got broken?
> 
> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
> turn out that what you are trying to do might be an equilvalent of
> 
> 	git format-patch ... |
>         # first lose all \r\n
>         dos2unix | 
> 	# then make everything \r\n
>         unix2dos |
>         # and apply
>         git am
> 
> which is not workable in the first place.  I dunno.

I agree with your analysis. It is indeed the MTA (or the MUA, not sure)
that turns all line endings into uniform CRLFs -- it is a requirement in
RFC 2822 / 822bis.

http://cr.yp.to/docs/smtplf.html
http://www.rfc-editor.org/rfc/rfc2822.txt

> 2.3. Body
>
>    The body of a message is simply lines of US-ASCII characters.  The
>    only two limitations on the body are as follows:
>
>    - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>      independently in the body.

But why is this situation "not workable"? The same happens with *all*
patches that people mail around, it's just not visible to them, because
git-am strips all CRs indiscriminately.

People who are forced to work with CRLF repositories don't have this
luxury with git-am, and bump into the /dev/null problem all the time.

What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?

Another question I had about gitdiff_verify_name() -- what ensures there
that the memcmp(), with the fixed size of 9 bytes, won't fall off the
end of the line? Some of the outer caller functions verify the line
length before their comparisons, but I don't see any length checks in
gitdiff_verify_name() for /dev/null specifically.

Thank you,
Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 19:31   ` Laszlo Ersek
@ 2014-09-23 19:56     ` Junio C Hamano
  2014-09-23 20:33       ` Laszlo Ersek
  2014-09-23 20:02     ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 19:56 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?

I thought we agreed that what you are doing is not workable in the
first place, no?

I suspect one way to handle "In this project, the files that are
checked out must be with CRLF line endings no matter what the
platform is" might be to use the line ending attributes to force
that while keeping the in-repository data with LF line endings.  The
diff output (format-patch output is just one of them) comes from
comparing the in-repository representation, so you won't have \r\n
that will be stripped via MTA in it, "apply" and "am" will apply the
patch without having to worry about \r\n, _and_ the line ending
attributes would end the lines in your in-working-tree files with
CRLF that way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 19:31   ` Laszlo Ersek
  2014-09-23 19:56     ` Junio C Hamano
@ 2014-09-23 20:02     ` Junio C Hamano
  2014-09-23 20:32       ` Laszlo Ersek
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:02 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> On 09/23/14 20:54, Junio C Hamano wrote:
> ...
>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>> turn out that what you are trying to do might be an equilvalent of
>> 
>> 	git format-patch ... |
>>         # first lose all \r\n
>>         dos2unix | 
>> 	# then make everything \r\n
>>         unix2dos |
>>         # and apply
>>         git am
>> 
>> which is not workable in the first place.  I dunno.
>
> I agree with your analysis. It is indeed the MTA...
>>    - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>      independently in the body.
>
> But why is this situation "not workable"? The same happens with *all*
> patches that people mail around, it's just not visible to them, because
> git-am strips all CRs indiscriminately.

It is not "git am" or "git apply" that "strips all CRs
indiscriminately".  I just tried to apply 0001-add-f2 without
letting your MTA/MUA corrupt it on "master" branch in the repository
you prepared that patch from, i.e.

	git checkout master^0 ;# go back
        git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
        git diff branch ;# nothing

> Another question I had about gitdiff_verify_name() -- what ensures there
> that the memcmp(), with the fixed size of 9 bytes,...

That may be worth fixing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 18:54 ` Junio C Hamano
  2014-09-23 19:31   ` Laszlo Ersek
@ 2014-09-23 20:17   ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:17 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Junio C Hamano <gitster@pobox.com> writes:

> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
> turn out that what you are trying to do might be an equilvalent of
>
> 	git format-patch ... |
>         # first lose all \r\n
>         dos2unix | 
> 	# then make everything \r\n
>         unix2dos |
>         # and apply
>         git am
>
> which is not workable in the first place.  I dunno.

This is a tangent, but if the problem were slightly different, I
would be more sympathetic.  For example

    A popular MUA, when asked to write out a message in the mbox
    format, ends _all_ the lines in its output with CRLF, whether
    the original was sent as LF-only or CRLF, and there is no way to
    convince it to use LF-only.  "git apply" fails to grok such an
    input.

could be, if the use of such an MUA is very prevalent, a common
problem worth working around.

But then I would suspect that the workaround for such a case may not
be "accept /dev/null\n and /dev/null\r\n equally".  It is likely
that the right workaround for such a case would be to "turn all \r\n
into \n before processing".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:02     ` Junio C Hamano
@ 2014-09-23 20:32       ` Laszlo Ersek
  2014-09-23 20:35         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 22:02, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 09/23/14 20:54, Junio C Hamano wrote:
>> ...
>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>> turn out that what you are trying to do might be an equilvalent of
>>>
>>> 	git format-patch ... |
>>>         # first lose all \r\n
>>>         dos2unix | 
>>> 	# then make everything \r\n
>>>         unix2dos |
>>>         # and apply
>>>         git am
>>>
>>> which is not workable in the first place.  I dunno.
>>
>> I agree with your analysis. It is indeed the MTA...
>>>    - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>>      independently in the body.
>>
>> But why is this situation "not workable"? The same happens with *all*
>> patches that people mail around, it's just not visible to them, because
>> git-am strips all CRs indiscriminately.
> 
> It is not "git am" or "git apply" that "strips all CRs
> indiscriminately".  I just tried to apply 0001-add-f2 without
> letting your MTA/MUA corrupt it on "master" branch in the repository
> you prepared that patch from, i.e.
> 
> 	git checkout master^0 ;# go back
>         git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
>         git diff branch ;# nothing

When you did this, was am.keepcr=true in effect?

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 19:56     ` Junio C Hamano
@ 2014-09-23 20:33       ` Laszlo Ersek
  2014-09-23 20:40         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 21:56, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
> 
> I thought we agreed that what you are doing is not workable in the
> first place, no?
> 
> I suspect one way to handle "In this project, the files that are
> checked out must be with CRLF line endings no matter what the
> platform is" might be to use the line ending attributes to force
> that while keeping the in-repository data with LF line endings.  The
> diff output (format-patch output is just one of them) comes from
> comparing the in-repository representation, so you won't have \r\n
> that will be stripped via MTA in it, "apply" and "am" will apply the
> patch without having to worry about \r\n, _and_ the line ending
> attributes would end the lines in your in-working-tree files with
> CRLF that way.

This would be a perfect solution if the git repository was not a mirror
of a Subversion repository that contains all files with embedded CRLFs.

Anyway I accept defeat, thanks for your time.

Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:32       ` Laszlo Ersek
@ 2014-09-23 20:35         ` Junio C Hamano
  2014-09-23 20:49           ` Laszlo Ersek
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:35 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> On 09/23/14 22:02, Junio C Hamano wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> On 09/23/14 20:54, Junio C Hamano wrote:
>>> ...
>>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>>> turn out that what you are trying to do might be an equilvalent of
>>>>
>>>> 	git format-patch ... |
>>>>         # first lose all \r\n
>>>>         dos2unix | 
>>>> 	# then make everything \r\n
>>>>         unix2dos |
>>>>         # and apply
>>>>         git am
>>>>
>>>> which is not workable in the first place.  I dunno.
>>>
>>> I agree with your analysis. It is indeed the MTA...
>>>>    - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>>>      independently in the body.
>>>
>>> But why is this situation "not workable"? The same happens with *all*
>>> patches that people mail around, it's just not visible to them, because
>>> git-am strips all CRs indiscriminately.
>> 
>> It is not "git am" or "git apply" that "strips all CRs
>> indiscriminately".  I just tried to apply 0001-add-f2 without
>> letting your MTA/MUA corrupt it on "master" branch in the repository
>> you prepared that patch from, i.e.
>> 
>> 	git checkout master^0 ;# go back
>>         git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
>>         git diff branch ;# nothing
>
> When you did this, was am.keepcr=true in effect?

I actually briefly scratched my head but realized when I saw it work
"as expected" without me passing "--keep-cr" to "am" myself.

But I did that experiment in the repository created by following
your reproduction recipe, in which it had these:

  git config core.whitespace cr-at-eol
  git config am.keepcr true

so yes I had keepcr set.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:33       ` Laszlo Ersek
@ 2014-09-23 20:40         ` Junio C Hamano
  2014-09-23 20:57           ` Laszlo Ersek
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:40 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> On 09/23/14 21:56, Junio C Hamano wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>> 
>>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
>> 
>> I thought we agreed that what you are doing is not workable in the
>> first place, no?
>> 
>> I suspect one way to handle "In this project, the files that are
>> checked out must be with CRLF line endings no matter what the
>> platform is" might be to use the line ending attributes to force
>> that while keeping the in-repository data with LF line endings.  The
>> diff output (format-patch output is just one of them) comes from
>> comparing the in-repository representation, so you won't have \r\n
>> that will be stripped via MTA in it, "apply" and "am" will apply the
>> patch without having to worry about \r\n, _and_ the line ending
>> attributes would end the lines in your in-working-tree files with
>> CRLF that way.
>
> This would be a perfect solution if the git repository was not a mirror
> of a Subversion repository that contains all files with embedded CRLFs.

Yikes.

> Anyway I accept defeat, thanks for your time.

I do not consider that a "defeat".  It is just I do not think it is
the right solution for the problem you are having to butcher "apply".
You'd need to find some other way to solve it, and other people on
the list may be able to offer a solution neither of us thought of in
this thread.

Perhaps those who are on Windows have more experience in situations
like yours?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:35         ` Junio C Hamano
@ 2014-09-23 20:49           ` Laszlo Ersek
  2014-09-23 21:35             ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 22:35, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 09/23/14 22:02, Junio C Hamano wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> On 09/23/14 20:54, Junio C Hamano wrote:
>>>> ...
>>>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>>>> turn out that what you are trying to do might be an equilvalent of
>>>>>
>>>>> 	git format-patch ... |
>>>>>         # first lose all \r\n
>>>>>         dos2unix | 
>>>>> 	# then make everything \r\n
>>>>>         unix2dos |
>>>>>         # and apply
>>>>>         git am
>>>>>
>>>>> which is not workable in the first place.  I dunno.
>>>>
>>>> I agree with your analysis. It is indeed the MTA...
>>>>>    - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>>>>      independently in the body.
>>>>
>>>> But why is this situation "not workable"? The same happens with *all*
>>>> patches that people mail around, it's just not visible to them, because
>>>> git-am strips all CRs indiscriminately.
>>>
>>> It is not "git am" or "git apply" that "strips all CRs
>>> indiscriminately".  I just tried to apply 0001-add-f2 without
>>> letting your MTA/MUA corrupt it on "master" branch in the repository
>>> you prepared that patch from, i.e.
>>>
>>> 	git checkout master^0 ;# go back
>>>         git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
>>>         git diff branch ;# nothing
>>
>> When you did this, was am.keepcr=true in effect?
> 
> I actually briefly scratched my head but realized when I saw it work
> "as expected" without me passing "--keep-cr" to "am" myself.
> 
> But I did that experiment in the repository created by following
> your reproduction recipe, in which it had these:
> 
>   git config core.whitespace cr-at-eol
>   git config am.keepcr true
> 
> so yes I had keepcr set.

Thank you for confirming, I expected so.

Because in this case the test doesn't refute my claim that "git-am
strips all CRs indiscriminately".

Git-am *does* strip all CRs indiscriminately, undoing the CRs that the
email servers / clients introduce. Your above test worked out because
you prevented git-am from stripping the CRs, with the keepcr=true
setting. If you turn that off, then either your git am command won't
succeed (because it will run into context conflicts due to different
line endings -- although not in this example), or the final git-diff
will report differences.

In summary:
- the email infrastructure turns all line terminators into CRLFs
- git-am strips these by default, from source code lines and from git
  diff header lines alike,
- this is fine for repos that store files with \n terminators,
- not fine for repos with embedded \r\n terminators -- the default
  stripping behavior of git-am breaks the source code in that case
  (runs into conflicts with existing files, and creates new files with
  wrong line endings)
- if you set am.keepcr=true, then the source code remains intact (no
  conflicts for existing files), but new files cannot be created,
  because the /dev/null\r header lines are rejected.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:40         ` Junio C Hamano
@ 2014-09-23 20:57           ` Laszlo Ersek
  0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 22:40, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 09/23/14 21:56, Junio C Hamano wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
>>>
>>> I thought we agreed that what you are doing is not workable in the
>>> first place, no?
>>>
>>> I suspect one way to handle "In this project, the files that are
>>> checked out must be with CRLF line endings no matter what the
>>> platform is" might be to use the line ending attributes to force
>>> that while keeping the in-repository data with LF line endings.  The
>>> diff output (format-patch output is just one of them) comes from
>>> comparing the in-repository representation, so you won't have \r\n
>>> that will be stripped via MTA in it, "apply" and "am" will apply the
>>> patch without having to worry about \r\n, _and_ the line ending
>>> attributes would end the lines in your in-working-tree files with
>>> CRLF that way.
>>
>> This would be a perfect solution if the git repository was not a mirror
>> of a Subversion repository that contains all files with embedded CRLFs.
> 
> Yikes.
> 
>> Anyway I accept defeat, thanks for your time.
> 
> I do not consider that a "defeat".  It is just I do not think it is
> the right solution for the problem you are having to butcher "apply".
> You'd need to find some other way to solve it, and other people on
> the list may be able to offer a solution neither of us thought of in
> this thread.
> 
> Perhaps those who are on Windows have more experience in situations
> like yours?

I'm not on Windows, "obviously". :)

The overwhelming majority of the EDK II developers use windows, and
connect directly to subversion. They work with CRLF line endings "natively".

Jordan (CC'd) operates a robot that mirrors SVN commits to the git repo
on github, with "git svn". "git svn rebase --use-log-author" fetches the
new SVN commits to the robots local git clone, and then "git push" (as
usual) pushes them to github. This is being done so that people knowing
git don't lose their sanity, trying to work with SVN.

The process works very well, up to a point (git-loving people clone the
github mirror, and submit patches with git-format-patch / git send-email
to edk2-devel). The problem is when you want to apply patches with
git-am from the list -- the CRLFs are a mess. Hence my thread starter here.

We can get around this by maintaining personal forks on github, pushing
our patches there too in parallel with the email postings. People can
then fetch directly, and avoid git-am altogether. But this is very
cumbersome -- you need a github account, you need an edk2 fork on
github, others need to add your repo as a remote, etc etc, while the
review occurs anyway in our MUAs.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 20:49           ` Laszlo Ersek
@ 2014-09-23 21:35             ` Junio C Hamano
  2014-09-24 12:56               ` Laszlo Ersek
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 21:35 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> In summary:

This is not entirely correct, though.  But it suggests an avenue for
a possible enhancement.

> - the email infrastructure turns all line terminators into CRLFs

Yes, but that is within MTAs and is expected to be invisible at MUA
level.  Typically mailbox files are stored in platform's native
format (e.g. I see LF endings in my mailbox), be it CRLF of LF.

The important thing to note here is that use of text/plain for
patches, if you want to have distinction between CRLF and LF in your
payload, is not designed to work over e-mails.

> - git-am strips these by default, from source code lines and from git
>   diff header lines alike,

On a platform whose native line endings are LF, the stripping may
not even happen (i.e. "am/apply" may not see CRLF in the mailbox in
the first place).

On a platform whose mbox has CRLF, or if you "unix2dos" a mbox on a
platform whose mbox has LF to manufacture a mbox with CRLF line
endings, you would have an opposite issue.  Information is lost in
MTAs while your e-mail is in transit and you cannot tell patch to
which paths had CRLF line endings and patch to which paths had LF
line endings.  If all files you are tracking uniformly use CRLF, you
can assume that everything had CRLF but it is still an assumption
and that is a special case that is not particularly interesting.

Applying such a CRLF patch with your "/dev/null\r\n" patch applied
to "am" will _add_ unwanted CR before LF to files that are meant to
use LF line endings.

Now if we accept that this issue is coming from lossy nature of
transporting patches via e-mails, we would realize that the right
place to solve this is not in "apply"'s parsing of structural part
of the "diff" output (e.g. "diff --git ...", "rename from ..." or
"--- filename"), but the payload part (i.e. " " followed by context,
"-" followed by removed and "+" followed by added).  Removal of CR
by "am -> mailsplit -> mailinfo -> apply" callchain is not losing
any information, as the input does not have useful information to
let us answer "are the lines in this path supposed to end with
CRLF?" in the first place; "/dev/null\r" patch is barking up a wrong
tree.

Our line-endings infrastructure (e.g. core.autocrlf configuration
variable, `eol` attribute) is all geared toward supporting cross
platform projects in that the BCP is to use LF line endings as the
canonical line endings for in-repository data and convert it to CRLF
for working tree files when necessary.  We do not have direct
support (other than declaring contents for paths as "binary" aka "no
conversion") to use CRLF in in-repository data (and that is partly
deliberate).

But it is conceivable to enhance the attribute system to allow us to
mark certain paths (or "all paths", which is a trivial special case)
as using CRLF line endings in in-repository and in-working-tree.  It
could be setting `eol` attribute to `both-crlf` or something.

Then "am -> mailsplit -> mailinfo -> apply" chain could:

 * "mailsplit" and "mailinfo" does not have to do anything special,
   other than stripping CR and make sure "apply" only sees LF
   endings;

 * "apply" is taught a new option "--fix-mta-corruption" which
   causes it to pay attention to the `eol` attribute set to
   `both-crlf`, and when it reads a patch

	diff --git a/one b/one
        --- one
        +++ one
        @@ -4,6 +4,6 @@
         common1
	 common2
        -old1
        -old2
        +new1
        +new2
         common3
         common4

   and notices that path "one" is marked as such, it pretends as if
   the input were

	diff --git a/one b/one
        --- one
        +++ one
        @@ -4,6 +4,6 @@
         common1^M
	 common2^M
        -old1^M
        -old2^M
        +new1^M
        +new2^M
         common3^M
         common4^M

   to compensate for possible breakage during the mail transit.

 * "am" is taught to pass "--fix-mta-corruption" to "apply" perhaps
   by default.

Because code paths that internally run "git am", e.g. "git rebase",
work on data that is *not* corrupt by mta (we generate diff and
apply ourselves), these places need to tell "am" not to use the
"--fix-mta-corruption" when running "apply".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-23 21:35             ` Junio C Hamano
@ 2014-09-24 12:56               ` Laszlo Ersek
  2014-09-24 17:55                 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-24 12:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming

On 09/23/14 23:35, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> [...]

> The important thing to note here is that use of text/plain for
> patches, if you want to have distinction between CRLF and LF in your
> payload, is not designed to work over e-mails.

That's good to know, thanks!

> Now if we accept that this issue is coming from lossy nature of
> transporting patches via e-mails, we would realize that the right
> place to solve this is not in "apply"'s parsing of structural part
> of the "diff" output (e.g. "diff --git ...", "rename from ..." or
> "--- filename"), but the payload part (i.e. " " followed by context,
> "-" followed by removed and "+" followed by added).

I can agree with this, yes.

> Removal of CR
> by "am -> mailsplit -> mailinfo -> apply" callchain is not losing
> any information, as the input does not have useful information to
> let us answer "are the lines in this path supposed to end with
> CRLF?" in the first place; "/dev/null\r" patch is barking up a wrong
> tree.

OK.

> Our line-endings infrastructure (e.g. core.autocrlf configuration
> variable, `eol` attribute) is all geared toward supporting cross
> platform projects in that the BCP is to use LF line endings as the
> canonical line endings for in-repository data and convert it to CRLF
> for working tree files when necessary.  We do not have direct
> support (other than declaring contents for paths as "binary" aka "no
> conversion") to use CRLF in in-repository data (and that is partly
> deliberate).

I see. The edk2 "mirror-of-svn" git repo is then somewhat "incompatible"
with the original design.

> But it is conceivable to enhance the attribute system to allow us to
> mark certain paths (or "all paths", which is a trivial special case)
> as using CRLF line endings in in-repository and in-working-tree.  It
> could be setting `eol` attribute to `both-crlf` or something.
> 
> Then "am -> mailsplit -> mailinfo -> apply" chain could:
> 
>  * "mailsplit" and "mailinfo" does not have to do anything special,
>    other than stripping CR and make sure "apply" only sees LF
>    endings;
> 
>  * "apply" is taught a new option "--fix-mta-corruption" which
>    causes it to pay attention to the `eol` attribute set to
>    `both-crlf`, and when it reads a patch
> 
> 	diff --git a/one b/one
>         --- one
>         +++ one
>         @@ -4,6 +4,6 @@
>          common1
> 	 common2
>         -old1
>         -old2
>         +new1
>         +new2
>          common3
>          common4
> 
>    and notices that path "one" is marked as such, it pretends as if
>    the input were
> 
> 	diff --git a/one b/one
>         --- one
>         +++ one
>         @@ -4,6 +4,6 @@
>          common1^M
> 	 common2^M
>         -old1^M
>         -old2^M
>         +new1^M
>         +new2^M
>          common3^M
>          common4^M
> 
>    to compensate for possible breakage during the mail transit.
> 
>  * "am" is taught to pass "--fix-mta-corruption" to "apply" perhaps
>    by default.
> 
> Because code paths that internally run "git am", e.g. "git rebase",
> work on data that is *not* corrupt by mta (we generate diff and
> apply ourselves), these places need to tell "am" not to use the
> "--fix-mta-corruption" when running "apply".

Thank you for taking the time to describe this. It does look like the
by-the-book solution.

Obviously, I can't implement it myself -- first, I have no experience
with the git codebase, second, I have no time nor mandate to get
familiar with it and to make a serious effort to improve it in this
direction. (IOW "git" is a tool for my job, not my job.) The one-liner
patch and this discussion is all I can manage -- I thought I'd take a
stab at it myself rather than just "complain".

If someone finds the time to implement and document this feature, a
small part of the community will be very grateful. (Not much of a
compensation for a corner case like this, admittedly.)

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
  2014-09-24 12:56               ` Laszlo Ersek
@ 2014-09-24 17:55                 ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-24 17:55 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming

Laszlo Ersek <lersek@redhat.com> writes:

> Thank you for taking the time to describe this. It does look like the
> by-the-book solution.
>
> Obviously, I can't implement it myself -- first, I have no experience
> with the git codebase, ...

Oh, I wasn't expecting that anyway ;-).

The reason I outlined a possible approach was to ask you to sanity
check the envisioned outcome (i.e. "if somebody made effort to
implement it, would the end result help your workflow?") and to give
hints to contributors who are looking for things to work on ;-)

> If someone finds the time to implement and document this feature, a
> small part of the community will be very grateful. (Not much of a
> compensation for a corner case like this, admittedly.)

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-09-24 17:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  1:09 [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r" Laszlo Ersek
2014-09-23 18:54 ` Junio C Hamano
2014-09-23 19:31   ` Laszlo Ersek
2014-09-23 19:56     ` Junio C Hamano
2014-09-23 20:33       ` Laszlo Ersek
2014-09-23 20:40         ` Junio C Hamano
2014-09-23 20:57           ` Laszlo Ersek
2014-09-23 20:02     ` Junio C Hamano
2014-09-23 20:32       ` Laszlo Ersek
2014-09-23 20:35         ` Junio C Hamano
2014-09-23 20:49           ` Laszlo Ersek
2014-09-23 21:35             ` Junio C Hamano
2014-09-24 12:56               ` Laszlo Ersek
2014-09-24 17:55                 ` Junio C Hamano
2014-09-23 20:17   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).