* [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
@ 2014-09-23 1:09 Laszlo Ersek
2014-09-23 18:54 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 1:09 UTC (permalink / raw)
To: gitster, git, jordan.l.justen, matt.fleming, lersek
The edk2 (EFI Development Kit II) project at
<https://github.com/tianocore/edk2/> uses CRLF line endings.
The following small reproducer demonstrates how gitdiff_verify_name()
breaks when it meets the usual git patches workflow in combination with
CRLF line endings:
1. Prepare the test repo:
mkdir testdir
cd testdir
git init
git config core.whitespace cr-at-eol
git config am.keepcr true
touch f1
git add f1
git commit -m 'initial import'
2. In the contributor role, write a patch that creates a new file
(adhering to the CRLF convention), submit it, then clean up:
git checkout -b branch1 master
echo 'hello world' | unix2dos >f2
git add f2
git commit -m 'add f2'
git format-patch master..branch1
git send-email 0001-add-f2.patch
# send it to yourself -- make sure it goes through your MTA
git clean -fdx
3. In the reviewer / tester / maintainer role, save the patch from your
email client to a local file. Assume that your email client does not
corrupt the patch when saving it. (Thunderbird doesn't corrupt it, for
example.) Once saved, try to apply the patch email to a new branch.
git checkout -b branch2 master
git am /path/to/the/saved/file
-> Applying: add f2
-> fatal: git apply: bad git-diff - expected /dev/null on line 9
This happens because am.keepcr==true keeps the CRLFs intact (as it should
in fact), but then "/dev/null\r" in the diff header trips up
gitdiff_verify_name().
Fix it by reusing the is_dev_null() helper function, which in effect
changes the condition from
memcmp("/dev/null", line, 9) || line[9] != '\n'
to
memcmp("/dev/null", line, 9) || !isspace(line[9])
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
Notes:
I'm not subscribed to the list; please keep me CC'd. Thanks.
builtin/apply.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/builtin/apply.c b/builtin/apply.c
index 6b7c764..a9c6a08 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -955,7 +955,7 @@ static char *gitdiff_verify_name(const char *line, int isnull, char *orig_name,
}
else {
/* expect "/dev/null" */
- if (memcmp("/dev/null", line, 9) || line[9] != '\n')
+ if (!is_dev_null(line))
die(_("git apply: bad git-diff - expected /dev/null on line %d"), linenr);
return NULL;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 1:09 [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r" Laszlo Ersek
@ 2014-09-23 18:54 ` Junio C Hamano
2014-09-23 19:31 ` Laszlo Ersek
2014-09-23 20:17 ` Junio C Hamano
0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 18:54 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> git format-patch master..branch1
The output from this has these (excerpt from "od -xc" output):
0000360 f 2 \n \n d i f f - - g i t
6620 0a32 640a 6669 2066 2d2d 6967 2074
0000400 a / f 2 b / f 2 \n n e w f i
2f61 3266 6220 662f 0a32 656e 2077 6966
0000420 l e m o d e 1 0 0 6 4 4 \n i
656c 6d20 646f 2065 3031 3630 3434 690a
0000440 n d e x 0 0 0 0 0 0 0 . . f 3
646e 7865 3020 3030 3030 3030 2e2e 3366
0000460 5 d 3 e 6 \n - - - / d e v / n
6435 6533 0a36 2d2d 202d 642f 7665 6e2f
0000500 u l l \n + + + b / f 2 \n @ @
6c75 0a6c 2b2b 202b 2f62 3266 400a 2040
0000520 - 0 , 0 + 1 @ @ \n + h e l l
302d 302c 2b20 2031 4040 2b0a 6568 6c6c
0000540 o w o r l d \r \n - - \n 2 . 1
206f 6f77 6c72 0d64 2d0a 202d 320a 312e
The structural parts of the diff, including "--- /dev/null" line,
are all terminated by "\n" (as they should be), and the only CR
appears in the message is at the end of "+hello world" line.
So I do not think apply should need to loosen its sanity check and
take a random whitespace after the "/dev/null" as a valid "this is a
creation event for the path" marker (e.g. "--- /dev/null whoa"?).
is_dev_null() is used to in the fallback code path that parses
traditional patch output (e.g. GNU diff) which throws random cruft
(e.g. timestamp) after the /dev/null marker, e.g.
$ diff -u /dev/null f2
--- /dev/null 2014-09-17 18:22:57.995111003 -0700
+++ f2 2014-09-23 11:37:09.000000000 -0700
@@ -0,0 +1 @@
+hello world
and we'd be hesitant to allow that kind of looseness for Git patches
where we know we end the line after the "/dev/null" marker.
> 3. In the reviewer / tester / maintainer role, save the patch from your
> email client to a local file. Assume that your email client does not
> corrupt the patch when saving it.
Perhaps compare this saved file with the output from the above
format-patch to see where things got broken?
SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
turn out that what you are trying to do might be an equilvalent of
git format-patch ... |
# first lose all \r\n
dos2unix |
# then make everything \r\n
unix2dos |
# and apply
git am
which is not workable in the first place. I dunno.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 18:54 ` Junio C Hamano
@ 2014-09-23 19:31 ` Laszlo Ersek
2014-09-23 19:56 ` Junio C Hamano
2014-09-23 20:02 ` Junio C Hamano
2014-09-23 20:17 ` Junio C Hamano
1 sibling, 2 replies; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 19:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 20:54, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> git format-patch master..branch1
>
> The output from this has these (excerpt from "od -xc" output):
>
> 0000360 f 2 \n \n d i f f - - g i t
> 6620 0a32 640a 6669 2066 2d2d 6967 2074
> 0000400 a / f 2 b / f 2 \n n e w f i
> 2f61 3266 6220 662f 0a32 656e 2077 6966
> 0000420 l e m o d e 1 0 0 6 4 4 \n i
> 656c 6d20 646f 2065 3031 3630 3434 690a
> 0000440 n d e x 0 0 0 0 0 0 0 . . f 3
> 646e 7865 3020 3030 3030 3030 2e2e 3366
> 0000460 5 d 3 e 6 \n - - - / d e v / n
> 6435 6533 0a36 2d2d 202d 642f 7665 6e2f
> 0000500 u l l \n + + + b / f 2 \n @ @
> 6c75 0a6c 2b2b 202b 2f62 3266 400a 2040
> 0000520 - 0 , 0 + 1 @ @ \n + h e l l
> 302d 302c 2b20 2031 4040 2b0a 6568 6c6c
> 0000540 o w o r l d \r \n - - \n 2 . 1
> 206f 6f77 6c72 0d64 2d0a 202d 320a 312e
>
> The structural parts of the diff, including "--- /dev/null" line,
> are all terminated by "\n" (as they should be), and the only CR
> appears in the message is at the end of "+hello world" line.
That's right -- until the patch email goes through an MTA that turns all
line endings into CRLF. (Did you email the patch to yourself as
requested in the reproducer?)
Such CRLFs are normally transparent because git-am strips them. The
keepcr=true setting preserves them, but not only for the source code
lines (where it's the right thing to do): it also preserves them in the
git diff header lines. Which is not a problem in general, *except* when
said header line includes /dev/null.
> So I do not think apply should need to loosen its sanity check and
> take a random whitespace after the "/dev/null" as a valid "this is a
> creation event for the path" marker (e.g. "--- /dev/null whoa"?).
>
> is_dev_null() is used to in the fallback code path that parses
> traditional patch output (e.g. GNU diff) which throws random cruft
> (e.g. timestamp) after the /dev/null marker, e.g.
>
> $ diff -u /dev/null f2
> --- /dev/null 2014-09-17 18:22:57.995111003 -0700
> +++ f2 2014-09-23 11:37:09.000000000 -0700
> @@ -0,0 +1 @@
> +hello world
>
> and we'd be hesitant to allow that kind of looseness for Git patches
> where we know we end the line after the "/dev/null" marker.
Okay, let's say I only relax the check in question to accept "\r\n" in
addition to the current "\n". Will you take that?
>> 3. In the reviewer / tester / maintainer role, save the patch from your
>> email client to a local file. Assume that your email client does not
>> corrupt the patch when saving it.
>
> Perhaps compare this saved file with the output from the above
> format-patch to see where things got broken?
>
> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
> turn out that what you are trying to do might be an equilvalent of
>
> git format-patch ... |
> # first lose all \r\n
> dos2unix |
> # then make everything \r\n
> unix2dos |
> # and apply
> git am
>
> which is not workable in the first place. I dunno.
I agree with your analysis. It is indeed the MTA (or the MUA, not sure)
that turns all line endings into uniform CRLFs -- it is a requirement in
RFC 2822 / 822bis.
http://cr.yp.to/docs/smtplf.html
http://www.rfc-editor.org/rfc/rfc2822.txt
> 2.3. Body
>
> The body of a message is simply lines of US-ASCII characters. The
> only two limitations on the body are as follows:
>
> - CR and LF MUST only occur together as CRLF; they MUST NOT appear
> independently in the body.
But why is this situation "not workable"? The same happens with *all*
patches that people mail around, it's just not visible to them, because
git-am strips all CRs indiscriminately.
People who are forced to work with CRLF repositories don't have this
luxury with git-am, and bump into the /dev/null problem all the time.
What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
Another question I had about gitdiff_verify_name() -- what ensures there
that the memcmp(), with the fixed size of 9 bytes, won't fall off the
end of the line? Some of the outer caller functions verify the line
length before their comparisons, but I don't see any length checks in
gitdiff_verify_name() for /dev/null specifically.
Thank you,
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 19:31 ` Laszlo Ersek
@ 2014-09-23 19:56 ` Junio C Hamano
2014-09-23 20:33 ` Laszlo Ersek
2014-09-23 20:02 ` Junio C Hamano
1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 19:56 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
I thought we agreed that what you are doing is not workable in the
first place, no?
I suspect one way to handle "In this project, the files that are
checked out must be with CRLF line endings no matter what the
platform is" might be to use the line ending attributes to force
that while keeping the in-repository data with LF line endings. The
diff output (format-patch output is just one of them) comes from
comparing the in-repository representation, so you won't have \r\n
that will be stripped via MTA in it, "apply" and "am" will apply the
patch without having to worry about \r\n, _and_ the line ending
attributes would end the lines in your in-working-tree files with
CRLF that way.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 19:31 ` Laszlo Ersek
2014-09-23 19:56 ` Junio C Hamano
@ 2014-09-23 20:02 ` Junio C Hamano
2014-09-23 20:32 ` Laszlo Ersek
1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:02 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> On 09/23/14 20:54, Junio C Hamano wrote:
> ...
>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>> turn out that what you are trying to do might be an equilvalent of
>>
>> git format-patch ... |
>> # first lose all \r\n
>> dos2unix |
>> # then make everything \r\n
>> unix2dos |
>> # and apply
>> git am
>>
>> which is not workable in the first place. I dunno.
>
> I agree with your analysis. It is indeed the MTA...
>> - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>> independently in the body.
>
> But why is this situation "not workable"? The same happens with *all*
> patches that people mail around, it's just not visible to them, because
> git-am strips all CRs indiscriminately.
It is not "git am" or "git apply" that "strips all CRs
indiscriminately". I just tried to apply 0001-add-f2 without
letting your MTA/MUA corrupt it on "master" branch in the repository
you prepared that patch from, i.e.
git checkout master^0 ;# go back
git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
git diff branch ;# nothing
> Another question I had about gitdiff_verify_name() -- what ensures there
> that the memcmp(), with the fixed size of 9 bytes,...
That may be worth fixing.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 18:54 ` Junio C Hamano
2014-09-23 19:31 ` Laszlo Ersek
@ 2014-09-23 20:17 ` Junio C Hamano
1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:17 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Junio C Hamano <gitster@pobox.com> writes:
> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
> turn out that what you are trying to do might be an equilvalent of
>
> git format-patch ... |
> # first lose all \r\n
> dos2unix |
> # then make everything \r\n
> unix2dos |
> # and apply
> git am
>
> which is not workable in the first place. I dunno.
This is a tangent, but if the problem were slightly different, I
would be more sympathetic. For example
A popular MUA, when asked to write out a message in the mbox
format, ends _all_ the lines in its output with CRLF, whether
the original was sent as LF-only or CRLF, and there is no way to
convince it to use LF-only. "git apply" fails to grok such an
input.
could be, if the use of such an MUA is very prevalent, a common
problem worth working around.
But then I would suspect that the workaround for such a case may not
be "accept /dev/null\n and /dev/null\r\n equally". It is likely
that the right workaround for such a case would be to "turn all \r\n
into \n before processing".
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:02 ` Junio C Hamano
@ 2014-09-23 20:32 ` Laszlo Ersek
2014-09-23 20:35 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 22:02, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> On 09/23/14 20:54, Junio C Hamano wrote:
>> ...
>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>> turn out that what you are trying to do might be an equilvalent of
>>>
>>> git format-patch ... |
>>> # first lose all \r\n
>>> dos2unix |
>>> # then make everything \r\n
>>> unix2dos |
>>> # and apply
>>> git am
>>>
>>> which is not workable in the first place. I dunno.
>>
>> I agree with your analysis. It is indeed the MTA...
>>> - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>> independently in the body.
>>
>> But why is this situation "not workable"? The same happens with *all*
>> patches that people mail around, it's just not visible to them, because
>> git-am strips all CRs indiscriminately.
>
> It is not "git am" or "git apply" that "strips all CRs
> indiscriminately". I just tried to apply 0001-add-f2 without
> letting your MTA/MUA corrupt it on "master" branch in the repository
> you prepared that patch from, i.e.
>
> git checkout master^0 ;# go back
> git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
> git diff branch ;# nothing
When you did this, was am.keepcr=true in effect?
Thanks
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 19:56 ` Junio C Hamano
@ 2014-09-23 20:33 ` Laszlo Ersek
2014-09-23 20:40 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 21:56, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
>
> I thought we agreed that what you are doing is not workable in the
> first place, no?
>
> I suspect one way to handle "In this project, the files that are
> checked out must be with CRLF line endings no matter what the
> platform is" might be to use the line ending attributes to force
> that while keeping the in-repository data with LF line endings. The
> diff output (format-patch output is just one of them) comes from
> comparing the in-repository representation, so you won't have \r\n
> that will be stripped via MTA in it, "apply" and "am" will apply the
> patch without having to worry about \r\n, _and_ the line ending
> attributes would end the lines in your in-working-tree files with
> CRLF that way.
This would be a perfect solution if the git repository was not a mirror
of a Subversion repository that contains all files with embedded CRLFs.
Anyway I accept defeat, thanks for your time.
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:32 ` Laszlo Ersek
@ 2014-09-23 20:35 ` Junio C Hamano
2014-09-23 20:49 ` Laszlo Ersek
0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:35 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> On 09/23/14 22:02, Junio C Hamano wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>>
>>> On 09/23/14 20:54, Junio C Hamano wrote:
>>> ...
>>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>>> turn out that what you are trying to do might be an equilvalent of
>>>>
>>>> git format-patch ... |
>>>> # first lose all \r\n
>>>> dos2unix |
>>>> # then make everything \r\n
>>>> unix2dos |
>>>> # and apply
>>>> git am
>>>>
>>>> which is not workable in the first place. I dunno.
>>>
>>> I agree with your analysis. It is indeed the MTA...
>>>> - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>>> independently in the body.
>>>
>>> But why is this situation "not workable"? The same happens with *all*
>>> patches that people mail around, it's just not visible to them, because
>>> git-am strips all CRs indiscriminately.
>>
>> It is not "git am" or "git apply" that "strips all CRs
>> indiscriminately". I just tried to apply 0001-add-f2 without
>> letting your MTA/MUA corrupt it on "master" branch in the repository
>> you prepared that patch from, i.e.
>>
>> git checkout master^0 ;# go back
>> git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
>> git diff branch ;# nothing
>
> When you did this, was am.keepcr=true in effect?
I actually briefly scratched my head but realized when I saw it work
"as expected" without me passing "--keep-cr" to "am" myself.
But I did that experiment in the repository created by following
your reproduction recipe, in which it had these:
git config core.whitespace cr-at-eol
git config am.keepcr true
so yes I had keepcr set.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:33 ` Laszlo Ersek
@ 2014-09-23 20:40 ` Junio C Hamano
2014-09-23 20:57 ` Laszlo Ersek
0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 20:40 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> On 09/23/14 21:56, Junio C Hamano wrote:
>> Laszlo Ersek <lersek@redhat.com> writes:
>>
>>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
>>
>> I thought we agreed that what you are doing is not workable in the
>> first place, no?
>>
>> I suspect one way to handle "In this project, the files that are
>> checked out must be with CRLF line endings no matter what the
>> platform is" might be to use the line ending attributes to force
>> that while keeping the in-repository data with LF line endings. The
>> diff output (format-patch output is just one of them) comes from
>> comparing the in-repository representation, so you won't have \r\n
>> that will be stripped via MTA in it, "apply" and "am" will apply the
>> patch without having to worry about \r\n, _and_ the line ending
>> attributes would end the lines in your in-working-tree files with
>> CRLF that way.
>
> This would be a perfect solution if the git repository was not a mirror
> of a Subversion repository that contains all files with embedded CRLFs.
Yikes.
> Anyway I accept defeat, thanks for your time.
I do not consider that a "defeat". It is just I do not think it is
the right solution for the problem you are having to butcher "apply".
You'd need to find some other way to solve it, and other people on
the list may be able to offer a solution neither of us thought of in
this thread.
Perhaps those who are on Windows have more experience in situations
like yours?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:35 ` Junio C Hamano
@ 2014-09-23 20:49 ` Laszlo Ersek
2014-09-23 21:35 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 22:35, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> On 09/23/14 22:02, Junio C Hamano wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> On 09/23/14 20:54, Junio C Hamano wrote:
>>>> ...
>>>>> SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
>>>>> turn out that what you are trying to do might be an equilvalent of
>>>>>
>>>>> git format-patch ... |
>>>>> # first lose all \r\n
>>>>> dos2unix |
>>>>> # then make everything \r\n
>>>>> unix2dos |
>>>>> # and apply
>>>>> git am
>>>>>
>>>>> which is not workable in the first place. I dunno.
>>>>
>>>> I agree with your analysis. It is indeed the MTA...
>>>>> - CR and LF MUST only occur together as CRLF; they MUST NOT appear
>>>>> independently in the body.
>>>>
>>>> But why is this situation "not workable"? The same happens with *all*
>>>> patches that people mail around, it's just not visible to them, because
>>>> git-am strips all CRs indiscriminately.
>>>
>>> It is not "git am" or "git apply" that "strips all CRs
>>> indiscriminately". I just tried to apply 0001-add-f2 without
>>> letting your MTA/MUA corrupt it on "master" branch in the repository
>>> you prepared that patch from, i.e.
>>>
>>> git checkout master^0 ;# go back
>>> git am 0001-add-f2* ;# apply that "+hello world\r\n" patch
>>> git diff branch ;# nothing
>>
>> When you did this, was am.keepcr=true in effect?
>
> I actually briefly scratched my head but realized when I saw it work
> "as expected" without me passing "--keep-cr" to "am" myself.
>
> But I did that experiment in the repository created by following
> your reproduction recipe, in which it had these:
>
> git config core.whitespace cr-at-eol
> git config am.keepcr true
>
> so yes I had keepcr set.
Thank you for confirming, I expected so.
Because in this case the test doesn't refute my claim that "git-am
strips all CRs indiscriminately".
Git-am *does* strip all CRs indiscriminately, undoing the CRs that the
email servers / clients introduce. Your above test worked out because
you prevented git-am from stripping the CRs, with the keepcr=true
setting. If you turn that off, then either your git am command won't
succeed (because it will run into context conflicts due to different
line endings -- although not in this example), or the final git-diff
will report differences.
In summary:
- the email infrastructure turns all line terminators into CRLFs
- git-am strips these by default, from source code lines and from git
diff header lines alike,
- this is fine for repos that store files with \n terminators,
- not fine for repos with embedded \r\n terminators -- the default
stripping behavior of git-am breaks the source code in that case
(runs into conflicts with existing files, and creates new files with
wrong line endings)
- if you set am.keepcr=true, then the source code remains intact (no
conflicts for existing files), but new files cannot be created,
because the /dev/null\r header lines are rejected.
Thanks
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:40 ` Junio C Hamano
@ 2014-09-23 20:57 ` Laszlo Ersek
0 siblings, 0 replies; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-23 20:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 22:40, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> On 09/23/14 21:56, Junio C Hamano wrote:
>>> Laszlo Ersek <lersek@redhat.com> writes:
>>>
>>>> What do you think about accepting only "/dev/null\n" and "/dev/null\r\n"?
>>>
>>> I thought we agreed that what you are doing is not workable in the
>>> first place, no?
>>>
>>> I suspect one way to handle "In this project, the files that are
>>> checked out must be with CRLF line endings no matter what the
>>> platform is" might be to use the line ending attributes to force
>>> that while keeping the in-repository data with LF line endings. The
>>> diff output (format-patch output is just one of them) comes from
>>> comparing the in-repository representation, so you won't have \r\n
>>> that will be stripped via MTA in it, "apply" and "am" will apply the
>>> patch without having to worry about \r\n, _and_ the line ending
>>> attributes would end the lines in your in-working-tree files with
>>> CRLF that way.
>>
>> This would be a perfect solution if the git repository was not a mirror
>> of a Subversion repository that contains all files with embedded CRLFs.
>
> Yikes.
>
>> Anyway I accept defeat, thanks for your time.
>
> I do not consider that a "defeat". It is just I do not think it is
> the right solution for the problem you are having to butcher "apply".
> You'd need to find some other way to solve it, and other people on
> the list may be able to offer a solution neither of us thought of in
> this thread.
>
> Perhaps those who are on Windows have more experience in situations
> like yours?
I'm not on Windows, "obviously". :)
The overwhelming majority of the EDK II developers use windows, and
connect directly to subversion. They work with CRLF line endings "natively".
Jordan (CC'd) operates a robot that mirrors SVN commits to the git repo
on github, with "git svn". "git svn rebase --use-log-author" fetches the
new SVN commits to the robots local git clone, and then "git push" (as
usual) pushes them to github. This is being done so that people knowing
git don't lose their sanity, trying to work with SVN.
The process works very well, up to a point (git-loving people clone the
github mirror, and submit patches with git-format-patch / git send-email
to edk2-devel). The problem is when you want to apply patches with
git-am from the list -- the CRLFs are a mess. Hence my thread starter here.
We can get around this by maintaining personal forks on github, pushing
our patches there too in parallel with the email postings. People can
then fetch directly, and avoid git-am altogether. But this is very
cumbersome -- you need a github account, you need an edk2 fork on
github, others need to add your repo as a remote, etc etc, while the
review occurs anyway in our MUAs.
Thanks
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 20:49 ` Laszlo Ersek
@ 2014-09-23 21:35 ` Junio C Hamano
2014-09-24 12:56 ` Laszlo Ersek
0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2014-09-23 21:35 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> In summary:
This is not entirely correct, though. But it suggests an avenue for
a possible enhancement.
> - the email infrastructure turns all line terminators into CRLFs
Yes, but that is within MTAs and is expected to be invisible at MUA
level. Typically mailbox files are stored in platform's native
format (e.g. I see LF endings in my mailbox), be it CRLF of LF.
The important thing to note here is that use of text/plain for
patches, if you want to have distinction between CRLF and LF in your
payload, is not designed to work over e-mails.
> - git-am strips these by default, from source code lines and from git
> diff header lines alike,
On a platform whose native line endings are LF, the stripping may
not even happen (i.e. "am/apply" may not see CRLF in the mailbox in
the first place).
On a platform whose mbox has CRLF, or if you "unix2dos" a mbox on a
platform whose mbox has LF to manufacture a mbox with CRLF line
endings, you would have an opposite issue. Information is lost in
MTAs while your e-mail is in transit and you cannot tell patch to
which paths had CRLF line endings and patch to which paths had LF
line endings. If all files you are tracking uniformly use CRLF, you
can assume that everything had CRLF but it is still an assumption
and that is a special case that is not particularly interesting.
Applying such a CRLF patch with your "/dev/null\r\n" patch applied
to "am" will _add_ unwanted CR before LF to files that are meant to
use LF line endings.
Now if we accept that this issue is coming from lossy nature of
transporting patches via e-mails, we would realize that the right
place to solve this is not in "apply"'s parsing of structural part
of the "diff" output (e.g. "diff --git ...", "rename from ..." or
"--- filename"), but the payload part (i.e. " " followed by context,
"-" followed by removed and "+" followed by added). Removal of CR
by "am -> mailsplit -> mailinfo -> apply" callchain is not losing
any information, as the input does not have useful information to
let us answer "are the lines in this path supposed to end with
CRLF?" in the first place; "/dev/null\r" patch is barking up a wrong
tree.
Our line-endings infrastructure (e.g. core.autocrlf configuration
variable, `eol` attribute) is all geared toward supporting cross
platform projects in that the BCP is to use LF line endings as the
canonical line endings for in-repository data and convert it to CRLF
for working tree files when necessary. We do not have direct
support (other than declaring contents for paths as "binary" aka "no
conversion") to use CRLF in in-repository data (and that is partly
deliberate).
But it is conceivable to enhance the attribute system to allow us to
mark certain paths (or "all paths", which is a trivial special case)
as using CRLF line endings in in-repository and in-working-tree. It
could be setting `eol` attribute to `both-crlf` or something.
Then "am -> mailsplit -> mailinfo -> apply" chain could:
* "mailsplit" and "mailinfo" does not have to do anything special,
other than stripping CR and make sure "apply" only sees LF
endings;
* "apply" is taught a new option "--fix-mta-corruption" which
causes it to pay attention to the `eol` attribute set to
`both-crlf`, and when it reads a patch
diff --git a/one b/one
--- one
+++ one
@@ -4,6 +4,6 @@
common1
common2
-old1
-old2
+new1
+new2
common3
common4
and notices that path "one" is marked as such, it pretends as if
the input were
diff --git a/one b/one
--- one
+++ one
@@ -4,6 +4,6 @@
common1^M
common2^M
-old1^M
-old2^M
+new1^M
+new2^M
common3^M
common4^M
to compensate for possible breakage during the mail transit.
* "am" is taught to pass "--fix-mta-corruption" to "apply" perhaps
by default.
Because code paths that internally run "git am", e.g. "git rebase",
work on data that is *not* corrupt by mta (we generate diff and
apply ourselves), these places need to tell "am" not to use the
"--fix-mta-corruption" when running "apply".
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-23 21:35 ` Junio C Hamano
@ 2014-09-24 12:56 ` Laszlo Ersek
2014-09-24 17:55 ` Junio C Hamano
0 siblings, 1 reply; 15+ messages in thread
From: Laszlo Ersek @ 2014-09-24 12:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, jordan.l.justen, matt.fleming
On 09/23/14 23:35, Junio C Hamano wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> [...]
> The important thing to note here is that use of text/plain for
> patches, if you want to have distinction between CRLF and LF in your
> payload, is not designed to work over e-mails.
That's good to know, thanks!
> Now if we accept that this issue is coming from lossy nature of
> transporting patches via e-mails, we would realize that the right
> place to solve this is not in "apply"'s parsing of structural part
> of the "diff" output (e.g. "diff --git ...", "rename from ..." or
> "--- filename"), but the payload part (i.e. " " followed by context,
> "-" followed by removed and "+" followed by added).
I can agree with this, yes.
> Removal of CR
> by "am -> mailsplit -> mailinfo -> apply" callchain is not losing
> any information, as the input does not have useful information to
> let us answer "are the lines in this path supposed to end with
> CRLF?" in the first place; "/dev/null\r" patch is barking up a wrong
> tree.
OK.
> Our line-endings infrastructure (e.g. core.autocrlf configuration
> variable, `eol` attribute) is all geared toward supporting cross
> platform projects in that the BCP is to use LF line endings as the
> canonical line endings for in-repository data and convert it to CRLF
> for working tree files when necessary. We do not have direct
> support (other than declaring contents for paths as "binary" aka "no
> conversion") to use CRLF in in-repository data (and that is partly
> deliberate).
I see. The edk2 "mirror-of-svn" git repo is then somewhat "incompatible"
with the original design.
> But it is conceivable to enhance the attribute system to allow us to
> mark certain paths (or "all paths", which is a trivial special case)
> as using CRLF line endings in in-repository and in-working-tree. It
> could be setting `eol` attribute to `both-crlf` or something.
>
> Then "am -> mailsplit -> mailinfo -> apply" chain could:
>
> * "mailsplit" and "mailinfo" does not have to do anything special,
> other than stripping CR and make sure "apply" only sees LF
> endings;
>
> * "apply" is taught a new option "--fix-mta-corruption" which
> causes it to pay attention to the `eol` attribute set to
> `both-crlf`, and when it reads a patch
>
> diff --git a/one b/one
> --- one
> +++ one
> @@ -4,6 +4,6 @@
> common1
> common2
> -old1
> -old2
> +new1
> +new2
> common3
> common4
>
> and notices that path "one" is marked as such, it pretends as if
> the input were
>
> diff --git a/one b/one
> --- one
> +++ one
> @@ -4,6 +4,6 @@
> common1^M
> common2^M
> -old1^M
> -old2^M
> +new1^M
> +new2^M
> common3^M
> common4^M
>
> to compensate for possible breakage during the mail transit.
>
> * "am" is taught to pass "--fix-mta-corruption" to "apply" perhaps
> by default.
>
> Because code paths that internally run "git am", e.g. "git rebase",
> work on data that is *not* corrupt by mta (we generate diff and
> apply ourselves), these places need to tell "am" not to use the
> "--fix-mta-corruption" when running "apply".
Thank you for taking the time to describe this. It does look like the
by-the-book solution.
Obviously, I can't implement it myself -- first, I have no experience
with the git codebase, second, I have no time nor mandate to get
familiar with it and to make a serious effort to improve it in this
direction. (IOW "git" is a tool for my job, not my job.) The one-liner
patch and this discussion is all I can manage -- I thought I'd take a
stab at it myself rather than just "complain".
If someone finds the time to implement and document this feature, a
small part of the community will be very grateful. (Not much of a
compensation for a corner case like this, admittedly.)
Thanks,
Laszlo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r"
2014-09-24 12:56 ` Laszlo Ersek
@ 2014-09-24 17:55 ` Junio C Hamano
0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2014-09-24 17:55 UTC (permalink / raw)
To: Laszlo Ersek; +Cc: git, jordan.l.justen, matt.fleming
Laszlo Ersek <lersek@redhat.com> writes:
> Thank you for taking the time to describe this. It does look like the
> by-the-book solution.
>
> Obviously, I can't implement it myself -- first, I have no experience
> with the git codebase, ...
Oh, I wasn't expecting that anyway ;-).
The reason I outlined a possible approach was to ask you to sanity
check the envisioned outcome (i.e. "if somebody made effort to
implement it, would the end result help your workflow?") and to give
hints to contributors who are looking for things to work on ;-)
> If someone finds the time to implement and document this feature, a
> small part of the community will be very grateful. (Not much of a
> compensation for a corner case like this, admittedly.)
Thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-09-24 17:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23 1:09 [PATCH for-maint] apply: gitdiff_verify_name(): accept "/dev/null\r" Laszlo Ersek
2014-09-23 18:54 ` Junio C Hamano
2014-09-23 19:31 ` Laszlo Ersek
2014-09-23 19:56 ` Junio C Hamano
2014-09-23 20:33 ` Laszlo Ersek
2014-09-23 20:40 ` Junio C Hamano
2014-09-23 20:57 ` Laszlo Ersek
2014-09-23 20:02 ` Junio C Hamano
2014-09-23 20:32 ` Laszlo Ersek
2014-09-23 20:35 ` Junio C Hamano
2014-09-23 20:49 ` Laszlo Ersek
2014-09-23 21:35 ` Junio C Hamano
2014-09-24 12:56 ` Laszlo Ersek
2014-09-24 17:55 ` Junio C Hamano
2014-09-23 20:17 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).