From: "Torsten Bögershausen" <tboegi@web.de>
To: "Jakub Narębski" <jnareb@gmail.com>, git@vger.kernel.org
Subject: Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows
Date: Sun, 2 Apr 2017 06:34:26 +0200 [thread overview]
Message-ID: <fde70c33-72ed-4035-2a55-2184fa9ac553@web.de> (raw)
In-Reply-To: <bbd60ab1-1309-6b1e-9b7f-09764bab5ccd@gmail.com>
On 2017-03-31 21:44, Jakub Narębski wrote:
> W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
>> On 30.03.17 21:35, Jakub Narębski wrote:
>>> Hello,
>>>
>>> Recently I had to work on a project which uses legacy 8-bit encoding
>>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>>> documents). My terminal, that is Git Bash from Git for Windows is set
>>> up for utf-8.
>>>
>>> I wanted for "git diff" and friends to return something sane on said
>>> utf-8 terminal, instead of mojibake. There is 'encoding'
>>> gitattribute... but it works only for GUI ('git gui', that is).
>>>
>>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>>> file encoding to utf-8 encoding of console.
>>>
>>> I have set the following in .gitattributes file:
>>>
>>> ## LaTeX documents in cp1250 encoding
>>> *.tex text diff=mylatex
>>>
>>> The 'mylatex' driver is defined as:
>>>
>>> [diff "mylatex"]
>>> xfuncname = "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>>> wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
>>> textconv = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t utf-8
>>> cachetextconv = true
>>>
>>> And everything would be all right... if not the fact that Git appends
>>> spurious ^M to added lines in the `git diff` output. Files use CRLF
>>> end-of-line convention (the native MS Windows one).
>>>
>>> $ git diff test.tex
>>> diff --git a/test.tex b/test.tex
>>> index 029646e..250ab16 100644
>>> --- a/test.tex
>>> +++ b/test.tex
>>> @@ -1,4 +1,4 @@
>>> -\documentclass{article}
>>> +\documentclass{mwart}^M
>>>
>>> \usepackage[cp1250]{inputenc}
>>> \usepackage{polski}
>>>
>>> What gives? Why there is this ^M tacked on the end of added lines,
>>> while it is not present in deleted lines, nor in content lines?
>>>
>>> Puzzled.
>>>
>>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>>> that it doesn't supports in core `encoding` attribute together with
>>> having `i18n.outputEncoding`.
>>
>> Is there a chance to give us a receipt how to reproduce it?
>> A complete test script or ?
>> (I don't want to speculate, if the invocation of iconv is the problem,
>> where stdout is not in "binary mode", or however this is called under Windows)
>
> I'm sorry, I though I posted whole recipe, but I missed some details
> in the above description of the case.
>
> First, files are stored on filesystem using CRLF eol (DOS end-of-line
> convention). Due to `core.autocrlf` they are converted to LF in blobs,
> that is in the index and in the repository.
>
> Second, a textconv with filter preserving end-of-line needs to be
> configured. I have used `iconv`, but I suspect that the problem would
> happen also for `cat`.
>
> In the .gitattributes file, or .git/info/attributes add, for example:
>
> *.tex text diff=myconv
>
> In the .git/config configure the textconv filter, for example:
>
> [diff "myconv"]
> textconv = iconv.exe -f cp1250 -t utf-8
>
> Create a file which filename matches the attribute line, and which
> uses CRLF end of line convention, and add it to Git (adding it to
> the index):
>
> $ printf "foo\r\n" >foo.tex
> $ git add foo.tex
>
> Modify file (also with CRLF):
>
> $ printf "bar\r\n" >foo.tex
>
> Check the difference
>
> $ git diff foo.tex
>
> HTH
>
There seems to be a bug in Git, when it comes to "git diff".
Before we feed the content of the working tree into the
diff machinery, a call to convert_to_git() should be made.
But it seems as there is something missing, the expected
"+fox" becomes a "+foxQ"
#!/bin/sh
test_description='CRLF with diff filter'
. ./test-lib.sh
test_expect_success 'setup' '
git config core.autocrlf input &&
printf "foo\r\n" >foo.tex &&
git add foo.tex &&
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
git commit -m "Add foo.txt" &&
cat >.git/config <<-\EOF
[diff "myconv"]
textconv = sed -e "s/f/g"
EOF
'
test_expect_success 'check EOL in diff' '
printf "fox\r\n" >foo.tex &&
cat >expect <<-\EOF &&
diff --git a/foo.tex b/foo.tex
index 257cc56..88c2893 100644
--- a/foo.tex
+++ b/foo.tex
@@ -1 +1 @@
-foo
+fox
EOF
git diff foo.tex | tr "\015" Q >actual &&
test_cmp expect actual
'
test_done
prev parent reply other threads:[~2017-04-02 4:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-30 19:35 [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows Jakub Narębski
2017-03-30 20:00 ` Jeff King
2017-03-31 13:24 ` Jakub Narębski
2017-04-01 6:08 ` Jeff King
2017-04-01 18:31 ` Jakub Narębski
2017-04-02 7:45 ` Jeff King
2017-04-02 11:40 ` Jakub Narębski
2017-03-31 12:38 ` Torsten Bögershausen
2017-03-31 19:44 ` Jakub Narębski
2017-04-02 4:34 ` Torsten Bögershausen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fde70c33-72ed-4035-2a55-2184fa9ac553@web.de \
--to=tboegi@web.de \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).