git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>, git@vger.kernel.org
Subject: Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows
Date: Fri, 31 Mar 2017 21:44:15 +0200	[thread overview]
Message-ID: <bbd60ab1-1309-6b1e-9b7f-09764bab5ccd@gmail.com> (raw)
In-Reply-To: <264c72d0-9558-fa0d-e5ee-eaca894538be@web.de>

W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
> On 30.03.17 21:35, Jakub Narębski wrote:
>> Hello,
>>
>> Recently I had to work on a project which uses legacy 8-bit encoding
>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>> documents).  My terminal, that is Git Bash from Git for Windows is set
>> up for utf-8.
>>
>> I wanted for "git diff" and friends to return something sane on said
>> utf-8 terminal, instead of mojibake.  There is 'encoding'
>> gitattribute... but it works only for GUI ('git gui', that is).
>>
>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>> file encoding to utf-8 encoding of console.
>>
>> I have set the following in .gitattributes file:
>>
>>   ## LaTeX documents in cp1250 encoding
>>   *.tex text diff=mylatex
>>
>> The 'mylatex' driver is defined as:
>>
>>   [diff "mylatex"]
>>         xfuncname = "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>>         wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
>>         textconv  = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t utf-8
>>         cachetextconv = true
>>
>> And everything would be all right... if not the fact that Git appends
>> spurious ^M to added lines in the `git diff` output.  Files use CRLF
>> end-of-line convention (the native MS Windows one).
>>
>>   $ git diff test.tex
>>   diff --git a/test.tex b/test.tex
>>   index 029646e..250ab16 100644
>>   --- a/test.tex
>>   +++ b/test.tex
>>   @@ -1,4 +1,4 @@
>>   -\documentclass{article}
>>   +\documentclass{mwart}^M
>>   
>>    \usepackage[cp1250]{inputenc}
>>    \usepackage{polski}
>>
>> What gives?  Why there is this ^M tacked on the end of added lines,
>> while it is not present in deleted lines, nor in content lines?
>>
>> Puzzled.
>>
>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>> that it doesn't supports in core `encoding` attribute together with
>> having `i18n.outputEncoding`.
>
> Is there a chance to give us a receipt how to reproduce it?
> A complete test script or ?
> (I don't want to speculate, if the invocation of iconv is the problem,
>  where stdout is not in "binary mode", or however this is called under Windows)

I'm sorry, I though I posted whole recipe, but I missed some details
in the above description of the case.

First, files are stored on filesystem using CRLF eol (DOS end-of-line
convention).  Due to `core.autocrlf` they are converted to LF in blobs,
that is in the index and in the repository.

Second, a textconv with filter preserving end-of-line needs to be
configured.  I have used `iconv`, but I suspect that the problem would
happen also for `cat`.

In the .gitattributes file, or .git/info/attributes add, for example:

  *.tex text diff=myconv

In the .git/config configure the textconv filter, for example:

  [diff "myconv"]
         textconv  = iconv.exe -f cp1250 -t utf-8

Create a file which filename matches the attribute line, and which
uses CRLF end of line convention, and add it to Git (adding it to
the index):

  $ printf "foo\r\n" >foo.tex
  $ git add foo.tex

Modify file (also with CRLF):

  $ printf "bar\r\n" >foo.tex

Check the difference

  $ git diff foo.tex

HTH
-- 
Jakub Narębski


  reply	other threads:[~2017-03-31 19:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-30 19:35 [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows Jakub Narębski
2017-03-30 20:00 ` Jeff King
2017-03-31 13:24   ` Jakub Narębski
2017-04-01  6:08     ` Jeff King
2017-04-01 18:31       ` Jakub Narębski
2017-04-02  7:45         ` Jeff King
2017-04-02 11:40           ` Jakub Narębski
2017-03-31 12:38 ` Torsten Bögershausen
2017-03-31 19:44   ` Jakub Narębski [this message]
2017-04-02  4:34     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbd60ab1-1309-6b1e-9b7f-09764bab5ccd@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).