git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Re: Script for handling UTF-16 files
       [not found] <608a349b-cc71-4cba-9197-3783049e9f47@googlegroups.com>
@ 2013-04-10 18:59 ` Karsten Blees
  2013-04-11 19:11   ` Ken Ismert
  0 siblings, 1 reply; 2+ messages in thread
From: Karsten Blees @ 2013-04-10 18:59 UTC (permalink / raw
  To: Ken Ismert; +Cc: msysgit, git

Am 10.04.2013 01:47, schrieb Ken Ismert:
> 
> I bumped into the UTF-16 display problem with Git Extensions running on top of msysGit. After lots of searching and experimenting, I came up with a solution that works for me.
> 
> Note: Please see questions below.
> 
> This method is for MSysGit 1.8.1, and is tested on Windows XP. I use Git Extensions 2.44, but since the changes are at the Git level, they should work for Git Gui as well. Steps:

There has been a discussion about handling UTF-16 on the git ML a while back, see http://thread.gmane.org/gmane.comp.version-control.git/159708

As suggested there, I would try to use a clean/smudge filter (i.e. store UTF-16 files as UTF-8 in the repository and convert back to UTF-16 on checkout). That way git can treat your UTF-16 files as text in most cases (i.e. you can merge them, git-grep works, gitattributes work (eol-conversion, ident-replacement, built-in diff patterns...)).

If you use a textconv filter, UTF-16 content will be treated as binary by most git operations.

There's also an 'encoding' attribute and a 'gui.encoding' setting which in theory should solve your issue (i.e. specify encoding of files for display by GUI tools). I don't know if Git Extensions supports that, or whether its supposed to work for binary files at all.

> 3) Modify the global ~/Git/etc/gitconfig or your local ~/.git/config file, and add these lines:
> 
>     [diff "astextutf16"]
>         textconv = astextutf16

Why not simply "textconv = iconv -f utf-16 -t utf-8", without the extra script?

> c) I had success with iconv, but is there any built-in UTF-16 to UTF-8 converter that ships with msysGit?

There are ready-to-use UTF-conversion functions in the codebase, but these are not accessible as a git command or built-in filter.

> As a quick fix, how hard would it be to add a 'utf16' diff filter, similar to cpp or |csharp? Or is this simply the wrong place to put in a work-around?

As described above, I think a diff filter is not the right tool for the job. The only universal format for text content that works reasonably well with established text-based technologies (merge algorithms, regex etc.) is UTF-8. If we want to benefit from these technologies, git should store text files as UTF-8 and convert from / to platform-specific formats on checkin / checkout or for display.

Bye,
Karsten

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Script for handling UTF-16 files
  2013-04-10 18:59 ` Script for handling UTF-16 files Karsten Blees
@ 2013-04-11 19:11   ` Ken Ismert
  0 siblings, 0 replies; 2+ messages in thread
From: Ken Ismert @ 2013-04-11 19:11 UTC (permalink / raw
  To: msysgit; +Cc: Ken Ismert, git

[-- Attachment #1: Type: text/plain, Size: 1771 bytes --]


Karsten, 

There's also an 'encoding' attribute and a 'gui.encoding' setting which in 
> theory should solve your issue (i.e. specify encoding of files for display 
> by GUI tools). I don't know if Git Extensions supports that, or whether its 
> supposed to work for binary files at all. 
>

I played around with this, and neither 'encoding' or 'gui.encoding' solves 
my problem in Get Extensions or Git Gui.

According to the man page, I would put this entry in my .gitattributes:
*.txt encoding=utf-8

This sounds rather tautological -- "Why yes, I do want to display my UTF-16 
file as text!"

More to the point, since Git *doesn't know* the original encoding of a 
'non-standard' Unicode file, how can it possibly know how to convert it to 
UTF-8?

So, 'encoding' and 'gui.encoding' seem to be poorly conceived, and 
certainly aren't helpful for the Windows UTF-16 problem.

-Ken

-- 
-- 
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free.

You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msysgit@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscribe@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups "msysGit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



[-- Attachment #2: Type: text/html, Size: 2433 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-04-11 19:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <608a349b-cc71-4cba-9197-3783049e9f47@googlegroups.com>
2013-04-10 18:59 ` Script for handling UTF-16 files Karsten Blees
2013-04-11 19:11   ` Ken Ismert

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).