* Handling text files encoded in little-endian UTF-16 with BOM
@ 2019-07-05 11:35 Mateusz Loskot
2019-07-05 16:25 ` Torsten Bögershausen
0 siblings, 1 reply; 3+ messages in thread
From: Mateusz Loskot @ 2019-07-05 11:35 UTC (permalink / raw)
To: git
Hi,
Using git version 2.22.0.windows.1
I have a repository with number of .txt files encoded in
little-endian UTF-16 with BOM.
What are the best practice and recommended configuration to
manage such files with Git to avoid unexpected re-encoding to
UTF-8 or others?
Currently, there is .gitattriuts with entries like
resource/*.txt working-tree-encoding=UTF-16LE-BOM -text
Despite that some of team members have noticed that the files
occacionally get re-encoded to UTF-8. It is unknow what are
actual steps leading to that. BTW, there a few Git clients
in use: git in Git Bash, VSCode, Fork.
What bothers me in the .gitattributes is this `-text` attribute.
Is the use of `working-tree-encoding` and `-text` together a
valid combination at all?
The documentation at https://git-scm.com/docs/gitattributes
does not seem to touch on that.
I'll appreciate any suggestions on those UTF-16LE-BOM files.
Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Handling text files encoded in little-endian UTF-16 with BOM
2019-07-05 11:35 Handling text files encoded in little-endian UTF-16 with BOM Mateusz Loskot
@ 2019-07-05 16:25 ` Torsten Bögershausen
2019-07-05 18:52 ` Mateusz Loskot
0 siblings, 1 reply; 3+ messages in thread
From: Torsten Bögershausen @ 2019-07-05 16:25 UTC (permalink / raw)
To: Mateusz Loskot; +Cc: git
On Fri, Jul 05, 2019 at 01:35:13PM +0200, Mateusz Loskot wrote:
> Hi,
>
> Using git version 2.22.0.windows.1
>
> I have a repository with number of .txt files encoded in
> little-endian UTF-16 with BOM.
>
> What are the best practice and recommended configuration to
> manage such files with Git to avoid unexpected re-encoding to
> UTF-8 or others?
>
> Currently, there is .gitattriuts with entries like
>
> resource/*.txt working-tree-encoding=UTF-16LE-BOM -text
>
> Despite that some of team members have noticed that the files
> occacionally get re-encoded to UTF-8. It is unknow what are
> actual steps leading to that. BTW, there a few Git clients
> in use: git in Git Bash, VSCode, Fork.
If possible, I don't want to comment on this kind of
"sometimes something happens something on someones computer" thing.
A little bit more information could be helpful.
>
> What bothers me in the .gitattributes is this `-text` attribute.
>
> Is the use of `working-tree-encoding` and `-text` together a
> valid combination at all?
Yes, it means that the content re-encoded between the repo and the working tree,
(that is what you want)
And the "-text" means "leave the line endings" (LF or CRLF) as is, don't convert them.
In that sense you can call that a legal combination, but may be not a recommended one.
>
> The documentation at https://git-scm.com/docs/gitattributes
> does not seem to touch on that.
>
> I'll appreciate any suggestions on those UTF-16LE-BOM files.
>
My suggestion would be to use the "text" attribute:
resource/*.txt working-tree-encoding=UTF-16LE-BOM text
And depending on your application: Do the resource files need a special line ending ?
The use either
resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=LF
or
resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=CRLF
I hope that helps a little bit.
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Handling text files encoded in little-endian UTF-16 with BOM
2019-07-05 16:25 ` Torsten Bögershausen
@ 2019-07-05 18:52 ` Mateusz Loskot
0 siblings, 0 replies; 3+ messages in thread
From: Mateusz Loskot @ 2019-07-05 18:52 UTC (permalink / raw)
To: git
On Fri, 5 Jul 2019 at 18:25, Torsten Bögershausen <tboegi@web.de> wrote:
> On Fri, Jul 05, 2019 at 01:35:13PM +0200, Mateusz Loskot wrote:
> >
> > Using git version 2.22.0.windows.1
> >
> > I have a repository with number of .txt files encoded in
> > little-endian UTF-16 with BOM.
> >
> > What are the best practice and recommended configuration to
> > manage such files with Git to avoid unexpected re-encoding to
> > UTF-8 or others?
> >
> > Currently, there is .gitattriuts with entries like
> >
> > resource/*.txt working-tree-encoding=UTF-16LE-BOM -text
> >
> > Despite that some of team members have noticed that the files
> > occacionally get re-encoded to UTF-8. It is unknow what are
> > actual steps leading to that. BTW, there a few Git clients
> > in use: git in Git Bash, VSCode, Fork.
>
> If possible, I don't want to comment on this kind of
> "sometimes something happens something on someones computer" thing.
Perfectly understood.
> A little bit more information could be helpful.
If there was more, I'd have provided.
> > What bothers me in the .gitattributes is this `-text` attribute.
> >
> > Is the use of `working-tree-encoding` and `-text` together a
> > valid combination at all?
>
> Yes, it means that the content re-encoded between the repo and the working tree,
> (that is what you want)
> And the "-text" means "leave the line endings" (LF or CRLF) as is, don't convert them.
That's quite a useful insight. I understood "-text" means content is
not a text, but binary.
> In that sense you can call that a legal combination, but may be not a recommended one.
Right.
> > The documentation at https://git-scm.com/docs/gitattributes
> > does not seem to touch on that.
> >
> > I'll appreciate any suggestions on those UTF-16LE-BOM files.
> >
>
> My suggestion would be to use the "text" attribute:
> resource/*.txt working-tree-encoding=UTF-16LE-BOM text
>
> And depending on your application: Do the resource files need a special line ending ?
I need CRLF.
> The use either
> resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=LF
> or
> resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=CRLF
This is very helpful. Thanks!
--
Mateusz Loskot, http://mateusz.loskot.net
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-07-05 18:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05 11:35 Handling text files encoded in little-endian UTF-16 with BOM Mateusz Loskot
2019-07-05 16:25 ` Torsten Bögershausen
2019-07-05 18:52 ` Mateusz Loskot
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).