git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* help request: unable to merge UTF-16-LE "text" file
@ 2022-04-19 19:36 Kevin Long
  2022-04-19 21:50 ` brian m. carlson
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Kevin Long @ 2022-04-19 19:36 UTC (permalink / raw)
  To: git

Greetings,

I've been struggling to merge branches because of a UTF-16-LE (with BOM?) file.

Windows 11 / git version 2.35.3.windows.1

The problem file is a .sln file (Visual Studio "solution"). Edited in
both branches. It is a "text" file, but is encoded as such:

FacilityMaster.sln: Unicode text, UTF-16, little-endian text, with
CRLF line terminators

I have tried several "working-tree-encoding" settings in
.gitattributes in my local working directory, to no avail yet:

*.sln working-tree-encoding=UTF-16-LE eol=CRLF, results in:
error: failed to encode 'FacilityMaster.sln' from UTF-16-LE to UTF-8
warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)

*.sln working-tree-encoding=UTF-16 eol=CRLF, results in:
warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)

*.sln working-tree-encoding=UTF-16-LE-BOM eol=CRLF
error: failed to encode 'FacilityMaster.sln' from UTF-16-LE-BOM to UTF-8
warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)


Hoping for some suggestions. I've also tried to save the file as UTF-8
in both branches, commit, then merge, but still that did not work. I
just want to merge it like a normal source code file.


Thank you,

Kevin Long

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-19 19:36 help request: unable to merge UTF-16-LE "text" file Kevin Long
@ 2022-04-19 21:50 ` brian m. carlson
  2022-04-20 17:25 ` Erik Cervin Edin
  2022-04-22 18:42 ` Plato Kiorpelidis
  2 siblings, 0 replies; 10+ messages in thread
From: brian m. carlson @ 2022-04-19 21:50 UTC (permalink / raw)
  To: Kevin Long; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

On 2022-04-19 at 19:36:19, Kevin Long wrote:
> Greetings,
> 
> I've been struggling to merge branches because of a UTF-16-LE (with BOM?) file.
> 
> Windows 11 / git version 2.35.3.windows.1
> 
> The problem file is a .sln file (Visual Studio "solution"). Edited in
> both branches. It is a "text" file, but is encoded as such:
> 
> FacilityMaster.sln: Unicode text, UTF-16, little-endian text, with
> CRLF line terminators

Git does not consider files using UTF-16 to be text because they contain
NUL bytes.  In some sense they do represent textual content, but Git
considers them to be binary.

> I have tried several "working-tree-encoding" settings in
> .gitattributes in my local working directory, to no avail yet:
> 
> *.sln working-tree-encoding=UTF-16-LE eol=CRLF, results in:
> error: failed to encode 'FacilityMaster.sln' from UTF-16-LE to UTF-8
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)
> 
> *.sln working-tree-encoding=UTF-16 eol=CRLF, results in:
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)
> 
> *.sln working-tree-encoding=UTF-16-LE-BOM eol=CRLF
> error: failed to encode 'FacilityMaster.sln' from UTF-16-LE-BOM to UTF-8
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)

The proper encoding you want here is "UTF-16LE-BOM".  Many Windows
programs use a non-standard encoding where everything _must_ be both
little-endian and have a BOM.  (The standard encoding UTF-16LE must
always be little endian but omits the BOM, and UTF-16 could be of either
endianness, and must only contain a BOM if little endian, but could in
either case.)

That will result in the file being stored as UTF-8 in the repository and
converted to this non-standard Windows encoding on checkout.  However,
if you have already checked the file in without an appropriate
working-tree-encoding, you should run `git add --renormalize .` and then
commit.  You'll need to do that (or merge in a commit that does that) on
every branch you want to work with.

> Hoping for some suggestions. I've also tried to save the file as UTF-8
> in both branches, commit, then merge, but still that did not work. I
> just want to merge it like a normal source code file.

However, in order for the merge to work, both branches must have the
file checked in correctly.  That is, both master and the branch from
which you're merging need to have the file as UTF-8 in the repository.
If you make the working-tree-encoding changes above (or the switch to
UTF-8) on only one of those branches, then the other one will still have
the binary blob, and merging won't be possible.

If you can keep it as UTF-8, that's ideal.  It should definitely work if
both sides have UTF-8 files.  If you still see a message about binary
files, then it could well be that something didn't get saved properly as
UTF-8, or that these really aren't text files and that they contain
binary contents.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-19 19:36 help request: unable to merge UTF-16-LE "text" file Kevin Long
  2022-04-19 21:50 ` brian m. carlson
@ 2022-04-20 17:25 ` Erik Cervin Edin
  2022-04-20 17:38   ` Kevin Long
  2022-04-20 17:50   ` Junio C Hamano
  2022-04-22 18:42 ` Plato Kiorpelidis
  2 siblings, 2 replies; 10+ messages in thread
From: Erik Cervin Edin @ 2022-04-20 17:25 UTC (permalink / raw)
  To: Kevin Long; +Cc: git

On Wed, Apr 20, 2022 at 6:29 PM Kevin Long <kevinlong206@gmail.com> wrote:
>
> The problem file is a .sln file (Visual Studio "solution"). Edited in
> both branches. It is a "text" file, but is encoded as such:

Can you convert it to utf-8 in both branches and then merge?
  iconv.exe -f utf-16le -t utf-8 foo.sln > tmp.sln && mv -f tmp.sln foo.sln

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-20 17:25 ` Erik Cervin Edin
@ 2022-04-20 17:38   ` Kevin Long
  2022-04-20 17:50   ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Kevin Long @ 2022-04-20 17:38 UTC (permalink / raw)
  To: Erik Cervin Edin; +Cc: git

Thanks, I actually did try that without success. I'm wondering if
there is an "ancestor" object that is still binary or something, deep
rooted in the tree.

Converting it once and for all to utf-8 would be great, as that will
work fine with Visual Studio.  Not sure why the original committer
chose UTF-16.

On Wed, Apr 20, 2022 at 10:26 AM Erik Cervin Edin <erik@cervined.in> wrote:
>
> On Wed, Apr 20, 2022 at 6:29 PM Kevin Long <kevinlong206@gmail.com> wrote:
> >
> > The problem file is a .sln file (Visual Studio "solution"). Edited in
> > both branches. It is a "text" file, but is encoded as such:
>
> Can you convert it to utf-8 in both branches and then merge?
>   iconv.exe -f utf-16le -t utf-8 foo.sln > tmp.sln && mv -f tmp.sln foo.sln

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-20 17:25 ` Erik Cervin Edin
  2022-04-20 17:38   ` Kevin Long
@ 2022-04-20 17:50   ` Junio C Hamano
  2022-04-20 17:55     ` Erik Cervin Edin
  2022-04-24  1:57     ` Chris Torek
  1 sibling, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-04-20 17:50 UTC (permalink / raw)
  To: Erik Cervin Edin; +Cc: Kevin Long, git

Erik Cervin Edin <erik@cervined.in> writes:

> On Wed, Apr 20, 2022 at 6:29 PM Kevin Long <kevinlong206@gmail.com> wrote:
>>
>> The problem file is a .sln file (Visual Studio "solution"). Edited in
>> both branches. It is a "text" file, but is encoded as such:
>
> Can you convert it to utf-8 in both branches and then merge?
>   iconv.exe -f utf-16le -t utf-8 foo.sln > tmp.sln && mv -f tmp.sln foo.sln

For that to work, it is likely that you'd need to convert not just
the tips of two branches getting merged, but also the merge base
commit, so that all three trees involved in the 3-way merge are in
the same text encoding.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-20 17:50   ` Junio C Hamano
@ 2022-04-20 17:55     ` Erik Cervin Edin
  2022-04-20 18:03       ` Erik Cervin Edin
  2022-04-24  1:57     ` Chris Torek
  1 sibling, 1 reply; 10+ messages in thread
From: Erik Cervin Edin @ 2022-04-20 17:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Long, git

On Wed, Apr 20, 2022 at 7:50 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> For that to work, it is likely that you'd need to convert not just
> the tips of two branches getting merged, but also the merge base
> commit, so that all three trees involved in the 3-way merge are in
> the same text encoding.

Hmm.. Right, the common ancestor would still be the old encoding :/

On Wed, Apr 20, 2022 at 7:38 PM Kevin Long <kevinlong206@gmail.com> wrote:
>
> Thanks, I actually did try that without success. I'm wondering if
> there is an "ancestor" object that is still binary or something, deep
> rooted in the tree.

Yes, as Junio correctly pointed out.

> Converting it once and for all to utf-8 would be great, as that will
> work fine with Visual Studio.  Not sure why the original committer
> chose UTF-16.

Visual Studio probably saved it as utf-16le with BOM :(

> I have tried several "working-tree-encoding" settings in
> .gitattributes in my local working directory, to no avail yet:
>
> *.sln working-tree-encoding=UTF-16-LE eol=CRLF, results in:
> error: failed to encode 'FacilityMaster.sln' from UTF-16-LE to UTF-8
> warning: Cannot merge binary files: FacilityMaster.sln (HEAD vs. master)

Could the issue be that it should be UTF-16LE (without hyphen)?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-20 17:55     ` Erik Cervin Edin
@ 2022-04-20 18:03       ` Erik Cervin Edin
  0 siblings, 0 replies; 10+ messages in thread
From: Erik Cervin Edin @ 2022-04-20 18:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Long, git

On Wed, Apr 20, 2022 at 6:29 PM Kevin Long <kevinlong206@gmail.com> wrote:
>
> Hoping for some suggestions. I've also tried to save the file as UTF-8
> in both branches, commit, then merge, but still that did not work. I
> just want to merge it like a normal source code file.

Solution files tend to be of the "for machines only" variety.
In my experience, it often ends up being a bunch of csproj GUIDs
changing back and forth.
Consider just taking one of the two and manually fixing the solution file?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-19 19:36 help request: unable to merge UTF-16-LE "text" file Kevin Long
  2022-04-19 21:50 ` brian m. carlson
  2022-04-20 17:25 ` Erik Cervin Edin
@ 2022-04-22 18:42 ` Plato Kiorpelidis
  2 siblings, 0 replies; 10+ messages in thread
From: Plato Kiorpelidis @ 2022-04-22 18:42 UTC (permalink / raw)
  To: Kevin Long; +Cc: git

On 22/04/19 12:36PM, Kevin Long wrote:
> Greetings,

Hey Kevin,

> [..]

It would also help if you create a minimal working example[1] and upload it to a
popular Git hosting service like Github/Gitlab, so we can try to perform the
merge. This way, you will probably get more precise directions.

[1]: https://en.wikipedia.org/wiki/Minimal_reproducible_example

Thanks,
Plato

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
  2022-04-20 17:50   ` Junio C Hamano
  2022-04-20 17:55     ` Erik Cervin Edin
@ 2022-04-24  1:57     ` Chris Torek
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Torek @ 2022-04-24  1:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Erik Cervin Edin, Kevin Long, Git List

On Fri, Apr 22, 2022 at 10:27 AM Junio C Hamano <gitster@pobox.com> wrote:
> For that to work, it is likely that you'd need to convert not just
> the tips of two branches getting merged, but also the merge base
> commit, so that all three trees involved in the 3-way merge are in
> the same text encoding.

The old merge-recursive has `-X renormalize` that I believe would
do this for you. (I see code in merge-ort for this as well, but have no
handy means to test it myself.)

Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: help request: unable to merge UTF-16-LE "text" file
@ 2024-01-12 20:34 Michael Litwak
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Litwak @ 2024-01-12 20:34 UTC (permalink / raw)
  To: kioplato@gmail.com; +Cc: git@vger.kernel.org, kevinlong206@gmail.com


This is in reply to the message at 
https://lore.kernel.org/all/20220422184211.5z67sxrgq2zm3tvd@compass/#t

Sent via MS Outlook - hope this reaches the online thread.

------------------------------------------------------------------------

On Windows, if you do

    git add

from an ordinary Command Prompt, it will fail to call iconv.exe to perform
the necessary text conversion.  I.E. your UTF-16LE with BOM file will not be
properly converted by Git to UTF-8 for its internal storage, leading to
subsequent encoding errors.

So, in addition to setting the working-tree-encoding of the file to 
UTF-16LE-BOM in .gitattributes prior to adding the file, be sure to run:

    git add   

from a git bash console when adding such files.

- Michael



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-12 20:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 19:36 help request: unable to merge UTF-16-LE "text" file Kevin Long
2022-04-19 21:50 ` brian m. carlson
2022-04-20 17:25 ` Erik Cervin Edin
2022-04-20 17:38   ` Kevin Long
2022-04-20 17:50   ` Junio C Hamano
2022-04-20 17:55     ` Erik Cervin Edin
2022-04-20 18:03       ` Erik Cervin Edin
2022-04-24  1:57     ` Chris Torek
2022-04-22 18:42 ` Plato Kiorpelidis
  -- strict thread matches above, loose matches on Subject: below --
2024-01-12 20:34 Michael Litwak

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).