git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Adrián Gimeno Balaguer" <adrigibal@gmail.com>
To: sandals@crustytoothpaste.net, git@vger.kernel.org
Subject: Re: git-rebase is ignoring working-tree-encoding
Date: Sun, 4 Nov 2018 17:37:09 +0100	[thread overview]
Message-ID: <CADN+U_Nw5wCyK1SPRgsxzFbJ-KKnOV2Ub8YA3_a80SZYwKC5FQ@mail.gmail.com> (raw)
In-Reply-To: <20181104154744.GI731755@genre.crustytoothpaste.net>

El dom., 4 nov. 2018 a las 16:48, brian m. carlson
(<sandals@crustytoothpaste.net>) escribió:
> Do things work for you if you write this as "UTF-16LE"?  When you use
> working-tree-encoding, the file is stored internally as UTF-8, but it's
> serialized to the specified encoding when written out.

When I use UTF-16LE or UTF-16BE, then I can't commit or view diffs of
specified files, as Git prohibites BOM existance in these cases,
showing an error when attempting to commit. But BOM must also exist
for the project. I even experimented for fixing this issue within
Git's source. It turns out that Git is following an Unicode rule that
says that BOM is not permitted when declaring exact UTF-16BE/UTF-16LE
MIME (and UTF-32 variants) encoding types:

https://github.com/git/git/blob/master/utf8.h#L87

> Asking for "UTF-16" is ambiguous: there are two endiannesses, and so as
> long as you get a BOM in the output, either one is an acceptable option.
> Which one you get is dependent on what the underlying code thinks is the
> default, and traditionally for Unix systems and Unix tools that's been
> big-endian.  If you want a particular endianness, you should specify it.

I wrote a "counterpart" easy fix which instead only prohibites BOM for
the opposite endianness (for example if
working-tree-encoding=UTF-16LE, then finding an UTF-16BE BOM in the
file would cause Git to signal the error right before committing,
diffing, etc.). That way user would be encouraged to modify the file's
encoding to match the one specified in working-tree-encoding before
allowing these actions, therefore preventing Git from encoding to the
wrong endianness after file is written out. With few repository tests,
this new behaviour worked as expected. But then I realized this
solution would perhaps be unacceptable for Git's source code as it
would violate that Unicode standard. Anyways, here is a PR in my Git
fork with the changes I did, for reference:

https://github.com/AdRiAnIlloO/git/pull/1

Ah this point, the solution I came with recently for my project was
writing some code in Shell to fix the endianness of the re-encoded
files to UTF-16BE after the Git's write out process (or a "working
tree refresh" in my own words), within the same script that I use to
pack assets including the localization files.

> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
Adrián

  reply	other threads:[~2018-11-04 16:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-02  2:30 git-rebase is ignoring working-tree-encoding Adrián Gimeno Balaguer
2018-11-04 15:47 ` brian m. carlson
2018-11-04 16:37   ` Adrián Gimeno Balaguer [this message]
2018-11-04 18:38     ` brian m. carlson
2018-11-04 17:07 ` Torsten Bögershausen
2018-11-05  4:24   ` Adrián Gimeno Balaguer
2018-11-05 18:10     ` Torsten Bögershausen
2018-11-06 20:16       ` Torsten Bögershausen
2018-11-07  4:38         ` Adrián Gimeno Balaguer
2018-11-08 17:02           ` Torsten Bögershausen
2018-12-26  0:56             ` Alexandre Grigoriev
2018-12-26 19:25               ` brian m. carlson
2018-12-27  2:52                 ` Alexandre Grigoriev
2018-12-27 14:45                   ` Torsten Bögershausen
2018-12-23 14:46   ` Alexandre Grigoriev
2018-12-29 11:09 ` [PATCH/RFC v1 1/1] Support working-tree-encoding "UTF-16LE-BOM" tboegi
     [not found]   ` <CADN+U_OccLuLN7_0rjikDgLT+Zvt8hka-=xsnVVLJORjYzP78Q@mail.gmail.com>
2018-12-29 15:48     ` Adrián Gimeno Balaguer
2018-12-29 17:54       ` Philip Oakley
2019-01-20 16:43 ` [PATCH v2 " tboegi
2019-01-22 20:13   ` Junio C Hamano
2019-01-30 15:01 ` [PATCH v3 " tboegi
2019-01-30 15:24   ` Jason Pyeron
2019-01-30 17:49     ` Torsten Bögershausen
2019-03-06  5:23 ` [PATCH v1 1/1] gitattributes.txt: fix typo tboegi
2019-03-07  0:24   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADN+U_Nw5wCyK1SPRgsxzFbJ-KKnOV2Ub8YA3_a80SZYwKC5FQ@mail.gmail.com \
    --to=adrigibal@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).