git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Kevin Daudt <git@lists.ikke.info>,
	git@vger.kernel.org, larsxschneider@gmail.com
Subject: Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
Date: Fri, 8 Feb 2019 01:04:03 -0500	[thread overview]
Message-ID: <20190208060403.GA29788@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190208001705.GC11927@genre.crustytoothpaste.net>

On Fri, Feb 08, 2019 at 12:17:05AM +0000, brian m. carlson wrote:
> [Please skip using Reply-To and instead of Mail-Followup-To so that
> responses also go to the list.]
> 
> On Thu, Feb 07, 2019 at 10:59:35PM +0100, Kevin Daudt wrote:
> > I'm trying to get the git test suite passing on Alpine Linux, which is
> > based on musl libc.
> > 
> > All tests in t0028-working-tree-encoding.sh are currently failing,
> > because musl iconv does not support statefull output of UTF-16/32 (eg,
> > it does not output a BOM), while git is expecting that to be present:
> > 
> > > hint: The file 'test.utf16' is missing a byte order mark (BOM). Please
> > > use UTF-16BE or UTF-16LE (depending on the byte order) as
> > > working-tree-encoding.
> > > fatal: BOM is required in 'test.utf16' if encoded as utf-16
> > 
> > Because adding the file to get fails, all the other tests fail as well
> > as they expect the file to be present in the repository.
> > 
> > Any idea how to get around this?
> 
> I think musl needs to patch their libc. RFC 2781 says that if there's no
> BOM in UTF-16, then "the text SHOULD be interpreted as being
> big-endian."
> 
> Unfortunately for all of us, many Windows-based programs have chosen to
> ignore that advice (technically, it's only a SHOULD) and interpret it as
> little-endian instead. Git can't safely assume anything about the
> endianness of a UTF-16 stream that doesn't contain a BOM. Technically,
> since the RFC doesn't specify a MUST requirement, musl can't, either.
> 
> Even if Git were to produce a BOM to work around this issue, then we'd
> still have the problem that any program using musl will write data in
> UTF-16 without a BOM. Moreover, because musl, in violation of the RFC,
> doesn't read and process BOMs, someone using little-endian UTF-16 (with
> a proper BOM) with musl and Git will have their data corrupted,
> according to my reading of the musl website.

That information is outdated and someone from our side should update
it; since 1.1.19, musl treats "UTF-16" input as ambiguous endianness
determined by BOM, defaulting to big if there's no BOM. However output
is always big endian, such that processes conforming to the Unicode
SHOULD clause will interpret it correctly.

The portable way to get little endian with a BOM is to open a
conversion descriptor for "UTF-16LE" (which should not add any BOM)
and write a BOM manually.

In any case, this test seems mainly relevant to Windows users wanting
to store source files in UTF-16LE with BOM. This doesn't really make
sense to do on a Linux/musl system, so I'm not sure any action is
needed here from either side.

Rich

  reply	other threads:[~2019-02-08  6:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08  0:17 ` brian m. carlson
2019-02-08  6:04   ` Rich Felker [this message]
2019-02-08 11:45     ` brian m. carlson
2019-02-08 11:55       ` Kevin Daudt
2019-02-08 13:51         ` brian m. carlson
2019-02-08 17:50           ` Junio C Hamano
2019-02-08 20:23             ` Kevin Daudt
2019-02-08 20:42               ` brian m. carlson
2019-02-08 23:12                 ` Junio C Hamano
2019-02-09  0:24                   ` brian m. carlson
2019-02-09 14:57                 ` Kevin Daudt
2019-02-09 20:08                   ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10  1:45                     ` Eric Sunshine
2019-02-10 18:14                       ` brian m. carlson
2019-02-10  8:04                     ` Torsten Bögershausen
2019-02-10 18:55                       ` brian m. carlson
2019-02-11 17:14                         ` Junio C Hamano
2019-02-11  0:23                     ` [PATCH v2] " brian m. carlson
2019-02-11  1:16                       ` Eric Sunshine
2019-02-11  1:20                         ` brian m. carlson
2019-02-11  1:26                     ` [PATCH v3] " brian m. carlson
2019-02-11 21:43                       ` Kevin Daudt
2019-02-11 23:58                         ` brian m. carlson
2019-02-12  0:31                           ` Junio C Hamano
2019-02-12  0:53                             ` brian m. carlson
2019-02-12  2:43                               ` Junio C Hamano
2019-02-12  0:52                     ` [PATCH v4] " brian m. carlson
2019-02-08 16:13         ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09  8:09     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190208060403.GA29788@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=git@lists.ikke.info \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).