git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Junio C Hamano <gitster@pobox.com>, Rich Felker <dalias@libc.org>,
	git@vger.kernel.org, larsxschneider@gmail.com
Subject: Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
Date: Fri, 8 Feb 2019 20:42:19 +0000	[thread overview]
Message-ID: <20190208204219.GF11927@genre.crustytoothpaste.net> (raw)
In-Reply-To: <20190208202336.GA5284@alpha>

[-- Attachment #1: Type: text/plain, Size: 3398 bytes --]

On Fri, Feb 08, 2019 at 09:23:36PM +0100, Kevin Daudt wrote:
> Firstly, the tests expect iconv -t UTF-16 to output a BOM, which it
> indeed does not do on Alpine. Secondly, git itself also expects the BOM
> to be present when the encoding is set to UTF-16, otherwise it will
> complain.

Yeah, we definitely want to require a BOM for UTF-16. As previously
mentioned, it isn't safe for us to assume big-endian when it's missing.

> I tried change the test to manually inject a BOM to the file (and
> setting iconv to UTF-16LE / UTF16-BE, which lets the first test go
> through, but test 3 then fails, because git itself output the file
> without BOM, presumably because it's passed through iconv.
> 
> So I'm not sure if it's a matter of just fixing the tests.

I think something like the following will likely work in this scenario:

------ %< ---------
From: "brian m. carlson" <sandals@crustytoothpaste.net>
Date: Fri, 8 Feb 2019 12:58:11 +0000
Subject: [PATCH] WIP: utf8: handle missing musl UTF-16 BOM

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t0028-working-tree-encoding.sh | 20 ++++++++++++++++++--
 utf8.c                           |  4 ++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index e58ecbfc44..ff02d03bad 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -6,6 +6,22 @@ test_description='working-tree-encoding conversion via gitattributes'
 
 GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
 
+test_lazy_prereq NO_BOM '
+	printf abc | iconv -f UTF-8 -t UTF-16 &&
+	test $(wc -c) = 6
+'
+
+write_utf16 () {
+	test_have_prereq NO_BOM && printf '\xfe\xff'
+	iconv -f UTF-8 -t UTF-16
+
+}
+
+write_utf32 () {
+	test_have_prereq NO_BOM && printf '\x00\x00\xfe\xff'
+	iconv -f UTF-8 -t UTF-32
+}
+
 test_expect_success 'setup test files' '
 	git config core.eol lf &&
 
@@ -13,8 +29,8 @@ test_expect_success 'setup test files' '
 	echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes &&
 	echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes &&
 	printf "$text" >test.utf8.raw &&
-	printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw &&
-	printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw &&
+	printf "$text" | write_utf16 >test.utf16.raw &&
+	printf "$text" | write_utf32 >test.utf32.raw &&
 	printf "\377\376"                         >test.utf16lebom.raw &&
 	printf "$text" | iconv -f UTF-8 -t UTF-32LE >>test.utf16lebom.raw &&
 
diff --git a/utf8.c b/utf8.c
index 83824dc2f4..4aa69cd65b 100644
--- a/utf8.c
+++ b/utf8.c
@@ -568,6 +568,10 @@ char *reencode_string_len(const char *in, size_t insz,
 		bom_str = utf16_be_bom;
 		bom_len = sizeof(utf16_be_bom);
 		out_encoding = "UTF-16BE";
+	} else if (same_utf_encoding("UTF-16", out_encoding)) {
+		bom_str = utf16_le_bom;
+		bom_len = sizeof(utf16_le_bom);
+		out_encoding = "UTF-16LE";
 	}
 
 	conv = iconv_open(out_encoding, in_encoding);
------ %< ---------

This passes for me on glibc, but only on a little-endian system. If this
works for musl folks, then I'll add a config option for those people who
have UTF-16 without BOM.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

  reply	other threads:[~2019-02-08 20:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08  0:17 ` brian m. carlson
2019-02-08  6:04   ` Rich Felker
2019-02-08 11:45     ` brian m. carlson
2019-02-08 11:55       ` Kevin Daudt
2019-02-08 13:51         ` brian m. carlson
2019-02-08 17:50           ` Junio C Hamano
2019-02-08 20:23             ` Kevin Daudt
2019-02-08 20:42               ` brian m. carlson [this message]
2019-02-08 23:12                 ` Junio C Hamano
2019-02-09  0:24                   ` brian m. carlson
2019-02-09 14:57                 ` Kevin Daudt
2019-02-09 20:08                   ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10  1:45                     ` Eric Sunshine
2019-02-10 18:14                       ` brian m. carlson
2019-02-10  8:04                     ` Torsten Bögershausen
2019-02-10 18:55                       ` brian m. carlson
2019-02-11 17:14                         ` Junio C Hamano
2019-02-11  0:23                     ` [PATCH v2] " brian m. carlson
2019-02-11  1:16                       ` Eric Sunshine
2019-02-11  1:20                         ` brian m. carlson
2019-02-11  1:26                     ` [PATCH v3] " brian m. carlson
2019-02-11 21:43                       ` Kevin Daudt
2019-02-11 23:58                         ` brian m. carlson
2019-02-12  0:31                           ` Junio C Hamano
2019-02-12  0:53                             ` brian m. carlson
2019-02-12  2:43                               ` Junio C Hamano
2019-02-12  0:52                     ` [PATCH v4] " brian m. carlson
2019-02-08 16:13         ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09  8:09     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190208204219.GF11927@genre.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=dalias@libc.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).