From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Junio C Hamano <gitster@pobox.com>, Rich Felker <dalias@libc.org>,
git@vger.kernel.org, larsxschneider@gmail.com
Subject: Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
Date: Fri, 8 Feb 2019 20:42:19 +0000 [thread overview]
Message-ID: <20190208204219.GF11927@genre.crustytoothpaste.net> (raw)
In-Reply-To: <20190208202336.GA5284@alpha>
[-- Attachment #1: Type: text/plain, Size: 3398 bytes --]
On Fri, Feb 08, 2019 at 09:23:36PM +0100, Kevin Daudt wrote:
> Firstly, the tests expect iconv -t UTF-16 to output a BOM, which it
> indeed does not do on Alpine. Secondly, git itself also expects the BOM
> to be present when the encoding is set to UTF-16, otherwise it will
> complain.
Yeah, we definitely want to require a BOM for UTF-16. As previously
mentioned, it isn't safe for us to assume big-endian when it's missing.
> I tried change the test to manually inject a BOM to the file (and
> setting iconv to UTF-16LE / UTF16-BE, which lets the first test go
> through, but test 3 then fails, because git itself output the file
> without BOM, presumably because it's passed through iconv.
>
> So I'm not sure if it's a matter of just fixing the tests.
I think something like the following will likely work in this scenario:
------ %< ---------
From: "brian m. carlson" <sandals@crustytoothpaste.net>
Date: Fri, 8 Feb 2019 12:58:11 +0000
Subject: [PATCH] WIP: utf8: handle missing musl UTF-16 BOM
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
t/t0028-working-tree-encoding.sh | 20 ++++++++++++++++++--
utf8.c | 4 ++++
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index e58ecbfc44..ff02d03bad 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -6,6 +6,22 @@ test_description='working-tree-encoding conversion via gitattributes'
GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
+test_lazy_prereq NO_BOM '
+ printf abc | iconv -f UTF-8 -t UTF-16 &&
+ test $(wc -c) = 6
+'
+
+write_utf16 () {
+ test_have_prereq NO_BOM && printf '\xfe\xff'
+ iconv -f UTF-8 -t UTF-16
+
+}
+
+write_utf32 () {
+ test_have_prereq NO_BOM && printf '\x00\x00\xfe\xff'
+ iconv -f UTF-8 -t UTF-32
+}
+
test_expect_success 'setup test files' '
git config core.eol lf &&
@@ -13,8 +29,8 @@ test_expect_success 'setup test files' '
echo "*.utf16 text working-tree-encoding=utf-16" >.gitattributes &&
echo "*.utf16lebom text working-tree-encoding=UTF-16LE-BOM" >>.gitattributes &&
printf "$text" >test.utf8.raw &&
- printf "$text" | iconv -f UTF-8 -t UTF-16 >test.utf16.raw &&
- printf "$text" | iconv -f UTF-8 -t UTF-32 >test.utf32.raw &&
+ printf "$text" | write_utf16 >test.utf16.raw &&
+ printf "$text" | write_utf32 >test.utf32.raw &&
printf "\377\376" >test.utf16lebom.raw &&
printf "$text" | iconv -f UTF-8 -t UTF-32LE >>test.utf16lebom.raw &&
diff --git a/utf8.c b/utf8.c
index 83824dc2f4..4aa69cd65b 100644
--- a/utf8.c
+++ b/utf8.c
@@ -568,6 +568,10 @@ char *reencode_string_len(const char *in, size_t insz,
bom_str = utf16_be_bom;
bom_len = sizeof(utf16_be_bom);
out_encoding = "UTF-16BE";
+ } else if (same_utf_encoding("UTF-16", out_encoding)) {
+ bom_str = utf16_le_bom;
+ bom_len = sizeof(utf16_le_bom);
+ out_encoding = "UTF-16LE";
}
conv = iconv_open(out_encoding, in_encoding);
------ %< ---------
This passes for me on glibc, but only on a little-endian system. If this
works for musl folks, then I'll add a config option for those people who
have UTF-16 without BOM.
--
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]
next prev parent reply other threads:[~2019-02-08 20:42 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08 0:17 ` brian m. carlson
2019-02-08 6:04 ` Rich Felker
2019-02-08 11:45 ` brian m. carlson
2019-02-08 11:55 ` Kevin Daudt
2019-02-08 13:51 ` brian m. carlson
2019-02-08 17:50 ` Junio C Hamano
2019-02-08 20:23 ` Kevin Daudt
2019-02-08 20:42 ` brian m. carlson [this message]
2019-02-08 23:12 ` Junio C Hamano
2019-02-09 0:24 ` brian m. carlson
2019-02-09 14:57 ` Kevin Daudt
2019-02-09 20:08 ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10 1:45 ` Eric Sunshine
2019-02-10 18:14 ` brian m. carlson
2019-02-10 8:04 ` Torsten Bögershausen
2019-02-10 18:55 ` brian m. carlson
2019-02-11 17:14 ` Junio C Hamano
2019-02-11 0:23 ` [PATCH v2] " brian m. carlson
2019-02-11 1:16 ` Eric Sunshine
2019-02-11 1:20 ` brian m. carlson
2019-02-11 1:26 ` [PATCH v3] " brian m. carlson
2019-02-11 21:43 ` Kevin Daudt
2019-02-11 23:58 ` brian m. carlson
2019-02-12 0:31 ` Junio C Hamano
2019-02-12 0:53 ` brian m. carlson
2019-02-12 2:43 ` Junio C Hamano
2019-02-12 0:52 ` [PATCH v4] " brian m. carlson
2019-02-08 16:13 ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09 8:09 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190208204219.GF11927@genre.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=dalias@libc.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).