From: Tim Ruehsen <tim.ruehsen@gmx.de>
To: Bruno Haible <bruno@clisp.org>, bug-gnulib@gnu.org
Subject: Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
Date: Thu, 08 Feb 2018 20:22:02 +0100 [thread overview]
Message-ID: <1518117722.2515.13.camel@gmx.de> (raw)
In-Reply-To: <2057454.6PEsBuzmPS@omega>
Hi Bruno,
thanks for your answer... after thinking about the ambiguous output of
locale_charset() this might be an explanation:
Libidn2 (and the tests) use libunistring installed from homebrew while
my direct call to locale_charset() is from gnulib.
So my build correctly says UTF-8, but the homwbrew libunistring has
been built on some unknown (OSX ?) system with their own version of
locale_charset() returning ASCII. I said I get ? from characters > 255,
but I didn't make sure. Maybe it is characters > 127.
The bad thing is, I only experience this on a Travis CI build and so
can't use gdb for single stepping.
But an option is to build libunistring from sources in the CI and
link/test with that.
Regards, Tim
Am Donnerstag, den 08.02.2018, 18:05 +0100 schrieb Bruno Haible:
> Hi Tim,
>
> > locale_charset() returns with "UTF-8".
>
> That is as it should be on Mac OS X.
>
> > u8_strconv_to_locale() and u8_strconv_from_locale() seem not to
> > work as
> > expected:
> >
> >
> > One problem seems to be that u8_strconv_to_locale() outputs
> > decomposed
> > characters, e.g. u8_strconv_to_locale(bücher.de) returns
> > b"ucher.de.
> >
> > Hex/u32:
> >
> > Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e
> > U+0064
> > U+0065)
> >
> > Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064
> > U+0065
>
> This would indicate that locale_charset() returns "ASCII".
> What happens then is that, because u8_strconv_to_locale invokes
> u8_strconv_to_encoding, which invokes mem_iconveha with
> transliterate=true,
> which appends '//TRANSLIT' when invoking iconv_open. you get the
> transliteration, e.g. from 'ü' to '"u'.
>
> > The second problem is that characters beyond 255 are translated
> > into ?
> > (U+003f).
>
> This would indicate that locale_charset() returns "ISO-8859-1". The
> question marks then come from the transliteration, again.
>
> > Do you have any hints how to fix these problems ?
>
> I would compile without -O and with -ggdb, then single-step through
> the code,
> paying particular attention to the value of locale_charset() and to
> the arguments of iconv_open().
>
> > I would expect u8_strconv_to_locale() to work in a defined manner
> > on
> > UTF-8 locales
>
> That's certainly how it is intended to be.
>
> Bruno
>
>
next prev parent reply other threads:[~2018-02-08 19:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-08 15:59 u8_strconv_to_locale() misbehaves on OSX (Travis CI runner) Tim Rühsen
2018-02-08 17:05 ` Bruno Haible
2018-02-08 19:22 ` Tim Ruehsen [this message]
2024-02-23 18:52 ` Bruno Haible
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1518117722.2515.13.camel@gmx.de \
--to=tim.ruehsen@gmx.de \
--cc=bruno@clisp.org \
--cc=bug-gnulib@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).