bug-gnulib@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
* u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
@ 2018-02-08 15:59 Tim Rühsen
  2018-02-08 17:05 ` Bruno Haible
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Rühsen @ 2018-02-08 15:59 UTC (permalink / raw)
  To: bug-gnulib


[-- Attachment #1.1: Type: text/plain, Size: 1268 bytes --]

Trying to find out why the to_unicode tests of libidn2 fail since a few
months...

It happens on OSX Travis-CI runner, all the infos I have are

$ clang --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
before_install.1

locale_charset() returns with "UTF-8".

u8_strconv_to_locale() and u8_strconv_from_locale() seem not to work as
expected:


One problem seems to be that u8_strconv_to_locale() outputs decomposed
characters, e.g. u8_strconv_to_locale(bücher.de) returns b"ucher.de.

Hex/u32:

Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e U+0064
U+0065)

Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064 U+0065



The second problem is that characters beyond 255 are translated into ?
(U+003f).


Do you have any hints how to fix these problems ?
I would expect u8_strconv_to_locale() to work in a defined manner on
UTF-8 locales - but maybe I am wrong. I could apply a normalization step
in the test itself, but not sure if that is the correct solution.

For problem 2 I see no solution right now.


With Best Regards, Tim



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
  2018-02-08 15:59 u8_strconv_to_locale() misbehaves on OSX (Travis CI runner) Tim Rühsen
@ 2018-02-08 17:05 ` Bruno Haible
  2018-02-08 19:22   ` Tim Ruehsen
  0 siblings, 1 reply; 4+ messages in thread
From: Bruno Haible @ 2018-02-08 17:05 UTC (permalink / raw)
  To: bug-gnulib; +Cc: Tim Rühsen

Hi Tim,

> locale_charset() returns with "UTF-8".

That is as it should be on Mac OS X.

> u8_strconv_to_locale() and u8_strconv_from_locale() seem not to work as
> expected:
> 
> 
> One problem seems to be that u8_strconv_to_locale() outputs decomposed
> characters, e.g. u8_strconv_to_locale(bücher.de) returns b"ucher.de.
> 
> Hex/u32:
> 
> Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e U+0064
> U+0065)
> 
> Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064 U+0065

This would indicate that locale_charset() returns "ASCII".
What happens then is that, because u8_strconv_to_locale invokes
u8_strconv_to_encoding, which invokes mem_iconveha with transliterate=true,
which appends '//TRANSLIT' when invoking iconv_open. you get the
transliteration, e.g. from 'ü' to '"u'.

> The second problem is that characters beyond 255 are translated into ?
> (U+003f).

This would indicate that locale_charset() returns "ISO-8859-1". The
question marks then come from the transliteration, again.

> Do you have any hints how to fix these problems ?

I would compile without -O and with -ggdb, then single-step through the code,
paying particular attention to the value of locale_charset() and to
the arguments of iconv_open().

> I would expect u8_strconv_to_locale() to work in a defined manner on
> UTF-8 locales

That's certainly how it is intended to be.

Bruno



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
  2018-02-08 17:05 ` Bruno Haible
@ 2018-02-08 19:22   ` Tim Ruehsen
  2024-02-23 18:52     ` Bruno Haible
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Ruehsen @ 2018-02-08 19:22 UTC (permalink / raw)
  To: Bruno Haible, bug-gnulib

Hi Bruno,

thanks for your answer... after thinking about the ambiguous output of
locale_charset() this might be an explanation:

Libidn2 (and the tests) use libunistring installed from homebrew while
my direct call to locale_charset() is from gnulib.

So my build correctly says UTF-8, but the homwbrew libunistring has
been built on some unknown (OSX ?) system with their own version of 
locale_charset() returning ASCII. I said I get ? from characters > 255,
but I didn't make sure. Maybe it is characters > 127.

The bad thing is, I only experience this on a Travis CI build and so
can't use gdb for single stepping.

But an option is to build libunistring from sources in the CI and
link/test with that.

Regards, Tim


Am Donnerstag, den 08.02.2018, 18:05 +0100 schrieb Bruno Haible:
> Hi Tim,
> 
> > locale_charset() returns with "UTF-8".
> 
> That is as it should be on Mac OS X.
> 
> > u8_strconv_to_locale() and u8_strconv_from_locale() seem not to
> > work as
> > expected:
> > 
> > 
> > One problem seems to be that u8_strconv_to_locale() outputs
> > decomposed
> > characters, e.g. u8_strconv_to_locale(bücher.de) returns
> > b"ucher.de.
> > 
> > Hex/u32:
> > 
> > Result: U+0062 U+0022 U+0075 U+0063 U+0068 U+0065 U+0072 U+002e
> > U+0064
> > U+0065)
> > 
> > Expected: U+0062 U+00fc U+0063 U+0068 U+0065 U+0072 U+002e U+0064
> > U+0065
> 
> This would indicate that locale_charset() returns "ASCII".
> What happens then is that, because u8_strconv_to_locale invokes
> u8_strconv_to_encoding, which invokes mem_iconveha with
> transliterate=true,
> which appends '//TRANSLIT' when invoking iconv_open. you get the
> transliteration, e.g. from 'ü' to '"u'.
> 
> > The second problem is that characters beyond 255 are translated
> > into ?
> > (U+003f).
> 
> This would indicate that locale_charset() returns "ISO-8859-1". The
> question marks then come from the transliteration, again.
> 
> > Do you have any hints how to fix these problems ?
> 
> I would compile without -O and with -ggdb, then single-step through
> the code,
> paying particular attention to the value of locale_charset() and to
> the arguments of iconv_open().
> 
> > I would expect u8_strconv_to_locale() to work in a defined manner
> > on
> > UTF-8 locales
> 
> That's certainly how it is intended to be.
> 
> Bruno
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: u8_strconv_to_locale() misbehaves on OSX (Travis CI runner)
  2018-02-08 19:22   ` Tim Ruehsen
@ 2024-02-23 18:52     ` Bruno Haible
  0 siblings, 0 replies; 4+ messages in thread
From: Bruno Haible @ 2024-02-23 18:52 UTC (permalink / raw)
  To: bug-gnulib, Tim Ruehsen

Tim Ruehsen wrote on 2018-02-08
in <https://lists.gnu.org/archive/html/bug-gnulib/2018-02/msg00044.html>:
> my direct call to locale_charset() is from gnulib.
> 
> So my build correctly says UTF-8, but the homwbrew libunistring has
> been built on some unknown (OSX ?) system with their own version of 
> locale_charset() returning ASCII.

That's clearly bogus. But I can't debug builds from distros that each
have their own idiosyncratic build system.

Bruno





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-23 19:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-08 15:59 u8_strconv_to_locale() misbehaves on OSX (Travis CI runner) Tim Rühsen
2018-02-08 17:05 ` Bruno Haible
2018-02-08 19:22   ` Tim Ruehsen
2024-02-23 18:52     ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).