git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/1] gettext(windows): always use UTF-8
@ 2019-06-27  8:44 Johannes Schindelin via GitGitGadget
  2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
  2019-07-03 20:46 ` [PATCH v2 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 9+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-06-27  8:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The main issue we work around here is that Windows does not have a UTF-8
"code page".

Side note: there is actually a code page for UTF-8: 65001 (see 
https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers
). However, when experimenting with it, we ran into a multitude of issues in
the Git for Windows project, ranging from various problems with Windows'
default console to miscounted file writes. While these issues may have been
mitigated in recent Windows 10 versions, older ones (in particular, Windows
7) still seem to have most of them, and Git for Windows specifically still
supports even Windows Vista. So from a practical point of view, there is no
UTF-8 code page.

Karsten Blees (1):
  gettext: always use UTF-8 on native Windows

 gettext.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-217%2Fdscho%2Fgettext-force-utf-8-on-windows-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-217/dscho/gettext-force-utf-8-on-windows-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/217
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] gettext: always use UTF-8 on native Windows
  2019-06-27  8:44 [PATCH 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
@ 2019-06-27  8:44 ` Karsten Blees via GitGitGadget
  2019-07-03 11:26   ` Johannes Schindelin
  2019-07-03 20:46 ` [PATCH v2 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 9+ messages in thread
From: Karsten Blees via GitGitGadget @ 2019-06-27  8:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Karsten Blees

From: Karsten Blees <blees@dcon.de>

Git on native Windows exclusively uses UTF-8 for console output (both with
MinTTY and native Console windows). Gettext uses setlocale() to determine
the output encoding for translated text, however, MSVCRT's setlocale()
doesn't support UTF-8. As a result, translated text is encoded in system
encoding (GetAPC()), and non-ASCII chars are mangled in console output.

Use gettext's bind_textdomain_codeset() to force the encoding to UTF-8 on
native Windows.

In this developers' setup, HAVE_LIBCHARSET_H is apparently defined, but
we *really* want to override the locale_charset() here.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 gettext.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gettext.c b/gettext.c
index d4021d690c..d8423e5c41 100644
--- a/gettext.c
+++ b/gettext.c
@@ -12,7 +12,9 @@
 #ifndef NO_GETTEXT
 #	include <locale.h>
 #	include <libintl.h>
-#	ifdef HAVE_LIBCHARSET_H
+#	ifdef GIT_WINDOWS_NATIVE
+#		define locale_charset() "UTF-8"
+#	elif defined HAVE_LIBCHARSET_H
 #		include <libcharset.h>
 #	else
 #		include <langinfo.h>
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] gettext: always use UTF-8 on native Windows
  2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
@ 2019-07-03 11:26   ` Johannes Schindelin
  2019-07-03 18:31     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2019-07-03 11:26 UTC (permalink / raw)
  To: Karsten Blees via GitGitGadget; +Cc: git, Junio C Hamano, Karsten Blees

Hi,

On Thu, 27 Jun 2019, Karsten Blees via GitGitGadget wrote:

> diff --git a/gettext.c b/gettext.c
> index d4021d690c..d8423e5c41 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -12,7 +12,9 @@
>  #ifndef NO_GETTEXT
>  #	include <locale.h>
>  #	include <libintl.h>
> -#	ifdef HAVE_LIBCHARSET_H
> +#	ifdef GIT_WINDOWS_NATIVE
> +#		define locale_charset() "UTF-8"
> +#	elif defined HAVE_LIBCHARSET_H
>  #		include <libcharset.h>
>  #	else
>  #		include <langinfo.h>

Sadly, this has a really unfortunate interaction with ab/no-kwset: the
latter patch series contains test cases that rely on being able to
use `LC_ALL=C` to prevent Git from assuming UTF-8 encoding.

I have this tentative patch queued up on Git for Windows' `shears/pu`
branch (i.e. the ever-green branch that continuously rebases all of Git
for Windows' patch thicket on top of `pu`):
https://github.com/git-for-windows/git/commit/e561446d

For you convenience:

-- snip --
diff --git a/gettext.c b/gettext.c
index 7da80db453c4..35d2c1218db2 100644
--- a/gettext.c
+++ b/gettext.c
@@ -13,7 +13,23 @@
 #	include <locale.h>
 #	include <libintl.h>
 #	ifdef GIT_WINDOWS_NATIVE
-#		define locale_charset() "UTF-8"
+
+static const char *locale_charset(void)
+{
+	const char *env = getenv("LC_ALL"), *dot;
+
+	if (!env || !*env)
+		env = getenv("LC_CTYPE");
+	if (!env || !*env)
+		env = getenv("LANG");
+
+	if (!env)
+		return "UTF-8";
+
+	dot = strchr(env, '.');
+	return !dot ? env : dot + 1;
+}
+
 #	elif defined HAVE_LIBCHARSET_H
 #		include <libcharset.h>
 #	else
-- snap --

Junio, please hold off from advancing `kb/windows-force-utf8` until this
is resolved.

Also: does that diff look okay? Or would you rather want to avoid having
that function defined in that #if...#endif block?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] gettext: always use UTF-8 on native Windows
  2019-07-03 11:26   ` Johannes Schindelin
@ 2019-07-03 18:31     ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2019-07-03 18:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Karsten Blees via GitGitGadget, git, Karsten Blees

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Junio, please hold off from advancing `kb/windows-force-utf8` until this
> is resolved.

OK, will do.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 0/1] gettext(windows): always use UTF-8
  2019-06-27  8:44 [PATCH 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
  2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
@ 2019-07-03 20:46 ` Johannes Schindelin via GitGitGadget
  2019-07-03 20:46   ` [PATCH v2 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-07-03 20:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

The main issue we work around here is that Windows does not have a UTF-8
"code page".

Side note: there is actually a code page for UTF-8: 65001 (see 
https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers
). However, when experimenting with it, we ran into a multitude of issues in
the Git for Windows project, ranging from various problems with Windows'
default console to miscounted file writes. While these issues may have been
mitigated in recent Windows 10 versions, older ones (in particular, Windows
7) still seem to have most of them, and Git for Windows specifically still
supports even Windows Vista. So from a practical point of view, there is no
UTF-8 code page.

Changes since v1:

 * The LC_ALL=C method used by ab/no-kwset to prevent Git from assuming
   UTF-8-encoded input is now supported.
 * The commit message was enhanced and revamped.

Karsten Blees (1):
  gettext: always use UTF-8 on native Windows

 gettext.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-217%2Fdscho%2Fgettext-force-utf-8-on-windows-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-217/dscho/gettext-force-utf-8-on-windows-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/217

Range-diff vs v1:

 1:  ff37a2646a ! 1:  2d2253faef gettext: always use UTF-8 on native Windows
     @@ -2,17 +2,34 @@
      
          gettext: always use UTF-8 on native Windows
      
     -    Git on native Windows exclusively uses UTF-8 for console output (both with
     -    MinTTY and native Console windows). Gettext uses setlocale() to determine
     -    the output encoding for translated text, however, MSVCRT's setlocale()
     -    doesn't support UTF-8. As a result, translated text is encoded in system
     -    encoding (GetAPC()), and non-ASCII chars are mangled in console output.
     +    On native Windows, Git exclusively uses UTF-8 for console output (both
     +    with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
     +    determine the output encoding for translated text, however, MSVCRT's
     +    `setlocale()` does not support UTF-8. As a result, translated text is
     +    encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
     +    mangled in console output.
      
     -    Use gettext's bind_textdomain_codeset() to force the encoding to UTF-8 on
     -    native Windows.
     +    Side note: There is actually a code page for UTF-8: 65001. In practice,
     +    it does not work as expected at least on Windows 7, though, so we cannot
     +    use it in Git. Besides, if we overrode the code page, any process
     +    spawned from Git would inherit that code page (as opposed to the code
     +    page configured for the current user), which would quite possibly break
     +    e.g. diff or merge helpers. So we really cannot override the code page.
      
     -    In this developers' setup, HAVE_LIBCHARSET_H is apparently defined, but
     -    we *really* want to override the locale_charset() here.
     +    In `init_gettext_charset()`, Git calls gettext's
     +    `bind_textdomain_codeset()` with the character set obtained via
     +    `locale_charset()`; Let's override that latter function to force the
     +    encoding to UTF-8 on native Windows.
     +
     +    In Git for Windows' SDK, there is a `libcharset.h` and therefore we
     +    define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
     +    `config.mak.uname`, therefore we need to add the override before that
     +    conditionally-compiled code block.
     +
     +    Rather than simply defining `locale_charset()` to return the string
     +    `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
     +    `ab/no-kwset` patch series, for example, needs to have a way to prevent
     +    Git from expecting UTF-8-encoded input.
      
          Signed-off-by: Karsten Blees <blees@dcon.de>
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     @@ -26,7 +43,23 @@
       #	include <libintl.h>
      -#	ifdef HAVE_LIBCHARSET_H
      +#	ifdef GIT_WINDOWS_NATIVE
     -+#		define locale_charset() "UTF-8"
     ++
     ++static const char *locale_charset(void)
     ++{
     ++	const char *env = getenv("LC_ALL"), *dot;
     ++
     ++	if (!env || !*env)
     ++		env = getenv("LC_CTYPE");
     ++	if (!env || !*env)
     ++		env = getenv("LANG");
     ++
     ++	if (!env)
     ++		return "UTF-8";
     ++
     ++	dot = strchr(env, '.');
     ++	return !dot ? env : dot + 1;
     ++}
     ++
      +#	elif defined HAVE_LIBCHARSET_H
       #		include <libcharset.h>
       #	else

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/1] gettext: always use UTF-8 on native Windows
  2019-07-03 20:46 ` [PATCH v2 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
@ 2019-07-03 20:46   ` Karsten Blees via GitGitGadget
  2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 9+ messages in thread
From: Karsten Blees via GitGitGadget @ 2019-07-03 20:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Karsten Blees

From: Karsten Blees <blees@dcon.de>

On native Windows, Git exclusively uses UTF-8 for console output (both
with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
determine the output encoding for translated text, however, MSVCRT's
`setlocale()` does not support UTF-8. As a result, translated text is
encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
mangled in console output.

Side note: There is actually a code page for UTF-8: 65001. In practice,
it does not work as expected at least on Windows 7, though, so we cannot
use it in Git. Besides, if we overrode the code page, any process
spawned from Git would inherit that code page (as opposed to the code
page configured for the current user), which would quite possibly break
e.g. diff or merge helpers. So we really cannot override the code page.

In `init_gettext_charset()`, Git calls gettext's
`bind_textdomain_codeset()` with the character set obtained via
`locale_charset()`; Let's override that latter function to force the
encoding to UTF-8 on native Windows.

In Git for Windows' SDK, there is a `libcharset.h` and therefore we
define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
`config.mak.uname`, therefore we need to add the override before that
conditionally-compiled code block.

Rather than simply defining `locale_charset()` to return the string
`"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
`ab/no-kwset` patch series, for example, needs to have a way to prevent
Git from expecting UTF-8-encoded input.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 gettext.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gettext.c b/gettext.c
index d4021d690c..3f2aca5c3b 100644
--- a/gettext.c
+++ b/gettext.c
@@ -12,7 +12,25 @@
 #ifndef NO_GETTEXT
 #	include <locale.h>
 #	include <libintl.h>
-#	ifdef HAVE_LIBCHARSET_H
+#	ifdef GIT_WINDOWS_NATIVE
+
+static const char *locale_charset(void)
+{
+	const char *env = getenv("LC_ALL"), *dot;
+
+	if (!env || !*env)
+		env = getenv("LC_CTYPE");
+	if (!env || !*env)
+		env = getenv("LANG");
+
+	if (!env)
+		return "UTF-8";
+
+	dot = strchr(env, '.');
+	return !dot ? env : dot + 1;
+}
+
+#	elif defined HAVE_LIBCHARSET_H
 #		include <libcharset.h>
 #	else
 #		include <langinfo.h>
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] gettext: always use UTF-8 on native Windows
  2019-07-03 20:46   ` [PATCH v2 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
@ 2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
  2019-07-08 12:57       ` Johannes Schindelin
  2019-07-08 18:30       ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-07-04 22:53 UTC (permalink / raw)
  To: Karsten Blees via GitGitGadget
  Cc: git, Junio C Hamano, Karsten Blees, Johannes Schindelin


On Wed, Jul 03 2019, Karsten Blees via GitGitGadget wrote:

> From: Karsten Blees <blees@dcon.de>
>
> On native Windows, Git exclusively uses UTF-8 for console output (both
> with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
> determine the output encoding for translated text, however, MSVCRT's
> `setlocale()` does not support UTF-8. As a result, translated text is
> encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
> mangled in console output.
>
> Side note: There is actually a code page for UTF-8: 65001. In practice,
> it does not work as expected at least on Windows 7, though, so we cannot
> use it in Git. Besides, if we overrode the code page, any process
> spawned from Git would inherit that code page (as opposed to the code
> page configured for the current user), which would quite possibly break
> e.g. diff or merge helpers. So we really cannot override the code page.
>
> In `init_gettext_charset()`, Git calls gettext's
> `bind_textdomain_codeset()` with the character set obtained via
> `locale_charset()`; Let's override that latter function to force the
> encoding to UTF-8 on native Windows.
>
> In Git for Windows' SDK, there is a `libcharset.h` and therefore we
> define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
> `config.mak.uname`, therefore we need to add the override before that
> conditionally-compiled code block.
>
> Rather than simply defining `locale_charset()` to return the string
> `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
> `ab/no-kwset` patch series, for example, needs to have a way to prevent
> Git from expecting UTF-8-encoded input.

It's not just the ab/no-kwset I have cooking (but happy to have this
take that into account), but also anything grep-like is usually must
faster with LC_ALL=C. Isn't that also the case on Windows? Setting
locales affects a large variety of libc functions and third party
libraries (e.g. PCRE via us setting "use UTF-8" under locale).

> Signed-off-by: Karsten Blees <blees@dcon.de>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  gettext.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gettext.c b/gettext.c
> index d4021d690c..3f2aca5c3b 100644
> --- a/gettext.c
> +++ b/gettext.c
> @@ -12,7 +12,25 @@
>  #ifndef NO_GETTEXT
>  #	include <locale.h>
>  #	include <libintl.h>
> -#	ifdef HAVE_LIBCHARSET_H
> +#	ifdef GIT_WINDOWS_NATIVE
> +
> +static const char *locale_charset(void)
> +{
> +	const char *env = getenv("LC_ALL"), *dot;
> +
> +	if (!env || !*env)
> +		env = getenv("LC_CTYPE");
> +	if (!env || !*env)
> +		env = getenv("LANG");
> +
> +	if (!env)
> +		return "UTF-8";
> +
> +	dot = strchr(env, '.');
> +	return !dot ? env : dot + 1;
> +}
> +
> +#	elif defined HAVE_LIBCHARSET_H
>  #		include <libcharset.h>
>  #	else
>  #		include <langinfo.h>

I'll take it on faith that this is what the locale_charset() should look
like.

I wonder if it wouldn't be better to always compile this function, and
just have init_gettext_charset() switch between the two. We've moved
more towards that sort of thing (e.g. with pthreads). I.e. prefer
redundant compilation to ifdefing platform-only code (which then only
gets compiled there). See "HAVE_THREADS" in the code.

It looks to me that with this patch the HAVE_LIBCHARSET_H docs in
"Makefile" become wrong. Shouldn't those be updated too?

We also still pass -DHAVE_LIBCHARSET_H to every file we compile, only to
never use it under GIT_WINDOWS_NATIVE, but perhaps fixing that isn't
possible with GIT_WINDOWS_NATIVE being a macro, and perhaps I've again
gotten the "native" v.s. "mingw" etc. relationship wrong in my head and
the HAVE_LIBCHARSET_H docs are fine.

It just seems wrong that we have both the configure script &
config.mak.uname look for / declare that we have libcharset.h, only to
at this late point not use libcharset.h at all. Couldn't we just know if
GIT_WINDOWS_NATIVE will be true earlier & move that check up, so it &
HAVE_LIBCHARSET_H can be mutually exclusive (with accompanying #error if
we have both)?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] gettext: always use UTF-8 on native Windows
  2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
@ 2019-07-08 12:57       ` Johannes Schindelin
  2019-07-08 18:30       ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2019-07-08 12:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Karsten Blees via GitGitGadget, git, Junio C Hamano, Karsten Blees

[-- Attachment #1: Type: text/plain, Size: 6329 bytes --]

Hi Ævar,

On Fri, 5 Jul 2019, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Jul 03 2019, Karsten Blees via GitGitGadget wrote:
>
> > From: Karsten Blees <blees@dcon.de>
> >
> > On native Windows, Git exclusively uses UTF-8 for console output (both
> > with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
> > determine the output encoding for translated text, however, MSVCRT's
> > `setlocale()` does not support UTF-8. As a result, translated text is
> > encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
> > mangled in console output.
> >
> > Side note: There is actually a code page for UTF-8: 65001. In practice,
> > it does not work as expected at least on Windows 7, though, so we cannot
> > use it in Git. Besides, if we overrode the code page, any process
> > spawned from Git would inherit that code page (as opposed to the code
> > page configured for the current user), which would quite possibly break
> > e.g. diff or merge helpers. So we really cannot override the code page.
> >
> > In `init_gettext_charset()`, Git calls gettext's
> > `bind_textdomain_codeset()` with the character set obtained via
> > `locale_charset()`; Let's override that latter function to force the
> > encoding to UTF-8 on native Windows.
> >
> > In Git for Windows' SDK, there is a `libcharset.h` and therefore we
> > define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
> > `config.mak.uname`, therefore we need to add the override before that
> > conditionally-compiled code block.
> >
> > Rather than simply defining `locale_charset()` to return the string
> > `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
> > `ab/no-kwset` patch series, for example, needs to have a way to prevent
> > Git from expecting UTF-8-encoded input.
>
> It's not just the ab/no-kwset I have cooking (but happy to have this
> take that into account), but also anything grep-like is usually must
> faster with LC_ALL=C. Isn't that also the case on Windows?

Probably. I have never tested this.

> Setting locales affects a large variety of libc functions and third
> party libraries (e.g. PCRE via us setting "use UTF-8" under locale).

Yes, but as I mentioned in the commit message, setting locales in MINGW
programs is murky at best. There is the idea of gettext, and there is the
idea of Windows, and they are likely a bit different from one another.

> > Signed-off-by: Karsten Blees <blees@dcon.de>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  gettext.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gettext.c b/gettext.c
> > index d4021d690c..3f2aca5c3b 100644
> > --- a/gettext.c
> > +++ b/gettext.c
> > @@ -12,7 +12,25 @@
> >  #ifndef NO_GETTEXT
> >  #	include <locale.h>
> >  #	include <libintl.h>
> > -#	ifdef HAVE_LIBCHARSET_H
> > +#	ifdef GIT_WINDOWS_NATIVE
> > +
> > +static const char *locale_charset(void)
> > +{
> > +	const char *env = getenv("LC_ALL"), *dot;
> > +
> > +	if (!env || !*env)
> > +		env = getenv("LC_CTYPE");
> > +	if (!env || !*env)
> > +		env = getenv("LANG");
> > +
> > +	if (!env)
> > +		return "UTF-8";
> > +
> > +	dot = strchr(env, '.');
> > +	return !dot ? env : dot + 1;
> > +}
> > +
> > +#	elif defined HAVE_LIBCHARSET_H
> >  #		include <libcharset.h>
> >  #	else
> >  #		include <langinfo.h>
>
> I'll take it on faith that this is what the locale_charset() should look
> like.

I copy/edited that code from a later code block in `is_utf8_locale()` that
is also conditional (under `NO_GETTEXT`, hence no attempt to refactor it,
as that would make the code even less readable).

So I am fairly confident that the code is _correct_.

Whether it is elegant, I cannot really say. It strikes me as ugly, in
those indented `#ifdef..#endif` guards, yet I did not find a way to make
it less ugly.

> I wonder if it wouldn't be better to always compile this function, and
> just have init_gettext_charset() switch between the two.

Based on what? If the switch is a compile time switch, then this function
must be under the same compile time guard, otherwise GCC will complain
about an unused static function.

> We've moved more towards that sort of thing (e.g. with pthreads). I.e.
> prefer redundant compilation to ifdefing platform-only code (which then
> only gets compiled there). See "HAVE_THREADS" in the code.

How does that even avoid complaints by GCC about dead code.

> It looks to me that with this patch the HAVE_LIBCHARSET_H docs in
> "Makefile" become wrong. Shouldn't those be updated too?

That comment says:

# Define HAVE_LIBCHARSET_H if you haven't set NO_GETTEXT and you can't
# trust the langinfo.h's nl_langinfo(CODESET) function to return the
# current character set. [...]

I think it still applies.

> We also still pass -DHAVE_LIBCHARSET_H to every file we compile, only to
> never use it under GIT_WINDOWS_NATIVE, but perhaps fixing that isn't
> possible with GIT_WINDOWS_NATIVE being a macro, and perhaps I've again
> gotten the "native" v.s. "mingw" etc. relationship wrong in my head and
> the HAVE_LIBCHARSET_H docs are fine.

MinGW is a really old, outdated thing. These days, mingw-w64 is the rage
(it even supports building 64-bit binaries, would you believe that?
</sarcasm>). And neither is "native", strictly, although it is as native
as you can get with GCC.

And as you say, the macro thing makes it hard/impossible to decide in
Makefile whether we want to pass HAVE_LIBCHARSET_H or not. So we should do
it independently of whether we're on Windows or not.

> It just seems wrong that we have both the configure script &
> config.mak.uname look for / declare that we have libcharset.h, only to
> at this late point not use libcharset.h at all. Couldn't we just know if
> GIT_WINDOWS_NATIVE will be true earlier & move that check up, so it &
> HAVE_LIBCHARSET_H can be mutually exclusive (with accompanying #error if
> we have both)?

I don't think that this is wrong, as it is correct in pretty much all
circumstances except Git for Windows. And even in Git for Windows it is
correct: we do have a `libcharset.h`. We just can't use it to determine
the current locale.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/1] gettext: always use UTF-8 on native Windows
  2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
  2019-07-08 12:57       ` Johannes Schindelin
@ 2019-07-08 18:30       ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2019-07-08 18:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Karsten Blees via GitGitGadget, git, Karsten Blees, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> -#	ifdef HAVE_LIBCHARSET_H
>> +#	ifdef GIT_WINDOWS_NATIVE
>> + ... new windows-only code ...
>> +#	elif defined HAVE_LIBCHARSET_H
>>  #		include <libcharset.h>
>>  #	else
>>  #		include <langinfo.h>
> ...
> It looks to me that with this patch the HAVE_LIBCHARSET_H docs in
> "Makefile" become wrong. Shouldn't those be updated too?

I do not think this change has much to do with HAVE_LIBCHARSET_H; it
inserts "regardless of what we have been doing, do this new thing
only and always on windows (persumably '... because libcharset would
not be useful on that platform')".  

Existing users of HAVE_LIBCHARSET_H and existing non-windows users
that did not use HAVE_LIBCHARSET_H are not affected, and whatever
Makefile documents the macro as still applies to them.

> I wonder if it wouldn't be better to always compile this function, and
> just have init_gettext_charset() switch between the two. We've moved
> more towards that sort of thing (e.g. with pthreads). I.e. prefer
> redundant compilation to ifdefing platform-only code (which then only
> gets compiled there). See "HAVE_THREADS" in the code.

OK, so init_gettext_charset() is the only caller of locale_charset()
in our codebase, and we supply our own locale_charset() if we do not
have <libcharset.h>, either with nl_langinfo(), or with the code
introduced by the patch in question for windows.  Your suggestion is
to add a block of #ifdef cascade in init_gettext_charset() to call
locale_charset(), nl_langinfo(), or the windows-only code (perhaps
inlined right there)?

I am not sure.  We'd need the conditional inclusion of header files
depending on HAVE_LIBCHARSET_H etc. anyway, so...

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-27  8:44 [PATCH 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
2019-07-03 11:26   ` Johannes Schindelin
2019-07-03 18:31     ` Junio C Hamano
2019-07-03 20:46 ` [PATCH v2 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
2019-07-03 20:46   ` [PATCH v2 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
2019-07-08 12:57       ` Johannes Schindelin
2019-07-08 18:30       ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox