git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/1] gettext(windows): always use UTF-8
Date: Wed, 03 Jul 2019 13:46:03 -0700 (PDT)	[thread overview]
Message-ID: <pull.217.v2.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.217.git.gitgitgadget@gmail.com>

The main issue we work around here is that Windows does not have a UTF-8
"code page".

Side note: there is actually a code page for UTF-8: 65001 (see 
https://docs.microsoft.com/en-us/windows/desktop/Intl/code-page-identifiers
). However, when experimenting with it, we ran into a multitude of issues in
the Git for Windows project, ranging from various problems with Windows'
default console to miscounted file writes. While these issues may have been
mitigated in recent Windows 10 versions, older ones (in particular, Windows
7) still seem to have most of them, and Git for Windows specifically still
supports even Windows Vista. So from a practical point of view, there is no
UTF-8 code page.

Changes since v1:

 * The LC_ALL=C method used by ab/no-kwset to prevent Git from assuming
   UTF-8-encoded input is now supported.
 * The commit message was enhanced and revamped.

Karsten Blees (1):
  gettext: always use UTF-8 on native Windows

 gettext.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-217%2Fdscho%2Fgettext-force-utf-8-on-windows-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-217/dscho/gettext-force-utf-8-on-windows-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/217

Range-diff vs v1:

 1:  ff37a2646a ! 1:  2d2253faef gettext: always use UTF-8 on native Windows
     @@ -2,17 +2,34 @@
      
          gettext: always use UTF-8 on native Windows
      
     -    Git on native Windows exclusively uses UTF-8 for console output (both with
     -    MinTTY and native Console windows). Gettext uses setlocale() to determine
     -    the output encoding for translated text, however, MSVCRT's setlocale()
     -    doesn't support UTF-8. As a result, translated text is encoded in system
     -    encoding (GetAPC()), and non-ASCII chars are mangled in console output.
     +    On native Windows, Git exclusively uses UTF-8 for console output (both
     +    with MinTTY and native Win32 Console). Gettext uses `setlocale()` to
     +    determine the output encoding for translated text, however, MSVCRT's
     +    `setlocale()` does not support UTF-8. As a result, translated text is
     +    encoded in system encoding (as per `GetAPC()`), and non-ASCII chars are
     +    mangled in console output.
      
     -    Use gettext's bind_textdomain_codeset() to force the encoding to UTF-8 on
     -    native Windows.
     +    Side note: There is actually a code page for UTF-8: 65001. In practice,
     +    it does not work as expected at least on Windows 7, though, so we cannot
     +    use it in Git. Besides, if we overrode the code page, any process
     +    spawned from Git would inherit that code page (as opposed to the code
     +    page configured for the current user), which would quite possibly break
     +    e.g. diff or merge helpers. So we really cannot override the code page.
      
     -    In this developers' setup, HAVE_LIBCHARSET_H is apparently defined, but
     -    we *really* want to override the locale_charset() here.
     +    In `init_gettext_charset()`, Git calls gettext's
     +    `bind_textdomain_codeset()` with the character set obtained via
     +    `locale_charset()`; Let's override that latter function to force the
     +    encoding to UTF-8 on native Windows.
     +
     +    In Git for Windows' SDK, there is a `libcharset.h` and therefore we
     +    define `HAVE_LIBCHARSET_H` in the MINGW-specific section in
     +    `config.mak.uname`, therefore we need to add the override before that
     +    conditionally-compiled code block.
     +
     +    Rather than simply defining `locale_charset()` to return the string
     +    `"UTF-8"`, though, we are careful not to break `LC_ALL=C`: the
     +    `ab/no-kwset` patch series, for example, needs to have a way to prevent
     +    Git from expecting UTF-8-encoded input.
      
          Signed-off-by: Karsten Blees <blees@dcon.de>
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     @@ -26,7 +43,23 @@
       #	include <libintl.h>
      -#	ifdef HAVE_LIBCHARSET_H
      +#	ifdef GIT_WINDOWS_NATIVE
     -+#		define locale_charset() "UTF-8"
     ++
     ++static const char *locale_charset(void)
     ++{
     ++	const char *env = getenv("LC_ALL"), *dot;
     ++
     ++	if (!env || !*env)
     ++		env = getenv("LC_CTYPE");
     ++	if (!env || !*env)
     ++		env = getenv("LANG");
     ++
     ++	if (!env)
     ++		return "UTF-8";
     ++
     ++	dot = strchr(env, '.');
     ++	return !dot ? env : dot + 1;
     ++}
     ++
      +#	elif defined HAVE_LIBCHARSET_H
       #		include <libcharset.h>
       #	else

-- 
gitgitgadget

  parent reply	other threads:[~2019-07-03 20:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27  8:44 [PATCH 0/1] gettext(windows): always use UTF-8 Johannes Schindelin via GitGitGadget
2019-06-27  8:44 ` [PATCH 1/1] gettext: always use UTF-8 on native Windows Karsten Blees via GitGitGadget
2019-07-03 11:26   ` Johannes Schindelin
2019-07-03 18:31     ` Junio C Hamano
2019-07-03 20:46 ` Johannes Schindelin via GitGitGadget [this message]
2019-07-03 20:46   ` [PATCH v2 " Karsten Blees via GitGitGadget
2019-07-04 22:53     ` Ævar Arnfjörð Bjarmason
2019-07-08 12:57       ` Johannes Schindelin
2019-07-08 18:30       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.217.v2.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).