git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Johannes Sixt <j.sixt@viscovery.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Thomas Haller <thom311@gmail.com>, Git List <git@vger.kernel.org>
Subject: Re: [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly
Date: Mon, 25 Feb 2013 10:19:17 -0500	[thread overview]
Message-ID: <20130225151916.GA7725@sigill.intra.peff.net> (raw)
In-Reply-To: <512B22DE.9070603@viscovery.net>

On Mon, Feb 25, 2013 at 09:37:50AM +0100, Johannes Sixt wrote:

> From: Johannes Sixt <j6t@kdbg.org>
> 
> iconv on Windows does not know the encoding name "utf8", and does not
> re-encode log messages when this name is given. Request "UTF-8" encoding.
> 
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> ---
>  I'm not sure whether I'm right to say that "UTF-8" is the correct
>  spelling. Anyway, 'iconv -l' on my old Linux box lists "UTF8", but on
>  Windows it does not.

UTF-8 is correct according to:

  https://en.wikipedia.org/wiki/Utf8#Official_name_and_variants

>  A more correct fix would probably be to use is_encoding_utf8() in more
>  places, but it's outside my time budget look after it.

Yeah, I wonder if this is a symptom of a deeper issue, which is that
utf-8 has many synonyms, and we would prefer to canonicalize the
encoding name before generating an object to avoid inconsistencies (of
course we cannot do so for every imaginable encoding, but utf-8 is a
pretty obvious one we handle already). We _should_ be generating commits
with no encoding header at all for utf-8, though.

And indeed, it looks like that is the case. commit_tree_extended has:

    /* Not having i18n.commitencoding is the same as having utf-8 */
    encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);

    [...]

    if (!encoding_is_utf8)
            strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);


which makes me think that this first hunk...

> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> index 52a7472..b1956e2 100755
> --- a/t/t4210-log-i18n.sh
> +++ b/t/t4210-log-i18n.sh
> @@ -15,7 +15,7 @@ test_expect_success 'create commits in different encodings' '
>  	t${utf8_e}st
>  	EOF
>  	git add msg &&
> -	git -c i18n.commitencoding=utf8 commit -F msg &&
> +	git -c i18n.commitencoding=UTF-8 commit -F msg &&
>  	cat >msg <<-EOF &&
>  	latin1

...should be a no-op; the utf8 there should never be seen by anybody but
git. Can you confirm that is the case?

> @@ -30,7 +30,7 @@ test_expect_success 'log --grep searches in log output encoding (utf8)' '
>  	latin1
>  	utf8
>  	EOF
> -	git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
> +	git log --encoding=UTF-8 --format=%s --grep=$utf8_e >actual &&
>  	test_cmp expect actual
>  '

This one will feed it to iconv, though, because the latin1 commit will
need to be re-encoded. I think the simplest thing would just be:

diff --git a/utf8.c b/utf8.c
index 1087870..8d42b50 100644
--- a/utf8.c
+++ b/utf8.c
@@ -507,6 +507,17 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
 
 	if (!in_encoding)
 		return NULL;
+
+	/*
+	 * Some platforms do not have the variously spelled variants of
+	 * UTF-8, so let us feed iconv the most official spelling, which
+	 * should hopefully be accepted everywhere.
+	 */
+	if (is_encoding_utf8(in_encoding))
+		in_encoding = "UTF-8";
+	if (is_encoding_utf8(out_encoding))
+		out_encoding = "UTF-8";
+
 	conv = iconv_open(out_encoding, in_encoding);
 	if (conv == (iconv_t) -1)
 		return NULL;

Does that fix the tests for you? It's a larger change, but I think it
makes git friendlier all around for people on Windows.

-Peff

  reply	other threads:[~2013-02-25 15:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 23:52 segfault for git log --graph --no-walk --grep a Thomas Haller
2013-02-09  0:05 ` Junio C Hamano
2013-02-09  0:22   ` Junio C Hamano
2013-02-09  0:27     ` Jeff King
2013-02-09  0:39       ` Junio C Hamano
2013-02-09  0:47         ` Junio C Hamano
2013-02-09  1:05           ` Jeff King
2013-02-09  1:08             ` Jeff King
2013-02-11 19:16           ` Jeff King
2013-02-11 20:01             ` Junio C Hamano
2013-02-11 20:36               ` Junio C Hamano
2013-02-11 20:41                 ` Jeff King
2013-02-11 20:55                   ` Junio C Hamano
2013-02-11 20:59               ` [PATCH] log: re-encode commit messages before grepping Jeff King
2013-02-11 21:11                 ` Junio C Hamano
2013-02-11 21:14                   ` Jeff King
2013-02-25  8:37                 ` [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly Johannes Sixt
2013-02-25 15:19                   ` Jeff King [this message]
2013-02-25 19:06                     ` Junio C Hamano
2013-02-25 20:31                       ` Jeff King
2013-02-26  6:47                         ` Johannes Sixt
2013-02-25 21:00                     ` Torsten Bögershausen
2013-02-25 18:54                   ` Torsten Bögershausen
2013-02-25 20:36                     ` Jeff King
2013-02-09  0:29     ` segfault for git log --graph --no-walk --grep a Junio C Hamano
2013-02-09  0:39       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130225151916.GA7725@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    --cc=thom311@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).