git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Jeff King <peff@peff.net>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
	Junio C Hamano <gitster@pobox.com>,
	Thomas Haller <thom311@gmail.com>, Git List <git@vger.kernel.org>
Subject: Re: [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly
Date: Mon, 25 Feb 2013 22:00:46 +0100	[thread overview]
Message-ID: <512BD0FE.5040108@web.de> (raw)
In-Reply-To: <20130225151916.GA7725@sigill.intra.peff.net>

On 25.02.13 16:19, Jeff King wrote:
> On Mon, Feb 25, 2013 at 09:37:50AM +0100, Johannes Sixt wrote:
> 
>> From: Johannes Sixt <j6t@kdbg.org>
>>
>> iconv on Windows does not know the encoding name "utf8", and does not
>> re-encode log messages when this name is given. Request "UTF-8" encoding.
>>
>> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
>> ---
>>  I'm not sure whether I'm right to say that "UTF-8" is the correct
>>  spelling. Anyway, 'iconv -l' on my old Linux box lists "UTF8", but on
>>  Windows it does not.
> 
> UTF-8 is correct according to:
> 
>   https://en.wikipedia.org/wiki/Utf8#Official_name_and_variants
> 
>>  A more correct fix would probably be to use is_encoding_utf8() in more
>>  places, but it's outside my time budget look after it.
> 
> Yeah, I wonder if this is a symptom of a deeper issue, which is that
> utf-8 has many synonyms, and we would prefer to canonicalize the
> encoding name before generating an object to avoid inconsistencies (of
> course we cannot do so for every imaginable encoding, but utf-8 is a
> pretty obvious one we handle already). We _should_ be generating commits
> with no encoding header at all for utf-8, though.
> 
> And indeed, it looks like that is the case. commit_tree_extended has:
> 
>     /* Not having i18n.commitencoding is the same as having utf-8 */
>     encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
> 
>     [...]
> 
>     if (!encoding_is_utf8)
>             strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
> 
> 
> which makes me think that this first hunk...
> 
>> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
>> index 52a7472..b1956e2 100755
>> --- a/t/t4210-log-i18n.sh
>> +++ b/t/t4210-log-i18n.sh
>> @@ -15,7 +15,7 @@ test_expect_success 'create commits in different encodings' '
>>  	t${utf8_e}st
>>  	EOF
>>  	git add msg &&
>> -	git -c i18n.commitencoding=utf8 commit -F msg &&
>> +	git -c i18n.commitencoding=UTF-8 commit -F msg &&
>>  	cat >msg <<-EOF &&
>>  	latin1
> 
> ...should be a no-op; the utf8 there should never be seen by anybody but
> git. Can you confirm that is the case?
> 
>> @@ -30,7 +30,7 @@ test_expect_success 'log --grep searches in log output encoding (utf8)' '
>>  	latin1
>>  	utf8
>>  	EOF
>> -	git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
>> +	git log --encoding=UTF-8 --format=%s --grep=$utf8_e >actual &&
>>  	test_cmp expect actual
>>  '
> 
> This one will feed it to iconv, though, because the latin1 commit will
> need to be re-encoded. I think the simplest thing would just be:
> 
> diff --git a/utf8.c b/utf8.c
> index 1087870..8d42b50 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -507,6 +507,17 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
>  
>  	if (!in_encoding)
>  		return NULL;
> +
> +	/*
> +	 * Some platforms do not have the variously spelled variants of
> +	 * UTF-8, so let us feed iconv the most official spelling, which
> +	 * should hopefully be accepted everywhere.
> +	 */
> +	if (is_encoding_utf8(in_encoding))
> +		in_encoding = "UTF-8";
> +	if (is_encoding_utf8(out_encoding))
> +		out_encoding = "UTF-8";
> +
>  	conv = iconv_open(out_encoding, in_encoding);
>  	if (conv == (iconv_t) -1)
>  		return NULL;
> 
> Does that fix the tests for you? It's a larger change, but I think it
> makes git friendlier all around for people on Windows.
> 
> -Peff
> --
 
Thanks, I'm OK with your version.

And a test on cygwin was OK for the new t4210.

  parent reply	other threads:[~2013-02-25 21:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 23:52 segfault for git log --graph --no-walk --grep a Thomas Haller
2013-02-09  0:05 ` Junio C Hamano
2013-02-09  0:22   ` Junio C Hamano
2013-02-09  0:27     ` Jeff King
2013-02-09  0:39       ` Junio C Hamano
2013-02-09  0:47         ` Junio C Hamano
2013-02-09  1:05           ` Jeff King
2013-02-09  1:08             ` Jeff King
2013-02-11 19:16           ` Jeff King
2013-02-11 20:01             ` Junio C Hamano
2013-02-11 20:36               ` Junio C Hamano
2013-02-11 20:41                 ` Jeff King
2013-02-11 20:55                   ` Junio C Hamano
2013-02-11 20:59               ` [PATCH] log: re-encode commit messages before grepping Jeff King
2013-02-11 21:11                 ` Junio C Hamano
2013-02-11 21:14                   ` Jeff King
2013-02-25  8:37                 ` [PATCH ] t4210-log-i18n: spell encoding name "UTF-8" correctly Johannes Sixt
2013-02-25 15:19                   ` Jeff King
2013-02-25 19:06                     ` Junio C Hamano
2013-02-25 20:31                       ` Jeff King
2013-02-26  6:47                         ` Johannes Sixt
2013-02-25 21:00                     ` Torsten Bögershausen [this message]
2013-02-25 18:54                   ` Torsten Bögershausen
2013-02-25 20:36                     ` Jeff King
2013-02-09  0:29     ` segfault for git log --graph --no-walk --grep a Junio C Hamano
2013-02-09  0:39       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512BD0FE.5040108@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    --cc=peff@peff.net \
    --cc=thom311@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).