git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* fatal: cannot convert from utf8 to UTF-8
@ 2012-10-19  0:03 Cristian Tibirna
  2012-10-19  5:50 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Cristian Tibirna @ 2012-10-19  0:03 UTC (permalink / raw)
  To: git


This error:

fatal: cannot convert from utf8 to UTF-8

occured in two distinct situations in our work group with git binaries older 
or equal to 1.7.7. Once during a commit, the other time during a rebase. Both 
occurences are 100% reproductible. But the commit that gives the error during 
a rebase doesn't do so in a cherry-pick.

This is in part our fault: during the standardisation of our git environment, 
we (re)enforced UTF-8 encodings by setting "i18n.commitenconding" and 
"i18n.logOutputEncoding" to "utf8".

It is the "i18n.logOutputEncoding = utf8" that *sometimes* triggers the error 
above.

I know "utf8" is not an accepted denomination ("UTF-8" or "utf-8" should be 
used, according to IANA standards), but we have attenuating circumstances in 
the fact that most things dealing with encoding accept the erroneous name. 
That includes at least iconv(1) and python(1). Thus we ignored that a 
distinction existed and, as self-respecting lazy typers, we preferred the (one 
touch) shorter version.

I wonder if it should be expected that git accepts these name variants ("utf8" 
and "UTF8") as valid and equivalent to the standard ones.

Of course it is very easy for us to work around the error, since setting 
"i18n.logOutputEncoding = utf-8" or removing it altogether from the git config 
file chases the error away. It's only that these kinds of things are bound to 
happen and for a good proportion of git users it might be well opaque, 
difficult to fix and, in drastic (user ignorance-induced) cases, a 
showstopper.

Thanks for listening.

-- 
Cristian Tibirna				(418) 656-2131 / 4340
  Laval University - Québec, CAN ... http://www.giref.ulaval.ca/~ctibirna
  Research professional - GIREF ... ctibirna@giref.ulaval.ca

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: fatal: cannot convert from utf8 to UTF-8
  2012-10-19  0:03 fatal: cannot convert from utf8 to UTF-8 Cristian Tibirna
@ 2012-10-19  5:50 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2012-10-19  5:50 UTC (permalink / raw)
  To: Cristian Tibirna; +Cc: git

Cristian Tibirna <ctibirna@giref.ulaval.ca> writes:

> This error:
>
> fatal: cannot convert from utf8 to UTF-8
> ...
> This is in part our fault: during the standardisation of our git environment, 
> we (re)enforced UTF-8 encodings by setting "i18n.commitenconding" and 
> "i18n.logOutputEncoding" to "utf8".
> ...
> I know "utf8" is not an accepted denomination ("UTF-8" or "utf-8" should be 
> used, according to IANA standards),...

Perhaps like this.

-- >8 --
Subject: [PATCH] reencode_string(): introduce and use same_encoding()

Callers of reencode_string() that re-encodes a string from one
encoding to another all used ad-hoc way to bypass the case where the
input and the output encodings are the same.  Some did strcmp(),
some did strcasecmp(), yet some others when converting to UTF-8 used
is_encoding_utf8().

Introduce same_encoding() helper function to make these callers
use the same logic.  Notably, is_encoding_utf8() has a work-around
for common misconfiguration to use "utf8" to name UTF-8 encoding,
which does not match "UTF-8" hence strcasecmp() would not consider
the same.  Make use of it in this helper function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 builtin/mailinfo.c | 2 +-
 notes.c            | 2 +-
 pretty.c           | 2 +-
 sequencer.c        | 2 +-
 utf8.c             | 7 +++++++
 utf8.h             | 1 +
 6 files changed, 12 insertions(+), 4 deletions(-)

diff --git c/builtin/mailinfo.c w/builtin/mailinfo.c
index da23140..e4e39d6 100644
--- c/builtin/mailinfo.c
+++ w/builtin/mailinfo.c
@@ -483,7 +483,7 @@ static void convert_to_utf8(struct strbuf *line, const char *charset)
 
 	if (!charset || !*charset)
 		return;
-	if (!strcasecmp(metainfo_charset, charset))
+	if (same_encoding(metainfo_charset, charset))
 		return;
 	out = reencode_string(line->buf, metainfo_charset, charset);
 	if (!out)
diff --git c/notes.c w/notes.c
index bc454e1..ee8f01f 100644
--- c/notes.c
+++ w/notes.c
@@ -1231,7 +1231,7 @@ static void format_note(struct notes_tree *t, const unsigned char *object_sha1,
 	}
 
 	if (output_encoding && *output_encoding &&
-			strcmp(utf8, output_encoding)) {
+	    !is_encoding_utf8(output_encoding)) {
 		char *reencoded = reencode_string(msg, output_encoding, utf8);
 		if (reencoded) {
 			free(msg);
diff --git c/pretty.c w/pretty.c
index 8b1ea9f..e87fe9f 100644
--- c/pretty.c
+++ w/pretty.c
@@ -504,7 +504,7 @@ char *logmsg_reencode(const struct commit *commit,
 		return NULL;
 	encoding = get_header(commit, "encoding");
 	use_encoding = encoding ? encoding : utf8;
-	if (!strcmp(use_encoding, output_encoding))
+	if (same_encoding(use_encoding, output_encoding))
 		if (encoding) /* we'll strip encoding header later */
 			out = xstrdup(commit->buffer);
 		else
diff --git c/sequencer.c w/sequencer.c
index e3723d2..73c396b 100644
--- c/sequencer.c
+++ w/sequencer.c
@@ -60,7 +60,7 @@ static int get_message(struct commit *commit, struct commit_message *out)
 
 	out->reencoded_message = NULL;
 	out->message = commit->buffer;
-	if (strcmp(encoding, git_commit_encoding))
+	if (same_encoding(encoding, git_commit_encoding))
 		out->reencoded_message = reencode_string(commit->buffer,
 					git_commit_encoding, encoding);
 	if (out->reencoded_message)
diff --git c/utf8.c w/utf8.c
index a544f15..6a52834 100644
--- c/utf8.c
+++ w/utf8.c
@@ -423,6 +423,13 @@ int is_encoding_utf8(const char *name)
 	return 0;
 }
 
+int same_encoding(const char *src, const char *dst)
+{
+	if (is_encoding_utf8(src) && is_encoding_utf8(dst))
+		return 1;
+	return !strcasecmp(src, dst);
+}
+
 /*
  * Given a buffer and its encoding, return it re-encoded
  * with iconv.  If the conversion fails, returns NULL.
diff --git c/utf8.h w/utf8.h
index 3c0ae76..93ef600 100644
--- c/utf8.h
+++ w/utf8.h
@@ -7,6 +7,7 @@ int utf8_width(const char **start, size_t *remainder_p);
 int utf8_strwidth(const char *string);
 int is_utf8(const char *text);
 int is_encoding_utf8(const char *name);
+int same_encoding(const char *, const char *);
 
 int strbuf_add_wrapped_text(struct strbuf *buf,
 		const char *text, int indent, int indent2, int width);

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-10-19  5:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-19  0:03 fatal: cannot convert from utf8 to UTF-8 Cristian Tibirna
2012-10-19  5:50 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).