[PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful
@ 2012-02-21 14:24 Nguyễn Thái Ngọc Duy
  2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t3900-i18n-commit.sh |    4 ++--
 t/t3900/UTF-16.txt     |  Bin 0 -> 18 bytes
 2 files changed, 2 insertions(+), 2 deletions(-)
 create mode 100644 t/t3900/UTF-16.txt

diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index d48a7c0..a9e5662 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -34,9 +34,9 @@ test_expect_success 'no encoding header for base case' '
 	test z = "z$E"
 '
 
-test_expect_failure 'UTF-16 refused because of NULs' '
+test_expect_success 'UTF-16 refused because of NULs' '
 	echo UTF-16 >F &&
-	git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt
+	test_must_fail git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt
 '
 
 
diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8d0945b8e0a734ced8948da29ed9f8c65e3ec775
GIT binary patch
literal 18
VcmezW&xIi$409P$8B!Ry7yv#b1kV5f

literal 0
HcmV?d00001

-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
  2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
@ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy
  2012-02-21 14:53   ` Nguyen Thai Ngoc Duy
  2012-02-21 18:21   ` Jeff King
  2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
  2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
  2 siblings, 2 replies; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

We rely on ASCII everywhere. We print "\n" directly without conversion
for example. The end result would be a mix of some encoding and ASCII
if they are incompatible. Do not do that.

In theory we could convert everything to utf-8 as intermediate medium,
process process process, then convert final output to the desired
encoding. But that's a lot of work (unless we have a pager-like
converter) with little real use. Users can just pipe everything to
iconv instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 It seems half of the encodings "iconv -l" list does not pass
 ascii_superset_encoding() test. I just assume they are either exotic
 or duplicate names.

 pretty.c |    7 +++++++
 utf8.c   |   15 +++++++++++++++
 utf8.h   |    1 +
 3 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/pretty.c b/pretty.c
index 8688b8f..5c433a2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -493,12 +493,19 @@ char *logmsg_reencode(const struct commit *commit,
 		      const char *output_encoding)
 {
 	static const char *utf8 = "UTF-8";
+	static const char *last_output_encoding = NULL;
 	const char *use_encoding;
 	char *encoding;
 	char *out;
 
 	if (!*output_encoding)
 		return NULL;
+	if (last_output_encoding != output_encoding) {
+		if (!ascii_superset_encoding(output_encoding))
+			die("encoding %s is not a superset of ASCII.",
+			    output_encoding);
+		last_output_encoding = output_encoding;
+	}
 	encoding = get_header(commit, "encoding");
 	use_encoding = encoding ? encoding : utf8;
 	if (!strcmp(use_encoding, output_encoding))
diff --git a/utf8.c b/utf8.c
index 8acbc66..def93ee 100644
--- a/utf8.c
+++ b/utf8.c
@@ -482,3 +482,18 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
 	return out;
 }
 #endif
+
+int ascii_superset_encoding(const char *encoding)
+{
+	const char *sample = " !\"#$%&'()*+,-./0123456789:;<=>?@"
+		"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
+		"abcdefghijklmnopqrstuvwxyz{|}~\n";
+	char *output;
+	int ret;
+	if (!encoding)
+		return 1;
+	output = reencode_string(sample, encoding, "US-ASCII");
+	ret = !output || !strcmp(sample, output);
+	free(output);
+	return ret;
+}
diff --git a/utf8.h b/utf8.h
index 81f2c82..75bc128 100644
--- a/utf8.h
+++ b/utf8.h
@@ -12,6 +12,7 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
 		const char *text, int indent, int indent2, int width);
 int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 			     int indent, int indent2, int width);
+int ascii_superset_encoding(const char *encoding);
 
 #ifndef NO_ICONV
 char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding);
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] utf8: die if failed to re-encoding
  2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
  2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
@ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy
  2012-02-21 17:36   ` Junio C Hamano
  2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Return value NULL in this case means "no conversion needed", which is
not quite true when conv == -1.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t4201-shortlog.sh |    2 +-
 utf8.c              |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 6872ba1..d445665 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -27,7 +27,7 @@ test_expect_success 'setup' '
 		tr 1234 "\360\235\204\236")" a1 &&
 
 	# now fsck up the utf8
-	git config i18n.commitencoding non-utf-8 &&
+	git config i18n.commitencoding viscii &&
 	echo 4 >a1 &&
 	git commit --quiet -m "$(
 		echo "This is a very, very long first line for the commit message to see if it is wrapped correctly" |
diff --git a/utf8.c b/utf8.c
index def93ee..f918e9e 100644
--- a/utf8.c
+++ b/utf8.c
@@ -444,7 +444,7 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
 		return NULL;
 	conv = iconv_open(out_encoding, in_encoding);
 	if (conv == (iconv_t) -1)
-		return NULL;
+		die("failed to convert from %s to %s", in_encoding, out_encoding);
 	insz = strlen(in);
 	outsz = insz;
 	outalloc = outsz + 1; /* for terminating NUL */
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] Only re-encode certain parts in commit object, not the whole
  2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
  2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
  2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
@ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy
  2012-02-21 18:25   ` Jeff King
  2 siblings, 1 reply; 12+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Commit object has its own format, which happens to be in ascii, but
not really subject to re-encoding.

There are only four areas that may be re-encoded: author line,
committer line, mergetag lines and commit body.  Encoding of tags
embedded in mergetag lines is not decided by commit encoding, so leave
it out and consider it binary.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pretty.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 57 insertions(+), 1 deletions(-)

diff --git a/pretty.c b/pretty.c
index 5c433a2..6ccc091 100644
--- a/pretty.c
+++ b/pretty.c
@@ -489,6 +489,62 @@ static char *replace_encoding_header(char *buf, const char *encoding)
 	return strbuf_detach(&tmp, NULL);
 }
 
+/*
+ * Re-encode author, committer and commit body only, leaving the rest
+ * in ascii (or whatever the encoding it is in mergetag lines)
+ * regardless output encoding. We assume the commit is good, so no
+ * validation.
+ */
+static char *reencode_commit(const char *buffer,
+			     const char *out_enc, const char *in_enc)
+{
+	struct strbuf out = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	char *reencoded, *s, *e;
+
+	strbuf_addstr(&buf, buffer);
+
+	s = strstr(buf.buf, "\nauthor ");
+	assert(s != NULL);
+	s += 8;			/* "\nauthor " */
+	strbuf_add(&out, buf.buf, s - buf.buf);
+	e = strchr(s, '\n');
+	*e = '\0';
+	reencoded = reencode_string(s, out_enc, in_enc);
+	if (reencoded && strchr(reencoded, '\n'))
+		die("your chosen encoding produces \\n out of nowhere?");
+	strbuf_addstr(&out, reencoded ? reencoded : s);
+	free(reencoded);
+
+	strbuf_addstr(&out, "\ncommitter ");
+	assert(!strncmp(e + 1, "committer ", 10));
+	s = e + 11;		/* "\ncommitter " */
+	e = strchr(s, '\n');
+	*e = '\0';
+	reencoded = reencode_string(s, out_enc, in_enc);
+	if (reencoded && strchr(reencoded, '\n'))
+		die("your chosen encoding produces \\n out of nowhere?");
+	strbuf_addstr(&out, reencoded ? reencoded : s);
+	free(reencoded);
+	*e = '\n';
+
+	s = e;
+	e = strstr(s, "\n\n");
+	if (e) {
+		e += 2;		/* "\n\n" */
+		strbuf_add(&out, s, e - s);
+
+		s = e;
+		reencoded = reencode_string(s, out_enc, in_enc);
+		strbuf_addstr(&out, reencoded ? reencoded : s);
+		free(reencoded);
+	} else
+		strbuf_addstr(&out, s);
+
+	strbuf_release(&buf);
+	return strbuf_detach(&out, NULL);
+}
+
 char *logmsg_reencode(const struct commit *commit,
 		      const char *output_encoding)
 {
@@ -514,7 +570,7 @@ char *logmsg_reencode(const struct commit *commit,
 		else
 			return NULL; /* nothing to do */
 	else
-		out = reencode_string(commit->buffer,
+		out = reencode_commit(commit->buffer,
 				      output_encoding, use_encoding);
 	if (out)
 		out = replace_encoding_header(out, output_encoding);
-- 
1.7.8.36.g69ee2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
  2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
@ 2012-02-21 14:53   ` Nguyen Thai Ngoc Duy
  2012-02-21 18:21   ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-21 14:53 UTC (permalink / raw
  To: git

2012/2/21 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> @@ -482,3 +482,18 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
>        return out;
>  }
>  #endif
> +
> +int ascii_superset_encoding(const char *encoding)
> +{
> +       const char *sample = " !\"#$%&'()*+,-./0123456789:;<=>?@"
> +               "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
> +               "abcdefghijklmnopqrstuvwxyz{|}~\n";
> +       char *output;
> +       int ret;
> +       if (!encoding)
> +               return 1;
> +       output = reencode_string(sample, encoding, "US-ASCII");
> +       ret = !output || !strcmp(sample, output);
> +       free(output);
> +       return ret;
> +}

Side note about this function, which was written to ban all
ascii-incompatible charsets from entering commit objects. The idea of
mixing charsets in the same buffer without clear boundary does not
sound healthy. Plus, ident.c will silently drop '\n', '<' and '>' in
author/committer. If a hypothetical charset happens to place a letter
in those, um.. code points?, the letter will be dropped. But meh..
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] utf8: die if failed to re-encoding
  2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
@ 2012-02-21 17:36   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2012-02-21 17:36 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Return value NULL in this case means "no conversion needed", which is
> not quite true when conv == -1.

Doing this only when producing new commits to avoid spreading damage might
be a good idea.

But utf8.c::reencode_string() is sufficiently deep in the call-chains to
make me suspect that the codepaths this change affects are not limited to
creation ones.  If this also forbids readers from resurrecting salvageable
bits while reading (imagine your commit had "encodign vscii" but your log
message was in English, except only your name had letters outside ASCII
that I cannot locally convert to utf-8 for viewing), I do not think it is
an acceptable change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
  2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
  2012-02-21 14:53   ` Nguyen Thai Ngoc Duy
@ 2012-02-21 18:21   ` Jeff King
  2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
  2012-02-23 11:25     ` Peter Krefting
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff King @ 2012-02-21 18:21 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Tue, Feb 21, 2012 at 09:24:50PM +0700, Nguyen Thai Ngoc Duy wrote:

> We rely on ASCII everywhere. We print "\n" directly without conversion
> for example. The end result would be a mix of some encoding and ASCII
> if they are incompatible. Do not do that.
> 
> In theory we could convert everything to utf-8 as intermediate medium,
> process process process, then convert final output to the desired
> encoding. But that's a lot of work (unless we have a pager-like
> converter) with little real use. Users can just pipe everything to
> iconv instead.

I'm not sure why we bother checking this. Using non-ASCII-superset
encodings is broken, yes, but are people actually doing that? I assume
that the common one is utf-16, and anybody using it will experience
severe breakage immediately. So are people actually doing this? Are
there actually encodings that will cause subtle breakage that we want to
catch?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole
  2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
@ 2012-02-21 18:25   ` Jeff King
  2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2012-02-21 18:25 UTC (permalink / raw
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote:

> Commit object has its own format, which happens to be in ascii, but
> not really subject to re-encoding.
> 
> There are only four areas that may be re-encoded: author line,
> committer line, mergetag lines and commit body.  Encoding of tags
> embedded in mergetag lines is not decided by commit encoding, so leave
> it out and consider it binary.

Is this worth the effort? Yes, re-encoding the ASCII bits of the commit
object is unnecessary. But do we actually handle encodings that are not
ASCII supersets? IOW, I could see the point if this is making it
possible to hold utf-16 names and messages in your commits (though why
you would want to do so is beyond me...). But my understanding is that
this is horribly broken anyway by other parts of the code. And even
looking at your code below:

> +static char *reencode_commit(const char *buffer,
> +			     const char *out_enc, const char *in_enc)
> +{
> +	struct strbuf out = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	char *reencoded, *s, *e;
> +
> +	strbuf_addstr(&buf, buffer);
> +
> +	s = strstr(buf.buf, "\nauthor ");
> +	assert(s != NULL);

Wouldn't this assert trigger in the presence of encodings which
contain ASCII NUL (e.g., wide encodings like utf-16)?

Is there an encoding you have in mind which would be helped by this?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole
  2012-02-21 18:25   ` Jeff King
@ 2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
  2012-02-22  3:14       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-22  2:01 UTC (permalink / raw
  To: Jeff King; +Cc: git

2012/2/22 Jeff King <peff@peff.net>:
> On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> Commit object has its own format, which happens to be in ascii, but
>> not really subject to re-encoding.
>>
>> There are only four areas that may be re-encoded: author line,
>> committer line, mergetag lines and commit body.  Encoding of tags
>> embedded in mergetag lines is not decided by commit encoding, so leave
>> it out and consider it binary.
>
> Is this worth the effort? Yes, re-encoding the ASCII bits of the commit
> object is unnecessary. But do we actually handle encodings that are not
> ASCII supersets? IOW, I could see the point if this is making it
> possible to hold utf-16 names and messages in your commits (though why
> you would want to do so is beyond me...). But my understanding is that
> this is horribly broken anyway by other parts of the code. And even
> looking at your code below:

No, utf-16 and friends are out of question. 617/1168 supported
encodings in iconv translate chars 10,32-126 to something else, some
of them does not generate NUL. I suppose none of these are actually
used nowadays. Looking again, some don't even successfully translate
the given input. No, it's probably not worth the effort.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
  2012-02-21 18:21   ` Jeff King
@ 2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
  2012-02-23 11:25     ` Peter Krefting
  1 sibling, 0 replies; 12+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-22  2:17 UTC (permalink / raw
  To: Jeff King; +Cc: git

2012/2/22 Jeff King <peff@peff.net>:
> On Tue, Feb 21, 2012 at 09:24:50PM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> We rely on ASCII everywhere. We print "\n" directly without conversion
>> for example. The end result would be a mix of some encoding and ASCII
>> if they are incompatible. Do not do that.
>>
>> In theory we could convert everything to utf-8 as intermediate medium,
>> process process process, then convert final output to the desired
>> encoding. But that's a lot of work (unless we have a pager-like
>> converter) with little real use. Users can just pipe everything to
>> iconv instead.
>
> I'm not sure why we bother checking this. Using non-ASCII-superset
> encodings is broken, yes, but are people actually doing that? I assume
> that the common one is utf-16, and anybody using it will experience
> severe breakage immediately. So are people actually doing this? Are
> there actually encodings that will cause subtle breakage that we want to
> catch?

I did :-) once actually. But that's a good point, using unsuitable
encoding leads to garbage output, but no subtle breakage there. It'd
be nice to say "your encoding is not supported" than throw garbage,
but again probably no one did it but me, and I don't feel like doing
it again.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole
  2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
@ 2012-02-22  3:14       ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2012-02-22  3:14 UTC (permalink / raw
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, git

By the way, zj/term-columns topic has already graduated to 'master', so if
you are still interested in your earlier nd/columns topic, it would be a
good time to re-roll it.

No hurries, but pointing it out just in case you forgot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
  2012-02-21 18:21   ` Jeff King
  2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
@ 2012-02-23 11:25     ` Peter Krefting
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Krefting @ 2012-02-23 11:25 UTC (permalink / raw
  To: Git Mailing List; +Cc: Jeff King, Nguyễn Thái Ngọc Duy

Jeff King:

> I'm not sure why we bother checking this. Using non-ASCII-superset
> encodings is broken, yes, but are people actually doing that?
[...]
> Are there actually encodings that will cause subtle breakage that we want 
> to catch?

Shift-JIS could be a problem; if implemented to the letter it would convert 
0x5C to a Yen character and 0x7E as an overline. Otherwise I expect it only 
being a problem with ISO 646 encodings, especially the ones that replace "@" 
with something else [1].

Also any ISO 2022 seven-bit encoding (ISO-2022-{CN,JP,KR}) could cause 
problems, especially if there is any preprocessing done on the string that 
does not respect its state-shifting (most 0x21--0x7E characters can be lead 
and trail bytes in their multi-byte modes).

-- 
\\// Peter - http://www.softwolves.pp.se/

  [1] Trying to send Internet e-mail from a system using the extended
      Swedish seven-bit encoding, where 0x40 mapped to "É", could
      sometimes be a challenge.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-02-23 11:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy
2012-02-21 14:53   ` Nguyen Thai Ngoc Duy
2012-02-21 18:21   ` Jeff King
2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
2012-02-23 11:25     ` Peter Krefting
2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
2012-02-21 17:36   ` Junio C Hamano
2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
2012-02-21 18:25   ` Jeff King
2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
2012-02-22  3:14       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).