git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Max Kirillov <max@max630.net>
To: Eric Sunshine <sunshine@sunshineco.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: Max Kirillov <max@max630.net>, git@vger.kernel.org
Subject: [PATCH v2] utf8.c: print warning about iconv errors
Date: Sat, 15 Aug 2015 00:55:34 +0300	[thread overview]
Message-ID: <1439589334-32318-1-git-send-email-max@max630.net> (raw)
In-Reply-To: <1433624551-20730-1-git-send-email-max@max630.net>

If reencoding a text data from one encoding to another fails, the original
version is used insted. Currently there is no warning about failed reencoding,
which can have an undesired outcome that returned data is incorrect but user
is not aware about it.

Add printing warning when conversion fails.

Also add test script to assert that warning is actually printed and output is
not changed, as expected.

Signed-off-by: Max Kirillov <max@max630.net>
---
Changes since v1:
* rebase to recent changes
* add handling runtime errors
* add test
* do not limit number of warnings - does not worth complicating the code
* noticed that incomplete utf8 sequence in input silently treated as latin1.
  so mark the testcase as expect_failure. Actually, it's quite surprising,
  would be nice if somebody tries it in various environments
Actually, as far as I could grep, all uses of the resoding happen
only for printing, so probably it is not that important.
 t/t3911-show-reencode.sh | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 utf8.c                   | 24 +++++++++++++++++++++++-
 utf8.h                   |  7 ++-----
 3 files changed, 71 insertions(+), 6 deletions(-)
 create mode 100755 t/t3911-show-reencode.sh

diff --git a/t/t3911-show-reencode.sh b/t/t3911-show-reencode.sh
new file mode 100755
index 0000000..061d820
--- /dev/null
+++ b/t/t3911-show-reencode.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='reencoding'
+
+. ./test-lib.sh
+
+printf '\304\201\n' >a_macron_utf8
+printf '\303\244\n' >a_diaeresis_utf8
+printf '\303\244\304\n' >incomplete_utf8
+printf '\344\n' >a_diaeresis_latin1
+
+test_expect_success 'setup' '
+	git commit --allow-empty -F a_diaeresis_utf8 &&
+	git tag latin1_utf8 &&
+	git commit --allow-empty -F a_macron_utf8 &&
+	git tag extended_utf8 &&
+	git commit --allow-empty -F incomplete_utf8 &&
+	git tag invalid_utf8
+'
+
+test_expect_success 'encoding to latin1' '
+	git log --encoding=latin1 --pretty=format:%B -1 latin1_utf8 >out 2>err &&
+	test_must_be_empty err &&
+	test_cmp out a_diaeresis_latin1
+'
+
+test_expect_success 'unknown encoding' '
+	git log --encoding=no-encoding --pretty=format:%B -1 latin1_utf8 >out 2>err &&
+	grep -q "not supported" err &&
+	test_cmp out a_diaeresis_utf8
+'
+
+# apparently incomplete UTF8 byte sequences silently treated as latin1
+test_expect_failure 'incomplete utf8' '
+	git log --encoding=latin1 --pretty=format:%B -1 invalid_utf8 >out 2>err &&
+	grep -q "Invalid input" err &&
+	test_cmp out incomplete_utf8
+'
+
+test_expect_success 'does not fit into latin1' '
+	git log --encoding=latin1 --pretty=format:%B -1 extended_utf8 >out 2>err &&
+	grep -q "Invalid input" err &&
+	test_cmp out a_macron_utf8
+'
+
+test_done
diff --git a/utf8.c b/utf8.c
index 28e6d76..d284bb0 100644
--- a/utf8.c
+++ b/utf8.c
@@ -465,7 +465,9 @@ char *reencode_string_iconv(const char *in, size_t insz, iconv_t conv, int *outs
 		if (cnt == (size_t) -1) {
 			size_t sofar;
 			if (errno != E2BIG) {
+				int failure_errno = errno;
 				free(out);
+				errno = failure_errno;
 				return NULL;
 			}
 			/* insz has remaining number of bytes.
@@ -513,14 +515,34 @@ char *reencode_string_len(const char *in, int insz,
 		if (is_encoding_utf8(out_encoding))
 			out_encoding = "UTF-8";
 		conv = iconv_open(out_encoding, in_encoding);
-		if (conv == (iconv_t) -1)
+		if (conv == (iconv_t) -1) {
+			if (errno == EINVAL)
+				warning("Conversion from %s to %s not supported, falling back to verbatim copy", in_encoding, out_encoding);
+			else
+				warning("Conversion from %s to %s failed: %s, falling back to verbatim copy", in_encoding, out_encoding, strerror(errno));
 			return NULL;
+		}
 	}
 
 	out = reencode_string_iconv(in, insz, conv, outsz);
+	if (out == NULL) {
+		if (errno == EILSEQ || errno == EINVAL)
+			warning("Invalid input for conversion from %s to %s, falling back to verbatim copy", in_encoding, out_encoding);
+		else
+			warning("Conversion from %s to %s failed: %s, falling back to verbatim copy", in_encoding, out_encoding, strerror(errno));
+	}
 	iconv_close(conv);
 	return out;
 }
+#else
+char *reencode_string_len(const char *in, int insz,
+			  const char *out_encoding, const char *in_encoding,
+			  int *outsz)
+{
+	if (!same_encoding(in_encoding, out_encoding))
+		warning("Iconv support is disabled at compile time. It is likely that\nincorrect data will be printed or stored in repository.\nConsider using other build for this task.");
+	return NULL;
+}
 #endif
 
 /*
diff --git a/utf8.h b/utf8.h
index 5a9e94b..c72998b 100644
--- a/utf8.h
+++ b/utf8.h
@@ -26,15 +26,12 @@ void strbuf_utf8_replace(struct strbuf *sb, int pos, int width,
 #ifndef NO_ICONV
 char *reencode_string_iconv(const char *in, size_t insz,
 			    iconv_t conv, int *outsz);
+#endif
+
 char *reencode_string_len(const char *in, int insz,
 			  const char *out_encoding,
 			  const char *in_encoding,
 			  int *outsz);
-#else
-static inline char *reencode_string_len(const char *a, int b,
-					const char *c, const char *d, int *e)
-{ if (e) *e = 0; return NULL; }
-#endif
 
 static inline char *reencode_string(const char *in,
 				    const char *out_encoding,
-- 
2.3.4.2801.g3d0809b

  parent reply	other threads:[~2015-08-14 22:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-06 21:02 [PATCH] utf8.c: print warning about disabled iconv Max Kirillov
2015-06-08 16:16 ` Junio C Hamano
2015-06-08 21:07   ` Max Kirillov
2015-06-08 21:14     ` Junio C Hamano
2015-08-14 21:55 ` Max Kirillov [this message]
2015-08-14 22:35   ` [PATCH v2] utf8.c: print warning about iconv errors Junio C Hamano
2015-08-17 19:02     ` Jeff King
2015-08-17 19:49       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439589334-32318-1-git-send-email-max@max630.net \
    --to=max@max630.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).