git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: lars.schneider@autodesk.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, tboegi@web.de, j6t@kdbg.org,
	sunshine@sunshineco.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
	pclouds@gmail.com, avarab@gmail.com,
	Lars Schneider <larsxschneider@gmail.com>
Subject: [PATCH v13 00/10] convert: add support for different encodings
Date: Sun, 15 Apr 2018 20:16:00 +0200	[thread overview]
Message-ID: <20180415181610.1612-1-lars.schneider@autodesk.com> (raw)

From: Lars Schneider <larsxschneider@gmail.com>

Hi,

Patches 1-6,9 are preparation and helper functions.
Patch 7,8,10 are the actual change.

This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13).

The series can be rebased without conflicts on top of v2.17.0:
https://github.com/larsxschneider/git/tree/encoding-2.17

Changes since v12:

* commit message improvement (Torsten)
* prevent undefined memcpy behavior in has_bom_prefix (Avar)
* improve error message: true/false are no valid working-tree-encodings (Torsten)
* fix crash in same_encoding() if only one argument is NULL (this bug
  was already present before this series, Eric)

Thanks,
Lars

  RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@gmail.com/
   v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@autodesk.com/
   v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@autodesk.com/
   v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@autodesk.com/
   v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@autodesk.com/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@web.de/
   v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@autodesk.com/
   v7: https://public-inbox.org/git/20180215152711.158-1-lars.schneider@autodesk.com/
   v8: https://public-inbox.org/git/20180224162801.98860-1-lars.schneider@autodesk.com/
   v9: https://public-inbox.org/git/20180304201418.60958-1-lars.schneider@autodesk.com/
  v10: https://public-inbox.org/git/20180307173026.30058-1-lars.schneider@autodesk.com/
  v11: https://public-inbox.org/git/20180309173536.62012-1-lars.schneider@autodesk.com/
  v12: https://public-inbox.org/git/20180315225746.18119-1-lars.schneider@autodesk.com/

Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/3aa98e6975
Checkout: git fetch https://github.com/larsxschneider/git encoding-v13 && git checkout 3aa98e6975


### Interdiff (v12..v13):

diff --git a/convert.c b/convert.c
index 2a002af66d..1ae6301629 100644
--- a/convert.c
+++ b/convert.c
@@ -1222,7 +1222,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
 		return NULL;

 	if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
-		die(_("working-tree-encoding attribute requires a value"));
+		die(_("true/false are no valid working-tree-encodings"));
 	}

 	/* Don't encode to the default encoding */
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 884f0878b1..12b8eb963a 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -152,7 +152,7 @@ test_expect_success 'check unsupported encodings' '
 	echo "*.set text working-tree-encoding" >.gitattributes &&
 	printf "set" >t.set &&
 	test_must_fail git add t.set 2>err.out &&
-	test_i18ngrep "working-tree-encoding attribute requires a value" err.out &&
+	test_i18ngrep "true/false are no valid working-tree-encodings" err.out &&

 	echo "*.unset text -working-tree-encoding" >.gitattributes &&
 	printf "unset" >t.unset &&
diff --git a/utf8.c b/utf8.c
index 2d8821d36e..25d366d6b3 100644
--- a/utf8.c
+++ b/utf8.c
@@ -428,8 +428,12 @@ int is_encoding_utf8(const char *name)

 int same_encoding(const char *src, const char *dst)
 {
-	if (is_encoding_utf8(src) && is_encoding_utf8(dst))
-		return 1;
+	static const char utf8[] = "UTF-8";
+
+	if (!src)
+		src = utf8;
+	if (!dst)
+		dst = utf8;
 	if (same_utf_encoding(src, dst))
 		return 1;
 	return !strcasecmp(src, dst);
@@ -559,7 +563,7 @@ char *reencode_string_len(const char *in, int insz,
 static int has_bom_prefix(const char *data, size_t len,
 			  const char *bom, size_t bom_len)
 {
-	return (len >= bom_len) && !memcmp(data, bom, bom_len);
+	return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
 }

 static const char utf16_be_bom[] = {0xFE, 0xFF};


### Patches

Lars Schneider (10):
  strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
  strbuf: add xstrdup_toupper()
  strbuf: add a case insensitive starts_with()
  utf8: teach same_encoding() alternative UTF encoding names
  utf8: add function to detect prohibited UTF-16/32 BOM
  utf8: add function to detect a missing UTF-16/32 BOM
  convert: add 'working-tree-encoding' attribute
  convert: check for detectable errors in UTF encodings
  convert: add tracing for 'working-tree-encoding' attribute
  convert: add round trip check based on 'core.checkRoundtripEncoding'

 Documentation/config.txt         |   6 +
 Documentation/gitattributes.txt  |  88 +++++++++++++
 config.c                         |   5 +
 convert.c                        | 276 ++++++++++++++++++++++++++++++++++++++-
 convert.h                        |   2 +
 environment.c                    |   1 +
 git-compat-util.h                |   1 +
 sha1_file.c                      |   2 +-
 strbuf.c                         |  22 +++-
 strbuf.h                         |   1 +
 t/t0028-working-tree-encoding.sh | 245 ++++++++++++++++++++++++++++++++++
 utf8.c                           |  65 ++++++++-
 utf8.h                           |  28 ++++
 13 files changed, 737 insertions(+), 5 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh


base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417
--
2.16.2


             reply	other threads:[~2018-04-15 18:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-15 18:16 lars.schneider [this message]
2018-04-15 18:16 ` [PATCH v13 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-04-15 18:16 ` [PATCH v13 02/10] strbuf: add xstrdup_toupper() lars.schneider
2018-04-15 18:16 ` [PATCH v13 03/10] strbuf: add a case insensitive starts_with() lars.schneider
2018-04-15 18:16 ` [PATCH v13 04/10] utf8: teach same_encoding() alternative UTF encoding names lars.schneider
2018-04-15 18:16 ` [PATCH v13 05/10] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-04-15 18:16 ` [PATCH v13 06/10] utf8: add function to detect a missing " lars.schneider
2018-04-15 18:16 ` [PATCH v13 07/10] convert: add 'working-tree-encoding' attribute lars.schneider
2018-04-15 18:16 ` [PATCH v13 08/10] convert: check for detectable errors in UTF encodings lars.schneider
2018-04-15 18:16 ` [PATCH v13 09/10] convert: add tracing for 'working-tree-encoding' attribute lars.schneider
2018-04-15 18:16 ` [PATCH v13 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180415181610.1612-1-lars.schneider@autodesk.com \
    --to=lars.schneider@autodesk.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).