git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v7 0/7] convert: add support for different encodings
@ 2018-02-15 15:27 lars.schneider
  2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: lars.schneider @ 2018-02-15 15:27 UTC (permalink / raw)
  To: git
  Cc: gitster, tboegi, j6t, sunshine, peff, ramsay, Johannes.Schindelin,
	Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Hi,

Patches 1-4, 6 are preparation and helper functions.
Patch 5,7 are the actual change.

This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already
in master.

Changes since v6:

* use consistent casing for core.checkRoundtripEncoding (Junio)
* fix gibberish in commit message (Junio)
* improve documentation (Torsten)
* improve advise messages (Torsten)


Thanks,
Lars

  RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@gmail.com/
   v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@autodesk.com/
   v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@autodesk.com/
   v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@autodesk.com/
   v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@autodesk.com/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@web.de/
   v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@autodesk.com/


Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/2b94bec353
Checkout: git fetch https://github.com/larsxschneider/git encoding-v7 && git checkout 2b94bec353


### Interdiff (v6..v7):

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index ea5a9509c6..10cb37795d 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -291,19 +291,20 @@ the content is reencoded back to the specified encoding.
 Please note that using the `working-tree-encoding` attribute may have a
 number of pitfalls:

-- Git clients that do not support the `working-tree-encoding` attribute
-  will checkout the respective files UTF-8 encoded and not in the
-  expected encoding. Consequently, these files will appear different
-  which typically causes trouble. This is in particular the case for
-  older Git versions and alternative Git implementations such as JGit
-  or libgit2 (as of February 2018).
+- Third party Git implementations that do not support the
+  `working-tree-encoding` attribute will checkout the respective files
+  UTF-8 encoded and not in the expected encoding. Consequently, these
+  files will appear different which typically causes trouble. This is
+  in particular the case for older Git versions and alternative Git
+  implementations such as JGit or libgit2 (as of February 2018).

 - Reencoding content to non-UTF encodings can cause errors as the
   conversion might not be UTF-8 round trip safe. If you suspect your
-  encoding to not be round trip safe, then add it to `core.checkRoundtripEncoding`
-  to make Git check the round trip encoding (see linkgit:git-config[1]).
-  SHIFT-JIS (Japanese character set) is known to have round trip issues
-  with UTF-8 and is checked by default.
+  encoding to not be round trip safe, then add it to
+  `core.checkRoundtripEncoding` to make Git check the round trip
+  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
+  set) is known to have round trip issues with UTF-8 and is checked by
+  default.

 - Reencoding content requires resources that might slow down certain
   Git operations (e.g 'git checkout' or 'git add').
@@ -327,7 +328,7 @@ explicitly define the line endings with `eol` if the `working-tree-encoding`
 attribute is used to avoid ambiguity.

 ------------------------
-*.proj 		working-tree-encoding=UTF-16LE text eol=CRLF
+*.proj 		text working-tree-encoding=UTF-16LE eol=CRLF
 ------------------------

 You can get a list of all available encodings on your platform with the
diff --git a/convert.c b/convert.c
index 71dffc7167..398cd9cf7b 100644
--- a/convert.c
+++ b/convert.c
@@ -352,29 +352,29 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,

 	if (has_prohibited_utf_bom(enc->name, src, src_len)) {
 		const char *error_msg = _(
-			"BOM is prohibited for '%s' if encoded as %s");
+			"BOM is prohibited in '%s' if encoded as %s");
+		/*
+		 * This advise is shown for UTF-??BE and UTF-??LE encodings.
+		 * We truncate the encoding name to 6 chars with %.6s to cut
+		 * off the last two "byte order" characters.
+		 */
 		const char *advise_msg = _(
-			"You told Git to treat '%s' as %s. A byte order mark "
-			"(BOM) is prohibited with this encoding. Either use "
-			"%.6s as working tree encoding or remove the BOM from the "
-			"file.");
-
-		advise(advise_msg, path, enc->name, enc->name, enc->name);
+			"The file '%s' contains a byte order mark (BOM). "
+			"Please use %.6s as working-tree-encoding.");
+		advise(advise_msg, path, enc->name);
 		if (conv_flags & CONV_WRITE_OBJECT)
 			die(error_msg, path, enc->name);
 		else
 			error(error_msg, path, enc->name);

-
 	} else if (is_missing_required_utf_bom(enc->name, src, src_len)) {
 		const char *error_msg = _(
-			"BOM is required for '%s' if encoded as %s");
+			"BOM is required in '%s' if encoded as %s");
 		const char *advise_msg = _(
-			"You told Git to treat '%s' as %s. A byte order mark "
-			"(BOM) is required with this encoding. Either use "
-			"%sBE/%sLE as working tree encoding or add a BOM to the "
-			"file.");
-		advise(advise_msg, path, enc->name, enc->name, enc->name);
+			"The file '%s' is missing a byte order mark (BOM). "
+			"Please use %sBE or %sLE (depending on the byte order) "
+			"as working-tree-encoding.");
+		advise(advise_msg, path, enc->name, enc->name);
 		if (conv_flags & CONV_WRITE_OBJECT)
 			die(error_msg, path, enc->name);
 		else
@@ -405,7 +405,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
 	 * Unicode aims to be a superset of all other character encodings.
 	 * However, certain encodings (e.g. SHIFT-JIS) are known to have round
 	 * trip issues [2]. Check the round trip conversion for all encodings
-	 * listed in core.checkRoundTripEncoding.
+	 * listed in core.checkRoundtripEncoding.
 	 *
 	 * The round trip check is only performed if content is written to Git.
 	 * This ensures that no information is lost during conversion to/from
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 5dcdd5f899..e4717402a5 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -221,10 +221,10 @@ test_expect_success 'check roundtrip encoding' '
 	git reset &&

 	# ... unless we overwrite the Git config!
-	test_config core.checkRoundTripEncoding "garbage" &&
+	test_config core.checkRoundtripEncoding "garbage" &&
 	! GIT_TRACE=1 git add .gitattributes roundtrip.shift 2>&1 >/dev/null |
 		grep "Checking roundtrip encoding for SHIFT-JIS" &&
-	test_unconfig core.checkRoundTripEncoding &&
+	test_unconfig core.checkRoundtripEncoding &&
 	git reset &&

 	# UTF-16 encoded files should not be round-trip checked by default...
@@ -233,14 +233,14 @@ test_expect_success 'check roundtrip encoding' '
 	git reset &&

 	# ... unless we tell Git to check it!
-	test_config_global core.checkRoundTripEncoding "UTF-16, UTF-32" &&
+	test_config_global core.checkRoundtripEncoding "UTF-16, UTF-32" &&
 	GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null |
 		grep "Checking roundtrip encoding for UTF-16" &&
 	git reset &&

 	# ... unless we tell Git to check it!
 	# (here we also check that the casing of the encoding is irrelevant)
-	test_config_global core.checkRoundTripEncoding "UTF-32, utf-16" &&
+	test_config_global core.checkRoundtripEncoding "UTF-32, utf-16" &&
 	GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null |
 		grep "Checking roundtrip encoding for UTF-16" &&
 	git reset &&


### Patches

Lars Schneider (7):
  strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
  strbuf: add xstrdup_toupper()
  utf8: add function to detect prohibited UTF-16/32 BOM
  utf8: add function to detect a missing UTF-16/32 BOM
  convert: add 'working-tree-encoding' attribute
  convert: add tracing for 'working-tree-encoding' attribute
  convert: add round trip check based on 'core.checkRoundtripEncoding'

 Documentation/config.txt         |   6 +
 Documentation/gitattributes.txt  |  74 +++++++++++
 config.c                         |   5 +
 convert.c                        | 256 ++++++++++++++++++++++++++++++++++++++-
 convert.h                        |   2 +
 environment.c                    |   1 +
 sha1_file.c                      |   2 +-
 strbuf.c                         |  13 +-
 strbuf.h                         |   1 +
 t/t0028-working-tree-encoding.sh | 253 ++++++++++++++++++++++++++++++++++++++
 utf8.c                           |  37 ++++++
 utf8.h                           |  25 ++++
 12 files changed, 672 insertions(+), 3 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh


base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417
--
2.16.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-03-04 10:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-15 15:27 [PATCH v7 0/7] convert: add support for different encodings lars.schneider
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55   ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45     ` Jeff King
2018-02-16 19:30       ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` [PATCH v7 0/7] convert: add support for different encodings Junio C Hamano
2018-02-15 22:09   ` Jeff King
2018-02-16 18:55     ` Junio C Hamano
2018-02-16 19:25       ` Jeff King
2018-02-16 19:27         ` Jeff King
2018-02-16 19:41           ` Junio C Hamano
2018-02-21 18:06       ` Lars Schneider
2018-02-16 14:42   ` Lars Schneider
2018-02-16 16:58     ` Torsten Bögershausen
2018-02-22 20:00       ` Lars Schneider
2018-02-22 20:12         ` Jeff King
2018-02-23 16:35         ` Junio C Hamano
2018-02-23 20:11           ` Junio C Hamano
2018-02-24 15:18             ` Lars Schneider
2018-02-26  1:44               ` Jeff King
2018-02-26 17:35                 ` Torsten Bögershausen
2018-02-26 20:46                   ` Jeff King
2018-02-27 21:05                     ` Torsten Bögershausen
2018-02-27 21:25                       ` Jeff King
2018-02-27 21:55                         ` Junio C Hamano
2018-02-27 21:58                           ` Jeff King
2018-02-27 22:10                             ` Junio C Hamano
2018-02-27 22:20                               ` Jeff King
2018-02-28  8:20                         ` Torsten Bögershausen
2018-02-28 13:21                           ` Jeff King
2018-02-28 17:42                             ` Junio C Hamano
2018-03-01  7:49                               ` Jeff King
2018-03-04 10:16                             ` Torsten Bögershausen
2018-02-28 20:46                         ` Lars Schneider
2018-02-16 19:04     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).