From: lars.schneider@autodesk.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, tboegi@web.de, j6t@kdbg.org,
sunshine@sunshineco.com, peff@peff.net,
ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
Lars Schneider <larsxschneider@gmail.com>
Subject: [PATCH v7 0/7] convert: add support for different encodings
Date: Thu, 15 Feb 2018 16:27:04 +0100 [thread overview]
Message-ID: <20180215152711.158-1-lars.schneider@autodesk.com> (raw)
From: Lars Schneider <larsxschneider@gmail.com>
Hi,
Patches 1-4, 6 are preparation and helper functions.
Patch 5,7 are the actual change.
This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is already
in master.
Changes since v6:
* use consistent casing for core.checkRoundtripEncoding (Junio)
* fix gibberish in commit message (Junio)
* improve documentation (Torsten)
* improve advise messages (Torsten)
Thanks,
Lars
RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@gmail.com/
v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@autodesk.com/
v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@autodesk.com/
v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@autodesk.com/
v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@autodesk.com/
v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@web.de/
v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@autodesk.com/
Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/2b94bec353
Checkout: git fetch https://github.com/larsxschneider/git encoding-v7 && git checkout 2b94bec353
### Interdiff (v6..v7):
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index ea5a9509c6..10cb37795d 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -291,19 +291,20 @@ the content is reencoded back to the specified encoding.
Please note that using the `working-tree-encoding` attribute may have a
number of pitfalls:
-- Git clients that do not support the `working-tree-encoding` attribute
- will checkout the respective files UTF-8 encoded and not in the
- expected encoding. Consequently, these files will appear different
- which typically causes trouble. This is in particular the case for
- older Git versions and alternative Git implementations such as JGit
- or libgit2 (as of February 2018).
+- Third party Git implementations that do not support the
+ `working-tree-encoding` attribute will checkout the respective files
+ UTF-8 encoded and not in the expected encoding. Consequently, these
+ files will appear different which typically causes trouble. This is
+ in particular the case for older Git versions and alternative Git
+ implementations such as JGit or libgit2 (as of February 2018).
- Reencoding content to non-UTF encodings can cause errors as the
conversion might not be UTF-8 round trip safe. If you suspect your
- encoding to not be round trip safe, then add it to `core.checkRoundtripEncoding`
- to make Git check the round trip encoding (see linkgit:git-config[1]).
- SHIFT-JIS (Japanese character set) is known to have round trip issues
- with UTF-8 and is checked by default.
+ encoding to not be round trip safe, then add it to
+ `core.checkRoundtripEncoding` to make Git check the round trip
+ encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
+ set) is known to have round trip issues with UTF-8 and is checked by
+ default.
- Reencoding content requires resources that might slow down certain
Git operations (e.g 'git checkout' or 'git add').
@@ -327,7 +328,7 @@ explicitly define the line endings with `eol` if the `working-tree-encoding`
attribute is used to avoid ambiguity.
------------------------
-*.proj working-tree-encoding=UTF-16LE text eol=CRLF
+*.proj text working-tree-encoding=UTF-16LE eol=CRLF
------------------------
You can get a list of all available encodings on your platform with the
diff --git a/convert.c b/convert.c
index 71dffc7167..398cd9cf7b 100644
--- a/convert.c
+++ b/convert.c
@@ -352,29 +352,29 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
if (has_prohibited_utf_bom(enc->name, src, src_len)) {
const char *error_msg = _(
- "BOM is prohibited for '%s' if encoded as %s");
+ "BOM is prohibited in '%s' if encoded as %s");
+ /*
+ * This advise is shown for UTF-??BE and UTF-??LE encodings.
+ * We truncate the encoding name to 6 chars with %.6s to cut
+ * off the last two "byte order" characters.
+ */
const char *advise_msg = _(
- "You told Git to treat '%s' as %s. A byte order mark "
- "(BOM) is prohibited with this encoding. Either use "
- "%.6s as working tree encoding or remove the BOM from the "
- "file.");
-
- advise(advise_msg, path, enc->name, enc->name, enc->name);
+ "The file '%s' contains a byte order mark (BOM). "
+ "Please use %.6s as working-tree-encoding.");
+ advise(advise_msg, path, enc->name);
if (conv_flags & CONV_WRITE_OBJECT)
die(error_msg, path, enc->name);
else
error(error_msg, path, enc->name);
-
} else if (is_missing_required_utf_bom(enc->name, src, src_len)) {
const char *error_msg = _(
- "BOM is required for '%s' if encoded as %s");
+ "BOM is required in '%s' if encoded as %s");
const char *advise_msg = _(
- "You told Git to treat '%s' as %s. A byte order mark "
- "(BOM) is required with this encoding. Either use "
- "%sBE/%sLE as working tree encoding or add a BOM to the "
- "file.");
- advise(advise_msg, path, enc->name, enc->name, enc->name);
+ "The file '%s' is missing a byte order mark (BOM). "
+ "Please use %sBE or %sLE (depending on the byte order) "
+ "as working-tree-encoding.");
+ advise(advise_msg, path, enc->name, enc->name);
if (conv_flags & CONV_WRITE_OBJECT)
die(error_msg, path, enc->name);
else
@@ -405,7 +405,7 @@ static int encode_to_git(const char *path, const char *src, size_t src_len,
* Unicode aims to be a superset of all other character encodings.
* However, certain encodings (e.g. SHIFT-JIS) are known to have round
* trip issues [2]. Check the round trip conversion for all encodings
- * listed in core.checkRoundTripEncoding.
+ * listed in core.checkRoundtripEncoding.
*
* The round trip check is only performed if content is written to Git.
* This ensures that no information is lost during conversion to/from
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 5dcdd5f899..e4717402a5 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -221,10 +221,10 @@ test_expect_success 'check roundtrip encoding' '
git reset &&
# ... unless we overwrite the Git config!
- test_config core.checkRoundTripEncoding "garbage" &&
+ test_config core.checkRoundtripEncoding "garbage" &&
! GIT_TRACE=1 git add .gitattributes roundtrip.shift 2>&1 >/dev/null |
grep "Checking roundtrip encoding for SHIFT-JIS" &&
- test_unconfig core.checkRoundTripEncoding &&
+ test_unconfig core.checkRoundtripEncoding &&
git reset &&
# UTF-16 encoded files should not be round-trip checked by default...
@@ -233,14 +233,14 @@ test_expect_success 'check roundtrip encoding' '
git reset &&
# ... unless we tell Git to check it!
- test_config_global core.checkRoundTripEncoding "UTF-16, UTF-32" &&
+ test_config_global core.checkRoundtripEncoding "UTF-16, UTF-32" &&
GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null |
grep "Checking roundtrip encoding for UTF-16" &&
git reset &&
# ... unless we tell Git to check it!
# (here we also check that the casing of the encoding is irrelevant)
- test_config_global core.checkRoundTripEncoding "UTF-32, utf-16" &&
+ test_config_global core.checkRoundtripEncoding "UTF-32, utf-16" &&
GIT_TRACE=1 git add roundtrip.utf16 2>&1 >/dev/null |
grep "Checking roundtrip encoding for UTF-16" &&
git reset &&
### Patches
Lars Schneider (7):
strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
strbuf: add xstrdup_toupper()
utf8: add function to detect prohibited UTF-16/32 BOM
utf8: add function to detect a missing UTF-16/32 BOM
convert: add 'working-tree-encoding' attribute
convert: add tracing for 'working-tree-encoding' attribute
convert: add round trip check based on 'core.checkRoundtripEncoding'
Documentation/config.txt | 6 +
Documentation/gitattributes.txt | 74 +++++++++++
config.c | 5 +
convert.c | 256 ++++++++++++++++++++++++++++++++++++++-
convert.h | 2 +
environment.c | 1 +
sha1_file.c | 2 +-
strbuf.c | 13 +-
strbuf.h | 1 +
t/t0028-working-tree-encoding.sh | 253 ++++++++++++++++++++++++++++++++++++++
utf8.c | 37 ++++++
utf8.h | 25 ++++
12 files changed, 672 insertions(+), 3 deletions(-)
create mode 100755 t/t0028-working-tree-encoding.sh
base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417
--
2.16.1
next reply other threads:[~2018-02-15 18:16 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-15 15:27 lars.schneider [this message]
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55 ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45 ` Jeff King
2018-02-16 19:30 ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` [PATCH v7 0/7] convert: add support for different encodings Junio C Hamano
2018-02-15 22:09 ` Jeff King
2018-02-16 18:55 ` Junio C Hamano
2018-02-16 19:25 ` Jeff King
2018-02-16 19:27 ` Jeff King
2018-02-16 19:41 ` Junio C Hamano
2018-02-21 18:06 ` Lars Schneider
2018-02-16 14:42 ` Lars Schneider
2018-02-16 16:58 ` Torsten Bögershausen
2018-02-22 20:00 ` Lars Schneider
2018-02-22 20:12 ` Jeff King
2018-02-23 16:35 ` Junio C Hamano
2018-02-23 20:11 ` Junio C Hamano
2018-02-24 15:18 ` Lars Schneider
2018-02-26 1:44 ` Jeff King
2018-02-26 17:35 ` Torsten Bögershausen
2018-02-26 20:46 ` Jeff King
2018-02-27 21:05 ` Torsten Bögershausen
2018-02-27 21:25 ` Jeff King
2018-02-27 21:55 ` Junio C Hamano
2018-02-27 21:58 ` Jeff King
2018-02-27 22:10 ` Junio C Hamano
2018-02-27 22:20 ` Jeff King
2018-02-28 8:20 ` Torsten Bögershausen
2018-02-28 13:21 ` Jeff King
2018-02-28 17:42 ` Junio C Hamano
2018-03-01 7:49 ` Jeff King
2018-03-04 10:16 ` Torsten Bögershausen
2018-02-28 20:46 ` Lars Schneider
2018-02-16 19:04 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180215152711.158-1-lars.schneider@autodesk.com \
--to=lars.schneider@autodesk.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=larsxschneider@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sunshine@sunshineco.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).