From: lars.schneider@autodesk.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, tboegi@web.de, j6t@kdbg.org,
sunshine@sunshineco.com, peff@peff.net,
ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
pclouds@gmail.com, avarab@gmail.com,
Lars Schneider <larsxschneider@gmail.com>
Subject: [PATCH v13 00/10] convert: add support for different encodings
Date: Sun, 15 Apr 2018 20:16:00 +0200 [thread overview]
Message-ID: <20180415181610.1612-1-lars.schneider@autodesk.com> (raw)
From: Lars Schneider <larsxschneider@gmail.com>
Hi,
Patches 1-6,9 are preparation and helper functions.
Patch 7,8,10 are the actual change.
This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13).
The series can be rebased without conflicts on top of v2.17.0:
https://github.com/larsxschneider/git/tree/encoding-2.17
Changes since v12:
* commit message improvement (Torsten)
* prevent undefined memcpy behavior in has_bom_prefix (Avar)
* improve error message: true/false are no valid working-tree-encodings (Torsten)
* fix crash in same_encoding() if only one argument is NULL (this bug
was already present before this series, Eric)
Thanks,
Lars
RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@gmail.com/
v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@autodesk.com/
v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@autodesk.com/
v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@autodesk.com/
v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@autodesk.com/
v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@web.de/
v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@autodesk.com/
v7: https://public-inbox.org/git/20180215152711.158-1-lars.schneider@autodesk.com/
v8: https://public-inbox.org/git/20180224162801.98860-1-lars.schneider@autodesk.com/
v9: https://public-inbox.org/git/20180304201418.60958-1-lars.schneider@autodesk.com/
v10: https://public-inbox.org/git/20180307173026.30058-1-lars.schneider@autodesk.com/
v11: https://public-inbox.org/git/20180309173536.62012-1-lars.schneider@autodesk.com/
v12: https://public-inbox.org/git/20180315225746.18119-1-lars.schneider@autodesk.com/
Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/3aa98e6975
Checkout: git fetch https://github.com/larsxschneider/git encoding-v13 && git checkout 3aa98e6975
### Interdiff (v12..v13):
diff --git a/convert.c b/convert.c
index 2a002af66d..1ae6301629 100644
--- a/convert.c
+++ b/convert.c
@@ -1222,7 +1222,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check)
return NULL;
if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
- die(_("working-tree-encoding attribute requires a value"));
+ die(_("true/false are no valid working-tree-encodings"));
}
/* Don't encode to the default encoding */
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 884f0878b1..12b8eb963a 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -152,7 +152,7 @@ test_expect_success 'check unsupported encodings' '
echo "*.set text working-tree-encoding" >.gitattributes &&
printf "set" >t.set &&
test_must_fail git add t.set 2>err.out &&
- test_i18ngrep "working-tree-encoding attribute requires a value" err.out &&
+ test_i18ngrep "true/false are no valid working-tree-encodings" err.out &&
echo "*.unset text -working-tree-encoding" >.gitattributes &&
printf "unset" >t.unset &&
diff --git a/utf8.c b/utf8.c
index 2d8821d36e..25d366d6b3 100644
--- a/utf8.c
+++ b/utf8.c
@@ -428,8 +428,12 @@ int is_encoding_utf8(const char *name)
int same_encoding(const char *src, const char *dst)
{
- if (is_encoding_utf8(src) && is_encoding_utf8(dst))
- return 1;
+ static const char utf8[] = "UTF-8";
+
+ if (!src)
+ src = utf8;
+ if (!dst)
+ dst = utf8;
if (same_utf_encoding(src, dst))
return 1;
return !strcasecmp(src, dst);
@@ -559,7 +563,7 @@ char *reencode_string_len(const char *in, int insz,
static int has_bom_prefix(const char *data, size_t len,
const char *bom, size_t bom_len)
{
- return (len >= bom_len) && !memcmp(data, bom, bom_len);
+ return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len);
}
static const char utf16_be_bom[] = {0xFE, 0xFF};
### Patches
Lars Schneider (10):
strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
strbuf: add xstrdup_toupper()
strbuf: add a case insensitive starts_with()
utf8: teach same_encoding() alternative UTF encoding names
utf8: add function to detect prohibited UTF-16/32 BOM
utf8: add function to detect a missing UTF-16/32 BOM
convert: add 'working-tree-encoding' attribute
convert: check for detectable errors in UTF encodings
convert: add tracing for 'working-tree-encoding' attribute
convert: add round trip check based on 'core.checkRoundtripEncoding'
Documentation/config.txt | 6 +
Documentation/gitattributes.txt | 88 +++++++++++++
config.c | 5 +
convert.c | 276 ++++++++++++++++++++++++++++++++++++++-
convert.h | 2 +
environment.c | 1 +
git-compat-util.h | 1 +
sha1_file.c | 2 +-
strbuf.c | 22 +++-
strbuf.h | 1 +
t/t0028-working-tree-encoding.sh | 245 ++++++++++++++++++++++++++++++++++
utf8.c | 65 ++++++++-
utf8.h | 28 ++++
13 files changed, 737 insertions(+), 5 deletions(-)
create mode 100755 t/t0028-working-tree-encoding.sh
base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417
--
2.16.2
next reply other threads:[~2018-04-15 18:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-15 18:16 lars.schneider [this message]
2018-04-15 18:16 ` [PATCH v13 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-04-15 18:16 ` [PATCH v13 02/10] strbuf: add xstrdup_toupper() lars.schneider
2018-04-15 18:16 ` [PATCH v13 03/10] strbuf: add a case insensitive starts_with() lars.schneider
2018-04-15 18:16 ` [PATCH v13 04/10] utf8: teach same_encoding() alternative UTF encoding names lars.schneider
2018-04-15 18:16 ` [PATCH v13 05/10] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-04-15 18:16 ` [PATCH v13 06/10] utf8: add function to detect a missing " lars.schneider
2018-04-15 18:16 ` [PATCH v13 07/10] convert: add 'working-tree-encoding' attribute lars.schneider
2018-04-15 18:16 ` [PATCH v13 08/10] convert: check for detectable errors in UTF encodings lars.schneider
2018-04-15 18:16 ` [PATCH v13 09/10] convert: add tracing for 'working-tree-encoding' attribute lars.schneider
2018-04-15 18:16 ` [PATCH v13 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180415181610.1612-1-lars.schneider@autodesk.com \
--to=lars.schneider@autodesk.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=larsxschneider@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sunshine@sunshineco.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).