From: "Torsten Bögershausen" <tboegi@web.de>
To: lars.schneider@autodesk.com
Cc: git@vger.kernel.org, gitster@pobox.com, j6t@kdbg.org,
sunshine@sunshineco.com, peff@peff.net,
ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
Lars Schneider <larsxschneider@gmail.com>
Subject: Re: [PATCH v3 0/7] convert: add support for different encodings
Date: Sun, 7 Jan 2018 10:38:15 +0100 [thread overview]
Message-ID: <20180107093815.GA7442@tor.lan> (raw)
In-Reply-To: <20180106004808.77513-1-lars.schneider@autodesk.com>
On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schneider@autodesk.com wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Hi,
>
> Patches 1-5 and 6 are helper functions and preparation.
> Patch 6 is the actual change.
>
> I am still torn between "checkout-encoding" and "working-tree-encoding"
> as attribute name. I am happy to hear arguments for/against one or the
> other.
checkout-encoding is probably misleading, as it is even the checkin-encoding.
What is wrong with working-tree-encoding ?
I think the 2 "-".
What was wrong with workingtree-encoding ?
Or
workdir-encoding ?
>
> Changes since v2:
>
> * Added Torsten's crlfsave refactoring patch (patch 5)
> @Torsten: I tried to make the commit message more clean, added
> some comments to and renamed conv_flags_eol to
> global_conv_flags_eol.
>
> * Improved documentation and commit message (Torsten)
Good, thanks.
>
> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
>
> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
>
> * Made BOM arrays static (Ramsay)
Some comments:
I would like to have the CRLF conversion a little bit more strict -
many users tend to set core.autocrlf=true or write "* text=auto"
in the .gitattributes.
Reading all the effort about BOM markers and UTF-16LE, I think there
should ne some effort to make the line endings round trip.
Therefore I changed convert.c to demand that the "text" attribute
is set to enable CRLF conversions.
(If I had submitted the patch, I would have demanded
"text eol=lf" or "text eol=crlf", but the test case t0028 indicates
that there is a demand to produce line endings as configured in core.eol)
Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B
Here is a inter-diff against your version:
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 1bc03e69c..b8d9f91c8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.
-In these cases you can teach Git the encoding of a file in the working
+In these cases you can tell Git the encoding of a file in the working
directory with the `checkout-encoding` attribute. If a file with this
attributes is added to Git, then Git reencodes the content from the
specified encoding to UTF-8 and stores the result in its internal data
@@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot store a file in
UTF-8 encoding and if you want Git to be able to process the content as
text.
+Note that when `checkout-encoding` is defined, by default the line
+endings are not converted. `text=auto` and core.autocrlf are ignored.
+Set the `text` attribute to enable CRLF conversions.
+
Use the following attributes if your '*.txt' files are UTF-16 encoded
-with byte order mark (BOM) and you want Git to perform automatic line
-ending conversion based on your platform.
+with byte order mark (BOM).
------------------------
-*.txt text checkout-encoding=UTF-16
+*.txt checkout-encoding=UTF-16
------------------------
Use the following attributes if your '*.txt' files are UTF-16 little
-endian encoded without BOM and you want Git to use Windows line endings
-in the working directory.
+endian encoded without BOM and you want Git to use LF in the repo and
+CRLF in the working directory.
------------------------
*.txt checkout-encoding=UTF-16LE text eol=CRLF
diff --git a/convert.c b/convert.c
index 13f766d2a..1e29f515e 100644
--- a/convert.c
+++ b/convert.c
@@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char *path, enum crlf_action crlf_
}
}
static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
@@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate,
* cherry-pick.
*/
if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
- has_cr_in_index(istate, path))
+ has_crlf_in_index(istate, path))
convert_crlf_into_lf = 0;
}
if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
@@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
ca->crlf_action = git_path_check_crlf(ccheck + 0);
ca->ident = git_path_check_ident(ccheck + 1);
ca->drv = git_path_check_convert(ccheck + 2);
+ ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
if (ca->crlf_action != CRLF_BINARY) {
enum eol eol_attr = git_path_check_eol(ccheck + 3);
- if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
+ if (ca->checkout_encoding) {
+ enum crlf_action crlf_action = CRLF_BINARY;
+ /*
+ * encoded files don't use auto.
+ * 'text' must be specified to
+ * do crlf conversions
+ */
+ if (ca->crlf_action == CRLF_TEXT) {
+ if (eol_attr == EOL_LF)
+ crlf_action = CRLF_TEXT_INPUT;
+ else if (eol_attr == EOL_CRLF)
+ crlf_action = CRLF_TEXT_CRLF;
+ else if (text_eol_is_crlf())
+ crlf_action = CRLF_TEXT_CRLF;
+ else
+ crlf_action = CRLF_TEXT_INPUT;
+ }
+ ca->crlf_action = crlf_action;
+ } else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
ca->crlf_action = CRLF_AUTO_INPUT;
else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
ca->crlf_action = CRLF_AUTO_CRLF;
@@ -1225,11 +1253,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
else if (eol_attr == EOL_CRLF)
ca->crlf_action = CRLF_TEXT_CRLF;
}
- ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
} else {
ca->drv = NULL;
ca->crlf_action = CRLF_UNDEFINED;
ca->ident = 0;
+ ca->checkout_encoding = NULL;
}
/* Save attr and make a decision for action */
next prev parent reply other threads:[~2018-01-07 9:38 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-06 0:48 [PATCH v3 0/7] convert: add support for different encodings lars.schneider
2018-01-06 0:48 ` [PATCH v3 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-01-06 0:48 ` [PATCH v3 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-01-06 0:48 ` [PATCH v3 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-01-06 0:48 ` [PATCH v3 4/7] utf8: add function to detect a missing " lars.schneider
2018-01-06 0:48 ` [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags lars.schneider
2018-01-08 21:28 ` Junio C Hamano
2018-01-08 22:47 ` Lars Schneider
2018-01-08 23:17 ` Junio C Hamano
2018-01-09 6:20 ` Torsten Bögershausen
2018-01-06 0:48 ` [PATCH v3 6/7] convert: add support for 'checkout-encoding' attribute lars.schneider
2018-01-06 0:48 ` [PATCH v3 7/7] convert: add tracing for checkout-encoding lars.schneider
2018-01-07 9:38 ` Torsten Bögershausen [this message]
2018-01-08 14:38 ` [PATCH v3 0/7] convert: add support for different encodings Lars Schneider
2018-01-08 18:08 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180107093815.GA7442@tor.lan \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=lars.schneider@autodesk.com \
--cc=larsxschneider@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).