git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: "Lars Schneider" <lars.schneider@autodesk.com>,
	"Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Jeff King" <peff@peff.net>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	Johannes.Schindelin@gmx.de, "Patrick Lühne" <patrick@luehne.de>
Subject: Re: [PATCH v3 0/7] convert: add support for different encodings
Date: Mon, 8 Jan 2018 15:38:48 +0100	[thread overview]
Message-ID: <B51940E6-95AB-424A-AF62-0018E9934279@gmail.com> (raw)
In-Reply-To: <20180107093815.GA7442@tor.lan>


> On 07 Jan 2018, at 10:38, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schneider@autodesk.com wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> Hi,
>> 
>> Patches 1-5 and 6 are helper functions and preparation.
>> Patch 6 is the actual change.
>> 
>> I am still torn between "checkout-encoding" and "working-tree-encoding"
>> as attribute name. I am happy to hear arguments for/against one or the
>> other.
> 
> checkout-encoding is probably misleading, as it is even the checkin-encoding.

Yeah, I start to think the same.


> What is wrong with working-tree-encoding ?
> I think the 2 "-".
> 
> What was wrong with workingtree-encoding ?

Yeah, the two dashes are a minor annoyance.

However, consider this:

$ git grep 'working tree' -- '*.txt' | wc -l
     570

$ git grep 'working-tree' -- '*.txt' | wc -l
       6

$ git grep 'workingtree' -- '*.txt' | wc -l
       0


$ git grep 'working tree' -- po | wc -l
     704

$ git grep 'working-tree' -- po | wc -l
       0

$ git grep 'workingtree' -- po | wc -l
       0

I think "working tree" is a pretty established term that
endusers might be able to understand. Therefore, I would
like to go with "working-tree-encoding" as it was written
that way at least 6 times in the Git tree before.

Would that work for you?


> Or
> workdir-encoding ?

Although I like the shortness, the term "workdir" might already 
be occupied [1]. Could that cause confusion?

[1] 4f01748d51 (contrib/workdir: add a simple script to create a working directory, 2007-03-27)


>> 
>> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
>> 
>> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
>> 
>> * Made BOM arrays static (Ramsay)
> 
> 
> Some comments:
> 
> I would like to have the CRLF conversion a little bit more strict -
> many users tend to set core.autocrlf=true or write "* text=auto"
> in the .gitattributes.
> Reading all the effort about BOM markers and UTF-16LE, I think there
> should ne some effort to make the line endings round trip.
> Therefore I changed convert.c to demand that the "text" attribute
> is set to enable CRLF conversions.
> (If I had submitted the patch, I would have demanded
> "text eol=lf" or "text eol=crlf", but the test case t0028 indicates
> that there is a demand to produce line endings as configured in core.eol)

But wouldn't that be inconvenient for the users? E.g. if I add a UTF-16
file on Windows with CRLF then it would be nice if Git would automatically
convert the line endings to LF on Linux, no?

IOW: Why should we handle text files that have a defined checkout-encoding
differently compared to UTF-8 encoded text files? Wouldn't that be unexpected
to the user?

Thanks,
Lars



> 
> Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
> https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B
> 
> Here is a inter-diff against your version:
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 1bc03e69c..b8d9f91c8 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text processing
>  tools (e.g. 'git diff') as well as most Git web front ends do not
>  visualize the content.
> 
> -In these cases you can teach Git the encoding of a file in the working
> +In these cases you can tell Git the encoding of a file in the working

Oops. I meant to change that already. Thanks!

>  directory with the `checkout-encoding` attribute. If a file with this
>  attributes is added to Git, then Git reencodes the content from the
>  specified encoding to UTF-8 and stores the result in its internal data
> @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot store a file in
>  UTF-8 encoding and if you want Git to be able to process the content as
>  text.
> 
> +Note that when `checkout-encoding` is defined, by default the line
> +endings are not converted. `text=auto` and core.autocrlf are ignored.
> +Set the `text` attribute to enable CRLF conversions.
> +
>  Use the following attributes if your '*.txt' files are UTF-16 encoded
> -with byte order mark (BOM) and you want Git to perform automatic line
> -ending conversion based on your platform.
> +with byte order mark (BOM).
> 
>  ------------------------
> -*.txt		text checkout-encoding=UTF-16
> +*.txt		checkout-encoding=UTF-16
>  ------------------------
> 
>  Use the following attributes if your '*.txt' files are UTF-16 little
> -endian encoded without BOM and you want Git to use Windows line endings
> -in the working directory.
> +endian encoded without BOM and you want Git to use LF in the repo and
> +CRLF in the working directory.
> 
>  ------------------------
>  *.txt 		checkout-encoding=UTF-16LE text eol=CRLF
> diff --git a/convert.c b/convert.c
> index 13f766d2a..1e29f515e 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char *path, enum crlf_action crlf_
>  	}
>  }
> 
> 
>  static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
> @@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate,
>  		 * cherry-pick.
>  		 */
>  		if ((!(conv_flags & CONV_EOL_RENORMALIZE)) &&
> -		    has_cr_in_index(istate, path))
> +		    has_crlf_in_index(istate, path))
>  			convert_crlf_into_lf = 0;
>  	}
>  	if (((conv_flags & CONV_EOL_RNDTRP_WARN) ||
> @@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
>  			ca->crlf_action = git_path_check_crlf(ccheck + 0);
>  		ca->ident = git_path_check_ident(ccheck + 1);
>  		ca->drv = git_path_check_convert(ccheck + 2);
> +		ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
>  		if (ca->crlf_action != CRLF_BINARY) {
>  			enum eol eol_attr = git_path_check_eol(ccheck + 3);
> -			if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
> +			if (ca->checkout_encoding) {
> +				enum crlf_action crlf_action = CRLF_BINARY;
> +				/*
> +				 * encoded files don't use auto.
> +				 * 'text' must be specified to
> +				 * do crlf conversions
> +				 */
> +				if (ca->crlf_action == CRLF_TEXT) {
> +					if (eol_attr == EOL_LF)
> +						crlf_action = CRLF_TEXT_INPUT;
> +					else if (eol_attr == EOL_CRLF)
> +						crlf_action = CRLF_TEXT_CRLF;
> +					else if (text_eol_is_crlf())
> +						crlf_action = CRLF_TEXT_CRLF;
> +					else
> +						crlf_action = CRLF_TEXT_INPUT;
> +				}
> +				ca->crlf_action = crlf_action;
> +			} else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
>  				ca->crlf_action = CRLF_AUTO_INPUT;
>  			else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
>  				ca->crlf_action = CRLF_AUTO_CRLF;
> @@ -1225,11 +1253,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
>  			else if (eol_attr == EOL_CRLF)
>  				ca->crlf_action = CRLF_TEXT_CRLF;
>  		}
> -		ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
>  	} else {
>  		ca->drv = NULL;
>  		ca->crlf_action = CRLF_UNDEFINED;
>  		ca->ident = 0;
> +		ca->checkout_encoding = NULL;
>  	}
> 
>  	/* Save attr and make a decision for action */
> 
> 


  reply	other threads:[~2018-01-08 14:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-06  0:48 [PATCH v3 0/7] convert: add support for different encodings lars.schneider
2018-01-06  0:48 ` [PATCH v3 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-01-06  0:48 ` [PATCH v3 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-01-06  0:48 ` [PATCH v3 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-01-06  0:48 ` [PATCH v3 4/7] utf8: add function to detect a missing " lars.schneider
2018-01-06  0:48 ` [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags lars.schneider
2018-01-08 21:28   ` Junio C Hamano
2018-01-08 22:47     ` Lars Schneider
2018-01-08 23:17       ` Junio C Hamano
2018-01-09  6:20       ` Torsten Bögershausen
2018-01-06  0:48 ` [PATCH v3 6/7] convert: add support for 'checkout-encoding' attribute lars.schneider
2018-01-06  0:48 ` [PATCH v3 7/7] convert: add tracing for checkout-encoding lars.schneider
2018-01-07  9:38 ` [PATCH v3 0/7] convert: add support for different encodings Torsten Bögershausen
2018-01-08 14:38   ` Lars Schneider [this message]
2018-01-08 18:08     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B51940E6-95AB-424A-AF62-0018E9934279@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=patrick@luehne.de \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).