git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2] autocrlf: Make it work also for un-normalized repositories
@ 2010-05-11 22:37 Finn Arne Gangstad
  2010-05-12  6:16 ` Dmitry Potapov
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Finn Arne Gangstad @ 2010-05-11 22:37 UTC (permalink / raw
  To: git, msysgit
  Cc: Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov,
	Johannes Schindelin

Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.

Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.

The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):

git checkout <something>
touch *
git add -A .    (will add nothing)
git commit      (nothing to commit)

Previously this would break for any text file containing a CR.

Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.

I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?

Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):

1. Setting autocrlf globally is a pain since autocrlf does not work well
   with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
   (the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
   at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
   the same EOL style.

I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4 (it solves 2 by being
set by default, which is early enough for clone).

I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found.

Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
---

Changes since v1:

Only check for CRs in the index if we're in CRLF_GUESS mode, if any
crlf attributes are set, we assume the attributes are set correctly
and do not scan for CRs.

Added tests. Note that I only ran these on Linux, so I'm not 100% sure
they are written portably.

I have talked a bit to Eyvind about his eol-patch, so this is
rewritten slightly to be easier to merge with his changes.


 convert.c       |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 t/t0020-crlf.sh |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/convert.c b/convert.c
index 4f8fcb7..3458bca 100644
--- a/convert.c
+++ b/convert.c
@@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
 	}
 }
 
+static int has_cr_in_index(const char *path)
+{
+	int pos, len;
+	unsigned long sz;
+	enum object_type type;
+	void *data;
+	int has_cr;
+	struct index_state *istate = &the_index;
+
+	len = strlen(path);
+	pos = index_name_pos(istate, path, len);
+	if (pos < 0) {
+		/*
+		 * We might be in the middle of a merge, in which
+		 * case we would read stage #2 (ours).
+		 */
+		int i;
+		for (i = -pos - 1;
+		     (pos < 0 && i < istate->cache_nr &&
+		      !strcmp(istate->cache[i]->name, path));
+		     i++)
+			if (ce_stage(istate->cache[i]) == 2)
+				pos = i;
+	}
+	if (pos < 0)
+		return 0;
+	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return 0;
+	}
+
+	has_cr = memchr(data, '\r', sz) != NULL;
+	free(data);
+	return has_cr;
+}
+
 static int crlf_to_git(const char *path, const char *src, size_t len,
                        struct strbuf *buf, int action, enum safe_crlf checksafe)
 {
@@ -145,6 +182,13 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 		 */
 		if (is_binary(len, &stats))
 			return 0;
+
+		/*
+		 * If the file in the index has any CR in it, do not convert.
+		 * This is the new safer autocrlf handling.
+		 */
+		if (has_cr_in_index(path))
+			return 0;
 	}
 
 	check_safe_crlf(path, action, &stats, checksafe);
@@ -203,6 +247,11 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 		return 0;
 
 	if (action == CRLF_GUESS) {
+		/* If we have any CR or CRLF line endings, we do not touch it */
+		/* This is the new safer autocrlf-handling */
+		if (stats.cr > 0 || stats.crlf > 0)
+			return 0;
+		
 		/* If we have any bare CR characters, we're not going to touch it */
 		if (stats.cr != stats.crlf)
 			return 0;
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index c3e7e32..f425302 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -453,5 +453,59 @@ test_expect_success 'invalid .gitattributes (must not crash)' '
 	git diff
 
 '
+# Some more tests here to add new autocrlf functionality.
+# We want to have a known state here, so start a bit from scratch
+
+test_expect_success 'setting up for new autocrlf tests' '
+
+	git config core.autocrlf false &&
+	git config core.safecrlf false &&
+	rm -rf .????* * &&
+ 	for w in I am all LF; do echo $w; done >alllf &&
+	for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >mixed &&
+	for w in I am all CRLF; do echo $w; done | append_cr >allcrlf &&
+	git add -A . &&
+	git commit -m "alllf, allcrlf and mixed only" &&
+	git tag -a -m "message" autocrlf-checkpoint
+'
+
+test_expect_success 'setting autocrlf gives empty diff' '
+
+	git config core.autocrlf true && 
+	touch * &&
+	git diff --exit-code
+'
+
+test_expect_success 'files are clean after checkout' '
+	rm * &&
+	git checkout -f &&
+	git diff --exit-code
+'
+
+cr_to_Q_no_NL () {
+    tr '\015' Q | tr -d '\012'
+}
+
+test_expect_success 'LF only file gets CRLF with autocrlf' '
+	[ $(cr_to_Q_no_NL < alllf) = "IQamQallQLFQ" ]
+'
+
+test_expect_success 'Mixed file is still mixed with autocrlf' '
+	[ $(cr_to_Q_no_NL < mixed) = "OhhereisCRLFQintext" ]
+'
+
+test_expect_success 'CRLF only file has CRLF with autocrlf' '
+	[ $(cr_to_Q_no_NL < allcrlf) = "IQamQallQCRLFQ" ]
+'
+
+test_expect_success 'New CRLF file gets LF in repo' '
+	tr -d "\015" < alllf | append_cr > alllf2 &&
+	git add alllf2 &&
+	git commit -m "alllf2 added" &&
+	git config core.autocrlf false &&
+	rm * &&
+	git checkout -f &&
+	test_cmp alllf alllf2 
+'
 
 test_done
-- 
1.7.1.1.g653e8

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] autocrlf: Make it work also for un-normalized repositories
  2010-05-11 22:37 [PATCH v2] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
@ 2010-05-12  6:16 ` Dmitry Potapov
  2010-05-12  6:33   ` Eyvind Bernhardsen
  2010-05-12  9:26 ` Hnerik Grubbström
  2010-05-23  9:30 ` Clemens Buchacher
  2 siblings, 1 reply; 5+ messages in thread
From: Dmitry Potapov @ 2010-05-12  6:16 UTC (permalink / raw
  To: Finn Arne Gangstad
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano,
	Johannes Schindelin

On Wed, May 12, 2010 at 12:37:57AM +0200, Finn Arne Gangstad wrote:
> @@ -203,6 +247,11 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
>  		return 0;
>  
>  	if (action == CRLF_GUESS) {
> +		/* If we have any CR or CRLF line endings, we do not touch it */
> +		/* This is the new safer autocrlf-handling */
> +		if (stats.cr > 0 || stats.crlf > 0)
> +			return 0;
> +		
>  		/* If we have any bare CR characters, we're not going to touch it */
>  		if (stats.cr != stats.crlf)
>  			return 0;

If there is no CR then there is no CRLF and certainly no bare CR
characters. So, all above checks can be replaced with one:

		if (stats.cr > 0)
			return 0;

Other than that, I really like your patch.


Thanks,
Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] autocrlf: Make it work also for un-normalized repositories
  2010-05-12  6:16 ` Dmitry Potapov
@ 2010-05-12  6:33   ` Eyvind Bernhardsen
  0 siblings, 0 replies; 5+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-12  6:33 UTC (permalink / raw
  To: Dmitry Potapov
  Cc: Finn Arne Gangstad, git@vger.kernel.org, msysgit@googlegroups.com,
	Junio C Hamano, Johannes Schindelin

On 12. mai 2010, at 08.16, Dmitry Potapov <dpotapov@gmail.com> wrote:

> On Wed, May 12, 2010 at 12:37:57AM +0200, Finn Arne Gangstad wrote:
>> @@ -203,6 +247,11 @@ static int crlf_to_worktree(const char *path,  
>> const char *src, size_t len,
>>        return 0;
>>
>>    if (action == CRLF_GUESS) {
>> +        /* If we have any CR or CRLF line endings, we do not touch  
>> it */
>> +        /* This is the new safer autocrlf-handling */
>> +        if (stats.cr > 0 || stats.crlf > 0)
>> +            return 0;
>> +
>>        /* If we have any bare CR characters, we're not going to  
>> touch it */
>>        if (stats.cr != stats.crlf)
>>            return 0;
>
> If there is no CR then there is no CRLF and certainly no bare CR
> characters. So, all above checks can be replaced with one:
>
>        if (stats.cr > 0)
>            return 0;
>
> Other than that, I really like your patch.

Keeping the tests separate helps merging with my patch. The idea is to  
not do the "safe autocrlf" test when crlf=auto, so in that case the CR  
test is still needed.
-- 
Eyvind

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] autocrlf: Make it work also for un-normalized repositories
  2010-05-11 22:37 [PATCH v2] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
  2010-05-12  6:16 ` Dmitry Potapov
@ 2010-05-12  9:26 ` Hnerik Grubbström
  2010-05-23  9:30 ` Clemens Buchacher
  2 siblings, 0 replies; 5+ messages in thread
From: Hnerik Grubbström @ 2010-05-12  9:26 UTC (permalink / raw
  To: git

Finn Arne Gangstad <finnag <at> pvv.org> writes:

> Previously, autocrlf would only work well for normalized
> repositories. Any text files that contained CRLF in the repository
> would cause problems, and would be modified when handled with
> core.autocrlf set.
> 
> The following sequence should now always be a NOP even with autocrlf
> set (assuming a clean working directory):
> 
> git checkout <something>
> touch *
> git add -A .    (will add nothing)
> git commit      (nothing to commit)

Please note that the above problem is solved for the general case with the
"[PATCH v3 0/5] Patches to avoid reporting conversion changes." patch set.

--
/grubba

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] autocrlf: Make it work also for un-normalized repositories
  2010-05-11 22:37 [PATCH v2] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
  2010-05-12  6:16 ` Dmitry Potapov
  2010-05-12  9:26 ` Hnerik Grubbström
@ 2010-05-23  9:30 ` Clemens Buchacher
  2 siblings, 0 replies; 5+ messages in thread
From: Clemens Buchacher @ 2010-05-23  9:30 UTC (permalink / raw
  To: Finn Arne Gangstad
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov,
	Johannes Schindelin

On Wed, May 12, 2010 at 12:37:57AM +0200, Finn Arne Gangstad wrote:

> Change autocrlf to not do any conversions to files that in the
> repository already contain a CR. git with autocrlf set will never
> create such a file, or change a LF only file to contain CRs, so the
> (new) assumption is that if a file contains a CR, it is intentional,
> and autocrlf should not change that.

I think this is a good change. But it only covers the part where we
translate CR -> LF when staging changes. With Eyvind's patches, if
I understand correctly, it will be possible to convert files to
have LF line endings. Such files will be translated from LF -> CR
when adding changes.

So if the file already has LF line endings, will this cause the
same problem the other way around?

Clemens

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-23  9:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11 22:37 [PATCH v2] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
2010-05-12  6:16 ` Dmitry Potapov
2010-05-12  6:33   ` Eyvind Bernhardsen
2010-05-12  9:26 ` Hnerik Grubbström
2010-05-23  9:30 ` Clemens Buchacher

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).