git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: tboegi@web.de
To: git@vger.kernel.org, asottile@umich.edu
Cc: "Torsten Bögershausen" <tboegi@web.de>
Subject: [PATCH v4 2/2] File commited with CRLF should roundtrip diff and apply
Date: Sat, 19 Aug 2017 13:28:01 +0200	[thread overview]
Message-ID: <20170819112801.18564-1-tboegi@web.de> (raw)
In-Reply-To: <xmqqa832vymw.fsf@gitster.mtv.corp.google.com>

From: Torsten Bögershausen <tboegi@web.de>

When a file had been commited with CRLF but now .gitattributes say
"* text=auto" (or core.autocrlf is true),
the following does not roundtrip, `git apply` fails:

printf "Added line\r\n" >>file &&
git diff >patch &&
git checkout -- . &&
git apply patch

Before applying the patch, the file from working tree is converted into the
index format (clean filter, CRLF conversion, ...)
Here, when commited with CRLF, the line endings should not be converted.

Note that `git apply --index` or `git apply --cache` doesn't call
convert_to_git() because the source material is already in index format.

Analyze the patch if there is a) any context line with CRLF,
or b) if any line with CRLF is to be removed.
In this case the patch file `patch` has mixed line endings, for a)
it looks like this:

 diff --git a/one b/one
 index 533790e..c30dea8 100644
 --- a/one
 +++ b/one
 @@ -1 +1,2 @@
  a\r
 +b\r

And for b) it looks like this:

 diff --git a/one b/one
 index 533790e..485540d 100644
 --- a/one
 +++ b/one
 @@ -1 +1 @@
 -a\r
 +b\r

If `git apply` detects that the patch itself has CRLF, (look at the line
" a\r" or "-a\r" above), the new flag crlf_in_old is set in "struct patch"
and two things will happen:
- read_old_data() will not convert CRLF into LF by calling
  convert_to_git(..., SAFE_CRLF_KEEP_CRLF);
- The WS_CR_AT_EOL bit is set in the "white space rule",
  CRLF are no longer treated as white space.

While at there, make clear that read_old_data() in apply.c
knows what it wants convert_to_git() to do with respect to CRLF.  In
fact, this codepath is about applying a patch to a file in the
filesystem, which may not exist in the index, or may exist but may
not match what is recorded in the index, or in the extreme case, we
may not even be in a Git repository.  If convert_to_git() peeked at
the index while doing its work, it *would* be a bug.

Pass NULL instead of &the_index to convert_to_git() to make sure we
catch future bugs to clarify this.

Update the test in t4124: split one test case into 3:
- Detect the " a\r" line in the patch
- Detect the "-a\r" line in the patch
- Use LF in repo and CLRF in the worktree.

Reported-by: Anthony Sottile <asottile@umich.edu>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---

Changes since v3:
- took apply.c from junio/tb/apply-with-crlf
- Remove the leading asterix in the commit message, at the place
  where the "git diff" is cited.
- Mention "Pass NULL instead of &the_index to convert_to_git()"

apply.c                  | 41 ++++++++++++++++++++++++++++++++++++-----
 t/t4124-apply-ws-rule.sh | 33 +++++++++++++++++++++++++++------
 2 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/apply.c b/apply.c
index f2d599141d..66c68f193a 100644
--- a/apply.c
+++ b/apply.c
@@ -220,6 +220,7 @@ struct patch {
 	unsigned int recount:1;
 	unsigned int conflicted_threeway:1;
 	unsigned int direct_to_threeway:1;
+	unsigned int crlf_in_old:1;
 	struct fragment *fragments;
 	char *result;
 	size_t resultsize;
@@ -1662,6 +1663,19 @@ static void check_whitespace(struct apply_state *state,
 	record_ws_error(state, result, line + 1, len - 2, state->linenr);
 }
 
+/*
+ * Check if the patch has context lines with CRLF or
+ * the patch wants to remove lines with CRLF.
+ */
+static void check_old_for_crlf(struct patch *patch, const char *line, int len)
+{
+	if (len >= 2 && line[len-1] == '\n' && line[len-2] == '\r') {
+		patch->ws_rule |= WS_CR_AT_EOL;
+		patch->crlf_in_old = 1;
+	}
+}
+
+
 /*
  * Parse a unified diff. Note that this really needs to parse each
  * fragment separately, since the only way to know the difference
@@ -1712,11 +1726,14 @@ static int parse_fragment(struct apply_state *state,
 			if (!deleted && !added)
 				leading++;
 			trailing++;
+			check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action == correct_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
 			break;
 		case '-':
+			if (!state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -1725,6 +1742,8 @@ static int parse_fragment(struct apply_state *state,
 			trailing = 0;
 			break;
 		case '+':
+			if (state->apply_in_reverse)
+				check_old_for_crlf(patch, line, len);
 			if (!state->apply_in_reverse &&
 			    state->ws_error_action != nowarn_ws_error)
 				check_whitespace(state, line, len, patch->ws_rule);
@@ -2268,8 +2287,11 @@ static void show_stats(struct apply_state *state, struct patch *patch)
 		add, pluses, del, minuses);
 }
 
-static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
+static int read_old_data(struct stat *st, struct patch *patch,
+			 const char *path, struct strbuf *buf)
 {
+	enum safe_crlf safe_crlf = patch->crlf_in_old ?
+		SAFE_CRLF_KEEP_CRLF : SAFE_CRLF_RENORMALIZE;
 	switch (st->st_mode & S_IFMT) {
 	case S_IFLNK:
 		if (strbuf_readlink(buf, path, st->st_size) < 0)
@@ -2278,7 +2300,15 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(&the_index, path, buf->buf, buf->len, buf, 0);
+		/*
+		 * "git apply" without "--index/--cached" should never look
+		 * at the index; the target file may not have been added to
+		 * the index yet, and we may not even be in any Git repository.
+		 * Pass NULL to convert_to_git() to stress this; the function
+		 * should never look at the index when explicit crlf option
+		 * is given.
+		 */
+		convert_to_git(NULL, path, buf->buf, buf->len, buf, safe_crlf);
 		return 0;
 	default:
 		return -1;
@@ -3384,6 +3414,7 @@ static int load_patch_target(struct apply_state *state,
 			     struct strbuf *buf,
 			     const struct cache_entry *ce,
 			     struct stat *st,
+			     struct patch *patch,
 			     const char *name,
 			     unsigned expected_mode)
 {
@@ -3399,7 +3430,7 @@ static int load_patch_target(struct apply_state *state,
 		} else if (has_symlink_leading_path(name, strlen(name))) {
 			return error(_("reading from '%s' beyond a symbolic link"), name);
 		} else {
-			if (read_old_data(st, name, buf))
+			if (read_old_data(st, patch, name, buf))
 				return error(_("failed to read %s"), name);
 		}
 	}
@@ -3432,7 +3463,7 @@ static int load_preimage(struct apply_state *state,
 		/* We have a patched copy in memory; use that. */
 		strbuf_add(&buf, previous->result, previous->resultsize);
 	} else {
-		status = load_patch_target(state, &buf, ce, st,
+		status = load_patch_target(state, &buf, ce, st, patch,
 					   patch->old_name, patch->old_mode);
 		if (status < 0)
 			return status;
@@ -3520,7 +3551,7 @@ static int load_current(struct apply_state *state,
 	if (verify_index_match(ce, &st))
 		return error(_("%s: does not match index"), name);
 
-	status = load_patch_target(state, &buf, ce, &st, name, mode);
+	status = load_patch_target(state, &buf, ce, &st, patch, name, mode);
 	if (status < 0)
 		return status;
 	else if (status)
diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index d350065f25..4fc27c51f7 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -467,21 +467,42 @@ test_expect_success 'same, but with CR-LF line endings && cr-at-eol set' '
 	test_cmp one expect
 '
 
-test_expect_success 'same, but with CR-LF line endings && cr-at-eol unset' '
+test_expect_success 'CR-LF line endings && add line && text=auto' '
 	git config --unset core.whitespace &&
 	printf "a\r\n" >one &&
+	cp one save-one &&
+	git add one &&
 	printf "b\r\n" >>one &&
-	printf "c\r\n" >>one &&
+	cp one expect &&
+	git diff -- one >patch &&
+	mv save-one one &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
+
+test_expect_success 'CR-LF line endings && change line && text=auto' '
+	printf "a\r\n" >one &&
 	cp one save-one &&
-	printf "                 \r\n" >>one &&
 	git add one &&
+	printf "b\r\n" >one &&
 	cp one expect &&
-	printf "d\r\n" >>one &&
 	git diff -- one >patch &&
 	mv save-one one &&
-	echo d >>expect &&
+	echo "one text=auto" >.gitattributes &&
+	git apply patch &&
+	test_cmp one expect
+'
 
-	git apply --ignore-space-change --whitespace=fix patch &&
+test_expect_success 'LF in repo, CRLF in worktree && change line && text=auto' '
+	printf "a\n" >one &&
+	git add one &&
+	printf "b\r\n" >one &&
+	git diff -- one >patch &&
+	printf "a\r\n" >one &&
+	echo "one text=auto" >.gitattributes &&
+	git -c core.eol=CRLF apply patch &&
+	printf "b\r\n" >expect &&
 	test_cmp one expect
 '
 
-- 
2.14.0.rc1.15.gd40c2d4e85.dirty


  parent reply	other threads:[~2017-08-19 11:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-01 18:24 core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Anthony Sottile
2017-08-01 20:47 ` Torsten Bögershausen
2017-08-01 20:58   ` Anthony Sottile
2017-08-02 15:44     ` Torsten Bögershausen
2017-08-02 20:42       ` [PATCH v1 1/1] correct apply for files commited with CRLF tboegi
2017-08-02 21:13         ` Junio C Hamano
2017-08-04 17:31           ` Torsten Bögershausen
2017-08-04 17:57             ` Junio C Hamano
2017-08-04 19:26               ` Junio C Hamano
2017-08-02 20:58       ` core.autocrlf=true causes `git apply` to fail on patch generated with `git diff-index HEAD --patch` Junio C Hamano
2017-08-12  5:45         ` Torsten Bögershausen
2017-08-12  5:53           ` Torsten Bögershausen
2017-08-12 14:56         ` [PATCH/RFC] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-12 14:56         ` [PATCH/RFC] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-14 17:37           ` Junio C Hamano
2017-08-17  6:06             ` [PATCH v2 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-17  6:06             ` [PATCH v2 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-17  6:37               ` Junio C Hamano
2017-08-17 21:43             ` [PATCH v3 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-17 21:43             ` [PATCH v3 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-17 22:29               ` Junio C Hamano
2017-08-17 22:35               ` Junio C Hamano
2017-08-19 11:27             ` [PATCH v4 1/2] convert: Add SAFE_CRLF_KEEP_CRLF tboegi
2017-08-19 11:28             ` tboegi [this message]
2017-08-13  8:51         ` [PATCH/RFC " tboegi
2017-08-13  8:51         ` [PATCH/RFC 2/2] File commited with CRLF should roundtrip diff and apply tboegi
2017-08-16 18:28           ` Junio C Hamano
2017-08-16 18:28           ` [PATCH/FIXUP 3/2] apply: remove unused member apply_state.flags Junio C Hamano
2017-08-16 18:29           ` [PATCH/FIXUP 4/2] apply: only pay attention to CRLF in the preimage Junio C Hamano
2017-08-16 18:30           ` [PATCH/FIXUP 5/2] apply: localize the CRLF business to read_old_data() Junio C Hamano
2017-08-16 18:34           ` [PATCH/FIXUP 6/2] apply: clarify read_old_data() is about no-index case Junio C Hamano
2017-08-17  6:24             ` Torsten Bögershausen
2017-08-17  7:06               ` Junio C Hamano
2017-08-17  7:12               ` Junio C Hamano
2017-08-17  8:24                 ` Torsten Bögershausen
2017-08-17 17:07                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170819112801.18564-1-tboegi@web.de \
    --to=tboegi@web.de \
    --cc=asottile@umich.edu \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).