git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug report: Duplicate CRLF rewrite warnings on commit
@ 2016-05-13 13:49 Adam Dinwoodie
  2016-05-13 16:43 ` Junio C Hamano
                   ` (11 more replies)
  0 siblings, 12 replies; 53+ messages in thread
From: Adam Dinwoodie @ 2016-05-13 13:49 UTC (permalink / raw)
  To: git

If you use .gitattributes to enable CRLF->LF rewriting, then commit a
file that would have its line endings rewritten, the "CRLF will be
replaced by LF" warning is printed several times over; I'd expect it to
be printed only once.

There's a test case in t0020 -- "safecrlf: print warning only once" --
that checks the warning is only printed once when using `git add`, but
nothing that seems to check the same thing on `git commit`.  The
unnecessary multiple warnings seem to have existed since at least v1.6.0
(the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
couldn't get that to build on my current box), and I'm seeing them on my
Cygwin box's build of the current next branch (d10caa2) and on my CentOS
box's v2.8.1 release.

Example:

    $ git init
    Initialized empty Git repository in /home/Adam/test/.git/

    $ echo '* text' >.gitattributes

    $ echo 'lf-terminated line' >text

    $ git add .gitattributes text && git commit -m 'Initial commit'
    [master (root-commit) 9a18d39] Initial commit
     2 files changed, 2 insertions(+)
     create mode 100644 .gitattributes
     create mode 100644 text

    $ echo 'crlf-terminated line' | unix2dos >text

    $ git add text  # Single CRLF warning as expected
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.

    $ git commit -m 'CRLF'  # Should have one CRLF warning, actually get two
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.
    [master 4a8b1cb] CRLF
    warning: CRLF will be replaced by LF in text.
    The file will have its original line endings in your working directory.
     1 file changed, 1 insertion(+), 1 deletion(-)

(Tangentially: what's the accepted practice for submitting failing test
scripts?  I've written a short test case to add to t0020 that shows this
bugged behaviour, but I've got the vague impression from past emails
that leading with the patch email adding the failing test case is not
the expected way to do things on this list...)

Cheers,

Adam

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
@ 2016-05-13 16:43 ` Junio C Hamano
  2016-05-14  5:40   ` Torsten Bögershausen
  2016-05-13 18:12 ` Jeff King
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-13 16:43 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, Adam Dinwoodie

Adam Dinwoodie <adam@dinwoodie.org> writes:

> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
> file that would have its line endings rewritten, the "CRLF will be
> replaced by LF" warning is printed several times over; I'd expect it to
> be printed only once.
>
> There's a test case in t0020 -- "safecrlf: print warning only once" --
> that checks the warning is only printed once when using `git add`, but
> nothing that seems to check the same thing on `git commit`.  The
> unnecessary multiple warnings seem to have existed since at least v1.6.0
> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
> couldn't get that to build on my current box), and I'm seeing them on my
> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
> box's v2.8.1 release.

Torsten, I know you are not heavily invested in "commit" codepath,
but since you've been actively touching the CRLF area, I wonder
perhaps you might be interested in taking a look?

>
> Example:
>
>     $ git init
>     Initialized empty Git repository in /home/Adam/test/.git/
>
>     $ echo '* text' >.gitattributes
>
>     $ echo 'lf-terminated line' >text
>
>     $ git add .gitattributes text && git commit -m 'Initial commit'
>     [master (root-commit) 9a18d39] Initial commit
>      2 files changed, 2 insertions(+)
>      create mode 100644 .gitattributes
>      create mode 100644 text
>
>     $ echo 'crlf-terminated line' | unix2dos >text
>
>     $ git add text  # Single CRLF warning as expected
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>
>     $ git commit -m 'CRLF'  # Should have one CRLF warning, actually get two
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>     [master 4a8b1cb] CRLF
>     warning: CRLF will be replaced by LF in text.
>     The file will have its original line endings in your working directory.
>      1 file changed, 1 insertion(+), 1 deletion(-)
>
> (Tangentially: what's the accepted practice for submitting failing test
> scripts?  I've written a short test case to add to t0020 that shows this
> bugged behaviour, but I've got the vague impression from past emails
> that leading with the patch email adding the failing test case is not
> the expected way to do things on this list...)
>
> Cheers,
>
> Adam

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
  2016-05-13 16:43 ` Junio C Hamano
@ 2016-05-13 18:12 ` Jeff King
  2016-05-13 19:46   ` Junio C Hamano
  2016-05-15  6:08 ` [PATCH/RFC v1 0/1] Quickfix ?No duplicate " tboegi
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2016-05-13 18:12 UTC (permalink / raw)
  To: Adam Dinwoodie; +Cc: git

On Fri, May 13, 2016 at 02:49:53PM +0100, Adam Dinwoodie wrote:

> (Tangentially: what's the accepted practice for submitting failing test
> scripts?  I've written a short test case to add to t0020 that shows this
> bugged behaviour, but I've got the vague impression from past emails
> that leading with the patch email adding the failing test case is not
> the expected way to do things on this list...)

We don't want commits that fail the test suite, since it makes bisection
more difficult. But you can mark known bugs like:

   test_expect_failure 'git-foo should output bar' '
	...
   '

I think it's OK to submit a patch like that. Hopefully somebody picks
that up and combines it with their fix patch, but if not, then it at
least documents the failure for later generations.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-13 18:12 ` Jeff King
@ 2016-05-13 19:46   ` Junio C Hamano
  2016-05-13 19:53     ` Jeff King
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-13 19:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Adam Dinwoodie, git

Jeff King <peff@peff.net> writes:

> On Fri, May 13, 2016 at 02:49:53PM +0100, Adam Dinwoodie wrote:
>
>> (Tangentially: what's the accepted practice for submitting failing test
>> scripts?  I've written a short test case to add to t0020 that shows this
>> bugged behaviour, but I've got the vague impression from past emails
>> that leading with the patch email adding the failing test case is not
>> the expected way to do things on this list...)
>
> We don't want commits that fail the test suite, since it makes bisection
> more difficult. But you can mark known bugs like:
>
>    test_expect_failure 'git-foo should output bar' '
> 	...
>    '
>
> I think it's OK to submit a patch like that. Hopefully somebody picks
> that up and combines it with their fix patch, but if not, then it at
> least documents the failure for later generations.

... but we do not want to overdo this.

An expect_failure helps when somebody is actually starting to work
on a breakage by setting a goal, but otherwise nobody would look at
them.  I do not think we know if 120+ instances of expect_failure in
t/ we already have are still all valid; some of them even might be
requesting a behaviour that we decided is a wrong expectation later.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-13 19:46   ` Junio C Hamano
@ 2016-05-13 19:53     ` Jeff King
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2016-05-13 19:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Adam Dinwoodie, git

On Fri, May 13, 2016 at 12:46:57PM -0700, Junio C Hamano wrote:

> > We don't want commits that fail the test suite, since it makes bisection
> > more difficult. But you can mark known bugs like:
> >
> >    test_expect_failure 'git-foo should output bar' '
> > 	...
> >    '
> >
> > I think it's OK to submit a patch like that. Hopefully somebody picks
> > that up and combines it with their fix patch, but if not, then it at
> > least documents the failure for later generations.
> 
> ... but we do not want to overdo this.
> 
> An expect_failure helps when somebody is actually starting to work
> on a breakage by setting a goal, but otherwise nobody would look at
> them.  I do not think we know if 120+ instances of expect_failure in
> t/ we already have are still all valid; some of them even might be
> requesting a behaviour that we decided is a wrong expectation later.

True. I almost wrote more on this, but held back. But since you seem
interested... :)

I think that it's a really nice thing to write up your test as
test_expect_failure and post it to the list as a patch that is ready to
be applied, _even if we don't apply it_. Because that format makes it
super-easy for somebody who does want to work on it to just "git am"
your patch and focus on fixing it.

And your patch can either go at the front of their series, or they
can squash it in and acknowledge you in the commit message. But either
way it reduces the amount of work they have to do (whether they work on
the bug soon, or much later).

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-13 16:43 ` Junio C Hamano
@ 2016-05-14  5:40   ` Torsten Bögershausen
  2016-05-14 11:17     ` Adam Dinwoodie
  0 siblings, 1 reply; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-14  5:40 UTC (permalink / raw)
  To: Junio C Hamano, Torsten Bögershausen; +Cc: git, Adam Dinwoodie

On 13.05.16 18:43, Junio C Hamano wrote:
> Adam Dinwoodie <adam@dinwoodie.org> writes:
> 
>> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
>> file that would have its line endings rewritten, the "CRLF will be
>> replaced by LF" warning is printed several times over; I'd expect it to
>> be printed only once.
>>
>> There's a test case in t0020 -- "safecrlf: print warning only once" --
>> that checks the warning is only printed once when using `git add`, but
>> nothing that seems to check the same thing on `git commit`.  The
>> unnecessary multiple warnings seem to have existed since at least v1.6.0
>> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
>> couldn't get that to build on my current box), and I'm seeing them on my
>> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
>> box's v2.8.1 release.
> 
> Torsten, I know you are not heavily invested in "commit" codepath,
> but since you've been actively touching the CRLF area, I wonder
> perhaps you might be interested in taking a look?
> 
I have seen this double warning as well, but never digged further,
as it is most probably in the code outside convert.c

I can put it on the TODO list, and I think it would make sense to add
a TC in t0020, which is marked as test_expect_failure.
 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Bug report: Duplicate CRLF rewrite warnings on commit
  2016-05-14  5:40   ` Torsten Bögershausen
@ 2016-05-14 11:17     ` Adam Dinwoodie
  0 siblings, 0 replies; 53+ messages in thread
From: Adam Dinwoodie @ 2016-05-14 11:17 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, git

On Sat, May 14, 2016 at 07:40:11AM +0200, Torsten Bögershausen wrote:
> On 13.05.16 18:43, Junio C Hamano wrote:
> > Adam Dinwoodie <adam@dinwoodie.org> writes:
> > 
> >> If you use .gitattributes to enable CRLF->LF rewriting, then commit a
> >> file that would have its line endings rewritten, the "CRLF will be
> >> replaced by LF" warning is printed several times over; I'd expect it to
> >> be printed only once.
> >>
> >> There's a test case in t0020 -- "safecrlf: print warning only once" --
> >> that checks the warning is only printed once when using `git add`, but
> >> nothing that seems to check the same thing on `git commit`.  The
> >> unnecessary multiple warnings seem to have existed since at least v1.6.0
> >> (the warnings appear to have been added in v1.5.5 in 21e5ad5, but I
> >> couldn't get that to build on my current box), and I'm seeing them on my
> >> Cygwin box's build of the current next branch (d10caa2) and on my CentOS
> >> box's v2.8.1 release.
> > 
> > Torsten, I know you are not heavily invested in "commit" codepath,
> > but since you've been actively touching the CRLF area, I wonder
> > perhaps you might be interested in taking a look?
> > 
> I have seen this double warning as well, but never digged further,
> as it is most probably in the code outside convert.c
> 
> I can put it on the TODO list, and I think it would make sense to add
> a TC in t0020, which is marked as test_expect_failure.

Grand.  I'll submit the test patch now.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH/RFC v1 0/1] Quickfix ?No duplicate CRLF rewrite warnings on commit
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
  2016-05-13 16:43 ` Junio C Hamano
  2016-05-13 18:12 ` Jeff King
@ 2016-05-15  6:08 ` tboegi
  2016-05-15  6:08 ` [PATCH/RFC v1 1/1] No " tboegi
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15  6:08 UTC (permalink / raw)
  To: git; +Cc: adam, Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

It may be that this patch only covers over a sympton, rather
than fixing the root cause.

Torsten Bögershausen (1):
  No duplicate CRLF rewrite warnings on commit

 diff.c           |  2 ++
 diffcore-break.c |  6 ++++--
 diffcore.h       |  1 +
 t/t0020-crlf.sh  | 14 ++++++++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH/RFC v1 1/1] No duplicate CRLF rewrite warnings on commit
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (2 preceding siblings ...)
  2016-05-15  6:08 ` [PATCH/RFC v1 0/1] Quickfix ?No duplicate " tboegi
@ 2016-05-15  6:08 ` tboegi
  2016-05-15  6:15   ` Eric Sunshine
  2016-05-15  6:37 ` [PATCH v1 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-15  6:08 UTC (permalink / raw)
  To: git; +Cc: adam, Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

If .gitattributes are used to enable CRLF->LF rewriting,
then commiting a file that would have its line endings rewritten,
the "CRLF will be replaced by LF" warning is printed 2 times.
A user expects it to be printed only once.
The automatic rename detection by Git runs the conversion twice,
suppress the warning in the second run.

Reported-By: Adam Dinwoodie <adam@dinwoodie.org>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 diff.c           |  2 ++
 diffcore-break.c |  6 ++++--
 diffcore.h       |  1 +
 t/t0020-crlf.sh  | 14 ++++++++++++++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index d3734d3..c965afc 100644
--- a/diff.c
+++ b/diff.c
@@ -2749,6 +2749,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 	enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL
 				    ? SAFE_CRLF_WARN
 				    : safe_crlf);
+	if (flags & CHECK_IGNORE_CRLF_WARNING)
+		crlf_warn = SAFE_CRLF_FALSE;
 
 	if (!DIFF_FILE_VALID(s))
 		die("internal error: asking to populate invalid file.");
diff --git a/diffcore-break.c b/diffcore-break.c
index 5473493..4a75681 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -47,6 +47,7 @@ static int should_break(struct diff_filespec *src,
 	 */
 	unsigned long delta_size, max_size;
 	unsigned long src_copied, literal_added, src_removed;
+	unsigned flags2 = 0;
 
 	*merge_score_p = 0; /* assume no deletion --- "do not break"
 			     * is the default.
@@ -61,9 +62,10 @@ static int should_break(struct diff_filespec *src,
 	    !hashcmp(src->sha1, dst->sha1))
 		return 0; /* they are the same */
 
-	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
+	if (!strcmp(src->path, dst->path))
+		flags2 = CHECK_IGNORE_CRLF_WARNING;
+	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, flags2))
 		return 0; /* error but caught downstream */
-
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
 	if (max_size < MINIMUM_BREAK_SIZE)
 		return 0; /* we do not break too small filepair */
diff --git a/diffcore.h b/diffcore.h
index 33ea2de..91464aa 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -57,6 +57,7 @@ extern void fill_filespec(struct diff_filespec *, const unsigned char *,
 
 #define CHECK_SIZE_ONLY 1
 #define CHECK_BINARY    2
+#define CHECK_IGNORE_CRLF_WARNING    4
 extern int diff_populate_filespec(struct diff_filespec *, unsigned int);
 extern void diff_free_filespec_data(struct diff_filespec *);
 extern void diff_free_filespec_blob(struct diff_filespec *);
diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
index f94120a..f7d2f74 100755
--- a/t/t0020-crlf.sh
+++ b/t/t0020-crlf.sh
@@ -86,6 +86,20 @@ test_expect_success 'safecrlf: print warning only once' '
 	test $(git add doublewarn 2>&1 | grep "CRLF will be replaced by LF" | wc -l) = 1
 '
 
+test_expect_success 'safecrlf: print warning only once on commit' '
+
+	git config core.autocrlf input &&
+	git config core.safecrlf warn &&
+
+	for w in I am all LF; do echo $w; done >doublewarn2 &&
+	git add doublewarn2 &&
+	git commit -m "nowarn" &&
+	for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >doublewarn2 &&
+	git add doublewarn2 2>&1 &&
+	git commit -m Message 2>&1 | grep "CRLF will be replaced by LF" >actual &&
+	echo "warning: CRLF will be replaced by LF in doublewarn2." >expected &&
+	test_cmp expected actual
+'
 
 test_expect_success 'safecrlf: git diff demotes safecrlf=true to warn' '
 	git config core.autocrlf input &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC v1 1/1] No duplicate CRLF rewrite warnings on commit
  2016-05-15  6:08 ` [PATCH/RFC v1 1/1] No " tboegi
@ 2016-05-15  6:15   ` Eric Sunshine
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Sunshine @ 2016-05-15  6:15 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List, adam

On Sun, May 15, 2016 at 2:08 AM,  <tboegi@web.de> wrote:
> If .gitattributes are used to enable CRLF->LF rewriting,
> then commiting a file that would have its line endings rewritten,
> the "CRLF will be replaced by LF" warning is printed 2 times.
> A user expects it to be printed only once.
> The automatic rename detection by Git runs the conversion twice,
> suppress the warning in the second run.
>
> Reported-By: Adam Dinwoodie <adam@dinwoodie.org>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
> diff --git a/t/t0020-crlf.sh b/t/t0020-crlf.sh
> @@ -86,6 +86,20 @@ test_expect_success 'safecrlf: print warning only once' '
> +test_expect_success 'safecrlf: print warning only once on commit' '
> +
> +       git config core.autocrlf input &&
> +       git config core.safecrlf warn &&
> +
> +       for w in I am all LF; do echo $w; done >doublewarn2 &&

I would typically say something about how you could instead use:

    test_write_lines I am all LF >doublewarn2 &&

but since you're just mimicking existing style in this script, I won't
mention it.

> +       git add doublewarn2 &&
> +       git commit -m "nowarn" &&
> +       for w in Oh here is CRLFQ in text; do echo $w; done | q_to_cr >doublewarn2 &&

Likewise; note my silence.

> +       git add doublewarn2 2>&1 &&
> +       git commit -m Message 2>&1 | grep "CRLF will be replaced by LF" >actual &&
> +       echo "warning: CRLF will be replaced by LF in doublewarn2." >expected &&
> +       test_cmp expected actual
> +'

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v1 0/3] CRLF-Handling: bug fix around ce_compare_data()
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (3 preceding siblings ...)
  2016-05-15  6:08 ` [PATCH/RFC v1 1/1] No " tboegi
@ 2016-05-15  6:37 ` tboegi
  2016-05-15  6:38 ` [PATCH v1 1/3] t6038; use crlf on all platforms tboegi
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15  6:37 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Break up the old 10/10 series about CLRF handling into smaller
series.

1/10..4/10 are now found in tb/core-eol-fix

This series includes 3 from the 10/10 series:
09/10 t6038; use crlf on all platforms                    #now 1/3
05/10 read-cache: factor out get_sha1_from_index() helper #now 2/3
10/10 Fix for ce_compare_data(), done right.              #now 3/3

The most important patch is 3/3

The reminding 3 patches,
stream-and-early-out,
convert-unify-the-auto-handling-of-CRLF
more-safer-crlf-handling-with-text-attr
Need to be re-done, re-sorted and re-written, thanks for all the comments.

Torsten Bögershausen (3):
  t6038; use crlf on all platforms
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path

 cache.h                    |  4 ++++
 convert.c                  | 33 ++++++++++++++++++++++-----------
 convert.h                  | 23 +++++++++++++++++++----
 read-cache.c               | 33 +++++++++++++++++++++------------
 sha1_file.c                | 17 +++++++++++++----
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 6 files changed, 90 insertions(+), 57 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v1 1/3] t6038; use crlf on all platforms
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (4 preceding siblings ...)
  2016-05-15  6:37 ` [PATCH v1 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-15  6:38 ` tboegi
  2016-05-15  6:42   ` Eric Sunshine
  2016-05-15  6:38 ` [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-15  6:38 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

t6038 uses different code, dependig if NATIVE_CRLF is set ot not.
When the native line endings are LF, merge.renormalize is not tested very well.
Change the test to always use CRLF by setting core.eol=crlf.
After doing so, the test fails:

rm '.gitattributes'
rm 'control_file'
rm 'file'
rm 'inert_file'
HEAD is now at 0d9ffb6 add line from b
error: addinfo_cache failed for path 'file'
file: unmerged (cbd69ec7cd12dd0989e853923867d94c8519aa52)
file: unmerged (ad55e240aeb42e0d9a0e18d6d8b02dd82ee3e527)
file: unmerged (99b633103c15c20cebebf821133ab526b0ff90b2)
fatal: git write-tree failed to write a tree
Merging:
0d9ffb6 add line from b
virtual a
found 1 common ancestor:
1c56df1 Initial
Auto-merging file
not ok 4 - Merge addition of text=auto

This will be addressed in the next commit.
---
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..4dc8c1a 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
 
 test_expect_success setup '
 	git config core.autocrlf false &&
+	git config core.eol crlf &&
 
 	echo first line | append_cr >file &&
 	echo first line >control_file &&
@@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
 
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
 	echo "<<<<<<<" >expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected &&
-		echo ======= | append_cr >>expected
-	else
-		echo first line >>expected &&
-		echo same line >>expected &&
-		echo ======= >>expected
-	fi &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
+	echo ======= | append_cr >>expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
@@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
 	echo "<<<<<<<" >expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo ======= | append_cr >>expected &&
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected
-	else
-		echo ======= >>expected &&
-		echo first line >>expected &&
-		echo same line >>expected
-	fi &&
+	echo ======= | append_cr >>expected &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
 	git config merge.renormalize false &&
 	rm -f .gitattributes &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (5 preceding siblings ...)
  2016-05-15  6:38 ` [PATCH v1 1/3] t6038; use crlf on all platforms tboegi
@ 2016-05-15  6:38 ` tboegi
  2016-05-15  6:45   ` Eric Sunshine
  2016-05-15  6:38 ` [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-15  6:38 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (6 preceding siblings ...)
  2016-05-15  6:38 ` [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-15  6:38 ` tboegi
  2016-05-15  6:52   ` Eric Sunshine
  2016-05-15 22:14   ` Junio C Hamano
  2016-05-15 13:02 ` [PATCH v2 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 53+ messages in thread
From: tboegi @ 2016-05-15  6:38 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  1 +
 convert.c    | 33 ++++++++++++++++++++++-----------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..5a0cd03 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,29 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
 	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
-	if (!data)
+	enum object_type type;
+	if (!sha1)
+		sha1 = get_sha1_from_cache(path);
+	if (!sha1)
+		return 0;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
 		return 0;
+	}
+
 	has_cr = memchr(data, '\r', sz) != NULL;
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +268,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (has_cr_in_index(path, sha1))
 				return 0;
 		}
 	}
@@ -852,8 +860,9 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+			   const char *src, size_t len,
+			   struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +883,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +891,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +905,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+				  const char *src, size_t len,
+				  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+				 struct strbuf *dst, enum safe_crlf checksafe)
+{
+	return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+					       const unsigned char *sha1)
+{
+	return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_CE_HAS_SHA1;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git_ce_sha1(path,
+					   valid_sha1 ? sha1 : NULL,
+					   buf, size, &nbuf,
+					   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 1/3] t6038; use crlf on all platforms
  2016-05-15  6:38 ` [PATCH v1 1/3] t6038; use crlf on all platforms tboegi
@ 2016-05-15  6:42   ` Eric Sunshine
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Sunshine @ 2016-05-15  6:42 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List

On Sun, May 15, 2016 at 2:38 AM,  <tboegi@web.de> wrote:
> t6038 uses different code, dependig if NATIVE_CRLF is set ot not.

s/dependig/depending/
s/ot/or/

> When the native line endings are LF, merge.renormalize is not tested very well.
> Change the test to always use CRLF by setting core.eol=crlf.
> After doing so, the test fails:
> [...snip...]
> This will be addressed in the next commit.

Does this mean that the below tests now fail? If so, they should be
switched to use test_expect_failure here, and then swapped back to
test_expect_success in the patch which fixes the problem.

> ---
> diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
> index 85c10b0..4dc8c1a 100755
> --- a/t/t6038-merge-text-auto.sh
> +++ b/t/t6038-merge-text-auto.sh
> @@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
>
>  test_expect_success setup '
>         git config core.autocrlf false &&
> +       git config core.eol crlf &&
>
>         echo first line | append_cr >file &&
>         echo first line >control_file &&
> @@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
>         same line
>         EOF
>
> -       if test_have_prereq NATIVE_CRLF; then
> -               append_cr <expected >expected.temp &&
> -               mv expected.temp expected
> -       fi &&
> +       append_cr <expected >expected.temp &&
> +       mv expected.temp expected &&
>         git config merge.renormalize true &&
>         git rm -fr . &&
>         rm -f .gitattributes &&
> @@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
>         same line
>         EOF
>
> -       if test_have_prereq NATIVE_CRLF; then
> -               append_cr <expected >expected.temp &&
> -               mv expected.temp expected
> -       fi &&
> +       append_cr <expected >expected.temp &&
> +       mv expected.temp expected &&
>         git config merge.renormalize true &&
>         git rm -fr . &&
>         rm -f .gitattributes &&
> @@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
>
>  test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
>         echo "<<<<<<<" >expected &&
> -       if test_have_prereq NATIVE_CRLF; then
> -               echo first line | append_cr >>expected &&
> -               echo same line | append_cr >>expected &&
> -               echo ======= | append_cr >>expected
> -       else
> -               echo first line >>expected &&
> -               echo same line >>expected &&
> -               echo ======= >>expected
> -       fi &&
> +       echo first line | append_cr >>expected &&
> +       echo same line | append_cr >>expected &&
> +       echo ======= | append_cr >>expected &&
>         echo first line | append_cr >>expected &&
>         echo same line | append_cr >>expected &&
>         echo ">>>>>>>" >>expected &&
> @@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
>         echo "<<<<<<<" >expected &&
>         echo first line | append_cr >>expected &&
>         echo same line | append_cr >>expected &&
> -       if test_have_prereq NATIVE_CRLF; then
> -               echo ======= | append_cr >>expected &&
> -               echo first line | append_cr >>expected &&
> -               echo same line | append_cr >>expected
> -       else
> -               echo ======= >>expected &&
> -               echo first line >>expected &&
> -               echo same line >>expected
> -       fi &&
> +       echo ======= | append_cr >>expected &&
> +       echo first line | append_cr >>expected &&
> +       echo same line | append_cr >>expected &&
>         echo ">>>>>>>" >>expected &&
>         git config merge.renormalize false &&
>         rm -f .gitattributes &&
> --
> 2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper
  2016-05-15  6:38 ` [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-15  6:45   ` Eric Sunshine
  0 siblings, 0 replies; 53+ messages in thread
From: Eric Sunshine @ 2016-05-15  6:45 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List

On Sun, May 15, 2016 at 2:38 AM,  <tboegi@web.de> wrote:
> Factor out the retrieval of the sha1 for a given path in
> read_blob_data_from_index() into the function get_sha1_from_index().
>
> This will be used in the next commit, when convert.c can do the
> analyze for "text=auto" without slurping the whole blob into memory
> at once.
>
> Add a wrapper definition get_sha1_from_cache().
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
> diff --git a/cache.h b/cache.h
> @@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
>  #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
> +#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))

s/\s+/ /

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-15  6:38 ` [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-15  6:52   ` Eric Sunshine
  2016-05-15 22:14   ` Junio C Hamano
  1 sibling, 0 replies; 53+ messages in thread
From: Eric Sunshine @ 2016-05-15  6:52 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Git List

On Sun, May 15, 2016 at 2:38 AM,  <tboegi@web.de> wrote:
> To compare a file in working tree with the index, convert_to_git() is used,
> the the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not: When a CRLF had been in the index before, CRLF in
> the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
> diff --git a/convert.c b/convert.c
> @@ -217,21 +217,29 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
> -static int has_cr_in_index(const char *path)
> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>  {
>         unsigned long sz;
>         void *data;
>         int has_cr;
> -
> -       data = read_blob_data_from_cache(path, &sz);
> -       if (!data)
> +       enum object_type type;
> +       if (!sha1)
> +               sha1 = get_sha1_from_cache(path);
> +       if (!sha1)
> +               return 0;
> +       data = read_sha1_file(sha1, &type, &sz);
> +       if (!data || type != OBJ_BLOB) {
> +               free(data);
>                 return 0;
> +       }
> +
>         has_cr = memchr(data, '\r', sz) != NULL;
>         free(data);
>         return has_cr;
>  }

Possible rewrite which would make it harder to forget to free 'data':

    int has_cr = 0;
    ...
    data = read_sha1_file(...);
    if (data && type == OBJ_BLOB)
        has_cr = memchr(...);
    free(data);
    return has_cr;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 0/3] CRLF-Handling: bug fix around ce_compare_data()
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (7 preceding siblings ...)
  2016-05-15  6:38 ` [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-15 13:02 ` tboegi
  2016-05-15 13:02 ` [PATCH v2 1/3] read-cache: factor out get_sha1_from_index() helper tboegi
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15 13:02 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Changes since v1:
  - Re-order the patches
  - t6038; use crlf on all platforms
    This did actually not break anything (without old 7/10)
    Adapt the commit message
  - ce_compare_data():
    Simplify the logic around free() in has_cr_in_index(),
    Thanks Eric Sunshine

Break up the old 10/10 series about CLRF handling into smaller
series.

1/10..4/10 are now found in tb/core-eol-fix

This series includes 3 from the 10/10 series:
05/10 read-cache: factor out get_sha1_from_index() helper #now 1/3
10/10 Fix for ce_compare_data(), done right.              #now 2/3
09/10 t6038; use crlf on all platforms                    #now 3/3

The most important patch is 2/3

The reminding 3 patches,
stream-and-early-out,
convert-unify-the-auto-handling-of-CRLF
more-safer-crlf-handling-with-text-attr
Need to be re-done, re-sorted and re-written, thanks for all the comments.

Torsten Bögershausen (3):
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path
  t6038; use crlf on all platforms

 cache.h                    |  4 ++++
 convert.c                  | 34 ++++++++++++++++++++++------------
 convert.h                  | 23 +++++++++++++++++++----
 read-cache.c               | 33 +++++++++++++++++++++------------
 sha1_file.c                | 17 +++++++++++++----
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 6 files changed, 90 insertions(+), 58 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 1/3] read-cache: factor out get_sha1_from_index() helper
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (8 preceding siblings ...)
  2016-05-15 13:02 ` [PATCH v2 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-15 13:02 ` tboegi
  2016-05-15 13:02 ` [PATCH v2 2/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
  2016-05-15 13:02 ` [PATCH v2 3/3] t6038; use crlf on all platforms tboegi
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15 13:02 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 2/3] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (9 preceding siblings ...)
  2016-05-15 13:02 ` [PATCH v2 1/3] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-15 13:02 ` tboegi
  2016-05-15 13:02 ` [PATCH v2 3/3] t6038; use crlf on all platforms tboegi
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15 13:02 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  1 +
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..c972d14 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!sha1)
+		sha1 = get_sha1_from_cache(path);
+	if (!sha1)
+		return 0;
+	data = read_sha1_file(sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (has_cr_in_index(path, sha1))
 				return 0;
 		}
 	}
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+			   const char *src, size_t len,
+			   struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+				  const char *src, size_t len,
+				  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+				 struct strbuf *dst, enum safe_crlf checksafe)
+{
+	return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+					       const unsigned char *sha1)
+{
+	return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_CE_HAS_SHA1;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git_ce_sha1(path,
+					   valid_sha1 ? sha1 : NULL,
+					   buf, size, &nbuf,
+					   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 3/3] t6038; use crlf on all platforms
  2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
                   ` (10 preceding siblings ...)
  2016-05-15 13:02 ` [PATCH v2 2/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-15 13:02 ` tboegi
  11 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-15 13:02 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

t6038 uses different code, dependig if NATIVE_CRLF is set ot not.
When the native line endings are LF, merge.renormalize is not tested very well.
Change the test to always use CRLF by setting core.eol=crlf.
---
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..4dc8c1a 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
 
 test_expect_success setup '
 	git config core.autocrlf false &&
+	git config core.eol crlf &&
 
 	echo first line | append_cr >file &&
 	echo first line >control_file &&
@@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
 
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
 	echo "<<<<<<<" >expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected &&
-		echo ======= | append_cr >>expected
-	else
-		echo first line >>expected &&
-		echo same line >>expected &&
-		echo ======= >>expected
-	fi &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
+	echo ======= | append_cr >>expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
@@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
 	echo "<<<<<<<" >expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo ======= | append_cr >>expected &&
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected
-	else
-		echo ======= >>expected &&
-		echo first line >>expected &&
-		echo same line >>expected
-	fi &&
+	echo ======= | append_cr >>expected &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
 	git config merge.renormalize false &&
 	rm -f .gitattributes &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-15  6:38 ` [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
  2016-05-15  6:52   ` Eric Sunshine
@ 2016-05-15 22:14   ` Junio C Hamano
  2016-05-16 15:51     ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
                       ` (7 more replies)
  1 sibling, 8 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-15 22:14 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> -static int has_cr_in_index(const char *path)
> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>  {
>  	unsigned long sz;
>  	void *data;
>  	int has_cr;
> -
> -	data = read_blob_data_from_cache(path, &sz);
> -	if (!data)
> +	enum object_type type;
> +	if (!sha1)
> +		sha1 = get_sha1_from_cache(path);
> +	if (!sha1)
> +		return 0;
> +	data = read_sha1_file(sha1, &type, &sz);
> +	if (!data || type != OBJ_BLOB) {
> +		free(data);
>  		return 0;
> +	}
> +
>  	has_cr = memchr(data, '\r', sz) != NULL;
>  	free(data);
>  	return has_cr;
>  }

Does this really need 2/3?  Wouldn't this be equivalent to

	if (!sha1) {
        	data = read_blob_data_from_cache(path, &sz);
	} else {
        	data = read_sha1_file(sha1, &type, &sz);
	}
	if (!data || type != OBJ_BLOB) {
        	free(data);
                return 0;
	}

        has_cr = ...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
  2016-05-15 22:14   ` Junio C Hamano
@ 2016-05-16 15:51     ` tboegi
  2016-05-16 16:13       ` Junio C Hamano
  2016-05-16 15:51     ` [PATCH v3 1/1] " tboegi
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-16 15:51 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Changes since v2:

- Only 1 patch, the t6038 needs to go as a separate "series"

has_cr_in_index() uses either
	data = read_sha1_file(sha1, &type, &sz);
or
	data = read_blob_data_from_cache(path, &sz);

(I like v2 better, since there is a single code to get a sha object
into memory, which later will be replaced by a streaming object)
But in any case, here is v3

Torsten Bögershausen (1):
  convert: ce_compare_data() checks for a sha1 of a path

 cache.h                    |  1 +
 convert.c                  | 35 +++++++++++++++++++++--------------
 convert.h                  | 23 +++++++++++++++++++----
 read-cache.c               |  4 +++-
 sha1_file.c                | 17 +++++++++++++----

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 1/1] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-15 22:14   ` Junio C Hamano
  2016-05-16 15:51     ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-16 15:51     ` tboegi
  2016-05-30 17:00     ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-16 15:51 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  1 +
 convert.c    | 35 +++++++++++++++++++++--------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..5b25462 100644
--- a/cache.h
+++ b/cache.h
@@ -604,6 +604,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..dc86899 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,25 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
-	if (!data)
-		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	int has_cr = 0;
+	enum object_type type = OBJ_BLOB;
+	if (sha1)
+		data = read_sha1_file(sha1, &type, &sz);
+	else
+		data = read_blob_data_from_cache(path, &sz);
+
+	if (data)
+		has_cr = memchr(data, '\r', sz) != NULL;
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +264,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (has_cr_in_index(path, sha1))
 				return 0;
 		}
 	}
@@ -852,8 +856,9 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+			   const char *src, size_t len,
+			   struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +879,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +887,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +901,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+				  const char *src, size_t len,
+				  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+				 struct strbuf *dst, enum safe_crlf checksafe)
+{
+	return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+					       const unsigned char *sha1)
+{
+	return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..4b0e3c3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_CE_HAS_SHA1;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git_ce_sha1(path,
+					   valid_sha1 ? sha1 : NULL,
+					   buf, size, &nbuf,
+					   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
  2016-05-16 15:51     ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-16 16:13       ` Junio C Hamano
  2016-05-17  4:08         ` Torsten Bögershausen
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-16 16:13 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> Changes since v2:
>
> - Only 1 patch, the t6038 needs to go as a separate "series"
>
> has_cr_in_index() uses either
> 	data = read_sha1_file(sha1, &type, &sz);
> or
> 	data = read_blob_data_from_cache(path, &sz);
>
> (I like v2 better, since there is a single code to get a sha object
> into memory, which later will be replaced by a streaming object)

Wait a minute, please.  I only asked the reason why you did it that
way and mentioned that the end result seemed equivalent.  "The end
result seems equivalent" does not automatically mean "therefore you
must avoid changing the code."

If you still prefer the original code, and your preference is backed
by a solid reasoning, don't change the code to a less preferrable
version.  You may have to explain what you wrote in () above clearly
in an updated log message to save other readers from asking the same
question as I asked, though.

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
  2016-05-16 16:13       ` Junio C Hamano
@ 2016-05-17  4:08         ` Torsten Bögershausen
  2016-05-17 16:09           ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
                             ` (6 more replies)
  0 siblings, 7 replies; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-17  4:08 UTC (permalink / raw)
  To: Junio C Hamano, tboegi; +Cc: git

On 05/16/2016 06:13 PM, Junio C Hamano wrote:
> Wait a minute, please.  I only asked the reason why you did it that
> way and mentioned that the end result seemed equivalent.  "The end
> result seems equivalent" does not automatically mean "therefore you
> must avoid changing the code."
>
> If you still prefer the original code, and your preference is backed
> by a solid reasoning, don't change the code to a less preferrable
> version.  You may have to explain what you wrote in () above clearly
> in an updated log message to save other readers from asking the same
> question as I asked, though.
>
> Thanks.
> --
>
No problem.
I will re-send v4 in some time and pull out the update of t6038 into an 
own path

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v1 1/1] t6038; use crlf on all platforms
  2016-05-17  4:08         ` Torsten Bögershausen
@ 2016-05-17 16:09           ` tboegi
  2016-05-17 18:39             ` Junio C Hamano
  2016-05-17 16:41           ` [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path tboegi
                             ` (5 subsequent siblings)
  6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-17 16:09 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

t6038 uses different code, depending if NATIVE_CRLF is set ot not.
When the native line endings are LF, merge.renormalize is not tested very well.
Change the test to always use CRLF by setting core.eol=crlf.
---
Broke the 10/10 series into smaller pieces, this is the update of t6038
 t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..4dc8c1a 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
 
 test_expect_success setup '
 	git config core.autocrlf false &&
+	git config core.eol crlf &&
 
 	echo first line | append_cr >file &&
 	echo first line >control_file &&
@@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
 
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
 	echo "<<<<<<<" >expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected &&
-		echo ======= | append_cr >>expected
-	else
-		echo first line >>expected &&
-		echo same line >>expected &&
-		echo ======= >>expected
-	fi &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
+	echo ======= | append_cr >>expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
@@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
 	echo "<<<<<<<" >expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo ======= | append_cr >>expected &&
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected
-	else
-		echo ======= >>expected &&
-		echo first line >>expected &&
-		echo same line >>expected
-	fi &&
+	echo ======= | append_cr >>expected &&
+	echo first line | append_cr >>expected &&
+	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
 	git config merge.renormalize false &&
 	rm -f .gitattributes &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
  2016-05-17  4:08         ` Torsten Bögershausen
  2016-05-17 16:09           ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
@ 2016-05-17 16:41           ` tboegi
  2016-05-17 16:41           ` [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Break up the old 10/10 series about CLRF handling into smaller
series. This is a small bugfix, when merge.renormalize is used
with core.autocrlf (and no attributes are set).
Prepare the refactoring to use the streaming interface.

Torsten Bögershausen (2):
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path

 cache.h      |  4 ++++
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c | 33 +++++++++++++++++++++------------
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 79 insertions(+), 32 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper
  2016-05-17  4:08         ` Torsten Bögershausen
  2016-05-17 16:09           ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
  2016-05-17 16:41           ` [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-17 16:41           ` tboegi
  2016-05-17 16:41           ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-17  4:08         ` Torsten Bögershausen
                             ` (2 preceding siblings ...)
  2016-05-17 16:41           ` [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-17 16:41           ` tboegi
  2016-05-17 18:58             ` Junio C Hamano
  2016-05-19 14:21           ` [PATCH v5 0/2] CRLF: " tboegi
                             ` (2 subsequent siblings)
  6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

has_cr_in_index() does not use read_blob_data_from_cache() any more.
A single function to read data for a give sha value makes it easier to
refactor has_cr_in_index() to use the streaming interface.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  1 +
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1  4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..c972d14 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!sha1)
+		sha1 = get_sha1_from_cache(path);
+	if (!sha1)
+		return 0;
+	data = read_sha1_file(sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (has_cr_in_index(path, sha1))
 				return 0;
 		}
 	}
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+			   const char *src, size_t len,
+			   struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+				  const char *src, size_t len,
+				  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+				 struct strbuf *dst, enum safe_crlf checksafe)
+{
+	return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+					       const unsigned char *sha1)
+{
+	return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_CE_HAS_SHA1;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git_ce_sha1(path,
+					   valid_sha1 ? sha1 : NULL,
+					   buf, size, &nbuf,
+					   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 1/1] t6038; use crlf on all platforms
  2016-05-17 16:09           ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
@ 2016-05-17 18:39             ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-17 18:39 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> t6038 uses different code, depending if NATIVE_CRLF is set ot not.
> When the native line endings are LF, merge.renormalize is not tested very well.
> Change the test to always use CRLF by setting core.eol=crlf.
> ---
> Broke the 10/10 series into smaller pieces, this is the update of t6038

Thanks.

 - Missing sign-off.

 - "... is not tested very well", which implies "with this change,
   it will be tested", is a secondary benefit.  The reader need to
   agree that, whether the platform native line ending is CRLF or
   LF, 'git merge' should behave identically on any platform, as
   long as the line ending convention used in the repository is
   explicitly set in the same way, before agreeing that this is a
   good thing to do in general.  And that is a bigger benefit, no?

 - But doesn't the same principle apply in the other direction?
   When forced to do core.eol=lf, a platform with NATIVE_CRLF should
   behave identically to how a platform without NATIVE_CRLF would?

   With this patch, we lose test coverage for core.eol=lf case,
   which used to be tested on non-NATIVE_CRLF platforms.  Isn't that
   a concern to us?

>  t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
>  1 file changed, 11 insertions(+), 26 deletions(-)
>
> diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
> index 85c10b0..4dc8c1a 100755
> --- a/t/t6038-merge-text-auto.sh
> +++ b/t/t6038-merge-text-auto.sh
> @@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
>  
>  test_expect_success setup '
>  	git config core.autocrlf false &&
> +	git config core.eol crlf &&
>  
>  	echo first line | append_cr >file &&
>  	echo first line >control_file &&
> @@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
>  	same line
>  	EOF
>  
> -	if test_have_prereq NATIVE_CRLF; then
> -		append_cr <expected >expected.temp &&
> -		mv expected.temp expected
> -	fi &&
> +	append_cr <expected >expected.temp &&
> +	mv expected.temp expected &&
>  	git config merge.renormalize true &&
>  	git rm -fr . &&
>  	rm -f .gitattributes &&
> @@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
>  	same line
>  	EOF
>  
> -	if test_have_prereq NATIVE_CRLF; then
> -		append_cr <expected >expected.temp &&
> -		mv expected.temp expected
> -	fi &&
> +	append_cr <expected >expected.temp &&
> +	mv expected.temp expected &&
>  	git config merge.renormalize true &&
>  	git rm -fr . &&
>  	rm -f .gitattributes &&
> @@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
>  
>  test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
>  	echo "<<<<<<<" >expected &&
> -	if test_have_prereq NATIVE_CRLF; then
> -		echo first line | append_cr >>expected &&
> -		echo same line | append_cr >>expected &&
> -		echo ======= | append_cr >>expected
> -	else
> -		echo first line >>expected &&
> -		echo same line >>expected &&
> -		echo ======= >>expected
> -	fi &&
> +	echo first line | append_cr >>expected &&
> +	echo same line | append_cr >>expected &&
> +	echo ======= | append_cr >>expected &&
>  	echo first line | append_cr >>expected &&
>  	echo same line | append_cr >>expected &&
>  	echo ">>>>>>>" >>expected &&
> @@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
>  	echo "<<<<<<<" >expected &&
>  	echo first line | append_cr >>expected &&
>  	echo same line | append_cr >>expected &&
> -	if test_have_prereq NATIVE_CRLF; then
> -		echo ======= | append_cr >>expected &&
> -		echo first line | append_cr >>expected &&
> -		echo same line | append_cr >>expected
> -	else
> -		echo ======= >>expected &&
> -		echo first line >>expected &&
> -		echo same line >>expected
> -	fi &&
> +	echo ======= | append_cr >>expected &&
> +	echo first line | append_cr >>expected &&
> +	echo same line | append_cr >>expected &&
>  	echo ">>>>>>>" >>expected &&
>  	git config merge.renormalize false &&
>  	rm -f .gitattributes &&

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-17 16:41           ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-17 18:58             ` Junio C Hamano
  2016-05-18  4:26               ` Torsten Bögershausen
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-17 18:58 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

>  #define HASH_WRITE_OBJECT 1
>  #define HASH_FORMAT_CHECK 2
> +#define HASH_CE_HAS_SHA1  4

How does one pronounce the words in this constant?  Does it make a
listener understand what this constant means?


/*
 * We need a comment around here to say what these two
 * parameters mean.  I am guessing that (1) if sha1 is not NULL,
 * path is ignored and the function inspects if it has CR; (2)
 * otherwise it checks the index entry at path and inspects if
 * it has CR.
 */
>  
> -static int has_cr_in_index(const char *path)
> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>  {

This makes me seriously wonder if it is a good idea to modify this
function like this.  (1) means this function is not about IN INDEX
at all!

Perhaps add a "static int blob_has_cr(const unsigned char *sha1)"
and call it from the real caller you added that wants to call this
butchered two-param version that has sha1 is a better solution?

> -static int crlf_to_git(const char *path, const char *src, size_t len,
> +static int crlf_to_git(const char *path, const unsigned char *sha1,
> +		       const char *src, size_t len,
>  		       struct strbuf *buf,
>  		       enum crlf_action crlf_action, enum safe_crlf checksafe)
>  {
> @@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
>  			 * If the file in the index has any CR in it, do not convert.
>  			 * This is the new safer autocrlf handling.
>  			 */
> -			if (has_cr_in_index(path))
> +			if (has_cr_in_index(path, sha1))

I think this change is too ugly.  The new "sha1" parameter is
telling us that "in order to see if the indexed version has CR, do
not look at the index, but look at the contents of this blob".

But wouldn't the result become more understandable if instead we
passed a new parameter "cr-state-for-conversion" that takes UNKNOWN,
HAS_CR, or NO_CR to this function?

In other words, what if, when the caller knows it does not care
what's in the index but wants to instead see if a different blob has
CR, we make it the caller's responsibility to figure it out by
calling blob_has_cr() before calling into this codepath and pass the
result of that check down?  When cr-state-for-conversion is UNKNOWN,
this code knows that it needs to call has_cr_in_index(path) to
figure it out itself.  Otherwise, it can and should use the
caller-supplied value without asking has_cr_in_index(path).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-17 18:58             ` Junio C Hamano
@ 2016-05-18  4:26               ` Torsten Bögershausen
  2016-05-18 15:10                 ` Torsten Bögershausen
  0 siblings, 1 reply; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-18  4:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/17/2016 08:58 PM, Junio C Hamano wrote:
> tboegi@web.de writes:
>
>>   #define HASH_WRITE_OBJECT 1
>>   #define HASH_FORMAT_CHECK 2
>> +#define HASH_CE_HAS_SHA1  4
>
> How does one pronounce the words in this constant?  Does it make a
> listener understand what this constant means?
How about
HASH_USE_SHA1_FROM_CE
or
HASH_CE_HAS_VALID_SHA1


>
>
> /*
>   * We need a comment around here to say what these two
>   * parameters mean.  I am guessing that (1) if sha1 is not NULL,
>   * path is ignored and the function inspects if it has CR; (2)
>   * otherwise it checks the index entry at path and inspects if
>   * it has CR.
>   */
>>
>> -static int has_cr_in_index(const char *path)
>> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>>   {
>
> This makes me seriously wonder if it is a good idea to modify this
> function like this.  (1) means this function is not about IN INDEX
> at all!
>
> Perhaps add a "static int blob_has_cr(const unsigned char *sha1)"
> and call it from the real caller you added that wants to call this
> butchered two-param version that has sha1 is a better solution?
>
>> -static int crlf_to_git(const char *path, const char *src, size_t len,
>> +static int crlf_to_git(const char *path, const unsigned char *sha1,
>> +		       const char *src, size_t len,
>>   		       struct strbuf *buf,
>>   		       enum crlf_action crlf_action, enum safe_crlf checksafe)
>>   {
>> @@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
>>   			 * If the file in the index has any CR in it, do not convert.
>>   			 * This is the new safer autocrlf handling.
>>   			 */
>> -			if (has_cr_in_index(path))
>> +			if (has_cr_in_index(path, sha1))

>
> I think this change is too ugly.  The new "sha1" parameter is
> telling us that "in order to see if the indexed version has CR, do
> not look at the index, but look at the contents of this blob".
Thanks for some fresh eyes, I guess that
blob_has_cr(sha1)
would make most sense.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-18  4:26               ` Torsten Bögershausen
@ 2016-05-18 15:10                 ` Torsten Bögershausen
  0 siblings, 0 replies; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-18 15:10 UTC (permalink / raw)
  To: Torsten Bögershausen, Junio C Hamano; +Cc: git

On 18.05.16 06:26, Torsten Bögershausen wrote:
> On 05/17/2016 08:58 PM, Junio C Hamano wrote:
>> tboegi@web.de writes:
>>
>>>   #define HASH_WRITE_OBJECT 1
>>>   #define HASH_FORMAT_CHECK 2
>>> +#define HASH_CE_HAS_SHA1  4
>>
>> How does one pronounce the words in this constant?  Does it make a
>> listener understand what this constant means?
> How about
> HASH_USE_SHA1_FROM_CE
> or
> HASH_CE_HAS_VALID_SHA1
or, before I send a new patch,
HASH_USE_SHA_NOT_PATH 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
  2016-05-17  4:08         ` Torsten Bögershausen
                             ` (3 preceding siblings ...)
  2016-05-17 16:41           ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-19 14:21           ` tboegi
  2016-05-19 23:10             ` Junio C Hamano
  2016-05-19 14:21           ` [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
  2016-05-19 14:21           ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
  6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Break up the old 10/10 series about CLRF handling into smaller
series. This is a small bugfix, when merge.renormalize is used
with core.autocrlf (and no attributes are set).
Prepare the refactoring to use the streaming interface.
Changes since v4:
 - Rename #define in cache.h into HASH_USE_SHA_NOT_PATH
 - convert.c: Rename has_cr_in_index into blob_has_cr()
   Better logic when sha1 != NULL,
   Adjusted the commit message

Torsten Bögershausen (2):
  read-cache: factor out get_sha1_from_index() helper
  convert: ce_compare_data() checks for a sha1 of a path

 cache.h      |  4 ++++
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c | 33 +++++++++++++++++++++------------
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 79 insertions(+), 32 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper
  2016-05-17  4:08         ` Torsten Bögershausen
                             ` (4 preceding siblings ...)
  2016-05-19 14:21           ` [PATCH v5 0/2] CRLF: " tboegi
@ 2016-05-19 14:21           ` tboegi
  2016-05-19 14:21           ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
  6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-17  4:08         ` Torsten Bögershausen
                             ` (5 preceding siblings ...)
  2016-05-19 14:21           ` [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-19 14:21           ` tboegi
  2016-05-19 23:03             ` Junio C Hamano
  6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

While at it, rename has_cr_in_index() into blob_has_cr()

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  1 +
 convert.c    | 34 ++++++++++++++++++++++------------
 convert.h    | 23 +++++++++++++++++++----
 read-cache.c |  4 +++-
 sha1_file.c  | 17 +++++++++++++----
 5 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/convert.c b/convert.c
index f524b8d..92ddfb1 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,26 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!sha1)
+		return 0;
+	data = read_sha1_file(sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (!sha1)
+				sha1 = get_sha1_from_cache(path);
+			if (blob_has_cr(sha1))
 				return 0;
 		}
 	}
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+			   const char *src, size_t len,
+			   struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+				  const char *src, size_t len,
+				  struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+				 struct strbuf *dst, enum safe_crlf checksafe)
+{
+	return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
 {
 	return convert_to_git(path, NULL, 0, NULL, 0);
 }
+static inline int would_convert_to_git_ce_sha1(const char *path,
+					       const unsigned char *sha1)
+{
+	return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_USE_SHA_NOT_PATH;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..1fdbfd3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git_ce_sha1(path,
+					   valid_sha1 ? sha1 : NULL,
+					   buf, size, &nbuf,
+					   write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-19 14:21           ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-19 23:03             ` Junio C Hamano
  2016-05-20 17:12               ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
                                 ` (3 more replies)
  0 siblings, 4 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-19 23:03 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> +int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
> +			   const char *src, size_t len,
> +			   struct strbuf *dst, enum safe_crlf checksafe)

That's a strange name for the function, as "ce" and "sha1" gives no
hints what they are about.

If I understand correctly:

 - convert_to_git() is about converting <src, len> into <dst>, and
   "path" is not for reading the contents to be converted.  It is
   used to tell crlf_to_git() the index entry to pay attention to
   when defeating the usual conversion rules due to strange contents
   in the index (it also is used to report errors for safe-crlf
   check).

 - This one adds "sha1", and that is not about the contents to be
   converted, either.  Like "path", "sha1" is used to tell what blob
   to check when disabling the usual conversion rules.

Does the above description help in coming up with a better
description for the ce/sha1 thing?  The comment near the code that
uses them reads like so:

	/*
	 * If the file in the index has any CR in it, do not convert.
	 * This is the new safer autocrlf handling.
	 */

What is a good name to call the input to that logic?  "This
function, in addition to convert_to_git(), takes an extra parameter,
that tells it an extra piece of information 'X'"; what is X?

At the same time, the parameter "sha1" needs to be renamed to
clarify what object it is and what purpose it serves.  "sha1" alone
is an overly generic name, and it does not hint that it may not even
be given to the function, and that it doesn't have anything to do
with the contents <src, len> points at.

	Note. Perhaps you wanted _ce_sha1 suffix to tell the readers
	that it takes "an object name of the cache entry" that
	further affects the conversion?  If so the sha1 parameter
	should be renamed to match (and make it clear to readers
	that is what you meant).

The "sha1" is pretending to be the more important input for the
primary function of this function by being in early part of the
parameter list.  This may need to be rethought; we probably should
have done so as part of your previous refactoring of this file.

convert_to_git() takes the data for <path> in <src, len> and gives
result in <dst>, so these four should be its first parameters.  The
detail of the way the conversion works may be affected by additional
parameters, e.g. <checksafe> controls if extra warnings are given.

The <sha1> is to influence the conversion logic further to disable
the crlf-to-git conversion by inspecting a blob, and it tells the
function that the blob comes from an unusual place (as opposed to
the index entry at <path>).  So it should sit next to checksafe as
an auxiliary input parameter.

The logic implemented by the patch looks correct, but I'd have to
say that the result is an unmaintainable mess.  Right now, I can see
what is going on in the new code.  I am sure that in 6 months, with
poorly named helper functions and parameters, I will have hard time
recalling how the new code works.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
  2016-05-19 14:21           ` [PATCH v5 0/2] CRLF: " tboegi
@ 2016-05-19 23:10             ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-19 23:10 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> Break up the old 10/10 series about CLRF handling into smaller
> series. This is a small bugfix, when merge.renormalize is used
> with core.autocrlf (and no attributes are set).

Is it worth protecting the fix with a new test?  Or does this flip
an existing "expect-failure" to "expect-success"?

> Prepare the refactoring to use the streaming interface.

> Changes since v4:
>  - Rename #define in cache.h into HASH_USE_SHA_NOT_PATH
>  - convert.c: Rename has_cr_in_index into blob_has_cr()
>    Better logic when sha1 != NULL,
>    Adjusted the commit message
>
> Torsten Bögershausen (2):
>   read-cache: factor out get_sha1_from_index() helper
>   convert: ce_compare_data() checks for a sha1 of a path
>
>  cache.h      |  4 ++++
>  convert.c    | 34 ++++++++++++++++++++++------------
>  convert.h    | 23 +++++++++++++++++++----
>  read-cache.c | 33 +++++++++++++++++++++------------
>  sha1_file.c  | 17 +++++++++++++----
>  5 files changed, 79 insertions(+), 32 deletions(-)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper
  2016-05-19 23:03             ` Junio C Hamano
@ 2016-05-20 17:12               ` tboegi
  2016-05-20 17:12               ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-20 17:12 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-19 23:03             ` Junio C Hamano
  2016-05-20 17:12               ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-20 17:12               ` tboegi
  2016-05-20 17:46                 ` Junio C Hamano
  2016-05-21 10:01               ` [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
  2016-05-21 10:01               ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
  3 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-20 17:12 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Forward sha1 from ce_compare_data() into convert_to_git().
All other callers use NULL, and the sha1 it is determined from path using
get_sha1_from_cache(path), this is the same handling as before.

Re-order the arguments for convert_to_git() according to their importance:
  `src`, `len` and `dst` are the place in memory, where the conversion is done
  `path` is the file name to look up the attributes.
  `sha1` is needed by the "new safer autocrlf handling".
  `checksafe` determines, if a warning is printed or an error is raised.

In the same spirit, forward the sha1 into would_convert_to_git().

While at it, rename has_cr_in_index() into blob_has_cr()

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Changes sinve v6:
 decrease the messiness with 12 %
 convert_to_git() has a re-ordered parameter list.
 Describe whats going on better in the commit msg.
 Cleanup: 0 -> SAFE_CRLF_FALSE at some places
---
 builtin/apply.c |  3 ++-
 builtin/blame.c |  2 +-
 cache.h         |  1 +
 combine-diff.c  |  4 +++-
 convert.c       | 38 +++++++++++++++++++++++++-------------
 convert.h       | 20 ++++++++++++++------
 diff.c          |  3 ++-
 dir.c           |  2 +-
 read-cache.c    |  4 +++-
 sha1_file.c     | 17 +++++++++++++----
 10 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..c01654a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(path, buf->buf, buf->len, buf, 0);
+		convert_to_git(buf->buf, buf->len, buf,
+			       path, NULL, SAFE_CRLF_FALSE);
 		return 0;
 	default:
 		return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..4a01e20 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		if (strbuf_read(&buf, 0, 0) < 0)
 			die_errno("failed to read from stdin");
 	}
-	convert_to_git(path, buf.buf, buf.len, &buf, 0);
+	convert_to_git(buf.buf, buf.len, &buf, path, NULL, SAFE_CRLF_FALSE);
 	origin->file.ptr = buf.buf;
 	origin->file.size = buf.len;
 	pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/combine-diff.c b/combine-diff.c
index 0e1d4b0..cac4c81 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1053,7 +1053,9 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			if (is_file) {
 				struct strbuf buf = STRBUF_INIT;
 
-				if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+				if (convert_to_git(result, len, &buf,
+						   elem->path, NULL,
+						   safe_crlf)) {
 					free(result);
 					result = strbuf_detach(&buf, &len);
 					result_size = len;
diff --git a/convert.c b/convert.c
index f524b8d..a58bb26 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,26 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!sha1)
+		return 0;
+	data = read_sha1_file(sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+		       const char *src, size_t len,
 		       struct strbuf *buf,
 		       enum crlf_action crlf_action, enum safe_crlf checksafe)
 {
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (!sha1)
+				sha1 = get_sha1_from_cache(path);
+			if (blob_has_cr(sha1))
 				return 0;
 		}
 	}
@@ -852,8 +859,10 @@ const char *get_convert_attr_ascii(const char *path)
 	return "";
 }
 
-int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git(const char *src, size_t len,
+		   struct strbuf *dst,
+		   const char *path, const unsigned char *sha1,
+		   enum safe_crlf checksafe)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +883,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -882,7 +891,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
 	return ret | ident_to_git(path, src, len, dst, ca.ident);
 }
 
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+			      const unsigned char *sha1,
+			      int fd, struct strbuf *dst,
 			      enum safe_crlf checksafe)
 {
 	struct conv_attrs ca;
@@ -894,7 +905,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
@@ -949,7 +960,8 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
 		src = dst->buf;
 		len = dst->len;
 	}
-	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+	ret |= convert_to_git(src, len, dst, path, NULL, SAFE_CRLF_FALSE);
+	return ret;
 }
 
 /*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..12fe767 100644
--- a/convert.h
+++ b/convert.h
@@ -37,19 +37,27 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
 extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git(const char *src, size_t len,
+			  struct strbuf *dst,
+			  const char *path, const unsigned char *sha1,
+			  enum safe_crlf checksafe);
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+
+static inline int would_convert_to_git(const char *path,
+				       const unsigned char *sha1)
 {
-	return convert_to_git(path, NULL, 0, NULL, 0);
+	return convert_to_git(NULL, 0, NULL, path, sha1, SAFE_CRLF_FALSE);
 }
+
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
-				     struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+				     const unsigned char *sha1,
+				     int fd, struct strbuf *dst,
 				     enum safe_crlf checksafe);
 extern int would_convert_to_git_filter_fd(const char *path);
 
diff --git a/diff.c b/diff.c
index d3734d3..9c00973 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		/*
 		 * Convert from working tree format to canonical git format
 		 */
-		if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+		if (convert_to_git(s->data, s->size, &buf, s->path, NULL,
+				   crlf_warn)) {
 			size_t size = 0;
 			munmap(s->data, s->size);
 			s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 656f272..5ac379d 100644
--- a/dir.c
+++ b/dir.c
@@ -713,7 +713,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 				 (pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
 				 !ce_stage(active_cache[pos]) &&
 				 ce_uptodate(active_cache[pos]) &&
-				 !would_convert_to_git(fname))
+				 !would_convert_to_git(fname, NULL))
 				hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
 			else
 				hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_USE_SHA_NOT_PATH;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..48906b0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	 */
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
-		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+		if (convert_to_git(
+			    buf, size, &nbuf,path,
+			    valid_sha1 ? sha1 : NULL,
+			    write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
-	convert_to_git_filter_fd(path, fd, &sbuf,
+	convert_to_git_filter_fd(path,
+				 valid_sha1 ? sha1 : NULL,
+				 fd, &sbuf,
 				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
 
 	if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-20 17:12               ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-20 17:46                 ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-20 17:46 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> To compare a file in working tree with the index, convert_to_git() is used,
> the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not: When a CRLF had been in the index before, CRLF in
> the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Forward sha1 from ce_compare_data() into convert_to_git().
> All other callers use NULL, and the sha1 it is determined from path using
> get_sha1_from_cache(path), this is the same handling as before.
>
> Re-order the arguments for convert_to_git() according to their importance:
>   `src`, `len` and `dst` are the place in memory, where the conversion is done
>   `path` is the file name to look up the attributes.
>   `sha1` is needed by the "new safer autocrlf handling".
>   `checksafe` determines, if a warning is printed or an error is raised.
>
> In the same spirit, forward the sha1 into would_convert_to_git().
>
> While at it, rename has_cr_in_index() into blob_has_cr()
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>


> Changes sinve v6:
>  decrease the messiness with 12 %

What does that mean????

>  convert_to_git() has a re-ordered parameter list.

That is not what I suggested, though.  <path, src, len> being
the first three would be fine, and that would be in line with
helpers ${frotz}_to_git() that are used from there.

The primary thing that made me worried was a new parameter with a
bland name "sha1" whose purpose is unclear was added near the
beginning, leading readers to confusion.

Whether you keep <path> at the beginning of move it to later
together with <sha1>, helpers like crlf_to_git() need to be updated
to match.  E.g. I would say that this

> -	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
> +	ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);

would want to be ordered more like this:

    ret |= crlf_to_git(path, src, len, dst,
                       ca.crlf_action, checksafe, index_blob_sha1);

if you choose to keep the first four intact for convert_to_git(),
that is.

>  Describe whats going on better in the commit msg.

The suggestion to rename the parameter was to allow readers of the
code to immediately know what kind of SHA1 it is.  They cannot
afford to run "git blame" every time to find the commit to read the
commit log message; for a public function like convert_to_git(),
in-code comment is also necessary.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper
  2016-05-19 23:03             ` Junio C Hamano
  2016-05-20 17:12               ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
  2016-05-20 17:12               ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-21 10:01               ` tboegi
  2016-05-21 10:01               ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
  3 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-21 10:01 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-19 23:03             ` Junio C Hamano
                                 ` (2 preceding siblings ...)
  2016-05-21 10:01               ` [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-21 10:01               ` tboegi
  2016-05-24 18:36                 ` Junio C Hamano
  3 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-21 10:01 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

To compare a file in working tree with the index, convert_to_git() is used,
the result is hashed and the hash value compared with ce->sha1.

Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not.
The "new safer autocrlf handling" checks if CRLF had been in the index before,
and if, the CRLF in the working tree are not converted.

While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.

Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
for index_blob_sha1, and the sha1 is determined from path
using get_sha1_from_cache(path). This is the same handling as before.

In the same spirit, forward the sha1 into would_convert_to_git().

While at it, rename has_cr_in_index() into blob_has_cr() and replace
0 with SAFE_CRLF_FALSE.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 V6 went into the wrong direction.
 V5 -> V7: Adds parameter index_blob_sha1 to (convert_to_git() and would_convert_to_git().

 builtin/apply.c |  3 ++-
 builtin/blame.c |  2 +-
 cache.h         |  1 +
 combine-diff.c  |  3 ++-
 convert.c       | 34 ++++++++++++++++++++++------------
 convert.h       | 15 +++++++++++----
 diff.c          |  3 ++-
 dir.c           |  2 +-
 read-cache.c    |  4 +++-
 sha1_file.c     | 12 +++++++++---
 10 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..0cf9a0a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(path, buf->buf, buf->len, buf, 0);
+		convert_to_git(path, buf->buf, buf->len, buf,
+			       SAFE_CRLF_FALSE, NULL);
 		return 0;
 	default:
 		return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..1c523b6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		if (strbuf_read(&buf, 0, 0) < 0)
 			die_errno("failed to read from stdin");
 	}
-	convert_to_git(path, buf.buf, buf.len, &buf, 0);
+	convert_to_git(path, buf.buf, buf.len, &buf, SAFE_CRLF_FALSE, NULL);
 	origin->file.ptr = buf.buf;
 	origin->file.size = buf.len;
 	pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/combine-diff.c b/combine-diff.c
index 0e1d4b0..c4fa884 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1053,7 +1053,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			if (is_file) {
 				struct strbuf buf = STRBUF_INIT;
 
-				if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+				if (convert_to_git(elem->path, result, len,
+						   &buf, safe_crlf, NULL)) {
 					free(result);
 					result = strbuf_detach(&buf, &len);
 					result_size = len;
diff --git a/convert.c b/convert.c
index f524b8d..f0eb4ed 100644
--- a/convert.c
+++ b/convert.c
@@ -217,23 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *index_blob_sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!index_blob_sha1)
+		return 0;
+	data = read_sha1_file(index_blob_sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
 static int crlf_to_git(const char *path, const char *src, size_t len,
 		       struct strbuf *buf,
-		       enum crlf_action crlf_action, enum safe_crlf checksafe)
+		       enum crlf_action crlf_action, enum safe_crlf checksafe,
+		       const unsigned char *index_blob_sha1)
 {
 	struct text_stat stats;
 	char *dst;
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			 * If the file in the index has any CR in it, do not convert.
 			 * This is the new safer autocrlf handling.
 			 */
-			if (has_cr_in_index(path))
+			if (!index_blob_sha1)
+				index_blob_sha1 = get_sha1_from_cache(path);
+			if (blob_has_cr(index_blob_sha1))
 				return 0;
 		}
 	}
@@ -853,7 +860,8 @@ const char *get_convert_attr_ascii(const char *path)
 }
 
 int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+		   struct strbuf *dst, enum safe_crlf checksafe,
+		   const unsigned char *index_blob_sha1)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe, index_blob_sha1);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -883,7 +891,8 @@ int convert_to_git(const char *path, const char *src, size_t len,
 }
 
 void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
-			      enum safe_crlf checksafe)
+			      enum safe_crlf checksafe,
+			      const unsigned char *index_blob_sha1)
 {
 	struct conv_attrs ca;
 	convert_attrs(&ca, path);
@@ -894,7 +903,8 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
+		    checksafe, index_blob_sha1);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
@@ -949,7 +959,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
 		src = dst->buf;
 		len = dst->len;
 	}
-	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE, NULL);
 }
 
 /*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..60c46b8 100644
--- a/convert.h
+++ b/convert.h
@@ -38,19 +38,26 @@ extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
 extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+			  struct strbuf *dst, enum safe_crlf checksafe,
+			  const unsigned char *index_blob_sha1);
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+static inline int would_convert_to_git(const char *path,
+				       const unsigned char *index_blob_sha1)
 {
-	return convert_to_git(path, NULL, 0, NULL, 0);
+	return convert_to_git(path, NULL, 0, NULL, SAFE_CRLF_FALSE,
+			      index_blob_sha1);
 }
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
 extern void convert_to_git_filter_fd(const char *path, int fd,
 				     struct strbuf *dst,
-				     enum safe_crlf checksafe);
+				     enum safe_crlf checksafe,
+				     const unsigned char *index_blob_sha1);
+
 extern int would_convert_to_git_filter_fd(const char *path);
 
 /*****************************************************************
diff --git a/diff.c b/diff.c
index d3734d3..a8308e0 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		/*
 		 * Convert from working tree format to canonical git format
 		 */
-		if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+		if (convert_to_git(s->path, s->data, s->size, &buf,
+				   crlf_warn, NULL)) {
 			size_t size = 0;
 			munmap(s->data, s->size);
 			s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 656f272..5ac379d 100644
--- a/dir.c
+++ b/dir.c
@@ -713,7 +713,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 				 (pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
 				 !ce_stage(active_cache[pos]) &&
 				 ce_uptodate(active_cache[pos]) &&
-				 !would_convert_to_git(fname))
+				 !would_convert_to_git(fname, NULL))
 				hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
 			else
 				hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_USE_SHA_NOT_PATH;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..52e5c6f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3285,7 +3286,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
 		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+				   write_object ? safe_crlf : SAFE_CRLF_FALSE,
+				   valid_sha1 ? sha1 : NULL)) {
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3313,13 +3315,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
 	convert_to_git_filter_fd(path, fd, &sbuf,
-				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+				 write_object ? safe_crlf : SAFE_CRLF_FALSE,
+				 valid_sha1 ? sha1 : NULL);
 
 	if (write_object)
 		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
@@ -3396,6 +3400,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3412,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path
  2016-05-21 10:01               ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-24 18:36                 ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-24 18:36 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> To compare a file in working tree with the index, convert_to_git() is used,
> the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not.
> The "new safer autocrlf handling" checks if CRLF had been in the index before,
> and if, the CRLF in the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
> sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
> for index_blob_sha1, and the sha1 is determined from path
> using get_sha1_from_cache(path). This is the same handling as before.
>
> In the same spirit, forward the sha1 into would_convert_to_git().
>
> While at it, rename has_cr_in_index() into blob_has_cr() and replace
> 0 with SAFE_CRLF_FALSE.
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---

Assuming that it is sensible to pass an extra "no, no, do not look
at the index entry at path, but look at this blob to see if the CRLF
conversion must be disabled" parameter all the down the callchain,
this round looks good.

I really hate to say this this late in the reroll cycle, but this
part of the description makes me wonder:

    > While in a merge, a file name in the working tree has
    > different blobs in the index with different hash values.
    > Forwarding ce->sha1 from ce_compare_data() into crlf_to_git()
    > makes sure the would_convert_crlf_at_commit() looks at the
    > appropriate blob.

When we need content-level merges with help from the end user, we
would need to "convert-to-git" the result of conflict resolution by
the user left in the working tree, and that convert-to-git needs to
take our original contents into account, i.e. "did we have CR in the
blob already? if so, disable the CRLF thing".

But we would always have "our" original contents at stage #2 in the
index in such a case, and would_convert_to_git() eventually calls
into read_blob_data_from_cache(), which knows to read from stage #2

Even if we were in a renaming merge conflict, where they renamed
file F to G while we kept file F as file F, the conflicted working
tree file will be made in path G, and stage #2 of the index for path
G would have our original contents we had at path F.

So it is not clear why this "no, no, look at this blob instead" is
necessary in the first place.  What problem does this solve?  Is
this a fix for something that can be easily demonstrated with a new
test case?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v1 0/1] t6038-merge-text-auto.sh
  2016-05-15 22:14   ` Junio C Hamano
  2016-05-16 15:51     ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
  2016-05-16 15:51     ` [PATCH v3 1/1] " tboegi
@ 2016-05-30 17:00     ` tboegi
  2016-05-30 18:00       ` Junio C Hamano
  2016-05-30 17:00     ` [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto" tboegi
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-30 17:00 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Split of the old 10/10 series.
This is the update of t6038, which is needed to
motivate the patch
   `convert: ce_compare_data() checks for a sha1 of a path`
on top of
   `convert: unify the "auto" handling of CRLF`

When files with different eols are merged with
merge.renormalize = true,
it is important to look at the right blob to determine if
the crlf came from the blob or are a result of a coversion.
This is a little bit of a hen-and-egg problem:
The problem comes up after the "unified auto handling".
In theory, it should have been since before:
get_sha1_from_index() says:

 * We might be in the middle of a merge, in which
 * case we would read stage #2 (ours).

This seams wrong, as in the merge we sometimes need to
look at "theirs".
(But I haven't managed to construct a TC)

t6038-merge-text-auto.sh

Torsten Bögershausen (1):
  t6038: different eol for "Merge addition of text=auto"

 t/t6038-merge-text-auto.sh | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto"
  2016-05-15 22:14   ` Junio C Hamano
                       ` (2 preceding siblings ...)
  2016-05-30 17:00     ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
@ 2016-05-30 17:00     ` tboegi
  2016-06-07 15:20     ` [PATCH v2 0/3] unified auto CRLF handling, V2 tboegi
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-30 17:00 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

t6038 uses different code, depending if NATIVE_CRLF is set ot not.

Enhance the test coverage for cross-platform testing and run
"Merge addition of text=auto" with both lf and crlf as core.eol.

It is important to be run this test with crlf under Linux,
the day when the "unified auto handling will be introduced.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 t/t6038-merge-text-auto.sh | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 33b77ee..8846f5d 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -91,16 +91,13 @@ test_expect_success 'Merge after setting text=auto' '
 	compare_files expected file
 '
 
-test_expect_success 'Merge addition of text=auto' '
+test_expect_success 'Merge addition of text=auto eol=LF' '
+	git config core.eol lf &&
 	cat <<-\EOF >expected &&
 	first line
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -109,7 +106,28 @@ test_expect_success 'Merge addition of text=auto' '
 	compare_files  expected file
 '
 
+test_expect_success 'Merge addition of text=auto eol=CRLF' '
+	git config core.eol crlf &&
+	cat <<-\EOF >expected &&
+	first line
+	same line
+	EOF
+
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
+	git config merge.renormalize true &&
+	git rm -fr . &&
+	rm -f .gitattributes &&
+	git reset --hard b &&
+	echo >&2 "After git reset --hard b" &&
+	git ls-files -s --eol >&2 &&
+	git merge a &&
+	compare_files  expected file
+'
+
+
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
+	git config core.eol native &&
 	echo "<<<<<<<" >expected &&
 	if test_have_prereq NATIVE_CRLF; then
 		echo first line | append_cr >>expected &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 0/1] t6038-merge-text-auto.sh
  2016-05-30 17:00     ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
@ 2016-05-30 18:00       ` Junio C Hamano
  2016-05-30 18:48         ` Junio C Hamano
  0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-30 18:00 UTC (permalink / raw)
  To: tboegi; +Cc: git

tboegi@web.de writes:

> This is a little bit of a hen-and-egg problem:
> The problem comes up after the "unified auto handling".
> In theory, it should have been since before:
> get_sha1_from_index() says:
>
>  * We might be in the middle of a merge, in which
>  * case we would read stage #2 (ours).
>
> This seams wrong, as in the merge we sometimes need to
> look at "theirs".

The two comment you quoted is absolutely the right thing to do.
"In a merge, we sometimes need to look at 'theirs'" is like saying
"When we are dong 'git add', we need to look at what is in the
working tree".  It is total red herring.

Step back and think why we even look at what in the index in the
first place; it is to decide if we want or do not want to disable
the automatic CRLF -> LF conversion.  And think again the reason why
do we look at the index.

There may be a line with CRLF line endings in the new contents,
whether it came from a merge, cherry-pick, patch application, or
plain-simple "git add" from the working tree.  Auto-CRLF usually
says "We want CRLF turned into LF".  But the user misconfigured and
for this path the user might not want the conversion take place, in
which case you would disable the conversion.  Where do you take that
hint "the user might have misconfigured?" from?  By looking at what
the user _started_ her update from.  If the state before this "we
need to replace the blob in the index with a new contents, so we
need to hash the new contents to come up with the updated blob"
started contains CRLF already, that may be a hint--if we apply the
CRLF->LF conversion on the original, even if the "new contents" were
identical to what she already had, we would end up changing the blob
with her current configuration.  Hence we disable.

Isn't that the reasoning behind that "safe auto-crlf" thing?

The new contents getting integrated into her current state may have
CRLF, and if a merge or a cherry-pick leaves conflicts, they may be
stored in stage #3.  But paying attention to that to decide if we
want to disable Auto-CRLF conversion is simply wrong; you should
look at the CRLF in stage #3 as purely something that might need to
be converted, not something that affects the decision if it needs to
be converted, just like you view CRLF in a working tree file when
you do "git add"..

Imagine that you started from a history where somebody recorded a
text file with CRLF in the blob, unconverted.  Later the project
decided to express their text with LF to support cross-platform
development better, and sets up the Auto-CRLF.  Your user is working
near the tip of that history after the eol correction happened.  Now
she gets a pull-request of a branch that forked from an old point in
the history, before the eol correction and full of CRLF.  Yes, to
integrate the change being proposed, she needs to look at "theirs";
that's the whole point of a "merge".  Why should she revert the eol
correction her history has by getting fooled by the fact that the
update was based on a part of the history before the eol correction?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v1 0/1] t6038-merge-text-auto.sh
  2016-05-30 18:00       ` Junio C Hamano
@ 2016-05-30 18:48         ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-30 18:48 UTC (permalink / raw)
  To: tboegi; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Imagine that you started from a history where somebody recorded a
> text file with CRLF in the blob, unconverted.  Later the project
> decided to express their text with LF to support cross-platform
> development better, and sets up the Auto-CRLF.  Your user is working
> near the tip of that history after the eol correction happened.  Now
> she gets a pull-request of a branch that forked from an old point in
> the history, before the eol correction and full of CRLF.  Yes, to
> integrate the change being proposed, she needs to look at "theirs";
> that's the whole point of a "merge".  Why should she revert the eol
> correction her history has by getting fooled by the fact that the
> update was based on a part of the history before the eol correction?

Thinking aloud along this line a bit further, if you really cared
(and I don't feel very strongly, as I think "safe crlf" is an ugly
workaround that lets users keep their misconfiguration by penalizing
their day-to-day operation), you may want to think about doing a
3-way merge of "eol preference" beween all stages.

That is, if the common ancestor in stage #1 and the current version
in stage #2 both have its text in LF, and the data being merged in
stage #3 is in CRLF, you sayr "do not convert; the change being
brought in wants to have CRLF endings, while our history did not
care".  Similarly, if the common ancestor in stage #1 and the data
being merged in stage #3 both have CRLF, and your version in stage
#2 has LF, you say "We wanted to fix eol since the side branch
forked, and the side branch predates that fix, so we keep the eol
fix we did since we diverged", i.e. "Do convert".

For doing this, you may want to refactor the codepath that decides
"Auto-CRLF usually wants to turn CRLF in text to LF, but should we
disable that logic now, because the user already has CRLF in the
current one?" into a function that takes a single parameter, `path`,
and returns "Yes, do convert CRLF to LF / No, do not convert"
boolean.  Having a low level function that says "What's the blob at
this path in the index" and have the caller run that logic feels
unwieldy if we want to go that route.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 0/3] unified auto CRLF handling, V2
  2016-05-15 22:14   ` Junio C Hamano
                       ` (3 preceding siblings ...)
  2016-05-30 17:00     ` [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto" tboegi
@ 2016-06-07 15:20     ` tboegi
  2016-06-07 15:20     ` [PATCH v2 1/3] convert: unify the "auto" handling of CRLF tboegi
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

unified auto CRLF handling, V2
  1/3 is 7/10 of the old 10/10 series
  2/3 and 
  3/3 is a replacement for tb/convert-peek-in-index:
      Better commit message, added test case

All in all we are getting closer.
Most of the patches had been send & reviewed earlier,
but a critical review won't hurt.

Torsten Bögershausen (3):
  convert: unify the "auto" handling of CRLF
  read-cache: factor out get_sha1_from_index() helper
  Correct ce_compare_data() in a middle of a merge

 Documentation/config.txt        | 12 +++----
 Documentation/gitattributes.txt | 15 +++++----
 builtin/apply.c                 |  3 +-
 builtin/blame.c                 |  2 +-
 cache.h                         |  4 +++
 combine-diff.c                  |  3 +-
 convert.c                       | 65 +++++++++++++++++++++++-------------
 convert.h                       | 18 +++++++---
 diff.c                          |  3 +-
 dir.c                           |  2 +-
 read-cache.c                    | 33 +++++++++++-------
 sha1_file.c                     | 12 +++++--
 t/t0025-crlf-auto.sh            |  4 +--
 t/t0027-auto-crlf.sh            | 32 +++++++++---------
 t/t6038-merge-text-auto.sh      | 74 ++++++++++++++++++++++++-----------------
 15 files changed, 172 insertions(+), 110 deletions(-)

-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 1/3] convert: unify the "auto" handling of CRLF
  2016-05-15 22:14   ` Junio C Hamano
                       ` (4 preceding siblings ...)
  2016-06-07 15:20     ` [PATCH v2 0/3] unified auto CRLF handling, V2 tboegi
@ 2016-06-07 15:20     ` tboegi
  2016-06-07 15:20     ` [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
  2016-06-07 15:20     ` [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge tboegi
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Before this change,
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes

would have the same effect as
$ echo "* text" >.gitattributes
$ git config core.eol crlf

Since the 'eol' attribute had higher priority than 'text=auto', this may
corrupt binary files and is not what most users expect to happen.

Make the 'eol' attribute to obey 'text=auto', and now
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
behaves the same as
$ echo "* text=auto" >.gitattributes
$ git config core.eol crlf

In other words,
$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true

and
$ echo "* text=auto eol=lf" >.gitattributes
has the same effect as
$ git config core.autocrlf input

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Documentation/config.txt        | 12 +++++-------
 Documentation/gitattributes.txt | 15 +++++++++------
 convert.c                       | 42 +++++++++++++++++++++--------------------
 convert.h                       |  3 ++-
 t/t0025-crlf-auto.sh            |  4 ++--
 t/t0027-auto-crlf.sh            | 32 +++++++++++++++----------------
 t/t6038-merge-text-auto.sh      | 23 ++++++++++++++--------
 7 files changed, 71 insertions(+), 60 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 53f00db..208b5de 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -405,13 +405,11 @@ file with mixed line endings would be reported by the `core.safecrlf`
 mechanism.
 
 core.autocrlf::
-	Setting this variable to "true" is almost the same as setting
-	the `text` attribute to "auto" on all files except that text
-	files are not guaranteed to be normalized: files that contain
-	`CRLF` in the repository will not be touched.  Use this
-	setting if you want to have `CRLF` line endings in your
-	working directory even though the repository does not have
-	normalized line endings.  This variable can be set to 'input',
+	Setting this variable to "true" is the same as setting
+	the `text` attribute to "auto" on all files and core.eol to "crlf".
+	Set to true if you want to have `CRLF` line endings in your
+	working directory and the repository has LF line endings.
+	This variable can be set to 'input',
 	in which case no output conversion is performed.
 
 core.symlinks::
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..d7a124b 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -115,6 +115,7 @@ text file is normalized, its line endings are converted to LF in the
 repository.  To control what line ending style is used in the working
 directory, use the `eol` attribute for a single file and the
 `core.eol` configuration variable for all text files.
+Note that `core.autocrlf` overrides `core.eol`
 
 Set::
 
@@ -130,8 +131,9 @@ Unset::
 Set to string value "auto"::
 
 	When `text` is set to "auto", the path is marked for automatic
-	end-of-line normalization.  If Git decides that the content is
-	text, its line endings are normalized to LF on checkin.
+	end-of-line conversion.  If Git decides that the content is
+	text, its line endings are converted to LF on checkin.
+	When the file has been commited with CRLF, no conversion is done.
 
 Unspecified::
 
@@ -146,7 +148,7 @@ unspecified.
 ^^^^^
 
 This attribute sets a specific line-ending style to be used in the
-working directory.  It enables end-of-line normalization without any
+working directory.  It enables end-of-line conversion without any
 content checks, effectively setting the `text` attribute.
 
 Set to string value "crlf"::
@@ -186,9 +188,10 @@ the working directory, and prevent .jpg files from being normalized
 regardless of their content.
 
 ------------------------
+*               text=auto
 *.txt		text
-*.vcproj	eol=crlf
-*.sh		eol=lf
+*.vcproj	text eol=crlf
+*.sh		text eol=lf
 *.jpg		-text
 ------------------------
 
@@ -198,7 +201,7 @@ normalization in Git.
 
 If you simply want to have CRLF line endings in your working directory
 regardless of the repository you are working with, you can set the
-config variable "core.autocrlf" without changing any attributes.
+config variable "core.autocrlf" without using any attributes.
 
 ------------------------
 [core]
diff --git a/convert.c b/convert.c
index b1614bf..67d69b5 100644
--- a/convert.c
+++ b/convert.c
@@ -176,7 +176,9 @@ static enum eol output_eol(enum crlf_action crlf_action)
 		return EOL_LF;
 	case CRLF_UNDEFINED:
 	case CRLF_AUTO_CRLF:
+		return EOL_CRLF;
 	case CRLF_AUTO_INPUT:
+		return EOL_LF;
 	case CRLF_TEXT:
 	case CRLF_AUTO:
 		/* fall through */
@@ -254,17 +256,15 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 	if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
 		if (convert_is_binary(len, &stats))
 			return 0;
-
-		if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
-			/*
-			 * If the file in the index has any CR in it, do not convert.
-			 * This is the new safer autocrlf handling.
-			 */
-			if (has_cr_in_index(path))
-				return 0;
-		}
+		/*
+		 * If the file in the index has any CR in it, do not convert.
+		 * This is the new safer autocrlf handling.
+		 */
+		if (checksafe == SAFE_CRLF_RENORMALIZE)
+			checksafe = SAFE_CRLF_FALSE;
+		else if (has_cr_in_index(path))
+			return 0;
 	}
-
 	check_safe_crlf(path, crlf_action, &stats, checksafe);
 
 	/* Optimization: No CRLF? Nothing to convert, regardless. */
@@ -320,12 +320,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 		return 0;
 
 	if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
-		if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
-			/* If we have any CR or CRLF line endings, we do not touch it */
-			/* This is the new safer autocrlf-handling */
-			if (stats.lonecr || stats.crlf )
-				return 0;
-		}
+		/* If we have any CR or CRLF line endings, we do not touch it */
+		/* This is the new safer autocrlf-handling */
+		if (stats.lonecr || stats.crlf )
+			return 0;
 
 		if (convert_is_binary(len, &stats))
 			return 0;
@@ -786,7 +784,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
 		ca->drv = git_path_check_convert(ccheck + 2);
 		if (ca->crlf_action != CRLF_BINARY) {
 			enum eol eol_attr = git_path_check_eol(ccheck + 3);
-			if (eol_attr == EOL_LF)
+			if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
+				ca->crlf_action = CRLF_AUTO_INPUT;
+			else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
+				ca->crlf_action = CRLF_AUTO_CRLF;
+			else if (eol_attr == EOL_LF)
 				ca->crlf_action = CRLF_TEXT_INPUT;
 			else if (eol_attr == EOL_CRLF)
 				ca->crlf_action = CRLF_TEXT_CRLF;
@@ -845,9 +847,9 @@ const char *get_convert_attr_ascii(const char *path)
 	case CRLF_AUTO:
 		return "text=auto";
 	case CRLF_AUTO_CRLF:
-		return "text=auto eol=crlf"; /* This is not supported yet */
+		return "text=auto eol=crlf";
 	case CRLF_AUTO_INPUT:
-		return "text=auto eol=lf"; /* This is not supported yet */
+		return "text=auto eol=lf";
 	}
 	return "";
 }
@@ -949,7 +951,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
 		src = dst->buf;
 		len = dst->len;
 	}
-	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE);
 }
 
 /*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..81b6cdf 100644
--- a/convert.h
+++ b/convert.h
@@ -7,7 +7,8 @@
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
-	SAFE_CRLF_WARN = 2
+	SAFE_CRLF_WARN = 2,
+	SAFE_CRLF_RENORMALIZE = 4
 };
 
 extern enum safe_crlf safe_crlf;
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..d0bee08 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -114,7 +114,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
 	test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true does not normalize CRLF files' '
 
 	rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
 	git config core.autocrlf true &&
@@ -126,7 +126,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
 	LFonlydiff=$(git diff LFonly) &&
 	CRLFonlydiff=$(git diff CRLFonly) &&
 	LFwithNULdiff=$(git diff LFwithNUL) &&
-	test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+	test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '
diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 9372589..8367d0b 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -175,8 +175,8 @@ attr_ascii () {
 	text,lf)   echo "text eol=lf" ;;
 	text,crlf) echo "text eol=crlf" ;;
 	auto,)     echo "text=auto" ;;
-	auto,lf)   echo "text eol=lf" ;;
-	auto,crlf) echo "text eol=crlf" ;;
+	auto,lf)   echo "text=auto eol=lf" ;;
+	auto,crlf) echo "text=auto eol=crlf" ;;
 	lf,)       echo "text eol=lf" ;;
 	crlf,)     echo "text eol=crlf" ;;
 	,) echo "" ;;
@@ -397,10 +397,9 @@ commit_chk_wrnNNO ""      ""      false   ""        ""        ""          ""
 commit_chk_wrnNNO ""      ""      true    LF_CRLF   ""        ""          ""          ""
 commit_chk_wrnNNO ""      ""      input   ""        ""        ""          ""          ""
 
-commit_chk_wrnNNO "auto"  ""      false   "$WILC"   "$WICL"   "$WAMIX"    ""          ""
-commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        LF_CRLF     ""          ""
-commit_chk_wrnNNO "auto"  ""      input   ""        CRLF_LF   CRLF_LF     ""          ""
-
+commit_chk_wrnNNO "auto"  ""      false   "$WILC"   ""        ""          ""          ""
+commit_chk_wrnNNO "auto"  ""      true    LF_CRLF   ""        ""          ""          ""
+commit_chk_wrnNNO "auto"  ""      input   ""        ""        ""          ""          ""
 for crlf in true false input
 do
 	commit_chk_wrnNNO -text ""      $crlf   ""        ""        ""          ""          ""
@@ -408,8 +407,8 @@ do
 	commit_chk_wrnNNO -text crlf    $crlf   ""        ""        ""          ""          ""
 	commit_chk_wrnNNO ""    lf      $crlf   ""       CRLF_LF    CRLF_LF      ""         CRLF_LF
 	commit_chk_wrnNNO ""    crlf    $crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
-	commit_chk_wrnNNO auto  lf    	$crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
-	commit_chk_wrnNNO auto  crlf  	$crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
+	commit_chk_wrnNNO auto  lf    	$crlf   ""        ""        ""          ""          ""
+	commit_chk_wrnNNO auto  crlf  	$crlf   LF_CRLF   ""        ""          ""          ""
 	commit_chk_wrnNNO text  lf    	$crlf   ""       CRLF_LF    CRLF_LF     ""          CRLF_LF
 	commit_chk_wrnNNO text  crlf  	$crlf   LF_CRLF   ""        LF_CRLF     LF_CRLF     ""
 done
@@ -454,9 +453,9 @@ do
 	check_in_repo_NNO -text ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
 	check_in_repo_NNO -text lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
 	check_in_repo_NNO -text crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
-	check_in_repo_NNO auto  ""     $crlf   LF  LF    LF           LF_mix_CR  CRLF_nul
-	check_in_repo_NNO auto  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
-	check_in_repo_NNO auto  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
+	check_in_repo_NNO auto  ""     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+	check_in_repo_NNO auto  lf     $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
+	check_in_repo_NNO auto  crlf   $crlf   LF  CRLF  CRLF_mix_LF  LF_mix_CR  CRLF_nul
 	check_in_repo_NNO text  ""     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
 	check_in_repo_NNO text  lf     $crlf   LF  LF    LF           LF_mix_CR  LF_nul
 	check_in_repo_NNO text  crlf   $crlf   LF  LF    LF           LF_mix_CR  LF_nul
@@ -493,7 +492,8 @@ fi
 export CRLF_MIX_LF_CR MIX NL
 
 # Same handling with and without ident
-for id in "" ident
+#for id in "" ident
+for id in ""
 do
 	for ceol in lf crlf native
 	do
@@ -509,7 +509,7 @@ do
 			checkout_files text  "$id" "crlf" "$crlf" "$ceol"  CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
 			# currently the same as text, eol=XXX
 			checkout_files auto  "$id" "lf"   "$crlf" "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
-			checkout_files auto  "$id" "crlf" "$crlf" "$ceol"  CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
+			checkout_files auto  "$id" "crlf" "$crlf" "$ceol"  CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
 		done
 
 		# core.autocrlf false, different core.eol
@@ -517,7 +517,7 @@ do
 		# core.autocrlf true
 		checkout_files   ""    "$id" ""     true    "$ceol"  CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
 		# text: core.autocrlf = true overrides core.eol
-		checkout_files   auto  "$id" ""     true    "$ceol"  CRLF  CRLF  CRLF         LF_mix_CR    LF_nul
+		checkout_files   auto  "$id" ""     true    "$ceol"  CRLF  CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
 		checkout_files   text  "$id" ""     true    "$ceol"  CRLF  CRLF  CRLF         CRLF_mix_CR  CRLF_nul
 		# text: core.autocrlf = input overrides core.eol
 		checkout_files   text  "$id" ""     input   "$ceol"  LF    CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
@@ -531,8 +531,8 @@ do
 	checkout_files     text  "$id" ""     false   ""       $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
 	checkout_files     text  "$id" ""     false   native   $NL   CRLF  $MIX_CRLF_LF $MIX_LF_CR   $LFNUL
 	# auto: core.autocrlf=false and core.eol unset(or native) uses native eol
-	checkout_files     auto  "$id" ""     false   ""       $NL   CRLF  $MIX_CRLF_LF LF_mix_CR    LF_nul
-	checkout_files     auto  "$id" ""     false   native   $NL   CRLF  $MIX_CRLF_LF LF_mix_CR    LF_nul
+	checkout_files     auto  "$id" ""     false   ""       $NL   CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
+	checkout_files     auto  "$id" ""     false   native   $NL   CRLF  CRLF_mix_LF  LF_mix_CR    LF_nul
 done
 
 # Should be the last test case: remove some files from the worktree
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..33b77ee 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -16,6 +16,13 @@ test_description='CRLF merge conflict across text=auto change
 
 test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
 
+compare_files () {
+	tr '\015\000' QN <"$1" >"$1".expect &&
+	tr '\015\000' QN <"$2" >"$2".actual &&
+	test_cmp "$1".expect "$2".actual &&
+	rm "$1".expect "$2".actual
+}
+
 test_expect_success setup '
 	git config core.autocrlf false &&
 
@@ -30,7 +37,7 @@ test_expect_success setup '
 	git branch side &&
 
 	echo "* text=auto" >.gitattributes &&
-	touch file &&
+	echo first line >file &&
 	git add .gitattributes file &&
 	test_tick &&
 	git commit -m "normalize file" &&
@@ -81,7 +88,7 @@ test_expect_success 'Merge after setting text=auto' '
 	rm -f .gitattributes &&
 	git reset --hard a &&
 	git merge b &&
-	test_cmp expected file
+	compare_files expected file
 '
 
 test_expect_success 'Merge addition of text=auto' '
@@ -99,7 +106,7 @@ test_expect_success 'Merge addition of text=auto' '
 	rm -f .gitattributes &&
 	git reset --hard b &&
 	git merge a &&
-	test_cmp expected file
+	compare_files  expected file
 '
 
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
@@ -121,7 +128,7 @@ test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
 	git reset --hard a &&
 	test_must_fail git merge b &&
 	fuzz_conflict file >file.fuzzy &&
-	test_cmp expected file.fuzzy
+	compare_files expected file.fuzzy
 '
 
 test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
@@ -143,7 +150,7 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
 	git reset --hard b &&
 	test_must_fail git merge a &&
 	fuzz_conflict file >file.fuzzy &&
-	test_cmp expected file.fuzzy
+	compare_files expected file.fuzzy
 '
 
 test_expect_failure 'checkout -m after setting text=auto' '
@@ -158,7 +165,7 @@ test_expect_failure 'checkout -m after setting text=auto' '
 	git reset --hard initial &&
 	git checkout a -- . &&
 	git checkout -m b &&
-	test_cmp expected file
+	compare_files expected file
 '
 
 test_expect_failure 'checkout -m addition of text=auto' '
@@ -173,7 +180,7 @@ test_expect_failure 'checkout -m addition of text=auto' '
 	git reset --hard initial &&
 	git checkout b -- . &&
 	git checkout -m a &&
-	test_cmp expected file
+	compare_files expected file
 '
 
 test_expect_failure 'cherry-pick patch from after text=auto was added' '
@@ -187,7 +194,7 @@ test_expect_failure 'cherry-pick patch from after text=auto was added' '
 	git reset --hard b &&
 	test_must_fail git cherry-pick a >err 2>&1 &&
 	grep "[Nn]othing added" err &&
-	test_cmp expected file
+	compare_files expected file
 '
 
 test_expect_success 'Test delete/normalize conflict' '
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper
  2016-05-15 22:14   ` Junio C Hamano
                       ` (5 preceding siblings ...)
  2016-06-07 15:20     ` [PATCH v2 1/3] convert: unify the "auto" handling of CRLF tboegi
@ 2016-06-07 15:20     ` tboegi
  2016-06-07 15:20     ` [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge tboegi
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().

This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.

Add a wrapper definition get_sha1_from_cache().

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 cache.h      |  3 +++
 read-cache.c | 29 ++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 6049f86..322ee40 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path)  get_sha1_from_index (&the_index, (path))
 #endif
 
 enum object_type {
@@ -1050,6 +1051,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
 	return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
 }
 
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
 /*
  * This internal function is only declared here for the benefit of
  * lookup_replace_object().  Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
 
 void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
 {
-	int pos, len;
+	const unsigned char *sha1;
 	unsigned long sz;
 	enum object_type type;
 	void *data;
 
-	len = strlen(path);
-	pos = index_name_pos(istate, path, len);
+	sha1 = get_sha1_from_index(istate, path);
+	if (!sha1)
+		return NULL;
+	data = read_sha1_file(sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return NULL;
+	}
+	if (size)
+		*size = sz;
+	return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+	int pos = index_name_pos(istate, path, strlen(path));
 	if (pos < 0) {
 		/*
 		 * We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
 	}
 	if (pos < 0)
 		return NULL;
-	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
-	if (!data || type != OBJ_BLOB) {
-		free(data);
-		return NULL;
-	}
-	if (size)
-		*size = sz;
-	return data;
+	return (istate->cache[pos]->sha1);
 }
 
 void stat_validity_clear(struct stat_validity *sv)
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge
  2016-05-15 22:14   ` Junio C Hamano
                       ` (6 preceding siblings ...)
  2016-06-07 15:20     ` [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-06-07 15:20     ` tboegi
  7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
  To: git; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

The following didn't work as expected:
- In a middle of a merge
- merge.renormalize is true,
- .gitattributes = "* text=auto"
- core.eol = crlf

Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".

The expected result of the merge is "first line\nsame line\n".

The content in the working tree is "first line\r\nsame line\r\n",
and ce_compare_data() should find that the content is clean and return 0.

Deep down crlf_to_git() is invoked, to check if CRLF are converted or not.

The "new safer autocrlf handling" calls blob_has_cr().

Instead of using the sha1 of the blob, (CRLF in this example),
the function get_sha1_from_index() is invoked.
get_sha1_from_index() decides to return "ours" when in the middle of
the merge, which is LF.

As a result, the CRLF in the worktree are converted into LF before
the comparison.
The contents of LF and CRLF don't match any more.

The problem is that ce_compare_data() has ce->sha1, but the sha1 is lost
on it's way into blob_has_cr().

Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
that blob_has_cr() looks at the appropriate blob.

Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
for index_blob_sha1, and the sha1 is determined from path
using get_sha1_from_cache(path). This is the same handling as before.

In the same spirit, forward the sha1 into would_convert_to_git().

While at it, rename has_cr_in_index() into blob_has_cr()
and replace 0 with SAFE_CRLF_FALSE.

Add a TC in t6038 to have a test coverage under Linux.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 builtin/apply.c            |  3 ++-
 builtin/blame.c            |  2 +-
 cache.h                    |  1 +
 combine-diff.c             |  3 ++-
 convert.c                  | 43 ++++++++++++++++++++++++++------------
 convert.h                  | 15 ++++++++++----
 diff.c                     |  3 ++-
 dir.c                      |  2 +-
 read-cache.c               |  4 +++-
 sha1_file.c                | 12 ++++++++---
 t/t6038-merge-text-auto.sh | 51 +++++++++++++++++++++++++---------------------
 11 files changed, 90 insertions(+), 49 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..0cf9a0a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
 	case S_IFREG:
 		if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
 			return error(_("unable to open or read %s"), path);
-		convert_to_git(path, buf->buf, buf->len, buf, 0);
+		convert_to_git(path, buf->buf, buf->len, buf,
+			       SAFE_CRLF_FALSE, NULL);
 		return 0;
 	default:
 		return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..1c523b6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		if (strbuf_read(&buf, 0, 0) < 0)
 			die_errno("failed to read from stdin");
 	}
-	convert_to_git(path, buf.buf, buf.len, &buf, 0);
+	convert_to_git(path, buf.buf, buf.len, &buf, SAFE_CRLF_FALSE, NULL);
 	origin->file.ptr = buf.buf;
 	origin->file.size = buf.len;
 	pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 322ee40..86f48f9 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
 
 #define HASH_WRITE_OBJECT 1
 #define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
 
diff --git a/combine-diff.c b/combine-diff.c
index 8f2313d..8b535a7 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1052,7 +1052,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
 			if (is_file) {
 				struct strbuf buf = STRBUF_INIT;
 
-				if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+				if (convert_to_git(elem->path, result, len,
+						   &buf, safe_crlf, NULL)) {
 					free(result);
 					result = strbuf_detach(&buf, &len);
 					result_size = len;
diff --git a/convert.c b/convert.c
index 67d69b5..802ee7c 100644
--- a/convert.c
+++ b/convert.c
@@ -219,23 +219,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
 	}
 }
 
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *index_blob_sha1)
 {
 	unsigned long sz;
 	void *data;
-	int has_cr;
-
-	data = read_blob_data_from_cache(path, &sz);
+	int has_cr = 0;
+	enum object_type type;
+	if (!index_blob_sha1)
+		return 0;
+	data = read_sha1_file(index_blob_sha1, &type, &sz);
 	if (!data)
 		return 0;
-	has_cr = memchr(data, '\r', sz) != NULL;
+	if (type == OBJ_BLOB)
+		has_cr = memchr(data, '\r', sz) != NULL;
+
 	free(data);
 	return has_cr;
 }
 
 static int crlf_to_git(const char *path, const char *src, size_t len,
 		       struct strbuf *buf,
-		       enum crlf_action crlf_action, enum safe_crlf checksafe)
+		       enum crlf_action crlf_action, enum safe_crlf checksafe,
+		       const unsigned char *index_blob_sha1)
 {
 	struct text_stat stats;
 	char *dst;
@@ -256,14 +261,23 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 	if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
 		if (convert_is_binary(len, &stats))
 			return 0;
+
 		/*
 		 * If the file in the index has any CR in it, do not convert.
 		 * This is the new safer autocrlf handling.
 		 */
 		if (checksafe == SAFE_CRLF_RENORMALIZE)
 			checksafe = SAFE_CRLF_FALSE;
-		else if (has_cr_in_index(path))
-			return 0;
+		else {
+			/*
+			 * If the file in the index has any CR in it, do not convert.
+			 * This is the new safer autocrlf handling.
+			 */
+			if (!index_blob_sha1)
+				index_blob_sha1 = get_sha1_from_cache(path);
+			if (blob_has_cr(index_blob_sha1))
+				return 0;
+		}
 	}
 	check_safe_crlf(path, crlf_action, &stats, checksafe);
 
@@ -855,7 +869,8 @@ const char *get_convert_attr_ascii(const char *path)
 }
 
 int convert_to_git(const char *path, const char *src, size_t len,
-                   struct strbuf *dst, enum safe_crlf checksafe)
+		   struct strbuf *dst, enum safe_crlf checksafe,
+		   const unsigned char *index_blob_sha1)
 {
 	int ret = 0;
 	const char *filter = NULL;
@@ -876,7 +891,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 		src = dst->buf;
 		len = dst->len;
 	}
-	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe, index_blob_sha1);
 	if (ret && dst) {
 		src = dst->buf;
 		len = dst->len;
@@ -885,7 +900,8 @@ int convert_to_git(const char *path, const char *src, size_t len,
 }
 
 void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
-			      enum safe_crlf checksafe)
+			      enum safe_crlf checksafe,
+			      const unsigned char *index_blob_sha1)
 {
 	struct conv_attrs ca;
 	convert_attrs(&ca, path);
@@ -896,7 +912,8 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
-	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
+		    checksafe, index_blob_sha1);
 	ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
 }
 
@@ -951,7 +968,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
 		src = dst->buf;
 		len = dst->len;
 	}
-	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE);
+	return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE, NULL);
 }
 
 /*****************************************************************
diff --git a/convert.h b/convert.h
index 81b6cdf..8c34dd5 100644
--- a/convert.h
+++ b/convert.h
@@ -39,19 +39,26 @@ extern const char *get_convert_attr_ascii(const char *path);
 
 /* returns 1 if *dst was used */
 extern int convert_to_git(const char *path, const char *src, size_t len,
-			  struct strbuf *dst, enum safe_crlf checksafe);
+			  struct strbuf *dst, enum safe_crlf checksafe,
+			  const unsigned char *index_blob_sha1);
+
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+static inline int would_convert_to_git(const char *path,
+				       const unsigned char *index_blob_sha1)
 {
-	return convert_to_git(path, NULL, 0, NULL, 0);
+	return convert_to_git(path, NULL, 0, NULL, SAFE_CRLF_FALSE,
+			      index_blob_sha1);
 }
+
 /* Precondition: would_convert_to_git_filter_fd(path) == true */
 extern void convert_to_git_filter_fd(const char *path, int fd,
 				     struct strbuf *dst,
-				     enum safe_crlf checksafe);
+				     enum safe_crlf checksafe,
+				     const unsigned char *index_blob_sha1);
+
 extern int would_convert_to_git_filter_fd(const char *path);
 
 /*****************************************************************
diff --git a/diff.c b/diff.c
index d3734d3..a8308e0 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
 		/*
 		 * Convert from working tree format to canonical git format
 		 */
-		if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+		if (convert_to_git(s->path, s->data, s->size, &buf,
+				   crlf_warn, NULL)) {
 			size_t size = 0;
 			munmap(s->data, s->size);
 			s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 6172b34..66f1ffb 100644
--- a/dir.c
+++ b/dir.c
@@ -712,7 +712,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
 				 (pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
 				 !ce_stage(active_cache[pos]) &&
 				 ce_uptodate(active_cache[pos]) &&
-				 !would_convert_to_git(fname))
+				 !would_convert_to_git(fname, NULL))
 				hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
 			else
 				hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 
 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_USE_SHA_NOT_PATH;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}
diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..10630f0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3273,6 +3273,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 
 	if (!type)
 		type = OBJ_BLOB;
@@ -3283,7 +3284,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	if ((type == OBJ_BLOB) && path) {
 		struct strbuf nbuf = STRBUF_INIT;
 		if (convert_to_git(path, buf, size, &nbuf,
-				   write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+				   write_object ? safe_crlf : SAFE_CRLF_FALSE,
+				   valid_sha1 ? sha1 : NULL)) {
 			buf = strbuf_detach(&nbuf, &size);
 			re_allocated = 1;
 		}
@@ -3311,13 +3313,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
 {
 	int ret;
 	const int write_object = flags & HASH_WRITE_OBJECT;
+	const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
 	struct strbuf sbuf = STRBUF_INIT;
 
 	assert(path);
 	assert(would_convert_to_git_filter_fd(path));
 
 	convert_to_git_filter_fd(path, fd, &sbuf,
-				 write_object ? safe_crlf : SAFE_CRLF_FALSE);
+				 write_object ? safe_crlf : SAFE_CRLF_FALSE,
+				 valid_sha1 ? sha1 : NULL);
 
 	if (write_object)
 		ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
@@ -3394,6 +3398,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	     enum object_type type, const char *path, unsigned flags)
 {
 	int ret;
+	const unsigned char *sha1_ce;
+	sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
 
 	/*
 	 * Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3404,7 +3410,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
 	else if (!S_ISREG(st->st_mode))
 		ret = index_pipe(sha1, fd, type, path, flags);
 	else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
-		 (path && would_convert_to_git(path)))
+		 (path && would_convert_to_git(path,sha1_ce)))
 		ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
 				 flags);
 	else
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 33b77ee..5e8d5fa 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -91,16 +91,13 @@ test_expect_success 'Merge after setting text=auto' '
 	compare_files expected file
 '
 
-test_expect_success 'Merge addition of text=auto' '
+test_expect_success 'Merge addition of text=auto eol=LF' '
+	git config core.eol lf &&
 	cat <<-\EOF >expected &&
 	first line
 	same line
 	EOF
 
-	if test_have_prereq NATIVE_CRLF; then
-		append_cr <expected >expected.temp &&
-		mv expected.temp expected
-	fi &&
 	git config merge.renormalize true &&
 	git rm -fr . &&
 	rm -f .gitattributes &&
@@ -109,17 +106,31 @@ test_expect_success 'Merge addition of text=auto' '
 	compare_files  expected file
 '
 
+test_expect_success 'Merge addition of text=auto eol=CRLF' '
+	git config core.eol crlf &&
+	cat <<-\EOF >expected &&
+	first line
+	same line
+	EOF
+
+	append_cr <expected >expected.temp &&
+	mv expected.temp expected &&
+	git config merge.renormalize true &&
+	git rm -fr . &&
+	rm -f .gitattributes &&
+	git reset --hard b &&
+	echo >&2 "After git reset --hard b" &&
+	git ls-files -s --eol >&2 &&
+	git merge a &&
+	compare_files  expected file
+'
+
 test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
+	git config core.eol native &&
 	echo "<<<<<<<" >expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected &&
-		echo ======= | append_cr >>expected
-	else
-		echo first line >>expected &&
-		echo same line >>expected &&
-		echo ======= >>expected
-	fi &&
+	echo first line >>expected &&
+	echo same line >>expected &&
+	echo ======= >>expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
 	echo ">>>>>>>" >>expected &&
@@ -135,15 +146,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
 	echo "<<<<<<<" >expected &&
 	echo first line | append_cr >>expected &&
 	echo same line | append_cr >>expected &&
-	if test_have_prereq NATIVE_CRLF; then
-		echo ======= | append_cr >>expected &&
-		echo first line | append_cr >>expected &&
-		echo same line | append_cr >>expected
-	else
-		echo ======= >>expected &&
-		echo first line >>expected &&
-		echo same line >>expected
-	fi &&
+	echo ======= >>expected &&
+	echo first line >>expected &&
+	echo same line >>expected &&
 	echo ">>>>>>>" >>expected &&
 	git config merge.renormalize false &&
 	rm -f .gitattributes &&
-- 
2.0.0.rc1.6318.g0c2c796

^ permalink raw reply related	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2016-06-07 15:15 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13 13:49 Bug report: Duplicate CRLF rewrite warnings on commit Adam Dinwoodie
2016-05-13 16:43 ` Junio C Hamano
2016-05-14  5:40   ` Torsten Bögershausen
2016-05-14 11:17     ` Adam Dinwoodie
2016-05-13 18:12 ` Jeff King
2016-05-13 19:46   ` Junio C Hamano
2016-05-13 19:53     ` Jeff King
2016-05-15  6:08 ` [PATCH/RFC v1 0/1] Quickfix ?No duplicate " tboegi
2016-05-15  6:08 ` [PATCH/RFC v1 1/1] No " tboegi
2016-05-15  6:15   ` Eric Sunshine
2016-05-15  6:37 ` [PATCH v1 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
2016-05-15  6:38 ` [PATCH v1 1/3] t6038; use crlf on all platforms tboegi
2016-05-15  6:42   ` Eric Sunshine
2016-05-15  6:38 ` [PATCH v1 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-15  6:45   ` Eric Sunshine
2016-05-15  6:38 ` [PATCH v1 3/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-15  6:52   ` Eric Sunshine
2016-05-15 22:14   ` Junio C Hamano
2016-05-16 15:51     ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
2016-05-16 16:13       ` Junio C Hamano
2016-05-17  4:08         ` Torsten Bögershausen
2016-05-17 16:09           ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
2016-05-17 18:39             ` Junio C Hamano
2016-05-17 16:41           ` [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-17 16:41           ` [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-17 16:41           ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-17 18:58             ` Junio C Hamano
2016-05-18  4:26               ` Torsten Bögershausen
2016-05-18 15:10                 ` Torsten Bögershausen
2016-05-19 14:21           ` [PATCH v5 0/2] CRLF: " tboegi
2016-05-19 23:10             ` Junio C Hamano
2016-05-19 14:21           ` [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-19 14:21           ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-19 23:03             ` Junio C Hamano
2016-05-20 17:12               ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-20 17:12               ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-20 17:46                 ` Junio C Hamano
2016-05-21 10:01               ` [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-21 10:01               ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-24 18:36                 ` Junio C Hamano
2016-05-16 15:51     ` [PATCH v3 1/1] " tboegi
2016-05-30 17:00     ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
2016-05-30 18:00       ` Junio C Hamano
2016-05-30 18:48         ` Junio C Hamano
2016-05-30 17:00     ` [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto" tboegi
2016-06-07 15:20     ` [PATCH v2 0/3] unified auto CRLF handling, V2 tboegi
2016-06-07 15:20     ` [PATCH v2 1/3] convert: unify the "auto" handling of CRLF tboegi
2016-06-07 15:20     ` [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-06-07 15:20     ` [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge tboegi
2016-05-15 13:02 ` [PATCH v2 0/3] CRLF-Handling: bug fix around ce_compare_data() tboegi
2016-05-15 13:02 ` [PATCH v2 1/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-15 13:02 ` [PATCH v2 2/3] convert: ce_compare_data() checks for a sha1 of a path tboegi
2016-05-15 13:02 ` [PATCH v2 3/3] t6038; use crlf on all platforms tboegi

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).