git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git blame breaking on repository with CRLF files
@ 2015-08-07 16:32 Benkstein, Frank
  2015-08-08  5:58 ` Torsten Bögershausen
  0 siblings, 1 reply; 7+ messages in thread
From: Benkstein, Frank @ 2015-08-07 16:32 UTC (permalink / raw)
  To: git@vger.kernel.org

Hello,

I am working working on Linux and am examining code in a git repository I do
not know much about.  I am only looking at files, not changing anything.  On
some files in the repository I get "00000000 (Not Committed Yet" for all lines
when running "git blame".  I checked with "git status", "git reset", "git
clean" that the files are indeed in the repository and unmodified.  I noticed
that this only happens with git v2.5.0.  With git v2.4.0 it looks correct, i.e.
the output has proper commit ids, Author names and dates..  With "git bisect" I
tracked this down to the following commit:

 commit 4bf256d67a85bed1e175ecc2706322eafe4489ca (HEAD, refs/bisect/bad)
 Author: Torsten Bögershausen <tboegi@web.de>
 Date:   Sun May 3 18:38:01 2015 +0200

     blame: CRLF in the working tree and LF in the repo

Digging further, it seems that most files in the repository are checked in with
CRLF line endings.  In my working tree these are checked out as LF - which
seems to be the exact opposite situation of what the commit is trying to
address.  When I set "core.autocrlf" to "false" I also get the correct behavior
of "git blame" - this is a workaround as long as I do not have to actually
modify anything.

Best regards,
Frank.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git blame breaking on repository with CRLF files
  2015-08-07 16:32 git blame breaking on repository with CRLF files Benkstein, Frank
@ 2015-08-08  5:58 ` Torsten Bögershausen
  2015-08-09 20:19   ` Torsten Bögershausen
  0 siblings, 1 reply; 7+ messages in thread
From: Torsten Bögershausen @ 2015-08-08  5:58 UTC (permalink / raw)
  To: Benkstein, Frank, git@vger.kernel.org

On 2015-08-07 18.32, Benkstein, Frank wrote:
> Hello,
> 
> I am working working on Linux and am examining code in a git repository I do
> not know much about.  I am only looking at files, not changing anything.  On
> some files in the repository I get "00000000 (Not Committed Yet" for all lines
> when running "git blame".  I checked with "git status", "git reset", "git
> clean" that the files are indeed in the repository and unmodified.  I noticed
> that this only happens with git v2.5.0.  With git v2.4.0 it looks correct, i.e.
> the output has proper commit ids, Author names and dates..  With "git bisect" I
> tracked this down to the following commit:
> 
>  commit 4bf256d67a85bed1e175ecc2706322eafe4489ca (HEAD, refs/bisect/bad)
>  Author: Torsten Bögershausen <tboegi@web.de>
>  Date:   Sun May 3 18:38:01 2015 +0200
> 
>      blame: CRLF in the working tree and LF in the repo
> 
> Digging further, it seems that most files in the repository are checked in with
> CRLF line endings.  In my working tree these are checked out as LF
Do I understand it right that you have files in the repo with CRLF ?
And these files are checked out with LF in the working tree ?
Are the files marked with .gitattributes ?
Or does the file have mixed line endings ?

(Unless I missed something: Git never strips CRLF into LF at checkout,
so I wonder how you ended up in this situation)

Is there a way to reproduce it?

 - which
> seems to be the exact opposite situation of what the commit is trying to
> address.  When I set "core.autocrlf" to "false" I also get the correct behavior
> of "git blame" - this is a workaround as long as I do not have to actually
> modify anything.
> 
> Best regards,
> Frank.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git blame breaking on repository with CRLF files
  2015-08-08  5:58 ` Torsten Bögershausen
@ 2015-08-09 20:19   ` Torsten Bögershausen
  2015-08-10  8:36     ` Benkstein, Frank
  2015-08-10 18:48     ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Torsten Bögershausen @ 2015-08-09 20:19 UTC (permalink / raw)
  To: Torsten Bögershausen, Benkstein, Frank, git@vger.kernel.org

On 2015-08-08 07.58, Torsten Bögershausen wrote:
> On 2015-08-07 18.32, Benkstein, Frank wrote:
>> Hello,
>>
>> I am working working on Linux and am examining code in a git repository I do
>> not know much about.  I am only looking at files, not changing anything.  On
>> some files in the repository I get "00000000 (Not Committed Yet" for all lines
>> when running "git blame".  I checked with "git status", "git reset", "git
>> clean" that the files are indeed in the repository and unmodified.  I noticed
>> that this only happens with git v2.5.0.  With git v2.4.0 it looks correct, i.e.
>> the output has proper commit ids, Author names and dates..  With "git bisect" I
>> tracked this down to the following commit:
>>
>>  commit 4bf256d67a85bed1e175ecc2706322eafe4489ca (HEAD, refs/bisect/bad)
>>  Author: Torsten Bögershausen <tboegi@web.de>
>>  Date:   Sun May 3 18:38:01 2015 +0200
>>
>>      blame: CRLF in the working tree and LF in the repo
>>
>> Digging further, it seems that most files in the repository are checked in with
>> CRLF line endings.  In my working tree these are checked out as LF
> Do I understand it right that you have files in the repo with CRLF ?
> And these files are checked out with LF in the working tree ?
> Are the files marked with .gitattributes ?
> Or does the file have mixed line endings ?
> 
> (Unless I missed something: Git never strips CRLF into LF at checkout,
> so I wonder how you ended up in this situation)
> 
> Is there a way to reproduce it?
> 
Actually I could reproduce the following:
CRLF in repo, CRLF in working tree, core.autocrlf= true.

This is an old limitation (or call it bug), which has been there for a long
time, (I tested with Git v1.7.0 from 2010).

Thanks for the report, we will see if anybody is able to fix it.
I can probably contribute some test cases.

>> seems to be the exact opposite situation of what the commit is trying to
>> address.  When I set "core.autocrlf" to "false" I also get the correct behavior
>> of "git blame" - this is a workaround as long as I do not have to actually
>> modify anything.
>>
>> Best regards,
>> Frank.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: git blame breaking on repository with CRLF files
  2015-08-09 20:19   ` Torsten Bögershausen
@ 2015-08-10  8:36     ` Benkstein, Frank
  2015-08-10 23:54       ` brian m. carlson
  2015-08-10 18:48     ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Benkstein, Frank @ 2015-08-10  8:36 UTC (permalink / raw)
  To: Torsten Bögershausen, git@vger.kernel.org

Hello Torsten,

Torsten Bögershausen, Sonntag, 9. August 2015 22:20:
> On 2015-08-08 07.58, Torsten Bögershausen wrote:
>> On 2015-08-07 18.32, Benkstein, Frank wrote:
>>> I am working working on Linux and am examining code in a git repository I do
>>> not know much about.  I am only looking at files, not changing anything.  On
>>> some files in the repository I get "00000000 (Not Committed Yet" for all lines
>>> when running "git blame".  I checked with "git status", "git reset", "git
>>> clean" that the files are indeed in the repository and unmodified.  I noticed
>>> that this only happens with git v2.5.0.  With git v2.4.0 it looks correct, i.e.
>>> the output has proper commit ids, Author names and dates..  With "git bisect" I
>>> tracked this down to the following commit:
>>>
>>>  commit 4bf256d67a85bed1e175ecc2706322eafe4489ca (HEAD, refs/bisect/bad)
>>>  Author: Torsten Bögershausen <tboegi@web.de>
>>>  Date:   Sun May 3 18:38:01 2015 +0200
>>>
>>>      blame: CRLF in the working tree and LF in the repo
>>>
>>> Digging further, it seems that most files in the repository are checked in with
>>> CRLF line endings.  In my working tree these are checked out as LF
>> Do I understand it right that you have files in the repo with CRLF ?
>> And these files are checked out with LF in the working tree ?
>> Are the files marked with .gitattributes ?
>> Or does the file have mixed line endings ?
>> 
>> (Unless I missed something: Git never strips CRLF into LF at checkout,
>> so I wonder how you ended up in this situation)

You were right.  They are CRLF in my working tree.  My editor tricked me.

>> Is there a way to reproduce it?
>> 
> Actually I could reproduce the following:
> CRLF in repo, CRLF in working tree, core.autocrlf= true.
> 
> This is an old limitation (or call it bug), which has been there for a long
> time, (I tested with Git v1.7.0 from 2010).
>
> Thanks for the report, we will see if anybody is able to fix it.
> I can probably contribute some test cases.

You are correct that it is also wrong in git v1.7.0.  However, it is correct in
v2.4.0.

Another bisect gave me this commit which was included in v2.0.1:

 commit 4d4813a52f3722854a54bab046f4abfec13ef6ae
 Author: brian m. carlson <sandals@crustytoothpaste.net>
 Date:   Sat Apr 26 23:10:40 2014 +0000

     blame: correctly handle files regardless of autocrlf

So this still looks like a regression v2.5.0 to me.

Regards,
Frank.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git blame breaking on repository with CRLF files
  2015-08-09 20:19   ` Torsten Bögershausen
  2015-08-10  8:36     ` Benkstein, Frank
@ 2015-08-10 18:48     ` Junio C Hamano
  2015-08-10 20:22       ` Torsten Bögershausen
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2015-08-10 18:48 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Benkstein, Frank, git@vger.kernel.org

Torsten Bögershausen <tboegi@web.de> writes:

> Actually I could reproduce the following:
> CRLF in repo, CRLF in working tree, core.autocrlf= true.

What should happen in such a case?  Wouldn't autocrlf=true want to
strip CRLF down to LF?  Shouldn't it?  And if so, "blame" is correct
to say that you are changing the line endings of all your lines, as
what you _would_ commit if you were to commit the tracked files in
your working tree would be different from what is in the index, no?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git blame breaking on repository with CRLF files
  2015-08-10 18:48     ` Junio C Hamano
@ 2015-08-10 20:22       ` Torsten Bögershausen
  0 siblings, 0 replies; 7+ messages in thread
From: Torsten Bögershausen @ 2015-08-10 20:22 UTC (permalink / raw)
  To: Junio C Hamano, Torsten Bögershausen
  Cc: Benkstein, Frank, git@vger.kernel.org

On 2015-08-10 20.48, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> Actually I could reproduce the following:
>> CRLF in repo, CRLF in working tree, core.autocrlf= true.
> 
> What should happen in such a case?  Wouldn't autocrlf=true want to
> strip CRLF down to LF?  Shouldn't it? 

A problem is, that "git status" would report a file as changed,
when it have been committed with CRLF and core.autocrlf was false.

The only "change" that "git status" would trigger on would be the EOL normalization.
So if core.autocrlf would be set true later,
git status reports files as changed.....

Long story short:
Once commited with CRLF, the files will not be normalized in a modern git:

From convert.c:
if (crlf_action == CRLF_GUESS) {
	/*
	 * If the file in the index has any CR in it, do not convert.
	 * This is the new safer autocrlf handling.
	 */
	if (has_cr_in_index(path))
		return 0;
}
---------------------
commit fd6cce9e89ab5ac1125a3b5f5611048ad22379e7
Author: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Date:   Wed May 19 22:43:10 2010 +0200

    Add per-repository eol normalization

    Change the semantics of the "crlf" attribute so that it enables
    end-of-line normalization when it is set, regardless of "core.autocrlf".

    Add a new setting for "crlf": "auto", which enables end-of-line
    conversion but does not override the automatic text file detection.

    Add a new attribute "eol" with possible values "crlf" and "lf".  When
    set, this attribute enables normalization and forces git to use CRLF or
    LF line endings in the working directory, respectively.

    The line ending style to be used for normalized text files in the
    working directory is set using "core.autocrlf".  When it is set to
    "true", CRLFs are used in the working directory; when set to "input" or
    "false", LFs are used.
-----------------
So "git status" is somewhat improved, but "git blame" is not.
(My feeling/suspicion is that has_cr_in_index() should be replaced
by has_cr_in_latest_commit() to have "git status" consistent
with "git blame", but more analyzes may be needed.)

A different approach could be to ignore the EOL differences
completely in "git blame" (when core.autocrlf is set and the file
is text, or when the "text" attribute is set).


> And if so, "blame" is correct
> to say that you are changing the line endings of all your lines, as
> what you _would_ commit if you were to commit the tracked files in
> your working tree would be different from what is in the index, no?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git blame breaking on repository with CRLF files
  2015-08-10  8:36     ` Benkstein, Frank
@ 2015-08-10 23:54       ` brian m. carlson
  0 siblings, 0 replies; 7+ messages in thread
From: brian m. carlson @ 2015-08-10 23:54 UTC (permalink / raw)
  To: Benkstein, Frank; +Cc: Torsten Bögershausen, git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

On Mon, Aug 10, 2015 at 08:36:35AM +0000, Benkstein, Frank wrote:
> You are correct that it is also wrong in git v1.7.0.  However, it is correct in
> v2.4.0.
> 
> Another bisect gave me this commit which was included in v2.0.1:
> 
>  commit 4d4813a52f3722854a54bab046f4abfec13ef6ae
>  Author: brian m. carlson <sandals@crustytoothpaste.net>
>  Date:   Sat Apr 26 23:10:40 2014 +0000
> 
>      blame: correctly handle files regardless of autocrlf
> 
> So this still looks like a regression v2.5.0 to me.

This commit was reverted because it was decided that it wasn't the right
way to handle the problem and it broke other things.  The complexity of
the CRLF handling is a bit beyond me, to be honest.  I'm sure I'd
understand it better if I used it more, but I'm a Unix guy.

I stand by my earlier statement that we should improve the documentation
in this area, because it's a common source of confusion.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-10 23:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-07 16:32 git blame breaking on repository with CRLF files Benkstein, Frank
2015-08-08  5:58 ` Torsten Bögershausen
2015-08-09 20:19   ` Torsten Bögershausen
2015-08-10  8:36     ` Benkstein, Frank
2015-08-10 23:54       ` brian m. carlson
2015-08-10 18:48     ` Junio C Hamano
2015-08-10 20:22       ` Torsten Bögershausen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).