git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* crlf with git-svn driving me nuts...
@ 2008-04-16 19:10 Nigel Magnay
  2008-04-16 20:01 ` Dmitry Potapov
  2008-04-16 20:03 ` Avery Pennarun
  0 siblings, 2 replies; 19+ messages in thread
From: Nigel Magnay @ 2008-04-16 19:10 UTC (permalink / raw)
  To: git

We've got projects with a mixed userbase of windows / *nix; I'm trying
to migrate some users onto git, whilst everyone else stays happy in
their SVN repo.

However, there's one issue that has been driving me slowly insane.
This is best illustrated thusly (on windows) :

  $ git init
  $ git config core.autocrlf false

-->Create a file with some text content on a few lines
  $ notepad file.txt

  $ git add file.txt
  $ git commit -m "initial checkin"

  $ git status
# On branch master
nothing to commit (working directory clean)
--> Yarp, what I wanted

  $ git config core.autocrlf true
  $ git status

# On branch master
nothing to commit (working directory clean)
--> Yarp, still all good

--> Simulate non-change happened by an editor opening file...
  $ touch file.txt
  $ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#       modified:   file.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

--> Oh Noes! I wonder what it could be
  $ git diff file.txt
diff --git a/file.txt b/file.txt
index 7a2051f..31ca3a0 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,3 @@
-<xml>
-       wooot
-</xml>
+<xml>
+       wooot
+</xml>

--> Huh? ...
  $ git diff -b file.txt
diff --git a/file.txt b/file.txt
index 7a2051f..31ca3a0 100644

--> Bah... don't care! get me back to the start...
  $ git reset --hard

HEAD is now at 4762c31... initial checkin

  $ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#       modified:   file.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

--> ARGH!
  $ git config core.autocrlf false
  $ git status
# On branch master
nothing to commit (working directory clean)

  $ git config core.autocrlf true
  $ git status
# On branch master
nothing to commit (working directory clean)

--> WtF?

Why does it think in this instance that there is a change? It's CRLF
in the repo, it's CRLF in the working tree, and the checkout in either
mode ought to be identical ??

Now this is further compounded by the fact that users then typically
tend to do a 'CRLF->LF conversion' checkin - *BUT* this will cause
merge conflicts if another user actually made a genuine change (I.E.
the removal of CR and the change are both treated as significant).

Additional fun is caused because some editors 'touching' files that
they actually haven't modified, leading to all these 'null' changes.

This is a bigger deal for us than it ought to be, because we're
pulling changes from a windows-based svn repo, which is always CRLF.
Should I set core.autocrlf=input when doing 'git svn fetch' (and would
it pay any attention)? Also is it possible to tell the diff / merge
machinery that it ought to just ignore text file line endings when
merging ?

Sorry if some of this is stupid-user territory, but there's probably a
few people out there also looking at trying to migrate away from
Windows+SVN that are likely to hit the same things...

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 19:10 crlf with git-svn driving me nuts Nigel Magnay
@ 2008-04-16 20:01 ` Dmitry Potapov
  2008-04-16 20:20   ` Avery Pennarun
  2008-04-16 20:56   ` Martin Langhoff
  2008-04-16 20:03 ` Avery Pennarun
  1 sibling, 2 replies; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-16 20:01 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On Wed, Apr 16, 2008 at 08:10:26PM +0100, Nigel Magnay wrote:
> We've got projects with a mixed userbase of windows / *nix; I'm trying
> to migrate some users onto git, whilst everyone else stays happy in
> their SVN repo.
> 
> However, there's one issue that has been driving me slowly insane.
> This is best illustrated thusly (on windows) :
> 
>   $ git init
>   $ git config core.autocrlf false

core.autocrlf=false is a bad choice for Windows.

> 
> -->Create a file with some text content on a few lines
>   $ notepad file.txt
> 
>   $ git add file.txt
>   $ git commit -m "initial checkin"

You added a file with the CRLF ending in the repository!
You are going to have problems now...

> 
>   $ git status
> # On branch master
> nothing to commit (working directory clean)
> --> Yarp, what I wanted
> 
>   $ git config core.autocrlf true
>   $ git status

You should not change core.autocrlf during your work, or you
are going to have some funny problems. If you really need to
change it, it should be followed by "git reset --hard".

In this case, you already have a file with the wrong ending,
so file.txt will be shown as changed now, because if you commit
it again then it will be commited with <LF>, which should have
been done in the first place.

> 
> # On branch master
> nothing to commit (working directory clean)
> --> Yarp, still all good
> 
> --> Simulate non-change happened by an editor opening file...
>   $ touch file.txt
>   $ git status
> # On branch master
> # Changed but not updated:
> #   (use "git add <file>..." to update what will be committed)
> #
> #       modified:   file.txt
> #
> no changes added to commit (use "git add" and/or "git commit -a")
> 
> --> Oh Noes! I wonder what it could be
>   $ git diff file.txt
> diff --git a/file.txt b/file.txt
> index 7a2051f..31ca3a0 100644
> --- a/file.txt
> +++ b/file.txt
> @@ -1,3 +1,3 @@
> -<xml>
> -       wooot
> -</xml>
> +<xml>
> +       wooot
> +</xml>
> 
> --> Huh? ...

Actually, it is

@@ -1,3 +1,3 @@
-<xml>^M
-       wooot^M
-</xml>^M
+<xml>
+       wooot
+</xml>

where ^M is <CR>

> 
> --> WtF?
> 
> Why does it think in this instance that there is a change? It's CRLF
> in the repo, it's CRLF in the working tree, and the checkout in either
> mode ought to be identical ??

If you do not want problems, you should use core.autocrlf=true
on Windows. Then all text files will be stored in the repository
with <LF>, but they will have <CR><LF> in your work tree.
Users on *nix should set core.autocrlf=input or false, so they
will have <LF> in their work tree.

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 19:10 crlf with git-svn driving me nuts Nigel Magnay
  2008-04-16 20:01 ` Dmitry Potapov
@ 2008-04-16 20:03 ` Avery Pennarun
  1 sibling, 0 replies; 19+ messages in thread
From: Avery Pennarun @ 2008-04-16 20:03 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On 4/16/08, Nigel Magnay <nigel.magnay@gmail.com> wrote:
>  Why does it think in this instance that there is a change? It's CRLF
>  in the repo, it's CRLF in the working tree, and the checkout in either
>  mode ought to be identical ??

We got quite confused by this here too.  I'm pretty sure git's
autocrlf feature is buggy, as you've noticed.  Combined with that, svn
has its *own* kind of autocrlf feature (svn:eol-style property on each
file) that acts completely differently.

As an added bonus, I don't know if you've run into this yet, but
cygwin's "patch" command seems to unconditionally strip CR from
patches *before* trying to apply them at all, *even if* the target
file is CRLF, so patches just never apply to CRLF files ever.  Ha ha!

I managed to make the two systems stop stomping on each other, in our
case, by using svn:eol-style of "native" (which means when git-svn
checks out the file, it gets only LF, since it seems to always claim
to be Unix) and not using git's autocrlf at all.  However, this isn't
optimal since then Windows git users end up with LF instead of CRLF in
their files, which confuses them.

On the other hand, the conflicts and the random-newline-changing diffs
go away, as svn fixes things up at checkin time no matter how badly
they got mangled by the windows user (most commonly, they run a
program that resaves the whole file as CRLF).

Obviously a working git autocrlf feature would be better, but I
haven't looked into it closely enough to say where the problem
actually lies.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:01 ` Dmitry Potapov
@ 2008-04-16 20:20   ` Avery Pennarun
  2008-04-16 20:39     ` Dmitry Potapov
  2008-04-16 20:56   ` Martin Langhoff
  1 sibling, 1 reply; 19+ messages in thread
From: Avery Pennarun @ 2008-04-16 20:20 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Nigel Magnay, git

On 4/16/08, Dmitry Potapov <dpotapov@gmail.com> wrote:
>  In this case, you already have a file with the wrong ending,
>  so file.txt will be shown as changed now, because if you commit
>  it again then it will be commited with <LF>, which should have
>  been done in the first place.
[...]
> If you do not want problems, you should use core.autocrlf=true
>  on Windows. Then all text files will be stored in the repository
>  with <LF>, but they will have <CR><LF> in your work tree.
>  Users on *nix should set core.autocrlf=input or false, so they
>  will have <LF> in their work tree.

Alas, the subject of this thread involves git-svn, and the typical
git-svn user is someone who has no way of rewriting the existing
history in their svn repositories.  Thus, files *will* be in the
repository that have the wrong line endings, and (as you noted) git
just gets totally confused in that case.

Nigel's example showed a few situations where git *thought* the file
had changed when it hadn't, and yet is incapable of checking in the
changes.

If all I had to do was checkout (thus converting everything to LF),
and then "git commit -a" to check in all the corrected files, then
git-svn would make one giant, very rude checkin to svn, and my
problems would be largely solved.  However, this does not seem to be
possible due to the problems you noted ("you are going to have
problems now").

Have fun,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:20   ` Avery Pennarun
@ 2008-04-16 20:39     ` Dmitry Potapov
  2008-04-16 21:56       ` Nigel Magnay
       [not found]       ` <320075ff0804161447u25dfbb2bmcd36ea507224d835@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-16 20:39 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Nigel Magnay, git

On Wed, Apr 16, 2008 at 04:20:27PM -0400, Avery Pennarun wrote:
> On 4/16/08, Dmitry Potapov <dpotapov@gmail.com> wrote:
> >  In this case, you already have a file with the wrong ending,
> >  so file.txt will be shown as changed now, because if you commit
> >  it again then it will be commited with <LF>, which should have
> >  been done in the first place.
> [...]
> > If you do not want problems, you should use core.autocrlf=true
> >  on Windows. Then all text files will be stored in the repository
> >  with <LF>, but they will have <CR><LF> in your work tree.
> >  Users on *nix should set core.autocrlf=input or false, so they
> >  will have <LF> in their work tree.
> 
> Alas, the subject of this thread involves git-svn, and the typical
> git-svn user is someone who has no way of rewriting the existing
> history in their svn repositories.  Thus, files *will* be in the
> repository that have the wrong line endings, and (as you noted) git
> just gets totally confused in that case.

Actually, what matters in what format files are in _Git_ repository.
Maybe, there is a problem with git-svn and how it imports SVN commits
to Git, but I have not encountered it.

> Nigel's example showed a few situations where git *thought* the file
> had changed when it hadn't, and yet is incapable of checking in the
> changes.

Incapable of checking in? I have not found a single example in
his mail where it was impossible. The only quirk with autocrlf
is that you need to re-checkout your work tree after changing
it. There is no other problems with it as far as I know.

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:01 ` Dmitry Potapov
  2008-04-16 20:20   ` Avery Pennarun
@ 2008-04-16 20:56   ` Martin Langhoff
  2008-04-16 21:02     ` Avery Pennarun
  2008-04-16 21:17     ` Dmitry Potapov
  1 sibling, 2 replies; 19+ messages in thread
From: Martin Langhoff @ 2008-04-16 20:56 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Nigel Magnay, git

On Wed, Apr 16, 2008 at 3:01 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
>  core.autocrlf=false is a bad choice for Windows.
...
>  If you do not want problems, you should use core.autocrlf=true
>  on Windows.

If you are making the above statements in generally about git, I
disagree. I have used msysgit a lot with unix-newlines projects, and
it works fantastic. I am careful to work with newline-smart editors
but any half-decent editor will cope. The general hint is: avoid any
content-mangling options if possible, and git will do the right thing.

OTOH, you might be referring to git-svn on Windows, which I have no
experience with :-)

cheers,



martin
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:56   ` Martin Langhoff
@ 2008-04-16 21:02     ` Avery Pennarun
  2008-04-16 21:17     ` Dmitry Potapov
  1 sibling, 0 replies; 19+ messages in thread
From: Avery Pennarun @ 2008-04-16 21:02 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Dmitry Potapov, Nigel Magnay, git

On 4/16/08, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> On Wed, Apr 16, 2008 at 3:01 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> >  If you do not want problems, you should use core.autocrlf=true
> >  on Windows.
>
> If you are making the above statements in generally about git, I
>  disagree. I have used msysgit a lot with unix-newlines projects, and
>  it works fantastic. I am careful to work with newline-smart editors
>  but any half-decent editor will cope. The general hint is: avoid any
>  content-mangling options if possible, and git will do the right thing.

Various Windows IDEs (notably Delphi... and notepad :)) get confused
by non-CRLF files and either do random things to the file, fail to
compile, or "helpfully" change all the line endings back to CRLF.  I
agree that any program that does any such thing is braindead, but
unfortunately, some people are stuck with such programs.

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:56   ` Martin Langhoff
  2008-04-16 21:02     ` Avery Pennarun
@ 2008-04-16 21:17     ` Dmitry Potapov
  1 sibling, 0 replies; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-16 21:17 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Nigel Magnay, git

On Wed, Apr 16, 2008 at 03:56:18PM -0500, Martin Langhoff wrote:
> On Wed, Apr 16, 2008 at 3:01 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> >  core.autocrlf=false is a bad choice for Windows.
> ...
> >  If you do not want problems, you should use core.autocrlf=true
> >  on Windows.
> 
> If you are making the above statements in generally about git, I
> disagree.

I stand corrected. It should be either core.autocrlf=true is you
like DOS ending or core.autocrlf=input if you prefer unix-newlines.
In both cases, your Git repository will have only LF, which is the
Right Thing. The only argument for core.autocrlf=false was that
automatic heuristic may incorrectly detect some binary as text and
then your tile will be corrupted. So, core.safecrlf option was
introduced to warn a user if a irreversable change happens. In fact,
there are two possibilities of irreversable changes -- mixed line-ending
in text file, in this normalization is desirable, so this warning can be
ignored, or (very unlikely) that Git incorrectly detected your binary
file as text. Then you need to use attributes to tell Git that this file
is binary.

I have not used git-svn on Windows for some time now, because now I have
a mirror running on Linux, so I clone directly from it.

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 20:39     ` Dmitry Potapov
@ 2008-04-16 21:56       ` Nigel Magnay
       [not found]       ` <320075ff0804161447u25dfbb2bmcd36ea507224d835@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Nigel Magnay @ 2008-04-16 21:56 UTC (permalink / raw)
  To: git

>  > Nigel's example showed a few situations where git *thought* the file
>  > had changed when it hadn't, and yet is incapable of checking in the
>  > changes.
>
>  Incapable of checking in? I have not found a single example in
>  his mail where it was impossible. The only quirk with autocrlf
>  is that you need to re-checkout your work tree after changing
>  it. There is no other problems with it as far as I know.
>

My (initial) setting of core.autocrlf to false was because that's what
it was on all the windows clients (I know the default has now changed)
and to make the later parts of the script obvious that the file in the
repo had a CRLF ending, rather than have being converted to LF. That's
the situation we have, because they've all come from SVN.

The bit I really don't understand is why git thinks a file that has
just been touched has chnaged when it hasn't, and doing a 'git reset
--hard' actually doesn't help at all (but, bizzarely, git config
core.autocrlf false & git config core.autocrlf true *does* !). The
repo copy is CRLF, the working copy is CRLF, but git thinks it's
changed...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
       [not found]         ` <20080416223739.GJ3133@dpotapov.dyndns.org>
@ 2008-04-16 23:07           ` Nigel Magnay
  2008-04-17  0:46             ` Dmitry Potapov
  2008-04-17  5:43             ` Steffen Prohaska
  0 siblings, 2 replies; 19+ messages in thread
From: Nigel Magnay @ 2008-04-16 23:07 UTC (permalink / raw)
  To: Dmitry Potapov, git

>  > The bit I really don't understand is why git thinks a file that has
>  > just been touched has chnaged when it hasn't,
>
>  Actually, it did change in the sense that if you try to commit this
>  file now into the repository, you will have a different file in Git!
>  So, it is more correct to say that Git did not notice this change until
>  you touch this file, because this change is indirect (autocrlf causes
>  a different interpretation of the file).
>

Okay - at the very least this behaviour is really, really confusing.
And I think there's actually a bug (it should *always* report that the
file is different), not magically after it's been touched.

But fixing that minor bug still leads to badness for the user. Doing
(on a core.autocrlf=true machine) a checkout of any revision
containing a file that is (currently) CRLF in the repository, and your
WC is *immediately* dirty. However technically correct that is, it
doesn't fit most people's user model of an SCM, because they haven't
made any modification. And if 1 person makes a change along with their
conversion, and the other 'just' does a CRLF->LF conversion, their
revisions will conflict at merge time. Blech. And because the svn is
mastered crlf (well, strictly speaking, it's ignorant of line endings)
this is gonna happen a lot.

Can't git be taught that if the WC is byte-identical to the revision
in the repository (regardless of autocrlf) then that ought not to be
regarded as a change?
Is there a way I can persuade the diff / merge mechanisms to normalise
before they operate? (e.g if core.autocrlf does lf->crlf/crlf->lf,
then an equivalent that does crlf->lf/crlf->lf before doing the merge
)?

In a perfect world I'd be able to switch all files int he repo to LF,
but that's not going to happen any time soon because of the majority
of developers, still on svn, still on windows.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 23:07           ` Nigel Magnay
@ 2008-04-17  0:46             ` Dmitry Potapov
  2008-04-17  1:44               ` Avery Pennarun
  2008-04-17  7:07               ` Nigel Magnay
  2008-04-17  5:43             ` Steffen Prohaska
  1 sibling, 2 replies; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-17  0:46 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On Thu, Apr 17, 2008 at 12:07:27AM +0100, Nigel Magnay wrote:
> >  > The bit I really don't understand is why git thinks a file that has
> >  > just been touched has chnaged when it hasn't,
> >
> >  Actually, it did change in the sense that if you try to commit this
> >  file now into the repository, you will have a different file in Git!
> >  So, it is more correct to say that Git did not notice this change until
> >  you touch this file, because this change is indirect (autocrlf causes
> >  a different interpretation of the file).
> >
> 
> Okay - at the very least this behaviour is really, really confusing.
> And I think there's actually a bug (it should *always* report that the
> file is different), not magically after it's been touched.

I don't think there is a simple way to correct that without penalizing
normal use cases. Usually, people do not change autocrlf during their
normal work. Besides, you can have your own input filters and they may
cause the same effect. So, Git works in the assumption that input filters
always produce the same results...

> 
> But fixing that minor bug still leads to badness for the user. Doing
> (on a core.autocrlf=true machine) a checkout of any revision
> containing a file that is (currently) CRLF in the repository, and your
> WC is *immediately* dirty. However technically correct that is, it
> doesn't fit most people's user model of an SCM, because they haven't
> made any modification.

IMHO, the only sane way is never store CRLF in the Git repository.
You can have whatever ending you like in your work tree, but inside
of Git, LF is the actually marker of the end-of-line.

> And if 1 person makes a change along with their
> conversion, and the other 'just' does a CRLF->LF conversion,

If you imported correctly in Git, it should not have CRLF for text
files. So, there is no conversion that a user does expliciltly.

> And because the svn is
> mastered crlf (well, strictly speaking, it's ignorant of line endings)
> this is gonna happen a lot.

Not really. SVN has its own setting for EOL conversion. If you have
'svn:eol-style' set to 'native' for any text file then SVN will
checkout text files accordingly to your native EOL (you can specify
your native EOL using the --native-eol option when it is necessary).

> Can't git be taught that if the WC is byte-identical to the revision
> in the repository (regardless of autocrlf) then that ought not to be
> regarded as a change?

Why should not it? If a file is different as long as Git repository is
concern then then it *is* a change. Git binary compare files _after_
applying all specified filters (and you can have your own filters, not
only autocrlf).

> Is there a way I can persuade the diff / merge mechanisms to normalise
> before they operate? (e.g if core.autocrlf does lf->crlf/crlf->lf,
> then an equivalent that does crlf->lf/crlf->lf before doing the merge
> )?

I am not sure if there is a standard option for that, but it is
certainly possible to define your own merge strategy.

> 
> In a perfect world I'd be able to switch all files int he repo to LF,
> but that's not going to happen any time soon because of the majority
> of developers, still on svn, still on windows.

Well, I don't see any problem here if everything is configured properly.
How files are stored inside and what you have in your work tree does
not have to be the same. So, storing everything inside with LF is
certainly possible. Actually, I believe it is exactly what CVS does
(unless you added a file with '-kb'), and people use CVS on Windows.
Importing files with CRLF in Git, it is like putting files as _binary_
in CVS.

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17  0:46             ` Dmitry Potapov
@ 2008-04-17  1:44               ` Avery Pennarun
  2008-04-17  7:07               ` Nigel Magnay
  1 sibling, 0 replies; 19+ messages in thread
From: Avery Pennarun @ 2008-04-17  1:44 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Nigel Magnay, git

On 4/16/08, Dmitry Potapov <dpotapov@gmail.com> wrote:
> On Thu, Apr 17, 2008 at 12:07:27AM +0100, Nigel Magnay wrote:
>  > Okay - at the very least this behaviour is really, really confusing.
>  > And I think there's actually a bug (it should *always* report that the
>  > file is different), not magically after it's been touched.
>
> I don't think there is a simple way to correct that without penalizing
>  normal use cases. Usually, people do not change autocrlf during their
>  normal work. Besides, you can have your own input filters and they may
>  cause the same effect. So, Git works in the assumption that input filters
>  always produce the same results...

However, it doesn't check that before it marks the file as unmodified
right after checkout.  That is, the problem is hidden until the file's
mtime changes.

Is there a way to quickly check that every file in the repo is "sane",
ie. the input filter is the proper inverse of the output filter and
will put each file back in the repo?  This is pretty important for
anyone designing any kind of input filter, or bugs will go undetected
until some later time when they're confusing.

> If you imported correctly in Git, it should not have CRLF for text
>  files. So, there is no conversion that a user does expliciltly.

Can you give a set of steps for how to import "correctly" using git-svn?

Remember that a given svn repository might have long ago been
configured to store CRLF (actually, to store files without changing
their line endings), since that is the svn default.  Also remember
that the svn:eol-style flag may be set differently on various files in
svn, and may have changed in different svn revisions over time.

Thanks,

Avery

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-16 23:07           ` Nigel Magnay
  2008-04-17  0:46             ` Dmitry Potapov
@ 2008-04-17  5:43             ` Steffen Prohaska
  1 sibling, 0 replies; 19+ messages in thread
From: Steffen Prohaska @ 2008-04-17  5:43 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: Dmitry Potapov, git


On Apr 17, 2008, at 1:07 AM, Nigel Magnay wrote:

> In a perfect world I'd be able to switch all files int the repo to LF,
> but that's not going to happen any time soon because of the majority
> of developers, still on svn, still on windows.


If you want Git's autocrlf to convert to the native line endings on  
Windows and Unix, you need to convert everything to LF in the repo.   
This is what we did and now everything runs smoothly.

I have no recommendation, though, how to use svn and git together.  I  
do not use git-svn.

	Steffen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17  0:46             ` Dmitry Potapov
  2008-04-17  1:44               ` Avery Pennarun
@ 2008-04-17  7:07               ` Nigel Magnay
  2008-04-17  9:43                 ` Dmitry Potapov
  1 sibling, 1 reply; 19+ messages in thread
From: Nigel Magnay @ 2008-04-17  7:07 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git

On Thu, Apr 17, 2008 at 1:46 AM, Dmitry Potapov <dpotapov@gmail.com> wrote:
> On Thu, Apr 17, 2008 at 12:07:27AM +0100, Nigel Magnay wrote:
>  > >  > The bit I really don't understand is why git thinks a file that has
>  > >  > just been touched has chnaged when it hasn't,
>  > >
>  > >  Actually, it did change in the sense that if you try to commit this
>  > >  file now into the repository, you will have a different file in Git!
>  > >  So, it is more correct to say that Git did not notice this change until
>  > >  you touch this file, because this change is indirect (autocrlf causes
>  > >  a different interpretation of the file).
>  > >
>  >
>  > Okay - at the very least this behaviour is really, really confusing.
>  > And I think there's actually a bug (it should *always* report that the
>  > file is different), not magically after it's been touched.
>
>  I don't think there is a simple way to correct that without penalizing
>  normal use cases. Usually, people do not change autocrlf during their
>  normal work. Besides, you can have your own input filters and they may
>  cause the same effect. So, Git works in the assumption that input filters
>  always produce the same results...

This has nothing to do with changing core.autocrlf after checkout -
it's a problem with *any* repo with CRLF files, being checked out on a
core.autocrlf=true machine, which basically is any windows machine.

The current 'isDirty' check seems to be something like

isDirty = ( wc.file.mtime > someValue ) && ( repository.file !=
filter(wc.file) )

I'm saying it ought to be something like

isDirty = ( wc.file.mtime > someValue ) && (sha1(repository.file) !=
sha1(wc.file) ) && ( repository.file != filter(wc.file) )


>
>
>  >
>  > But fixing that minor bug still leads to badness for the user. Doing
>  > (on a core.autocrlf=true machine) a checkout of any revision
>  > containing a file that is (currently) CRLF in the repository, and your
>  > WC is *immediately* dirty. However technically correct that is, it
>  > doesn't fit most people's user model of an SCM, because they haven't
>  > made any modification.
>
>  IMHO, the only sane way is never store CRLF in the Git repository.
>  You can have whatever ending you like in your work tree, but inside
>  of Git, LF is the actually marker of the end-of-line.
>

Great. I'll go and argue with the team using svn, who don't even
*notice* this issue, and try to get them to adjust the metadata on
every single file in the repository.

Then, for a bonus, I'll try the same with every OSS project that I'm
tracking with git-svn. :-(

I get that things are horribly broken if you get CRLF in your
repository. But it's unreasonable to expect the ability to bend the
rest of the world to what's convenient for me! Some of our windows
coders probably even *like* svn:eol-style=CRLF !

>
>  > And if 1 person makes a change along with their
>  > conversion, and the other 'just' does a CRLF->LF conversion,
>
>  If you imported correctly in Git, it should not have CRLF for text
>  files. So, there is no conversion that a user does expliciltly.
>
>
>  > And because the svn is
>  > mastered crlf (well, strictly speaking, it's ignorant of line endings)
>  > this is gonna happen a lot.
>
>  Not really. SVN has its own setting for EOL conversion. If you have
>  'svn:eol-style' set to 'native' for any text file then SVN will
>  checkout text files accordingly to your native EOL (you can specify
>  your native EOL using the --native-eol option when it is necessary).
>

Can I set this personally, without affecting the svn repo? If so, why
isn't git-svn doing this anyway, and can I tell it to do so?

>
>  > Can't git be taught that if the WC is byte-identical to the revision
>  > in the repository (regardless of autocrlf) then that ought not to be
>  > regarded as a change?
>
>  Why should not it? If a file is different as long as Git repository is
>  concern then then it *is* a change. Git binary compare files _after_
>  applying all specified filters (and you can have your own filters, not
>  only autocrlf).
>

See above. Unchanged (on disk, byte identical) files, if touched, get
(sometimes) marked as dirty.

>
>  > Is there a way I can persuade the diff / merge mechanisms to normalise
>  > before they operate? (e.g if core.autocrlf does lf->crlf/crlf->lf,
>  > then an equivalent that does crlf->lf/crlf->lf before doing the merge
>  > )?
>
>  I am not sure if there is a standard option for that, but it is
>  certainly possible to define your own merge strategy.
>
Ok - I'll have a look into this - just a filter on each file before
merging would be sufficient. Presumably people that do things like
$Id$ expansion need something similar to avoid constant merge
conflicts..

>
>  >
>  > In a perfect world I'd be able to switch all files int he repo to LF,
>  > but that's not going to happen any time soon because of the majority
>  > of developers, still on svn, still on windows.
>
>  Well, I don't see any problem here if everything is configured properly.
>  How files are stored inside and what you have in your work tree does
>  not have to be the same. So, storing everything inside with LF is
>  certainly possible. Actually, I believe it is exactly what CVS does
>  (unless you added a file with '-kb'), and people use CVS on Windows.
>  Importing files with CRLF in Git, it is like putting files as _binary_
>  in CVS.
>
>  Dmitry
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17  7:07               ` Nigel Magnay
@ 2008-04-17  9:43                 ` Dmitry Potapov
  2008-04-17 10:09                   ` Nigel Magnay
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-17  9:43 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On Thu, Apr 17, 2008 at 08:07:27AM +0100, Nigel Magnay wrote:
> 
> This has nothing to do with changing core.autocrlf after checkout -
> it's a problem with *any* repo with CRLF files, being checked out on a
> core.autocrlf=true machine, which basically is any windows machine.
> 
> The current 'isDirty' check seems to be something like
> 
> isDirty = ( wc.file.mtime > someValue ) && ( repository.file !=
> filter(wc.file) )

Basically, yes.

> 
> I'm saying it ought to be something like
> 
> isDirty = ( wc.file.mtime > someValue ) && (sha1(repository.file) !=
> sha1(wc.file) ) && ( repository.file != filter(wc.file) )

I don't think it is reasonable. Files inside of the repository and
in the work are not meant to be the same. What if I have $Id$ expansion
or something else. What could make sense is to add an additional check:
 && convert_to_work_tree(repository.file) != wc.file
but it should be optional, so it will not penalize those who do need
or do not want this extra check.

> >  >
> >  > But fixing that minor bug still leads to badness for the user. Doing
> >  > (on a core.autocrlf=true machine) a checkout of any revision
> >  > containing a file that is (currently) CRLF in the repository, and your
> >  > WC is *immediately* dirty. However technically correct that is, it
> >  > doesn't fit most people's user model of an SCM, because they haven't
> >  > made any modification.
> >
> >  IMHO, the only sane way is never store CRLF in the Git repository.
> >  You can have whatever ending you like in your work tree, but inside
> >  of Git, LF is the actually marker of the end-of-line.
> >
> 
> Great. I'll go and argue with the team using svn, who don't even
> *notice* this issue, and try to get them to adjust the metadata on
> every single file in the repository.

Maybe, you can teach git-svn to be smarter... I mean storing text files
in Git repo with CRLF is stupid, so, perhaps, git-svn can do a better
job converting CRLF<->LF when it exports and imports from/to SVN.

> 
> Then, for a bonus, I'll try the same with every OSS project that I'm
> tracking with git-svn. :-(
> 
> I get that things are horribly broken if you get CRLF in your
> repository. But it's unreasonable to expect the ability to bend the
> rest of the world to what's convenient for me! Some of our windows
> coders probably even *like* svn:eol-style=CRLF !

You can use Git and have CRLF in your work tree. You just need to
have autocrlf=true for that. _Inside_ of Git, only LF is the end
of line. How you store in SVN, it is a separate issue with git-svn.
I guess, git-svn needs improvement in this area...

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17  9:43                 ` Dmitry Potapov
@ 2008-04-17 10:09                   ` Nigel Magnay
  2008-04-17 18:53                     ` Dmitry Potapov
  0 siblings, 1 reply; 19+ messages in thread
From: Nigel Magnay @ 2008-04-17 10:09 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git

>  >
>  > I'm saying it ought to be something like
>  >
>  > isDirty = ( wc.file.mtime > someValue ) && (sha1(repository.file) !=
>  > sha1(wc.file) ) && ( repository.file != filter(wc.file) )
>
>  I don't think it is reasonable. Files inside of the repository and
>  in the work are not meant to be the same. What if I have $Id$ expansion
>  or something else. What could make sense is to add an additional check:
>   && convert_to_work_tree(repository.file) != wc.file
>  but it should be optional, so it will not penalize those who do need
>  or do not want this extra check.
>

Ah, yes - you're right (I was only thinking about check-in filters,
not check-out).

I agree it ought to be optional; I suggest it ought to be turned on
(be default) in the $Id$ expansion and the core.autocrlf=true
scenarios (I.E when there's some filter in place).

>
>  > >  > ...
>  Maybe, you can teach git-svn to be smarter... I mean storing text files
>  in Git repo with CRLF is stupid, so, perhaps, git-svn can do a better
>  job converting CRLF<->LF when it exports and imports from/to SVN.
>

Yar - maybe there's some options there. Maybe it isn't so bad - all
svn projects probably *ought* to be using eol=native, but it isn't
default; so maybe it's just easier to coax those projects into fixing
their svn repos (but of course it's not really an issue for them, so
it might be a bit of a hard sell).

I may add some detail to the wiki docs to point this out - if I'd done
it up front to our local projects, my life would be easier!

>  ...
>  You can use Git and have CRLF in your work tree. You just need to
>  have autocrlf=true for that. _Inside_ of Git, only LF is the end
>  of line. How you store in SVN, it is a separate issue with git-svn.
>  I guess, git-svn needs improvement in this area...
>

Yes, in the sense that git is primarily a *nix tool, so it treats LF
as canon and CRLF as somehow 'stupid' (I.E you could make an equally
valid argument for the reverse position, it just depends on your
perspective ;-)) ; but then again, it's only an issue because I'm now
merging in git *waaay* more often and it's uncovering a problem that
might actually be there already (modulo the fact that svn merging may
ignore line endings anyway - but I don't know because all merges there
seem to inevitably end up in conflicts anyway..).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17 10:09                   ` Nigel Magnay
@ 2008-04-17 18:53                     ` Dmitry Potapov
  2008-04-17 22:03                       ` Nigel Magnay
  0 siblings, 1 reply; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-17 18:53 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On Thu, Apr 17, 2008 at 11:09:12AM +0100, Nigel Magnay wrote:
>
> Maybe it isn't so bad - all
> svn projects probably *ought* to be using eol=native, but it isn't
> default; 

If you want to have native EOL for each platform then you have to do
this conversion, but it should be applied to only to text files.  So,
the question is how can a VCS know what file is text and what is not.
CVS considers everything what you check-in as text by default. If you
want to put a binary file, you have to use -kb flag, otherwise your file
may be damaged. People tend to be forgetful and some lose their data in
this way. So team SVN team decided to stay on the safe side and put
everything as is, because if you forget to set eol=native, you do not
lose anything and you can set eol=native later. Unfortunately, now SVN
users forget to set eol=native a way too often. So, IMHO, Git approach
based on heuristic is much better when most of stored files are text.

> so maybe it's just easier to coax those projects into fixing
> their svn repos (but of course it's not really an issue for them, so
> it might be a bit of a hard sell).

If they care about support different platforms then it _is_ issue
for them too. On the other hand, if everyone uses Windows with CRLF,
you can do that with Git too just by setting autocrlf=false.

> 
> Yes, in the sense that git is primarily a *nix tool, so it treats LF
> as canon

and perhaps even more important, it is written in C and where LF has
always been considered as EOL since the first Hello-World program was
written in C:

   printf("Hello world!\n");
-----------------------^^

So, naturally LF is considered as EOL inside of Git. Actually, CVS does
so, and even SVN does if you set eol=native.

> and CRLF as somehow 'stupid' (I.E you could make an equally
> valid argument for the reverse position, it just depends on your
> perspective ;-)) ;

There is no good technical reason to have two symbols as the end-of-line
marker instead of one. Most programs on Windows just remove CR when read
from a file and then adding it back before LF when writing it back. So,
CR is clearly redundant.

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17 18:53                     ` Dmitry Potapov
@ 2008-04-17 22:03                       ` Nigel Magnay
  2008-04-17 22:42                         ` Dmitry Potapov
  0 siblings, 1 reply; 19+ messages in thread
From: Nigel Magnay @ 2008-04-17 22:03 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: git

>  lose anything and you can set eol=native later. Unfortunately, now SVN
>  users forget to set eol=native a way too often. So, IMHO, Git approach
>  based on heuristic is much better when most of stored files are text.
>

I agree - since the forgetful users includes us!

>
>  > so maybe it's just easier to coax those projects into fixing
>  > their svn repos (but of course it's not really an issue for them, so
>  > it might be a bit of a hard sell).
>
>  If they care about support different platforms then it _is_ issue
>  for them too. On the other hand, if everyone uses Windows with CRLF,
>  you can do that with Git too just by setting autocrlf=false.
>

Actually it seems to be less of an everyday issue- but I think it's
because the diff tools in use by programs downstream are probably
stripping CRs anyway before presenting diffs, so it all 'appears' to
be right. Certainly I've been sharing via a svn repo through Eclipse
with windows users for ages without it being a problem. Either way,
the problem in touched/untouched-files was the majority of my
confusion as I wasn't expecting to find a bug and was assuming I was
doing something wrong...

>
>  >
>  > Yes, in the sense that git is primarily a *nix tool, so it treats LF
>  > as canon
>
>  and perhaps even more important, it is written in C and where LF has
>  always been considered as EOL since the first Hello-World program was
>  written in C:
>
>    printf("Hello world!\n");
>  -----------------------^^
>
>  So, naturally LF is considered as EOL inside of Git. Actually, CVS does
>  so, and even SVN does if you set eol=native.
>
>
>  > and CRLF as somehow 'stupid' (I.E you could make an equally
>  > valid argument for the reverse position, it just depends on your
>  > perspective ;-)) ;
>
>  There is no good technical reason to have two symbols as the end-of-line
>  marker instead of one. Most programs on Windows just remove CR when read
>  from a file and then adding it back before LF when writing it back. So,
>  CR is clearly redundant.
>

Well.... Newline = LF vs CRLF (vs CR for early mac.. erk) dates to
well before C and UNIX; back into the days of baudot codes and
teletype printers that couldn't physically newline in the time taken
for 1 character to be processed; LF is meant to mean Line Feed and CR
is meant to mean "Carriage Return", so CRLF is in that sense quite
logical. But that's standards committees and backwards compatibility
for you :-/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: crlf with git-svn driving me nuts...
  2008-04-17 22:03                       ` Nigel Magnay
@ 2008-04-17 22:42                         ` Dmitry Potapov
  0 siblings, 0 replies; 19+ messages in thread
From: Dmitry Potapov @ 2008-04-17 22:42 UTC (permalink / raw)
  To: Nigel Magnay; +Cc: git

On Thu, Apr 17, 2008 at 11:03:10PM +0100, Nigel Magnay wrote:
> 
> Well.... Newline = LF vs CRLF (vs CR for early mac.. erk) dates to
> well before C and UNIX; back into the days of baudot codes and
> teletype printers that couldn't physically newline in the time taken
> for 1 character to be processed; LF is meant to mean Line Feed and CR
> is meant to mean "Carriage Return", so CRLF is in that sense quite
> logical. But that's standards committees and backwards compatibility
> for you :-/

CRLF is logical from the point of you of teletype printers, but when
we speak about text files then it is more logical to consider them as
a list of lines. What particular symbol is used as line-separator does
not really matter, but IMHO it is stupid to have two symbols for that.
So, LF vs CR is matter of preferences, but CRLF is just stupid -;)

Dmitry

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-04-17 22:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-16 19:10 crlf with git-svn driving me nuts Nigel Magnay
2008-04-16 20:01 ` Dmitry Potapov
2008-04-16 20:20   ` Avery Pennarun
2008-04-16 20:39     ` Dmitry Potapov
2008-04-16 21:56       ` Nigel Magnay
     [not found]       ` <320075ff0804161447u25dfbb2bmcd36ea507224d835@mail.gmail.com>
     [not found]         ` <20080416223739.GJ3133@dpotapov.dyndns.org>
2008-04-16 23:07           ` Nigel Magnay
2008-04-17  0:46             ` Dmitry Potapov
2008-04-17  1:44               ` Avery Pennarun
2008-04-17  7:07               ` Nigel Magnay
2008-04-17  9:43                 ` Dmitry Potapov
2008-04-17 10:09                   ` Nigel Magnay
2008-04-17 18:53                     ` Dmitry Potapov
2008-04-17 22:03                       ` Nigel Magnay
2008-04-17 22:42                         ` Dmitry Potapov
2008-04-17  5:43             ` Steffen Prohaska
2008-04-16 20:56   ` Martin Langhoff
2008-04-16 21:02     ` Avery Pennarun
2008-04-16 21:17     ` Dmitry Potapov
2008-04-16 20:03 ` Avery Pennarun

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).