git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Not understanding with git wants to copy one file to another
@ 2017-08-10 17:03 Harry Putnam
  2017-08-10 17:36 ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: Harry Putnam @ 2017-08-10 17:03 UTC (permalink / raw)
  To: git

I ran into a line in git commit ouput I had not see before

  #copied:     d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts

So googling I learned that this might happen if git thinks the two
files are the same.

I was pretty sure they were not the same so checked them>

 <inside git repo>

diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts

The output is a bit long but shows them being quite different.

Some 2 dozen or so lines that dramatically differ.

Here are two that are at least kind of similar but would never be seen
as the same:

< 192.168.1.43      m2.local.lan       m2       # 00-90-F5-A1-F9-E5
> 192.168.1.43    m2.local.lan        m2         # win 7

Not to mention they are quite different lines as well.

So what is going on and what should I be looking at?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-10 17:03 Not understanding with git wants to copy one file to another Harry Putnam
@ 2017-08-10 17:36 ` Stefan Beller
  2017-08-10 18:18   ` Harry Putnam
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2017-08-10 17:36 UTC (permalink / raw)
  To: Harry Putnam; +Cc: git@vger.kernel.org

On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam <reader@newsguy.com> wrote:
> I ran into a line in git commit ouput I had not see before
>
>   #copied:     d0/etc/hosts -> misc/old-readerHOSTvcs-files/etc/hosts
>
> So googling I learned that this might happen if git thinks the two
> files are the same.
>
> I was pretty sure they were not the same so checked them>
>
>  <inside git repo>
>
> diff d0/etc/host misc/old-readerHOSTvcs-files/etc/hosts
>
> The output is a bit long but shows them being quite different.
>
> Some 2 dozen or so lines that dramatically differ.
>
> Here are two that are at least kind of similar but would never be seen
> as the same:
>
> < 192.168.1.43      m2.local.lan       m2       # 00-90-F5-A1-F9-E5
>> 192.168.1.43    m2.local.lan        m2         # win 7
>
> Not to mention they are quite different lines as well.
>
> So what is going on and what should I be looking at?

The diff machinery has a threshold for when it assumes
a copy/move of a file. (e.g. "A file is assumed copied when
at least 55% of lines are equal")

https://git-scm.com/docs/git-diff

See -C and -M option.

git-status seems to use this machinery as well, but does
not expose the options?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-10 17:36 ` Stefan Beller
@ 2017-08-10 18:18   ` Harry Putnam
  2017-08-10 18:47     ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: Harry Putnam @ 2017-08-10 18:18 UTC (permalink / raw)
  To: git

Stefan Beller <sbeller@google.com> writes:

> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam <reader@newsguy.com> wrote:

[...]

Harry wrote:
>> Here are two that are at least kind of similar but would never be seen
>> as the same:
>>
>> < 192.168.1.43      m2.local.lan       m2       # 00-90-F5-A1-F9-E5
>>> 192.168.1.43    m2.local.lan        m2         # win 7

 Stefan B replied:
> The diff machinery has a threshold for when it assumes
> a copy/move of a file. (e.g. "A file is assumed copied when
> at least 55% of lines are equal")
>
> https://git-scm.com/docs/git-diff
>
> See -C and -M option.
>
> git-status seems to use this machinery as well, but does
> not expose the options?

Well, now I'm even more confused.  What actually happens? Is either
file changed? Is only one file kept?

On the surface it sounds like complete anathema to what git is all
about.

However, I know a tool this sophisticated is not doing something just
outright stupid... so must be really missing the point here.

I get the way you can make -M stricter or not... but I didn't call
git-diff to see that copy thing comeup.

I called git commit.

There must be some way to set stricter guidlines to calling things
copies.

But then I must really not get it because it still seems almost silly
to consider one file a copy of another if only 55% is the same.

What am I missing?




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-10 18:18   ` Harry Putnam
@ 2017-08-10 18:47     ` Stefan Beller
  2017-08-11 20:41       ` Harry Putnam
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2017-08-10 18:47 UTC (permalink / raw)
  To: Harry Putnam; +Cc: git@vger.kernel.org

On Thu, Aug 10, 2017 at 11:18 AM, Harry Putnam <reader@newsguy.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Thu, Aug 10, 2017 at 10:03 AM, Harry Putnam <reader@newsguy.com> wrote:
>
> [...]
>
> Harry wrote:
>>> Here are two that are at least kind of similar but would never be seen
>>> as the same:
>>>
>>> < 192.168.1.43      m2.local.lan       m2       # 00-90-F5-A1-F9-E5
>>>> 192.168.1.43    m2.local.lan        m2         # win 7
>
>  Stefan B replied:
>> The diff machinery has a threshold for when it assumes
>> a copy/move of a file. (e.g. "A file is assumed copied when
>> at least 55% of lines are equal")
>>
>> https://git-scm.com/docs/git-diff
>>
>> See -C and -M option.
>>
>> git-status seems to use this machinery as well, but does
>> not expose the options?
>
> Well, now I'm even more confused.  What actually happens? Is either
> file changed? Is only one file kept?
>
> On the surface it sounds like complete anathema to what git is all
> about.
>
> However, I know a tool this sophisticated is not doing something just
> outright stupid... so must be really missing the point here.
>
> I get the way you can make -M stricter or not... but I didn't call
> git-diff to see that copy thing comeup.
>
> I called git commit.

Ah. Sorry for confusing even more.
By pointing out the options for git-diff, I just wanted to point out that
such a mechanism ("rename/copy detection") exists.

The output of git-status is similar to a dry run of git-commit,
and apparently this detection is used there.

>
> There must be some way to set stricter guidlines to calling things
> copies.

Well from Gits perspective it is really hard to tell if it was a copy, or
if it was similar incidentally (because the format/content of these files
happen to follow some strict guidelines).

The user could have moved/copied a file outside of Git (instead of
git-mv, you'd use tools provided by your operating system to copy a
file). Or the user could have written a file that is similar by chance.

However that doesn't really matter, as Git tracks the content, and not
how the file evolved.

Consider the copy/move/rename detection as a heuristic, that wants
to help the user, but may be mistaken.

>
> But then I must really not get it because it still seems almost silly
> to consider one file a copy of another if only 55% is the same.
>
> What am I missing?
>

https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/

"Rename detection is just GUI sugar".

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-10 18:47     ` Stefan Beller
@ 2017-08-11 20:41       ` Harry Putnam
  2017-08-14 18:07         ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: Harry Putnam @ 2017-08-11 20:41 UTC (permalink / raw)
  To: git

Stefan Beller <sbeller@google.com> writes:


[...]

> Ah. Sorry for confusing even more.
> By pointing out the options for git-diff, I just wanted to point out that
> such a mechanism ("rename/copy detection") exists.


[...]

>> What am I missing?
>>
>
> https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/
>
> "Rename detection is just GUI sugar".

Thanks there is a nice full explanation at the cited url.

What is still a bit puzzling is that in that same commit, there are
files that are true copies of each other, just in different locations,
But nothing pops up about them in a git commit.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-11 20:41       ` Harry Putnam
@ 2017-08-14 18:07         ` Stefan Beller
  2017-08-14 19:21           ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2017-08-14 18:07 UTC (permalink / raw)
  To: Harry Putnam; +Cc: git@vger.kernel.org

On Fri, Aug 11, 2017 at 1:41 PM, Harry Putnam <reader@newsguy.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>
> [...]
>
>> Ah. Sorry for confusing even more.
>> By pointing out the options for git-diff, I just wanted to point out that
>> such a mechanism ("rename/copy detection") exists.
>
>
> [...]
>
>>> What am I missing?
>>>
>>
>> https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/
>>
>> "Rename detection is just GUI sugar".
>
> Thanks there is a nice full explanation at the cited url.
>
> What is still a bit puzzling is that in that same commit, there are
> files that are true copies of each other, just in different locations,
> But nothing pops up about them in a git commit.
>

The heuristic to find the renames/copies only looks at modified files
to be fast(, the assumption is that each commit only touches few
files, but the project consists of a lot of files).

For that git-diff knows about '--find-copies-harder' that looks at
all files even those not modified. This would point out the true
copies, I would assume.

I don't think we'd want to include the '--find-copies-harder' flag
to status or commit, as it may take some time in large projects.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Not understanding with git wants to copy one file to another
  2017-08-14 18:07         ` Stefan Beller
@ 2017-08-14 19:21           ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-08-14 19:21 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Harry Putnam, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> On Fri, Aug 11, 2017 at 1:41 PM, Harry Putnam <reader@newsguy.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>
>> [...]
>>
>>> Ah. Sorry for confusing even more.
>>> By pointing out the options for git-diff, I just wanted to point out that
>>> such a mechanism ("rename/copy detection") exists.
>>
>>
>> [...]
>>
>>>> What am I missing?
>>>>
>>>
>>> https://www.reddit.com/r/git/comments/3ogkk1/beginner_disable_rename_detection/
>>>
>>> "Rename detection is just GUI sugar".
>>
>> Thanks there is a nice full explanation at the cited url.
>>
>> What is still a bit puzzling is that in that same commit, there are
>> files that are true copies of each other, just in different locations,
>> But nothing pops up about them in a git commit.
>>
>
> The heuristic to find the renames/copies only looks at modified files
> to be fast(, the assumption is that each commit only touches few
> files, but the project consists of a lot of files).
>
> For that git-diff knows about '--find-copies-harder' that looks at
> all files even those not modified. This would point out the true
> copies, I would assume.
>
> I don't think we'd want to include the '--find-copies-harder' flag
> to status or commit, as it may take some time in large projects.

Yeah, thanks for helping in this discussion.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-14 19:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-10 17:03 Not understanding with git wants to copy one file to another Harry Putnam
2017-08-10 17:36 ` Stefan Beller
2017-08-10 18:18   ` Harry Putnam
2017-08-10 18:47     ` Stefan Beller
2017-08-11 20:41       ` Harry Putnam
2017-08-14 18:07         ` Stefan Beller
2017-08-14 19:21           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).