git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Handling renames.
@ 2005-04-14 17:54 David Woodhouse
  2005-04-14 18:11 ` Linus Torvalds
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-14 17:54 UTC (permalink / raw)
  To: git; +Cc: James Bottomley

I've been looking at tracking file revisions. One proposed solution was
to have a separate revision history for individual files, with a new
kind of 'filecommit' object which parallels the existing 'commit',
referencing a blob instead of a tree. Then trees would reference such
objects instead of referencing blobs directly.

I think that introduces a lot of redundancy though, because 99% of the
time, the revision history of the individual file is entirely
reproducible from the revision history of the tree. It's only when files
are renamed that we fall over -- and I think we can handle renames
fairly well if we just log them in the commit object. 

My 'gitfilelog.sh' script is already capable of tracking a given file
back through multiple tree commits, listing those commits where the file
in question was actually changed. It uses my patched version of diff-
tree which supports 'diff-tree <TREE_A> <TREE_B> <filename>' in order to
do this.

By storing rename information in the commit object, the script (or a
reimplementation of a similar algorithm) could know when to change the
filename it's looking for, as it goes back through the tree. That ought
to be perfectly sufficient.

So a commit involving a rename would look something like this...

	tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
	parent bb95843a5a0f397270819462812735ee29796fb4
	rename foo.c bar.c
	author David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
	committer David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
	Rename foo.c to bar.c and s/foo_/bar_/g

Opinions? Dissent? We'd probably need to escape the filenames in some
way -- handwave over that for now.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 17:54 Handling renames David Woodhouse
@ 2005-04-14 18:11 ` Linus Torvalds
  2005-04-14 19:09   ` David Woodhouse
  2005-04-14 18:12 ` Ingo Molnar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2005-04-14 18:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, James Bottomley



On Thu, 14 Apr 2005, David Woodhouse wrote:
>
> I've been looking at tracking file revisions. One proposed solution was
> to have a separate revision history for individual files, with a new
> kind of 'filecommit' object which parallels the existing 'commit',
> referencing a blob instead of a tree. Then trees would reference such
> objects instead of referencing blobs directly.

Please don't.  It's fundamentally the git notion of "content determines
objects".

It also has no relevance. A "rename" really doesn't exist in the git 
model. The git model really is about tracking data, not about tracking 
what happened to _create_ that data.

The one exception is the commit log. That's where you put the explanations 
of _why_ the data changed. And git itself doesn't care what the format is, 
apart from the git header.

So, you really need to think of git as a filesystem. You can then 
implement an SCM _on_top_of_it_, which means that your second suggestion 
is not only acceptable, it really is the _only_ way to handle this in git:

> So a commit involving a rename would look something like this...
> 
> 	tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
> 	parent bb95843a5a0f397270819462812735ee29796fb4
> 	rename foo.c bar.c
> 	author David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
> 	committer David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
> 	Rename foo.c to bar.c and s/foo_/bar_/g

Except I want that empty line in there, and I want it in the "free-form"  
section. The "rename" part really isn't part of the git header. It's not 
what git tracks, it was tracked by an SCM system on top of git.

So the git header is an "inode" in the git filesystem, and like an inode 
it has a ctime and an mtime, and pointers to the data. So as far as git is 
concerned, this part:

	tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
	parent bb95843a5a0f397270819462812735ee29796fb4
	author David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
	committer David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100

really is the filesystem "inode". The rest is whatever the filesystem user
puts into it, and git won't care.

> Opinions? Dissent? We'd probably need to escape the filenames in some
> way -- handwave over that for now.

The fact that git handles arbitrary filenames (stuff starting with "." 
excepted) doesn't mean that the SCM above it needs to. Quite frankly, I 
think an SCM that handles newlines in filenames is being silly. But a 
_filesystem_ needs to not care.

There are too many messy SCM's out there that do not hav ea "philosophy". 
Dammit, I'm not interested in creating another one. This thing has a 
mental model, and we keep to that model.

The reason UNIX is beautiful is that it has a mental model of processes 
and files. Git has a mental model of objects and certain very very limited 
relationships. The relationships git cares about are encoded in the C 
files, the "extra crap" (like rename info) is just that - stuff that 
random scripts wrote, and that is just informational and not central to 
the model.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 17:54 Handling renames David Woodhouse
  2005-04-14 18:11 ` Linus Torvalds
@ 2005-04-14 18:12 ` Ingo Molnar
  2005-04-14 18:32   ` Linus Torvalds
  2005-04-14 18:21 ` H. Peter Anvin
  2005-04-14 22:23 ` Handling renames Daniel Barkalow
  3 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2005-04-14 18:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, James Bottomley


* David Woodhouse <dwmw2@infradead.org> wrote:

> I've been looking at tracking file revisions. One proposed solution 
> was to have a separate revision history for individual files, with a 
> new kind of 'filecommit' object which parallels the existing 'commit', 
> referencing a blob instead of a tree. Then trees would reference such 
> objects instead of referencing blobs directly.
> 
> I think that introduces a lot of redundancy though, because 99% of the 
> time, the revision history of the individual file is entirely 
> reproducible from the revision history of the tree. It's only when 
> files are renamed that we fall over -- and I think we can handle 
> renames fairly well if we just log them in the commit object.

how about the following structure:

    - tree_new --->
    - tree_old ---> rename_commit -> blob

the rename_commit object just contains a pointer to the file content 
blob. If a rename happens then the old tree references the rename_commit 
object (instead of the blob), and the new tree references it too. This 
way there's no need to list the rename via namespace means: if a tree 
entry points to a rename_commit object then a rename happened and the 
rename_commit object is looked up in the old tree to get the old name.

there's no redundancy caused by this method: only renames (which are 
rare) go through the rename_commit redirection. (to speed up the lookup 
the rename_commit object could cache the offset of the two names within 
their tree objects.)

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 17:54 Handling renames David Woodhouse
  2005-04-14 18:11 ` Linus Torvalds
  2005-04-14 18:12 ` Ingo Molnar
@ 2005-04-14 18:21 ` H. Peter Anvin
  2005-04-14 18:48   ` Linus Torvalds
  2005-04-14 22:23 ` Handling renames Daniel Barkalow
  3 siblings, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2005-04-14 18:21 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, James Bottomley

David Woodhouse wrote:
> 
> Opinions? Dissent? We'd probably need to escape the filenames in some
> way -- handwave over that for now.
> 

For readability and simplicity I'd suggest using either URL-style %XX 
escapes or octal \xxx escapes for anything bytes < 33, minus the escape 
character.

Although Linus is correct in that an SCM doesn't *have* to handle this, 
it really feels like shooting for mediocracy to me.  We might as well 
design it right from the beginning.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:12 ` Ingo Molnar
@ 2005-04-14 18:32   ` Linus Torvalds
  2005-04-14 18:58     ` Ingo Molnar
  2005-04-14 19:21     ` David Mansfield
  0 siblings, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2005-04-14 18:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David Woodhouse, git, James Bottomley



On Thu, 14 Apr 2005, Ingo Molnar wrote:
> 
> there's no redundancy caused by this method: only renames (which are 
> rare) go through the rename_commit redirection. (to speed up the lookup 
> the rename_commit object could cache the offset of the two names within 
> their tree objects.)

Bzzt. Wrong.

ANYTHING you do with games like this will cause the "same directory 
creates different object" crap.

Git doesn't do that. There fundamentally is no history in objects, 
_except_ for the commit object. Two objects with the same name are 
identical, and that means that they are easy to share. 

Any time you break that model, you break the whole point of git. Don't do 
it. You'll be very very sorry if you ever do, because it breaks the clean 
separation of "time" and "space". I guarantee you that your merges will 
become _harder_ rather than easier.

What you can do at an SCM level, is that if you want to track renames, you
track them as a separate commit altogether. Ie if you notice a rename, you
first commit the rename (and you can _see_ it's a rename, since the object
didn't change, and the sha1 stayed the same, which in git-speak means that
it is the same object, ie that _is_ a rename as far as git is concerned),
and then you create the "this is the data that changed" as a _second_
commit.

But don't make it a new kind of commit. It's just a regular commit, 
dammit. No new abstractions. 

Trust me, it's worth it to follow the rules. You don't start making up new 
concepts for every new thing you track. Next you'll want "tag objects". 
That's a totally idiotic idea. What you do is you tag things at a higher 
level than git ever is, and git will _never_ have to know about tag 
objects. 

Some "higher level" thing can add its own rules _on_top_ of git rules. The
same way we have normal applications having their _own_ rules on top of
the kernel. You do abstraction in layers, but for this to work, the base 
you build on top of had better be damn solid, and not have any ugly 
special cases.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:21 ` H. Peter Anvin
@ 2005-04-14 18:48   ` Linus Torvalds
  2005-04-14 18:49     ` H. Peter Anvin
  2005-04-14 19:22     ` Zach Welch
  0 siblings, 2 replies; 28+ messages in thread
From: Linus Torvalds @ 2005-04-14 18:48 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Woodhouse, git, James Bottomley



On Thu, 14 Apr 2005, H. Peter Anvin wrote:
> 
> Although Linus is correct in that an SCM doesn't *have* to handle this, 
> it really feels like shooting for mediocracy to me.  We might as well 
> design it right from the beginning.

No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and 
that _is_ doing it right from the beginning.

Keep the abstractions clean. Do _not_ get confused into thinking that git 
is an SCM. If you think of it that way, you'll end up with crap you can't 
think about.

And at a filesystem layer, "rename" already exists. It's moving an object 
to a new name in a tree. git already does that very well, thank you very 
much.

But a filesystem rename is _not_ the same thing as an SCM rename.  An SCM 
rename is built on top of a filesystem rename, but it has its own issues 
that may or may not make sense for the filesystem.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:48   ` Linus Torvalds
@ 2005-04-14 18:49     ` H. Peter Anvin
  2005-04-14 19:22     ` Zach Welch
  1 sibling, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2005-04-14 18:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, git, James Bottomley

Linus Torvalds wrote:
> 
> On Thu, 14 Apr 2005, H. Peter Anvin wrote:
> 
>>Although Linus is correct in that an SCM doesn't *have* to handle this, 
>>it really feels like shooting for mediocracy to me.  We might as well 
>>design it right from the beginning.
> 
> No. git is not an SCM. it's a filesystem designed to _host_ an SCM, and 
> that _is_ doing it right from the beginning.
> 
> Keep the abstractions clean. Do _not_ get confused into thinking that git 
> is an SCM. If you think of it that way, you'll end up with crap you can't 
> think about.
> 
> And at a filesystem layer, "rename" already exists. It's moving an object 
> to a new name in a tree. git already does that very well, thank you very 
> much.
> 
> But a filesystem rename is _not_ the same thing as an SCM rename.  An SCM 
> rename is built on top of a filesystem rename, but it has its own issues 
> that may or may not make sense for the filesystem.
> 

I wasn't referring to git per se, I was referring to the hosted SCM.

	-hpa

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:32   ` Linus Torvalds
@ 2005-04-14 18:58     ` Ingo Molnar
  2005-04-14 19:20       ` David Woodhouse
  2005-04-14 19:21     ` David Mansfield
  1 sibling, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2005-04-14 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, git, James Bottomley


* Linus Torvalds <torvalds@osdl.org> wrote:

> [...] Ie if you notice a rename, you first commit the rename (and you 
> can _see_ it's a rename, since the object didn't change, and the sha1 
> stayed the same, which in git-speak means that it is the same object, 
> ie that _is_ a rename as far as git is concerned), and then you create 
> the "this is the data that changed" as a _second_ commit.

ok, i accept your point of not putting this into such a low level as the 
object abstraction. Was a bad idea.

but i dont think the above would be enough: there can be renames of 
objects that have the same sha1 hash as other objects in the same tree, 
and developers want to track individual objects, regardless of whether 
other files share the same content. So some formal operation would be 
needed to signal renames - e.g. to embedd it in the commit object, per 
David's suggestion.

The thing i tried to avoid was to list long filenames in the commit 
(because of the tree hierarchy we'd need to do tree-absolute pathnames 
or something like that, and escape things, and do lookups - duplicating 
a VFS which is quite bad) - it would be better to identify the rename 
source and target via its tree object hash and its offset within that 
tree. Such information could be embedded in the commit object just fine.  
Something like:

me bb95843a5a0f397270819462812735ee29796fb4
tree 1756b578489f93999ded68ae347bef7d6063101c
parent 9f02d4d233223462d3f6217b5837b786e6286ba4
author
committer
rename 39021759c903a943a33a28cfbd5070d36d851581 15234 9f02d4d233223462d3f6217b5837b786e6286ba4 16163

?

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:11 ` Linus Torvalds
@ 2005-04-14 19:09   ` David Woodhouse
  0 siblings, 0 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-14 19:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, James Bottomley

On Thu, 2005-04-14 at 11:11 -0700, Linus Torvalds wrote:
> So, you really need to think of git as a filesystem. You can then 
> implement an SCM _on_top_of_it_, which means that your second suggestion 
> is not only acceptable, it really is the _only_ way to handle this in git:
> 
> > So a commit involving a rename would look something like this...
> > 
> > 	tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
> > 	parent bb95843a5a0f397270819462812735ee29796fb4
> > 	rename foo.c bar.c
> > 	author David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
> > 	committer David Woodhouse <dwmw2@hades.cambridge.redhat.com> 1113499881 +0100
> > 	Rename foo.c to bar.c and s/foo_/bar_/g
> 
> Except I want that empty line in there, and I want it in the "free-form"  
> section. The "rename" part really isn't part of the git header. It's not 
> what git tracks, it was tracked by an SCM system on top of git.

Note that not only may you have a _set_ of renames, but you'll also have
a _different_ set of renames for each parent. Consider the
representation of a merge where a file was called 'foo' in one parent,
'bar' in the other, and we called it 'foobar' in the resulting tree.

That's the main reason I wanted the renames in with the parent
information -- so it's <parent><rename><rename><...><parent><rename>...

I see your point though and I can't be bothered to argue for the sake of
the slight efficiency benefit we might gain from doing it that way. The
implementation details really aren't that interesting right now.

Let us assume, however, that we have this information somehow stored in
each commit object. It's perfectly sufficient from the POV of the 
'git revtool' which I've been poking at; is it good enough for merges?

Consider a simple case: A branch is taken, file foo.c is renamed to
bar.c, and now we're trying to merge that branch back into the head,
which has moved on. 

We can't just take 'bar.c' as a new file -- we have to track it all the
way back to its inception, and notice that it actually shares a common
ancestor with 'foo.c' in the other parent of the merge.

How feasible, and how computationally expensive, is that task going to
be? Especially given that there may be _many_ new files that we need to
attempt to tie up with their partners, across many potential renames. 

One option for optimising this, if we really need to, might be to track
the file back to its _first_ ancestor and use that as an identification.
The SCM could store that identifier in the blob itself, or we could
consider it an 'inode number' and store it in git's tree objects.

If we can avoid that, however, it would be nice. How feasible is the
merge going to be without it?

-- 
dwmw2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:58     ` Ingo Molnar
@ 2005-04-14 19:20       ` David Woodhouse
  0 siblings, 0 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-14 19:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, git, James Bottomley

On Thu, 2005-04-14 at 20:58 +0200, Ingo Molnar wrote:
> The thing i tried to avoid was to list long filenames in the commit 
> (because of the tree hierarchy we'd need to do tree-absolute pathnames 
> or something like that, and escape things, and do lookups - duplicating 
> a VFS which is quite bad) - it would be better to identify the rename 
> source and target via its tree object hash and its offset within that 
> tree. Such information could be embedded in the commit object just fine.  
> Something like:

Actually I'm not sure that's true. Let's consider the two main users of
this information.

Firstly, because it's what I've been playing with: to list a given
file's revision history, I currently work with its filename -- walk the
commit objects, inspecting the tree and selecting those commits where
the file has changed. If my filename is 'fs/jffs2/inode.c' then I can
immediately skip over a commit where the 'fs' entry in the top-level
tree is identical to that in the parent, or I can skip a commit where
the 'jffs2' entry in the 'fs' subtree is identical to the parent... it's
all done on filename, and the {parent, entry} tuple wouldn't help much
here; I'd probably have to convert back to a filename anyway.

Secondly, there's merges. I've paid less attention to these (see mail 5
minutes ago) but I think they'd end up operating on the rename
information in a very similar way. To find a common ancestor for a given
file,, we want to track its name as it changed during history; at that
point it's all string compares.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:32   ` Linus Torvalds
  2005-04-14 18:58     ` Ingo Molnar
@ 2005-04-14 19:21     ` David Mansfield
  1 sibling, 0 replies; 28+ messages in thread
From: David Mansfield @ 2005-04-14 19:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, David Woodhouse, git, James Bottomley

Linus Torvalds wrote:
> 
> On Thu, 14 Apr 2005, Ingo Molnar wrote:
> 
>>there's no redundancy caused by this method: only renames (which are 
>>rare) go through the rename_commit redirection. (to speed up the lookup 
>>the rename_commit object could cache the offset of the two names within 
>>their tree objects.)
> 
> 

> 
> Some "higher level" thing can add its own rules _on_top_ of git rules. The
> same way we have normal applications having their _own_ rules on top of
> the kernel. You do abstraction in layers, but for this to work, the base 
> you build on top of had better be damn solid, and not have any ugly 
> special cases.
> 

Maybe you (or the group) should standardize on a way to 'extend' the 
commit 'object' in terms of:

the layer1 (git) header for commit object is defined as such-and-such
the layer2 (scm or other) header for commit object is defined as 
such-and-such

Much the way network protocols stack on top of each other.  If a 
standard way of stacking is defined, then it could be much cleaner for 
future implementors to understand a 'new' stacking protocol, and it will 
make the scm-level extensions easier to discuss it terms of their own 
'layer'.

David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 18:48   ` Linus Torvalds
  2005-04-14 18:49     ` H. Peter Anvin
@ 2005-04-14 19:22     ` Zach Welch
  2005-04-14 19:40       ` Andrew Timberlake-Newell
  1 sibling, 1 reply; 28+ messages in thread
From: Zach Welch @ 2005-04-14 19:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, David Woodhouse, git, James Bottomley

Linus Torvalds wrote:
> 
> On Thu, 14 Apr 2005, H. Peter Anvin wrote:
> 
>> Although Linus is correct in that an SCM doesn't *have* to handle 
>> this, it really feels like shooting for mediocracy to me.  We might
>>  as well design it right from the beginning.
> 
> 
> No. git is not an SCM. it's a filesystem designed to _host_ an SCM, 
> and that _is_ doing it right from the beginning.

I imagine quite a few folks expect something not entirely unlike an SCM
to emerge from these current efforts. Moreover, Petr's 'git' scripts
wrap your "filesystem" plumbing to that very end.

To avoid confusion, I think it would be better to distinguish the two
layers, perhaps by calling the low-level plumbing... 'gitfs', of course.

Cheers,

Zach Welch
Superlucidity Services

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: Handling renames.
  2005-04-14 19:22     ` Zach Welch
@ 2005-04-14 19:40       ` Andrew Timberlake-Newell
  2005-04-14 20:42         ` Naming the SCM (was Re: Handling renames.) Steven Cole
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Timberlake-Newell @ 2005-04-14 19:40 UTC (permalink / raw)
  To: git; +Cc: 'Zach Welch', 'Linus Torvalds'

Zach Welch pontificated:
> I imagine quite a few folks expect something not entirely unlike an SCM
> to emerge from these current efforts. Moreover, Petr's 'git' scripts
> wrap your "filesystem" plumbing to that very end.
> 
> To avoid confusion, I think it would be better to distinguish the two
> layers, perhaps by calling the low-level plumbing... 'gitfs', of course.

Or perhaps to come up with a name (or at least nickname) for the SCM.

GitMaster?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Naming the SCM (was Re: Handling renames.)
  2005-04-14 19:40       ` Andrew Timberlake-Newell
@ 2005-04-14 20:42         ` Steven Cole
  2005-04-14 20:53           ` Petr Baudis
  2005-04-14 23:17           ` Peter Williams
  0 siblings, 2 replies; 28+ messages in thread
From: Steven Cole @ 2005-04-14 20:42 UTC (permalink / raw)
  To: Andrew Timberlake-Newell
  Cc: git, 'Zach Welch', 'Linus Torvalds'

On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote:
> Zach Welch pontificated:
> > I imagine quite a few folks expect something not entirely unlike an SCM
> > to emerge from these current efforts. Moreover, Petr's 'git' scripts
> > wrap your "filesystem" plumbing to that very end.
> > 
> > To avoid confusion, I think it would be better to distinguish the two
> > layers, perhaps by calling the low-level plumbing... 'gitfs', of course.
> 
> Or perhaps to come up with a name (or at least nickname) for the SCM.
> 
> GitMaster?
> 

Cogito.  "Git inside" can be the first slogan.

Differentiating the SCM built on top of git from git itself is probably worthwhile
to avoid confusion.  Other SCMs may be developed later, built on git, and these
can come up with their own clever names.

Steven

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Naming the SCM (was Re: Handling renames.)
  2005-04-14 20:42         ` Naming the SCM (was Re: Handling renames.) Steven Cole
@ 2005-04-14 20:53           ` Petr Baudis
  2005-04-14 20:58             ` H. Peter Anvin
  2005-04-14 23:17           ` Peter Williams
  1 sibling, 1 reply; 28+ messages in thread
From: Petr Baudis @ 2005-04-14 20:53 UTC (permalink / raw)
  To: Steven Cole
  Cc: Andrew Timberlake-Newell, git, 'Zach Welch',
	'Linus Torvalds'

Dear diary, on Thu, Apr 14, 2005 at 10:42:16PM CEST, I got a letter
where Steven Cole <elenstev@mesatop.com> told me that...
> On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote:
> > Zach Welch pontificated:
> > > I imagine quite a few folks expect something not entirely unlike an SCM
> > > to emerge from these current efforts. Moreover, Petr's 'git' scripts
> > > wrap your "filesystem" plumbing to that very end.
> > > 
> > > To avoid confusion, I think it would be better to distinguish the two
> > > layers, perhaps by calling the low-level plumbing... 'gitfs', of course.
> > 
> > Or perhaps to come up with a name (or at least nickname) for the SCM.
> > 
> > GitMaster?
> > 
> 
> Cogito.  "Git inside" can be the first slogan.

What about tig?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Naming the SCM (was Re: Handling renames.)
  2005-04-14 20:53           ` Petr Baudis
@ 2005-04-14 20:58             ` H. Peter Anvin
  2005-04-14 21:01               ` Petr Baudis
  0 siblings, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2005-04-14 20:58 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Steven Cole, Andrew Timberlake-Newell, git, 'Zach Welch',
	'Linus Torvalds'

Petr Baudis wrote:

>>Cogito.  "Git inside" can be the first slogan.
> 
> What about tig?

I like "Cogito"; it's a real name, plus it'd be a good use for the 
otherwise-pretty-useless two-letter combination "cg".

	-hpa


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: Naming the SCM (was Re: Handling renames.)
  2005-04-14 20:58             ` H. Peter Anvin
@ 2005-04-14 21:01               ` Petr Baudis
  0 siblings, 0 replies; 28+ messages in thread
From: Petr Baudis @ 2005-04-14 21:01 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Cole, Andrew Timberlake-Newell, git, 'Zach Welch',
	'Linus Torvalds'

Dear diary, on Thu, Apr 14, 2005 at 10:58:52PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Petr Baudis wrote:
> 
> >>Cogito.  "Git inside" can be the first slogan.
> >
> >What about tig?
> 
> I like "Cogito"; it's a real name, plus it'd be a good use for the 
> otherwise-pretty-useless two-letter combination "cg".

Duh, believe me or not but I completely missed the "Cogito" part of
Steven's mail. Of course, I like it too.

I'll commit my poor man's git-merge-in-separate-tree and finally get
some sleep. I promise.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 17:54 Handling renames David Woodhouse
                   ` (2 preceding siblings ...)
  2005-04-14 18:21 ` H. Peter Anvin
@ 2005-04-14 22:23 ` Daniel Barkalow
  2005-04-14 22:46   ` David Woodhouse
  3 siblings, 1 reply; 28+ messages in thread
From: Daniel Barkalow @ 2005-04-14 22:23 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, James Bottomley

On Thu, 14 Apr 2005, David Woodhouse wrote:

> Opinions? Dissent? We'd probably need to escape the filenames in some
> way -- handwave over that for now.

I personally think renames are a minor thing that doesn't happen
much. What actually happens, in my opinion, is that some chunk of a file
is moved to a different, possibly new, file. If this is supported (as
something that the SCM notices), then a rename is just a special case
where the moved chunk is a whole file.

I think that it should be possible to identify and tag "big
enough" deletions and insertions, and compare them to find moves, where a
further change may be applied in the middle if two chunks are "very
similar" but not the same.

On the other hand, I think that the SCM will need to cache its
understanding of what a commit did in order to give reasonable
performance for operations like "annotate", and it may be advantegous to
distribute things from this cache, since the committer might want to tell
the system something that it didn't guess.

At some point, I'm going to argue for core support for "back pointers",
where a file can be created which is "about" some other file(s), and
someone looking for files "about" a particular file can find them without
searching the entire database. I think this will turn out to be important
for a variety of cases where some later participant wants to say something
about an existing file without changing the content of the file.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-14 22:23 ` Handling renames Daniel Barkalow
@ 2005-04-14 22:46   ` David Woodhouse
  0 siblings, 0 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-14 22:46 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, James Bottomley

On Thu, 2005-04-14 at 18:23 -0400, Daniel Barkalow wrote:
> I personally think renames are a minor thing that doesn't happen
> much. What actually happens, in my opinion, is that some chunk of a
> file is moved to a different, possibly new, file. If this is supported
> (as something that the SCM notices), then a rename is just a special
> case where the moved chunk is a whole file.

Certainly we'd discussed the possibility that the 'rename' field may
contain more than one destination, or more than one source filename.
This could happen when a file is split into two, or when two files are
merged into one, for example.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Naming the SCM (was Re: Handling renames.)
  2005-04-14 20:42         ` Naming the SCM (was Re: Handling renames.) Steven Cole
  2005-04-14 20:53           ` Petr Baudis
@ 2005-04-14 23:17           ` Peter Williams
  1 sibling, 0 replies; 28+ messages in thread
From: Peter Williams @ 2005-04-14 23:17 UTC (permalink / raw)
  To: Steven Cole
  Cc: Andrew Timberlake-Newell, git, 'Zach Welch',
	'Linus Torvalds'

Steven Cole wrote:
> On Thursday 14 April 2005 01:40 pm, Andrew Timberlake-Newell wrote:
> 
>>Zach Welch pontificated:
>>
>>>I imagine quite a few folks expect something not entirely unlike an SCM
>>>to emerge from these current efforts. Moreover, Petr's 'git' scripts
>>>wrap your "filesystem" plumbing to that very end.
>>>
>>>To avoid confusion, I think it would be better to distinguish the two
>>>layers, perhaps by calling the low-level plumbing... 'gitfs', of course.
>>
>>Or perhaps to come up with a name (or at least nickname) for the SCM.
>>
>>GitMaster?
>>
> 
> 
> Cogito.  "Git inside" can be the first slogan.
> 
> Differentiating the SCM built on top of git from git itself is probably worthwhile
> to avoid confusion.  Other SCMs may be developed later, built on git, and these
> can come up with their own clever names.

And the logo could be a dove which, as everybody knows, coos.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
@ 2005-04-15 13:37 linux
  2005-04-15 13:53 ` David Woodhouse
  0 siblings, 1 reply; 28+ messages in thread
From: linux @ 2005-04-15 13:37 UTC (permalink / raw)
  To: dwmw2; +Cc: git

> One option for optimising this, if we really need to, might be to track
> the file back to its _first_ ancestor and use that as an identification.
> The SCM could store that identifier in the blob itself, or we could
> consider it an 'inode number' and store it in git's tree objects.

This suggestion (and this whole discussion about renames) has issues
with file copies, which form a branch in the revision history.  If I
copy foo.c to foo2.c (or fs/ext2/ to fs/ext3/), then the oldest ancestor
isn't a "unique inode number".

I've written a lot of programs by debugging hello.c.

Thinking about this can give you all sorts of exciting merge possibilities.

If branch1 renames a.c to b.c, and branch2 patches a.c, it seems obvious
that the patch should be merged into b.c.  But what if branch1 copies a.c
to b.c?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-04-15 13:37 linux
@ 2005-04-15 13:53 ` David Woodhouse
  0 siblings, 0 replies; 28+ messages in thread
From: David Woodhouse @ 2005-04-15 13:53 UTC (permalink / raw)
  To: linux; +Cc: git

On Fri, 2005-04-15 at 13:37 +0000, linux@horizon.com wrote:
> > One option for optimising this, if we really need to, might be to track
> > the file back to its _first_ ancestor and use that as an identification.
> > The SCM could store that identifier in the blob itself, or we could
> > consider it an 'inode number' and store it in git's tree objects.
> 
> This suggestion (and this whole discussion about renames) has issues
> with file copies, which form a branch in the revision history.  If I
> copy foo.c to foo2.c (or fs/ext2/ to fs/ext3/), then the oldest ancestor
> isn't a "unique inode number".

That's why I prefer the option of simply annotating the moves. They
don't need to be just renames -- it can cover the cases where files are
split up or merged into one, to indicate where the history of the given
_data_ is coming from.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Handling renames.
  2005-10-22  0:37 ` Petr Baudis
@ 2005-10-22  0:47   ` Petr Baudis
  2005-10-22  1:28     ` Linus Torvalds
  0 siblings, 1 reply; 28+ messages in thread
From: Petr Baudis @ 2005-10-22  0:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

  (Having Apr 14 flashbacks? Good memory!)

Dear diary, on Sat, Oct 22, 2005 at 02:37:33AM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> told me that...
> But my main concern is - will it be possible to do the rename detection
> here as well? Using --dense instead of explicit diff-tree calls in
> cg-log would be nice optimization, but I was about to add support for
> optional following of renames for cg-log <filename>. That's really
> pretty useful to have, every time I hit the Junio's big scripts rename
> I keep repeating that to myself. ;-) Now when core GIT got the comfort
> of per-file history, I only hope that it will start to annoy you as well.

  After all, this might be as good time to bring this up as any. Heavy
post-1.0 material follows:

  How to track renames? I believe the situation has changed in the last
half a year.  GIT really is a full-fledged SCM by now (at least its
major part code-wise), and I think it's hopefully becoming obvious that
we need to track renames. I actually decided to skip the whole
discussion why so, because we already _do_ concern ourselves with
renames - that's what the cool diff -M gadget does. And people start to
want to use it for all kind of history-digging stuff (not just for nice
diffs between trees).

  So the problem is whether we should make this explicit. diff -M is
only a heuristic and it can go wrong, while it was empirically found
out in other SCMs that people actually don't mind telling their SCM
about renames explicitly - no more than telling it about adds and
removals explicitly. So the user is willing to tell us what precisely
happened and it would be foolish to throw that away and insist on
guessing.  Besides, guessing (and even doing that everytime we go
through the history) is fundamentally slow, orders of magnitude more
than just a tree diff.


  If I convince you that it is worth tracking the renames explicitly,
"how" is already a minor question. One idea of mine was to add an "edge"
object describing the edge between two trees (that's optimized for
flexible use - some people use GIT for really weird things and perhaps
do not use commits at all; edge between two commits would be optimized
for flexibility in case we will later think of some other cool stuff to
track at the edge):

	trees c53f757133bb84a2d87e901c49207e9b7c48e1a6 6bc7aa4f652d0ef49108d9e30a7ea7fbf8e44639
	rename git-pull-script\0git-pull.sh\0
	copy somefile1\0somefile2\0
	rewrite anotherfile1\0anotherfile2\0

  (The "rewrite" line might be controversial. And you might want to
merge this with the delta objects during packing, or do something
similarly clever.)

  Then you could e.g. pass the edge object ID as the second ID on the
parent line:

	parent 99977bd5fdeabbd0608a70e9411c243007ec4ea2 edgeobjectid

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-10-22  0:47   ` Handling renames Petr Baudis
@ 2005-10-22  1:28     ` Linus Torvalds
  2005-10-22  1:51       ` Petr Baudis
  0 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2005-10-22  1:28 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Git Mailing List



On Sat, 22 Oct 2005, Petr Baudis wrote:
> 
>   How to track renames? I believe the situation has changed in the last
> half a year.

I disagree.

Every single thing that said that renames were a bad idea to track when 
git started is still equally true.

>   If I convince you that it is worth tracking the renames explicitly,
> "how" is already a minor question.

Never. I'm 100% convinced that tracking renames is WRONG WRONG WRONG.

You can follow renames _afterwards_. 

Git tracks contents. And I think we've proven that figuring out renames 
after-the-fact from those contents is not only doable, but very well 
supported already.

I'm convinced that git handles renames better than any other SCM ever. 
Exactly because we figure it out when it matters.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-10-22  1:28     ` Linus Torvalds
@ 2005-10-22  1:51       ` Petr Baudis
  2005-10-22  2:10         ` Junio C Hamano
  2005-10-22  3:23         ` Linus Torvalds
  0 siblings, 2 replies; 28+ messages in thread
From: Petr Baudis @ 2005-10-22  1:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

> Every single thing that said that renames were a bad idea to track
> when git started is still equally true.

It'd be good to clarify whether we discuss whether the idea of tracking
renames is good or bad, or whether having the user explicitly specify
renames is better than figuring that out automagically. Your first
comment would indicate the former, but the rest of your reply the
latter.

> You can follow renames _afterwards_. 

I can - crudely, but what's the point, if the user is dying to give me
the information.

> Git tracks contents. And I think we've proven that figuring out renames 
> after-the-fact from those contents is not only doable, but very well 
> supported already.

It's unreliable and it's slow (well, perhaps I should get some numbers
to back that out, but given how it is done I take it for granted). Does
not sound too "very well" to me.

> I'm convinced that git handles renames better than any other SCM ever. 
> Exactly because we figure it out when it matters.

It matters at least every time you show per-file history and every time
you merge cross the rename. I think that can be both pretty common if
you ever do the rename. That means you can do an expensive guess
every time you hit that, and the guess can get it wrong, in which case
there is no way around that and you lose.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-10-22  1:51       ` Petr Baudis
@ 2005-10-22  2:10         ` Junio C Hamano
  2005-10-22  2:49           ` Petr Baudis
  2005-10-22  3:23         ` Linus Torvalds
  1 sibling, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2005-10-22  2:10 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis <pasky@suse.cz> writes:

>> I'm convinced that git handles renames better than any other SCM ever. 
>> Exactly because we figure it out when it matters.
>
> It matters at least every time you show per-file history and every time
> you merge cross the rename. I think that can be both pretty common if
> you ever do the rename. That means you can do an expensive guess
> every time you hit that, and the guess can get it wrong, in which case
> there is no way around that and you lose.

I think it is OK for the higher level layer (like your
single-file-history follower) to use what you outlined with
"edges", as either a hint/request from the user and/or a cache
of what the expensive and unreliable thing figured out.

I however do not necessarily think adding the "edges"
information as an optional second item on "parent " line in a
commit is a good idea.  Rather, treating the set of "edges" just
like we treat the grafts feel more appropriate to me.  IOW,
create .git/info/edges and keep your edge information there.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-10-22  2:10         ` Junio C Hamano
@ 2005-10-22  2:49           ` Petr Baudis
  0 siblings, 0 replies; 28+ messages in thread
From: Petr Baudis @ 2005-10-22  2:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Dear diary, on Sat, Oct 22, 2005 at 04:10:59AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> I think it is OK for the higher level layer (like your
> single-file-history follower) to use what you outlined with
> "edges", as either a hint/request from the user and/or a cache
> of what the expensive and unreliable thing figured out.
> 
> I however do not necessarily think adding the "edges"
> information as an optional second item on "parent " line in a
> commit is a good idea.

Well, it's obviously unflying idea when it doesn't become core part of
GIT. ;-) That won't stop me with Cogito, though...

> Rather, treating the set of "edges" just
> like we treat the grafts feel more appropriate to me.  IOW,
> create .git/info/edges and keep your edge information there.

The thing is, you aren't supposed to have a lot of grafts, they are
expected to be rare. OTOH, renames aren't really all that rare, and as
the history goes, you'll accumulate a lot of them. Besides, it's not
just a cache - in case the user manually recorded the rename, it's an
important historical information, as important as e.g. the commit
message. It should be also distributed with the rest of the history
data, not be a per-repository thing. (Actually, this does not mix with
the cache at all - that should be a separate thing, quite possibly in
info/.)


But if GIT won't do this, I will have to do this in some Cogito-specific
way. My preliminary plan is appending something like this to the commit
message:

	---
	!%$rename foo\x20bar baz\nquux

!%$ being some arbitrary magic literal, and filter this out from cg-log
and such. Also, all Cogito commits adding/removing files but not
renaming any would have a '!%$norenames' line in the commit message, so
that I know that I don't have to do the expensive heuristics because no
rename/copy happened.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
VI has two modes: the one in which it beeps and the one in which
it doesn't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Handling renames.
  2005-10-22  1:51       ` Petr Baudis
  2005-10-22  2:10         ` Junio C Hamano
@ 2005-10-22  3:23         ` Linus Torvalds
  1 sibling, 0 replies; 28+ messages in thread
From: Linus Torvalds @ 2005-10-22  3:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Git Mailing List



On Sat, 22 Oct 2005, Petr Baudis wrote:
> 
> > You can follow renames _afterwards_. 
> 
> I can - crudely, but what's the point, if the user is dying to give me
> the information.

No the user is NOT.

The fact is, users have not a frigging clue when a rename happens. 

I told you before, I'll tell you again: if you depend on users telling you 
about renames, you'll get it wrong. You'll get it wrong quite often, in 
fact.

This is not something I'm going to discuss again. Go back to all the same 
arguments from 6 months ago. I was right then, I'm right now.

		Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2005-10-22  3:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-14 17:54 Handling renames David Woodhouse
2005-04-14 18:11 ` Linus Torvalds
2005-04-14 19:09   ` David Woodhouse
2005-04-14 18:12 ` Ingo Molnar
2005-04-14 18:32   ` Linus Torvalds
2005-04-14 18:58     ` Ingo Molnar
2005-04-14 19:20       ` David Woodhouse
2005-04-14 19:21     ` David Mansfield
2005-04-14 18:21 ` H. Peter Anvin
2005-04-14 18:48   ` Linus Torvalds
2005-04-14 18:49     ` H. Peter Anvin
2005-04-14 19:22     ` Zach Welch
2005-04-14 19:40       ` Andrew Timberlake-Newell
2005-04-14 20:42         ` Naming the SCM (was Re: Handling renames.) Steven Cole
2005-04-14 20:53           ` Petr Baudis
2005-04-14 20:58             ` H. Peter Anvin
2005-04-14 21:01               ` Petr Baudis
2005-04-14 23:17           ` Peter Williams
2005-04-14 22:23 ` Handling renames Daniel Barkalow
2005-04-14 22:46   ` David Woodhouse
  -- strict thread matches above, loose matches on Subject: below --
2005-04-15 13:37 linux
2005-04-15 13:53 ` David Woodhouse
2005-10-21 23:40 git-rev-list: add "--dense" flag Linus Torvalds
2005-10-22  0:37 ` Petr Baudis
2005-10-22  0:47   ` Handling renames Petr Baudis
2005-10-22  1:28     ` Linus Torvalds
2005-10-22  1:51       ` Petr Baudis
2005-10-22  2:10         ` Junio C Hamano
2005-10-22  2:49           ` Petr Baudis
2005-10-22  3:23         ` Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).