git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git and symlinks as tracked content
@ 2005-05-03 18:33 Kay Sievers
  2005-05-03 19:02 ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Kay Sievers @ 2005-05-03 18:33 UTC (permalink / raw
  To: git

Is there a sane model to make git aware of tracking symlinks in the
repository? In the bk udev tree we've had a test sysfs-tree with a lot
of symlinks in it.

Where can we store the link-target? In its own blob-object or directly
in the tree-object?

How would a exported "patch" with symlinks as content look like?

Thanks,
Kay


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 18:33 git and symlinks as tracked content Kay Sievers
@ 2005-05-03 19:02 ` Linus Torvalds
  2005-05-03 19:10   ` Morten Welinder
                     ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Linus Torvalds @ 2005-05-03 19:02 UTC (permalink / raw
  To: Kay Sievers; +Cc: git



On Tue, 3 May 2005, Kay Sievers wrote:
>
> Is there a sane model to make git aware of tracking symlinks in the
> repository? In the bk udev tree we've had a test sysfs-tree with a lot
> of symlinks in it.
> 
> Where can we store the link-target? In its own blob-object or directly
> in the tree-object?

I'd suggest you create a blob object with the symlink name, and then in
the tree you point to that blob, but with the S_IFLNK value in the mode
field (0120000).

So you have

 - directories: S_IFDIR (0040000) point to "tree" objects for contents
 - symlinks: S_IFLNK (0120000) point to "blob" objects
 - executables: S_IFREG | 0755 (0100755) point to "blob" objects
 - regular files: S_IFREG | 0644 (0100644) point to "blob" objects

which seems very sane and regular. 

Now, I also haev a plan for device nodes, but that one is so ugly that I'm 
a bit ashamed of it. That one does:

 - S_IFCHR/S_IFBLK (0020000 or 0060000), with the 20-byte SHA1 not being a 
   SHA1 at all, but just the major:minor numbers in some nice binary 
   encoding. Probably: two network byte order 32-bit values, with twelve 
   bytes of some non-zero signature (the SHA1 of all zeroes should be 
   avoided, so the signature really should be soemthing else than just 
   twelve bytes of zero).

That should cover most of it.

> How would a exported "patch" with symlinks as content look like?

The easiest way is to make this exactly the same as the "executable bit". 
A symlink is just a normal blob, it just has a "symlink mode" instead of 
"0755" or "0644" mode.

When you think of it that way, the "patch" ends up falling out very 
naturally, I think. It would look like

	New file: filename (Mode: 0120000)
	--- /dev/null
	+++ filename
	@@ 0,0 1,1
	+symlink-value

(or something, you get the idea).

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:02 ` Linus Torvalds
@ 2005-05-03 19:10   ` Morten Welinder
  2005-05-03 19:50   ` H. Peter Anvin
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Morten Welinder @ 2005-05-03 19:10 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Kay Sievers, git

Something in the patching food chain will also need to know how to turn
regular files into symlinks (and vice versa) in the same we ought to have
that for directories right now.

Morten

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:02 ` Linus Torvalds
  2005-05-03 19:10   ` Morten Welinder
@ 2005-05-03 19:50   ` H. Peter Anvin
  2005-05-03 19:57   ` Andreas Gal
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2005-05-03 19:50 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Kay Sievers, git

Linus Torvalds wrote:
> 
> So you have
> 
>  - directories: S_IFDIR (0040000) point to "tree" objects for contents
>  - symlinks: S_IFLNK (0120000) point to "blob" objects
>  - executables: S_IFREG | 0755 (0100755) point to "blob" objects
>  - regular files: S_IFREG | 0644 (0100644) point to "blob" objects
> 
> which seems very sane and regular. 
> 

One thing about using a hierarchy of "tree" objects... as far as I 
understand today, it's possible for "git" to represent a limited 
scattering of files underneath the root, such as keeping one's 
configuration files underneath one's home directory.  Scanning the whole 
home directory to check in (or worse, out) files would suck.

On the other hand, having a single "tree" object for a large project 
that would have to be constantly updated would suck, too.

This is certainly *not* mutually exclusive; it's mostly a matter of 
making sure that if scaffolding directory objects are necessary, that 
they can be automatically added/created, and aren't exhaustively 
searched for uncontrolled objects.

> Now, I also haev a plan for device nodes, but that one is so ugly that I'm 
> a bit ashamed of it. That one does:
> 
>  - S_IFCHR/S_IFBLK (0020000 or 0060000), with the 20-byte SHA1 not being a 
>    SHA1 at all, but just the major:minor numbers in some nice binary 
>    encoding. Probably: two network byte order 32-bit values, with twelve 
>    bytes of some non-zero signature (the SHA1 of all zeroes should be 
>    avoided, so the signature really should be soemthing else than just 
>    twelve bytes of zero).
> 

OK, that's ugly.  I'm impressed.  :)

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:02 ` Linus Torvalds
  2005-05-03 19:10   ` Morten Welinder
  2005-05-03 19:50   ` H. Peter Anvin
@ 2005-05-03 19:57   ` Andreas Gal
  2005-05-03 20:05     ` Linus Torvalds
  2005-05-03 20:23   ` Junio C Hamano
  2005-05-04 22:35   ` Kay Sievers
  4 siblings, 1 reply; 29+ messages in thread
From: Andreas Gal @ 2005-05-03 19:57 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Kay Sievers, git


>  - S_IFCHR/S_IFBLK (0020000 or 0060000), with the 20-byte SHA1 not being a 
>    SHA1 at all, but just the major:minor numbers in some nice binary 
>    encoding. Probably: two network byte order 32-bit values, with twelve 
>    bytes of some non-zero signature (the SHA1 of all zeroes should be 
>    avoided, so the signature really should be soemthing else than just 
>    twelve bytes of zero).

Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
For every hash you can locate the file, determine its type/tag, unpack it, 
and check the SHA1 hash. The proposal above breaks all that. Why not just 
introduce a new object type "dev" and put major minor in there. It 
will still always hash to the same SHA1 hash value, but fits much better in the 
overall design. 

Andreas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:57   ` Andreas Gal
@ 2005-05-03 20:05     ` Linus Torvalds
  2005-05-03 20:09       ` Kay Sievers
  2005-05-03 21:30       ` Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2005-05-03 20:05 UTC (permalink / raw
  To: Andreas Gal; +Cc: Kay Sievers, git



On Tue, 3 May 2005, Andreas Gal wrote:
> 
> Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
> For every hash you can locate the file, determine its type/tag, unpack it, 
> and check the SHA1 hash. The proposal above breaks all that. Why not just 
> introduce a new object type "dev" and put major minor in there. It 
> will still always hash to the same SHA1 hash value, but fits much better in the 
> overall design. 

Hey, I don't personally care that much. I don't see anybody using 
character device nodes in the kernel tree, and I don't think most SCM's 
support stuff like that anyway ;)

If you want to make it a blob (and have a use for it), go wild. 

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 20:05     ` Linus Torvalds
@ 2005-05-03 20:09       ` Kay Sievers
  2005-05-03 21:30       ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2005-05-03 20:09 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Andreas Gal, git

On Tue, 2005-05-03 at 13:05 -0700, Linus Torvalds wrote:
> 
> On Tue, 3 May 2005, Andreas Gal wrote:
> > 
> > Yuck. Thats really ugly. Right now all files have a uniform touch to them. 
> > For every hash you can locate the file, determine its type/tag, unpack it, 
> > and check the SHA1 hash. The proposal above breaks all that. Why not just 
> > introduce a new object type "dev" and put major minor in there. It 
> > will still always hash to the same SHA1 hash value, but fits much better in the 
> > overall design. 
> 
> Hey, I don't personally care that much. I don't see anybody using 
> character device nodes in the kernel tree, and I don't think most SCM's 
> support stuff like that anyway ;)

Well, you need to be root to create device nodes, that is not a usual
requirement for an SCM checkout. :)

Kay


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:02 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2005-05-03 19:57   ` Andreas Gal
@ 2005-05-03 20:23   ` Junio C Hamano
  2005-05-04 22:35   ` Kay Sievers
  4 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2005-05-03 20:23 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Kay Sievers, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Tue, 3 May 2005, Kay Sievers wrote:
>> Where can we store the link-target? In its own blob-object or directly
>> in the tree-object?

LT>  - directories: S_IFDIR (0040000) point to "tree" objects for contents
LT>  - symlinks: S_IFLNK (0120000) point to "blob" objects
LT>  - executables: S_IFREG | 0755 (0100755) point to "blob" objects
LT>  - regular files: S_IFREG | 0644 (0100644) point to "blob" objects

These and the device nodes you mention would work naturally on
the cache side and I generally like this idea.  You need to
update checkout-cache to (attempt to) create the right kind of
file, but that is about it.

On the diff side, things are a bit more interesting.  Both the
git-diff-tree-helper engine (diff.c) and git-apply-patch-script
need to be told about these new types of objects, or at least
tightened up to ignore them until they know how to support them.

LT> When you think of it that way, the "patch" ends up falling out very 
LT> naturally, I think. It would look like

LT> 	New file: filename (Mode: 0120000)
LT> 	--- /dev/null
LT> 	+++ filename
LT> 	@@ 0,0 1,1
LT> 	+symlink-value

LT> (or something, you get the idea).

I've always wanted to have this from the normal "diff -r"
output, but we have to be careful.  You do not want to
accidentally feed the normal patch that kind of output.  How
about doing something like this?

GIT: filename (mode:120000)
 --- /dev/null
 +++ filename
 @@ -0,0 +1 @@
 +symlink-value

GIT: filename (mode:120000)
 --- filename
 +++ /dev/null
 @@ -1 +0,0 @@
 -symlink-value

GIT: filename (mode:120000->120000)
 --- filename
 +++ filename
 @@ -1 +1 @@
 -old-symlink-value
 +new-symlink-value

That is, to indent them to keep patch from noticing them [*1*].

About the device nodes, the diffed contents would be major and
minor in decimal notation, and the real filesystem permission
bits and ownerships (e.g. changing /dev/audio from 0600 to 0660
or from root:root to root:audio).  I do not know if we would
want owner/group in symbolic or numeric yet.

GIT: filename (mode:0020000->0020000)
 --- dev/audio
 +++ dev/audio
 @@ -1,5 +1,5 @@
  major=14
  minor=4
  owner=root
 -group=root
 -perm=0600
 +group=audio
 +perm=0660

[Footnote]

*1* A careful but not careful enough reader would wonder if the use
of "--- /dev/null" or "+++ /dev/null" to represent an addition
and a deletion may hamper managing the device node "/dev/null",
but this is not a problem.  Such a device node managed by GIT
will appear as "--- dev/null" or "+++ dev/null", without the
leading slash.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 20:05     ` Linus Torvalds
  2005-05-03 20:09       ` Kay Sievers
@ 2005-05-03 21:30       ` Junio C Hamano
  2005-05-03 21:51         ` Andreas Gal
  2005-05-03 22:56         ` git and symlinks as tracked content H. Peter Anvin
  1 sibling, 2 replies; 29+ messages in thread
From: Junio C Hamano @ 2005-05-03 21:30 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Andreas Gal, Kay Sievers, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Tue, 3 May 2005, Andreas Gal wrote:

>> Yuck. Thats really ugly. Right now all files have a uniform
>> touch to them.  For every hash you can locate the file,
>> determine its type/tag, unpack it, and check the SHA1
>> hash. The proposal above breaks all that. Why not just
>> introduce a new object type "dev" and put major minor in
>> there. It will still always hash to the same SHA1 hash value,
>> but fits much better in the overall design.

LT> Hey, I don't personally care that much. I don't see anybody using 
LT> character device nodes in the kernel tree, and I don't think most SCM's 
LT> support stuff like that anyway ;)

LT> If you want to make it a blob (and have a use for it), go wild. 

Introducing "dev" type, as Andreas suggests, is wrong.  This
this should be done in the same way as you suggested for the
symlink case.  Store a blob object with those chrdev or blkdev
modes whose contents are of form:

    major=14
    minor=4
    owner=root
    group=audio
    perm=0660

This would impact the diff side least, and for the cache side it
does not matter in storing and merging.  checkout-cache still
needs to know about this.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 21:30       ` Junio C Hamano
@ 2005-05-03 21:51         ` Andreas Gal
  2005-05-03 22:44           ` Junio C Hamano
  2005-05-03 22:56         ` git and symlinks as tracked content H. Peter Anvin
  1 sibling, 1 reply; 29+ messages in thread
From: Andreas Gal @ 2005-05-03 21:51 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Kay Sievers, git


Whether you use an explicit "dev" type or an implicit "dev" type that 
calls itself "blob" and uses a magic mode flag to tell checkout that it 
needs special treatment doesn't make a difference (whatever you 
prefer, really). I was only trying to make the point that hashes should remain 
hashes and not become a placeholder for minors/majors. However, as 
somebody already suggested, the entire issue is probably moot. When was the last 
time you tried to version control /dev? ;)

Andreas

On Tue, 3 May 2005, Junio C Hamano wrote:

> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> On Tue, 3 May 2005, Andreas Gal wrote:
> 
> >> Yuck. Thats really ugly. Right now all files have a uniform
> >> touch to them.  For every hash you can locate the file,
> >> determine its type/tag, unpack it, and check the SHA1
> >> hash. The proposal above breaks all that. Why not just
> >> introduce a new object type "dev" and put major minor in
> >> there. It will still always hash to the same SHA1 hash value,
> >> but fits much better in the overall design.
> 
> LT> Hey, I don't personally care that much. I don't see anybody using 
> LT> character device nodes in the kernel tree, and I don't think most SCM's 
> LT> support stuff like that anyway ;)
> 
> LT> If you want to make it a blob (and have a use for it), go wild. 
> 
> Introducing "dev" type, as Andreas suggests, is wrong.  This
> this should be done in the same way as you suggested for the
> symlink case.  Store a blob object with those chrdev or blkdev
> modes whose contents are of form:
> 
>     major=14
>     minor=4
>     owner=root
>     group=audio
>     perm=0660
> 
> This would impact the diff side least, and for the cache side it
> does not matter in storing and merging.  checkout-cache still
> needs to know about this.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 21:51         ` Andreas Gal
@ 2005-05-03 22:44           ` Junio C Hamano
  2005-05-04  0:39             ` Sym-links, b/c-special files, pipes, ... Scope Creep Brian O'Mahoney
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2005-05-03 22:44 UTC (permalink / raw
  To: Andreas Gal; +Cc: Linus Torvalds, Kay Sievers, git

>>>>> "AG" == Andreas Gal <gal@uci.edu> writes:

AG> Whether you use an explicit "dev" type or an implicit "dev"
AG> type that calls itself "blob" and uses a magic mode flag to
AG> tell checkout that it needs special treatment doesn't make a
AG> difference.

True.  The use of word "wrong" in my message was _wrong_.  But
my gut feeling is that the code that has to deal with the dev
and symlink stuff would be simpler if we just stick to the blob
type.

AG> When was the last time you tried to version control /dev? ;)

Tried?  Never.  Wished?  Number of times.  It's just that there
is no such SCM that does this natively, so I keep "ls -l /dev"
output under CVS control as a rough approximation.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 21:30       ` Junio C Hamano
  2005-05-03 21:51         ` Andreas Gal
@ 2005-05-03 22:56         ` H. Peter Anvin
  2005-05-03 23:16           ` Junio C Hamano
  2005-05-04 15:48           ` David A. Wheeler
  1 sibling, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2005-05-03 22:56 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Andreas Gal, Kay Sievers, git

Junio C Hamano wrote:
> 
> Introducing "dev" type, as Andreas suggests, is wrong.  This
> this should be done in the same way as you suggested for the
> symlink case.  Store a blob object with those chrdev or blkdev
> modes whose contents are of form:
> 
>     major=14
>     minor=4
>     owner=root
>     group=audio
>     perm=0660
> 
> This would impact the diff side least, and for the cache side it
> does not matter in storing and merging.  checkout-cache still
> needs to know about this.
> 

Owner and permissions are part of the tree object, and apply to all file 
types.  The only thing equivalent to file data is the major,minor; 
storing it as a comma-separated decimal ASCII string is probably the 
cleanest, i.e. for your exaple:

14,4

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 22:56         ` git and symlinks as tracked content H. Peter Anvin
@ 2005-05-03 23:16           ` Junio C Hamano
  2005-05-03 23:18             ` H. Peter Anvin
  2005-05-04 15:48           ` David A. Wheeler
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2005-05-03 23:16 UTC (permalink / raw
  To: H. Peter Anvin; +Cc: Linus Torvalds, Andreas Gal, Kay Sievers, git

>>>>> "HPA" == H Peter Anvin <hpa@zytor.com> writes:

HPA> Owner and permissions are part of the tree object, and apply to all
HPA> file types.

Huh?  I am confused...  Do you mean tree object should be
changed to record these?  That would make the existing in-cache
merging of files, which GIT was built for, quite interesting...

Well, doing device nodes _is_ a tangent, so let's drop this
discussion.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 23:16           ` Junio C Hamano
@ 2005-05-03 23:18             ` H. Peter Anvin
  2005-05-03 23:42               ` Linus Torvalds
  2005-05-03 23:42               ` Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2005-05-03 23:18 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Andreas Gal, Kay Sievers, git

Junio C Hamano wrote:
>>>>>>"HPA" == H Peter Anvin <hpa@zytor.com> writes:
> 
> 
> HPA> Owner and permissions are part of the tree object, and apply to all
> HPA> file types.
> 
> Huh?  I am confused...  Do you mean tree object should be
> changed to record these?  That would make the existing in-cache
> merging of files, which GIT was built for, quite interesting...
> 
> Well, doing device nodes _is_ a tangent, so let's drop this
> discussion.
> 

No, the tree object *ALREADY* records these.

BLOB: A "blob" object is nothing but a binary blob of data, and doesn't
refer to anything else.  There is no signature or any other verification
of the data, so while the object is consistent (it _is_ indexed by its
sha1 hash, so the data itself is certainly correct), it has absolutely
no other attributes.  No name associations, no permissions.  It is
purely a blob of data (ie normally "file contents").

TREE: The next hierarchical object type is the "tree" object.  A tree
object is a list of permission/name/blob data, sorted by name.  In other
words the tree object is uniquely determined by the set contents, and so
two separate but identical trees will always share the exact same
object.

	-hpa

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 23:18             ` H. Peter Anvin
@ 2005-05-03 23:42               ` Linus Torvalds
  2005-05-03 23:42               ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2005-05-03 23:42 UTC (permalink / raw
  To: H. Peter Anvin; +Cc: Junio C Hamano, Andreas Gal, Kay Sievers, git



On Tue, 3 May 2005, H. Peter Anvin wrote:
> 
> No, the tree object *ALREADY* records these.

Not ownership.

Yes, the permissions are there, but if you actually want to track 
ownership (or things like "mtime" etc), you really do have to track it 
outside the tree object.

Also, right now git will actually ignore most of the permission bits too.  
We can change that, and make it a dynamic setting somewhere (some flag in
a ".git/settings" file or something), but it does boil down to the fact
that a software development tree tracker wants different things than
something that tracks system settings.

For example, generating different trees just because different users had
different umask settings clearly didn't work out. Which means that right
now git really only tracks the "owner execute" bit of the permissions, and
always resets the other bits to 0755 or 0644 depending on that _one_ bit.

And similarly, tracking actual uid/gid information would _really_ not work 
for a distributed kernel source management system, so that's not even in 
the tree.

So if you want to track system files, right now "raw git" is _not_ the way 
to do it. You'd want something else. 

Of course, that's actually true largely even of normal /dev contents.  
That's why we've moved towards udev, and having things like device
permissions and ownership not be "filesystem attributes", but really
_rules_ in a udev database. So the fact that git doesn't track them isn't
necessarily a problem for /dev - since modern /dev really wants to track
them at a higher level _anyway_ (and you'd use git to track the _rules_,
not the ownership things themselves).

But if you'd want to track other system directories with git, you'd
probably need to either (a) do serious surgery on git itself, or (probably
preferable) by (b) track the extra things you want "manually" using a file
(that is tracked in git) that describes the ownership and permission data.

Whether git is really suitable for tracking non-source projects is
obviously debatable. It's not what it was designed for, and it _may_ be 
able to do so partly just by luck.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 23:18             ` H. Peter Anvin
  2005-05-03 23:42               ` Linus Torvalds
@ 2005-05-03 23:42               ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2005-05-03 23:42 UTC (permalink / raw
  To: H. Peter Anvin; +Cc: Linus Torvalds, Andreas Gal, Kay Sievers, git

>>>>> "HPA" == H Peter Anvin <hpa@zytor.com> writes:
HPA> Junio C Hamano wrote:

HPA> Owner and permissions are part of the tree object, and apply to
HPA> all file types.

>> Huh?  I am confused...  Do you mean tree object should be
>> changed to record these?  That would make the existing in-cache
>> merging of files, which GIT was built for, quite interesting...

HPA> No, the tree object *ALREADY* records these.

As you quoted (and before I uttered my previous confusion I did
look at the code in write-tree.c which I thought to match this
description) ...

HPA> TREE: The next hierarchical object type is the "tree" object.  A tree
HPA> object is a list of permission/name/blob data, sorted by name.  In other
HPA> words the tree object is uniquely determined by the set contents, and so
HPA> two separate but identical trees will always share the exact same
HPA> object.

... it records permission (but not in the 0660 vs 0600 sense ---
it just records executable bit for file blobs and the treeness
by recording S_IFDIR), name and SHA1.  There is no owner or
group information recorded there [*1*].

I am afraid I am missing something in my reading of write-tree.c

Quite confused...

[Footnote]

*1* Nor there should be.  Otherwise comparing two identical
trees representing the same set of files become meaningless.

The reason why I placed these information in my hypothetical
representation of device nodes is exactly that.  To record owner
and group information is meaningless and harmful for the purpose
of version controlling the source files but it matters _if_ we
wanted to maintain device nodes in GIT.  Since it matters only
for those things, it would be preferable to have it as part of
the data that describes the object (i.e. device nodes), not part
of the data that contains the object (i.e. tree).  And I thought
GIT tree object is already doing the right thing by not
recording them.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Sym-links, b/c-special files, pipes, ... Scope Creep
  2005-05-03 22:44           ` Junio C Hamano
@ 2005-05-04  0:39             ` Brian O'Mahoney
  0 siblings, 0 replies; 29+ messages in thread
From: Brian O'Mahoney @ 2005-05-04  0:39 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Caution, let us all carefully understand the Source-Code/
Configuration Management issue.

I for one will be very happy if we get a really good distributed
SCM out of this.

Brian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 22:56         ` git and symlinks as tracked content H. Peter Anvin
  2005-05-03 23:16           ` Junio C Hamano
@ 2005-05-04 15:48           ` David A. Wheeler
  2005-05-04 23:03             ` Daniel Barkalow
  1 sibling, 1 reply; 29+ messages in thread
From: David A. Wheeler @ 2005-05-04 15:48 UTC (permalink / raw
  To: H. Peter Anvin
  Cc: Junio C Hamano, Linus Torvalds, Andreas Gal, Kay Sievers, git

Linus Torvalds wrote:
 >Also, right now git will actually ignore most of the permission bits 
too.  
 >We can change that, and make it a dynamic setting somewhere (some flag in
 >a ".git/settings" file or something), but it does boil down to the fact
 >that a software development tree tracker wants different things than
 >something that tracks system settings.
...
 >So if you want to track system files, right now "raw git" is _not_ the 
way
 >to do it. You'd want something else.
...
 >But if you'd want to track other system directories with git, you'd
 >probably need to either (a) do serious surgery on git itself, or (probably
 >preferable) by (b) track the extra things you want "manually" using a file
 >(that is tracked in git) that describes the ownership and permission data.
 >
 >Whether git is really suitable for tracking non-source projects is
 >obviously debatable. It's not what it was designed for, and it _may_ be
 >able to do so partly just by luck.

I suspect there's a 95% point which is easily achieved, &
beyond that it's not clear it's worth it.

I recall seeing several source code directories that actually use symlinks
in their source, and thus would want them preserved by the SCM.
(Not arguing that's the BEST plan, merely an observation).
As this discussion has noted, that wouldn't be hard to add symlink
support to git, and WOULD be helpful for its primary purpose as SCM support.

Once you're there, it wouldn't be hard to add logic to add options to
(1) record the REAL permission bits, (2) record "." files, and
(3) recover the permission bits.  That would be enough to
store & recover in a distributed way a single person's home directory.
THAT might be darn useful, for those of us who float between
different systems & would like to use a single system for multiple purposes.
That's clearly beyond the scope of a typical SCM, but since
it's easy to get there, that'd make sense.

I'm ambivalent about supporting dev, uid/gid, and mtime, and how
it should be done; that may be beyond the "worth it" step.

--- David A. Wheeler


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-03 19:02 ` Linus Torvalds
                     ` (3 preceding siblings ...)
  2005-05-03 20:23   ` Junio C Hamano
@ 2005-05-04 22:35   ` Kay Sievers
  2005-05-04 23:16     ` Junio C Hamano
  4 siblings, 1 reply; 29+ messages in thread
From: Kay Sievers @ 2005-05-04 22:35 UTC (permalink / raw
  To: Linus Torvalds; +Cc: git

On Tue, May 03, 2005 at 12:02:33PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 3 May 2005, Kay Sievers wrote:
> >
> > Is there a sane model to make git aware of tracking symlinks in the
> > repository? In the bk udev tree we've had a test sysfs-tree with a lot
> > of symlinks in it.
> > 
> > Where can we store the link-target? In its own blob-object or directly
> > in the tree-object?
> 
> I'd suggest you create a blob object with the symlink name, and then in
> the tree you point to that blob, but with the S_IFLNK value in the mode
> field (0120000).
> 
> So you have
> 
>  - directories: S_IFDIR (0040000) point to "tree" objects for contents
>  - symlinks: S_IFLNK (0120000) point to "blob" objects
>  - executables: S_IFREG | 0755 (0100755) point to "blob" objects
>  - regular files: S_IFREG | 0644 (0100644) point to "blob" objects
> 
> which seems very sane and regular. 

Here is a first try, that is able to track symlinks. Please have a look
if something like this is acceptable, then I will finish it.

Thanks,
Kay

--- a/check-files.c
+++ b/check-files.c
@@ -28,8 +28,8 @@ static void check_file(const char *path)
 		die("preparing to update existing file '%s' not in cache", path);
 	ce = active_cache[pos];
 
-	if (fstat(fd, &st) < 0)
-		die("fstat(%s): %s", path, strerror(errno));
+	if (lstat(path, &st) < 0)
+		die("lstat(%s): %s", path, strerror(errno));
 
 	changed = cache_match_stat(ce, &st);
 	if (changed)
--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -72,23 +72,37 @@ static int write_entry(struct cache_entr
 	unsigned long size;
 	long wrote;
 	char type[20];
+	unsigned int mode;
 
 	new = read_sha1_file(ce->sha1, type, &size);
 	if (!new || strcmp(type, "blob")) {
 		return error("checkout-cache: unable to read sha1 file of %s (%s)",
 			path, sha1_to_hex(ce->sha1));
 	}
-	fd = create_file(path, ntohl(ce->ce_mode));
-	if (fd < 0) {
+	mode = ntohl(ce->ce_mode);
+	if (S_ISLNK(mode)) {
+		char target[1024];
+		memcpy(target, new, size);
+		target[size] = '\0';
+		if (symlink(target, path)) {
+			free(new);
+			return error("checkout-cache: unable to create link %s (%s)",
+				path, strerror(errno));
+		}
+		free(new);
+	} else {
+		fd = create_file(path, mode);
+		if (fd < 0) {
+			free(new);
+			return error("checkout-cache: unable to create file %s (%s)",
+				path, strerror(errno));
+		}
+		wrote = write(fd, new, size);
+		close(fd);
 		free(new);
-		return error("checkout-cache: unable to create %s (%s)",
-			path, strerror(errno));
+		if (wrote != size)
+			return error("checkout-cache: unable to write %s", path);
 	}
-	wrote = write(fd, new, size);
-	close(fd);
-	free(new);
-	if (wrote != size)
-		return error("checkout-cache: unable to write %s", path);
 	return 0;
 }
 
@@ -101,7 +115,7 @@ static int checkout_entry(struct cache_e
 	memcpy(path, base_dir, len);
 	strcpy(path + len, ce->name);
 
-	if (!stat(path, &st)) {
+	if (!lstat(path, &st)) {
 		unsigned changed = cache_match_stat(ce, &st);
 		if (!changed)
 			return 0;
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -24,7 +24,7 @@ static int get_stat_data(struct cache_en
 		static unsigned char no_sha1[20];
 		int changed;
 		struct stat st;
-		if (stat(ce->name, &st) < 0)
+		if (lstat(ce->name, &st) < 0)
 			return -1;
 		changed = cache_match_stat(ce, &st);
 		if (changed) {
--- a/ls-files.c
+++ b/ls-files.c
@@ -199,7 +199,7 @@ static void show_files(void)
 			struct stat st;
 			if (excluded(ce->name) != show_ignored)
 				continue;
-			if (!stat(ce->name, &st))
+			if (!lstat(ce->name, &st))
 				continue;
 			printf("%s%c", ce->name, line_terminator);
 		}
--- a/tree.c
+++ b/tree.c
@@ -13,7 +13,10 @@ static int read_one_entry(unsigned char 
 
 	memset(ce, 0, size);
 
-	ce->ce_mode = create_ce_mode(mode);
+	if (mode & S_IFMT)
+		ce->ce_mode = htonl(mode);
+	else
+		ce->ce_mode = create_ce_mode(mode);
 	ce->ce_flags = create_ce_flags(baselen + len, stage);
 	memcpy(ce->name, base, baselen);
 	memcpy(ce->name + baselen, pathname, len+1);
--- a/update-cache.c
+++ b/update-cache.c
@@ -58,30 +58,37 @@ static int add_file_to_cache(char *path)
 	struct stat st;
 	int fd;
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
+	if (lstat(path, &st) < 0) {
 		if (errno == ENOENT || errno == ENOTDIR) {
 			if (allow_remove)
 				return remove_file_from_cache(path);
 		}
 		return -1;
 	}
-	if (fstat(fd, &st) < 0) {
-		close(fd);
-		return -1;
-	}
 	namelen = strlen(path);
 	size = cache_entry_size(namelen);
 	ce = xmalloc(size);
 	memset(ce, 0, size);
 	memcpy(ce->name, path, namelen);
 	fill_stat_cache_info(ce, &st);
-	ce->ce_mode = create_ce_mode(st.st_mode);
+	if (S_ISREG(st.st_mode)) {
+		fd = open(path, O_RDONLY);
+		if (fd < 0)
+			return -1;
+		ce->ce_mode = create_ce_mode(st.st_mode);
+		if (index_fd(ce->sha1, fd, &st) < 0)
+			return -1;
+	} else if (S_ISLNK(st.st_mode)) {
+		unsigned int len;
+		char target[1024];
+		ce->ce_mode = htonl(S_IFLNK);
+		len = readlink(path, target, sizeof(target));
+		if (len == -1 || len+1 > sizeof(target))
+			return -1;
+		if (write_sha1_file(target, len, "blob", ce->sha1))
+			return -1;
+	}
 	ce->ce_flags = htons(namelen);
-
-	if (index_fd(ce->sha1, fd, &st) < 0)
-		return -1;
-
 	return add_cache_entry(ce, allow_add);
 }
 
@@ -137,7 +144,7 @@ static struct cache_entry *refresh_entry
 	struct cache_entry *updated;
 	int changed, size;
 
-	if (stat(ce->name, &st) < 0)
+	if (lstat(ce->name, &st) < 0)
 		return ERR_PTR(-errno);
 
 	changed = cache_match_stat(ce, &st);


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-04 15:48           ` David A. Wheeler
@ 2005-05-04 23:03             ` Daniel Barkalow
  2005-05-05  6:09               ` Alan Chandler
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Barkalow @ 2005-05-04 23:03 UTC (permalink / raw
  To: David A. Wheeler
  Cc: H. Peter Anvin, Junio C Hamano, Linus Torvalds, Andreas Gal,
	Kay Sievers, git

On Wed, 4 May 2005, David A. Wheeler wrote:

> Once you're there, it wouldn't be hard to add logic to add options to
> (1) record the REAL permission bits, (2) record "." files, and
> (3) recover the permission bits.  That would be enough to
> store & recover in a distributed way a single person's home directory.
> THAT might be darn useful, for those of us who float between
> different systems & would like to use a single system for multiple purposes.
> That's clearly beyond the scope of a typical SCM, but since
> it's easy to get there, that'd make sense.

The status quo with respect to the permissions is actually the correct
thing for an SCM, because you want to generate the corresponding tree for
a different user (e.g., with the other user's umask applied, etc.), not
the same tree.

This is a situation in which doing 90% of one thing, and then supporting
90% of something else separately is best. What you really want is to have
a "directory" object type that stores the exact permissions, and the
uid/gid, and even xattr stuff. Then you use those for distributing your
home directory, but not for distributing source trees, where that stuff is
useless and somewhat wrong. You could probably have the same kind of
commit objects, although you still need some way of figuring out what kind
of object is desired for the directories in a commit.

(on the other hand, it might make sense for git to handle files starting
with '.', and only skip .git).

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-04 22:35   ` Kay Sievers
@ 2005-05-04 23:16     ` Junio C Hamano
  2005-05-05  1:20       ` Kay Sievers
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2005-05-04 23:16 UTC (permalink / raw
  To: Kay Sievers; +Cc: Linus Torvalds, git

It seems to follow the original suggestion by Linus and looks
good.  Some comments:

 * It continues to assume that S_IFREG, S_IFDIR and S_IFLNK have
   the same bit pattern everywhere.  In the same spirit as we
   store mode bits in network byte order, it may be a good time
   to introduce something like this:

   -#define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
   -#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
   +#define CE_IFREG  0100000
   +#define CE_IFDIR  0040000
   +#define CE_IFLNK  0120000
   +#define CE_IFMASK 0770000
   +
   +#define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644) /* REG only */ 
   +#define create_ce_mode(mode) htonl(S_ISREG(mode) ?
   +				   (CE_IFREG | ce_permissions(mode)) :
   +				   S_ISLNK(mode) ?
   +				   CE_IFLNK :
   +				   0) /* what would we do for unknowns? */

 * read-cache.c:cache_match_stat() needs to know about the
   object type.  It was allowed to assume that anything thrown
   at it was a file, but not anymore.  How about something like
   this:

     int cache_match_stat(struct cache_entry 
     {
            unsigned int changed = 0;

    +       switch (ntohl(ce->ce_mode) & CE_IFMASK) {
    +       case CE_IFREG:
    +               changed |= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;
    +               break;
    +       case CE_IFLNK:
    +               changed |= !S_ISLNK(st->st_mode) ? TYPE_CHANGED : 0;
    +               break;
    +       default:
    +               die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
    +       }

    (in cache.h) 
     #define INODE_CHANGED   0x0010
     #define DATA_CHANGED    0x0020
    +#define TYPE_CHANGED    0x0040

  * update-cache.c:refresh_entry() needs to know that if the
    type of the path changed, it would never match:

            /*
    -        * If the mode has changed, there's no point in trying
    +        * If the mode or type has changed, there's no point in trying
             * to refresh the entry - it's not going to match
             */
    -       if (changed & MODE_CHANGED)
    +       if (changed & (MODE_CHANGED | TYPE_CHANGED))
                    return ERR_PTR(-EINVAL);

            if (compare_data(ce, st.st_size))

  * (this is just a minor nit).  Since you have st here,
    st.st_size can be used to see how big a buffer you need to
    prepare for readlink() here:

    +               unsigned int len;
    +               char target[1024];
    +               ce->ce_mode = htonl(S_IFLNK);
    +               len = readlink(path, target, sizeof(target));
    +               if (len == -1 || len+1 > sizeof(target))
    +                       return -1;

  * Probably diff.c needs to be made aware of this change.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-04 23:16     ` Junio C Hamano
@ 2005-05-05  1:20       ` Kay Sievers
  2005-05-05  2:13         ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Kay Sievers @ 2005-05-05  1:20 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Allow to store and track symlink in the repository. A symlink is stored
the same way as a regular file, but with the appropriate mode bits set.
The symlink target is stored in its own blob object. This will hopefully
make our udev repository fully functional. :)

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

On Wed, May 04, 2005 at 04:16:22PM -0700, Junio C Hamano wrote:
> It seems to follow the original suggestion by Linus and looks
> good.  Some comments:
>
>  * It continues to assume that S_IFREG, S_IFDIR and S_IFLNK have
>    the same bit pattern everywhere.  In the same spirit as we
>    store mode bits in network byte order, it may be a good time
>    to introduce something like this:
...

>  * read-cache.c:cache_match_stat() needs to know about the
>    object type.  It was allowed to assume that anything thrown
>    at it was a file, but not anymore.  How about something like
>    this:

Both included and updated.

>   * (this is just a minor nit).  Since you have st here,
>     st.st_size can be used to see how big a buffer you need to
>     prepare for readlink() here:

Sounds nice, but is this reliable? I just remember some exotic filesystems
to reported bogus values here.

>   * Probably diff.c needs to be made aware of this change.

You already did this. :) Very nice.

Thanks,
Kay

--- a/cache.h
+++ b/cache.h
@@ -86,8 +86,19 @@ struct cache_entry {
 #define ce_size(ce) cache_entry_size(ce_namelen(ce))
 #define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
 
+#define CE_IFREG  0100000
+#define CE_IFDIR  0040000
+#define CE_IFLNK  0120000
+#define CE_IFMASK 0770000
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
-#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
+static inline unsigned int create_ce_mode(unsigned int mode)
+{
+	if (S_ISREG(mode))
+		return htonl(S_IFREG | ce_permissions(mode));
+	if (S_ISLNK(mode))
+		return htonl(CE_IFLNK);
+	return 0;
+}
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
@@ -124,6 +135,7 @@ extern int index_fd(unsigned char *sha1,
 #define MODE_CHANGED    0x0008
 #define INODE_CHANGED   0x0010
 #define DATA_CHANGED    0x0020
+#define TYPE_CHANGED    0x0040
 
 /* Return a statically allocated filename matching the sha1 signature */
 extern char *sha1_file_name(const unsigned char *sha1);
--- a/check-files.c
+++ b/check-files.c
@@ -28,8 +28,8 @@ static void check_file(const char *path)
 		die("preparing to update existing file '%s' not in cache", path);
 	ce = active_cache[pos];
 
-	if (fstat(fd, &st) < 0)
-		die("fstat(%s): %s", path, strerror(errno));
+	if (lstat(path, &st) < 0)
+		die("lstat(%s): %s", path, strerror(errno));
 
 	changed = cache_match_stat(ce, &st);
 	if (changed)
--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -72,23 +72,37 @@ static int write_entry(struct cache_entr
 	unsigned long size;
 	long wrote;
 	char type[20];
+	unsigned int mode;
 
 	new = read_sha1_file(ce->sha1, type, &size);
 	if (!new || strcmp(type, "blob")) {
 		return error("checkout-cache: unable to read sha1 file of %s (%s)",
 			path, sha1_to_hex(ce->sha1));
 	}
-	fd = create_file(path, ntohl(ce->ce_mode));
-	if (fd < 0) {
+	mode = ntohl(ce->ce_mode);
+	if (S_ISLNK(mode)) {
+		char target[1024];
+		memcpy(target, new, size);
+		target[size] = '\0';
+		if (symlink(target, path)) {
+			free(new);
+			return error("checkout-cache: unable to create link %s (%s)",
+				path, strerror(errno));
+		}
+		free(new);
+	} else {
+		fd = create_file(path, mode);
+		if (fd < 0) {
+			free(new);
+			return error("checkout-cache: unable to create file %s (%s)",
+				path, strerror(errno));
+		}
+		wrote = write(fd, new, size);
+		close(fd);
 		free(new);
-		return error("checkout-cache: unable to create %s (%s)",
-			path, strerror(errno));
+		if (wrote != size)
+			return error("checkout-cache: unable to write %s", path);
 	}
-	wrote = write(fd, new, size);
-	close(fd);
-	free(new);
-	if (wrote != size)
-		return error("checkout-cache: unable to write %s", path);
 	return 0;
 }
 
@@ -101,7 +115,7 @@ static int checkout_entry(struct cache_e
 	memcpy(path, base_dir, len);
 	strcpy(path + len, ce->name);
 
-	if (!stat(path, &st)) {
+	if (!lstat(path, &st)) {
 		unsigned changed = cache_match_stat(ce, &st);
 		if (!changed)
 			return 0;
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -24,7 +24,7 @@ static int get_stat_data(struct cache_en
 		static unsigned char no_sha1[20];
 		int changed;
 		struct stat st;
-		if (stat(ce->name, &st) < 0)
+		if (lstat(ce->name, &st) < 0)
 			return -1;
 		changed = cache_match_stat(ce, &st);
 		if (changed) {
--- a/diff.c
+++ b/diff.c
@@ -165,7 +165,7 @@ static void prepare_temp_file(const char
 		}
 		strcpy(temp->hex, sha1_to_hex(null_sha1));
 		sprintf(temp->mode, "%06o",
-			S_IFREG |ce_permissions(st.st_mode));
+			S_IFREG | ce_permissions(st.st_mode));
 	}
 	else {
 		int fd;
--- a/ls-files.c
+++ b/ls-files.c
@@ -199,7 +199,7 @@ static void show_files(void)
 			struct stat st;
 			if (excluded(ce->name) != show_ignored)
 				continue;
-			if (!stat(ce->name, &st))
+			if (!lstat(ce->name, &st))
 				continue;
 			printf("%s%c", ce->name, line_terminator);
 		}
--- a/read-cache.c
+++ b/read-cache.c
@@ -13,6 +13,16 @@ int cache_match_stat(struct cache_entry 
 {
 	unsigned int changed = 0;
 
+	switch (ntohl(ce->ce_mode) & CE_IFMASK) {
+	case CE_IFREG:
+		changed |= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;
+		break;
+	case CE_IFLNK:
+		changed |= !S_ISLNK(st->st_mode) ? TYPE_CHANGED : 0;
+		break;
+	default:
+		die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
+	}
 	if (ce->ce_mtime.sec != htonl(st->st_mtime))
 		changed |= MTIME_CHANGED;
 	if (ce->ce_ctime.sec != htonl(st->st_ctime))
--- a/tree.c
+++ b/tree.c
@@ -13,7 +13,10 @@ static int read_one_entry(unsigned char 
 
 	memset(ce, 0, size);
 
-	ce->ce_mode = create_ce_mode(mode);
+	if (mode & S_IFMT)
+		ce->ce_mode = htonl(mode);
+	else
+		ce->ce_mode = create_ce_mode(mode);
 	ce->ce_flags = create_ce_flags(baselen + len, stage);
 	memcpy(ce->name, base, baselen);
 	memcpy(ce->name + baselen, pathname, len+1);
--- a/update-cache.c
+++ b/update-cache.c
@@ -58,30 +58,37 @@ static int add_file_to_cache(char *path)
 	struct stat st;
 	int fd;
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
+	if (lstat(path, &st) < 0) {
 		if (errno == ENOENT || errno == ENOTDIR) {
 			if (allow_remove)
 				return remove_file_from_cache(path);
 		}
 		return -1;
 	}
-	if (fstat(fd, &st) < 0) {
-		close(fd);
-		return -1;
-	}
 	namelen = strlen(path);
 	size = cache_entry_size(namelen);
 	ce = xmalloc(size);
 	memset(ce, 0, size);
 	memcpy(ce->name, path, namelen);
 	fill_stat_cache_info(ce, &st);
-	ce->ce_mode = create_ce_mode(st.st_mode);
+	if (S_ISREG(st.st_mode)) {
+		fd = open(path, O_RDONLY);
+		if (fd < 0)
+			return -1;
+		ce->ce_mode = create_ce_mode(st.st_mode);
+		if (index_fd(ce->sha1, fd, &st) < 0)
+			return -1;
+	} else if (S_ISLNK(st.st_mode)) {
+		unsigned int len;
+		char target[1024];
+		ce->ce_mode = htonl(S_IFLNK);
+		len = readlink(path, target, sizeof(target));
+		if (len == -1 || len+1 > sizeof(target))
+			return -1;
+		if (write_sha1_file(target, len, "blob", ce->sha1))
+			return -1;
+	}
 	ce->ce_flags = htons(namelen);
-
-	if (index_fd(ce->sha1, fd, &st) < 0)
-		return -1;
-
 	return add_cache_entry(ce, allow_add);
 }
 
@@ -137,7 +144,7 @@ static struct cache_entry *refresh_entry
 	struct cache_entry *updated;
 	int changed, size;
 
-	if (stat(ce->name, &st) < 0)
+	if (lstat(ce->name, &st) < 0)
 		return ERR_PTR(-errno);
 
 	changed = cache_match_stat(ce, &st);
@@ -145,10 +152,10 @@ static struct cache_entry *refresh_entry
 		return ce;
 
 	/*
-	 * If the mode has changed, there's no point in trying
+	 * If the mode or type has changed, there's no point in trying
 	 * to refresh the entry - it's not going to match
 	 */
-	if (changed & MODE_CHANGED)
+	if (changed & (MODE_CHANGED | TYPE_CHANGED))
 		return ERR_PTR(-EINVAL);
 
 	if (compare_data(ce, st.st_size))


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-05  1:20       ` Kay Sievers
@ 2005-05-05  2:13         ` Junio C Hamano
  2005-05-05 12:38           ` Kay Sievers
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2005-05-05  2:13 UTC (permalink / raw
  To: Kay Sievers; +Cc: Linus Torvalds, git

>>>>> "KS" == Kay Sievers <kay.sievers@vrfy.org> writes:

>> * It continues to assume that S_IFREG, S_IFDIR and S_IFLNK have
>> the same bit pattern everywhere....

>> * read-cache.c:cache_match_stat() ...

KS> Both included and updated.

The second one, yes, but the first one is "not really".  If you
are going to do this:

KS> +#define CE_IFREG  0100000
KS> +#define CE_IFDIR  0040000
KS> ...
KS> +#define CE_IFMASK 0770000
 
then you need to touch these things:

KS> +	mode = ntohl(ce->ce_mode);
KS> +	if (S_ISLNK(mode)) {

Here mode encodes type in CE_ format, so S_ISLNK() is bad.

KS> @@ -165,7 +165,7 @@ static void prepare_temp_file(const char
KS>  		}
KS>  		strcpy(temp->hex, sha1_to_hex(null_sha1));
KS>  		sprintf(temp->mode, "%06o",
KS> -			S_IFREG |ce_permissions(st.st_mode));
KS> +			S_IFREG | ce_permissions(st.st_mode));
KS>  	}

Likewise here, although this is my bad.  I did not know if you
are going to take CE_ type suggestion so I left it as it was.

There are more.  "grep 'S_I[SF]' *.[ch] */*.[ch]" would tell us
most if not all.  We probably would want to have CE_ISLNK() and
friends, parallel to S_ISLNK() and friends if we go this route.

Does POSIX or something have nice to say that we do not have to
worry about this?  Or are the stat type bits really different on
different Unixen?  I used to do porting for living across a
dozen or so different Unixen long time ago and I should know the
answer to this kind of thing by heart, but I do not anymore X-<.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-04 23:03             ` Daniel Barkalow
@ 2005-05-05  6:09               ` Alan Chandler
  2005-05-05  9:51                 ` read-only git repositories David Lang
  2005-05-05 21:23                 ` git and symlinks as tracked content Daniel Barkalow
  0 siblings, 2 replies; 29+ messages in thread
From: Alan Chandler @ 2005-05-05  6:09 UTC (permalink / raw
  To: git

On Thursday 05 May 2005 00:03, Daniel Barkalow wrote:

> (on the other hand, it might make sense for git to handle files starting
> with '.', and only skip .git).

definitely only as an option.  I envisage checking out (maybe anonymously) 
from svn or other repositories and then using git locally to manage my own 
development.  It would be preferable for the .git repository not to be 
"polluted" with the svn prisine trees etc 


-- 
Alan Chandler
http://www.chandlerfamily.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* read-only git repositories
  2005-05-05  6:09               ` Alan Chandler
@ 2005-05-05  9:51                 ` David Lang
  2005-05-05 12:39                   ` Sean
  2005-05-06  3:01                   ` read-only git repositories (ancient history) David A. Wheeler
  2005-05-05 21:23                 ` git and symlinks as tracked content Daniel Barkalow
  1 sibling, 2 replies; 29+ messages in thread
From: David Lang @ 2005-05-05  9:51 UTC (permalink / raw
  To: git

given that git already treats everything in the object storage as being 
fixed it occured to me that there may be value in makeing it so that git 
can make use of more then one pool of storage.

possible uses of this would be to have a bunch of data on read-only media 
(say the 3G+ kernel history on a DVD), having a pruned local object store 
with automated fetching from elsewhere if the object isn't found locally, 
or marking the object store that you plan on sharing with the world as 
read-only (with your changed object going into a secondary store) so that 
you don't pollute it accidently (this could also cut down on the storage 
requirements)

there are probably other uses and it seems like a fairly small 
modification to add a hook to use if the object isn't found initially that 
I thought I'd mention it to the group.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-05  2:13         ` Junio C Hamano
@ 2005-05-05 12:38           ` Kay Sievers
  0 siblings, 0 replies; 29+ messages in thread
From: Kay Sievers @ 2005-05-05 12:38 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Allow to store and track symlink in the repository. A symlink is stored
the same way as a regular file, only with the appropriate mode bits set.
The symlink target is therefore stored in a blob object.
This will hopefully make our udev repository fully functional. :)

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

On Wed, May 04, 2005 at 07:13:43PM -0700, Junio C Hamano wrote:
> >>>>> "KS" == Kay Sievers <kay.sievers@vrfy.org> writes:

> I did not know if you are going to take CE_ type suggestion so 
> I left it as it was.
> 
> There are more.  "grep 'S_I[SF]' *.[ch] */*.[ch]" would tell us
> most if not all.  We probably would want to have CE_ISLNK() and
> friends, parallel to S_ISLNK() and friends if we go this route.

Hmm, how about this?

Thanks,
Kay

--- a/cache.h
+++ b/cache.h
@@ -87,7 +87,14 @@ struct cache_entry {
 #define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
-#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
+static inline unsigned int create_ce_mode(unsigned int mode)
+{
+	if (S_ISREG(mode))
+		return htonl(S_IFREG | ce_permissions(mode));
+	if (S_ISLNK(mode))
+		return htonl(S_IFLNK);
+	return htonl(mode);
+}
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
@@ -124,6 +131,7 @@ extern int index_fd(unsigned char *sha1,
 #define MODE_CHANGED    0x0008
 #define INODE_CHANGED   0x0010
 #define DATA_CHANGED    0x0020
+#define TYPE_CHANGED    0x0040
 
 /* Return a statically allocated filename matching the sha1 signature */
 extern char *sha1_file_name(const unsigned char *sha1);
--- a/check-files.c
+++ b/check-files.c
@@ -28,8 +28,8 @@ static void check_file(const char *path)
 		die("preparing to update existing file '%s' not in cache", path);
 	ce = active_cache[pos];
 
-	if (fstat(fd, &st) < 0)
-		die("fstat(%s): %s", path, strerror(errno));
+	if (lstat(path, &st) < 0)
+		die("lstat(%s): %s", path, strerror(errno));
 
 	changed = cache_match_stat(ce, &st);
 	if (changed)
--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -72,23 +72,41 @@ static int write_entry(struct cache_entr
 	unsigned long size;
 	long wrote;
 	char type[20];
+	char target[1024];
 
 	new = read_sha1_file(ce->sha1, type, &size);
 	if (!new || strcmp(type, "blob")) {
 		return error("checkout-cache: unable to read sha1 file of %s (%s)",
 			path, sha1_to_hex(ce->sha1));
 	}
-	fd = create_file(path, ntohl(ce->ce_mode));
-	if (fd < 0) {
+	switch (ntohl(ce->ce_mode) & S_IFMT) {
+	case S_IFREG:
+		fd = create_file(path, ntohl(ce->ce_mode));
+		if (fd < 0) {
+			free(new);
+			return error("checkout-cache: unable to create file %s (%s)",
+				path, strerror(errno));
+		}
+		wrote = write(fd, new, size);
+		close(fd);
+		free(new);
+		if (wrote != size)
+			return error("checkout-cache: unable to write file %s", path);
+		break;
+	case S_IFLNK:
+		memcpy(target, new, size);
+		target[size] = '\0';
+		if (symlink(target, path)) {
+			free(new);
+			return error("checkout-cache: unable to create symlink %s (%s)",
+				path, strerror(errno));
+		}
+		free(new);
+		break;
+	default:
 		free(new);
-		return error("checkout-cache: unable to create %s (%s)",
-			path, strerror(errno));
+		return error("checkout-cache: unknown file mode for %s", path);
 	}
-	wrote = write(fd, new, size);
-	close(fd);
-	free(new);
-	if (wrote != size)
-		return error("checkout-cache: unable to write %s", path);
 	return 0;
 }
 
@@ -101,7 +119,7 @@ static int checkout_entry(struct cache_e
 	memcpy(path, base_dir, len);
 	strcpy(path + len, ce->name);
 
-	if (!stat(path, &st)) {
+	if (!lstat(path, &st)) {
 		unsigned changed = cache_match_stat(ce, &st);
 		if (!changed)
 			return 0;
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -24,7 +24,7 @@ static int get_stat_data(struct cache_en
 		static unsigned char no_sha1[20];
 		int changed;
 		struct stat st;
-		if (stat(ce->name, &st) < 0)
+		if (lstat(ce->name, &st) < 0)
 			return -1;
 		changed = cache_match_stat(ce, &st);
 		if (changed) {
--- a/ls-files.c
+++ b/ls-files.c
@@ -199,7 +199,7 @@ static void show_files(void)
 			struct stat st;
 			if (excluded(ce->name) != show_ignored)
 				continue;
-			if (!stat(ce->name, &st))
+			if (!lstat(ce->name, &st))
 				continue;
 			printf("%s%c", ce->name, line_terminator);
 		}
--- a/read-cache.c
+++ b/read-cache.c
@@ -13,6 +13,16 @@ int cache_match_stat(struct cache_entry 
 {
 	unsigned int changed = 0;
 
+	switch (ntohl(ce->ce_mode) & S_IFMT) {
+	case S_IFREG:
+		changed |= !S_ISREG(st->st_mode) ? TYPE_CHANGED : 0;
+		break;
+	case S_IFLNK:
+		changed |= !S_ISLNK(st->st_mode) ? TYPE_CHANGED : 0;
+		break;
+	default:
+		die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
+	}
 	if (ce->ce_mtime.sec != htonl(st->st_mtime))
 		changed |= MTIME_CHANGED;
 	if (ce->ce_ctime.sec != htonl(st->st_ctime))
--- a/update-cache.c
+++ b/update-cache.c
@@ -57,19 +57,16 @@ static int add_file_to_cache(char *path)
 	struct cache_entry *ce;
 	struct stat st;
 	int fd;
+	unsigned int len;
+	char target[1024];
 
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
+	if (lstat(path, &st) < 0) {
 		if (errno == ENOENT || errno == ENOTDIR) {
 			if (allow_remove)
 				return remove_file_from_cache(path);
 		}
 		return -1;
 	}
-	if (fstat(fd, &st) < 0) {
-		close(fd);
-		return -1;
-	}
 	namelen = strlen(path);
 	size = cache_entry_size(namelen);
 	ce = xmalloc(size);
@@ -78,10 +75,24 @@ static int add_file_to_cache(char *path)
 	fill_stat_cache_info(ce, &st);
 	ce->ce_mode = create_ce_mode(st.st_mode);
 	ce->ce_flags = htons(namelen);
-
-	if (index_fd(ce->sha1, fd, &st) < 0)
+	switch (st.st_mode & S_IFMT) {
+	case S_IFREG:
+		fd = open(path, O_RDONLY);
+		if (fd < 0)
+			return -1;
+		if (index_fd(ce->sha1, fd, &st) < 0)
+			return -1;
+		break;
+	case S_IFLNK:
+		len = readlink(path, target, sizeof(target));
+		if (len == -1 || len+1 > sizeof(target))
+			return -1;
+		if (write_sha1_file(target, len, "blob", ce->sha1))
+			return -1;
+		break;
+	default:
 		return -1;
-
+	}
 	return add_cache_entry(ce, allow_add);
 }
 
@@ -137,7 +148,7 @@ static struct cache_entry *refresh_entry
 	struct cache_entry *updated;
 	int changed, size;
 
-	if (stat(ce->name, &st) < 0)
+	if (lstat(ce->name, &st) < 0)
 		return ERR_PTR(-errno);
 
 	changed = cache_match_stat(ce, &st);
@@ -145,10 +156,10 @@ static struct cache_entry *refresh_entry
 		return ce;
 
 	/*
-	 * If the mode has changed, there's no point in trying
+	 * If the mode or type has changed, there's no point in trying
 	 * to refresh the entry - it's not going to match
 	 */
-	if (changed & MODE_CHANGED)
+	if (changed & (MODE_CHANGED | TYPE_CHANGED))
 		return ERR_PTR(-EINVAL);
 
 	if (compare_data(ce, st.st_size))


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: read-only git repositories
  2005-05-05  9:51                 ` read-only git repositories David Lang
@ 2005-05-05 12:39                   ` Sean
  2005-05-06  3:01                   ` read-only git repositories (ancient history) David A. Wheeler
  1 sibling, 0 replies; 29+ messages in thread
From: Sean @ 2005-05-05 12:39 UTC (permalink / raw
  To: David Lang; +Cc: git

On Thu, May 5, 2005 5:51 am, David Lang said:

> there are probably other uses and it seems like a fairly small
> modification to add a hook to use if the object isn't found initially
> that I thought I'd mention it to the group.
>
David,

Great idea!  This seems like an option that naturally falls out of the git
design.  You're right that there are lots of uses for it too; another
would be to keep all local changes in an isolated object store for backup
etc.

Sean



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: git and symlinks as tracked content
  2005-05-05  6:09               ` Alan Chandler
  2005-05-05  9:51                 ` read-only git repositories David Lang
@ 2005-05-05 21:23                 ` Daniel Barkalow
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Barkalow @ 2005-05-05 21:23 UTC (permalink / raw
  To: Alan Chandler; +Cc: git

On Thu, 5 May 2005, Alan Chandler wrote:

> On Thursday 05 May 2005 00:03, Daniel Barkalow wrote:
> 
> > (on the other hand, it might make sense for git to handle files starting
> > with '.', and only skip .git).
> 
> definitely only as an option.  I envisage checking out (maybe anonymously) 
> from svn or other repositories and then using git locally to manage my own 
> development.  It would be preferable for the .git repository not to be 
> "polluted" with the svn prisine trees etc 

It wouldn't touch them at all unless you specifically added them. The
present situation is that git ignores files starting with "." even if you
specifically add them.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: read-only git repositories (ancient history)
  2005-05-05  9:51                 ` read-only git repositories David Lang
  2005-05-05 12:39                   ` Sean
@ 2005-05-06  3:01                   ` David A. Wheeler
  1 sibling, 0 replies; 29+ messages in thread
From: David A. Wheeler @ 2005-05-06  3:01 UTC (permalink / raw
  To: David Lang; +Cc: git

David Lang wrote:
> given that git already treats everything in the object storage as being 
> fixed it occured to me that there may be value in makeing it so that git 
> can make use of more then one pool of storage
...
> there are probably other uses and it seems like a fairly small 
> modification to add a hook to use if the object isn't found initially 
> that I thought I'd mention it to the group.

Reasonable.  Another use would be to have a repository with
"ancient history" (e.g., Linux pre-2.6) that isn't normally
loaded or looked at, but COULD be looked at if you added
that repository.  For that use, though, you'd need a way to
record "the parent of X is Y" since the information creating
connections BETWEEN the repositories might not be stored in
the later repository itself (see the discussions about Linux kernel
history recreation).

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2005-05-06  2:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03 18:33 git and symlinks as tracked content Kay Sievers
2005-05-03 19:02 ` Linus Torvalds
2005-05-03 19:10   ` Morten Welinder
2005-05-03 19:50   ` H. Peter Anvin
2005-05-03 19:57   ` Andreas Gal
2005-05-03 20:05     ` Linus Torvalds
2005-05-03 20:09       ` Kay Sievers
2005-05-03 21:30       ` Junio C Hamano
2005-05-03 21:51         ` Andreas Gal
2005-05-03 22:44           ` Junio C Hamano
2005-05-04  0:39             ` Sym-links, b/c-special files, pipes, ... Scope Creep Brian O'Mahoney
2005-05-03 22:56         ` git and symlinks as tracked content H. Peter Anvin
2005-05-03 23:16           ` Junio C Hamano
2005-05-03 23:18             ` H. Peter Anvin
2005-05-03 23:42               ` Linus Torvalds
2005-05-03 23:42               ` Junio C Hamano
2005-05-04 15:48           ` David A. Wheeler
2005-05-04 23:03             ` Daniel Barkalow
2005-05-05  6:09               ` Alan Chandler
2005-05-05  9:51                 ` read-only git repositories David Lang
2005-05-05 12:39                   ` Sean
2005-05-06  3:01                   ` read-only git repositories (ancient history) David A. Wheeler
2005-05-05 21:23                 ` git and symlinks as tracked content Daniel Barkalow
2005-05-03 20:23   ` Junio C Hamano
2005-05-04 22:35   ` Kay Sievers
2005-05-04 23:16     ` Junio C Hamano
2005-05-05  1:20       ` Kay Sievers
2005-05-05  2:13         ` Junio C Hamano
2005-05-05 12:38           ` Kay Sievers

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).