git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Using GIT to store /etc (Or: How to make GIT store all file permission bits)
@ 2006-12-10 13:40 Kyle Moffett
  2006-12-10 14:49 ` Jeff Garzik
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Kyle Moffett @ 2006-12-10 13:40 UTC (permalink / raw)
  To: git

I've recently become somewhat interested in the idea of using GIT to  
store the contents of various folders in /etc.  However after a bit  
of playing with this, I discovered that GIT doesn't actually preserve  
all permission bits since that would cause problems with the more  
traditional software development model.  I'm curious if anyone has  
done this before; and if so, how they went about handling the  
permissions and ownership issues.

I spent a little time looking over how GIT stores and compares  
permission bits; trying to figure out if it's possible to patch in a  
new configuration variable or two; say "preserve_all_perms" and  
"preserve_owner", or maybe even "save_acls".  It looks like standard  
permission preservation is fairly basic; you would just need to patch  
a few routines which alter the permissions read in from disk or  
compare them with ones from the database.  On the other hand, it  
would appear that preserving ownership or full POSIX ACLs might be a  
bit of a challenge.

Thanks for your insight and advice!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
@ 2006-12-10 14:49 ` Jeff Garzik
  2006-12-10 15:30   ` Jakub Narebski
  2006-12-10 15:06 ` Santi Béjar
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Jeff Garzik @ 2006-12-10 14:49 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

Kyle Moffett wrote:
> I've recently become somewhat interested in the idea of using GIT to 
> store the contents of various folders in /etc.  However after a bit of 
> playing with this, I discovered that GIT doesn't actually preserve all 
> permission bits since that would cause problems with the more 
> traditional software development model.  I'm curious if anyone has done 
> this before; and if so, how they went about handling the permissions and 
> ownership issues.
> 
> I spent a little time looking over how GIT stores and compares 
> permission bits; trying to figure out if it's possible to patch in a new 
> configuration variable or two; say "preserve_all_perms" and 
> "preserve_owner", or maybe even "save_acls".  It looks like standard 
> permission preservation is fairly basic; you would just need to patch a 
> few routines which alter the permissions read in from disk or compare 
> them with ones from the database.  On the other hand, it would appear 
> that preserving ownership or full POSIX ACLs might be a bit of a challenge.

It's a great idea, something I would like to do, and something I've 
suggested before.  You could dig through the mailing list archives, if 
you're motivated.

I actively use git to version, store and distribute an exim mail 
configuration across six servers.  So far my solution has been a 'fix 
perms' script, or using the file perm checking capabilities of cfengine.

But it would be a lot better if git natively cared about ownership and 
permissions (presumably via an option).

	Jeff



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
  2006-12-10 14:49 ` Jeff Garzik
@ 2006-12-10 15:06 ` Santi Béjar
  2006-12-10 17:46   ` Kyle Moffett
  2007-01-10  1:39   ` David Lang
  2006-12-11 10:50 ` Nikolai Weibull
  2006-12-12  3:45 ` Daniel Barkalow
  3 siblings, 2 replies; 34+ messages in thread
From: Santi Béjar @ 2006-12-10 15:06 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote:
> I've recently become somewhat interested in the idea of using GIT to
> store the contents of various folders in /etc.  However after a bit
> of playing with this, I discovered that GIT doesn't actually preserve
> all permission bits since that would cause problems with the more
> traditional software development model.  I'm curious if anyone has
> done this before; and if so, how they went about handling the
> permissions and ownership issues.
>
> I spent a little time looking over how GIT stores and compares
> permission bits; trying to figure out if it's possible to patch in a
> new configuration variable or two; say "preserve_all_perms" and
> "preserve_owner", or maybe even "save_acls".  It looks like standard
> permission preservation is fairly basic; you would just need to patch
> a few routines which alter the permissions read in from disk or
> compare them with ones from the database.  On the other hand, it
> would appear that preserving ownership or full POSIX ACLs might be a
> bit of a challenge.
>
> Thanks for your insight and advice!

I have not used it, but you could try:

http://www.isisetup.ch/

that uses git as a backend.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 14:49 ` Jeff Garzik
@ 2006-12-10 15:30   ` Jakub Narebski
  2006-12-10 18:10     ` Kyle Moffett
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2006-12-10 15:30 UTC (permalink / raw)
  To: git

Jeff Garzik wrote:

> Kyle Moffett wrote:
>>
>> I've recently become somewhat interested in the idea of using GIT to 
>> store the contents of various folders in /etc.  However after a bit of 
>> playing with this, I discovered that GIT doesn't actually preserve all 
>> permission bits since that would cause problems with the more 
>> traditional software development model.  I'm curious if anyone has done 
>> this before; and if so, how they went about handling the permissions and 
>> ownership issues.
>> 
>> I spent a little time looking over how GIT stores and compares 
>> permission bits; trying to figure out if it's possible to patch in a new 
>> configuration variable or two; say "preserve_all_perms" and 
>> "preserve_owner", or maybe even "save_acls".  It looks like standard 
>> permission preservation is fairly basic; you would just need to patch a 
>> few routines which alter the permissions read in from disk or compare 
>> them with ones from the database.  On the other hand, it would appear 
>> that preserving ownership or full POSIX ACLs might be a bit of a challenge.
> 
> It's a great idea, something I would like to do, and something I've 
> suggested before.  You could dig through the mailing list archives, if 
> you're motivated.
> 
> I actively use git to version, store and distribute an exim mail 
> configuration across six servers.  So far my solution has been a 'fix 
> perms' script, or using the file perm checking capabilities of cfengine.

Fix perms' script used on a checkout hook is a best idea I think.
 
> But it would be a lot better if git natively cared about ownership and 
> permissions (presumably via an option).

There is currently no place for ownership and extended attributes in
the tree object; and even full POSIX permissions might be challenge
because for example currently unused 'is socket' permission bit is
used for experimental commit-in-tree submodule support. And given Linus
stance that git is "content tracker"...

In the loooong thread "VCS comparison table" there was some talk
about using git (or any SCM) to manage /etc. Check out:

 * Message-ID: <Pine.LNX.4.64.0610220926170.3962@g5.osdl.org>
   http://permalink.gmane.org/gmane.comp.version-control.git/29765
 * Message-ID: <20061023051932.GA8625@evofed.localdomain>
   http://marc.theaimsgroup.com/?i=<20061023051932.GA8625@evofed.localdomain>

(and other messages in this subthread).
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 15:06 ` Santi Béjar
@ 2006-12-10 17:46   ` Kyle Moffett
  2006-12-10 18:10     ` Jakub Narebski
  2007-01-10  1:39   ` David Lang
  1 sibling, 1 reply; 34+ messages in thread
From: Kyle Moffett @ 2006-12-10 17:46 UTC (permalink / raw)
  To: Santi Béjar, Jeff Garzik; +Cc: git

> On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote:
>> I've recently become somewhat interested in the idea of using GIT  
>> to store the contents of various folders in /etc.  However after a  
>> bit of playing with this, I discovered that GIT doesn't actually  
>> preserve all permission bits since that would cause problems with  
>> the more traditional software development model.  I'm curious if  
>> anyone has done this before; and if so, how they went about  
>> handling the permissions and ownership issues.
>>
>> I spent a little time looking over how GIT stores and compares  
>> permission bits; trying to figure out if it's possible to patch in  
>> a new configuration variable or two; say "preserve_all_perms" and  
>> "preserve_owner", or maybe even "save_acls".  It looks like  
>> standard permission preservation is fairly basic; you would just  
>> need to patch a few routines which alter the permissions read in  
>> from disk or compare them with ones from the database.  On the  
>> other hand, it would appear that preserving ownership or full  
>> POSIX ACLs might be a bit of a challenge.

On Dec 10, 2006, at 10:06:14, Santi Béjar wrote:
> I have not used it, but you could try:
>
> http://www.isisetup.ch/
>
> that uses git as a backend.

Wow, umm, that's actually really interesting for me, given that I'm  
most interested in these sorts of things on Debian.  I can't find  
much documentation on their site; the tools look vaguely immature but  
I haven't really had much time to look at it yet.

On Dec 10, 2006, at 09:49:50, Jeff Garzik wrote:
> It's a great idea, something I would like to do, and something I've  
> suggested before.  You could dig through the mailing list archives,  
> if you're motivated.

I have been digging through the archives; I was just holding out hope  
that somebody else on the list had already halfway beat me to the  
punch.  Guess not :-D

> I actively use git to version, store and distribute an exim mail  
> configuration across six servers.  So far my solution has been a  
> 'fix perms' script, or using the file perm checking capabilities of  
> cfengine.
>
> But it would be a lot better if git natively cared about ownership  
> and permissions (presumably via an option).

I was thinking about a standard config option in the GIT config file,  
that way users could have a personal default and repositories could  
specify it locally.

I started tinkering but quickly discovered that permissions handling  
in general in GIT seems to be a mess; there's about 4 different tiers  
where permissions data is manipulated in various formats.  Some  
places use network-endian 16-bit values, there's a couple functions  
which do different truncations to 644 or 755 format.  There are 2  
functions which canonicalize the file mode based on symlink or  
directory status, each in subtly different ways.

I'm slowly sorting through things but if I could get a few pointers  
from someone intimately familiar with the code that would be most  
appreciated:  I'd like to try to add new entries to tree objects  
which older versions of GIT would ignore but which newer versions of  
GIT would use to store ACL or extended-attribute data.

The simplest solution which admittedly breaks the ability of older  
GITs to read the data from a file with attributes (ignoring the ext- 
attrs themselves) is to create a new "file-with-extended-attributes"  
object which contains a binary concatenation (with length bytes and  
attribute names and such) of the file and its extended attributes.   
That breaks the old GIT assumption that permission and security data  
is part of the directory not the file, but it's more in-line with the  
way extended attributes are attached to the inodes in the filesystem  
(although that doesn't really matter IMO).

Alternatively I might be able to add a new entry to each tree object  
with invalid extended file mods bits (IE: Neither a directory, a  
file, nor a symlink), or perhaps an entry with an empty name, which  
points to a new "extended attribute table".  That table could either  
map from (entry, attribute) => (data) or from (entry) =>  
((attribute,data),(attribute,data),[...]), depending on which would  
be more efficient.  It's essential that the overhead for non-ext-attr  
repositories is O(1) and ideally the overhead for a bunch of files  
with the same ext-attr is O(size-of-ext-attr) + O(number-of-files- 
with-that-attr), although that may vary depending on implementation.

Advice, opinions, problems, and "this-has-no-chance-of-ever-even- 
remotely-working" are all useful and welcome!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 15:30   ` Jakub Narebski
@ 2006-12-10 18:10     ` Kyle Moffett
  2006-12-10 18:18       ` Jakub Narebski
  2006-12-10 18:26       ` Jakub Narebski
  0 siblings, 2 replies; 34+ messages in thread
From: Kyle Moffett @ 2006-12-10 18:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote:
> Jeff Garzik wrote:
>> I actively use git to version, store and distribute an exim mail  
>> configuration across six servers.  So far my solution has been a  
>> 'fix perms' script, or using the file perm checking capabilities  
>> of cfengine.
>
> Fix perms' script used on a checkout hook is a best idea I think.

Hmm, unfortunately that has problems with security-related race  
conditions when used directly for /etc.  Think about what happens  
with "/etc/shadow" in that case, for example.  (/etc/.git is of  
course 0700)  I'm sure there are others where non-root daemons get  
unhappy when they get an inotify event and their config files have  
suddenly become root:root:0600.  I also want to be able to "cd /etc  
&& git status" to see what changed after running "apt-get update" or  
maybe fiddling in SWAT or webmin, so a makefile which installs into / 
etc won't quite solve it either.  It would also be nice to see when  
things change the permissions on files in /etc, or even bind-mount an  
append-only volume over /etc/.git/objects to provide additional data  
security.

>> But it would be a lot better if git natively cared about ownership  
>> and  permissions (presumably via an option).
>
> There is currently no place for ownership and extended attributes  
> in the tree object; and even full POSIX permissions might be  
> challenge because for example currently unused 'is socket'  
> permission bit is used for experimental commit-in-tree submodule  
> support.

What about doing something crazy like "is socket" && "is directory"  
&& "is symlink"?  Or something else that old GIT versions would  
ignore and new GIT versions could do something useful with?  Perhaps  
like I mentioned in an earlier email, the new data could be stored as  
part of a modified "file" object.  Alternatively could a directory  
have a file named with an empty string with bogus mode bits which  
points to an extended-attributes-tree object?

> And given Linus stance that git is "content tracker"...

Extended attributes are content too!  This includes things like  
icons, security labels (Think unclassified/confidential/secret/top- 
secret/etc), ACLs, summaries, and other metadata.  Content tracker  
purists could also just ignore the new default-off config options and  
be perfectly happy with status-quo. :-D

> In the loooong thread "VCS comparison table" there was some talk  
> about using git (or any SCM) to manage /etc. Check out:
>
>  * Message-ID: <Pine.LNX.4.64.0610220926170.3962@g5.osdl.org>
>    http://permalink.gmane.org/gmane.comp.version-control.git/29765
>  * Message-ID: <20061023051932.GA8625@evofed.localdomain>
>    http://marc.theaimsgroup.com/? 
> i=<20061023051932.GA8625@evofed.localdomain>
>
> (and other messages in this subthread).

I have, and while it's interesting material that thread produced no  
real patches :-D.  I'd like to introduce some new config options to  
control the new code: "preserve_full_perms", "preserve_posix_acls",  
"preserve_security_labels", and "preserve_user_xattrs" which default  
to false but when set modify GIT's behavior to store, retrieve, and  
compare additional data.

If you have any suggestions on how to store the data such that old  
GIT ignores it I'm all ears :-D.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 17:46   ` Kyle Moffett
@ 2006-12-10 18:10     ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2006-12-10 18:10 UTC (permalink / raw)
  To: git

Kyle Moffett wrote:

> The simplest solution which admittedly breaks the ability of older  
> GITs to read the data from a file with attributes (ignoring the ext- 
> attrs themselves) is to create a new "file-with-extended-attributes"  
> object which contains a binary concatenation (with length bytes and  
> attribute names and such) of the file and its extended attributes.   
> That breaks the old GIT assumption that permission and security data  
> is part of the directory not the file, but it's more in-line with the  
> way extended attributes are attached to the inodes in the filesystem  
> (although that doesn't really matter IMO).

This contradict git philosophy of "tracking contents".

> Alternatively I might be able to add a new entry to each tree object  
> with invalid extended file mods bits (IE: Neither a directory, a  
> file, nor a symlink), or perhaps an entry with an empty name, which  
> points to a new "extended attribute table".  That table could either  
> map from (entry, attribute) => (data) or from (entry) =>  
> ((attribute,data),(attribute,data),[...]), depending on which would  
> be more efficient.  It's essential that the overhead for non-ext-attr  
> repositories is O(1) and ideally the overhead for a bunch of files  
> with the same ext-attr is O(size-of-ext-attr) + O(number-of-files- 
> with-that-attr), although that may vary depending on implementation.

Wouldn't it be better to add another field in the tree object, that
instead of storing "(filemode, link to contents, name)" it would
store "(filemode, link to extended attributes, link to contents, name)"
where "filemode" is mode of a file of which git uses only a few bits
(is a directory, is a symlink, is a file, is a executable file),
and "link to" is sha1 of appropriate blob (or tree) object? Extended
attributes could be stored in new type of object, or just in blob
object. Well, you'd have to extend index in similar way (and add
a way to store extended attributes for directories in index; nowit only
stores info about files).

This of course breaks backwards compatibility...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 18:10     ` Kyle Moffett
@ 2006-12-10 18:18       ` Jakub Narebski
  2006-12-10 18:26       ` Jakub Narebski
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2006-12-10 18:18 UTC (permalink / raw)
  To: git

Kyle Moffett wrote:

> On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote:
>> Jeff Garzik wrote:
>>>
>>> I actively use git to version, store and distribute an exim mail  
>>> configuration across six servers.  So far my solution has been a  
>>> 'fix perms' script, or using the file perm checking capabilities  
>>> of cfengine.
>>
>> Fix perms' script used on a checkout hook is a best idea I think.
> 
> Hmm, unfortunately that has problems with security-related race  
> conditions when used directly for /etc.  Think about what happens  
> with "/etc/shadow" in that case, for example.  (/etc/.git is of  
> course 0700)  I'm sure there are others where non-root daemons get  
> unhappy when they get an inotify event and their config files have  
> suddenly become root:root:0600.  I also want to be able to "cd /etc  
> && git status" to see what changed after running "apt-get update" or  
> maybe fiddling in SWAT or webmin, so a makefile which installs into / 
> etc won't quite solve it either.  It would also be nice to see when  
> things change the permissions on files in /etc, or even bind-mount an  
> append-only volume over /etc/.git/objects to provide additional data  
> security.

The idea is to not store /etc in git directly, but use import/export
scripts, which for example saves permissions and ownership in some
file also tracked by git on import, and restores correct permissions
on export. That is what I remember from this discussion. This of course
means that you would have to write your own porcelain...

What about mentioned in other email IsiSetup?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 18:10     ` Kyle Moffett
  2006-12-10 18:18       ` Jakub Narebski
@ 2006-12-10 18:26       ` Jakub Narebski
  2006-12-10 18:35         ` Kyle Moffett
  1 sibling, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2006-12-10 18:26 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

Kyle Moffett wrote:
> On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote:
>> Jeff Garzik wrote:
>>>
>>> I actively use git to version, store and distribute an exim mail  
>>> configuration across six servers.  So far my solution has been a  
>>> 'fix perms' script, or using the file perm checking capabilities  
>>> of cfengine.
>>
>> Fix perms' script used on a checkout hook is a best idea I think.
> 
> Hmm, unfortunately that has problems with security-related race  
> conditions when used directly for /etc.  Think about what happens  
> with "/etc/shadow" in that case, for example.  (/etc/.git is of  
> course 0700)  I'm sure there are others where non-root daemons get  
> unhappy when they get an inotify event and their config files have  
> suddenly become root:root:0600.  I also want to be able to "cd /etc  
> && git status" to see what changed after running "apt-get update" or  
> maybe fiddling in SWAT or webmin, so a makefile which installs into / 
> etc won't quite solve it either.  It would also be nice to see when  
> things change the permissions on files in /etc, or even bind-mount an  
> append-only volume over /etc/.git/objects to provide additional data  
> security.

The idea is to not store /etc in git directly, but use import/export
scripts, which for example saves permissions and ownership in some
file also tracked by git on import, and restores correct permissions
on export. That is what I remember from this discussion. This of course
means that you would have to write your own porcelain...

What about mentioned in other email IsiSetup?

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 18:26       ` Jakub Narebski
@ 2006-12-10 18:35         ` Kyle Moffett
  2006-12-11 10:39           ` Andreas Ericsson
  0 siblings, 1 reply; 34+ messages in thread
From: Kyle Moffett @ 2006-12-10 18:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Dec 10, 2006, at 13:26:32, Jakub Narebski wrote:
> The idea is to not store /etc in git directly, but use import/ 
> export scripts, which for example saves permissions and ownership  
> in some file also tracked by git on import, and restores correct  
> permissions on export. That is what I remember from this  
> discussion. This of course means that you would have to write your  
> own porcelain...
>
> What about mentioned in other email IsiSetup?

The real problem I have with that is you literally have to duplicate  
all sorts of functionality.  I want to run "foo-status" in /etc and  
get something useful, but if /etc is not a git directory in and of  
itself then you have to duplicate most of "git-status" anyways.  And  
the same applies to all the other commands.  From what I can see of  
IsiSetup the tools for checking out, merging, modifying, cloning, etc  
are all much more limited and immature than the ones available  
through GIT/cogito, and I would be loathe to discard all that extra  
functionality and duplicate a few thousand lines of code in the name  
of "concept purity".

GIT already has _some_ idea about file permissions, it just discards  
most of the data before writing to disk.  Of course, adding POSIX  
ACLs and user-extended-attributes requires a new data format, but  
those are very similar to filesystem permissions; they differ only in  
amount of data stored, not in purpose.

Import/export scripts literally require wrapping every single GIT  
command with a script that changes directory a few times, reads from  
a different checked-out tree, and permutes some extended-attribute  
data slightly before storing it in the underlying GIT tree.  Even  
without adding any new functionality whatsoever that doubles the  
amount of code just for finding your repository and checking command- 
line arguments, and that's a crazy trade-off to make in any situation.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 18:35         ` Kyle Moffett
@ 2006-12-11 10:39           ` Andreas Ericsson
  2006-12-11 10:55             ` Jeff Garzik
  2006-12-11 12:13             ` Josef Weidendorfer
  0 siblings, 2 replies; 34+ messages in thread
From: Andreas Ericsson @ 2006-12-11 10:39 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Jakub Narebski, git

Kyle Moffett wrote:
> On Dec 10, 2006, at 13:26:32, Jakub Narebski wrote:
>> The idea is to not store /etc in git directly, but use import/export 
>> scripts, which for example saves permissions and ownership in some 
>> file also tracked by git on import, and restores correct permissions 
>> on export. That is what I remember from this discussion. This of 
>> course means that you would have to write your own porcelain...
>>
>> What about mentioned in other email IsiSetup?
> 
> The real problem I have with that is you literally have to duplicate all 
> sorts of functionality.  I want to run "foo-status" in /etc and get 
> something useful, but if /etc is not a git directory in and of itself 
> then you have to duplicate most of "git-status" anyways.

Make /etc/.git a symlink to where you store your repo and go to the 
other directory when you want to *restore* configuration. The only "own 
porcelain" you need to write is a simple program that understands "save" 
and "restore" (or some such) and tucks away the meta-data in a file 
somewhere inside the git tree. If you make it in the format

octal-mode path/to/file

you can even get decently human-readable permission diffs, which will 
most likely be prettier and easier to read than anything git currently has.

> 
> GIT already has _some_ idea about file permissions, it just discards 
> most of the data before writing to disk.   Of course, adding POSIX ACLs
> and user-extended-attributes requires a new data format, but those are 
> very similar to filesystem permissions; they differ only in amount of 
> data stored, not in purpose.
> 

The amount of data stored is the issue here. The current implementation 
(which works just fine and does The Right Thing(tm) for code-repos) only 
stores what it has to and uses the spare bits to do other things.

> Import/export scripts literally require wrapping every single GIT 
> command with a script that changes directory a few times, reads from a 
> different checked-out tree, and permutes some extended-attribute data 
> slightly before storing it in the underlying GIT tree.  Even without 
> adding any new functionality whatsoever that doubles the amount of code 
> just for finding your repository and checking command-line arguments, 
> and that's a crazy trade-off to make in any situation.
> 

GIT_DIR=/some/where/else/.git git log -p

Why would you want to read from a different checked-out tree? 
Non-committed data is "changes", committed data is "HEAD" (or 
commit-ish) and marked data is "index". I see no reason what so ever for 
a second checked-out tree.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
  2006-12-10 14:49 ` Jeff Garzik
  2006-12-10 15:06 ` Santi Béjar
@ 2006-12-11 10:50 ` Nikolai Weibull
  2006-12-12  3:45 ` Daniel Barkalow
  3 siblings, 0 replies; 34+ messages in thread
From: Nikolai Weibull @ 2006-12-11 10:50 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote:
> I've recently become somewhat interested in the idea of using GIT to
> store the contents of various folders in /etc.  However after a bit
> of playing with this, I discovered that GIT doesn't actually preserve
> all permission bits since that would cause problems with the more
> traditional software development model.  I'm curious if anyone has
> done this before; and if so, how they went about handling the
> permissions and ownership issues.

I keep the files I want to track in a separate folder that I track
with Git and use a Makefile for updating /etc.  I basically have a
rule for checking for differences between the tracked folder and /etc
and a rule for installing changed files (with the correct
permissions).  It works, but it does require some "Makefile magic" to
work right (or the way /I/ want it anyway).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-11 10:39           ` Andreas Ericsson
@ 2006-12-11 10:55             ` Jeff Garzik
  2006-12-11 12:13             ` Josef Weidendorfer
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff Garzik @ 2006-12-11 10:55 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andreas Ericsson, Jakub Narebski, git

Another option is to have a process that stores your configs in git, and 
script an export from git to rpm|deb.  Packaging systems make it even 
easier to go between config versions.

	Jeff



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-11 10:39           ` Andreas Ericsson
  2006-12-11 10:55             ` Jeff Garzik
@ 2006-12-11 12:13             ` Josef Weidendorfer
  2006-12-11 13:33               ` Johannes Schindelin
  1 sibling, 1 reply; 34+ messages in thread
From: Josef Weidendorfer @ 2006-12-11 12:13 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Kyle Moffett, Jakub Narebski, git

On Monday 11 December 2006 11:39, Andreas Ericsson wrote:
> > Import/export scripts literally require wrapping every single GIT 
> > command with a script that changes directory a few times, reads from a 
> > different checked-out tree, and permutes some extended-attribute data 
> > slightly before storing it in the underlying GIT tree.  Even without 
> > adding any new functionality whatsoever that doubles the amount of code 
> > just for finding your repository and checking command-line arguments, 
> > and that's a crazy trade-off to make in any situation.
> > 
> 
> GIT_DIR=/some/where/else/.git git log -p

Doing this everytime you want to run a git command *is* a lot of time
wasted for typing.

The .gitlink proposal would come in handy here: you have a simple
file instead of .git/, which links to the real repository.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-11 12:13             ` Josef Weidendorfer
@ 2006-12-11 13:33               ` Johannes Schindelin
  2006-12-11 15:07                 ` Josef Weidendorfer
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2006-12-11 13:33 UTC (permalink / raw)
  To: Josef Weidendorfer; +Cc: Andreas Ericsson, Kyle Moffett, Jakub Narebski, git

Hi,

On Mon, 11 Dec 2006, Josef Weidendorfer wrote:

> On Monday 11 December 2006 11:39, Andreas Ericsson wrote:
> > > Import/export scripts literally require wrapping every single GIT 
> > > command with a script that changes directory a few times, reads from a 
> > > different checked-out tree, and permutes some extended-attribute data 
> > > slightly before storing it in the underlying GIT tree.  Even without 
> > > adding any new functionality whatsoever that doubles the amount of code 
> > > just for finding your repository and checking command-line arguments, 
> > > and that's a crazy trade-off to make in any situation.
> > > 
> > 
> > GIT_DIR=/some/where/else/.git git log -p
> 
> Doing this everytime you want to run a git command *is* a lot of time
> wasted for typing.
> 
> The .gitlink proposal would come in handy here: you have a simple
> file instead of .git/, which links to the real repository.

I beg your pardon; I'm just joining in. Why is a symbolic link for .git 
inacceptable?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-11 13:33               ` Johannes Schindelin
@ 2006-12-11 15:07                 ` Josef Weidendorfer
  0 siblings, 0 replies; 34+ messages in thread
From: Josef Weidendorfer @ 2006-12-11 15:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Andreas Ericsson, Kyle Moffett, Jakub Narebski, git

On Monday 11 December 2006 14:33, Johannes Schindelin wrote:
> On Mon, 11 Dec 2006, Josef Weidendorfer wrote:
> 
> > On Monday 11 December 2006 11:39, Andreas Ericsson wrote:
> > > > Import/export scripts literally require wrapping every single GIT 
> > > > command with a script that changes directory a few times, reads from a 
> > > > different checked-out tree, and permutes some extended-attribute data 
> > > > slightly before storing it in the underlying GIT tree.  Even without 
> > > > adding any new functionality whatsoever that doubles the amount of code 
> > > > just for finding your repository and checking command-line arguments, 
> > > > and that's a crazy trade-off to make in any situation.
> > > > 
> > > 
> > > GIT_DIR=/some/where/else/.git git log -p
> > 
> > Doing this everytime you want to run a git command *is* a lot of time
> > wasted for typing.
> > 
> > The .gitlink proposal would come in handy here: you have a simple
> > file instead of .git/, which links to the real repository.
> 
> I beg your pardon; I'm just joining in. Why is a symbolic link for .git 
> inacceptable?

You are totally right.

The .gitlink thing is tailored to allow submodule support later. It includes
some smart searching for the git repository to allow moving the checkout in
some limits without breaking the link to the repository.

Aside from this, the proposal is more flexible in that you can specify not
only GIT_DIR (or the GIT_DIR_HINT to trigger smart search), but also
GIT_INDEX_FILE and GIT_HEAD_FILE, which allows different checkouts
(with different index state and HEAD) for the same repo easily.

Which is not needed in this case.
So, sorry for the noise ;-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
                   ` (2 preceding siblings ...)
  2006-12-11 10:50 ` Nikolai Weibull
@ 2006-12-12  3:45 ` Daniel Barkalow
  2006-12-12 13:49   ` Kyle Moffett
  3 siblings, 1 reply; 34+ messages in thread
From: Daniel Barkalow @ 2006-12-12  3:45 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

On Sun, 10 Dec 2006, Kyle Moffett wrote:

> I've recently become somewhat interested in the idea of using GIT to store the
> contents of various folders in /etc.  However after a bit of playing with
> this, I discovered that GIT doesn't actually preserve all permission bits
> since that would cause problems with the more traditional software development
> model.  I'm curious if anyone has done this before; and if so, how they went
> about handling the permissions and ownership issues.
> 
> I spent a little time looking over how GIT stores and compares permission
> bits; trying to figure out if it's possible to patch in a new configuration
> variable or two; say "preserve_all_perms" and "preserve_owner", or maybe even
> "save_acls".  It looks like standard permission preservation is fairly basic;
> you would just need to patch a few routines which alter the permissions read
> in from disk or compare them with ones from the database.  On the other hand,
> it would appear that preserving ownership or full POSIX ACLs might be a bit of
> a challenge.

The first thing you'd want to do is correct the fact that the index 
doesn't keep full permissions. We decided long ago that we don't want to 
track more than 0100, but we're discarding the rest between the filesystem 
and the index, rather than between the index and the tree. (This is weird 
of us, since we keep gid and uid in the index, as changedness heuristics, 
but don't keep permissions; of course, we'd have to apply umask to the 
index when we check it out to sync what we expect to be there with what 
has actually been created.)

I think that would be the only change needed to the index and 
index/working directory connection, although it might be necessary to 
support longer values for uid/gid/etc, since they'd be important data now.

Note that git only stores content, not incidental information. But a lot 
of information which is incidental in a source tree is content in /etc. 
This implies that /etc and working/linux-2.6 are fundamentally different 
sorts of things, because different aspects of them are content.

I'd suggest a new object type for a directory with permissions, ACLs, and 
so forth. It should probably use symbolic owner and group, too. My guess 
is that you'll want to use "commit"s, the new object type, and "blob"s. 
Everything that uses trees would need to have a version that uses the new 
type. But I think that you generally want different behavior anyway, so 
that's not a major issue.

	-Daniel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-12  3:45 ` Daniel Barkalow
@ 2006-12-12 13:49   ` Kyle Moffett
  2006-12-12 15:53     ` Andy Parkins
  2006-12-13 18:10     ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow
  0 siblings, 2 replies; 34+ messages in thread
From: Kyle Moffett @ 2006-12-12 13:49 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

On Dec 11, 2006, at 22:45:25, Daniel Barkalow wrote:
> The first thing you'd want to do is correct the fact that the index  
> doesn't keep full permissions. We decided long ago that we don't  
> want to track more than 0100, but we're discarding the rest between  
> the filesystem and the index, rather than between the index and the  
> tree. (This is weird of us, since we keep gid and uid in the index,  
> as changedness heuristics, but don't keep permissions; of course,  
> we'd have to apply umask to the index when we check it out to sync  
> what we expect to be there with what has actually been created.)
>
> I think that would be the only change needed to the index and index/ 
> working directory connection, although it might be necessary to  
> support longer values for uid/gid/etc, since they'd be important  
> data now.

Hmm, ok.  It would seem to be a reasonable requirement that if you  
want to change any of the "preserve_*_attributes" config options you  
need to blow away and recreate your index, no?  I would probably  
change the underlying index format pretty completely and stick a new  
version tag inside it.

> Note that git only stores content, not incidental information. But  
> a lot of information which is incidental in a source tree is  
> content in /etc. This implies that /etc and working/linux-2.6 are  
> fundamentally different sorts of things, because different aspects  
> of them are content.

Ahh, I hadn't thought of it that way before but that makes a lot of  
sense.  Thanks!

> I'd suggest a new object type for a directory with permissions,  
> ACLs, and so forth. It should probably use symbolic owner and  
> group, too. My guess is that you'll want to use "commit"s, the new  
> object type, and "blob"s. Everything that uses trees would need to  
> have a version that uses the new type. But I think that you  
> generally want different behavior anyway, so that's not a major issue.

Ok, seems straightforward enough.  One other thing that crossed my  
mind was figuring out how to handle hardlinks.  The simplest solution  
would be to add an extra layer of indirection between the "file  
inode" and the "file data".  Instead of your directory pointing to a  
"file-data" blob and "file-attributes" object, it would point to an  
"file-inode" object with embedded attribute data and a pointer to the  
file contents blob.

I remember reading some discussions from the early days of GIT about  
how that was considered and discarded because the extra overhead  
wouldn't give any real tangible benefit.  On the other hand for  
something like /etc the added benefits of tracking extended  
attributes and hardlinks might outweigh the cost of a bunch of extra  
objects in the database.  A bit of care with the construction of the  
index file should make it sufficiently efficient for day-to-day usage.

If you're interested in some random musings about using GIT concepts  
to version whole filesystems (think checkpointing your disk drive and  
instantly restoring when you screw up), read on below, otherwise  
don't bother.

Cheers,
Kyle Moffett

<Random Tangential Off-the-Wall Thought Experiment>

NOTE: This probably belongs in it's own thread but it's such a  
random, undeveloped, and off-the-wall concept that I threw it in here  
just for kicks.

Combining extensions like those described above with something like  
the Ext3 block-allocation, inode-management and journalling code to  
produce a "versioned filesystem".  With the exponential growth of  
storage density over the last several years we've gotten to the point  
where we can many many hours of extremely realistic video and audio  
on your average small-computer drive.  Versioning your home  
directory, or even your entire computer, even with fairly steady  
modifications to multimedia files, installation of software programs,  
etc, doesn't seem like such an impossible undertaking anymore.

One predefined inode would contain a list of tags/heads and their  
current hashes.  Mount the filesystem with a "tag=$TAG" option to  
specify the initial tree object used for the root directory (with  
syscalls to navigate the history).  Allocate an inode per-mount to  
represent any changes from the last commit.

For efficiency purposes (no need to revision the entire system when I  
commit a change in my home directory) add a "subtree" object type  
which can specify either a particular hash or a symbolic tag/head  
name as a pseudo sub-mountpoint.  Trap traversal of the sub- 
mountpoint node to mount the filesystem with "tag=$SUBTAG" on the sub- 
mountpoint, expiring it some time after the last traversal.

The only remaining issue would be properly navigating through the  
history, preserving or discarding changes.  Since the kernel could  
easily manage copy-on-write semantics for underlying disk blocks you  
wouldn't need a separate "working copy" except where it's modified  
from the original, and discarding changes is as simple as unlinking  
any files referenced by the per-mount delta inode.

Committing changes would get tricky, you would need to hot-remap  
memory-mapped pages read-only while you checksum and store them.  The  
next write attempt would then separate the page from the freshly- 
committed on-disk version.  Would need a mechanism for applications  
to "trap" the commit so they could make databases consistent, with  
the ability for root or the mountpoint owner to commit without  
waiting for synchronization.  Only needs to synchronize files  
belonging to the new commit.  Merges would be managed from userspace,  
as long as there is a way to browse through objects by hash given  
sufficient permissions.

Make sure it's really easy to make a new atomic commit and/or reset  
to a known state every time the computer is rebooted (whether soft- 
rebooted or via crash/powerkill).  With journalling and the write- 
once nature of GIT it would be trivial to never require an fsck run.   
Also needs a way to move data between filesystems.  Makes LVM largely  
irrelevant; it doesn't matter how many disks you have if they're all  
treated as a shared storage pool for your GITfs data.  Make sure it's  
possible to archive data onto slower disks/media and purge older  
commits from the archive (missing parent commit references are  
tolerable in many situations).  Needs a way to notice hash collisions  
and take action to avoid them.

</Random Tangential Off-the-Wall Thought Experiment>

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-12 13:49   ` Kyle Moffett
@ 2006-12-12 15:53     ` Andy Parkins
  2006-12-12 22:49       ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
  2006-12-13 18:10     ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow
  1 sibling, 1 reply; 34+ messages in thread
From: Andy Parkins @ 2006-12-12 15:53 UTC (permalink / raw)
  To: git

On Tuesday 2006 December 12 13:49, Kyle Moffett wrote:

> Hmm, ok.  It would seem to be a reasonable requirement that if you
> want to change any of the "preserve_*_attributes" config options you
> need to blow away and recreate your index, no?  I would probably
> change the underlying index format pretty completely and stick a new
> version tag inside it.

I wonder if git's skill at managing content is the answer?  Rather than mess 
around with git's internals, the index, or the object database; how about 
simply having a pre-commit script that writes out a file that looks like:

-rw-r--r--  andyp andyp CHANGES
-rw-r--r--  andyp andyp COPYING
-rw-rw-r--  andyp andyp CREDITS
-rw-r--r--  andyp andyp Configure
-rw-rw-r--  andyp andyp Makefile
-rw-r--r--  andyp andyp README

If /that/ file were stored in the repository and you had a script that could 
read that file and apply the permissions after a checkout you'd have what you 
want.

If the permissions of a file changed but the content didn't, then 
this ".gitpermissions" file would have changed content but the file itself 
would remain the same.  If the content changed but not the permissions 
then ".gitpermissions" would be untouched.

Assuming that you're allowed to mess with the index in pre-commit (I haven't 
checked), one half of it can be automatic.  I suppose you could also plead 
for a post-checkout hook to apply those permissions and the whole lot would 
be transparent.



Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 15:53     ` Andy Parkins
@ 2006-12-12 22:49       ` Steven Grimm
  2006-12-12 22:57         ` Johannes Schindelin
                           ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Steven Grimm @ 2006-12-12 22:49 UTC (permalink / raw)
  To: git

This discussion reminds me of a use of git I've had in the back of my 
head to try out for a while. Right now I'm doing my local snapshot 
backups using the rsync-with-hard-links scheme 
(http://www.mikerubel.org/computers/rsync_snapshots/ if you're not 
familiar with it). This is nice in that the contents of files that don't 
change are only stored once on the backup disk. But it is less than 
optimal in that a file that changes even a little bit is stored from 
scratch.

What would be great for this would be to store each day's backup as a 
git revision; with a periodic repack, this would be much more 
space-efficient than the rsync hard links.

The problem is that while that would give me a very efficient backup 
scheme, the repository would still grow over time. In rsync land, I 
solve the disk space issue by keeping two weeks' worth of daily 
snapshots, then six months' worth of weekly snapshots, then two years' 
worth of monthly snapshots; files that change daily have a constant 
number of revisions stored in my backups, and older files drop off the 
backup disk as they age.

Given that there's no way (or is there?) to delete revisions from the 
*beginning* of a git revision history, right now it seems like the only 
approach that comes close is to give up on the "daily then weekly then 
monthly" thing -- probably fine given the space savings of delta 
compression -- and periodically make shallow clones of the backup 
repository that fetch all but the first N revisions; once a shallow 
clone is made, the original gets deleted and the clone is the new backup 
repo.

But it would sure be more efficient to be able to "shallow-ize" an 
existing repository. That would be useful for things other than backups, 
too, e.g. the recent request for some way to track just the current 
version of the kernel code rather than its revision history. If there 
were a shallowize command, you could do something like "git pull; git 
shallowize --depth 1" to track the latest revision without keeping the 
history locally.

Anyone think that sounds like an interesting thing to explore?

-Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 22:49       ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
@ 2006-12-12 22:57         ` Johannes Schindelin
  2006-12-12 23:06           ` Steven Grimm
  2006-12-12 23:15         ` Martin Langhoff
  2006-12-12 23:43         ` Using git as a general backup mechanism Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2006-12-12 22:57 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Hi,

On Tue, 12 Dec 2006, Steven Grimm wrote:

> If there were a shallowize command, you could do something like "git 
> pull; git shallowize --depth 1" to track the latest revision without 
> keeping the history locally.

Almost!

$ git pull --depth 1

Though it needs a server _and_ a client supporting shallow clones, which 
support is brewed in "next" right now.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 22:57         ` Johannes Schindelin
@ 2006-12-12 23:06           ` Steven Grimm
  2006-12-13  0:01             ` Johannes Schindelin
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Grimm @ 2006-12-12 23:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
> $ git pull --depth 1
>
> Though it needs a server _and_ a client supporting shallow clones, which 
> support is brewed in "next" right now.
>   

Will that actually discard old revisions that are already stored locally?

-Steve

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 22:49       ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
  2006-12-12 22:57         ` Johannes Schindelin
@ 2006-12-12 23:15         ` Martin Langhoff
  2006-12-12 23:23           ` Martin Langhoff
  2006-12-12 23:43         ` Using git as a general backup mechanism Junio C Hamano
  2 siblings, 1 reply; 34+ messages in thread
From: Martin Langhoff @ 2006-12-12 23:15 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven,

I've been thinking myself of writing a pdumpfs lookalike that uses git
internally. Sounds you you've got one already ;-)

In terms of getting rid of old history, have you considered moving a
graft point "forward" in time, and running git-repack -a -d? With your
history being (mostly?) linear this could be a workable scheme, but I
don't have much practice with using grafts.

cheers,



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 23:15         ` Martin Langhoff
@ 2006-12-12 23:23           ` Martin Langhoff
  0 siblings, 0 replies; 34+ messages in thread
From: Martin Langhoff @ 2006-12-12 23:23 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

On 12/13/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> I've been thinking myself of writing a pdumpfs lookalike that uses git
> internally. Sounds you you've got one already ;-)

Actually - what I was considering was mixing the "daily commit" with
GITFS ;-) http://www.sfgoth.com/~mitch/linux/gitfs/

are your scripts published anywhere?

cheers,



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism
  2006-12-12 22:49       ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
  2006-12-12 22:57         ` Johannes Schindelin
  2006-12-12 23:15         ` Martin Langhoff
@ 2006-12-12 23:43         ` Junio C Hamano
  2006-12-14 23:33           ` Steven Grimm
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2006-12-12 23:43 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven Grimm <koreth@midwinter.com> writes:

> What would be great for this would be to store each day's backup as a
> git revision; with a periodic repack, this would be much more
> space-efficient than the rsync hard links.
>
> The problem is that while that would give me a very efficient backup
> scheme, the repository would still grow over time. In rsync land, I
> solve the disk space issue by keeping two weeks' worth of daily
> snapshots, then six months' worth of weekly snapshots, then two years'
> worth of monthly snapshots; files that change daily have a constant
> number of revisions stored in my backups, and older files drop off the
> backup disk as they age.

Why not use N independent branches?  I'd illustrate only with
two levels below, but you could:

 (0) make a full tree snapshot.  Store the commit in 'daily'
     branch as its tip.

 (1) A new day comes.  Create an empty branch 'daily' if you
     do not already have one.  Make a full tree snapshot, and
     create a parentless commit for the day if the 'daily'
     branch did not exist, or make it a child of the 'daily'
     commit from the previous day if the branch existed.

 (2) End of week comes.  Create an empty branch 'weekly' if you
     do not already have one.  Make a full tree snapshot, and
     create a parentless commit for the week if the 'weekly'
     branch did not exist, or make it a child of the 'weekly'
     commit from the last week.  Discard 'lastweek' branch if
     you have one, and rename 'daily' branch to 'lastweek'.

At the end of month, you can rename 'weekly' to 'lastmonth'; if
you discard previous 'lastmonth' at this point, you essentially
made files older than two months drop off the backup disk.  You
can add more hierarchy with longer period to extend the scheme
ad infinitum.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc)
  2006-12-12 23:06           ` Steven Grimm
@ 2006-12-13  0:01             ` Johannes Schindelin
  0 siblings, 0 replies; 34+ messages in thread
From: Johannes Schindelin @ 2006-12-13  0:01 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Hi,

On Tue, 12 Dec 2006, Steven Grimm wrote:

> Johannes Schindelin wrote:
> > $ git pull --depth 1
> > 
> > Though it needs a server _and_ a client supporting shallow clones, 
> > which support is brewed in "next" right now.
> 
> Will that actually discard old revisions that are already stored 
> locally?

No. A pull should _never_ lose anything from the repository. However, if 
some objects become no-longer reachable (and at the moment it looks like 
we cut of history, even if we should not need to), they can be pruned from 
the repo.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-12 13:49   ` Kyle Moffett
  2006-12-12 15:53     ` Andy Parkins
@ 2006-12-13 18:10     ` Daniel Barkalow
  2006-12-14  5:06       ` Chris Riddoch
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Barkalow @ 2006-12-13 18:10 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: git

On Tue, 12 Dec 2006, Kyle Moffett wrote:

> Hmm, ok.  It would seem to be a reasonable requirement that if you want to
> change any of the "preserve_*_attributes" config options you need to blow away
> and recreate your index, no?  I would probably change the underlying index
> format pretty completely and stick a new version tag inside it.

You should be able to promote an insufficient-version index to a 
new-version index that's needs to be refreshed for every entry. (And then 
update-index would take care of the necessary rewrite-everything in the 
normal way). But I suspect that the right thing is to require that the 
repository be created with a "commits-include-directories-not-trees" flag, 
and this means that you always use the extra-detailed index, and the 
options only affect what information is filtered out in transit between 
the directory object and the index. Having more information in the index 
is merely a potential waste of space, not a correctness issue (we have 
extra information for trees in the index now, remember); it just means 
that there are more things that will cause git to reread the file, rather 
than declaring it unchanged with a stat().

For that matter, it may be best for the directory objects to record what 
information in them is real, and keep the "what's content" mask in the 
index as well. If it changes over the history of a repository, you want to 
correctly interpret the historical commits.

> Ok, seems straightforward enough.  One other thing that crossed my mind was
> figuring out how to handle hardlinks.  The simplest solution would be to add
> an extra layer of indirection between the "file inode" and the "file data".
> Instead of your directory pointing to a "file-data" blob and "file-attributes"
> object, it would point to an "file-inode" object with embedded attribute data
> and a pointer to the file contents blob.
>
> I remember reading some discussions from the early days of GIT about how that
> was considered and discarded because the extra overhead wouldn't give any real
> tangible benefit.  On the other hand for something like /etc the added
> benefits of tracking extended attributes and hardlinks might outweigh the cost
> of a bunch of extra objects in the database.  A bit of care with the
> construction of the index file should make it sufficiently efficient for
> day-to-day usage.

I was thinking this could be internal to the directory object, but you 
probably want to support hardlinks shared between dentries in different 
directory objects, so you're probably right that this makes sense. 

Alternatively, you could use a single "directory" object for the whole 
state (including subdirectories), making hardlinks out of the object 
clearly impossible, or you could use some scheme for sharing 
sub-"directory" objects that would imply that hardlinks are within an 
object (the hard part here is finding things when their locations aren't 
predictable by name).

	-Daniel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2006-12-13 18:10     ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow
@ 2006-12-14  5:06       ` Chris Riddoch
  0 siblings, 0 replies; 34+ messages in thread
From: Chris Riddoch @ 2006-12-14  5:06 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Kyle Moffett, git

So, I've been making little repositories for appropriately related
stuff.  For example, I have a repository for my ~/.bashrc,
~/.bash_profile, ~/.bash_completions/*, and such.

I recall Linus's post in the "VCS Comparison Table" thread, and after
thinking about it, I decided the best thing to do would be to have a
couple extra files tracked in the repository, alongside other data.

I use a backup shell script to copy things from my system to the
repository, and then I run getfacl on it all to write out all the
details to a 'facl' file in my repository.  Then I can make a commit.

Then there's a restore shell script to copy things back to my system,
and restore ownership and permissions with setfacl.

I store the backup and restore scripts in the repository.  Paths are
currently hard-coded.  I'm sure there's a more flexible way to do
this, though I'd need some means of representing the correspondence
between content in the repository and files in my filesystem.


On 12/13/06, Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Tue, 12 Dec 2006, Kyle Moffett wrote:
>
> > Hmm, ok.  It would seem to be a reasonable requirement that if you want to
> > change any of the "preserve_*_attributes" config options you need to blow
> away
> > and recreate your index, no?  I would probably change the underlying index
> > format pretty completely and stick a new version tag inside it.
>
> You should be able to promote an insufficient-version index to a
> new-version index that's needs to be refreshed for every entry. (And then
> update-index would take care of the necessary rewrite-everything in the
> normal way). But I suspect that the right thing is to require that the
> repository be created with a "commits-include-directories-not-trees" flag,
> and this means that you always use the extra-detailed index, and the
> options only affect what information is filtered out in transit between
> the directory object and the index. Having more information in the index
> is merely a potential waste of space, not a correctness issue (we have
> extra information for trees in the index now, remember); it just means
> that there are more things that will cause git to reread the file, rather
> than declaring it unchanged with a stat().
>
> For that matter, it may be best for the directory objects to record what
> information in them is real, and keep the "what's content" mask in the
> index as well. If it changes over the history of a repository, you want to
> correctly interpret the historical commits.
>
> > Ok, seems straightforward enough.  One other thing that crossed my mind
> was
> > figuring out how to handle hardlinks.  The simplest solution would be to
> add
> > an extra layer of indirection between the "file inode" and the "file
> data".
> > Instead of your directory pointing to a "file-data" blob and
> "file-attributes"
> > object, it would point to an "file-inode" object with embedded attribute
> data
> > and a pointer to the file contents blob.
> >
> > I remember reading some discussions from the early days of GIT about how
> that
> > was considered and discarded because the extra overhead wouldn't give any
> real
> > tangible benefit.  On the other hand for something like /etc the added
> > benefits of tracking extended attributes and hardlinks might outweigh the
> cost
> > of a bunch of extra objects in the database.  A bit of care with the
> > construction of the index file should make it sufficiently efficient for
> > day-to-day usage.
>
> I was thinking this could be internal to the directory object, but you
> probably want to support hardlinks shared between dentries in different
> directory objects, so you're probably right that this makes sense.
>
> Alternatively, you could use a single "directory" object for the whole
> state (including subdirectories), making hardlinks out of the object
> clearly impossible, or you could use some scheme for sharing
> sub-"directory" objects that would imply that hardlinks are within an
> object (the hard part here is finding things when their locations aren't
> predictable by name).
>
> 	-Daniel
> *This .sig left intentionally blank*
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
epistemological humility

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism
  2006-12-12 23:43         ` Using git as a general backup mechanism Junio C Hamano
@ 2006-12-14 23:33           ` Steven Grimm
  2006-12-15  0:33             ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Steven Grimm @ 2006-12-14 23:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
>  (2) End of week comes.  Create an empty branch 'weekly' if you
>      do not already have one.  Make a full tree snapshot, and
>      create a parentless commit for the week if the 'weekly'
>      branch did not exist, or make it a child of the 'weekly'
>      commit from the last week.  Discard 'lastweek' branch if
>      you have one, and rename 'daily' branch to 'lastweek'.

That sounds like it'd work, but doesn't it imply that the history of a 
given file in the backups is not continuous? That is, an old copy of a 
file on the "weekly" branch doesn't have any kind of ancestor 
relationship with the same file on the "daily" branch? While that's 
obviously no different than the current git-less situation where there's 
no notion of ancestry at all, it'd be neat if this backup scheme could 
actually track long-term changes to individual files.

I wonder if rebasing can get me what I want. Something like:

(1) Make a new branch from the latest daily. Commit a full tree
    snapshot to the new branch. (Each branch has exactly one commit.)

(2) To expire a daily backup, rebase the second-oldest daily branch,
    which will initially be a child of the oldest daily branch, under
    the latest weekly branch instead. Delete the oldest daily branch.
    I believe the right commands here would be:

    git-rebase -s recursive -s ours --onto latest-weekly \
               oldest-daily second-oldest-daily
    git-branch -D oldest-daily

    (Not sure about the double "-s", but I want it to detect renames
    where possible and never flag any conflicts.)

(3) At the end of the week, instead of expiring the oldest daily
    branch, rename it to indicate that it's now a weekly snapshot.
    (That will implicitly do the first part of step 2, since the
    next daily branch in line will already be a descendant of the
    newly renamed branch.)

    Repeat step 2, rebasing against the latest monthly branch,
    to expire the oldest weekly.

(4) To expire an old monthly, rebase the second-oldest monthly branch
    under the initial empty revision, then delete the oldest monthly.
    This is basically step 2 again, but rebasing under a fixed starting
    point.

(5) Run git-prune to expire the objects in the deleted branches, then
    git-repack -a -d to delta-compress everything.

That's a bit convoluted, admittedly, and probably a perversion of 
everything pure about the branch system, but would it work? The big 
thing I'm not sure about here is whether, after doing my rebase and 
delete in step 2, the objects from the oldest daily will actually be 
removed by git-prune. They should be unreachable at that point, I think.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using git as a general backup mechanism
  2006-12-14 23:33           ` Steven Grimm
@ 2006-12-15  0:33             ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2006-12-15  0:33 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven Grimm <koreth@midwinter.com> writes:

> Junio C Hamano wrote:
>>  (2) End of week comes.  Create an empty branch 'weekly' if you
>>      do not already have one.  Make a full tree snapshot, and
>>      create a parentless commit for the week if the 'weekly'
>>      branch did not exist, or make it a child of the 'weekly'
>>      commit from the last week.  Discard 'lastweek' branch if
>>      you have one, and rename 'daily' branch to 'lastweek'.
>
> That sounds like it'd work, but doesn't it imply that the history of a
> given file in the backups is not continuous? That is, an old copy of a
> file on the "weekly" branch doesn't have any kind of ancestor
> relationship with the same file on the "daily" branch? While that's
> obviously no different than the current git-less situation where
> there's no notion of ancestry at all, it'd be neat if this backup
> scheme could actually track long-term changes to individual files.

You can keep them connected by rewriting history of bounded
number of commits.  When you start a new week, you would make
the Monday commit a child of the tip of weekly branch that
represents the latest weekly shapshot.  Then on Friday, the
history would show the 5 commits during the week and behind that
would be a sequence of commits with one-per-week granularity.
When you rotate the week's daily log out and the commit for
Monday is based on the weekly history you are going to toss out,
you may need to rebase that week's daily log branch.

Let's say your policy is to keep daily log for at least one week
and enough number of end-of-week weekly logs.  Let's say it is
week #2 right now.

                        Aooo... (week #2 daily)
                       /|
                ooooooB |  (week #1 daily)
               /        |
     o--------o---------C (end-of-week weekly log)

The first commit in this week's daily log (A) would have two
parents: last commit from daily log of week #1 (B), and the
latest commit on the end-of-week weekly log (C).  Most likely, B
and C would have exactly the same tree.  That way, you would
have at least 7 days of daily log; at the end of this week you
would have close to 14 days but "keeping at least one week" is
satisfied.

When starting the 3rd week, you will discard 1st week's log; you
would need to rewrite 7 days worth of commits from week #2,
because the first commit of week #2 should now only have one
parent (C), and you would forget the commit on the last day of
week #1 as its parent (B).  Which cascades through 7 commits you
made during week #2.  You are not changing any trees, so this
should be quite efficient.

Then the first daily commit of 3rd week would have two parents,
the commit at the end of week #2 daily branch (D), and a new
commit (E) at the tip of the end-of-week log.  Again, D and E
would have the identical trees.

                                o...... (week #3 daily)
                               /|
                        Aooo..D |  (week #2 daily)
                        |       |
 (week #1 daily - gone) |       |
                        |       |
     o--------o---------C-------E (end-of-week weekly log)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file  permission bits)
  2006-12-10 15:06 ` Santi Béjar
  2006-12-10 17:46   ` Kyle Moffett
@ 2007-01-10  1:39   ` David Lang
  2007-01-10  2:30     ` Shawn O. Pearce
  1 sibling, 1 reply; 34+ messages in thread
From: David Lang @ 2007-01-10  1:39 UTC (permalink / raw)
  To: git

I want to have a tripwire-like system checking the files to make sure that they 
haven't changed unexpectedly. the program I'm looking at notices inode as well 
as timestamp and content changed.

when you checkout a file from git will it re-write/overwrite a file that hasn't 
changed or will it realize there is no change and leave it as-is?

does this answer change if there is a trigger on checkout (to change permissions 
or otherwise manipulate the file)?

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2007-01-10  1:39   ` David Lang
@ 2007-01-10  2:30     ` Shawn O. Pearce
  2007-01-10 18:34       ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Shawn O. Pearce @ 2007-01-10  2:30 UTC (permalink / raw)
  To: David Lang; +Cc: git

David Lang <david.lang@digitalinsight.com> wrote:
> I want to have a tripwire-like system checking the files to make sure that 
> they haven't changed unexpectedly. the program I'm looking at notices inode 
> as well as timestamp and content changed.
> 
> when you checkout a file from git will it re-write/overwrite a file that 
> hasn't changed or will it realize there is no change and leave it as-is?

If the stat data is current it will leave it as-is.  You can force
the index to refresh with `git update-index --refresh` or by running
git status.
 
> does this answer change if there is a trigger on checkout (to change 
> permissions or otherwise manipulate the file)?

Only if the trigger does something in addition, like force overwrite
files.  But we don't have a checkout trigger.  So there's no trigger.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file  permission bits)
  2007-01-10  2:30     ` Shawn O. Pearce
@ 2007-01-10 18:34       ` David Lang
  2007-01-12  0:55         ` Shawn O. Pearce
  0 siblings, 1 reply; 34+ messages in thread
From: David Lang @ 2007-01-10 18:34 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

On Tue, 9 Jan 2007, Shawn O. Pearce wrote:

> David Lang <david.lang@digitalinsight.com> wrote:
>> I want to have a tripwire-like system checking the files to make sure that
>> they haven't changed unexpectedly. the program I'm looking at notices inode
>> as well as timestamp and content changed.
>>
>> when you checkout a file from git will it re-write/overwrite a file that
>> hasn't changed or will it realize there is no change and leave it as-is?
>
> If the stat data is current it will leave it as-is.  You can force
> the index to refresh with `git update-index --refresh` or by running
> git status.

I was looking at checkout, not checkin so I'm not understanding how the index is 
involved here.

>> does this answer change if there is a trigger on checkout (to change
>> permissions or otherwise manipulate the file)?
>
> Only if the trigger does something in addition, like force overwrite
> files.  But we don't have a checkout trigger.  So there's no trigger.

we don't have a checkout trigger? I thought that what Linus had suggested for 
permissions was to have a script triggered on checkin that stored the 
permissions of the files, and a script triggered on checkout that set the 
permissions from the stored file.

if there isn't a checkout trigger how would the permissions ever get set?

in my particular case I'd like to have the checkin run a script that produces a 
'generic' version of each file, and the checkout run a script that converts the 
generic version into the host specific version. I already have a script that 
does this work (and (ab)uses ssh to propogate the generic version to other hosts 
and create the host specific versions there), but I was interested in useing git 
to add better version control to the generic versions of the files (I currently 
use RCS on each box to version control the host specific versions)

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
  2007-01-10 18:34       ` David Lang
@ 2007-01-12  0:55         ` Shawn O. Pearce
  0 siblings, 0 replies; 34+ messages in thread
From: Shawn O. Pearce @ 2007-01-12  0:55 UTC (permalink / raw)
  To: David Lang; +Cc: git

David Lang <david.lang@digitalinsight.com> wrote:
> On Tue, 9 Jan 2007, Shawn O. Pearce wrote:
> >If the stat data is current it will leave it as-is.  You can force
> >the index to refresh with `git update-index --refresh` or by running
> >git status.
> 
> I was looking at checkout, not checkin so I'm not understanding how the 
> index is involved here.

During checkout we use the index to help us decide if a file needs
to be updated with new content or can be left as-is.  Its a cache of
what version each file is at, and its based on the file stat data
(dev, inode, modification date, etc.) to tell us if the file has
been modified or was last created by Git.  If Git was the one that
last modified the file and the version stored in the index matches
the version needed during the checkout, the file is left alone.
But if anything differs then the file gets overwritten.
 
> >>does this answer change if there is a trigger on checkout (to change
> >>permissions or otherwise manipulate the file)?
> >
> >Only if the trigger does something in addition, like force overwrite
> >files.  But we don't have a checkout trigger.  So there's no trigger.
> 
> we don't have a checkout trigger?

No.

> I thought that what Linus had suggested 
> for permissions was to have a script triggered on checkin that stored the 
> permissions of the files, and a script triggered on checkout that set the 
> permissions from the stored file.

Yes.  It is what he suggested.

> if there isn't a checkout trigger how would the permissions ever get set?

Someone needs to implement support for a post-checkout trigger.  _Then_
a checkout trigger could perform this action.

> in my particular case I'd like to have the checkin run a script that 
> produces a 'generic' version of each file,

You may be able to do that in the pre-commit hook by updating the index

-- 
Shawn.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-01-12  0:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
2006-12-10 14:49 ` Jeff Garzik
2006-12-10 15:30   ` Jakub Narebski
2006-12-10 18:10     ` Kyle Moffett
2006-12-10 18:18       ` Jakub Narebski
2006-12-10 18:26       ` Jakub Narebski
2006-12-10 18:35         ` Kyle Moffett
2006-12-11 10:39           ` Andreas Ericsson
2006-12-11 10:55             ` Jeff Garzik
2006-12-11 12:13             ` Josef Weidendorfer
2006-12-11 13:33               ` Johannes Schindelin
2006-12-11 15:07                 ` Josef Weidendorfer
2006-12-10 15:06 ` Santi Béjar
2006-12-10 17:46   ` Kyle Moffett
2006-12-10 18:10     ` Jakub Narebski
2007-01-10  1:39   ` David Lang
2007-01-10  2:30     ` Shawn O. Pearce
2007-01-10 18:34       ` David Lang
2007-01-12  0:55         ` Shawn O. Pearce
2006-12-11 10:50 ` Nikolai Weibull
2006-12-12  3:45 ` Daniel Barkalow
2006-12-12 13:49   ` Kyle Moffett
2006-12-12 15:53     ` Andy Parkins
2006-12-12 22:49       ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
2006-12-12 22:57         ` Johannes Schindelin
2006-12-12 23:06           ` Steven Grimm
2006-12-13  0:01             ` Johannes Schindelin
2006-12-12 23:15         ` Martin Langhoff
2006-12-12 23:23           ` Martin Langhoff
2006-12-12 23:43         ` Using git as a general backup mechanism Junio C Hamano
2006-12-14 23:33           ` Steven Grimm
2006-12-15  0:33             ` Junio C Hamano
2006-12-13 18:10     ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow
2006-12-14  5:06       ` Chris Riddoch

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).