From: Kyle Moffett <mrmacman_g4@mac.com>
To: Daniel Barkalow <barkalow@iabervon.org>
Cc: git@vger.kernel.org
Subject: Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits)
Date: Tue, 12 Dec 2006 08:49:26 -0500 [thread overview]
Message-ID: <8900B938-1360-4A67-AB15-C9E84255107B@mac.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0612111837210.20138@iabervon.org>
On Dec 11, 2006, at 22:45:25, Daniel Barkalow wrote:
> The first thing you'd want to do is correct the fact that the index
> doesn't keep full permissions. We decided long ago that we don't
> want to track more than 0100, but we're discarding the rest between
> the filesystem and the index, rather than between the index and the
> tree. (This is weird of us, since we keep gid and uid in the index,
> as changedness heuristics, but don't keep permissions; of course,
> we'd have to apply umask to the index when we check it out to sync
> what we expect to be there with what has actually been created.)
>
> I think that would be the only change needed to the index and index/
> working directory connection, although it might be necessary to
> support longer values for uid/gid/etc, since they'd be important
> data now.
Hmm, ok. It would seem to be a reasonable requirement that if you
want to change any of the "preserve_*_attributes" config options you
need to blow away and recreate your index, no? I would probably
change the underlying index format pretty completely and stick a new
version tag inside it.
> Note that git only stores content, not incidental information. But
> a lot of information which is incidental in a source tree is
> content in /etc. This implies that /etc and working/linux-2.6 are
> fundamentally different sorts of things, because different aspects
> of them are content.
Ahh, I hadn't thought of it that way before but that makes a lot of
sense. Thanks!
> I'd suggest a new object type for a directory with permissions,
> ACLs, and so forth. It should probably use symbolic owner and
> group, too. My guess is that you'll want to use "commit"s, the new
> object type, and "blob"s. Everything that uses trees would need to
> have a version that uses the new type. But I think that you
> generally want different behavior anyway, so that's not a major issue.
Ok, seems straightforward enough. One other thing that crossed my
mind was figuring out how to handle hardlinks. The simplest solution
would be to add an extra layer of indirection between the "file
inode" and the "file data". Instead of your directory pointing to a
"file-data" blob and "file-attributes" object, it would point to an
"file-inode" object with embedded attribute data and a pointer to the
file contents blob.
I remember reading some discussions from the early days of GIT about
how that was considered and discarded because the extra overhead
wouldn't give any real tangible benefit. On the other hand for
something like /etc the added benefits of tracking extended
attributes and hardlinks might outweigh the cost of a bunch of extra
objects in the database. A bit of care with the construction of the
index file should make it sufficiently efficient for day-to-day usage.
If you're interested in some random musings about using GIT concepts
to version whole filesystems (think checkpointing your disk drive and
instantly restoring when you screw up), read on below, otherwise
don't bother.
Cheers,
Kyle Moffett
<Random Tangential Off-the-Wall Thought Experiment>
NOTE: This probably belongs in it's own thread but it's such a
random, undeveloped, and off-the-wall concept that I threw it in here
just for kicks.
Combining extensions like those described above with something like
the Ext3 block-allocation, inode-management and journalling code to
produce a "versioned filesystem". With the exponential growth of
storage density over the last several years we've gotten to the point
where we can many many hours of extremely realistic video and audio
on your average small-computer drive. Versioning your home
directory, or even your entire computer, even with fairly steady
modifications to multimedia files, installation of software programs,
etc, doesn't seem like such an impossible undertaking anymore.
One predefined inode would contain a list of tags/heads and their
current hashes. Mount the filesystem with a "tag=$TAG" option to
specify the initial tree object used for the root directory (with
syscalls to navigate the history). Allocate an inode per-mount to
represent any changes from the last commit.
For efficiency purposes (no need to revision the entire system when I
commit a change in my home directory) add a "subtree" object type
which can specify either a particular hash or a symbolic tag/head
name as a pseudo sub-mountpoint. Trap traversal of the sub-
mountpoint node to mount the filesystem with "tag=$SUBTAG" on the sub-
mountpoint, expiring it some time after the last traversal.
The only remaining issue would be properly navigating through the
history, preserving or discarding changes. Since the kernel could
easily manage copy-on-write semantics for underlying disk blocks you
wouldn't need a separate "working copy" except where it's modified
from the original, and discarding changes is as simple as unlinking
any files referenced by the per-mount delta inode.
Committing changes would get tricky, you would need to hot-remap
memory-mapped pages read-only while you checksum and store them. The
next write attempt would then separate the page from the freshly-
committed on-disk version. Would need a mechanism for applications
to "trap" the commit so they could make databases consistent, with
the ability for root or the mountpoint owner to commit without
waiting for synchronization. Only needs to synchronize files
belonging to the new commit. Merges would be managed from userspace,
as long as there is a way to browse through objects by hash given
sufficient permissions.
Make sure it's really easy to make a new atomic commit and/or reset
to a known state every time the computer is rebooted (whether soft-
rebooted or via crash/powerkill). With journalling and the write-
once nature of GIT it would be trivial to never require an fsck run.
Also needs a way to move data between filesystems. Makes LVM largely
irrelevant; it doesn't matter how many disks you have if they're all
treated as a shared storage pool for your GITfs data. Make sure it's
possible to archive data onto slower disks/media and purge older
commits from the archive (missing parent commit references are
tolerable in many situations). Needs a way to notice hash collisions
and take action to avoid them.
</Random Tangential Off-the-Wall Thought Experiment>
Cheers,
Kyle Moffett
next prev parent reply other threads:[~2006-12-12 13:49 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett
2006-12-10 14:49 ` Jeff Garzik
2006-12-10 15:30 ` Jakub Narebski
2006-12-10 18:10 ` Kyle Moffett
2006-12-10 18:18 ` Jakub Narebski
2006-12-10 18:26 ` Jakub Narebski
2006-12-10 18:35 ` Kyle Moffett
2006-12-11 10:39 ` Andreas Ericsson
2006-12-11 10:55 ` Jeff Garzik
2006-12-11 12:13 ` Josef Weidendorfer
2006-12-11 13:33 ` Johannes Schindelin
2006-12-11 15:07 ` Josef Weidendorfer
2006-12-10 15:06 ` Santi Béjar
2006-12-10 17:46 ` Kyle Moffett
2006-12-10 18:10 ` Jakub Narebski
2007-01-10 1:39 ` David Lang
2007-01-10 2:30 ` Shawn O. Pearce
2007-01-10 18:34 ` David Lang
2007-01-12 0:55 ` Shawn O. Pearce
2006-12-11 10:50 ` Nikolai Weibull
2006-12-12 3:45 ` Daniel Barkalow
2006-12-12 13:49 ` Kyle Moffett [this message]
2006-12-12 15:53 ` Andy Parkins
2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm
2006-12-12 22:57 ` Johannes Schindelin
2006-12-12 23:06 ` Steven Grimm
2006-12-13 0:01 ` Johannes Schindelin
2006-12-12 23:15 ` Martin Langhoff
2006-12-12 23:23 ` Martin Langhoff
2006-12-12 23:43 ` Using git as a general backup mechanism Junio C Hamano
2006-12-14 23:33 ` Steven Grimm
2006-12-15 0:33 ` Junio C Hamano
2006-12-13 18:10 ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow
2006-12-14 5:06 ` Chris Riddoch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8900B938-1360-4A67-AB15-C9E84255107B@mac.com \
--to=mrmacman_g4@mac.com \
--cc=barkalow@iabervon.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).