From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Julian Phillips" <julian@quantumfyre.co.uk>
Cc: "Git Mailing List" <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Tue, 4 Sep 2007 13:30:30 -0400 [thread overview]
Message-ID: <9e4733910709041030ye912369nd574a5f78d3f521b@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0709041816340.29009@reaper.quantumfyre.co.uk>
On 9/4/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> On Tue, 4 Sep 2007, Jon Smirl wrote:
>
> > Let's back up a little bit from "Caclulating tree node". What are the
> > elements of git's data structures?
> >
> > Right now we have an index structure (tree nodes) integrated in to a
> > base table. Integrating indexing into the data is not normally done in
> > a database. Doing a normalization analysis like this may expose flaws
> > in the way the data is structured. Of course we may also decide to
> > leave everything the way it is.
> >
> > What about the special status of a rename? In the current model we
> > effectively have three tables.
> >
> > commit - a set of all SHAs in the commit, previous commit, comment, author, etc
> > blob - a file, permissions, etc.
> > file names - name, SHA
> >
> > The file name table is encoded as an index and it has been
> > intermingled with the commit table.
> >
> > Looking at this from a set theory angle brings up the question, do we
> > really have three tables and file names are an independent variable
> > from the blobs, or should file names be an attribute of the blob?
>
> There isn't a one-to-one mapping of file names to blobs. The blob only
> describes the contents of the file. In the extreme case you could have
> one blob for every single file in your tree. For example:
>
> # git ls-tree -r HEAD
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c bar/foo
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo2
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo3
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo4
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo5
> 100644 blob 05303ef858aeeb01ca40590dd6fe65928096ee6c foo6
Both schemes support aliasing. In the flat scheme you would create a
second blob which contains the file and the aliased path name. When
the blob gets delta'd the second copy of the file will disappear.
I'm not proposing a change to data being stored in git, it is a
proposal to consider the impacts of how this data has been normalized
in the data store.
> > How this gets structured in the db is an independent question about
> > how renames get detected on a commit. The current scheme for detecting
> > renames by comparing diffs is working fine. The question is, once we
> > detect a rename how should it be stored?
> >
> > Ignoring the performance impacts and looking at the problem from the
> > set theory view point, should:
> > the pathnames be in their own table with a row for each alias
> > the pathnames be stored as an attribute of the blob
> >
> > Both of these are the same information, we're just looking at how
> > things are normalized.
> >
> >
>
> --
> Julian
>
> ---
> "You shouldn't make my toaster angry."
> -- Household security explained in "Johnny Quest"
>
--
Jon Smirl
jonsmirl@gmail.com
next prev parent reply other threads:[~2007-09-04 17:30 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07 ` Mike Hommey
2007-09-04 16:10 ` Andreas Ericsson
2007-09-04 16:19 ` Jon Smirl
2007-09-04 16:29 ` Andreas Ericsson
2007-09-04 17:09 ` Jeff King
2007-09-04 20:17 ` David Tweed
2007-09-04 17:21 ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31 ` Andreas Ericsson
2007-09-04 16:47 ` Jon Smirl
2007-09-04 16:51 ` Andreas Ericsson
2007-09-04 17:25 ` Junio C Hamano
2007-09-04 17:44 ` Jon Smirl
2007-09-04 18:04 ` Mike Hommey
2007-09-04 19:44 ` Reece Dunn
2007-09-04 18:06 ` Junio C Hamano
2007-09-04 21:25 ` Theodore Tso
2007-09-04 21:54 ` Jon Smirl
2007-09-05 7:18 ` Andreas Ericsson
2007-09-05 13:41 ` Jon Smirl
2007-09-05 14:51 ` Andreas Ericsson
2007-09-05 15:37 ` Jon Smirl
2007-09-05 15:54 ` Julian Phillips
2007-09-05 16:12 ` Jon Smirl
2007-09-05 17:31 ` Julian Phillips
2007-09-06 1:27 ` Kyle Moffett
2007-09-05 17:39 ` Mike Hommey
2007-09-06 8:49 ` Andreas Ericsson
2007-09-06 9:09 ` Junio C Hamano
2007-09-06 11:03 ` Wincent Colaiuta
2007-09-06 12:56 ` Johannes Schindelin
2007-09-06 18:14 ` Steven Grimm
2007-09-07 0:33 ` Martin Langhoff
2007-09-05 19:52 ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30 ` Jon Smirl [this message]
2007-09-04 18:51 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e4733910709041030ye912369nd574a5f78d3f521b@mail.gmail.com \
--to=jonsmirl@gmail.com \
--cc=git@vger.kernel.org \
--cc=julian@quantumfyre.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).