From: Mike Hommey <mh@glandium.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Tue, 4 Sep 2007 20:04:29 +0200 [thread overview]
Message-ID: <20070904180429.GA626@glandium.org> (raw)
In-Reply-To: <9e4733910709041044r71264346n341d178565dd0521@mail.gmail.com>
On Tue, Sep 04, 2007 at 01:44:47PM -0400, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 9/4/07, Junio C Hamano <gitster@pobox.com> wrote:
> > "Jon Smirl" <jonsmirl@gmail.com> writes:
> >
> > > Another way of looking at the problem,
> > >
> > > Let's build a full-text index for git. You put a string into the index
> > > and it returns the SHAs of all the file nodes that contain the string.
> > > How do I recover the path names of these SHAs?
> >
> > That question does not make much sense without specifying "which
> > commit's path you are talking about".
> >
> > If you want to encode such "contextual information" in addition
> > to "contents", you could do so, but you essentially need to
> > record commit + pathname + mode bits + contents as "blob" and
> > hash that to come up with a name.
>
> I left the details out of the full-text example to make it more
> obvious that we can't recover the path names.
>
> Doing this type of analysis may point out that even more fields are
> missing from the blob table such as commit id.
>
> The current data store design is not very flexible. Databases solved
> the flexibility problem long ago. I'm just wondering if we should
> steal some good ideas out of the database world and apply them to git.
> Ten years from now we may have 100GB git databases and really wish we
> had more flexible ways of querying them.
>
> The reason databases don't encode the fields into the index is that
> you can only have a single index on the table if you do that.
> Databases do sometimes duplicate the field in both the index and the
> table. Databases also have the property that indexes are just a cache
> and can be dropped at any time.
The big difference between a database and git is that a database is a
general purpose tool. git has a much more restricted scope. As such, it
doesn't need *that much* flexibility.
Mike
next prev parent reply other threads:[~2007-09-04 18:06 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07 ` Mike Hommey
2007-09-04 16:10 ` Andreas Ericsson
2007-09-04 16:19 ` Jon Smirl
2007-09-04 16:29 ` Andreas Ericsson
2007-09-04 17:09 ` Jeff King
2007-09-04 20:17 ` David Tweed
2007-09-04 17:21 ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31 ` Andreas Ericsson
2007-09-04 16:47 ` Jon Smirl
2007-09-04 16:51 ` Andreas Ericsson
2007-09-04 17:25 ` Junio C Hamano
2007-09-04 17:44 ` Jon Smirl
2007-09-04 18:04 ` Mike Hommey [this message]
2007-09-04 19:44 ` Reece Dunn
2007-09-04 18:06 ` Junio C Hamano
2007-09-04 21:25 ` Theodore Tso
2007-09-04 21:54 ` Jon Smirl
2007-09-05 7:18 ` Andreas Ericsson
2007-09-05 13:41 ` Jon Smirl
2007-09-05 14:51 ` Andreas Ericsson
2007-09-05 15:37 ` Jon Smirl
2007-09-05 15:54 ` Julian Phillips
2007-09-05 16:12 ` Jon Smirl
2007-09-05 17:31 ` Julian Phillips
2007-09-06 1:27 ` Kyle Moffett
2007-09-05 17:39 ` Mike Hommey
2007-09-06 8:49 ` Andreas Ericsson
2007-09-06 9:09 ` Junio C Hamano
2007-09-06 11:03 ` Wincent Colaiuta
2007-09-06 12:56 ` Johannes Schindelin
2007-09-06 18:14 ` Steven Grimm
2007-09-07 0:33 ` Martin Langhoff
2007-09-05 19:52 ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30 ` Jon Smirl
2007-09-04 18:51 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070904180429.GA626@glandium.org \
--to=mh@glandium.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).