git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Mike Hommey <mh@glandium.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Git's database structure
Date: Tue, 4 Sep 2007 20:04:29 +0200	[thread overview]
Message-ID: <20070904180429.GA626@glandium.org> (raw)
In-Reply-To: <9e4733910709041044r71264346n341d178565dd0521@mail.gmail.com>

On Tue, Sep 04, 2007 at 01:44:47PM -0400, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 9/4/07, Junio C Hamano <gitster@pobox.com> wrote:
> > "Jon Smirl" <jonsmirl@gmail.com> writes:
> >
> > > Another way of looking at the problem,
> > >
> > > Let's build a full-text index for git. You put a string into the index
> > > and it returns the SHAs of all the file nodes that contain the string.
> > > How do I recover the path names of these SHAs?
> >
> > That question does not make much sense without specifying "which
> > commit's path you are talking about".
> >
> > If you want to encode such "contextual information" in addition
> > to "contents", you could do so, but you essentially need to
> > record commit + pathname + mode bits + contents as "blob" and
> > hash that to come up with a name.
> 
> I left the details out of the full-text example to make it more
> obvious that we can't recover the path names.
> 
> Doing this type of analysis may point out that even more fields are
> missing from the blob table such as commit id.
> 
> The current data store design is not very flexible. Databases solved
> the flexibility problem long ago. I'm just wondering if we should
> steal some good ideas out of the database world and apply them to git.
> Ten years from now we may have 100GB git databases and really wish we
> had more flexible ways of querying them.
> 
> The reason databases don't encode the fields into the index is that
> you can only have a single index on the table if you do that.
> Databases do sometimes duplicate the field in both the index and the
> table. Databases also have the property that indexes are just a cache
> and can be dropped at any time.

The big difference between a database and git is that a database is a
general purpose tool. git has a much more restricted scope. As such, it
doesn't need *that much* flexibility.

Mike

  reply	other threads:[~2007-09-04 18:06 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-04 15:23 Git's database structure Jon Smirl
2007-09-04 15:55 ` Andreas Ericsson
2007-09-04 16:07   ` Mike Hommey
2007-09-04 16:10     ` Andreas Ericsson
2007-09-04 16:19   ` Jon Smirl
2007-09-04 16:29     ` Andreas Ericsson
2007-09-04 17:09     ` Jeff King
2007-09-04 20:17     ` David Tweed
2007-09-04 17:21   ` Junio C Hamano
2007-09-04 16:28 ` Jon Smirl
2007-09-04 16:31   ` Andreas Ericsson
2007-09-04 16:47     ` Jon Smirl
2007-09-04 16:51       ` Andreas Ericsson
2007-09-04 17:25   ` Junio C Hamano
2007-09-04 17:44     ` Jon Smirl
2007-09-04 18:04       ` Mike Hommey [this message]
2007-09-04 19:44         ` Reece Dunn
2007-09-04 18:06       ` Junio C Hamano
2007-09-04 21:25       ` Theodore Tso
2007-09-04 21:54         ` Jon Smirl
2007-09-05  7:18           ` Andreas Ericsson
2007-09-05 13:41             ` Jon Smirl
2007-09-05 14:51               ` Andreas Ericsson
2007-09-05 15:37                 ` Jon Smirl
2007-09-05 15:54                   ` Julian Phillips
2007-09-05 16:12                     ` Jon Smirl
2007-09-05 17:31                       ` Julian Phillips
2007-09-06  1:27                         ` Kyle Moffett
2007-09-05 17:39                       ` Mike Hommey
2007-09-06  8:49                       ` Andreas Ericsson
2007-09-06  9:09                         ` Junio C Hamano
2007-09-06 11:03                           ` Wincent Colaiuta
2007-09-06 12:56                       ` Johannes Schindelin
2007-09-06 18:14                         ` Steven Grimm
2007-09-07  0:33                       ` Martin Langhoff
2007-09-05 19:52               ` Andy Parkins
2007-09-04 17:19 ` Julian Phillips
2007-09-04 17:30   ` Jon Smirl
2007-09-04 18:51     ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070904180429.GA626@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonsmirl@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).