git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Arnaud Bertrand <xda@abalgo.com>
Cc: git@vger.kernel.org
Subject: Re: Possible improvement in DB structure
Date: Mon, 23 Dec 2019 19:09:50 +0000	[thread overview]
Message-ID: <20191223190950.GA6240@camp.crustytoothpaste.net> (raw)
In-Reply-To: <CAEW0o+gwbNyDqmiouFzO16LsRUfcAnSwj9K77oGe5hi=EVMB=w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1718 bytes --]

On 2019-12-23 at 13:00:46, Arnaud Bertrand wrote:
> Hello,
> 
> According to my understanding, git has only 3 kinds of objects:
> (excluding the packed version)
> - the blobs
> - the trees
> - the commits

There are also tags.

> Today to parse all objects of the same type, it is necessary to parse
> all the objects and test them one by one.

This isn't a behavior we often want.  Can you say more about why you
want to do this?

> May be due to my limited knowledge of git, I don't see any advantage
> to put everything together.
> By splitting the objects directory, the gain in performance could be
> important, the scripts simplified, the representation more clear.

Oftentimes, we want to look up an item that we would refer to as a
tree-ish.  That means that any tag, commit, or tree can be used in this
case and it will automatically be resolved to the appropriate tree.

Currently, we can look for any loose object, and then look for any
packed object, which is a limited number of lookups (at most, the number
of packs plus one).  Your proposal would have us look up at most the
number of packs plus six.

In addition, we sometimes know that we need to look up an object, but
don't know its type.  We would incur additional costs in this case as
well.

I'm not sure that we would gain a lot other than conceptual tidiness,
but we would incur additional performance costs.  We can currently
distinguish between the type of all of these objects by simply reading
the object header, which on a 64-bit system cannot exceed 28 bytes,
which we do in some cases, such as `git cat-file --batch`.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

  reply	other threads:[~2019-12-23 19:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-23 13:00 Possible improvement in DB structure Arnaud Bertrand
2019-12-23 19:09 ` brian m. carlson [this message]
2019-12-23 20:46   ` Arnaud Bertrand
2019-12-23 21:41     ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191223190950.GA6240@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=xda@abalgo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).