git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Shawn Pearce <spearce@spearce.org>,
	David Michael Barr <b@rr-dav.id.au>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [RFC] pack-objects: compression level for non-blobs
Date: Tue, 1 Jan 2013 11:15:58 +0700	[thread overview]
Message-ID: <CACsJy8CygfaM+Ee6rURFB-cP2khO8URGDJMG2f3mqg0ebYz+8Q@mail.gmail.com> (raw)
In-Reply-To: <CAJo=hJtjtpiPVd6Koy9q5je7s7A4EyDa-CptJNCnHLSLgd9W7g@mail.gmail.com>

On Tue, Jan 1, 2013 at 1:06 AM, Shawn Pearce <spearce@spearce.org> wrote:
>>   3. Dropping the "commits" file and just using the pack-*.idx as the
>>      index. The problem is that it is sparse in the commit space. So
>>      just naively storing 40 bytes per entry is going to waste a lot of
>>      space. If we had a separate index as in (1) above, that could be
>>      dropped to (say) 4 bytes of offset per object. But still, right now
>>      the commits file for linux-2.6 is about 7.2M (20 bytes times ~376K
>>      commits). There are almost 3 million total objects, so even storing
>>      4 bytes per object is going to be worse.
>
> Fix pack-objects to behave the way JGit does, cluster commits first in
> the pack stream. Now you have a dense space of commits. If I remember
> right this has a tiny positive improvement for most rev-list
> operations with very little downside.

I was going to suggest a similar thing. The current state of C Git's
pack writing is not bad. We mix commits and tags together, but tags
are few usually. Once we get the upper and lower bound, in terms of
object position in the pack, of the commit+tag region, we could reduce
the waste significantly. That is if you sort the cache by the object
order in the pack.
-- 
Duy

  reply	other threads:[~2013-01-01  4:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-26  6:25 [RFC] pack-objects: compression level for non-blobs David Michael Barr
2012-11-26 12:35 ` David Michael Barr
2012-12-29  0:41 ` Jeff King
2012-12-29  4:34   ` Nguyen Thai Ngoc Duy
2012-12-29  5:07     ` Jeff King
2012-12-29  5:25       ` Nguyen Thai Ngoc Duy
2012-12-29  5:27         ` Jeff King
2012-12-29  9:05           ` Jeff King
2012-12-29  9:48             ` Jeff King
2012-12-30 12:05           ` Jeff King
2012-12-30 12:53             ` Nguyen Thai Ngoc Duy
2012-12-30 21:31               ` Jeff King
2012-12-31 18:06                 ` Shawn Pearce
2013-01-01  4:15                   ` Duy Nguyen [this message]
2013-01-01 12:10                     ` Duy Nguyen
2013-01-01 17:17                       ` Shawn Pearce
2013-01-01 23:47                         ` Junio C Hamano
2013-01-02  2:23                         ` Duy Nguyen
2013-01-01 20:02                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACsJy8CygfaM+Ee6rURFB-cP2khO8URGDJMG2f3mqg0ebYz+8Q@mail.gmail.com \
    --to=pclouds@gmail.com \
    --cc=b@rr-dav.id.au \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).