git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Roman Shaposhnik <rvs@Sun.COM>,
	git@vger.kernel.org
Subject: Re: Achieving efficient storage of weirdly structured repos
Date: Fri, 04 Apr 2008 09:11:34 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.1.00.0804040844470.2947@xanadu.home> (raw)
In-Reply-To: <m3tziita2y.fsf@localhost.localdomain>

On Thu, 3 Apr 2008, Jakub Narebski wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Thu, 3 Apr 2008, Roman Shaposhnik wrote:
> >> 
> >> The last item (trees) also seem to take the most space and the most 
> >> reasonable explanation that I can offer is that NetBeans repository has 
> >> a really weird structure where they have approximately 700 (yes, seven 
> >> hundred!) top-level subdirectories there. They are clearly 
> >> Submodules-shy, but that's another issue that I will need to address 
> >> with them.
> > 
> > Trees taking the biggest amount of space is not unheard of, and it may 
> > also be that the name heuristics (for finding good packing partners) could 
> > be failign, which would result in a much bigger pack than necessary. 
> > 
> > So if you already did an aggressive repack like the above, I'd happily 
> > take a look at whether maybe it's bad heuristics for finding tree objects 
> > to pair up for delta-compression. Do you have a place where you can put 
> > that repo for people to clone and look at? 
> 
> Hmmm... I wonder if it would be the case that would speed-up
> development of pack v4.

Not really.  Pack v4 won't magically shrink a repository to less than 
half the pack v3 size.

I think we're simply facing the same situation as with the initial GCC 
repository which shrank from 3GB down to 300MB or so due to misfitted 
repacking parameters.

> If I remember correctly one of bigger changes
> was the way trees were represented in pack; the biggest improvement
> was for trees.

Yes, but that wasn't really so much about size but rather access speed 
by not deflating them. The pack v4 tree representation would certainly 
help, of course, but I suspect that simply repacking with more 
aggressive window/depth arguments would be even more effective in this 
case.

> One of bigger hindrances, as I understand it, in developing pack v4
> was the fact that it didn't offer that much of improvement in typical
> cases for the work needed... but perhaps "your" repository would be
> good showcase for pack v4.

The biggest hindrance for pack v4 is actually the lack of a native 
runtime tree walking, and having both tree object formats properly and 
optimally abstracted has not been looked at yet.

Speed is the primary goal for pack v4.  The fact that it also provides a 
10% pack reduction is only consequential.  But without native tree 
walking we must recreate the legacy tree format on the fly each time a 
tree object is loaded which dwarfs any improvements pack v4 is aiming 
for (yes it is still a little bit faster than pack v3 nevertheless, but 
not yet significantly enough to overcome the incompatibility costs).


Nicolas (who wishes he was still a student with plenty of hacking time)

  reply	other threads:[~2008-04-04 13:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-03 19:42 Achieving efficient storage of weirdly structured repos Roman Shaposhnik
2008-04-03 21:11 ` Linus Torvalds
2008-04-04  6:21   ` Jakub Narebski
2008-04-04 13:11     ` Nicolas Pitre [this message]
2008-04-04 14:16       ` Pieter de Bie
2008-04-05  3:24       ` Shawn O. Pearce
2008-04-04 23:30   ` Roman Shaposhnik
2008-04-04 23:57     ` Linus Torvalds
2008-04-06  0:13       ` Roman Shaposhnik
2008-04-06  0:48         ` Linus Torvalds
2008-04-06 16:10           ` Jeff King
2008-04-07  0:13             ` Nicolas Pitre
2008-04-07  0:18               ` Jeff King
2008-04-07  0:36                 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.1.00.0804040844470.2947@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=rvs@Sun.COM \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).