From: Nicolas Pitre <nico@cam.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Roman Shaposhnik <rvs@Sun.COM>,
git@vger.kernel.org
Subject: Re: Achieving efficient storage of weirdly structured repos
Date: Fri, 04 Apr 2008 09:11:34 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.1.00.0804040844470.2947@xanadu.home> (raw)
In-Reply-To: <m3tziita2y.fsf@localhost.localdomain>
On Thu, 3 Apr 2008, Jakub Narebski wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > On Thu, 3 Apr 2008, Roman Shaposhnik wrote:
> >>
> >> The last item (trees) also seem to take the most space and the most
> >> reasonable explanation that I can offer is that NetBeans repository has
> >> a really weird structure where they have approximately 700 (yes, seven
> >> hundred!) top-level subdirectories there. They are clearly
> >> Submodules-shy, but that's another issue that I will need to address
> >> with them.
> >
> > Trees taking the biggest amount of space is not unheard of, and it may
> > also be that the name heuristics (for finding good packing partners) could
> > be failign, which would result in a much bigger pack than necessary.
> >
> > So if you already did an aggressive repack like the above, I'd happily
> > take a look at whether maybe it's bad heuristics for finding tree objects
> > to pair up for delta-compression. Do you have a place where you can put
> > that repo for people to clone and look at?
>
> Hmmm... I wonder if it would be the case that would speed-up
> development of pack v4.
Not really. Pack v4 won't magically shrink a repository to less than
half the pack v3 size.
I think we're simply facing the same situation as with the initial GCC
repository which shrank from 3GB down to 300MB or so due to misfitted
repacking parameters.
> If I remember correctly one of bigger changes
> was the way trees were represented in pack; the biggest improvement
> was for trees.
Yes, but that wasn't really so much about size but rather access speed
by not deflating them. The pack v4 tree representation would certainly
help, of course, but I suspect that simply repacking with more
aggressive window/depth arguments would be even more effective in this
case.
> One of bigger hindrances, as I understand it, in developing pack v4
> was the fact that it didn't offer that much of improvement in typical
> cases for the work needed... but perhaps "your" repository would be
> good showcase for pack v4.
The biggest hindrance for pack v4 is actually the lack of a native
runtime tree walking, and having both tree object formats properly and
optimally abstracted has not been looked at yet.
Speed is the primary goal for pack v4. The fact that it also provides a
10% pack reduction is only consequential. But without native tree
walking we must recreate the legacy tree format on the fly each time a
tree object is loaded which dwarfs any improvements pack v4 is aiming
for (yes it is still a little bit faster than pack v3 nevertheless, but
not yet significantly enough to overcome the incompatibility costs).
Nicolas (who wishes he was still a student with plenty of hacking time)
next prev parent reply other threads:[~2008-04-04 13:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-03 19:42 Achieving efficient storage of weirdly structured repos Roman Shaposhnik
2008-04-03 21:11 ` Linus Torvalds
2008-04-04 6:21 ` Jakub Narebski
2008-04-04 13:11 ` Nicolas Pitre [this message]
2008-04-04 14:16 ` Pieter de Bie
2008-04-05 3:24 ` Shawn O. Pearce
2008-04-04 23:30 ` Roman Shaposhnik
2008-04-04 23:57 ` Linus Torvalds
2008-04-06 0:13 ` Roman Shaposhnik
2008-04-06 0:48 ` Linus Torvalds
2008-04-06 16:10 ` Jeff King
2008-04-07 0:13 ` Nicolas Pitre
2008-04-07 0:18 ` Jeff King
2008-04-07 0:36 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.00.0804040844470.2947@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=rvs@Sun.COM \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).