git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Figured out how to get Mozilla into git
Date: Fri, 9 Jun 2006 08:01:56 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0606090745390.5498@g5.osdl.org> (raw)
In-Reply-To: <e6b798$td3$1@sea.gmane.org>



On Fri, 9 Jun 2006, Jakub Narebski wrote:
> Jon Smirl wrote:
> 
> >> git-repack -a -d but it OOMs on my 2GB+2GBswap machine :(
> > 
> > We are all having problems getting this to run on 32 bit machines with
> > the 3-4GB process size limitations.
> 
> Is that expected (for 10GB repository if I remember correctly), or is there
> some way to avoid this OOM?

Well, to some degree, the VM limitations are inevitable with huge packs.

The original idea for packs was to avoid making one huge pack, partly 
because it was expected to be really really slow to generate (so 
incremental repacking was a much better strategy), but partly simply 
because trying to map one huge pack is really hard to do.

For various reasons, we ended up mostly using a single pack most of the 
time: it's the most efficient model when the project is reasonably sized, 
and it turns out that with the delta re-use, repacking even moderately 
large projects like the kernel doesn't actually take all that long.

But the fact that we ended up mostly using a single pack for the kernel, 
for example, doesn't mean that the fundamental reasons that git supports 
multiple packs would somehow have gone away. At some point, the project 
gets large enough that one single pack simply isn't reasonable.

So a single 2GB pack is already very much pushing it. It's really really 
hard to map in a 2GB file on a 32-bit platform: your VM is usually 
fragmented enough that it simply isn't practical. In fact, I think the 
limit for _practical_ usage of single packs is probably somewhere in the 
half-gig region, unless you just have 64-bit machines.

And yes, I realize that the "single pack" thing actually ends up having 
become a fact for cloning, for example. Originally, cloning would unpack 
on the receiving end, and leave the repacking to happen there, but that 
obviously sucked. So now when we clone, we always get a single pack. That 
can absolutely be a problem.

I don't know what the right solution is. Single packs _are_ very useful, 
especially after a clone. So it's possible that we should just make the 
pack-reading code be able to map partial packs. But the point is that 
there are certainly ways we can fix this - it's not _really_ fundamental.

It's going to complicate it a bit (damn, how I hate 32-bit VM 
limitations), but the good news is that the whole git model of "everything 
is an individual object" means that it's a very _local_ decision: it will 
probably be painful to re-do some of the pack reading code and have a LRU 
of pack _fragments_ instead of a LRU of packs, but it's only going to 
affect a small part of git, and everything else will never even see it.

So large packs are not really a fundamental problem, but right now we have 
some practical issues with them.

(It's not _just_ packs: running out of memory is also because of 
git-rev-list --objects being pretty memory hungry. I've improved the 
memory usage several times by over 50%, but people keep trying larger 
projects. It used to be that I considered the kernel a large history, now 
we're talking about things that have ten times the number of objects).

Martin - do you have some place to make that big mozilla repo available? 
It would be a good test-case.. 

			Linus

  reply	other threads:[~2006-06-09 15:02 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09  2:17 Figured out how to get Mozilla into git Jon Smirl
2006-06-09  2:56 ` Nicolas Pitre
2006-06-09  3:06 ` Martin Langhoff
2006-06-09  3:28   ` Jon Smirl
2006-06-09  7:17     ` Jakub Narebski
2006-06-09 15:01       ` Linus Torvalds [this message]
2006-06-09 16:11         ` Nicolas Pitre
2006-06-09 16:30           ` Linus Torvalds
2006-06-09 17:38             ` Nicolas Pitre
2006-06-09 17:49               ` Linus Torvalds
2006-06-09 17:10           ` Jakub Narebski
2006-06-09 18:13   ` Jon Smirl
2006-06-09 19:00     ` Linus Torvalds
2006-06-09 20:17       ` Jon Smirl
2006-06-09 20:40         ` Linus Torvalds
2006-06-09 20:56           ` Jon Smirl
2006-06-09 21:57             ` Linus Torvalds
2006-06-09 22:17               ` Linus Torvalds
2006-06-09 23:16               ` Greg KH
2006-06-09 23:37               ` Martin Langhoff
2006-06-09 23:43                 ` Linus Torvalds
2006-06-10  0:00                   ` Jon Smirl
2006-06-10  0:11                     ` Linus Torvalds
2006-06-10  0:16                       ` Jon Smirl
2006-06-10  0:45                         ` Jon Smirl
2006-06-09 20:44         ` Jakub Narebski
2006-06-09 21:05         ` Nicolas Pitre
2006-06-09 21:46           ` Jon Smirl
2006-06-10  1:23         ` Martin Langhoff
2006-06-10  1:14   ` Martin Langhoff
2006-06-10  1:33     ` Linus Torvalds
2006-06-10  1:43       ` Linus Torvalds
2006-06-10  1:48         ` Jon Smirl
2006-06-10  1:59           ` Linus Torvalds
2006-06-10  2:21             ` Jon Smirl
2006-06-10  2:34               ` Carl Worth
2006-06-10  3:08                 ` Linus Torvalds
2006-06-10  8:21                   ` Jakub Narebski
2006-06-10  9:00                     ` Junio C Hamano
2006-06-10  8:36                   ` Rogan Dawes
2006-06-10  9:08                     ` Junio C Hamano
2006-06-10 14:47                       ` Rogan Dawes
2006-06-10 14:58                         ` Jakub Narebski
2006-06-10 15:14                         ` Nicolas Pitre
2006-06-10 17:53                     ` Linus Torvalds
2006-06-10 18:02                       ` Jon Smirl
2006-06-10 18:36                       ` Rogan Dawes
2006-06-10  3:01               ` Linus Torvalds
2006-06-10  2:30             ` Jon Smirl
2006-06-10  3:41             ` Martin Langhoff
2006-06-10  3:55               ` Junio C Hamano
2006-06-10  4:02               ` Linus Torvalds
2006-06-10  4:11                 ` Linus Torvalds
2006-06-10  6:02                   ` Jon Smirl
2006-06-10  6:15                     ` Junio C Hamano
2006-06-10 15:44                       ` Jon Smirl
2006-06-10 16:15                         ` Timo Hirvonen
2006-06-10 18:37                         ` Petr Baudis
2006-06-10 18:55                         ` Lars Johannsen
2006-06-11 22:00       ` Nicolas Pitre
2006-06-18 19:26         ` Linus Torvalds
2006-06-18 21:40           ` Martin Langhoff
2006-06-18 22:36             ` Linus Torvalds
2006-06-18 22:51               ` Broken PPC sha1.. (Re: Figured out how to get Mozilla into git) Linus Torvalds
2006-06-18 23:25                 ` [PATCH] Fix PPC SHA1 routine for large input buffers Paul Mackerras
2006-06-19  5:02                   ` Linus Torvalds
2006-06-09  3:12 ` Figured out how to get Mozilla into git Pavel Roskin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0606090745390.5498@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).