git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* limit memory usage on large repositories
@ 2013-07-10 22:27 Matt Schoen
  2013-07-27  3:48 ` Jeff King
  0 siblings, 1 reply; 2+ messages in thread
From: Matt Schoen @ 2013-07-10 22:27 UTC (permalink / raw)
  To: git

Hi there,

I've been using git for some time now, and host my remote bare
repositories on my shared hosting account at Dreamhost.com.  As a
protective feature on their shared host setup, they enact a policy
that kills processes that consume too much memory.  This happens to
git sometimes.

By "sometimes" I mean on large repos (>~500MB), when performing
operations like git gc and git fsck and, most annoyingly, when doing a
clone.  It seems to happen in the pack phase, but I can't be sure
exactly.

I've messed around with the config options like pack.threads and
pack.sizeLimit, and basically anything on the git config manpage that
mentions memory.  I limit all of these things to 1 or 0 or 1m when
applicable, just to be sure. To be honest, I really don't know what
I'm doing ;)

Oddly enough, I'm having trouble reproducing my issue with anything
but git fsck.  Clones were failing in the past, but after a successful
git gc, everything seems to be ok(?)

Anyway, I'd like some advice on what settings limit memory usage, and
exactly how to determine what the memory usage will be with certain
values.

Thanks!

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: limit memory usage on large repositories
  2013-07-10 22:27 limit memory usage on large repositories Matt Schoen
@ 2013-07-27  3:48 ` Jeff King
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2013-07-27  3:48 UTC (permalink / raw)
  To: Matt Schoen; +Cc: git

On Wed, Jul 10, 2013 at 05:27:57PM -0500, Matt Schoen wrote:

> I've been using git for some time now, and host my remote bare
> repositories on my shared hosting account at Dreamhost.com.  As a
> protective feature on their shared host setup, they enact a policy
> that kills processes that consume too much memory.  This happens to
> git sometimes.
> 
> By "sometimes" I mean on large repos (>~500MB), when performing
> operations like git gc and git fsck and, most annoyingly, when doing a
> clone.  It seems to happen in the pack phase, but I can't be sure
> exactly.

Do you know how they measure the memory? One of the problems we've had
at GitHub in measuring git's memory usage is that git will mmap the
fairly large packfiles. This can bloat the RSS of the git process. At
the same time, not counting the map is not quite right, either; it is
memory the process is using, but it could stand to give up some of it if
other processes needed it (and that giving up is managed by the kernel,
not by git). So you end up in a situation where you may have a large RSS
precisely _because_ there is no memory pressure on the system, which
leaves the kernel free to leave the mmap'd pages in RAM.

You can reduce the amount of memory you map at once with
core.packedGitWindowSize.

> I've messed around with the config options like pack.threads and
> pack.sizeLimit, and basically anything on the git config manpage that
> mentions memory.  I limit all of these things to 1 or 0 or 1m when
> applicable, just to be sure. To be honest, I really don't know what
> I'm doing ;)

I assume you did pack.deltaCacheSize, which can take a fair bit of
memory during packing (or cloning).

Packing itself takes up a lot, as I think we keep the whole window's
worth of objects in memory at one time (so 10 by default). If you have
large objects, that can spike your memory usage for a moment as we keep
several versions of the large object in memory at once.

If you have such large objects that don't delta well, you can use the
"nodelta" gitattribute so that git doesn't even try them.

> Oddly enough, I'm having trouble reproducing my issue with anything
> but git fsck.  Clones were failing in the past, but after a successful
> git gc, everything seems to be ok(?)

Memory usage for clone should improve after a gc, as we will mostly be
reusing deltas from disk instead of trying to find new ones between
packs.

-Peff

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-07-27  3:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 22:27 limit memory usage on large repositories Matt Schoen
2013-07-27  3:48 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).