git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Matt Schoen <mtschoen@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: limit memory usage on large repositories
Date: Fri, 26 Jul 2013 23:48:43 -0400	[thread overview]
Message-ID: <20130727034843.GA20846@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJj9RsTjp7j7Ew2pSttKRAZfZ6fLt9jL+Q_vmHQCi16FBBbK=w@mail.gmail.com>

On Wed, Jul 10, 2013 at 05:27:57PM -0500, Matt Schoen wrote:

> I've been using git for some time now, and host my remote bare
> repositories on my shared hosting account at Dreamhost.com.  As a
> protective feature on their shared host setup, they enact a policy
> that kills processes that consume too much memory.  This happens to
> git sometimes.
> 
> By "sometimes" I mean on large repos (>~500MB), when performing
> operations like git gc and git fsck and, most annoyingly, when doing a
> clone.  It seems to happen in the pack phase, but I can't be sure
> exactly.

Do you know how they measure the memory? One of the problems we've had
at GitHub in measuring git's memory usage is that git will mmap the
fairly large packfiles. This can bloat the RSS of the git process. At
the same time, not counting the map is not quite right, either; it is
memory the process is using, but it could stand to give up some of it if
other processes needed it (and that giving up is managed by the kernel,
not by git). So you end up in a situation where you may have a large RSS
precisely _because_ there is no memory pressure on the system, which
leaves the kernel free to leave the mmap'd pages in RAM.

You can reduce the amount of memory you map at once with
core.packedGitWindowSize.

> I've messed around with the config options like pack.threads and
> pack.sizeLimit, and basically anything on the git config manpage that
> mentions memory.  I limit all of these things to 1 or 0 or 1m when
> applicable, just to be sure. To be honest, I really don't know what
> I'm doing ;)

I assume you did pack.deltaCacheSize, which can take a fair bit of
memory during packing (or cloning).

Packing itself takes up a lot, as I think we keep the whole window's
worth of objects in memory at one time (so 10 by default). If you have
large objects, that can spike your memory usage for a moment as we keep
several versions of the large object in memory at once.

If you have such large objects that don't delta well, you can use the
"nodelta" gitattribute so that git doesn't even try them.

> Oddly enough, I'm having trouble reproducing my issue with anything
> but git fsck.  Clones were failing in the past, but after a successful
> git gc, everything seems to be ok(?)

Memory usage for clone should improve after a gc, as we will mostly be
reusing deltas from disk instead of trying to find new ones between
packs.

-Peff

      reply	other threads:[~2013-07-27  3:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-10 22:27 limit memory usage on large repositories Matt Schoen
2013-07-27  3:48 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130727034843.GA20846@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mtschoen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).