git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Colin Stolley" <cstolley@runbox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH] packfile.c: speed up loading lots of packfiles.
Date: Tue, 3 Dec 2019 17:17:30 -0500	[thread overview]
Message-ID: <20191203221730.GA28419@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq7e3d75vk.fsf@gitster-ct.c.googlers.com>

On Tue, Dec 03, 2019 at 08:04:15AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Good catch. The issue is that we only add entries to the hashmap in
> > prepare_packed_git(), but they may be added to the pack list by other
> > callers of install_packed_git(). It probably makes sense to just push
> > the hashmap maintenance down into that function, like below.
> 
> Makes sense to me.
> 
> Let me locally squash your fix in and credit you with helped-by
> footer in the amended log message.  Strictly speaking, this may
> invalidate the perf numbers, but I do not think the scenario p5303
> sets up alone is all that interesting anyway---if you have 10,000
> packs, not just registering them (which is improved with this patch)
> but using objects from them would be slower than necessary X-<.

Thanks, that sounds good.

I actually re-checked the perf numbers (mostly to make sure I didn't
screw anything up) and got similar results.

I agree that 10,000 packs is ridiculous, but we do see it (and worse)
occasionally from people pushing in a loop before our scheduled
maintenance kicks in.  And it's quadratic, so if you hit 30k packs, it's
another factor of 9 worse. It makes even diagnosing the repository
pretty painful. :)

Also a fun fact: Linux actually has a limit on the number of
simultaneous mmaps that a process can have, which defaults to ~64k. But
if you have if you have 32k packs, then we'll map both the packs and the
idx files. Plus whatever you need for mapping the binary, libraries,
etc, plus any maps opened by malloc() for large requests.

I have occasionally run into this trying to repack some very
out-of-control cases (the magic fix is to double the limit with `sysctl
-w vm.max_map_count=131060`, if you are curious). I also wondered if
this might be made worse by the recent change to drop
release_pack_memory(). But I ran into it even before that change,
because zlib calls malloc() directly. We're also pretty aggressive about
dying when mmap() returns an error (rather than closing packs and trying
again).

I think Git _could_ be handle this more gracefully by just trying to
keep fewer packs open at one time (the way we similarly try not to use
up all of the file descriptors). But I have a hard time caring too much,
since it's such a ridiculous situation in the first place. Bumping the
limits is an easy operational fix.

-Peff

  parent reply	other threads:[~2019-12-03 22:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 22:24 [PATCH] packfile.c: speed up loading lots of packfiles Colin Stolley
2019-11-28  0:42 ` hashmap vs khash? " Eric Wong
2019-11-30 17:36   ` Junio C Hamano
2019-12-02 14:39   ` Jeff King
2019-12-02 17:40 ` SZEDER Gábor
2019-12-02 19:42   ` Jeff King
2019-12-03  6:17     ` Taylor Blau
2019-12-03 15:34       ` Jeff King
2019-12-03 16:04     ` Junio C Hamano
2019-12-03 17:33       ` Colin Stolley
2019-12-03 22:18         ` Jeff King
2019-12-04 18:15           ` Junio C Hamano
2019-12-03 22:17       ` Jeff King [this message]
2019-12-04  4:23         ` Jonathan Nieder
2019-12-03  6:19 ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203221730.GA28419@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=cstolley@runbox.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).