git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Iucha, Florin" <Florin.Iucha@amd.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: High locking contention during repack?
Date: Wed, 12 Dec 2018 06:24:10 -0500	[thread overview]
Message-ID: <20181212112409.GB30673@sigill.intra.peff.net> (raw)
In-Reply-To: <SN1PR12MB23840AFE62E41D908A40D1B095A70@SN1PR12MB2384.namprd12.prod.outlook.com>

On Wed, Dec 12, 2018 at 03:01:47AM +0000, Iucha, Florin wrote:

> I am running “git-repack  -A -d -f -F --window=250 --depth=250” on a
> Git repository converted using git-svn.

Sort of tangential to your question, but:

  - Using "-F" implies "-f" already, so you don't need both. That said,
    you are probably wasting CPU to use "-F", unless you have adjusted
    zlib compression settings since the last pack. (Whereas using "-f"
    is useful, if you're willing to pay the CPU tradeoff).

  - Using --depth=250 does not actually decrease the packfile size very
    much, and results in a packfile which is more expensive for
    subsequent processes to use. Some experimentation showed that
    --depth=50 is a sweet spot, and that is the default for both normal
    "git gc" and "git gc --aggressive" these days.

    See 07e7dbf0db (gc: default aggressive depth to 50, 2016-08-11) for
    more discussion.

> The system is a 16 core / 32 thread Threadripper with 128GB of RAM and
> NVMe storage. The repack starts strong, with 32 threads but it fairly
> quickly gets to 99% done and the number of threads drops to 4 then 3
> then 2. However, running “dstat 5” I see lots of “sys” time without
> any IO time (the network traffic you see is caused by SSH).

This sounds mostly normal and expected. The parallel part of a repack is
the delta search, which is not infinitely parallelizable. Each worker
thread is given a "chunk" of objects, and it uses a sliding window to
search for delta pairs through that chunk. You don't want a chunk that
approaches the window size, since at every chunk boundary you're missing
delta possibilities.

The default chunk size is about 1/nr_threads of the total list size
(i.e., we portion out all the work). And then when a thread finishes, we
take work from the thread with the most work remaining, and portion it
out. However, at some point the chunks approach their minimum, and we
stop dividing. So the number of threads will drop, eventually to 1, and
you'll be waiting on it to finish that final chunk.

So that's all working as planned. Having high sys load does seem a bit
odd. Most of the effort should be going to reading the mmap'd data from
disk, zlib-inflating it and computing a fingerprint, and then comparing
the fingerprints. So that would mostly be user time.

> Running a strace on the running git-repack process shows only these:
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
> 
> Any idea on how to debug this? I have ran git-repack under gdb, but it seems to spin on builtin/repack.c line 409.

The heavy lifting here is done by the pack-objects child process, not
git-repack itself. Try running with GIT_TRACE=1 in the environment to
see the exact invocation, but timing and debugging:

  git pack-objects --all --no-reuse-delta --delta-base-offset --stdout \
    </dev/null >/dev/null

should produce interesting results.

The SIGALRM loop you see above is likely just the progress meter
triggering once per second (the actual worker threads are updating an
int, and then at least once per second we'll show the int).

-Peff

  reply	other threads:[~2018-12-12 11:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12  3:01 High locking contention during repack? Iucha, Florin
2018-12-12 11:24 ` Jeff King [this message]
2018-12-12 14:04   ` Ævar Arnfjörð Bjarmason
2018-12-12 16:49   ` Iucha, Florin
2018-12-12 16:54     ` Ævar Arnfjörð Bjarmason
2018-12-12 18:08     ` Iucha, Florin
2018-12-12 18:30       ` Iucha, Florin
2018-12-12 19:05         ` Iucha, Florin
2018-12-12 21:50           ` Iucha, Florin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181212112409.GB30673@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Florin.Iucha@amd.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).