git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "W. David Jarvis" <william.d.jarvis@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: Reducing CPU load on git server
Date: Wed, 31 Aug 2016 02:02:41 -0400	[thread overview]
Message-ID: <20160831060240.xg6axpqnlclys7ic@sigill.intra.peff.net> (raw)
In-Reply-To: <CAFMAO9zfMtF6Vc+Uyt+UdgOVf4hzOqMN3o8QLf8pRHx3_U4DPQ@mail.gmail.com>

On Mon, Aug 29, 2016 at 03:41:59PM -0700, W. David Jarvis wrote:

> We have an open support thread with the GHE support folks, but the
> only feedback we've gotten so far on this subject is that they believe
> our CPU load is being driven by the quantity of fetch operations (an
> average of 16 fetch requests per second during normal business hours,
> so 4 requests per second per subscriber box). About 4,000 fetch
> requests on our main repository per day.

Hmm. That might well be, but I'd have to see the numbers to say more. At
this point I think we're exhausting what is useful to talk about on the
Git list.  I'll point GHE Support at this thread, which might help your
conversation with them (and they might pull me in behind the scenes).

I'll try to answer any Git-specific questions here, though.

> > None of those operations is triggered by client fetches. You'll see
> > "rev-list" for a variety of operations, so that's hard to pinpoint. But
> > I'm surprised that "prune" is a common one for you. It is run as part of
> > the post-push, but I happen to know that the version that ships on GHE
> > is optimized to use bitmaps, and to avoid doing any work when there are
> > no loose objects that need pruning in the first place.
> 
> Would regular force-pushing trigger prune operations? Our engineering
> body loves to force-push.

No, it shouldn't make a difference. For stock git, "prune" will only be
run occasionally as part of "git gc". On GitHub Enterprise, every push
kicks off a "sync" job that moves objects from a specific fork into
storage shared by all of the related forks. So GHE will run prune more
often than stock git would, but force-pushing wouldn't have any effect
on that.

There are also some custom patches to optimize prune on GHE, so it
shouldn't generally be very expensive. Unless perhaps for some reason
the reachability bitmaps on your repository aren't performing very well.

You could try something like comparing:

  time git rev-list --objects --all >/dev/null

and

  time git rev-list --objects --all --use-bitmap-index >/dev/null

on your server. The second should be a lot faster. If it's not, that may
be an indication that Git could be doing a better job of selecting
bitmap commits (that code is not GitHub-specific at all).

> >> I might be misunderstanding this, but if the subscriber is already "up
> >> to date" modulo a single updated ref tip, then this problem shouldn't
> >> occur, right? Concretely: if ref A is built off of ref B, and the
> >> subscriber already has B when it requests A, that shouldn't cause this
> >> behavior, but it would cause this behavior if the subscriber didn't
> >> have B when it requested A.
> >
> > Correct. So this shouldn't be a thing you are running into now, but it's
> > something that might be made worse if you switch to fetching only single
> > refs.
> 
> But in theory if we were always up-to-date (since we'd always fetch
> any updated ref) we wouldn't run into this problem? We could have a
> cron job to ensure that we run a full git fetch every once in a while
> but I'd hope that if this was written properly we'd almost always have
> the most recent commit for any dependency ref.

It's a little more complicated than that. What you're really going for
is letting git reuse on-disk deltas when serving fetches. But depending
on when the last repack was run, we might be cobbling together the fetch
from multiple packs on disk, in which case there will still be some
delta search. In my experience that's not _generally_ a big deal,
though. Small fetches don't have that many deltas to find.

> Our primary repository is fairly busy. It has about 1/3 the commits of
> Linux and about 1/3 the refs, but seems otherwise to be on the same
> scale. And, of course, it both hasn't been around for as long as Linux
> has and has been experiencing exponential growth, which means its
> current activity is higher than it has ever been before -- might put
> it on a similar scale to Linux's current activity.

Most of the work for repacking scales with the number of total objects
(not quite linearly, though).  For torvalds/linux (and its forks),
that's around 16 million objects. You might try "git count-objects -v"
on your server for comparison (but do it in the "network.git" directory,
as that's the shared object storage).

-Peff

  reply	other threads:[~2016-08-31  6:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-28 19:42 Reducing CPU load on git server W. David Jarvis
2016-08-28 21:20 ` Jakub Narębski
2016-08-28 23:18   ` W. David Jarvis
2016-08-29  5:47 ` Jeff King
2016-08-29 10:46   ` Jakub Narębski
2016-08-29 17:18     ` Jeff King
2016-08-29 19:16   ` W. David Jarvis
2016-08-29 21:31     ` Jeff King
2016-08-29 22:41       ` W. David Jarvis
2016-08-31  6:02         ` Jeff King [this message]
2016-08-30 10:46       ` git blame <directory> [was: Reducing CPU load on git server] Jakub Narębski
2016-08-31  5:42         ` Jeff King
2016-08-31  7:28           ` Dennis Kaarsemaker
2016-08-29 20:14 ` Reducing CPU load on git server Ævar Arnfjörð Bjarmason
2016-08-29 20:57   ` W. David Jarvis
2016-08-29 21:31     ` Dennis Kaarsemaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160831060240.xg6axpqnlclys7ic@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=william.d.jarvis@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).