From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: upload-pack is slow with lots of refs
Date: Wed, 3 Oct 2012 22:16:56 +0200 [thread overview]
Message-ID: <CACBZZX4Grya=FbL9XEh_EK6KVsFZYWCuHveV2QevcBwr+iYTMQ@mail.gmail.com> (raw)
In-Reply-To: <20121003180324.GB27446@sigill.intra.peff.net>
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Oct 03, 2012 at 02:36:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I'm creating a system where a lot of remotes constantly fetch from a
>> central repository for deployment purposes, but I've noticed that even
>> with a remote.$name.fetch configuration to only get certain refs a
>> "git fetch" will still call git-upload pack which will provide a list
>> of all references.
>>
>> This is being done against a repository with tens of thousands of refs
>> (it has a tag for each deployment), so it ends up burning a lot of CPU
>> time on the uploader/receiver side.
>
> Where is the CPU being burned? Are your refs packed (that's a huge
> savings)? What are the refs like? Are they .have refs from an alternates
> repository, or real refs? Are they pointing to commits or tag objects?
>
> What version of git are you using? In the past year or so, I've made
> several tweaks to speed up large numbers of refs, including:
>
> - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note
> that this only helps if they are being pulled in by an alternates
> repo. And even then, it only helps if they are mostly duplicates;
> distinct ones are still O(n^2).
>
> - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates)
> a0de288 (fetch-pack: avoid quadratic loop in filter_refs)
> Both in v1.7.11. I think there is still a potential quadratic loop
> in mark_complete()
>
> - 90108a2 (upload-pack: avoid parsing tag destinations)
> 926f1dd (upload-pack: avoid parsing objects during ref advertisement)
> Both in v1.7.10. Note that tag objects are more expensive to
> advertise than commits, because we have to load and peel them.
>
> Even with those patches, though, I found that it was something like ~2s
> to advertise 100,000 refs.
I can't provide all the details now (not with access to that machine
now), but briefly:
* The git client/server version is 1.7.8
* The repository has around 50k refs, they're "real" refs, almost all
of them (say all but 0.5k-1k) are annotated tags, the rest are
branches.
* >99% of them are packed, there's a weekly cronjob that packs them
all up, there were a few newly pushed branches and tags outside of
the
* I tried "echo -n | git upload-pack <repo>" on both that 50k
repository and a repository with <100 refs, the former took around
~1-2s to run on a 24 core box and the latter ~500ms.
* When I ran git-upload-pack with GNU parallel I managed around 20/s
packs on the 24 core box on the 50k ref one, 40/s on the 100 ref
one.
* A co-worker who was working on this today tried it on 1.7.12 and
claimed that it had the same performance characteristics.
* I tried to profile it under gcc -pg && echo -n | ./git-upload-pack
<repo> but it doesn't produce a profile like that, presumably
because the process exits unsuccessfully.
Maybe someone here knows offhand what mock data I could feed
git-upload-pack to make it happy to just list the refs, or better
yet do a bit more work which it would do if it were actually doing
the fetch (I suppose I could just do a fetch, but I wanted to do
this from a locally compiled checkout).
>> Has there been any work on extending the protocol so that the client
>> tells the server what refs it's interested in?
>
> I don't think so. It would be hard to do in a backwards-compatible way,
> because the advertisement is the first thing the server says, before it
> has negotiated any capabilities with the client at all.
I suppose at least for the ssh protocol we could just do:
ssh server "(git upload-pack <repo> --refs=* || git upload-pack <repo>)"
And something similar with HTTP headers, but that of course leaves the
git:// protocol.
next prev parent reply other threads:[~2012-10-04 21:50 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-03 12:36 upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 13:06 ` Nguyen Thai Ngoc Duy
2012-10-03 18:03 ` Jeff King
2012-10-03 18:53 ` Junio C Hamano
2012-10-03 18:55 ` Jeff King
2012-10-03 19:41 ` Shawn Pearce
2012-10-03 20:13 ` Jeff King
2012-10-04 21:52 ` Sascha Cunz
2012-10-05 0:20 ` Jeff King
2012-10-05 6:24 ` Johannes Sixt
2012-10-05 16:57 ` Shawn Pearce
2012-10-08 15:05 ` Johannes Sixt
2012-10-09 6:46 ` Shawn Pearce
2012-10-09 20:30 ` Johannes Sixt
2012-10-09 20:46 ` Johannes Sixt
2012-10-03 20:16 ` Ævar Arnfjörð Bjarmason [this message]
2012-10-03 21:20 ` Jeff King
2012-10-03 22:15 ` Ævar Arnfjörð Bjarmason
2012-10-03 23:15 ` Jeff King
2012-10-03 23:54 ` Ævar Arnfjörð Bjarmason
2012-10-04 7:56 ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04 7:58 ` [PATCH 1/4] peel_ref: use faster deref_tag_noverify Jeff King
2012-10-04 18:24 ` Junio C Hamano
2012-10-04 8:00 ` [PATCH 2/4] peel_ref: do not return a null sha1 Jeff King
2012-10-04 18:32 ` Junio C Hamano
2012-10-04 8:02 ` [PATCH 3/4] peel_ref: check object type before loading Jeff King
2012-10-04 19:06 ` Junio C Hamano
2012-10-04 19:41 ` Jeff King
2012-10-04 20:41 ` Junio C Hamano
2012-10-04 21:59 ` Jeff King
2012-10-04 8:03 ` [PATCH 4/4] upload-pack: use peel_ref for ref advertisements Jeff King
2012-10-04 8:04 ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04 9:01 ` Ævar Arnfjörð Bjarmason
2012-10-04 12:14 ` Nazri Ramliy
2012-10-03 22:32 ` upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 23:21 ` Jeff King
2012-10-03 23:47 ` Ævar Arnfjörð Bjarmason
2012-10-03 19:13 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACBZZX4Grya=FbL9XEh_EK6KVsFZYWCuHveV2QevcBwr+iYTMQ@mail.gmail.com' \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).