From: Martin Fick <mfick@codeaurora.org>
To: Jeff King <peff@peff.net>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
git discussion list <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Tue, 22 May 2012 16:19:30 -0600 [thread overview]
Message-ID: <201205221619.31738.mfick@codeaurora.org> (raw)
In-Reply-To: <20120522182157.GB20305@sigill.intra.peff.net>
On Tuesday, May 22, 2012 12:21:57 pm Jeff King wrote:
> On Mon, May 21, 2012 at 11:51:16PM -0600, Martin Fick
wrote:
> AFAIK, ref advertisement scales linearly. Which is
> probably not acceptable over a slow link, and we could
> do better.
Right, although it is not just a slow link issue. Linear or
not, this problem does not scale, it needs to be based on N,
the number of refs which need updating, not N the number of
refs in the repo.
I gave the following numbers to Junio and Shawn recently and
I figure it is probably worth mentioning them here to
perhaps give others some insights into just how bad this
problem can be...
Android consists of approximately 400 projects, and if
anyone tags their builds regularly, that means that they
will be tagging 400 projects per build. We have something
like 10K tags on average across 400 projects, so when we do
a simple No Op sync just to see if all 400 projects are up
to date, this yields about 200MB of data over the wire (4M
refs)!!!
That 200MB is using a -j4 on repo sync and it chews up about
1.5 cores on a high end server to serve that 200MB for over
1min. Now imagine a build bot needing to build about 25
images in parallel all just doing a No Op sync to see if
they are up to date! That's 25 * 200MB = 5GB data and 25 *
1.5 cores ~= 36 cores. That means our high end 24 core
server falls over all for a No Op.
As you can imagine, we really would like to see this
eliminated from our workflows. If we want to check 400
projects to see if they are up to date, it should be 400
refs, not 400 * 10K,
-Martin
--
Employee of Qualcomm Innovation Center, Inc. which is a
member of Code Aurora Forum
next prev parent reply other threads:[~2012-05-22 22:19 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-21 8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21 9:09 ` Junio C Hamano
2012-05-21 9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14 ` Jeff King
2012-05-21 22:17 ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08 ` Junio C Hamano
2012-05-22 20:23 ` Jeff King
2012-05-24 6:04 ` Jeff King
2012-05-21 22:17 ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19 ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19 ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23 ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16 ` Junio C Hamano
2012-05-21 23:52 ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22 0:07 ` Jeff King
2012-05-22 3:59 ` Michael Haggerty
2012-05-22 4:11 ` Jeff King
2012-05-22 7:15 ` Michael Haggerty
2012-05-22 7:37 ` Jeff King
2012-05-22 13:28 ` Michael Haggerty
2012-05-22 17:33 ` Jeff King
2012-05-24 12:05 ` Michael Haggerty
2012-05-25 0:17 ` Martin Fick
2012-05-25 0:39 ` Jeff King
2012-05-25 0:54 ` Martin Fick
2012-05-25 1:04 ` Jeff King
2012-05-25 1:32 ` Martin Fick
2012-05-25 6:50 ` Jeff King
2012-05-22 12:18 ` Nguyen Thai Ngoc Duy
2012-05-22 13:35 ` Michael Haggerty
2012-05-22 17:01 ` Jeff King
2012-05-22 17:35 ` Junio C Hamano
2012-05-22 17:46 ` Jeff King
2012-05-24 4:54 ` Michael Haggerty
2012-05-23 1:20 ` Nguyen Thai Ngoc Duy
2012-05-22 16:16 ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41 ` Jeff King
2012-05-21 22:08 ` Junio C Hamano
2012-05-21 22:24 ` Jeff King
2012-05-22 5:51 ` Martin Fick
2012-05-22 18:21 ` Jeff King
2012-05-22 22:19 ` Martin Fick [this message]
2012-05-22 23:23 ` Junio C Hamano
2012-05-23 0:46 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201205221619.31738.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).