From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Martin Fick <mfick@codeaurora.org>,
git discussion list <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Thu, 24 May 2012 14:05:16 +0200 [thread overview]
Message-ID: <4FBE23FC.5070405@alum.mit.edu> (raw)
In-Reply-To: <20120522173355.GC11600@sigill.intra.peff.net>
On 05/22/2012 07:33 PM, Jeff King wrote:
> On Tue, May 22, 2012 at 03:28:32PM +0200, Michael Haggerty wrote:
>
>>> When I try it with both 'next' and v1.7.10, I see that the latter is
>>> much faster. I did my tests with a warm cache, but the interesting
>>> number is the CPU time, which is quite different.
>>
>> I cannot reproduce anything as big as the performance regression that
>> you see. I did find a regression 9.5 s -> 10.1 s caused by
>>
>> 5fa044184 find_containing_dir(): use strbuf in implementation of this
>> function
>>
>> It is fixed by the patch that I just sent to the mailing list [1],
>> which sizes the strbuf in that function to strlen(refname) instead of
>> PATH_MAX. Since your experiments suggest that the performance
>> regression is related to the size of the repository contents, it
>> could be that the same test produces more memory pressure on your
>> system and therefore a larger effect. Please try the patch and tell
>> me if it fixes the problem for you.
>
> That patch drops about a second off of the slow case, but the main
> problem still remains. Just to be sure we are both doing the exact same
> thing, here is a complete reproduction recipe:
>
> GIT=/path/to/your/git/checkout
> RAILS=/path/to/unpacked/rails.git
>
> cd $GIT&&
> git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8^&&
> make&&
>
> cd $RAILS&&
> time $GIT/bin-wrappers/git fetch . refs/*:refs/*&&
>
> cd $GIT&&
> git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8&&
> make&&
>
> cd $RAILS&&
> time $GIT/bin-wrappers/git fetch . refs/*:refs/*
>
> produces:
>
> [before]
> real 0m9.128s
> user 0m9.369s
> sys 0m0.976s
>
> [after]
> real 0m15.926s
> user 0m16.181s
> sys 0m0.984s
>
> I don't think memory pressure is involved. The git process maxes out at
> ~300M, and this machine has 7.5G available.
>
> I wonder why we are getting different results. Could it be compilation
> options? As I mentioned, I compile with -O0, but I got similar results
> with -O3.
Thanks for the idiot-proof reproduction steps. I don't know what I was
doing differently last time, but now I can reproduce your slowdown.
The thing that triggers the problem is a reference directory
("refs/remotes/8514/pull/") with 3810 sub-namespaces
("refs/remotes/8514/pull/1091", "refs/remotes/8514/pull/1092", etc),
each subnamespace containing only one or two references. It wouldn't be
a problem having so many *references* in a namespace, because references
are just added to the end of the directory unsorted, and typically only
sorted once after all of them have been added. But every time that a
sub-namespace is accessed, the namespace has to be searched to see if
that sub-namespace already exists. The searching requires the namespace
to be sorted. So, for example, when adding the following sequence of
references:
1. refs/remotes/8514/pull/1091/head
2. refs/remotes/8514/pull/1091/merge
3. refs/remotes/8514/pull/1092/head
4. refs/remotes/8514/pull/1092/merge
5. refs/remotes/8514/pull/1093/head
6. refs/remotes/8514/pull/1093/merge
At step 1, sub-namespace "refs/remotes/8514/pull/1091/" is added to
"refs/remotes/8514/pull/". This makes the code think that
"refs/remotes/8514/pull/" is unsorted.
Step 2 is not problematic; the new references is added to
"refs/remotes/8514/pull/1091/", but adding a reference doesn't require
the ref_dir to be sorted.
At step 3, namespace "refs/remotes/8514/pull/" is first checked to see
if sub-namespace "refs/remotes/8514/pull/1092/" already exists. This
search requires namespace "refs/remotes/8514/pull/" to be sorted because
step 1 caused it to be considered unsorted.
Again at step 5, namespace "refs/remotes/8514/pull/" needs to be sorted,
and so on every time a subnamespace is added.
I will submit a patch shortly.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2012-05-24 12:12 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-21 8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21 9:09 ` Junio C Hamano
2012-05-21 9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14 ` Jeff King
2012-05-21 22:17 ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08 ` Junio C Hamano
2012-05-22 20:23 ` Jeff King
2012-05-24 6:04 ` Jeff King
2012-05-21 22:17 ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19 ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19 ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23 ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16 ` Junio C Hamano
2012-05-21 23:52 ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22 0:07 ` Jeff King
2012-05-22 3:59 ` Michael Haggerty
2012-05-22 4:11 ` Jeff King
2012-05-22 7:15 ` Michael Haggerty
2012-05-22 7:37 ` Jeff King
2012-05-22 13:28 ` Michael Haggerty
2012-05-22 17:33 ` Jeff King
2012-05-24 12:05 ` Michael Haggerty [this message]
2012-05-25 0:17 ` Martin Fick
2012-05-25 0:39 ` Jeff King
2012-05-25 0:54 ` Martin Fick
2012-05-25 1:04 ` Jeff King
2012-05-25 1:32 ` Martin Fick
2012-05-25 6:50 ` Jeff King
2012-05-22 12:18 ` Nguyen Thai Ngoc Duy
2012-05-22 13:35 ` Michael Haggerty
2012-05-22 17:01 ` Jeff King
2012-05-22 17:35 ` Junio C Hamano
2012-05-22 17:46 ` Jeff King
2012-05-24 4:54 ` Michael Haggerty
2012-05-23 1:20 ` Nguyen Thai Ngoc Duy
2012-05-22 16:16 ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41 ` Jeff King
2012-05-21 22:08 ` Junio C Hamano
2012-05-21 22:24 ` Jeff King
2012-05-22 5:51 ` Martin Fick
2012-05-22 18:21 ` Jeff King
2012-05-22 22:19 ` Martin Fick
2012-05-22 23:23 ` Junio C Hamano
2012-05-23 0:46 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBE23FC.5070405@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mfick@codeaurora.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).