From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff Hostetler <git@jeffhostetler.com>, Jeff King <peff@peff.net>,
Git Mailing List <git@vger.kernel.org>,
Ben Peart <peartben@gmail.com>,
Jameson Miller <jameson.miller81@gmail.com>
Subject: Re: Question about the ahead-behind computation and status
Date: Fri, 15 Dec 2017 14:40:28 -0500 [thread overview]
Message-ID: <5fd76b6f-15d0-b8ca-710b-d6289a63b9b4@gmail.com> (raw)
In-Reply-To: <xmqqa7yjrghd.fsf@gitster.mtv.corp.google.com>
On 12/15/2017 1:30 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> The biggest reason for the 20 seconds is not just the number of
>> commits in the ahead/behind but how many commits are walked (including
>> common to both branches) before paint_down_to_common() breaks its
>> while loop due to queue_has_nonstale().
> Hmm, queue_has_nonstale() looks to see if any element is not STALE
> (where the definition of STALE is "known to be a common ancestor")
> by potentially checking all elements in the queue. I wonder if we
> have an opportunity for a trivial optimization? When the caller
> knows that it dug one level and added the parents that are not
> stale, it does not have to ask queue_has_nonstale() if there is any
> non stale element, for example.
I thought this, too, but my tracing did not show significant time spent
in this method. 99% of the time is spent unzipping and parsing commits.
If this was taking too long, then we could track a minimum timestamp for
a commit that entered the queue in a non-stale state, but this will
delay the termination condition a bit since commits can be marked stale
after they enter the queue.
> What do you exactly mean by "not just the number of commits in the
> ahead/behind"? Aren't the number of these commits pretty much
> proportional to the number of commits we need to paint down to
> common ancestors? Is the latter a lot larger than the former
> (i.e. are we somehow not stopping when we _could_ notice that we
> can with better information)?
>
>
With the wide history, there is actually a large set of commits that are
in the common history but have newer commit times than the oldest commit
in only one history. Consider the following ASCII art:
A
|
1
|
2
|
3
|\
4 B
|\|
5 C
|\|
6 D
|\|
.
.
.
|\|
N Z
|/
0
Between A and B, A is ahead by commits {A,1,2,3,4,5,6,...,N}. Meanwhile,
commits B,C,D,...,Z are in the common history, but still must be walked.
Now imagine these two sets are actually much MUCH wider (thousands of
commits that are pairwise non-ancestors). This causes the number of
walked commits to be larger than just the number of commits in the
symmetric difference of the histories.
Unfortunately, generation numbers are not the only optimization needed
to make this call be sub-100ms. A graph data structure that lists the
edges between commits would prevent the time spent in zlib decompressing
and parsing commits. I'm working on investigating how such a data
structure (and corresponding file format) could integrate in the
commit-walking code.
Thanks,
-Stolee
prev parent reply other threads:[~2017-12-15 19:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-14 21:49 Question about the ahead-behind computation and status Jeff Hostetler
2017-12-15 10:08 ` Jeff King
2017-12-15 15:08 ` Jeff Hostetler
2017-12-15 15:43 ` Derrick Stolee
2017-12-15 18:30 ` Junio C Hamano
2017-12-15 19:40 ` Derrick Stolee [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5fd76b6f-15d0-b8ca-710b-d6289a63b9b4@gmail.com \
--to=stolee@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jameson.miller81@gmail.com \
--cc=peartben@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).