Re: Question about the ahead-behind computation and status

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff Hostetler <git@jeffhostetler.com>, Jeff King <peff@peff.net>,
	Git Mailing List <git@vger.kernel.org>,
	Ben Peart <peartben@gmail.com>,
	Jameson Miller <jameson.miller81@gmail.com>
Subject: Re: Question about the ahead-behind computation and status
Date: Fri, 15 Dec 2017 14:40:28 -0500	[thread overview]
Message-ID: <5fd76b6f-15d0-b8ca-710b-d6289a63b9b4@gmail.com> (raw)
In-Reply-To: <xmqqa7yjrghd.fsf@gitster.mtv.corp.google.com>

On 12/15/2017 1:30 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> The biggest reason for the 20 seconds is not just the number of
>> commits in the ahead/behind but how many commits are walked (including
>> common to both branches) before paint_down_to_common() breaks its
>> while loop due to queue_has_nonstale().
> Hmm, queue_has_nonstale() looks to see if any element is not STALE
> (where the definition of STALE is "known to be a common ancestor")
> by potentially checking all elements in the queue.  I wonder if we
> have an opportunity for a trivial optimization?  When the caller
> knows that it dug one level and added the parents that are not
> stale, it does not have to ask queue_has_nonstale() if there is any
> non stale element, for example.

I thought this, too, but my tracing did not show significant time spent 
in this method. 99% of the time is spent unzipping and parsing commits.

If this was taking too long, then we could track a minimum timestamp for 
a commit that entered the queue in a non-stale state, but this will 
delay the termination condition a bit since commits can be marked stale 
after they enter the queue.

> What do you exactly mean by "not just the number of commits in the
> ahead/behind"?  Aren't the number of these commits pretty much
> proportional to the number of commits we need to paint down to
> common ancestors?  Is the latter a lot larger than the former
> (i.e. are we somehow not stopping when we _could_ notice that we
> can with better information)?
>
>

With the wide history, there is actually a large set of commits that are 
in the common history but have newer commit times than the oldest commit 
in only one history. Consider the following ASCII art:

   A
   |
   1
   |
   2
   |
   3
   |\
   4 B
   |\|
   5 C
   |\|
   6 D
   |\|
    .
    .
    .
   |\|
   N Z
   |/
   0

Between A and B, A is ahead by commits {A,1,2,3,4,5,6,...,N}. Meanwhile, 
commits B,C,D,...,Z are in the common history, but still must be walked.

Now imagine these two sets are actually much MUCH wider (thousands of 
commits that are pairwise non-ancestors). This causes the number of 
walked commits to be larger than just the number of commits in the 
symmetric difference of the histories.

Unfortunately, generation numbers are not the only optimization needed 
to make this call be sub-100ms. A graph data structure that lists the 
edges between commits would prevent the time spent in zlib decompressing 
and parsing commits. I'm working on investigating how such a data 
structure (and corresponding file format) could integrate in the 
commit-walking code.

Thanks,
-Stolee

     prev parent reply	other threads:[~2017-12-15 19:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-14 21:49 Question about the ahead-behind computation and status Jeff Hostetler
2017-12-15 10:08 ` Jeff King
2017-12-15 15:08   ` Jeff Hostetler
2017-12-15 15:43     ` Derrick Stolee
2017-12-15 18:30       ` Junio C Hamano
2017-12-15 19:40         ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fd76b6f-15d0-b8ca-710b-d6289a63b9b4@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jameson.miller81@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).