git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>,
	me@ttaylorr.com, peff@peff.net,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 1/3] commit-reach: use one walk in remove_redundant()
Date: Sun, 31 Jan 2021 22:55:52 -0500	[thread overview]
Message-ID: <5d1e909e-b2e2-afda-d8ee-a3c32d25b01d@gmail.com> (raw)
In-Reply-To: <xmqqft2gonuh.fsf@gitster.c.googlers.com>

On 1/31/2021 3:25 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>>> So, the inner loop makes sure we won't revisit STALE parent, but
>>> keep digging parents we haven't seen, and stop when the generation
>>> is old enough.  What happens when there is no generation number
>>> computed yet, I wonder...  We'll keep getting infinity and dig all
>>> the way down to root?
>>
>> If we are on commits that have no generation number yet, then we
>> will walk until reaching commits in the commit-graph file that have
>> a computed generation (or in the heuristic case, when we have reached
>> all but one of the commits).
>>
>> In the case of the commit-graph, all commits will have generation
>> number "infinity". In such a case, perhaps the old algorithm _is_
>> the best we can do, at least for now.
> 
> Hmph, I am afraid that such is life X-<.
> 
>> One way to ensure we do not regress from the current behavior
>> would be to condition the new algorithm with
>>
>> 	if (generation_numbers_enabled(the_repository))
>> 		new_algorithm();
>> 	else
>> 		old_algorithm();
>>
>> much like in repo_is_descendant_of().
>>
>> Is that a good plan?
> 
> It would certainly avoid one particular form of regression, so it is
> better than nothing.
> 
> But at the same time, we'd probably want to encourage people to
> enable and maintain generation numbers for the majority of commits
> in their repository, but unless you "gc" twice a day or something,
> you'd inevitably have a mixture, say, all commits that are more than
> two weeks old are covered by commit-graph, but more recent ones are
> not yet enumerated, and you have to traverse at runtime.
> 
> And the performance characteristics we would care the most in the
> longer term is to make sure that we perform well in such a mixed
> environment for the parts of the history that are not old enough.

You're right, that in the mixed case of "all of these input commits
have infinite generation number" is the worst possible case for
this algorithm. However, if even a single commit has a finite
generation number, then this new algorithm should out-perform the
one using paint_down_to_common() in all cases.

I tested the painful case of

	git merge-base --independent 2ecedd756908 d2360a398f0b

in the Linux kernel repository with different commit-graph files.
I found that the new algorithm performed worse than the old
algorithm until there were fewer than 2,500 commits reachable
from these two but not in the commit-graph file. That's probably
too small of a gap to expect of a typical user.

I will modify my check

	if (generation_numbers_enabled(r)) {
		int i;

		/*
		 * If we have a single commit with finite generation
		 * number, then the _with_gen algorithm is preferred.
		 */
		for (i = 0; i < cnt; i++) {
			if (commit_graph_generation(array[i]) < GENERATION_NUMBER_INFINITY)
				return remove_redundant_with_gen(r, array, cnt);
		}
	}

	return remove_redundant_no_gen(r, array, cnt);

> Many things can be sped up by precomputing and storing the result in
> the commit-graph file and that is not all that interesting or
> surprising part of the story, I would think.  Rather, we want to
> ensure that we do not perform on the youngest part of the history
> any worse---that way, people will have strong incentive to enable
> commit-graph, as things will work superbly for older parts of the
> history, while not doing any worse than the original system for the
> newest parts of the history.

I'm definitely biased to a case where there is an up-to-date
commit-graph file, since I care most about server-side performance
or users with background maintenance enabled (and computing the
commit-graph frequently). You are right to call out that that is
not always the case.
 
> There was a side thread where somebody wished if they can remove
> support for all the codepaths that do not use commit-graph, but
> would this be an example of how such a wish is impractical, I have
> to wonder?

Some cases might allow de-duplication, such as
sort_in_topological_order(), that would have roughly the same
performance, give or take some overhead. It takes time to check
and see if that is the case, however.

Thanks,
-Stolee

  reply	other threads:[~2021-02-01  3:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28 16:24 [PATCH 0/3] Speed up remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 16:24 ` [PATCH 1/3] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 20:51   ` Junio C Hamano
2021-01-29 17:11     ` René Scharfe
2021-01-31  3:52       ` Derrick Stolee
2021-01-31 10:20         ` René Scharfe
2021-01-31  3:59     ` Derrick Stolee
2021-01-31 20:13       ` Derrick Stolee
2021-01-31 20:25       ` Junio C Hamano
2021-02-01  3:55         ` Derrick Stolee [this message]
2021-01-29 17:10   ` René Scharfe
2021-01-28 16:24 ` [PATCH 2/3] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-01-28 16:24 ` [PATCH 3/3] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 20:20 ` [PATCH 0/3] Speed up remove_redundant() Junio C Hamano
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-02-01 12:47   ` [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 19:51     ` Junio C Hamano
2021-02-01 12:47   ` [PATCH v2 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 16:12     ` René Scharfe.
2021-02-01 16:31       ` Derrick Stolee
2021-02-01 12:47   ` [PATCH v2 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-02-01 12:47   ` [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 20:05     ` Junio C Hamano
2021-02-01 21:02       ` Derrick Stolee
2021-02-01 12:47   ` [PATCH v2 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
2021-02-03 15:59     ` Taylor Blau
2021-02-01 15:48   ` [PATCH v2 0/5] Speed up remove_redundant() Derrick Stolee
2021-02-18 23:25     ` Junio C Hamano
2021-02-19 12:17       ` Derrick Stolee
2021-02-20  3:32         ` Junio C Hamano
2021-02-19 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d1e909e-b2e2-afda-d8ee-a3c32d25b01d@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).