git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* merge-base --is-ancestor A B is unreasonably slow with unrelated history B
@ 2018-01-09 15:17 Ævar Arnfjörð Bjarmason
  2018-01-09 19:13 ` Junio C Hamano
  2018-01-09 19:30 ` Derrick Stolee
  0 siblings, 2 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-09 15:17 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Thomas Rast, Junio C Hamano, Jeff King, Johannes Schindelin

This is a pathological case I don't have time to dig into right now:

    git branch -D orphan;
    git checkout --orphan orphan &&
    git reset --hard &&
    touch foo &&
    git add foo &&
    git commit -m"foo" &&
    time git merge-base --is-ancestor master orphan

This takes around 5 seconds on linux.git to return 1. Which is around
the same time it takes to run current master against the first commit in
linux.git:

    git merge-base --is-ancestor 1da177e4c3f4 master

This is obviously a pathological case, but maybe we should work slightly
harder on the RHS of and discover that it itself is an orphan commit.

I ran into this while writing a hook where we'd like to do:

    git diff $master...topic

Or not, depending on if the topic is an orphan or just something
recently branched off, figured I could use --is-ancestor as on
optimization, and then discovered it's not much of an optimization.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
  2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
@ 2018-01-09 19:13 ` Junio C Hamano
  2018-01-09 19:30 ` Derrick Stolee
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2018-01-09 19:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Thomas Rast, Jeff King, Johannes Schindelin

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> This is obviously a pathological case, but maybe we should work slightly
> harder on the RHS of and discover that it itself is an orphan commit.

In order to discover a commit is an orphan, you'd need to prove not
just that it does not reach the main part of the history (which is
cheap---its parenthood network would be quite limited and traversing
all of it is not that expensive) but the other way around, i.e. the
main part of the history would not reach it.

Do you have a cheaper way to do the latter than a full traveral of
the main history?  If not, then the cost similar to "git log master"
is expected.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
  2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
  2018-01-09 19:13 ` Junio C Hamano
@ 2018-01-09 19:30 ` Derrick Stolee
  2018-01-10 14:01   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 4+ messages in thread
From: Derrick Stolee @ 2018-01-09 19:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Git Mailing List
  Cc: Thomas Rast, Junio C Hamano, Jeff King, Johannes Schindelin

On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
> This is a pathological case I don't have time to dig into right now:
>
>      git branch -D orphan;
>      git checkout --orphan orphan &&
>      git reset --hard &&
>      touch foo &&
>      git add foo &&
>      git commit -m"foo" &&
>      time git merge-base --is-ancestor master orphan
>
> This takes around 5 seconds on linux.git to return 1. Which is around
> the same time it takes to run current master against the first commit in
> linux.git:
>
>      git merge-base --is-ancestor 1da177e4c3f4 master
>
> This is obviously a pathological case, but maybe we should work slightly
> harder on the RHS of and discover that it itself is an orphan commit.
>
> I ran into this while writing a hook where we'd like to do:
>
>      git diff $master...topic
>
> Or not, depending on if the topic is an orphan or just something
> recently branched off, figured I could use --is-ancestor as on
> optimization, and then discovered it's not much of an optimization.

Ævar,

This is the same performance problem that we are trying to work around 
with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits 
that are far apart, many commits need to be parsed. I think the right 
solution is to create a serialized commit graph that stores the 
adjacency information of the commits and can create commit structs 
quickly. This requires storing the commit id, commit date, parents, and 
root tree id to satisfy the needs of parse_commit_gently(). Once the 
framework for this data is constructed, it is simple to add generation 
numbers to that data and start consuming them in other algorithms (by 
adding the field to 'struct commit').

I'm working on such a patch right now, but it will be a few weeks before 
I'm ready.

Thanks,
-Stolee

[1] v5 of --no-ahead-behind 
https://public-inbox.org/git/20180109185018.69164-1-git@jeffhostetler.com/T/#t

[2] v4 of --no-ahead-behind 
https://public-inbox.org/git/nycvar.QRO.7.76.6.1801091744540.37@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz/T/#t



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
  2018-01-09 19:30 ` Derrick Stolee
@ 2018-01-10 14:01   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-10 14:01 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git Mailing List, Thomas Rast, Junio C Hamano, Jeff King,
	Johannes Schindelin


On Tue, Jan 09 2018, Derrick Stolee jotted:

> On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
>> This is a pathological case I don't have time to dig into right now:
>>
>>      git branch -D orphan;
>>      git checkout --orphan orphan &&
>>      git reset --hard &&
>>      touch foo &&
>>      git add foo &&
>>      git commit -m"foo" &&
>>      time git merge-base --is-ancestor master orphan
>>
>> This takes around 5 seconds on linux.git to return 1. Which is around
>> the same time it takes to run current master against the first commit in
>> linux.git:
>>
>>      git merge-base --is-ancestor 1da177e4c3f4 master
>>
>> This is obviously a pathological case, but maybe we should work slightly
>> harder on the RHS of and discover that it itself is an orphan commit.
>>
>> I ran into this while writing a hook where we'd like to do:
>>
>>      git diff $master...topic
>>
>> Or not, depending on if the topic is an orphan or just something
>> recently branched off, figured I could use --is-ancestor as on
>> optimization, and then discovered it's not much of an optimization.
>
> Ævar,
>
> This is the same performance problem that we are trying to work around
> with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits
> that are far apart, many commits need to be parsed. I think the right
> solution is to create a serialized commit graph that stores the
> adjacency information of the commits and can create commit structs
> quickly. This requires storing the commit id, commit date, parents,
> and root tree id to satisfy the needs of parse_commit_gently(). Once
> the framework for this data is constructed, it is simple to add
> generation numbers to that data and start consuming them in other
> algorithms (by adding the field to 'struct commit').
>
> I'm working on such a patch right now, but it will be a few weeks
> before I'm ready.

Thanks, yes of course I forgot about not being able to rely on
timestamps at all, otherwise this check wouldn't need commit traversal
at all, we could simply note:

 1. A is older than B
 2. B is an orphan commit
 3. Therefore it's impossible that A is an ancestor of B

But we don't enforce any of this in the DAG, so now we need to do a full
traversal to make sure this B committed in 2018 doesn't precede some
commit added in 2005.

This has come up before related to various traversal optimizations,
e.g. [tag|branch] --contains.

I don't know the details of what you have in mind, but have you
considered a solution which involves either computing that all commits
in the repo have a timestamp later than their parents, or alternatively
declaring via some config mechanism that that's true of all (or a subset
of) commits?

E.g. for centrally hosted repos a pre-receive hook could be added to
assert that all incoming commits must have monotonically increasing
timestamps compared to preceding commits, and if that wasn't the case we
could declare that e.g. all commits added after 2019 would have that
property (after ensuring no commits had dates in the future).

It sounds like a subset of what you have in mind, so maybe it's
useless. However for many cases such as this one it should be faster to
almost instantaneously consult such config instead of some side-data we
need to maintain, but it's probably not worth the complexity.


> [1] v5 of --no-ahead-behind
> https://public-inbox.org/git/20180109185018.69164-1-git@jeffhostetler.com/T/#t
>
> [2] v4 of --no-ahead-behind
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1801091744540.37@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz/T/#t

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-01-10 14:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
2018-01-09 19:13 ` Junio C Hamano
2018-01-09 19:30 ` Derrick Stolee
2018-01-10 14:01   ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).