* merge-base --is-ancestor A B is unreasonably slow with unrelated history B
@ 2018-01-09 15:17 Ævar Arnfjörð Bjarmason
2018-01-09 19:13 ` Junio C Hamano
2018-01-09 19:30 ` Derrick Stolee
0 siblings, 2 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-09 15:17 UTC (permalink / raw)
To: Git Mailing List
Cc: Thomas Rast, Junio C Hamano, Jeff King, Johannes Schindelin
This is a pathological case I don't have time to dig into right now:
git branch -D orphan;
git checkout --orphan orphan &&
git reset --hard &&
touch foo &&
git add foo &&
git commit -m"foo" &&
time git merge-base --is-ancestor master orphan
This takes around 5 seconds on linux.git to return 1. Which is around
the same time it takes to run current master against the first commit in
linux.git:
git merge-base --is-ancestor 1da177e4c3f4 master
This is obviously a pathological case, but maybe we should work slightly
harder on the RHS of and discover that it itself is an orphan commit.
I ran into this while writing a hook where we'd like to do:
git diff $master...topic
Or not, depending on if the topic is an orphan or just something
recently branched off, figured I could use --is-ancestor as on
optimization, and then discovered it's not much of an optimization.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
@ 2018-01-09 19:13 ` Junio C Hamano
2018-01-09 19:30 ` Derrick Stolee
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2018-01-09 19:13 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Git Mailing List, Thomas Rast, Jeff King, Johannes Schindelin
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> This is obviously a pathological case, but maybe we should work slightly
> harder on the RHS of and discover that it itself is an orphan commit.
In order to discover a commit is an orphan, you'd need to prove not
just that it does not reach the main part of the history (which is
cheap---its parenthood network would be quite limited and traversing
all of it is not that expensive) but the other way around, i.e. the
main part of the history would not reach it.
Do you have a cheaper way to do the latter than a full traveral of
the main history? If not, then the cost similar to "git log master"
is expected.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
2018-01-09 19:13 ` Junio C Hamano
@ 2018-01-09 19:30 ` Derrick Stolee
2018-01-10 14:01 ` Ævar Arnfjörð Bjarmason
1 sibling, 1 reply; 4+ messages in thread
From: Derrick Stolee @ 2018-01-09 19:30 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, Git Mailing List
Cc: Thomas Rast, Junio C Hamano, Jeff King, Johannes Schindelin
On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
> This is a pathological case I don't have time to dig into right now:
>
> git branch -D orphan;
> git checkout --orphan orphan &&
> git reset --hard &&
> touch foo &&
> git add foo &&
> git commit -m"foo" &&
> time git merge-base --is-ancestor master orphan
>
> This takes around 5 seconds on linux.git to return 1. Which is around
> the same time it takes to run current master against the first commit in
> linux.git:
>
> git merge-base --is-ancestor 1da177e4c3f4 master
>
> This is obviously a pathological case, but maybe we should work slightly
> harder on the RHS of and discover that it itself is an orphan commit.
>
> I ran into this while writing a hook where we'd like to do:
>
> git diff $master...topic
>
> Or not, depending on if the topic is an orphan or just something
> recently branched off, figured I could use --is-ancestor as on
> optimization, and then discovered it's not much of an optimization.
Ævar,
This is the same performance problem that we are trying to work around
with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits
that are far apart, many commits need to be parsed. I think the right
solution is to create a serialized commit graph that stores the
adjacency information of the commits and can create commit structs
quickly. This requires storing the commit id, commit date, parents, and
root tree id to satisfy the needs of parse_commit_gently(). Once the
framework for this data is constructed, it is simple to add generation
numbers to that data and start consuming them in other algorithms (by
adding the field to 'struct commit').
I'm working on such a patch right now, but it will be a few weeks before
I'm ready.
Thanks,
-Stolee
[1] v5 of --no-ahead-behind
https://public-inbox.org/git/20180109185018.69164-1-git@jeffhostetler.com/T/#t
[2] v4 of --no-ahead-behind
https://public-inbox.org/git/nycvar.QRO.7.76.6.1801091744540.37@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz/T/#t
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
2018-01-09 19:30 ` Derrick Stolee
@ 2018-01-10 14:01 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-10 14:01 UTC (permalink / raw)
To: Derrick Stolee
Cc: Git Mailing List, Thomas Rast, Junio C Hamano, Jeff King,
Johannes Schindelin
On Tue, Jan 09 2018, Derrick Stolee jotted:
> On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
>> This is a pathological case I don't have time to dig into right now:
>>
>> git branch -D orphan;
>> git checkout --orphan orphan &&
>> git reset --hard &&
>> touch foo &&
>> git add foo &&
>> git commit -m"foo" &&
>> time git merge-base --is-ancestor master orphan
>>
>> This takes around 5 seconds on linux.git to return 1. Which is around
>> the same time it takes to run current master against the first commit in
>> linux.git:
>>
>> git merge-base --is-ancestor 1da177e4c3f4 master
>>
>> This is obviously a pathological case, but maybe we should work slightly
>> harder on the RHS of and discover that it itself is an orphan commit.
>>
>> I ran into this while writing a hook where we'd like to do:
>>
>> git diff $master...topic
>>
>> Or not, depending on if the topic is an orphan or just something
>> recently branched off, figured I could use --is-ancestor as on
>> optimization, and then discovered it's not much of an optimization.
>
> Ævar,
>
> This is the same performance problem that we are trying to work around
> with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits
> that are far apart, many commits need to be parsed. I think the right
> solution is to create a serialized commit graph that stores the
> adjacency information of the commits and can create commit structs
> quickly. This requires storing the commit id, commit date, parents,
> and root tree id to satisfy the needs of parse_commit_gently(). Once
> the framework for this data is constructed, it is simple to add
> generation numbers to that data and start consuming them in other
> algorithms (by adding the field to 'struct commit').
>
> I'm working on such a patch right now, but it will be a few weeks
> before I'm ready.
Thanks, yes of course I forgot about not being able to rely on
timestamps at all, otherwise this check wouldn't need commit traversal
at all, we could simply note:
1. A is older than B
2. B is an orphan commit
3. Therefore it's impossible that A is an ancestor of B
But we don't enforce any of this in the DAG, so now we need to do a full
traversal to make sure this B committed in 2018 doesn't precede some
commit added in 2005.
This has come up before related to various traversal optimizations,
e.g. [tag|branch] --contains.
I don't know the details of what you have in mind, but have you
considered a solution which involves either computing that all commits
in the repo have a timestamp later than their parents, or alternatively
declaring via some config mechanism that that's true of all (or a subset
of) commits?
E.g. for centrally hosted repos a pre-receive hook could be added to
assert that all incoming commits must have monotonically increasing
timestamps compared to preceding commits, and if that wasn't the case we
could declare that e.g. all commits added after 2019 would have that
property (after ensuring no commits had dates in the future).
It sounds like a subset of what you have in mind, so maybe it's
useless. However for many cases such as this one it should be faster to
almost instantaneously consult such config instead of some side-data we
need to maintain, but it's probably not worth the complexity.
> [1] v5 of --no-ahead-behind
> https://public-inbox.org/git/20180109185018.69164-1-git@jeffhostetler.com/T/#t
>
> [2] v4 of --no-ahead-behind
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1801091744540.37@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz/T/#t
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-01-10 14:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
2018-01-09 19:13 ` Junio C Hamano
2018-01-09 19:30 ` Derrick Stolee
2018-01-10 14:01 ` Ævar Arnfjörð Bjarmason
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).