git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Thomas Rast <tr@thomasrast.ch>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B
Date: Wed, 10 Jan 2018 15:01:22 +0100	[thread overview]
Message-ID: <87o9m1ak3x.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <2aa20617-6b1d-f5a4-d6e1-250a3ea8f5be@gmail.com>


On Tue, Jan 09 2018, Derrick Stolee jotted:

> On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
>> This is a pathological case I don't have time to dig into right now:
>>
>>      git branch -D orphan;
>>      git checkout --orphan orphan &&
>>      git reset --hard &&
>>      touch foo &&
>>      git add foo &&
>>      git commit -m"foo" &&
>>      time git merge-base --is-ancestor master orphan
>>
>> This takes around 5 seconds on linux.git to return 1. Which is around
>> the same time it takes to run current master against the first commit in
>> linux.git:
>>
>>      git merge-base --is-ancestor 1da177e4c3f4 master
>>
>> This is obviously a pathological case, but maybe we should work slightly
>> harder on the RHS of and discover that it itself is an orphan commit.
>>
>> I ran into this while writing a hook where we'd like to do:
>>
>>      git diff $master...topic
>>
>> Or not, depending on if the topic is an orphan or just something
>> recently branched off, figured I could use --is-ancestor as on
>> optimization, and then discovered it's not much of an optimization.
>
> Ævar,
>
> This is the same performance problem that we are trying to work around
> with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits
> that are far apart, many commits need to be parsed. I think the right
> solution is to create a serialized commit graph that stores the
> adjacency information of the commits and can create commit structs
> quickly. This requires storing the commit id, commit date, parents,
> and root tree id to satisfy the needs of parse_commit_gently(). Once
> the framework for this data is constructed, it is simple to add
> generation numbers to that data and start consuming them in other
> algorithms (by adding the field to 'struct commit').
>
> I'm working on such a patch right now, but it will be a few weeks
> before I'm ready.

Thanks, yes of course I forgot about not being able to rely on
timestamps at all, otherwise this check wouldn't need commit traversal
at all, we could simply note:

 1. A is older than B
 2. B is an orphan commit
 3. Therefore it's impossible that A is an ancestor of B

But we don't enforce any of this in the DAG, so now we need to do a full
traversal to make sure this B committed in 2018 doesn't precede some
commit added in 2005.

This has come up before related to various traversal optimizations,
e.g. [tag|branch] --contains.

I don't know the details of what you have in mind, but have you
considered a solution which involves either computing that all commits
in the repo have a timestamp later than their parents, or alternatively
declaring via some config mechanism that that's true of all (or a subset
of) commits?

E.g. for centrally hosted repos a pre-receive hook could be added to
assert that all incoming commits must have monotonically increasing
timestamps compared to preceding commits, and if that wasn't the case we
could declare that e.g. all commits added after 2019 would have that
property (after ensuring no commits had dates in the future).

It sounds like a subset of what you have in mind, so maybe it's
useless. However for many cases such as this one it should be faster to
almost instantaneously consult such config instead of some side-data we
need to maintain, but it's probably not worth the complexity.


> [1] v5 of --no-ahead-behind
> https://public-inbox.org/git/20180109185018.69164-1-git@jeffhostetler.com/T/#t
>
> [2] v4 of --no-ahead-behind
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1801091744540.37@ZVAVAG-6OXH6DA.rhebcr.pbec.zvpebfbsg.pbz/T/#t

      reply	other threads:[~2018-01-10 14:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-09 15:17 merge-base --is-ancestor A B is unreasonably slow with unrelated history B Ævar Arnfjörð Bjarmason
2018-01-09 19:13 ` Junio C Hamano
2018-01-09 19:30 ` Derrick Stolee
2018-01-10 14:01   ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9m1ak3x.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).