git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Michael J Gruber <git@drmicha.warpmail.net>,
	git discussion list <git@vger.kernel.org>
Subject: Re: Our merge bases sometimes suck
Date: Fri, 13 Jun 2014 12:13:50 +0200	[thread overview]
Message-ID: <539ACEDE.8040401@alum.mit.edu> (raw)
In-Reply-To: <539AC690.6000906@drmicha.warpmail.net>

On 06/13/2014 11:38 AM, Michael J Gruber wrote:
> Michael Haggerty venit, vidit, dixit 13.06.2014 00:12:
>> I've been thinking a lot about merge bases lately and think I have
>> discovered something interesting.
>>
>> tl;dr:
>>
>> When two branches have multiple merge bases,
>>
>>     git merge-base $master $branch
>>
>> picks one merge base more or less arbitrarily.  Here I describe a
>> criterion for picking a "best" merge base.  When the best merge base
>> is used, the diff output by
>>
>>     git diff $master...$branch
>>
>> becomes shorter and simpler--sometimes dramatically so.  I have
>> quantified the improvement by analyzing historical merges from the Git
>> project (see attached image).  Best merge bases might also help reduce
>> conflicts when conducting merges.
>>
> 
> Everytime I looked at our merge base code, my head started spinning. So
> it's admirable you even started this endeavor :)

I haven't looked at the code :-)  My email came only out of reading docs
and experimenting.

> We use merge bases for several things:
> 
> - merging
> - resolving "A...B" rev notation (rev-list and friends), aka symmetric
> difference
> - left/right selection/annotation of commits (rev-list/log)
> - resolving "diff A...B", which may be a handy notation, but is horribly
> misleading because it is a very unsymmetric form of diff
> 
> Out of these four, you seemingly picked the one as pivoting example
> which is most different from any symmetric notion of "...". But in fact,
> "merging" is quite unsymmetric, too, because merging branch into master
> is quite different from the other way round.
> 
> This is certainly obvious to you, bit I thought I point it out for the
> convenience of other readers: You are after optimizing the merge base
> choice for an unsymmetric use case, which covers "diff A...B" as well as
> "merge B into A".
> 
> In these two cases, choosing different merge bases can lead to different
> results. The other two basically use all merge bases anyways.

Correct, the scope of my proposal is for cases when there are multiple
merge bases but one has to be selected.

> [...]
>> I propose that the best merge base is the merge base "candidate" that
>> minimizes the number of non-merge commits that are in
>>
>>     git rev-list --no-merges $candidate..$branch
>>
>> but are already in master:
>>
>>     git rev-list --no-merges $master
>>
>> Since a non-merge commit should embody a single logical change,
>> counting non-merge commits is in some sense counting changes [2].
>>
>> We can put this criterion in simpler form.  Because every candidate is
>> a merge-base,
>>
>>     git rev-list --no-merges $candidate..$branch
>>
>> necessarily includes *all* of the non-merge commits that are on branch
>> but not on master. 
> 
> This is actually true for every merge base candidate (i.e. every commit
> which is both on master and branch).
> 
> [These things are really much easier in set language, A..B being B
> \setminus A, being B intersected with the complement of A etc.]
> 
>> This is a fixed number of commits, the same for
>> every candidate.
> 
> "This" refers to "branch minus master" (commits on branch but not on
> master), which is defined indepently of candidate.
> 
>>  It *additionally* includes the commits that are on
>> master but not yet in branch.  This second number varies from one
>> candidate to another.
> 
> "master minus branch" is independent of candidate also.
> 
> The second number is rather those commits on candidate..branch which are
> not "branch minus master", i.e. "branch minus candidate" minus "branch
> minus master", which is "branch intersected with master minus
> candidate", i.e. those commits on candidate..branch which are also on
> master. [Again this is true for every merge base candidate, not just
> every merge base.]
> 
>>  So if we minimize the number of commits in this
>> output, is is the same as minimizing the number of unwanted commits.
> 
> Now I understand why this statement is true :)

Thanks for the translation to set notation.  It is indeed easier to work
with.

>> Therefore, to get the best merge base, all we have to do is pick the
>> merge base that minimizes
>>
>>     git rev-list --count --no-merges $candidate..$branch
>>
>> There can be ties, but in practice they are rare enough that it is
>> probably not worth worrying about them.
>>
>>
>> Symmetry; generalization to more than two branches
>> ==================================================
>>
>> Interestingly, minimizing the last criterion is the same as maximizing
>>
>>     git rev-list --count --no-merges $candidate
>>
>> because there is a fixed number of commits in
>>
>>     git rev-list --no-merges $branch
>>
>> , and each of those commits is in exactly one of the two counts above.
> 
> That's a cute observation, which in the example above is wrong on first
> glance, but true on second: Visually/subjectively, one easily misses
> commits B and C when counting "X..master", which is not just the commits
> "extending master beyond X", etc.
> 
>> This formulation shows that the best merge base for computing
>>
>>     git diff $master...$branch
>>
>> is also the best merge base for computing
>>
>>     git diff $branch...$master
>>
>> ; i.e., the best merge base is symmetric in its arguments.  It also
>> shows that the concept of "best merge base" can trivially be
>> generalized to more than two branches.
> 
> That symmetry is really surprising, but the argument is convincingly
> correct. Alas, that count back to the roots can be really expensive.

I mentioned the formulation

    git rev-list --count --no-merges $candidate

not because it is necessarily the most efficient way to find the best
merge base, but because it is theoretically useful.  For the
implementation, any of the formulations can be used, because they are
all equivalent.

> Given that the arguments in the previous argument show that everything
> applies to merge base candidates as well, not just merge bases, I'm
> wondering whether we can weave in the weighing (which one is best) with
> our existing merge base algorithm or maybe an alternative algorithm
> using generation numbers or such.
> [...]

That's an interesting idea.  I haven't looked at that code at all, so if
you want to get involved it would be awesome!

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2014-06-13 10:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-12 22:12 Our merge bases sometimes suck Michael Haggerty
2014-06-13  9:38 ` Michael J Gruber
2014-06-13 10:13   ` Michael Haggerty [this message]
2014-06-13 15:52   ` Jakub Narębski
2014-06-13 22:14     ` Michael Haggerty
2014-06-13 22:35     ` Junio C Hamano
2014-06-17 15:08 ` Junio C Hamano
2014-06-17 15:44   ` Michael Haggerty
2014-06-20  6:53     ` Junio C Hamano
2014-06-20  8:53       ` Michael Haggerty
2014-06-20 21:17       ` Nico Williams
2014-06-23 11:43         ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=539ACEDE.8040401@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).