git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFD] should all merge bases be equal?
@ 2016-10-17 22:28 Junio C Hamano
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
  2017-02-09 14:44 ` [RFD] should all merge bases be equal? Michael Haggerty
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-17 22:28 UTC (permalink / raw)
  To: git

People can see how fast the usual merges I see everyday are by
looking at output from

    $ git log --first-parent --format='%ct %s' master..pu

and noticing the seconds since epoch.  Most of the days, these are
recreated directly on top of 'master' from scratch, and they get
timestamps that are very close to each other (or the same), meaning
that I am getting multiple merges per second.

Being accustomed how fast my merges go, there is one merge that
hiccups for me every once in a few days: merging back from 'master'
to 'next'.  This happens after having multiple topics (that by
definition have to be the ones that were already merged to 'next'
some time ago) to 'master', and 'master' may also have its own
commit (e.g. update to "RelNotes") and merges of side branches that
were not in 'next' (e.g. merge from submaintainers like i18n, etc.)

The reason why this merge is slow is because it typically have many
merge bases.  For example, today's tip of 'next' is:

    commit 6021889cc14df07d4366829367d2c4a11d1eaa56
    Merge: 4868def05e 659889482a
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Mon Oct 17 14:02:05 2016 -0700

        Sync with master

        * master:
          Tenth batch for 2.11
          l10n: de.po: translate 260 new messages
          l10n: de.po: fix translation of autostash
          l10n: ru.po: update Russian translation

which is a merge that has 12 merge bases:

    $ git merge-base --all 4868def05e 659889482a | git name-rev --stdin
    3cdd5d19178a54d2e51b5098d43b57571241d0ab (ko/master)
    641c900b2c3a8c3d385eb353b3801a5f49ddbb47 (js/reset-usage)
    30cfe72d37ed8c174cae43923769730a94549dae (rs/pretty-format-color-doc-fix)
    654311bf6ee0fbf530c088271c2c76d46f31f82d (da/mergetool-diff-order)
    72710165c932edb2b8552aef7aef2f357dde4746 (sb/submodule-config-doc-drop-path)
    842a516cb02a53cf0291ff67ed6f8517966345c0 (js/regexec-buf)
    62fe0eb4804c297486a1d421a4f893865fcbc911 (jk/quarantine-received-objects)
    a94bb683970a111b467a36590ca36e52754ad504 (rs/cocci)
    e8c42cb9ce6a566aad797cc6c5bc1279d608d819 (jk/ref-symlink-loop)
    22d3b8de1b625813faec6f3d6ffe66124837b78b (jk/clone-copy-alternates-fix)
    7431596ab1f05a13adb93b44108f27cfd6578a31 (nd/commit-p-doc)
    5275c3081c2b2c6166a2fc6b253a3acb20f8ae89 (dt/http-empty-auth)

The tip of each topic that was merged recently to 'master' since
'master' was last merged to 'next' becomes a valid merge-base by
design of the workflow.  We merge a topic to 'master' whose tip
has been already in 'next' for a while, so the tip of 'next' before
merging 'master' back is a descendant of the tips of these topics,
and the tip of 'master' before I make this merge has just become a
descendant of the tips of these topics.  That makes them common
ancestors between 'master' and 'next'.

But for the purpose of figuring out the 3-way merge result, I
suspect that they are pretty much useless common ancestor to use as
the merge base.  The first one in the list, the old 'master' that
was merged the last time to 'next', would be a lot more useful one.

And of course, if I do this:

    $ git checkout next
    $ git merge master ;# this is slow due to the 12-base above
    $ git checkout HEAD^ ;# detach at the previous state
    $ git merge-recursive ko/master -- HEAD master

the merge is instantaneous.  I'd get only what truly happened
uniquely on 'master', e.g. RelNotes update and i18n merge.

I am wondering if it is worth trying to optimize it by taking
advantage of the specific workflow (i.e. give me an option to use
when merging 'master' back to 'next') and allows me to exclude the
tips of these topic branches.  Would I be making the result open to
the criss-cross merge gotchas the "recursive merge" machinery was
designed to prevent in the first place by doing so?  Offhand, I do
not think that would be the case.

Assuming that it is a good idea, there is another question of how to
tell the more meaningful merge bases (ko/master in this case) out of
the less useful ones (all the others).  I think it would be
sufficient to reject commits that are not on the first-parent chain
of neither branch being merged.  Among the 12 merge bases, ko/master
is the only one that appears on the first-parent chain from 'master'
being merged (it does not appear on the first-parent chain from
'next').  All others were topic tips that by definition were merged
as second parent to integration branches ('next' and later 'master').
The place to do this would be a new option to 'merge-base'; instead
of "--all", perhaps "--major" option gives only the major merge bases
(with the definition of 'major' being the above heuristics), and then
"git merge-recursive" would learn "-Xmajor-base-only" strategy option,
or something along that line.

Thoughts?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 0/7] Rejecting useless merge bases
  2016-10-17 22:28 [RFD] should all merge bases be equal? Junio C Hamano
@ 2016-10-19  4:23 ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 1/7] commit: simplify fastpath of merge-base computation Junio C Hamano
                     ` (7 more replies)
  2017-02-09 14:44 ` [RFD] should all merge bases be equal? Michael Haggerty
  1 sibling, 8 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

This is a continuation of

    http://public-inbox.org/git/xmqqmvi2sj8f.fsf@gitster.mtv.corp.google.com

In a workflow where topic branches are first merged to the 'next'
integration branch to be tested before getting merged down to the
'master' integration branch to be consumed by the end users, merging
the 'master' branch back to the 'next' happens after topics graduate
to 'master' and release notes entries are written for them.

Git finds many merge bases between 'master' and 'next' while
creating this merge.  In addition to the tip of 'master' back when
we made such a merge back from 'master' to 'next' was made the last
time, which is the most reasonable merge base to explain the
histories of both branches, all the tips of topic branches that
graduated recently are merge bases.  Because these tips of topic
branches were already in 'next', the tip of 'next' reaches them, and
because they just graduated to 'master', the tip of 'master' reaches
them, too.  And these topics are independent most of the time (that
is the point of employing the topic-branch workflow), so they cannot
be reduced.

The merge-recursive machinery is very inefficient to compute this
merge, because it needs to create pointless virtual merge-base
commits across these many merge bases.  Conceptually, the point
where the histories of 'master' and 'next' diverged was the tip of
'master' back when we created such a merge back from 'master' to
'next' the last time, and in practice that is the only merge base
that matters.

The series allows us to ignore these tips of topics, which are
uninteresting merge bases, when running "git merge".  The example
merge with 12 merge bases:

    git checkout 4868def05e && git merge 659889482a

in our history takes about 1.22-1.33 seconds without the series,
while running the latter "git merge" with the "--fp-base-only"
option takes about 0.54-0.59 seconds.

Junio C Hamano (7):
  commit: simplify fastpath of merge-base
  sha1_name: remove ONELINE_SEEN bit
  merge-base: stop moving commits around in remove_redundant()
  merge-base: expose get_merge_bases_many_0() a bit more
  merge-base: mark bases that are on first-parent chain
  merge-base: limit the output to bases that are on first-parent chain
  merge: allow to use only the fp-only merge bases

 Documentation/git-merge-base.txt |  8 +++-
 Documentation/merge-options.txt  |  9 +++++
 builtin/merge-base.c             | 10 +++--
 builtin/merge.c                  | 15 ++++++--
 commit.c                         | 81 ++++++++++++++++++++++++----------------
 commit.h                         |  7 +++-
 object.h                         |  3 +-
 sha1_name.c                      | 10 ++---
 8 files changed, 94 insertions(+), 49 deletions(-)

-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] commit: simplify fastpath of merge-base computation
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 2/7] sha1_name: remove ONELINE_SEEN bit Junio C Hamano
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

The get_merge_bases_many*() family of functions first call
merge_bases_many() to find all possible merge bases between a single
commit "one" and a set of other commits "twos[]".  Because this
typically involves traversing the commit graph, which uses the
object flags on the commits involved, the function needs to clear
the flags before it returns.

Except for one special case.  If "one" is exactly one of the "twos",
"one" is the merge base and merge_bases_many() returns a list of
merge bases with a single element, "one", on it, without smudging
the object flags of the commits at all.

We used to use a loop over "twos[]" array to see if this fast-path
would have kicked in in merge_bases_many(), in which case we can
return without cleaning the bits at all.  We can do the same by
inspecting the result in a more direct way, which is conceptually
cleaner.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index aada266f9a..6266c0380c 100644
--- a/commit.c
+++ b/commit.c
@@ -955,10 +955,17 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one,
 	int cnt, i;
 
 	result = merge_bases_many(one, n, twos);
-	for (i = 0; i < n; i++) {
-		if (one == twos[i])
-			return result;
-	}
+
+	/*
+	 * The fast-path of 'one' being the merge-base; there is no
+	 * need to clean the object flags in this case.
+	 */
+	if (result && !result->next &&
+	    result->item == one &&
+	    !(one->object.flags & RESULT))
+		return result;
+
+	/* If we didn't get any, or there is only one, we are done */
 	if (!result || !result->next) {
 		if (cleanup) {
 			clear_commit_marks(one, all_flags);
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] sha1_name: remove ONELINE_SEEN bit
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
  2016-10-19  4:23   ` [PATCH 1/7] commit: simplify fastpath of merge-base computation Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 3/7] merge-base: stop moving commits around in remove_redundant() Junio C Hamano
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

28a4d94044 ("object name: introduce ':/<oneline prefix>' notation",
2007-02-24) started using its own bit from object->flags to mark
commits used while parsing the ":/token" extended SHA-1 syntax to
name a commit temporarily, and this was kept even when f7bff00314
("sha1_name.c: fix parsing of ":/token" syntax", 2010-08-02) found
and fixed a bug in its implementation.

The use of that flag bit, however, is limited to a single function,
get_sha1_oneline(), which first sets it for the commits sitting at
the tips of refs, uses the bit to avoid duplicate traversal while
walking the history, and then cleans the bit from all commits it
walked.

Which is exactly what the general-purpose TMP_MARK bit meant to be
used for isolated case was invented for.  Replace ONELINE_SEEN with
TMP_MARK and retire the former.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.h    |  1 -
 sha1_name.c | 10 ++++------
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/object.h b/object.h
index f8b644263f..f8e218eccd 100644
--- a/object.h
+++ b/object.h
@@ -37,7 +37,6 @@ struct object_array {
  * bundle.c:                               16
  * http-push.c:                            16-----19
  * commit.c:                               16-----19
- * sha1_name.c:                                     20
  */
 #define FLAG_BITS  27
 
diff --git a/sha1_name.c b/sha1_name.c
index ca7ddd6f2c..fa0e6701a3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "remote.h"
 #include "dir.h"
+#include "revision.h"
 
 static int get_sha1_oneline(const char *, unsigned char *, struct commit_list *);
 
@@ -855,9 +856,6 @@ static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned l
  * For future extension, all other sequences beginning with ':/!' are reserved.
  */
 
-/* Remember to update object flag allocation in object.h */
-#define ONELINE_SEEN (1u<<20)
-
 static int handle_one_ref(const char *path, const struct object_id *oid,
 			  int flag, void *cb_data)
 {
@@ -899,7 +897,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		return -1;
 
 	for (l = list; l; l = l->next) {
-		l->item->object.flags |= ONELINE_SEEN;
+		l->item->object.flags |= TMP_MARK;
 		commit_list_insert(l->item, &backup);
 	}
 	while (list) {
@@ -907,7 +905,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 		struct commit *commit;
 		int matches;
 
-		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
+		commit = pop_most_recent_commit(&list, TMP_MARK);
 		if (!parse_object(commit->object.oid.hash))
 			continue;
 		buf = get_commit_buffer(commit, NULL);
@@ -924,7 +922,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1,
 	regfree(&regex);
 	free_commit_list(list);
 	for (l = backup; l; l = l->next)
-		clear_commit_marks(l->item, ONELINE_SEEN);
+		clear_commit_marks(l->item, TMP_MARK);
 	free_commit_list(backup);
 	return found ? 0 : -1;
 }
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] merge-base: stop moving commits around in remove_redundant()
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
  2016-10-19  4:23   ` [PATCH 1/7] commit: simplify fastpath of merge-base computation Junio C Hamano
  2016-10-19  4:23   ` [PATCH 2/7] sha1_name: remove ONELINE_SEEN bit Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 4/7] merge-base: expose get_merge_bases_many_0() a bit more Junio C Hamano
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

The merge-base computation is performed in two steps.  First,
paint_down_to_common() traverses the history to find all possible
merge bases (and more), and then remove_redundant() checks the
result and culls the ones that can be reached from another commit
in the result.

The latter received an array of commits, and then returned the same
array after reordering the elements in it, moving the surviving ones
to the front and returning the number of surviving ones.

This arrangement works, but it makes it cumbersome for the callers
when they want to see the array's contents intact (e.g. the caller
may want to keep an additional per-commit data in an independent
array that parallels the array of commits).

Stop moving commits around in the array, and instead mark the ones
that are not merge bases with the STALE bit in their object->flags
bitword.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/commit.c b/commit.c
index 6266c0380c..59bd18e67c 100644
--- a/commit.c
+++ b/commit.c
@@ -888,11 +888,11 @@ struct commit_list *get_octopus_merge_bases(struct commit_list *in)
 	return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static void mark_redundant(struct commit **array, int cnt)
 {
 	/*
 	 * Some commit in the array may be an ancestor of
-	 * another commit.  Move such commit to the end of
+	 * another commit.  Mark such commit as STALE in
 	 * the array, and return the number of commits that
 	 * are independent from each other.
 	 */
@@ -930,18 +930,16 @@ static int remove_redundant(struct commit **array, int cnt)
 		free_commit_list(common);
 	}
 
-	/* Now collect the result */
-	COPY_ARRAY(work, array, cnt);
-	for (i = filled = 0; i < cnt; i++)
-		if (!redundant[i])
-			array[filled++] = work[i];
-	for (j = filled, i = 0; i < cnt; i++)
+	/* Mark the result */
+	for (i = 0; i < cnt; i++)
 		if (redundant[i])
-			array[j++] = work[i];
+			array[i]->object.flags |= STALE;
+		else
+			array[i]->object.flags &= ~STALE;
+
 	free(work);
 	free(redundant);
 	free(filled_index);
-	return filled;
 }
 
 static struct commit_list *get_merge_bases_many_0(struct commit *one,
@@ -984,10 +982,13 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one,
 	clear_commit_marks(one, all_flags);
 	clear_commit_marks_many(n, twos, all_flags);
 
-	cnt = remove_redundant(rslt, cnt);
+	mark_redundant(rslt, cnt);
 	result = NULL;
 	for (i = 0; i < cnt; i++)
-		commit_list_insert_by_date(rslt[i], &result);
+		if (!(rslt[i]->object.flags & STALE))
+			commit_list_insert_by_date(rslt[i], &result);
+		else
+			rslt[i]->object.flags &= ~STALE;
 	free(rslt);
 	return result;
 }
@@ -1086,9 +1087,12 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 			p->item->object.flags &= ~STALE;
 		}
 	}
-	num_head = remove_redundant(array, num_head);
+	mark_redundant(array, num_head);
 	for (i = 0; i < num_head; i++)
-		tail = &commit_list_insert(array[i], tail)->next;
+		if (!(array[i]->object.flags & STALE))
+			tail = &commit_list_insert(array[i], tail)->next;
+		else
+			array[i]->object.flags &= ~STALE;
 	return result;
 }
 
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] merge-base: expose get_merge_bases_many_0() a bit more
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
                     ` (2 preceding siblings ...)
  2016-10-19  4:23   ` [PATCH 3/7] merge-base: stop moving commits around in remove_redundant() Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 5/7] merge-base: mark bases that are on first-parent chain Junio C Hamano
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

"git merge-base" names its internal workhorse helper function
"get_merge_bases_many_0()", which takes one "can we get away without
clearing the object->flags bits because we know we are the last
caller?" parameter.  Make the parameter into a flags word to make it
extensible and rename it to get_merge_bases_opt().  Use it to turn
get_merge_bases_many_dirty() wrapper into a C-preprocessor macro.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 19 ++++++-------------
 commit.h |  6 ++++--
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 59bd18e67c..92d23b1082 100644
--- a/commit.c
+++ b/commit.c
@@ -942,15 +942,15 @@ static void mark_redundant(struct commit **array, int cnt)
 	free(filled_index);
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
-						  int n,
-						  struct commit **twos,
-						  int cleanup)
+struct commit_list *get_merge_bases_opt(struct commit *one,
+					int n, struct commit **twos,
+					unsigned flags)
 {
 	struct commit_list *list;
 	struct commit **rslt;
 	struct commit_list *result;
 	int cnt, i;
+	int cleanup = !!(flags & MB_POSTCLEAN);
 
 	result = merge_bases_many(one, n, twos);
 
@@ -997,19 +997,12 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 					 int n,
 					 struct commit **twos)
 {
-	return get_merge_bases_many_0(one, n, twos, 1);
-}
-
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-					       int n,
-					       struct commit **twos)
-{
-	return get_merge_bases_many_0(one, n, twos, 0);
+	return get_merge_bases_opt(one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-	return get_merge_bases_many_0(one, 1, &two, 1);
+	return get_merge_bases_opt(one, 1, &two, MB_POSTCLEAN);
 }
 
 /*
diff --git a/commit.h b/commit.h
index 32e1a113e5..557f2814b7 100644
--- a/commit.h
+++ b/commit.h
@@ -253,8 +253,10 @@ extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit *r
 extern struct commit_list *get_merge_bases_many(struct commit *one, int n, struct commit **twos);
 extern struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
-/* To be used only when object flags after this call no longer matter */
-extern struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos);
+#define MB_POSTCLEAN 01
+extern struct commit_list *get_merge_bases_opt(struct commit *one, int n, struct commit **twos, unsigned flags);
+
+#define get_merge_bases_many_dirty(one, n, twos) get_merge_bases_opt((one),(n),(twos),MB_POSTCLEAN)
 
 /* largest positive number a signed 32-bit integer can contain */
 #define INFINITE_DEPTH 0x7fffffff
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] merge-base: mark bases that are on first-parent chain
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
                     ` (3 preceding siblings ...)
  2016-10-19  4:23   ` [PATCH 4/7] merge-base: expose get_merge_bases_many_0() a bit more Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19 17:42     ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 6/7] merge-base: limit the output to " Junio C Hamano
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

In a workflow where topic branches are first merged to the 'next'
integration branch to be tested before getting merged down to the
'master' integration branch to be consumed by the end users, merging
the 'master' branch back to the 'next' happens after topics graduate
to 'master' and release notes entries are written for them.

Git finds many merge bases between 'master' and 'next' while
creating this merge.  In addition to the tip of 'master' back when
we made such a merge back from 'master' to 'next' was made the last
time, which is the most reasonable merge base to explain the
histories of both branches, all the tips of topic branches that
graduated recently are merge bases.  Because these tips of topic
branches were already in 'next', the tip of 'next' reaches them, and
because they just graduated to 'master', the tip of 'master' reaches
them, too.  And these topics are independent most of the time (that
is the point of employing the topic-branch workflow), so they cannot
be reduced.

The merge-recursive machinery is very inefficient to compute this
merge, because it needs to create pointless virtual merge-base
commits across these many merge bases.  Conceptually, the point
where the histories of 'master' and 'next' diverged was the tip of
'master' back when we created such a merge back from 'master' to
'next' the last time, and in practice that is the only merge base
that matters.

Update the logic in paint_down_to_common() so that it can mark if a
merge-base commit was reached by following the first-parent chain of
either side of the merge to find such commit, and give an option to
get_merge_bases_opt() to return only merge-base commits that are on
the first-parent chain, so that a later step of this series can
introduce a command line option to enable it and avoid feeding
useless merge bases to the merge machinery.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 17 +++++++++++++++--
 commit.h |  1 +
 object.h |  2 +-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 92d23b1082..8e90531b05 100644
--- a/commit.c
+++ b/commit.c
@@ -765,8 +765,9 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 #define PARENT2		(1u<<17)
 #define STALE		(1u<<18)
 #define RESULT		(1u<<19)
+#define FPCHAIN		(1u<<20)
 
-static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT | FPCHAIN);
 
 static int queue_has_nonstale(struct prio_queue *queue)
 {
@@ -802,6 +803,7 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
 		struct commit *commit = prio_queue_get(&queue);
 		struct commit_list *parents;
 		int flags;
+		int nth_parent = 0;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
 		if (flags == (PARENT1 | PARENT2)) {
@@ -816,11 +818,14 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
 		while (parents) {
 			struct commit *p = parents->item;
 			parents = parents->next;
+			nth_parent++;
 			if ((p->object.flags & flags) == flags)
 				continue;
 			if (parse_commit(p))
 				return NULL;
 			p->object.flags |= flags;
+			if (nth_parent == 1)
+				p->object.flags |= FPCHAIN;
 			prio_queue_put(&queue, p);
 		}
 	}
@@ -951,6 +956,8 @@ struct commit_list *get_merge_bases_opt(struct commit *one,
 	struct commit_list *result;
 	int cnt, i;
 	int cleanup = !!(flags & MB_POSTCLEAN);
+	int fpchain = !!(flags & MB_FPCHAIN);
+	char *on_fpchain;
 
 	result = merge_bases_many(one, n, twos);
 
@@ -975,21 +982,27 @@ struct commit_list *get_merge_bases_opt(struct commit *one,
 	/* There are more than one */
 	cnt = commit_list_count(result);
 	rslt = xcalloc(cnt, sizeof(*rslt));
+	on_fpchain = xcalloc(cnt, sizeof(*on_fpchain));
 	for (list = result, i = 0; list; list = list->next)
 		rslt[i++] = list->item;
 	free_commit_list(result);
 
+	for (i = 0; i < cnt; i++)
+		on_fpchain[i] = !!(rslt[i]->object.flags & FPCHAIN);
+
 	clear_commit_marks(one, all_flags);
 	clear_commit_marks_many(n, twos, all_flags);
 
 	mark_redundant(rslt, cnt);
 	result = NULL;
 	for (i = 0; i < cnt; i++)
-		if (!(rslt[i]->object.flags & STALE))
+		if (!(rslt[i]->object.flags & STALE) &&
+		    (!fpchain || on_fpchain[i]))
 			commit_list_insert_by_date(rslt[i], &result);
 		else
 			rslt[i]->object.flags &= ~STALE;
 	free(rslt);
+	free(on_fpchain);
 	return result;
 }
 
diff --git a/commit.h b/commit.h
index 557f2814b7..ec6e09a985 100644
--- a/commit.h
+++ b/commit.h
@@ -254,6 +254,7 @@ extern struct commit_list *get_merge_bases_many(struct commit *one, int n, struc
 extern struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 #define MB_POSTCLEAN 01
+#define MB_FPCHAIN 02
 extern struct commit_list *get_merge_bases_opt(struct commit *one, int n, struct commit **twos, unsigned flags);
 
 #define get_merge_bases_many_dirty(one, n, twos) get_merge_bases_opt((one),(n),(twos),MB_POSTCLEAN)
diff --git a/object.h b/object.h
index f8e218eccd..7e6729158a 100644
--- a/object.h
+++ b/object.h
@@ -36,7 +36,7 @@ struct object_array {
  * bisect.c:                               16
  * bundle.c:                               16
  * http-push.c:                            16-----19
- * commit.c:                               16-----19
+ * commit.c:                               16-------20
  */
 #define FLAG_BITS  27
 
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] merge-base: limit the output to bases that are on first-parent chain
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
                     ` (4 preceding siblings ...)
  2016-10-19  4:23   ` [PATCH 5/7] merge-base: mark bases that are on first-parent chain Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19  4:23   ` [PATCH 7/7] merge: allow to use only the fp-only merge bases Junio C Hamano
  2016-10-19 21:34   ` [PATCH 0/7] Rejecting useless " Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

A new option "--fp-only" limits the output to the merge bases that
are on the first-parent chain from the branches being merged.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-merge-base.txt |  8 +++++++-
 builtin/merge-base.c             | 10 +++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-merge-base.txt b/Documentation/git-merge-base.txt
index 808426faac..f99653f138 100644
--- a/Documentation/git-merge-base.txt
+++ b/Documentation/git-merge-base.txt
@@ -9,7 +9,7 @@ git-merge-base - Find as good common ancestors as possible for a merge
 SYNOPSIS
 --------
 [verse]
-'git merge-base' [-a|--all] <commit> <commit>...
+'git merge-base' [-a|--all] [--fp-only] <commit> <commit>...
 'git merge-base' [-a|--all] --octopus <commit>...
 'git merge-base' --is-ancestor <commit> <commit>
 'git merge-base' --independent <commit>...
@@ -72,6 +72,12 @@ OPTIONS
 --all::
 	Output all merge bases for the commits, instead of just one.
 
+--fp-only::
+	Limit the output to merge bases that are on the first-parent
+	chain from none of the commits given on the command line.
+
+
+
 DISCUSSION
 ----------
 
diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index c0d1822eb3..42febb20ff 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -6,11 +6,12 @@
 #include "revision.h"
 #include "parse-options.h"
 
-static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
+static int show_merge_base(struct commit **rev, int rev_nr, int show_all, int fp_only)
 {
 	struct commit_list *result;
+	unsigned flags = fp_only ? MB_FPCHAIN : 0;
 
-	result = get_merge_bases_many_dirty(rev[0], rev_nr - 1, rev + 1);
+	result = get_merge_bases_opt(rev[0], rev_nr - 1, rev + 1, flags);
 
 	if (!result)
 		return 1;
@@ -208,10 +209,13 @@ int cmd_merge_base(int argc, const char **argv, const char *prefix)
 	struct commit **rev;
 	int rev_nr = 0;
 	int show_all = 0;
+	int fp_only = 0;
 	int cmdmode = 0;
 
 	struct option options[] = {
 		OPT_BOOL('a', "all", &show_all, N_("output all common ancestors")),
+		OPT_BOOL(0, "fp-only", &fp_only,
+			 N_("limit to bases that are on first-parent chain")),
 		OPT_CMDMODE(0, "octopus", &cmdmode,
 			    N_("find ancestors for a single n-way merge"), 'o'),
 		OPT_CMDMODE(0, "independent", &cmdmode,
@@ -255,5 +259,5 @@ int cmd_merge_base(int argc, const char **argv, const char *prefix)
 	ALLOC_ARRAY(rev, argc);
 	while (argc-- > 0)
 		rev[rev_nr++] = get_commit_reference(*argv++);
-	return show_merge_base(rev, rev_nr, show_all);
+	return show_merge_base(rev, rev_nr, show_all, fp_only);
 }
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] merge: allow to use only the fp-only merge bases
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
                     ` (5 preceding siblings ...)
  2016-10-19  4:23   ` [PATCH 6/7] merge-base: limit the output to " Junio C Hamano
@ 2016-10-19  4:23   ` Junio C Hamano
  2016-10-19 21:34   ` [PATCH 0/7] Rejecting useless " Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19  4:23 UTC (permalink / raw)
  To: git

Teach "git merge" a new option "--fp-base-only" that tells it to
consider only merge bases that are on the first-parent chain.

This speeds up back-merges needed in the topic branch workflow.  The
merge of 'master' back into 'next' used as an example in the RFD
article <xmqqmvi2sj8f.fsf@gitster.mtv.corp.google.com>, i.e.

    git checkout 4868def05e && git merge 659889482a

in our history takes about 1.22-1.33 seconds without the series,
while running the latter "git merge" with the "--fp-base-only"
option takes about 0.54-0.59 seconds.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/merge-options.txt |  9 +++++++++
 builtin/merge.c                 | 15 ++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 5b4a62e936..7927f069e4 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -116,6 +116,15 @@ ifndef::git-pull[]
 	Note that not all merge strategies may support progress
 	reporting.
 
+--fp-base-only::
+	Instead of using all merge bases when computing the
+	three-way merge result, use only the merge bases on the
+	first-parent chain of the commits being merged.  This
+	experimental feature is meant to be used when merging an
+	older integration branch back to a newer integration branch
+	in the topic-branch workflow.
+
+
 endif::git-pull[]
 
 --allow-unrelated-histories::
diff --git a/builtin/merge.c b/builtin/merge.c
index 0ae099f746..a38b878e61 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -88,6 +88,8 @@ enum ff_type {
 
 static enum ff_type fast_forward = FF_ALLOW;
 
+static int fp_base_only;
+
 static int option_parse_message(const struct option *opt,
 				const char *arg, int unset)
 {
@@ -210,6 +212,8 @@ static struct option builtin_merge_options[] = {
 	{ OPTION_SET_INT, 0, "ff-only", &fast_forward, NULL,
 		N_("abort if fast-forward is not possible"),
 		PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, FF_ONLY },
+	OPT_BOOL(0, "fp-base-only", &fp_base_only,
+		 N_("use only merge bases on first-parent chain")),
 	OPT_RERERE_AUTOUPDATE(&allow_rerere_auto),
 	OPT_BOOL(0, "verify-signatures", &verify_signatures,
 		N_("verify that the named commit has a valid GPG signature")),
@@ -1340,9 +1344,14 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 
 	if (!remoteheads)
 		; /* already up-to-date */
-	else if (!remoteheads->next)
-		common = get_merge_bases(head_commit, remoteheads->item);
-	else {
+	else if (!remoteheads->next) {
+		unsigned flags = MB_POSTCLEAN;
+		if (fp_base_only)
+			flags |= MB_FPCHAIN;
+		common = get_merge_bases_opt(head_commit,
+					     1, &remoteheads->item,
+					     flags);
+	} else {
 		struct commit_list *list = remoteheads;
 		commit_list_insert(head_commit, &list);
 		common = get_octopus_merge_bases(list);
-- 
2.10.1-631-gb2c64dcf30


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/7] merge-base: mark bases that are on first-parent chain
  2016-10-19  4:23   ` [PATCH 5/7] merge-base: mark bases that are on first-parent chain Junio C Hamano
@ 2016-10-19 17:42     ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19 17:42 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> In a workflow where topic branches are first merged to the 'next'
> integration branch to be tested before getting merged down to the
> 'master' integration branch to be consumed by the end users, merging
> the 'master' branch back to the 'next' happens after topics graduate
> to 'master' and release notes entries are written for them.
>
> Git finds many merge bases between 'master' and 'next' while
> creating this merge.  In addition to the tip of 'master' back when
> we made such a merge back from 'master' to 'next' was made the last
> time, which is the most reasonable merge base to explain the
> histories of both branches, all the tips of topic branches that
> graduated recently are merge bases.  Because these tips of topic
> branches were already in 'next', the tip of 'next' reaches them, and
> because they just graduated to 'master', the tip of 'master' reaches
> them, too.  And these topics are independent most of the time (that
> is the point of employing the topic-branch workflow), so they cannot
> be reduced.

The idea here is to exclude tips of topic branches as "obviously
less useful as merge bases than the mainline commit".  Aside from
the fact that the approach is purely a heuristic that heavily relies
on convention aka "topic branch workflow", there is a caveat (or
more that I haven't thought of yet).  

One is that there may be no merge base found that is on the first
parent chain with this particular patch.  When any of the topics
that just graduated to 'master' was forked off of 'master' after the
latest merge from 'master' to 'next' was made, then the merge base
on the first parent chain, i.e. the commit on 'master' that was
merged to 'next' the last time, will be an ancestor of the tip of
that recently forked topic, which is another common ancestor between
'master' and 'next' being merged (hence removed as redundant).  We
will be left with only the merge bases that are off first-parent
chain, and I think "git merge" will say "no related histories; I
won't merge them".

I do not know if that is a problem in practice, but if it turns out
to be, it probably can be corrected by updating the way how the
paint_down_to_common() function works.  It still stops traversal,
even with this patch, when a commit is found to be common and place
it to the "potential merge bases" list, but instead we could keep
digging until we hit a commit that is common between PARENT1 and
PARENT2 and also is on the first-parent chain, without declaring
that we found one result.

But that probably ends up being the same as ignoring all side
branches from merges and finding merge bases only paying attention
to the first-parent chain.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/7] Rejecting useless merge bases
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
                     ` (6 preceding siblings ...)
  2016-10-19  4:23   ` [PATCH 7/7] merge: allow to use only the fp-only merge bases Junio C Hamano
@ 2016-10-19 21:34   ` Junio C Hamano
  7 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-19 21:34 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> This is a continuation of
>
>     http://public-inbox.org/git/xmqqmvi2sj8f.fsf@gitster.mtv.corp.google.com

I scanned all two-parent merges in git.git leading to 'next' and
'pu' and reproduced them with or without the new --fp-base-only
option.  I also did the same experiment in linux.git, but only for
the latest 10,000 merges.

This exercise yielded some interesting numbers.

Among the total 10,995 merges, 10,487 merges had only one merge base
and that merge base was on the first-parent chain, i.e. the result
of reproduction is the same with or without "--fp-base-only".

There were 65 merges with multiple merge-bases but all of these
merge bases were on the first-parent chain, i.e. again the result is
the same with or without "--fp-base-only".

The remaining 443 merges had merge bases that are not on the
first-parent chain.  These merges, when recreated with
"--fp-base-only", would use different/reduced set of merge bases to
drive merge-recursive machinery and could produce different results.

Among these 443, "git merge" with or without the "--fp-base-only"
option successfully auto-resolved and produced the same result as
recorded in the real history for 214 of them.  They stopped in
conflicts but "git ls-files -s" output in their conflicted states
were the same, i.e. they left the identical conflicts, for 221 of
them.

The remaining 8 merges were auto-resolved the same way with or
without the "--fp-base-only" option, but they were different from
the merge in the real history (i.e. the real history had an evil
merge there to adjust for non-textual conflicts).

The most important numbers was 0.  There was no merges whose
reproduction with and without "--fp-base-only" produced different
results.


For linux.git the numbers are:

 - total merges looked at:                         10,000
 - single merge base:                               9,705
 - multi merge base all on the first parent chain:    123
 - fp-base-only eligible:                             172
 - cleanly resolved the same way w/ or w/o fp-only:   118
 - conflicted the same way w/ or w/o fp-only:          54
 - evil merges among fp-base-only eligible merges:      0
 - fp-base-only mismerges:                              0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD] should all merge bases be equal?
  2016-10-17 22:28 [RFD] should all merge bases be equal? Junio C Hamano
  2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
@ 2017-02-09 14:44 ` Michael Haggerty
  2017-02-09 16:57   ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Haggerty @ 2017-02-09 14:44 UTC (permalink / raw)
  To: Junio C Hamano, git

On 10/18/2016 12:28 AM, Junio C Hamano wrote:
> [...]
> Being accustomed how fast my merges go, there is one merge that
> hiccups for me every once in a few days: merging back from 'master'
> to 'next'.  [...]
> 
> The reason why this merge is slow is because it typically have many
> merge bases.  [...]

I overlooked this topic until just now :-(

I spent a lot of time looking at merge bases a couple of years ago [1],
originally motivated by the crappy diffs you get from

    git diff master...branch

when the merge base is chosen poorly. In that email I include a lot of
data and suggest a different heuristic, namely to define the "best"
merge base $M to be the one that minimizes the number of non-merge
commits between $M and either of the branch tips (it doesn't matter
which one you choose); i.e., the one that minimizes

    git rev-list --count --no-merges $M..$TIP

. The idea is that a merge base that is "closer" content-wise to the
tips will probably yield smaller diffs. I would expect that merge base
also to yield simpler merges, though I didn't test that. Relying on the
number of commits (rather than some other measure of how much the
content has been changed) is only a heuristic, but it seems to work well
and it can be implemented pretty cheaply.

We actually use an algorithm like the one I described at GitHub, though
it is implemented as a script rather than ever having been integrated
into git. And (for no particular reason) we include merge commits in the
commit count (it doesn't make much difference whether merges are
included or excluded).

Your idea to look at the first-parent histories of the two branch tips
is an interesting one and has the nice theoretical property that it is
based on the DAG topology rather than a count of commits. I'd be very
curious to see how the sizes of asymmetric diffs differ between your
method and mine, because for me smaller and more readable diffs are one
of the main benefits of better merge bases.

I would worry a bit that your proposed algorithm won't perform as well
for people who use less disciplined workflows than git.git or the Linux
kernel. For example, many people merge a lot more frequently and
chaotically, maybe even with the parents reversed from the canonical order.

Anyway, I mostly wanted to remind you of the earlier discussion of this
topic. There's a lot more information there.

Michael

[1] http://public-inbox.org/git/539A25BF.4060501@alum.mit.edu/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD] should all merge bases be equal?
  2017-02-09 14:44 ` [RFD] should all merge bases be equal? Michael Haggerty
@ 2017-02-09 16:57   ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2017-02-09 16:57 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Anyway, I mostly wanted to remind you of the earlier discussion of this
> topic. There's a lot more information there.
>
> Michael
>
> [1] http://public-inbox.org/git/539A25BF.4060501@alum.mit.edu/

Your http://public-inbox.org/git/53A3F67A.80501@alum.mit.edu/ in the
thread appears to me the best place to continue exploring.

Thanks for the link.  

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-09 16:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 22:28 [RFD] should all merge bases be equal? Junio C Hamano
2016-10-19  4:23 ` [PATCH 0/7] Rejecting useless merge bases Junio C Hamano
2016-10-19  4:23   ` [PATCH 1/7] commit: simplify fastpath of merge-base computation Junio C Hamano
2016-10-19  4:23   ` [PATCH 2/7] sha1_name: remove ONELINE_SEEN bit Junio C Hamano
2016-10-19  4:23   ` [PATCH 3/7] merge-base: stop moving commits around in remove_redundant() Junio C Hamano
2016-10-19  4:23   ` [PATCH 4/7] merge-base: expose get_merge_bases_many_0() a bit more Junio C Hamano
2016-10-19  4:23   ` [PATCH 5/7] merge-base: mark bases that are on first-parent chain Junio C Hamano
2016-10-19 17:42     ` Junio C Hamano
2016-10-19  4:23   ` [PATCH 6/7] merge-base: limit the output to " Junio C Hamano
2016-10-19  4:23   ` [PATCH 7/7] merge: allow to use only the fp-only merge bases Junio C Hamano
2016-10-19 21:34   ` [PATCH 0/7] Rejecting useless " Junio C Hamano
2017-02-09 14:44 ` [RFD] should all merge bases be equal? Michael Haggerty
2017-02-09 16:57   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).