git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH 1/4] subtree: refactor split of a commit into standalone method
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
@ 2018-09-28 18:35 ` Strain, Roger L
  2018-09-28 18:35 ` [PATCH 2/4] subtree: make --ignore-joins pay attention to adds Strain, Roger L
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Strain, Roger L @ 2018-09-28 18:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Junio C Hamano, Stephen R Guglielmo,
	David A . Greene, Matthieu Moy, Stephen R Guglielmo, Dave Ware,
	David Aguilar

In a particularly complex repo, subtree split was not creating
compatible splits for pushing back to a separate repo. Addressing
one of the issues requires recursive handling of parent commits
that were not initially considered by the algorithm. This commit
makes no functional changes, but relocates the code to be called
recursively into a new method to simply comparisons of later
commits.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 78 ++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 36 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index d3f39a862..2cd7b345b 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -598,6 +598,47 @@ ensure_valid_ref_format () {
 		die "'$1' does not look like a ref"
 }
 
+process_split_commit () {
+	local rev="$1"
+	local parents="$2"
+	revcount=$(($revcount + 1))
+	progress "$revcount/$revmax ($createcount)"
+	debug "Processing commit: $rev"
+	exists=$(cache_get "$rev")
+	if test -n "$exists"
+	then
+		debug "  prior: $exists"
+		return
+	fi
+	createcount=$(($createcount + 1))
+	debug "  parents: $parents"
+	newparents=$(cache_get $parents)
+	debug "  newparents: $newparents"
+
+	tree=$(subtree_for_commit "$rev" "$dir")
+	debug "  tree is: $tree"
+
+	check_parents $parents
+
+	# ugly.  is there no better way to tell if this is a subtree
+	# vs. a mainline commit?  Does it matter?
+	if test -z "$tree"
+	then
+		set_notree "$rev"
+		if test -n "$newparents"
+		then
+			cache_set "$rev" "$rev"
+		fi
+		return
+	fi
+
+	newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
+	debug "  newrev is: $newrev"
+	cache_set "$rev" "$newrev"
+	cache_set latest_new "$newrev"
+	cache_set latest_old "$rev"
+}
+
 cmd_add () {
 	if test -e "$dir"
 	then
@@ -706,42 +747,7 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		revcount=$(($revcount + 1))
-		progress "$revcount/$revmax ($createcount)"
-		debug "Processing commit: $rev"
-		exists=$(cache_get "$rev")
-		if test -n "$exists"
-		then
-			debug "  prior: $exists"
-			continue
-		fi
-		createcount=$(($createcount + 1))
-		debug "  parents: $parents"
-		newparents=$(cache_get $parents)
-		debug "  newparents: $newparents"
-
-		tree=$(subtree_for_commit "$rev" "$dir")
-		debug "  tree is: $tree"
-
-		check_parents $parents
-
-		# ugly.  is there no better way to tell if this is a subtree
-		# vs. a mainline commit?  Does it matter?
-		if test -z "$tree"
-		then
-			set_notree "$rev"
-			if test -n "$newparents"
-			then
-				cache_set "$rev" "$rev"
-			fi
-			continue
-		fi
-
-		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
-		debug "  newrev is: $newrev"
-		cache_set "$rev" "$newrev"
-		cache_set latest_new "$newrev"
-		cache_set latest_old "$rev"
+		process_split_commit "$rev" "$parents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new)
-- 
2.19.0.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] subtree: make --ignore-joins pay attention to adds
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
  2018-09-28 18:35 ` [PATCH 1/4] subtree: refactor split of a commit into standalone method Strain, Roger L
@ 2018-09-28 18:35 ` Strain, Roger L
  2018-09-28 18:35 ` [PATCH 3/4] subtree: use commits before rejoins for splits Strain, Roger L
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Strain, Roger L @ 2018-09-28 18:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Junio C Hamano, Stephen R Guglielmo,
	David A . Greene, Matthieu Moy, Stephen R Guglielmo, Dave Ware,
	David Aguilar

Changes the behavior of --ignore-joins to always consider a subtree add
commit, and ignore only splits and squashes.

The --ignore-joins option is documented to ignore prior --rejoin commits.
However, it additionally ignored subtree add commits generated when a
subtree was initially added to a repo.

Due to the logic which determines whether a commit is a mainline commit
or a subtree commit (namely, the presence or absence of content in the
subtree prefix) this causes commits before the initial add to appear to
be part of the subtree. An --ignore-joins split would therefore consider
those commits part of the subtree history and include them at the
beginning of the synthetic history, causing the resulting hashes to be
incorrect for all later commits.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 2cd7b345b..d8861f306 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -340,7 +340,12 @@ find_existing_splits () {
 	revs="$2"
 	main=
 	sub=
-	git log --grep="^git-subtree-dir: $dir/*\$" \
+	local grep_format="^git-subtree-dir: $dir/*\$"
+	if test -n "$ignore_joins"
+	then
+		grep_format="^Add '$dir/' from commit '"
+	fi
+	git log --grep="$grep_format" \
 		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
 	while read a b junk
 	do
@@ -730,12 +735,7 @@ cmd_split () {
 		done
 	fi
 
-	if test -n "$ignore_joins"
-	then
-		unrevs=
-	else
-		unrevs="$(find_existing_splits "$dir" "$revs")"
-	fi
+	unrevs="$(find_existing_splits "$dir" "$revs")"
 
 	# We can't restrict rev-list to only $dir here, because some of our
 	# parents have the $dir contents the root, and those won't match.
-- 
2.19.0.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] subtree: use commits before rejoins for splits
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
  2018-09-28 18:35 ` [PATCH 1/4] subtree: refactor split of a commit into standalone method Strain, Roger L
  2018-09-28 18:35 ` [PATCH 2/4] subtree: make --ignore-joins pay attention to adds Strain, Roger L
@ 2018-09-28 18:35 ` Strain, Roger L
  2018-09-28 18:35 ` [PATCH 4/4] subtree: improve decision on merges kept in split Strain, Roger L
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Strain, Roger L @ 2018-09-28 18:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Junio C Hamano, Stephen R Guglielmo,
	David A . Greene, Matthieu Moy, Stephen R Guglielmo, Dave Ware,
	David Aguilar

Adds recursive evaluation of parent commits which were not part of the
initial commit list when performing a split.

Split expects all relevant commits to be reachable from the target commit
but not reachable from any previous rejoins. However, a branch could be
based on a commit prior to a rejoin, then later merged back into the
current code. In this case, a parent to the commit will not be present in
the initial list of commits, trigging an "incorrect order" warning.

Previous behavior was to consider that commit to have no parent, creating
an original commit containing all subtree content. This commit is not
present in an existing subtree commit graph, changing commit hashes and
making pushing to a subtree repo impossible.

New behavior will recursively check these unexpected parent commits to
track them back to either an earlier rejoin, or a true original commit.
The generated synthetic commits will properly match previously-generated
commits, allowing successful pushing to a prior subtree repo.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index d8861f306..23dd04cbe 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -231,12 +231,14 @@ cache_miss () {
 }
 
 check_parents () {
-	missed=$(cache_miss "$@")
+	missed=$(cache_miss "$1")
+	local indent=$(($2 + 1))
 	for miss in $missed
 	do
 		if ! test -r "$cachedir/notree/$miss"
 		then
 			debug "  incorrect order: $miss"
+			process_split_commit "$miss" "" "$indent"
 		fi
 	done
 }
@@ -606,8 +608,20 @@ ensure_valid_ref_format () {
 process_split_commit () {
 	local rev="$1"
 	local parents="$2"
-	revcount=$(($revcount + 1))
-	progress "$revcount/$revmax ($createcount)"
+	local indent=$3
+
+	if test $indent -eq 0
+	then
+		revcount=$(($revcount + 1))
+	else
+		# processing commit without normal parent information;
+		# fetch from repo
+		parents=$(git show -s --pretty=%P "$rev")
+		extracount=$(($extracount + 1))
+	fi
+
+	progress "$revcount/$revmax ($createcount) [$extracount]"
+
 	debug "Processing commit: $rev"
 	exists=$(cache_get "$rev")
 	if test -n "$exists"
@@ -617,14 +631,13 @@ process_split_commit () {
 	fi
 	createcount=$(($createcount + 1))
 	debug "  parents: $parents"
+	check_parents "$parents" "$indent"
 	newparents=$(cache_get $parents)
 	debug "  newparents: $newparents"
 
 	tree=$(subtree_for_commit "$rev" "$dir")
 	debug "  tree is: $tree"
 
-	check_parents $parents
-
 	# ugly.  is there no better way to tell if this is a subtree
 	# vs. a mainline commit?  Does it matter?
 	if test -z "$tree"
@@ -744,10 +757,11 @@ cmd_split () {
 	revmax=$(eval "$grl" | wc -l)
 	revcount=0
 	createcount=0
+	extracount=0
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		process_split_commit "$rev" "$parents" 0
 	done || exit $?
 
 	latest_new=$(cache_get latest_new)
-- 
2.19.0.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/4] subtree: improve decision on merges kept in split
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (2 preceding siblings ...)
  2018-09-28 18:35 ` [PATCH 3/4] subtree: use commits before rejoins for splits Strain, Roger L
@ 2018-09-28 18:35 ` Strain, Roger L
  2018-10-11 19:46 ` [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos Roger Strain
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Strain, Roger L @ 2018-09-28 18:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Junio C Hamano, Stephen R Guglielmo,
	David A . Greene, Matthieu Moy, Stephen R Guglielmo, Dave Ware,
	David Aguilar

When multiple identical parents are detected for a commit being considered
for copying, explicitly check whether one is the common merge base between
the commits. If so, the other commit can be used as the identical parent;
if not, a merge must be performed to maintain history.

In some situations two parents of a merge commit may appear to both have
identical subtree content with each other and the current commit. However,
those parents can potentially come from different commit graphs.

Previous behavior would simply select one of the identical parents to
serve as the replacement for this commit, based on the order in which they
were processed.

New behavior compares the merge base between the commits to determine if
a new merge commit is necessary to maintain history despite the identical
content.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 23dd04cbe..1c157dbd9 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -541,6 +541,7 @@ copy_or_skip () {
 	nonidentical=
 	p=
 	gotparents=
+	copycommit=
 	for parent in $newparents
 	do
 		ptree=$(toptree_for_commit $parent) || exit $?
@@ -548,7 +549,24 @@ copy_or_skip () {
 		if test "$ptree" = "$tree"
 		then
 			# an identical parent could be used in place of this rev.
-			identical="$parent"
+			if test -n "$identical"
+			then
+				# if a previous identical parent was found, check whether
+				# one is already an ancestor of the other
+				mergebase=$(git merge-base $identical $parent)
+				if test "$identical" = "$mergebase"
+				then
+					# current identical commit is an ancestor of parent
+					identical="$parent"
+				elif test "$parent" != "$mergebase"
+				then
+					# no common history; commit must be copied
+					copycommit=1
+				fi
+			else
+				# first identical parent detected
+				identical="$parent"
+			fi
 		else
 			nonidentical="$parent"
 		fi
@@ -571,7 +589,6 @@ copy_or_skip () {
 		fi
 	done
 
-	copycommit=
 	if test -n "$identical" && test -n "$nonidentical"
 	then
 		extras=$(git rev-list --count $identical..$nonidentical)
-- 
2.19.0.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 0/4] Multiple subtree split fixes regarding complex repos
@ 2018-09-28 18:56 Strain, Roger L
  2018-09-28 18:35 ` [PATCH 1/4] subtree: refactor split of a commit into standalone method Strain, Roger L
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Strain, Roger L @ 2018-09-28 18:56 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Junio C Hamano, Stephen R Guglielmo,
	David A . Greene, Matthieu Moy, Stephen R Guglielmo, Dave Ware,
	David Aguilar

We recently (about eight months ago) transitioned to git source control systems for several very large, very complex systems. We brought over several active versions requiring maintenance updates, and also set up several subtree repos to manage code shared between the systems. Recently, we attempted to push updates back to those subtrees and encountered errors. I believe I have identified and corrected the errors we found in our repos, and would like to contribute those fixes back.

Commands to demonstrate both failures using the current version of the subtree script are here:
https://gist.github.com/FoxFireX/1b794384612b7fd5e7cd157cff96269e

Short summary of three problems involved:
1. Split using rejoins fails in some cases where a commit has a parent which was a parent commit further upstream from a rejoin, causing a new initial commit to be created, which is not related to the original subtree commits.
2. Split using rejoins fails to generate a merge commit which may have triaged the previous problem, but instead elected to use only the parent which is not connected to the original subtree commits. (This may occur when the commit and both parents all share the same subtree hash.)
3. Split ignoring joins also ignores the original add commit, which causes content prior to the add to be considered part of the subtree graph, changing the commit hashes so it is not connected to the original subtree commits.

The following commits address each problem individually, along with a single commit that makes no functional change but performs a small refactor of the existing code. Hopefully that will make reviewing it a simpler task. This is my first attempt at submitting a patch back, so apologies if I've made any errors in the process.

Strain, Roger L (4):
  subtree: refactor split of a commit into standalone method
  subtree: make --ignore-joins pay attention to adds
  subtree: use commits before rejoins for splits
  subtree: improve decision on merges kept in split

 contrib/subtree/git-subtree.sh | 129 +++++++++++++++++++++------------
 1 file changed, 83 insertions(+), 46 deletions(-)

-- 
2.19.0.windows.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (3 preceding siblings ...)
  2018-09-28 18:35 ` [PATCH 4/4] subtree: improve decision on merges kept in split Strain, Roger L
@ 2018-10-11 19:46 ` Roger Strain
  2018-10-12  7:35   ` Junio C Hamano
  2018-10-11 19:46 ` [PATCH v2 1/4] subtree: refactor split of a commit into standalone method Roger Strain
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Roger Strain @ 2018-10-11 19:46 UTC (permalink / raw)
  To: git

After doing some testing at scale, determined that one call was taking too long; replaced that with an alternate call which returns the same data significantly faster.

Also, if anyone has any other feedback on these I'd really love to hear it. It's working better for us (as in, it actually generates a compatible tree version to version) but still isn't perfect, and I'm not sure perfect is achievable, but want to make sure this doesn't things for anyone else.

Changes since v1:
diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 1c157dbd9..7dd643998 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -633,7 +633,7 @@ process_split_commit () {
        else
                # processing commit without normal parent information;
                # fetch from repo
-               parents=$(git show -s --pretty=%P "$rev")
+               parents=$(git log --pretty=%P -n 1 "$rev")
                extracount=$(($extracount + 1))
        fi

Strain, Roger L (4):
  subtree: refactor split of a commit into standalone method
  subtree: make --ignore-joins pay attention to adds
  subtree: use commits before rejoins for splits
  subtree: improve decision on merges kept in split

 contrib/subtree/git-subtree.sh | 129 +++++++++++++++++++++------------
 1 file changed, 83 insertions(+), 46 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] subtree: refactor split of a commit into standalone method
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (4 preceding siblings ...)
  2018-10-11 19:46 ` [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos Roger Strain
@ 2018-10-11 19:46 ` Roger Strain
  2018-10-11 19:46 ` [PATCH v2 2/4] subtree: make --ignore-joins pay attention to adds Roger Strain
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Roger Strain @ 2018-10-11 19:46 UTC (permalink / raw)
  To: git

From: "Strain, Roger L" <roger.strain@swri.org>

In a particularly complex repo, subtree split was not creating
compatible splits for pushing back to a separate repo. Addressing
one of the issues requires recursive handling of parent commits
that were not initially considered by the algorithm. This commit
makes no functional changes, but relocates the code to be called
recursively into a new method to simply comparisons of later
commits.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 78 ++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 36 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index d3f39a862..2cd7b345b 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -598,6 +598,47 @@ ensure_valid_ref_format () {
 		die "'$1' does not look like a ref"
 }
 
+process_split_commit () {
+	local rev="$1"
+	local parents="$2"
+	revcount=$(($revcount + 1))
+	progress "$revcount/$revmax ($createcount)"
+	debug "Processing commit: $rev"
+	exists=$(cache_get "$rev")
+	if test -n "$exists"
+	then
+		debug "  prior: $exists"
+		return
+	fi
+	createcount=$(($createcount + 1))
+	debug "  parents: $parents"
+	newparents=$(cache_get $parents)
+	debug "  newparents: $newparents"
+
+	tree=$(subtree_for_commit "$rev" "$dir")
+	debug "  tree is: $tree"
+
+	check_parents $parents
+
+	# ugly.  is there no better way to tell if this is a subtree
+	# vs. a mainline commit?  Does it matter?
+	if test -z "$tree"
+	then
+		set_notree "$rev"
+		if test -n "$newparents"
+		then
+			cache_set "$rev" "$rev"
+		fi
+		return
+	fi
+
+	newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
+	debug "  newrev is: $newrev"
+	cache_set "$rev" "$newrev"
+	cache_set latest_new "$newrev"
+	cache_set latest_old "$rev"
+}
+
 cmd_add () {
 	if test -e "$dir"
 	then
@@ -706,42 +747,7 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		revcount=$(($revcount + 1))
-		progress "$revcount/$revmax ($createcount)"
-		debug "Processing commit: $rev"
-		exists=$(cache_get "$rev")
-		if test -n "$exists"
-		then
-			debug "  prior: $exists"
-			continue
-		fi
-		createcount=$(($createcount + 1))
-		debug "  parents: $parents"
-		newparents=$(cache_get $parents)
-		debug "  newparents: $newparents"
-
-		tree=$(subtree_for_commit "$rev" "$dir")
-		debug "  tree is: $tree"
-
-		check_parents $parents
-
-		# ugly.  is there no better way to tell if this is a subtree
-		# vs. a mainline commit?  Does it matter?
-		if test -z "$tree"
-		then
-			set_notree "$rev"
-			if test -n "$newparents"
-			then
-				cache_set "$rev" "$rev"
-			fi
-			continue
-		fi
-
-		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
-		debug "  newrev is: $newrev"
-		cache_set "$rev" "$newrev"
-		cache_set latest_new "$newrev"
-		cache_set latest_old "$rev"
+		process_split_commit "$rev" "$parents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] subtree: make --ignore-joins pay attention to adds
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (5 preceding siblings ...)
  2018-10-11 19:46 ` [PATCH v2 1/4] subtree: refactor split of a commit into standalone method Roger Strain
@ 2018-10-11 19:46 ` Roger Strain
  2018-10-11 19:46 ` [PATCH v2 3/4] subtree: use commits before rejoins for splits Roger Strain
  2018-10-11 19:46 ` [PATCH v2 4/4] subtree: improve decision on merges kept in split Roger Strain
  8 siblings, 0 replies; 11+ messages in thread
From: Roger Strain @ 2018-10-11 19:46 UTC (permalink / raw)
  To: git

From: "Strain, Roger L" <roger.strain@swri.org>

Changes the behavior of --ignore-joins to always consider a subtree add
commit, and ignore only splits and squashes.

The --ignore-joins option is documented to ignore prior --rejoin commits.
However, it additionally ignored subtree add commits generated when a
subtree was initially added to a repo.

Due to the logic which determines whether a commit is a mainline commit
or a subtree commit (namely, the presence or absence of content in the
subtree prefix) this causes commits before the initial add to appear to
be part of the subtree. An --ignore-joins split would therefore consider
those commits part of the subtree history and include them at the
beginning of the synthetic history, causing the resulting hashes to be
incorrect for all later commits.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 2cd7b345b..d8861f306 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -340,7 +340,12 @@ find_existing_splits () {
 	revs="$2"
 	main=
 	sub=
-	git log --grep="^git-subtree-dir: $dir/*\$" \
+	local grep_format="^git-subtree-dir: $dir/*\$"
+	if test -n "$ignore_joins"
+	then
+		grep_format="^Add '$dir/' from commit '"
+	fi
+	git log --grep="$grep_format" \
 		--no-show-signature --pretty=format:'START %H%n%s%n%n%b%nEND%n' $revs |
 	while read a b junk
 	do
@@ -730,12 +735,7 @@ cmd_split () {
 		done
 	fi
 
-	if test -n "$ignore_joins"
-	then
-		unrevs=
-	else
-		unrevs="$(find_existing_splits "$dir" "$revs")"
-	fi
+	unrevs="$(find_existing_splits "$dir" "$revs")"
 
 	# We can't restrict rev-list to only $dir here, because some of our
 	# parents have the $dir contents the root, and those won't match.
-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] subtree: use commits before rejoins for splits
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (6 preceding siblings ...)
  2018-10-11 19:46 ` [PATCH v2 2/4] subtree: make --ignore-joins pay attention to adds Roger Strain
@ 2018-10-11 19:46 ` Roger Strain
  2018-10-11 19:46 ` [PATCH v2 4/4] subtree: improve decision on merges kept in split Roger Strain
  8 siblings, 0 replies; 11+ messages in thread
From: Roger Strain @ 2018-10-11 19:46 UTC (permalink / raw)
  To: git

From: "Strain, Roger L" <roger.strain@swri.org>

Adds recursive evaluation of parent commits which were not part of the
initial commit list when performing a split.

Split expects all relevant commits to be reachable from the target commit
but not reachable from any previous rejoins. However, a branch could be
based on a commit prior to a rejoin, then later merged back into the
current code. In this case, a parent to the commit will not be present in
the initial list of commits, trigging an "incorrect order" warning.

Previous behavior was to consider that commit to have no parent, creating
an original commit containing all subtree content. This commit is not
present in an existing subtree commit graph, changing commit hashes and
making pushing to a subtree repo impossible.

New behavior will recursively check these unexpected parent commits to
track them back to either an earlier rejoin, or a true original commit.
The generated synthetic commits will properly match previously-generated
commits, allowing successful pushing to a prior subtree repo.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index d8861f306..eef4199ae 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -231,12 +231,14 @@ cache_miss () {
 }
 
 check_parents () {
-	missed=$(cache_miss "$@")
+	missed=$(cache_miss "$1")
+	local indent=$(($2 + 1))
 	for miss in $missed
 	do
 		if ! test -r "$cachedir/notree/$miss"
 		then
 			debug "  incorrect order: $miss"
+			process_split_commit "$miss" "" "$indent"
 		fi
 	done
 }
@@ -606,8 +608,20 @@ ensure_valid_ref_format () {
 process_split_commit () {
 	local rev="$1"
 	local parents="$2"
-	revcount=$(($revcount + 1))
-	progress "$revcount/$revmax ($createcount)"
+	local indent=$3
+
+	if test $indent -eq 0
+	then
+		revcount=$(($revcount + 1))
+	else
+		# processing commit without normal parent information;
+		# fetch from repo
+		parents=$(git log --pretty=%P -n 1 "$rev")
+		extracount=$(($extracount + 1))
+	fi
+
+	progress "$revcount/$revmax ($createcount) [$extracount]"
+
 	debug "Processing commit: $rev"
 	exists=$(cache_get "$rev")
 	if test -n "$exists"
@@ -617,14 +631,13 @@ process_split_commit () {
 	fi
 	createcount=$(($createcount + 1))
 	debug "  parents: $parents"
+	check_parents "$parents" "$indent"
 	newparents=$(cache_get $parents)
 	debug "  newparents: $newparents"
 
 	tree=$(subtree_for_commit "$rev" "$dir")
 	debug "  tree is: $tree"
 
-	check_parents $parents
-
 	# ugly.  is there no better way to tell if this is a subtree
 	# vs. a mainline commit?  Does it matter?
 	if test -z "$tree"
@@ -744,10 +757,11 @@ cmd_split () {
 	revmax=$(eval "$grl" | wc -l)
 	revcount=0
 	createcount=0
+	extracount=0
 	eval "$grl" |
 	while read rev parents
 	do
-		process_split_commit "$rev" "$parents"
+		process_split_commit "$rev" "$parents" 0
 	done || exit $?
 
 	latest_new=$(cache_get latest_new)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] subtree: improve decision on merges kept in split
  2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
                   ` (7 preceding siblings ...)
  2018-10-11 19:46 ` [PATCH v2 3/4] subtree: use commits before rejoins for splits Roger Strain
@ 2018-10-11 19:46 ` Roger Strain
  8 siblings, 0 replies; 11+ messages in thread
From: Roger Strain @ 2018-10-11 19:46 UTC (permalink / raw)
  To: git

From: "Strain, Roger L" <roger.strain@swri.org>

When multiple identical parents are detected for a commit being considered
for copying, explicitly check whether one is the common merge base between
the commits. If so, the other commit can be used as the identical parent;
if not, a merge must be performed to maintain history.

In some situations two parents of a merge commit may appear to both have
identical subtree content with each other and the current commit. However,
those parents can potentially come from different commit graphs.

Previous behavior would simply select one of the identical parents to
serve as the replacement for this commit, based on the order in which they
were processed.

New behavior compares the merge base between the commits to determine if
a new merge commit is necessary to maintain history despite the identical
content.

Signed-off-by: Strain, Roger L <roger.strain@swri.org>
---
 contrib/subtree/git-subtree.sh | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index eef4199ae..7dd643998 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -541,6 +541,7 @@ copy_or_skip () {
 	nonidentical=
 	p=
 	gotparents=
+	copycommit=
 	for parent in $newparents
 	do
 		ptree=$(toptree_for_commit $parent) || exit $?
@@ -548,7 +549,24 @@ copy_or_skip () {
 		if test "$ptree" = "$tree"
 		then
 			# an identical parent could be used in place of this rev.
-			identical="$parent"
+			if test -n "$identical"
+			then
+				# if a previous identical parent was found, check whether
+				# one is already an ancestor of the other
+				mergebase=$(git merge-base $identical $parent)
+				if test "$identical" = "$mergebase"
+				then
+					# current identical commit is an ancestor of parent
+					identical="$parent"
+				elif test "$parent" != "$mergebase"
+				then
+					# no common history; commit must be copied
+					copycommit=1
+				fi
+			else
+				# first identical parent detected
+				identical="$parent"
+			fi
 		else
 			nonidentical="$parent"
 		fi
@@ -571,7 +589,6 @@ copy_or_skip () {
 		fi
 	done
 
-	copycommit=
 	if test -n "$identical" && test -n "$nonidentical"
 	then
 		extras=$(git rev-list --count $identical..$nonidentical)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos
  2018-10-11 19:46 ` [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos Roger Strain
@ 2018-10-12  7:35   ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2018-10-12  7:35 UTC (permalink / raw)
  To: Roger Strain; +Cc: git

Roger Strain <rstrain@swri.org> writes:

> After doing some testing at scale, determined that one call was
> taking too long; replaced that with an alternate call which
> returns the same data significantly faster.

Curious where the time goes.  Do you know?

> Also, if anyone has any other feedback on these I'd really love to
> hear it. It's working better for us (as in, it actually generates

The previous one is already in 'next'; please make it incremental
with explanation as to why "show -s" is worse than "log -1" (but see
below).

>                 # processing commit without normal parent information;
>                 # fetch from repo
> -               parents=$(git show -s --pretty=%P "$rev")
> +               parents=$(git log --pretty=%P -n 1 "$rev")

If you want to learn the parents of a given commit:

	$ git help revisions

says

       <rev>^@, e.g. HEAD^@
           A suffix ^ followed by an at sign is the same as listing all parents of <rev>
           (meaning, include anything reachable from its parents, but not the commit
           itself).

so

		parents=$(git rev-parse "$rev^@")

ought to be the most efficient way to do this, I suspect.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28 18:56 [PATCH 0/4] Multiple subtree split fixes regarding complex repos Strain, Roger L
2018-09-28 18:35 ` [PATCH 1/4] subtree: refactor split of a commit into standalone method Strain, Roger L
2018-09-28 18:35 ` [PATCH 2/4] subtree: make --ignore-joins pay attention to adds Strain, Roger L
2018-09-28 18:35 ` [PATCH 3/4] subtree: use commits before rejoins for splits Strain, Roger L
2018-09-28 18:35 ` [PATCH 4/4] subtree: improve decision on merges kept in split Strain, Roger L
2018-10-11 19:46 ` [PATCH v2 0/4] Multiple subtree split fixes regarding complex repos Roger Strain
2018-10-12  7:35   ` Junio C Hamano
2018-10-11 19:46 ` [PATCH v2 1/4] subtree: refactor split of a commit into standalone method Roger Strain
2018-10-11 19:46 ` [PATCH v2 2/4] subtree: make --ignore-joins pay attention to adds Roger Strain
2018-10-11 19:46 ` [PATCH v2 3/4] subtree: use commits before rejoins for splits Roger Strain
2018-10-11 19:46 ` [PATCH v2 4/4] subtree: improve decision on merges kept in split Roger Strain

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox