git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG?] 'git subtree split' replicates unrelated mainline merges into subtree?
@ 2021-01-07  1:54 Johan Herland
  2021-01-08 18:54 ` Strain, Roger L.
  0 siblings, 1 reply; 2+ messages in thread
From: Johan Herland @ 2021-01-07  1:54 UTC (permalink / raw)
  To: git; +Cc: Roger Strain, Avery Pennarun, Johan Herland

Hi,

I'm observing an issue with 'git subtree split' after adding a subtree
(with 'git subtree add') in a topic branch, and then merging the topic
branch back into the main branch (where other/unrelated changes have
accrued in the meantime). The patch below adds a failing t7900-subtree
test case which reproduces the issue.

Here is the repo history just prior to running 'git subtree split'

    $ git log --oneline --graph --decorate
    *   1355f01 (HEAD -> master) merge should be skipped on subtree
    |\  
    | *   adc8ecf (otherbranch) Add 'sub dir/' from commit '5280958b2f997c3ce7bff7192cceb19f55b45cd9'
    | |\  
    | | * 5280958 sub1
    * | a966436 other_unrelated
    |/  
    * 9b6e8f6 main1

The only commit that touches the subtree is the one created by
'git subtree add' (adc8ecf), hence I would expect 'git subtree split'
to NOT synthesize any new history, but simply return the last commit
in the subtree history (5280958)

However, 'git subtree split' ends up synthesizing a history that
replicates the master history since the start of the topic branch
(which consists of entirely unrelated commits) from the superproject
(aka. mainline) into the synthesized subtree history (57cbde8 and its
left-hand parent in this graph):

    $ git log --oneline --graph --all
    *   1355f01 (HEAD -> master) merge should be skipped on subtree
    |\  
    | *   adc8ecf (otherbranch) Add 'sub dir/' from commit '5280958b2f997c3ce7bff7192cceb19f55b45cd9'
    | |\  
    | | | * 57cbde8 (subproj-br) merge should be skipped on subtree
    | |_|/| 
    |/| |/  
    | | * 5280958 sub1
    * | a966436 other_unrelated
    |/  
    * 9b6e8f6 main1

I've been trying to understand how the subtree cache (mis)behaves in
this case. The cache is initially seeded from find_existing_splits(),
which finds these lines the adc8ecf commit message:

    git-subtree-mainline: 9b6e8f677b700a00e9f1715e2624bf5ed756dc85
    git-subtree-split: 5280958b2f997c3ce7bff7192cceb19f55b45cd9

and adds these corresponding entries to the cache:

    9b6e8f6 -> 5280958
    5280958 -> 5280958

In other words, the cache starts out claiming that 5280958 is the
equivalent subtree commit for the 9b6e8f6 mainline commit.
However, in my naive understanding this does not make sense, as
9b6e8f6 _precedes_ the subtree addition, and has no content in
the relevant subdir.

When process_split_commit() proceeds to tackle each subsequent commit
in order, it adds the following cache entries:

    a966436 -> a966436 (notree flag set)
    adc8ecf -> 5280958 (correct, AFAICS)
    1355f01 -> 57cbde8 (synthesizing a merge of a966436 and 5280958)

AFAICS, the problem originates from the combination of:
 - the first mainline commit 9b6e8f6 mapping to subtree commit 5280958
 - the subsequent mainline commit a966436 mapping to _itself_

I suspect that if both had pointed to the same cache state (either
both pointing to 5280958, or both having set the notree flag, the
latter makes more sense, IMHO), then we would skip synthesizing the
incorrect history. However I suspect I've overlooked something as I
have yet to find a simple tweak that does not break other test cases.


Stay safe,
...Johan


---
 contrib/subtree/t/t7900-subtree.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 57ff4b25c1..ef1fcaa3c0 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -971,6 +971,34 @@ test_expect_success 'push split to subproj' '
 	)
 '
 
+# If a subtree is added to a topic branch, which is then merged back to master
+# (where other, unrelated commits have been committed in meantime, a later
+# subtree split should not generate any new history (as nothing has touched
+# the sub dir since it was added)
+
+next_test
+test_expect_success 'split a subtree across mainline history with merge' '
+	subtree_test_create_repo "$subtree_test_count" &&
+	subtree_test_create_repo "$subtree_test_count/sub proj" &&
+	test_create_commit "$subtree_test_count" main1 &&
+	test_create_commit "$subtree_test_count/sub proj" sub1 &&
+	sub_head=$(cd "$subtree_test_count/sub proj"; git rev-parse HEAD) &&
+	(
+		cd "$subtree_test_count" &&
+		git checkout -b otherbranch &&
+		git fetch ./"sub proj" master &&
+		git subtree add --prefix="sub dir" FETCH_HEAD &&
+		git checkout master
+	) &&
+	test_create_commit "$subtree_test_count" other_unrelated &&
+	(
+		cd "$subtree_test_count" &&
+		git merge -m "merge should be skipped on subtree" otherbranch &&
+		git subtree split --prefix="sub dir" --branch subproj-br -d &&
+		check_equal "$(git rev-parse subproj-br)" "$sub_head"
+	)
+'
+
 #
 # This test covers 2 cases in subtree split copy_or_skip code
 # 1) Merges where one parent is a superset of the changes of the other
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [BUG?] 'git subtree split' replicates unrelated mainline merges into subtree?
  2021-01-07  1:54 [BUG?] 'git subtree split' replicates unrelated mainline merges into subtree? Johan Herland
@ 2021-01-08 18:54 ` Strain, Roger L.
  0 siblings, 0 replies; 2+ messages in thread
From: Strain, Roger L. @ 2021-01-08 18:54 UTC (permalink / raw)
  To: Johan Herland, git@vger.kernel.org; +Cc: Avery Pennarun

(Apologies for message formatting, fighting Outlook)

> I've been trying to understand how the subtree cache (mis)behaves in
> this case. The cache is initially seeded from find_existing_splits(),
> which finds these lines the adc8ecf commit message:
> 
>     git-subtree-mainline: 9b6e8f677b700a00e9f1715e2624bf5ed756dc85
>     git-subtree-split: 5280958b2f997c3ce7bff7192cceb19f55b45cd9
> 
> and adds these corresponding entries to the cache:
> 
>     9b6e8f6 -> 5280958
>     5280958 -> 5280958
> 
> In other words, the cache starts out claiming that 5280958 is the
> equivalent subtree commit for the 9b6e8f6 mainline commit.
> However, in my naive understanding this does not make sense, as
> 9b6e8f6 _precedes_ the subtree addition, and has no content in
> the relevant subdir.


I think you've identified the exact problem right here. In the normal
split/rejoin commits, the mainline commit *prior to* the merge commit
does, in fact, represent the same subtree state as the subtree commit
which is also merged at that point. But in the case of an add, that's
not true, and I'm actually a little surprised the same commit message
markers are generated.

What really should be captured by the initial cache seeding is that
the add merge commit *itself* has the same subtree content. However,
that can't be determined at the time the merge commit is created, as
the hash of that merge commit is determined by the commit message
itself.

I think a possible solution to this would be modifying the initial cache
process to isolate the Add commits and handle them differently.
Rather than using the hashes in the commit message, it should map
the merge commit itself to the subtree-split commit, and either do
nothing with the subtree-mainline commit hash, or explicitly set it
to notree.

However, this will complicate the logic of building the initial cache,
as it currently only cares about the existence of those simple lines,
and adds the mappings, erroneously as you have noted in this case.

It might also be worth changing the commit message provided for
adds so it no longer generates the incorrect assertion that the mainline
commit is identical subtree-wise, but even with that change, support
for correctly handling existing commits would still be ideal.

Currently, we're using a local version of the subtree script based on
some changes laid out here: https://github.com/gitgitgadget/git/pull/493
I'm hopeful that changeset will eventually land here, as it helps with
several complex issues in our repositories. I bring that up also because
it introduces some additional tools for managin the initial cache,
allowing manual mapping of one commit to another. That version might
allow some level of testing on whether this idea would correct the
problem described.

-- 
Roger Strain

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-08 19:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-07  1:54 [BUG?] 'git subtree split' replicates unrelated mainline merges into subtree? Johan Herland
2021-01-08 18:54 ` Strain, Roger L.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).