git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Potential bug with octopus merges and symlinks
@ 2021-12-02  3:04 Michael McClimon
  2021-12-04 17:34 ` Elijah Newren
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Michael McClimon @ 2021-12-02  3:04 UTC (permalink / raw)
  To: git

I am having a problem with octopus merges when symbolic links are involved.
I'm not completely convinced this is a bug in git, but I'm also not convinced
that it's _not_. (If it's not, my apologies; I often find it very hard to
track down the answers to complicated git questions on the internet.)

There is a minimal reproducer available at
https://github.com/mmcclimon/git-merge-problem-demo. Fetch all the branches
there. The main branch contains a directory (dir1) with a single file
(file.txt), plus a symlink (dir2), which links to dir1. branch1 replaces this
symlink with a copy of the files that were linked to. (This was accomplished
with: rm dir2; cp -r dir1 dir2.) branch2 and branch3 do not touch this
directory at all.

Merging these three branches fails:

$ git merge branch1 branch2 branch3
Fast-forwarding to: branch1
Trying simple merge with branch2
Simple merge did not work, trying automatic merge.
Trying simple merge with branch3
error: Entry 'dir2/file.txt' not uptodate. Cannot merge.
Merge with strategy octopus failed.

The order here matters! Here is every permutation (1 here is the symlink
change) to git merge; only the first two fail, all the others work.

1 2 3   FAIL
1 3 2   FAIL
2 1 3   PASS
2 3 1   PASS
3 1 2   PASS
3 2 1   PASS
1 2     PASS
2 1     PASS
2 3     PASS
3 2     PASS
1 3     PASS
3 1     PASS

I tracked this down as best I could (I am not a C programmer), by adding 
`set -x` to my copy of git-octopus-merge.sh. I determined that the line is
failing is toward the very bottom of the main loop there: 
    git read-tree -u -m --aggressive  $common $MRT $SHA1

I had a read of the index-format.txt and then the git read-tree docs, and I
confess that I don't _totally_ follow the latter. But, I wonder if maybe
--aggressive is perhaps not aggressive enough if the parts of the index that
store symbolic link data aren't taken into account, when it's changed between
two versions and the blobs involved aren't actually different.

I recognize that replacing a symlink with real files under the same name is a
weird thing to do, but this is a pared-down example of a thing that happens
occasionally at $work, where we regularly octopus-merge 15+ heads into
development branches. I would also be happy to know, if this isn't a bug, if
there's some other workaround! The only way I know to fix this case is to
merge the symlink change, then rebase all in-flight branches against it. That
is often extremely time-consuming and tedious, though, and requires a human to
be involved (all the other merging is done by CI.)

Thanks!
Michael


Please review the rest of the bug report below.
You can delete any lines you don't wish to share.

[System Info]
git version:
git version 2.34.1
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Darwin 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64
compiler info: clang: 13.0.0 (clang-1300.0.29.3)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]


-- 
Michael McClimon
michael@mcclimon.org


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug with octopus merges and symlinks
  2021-12-02  3:04 Potential bug with octopus merges and symlinks Michael McClimon
@ 2021-12-04 17:34 ` Elijah Newren
  2021-12-06 16:53 ` Martin von Zweigbergk
  2021-12-06 18:13 ` Junio C Hamano
  2 siblings, 0 replies; 4+ messages in thread
From: Elijah Newren @ 2021-12-04 17:34 UTC (permalink / raw)
  To: Michael McClimon; +Cc: Git Mailing List

On Sat, Dec 4, 2021 at 5:47 AM Michael McClimon <michael@mcclimon.org> wrote:
>
> I am having a problem with octopus merges when symbolic links are involved.
> I'm not completely convinced this is a bug in git, but I'm also not convinced
> that it's _not_. (If it's not, my apologies; I often find it very hard to
> track down the answers to complicated git questions on the internet.)
>
> There is a minimal reproducer available at
> https://github.com/mmcclimon/git-merge-problem-demo. Fetch all the branches
> there. The main branch contains a directory (dir1) with a single file
> (file.txt), plus a symlink (dir2), which links to dir1. branch1 replaces this
> symlink with a copy of the files that were linked to. (This was accomplished
> with: rm dir2; cp -r dir1 dir2.) branch2 and branch3 do not touch this
> directory at all.
>
> Merging these three branches fails:
>
> $ git merge branch1 branch2 branch3
> Fast-forwarding to: branch1
> Trying simple merge with branch2
> Simple merge did not work, trying automatic merge.
> Trying simple merge with branch3
> error: Entry 'dir2/file.txt' not uptodate. Cannot merge.
> Merge with strategy octopus failed.
>
> The order here matters! Here is every permutation (1 here is the symlink
> change) to git merge; only the first two fail, all the others work.
>
> 1 2 3   FAIL
> 1 3 2   FAIL
> 2 1 3   PASS
> 2 3 1   PASS
> 3 1 2   PASS
> 3 2 1   PASS
> 1 2     PASS
> 2 1     PASS
> 2 3     PASS
> 3 2     PASS
> 1 3     PASS
> 3 1     PASS
>
> I tracked this down as best I could (I am not a C programmer), by adding
> `set -x` to my copy of git-octopus-merge.sh. I determined that the line is
> failing is toward the very bottom of the main loop there:
>     git read-tree -u -m --aggressive  $common $MRT $SHA1
>
> I had a read of the index-format.txt and then the git read-tree docs, and I
> confess that I don't _totally_ follow the latter. But, I wonder if maybe
> --aggressive is perhaps not aggressive enough if the parts of the index that
> store symbolic link data aren't taken into account, when it's changed between
> two versions and the blobs involved aren't actually different.
>
> I recognize that replacing a symlink with real files under the same name is a
> weird thing to do, but this is a pared-down example of a thing that happens
> occasionally at $work, where we regularly octopus-merge 15+ heads into
> development branches. I would also be happy to know, if this isn't a bug, if
> there's some other workaround! The only way I know to fix this case is to
> merge the symlink change, then rebase all in-flight branches against it. That
> is often extremely time-consuming and tedious, though, and requires a human to
> be involved (all the other merging is done by CI.)

Yeah, I'm not too surprised that there are cases like this that trip up
read-tree and octopus.  There might be a fix in unpack-trees.c, but
it'd take some digging to verify if that's true or it's just a design
limitation.

However, I can give you a workaround, based on my nebulous ideas for
making an octopus-based-on-ort strategy (oort?).  Basically, just do a
sequence of individual merges with ort (which since v2.34 is the default
non-octopus merge strategy), then take the resulting tree and recraft it
into an octopus merge.  So, something like this:

  git merge --no-ff --no-edit branch1
  git merge --no-ff --no-edit branch2
  git merge --no-ff --no-edit branch3

  NEW_TREE=$(git write-tree)

  NEW_COMMIT=$(git commit-tree $NEW_TREE \
                   -p HEAD~3 -p HEAD~2^2 -p HEAD~1^2 -p HEAD^2 \
                   -m "Merge branches 'branch1', 'branch2', and 'branch3'")

  git reset $NEW_COMMIT

Which at least works with the git-merge-problem-demo repository example
you provided.

Note that the reason for the flags above are:
  --no-ff: ensure merge commits are created, so our -p flags to
           commit-tree are correct
  --no-edit: we'll later replace all the commit messages anyway

Also, if you're using an older version of git, the recursive strategy
would be fine too for the steps above; the main advantage of ort would
be that the intermediate steps could be done in memory if this were
implemented as a real builtin merge strategy rather than just a sequence
of shell commands.  The in-memory advantage might be good enough reason
to take this nebulous idea of mine and create the 'oort' strategy at
some point in the future, but for now the sequence of shell commands
should give you what you need.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug with octopus merges and symlinks
  2021-12-02  3:04 Potential bug with octopus merges and symlinks Michael McClimon
  2021-12-04 17:34 ` Elijah Newren
@ 2021-12-06 16:53 ` Martin von Zweigbergk
  2021-12-06 18:13 ` Junio C Hamano
  2 siblings, 0 replies; 4+ messages in thread
From: Martin von Zweigbergk @ 2021-12-06 16:53 UTC (permalink / raw)
  To: Michael McClimon; +Cc: git

On Sat, Dec 4, 2021 at 5:20 AM Michael McClimon <michael@mcclimon.org> wrote:
>
> There is a minimal reproducer available at
> https://github.com/mmcclimon/git-merge-problem-demo. Fetch all the branches
> there. The main branch contains a directory (dir1) with a single file
> (file.txt), plus a symlink (dir2), which links to dir1. branch1 replaces this
> symlink with a copy of the files that were linked to. (This was accomplished
> with: rm dir2; cp -r dir1 dir2.) branch2 and branch3 do not touch this
> directory at all.
>
> Merging these three branches fails:
>
> $ git merge branch1 branch2 branch3
> Fast-forwarding to: branch1
> Trying simple merge with branch2
> Simple merge did not work, trying automatic merge.
> Trying simple merge with branch3
> error: Entry 'dir2/file.txt' not uptodate. Cannot merge.
> Merge with strategy octopus failed.
>
> The order here matters! Here is every permutation (1 here is the symlink
> change) to git merge; only the first two fail, all the others work.
>
> 1 2 3   FAIL
> 1 3 2   FAIL
> 2 1 3   PASS
> 2 3 1   PASS
> 3 1 2   PASS
> 3 2 1   PASS
> 1 2     PASS
> 2 1     PASS
> 2 3     PASS
> 3 2     PASS
> 1 3     PASS
> 3 1     PASS

I'm not a Git contributor (since ~10 years ago) but I was curious and
tried your repro myself. Thanks for the very simple instructions.
Maybe it's useful to see what the error is when you use the octopus
strategy to merge only two commits:

```
$ git checkout origin/branch1
$ git merge -s octopus origin/branch2
Trying really trivial in-index merge...
error: Merge requires file-level merging
Nope.
Merge with strategy octopus failed.
```

That "Merge requires file-level merging" is surprising but maybe the
"really trivial" is the explanation :) My *guess* is the answer is
that the octopus strategy is really old and should be rewritten using
the new "ort" strategy (or maybe be integrated into it). I'm afraid I
can't think of a workaround for you (other than trying different
permutations until it works).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential bug with octopus merges and symlinks
  2021-12-02  3:04 Potential bug with octopus merges and symlinks Michael McClimon
  2021-12-04 17:34 ` Elijah Newren
  2021-12-06 16:53 ` Martin von Zweigbergk
@ 2021-12-06 18:13 ` Junio C Hamano
  2 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2021-12-06 18:13 UTC (permalink / raw)
  To: Michael McClimon; +Cc: git

Michael McClimon <michael@mcclimon.org> writes:

> The order here matters! Here is every permutation (1 here is the symlink
> change) to git merge; only the first two fail, all the others work.

I am not an active Octopus contributor, but was the original
inventor.  And it is not surprising that the order matters, because
I designed it that way ;-)

In the original design, octopus was to be used only to create the
simplest forms of merges without any conflicts, because octopus
merges make it almost impossible to examine the result of the merge
with comparison between one of the parent and the result (you have
to remember that there was no "diff --cc" invented yet, let alone
"diff --remerge-diff", back then to help us).  Aborting the merge at
the first conflict when merging the remote heads one by one is a way
to make sure that the result is a simple "combine unrelated work
into one" merge.  This does make the order matter somewhat.  When
more than one remote heads touch the same path, the order in which
they are merged into base does make a difference.

To be more strict for the goal of limiting octopus to the simplest
forms of merges without any conflicts, we should have insisted that
the set of paths these remote heads want to modify should not
overlap and that would have made the order totally not to matter,
but "abort at the first conflict" was easier to implement and
practical.

And later, before the command got part of the git suite, the rule
was further loosened to allow a conflicting merge when merging the
last remote head and let the user hand-resolve.  This makes the
order matter even more.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-06 18:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-02  3:04 Potential bug with octopus merges and symlinks Michael McClimon
2021-12-04 17:34 ` Elijah Newren
2021-12-06 16:53 ` Martin von Zweigbergk
2021-12-06 18:13 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).