Opinions on changing add/add conflict resolution?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Opinions on changing add/add conflict resolution?
@ 2018-03-12 18:32 Elijah Newren
  2018-03-12 18:47 ` Jonathan Nieder
  2018-03-12 22:19 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-12 18:32 UTC (permalink / raw)
  To: Git Mailing List

Hi everyone,

I'd like to change add/add conflict resolution.  Currently when such a
conflict occurs (say at ${path}), git unconditionally does a two-way
merge of the two files and sticks the result in the working tree at
${path}.

I would like to make it instead first check whether the two files are
similar; if they are, then do the two-way merge, but if they're not,
then instead write the two files out to separate paths (${path}~HEAD
and ${path}~$MERGE, while making sure that ${path} is removed from the
working copy).

Thoughts?

I have a patch series[1] with more details and other changes, but
wanted to especially get feedback on this issue even from folks that
didn't have enough time to read the patches or even the cover letter.

Thanks,
Elijah

[1] https://public-inbox.org/git/20180305171125.22331-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 18:32 Opinions on changing add/add conflict resolution? Elijah Newren
@ 2018-03-12 18:47 ` Jonathan Nieder
  2018-03-12 21:26   ` Elijah Newren
  2018-03-12 22:19 ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2018-03-12 18:47 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

Hi,

Elijah Newren wrote:

> Hi everyone,
>
> I'd like to change add/add conflict resolution.  Currently when such a
> conflict occurs (say at ${path}), git unconditionally does a two-way
> merge of the two files and sticks the result in the working tree at
> ${path}.
>
> I would like to make it instead first check whether the two files are
> similar; if they are, then do the two-way merge, but if they're not,
> then instead write the two files out to separate paths (${path}~HEAD
> and ${path}~$MERGE, while making sure that ${path} is removed from the
> working copy).
>
> Thoughts?

My immediate reaction is that it seems inconsistent with the rest of
merge behavior.  Why would add/add behave this way but edit/edit not
behave this way?

Would this behavior be configurable or unconditional?  I suspect I
would want it turned off in my own use.

On the other hand, in the case of wild difference between the two
files, skipping the two-way merge and just writing one of the versions
to the worktree (like we do for binary files) sounds like something I
would like in my own use.

Can you add something more about the motivation to the commit message?
E.g. is this about performance, interaction with some tools, to
support some particular workflow, etc?

Thanks and hope that helps,
Jonathan

> I have a patch series[1] with more details and other changes, but
> wanted to especially get feedback on this issue even from folks that
> didn't have enough time to read the patches or even the cover letter.
>
> Thanks,
> Elijah
>
> [1] https://public-inbox.org/git/20180305171125.22331-1-newren@gmail.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 18:47 ` Jonathan Nieder
@ 2018-03-12 21:26   ` Elijah Newren
  2018-03-12 21:35     ` Jonathan Nieder
  2018-03-13  5:30     ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-12 21:26 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List

Hi,

Cool, thanks for taking a look!

On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> My immediate reaction is that it seems inconsistent with the rest of
> merge behavior.  Why would add/add behave this way but edit/edit not
> behave this way?

Fair enough.  I have two separate reasons for believing that edit/edit
IS different than add/add; further, the current behavior for add/add
is inconsistent with the rest of merge behavior on a different axis.
I think it's helpful to get the motivation for my changes before
trying to explain why those are different and before delving into the
other inconsistency, so I'll add the explanation of this claim at the
end of this email.

> Would this behavior be configurable or unconditional?  I suspect I
> would want it turned off in my own use.
>
> On the other hand, in the case of wild difference between the two
> files, skipping the two-way merge and just writing one of the versions
> to the worktree (like we do for binary files) sounds like something I
> would like in my own use.

I think you just said the exact opposite thing in these last two
paragraphs; that you wouldn't want my proposed behavior and that you'd
want it.  I suspect that may mean that I misunderstood something you
said here.  Could you clarify?

> Can you add something more about the motivation to the commit message?
> E.g. is this about performance, interaction with some tools, to
> support some particular workflow, etc?

To be honest, I'm a little unsure how without even more excessive and
repetitive wording across commits.  Let me attempt here, and maybe you
can suggest how to change my commit messages?

  * When files are wildly dissimilar -- as you mentioned -- it'd be
easier for users to resolve conflicts if we wrote files out to
separate paths instead of two-way merging them.
  * There is a weird inconsistency between handling of add/add,
rename/add, and rename/rename(2to1).  I want this inconsistency fixed.
  * There is a significant performance optimization possible for
rename detection in git merge if we remove the above inconsistency and
treat these conflict types the same.  (Actually, we only need them to
be the same in the special case where a renamed file is unmodified on
the un-renamed side of history, but I don't want to special case that
situation because it sounds like a recipe for inconsistent results).
  * If we insist on these conflict types being handled differently
because there really is some important distinction between them, then
I'm almost certain that build_fake_ancestor() and it's usage in
am/rebase is broken/wrong.  This is because the usage of
build_fake_ancestor(), as currently implemented, will cause
mis-detection of one conflict type as another.

> Thanks and hope that helps,
> Jonathan

As promised above, here are my reasons for believing that edit/edit IS
fundamentally different than add/add for the behavior considered here,
as well as my explanation of the weird inconsistency add/add currently
has with the rest of merge behavior:

==Reason 1==
edit/edit means that ${path} existed in the merge base as well as both
sides.  It is more likely than not that ${path} on each side is
similar to the merge base and thus similar (though less so) to each
other.  On the other hand, add/add means ${path} did NOT exist in the
merge base.  Thus, we cannot use the same reason to believe they are
similar.  The only reason we have to assume similarity is the
filename, which, while a useful indicator, gives us a weird
inconsistency within git for handling pathname collisions -- see
"Weird inconsistency" below.

==Reason 2==
In merges, git does rename detection, but not copy or break detection.
In particular, break detection is what would allow us to find out that
the edited files are not similar to the original.  add/add conflicts
automatically come to us "pre-broken" (is there a better term for
knowing that two files are not paired?), though it is possible that
they just happen to be similar and should be paired/joined.

==Follow on to reason 2==
Not doing break detection means we get e.g. rename/add-source cases
wrong, so there are valid arguments for saying we should add break
detection.  However, it would be computationally expensive and would
require a fair amount of work re-thinking all the cases in
merge-recursive to make sure we get rename-breaks right (there are
FIXMEs and comments and testcases I added years ago to help someone
along the path if they want to try).  Since rename/add-source just
isn't a conflict type that occurs much (if it all) in real world
repositories, the motivation to implement it is somewhat low.

==Weird inconsistency git currently exhibits==
There are three types of conflicts representing two (possibly
unrelated) files colliding at the same path: add/add, rename/add, and
rename/rename(2to1).  add/add does the two-way merge of the colliding
files, and the other two conflict types write the two colliding files
out to separate paths.  I think the motivation behind the current
behavior was that add/add was written assuming filename-match implies
similarity, and the other two were written assuming filename-match
didn't imply anything and the files should be assumed dissimilar.
That seems weird to me.  I think all three should behave the same
(especially when the renamed file was unmodified on the un-renamed
side of history, and likely whenever there is no content conflicts for
the renamed file).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 21:26   ` Elijah Newren
@ 2018-03-12 21:35     ` Jonathan Nieder
  2018-03-12 23:08       ` Hilco Wijbenga
  2018-03-13  0:38       ` Elijah Newren
  2018-03-13  5:30     ` Junio C Hamano
  1 sibling, 2 replies; 22+ messages in thread
From: Jonathan Nieder @ 2018-03-12 21:35 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

Hi again,

Elijah Newren wrote:
> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> Would this behavior be configurable or unconditional?  I suspect I
>> would want it turned off in my own use.
>>
>> On the other hand, in the case of wild difference between the two
>> files, skipping the two-way merge and just writing one of the versions
>> to the worktree (like we do for binary files) sounds like something I
>> would like in my own use.
>
> I think you just said the exact opposite thing in these last two
> paragraphs; that you wouldn't want my proposed behavior and that you'd
> want it.  I suspect that may mean that I misunderstood something you
> said here.  Could you clarify?

Sorry for the lack of clarity.  My understanding was that the proposed
behavior was to write two files:

	${path}~HEAD
	${path}~MERGE

My proposal is instead to write one file:

	${path}

with the content that would have gone to ${path}~HEAD.  This is what
already happens when trying to merge binary files.

[...]
>> Can you add something more about the motivation to the commit message?
>> E.g. is this about performance, interaction with some tools, to
>> support some particular workflow, etc?
>
> To be honest, I'm a little unsure how without even more excessive and
> repetitive wording across commits.

Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
other words, explain the rationale for the end-user facing change in the
same patch that changes the end-user facing behavior.

>                                     Let me attempt here, and maybe you
> can suggest how to change my commit messages?
>
>   * When files are wildly dissimilar -- as you mentioned -- it'd be
> easier for users to resolve conflicts if we wrote files out to
> separate paths instead of two-way merging them.

Today what we do (in both the wildly-dissimilar case and the
less-dissimilar case) is write one proposed resolution to the worktree
and put the competing versions in the index.  Tools like "git
mergetool" are then able to pull the competing versions out of the
index to allow showing them at the same time.

My bias is that I've used VCSes before that wrote multiple competing
files to the worktree and I have been happier with my experience
resolving conflicts in git.  E.g. at any step I can run a build to try
out the current proposed resolution, and there's less of a chance of
accidentally commiting a ~HEAD file.

[...]
> There are three types of conflicts representing two (possibly
> unrelated) files colliding at the same path: add/add, rename/add, and
> rename/rename(2to1).  add/add does the two-way merge of the colliding
> files, and the other two conflict types write the two colliding files
> out to separate paths.

Interesting.  I would be tempted to resolve this inconsistency the
other way: by doing a half-hearted two-way merge (e.g. by picking one
of the two versions of the colliding file) and marking the path as
conflicted in the index.  That way it's more similar to edit/edit,
too.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 21:35     ` Jonathan Nieder
@ 2018-03-12 23:08       ` Hilco Wijbenga
  2018-03-12 23:14         ` Jonathan Nieder
  2018-03-13  0:38       ` Elijah Newren
  1 sibling, 1 reply; 22+ messages in thread
From: Hilco Wijbenga @ 2018-03-12 23:08 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Elijah Newren, Jonathan Nieder,
	Ævar Arnfjörð Bjarmason

On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Interesting.  I would be tempted to resolve this inconsistency the
> other way: by doing a half-hearted two-way merge (e.g. by picking one
> of the two versions of the colliding file) and marking the path as
> conflicted in the index.  That way it's more similar to edit/edit,
> too.

If work is going to be done in this area, would it be possible to
include making auto-merging (in general) optional? Preferably,
configurable by file (or glob) but I'd already be happy with a global
setting to opt out.

I keep running into bugs caused by Git's automatic merging. (And I
don't see how this could be improved without teaching Git the
specifics of various file types.) It's especially hard when rebasing
large numbers of commits. The bug is introduced early on but I will
not notice anything is wrong until late in the rebase.

(Since I seem to keep asking for things that turn out to already have
been implemented ... if this is already possible please just point me
to the right setting and consider me a happy camper. :-) )

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 23:08       ` Hilco Wijbenga
@ 2018-03-12 23:14         ` Jonathan Nieder
  0 siblings, 0 replies; 22+ messages in thread
From: Jonathan Nieder @ 2018-03-12 23:14 UTC (permalink / raw)
  To: Hilco Wijbenga
  Cc: Git Mailing List, Elijah Newren,
	Ævar Arnfjörð Bjarmason

Hi,

Hilco Wijbenga wrote:
> On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> Interesting.  I would be tempted to resolve this inconsistency the
>> other way: by doing a half-hearted two-way merge (e.g. by picking one
>> of the two versions of the colliding file) and marking the path as
>> conflicted in the index.  That way it's more similar to edit/edit,
>> too.
>
> If work is going to be done in this area, would it be possible to
> include making auto-merging (in general) optional? Preferably,
> configurable by file (or glob) but I'd already be happy with a global
> setting to opt out.

Have you experimented with the 'merge' attribute (see "git help
attributes")?  E.g. you can put

 * -merge

in .gitattributes or .git/info/attributes.

If that helps, then a patch adding a pointer to the most helpful place
(maybe git-merge.txt?) would be very welcome.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 21:35     ` Jonathan Nieder
  2018-03-12 23:08       ` Hilco Wijbenga
@ 2018-03-13  0:38       ` Elijah Newren
  2018-03-13 17:22         ` Elijah Newren
  1 sibling, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2018-03-13  0:38 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List

I'm worried that my attempt to extract add/add from the rest of the
discussion with rename/add and rename/rename resulted in a false sense
of simplification.  Trying to handle all the edge and corner cases and
remain consistent sometimes gets a little hairy.  Anyway...

On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Elijah Newren wrote:
>> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>

> Sorry for the lack of clarity.  My understanding was that the proposed
> behavior was to write two files:
>
>         ${path}~HEAD
>         ${path}~MERGE

(Just to be clear: The proposed behavior was to do that only when the
colliding files were dissimilar, and otherwise two-way merging.)

> My proposal is instead to write one file:
>
>         ${path}
>
> with the content that would have gone to ${path}~HEAD.  This is what
> already happens when trying to merge binary files.

Thanks for clarifying.

I feel the analogy to merging binary files breaks down here in more
than one way:

1)

Merging binary files is more complex than you suggest here.  In
particular, when creating a virtual merge base, the resolution chosen
is not the version of the file from HEAD, but the version of the file
from the merge base.  Nasty problems would result otherwise for the
recursive case.

If we tried to match how merging binary files behaved, we run into the
problem that the colliding file conflict case has no common version of
the file from a merge base.  So the same strategy just doesn't apply.
The closest would be outright deleting both versions of the file for
the virtual merge base and punting to the outer merge to deal with it.
That might be okay, but seems really odd to me.  I feel like it'd
unnecessarily increase the number of conflicts users will see, though
maybe it wouldn't be horrible if this was only done when the files
were considered dissimilar anyway.

2)

merging binary files only has 3 versions of the file to store at a
single $path in the index, which fit naturally in stages 1-3;
rename/add and rename/rename(2to1) have 4 and 6, respectively.  Having
three versions of a file from a 3-way merge makes fairly intuitive
sense.  Have 4 or 6 versions of a file at a path from a 3-way merge is
inherently more complex.  I think that just using the version from
HEAD risks users resolving things incorrectly much more likely than in
the case of a binary merge.

> [...]
>>> Can you add something more about the motivation to the commit message?
>>> E.g. is this about performance, interaction with some tools, to
>>> support some particular workflow, etc?
>>
>> To be honest, I'm a little unsure how without even more excessive and
>> repetitive wording across commits.
>
> Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
> other words, explain the rationale for the end-user facing change in the
> same patch that changes the end-user facing behavior.

Patches 3, 4, and 5 all change end-user facing behavior -- one patch
per conflict type to make the three types behave the same (the first
two patches add more testcases, and write a common function all three
types can use).  The rationale for the sets of changes is largely the
same, and is somewhat lengthy.  Should I repeat the rationale in full
in all three places?

>>                                     Let me attempt here, and maybe you
>> can suggest how to change my commit messages?
>>
>>   * When files are wildly dissimilar -- as you mentioned -- it'd be
>> easier for users to resolve conflicts if we wrote files out to
>> separate paths instead of two-way merging them.
>
> Today what we do (in both the wildly-dissimilar case and the
> less-dissimilar case) is write one proposed resolution to the worktree
> and put the competing versions in the index.  Tools like "git
> mergetool" are then able to pull the competing versions out of the
> index to allow showing them at the same time.
>
> My bias is that I've used VCSes before that wrote multiple competing
> files to the worktree and I have been happier with my experience
> resolving conflicts in git.  E.g. at any step I can run a build to try
> out the current proposed resolution, and there's less of a chance of
> accidentally commiting a ~HEAD file.
>
> [...]
>> There are three types of conflicts representing two (possibly
>> unrelated) files colliding at the same path: add/add, rename/add, and
>> rename/rename(2to1).  add/add does the two-way merge of the colliding
>> files, and the other two conflict types write the two colliding files
>> out to separate paths.
>
> Interesting.  I would be tempted to resolve this inconsistency the
> other way: by doing a half-hearted two-way merge (e.g. by picking one
> of the two versions of the colliding file) and marking the path as
> conflicted in the index.  That way it's more similar to edit/edit,
> too.

Your proposal is underspecified; there are more complicated cases
where your wording could have different meanings.

I also had a backup plan was to propose making rename/add and
rename/rename(2to1) behave more like add/add (instead of making the
behavior for all three a blend of their current behaviors), but the
problem is I have two backup proposals to choose between.  And it
sounds like you're adding a third, which follows your ~HEAD strategy
from above.  Let me see if I can explain why your proposal is
underspecified:

For rename/rename(2to1) we have up to 3 sets of conflicts.  Let's say
we renamed A->C in HEAD and B->C on the other side.  We first need to
do a three-way content merge on the first C (with the other two
versions of A), then we need to do a three-way content merge on the
second C (with the other two versions of B).  This gives us two
different C's, both with possible conflict markers, that want to be
stored at the same pathname.  Let me call these two versions of C, C1
and C2.  C1 and C2 may or may not be similar.  Resolving the fact that
both want to be stored at C is what gives us potentially a third set
of conflicts.

What's your resolution strategy here, for each combination of cases
(has conflict markers or not; is similar to the other C or not;
working on a virtual merge base or not)?

For reference there are two possible backup strategies I could propose here:

Backup Proposal #1: act stricly like add/add:
Do a two-way merge of C1 and C2.  If we end up with nested conflict
markers, oh well.  Good luck, user.

Backup Proposal #2: act like add/add, but only if C1 and C2 don't have
conflict markers.
If C1 or C2 have conflict markers, behave as currently -- record C1
and C2 to different paths.
If C1 and C2 do not have conflict markers, two-way merge them and
record in the worktree at C.

My question for your counter-proposal:
Do you record C1 or C2 as C?  Or do you record the version of C from
HEAD as C?  And what do you pick when creating a virtual merge base?

Problems I see regardless of the choice in your counter-proposal:
  * If you choose C from HEAD, you are ignoring 3 other versions of
"C" (as opposed to just one other version when merging a binary file);
will the user recognize that and know how to look at all four
versions?
  * If you choose C1 or C2, the user will see conflict markers; will
the user recognize that this is only a _subset_ of the conflicts, and
that there's a lot of content that was completely ignored?
  * There is no "merge base of C1 and C2" to use for the virtual merge
base.  What will you use?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13  0:38       ` Elijah Newren
@ 2018-03-13 17:22         ` Elijah Newren
  0 siblings, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 17:22 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List

On Mon, Mar 12, 2018 at 5:38 PM, Elijah Newren <newren@gmail.com> wrote:
> I feel the analogy to merging binary files breaks down here in more
> than one way:

Actually, after mulling this over, I'm going to retract the "more
than" piece of this sentence.  I'm trying to figure out how to retract
more, but have only figured out one piece.  In particular...

> 1)
>
> Merging binary files is more complex than you suggest here.  In
> particular, when creating a virtual merge base, the resolution chosen
> is not the version of the file from HEAD, but the version of the file
> from the merge base.  Nasty problems would result otherwise for the
> recursive case.
>
> If we tried to match how merging binary files behaved, we run into the
> problem that the colliding file conflict case has no common version of
> the file from a merge base.  So the same strategy just doesn't apply.
> The closest would be outright deleting both versions of the file for
> the virtual merge base and punting to the outer merge to deal with it.
> That might be okay, but seems really odd to me.  I feel like it'd
> unnecessarily increase the number of conflicts users will see, though
> maybe it wouldn't be horrible if this was only done when the files
> were considered dissimilar anyway.

Thinking about this more, it really isn't that weird at all.  The code
is already set up to use null_oid as the "original" version when
creating a virtual merge base, going back to the content from a
recursive merge base is a strategy used in multiple places in
recursive cases, and null is precisely the version from the recursive
merge base.  If two added files differ wildly, I don't think using
null would increase the number of conflicts appreciably, if at all.
So, the analogy to merging binary files holds just fine from this
angle.

So, if we could figure out how to handle the higher numbers of paths
for e.g. the rename/rename cases, then perhaps this is a strategy that
could work.

>> Interesting.  I would be tempted to resolve this inconsistency the
>> other way: by doing a half-hearted two-way merge (e.g. by picking one
>> of the two versions of the colliding file) and marking the path as
>> conflicted in the index.  That way it's more similar to edit/edit,
>> too.
>
> Your proposal is underspecified; there are more complicated cases
> where your wording could have different meanings.
>
<snip>
> My question for your counter-proposal:
> Do you record C1 or C2 as C?  Or do you record the version of C from
> HEAD as C?  And what do you pick when creating a virtual merge base?
>
> Problems I see regardless of the choice in your counter-proposal:
>   * If you choose C from HEAD, you are ignoring 3 other versions of
> "C" (as opposed to just one other version when merging a binary file);
> will the user recognize that and know how to look at all four
> versions?
>   * If you choose C1 or C2, the user will see conflict markers; will
> the user recognize that this is only a _subset_ of the conflicts, and
> that there's a lot of content that was completely ignored?

>   * There is no "merge base of C1 and C2" to use for the virtual merge
> base.  What will you use?

For this last question, the answer is "null", and it works just fine.
I don't yet have good answers to the other questions, though.  If
someone else does, I'd love to hear it.

Oh, and by the way Jonathan, thanks very much for your feedback and
alternative ideas for consideration.  You gave me more angles to try
to think about this problem from.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 21:26   ` Elijah Newren
  2018-03-12 21:35     ` Jonathan Nieder
@ 2018-03-13  5:30     ` Junio C Hamano
  2018-03-13 18:21       ` Elijah Newren
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2018-03-13  5:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Nieder, Git Mailing List

While I do not think it is a bad idea to add an optional way to write the
contents of conflicted stages out to separate files, I do not think it is a
good idea to change what happens to add/add conflict by default.

Two integrators picking up a same patch that adds a file separately
and allowing them to diverge before they are merged should not be
all that surprising, just like two integrators picking up a same patch
that changes an existing path and letting them evolve independently,
so from that point of view, I do not think there is fundamental difference
between edit/edit vs add/add and rename/rename conflict. The latter
certainly would be much rarer, but that is because edit happens a lot
more often than add.

There certainly are cases where conflicts is easier to resolve when
the merged tip versions are unrelated or diverged too vastly, with or
without a common merge base. As I already agreed to, it would be
useful in such a case to have an option to write the conflicted stages
to separate files to let an external merge tool to examine them and
help you resolve them. But that is not limited to resolving vast
differece between contents involved in an add/add conflict, is it?
The same tool (which can be driven as a mergetool backend) would
be useful in reconciling the difference between contents involved in an
edit/edit conflict, no?

If anything, if rename/rename behaves differently by always writing
the result out to separate files, it is that codepath whose behaviour
should be fixed to match how add/add conflicts are dealt with.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13  5:30     ` Junio C Hamano
@ 2018-03-13 18:21       ` Elijah Newren
  2018-03-13 22:26         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 18:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Git Mailing List

On Mon, Mar 12, 2018 at 10:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> While I do not think it is a bad idea to add an optional way to write the
> contents of conflicted stages out to separate files, I do not think it is a
> good idea to change what happens to add/add conflict by default.

<snip some other good arguments>

> If anything, if rename/rename behaves differently by always writing
> the result out to separate files, it is that codepath whose behaviour
> should be fixed to match how add/add conflicts are dealt with.

Cool, thanks for the context.  I'm happy to go down this path, but
there is one question I'd like your opinion on: what if the
intermediate content merges have conflicts themselves?  If that
question isn't clear, let me be more precise...

Let's say A is renamed to C in HEAD, and both sides modified.  We thus
need to do a 3-way content merge of C from HEAD with the two A's.
Let's call the result C1.  Further, let's say that B is renamed to C
in $MERGE_BRANCH, and both sides modified.  Again we need a 3-way
content merge for that other C; let's call the result C2.  If neither
C1 nor C2 have content conflicts, then making rename/rename(2to1) act
like add/add is easy: just do a two-way merge of C1 and C2.  But what
if C1 or C2 have conflicts?  Doing a two-way merge in that case could
result in nested conflict markers.  Is that okay, or would we want to
continue the old behavior of writing C1 and C2 out to separate files
for that special case only?

I'm inclined to special case content conflicts in the intermediate
merges and handle them by writing out to separate files, but I was the
one who suggested sometimes using separate files for add/add
conflicts; I'm curious if perhaps my strong aversion to nested
conflicts is also off base.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 18:21       ` Elijah Newren
@ 2018-03-13 22:26         ` Junio C Hamano
  2018-03-13 22:42           ` Elijah Newren
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2018-03-13 22:26 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Nieder, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Cool, thanks for the context.  I'm happy to go down this path, but
> there is one question I'd like your opinion on: what if the
> intermediate content merges have conflicts themselves?  If that
> question isn't clear, let me be more precise...

I think you answered this yourself after (re)discovering the virtual
ancestor merge in the recursive strategy, and no longer need my
input here ;-)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 22:26         ` Junio C Hamano
@ 2018-03-13 22:42           ` Elijah Newren
  2018-03-13 22:52             ` Junio C Hamano
  2018-03-13 22:56             ` Jonathan Nieder
  0 siblings, 2 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 22:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Git Mailing List

On Tue, Mar 13, 2018 at 3:26 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Elijah Newren <newren@gmail.com> writes:
>
>> Cool, thanks for the context.  I'm happy to go down this path, but
>> there is one question I'd like your opinion on: what if the
>> intermediate content merges have conflicts themselves?  If that
>> question isn't clear, let me be more precise...
>
> I think you answered this yourself after (re)discovering the virtual
> ancestor merge in the recursive strategy, and no longer need my
> input here ;-)

The question about what to put into the index was another issue, and
it's good to hear that you seem to approve of my logic on that one.
Thanks.  :-)

However, my question here about what to write to the working tree for
a rename/rename(2to1) conflict in one particular corner case still
remains.  Should a two-way merge be performed even if it may result in
nested sets of conflict markers, or is that a sufficiently bad outcome
for the user that it's the one case we do want to write colliding
files out to different temporary paths?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 22:42           ` Elijah Newren
@ 2018-03-13 22:52             ` Junio C Hamano
  2018-03-13 23:04               ` Elijah Newren
  2018-03-13 22:56             ` Jonathan Nieder
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2018-03-13 22:52 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Nieder, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> However, my question here about what to write to the working tree for
> a rename/rename(2to1) conflict in one particular corner case still
> remains.

Hmph, is it a bad idea to model this after what recursive merge
strategy does?  I think what is written out from that codepath to
the working tree has the nested conflict markers (with a bit of
tweak to the marker length, IIRC) in it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 22:52             ` Junio C Hamano
@ 2018-03-13 23:04               ` Elijah Newren
  0 siblings, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 23:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Git Mailing List

On Tue, Mar 13, 2018 at 3:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Elijah Newren <newren@gmail.com> writes:
>
>> However, my question here about what to write to the working tree for
>> a rename/rename(2to1) conflict in one particular corner case still
>> remains.
>
> Hmph, is it a bad idea to model this after what recursive merge
> strategy does?  I think what is written out from that codepath to
> the working tree has the nested conflict markers (with a bit of
> tweak to the marker length, IIRC) in it.

Oh, that's cool; I didn't know that.  It looks like that was
introduced in commit d694a17986 ("ll-merge: use a longer conflict
marker for internal merge", 2016-04-14).  That seems like a good idea;
I'll go with that.  Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 22:42           ` Elijah Newren
  2018-03-13 22:52             ` Junio C Hamano
@ 2018-03-13 22:56             ` Jonathan Nieder
  2018-03-13 23:14               ` Elijah Newren
  1 sibling, 1 reply; 22+ messages in thread
From: Jonathan Nieder @ 2018-03-13 22:56 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Junio C Hamano, Git Mailing List

Hi,

Elijah Newren wrote:

> However, my question here about what to write to the working tree for
> a rename/rename(2to1) conflict in one particular corner case still
> remains.  Should a two-way merge be performed even if it may result in
> nested sets of conflict markers, or is that a sufficiently bad outcome
> for the user that it's the one case we do want to write colliding
> files out to different temporary paths?

Nested conflict markers only happen in the conflictstyle=diff3 case, I
would think.

merge-recursive writes them already.  I've often wished that it would
use a union merge strategy when building the common ancestor to avoid
the nested conflicts that rerere doesn't understand.  But anyway,
that's an orthogonal issue: in the rename/rename context, it should be
fine to write nested conflict markers since that's consistent with
what merge-recursive already does.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 22:56             ` Jonathan Nieder
@ 2018-03-13 23:14               ` Elijah Newren
  2018-03-13 23:30                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 23:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, Git Mailing List

On Tue, Mar 13, 2018 at 3:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Elijah Newren wrote:
>
>> However, my question here about what to write to the working tree for
>> a rename/rename(2to1) conflict in one particular corner case still
>> remains.  Should a two-way merge be performed even if it may result in
>> nested sets of conflict markers, or is that a sufficiently bad outcome
>> for the user that it's the one case we do want to write colliding
>> files out to different temporary paths?
>
> Nested conflict markers only happen in the conflictstyle=diff3 case, I
> would think.

Currently, yes.  To be clear, though, this change would make it
possible even when there is no recursive merge being done and when
conflictstyle=merge.

> merge-recursive writes them already.  I've often wished that it would
> use a union merge strategy when building the common ancestor to avoid
> the nested conflicts that rerere doesn't understand.  But anyway,
> that's an orthogonal issue: in the rename/rename context, it should be
> fine to write nested conflict markers since that's consistent with
> what merge-recursive already does.

Cool, sounds like we're now all on the same page.

Someone in the future might hate us if they use conflictstyle=diff3,
and have a recursive merge, and have a rename/rename(2to1) conflict in
the virtual merge base with nested conflicts, and that resulting file
is also involved in a separate rename/rename(2to1) conflict in the
outer merge that has its own nested conflicts; we'd have up to four
levels of nested conflicts.  But for now, I'm going to write that off
as a crazy thought (or dismiss it as nigh impossible to reason about
regardless of what we did to help them) and just proceed ahead.  :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13 23:14               ` Elijah Newren
@ 2018-03-13 23:30                 ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-03-13 23:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jonathan Nieder, Git Mailing List

On Tue, Mar 13, 2018 at 4:14 PM, Elijah Newren <newren@gmail.com> wrote:
>
> Someone in the future might hate us if they use conflictstyle=diff3,
> and have a recursive merge,

FWIW I already always use diff3 style and see the nested markers,
and I already hate it, so you are no worse off ;-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-12 18:32 Opinions on changing add/add conflict resolution? Elijah Newren
  2018-03-12 18:47 ` Jonathan Nieder
@ 2018-03-12 22:19 ` Ævar Arnfjörð Bjarmason
       [not found]   ` <CABPp-BHDOimDoLxWxS=BDOBkm6CUTrXTzD16=TSkWGN-HOiU2g@mail.gmail.com>
  1 sibling, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-12 22:19 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List


On Mon, Mar 12 2018, Elijah Newren jotted:

> Hi everyone,
>
> I'd like to change add/add conflict resolution.  Currently when such a
> conflict occurs (say at ${path}), git unconditionally does a two-way
> merge of the two files and sticks the result in the working tree at
> ${path}.
>
> I would like to make it instead first check whether the two files are
> similar; if they are, then do the two-way merge, but if they're not,
> then instead write the two files out to separate paths (${path}~HEAD
> and ${path}~$MERGE, while making sure that ${path} is removed from the
> working copy).
>
> Thoughts?
>
> I have a patch series[1] with more details and other changes, but
> wanted to especially get feedback on this issue even from folks that
> didn't have enough time to read the patches or even the cover letter.

Does this mean that e.g. in this case of merging two files, one
containing "foo" and one containing "bar":

    (
        rm -rf /tmp/test.git &&
        git init /tmp/test.git &&
        cd /tmp/test.git &&
        echo foo >README &&
        git add README &&
        git commit -mfoo &&
        git checkout --orphan trunk &&
        git reset --hard &&
        echo bar >README &&
        git add README &&
        git commit -mbar &&
        git merge --allow-unrelated-histories master;
        cat README
    )

That instead of getting:

    <<<<<<< HEAD
    bar
    =======
    foo
    >>>>>>> master

I'd now get these split into different files?

I'm assuming by similarity you're talking about the same heuristic we
apply for git diff -M, i.e. if "moving" a file would consider it
removed/added instead of moved you'd want two files instead of the
two-way merge.

I don't mind this being a configurable option if you want it, but I
don't think it should be on by default, reasons:

 1) There's lots of cases where we totally screw up the "is this
    similar?" check, in particular with small files.

    E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
    in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
    "/var/git"'. The rename detection will think this these have nothing
    to do with each other since they share no common lines, but to a
    human reader they're really similar, and would make sense in the
    context of resolving a bigger merge where /{opt,var}/git changes are
    conflicting.

    This is not some theoretical concern, there's lots of things that
    e.g. use small 5-10 line config files to configure some app that
    because of some combo of indentation changes and changing a couple
    of lines will make git's rename detection totally give up, but to a
    human reader they're 95% the same.

 2) This will play havoc with already established merge tools on top of
    git which a lot of users use instead of manually resolving these in
    vi or whatever.

    If we made this the default they'd need to to deal with this new
    state, and even if it's not the default we'll have some confused
    users wondering why Emacs Ediff or whatever isn't showing the right
    thing because it isn't supporting this yet.

So actually, given that last point in #2 I'm slightly negative on the
whole thing, but maybe splitting it into some new format existing tools
don't understand is compelling enough to justify the downstream breakage.

I don't think we've ever documented the format we leave the tree in
after a failed merge as equivalent to plumbing, but for the purposes of
tools that build on top of git it really is.

^ permalink raw reply	[flat|nested] 22+ messages in thread

[parent not found: <CABPp-BHDOimDoLxWxS=BDOBkm6CUTrXTzD16=TSkWGN-HOiU2g@mail.gmail.com>]

* Fwd: Opinions on changing add/add conflict resolution?
       [not found]   ` <CABPp-BHDOimDoLxWxS=BDOBkm6CUTrXTzD16=TSkWGN-HOiU2g@mail.gmail.com>
@ 2018-03-13  2:53     ` Elijah Newren
  2018-03-13 22:12       ` Junio C Hamano
  2018-03-13  9:59     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 22+ messages in thread
From: Elijah Newren @ 2018-03-13  2:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

[Re-sending because this computer happened to have plain-text mode
turned off for some weird reason, and thus the email bounced]

Hi,

On Mon, Mar 12, 2018 at 3:19 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Does this mean that e.g. in this case of merging two files, one
> containing "foo" and one containing "bar":
>
>     (
>         rm -rf /tmp/test.git &&
>         git init /tmp/test.git &&
>         cd /tmp/test.git &&
>         echo foo >README &&
>         git add README &&
>         git commit -mfoo &&
>         git checkout --orphan trunk &&
>         git reset --hard &&
>         echo bar >README &&
>         git add README &&
>         git commit -mbar &&
>         git merge --allow-unrelated-histories master;
>         cat README
>     )
>
> That instead of getting:
>
>     <<<<<<< HEAD
>     bar
>     =======
>     foo
>     >>>>>>> master
>
> I'd now get these split into different files?

As currently implemented, yes.  However, I was more concerned the idea
of handling files differently based on whether or not they were
similar, rather than on what the precise definition of "similar" is
for this context.

As far as the definition of similarity goes, estimate_similarity() is
currently used by rename detection to compare files recorded at
different pathnames.  By contrast, in this context, we are comparing
two files which were recorded with the same pathname.  That suggests
the heuristic could be a bit different and use more than just
estimate_similarity().  (e.g. "We consider these files similar IF more
than 50% of the lines match OR both files are less than 2K.")

> I don't mind this being a configurable option if you want it, but I
> don't think it should be on by default, reasons:

I don't think a configurable option makes sense, at least not for my
purposes.  Having rename/rename conflicts be "safely" mis-detected as
rename/add or add/add, and having rename/add conflicts be "safely"
mis-detected as add/add is my overriding concern.  Thus, making these
three conflict types behave consistently is what I need.  Options
would make that more difficult for me, and would thus feel like a step
backwards.

git am/rebase has been doing such mis-detections for years (almost
since the "dawn" of git time), but it feels really broken to me
because the conflict types aren't handled consistently.  (The facts
that (a) I'm the only one that has added rename/add testcases to
git.git, (b) that I've added all but one of the rename/rename(2to1)
testcases to the git.git testsuite, and (c) that rename/add has had
multiple bugs for many years, all combine to suggest to me that folks
just don't hit those conflict types in practice and thus that they
just aren't noticing this breakage -- yet.)

I also want to allow such mis-detections for cherry-picks and merges
because of the significant (nearly order-of-magnitude in some cases)
performance improvements I can get in rename detection if it's
allowed.

>  1) There's lots of cases where we totally screw up the "is this
>     similar?" check, in particular with small files.
>
>     E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
>     in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
>     "/var/git"'. The rename detection will think this these have nothing
>     to do with each other since they share no common lines, but to a
>     human reader they're really similar, and would make sense in the
>     context of resolving a bigger merge where /{opt,var}/git changes are
>     conflicting.
>
>     This is not some theoretical concern, there's lots of things that
>     e.g. use small 5-10 line config files to configure some app that
>     because of some combo of indentation changes and changing a couple
>     of lines will make git's rename detection totally give up, but to a
>     human reader they're 95% the same.

Fair enough.  The small files case could potentially be handled by
just changing the similarity metric for these conflict types, as noted
above.  If it's a small file, that might be the easiest way for a user
to deal with it too.

I'm not sure I see the problem with the bigger files, though.  If you
have bigger files with less than 50% of the lines matching, then
you'll essentially end up with a great big conflict block with one
file on one side and the other file on the other side, which doesn't
seem that different to me than having them be in two separate files.
In fact, separate files seems easier to deal with because then the
user can run e.g. 'git diff --no-index --color-words FILE1 FILE2',
something that they can't do when it's in one file.  That has bothered
me more than once, and made me wish they were just in separate files.

>  2) This will play havoc with already established merge tools on top of
>     git which a lot of users use instead of manually resolving these in
>     vi or whatever.
>
>     If we made this the default they'd need to to deal with this new
>     state, and even if it's not the default we'll have some confused
>     users wondering why Emacs Ediff or whatever isn't showing the right
>     thing because it isn't supporting this yet.
>
> So actually, given that last point in #2 I'm slightly negative on the
> whole thing, but maybe splitting it into some new format existing tools
> don't understand is compelling enough to justify the downstream breakage.

To me, this is a bigger concern.  We have changed conflict resolutions
in various ways at various times over the years, so I don't think the
output should be considered fixed in stone, but I am very sympathetic
to arguments that this particular change is too painful.

There is a possible way to allow for a transition period, though.
Junio has already requested that some of the changes I'm working on be
done as a new merge strategy that is a rewrite of merge-recursive
(https://public-inbox.org/git/xmqqk1ydkbx0.fsf@gitster.mtv.corp.google.com/).
It would be a little unfortunate to not be able to use the new merge
strategy as an exact drop-in replacement of the current recursive
merge strategy, but it would provide a way for people to get some time
to migrate other tools.

However, I'm far more concerned with the three collision conflict
types having consistent behavior than I am with changing add/add
conflict handling.  And if your two concerns or Jonathan's concern
would prevent you from wanting to use the new merge strategy (and
possibly prevent it from becoming the default in the future), I'd much
rather just modify rename/add and rename/rename to behave like
add/add.  Would that be more to your liking?

Elijah

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Fwd: Opinions on changing add/add conflict resolution?
  2018-03-13  2:53     ` Fwd: " Elijah Newren
@ 2018-03-13 22:12       ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2018-03-13 22:12 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> As currently implemented, yes.  However, I was more concerned the idea
> of handling files differently based on whether or not they were
> similar, rather than on what the precise definition of "similar" is
> for this context.
>
> As far as the definition of similarity goes, estimate_similarity() is
> currently used by rename detection to compare files recorded at
> different pathnames.  By contrast, in this context, we are comparing
> two files which were recorded with the same pathname.  That suggests
> the heuristic could be a bit different and use more than just
> estimate_similarity().  (e.g. "We consider these files similar IF more
> than 50% of the lines match OR both files are less than 2K.")

Yeah, I think there is similar difference between similarity score
that is used by diffcore-rename and dissimilarity score that is used
by diffcore-break exactly for that reason.  If you start from a
100-line original file and grow it to 100-line one by adding 900
lines, as long as you kept the original 100 lines, it is easier to
view the change within the same path as a continued development,
instead of saying they are so dissimilar.

In any case, I think the way the stage #2 and stage #3 (i.e. ours
and theirs) contents are externalized during a conflicted mergy
operation should be consistent across edit/edit and add/add
conflict, so if we are adding a new way to write out extra temporary
files out of these higher stage index entries, it should be made
applicable not only to add/add conflict (i.e. there shounld't be a
code that says "oh, let's see, this lacks stage #1, so do this
different thing").

Personally, I think it is best to leave it all outside of the core
and make "git mergetool" to be responsible for the job of
externalizing higher stage index entries to temporary working tree
files.  They already need to do so in order to work with external
tools that do not read directly from our index file anyway, no?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
       [not found]   ` <CABPp-BHDOimDoLxWxS=BDOBkm6CUTrXTzD16=TSkWGN-HOiU2g@mail.gmail.com>
  2018-03-13  2:53     ` Fwd: " Elijah Newren
@ 2018-03-13  9:59     ` Ævar Arnfjörð Bjarmason
  2018-03-13 17:09       ` Elijah Newren
  1 sibling, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-13  9:59 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List


On Tue, Mar 13 2018, Elijah Newren jotted:

> Hi,
>
> On Mon, Mar 12, 2018 at 3:19 PM, Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> wrote:
>
>>
>> Does this mean that e.g. in this case of merging two files, one
>> containing "foo" and one containing "bar":
>>
>>     (
>>         rm -rf /tmp/test.git &&
>>         git init /tmp/test.git &&
>>         cd /tmp/test.git &&
>>         echo foo >README &&
>>         git add README &&
>>         git commit -mfoo &&
>>         git checkout --orphan trunk &&
>>         git reset --hard &&
>>         echo bar >README &&
>>         git add README &&
>>         git commit -mbar &&
>>         git merge --allow-unrelated-histories master;
>>         cat README
>>     )
>>
>> That instead of getting:
>>
>>     <<<<<<< HEAD
>>     bar
>>     =======
>>     foo
>>     >>>>>>> master
>>
>> I'd now get these split into different files?
>>
>
> As currently implemented, yes.  However, I was more concerned the idea of
> handling files differently based on whether or not they were similar,
> rather than on what the precise definition of "similar" is for this context.
>
> As far as the definition of similarity goes, estimate_similarity() is
> currently used by rename detection to compare files recorded at different
> pathnames.  By contrast, in this context, we are comparing two files which
> were recorded with the same pathname.  That suggests the heuristic could be
> a bit different and use more than just estimate_similarity().  (e.g. "We
> consider these files similar IF >50% of the lines match OR both files are
> less than 2K.")

Yeah that's fair enough, we could definitely tweak a dedicated heuristic
for just this purpose to do the right thing most of the time, although
really having completely unrelated 10-line README files use the old
two-way merge would, I guess, also be confusing to users who want this
feature.

>> I don't mind this being a configurable option if you want it, but I
>> don't think it should be on by default, reasons:
>>
>
> I don't think a configurable option makes sense, at least not for my
> purposes.  Having rename/rename conflicts be "safely" mis-detected as
> rename/add or add/add, and having rename/add conflicts be "safely"
> mis-detected as add/add is my overriding concern.  Thus, making these three
> conflict types behave consistently is what I need.  Options would make that
> more difficult for me, and would thus feel like a step backwards.

I think there's three types of users who interact with the current
format:

 1) The user using no special tool (e.g. editing with nano) and runs
    into a conflict, maybe this helps these users the most.

 2) The user who's also using no special tool (e.g. editing with nano)
    but just prefers to get in-file conflict markers.

 3) The user using a tool to solve the conflict, who doesn't care if
    we'd spew out our current format, this new formar, or some XML file
    with both versions or whatevery, they're using some other UI.

I don't think we should make some change that breaks existing tools for
users of #3 by default.

>  1) There's lots of cases where we totally screw up the "is this
>>     similar?" check, in particular with small files.
>>
>>     E.g. let's say you have a config file like 'fs-path "/tmp/git"' and
>>     in two branches you change that to 'fs-path "/opt/git"' and 'fs-path
>>     "/var/git"'. The rename detection will think this these have nothing
>>     to do with each other since they share no common lines, but to a
>>     human reader they're really similar, and would make sense in the
>>     context of resolving a bigger merge where /{opt,var}/git changes are
>>     conflicting.
>>
>>     This is not some theoretical concern, there's lots of things that
>>     e.g. use small 5-10 line config files to configure some app that
>>     because of some combo of indentation changes and changing a couple
>>     of lines will make git's rename detection totally give up, but to a
>>     human reader they're 95% the same.
>>
>
> Fair enough.  The small files case could potentially be handled by just
> changing the similarity metric for these conflict types, as noted above.
> If it's a small file, that might be the easiest way for a user to deal with
> it too.

Or it might be easier for the user to deal with even big conflicts with
in-file conflict markers, see use-case #2 & #3 above.

> I'm not sure I see the problem with the bigger files, though.  If you have
> bigger files with less than 50% of the lines matching, then you'll
> essentially end up with a great big conflict block with one file on one
> side and the other file on the other side, which doesn't seem that
> different to me than having them be in two separate files.  In fact,
> separate files seems easier to deal with because then the user can run e.g.
> 'git diff --no-index --color-words FILE1 FILE2', something that they can't
> do when it's in one file.  That has bothered me more than once, and made me
> wish they were just in separate files.

Right, but this assumes they user isn't using a merge-tool as noted in
#3.

>>  2) This will play havoc with already established merge tools on top of
>>     git which a lot of users use instead of manually resolving these in
>>     vi or whatever.
>>
>>     If we made this the default they'd need to to deal with this new
>>     state, and even if it's not the default we'll have some confused
>>     users wondering why Emacs Ediff or whatever isn't showing the right
>>     thing because it isn't supporting this yet.
>>
>> So actually, given that last point in #2 I'm slightly negative on the
>> whole thing, but maybe splitting it into some new format existing tools
>> don't understand is compelling enough to justify the downstream breakage.
>>
>
> To me, this is a bigger concern.  We have changed conflict resolutions in
> various ways at various times over the years, so I don't think the output
> should be considered fixed in stone, but I am very sympathetic to arguments
> that this particular change is too painful.

I think we've changed around a lot of the details, and certainly how the
conflict is resolved. By "format" I mean that we've been emitting a file
with conflict markers, which third-party tools know how to parse
(e.g. ediff). That's a relatively stable format that we (with minor
changes) also share with cvs, svn, or plain patch(1) etc.

> [...]
> However, I'm far more concerned with the three collision conflict types
> having consistent behavior than I am with changing add/add conflict
> handling.  And if your two concerns or Jonathan's concern would prevent you
> from wanting to use the new merge strategy (and possibly prevent it from
> becoming the default in the future), I'd much rather just modify rename/add
> and rename/rename to behave like add/add.  Would that be more to your
> liking?

I don't really object to these changes, I don't know enough about this
area, I skimmed your patches and 90% of it is over my head (although I
could keep digging...).

I'm just chiming in because it seems to me from upthread that you're
purely focusing on the use-case of the user of git who's using git at
the abstraction of using a dumb editor that doesn't do anything to help
them with conflict markers to resolve conflicts.

I think that in those cases if you have two unrelated 3k line files that
collide splitting them out is certainly more helpful.

But there's also the use-case of existing tools (e.g. ediff) essentially
using this as a plumbing format, and we have to keep those tools in
mind, because to them it really doesn't matter what the format is, but
it *does* matter that they can understand it, but if we took these
patches as-is e.g. ediff wouldn't even show these conflicting files,
correct?

Also another caveat: Since these are new side-files what happens when
you're in a conflict and run `git clean -dxf`, are these new files wiped
away, or are they somehow known to the index?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Opinions on changing add/add conflict resolution?
  2018-03-13  9:59     ` Ævar Arnfjörð Bjarmason
@ 2018-03-13 17:09       ` Elijah Newren
  0 siblings, 0 replies; 22+ messages in thread
From: Elijah Newren @ 2018-03-13 17:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Tue, Mar 13, 2018 at 2:59 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Tue, Mar 13 2018, Elijah Newren jotted:

>> However, I'm far more concerned with the three collision conflict types
>> having consistent behavior than I am with changing add/add conflict
>> handling.  And if your two concerns or Jonathan's concern would prevent you
>> from wanting to use the new merge strategy (and possibly prevent it from
>> becoming the default in the future), I'd much rather just modify rename/add
>> and rename/rename to behave like add/add.  Would that be more to your
>> liking?
>
> I don't really object to these changes, I don't know enough about this
> area, I skimmed your patches and 90% of it is over my head (although I
> could keep digging...).
>
> I'm just chiming in because it seems to me from upthread that you're
> purely focusing on the use-case of the user of git who's using git at
> the abstraction of using a dumb editor that doesn't do anything to help
> them with conflict markers to resolve conflicts.

Yeah, that's totally why I started this separate thread; I was worried
the original would push away folks who weren't familiar enough with
merge-recursive or were just overwhelmed by all the different changes
and rationale, but I really wanted to get feedback on at least the
piece that people were likely to hit in practice and would understand.
Thanks for doing so; your, and Jonathan's, and Junio's comments have
provided some good context.

> Also another caveat: Since these are new side-files what happens when
> you're in a conflict and run `git clean -dxf`, are these new files wiped
> away, or are they somehow known to the index?

A git clean would wipe them out...and if that scares you (and I can
totally see how it might), then rest assured that the situation is a
whole lot worse: this isn't limited to rename/add and
rename/rename(2to1) conflicts.  There are several paths in
merge-recursive.c that call unique_path() to get a temporary filename
that is in no way tracked in the index but which is used for storing
content relevant to the merge.  These include directory/file
conflicts, untracked files being in the way of putting a renamed file
where it belongs, untracked files being in the way of a modify/delete,
untracked files being in the way of simple adds on the other side of
history, rename/rename(1to2)/add/add, and maybe others.  In all cases,
a git clean is going to wipe out the files that were written to
different temporary locations.

My rewrite I'm trying to find time to work on would get rid of the
code paths involving untracked files being in the way of other stuff,
but would do nothing for the other cases.  I would love to get rid of
all of them, but don't have a clue how to do so.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-03-13 23:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 18:32 Opinions on changing add/add conflict resolution? Elijah Newren
2018-03-12 18:47 ` Jonathan Nieder
2018-03-12 21:26   ` Elijah Newren
2018-03-12 21:35     ` Jonathan Nieder
2018-03-12 23:08       ` Hilco Wijbenga
2018-03-12 23:14         ` Jonathan Nieder
2018-03-13  0:38       ` Elijah Newren
2018-03-13 17:22         ` Elijah Newren
2018-03-13  5:30     ` Junio C Hamano
2018-03-13 18:21       ` Elijah Newren
2018-03-13 22:26         ` Junio C Hamano
2018-03-13 22:42           ` Elijah Newren
2018-03-13 22:52             ` Junio C Hamano
2018-03-13 23:04               ` Elijah Newren
2018-03-13 22:56             ` Jonathan Nieder
2018-03-13 23:14               ` Elijah Newren
2018-03-13 23:30                 ` Junio C Hamano
2018-03-12 22:19 ` Ævar Arnfjörð Bjarmason
     [not found]   ` <CABPp-BHDOimDoLxWxS=BDOBkm6CUTrXTzD16=TSkWGN-HOiU2g@mail.gmail.com>
2018-03-13  2:53     ` Fwd: " Elijah Newren
2018-03-13 22:12       ` Junio C Hamano
2018-03-13  9:59     ` Ævar Arnfjörð Bjarmason
2018-03-13 17:09       ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).