git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git merge-tree: bug report and some feature requests
@ 2018-01-21  3:00 Josh Bleecher Snyder
  2018-01-21  8:14 ` Ævar Arnfjörð Bjarmason
  2018-01-22 17:58 ` Elijah Newren
  0 siblings, 2 replies; 7+ messages in thread
From: Josh Bleecher Snyder @ 2018-01-21  3:00 UTC (permalink / raw)
  To: git

Hi, all.

I'm experimenting with some new porcelain for interactive rebase. One
goal is to leave the work tree untouched for most operations. It looks
to me like 'git merge-tree' may be the right plumbing command for
doing the merge part of the pick work of the todo list, one commit at
a time. If I'm wrong about this, I'd love pointers; what follows may
still be interesting anyway.

I've encountered some bumps with 'git merge-tree'. A bug report and
some feature requests follow. Apologies for the long email.


1. Bug

When a binary file containing NUL is added on only one side, the
resulting patch is malformed. Reproduction script:

mkdir test
cd test
git init .
touch shared
git add shared
git commit -m "base"
git checkout -b left
echo "left" > x
printf '\1\0\1\n' > binary
git add x binary
git commit -m "left"
git checkout master
git checkout -b right
echo "right" > x
git add x
git commit -m "right"
git merge-tree master right left
git merge-tree master right left | xxd
cd ..
rm -rf test

The merge-tree results I get with 2.15.1 are:

added in remote
  their  100644 ddc50ce55647db1421b18aa33417442e29f63d2f binary
@@ -0,0 +1 @@
+added in both
  our    100644 c376d892e8b105bd712d06ec5162b5f31ce949c3 x
  their  100644 45cf141ba67d59203f02a54f03162f3fcef57830 x
@@ -1 +1,5 @@
+<<<<<<< .our
 right
+=======
+left
+>>>>>>> .their

00000000: 6164 6465 6420 696e 2072 656d 6f74 650a  added in remote.
00000010: 2020 7468 6569 7220 2031 3030 3634 3420    their  100644
00000020: 6464 6335 3063 6535 3536 3437 6462 3134  ddc50ce55647db14
00000030: 3231 6231 3861 6133 3334 3137 3434 3265  21b18aa33417442e
00000040: 3239 6636 3364 3266 2062 696e 6172 790a  29f63d2f binary.
00000050: 4040 202d 302c 3020 2b31 2040 400a 2b01  @@ -0,0 +1 @@.+.
00000060: 6164 6465 6420 696e 2062 6f74 680a 2020  added in both.
00000070: 6f75 7220 2020 2031 3030 3634 3420 6333  our    100644 c3
00000080: 3736 6438 3932 6538 6231 3035 6264 3731  76d892e8b105bd71
00000090: 3264 3036 6563 3531 3632 6235 6633 3163  2d06ec5162b5f31c
000000a0: 6539 3439 6333 2078 0a20 2074 6865 6972  e949c3 x.  their
000000b0: 2020 3130 3036 3434 2034 3563 6631 3431    100644 45cf141
000000c0: 6261 3637 6435 3932 3033 6630 3261 3534  ba67d59203f02a54
000000d0: 6630 3331 3632 6633 6663 6566 3537 3833  f03162f3fcef5783
000000e0: 3020 780a 4040 202d 3120 2b31 2c35 2040  0 x.@@ -1 +1,5 @
000000f0: 400a 2b3c 3c3c 3c3c 3c3c 202e 6f75 720a  @.+<<<<<<< .our.
00000100: 2072 6967 6874 0a2b 3d3d 3d3d 3d3d 3d0a   right.+=======.
00000110: 2b6c 6566 740a 2b3e 3e3e 3e3e 3e3e 202e  +left.+>>>>>>> .
00000120: 7468 6569 720a                           their.

Note that the "added in both" explanation appears to be part of the
diff for binary. The diff line should be '\1\0\1\n', but it is only
'\1', obviously suggesting a C string operation gone awry.

I haven't checked whether regular 'git diff' operations contain a
similar bug. (The NUL would have to be pretty far into the file, to
confuse the binary file detection heuristic, but that is possible.)

I don't see any particularly good work-arounds. Looking for all
possible explanations as a trigger to detect a malformed patch runs
into false positives with the explanation "merged", which occurs in
regular code.


2. Feature suggestion

Related to the bug, may I suggest a flag to omit unnecessary patches?
For "added in remote" and "deleted in remote", I don't actually need
the patch--I can grab the blob contents from the SHA myself if needed.
These cases need special handling anyway (to create/delete the file),
so the (often large) patch doesn't add much anyway. This would provide
a workaround for the bug.


3. Feature suggestion

There's no direct indication of whether any given file's merge
succeeded. Currently I sniff for merge conflicts by looking for
"+<<<<<<< .our", which feels like an ugly kludge. Could we provide an
explicit indicator? (And maybe also one for binary vs text
processing?)

Note that binary file merge conflicts don't generate patches with
three-way merge markers but instead say "warning: Cannot merge binary
files: binary (.our vs. .their)". Looking for this case even further
complicates the output parser.


4. API suggestion

Here's what I really want 'git merge-tree' to output. :)

If the merge had no conflicts, write the resulting tree to the object
database and give me its sha. I can always diff that tree against
branch1's tree if I want to see what has changed.

If the merge had conflicts, write the "as merged as possible" tree to
the object database and give me its sha, and then also give me the
three-way merge diff output for all conflicts, as a regular patch
against that tree, using full path names and shas. (Alternatively,
maybe better, give me a second sha for a tree containing all the
three-way merge diff patches applied, which I can diff against the
first tree to find the conflict patches.)

I'm not sure what to do about binary merge conflicts, since they
aren't representable with three-way markers. Maybe just list them,
with the two conflicting blob ids?



Thanks very much,
-josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-21  3:00 git merge-tree: bug report and some feature requests Josh Bleecher Snyder
@ 2018-01-21  8:14 ` Ævar Arnfjörð Bjarmason
  2018-01-22 17:58 ` Elijah Newren
  1 sibling, 0 replies; 7+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-21  8:14 UTC (permalink / raw)
  To: Josh Bleecher Snyder; +Cc: git


On Sun, Jan 21 2018, Josh Bleecher Snyder jotted:

> 3. Feature suggestion
>
> There's no direct indication of whether any given file's merge
> succeeded. Currently I sniff for merge conflicts by looking for
> "+<<<<<<< .our", which feels like an ugly kludge. Could we provide an
> explicit indicator? (And maybe also one for binary vs text
> processing?)
>
> Note that binary file merge conflicts don't generate patches with
> three-way merge markers but instead say "warning: Cannot merge binary
> files: binary (.our vs. .their)". Looking for this case even further
> complicates the output parser.

I thought I had a way to do this, but looking back in my logs I find
that I was just using:

    git merge-tree $(git merge-base A B) A B | grep -e '^\+======='; echo $?

I.e. a variation of what you're doing, which as you note won't work for
binary files.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-21  3:00 git merge-tree: bug report and some feature requests Josh Bleecher Snyder
  2018-01-21  8:14 ` Ævar Arnfjörð Bjarmason
@ 2018-01-22 17:58 ` Elijah Newren
  2018-01-23  7:08   ` Josh Bleecher Snyder
  1 sibling, 1 reply; 7+ messages in thread
From: Elijah Newren @ 2018-01-22 17:58 UTC (permalink / raw)
  To: Josh Bleecher Snyder; +Cc: Git Mailing List

On Sat, Jan 20, 2018 at 7:00 PM, Josh Bleecher Snyder
<josharian@gmail.com> wrote:
> Hi, all.
>
> I'm experimenting with some new porcelain for interactive rebase. One
> goal is to leave the work tree untouched for most operations. It looks
> to me like 'git merge-tree' may be the right plumbing command for
> doing the merge part of the pick work of the todo list, one commit at
> a time. If I'm wrong about this, I'd love pointers; what follows may
> still be interesting anyway.

I don't have a concrete alternative (yet?) but here are some pointers
to two alternate merge-without-touching-working-tree possibilities, if
your current route doesn't pan out as well as you like:

I posted some patches last year to make merge-recursive.c be able to
do merges without touching the working tree.  Adding a few flags would
then enable it for any of 'merge', 'cherry-pick', 'am', or
'rebase'...though for unsuccessful merges, there's a clear question of
what/how conflicts should be reported to the user.  That probably
depends a fair amount on the precise use-case.

Although that series was placed on the backburner due to the immediate
driver of the feature going away, I'm still interested in such a
change, though I think it would fall out as a nice side effect of
implementing Junio's proposed ideal-world-merge-recursive rewrite[1].
I have started looking into that[2], but no guarantees about how
quickly I'll find time to finish or even whether I will.

[1] https://public-inbox.org/git/xmqqd147kpdm.fsf@gitster.mtv.corp.google.com
[2] https://github.com/newren/git/blob/ort/ort-cover-letter contains
overview of ideas and notes to myself about what I was hoping to
accomplish; currently it doesn't even compile or do anything

> 4. API suggestion
>
> Here's what I really want 'git merge-tree' to output. :)
...
> If the merge had conflicts, write the "as merged as possible" tree to

You'd need to define "as merged as possible" more carefully, because I
thought you meant a tree containing all the three-way merge conflict
markers and such being present in the "resolved" file, but from your
parenthetical note below it appears you think that is a different tree
that would also be useful to diff against the first one.  That leaves
me wondering what the first tree is. (Is it just the tree where for
each path, if that path had no conflicts associated with it then it's
the merge-resolved-file, and otherwise it's the file contents from the
merge-base?).

Both of these trees are actually rather non-trivial to define.  The
wording above isn't actually sufficient, because content conflicts
aren't the only kind of conflict.  More on that below.

There is already a bunch of code in merge-recursive.c to create a
forcibly-merged-accepting-conflict-markers-in-the-resolution and
record it as a tree (this is used for creating virtual merge bases in
the recursive case, namely when there isn't a single merge-base for
the two branches you are merging).  It might be reusable for what you
want here, but it's not immediately clear whether all the things it
does are appropriate; someone would have to consider the non-content
(path-based) conflicts carefully.

> the object database and give me its sha, and then also give me the
> three-way merge diff output for all conflicts, as a regular patch
> against that tree, using full path names and shas. (Alternatively,
> maybe better, give me a second sha for a tree containing all the
> three-way merge diff patches applied, which I can diff against the
> first tree to find the conflict patches.)

As far as I can tell, you're assuming that it's possible with two
trees that are crafted "just right", that you can tell where the merge
conflicts are, with binary files being your only difficulty.  Content
conflicts aren't the only type that exist; there are also path-based
conflicts.  These type of conflicts also make it difficult to know how
the two trees you are requesting should even be created.

For example, if there is a modify/delete conflict, how can that be
determined from just two trees?  If the first tree has the base
version of the file, then the second tree either has a file at the
same position or it doesn't.  Neither case looks like a conflict, but
the original merge had one.  You need more information.  The exact
same thing can be said for rename/delete conflicts.

Similarly, rename/add (one side renames an existing file to some new
path (say, "new_path"), and the other adds a brand new file at
"new_path), or rename/rename(2to1) (each side renames a different file
to the same location), won't be detectable just by diffing two trees.
These are often handled by moving both files to some other location,
so there's no way to record in a tree that there was a conflict.

rename/rename(1to2) is similar, but instead of two different original
files being renamed to the same thing, this is one file being renamed
differently on different sides of history.

I know that several of the examples above involved rename detection,
which git-merge-trees won't even do, but that means you're even more
likely to face the modify/delete conflict cases.  And our list still
isn't done, either:

Directory/file conflicts (one side puts a directory of the same name
that the other side adds as a file) will also cause problems.

Finally, directory rename detection (currently in pu under review)
adds a few "implicit dir rename" conflict types (renames of multiple
directories would cause multiple files to be renamed to the same
location, or an existing file/dir being in the way of one or more
path(s) getting implicitly renamed).  This means that the number of
types of non-textual conflicts might also grow in the future so it may
be unwise to try to special case existing exceptions with a bag of
clever workarounds.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-22 17:58 ` Elijah Newren
@ 2018-01-23  7:08   ` Josh Bleecher Snyder
  2018-01-23 11:52     ` Edward Thomson
  0 siblings, 1 reply; 7+ messages in thread
From: Josh Bleecher Snyder @ 2018-01-23  7:08 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

>> I'm experimenting with some new porcelain for interactive rebase. One
>> goal is to leave the work tree untouched for most operations. It looks
>> to me like 'git merge-tree' may be the right plumbing command for
>> doing the merge part of the pick work of the todo list, one commit at
>> a time. If I'm wrong about this, I'd love pointers; what follows may
>> still be interesting anyway.
>
> I don't have a concrete alternative (yet?) but here are some pointers
> to two alternate merge-without-touching-working-tree possibilities, if
> your current route doesn't pan out as well as you like:
>
> I posted some patches last year to make merge-recursive.c be able to
> do merges without touching the working tree.  Adding a few flags would
> then enable it for any of 'merge', 'cherry-pick', 'am', or
> 'rebase'...though for unsuccessful merges, there's a clear question of
> what/how conflicts should be reported to the user.  That probably
> depends a fair amount on the precise use-case.
>
> Although that series was placed on the backburner due to the immediate
> driver of the feature going away, I'm still interested in such a
> change, though I think it would fall out as a nice side effect of
> implementing Junio's proposed ideal-world-merge-recursive rewrite[1].
> I have started looking into that[2], but no guarantees about how
> quickly I'll find time to finish or even whether I will.
>
> [1] https://public-inbox.org/git/xmqqd147kpdm.fsf@gitster.mtv.corp.google.com
> [2] https://github.com/newren/git/blob/ort/ort-cover-letter contains
> overview of ideas and notes to myself about what I was hoping to
> accomplish; currently it doesn't even compile or do anything

Thanks for the pointer. That does seem promising.

And yes, I see now that serialization of conflicts is decidedly
challenging. More on that below.


>> 4. API suggestion
>>
>> Here's what I really want 'git merge-tree' to output. :)
> ...
>> If the merge had conflicts, write the "as merged as possible" tree to
>
> You'd need to define "as merged as possible" more carefully, because I
> thought you meant a tree containing all the three-way merge conflict
> markers and such being present in the "resolved" file, but from your
> parenthetical note below it appears you think that is a different tree
> that would also be useful to diff against the first one.  That leaves
> me wondering what the first tree is. (Is it just the tree where for
> each path, if that path had no conflicts associated with it then it's
> the merge-resolved-file, and otherwise it's the file contents from the
> merge-base?).

FWIW, the parenthetical suggestion was indeed what I had in mind. But
non-content conflicts appear to make that a non-starter. Or at least
woefully incomplete.


> Both of these trees are actually rather non-trivial to define.  The
> wording above isn't actually sufficient, because content conflicts
> aren't the only kind of conflict.  More on that below.
>
> There is already a bunch of code in merge-recursive.c to create a
> forcibly-merged-accepting-conflict-markers-in-the-resolution and
> record it as a tree (this is used for creating virtual merge bases in
> the recursive case, namely when there isn't a single merge-base for
> the two branches you are merging).  It might be reusable for what you
> want here, but it's not immediately clear whether all the things it
> does are appropriate; someone would have to consider the non-content
> (path-based) conflicts carefully.

Ack. I assume this is also the code that generates the existing 'git
merge-tree' patches, which includes conflict markers.


>> the object database and give me its sha, and then also give me the
>> three-way merge diff output for all conflicts, as a regular patch
>> against that tree, using full path names and shas. (Alternatively,
>> maybe better, give me a second sha for a tree containing all the
>> three-way merge diff patches applied, which I can diff against the
>> first tree to find the conflict patches.)
>
> As far as I can tell, you're assuming that it's possible with two
> trees that are crafted "just right", that you can tell where the merge
> conflicts are, with binary files being your only difficulty.  Content
> conflicts aren't the only type that exist; there are also path-based
> conflicts.  These type of conflicts also make it difficult to know how
> the two trees you are requesting should even be created.
>
> For example, if there is a modify/delete conflict, how can that be
> determined from just two trees?  If the first tree has the base
> version of the file, then the second tree either has a file at the
> same position or it doesn't.  Neither case looks like a conflict, but
> the original merge had one.  You need more information.  The exact
> same thing can be said for rename/delete conflicts.
>
> Similarly, rename/add (one side renames an existing file to some new
> path (say, "new_path"), and the other adds a brand new file at
> "new_path), or rename/rename(2to1) (each side renames a different file
> to the same location), won't be detectable just by diffing two trees.
> These are often handled by moving both files to some other location,
> so there's no way to record in a tree that there was a conflict.
>
> rename/rename(1to2) is similar, but instead of two different original
> files being renamed to the same thing, this is one file being renamed
> differently on different sides of history.
>
> I know that several of the examples above involved rename detection,
> which git-merge-trees won't even do, but that means you're even more
> likely to face the modify/delete conflict cases.  And our list still
> isn't done, either:
>
> Directory/file conflicts (one side puts a directory of the same name
> that the other side adds as a file) will also cause problems.

Me: If the world were simple, we could build it this simple way!

You: The world isn't simple.

Me: ...drat. Thanks.


I've been looking at libgit2's handling of this. It appears the
closest analog is:

* Call git_merge_trees:
https://libgit2.github.com/libgit2/#HEAD/group/merge/git_merge_trees
* Call git_index_conflict_iterator_new on the resulting *git_index:
https://libgit2.github.com/libgit2/#HEAD/group/index/git_index_conflict_iterator_new
* Use git_index_conflict_next to inspect a conflict:
https://libgit2.github.com/libgit2/#HEAD/group/index/git_index_conflict_next
* The conflict provides three git_index_entrys:
https://libgit2.github.com/libgit2/#HEAD/type/git_index_entry

Looking over your list above, at a minimum, libgit2 might not have a
particularly good way to represent submodule/file or
submodule/directory conflicts, because is-a-submodule is defined
external to a git_index_entry.

(Incidentally, looking at
https://github.com/git/git/blob/master/Documentation/technical/index-format.txt,
it appears that there are also possibly symlink/gitlink X
file/dir/submodule conflicts? Ugh.)

libgit2 gets to avoid the bother of serialization and deserialization,
but it seems the problem of conflict categorization still remains. See
my next comments.



> Finally, directory rename detection (currently in pu under review)
> adds a few "implicit dir rename" conflict types (renames of multiple
> directories would cause multiple files to be renamed to the same
> location, or an existing file/dir being in the way of one or more
> path(s) getting implicitly renamed).  This means that the number of
> types of non-textual conflicts might also grow in the future so it may
> be unwise to try to special case existing exceptions with a bag of
> clever workarounds.

There's tension here.

Cataloging or special-casing all possible conflict types does seem
unwise because of the sheer number of kinds of conflicts.

But the alternative appears to be punting entirely, as libgit2 does,
and merely providing something akin to three index entries. This which
leaves it unclear what exactly the conflict was, at which point any
user (read: porcelain developer) will end up having to recreate some
merge logic to figure out what went wrong. And if merge-tree starts
doing rename detection, the user might then have to emulate that as
well. Given that, you may as well catalog the many kinds of conflicts
and report which one occurred. And if there were a list of all
possible kinds of conflicts, it'd help the user write correct code,
because they're not going to naively fail to consider--as I did--the
many kinds of non-content-based conflicts.


I admit, I'm somewhat puzzled about where to go from here, both from
git/libgit2's perspective and the perspective of my immediate needs.
For myself, I probably have the luxury of bailing on anything but the
simplest, content-based conflicts and sending the (end) user back to
regular interactive rebase, if I can find a way to implement that
reliably and without inordinate difficulty. The first case I checked
doesn't look good: For dir/file merge conflicts, 'git merge-tree'
merely reports some "added in remote" and "added in local"
explanations, meaning I'd have to add path conflict detection myself.

I'll continue to ponder. Thanks for the enlightening email.

-josh


P.S. Is it expected/known that 'git merge --abort' of a
merge-in-progress involving a dir/file conflict generates a mildly
incomprehensible error in addition to aborting the merge?

$ git merge --abort
error: 'df' appears as both a file and as a directory
error: df: cannot drop to stage #0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-23  7:08   ` Josh Bleecher Snyder
@ 2018-01-23 11:52     ` Edward Thomson
  2018-01-24 18:46       ` Josh Bleecher Snyder
  2018-01-24 19:39       ` Elijah Newren
  0 siblings, 2 replies; 7+ messages in thread
From: Edward Thomson @ 2018-01-23 11:52 UTC (permalink / raw)
  To: Josh Bleecher Snyder; +Cc: Elijah Newren, Git Mailing List

On Tue, Jan 23, 2018 at 7:08 AM, Josh Bleecher Snyder
<josharian@gmail.com> wrote:
> Looking over your list above, at a minimum, libgit2 might not have a
> particularly good way to represent submodule/file or
> submodule/directory conflicts, because is-a-submodule is defined
> external to a git_index_entry.

libgit2 should detect submodule/file and submodule/directory
conflicts.

While, yes, some metadata about the submodule is located outside the
index, you can look at the mode to determine that this _is_ a
submodule.  You should be able to reliably detect submodule/file
conflicts because one will be a regular or executable file (mode
0100644 or 0100755), while the other entry at the same path will
be a gitlink (mode 0160000).

Similarly, submodule/directory conflict detection works just like
regular file/directory conflict detection.  If some path `foo` is
a submodule (or a regular file) then some path `foo/bar` existing
in the other side of the merge causes a conflict.

> Cataloging or special-casing all possible conflict types does seem
> unwise because of the sheer number of kinds of conflicts.
>
> But the alternative appears to be punting entirely, as libgit2 does,
> and merely providing something akin to three index entries.

Indeed, when I added merge to libgit2, we put the higher-level conflict
analysis into application code because there was not much interest in it
at the time.  I've been meaning to add this to `git_status` in libgit2,
but it's not been a high priority.

> This which
> leaves it unclear what exactly the conflict was, at which point any
> user (read: porcelain developer) will end up having to recreate some
> merge logic to figure out what went wrong. And if merge-tree starts
> doing rename detection, the user might then have to emulate that as
> well.

That's not a good idea.  Trying to figure out what merge did would be
painful at best, and likely impossible, since a rename conflict is
recorded in the main index without any way to piece it together.  eg:

100644 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 1       bar.c
100644 cafec4fecafec4fecafec4fecafec4fecafec4fe 2       bar.c
100644 c4cc188a892898e13927dc4a02e7f68814b874b2 1       foo.c
100644 71f5af150b25e3aaaad2d67ff46759311401036f 2       foo.c
100644 351cfbdd55d656edd2c5c995aae3caafb9ec11fa 3       rename1.c
100644 e407c7d138fb457674c3b114fcf47748169ab0c5 3       rename2.c

This is the main index that results when bar.c has a rename/edit
conflict, and foo.c also has a rename/edit conflict.  One was renamed
to rename1.c and the other to rename2.c.  Trying to determine which is
which _after the fact_ would be regrettable.  Especially since rename
detection is not static - you would need to know the thresholds that
were configured at the time merge was performed and try to replay
the rename detection with that.

libgit2 records a `NAME` section in the index that pairs the rename
detection decisions that it performed so that you can analyze them
and display them after the fact.

-ed

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-23 11:52     ` Edward Thomson
@ 2018-01-24 18:46       ` Josh Bleecher Snyder
  2018-01-24 19:39       ` Elijah Newren
  1 sibling, 0 replies; 7+ messages in thread
From: Josh Bleecher Snyder @ 2018-01-24 18:46 UTC (permalink / raw)
  To: Edward Thomson; +Cc: Elijah Newren, Git Mailing List

Thanks, Ed. I think I'll pursue the libgit2 route; sounds promising.


>> But the alternative appears to be punting entirely, as libgit2 does,
>> and merely providing something akin to three index entries.
>
> Indeed, when I added merge to libgit2, we put the higher-level conflict
> analysis into application code because there was not much interest in it
> at the time.  I've been meaning to add this to `git_status` in libgit2,
> but it's not been a high priority.

Is your conflict analysis application code public? I might be game to
do some of the legwork to get it into libgit2's git_status (although
I'm probably not the right person to do the API design). At a minimum,
it would be helpful as a reference, as I'm probably about to recreate
some subset of it myself.


-josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: git merge-tree: bug report and some feature requests
  2018-01-23 11:52     ` Edward Thomson
  2018-01-24 18:46       ` Josh Bleecher Snyder
@ 2018-01-24 19:39       ` Elijah Newren
  1 sibling, 0 replies; 7+ messages in thread
From: Elijah Newren @ 2018-01-24 19:39 UTC (permalink / raw)
  To: Edward Thomson; +Cc: Josh Bleecher Snyder, Git Mailing List

On Tue, Jan 23, 2018 at 3:52 AM, Edward Thomson
<ethomson@edwardthomson.com> wrote:
> Indeed, when I added merge to libgit2, we put the higher-level conflict
> analysis into application code because there was not much interest in it
> at the time.  I've been meaning to add this to `git_status` in libgit2,
> but it's not been a high priority.

That sounds pretty cool.  I'm actually dealing with a similar problem
in git itself with my replacement merge strategy; since I try to
perform the merge in-core, and then only try to update the index and
working copy if there'd be no loss of information (no overwriting of
dirty or untracked content), there's a delay between when a conflict
is detected and when I can report it, meaning that I have to store
information about the conflict somehow, and then report later.  So
that means I have to go through the trouble of storing this
information internally, which naturally raises the question of whether
I want to provide this information to the user in some fashion other
than just simple "CONFLICT: ..." messages printed out to the user.

>> This which
>> leaves it unclear what exactly the conflict was, at which point any
>> user (read: porcelain developer) will end up having to recreate some
>> merge logic to figure out what went wrong. And if merge-tree starts
>> doing rename detection, the user might then have to emulate that as
>> well.
>
> That's not a good idea.  Trying to figure out what merge did would be
> painful at best, and likely impossible, since a rename conflict is
> recorded in the main index without any way to piece it together.  eg:
>
> 100644 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 1       bar.c
> 100644 cafec4fecafec4fecafec4fecafec4fecafec4fe 2       bar.c
> 100644 c4cc188a892898e13927dc4a02e7f68814b874b2 1       foo.c
> 100644 71f5af150b25e3aaaad2d67ff46759311401036f 2       foo.c
> 100644 351cfbdd55d656edd2c5c995aae3caafb9ec11fa 3       rename1.c
> 100644 e407c7d138fb457674c3b114fcf47748169ab0c5 3       rename2.c
>
> This is the main index that results when bar.c has a rename/edit
> conflict, and foo.c also has a rename/edit conflict.  One was renamed
> to rename1.c and the other to rename2.c.  Trying to determine which is
> which _after the fact_ would be regrettable.  Especially since rename
> detection is not static - you would need to know the thresholds that
> were configured at the time merge was performed and try to replay
> the rename detection with that.
>
> libgit2 records a `NAME` section in the index that pairs the rename
> detection decisions that it performed so that you can analyze them
> and display them after the fact.

Ooh, I like that idea.  Could we do the same in git itself?  Having
not messed with the index format at all, what commands do users use to
view this supplementary info?

Also, if you have this NAME section, you could make other commands
smarter.  For example, if you had a normal content conflict on some
file without rename detection being involved, you'd expect the index
to look something like

100644 deadbeefdeadbeefdeadbeefdeadbeefdeadbeef 1       file.c
100644 cafec4fecafec4fecafec4fecafec4fecafec4fe 2       file.c
100644 ba5eba11ba5eba11ba5eba11ba5eba11ba5eba11 3       file.c

and then when you do a 'git add file.c' you expect all the higher
stages to go away, i.e.

100644 5ca1ab1e5ca1ab1e5ca1ab1e5ca1ab1e5ca1ab1e 0       file.c

With your NAME section trick, could one expect 'git add rename1.c' to
delete not only the higher stage for rename1.c but also the two for
whichever of bar.c or foo.c is relevant?

Elijah

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-01-24 19:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-21  3:00 git merge-tree: bug report and some feature requests Josh Bleecher Snyder
2018-01-21  8:14 ` Ævar Arnfjörð Bjarmason
2018-01-22 17:58 ` Elijah Newren
2018-01-23  7:08   ` Josh Bleecher Snyder
2018-01-23 11:52     ` Edward Thomson
2018-01-24 18:46       ` Josh Bleecher Snyder
2018-01-24 19:39       ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).