git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Bleecher Snyder <josharian@gmail.com>
To: git@vger.kernel.org
Subject: git merge-tree: bug report and some feature requests
Date: Sat, 20 Jan 2018 19:00:06 -0800	[thread overview]
Message-ID: <CAFAcib-2fxiVxtVWcbvafY3-Br7Y70HMiHFZoT0VfK6JU0Dp9A@mail.gmail.com> (raw)

Hi, all.

I'm experimenting with some new porcelain for interactive rebase. One
goal is to leave the work tree untouched for most operations. It looks
to me like 'git merge-tree' may be the right plumbing command for
doing the merge part of the pick work of the todo list, one commit at
a time. If I'm wrong about this, I'd love pointers; what follows may
still be interesting anyway.

I've encountered some bumps with 'git merge-tree'. A bug report and
some feature requests follow. Apologies for the long email.


1. Bug

When a binary file containing NUL is added on only one side, the
resulting patch is malformed. Reproduction script:

mkdir test
cd test
git init .
touch shared
git add shared
git commit -m "base"
git checkout -b left
echo "left" > x
printf '\1\0\1\n' > binary
git add x binary
git commit -m "left"
git checkout master
git checkout -b right
echo "right" > x
git add x
git commit -m "right"
git merge-tree master right left
git merge-tree master right left | xxd
cd ..
rm -rf test

The merge-tree results I get with 2.15.1 are:

added in remote
  their  100644 ddc50ce55647db1421b18aa33417442e29f63d2f binary
@@ -0,0 +1 @@
+added in both
  our    100644 c376d892e8b105bd712d06ec5162b5f31ce949c3 x
  their  100644 45cf141ba67d59203f02a54f03162f3fcef57830 x
@@ -1 +1,5 @@
+<<<<<<< .our
 right
+=======
+left
+>>>>>>> .their

00000000: 6164 6465 6420 696e 2072 656d 6f74 650a  added in remote.
00000010: 2020 7468 6569 7220 2031 3030 3634 3420    their  100644
00000020: 6464 6335 3063 6535 3536 3437 6462 3134  ddc50ce55647db14
00000030: 3231 6231 3861 6133 3334 3137 3434 3265  21b18aa33417442e
00000040: 3239 6636 3364 3266 2062 696e 6172 790a  29f63d2f binary.
00000050: 4040 202d 302c 3020 2b31 2040 400a 2b01  @@ -0,0 +1 @@.+.
00000060: 6164 6465 6420 696e 2062 6f74 680a 2020  added in both.
00000070: 6f75 7220 2020 2031 3030 3634 3420 6333  our    100644 c3
00000080: 3736 6438 3932 6538 6231 3035 6264 3731  76d892e8b105bd71
00000090: 3264 3036 6563 3531 3632 6235 6633 3163  2d06ec5162b5f31c
000000a0: 6539 3439 6333 2078 0a20 2074 6865 6972  e949c3 x.  their
000000b0: 2020 3130 3036 3434 2034 3563 6631 3431    100644 45cf141
000000c0: 6261 3637 6435 3932 3033 6630 3261 3534  ba67d59203f02a54
000000d0: 6630 3331 3632 6633 6663 6566 3537 3833  f03162f3fcef5783
000000e0: 3020 780a 4040 202d 3120 2b31 2c35 2040  0 x.@@ -1 +1,5 @
000000f0: 400a 2b3c 3c3c 3c3c 3c3c 202e 6f75 720a  @.+<<<<<<< .our.
00000100: 2072 6967 6874 0a2b 3d3d 3d3d 3d3d 3d0a   right.+=======.
00000110: 2b6c 6566 740a 2b3e 3e3e 3e3e 3e3e 202e  +left.+>>>>>>> .
00000120: 7468 6569 720a                           their.

Note that the "added in both" explanation appears to be part of the
diff for binary. The diff line should be '\1\0\1\n', but it is only
'\1', obviously suggesting a C string operation gone awry.

I haven't checked whether regular 'git diff' operations contain a
similar bug. (The NUL would have to be pretty far into the file, to
confuse the binary file detection heuristic, but that is possible.)

I don't see any particularly good work-arounds. Looking for all
possible explanations as a trigger to detect a malformed patch runs
into false positives with the explanation "merged", which occurs in
regular code.


2. Feature suggestion

Related to the bug, may I suggest a flag to omit unnecessary patches?
For "added in remote" and "deleted in remote", I don't actually need
the patch--I can grab the blob contents from the SHA myself if needed.
These cases need special handling anyway (to create/delete the file),
so the (often large) patch doesn't add much anyway. This would provide
a workaround for the bug.


3. Feature suggestion

There's no direct indication of whether any given file's merge
succeeded. Currently I sniff for merge conflicts by looking for
"+<<<<<<< .our", which feels like an ugly kludge. Could we provide an
explicit indicator? (And maybe also one for binary vs text
processing?)

Note that binary file merge conflicts don't generate patches with
three-way merge markers but instead say "warning: Cannot merge binary
files: binary (.our vs. .their)". Looking for this case even further
complicates the output parser.


4. API suggestion

Here's what I really want 'git merge-tree' to output. :)

If the merge had no conflicts, write the resulting tree to the object
database and give me its sha. I can always diff that tree against
branch1's tree if I want to see what has changed.

If the merge had conflicts, write the "as merged as possible" tree to
the object database and give me its sha, and then also give me the
three-way merge diff output for all conflicts, as a regular patch
against that tree, using full path names and shas. (Alternatively,
maybe better, give me a second sha for a tree containing all the
three-way merge diff patches applied, which I can diff against the
first tree to find the conflict patches.)

I'm not sure what to do about binary merge conflicts, since they
aren't representable with three-way markers. Maybe just list them,
with the two conflicting blob ids?



Thanks very much,
-josh

             reply	other threads:[~2018-01-21  3:00 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-21  3:00 Josh Bleecher Snyder [this message]
2018-01-21  8:14 ` git merge-tree: bug report and some feature requests Ævar Arnfjörð Bjarmason
2018-01-22 17:58 ` Elijah Newren
2018-01-23  7:08   ` Josh Bleecher Snyder
2018-01-23 11:52     ` Edward Thomson
2018-01-24 18:46       ` Josh Bleecher Snyder
2018-01-24 19:39       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFAcib-2fxiVxtVWcbvafY3-Br7Y70HMiHFZoT0VfK6JU0Dp9A@mail.gmail.com \
    --to=josharian@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).