[RFC] Git rerere and non-conflicting changes during conflict resolution

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFC] Git rerere and non-conflicting changes during conflict resolution
@ 2017-07-25 15:09 Raman Gupta
  2017-07-25 17:52 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Raman Gupta @ 2017-07-25 15:09 UTC (permalink / raw)
  To: git

I had an interesting situation today: resolving a merge conflict
required modification in other files that were not themselves conflicting.

I just realized that rerere does not remember any changes to these
additional files -- only changes to the conflicting files. This makes
the end result of rerere obviously incorrect in this situation.

So my questions are:

1) Is this a known limitation or is there a reason rerere works in
this manner?

1b) If it is a limitation/bug, what would be needed to fix it? With
some guidance, I might be able to submit a patch...

2) In the meantime, is there a way I can identify these cases, without
which I cannot really trust rerere is doing the right thing?

Regards,
Raman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 15:09 [RFC] Git rerere and non-conflicting changes during conflict resolution Raman Gupta
@ 2017-07-25 17:52 ` Jeff King
  2017-07-25 19:54   ` Raman Gupta
  2017-07-25 20:26   ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2017-07-25 17:52 UTC (permalink / raw)
  To: Raman Gupta; +Cc: git

On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote:

> I had an interesting situation today: resolving a merge conflict
> required modification in other files that were not themselves conflicting.
> 
> I just realized that rerere does not remember any changes to these
> additional files -- only changes to the conflicting files. This makes
> the end result of rerere obviously incorrect in this situation.
> 
> So my questions are:
> 
> 1) Is this a known limitation or is there a reason rerere works in
> this manner?

Yes, it's known. Rerere works by storing a mapping of conflicted hunks
to their resolutions. If there's no conflicted hunk, I'm not sure how
we'd decide what to feed into the mapping to see if there is some
content to be replaced.

That said, I'm far from an expert on how rerere works. Junio might have
ideas on how we could handle this better. But I do note that for
repeated integration runs (like we do for topics in git.git, as they get
merged to "pu", then "next", then "master"), he keeps non-conflict
fixups in a separate commit which gets squashed into the merge
automatically. See

  https://github.com/git/git/blob/todo/Reintegrate#L185-L191

> 1b) If it is a limitation/bug, what would be needed to fix it? With
> some guidance, I might be able to submit a patch...

As far as I know, something like the Reintegrate script above is the
state of the art. IMHO it would be useful if something similar were
integrated into rerere, but I'm not sure exactly how it would know when
to trigger.

> 2) In the meantime, is there a way I can identify these cases, without
> which I cannot really trust rerere is doing the right thing?

I do think it would be useful if rerere could look at a merge result and
say "OK, I've recorded these bits, but there are other lines that are
not part of either parent and which are not part of a conflict". That
gives you a warning that such lines need to be part of a fixup (rather
than you being surprised when you redo the merge later and have to
rework the fixup).

But I don't think even then you can ever trust rerere fully.
Fundamentally you're applying some changes from one merge into another
context. There may be new sites that also need fixing up, and the tool
has no way to know. So you should treat a rerere-helped merge as any
other merge: assume it's a good starting point but use other tools (like
the compiler or automated tests) to confirm that the result is sensible.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 17:52 ` Jeff King
@ 2017-07-25 19:54   ` Raman Gupta
  2017-07-25 20:25     ` Jeff King
  2017-07-25 20:26   ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Raman Gupta @ 2017-07-25 19:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 25/07/17 01:52 PM, Jeff King wrote:
> On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote:
> 
>> I had an interesting situation today: resolving a merge conflict
>> required modification in other files that were not themselves conflicting.
>>
>> I just realized that rerere does not remember any changes to these
>> additional files -- only changes to the conflicting files. This makes
>> the end result of rerere obviously incorrect in this situation.
>>
>> So my questions are:
>>
>> 1) Is this a known limitation or is there a reason rerere works in
>> this manner?
> 
> Yes, it's known. Rerere works by storing a mapping of conflicted hunks
> to their resolutions. If there's no conflicted hunk, I'm not sure how
> we'd decide what to feed into the mapping to see if there is some
> content to be replaced.
> 
> That said, I'm far from an expert on how rerere works. Junio might have
> ideas on how we could handle this better. But I do note that for
> repeated integration runs (like we do for topics in git.git, as they get
> merged to "pu", then "next", then "master"), he keeps non-conflict
> fixups in a separate commit which gets squashed into the merge
> automatically. See
> 
>   https://github.com/git/git/blob/todo/Reintegrate#L185-L191

Seems relatively simple to me, at least conceptually.

1) Store the state of the index after the merge.

2) After conflict resolution is complete (i.e. user executes "git
commit"), diff index @ step 1 with commit.

3) Assume that all changes in that diff are related to conflict
resolution (as they should be), and save that diff to the rerere cache.

I could be missing something fundamental here though...

>> 1b) If it is a limitation/bug, what would be needed to fix it? With
>> some guidance, I might be able to submit a patch...
> 
> As far as I know, something like the Reintegrate script above is the
> state of the art. IMHO it would be useful if something similar were
> integrated into rerere, but I'm not sure exactly how it would know when
> to trigger.

I've seen the Reintegrate script before. It is very specific to the
git.git workflow. I think it makes sense to expose this particular
capability in git proper, given that rerere itself is exposed. Plus
that could actually simplify the Reintegrate script a bit.

>> 2) In the meantime, is there a way I can identify these cases, without
>> which I cannot really trust rerere is doing the right thing?
> 
> I do think it would be useful if rerere could look at a merge result and
> say "OK, I've recorded these bits, but there are other lines that are
> not part of either parent and which are not part of a conflict". That
> gives you a warning that such lines need to be part of a fixup (rather
> than you being surprised when you redo the merge later and have to
> rework the fixup).

Agreed.

> But I don't think even then you can ever trust rerere fully.
> Fundamentally you're applying some changes from one merge into another
> context. There may be new sites that also need fixing up, and the tool
> has no way to know. So you should treat a rerere-helped merge as any
> other merge: assume it's a good starting point but use other tools (like
> the compiler or automated tests) to confirm that the result is sensible.

While I agree that every case cannot be handled, and this type of
validation is still necessary, I believe git should cover the cases
which it is proper and possible to cover.

> -Peff
> 

Regards,
Raman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 19:54   ` Raman Gupta
@ 2017-07-25 20:25     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2017-07-25 20:25 UTC (permalink / raw)
  To: Raman Gupta; +Cc: git

On Tue, Jul 25, 2017 at 03:54:32PM -0400, Raman Gupta wrote:

> > That said, I'm far from an expert on how rerere works. Junio might have
> > ideas on how we could handle this better. But I do note that for
> > repeated integration runs (like we do for topics in git.git, as they get
> > merged to "pu", then "next", then "master"), he keeps non-conflict
> > fixups in a separate commit which gets squashed into the merge
> > automatically. See
> > 
> >   https://github.com/git/git/blob/todo/Reintegrate#L185-L191
> 
> Seems relatively simple to me, at least conceptually.
> 
> 1) Store the state of the index after the merge.
> 
> 2) After conflict resolution is complete (i.e. user executes "git
> commit"), diff index @ step 1 with commit.
> 
> 3) Assume that all changes in that diff are related to conflict
> resolution (as they should be), and save that diff to the rerere cache.
> 
> I could be missing something fundamental here though...

That describes the "save" step. How do we do the apply step? We later do
a merge and see that there are some conflicts. We compute a hash for
each conflict, and see if it has a resolution in the database.

But how do we decide which hashes to look up for the parts that are
outside of a conflict?

In theory you want those fixups to "ride along" with the other
conflicts. But which fixups ride along with which conflicts? You really
want some way to uniquely identify a particular merge. But of course you
can't use the content, because the whole point of rerere is that you're
doing a _similar_ merge in another context. And that's why Reintegrate
ties it to the branch being merged. That gives us the unique identifier.
But rerere normally does not concern itself with that.

I think you could mark the fixups as potential ride-alongs for each
conflict that's resolved. And then when we apply a resolution, tell the
user "by the way, you might want these fixups, too" (or maybe even try
to apply them automatically). That will suggest fixups that may not be
relevant anymore, but if the next "save" includes a particular
resolution but _not_ a potential fixup, then that fixup can be culled
from its list.

There's still a corner case that doesn't handle, though: you might need
fixups even if you have no conflicts at all.

So we may be stumbling towards an answer, but it still sounds kind of
hacky and fragile.

> > As far as I know, something like the Reintegrate script above is the
> > state of the art. IMHO it would be useful if something similar were
> > integrated into rerere, but I'm not sure exactly how it would know when
> > to trigger.
> 
> I've seen the Reintegrate script before. It is very specific to the
> git.git workflow. I think it makes sense to expose this particular
> capability in git proper, given that rerere itself is exposed. Plus
> that could actually simplify the Reintegrate script a bit.

Right, sorry if I wasn't clear; my "something similar" meant only the
fixup-applying lines I linked to.

> > But I don't think even then you can ever trust rerere fully.
> > Fundamentally you're applying some changes from one merge into another
> > context. There may be new sites that also need fixing up, and the tool
> > has no way to know. So you should treat a rerere-helped merge as any
> > other merge: assume it's a good starting point but use other tools (like
> > the compiler or automated tests) to confirm that the result is sensible.
> 
> While I agree that every case cannot be handled, and this type of
> validation is still necessary, I believe git should cover the cases
> which it is proper and possible to cover.

Yes, I agree that some help is usually better than none (although we
need to be careful that any improvement is honest about its limitations
for any given case, and doesn't end up tricking the user into a false
sense of security).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 17:52 ` Jeff King
  2017-07-25 19:54   ` Raman Gupta
@ 2017-07-25 20:26   ` Junio C Hamano
  2017-07-25 20:40     ` Junio C Hamano
  2017-07-25 20:58     ` Jeff King
  1 sibling, 2 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-07-25 20:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Raman Gupta, git

Jeff King <peff@peff.net> writes:

>> 1) Is this a known limitation or is there a reason rerere works in
>> this manner?
>
> Yes, it's known. Rerere works by storing a mapping of conflicted hunks
> to their resolutions. If there's no conflicted hunk, I'm not sure how
> we'd decide what to feed into the mapping to see if there is some
> content to be replaced.

Correct.  

This is not even a limitation but is outside the scope of rerere.
Let's understand that first.

A semantic conflict that requires an evil merge that touches a file
that is not involved in any textual conflict during a merge will
happen even if there is *NO* textual merge conflict.  

Imagine that there is a global variable 'xyzzy' used in many places
in the code, and then a side branch forks from the mainline.  The
mainline renames the variable to 'frotz' in the entire codebase,
while the side branch adds one more place that the variable is used
under its original name.  Then you merge these two branches.  This
will textually merge cleanly if the place the side branch adds a new
mention of 'xyzzy' is textually far from any block of text in the
common ancestor that has been updated on the mainline while these
two branches diverged.

"git checkout mainline && git merge side" will cleanly automerge,
yet the result is not correct.  The new mention of 'xyzzy' added by
the merge needs to be corrected to 'frotz'.

Now we take that as the baseline and further imagine that during the
time these two branches diverged, the mainline also updated some
documentation of something totally unrelated to 'xyzzy' vs 'frotz'
variable.  Perhaps README was updated, or something.  The side
branch also updated the same file in a different way.  This time,
the changes to this same file may result in textual conflict.

"git checkout mainline && git merge side" will result in a conflict,
whose resolution may be recorded by rerere for that file.  It should
be crystal clear that this conflict does *not* have anything to do
with the semantic conflict between 'xyzzy' vs 'frotz'.  

The realization we must draw from the above observation is that what
the "merge-fix" machinery in the Reintegrate script you cited in
your message tries to help, which is the semantic conflict,
fundamentally cannot be tied to any textual merge conflict that may
(or may not) happen.  That is what makes the issue outside the scope
of rerere.

The above is not to say that the need to record and replay such evil
merges to solve semantic conflict does not exist.  Far from it.  It
is just clarifying that it is a wrong approach to try to "teach"
rerere to somehow handle that case as well.

If we wanted to port the "merge-fix" logic, and I do wish it would
happen some day, the update belongs to "git merge".

You were too kind to call the "merge-fix" logic in Reintegrate "the
state of the art", but I am not happy about its limitation.  Here
are the things I wish to have in an ideal version of the "merge-fix"
logic, which does not exist yet:

 * There is a database of "to be cherry-picked" commits, keyed by a
   pair of branch names.  That is, given branches A and B, the
   database will return 0 or more commits that can be cherry-picked.
   The order of branch names in the pair is immaterial, i.e. asking
   the database for cherry-pickable commits keyed by <A, B> and
   keyed by <B, A> will yield the identical set of commits.

 * When merging commit X to commit Y, "git merge" in the ideal world
   does the following:

   - It first does what it currently does, i.e. three-way merge with
     the merge strategy and applying rerere for re-application of an
     earlier resolution of textual conflicts.

   - Then, it looks up the database to find the keys <A, B> where
     A is in X but not in Y, and B is not in X but in Y.
     These commits are cherry-picked and squashed into the result of
     the above.

The intent is that a pair <A, B> represents the mainline and side
branch in the above example, where A renamed 'xyzzy' to 'frotz' and
B added new reference to 'xyzzy'.  And the cherry-pickable commit
found in the database is to tweak the 'xyzzy' B adds into 'frotz'.

I said A and B in the above are branch names, but in the ideal
world, they can be commit object names (possibly in the middle of a
branch), as long as we can reliable update the database's keys every
time "git rebase" etc. rewrites commits.

To populate the database, we'd need a reverse.

 * When merging branch B into branch A (or the other way around) for
   the first time, "git merge" would do what it currently does.

 * The user then concludes the merge to resolve *ONLY* the textual
   conflict, and commit the result.  It is important that no
   additional evil merge to correct for semantic conflicts is done
   in this step.  Note that if the auto-merge cleanly succeeds, this
   step may not even exist.

 * Then the user makes another commit to correct for semantic
   conflicts (aka "merge-fix").

 * Then the user tells Git that semantic conflicts were resolved and
   need to be recorded (just like running "git rerere" manually,
   before "git commit" automatically does it for them these days).
   This will result in the following:

   - The database is updated so that key <A, B> yields the
     "merge-fix" commit;

   - The head is detached at the tip of branch A before the merge;

   - "git merge B" is done again, which _should_ reproduce the state
     immediately after the user committed the "merge-fix";

   - The tip of branch A is reset to the result of the above.

The merge-fix logic in Reintegrate is a poor-man's emulation of the
above ideal.  A value its database yields is not a set of commits,
but a single commit, and instead of getting keyed by a pair of
branch names, the database is keyed by a single branch name
(i.e. recording "I had trouble when merging this branch" without
saying "... to the integration branch that already had this other
branch"), so the look-up does not have to do "A is in X but not in
Y, and B is not in X but in Y".  

It is still usable but the database need to be reorganized every
time the order of topics merged to 'pu' is changed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 20:26   ` Junio C Hamano
@ 2017-07-25 20:40     ` Junio C Hamano
  2017-07-25 20:58     ` Jeff King
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-07-25 20:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Raman Gupta, git

Junio C Hamano <gitster@pobox.com> writes:

> To populate the database, we'd need a reverse.
> ...
>  * Then the user tells Git that semantic conflicts were resolved and
>    need to be recorded (just like running "git rerere" manually,
>    before "git commit" automatically does it for them these days).
>    This will result in the following:
>
>    - The database is updated so that key <A, B> yields the
>      "merge-fix" commit;
> ...

I probably should have been aiming for stars, as I were outlining
the ideal merge-fix logic.  The key <A, B> is merely a default, and
the worst one at that.  There should be a way for the user to tell
which exact pair of commits (i.e. another side branch that was
merged earlier to the mainline A that renamed 'xyzzy' to 'frotz'
wholesale, and the exact commit on the side branch B that added an
extra mention of 'xyzzy').  

If the logic can figure out what these two commits are without
user's help, mechanically by only looking at the merge-fix commit,
that would be even better.  But I do not believe in miracles, so...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 20:26   ` Junio C Hamano
  2017-07-25 20:40     ` Junio C Hamano
@ 2017-07-25 20:58     ` Jeff King
  2017-07-26  7:14       ` Junio C Hamano
  2017-07-26  8:06       ` Junio C Hamano
  1 sibling, 2 replies; 10+ messages in thread
From: Jeff King @ 2017-07-25 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Raman Gupta, git

On Tue, Jul 25, 2017 at 01:26:34PM -0700, Junio C Hamano wrote:

> This is not even a limitation but is outside the scope of rerere.
> Let's understand that first.
> [...]
> If we wanted to port the "merge-fix" logic, and I do wish it would
> happen some day, the update belongs to "git merge".

Looks like this crossed with my latest email. Overall I agree with you.

I almost said the same thing about scope initially, but I decided it
doesn't really matter. From the user's perspective there may be a tool X
that replays bits of a previous merge result. And that task can be
subdivided into replaying conflict resolution and replaying merge-fixes.

From the user's perspective, calling X "rerere" would probably be OK[1].
But from an implementation perspective (and to keep the existing
plumbing available and unchanged), it probably makes sense to call it
something else, and have it run both rerere and a new plumbing command
to do the merge-fix work (or call it nothing, and assume that users will
either touch the plumbing directly or will use "git merge" to trigger
both).

So if you want to shut down immediately the idea that this would be
bolted onto rerere, I can support that. There are ways of doing it that
would make sense to combine with rerere (like the "tying fixups to
conflict resolution" sketch I gave in the other email), but I agree they
will end up fundamentally hacky (because of the exact "you may not even
have textual conflicts" I mentioned).

The only part I'd disagree with above is that this belongs to git-merge.
I think it should be its own plumbing tool that merge calls alongside
rerere. ;)

>    - Then, it looks up the database to find the keys <A, B> where
>      A is in X but not in Y, and B is not in X but in Y.
>      These commits are cherry-picked and squashed into the result of
>      the above.

I think this is the crux of it. I mentioned in my other email that what
we really want is some way to say "this is roughly the same merge".
The Reintegrate script does it with the topic branch name and an
implicit "merging up to an integration branch".

Not having thought too hard about it yet, this containing relationship
seems like the right direction. I guess you'd do the lookup by computing
the merge-base M of <X,Y> (which we already know anyway), walking M..X
and looking for any entries which mention those commits (in either A or
B slots of the entry), and then similarly narrowing it according to
M..Y.

Hrm. That doesn't quite work, though. Because if your <A,B> are the
merge, then merging a topic to next will get an "A" that is a merge
commit from next. But that commit will never end up in master. What's
causing the conflict is really some "A" that is in the history between
the merge base and "A" (but we don't know which).

So you'd almost have to do an intersection of the left side of "$(git
merge-base A B)..A" with what's in X and Y (with respect to their merge
base). Err, maybe vice versa. But the point is that we're looking for
overlapping set unions, I think, not the presence of particular tips.

> I said A and B in the above are branch names, but in the ideal
> world, they can be commit object names (possibly in the middle of a
> branch), as long as we can reliable update the database's keys every
> time "git rebase" etc. rewrites commits.

What if instead of commit hashes we used patch ids?

There's one trick there, which is that merges don't have a well-defined
commit id. We could use its actual commit id in that case. That would
work OK in practice for a workflow like git.git's, because the merge
commits are never rewritten. But it would fall down if people do mixed
rebases and merges on their topic branches.

> To populate the database, we'd need a reverse.
> 
>  * When merging branch B into branch A (or the other way around) for
>    the first time, "git merge" would do what it currently does.
> 
>  * The user then concludes the merge to resolve *ONLY* the textual
>    conflict, and commit the result.  It is important that no
>    additional evil merge to correct for semantic conflicts is done
>    in this step.  Note that if the auto-merge cleanly succeeds, this
>    step may not even exist.
> 
>  * Then the user makes another commit to correct for semantic
>    conflicts (aka "merge-fix").

I think it's asking a lot for users to handle the textual conflicts and
semantic ones separately. It would be nice if we could tell them apart
automatically (and I think we can based on what isn't part of the
conflict resolution).

That still ends up with one giant "fixup" commit. But I don't know how
else you'd do it. I could make several commits, but we still don't know
how to attribute them to anything but the mass <A,B> merge. We don't
know which commits were responsible for which fixups (and I wouldn't
want to ask the user to figure it out), so the best we can do is apply
them all.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 20:58     ` Jeff King
@ 2017-07-26  7:14       ` Junio C Hamano
  2017-07-27 21:01         ` Junio C Hamano
  2017-07-26  8:06       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-07-26  7:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Raman Gupta, git

Jeff King <peff@peff.net> writes:

> From the user's perspective, calling X "rerere" would probably be OK[1].
> But from an implementation perspective (and to keep the existing
> plumbing available and unchanged), it probably makes sense to call it
> something else, and have it run both rerere and a new plumbing command
> to do the merge-fix work (or call it nothing, and assume that users will
> either touch the plumbing directly or will use "git merge" to trigger
> both).
> ...
> I think it should be its own plumbing tool that merge calls alongside
> rerere. ;)

As long as we use the database keyed with <A,B> and take the merge
base into account, "git am" and "git cherry-pick" would not be able
to use the merge-fix machinery, so in that sense, calling X "rerere"
would not be OK, but I agree with your general sentiment about the
UI visible to the end users.  Just like "rerere" started with a
small step to avoid automating things too much and then later became
almost invisible for normal cases because we managed to automate it
reasonably well and integrate it into mergy operations, we may be
able to do the same for merge-fix machinery.  My "this belongs to
'merge'" is primarily coming from it---it might be reusable in other
mergy operations with some fundamental changes, but I envision that
the primarly and only user of that X would initially be 'merge'.

> Not having thought too hard about it yet, this containing relationship
> seems like the right direction. I guess you'd do the lookup by computing
> the merge-base M of <X,Y> (which we already know anyway), walking M..X
> and looking for any entries which mention those commits (in either A or
> B slots of the entry), and then similarly narrowing it according to
> M..Y.

Yes, every time I look at the Reintegrate script, I try to think of
an efficient implementation but do not find anything better than the
left-right walk over X...Y range, so that is my conclusion after
having thought about it very hard as well ;-)

> What if instead of commit hashes we used patch ids?

Now you may be onto something.  While we aim at the ultimately automated
ideal that would catch the maximal cases, for the earlier 'xyzzy
turns into frotz' example, the minimal cue to identify one side of
the pair that keys the "change this new instance of xyzzy into
frotz, too" merge-fix is a hunk like this:

    -const char *xyzzy;
    +const char *frotz;

It does not matter what other changes also appear in the same
commit, and my original "branch name" is way too broad, and my
previous "ideal" which is "a single problematic commit" is still
broader than necessary.  Well, patch-id identifies the entire change
contained in a single commit, so it is still too broad, but if we
can narrow it down to a single hunk like that, perhaps we can use it
as one side of the key.

And the other side of the key is naturally a hunk like this:

    +	printf("%s\n", xyzzy);

i.e. a new use of xyzzy appears where it didn't exist before.  And
when we merge two branches, one of which has one half of the key
(i.e. "const char *xyzzy;" turned into "const char *frotz"), and the
other has the other half of the key (i.e. "printf xyzzy" is added),
then we'd apply a patch that tells us to do this:

    -	printf("%s\n", xyzzy);
    +	printf("%s\n", frotz);

i.e. that patch would be the value in the database keyed by the pair
of two previous hunks.

> I think it's asking a lot for users to handle the textual conflicts and
> semantic ones separately. It would be nice if we could tell them apart
> automatically (and I think we can based on what isn't part of the
> conflict resolution).

After thinking about this a bit more, I realized that it is possible
to mechanically sift a human generated resolution that has both
textual conflict resolution (which will be handled by "rerere") and
semantic one (which needs the merge-fix machinery) into two, without
requiring the user to make two separate commits.

The key trick is that "rerere" is capable of recording and replaying
semantic conflict resolution made to a file that happens to have
textual conflict just fine.  Because rerere database records the
preimage (i.e. with conflict markers) and postimage (i.e. the final
resolution for the file) as an entire file contents, it can do 3-way
merge for parts of the same file that is away from any conflicted
region.

If you tell contrib/rerere-train.sh to learn from "pu^{/^Merge
branch 'bp/fsmonitor' into pu}", you'll see what I mean.

  $ git show "pu^{/^Merge branch 'bp/fsmonitor' into pu}" compat/bswap.h

shows the result of such resolution.  Early part of the combined
diff shown by the above command comes from textual conflict
resolution, but there is a new implementation of inline version of
get_be64 that has become necessary but did not exist in either of
the branches getting merged, which is shown as an evil merge.

So the way to automatically sift an existing merge into textual
conflict resolution (i.e. automatable with "rerere") and semantic
evil merge-fix is to first try to recreate the merge and enumerate
the paths that get conflicts.  Then resolve these paths by grabbing
the resolved result for these paths that get textual conflicts out
of the original merge.  That gives us the textual part (we can run
"git rerere" at this point to store the resolution away to the
rerere database---which is essentially what contrib/rerere-train
does).

There will be difference between the result of the above and the
original merge commit.  The paths that are different are all outside
these conflicted paths, and they are what we want in the second
commit, i.e. the semantic evil merge-fix, which will be the value
for the merge-fix database.

So that part is easily automatable.  

It is much harder to come up with a way to index into the merge-fix
database.  We can "punt" and use the same index scheme as the
Reintegrate script (i.e. key it with the name of the branch being
merged, which can be read from the original merge) as the initial
approximation, and that would already be much better than what
Reintegrate script currently does, in that the recording part is
much more automated (in the workflow I use the Reintegrate script,
the side that records a merge-fix initially is entirely manual).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-25 20:58     ` Jeff King
  2017-07-26  7:14       ` Junio C Hamano
@ 2017-07-26  8:06       ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-07-26  8:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Raman Gupta, git

Jeff King <peff@peff.net> writes:

> Hrm. That doesn't quite work, though. Because if your <A,B> are the
> merge, then merging a topic to next will get an "A" that is a merge
> commit from next. But that commit will never end up in master. What's
> causing the conflict is really some "A" that is in the history between
> the merge base and "A" (but we don't know which).

There may be a misunderstanding.  When I said the key <A,B> is a
pair of branch names, I didn't mean 'A' to be the name of an
integration branch (e.g. 'pu') and 'B' to be the name of a topic.
Rather, both 'A' and 'B' are the names of topic branches.  

IOW, instead of having refs/merge-fix/sd/branch-copy that says "I
know when I merge sd/branch-copy to pu or jch, there is a semantic
conflict with some unnamed topic that is likely to be already in
there", i.e. keying with only a single topic name, the ideal I
presented would say 'sd/branch-copy and mh/packed-ref-store topics
are both by themselves OK, but when merged together, the end result
of textual merge needs to be further fixed up by cherry-picking this
change', by keying a change with a pair of topic names,
sd/branch-copy (which introduces a new method in the ref backend
vtable) and mh/packed-ref-store (which adds a new ref backend).  The
latter does not know the need for the new method, and the former
does not know the need to implement its new method in a new backend,
so a merge needs a trivial implementation of the new method added to
the new backend, which is what refs/merge-fix/sd/branch-copy does.

And better yet, instead of A=sd/branch-copy B=mh/packed-ref-store,
we could point at the exact commit on each of these branches that
introduce the semantic conflict.  I would probably pick these two

  A=52d59cc6 ("branch: add a --copy (-c) option to go with --move (-m)", 2017-06-18)
  B=67be7c5a ("packed-backend: new module for handling packed references", 2017-06-23)

so when we are on commit X that has A but not B, and are trying to
merge branch Y that has B but not A, we want the merge-fix to kick
in.  Walking "rev-list --left-right X...Y" and noticing A and B in
the output would be a way to notice it.

[footnote]

*1* https://github.com/gitster/git/ should mirror these refs in the
    refs/merge-fix/ hierarchymentioned in the body of this article.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution
  2017-07-26  7:14       ` Junio C Hamano
@ 2017-07-27 21:01         ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-07-27 21:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Raman Gupta, git

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> From the user's perspective, calling X "rerere" would probably be OK[1].
>> But from an implementation perspective (and to keep the existing
>> plumbing available and unchanged), it probably makes sense to call it
>> something else, and have it run both rerere and a new plumbing command
>> to do the merge-fix work (or call it nothing, and assume that users will
>> either touch the plumbing directly or will use "git merge" to trigger
>> both).
>> ...
>> I think it should be its own plumbing tool that merge calls alongside
>> rerere. ;)
>
> As long as we use the database keyed with <A,B> and take the merge
> base into account, "git am" and "git cherry-pick" would not be able
> to use the merge-fix machinery, so in that sense, calling X "rerere"
> would not be OK, but I agree with your general sentiment about the
> UI visible to the end users.

Actually, I guess "cherry-pick" could use it if we think hard and
long enough and come up with an ideal scheme to compute the index
into the merge-fix database.

Imagine this topology:

       A---o---o---...        topic #1
      /
 o---o---o---...              mainline
      \
       o---B---o---C---...    topic #2

where topic #1 renames 'xyzzy' to 'frotz' at commit A, and topic #2
adds a new mention of 'xyzzy' in file F at commit B and another in
file E at commit C.

In the ideal world, we would have two merge-fix database entries,
one that turns 'xyzzy' in file F to 'frotz' that is keyed by the
pair of commits <A,B>, and the other that does the same in file E
that is keyed by <A,C>.  When merging the topic #1 and the topic #2
together, or when merging the topic #2 to a mainline that already
has merged the topic #1, the merge-fix machinery notices that one
side has A but not B nor C, and the other side has B and C but not
A, and finds these two merge-fixes and applies on top of the textual
merge.

If we are cherry-picking C to something that already has A, then, we
should be able to notice that the history that receives the cherry-pick
has A but not C, and C, which is being picked, does not have A, and
decide that merge-fix <A,C> is relevant.

If we do this purely with commit object name, it will still not work
if we cherry-pick A to mainline and then we cherry-pick C.  The
mainline may hae change from A but does not have the exact commit A.

Which brings us back to your earlier idea to use something like
patch-id to identify these individual changes.  I am not sure how we
can structure the merge-fix database so that we can efficiently find
which "changes" are already on a branch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-07-27 21:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-25 15:09 [RFC] Git rerere and non-conflicting changes during conflict resolution Raman Gupta
2017-07-25 17:52 ` Jeff King
2017-07-25 19:54   ` Raman Gupta
2017-07-25 20:25     ` Jeff King
2017-07-25 20:26   ` Junio C Hamano
2017-07-25 20:40     ` Junio C Hamano
2017-07-25 20:58     ` Jeff King
2017-07-26  7:14       ` Junio C Hamano
2017-07-27 21:01         ` Junio C Hamano
2017-07-26  8:06       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).