* [RFC] Git rerere and non-conflicting changes during conflict resolution @ 2017-07-25 15:09 Raman Gupta 2017-07-25 17:52 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: Raman Gupta @ 2017-07-25 15:09 UTC (permalink / raw) To: git I had an interesting situation today: resolving a merge conflict required modification in other files that were not themselves conflicting. I just realized that rerere does not remember any changes to these additional files -- only changes to the conflicting files. This makes the end result of rerere obviously incorrect in this situation. So my questions are: 1) Is this a known limitation or is there a reason rerere works in this manner? 1b) If it is a limitation/bug, what would be needed to fix it? With some guidance, I might be able to submit a patch... 2) In the meantime, is there a way I can identify these cases, without which I cannot really trust rerere is doing the right thing? Regards, Raman ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 15:09 [RFC] Git rerere and non-conflicting changes during conflict resolution Raman Gupta @ 2017-07-25 17:52 ` Jeff King 2017-07-25 19:54 ` Raman Gupta 2017-07-25 20:26 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2017-07-25 17:52 UTC (permalink / raw) To: Raman Gupta; +Cc: git On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote: > I had an interesting situation today: resolving a merge conflict > required modification in other files that were not themselves conflicting. > > I just realized that rerere does not remember any changes to these > additional files -- only changes to the conflicting files. This makes > the end result of rerere obviously incorrect in this situation. > > So my questions are: > > 1) Is this a known limitation or is there a reason rerere works in > this manner? Yes, it's known. Rerere works by storing a mapping of conflicted hunks to their resolutions. If there's no conflicted hunk, I'm not sure how we'd decide what to feed into the mapping to see if there is some content to be replaced. That said, I'm far from an expert on how rerere works. Junio might have ideas on how we could handle this better. But I do note that for repeated integration runs (like we do for topics in git.git, as they get merged to "pu", then "next", then "master"), he keeps non-conflict fixups in a separate commit which gets squashed into the merge automatically. See https://github.com/git/git/blob/todo/Reintegrate#L185-L191 > 1b) If it is a limitation/bug, what would be needed to fix it? With > some guidance, I might be able to submit a patch... As far as I know, something like the Reintegrate script above is the state of the art. IMHO it would be useful if something similar were integrated into rerere, but I'm not sure exactly how it would know when to trigger. > 2) In the meantime, is there a way I can identify these cases, without > which I cannot really trust rerere is doing the right thing? I do think it would be useful if rerere could look at a merge result and say "OK, I've recorded these bits, but there are other lines that are not part of either parent and which are not part of a conflict". That gives you a warning that such lines need to be part of a fixup (rather than you being surprised when you redo the merge later and have to rework the fixup). But I don't think even then you can ever trust rerere fully. Fundamentally you're applying some changes from one merge into another context. There may be new sites that also need fixing up, and the tool has no way to know. So you should treat a rerere-helped merge as any other merge: assume it's a good starting point but use other tools (like the compiler or automated tests) to confirm that the result is sensible. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 17:52 ` Jeff King @ 2017-07-25 19:54 ` Raman Gupta 2017-07-25 20:25 ` Jeff King 2017-07-25 20:26 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Raman Gupta @ 2017-07-25 19:54 UTC (permalink / raw) To: Jeff King; +Cc: git On 25/07/17 01:52 PM, Jeff King wrote: > On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote: > >> I had an interesting situation today: resolving a merge conflict >> required modification in other files that were not themselves conflicting. >> >> I just realized that rerere does not remember any changes to these >> additional files -- only changes to the conflicting files. This makes >> the end result of rerere obviously incorrect in this situation. >> >> So my questions are: >> >> 1) Is this a known limitation or is there a reason rerere works in >> this manner? > > Yes, it's known. Rerere works by storing a mapping of conflicted hunks > to their resolutions. If there's no conflicted hunk, I'm not sure how > we'd decide what to feed into the mapping to see if there is some > content to be replaced. > > That said, I'm far from an expert on how rerere works. Junio might have > ideas on how we could handle this better. But I do note that for > repeated integration runs (like we do for topics in git.git, as they get > merged to "pu", then "next", then "master"), he keeps non-conflict > fixups in a separate commit which gets squashed into the merge > automatically. See > > https://github.com/git/git/blob/todo/Reintegrate#L185-L191 Seems relatively simple to me, at least conceptually. 1) Store the state of the index after the merge. 2) After conflict resolution is complete (i.e. user executes "git commit"), diff index @ step 1 with commit. 3) Assume that all changes in that diff are related to conflict resolution (as they should be), and save that diff to the rerere cache. I could be missing something fundamental here though... >> 1b) If it is a limitation/bug, what would be needed to fix it? With >> some guidance, I might be able to submit a patch... > > As far as I know, something like the Reintegrate script above is the > state of the art. IMHO it would be useful if something similar were > integrated into rerere, but I'm not sure exactly how it would know when > to trigger. I've seen the Reintegrate script before. It is very specific to the git.git workflow. I think it makes sense to expose this particular capability in git proper, given that rerere itself is exposed. Plus that could actually simplify the Reintegrate script a bit. >> 2) In the meantime, is there a way I can identify these cases, without >> which I cannot really trust rerere is doing the right thing? > > I do think it would be useful if rerere could look at a merge result and > say "OK, I've recorded these bits, but there are other lines that are > not part of either parent and which are not part of a conflict". That > gives you a warning that such lines need to be part of a fixup (rather > than you being surprised when you redo the merge later and have to > rework the fixup). Agreed. > But I don't think even then you can ever trust rerere fully. > Fundamentally you're applying some changes from one merge into another > context. There may be new sites that also need fixing up, and the tool > has no way to know. So you should treat a rerere-helped merge as any > other merge: assume it's a good starting point but use other tools (like > the compiler or automated tests) to confirm that the result is sensible. While I agree that every case cannot be handled, and this type of validation is still necessary, I believe git should cover the cases which it is proper and possible to cover. > -Peff > Regards, Raman ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 19:54 ` Raman Gupta @ 2017-07-25 20:25 ` Jeff King 0 siblings, 0 replies; 10+ messages in thread From: Jeff King @ 2017-07-25 20:25 UTC (permalink / raw) To: Raman Gupta; +Cc: git On Tue, Jul 25, 2017 at 03:54:32PM -0400, Raman Gupta wrote: > > That said, I'm far from an expert on how rerere works. Junio might have > > ideas on how we could handle this better. But I do note that for > > repeated integration runs (like we do for topics in git.git, as they get > > merged to "pu", then "next", then "master"), he keeps non-conflict > > fixups in a separate commit which gets squashed into the merge > > automatically. See > > > > https://github.com/git/git/blob/todo/Reintegrate#L185-L191 > > Seems relatively simple to me, at least conceptually. > > 1) Store the state of the index after the merge. > > 2) After conflict resolution is complete (i.e. user executes "git > commit"), diff index @ step 1 with commit. > > 3) Assume that all changes in that diff are related to conflict > resolution (as they should be), and save that diff to the rerere cache. > > I could be missing something fundamental here though... That describes the "save" step. How do we do the apply step? We later do a merge and see that there are some conflicts. We compute a hash for each conflict, and see if it has a resolution in the database. But how do we decide which hashes to look up for the parts that are outside of a conflict? In theory you want those fixups to "ride along" with the other conflicts. But which fixups ride along with which conflicts? You really want some way to uniquely identify a particular merge. But of course you can't use the content, because the whole point of rerere is that you're doing a _similar_ merge in another context. And that's why Reintegrate ties it to the branch being merged. That gives us the unique identifier. But rerere normally does not concern itself with that. I think you could mark the fixups as potential ride-alongs for each conflict that's resolved. And then when we apply a resolution, tell the user "by the way, you might want these fixups, too" (or maybe even try to apply them automatically). That will suggest fixups that may not be relevant anymore, but if the next "save" includes a particular resolution but _not_ a potential fixup, then that fixup can be culled from its list. There's still a corner case that doesn't handle, though: you might need fixups even if you have no conflicts at all. So we may be stumbling towards an answer, but it still sounds kind of hacky and fragile. > > As far as I know, something like the Reintegrate script above is the > > state of the art. IMHO it would be useful if something similar were > > integrated into rerere, but I'm not sure exactly how it would know when > > to trigger. > > I've seen the Reintegrate script before. It is very specific to the > git.git workflow. I think it makes sense to expose this particular > capability in git proper, given that rerere itself is exposed. Plus > that could actually simplify the Reintegrate script a bit. Right, sorry if I wasn't clear; my "something similar" meant only the fixup-applying lines I linked to. > > But I don't think even then you can ever trust rerere fully. > > Fundamentally you're applying some changes from one merge into another > > context. There may be new sites that also need fixing up, and the tool > > has no way to know. So you should treat a rerere-helped merge as any > > other merge: assume it's a good starting point but use other tools (like > > the compiler or automated tests) to confirm that the result is sensible. > > While I agree that every case cannot be handled, and this type of > validation is still necessary, I believe git should cover the cases > which it is proper and possible to cover. Yes, I agree that some help is usually better than none (although we need to be careful that any improvement is honest about its limitations for any given case, and doesn't end up tricking the user into a false sense of security). -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 17:52 ` Jeff King 2017-07-25 19:54 ` Raman Gupta @ 2017-07-25 20:26 ` Junio C Hamano 2017-07-25 20:40 ` Junio C Hamano 2017-07-25 20:58 ` Jeff King 1 sibling, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2017-07-25 20:26 UTC (permalink / raw) To: Jeff King; +Cc: Raman Gupta, git Jeff King <peff@peff.net> writes: >> 1) Is this a known limitation or is there a reason rerere works in >> this manner? > > Yes, it's known. Rerere works by storing a mapping of conflicted hunks > to their resolutions. If there's no conflicted hunk, I'm not sure how > we'd decide what to feed into the mapping to see if there is some > content to be replaced. Correct. This is not even a limitation but is outside the scope of rerere. Let's understand that first. A semantic conflict that requires an evil merge that touches a file that is not involved in any textual conflict during a merge will happen even if there is *NO* textual merge conflict. Imagine that there is a global variable 'xyzzy' used in many places in the code, and then a side branch forks from the mainline. The mainline renames the variable to 'frotz' in the entire codebase, while the side branch adds one more place that the variable is used under its original name. Then you merge these two branches. This will textually merge cleanly if the place the side branch adds a new mention of 'xyzzy' is textually far from any block of text in the common ancestor that has been updated on the mainline while these two branches diverged. "git checkout mainline && git merge side" will cleanly automerge, yet the result is not correct. The new mention of 'xyzzy' added by the merge needs to be corrected to 'frotz'. Now we take that as the baseline and further imagine that during the time these two branches diverged, the mainline also updated some documentation of something totally unrelated to 'xyzzy' vs 'frotz' variable. Perhaps README was updated, or something. The side branch also updated the same file in a different way. This time, the changes to this same file may result in textual conflict. "git checkout mainline && git merge side" will result in a conflict, whose resolution may be recorded by rerere for that file. It should be crystal clear that this conflict does *not* have anything to do with the semantic conflict between 'xyzzy' vs 'frotz'. The realization we must draw from the above observation is that what the "merge-fix" machinery in the Reintegrate script you cited in your message tries to help, which is the semantic conflict, fundamentally cannot be tied to any textual merge conflict that may (or may not) happen. That is what makes the issue outside the scope of rerere. The above is not to say that the need to record and replay such evil merges to solve semantic conflict does not exist. Far from it. It is just clarifying that it is a wrong approach to try to "teach" rerere to somehow handle that case as well. If we wanted to port the "merge-fix" logic, and I do wish it would happen some day, the update belongs to "git merge". You were too kind to call the "merge-fix" logic in Reintegrate "the state of the art", but I am not happy about its limitation. Here are the things I wish to have in an ideal version of the "merge-fix" logic, which does not exist yet: * There is a database of "to be cherry-picked" commits, keyed by a pair of branch names. That is, given branches A and B, the database will return 0 or more commits that can be cherry-picked. The order of branch names in the pair is immaterial, i.e. asking the database for cherry-pickable commits keyed by <A, B> and keyed by <B, A> will yield the identical set of commits. * When merging commit X to commit Y, "git merge" in the ideal world does the following: - It first does what it currently does, i.e. three-way merge with the merge strategy and applying rerere for re-application of an earlier resolution of textual conflicts. - Then, it looks up the database to find the keys <A, B> where A is in X but not in Y, and B is not in X but in Y. These commits are cherry-picked and squashed into the result of the above. The intent is that a pair <A, B> represents the mainline and side branch in the above example, where A renamed 'xyzzy' to 'frotz' and B added new reference to 'xyzzy'. And the cherry-pickable commit found in the database is to tweak the 'xyzzy' B adds into 'frotz'. I said A and B in the above are branch names, but in the ideal world, they can be commit object names (possibly in the middle of a branch), as long as we can reliable update the database's keys every time "git rebase" etc. rewrites commits. To populate the database, we'd need a reverse. * When merging branch B into branch A (or the other way around) for the first time, "git merge" would do what it currently does. * The user then concludes the merge to resolve *ONLY* the textual conflict, and commit the result. It is important that no additional evil merge to correct for semantic conflicts is done in this step. Note that if the auto-merge cleanly succeeds, this step may not even exist. * Then the user makes another commit to correct for semantic conflicts (aka "merge-fix"). * Then the user tells Git that semantic conflicts were resolved and need to be recorded (just like running "git rerere" manually, before "git commit" automatically does it for them these days). This will result in the following: - The database is updated so that key <A, B> yields the "merge-fix" commit; - The head is detached at the tip of branch A before the merge; - "git merge B" is done again, which _should_ reproduce the state immediately after the user committed the "merge-fix"; - The tip of branch A is reset to the result of the above. The merge-fix logic in Reintegrate is a poor-man's emulation of the above ideal. A value its database yields is not a set of commits, but a single commit, and instead of getting keyed by a pair of branch names, the database is keyed by a single branch name (i.e. recording "I had trouble when merging this branch" without saying "... to the integration branch that already had this other branch"), so the look-up does not have to do "A is in X but not in Y, and B is not in X but in Y". It is still usable but the database need to be reorganized every time the order of topics merged to 'pu' is changed. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 20:26 ` Junio C Hamano @ 2017-07-25 20:40 ` Junio C Hamano 2017-07-25 20:58 ` Jeff King 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2017-07-25 20:40 UTC (permalink / raw) To: Jeff King; +Cc: Raman Gupta, git Junio C Hamano <gitster@pobox.com> writes: > To populate the database, we'd need a reverse. > ... > * Then the user tells Git that semantic conflicts were resolved and > need to be recorded (just like running "git rerere" manually, > before "git commit" automatically does it for them these days). > This will result in the following: > > - The database is updated so that key <A, B> yields the > "merge-fix" commit; > ... I probably should have been aiming for stars, as I were outlining the ideal merge-fix logic. The key <A, B> is merely a default, and the worst one at that. There should be a way for the user to tell which exact pair of commits (i.e. another side branch that was merged earlier to the mainline A that renamed 'xyzzy' to 'frotz' wholesale, and the exact commit on the side branch B that added an extra mention of 'xyzzy'). If the logic can figure out what these two commits are without user's help, mechanically by only looking at the merge-fix commit, that would be even better. But I do not believe in miracles, so... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 20:26 ` Junio C Hamano 2017-07-25 20:40 ` Junio C Hamano @ 2017-07-25 20:58 ` Jeff King 2017-07-26 7:14 ` Junio C Hamano 2017-07-26 8:06 ` Junio C Hamano 1 sibling, 2 replies; 10+ messages in thread From: Jeff King @ 2017-07-25 20:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Raman Gupta, git On Tue, Jul 25, 2017 at 01:26:34PM -0700, Junio C Hamano wrote: > This is not even a limitation but is outside the scope of rerere. > Let's understand that first. > [...] > If we wanted to port the "merge-fix" logic, and I do wish it would > happen some day, the update belongs to "git merge". Looks like this crossed with my latest email. Overall I agree with you. I almost said the same thing about scope initially, but I decided it doesn't really matter. From the user's perspective there may be a tool X that replays bits of a previous merge result. And that task can be subdivided into replaying conflict resolution and replaying merge-fixes. From the user's perspective, calling X "rerere" would probably be OK[1]. But from an implementation perspective (and to keep the existing plumbing available and unchanged), it probably makes sense to call it something else, and have it run both rerere and a new plumbing command to do the merge-fix work (or call it nothing, and assume that users will either touch the plumbing directly or will use "git merge" to trigger both). So if you want to shut down immediately the idea that this would be bolted onto rerere, I can support that. There are ways of doing it that would make sense to combine with rerere (like the "tying fixups to conflict resolution" sketch I gave in the other email), but I agree they will end up fundamentally hacky (because of the exact "you may not even have textual conflicts" I mentioned). The only part I'd disagree with above is that this belongs to git-merge. I think it should be its own plumbing tool that merge calls alongside rerere. ;) > - Then, it looks up the database to find the keys <A, B> where > A is in X but not in Y, and B is not in X but in Y. > These commits are cherry-picked and squashed into the result of > the above. I think this is the crux of it. I mentioned in my other email that what we really want is some way to say "this is roughly the same merge". The Reintegrate script does it with the topic branch name and an implicit "merging up to an integration branch". Not having thought too hard about it yet, this containing relationship seems like the right direction. I guess you'd do the lookup by computing the merge-base M of <X,Y> (which we already know anyway), walking M..X and looking for any entries which mention those commits (in either A or B slots of the entry), and then similarly narrowing it according to M..Y. Hrm. That doesn't quite work, though. Because if your <A,B> are the merge, then merging a topic to next will get an "A" that is a merge commit from next. But that commit will never end up in master. What's causing the conflict is really some "A" that is in the history between the merge base and "A" (but we don't know which). So you'd almost have to do an intersection of the left side of "$(git merge-base A B)..A" with what's in X and Y (with respect to their merge base). Err, maybe vice versa. But the point is that we're looking for overlapping set unions, I think, not the presence of particular tips. > I said A and B in the above are branch names, but in the ideal > world, they can be commit object names (possibly in the middle of a > branch), as long as we can reliable update the database's keys every > time "git rebase" etc. rewrites commits. What if instead of commit hashes we used patch ids? There's one trick there, which is that merges don't have a well-defined commit id. We could use its actual commit id in that case. That would work OK in practice for a workflow like git.git's, because the merge commits are never rewritten. But it would fall down if people do mixed rebases and merges on their topic branches. > To populate the database, we'd need a reverse. > > * When merging branch B into branch A (or the other way around) for > the first time, "git merge" would do what it currently does. > > * The user then concludes the merge to resolve *ONLY* the textual > conflict, and commit the result. It is important that no > additional evil merge to correct for semantic conflicts is done > in this step. Note that if the auto-merge cleanly succeeds, this > step may not even exist. > > * Then the user makes another commit to correct for semantic > conflicts (aka "merge-fix"). I think it's asking a lot for users to handle the textual conflicts and semantic ones separately. It would be nice if we could tell them apart automatically (and I think we can based on what isn't part of the conflict resolution). That still ends up with one giant "fixup" commit. But I don't know how else you'd do it. I could make several commits, but we still don't know how to attribute them to anything but the mass <A,B> merge. We don't know which commits were responsible for which fixups (and I wouldn't want to ask the user to figure it out), so the best we can do is apply them all. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 20:58 ` Jeff King @ 2017-07-26 7:14 ` Junio C Hamano 2017-07-27 21:01 ` Junio C Hamano 2017-07-26 8:06 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2017-07-26 7:14 UTC (permalink / raw) To: Jeff King; +Cc: Raman Gupta, git Jeff King <peff@peff.net> writes: > From the user's perspective, calling X "rerere" would probably be OK[1]. > But from an implementation perspective (and to keep the existing > plumbing available and unchanged), it probably makes sense to call it > something else, and have it run both rerere and a new plumbing command > to do the merge-fix work (or call it nothing, and assume that users will > either touch the plumbing directly or will use "git merge" to trigger > both). > ... > I think it should be its own plumbing tool that merge calls alongside > rerere. ;) As long as we use the database keyed with <A,B> and take the merge base into account, "git am" and "git cherry-pick" would not be able to use the merge-fix machinery, so in that sense, calling X "rerere" would not be OK, but I agree with your general sentiment about the UI visible to the end users. Just like "rerere" started with a small step to avoid automating things too much and then later became almost invisible for normal cases because we managed to automate it reasonably well and integrate it into mergy operations, we may be able to do the same for merge-fix machinery. My "this belongs to 'merge'" is primarily coming from it---it might be reusable in other mergy operations with some fundamental changes, but I envision that the primarly and only user of that X would initially be 'merge'. > Not having thought too hard about it yet, this containing relationship > seems like the right direction. I guess you'd do the lookup by computing > the merge-base M of <X,Y> (which we already know anyway), walking M..X > and looking for any entries which mention those commits (in either A or > B slots of the entry), and then similarly narrowing it according to > M..Y. Yes, every time I look at the Reintegrate script, I try to think of an efficient implementation but do not find anything better than the left-right walk over X...Y range, so that is my conclusion after having thought about it very hard as well ;-) > What if instead of commit hashes we used patch ids? Now you may be onto something. While we aim at the ultimately automated ideal that would catch the maximal cases, for the earlier 'xyzzy turns into frotz' example, the minimal cue to identify one side of the pair that keys the "change this new instance of xyzzy into frotz, too" merge-fix is a hunk like this: -const char *xyzzy; +const char *frotz; It does not matter what other changes also appear in the same commit, and my original "branch name" is way too broad, and my previous "ideal" which is "a single problematic commit" is still broader than necessary. Well, patch-id identifies the entire change contained in a single commit, so it is still too broad, but if we can narrow it down to a single hunk like that, perhaps we can use it as one side of the key. And the other side of the key is naturally a hunk like this: + printf("%s\n", xyzzy); i.e. a new use of xyzzy appears where it didn't exist before. And when we merge two branches, one of which has one half of the key (i.e. "const char *xyzzy;" turned into "const char *frotz"), and the other has the other half of the key (i.e. "printf xyzzy" is added), then we'd apply a patch that tells us to do this: - printf("%s\n", xyzzy); + printf("%s\n", frotz); i.e. that patch would be the value in the database keyed by the pair of two previous hunks. > I think it's asking a lot for users to handle the textual conflicts and > semantic ones separately. It would be nice if we could tell them apart > automatically (and I think we can based on what isn't part of the > conflict resolution). After thinking about this a bit more, I realized that it is possible to mechanically sift a human generated resolution that has both textual conflict resolution (which will be handled by "rerere") and semantic one (which needs the merge-fix machinery) into two, without requiring the user to make two separate commits. The key trick is that "rerere" is capable of recording and replaying semantic conflict resolution made to a file that happens to have textual conflict just fine. Because rerere database records the preimage (i.e. with conflict markers) and postimage (i.e. the final resolution for the file) as an entire file contents, it can do 3-way merge for parts of the same file that is away from any conflicted region. If you tell contrib/rerere-train.sh to learn from "pu^{/^Merge branch 'bp/fsmonitor' into pu}", you'll see what I mean. $ git show "pu^{/^Merge branch 'bp/fsmonitor' into pu}" compat/bswap.h shows the result of such resolution. Early part of the combined diff shown by the above command comes from textual conflict resolution, but there is a new implementation of inline version of get_be64 that has become necessary but did not exist in either of the branches getting merged, which is shown as an evil merge. So the way to automatically sift an existing merge into textual conflict resolution (i.e. automatable with "rerere") and semantic evil merge-fix is to first try to recreate the merge and enumerate the paths that get conflicts. Then resolve these paths by grabbing the resolved result for these paths that get textual conflicts out of the original merge. That gives us the textual part (we can run "git rerere" at this point to store the resolution away to the rerere database---which is essentially what contrib/rerere-train does). There will be difference between the result of the above and the original merge commit. The paths that are different are all outside these conflicted paths, and they are what we want in the second commit, i.e. the semantic evil merge-fix, which will be the value for the merge-fix database. So that part is easily automatable. It is much harder to come up with a way to index into the merge-fix database. We can "punt" and use the same index scheme as the Reintegrate script (i.e. key it with the name of the branch being merged, which can be read from the original merge) as the initial approximation, and that would already be much better than what Reintegrate script currently does, in that the recording part is much more automated (in the workflow I use the Reintegrate script, the side that records a merge-fix initially is entirely manual). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-26 7:14 ` Junio C Hamano @ 2017-07-27 21:01 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2017-07-27 21:01 UTC (permalink / raw) To: Jeff King; +Cc: Raman Gupta, git Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> From the user's perspective, calling X "rerere" would probably be OK[1]. >> But from an implementation perspective (and to keep the existing >> plumbing available and unchanged), it probably makes sense to call it >> something else, and have it run both rerere and a new plumbing command >> to do the merge-fix work (or call it nothing, and assume that users will >> either touch the plumbing directly or will use "git merge" to trigger >> both). >> ... >> I think it should be its own plumbing tool that merge calls alongside >> rerere. ;) > > As long as we use the database keyed with <A,B> and take the merge > base into account, "git am" and "git cherry-pick" would not be able > to use the merge-fix machinery, so in that sense, calling X "rerere" > would not be OK, but I agree with your general sentiment about the > UI visible to the end users. Actually, I guess "cherry-pick" could use it if we think hard and long enough and come up with an ideal scheme to compute the index into the merge-fix database. Imagine this topology: A---o---o---... topic #1 / o---o---o---... mainline \ o---B---o---C---... topic #2 where topic #1 renames 'xyzzy' to 'frotz' at commit A, and topic #2 adds a new mention of 'xyzzy' in file F at commit B and another in file E at commit C. In the ideal world, we would have two merge-fix database entries, one that turns 'xyzzy' in file F to 'frotz' that is keyed by the pair of commits <A,B>, and the other that does the same in file E that is keyed by <A,C>. When merging the topic #1 and the topic #2 together, or when merging the topic #2 to a mainline that already has merged the topic #1, the merge-fix machinery notices that one side has A but not B nor C, and the other side has B and C but not A, and finds these two merge-fixes and applies on top of the textual merge. If we are cherry-picking C to something that already has A, then, we should be able to notice that the history that receives the cherry-pick has A but not C, and C, which is being picked, does not have A, and decide that merge-fix <A,C> is relevant. If we do this purely with commit object name, it will still not work if we cherry-pick A to mainline and then we cherry-pick C. The mainline may hae change from A but does not have the exact commit A. Which brings us back to your earlier idea to use something like patch-id to identify these individual changes. I am not sure how we can structure the merge-fix database so that we can efficiently find which "changes" are already on a branch. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] Git rerere and non-conflicting changes during conflict resolution 2017-07-25 20:58 ` Jeff King 2017-07-26 7:14 ` Junio C Hamano @ 2017-07-26 8:06 ` Junio C Hamano 1 sibling, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2017-07-26 8:06 UTC (permalink / raw) To: Jeff King; +Cc: Raman Gupta, git Jeff King <peff@peff.net> writes: > Hrm. That doesn't quite work, though. Because if your <A,B> are the > merge, then merging a topic to next will get an "A" that is a merge > commit from next. But that commit will never end up in master. What's > causing the conflict is really some "A" that is in the history between > the merge base and "A" (but we don't know which). There may be a misunderstanding. When I said the key <A,B> is a pair of branch names, I didn't mean 'A' to be the name of an integration branch (e.g. 'pu') and 'B' to be the name of a topic. Rather, both 'A' and 'B' are the names of topic branches. IOW, instead of having refs/merge-fix/sd/branch-copy that says "I know when I merge sd/branch-copy to pu or jch, there is a semantic conflict with some unnamed topic that is likely to be already in there", i.e. keying with only a single topic name, the ideal I presented would say 'sd/branch-copy and mh/packed-ref-store topics are both by themselves OK, but when merged together, the end result of textual merge needs to be further fixed up by cherry-picking this change', by keying a change with a pair of topic names, sd/branch-copy (which introduces a new method in the ref backend vtable) and mh/packed-ref-store (which adds a new ref backend). The latter does not know the need for the new method, and the former does not know the need to implement its new method in a new backend, so a merge needs a trivial implementation of the new method added to the new backend, which is what refs/merge-fix/sd/branch-copy does. And better yet, instead of A=sd/branch-copy B=mh/packed-ref-store, we could point at the exact commit on each of these branches that introduce the semantic conflict. I would probably pick these two A=52d59cc6 ("branch: add a --copy (-c) option to go with --move (-m)", 2017-06-18) B=67be7c5a ("packed-backend: new module for handling packed references", 2017-06-23) so when we are on commit X that has A but not B, and are trying to merge branch Y that has B but not A, we want the merge-fix to kick in. Walking "rev-list --left-right X...Y" and noticing A and B in the output would be a way to notice it. [footnote] *1* https://github.com/gitster/git/ should mirror these refs in the refs/merge-fix/ hierarchymentioned in the body of this article. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-07-27 21:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-25 15:09 [RFC] Git rerere and non-conflicting changes during conflict resolution Raman Gupta 2017-07-25 17:52 ` Jeff King 2017-07-25 19:54 ` Raman Gupta 2017-07-25 20:25 ` Jeff King 2017-07-25 20:26 ` Junio C Hamano 2017-07-25 20:40 ` Junio C Hamano 2017-07-25 20:58 ` Jeff King 2017-07-26 7:14 ` Junio C Hamano 2017-07-27 21:01 ` Junio C Hamano 2017-07-26 8:06 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).