Current state / standard advice for rebasing merges without information loss/re-entry?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Current state / standard advice for rebasing merges without information loss/re-entry?
@ 2022-04-18 11:56 Tao Klerks
  2022-04-18 14:26 ` Philip Oakley
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Tao Klerks @ 2022-04-18 11:56 UTC (permalink / raw)
  To: git

Hi folks,

The discussion around Edmundo Carmona Antoranz's recent "git replay"
proposal ([1]) led me down a rabbit-hole reminding me I really don't
understand where we stand with rebasing merges, and I don't think I'm
alone.

I understand the standard advice at the moment to be something like:
---
Use a recent git client, use the '--rebase-merges' option (avoid the
--preserve-merges option if you find it), and re-resolve any textual
and/or semantic conflicts manually (possibly using rerere if you know
what you're doing).
---
Is this correct?

This current state/advice seems... suboptimal, at best, because it
ignores any information encoded in the original merge commit, as
clearly documented in the help. It will often result in you having to
resolve conflicts that you already resolved, *where nothing relevant
to that merge/commit has changed in your rebase*. If you have rerere,
and you know what you are doing, and you were the one that performed
the merge, in this repo, then maybe you're ok; similarly if it's a
clean merge of course.

Elijah Newren describes this problem/opportunity quite carefully in
[2], and mentions a bunch of WIP that I have a hard time getting my
head around.

Similarly, Sergey Organov refers to a thread/discussion four years ago
[3], largely involving a debate around two implementations (his and
that of Phillip Wood?) that are largely theoretically-equivalent (in a
majority of cases), with a lovely explanation of the theory behind the
proposal by Igor Djordjevic / Buga [4], but that discussion appears to
have dried up; I can't tell whether anything came of it, even if only
a manually-usable "rebase a merge" script.

Finally, Martin von Zweigbergk mentions his git-like VCS [5] which
stores conflict data in some kinds of commit as part of a general
"working state is always committable and auto-committed"
state-management strategy; I may be misunderstanding something, but I
*think* the resulting conflict-resolution information ends up being
reusable in a manner theoretically equivalent to the strategy
described by Buga as referenced above.

These kinds of discussions frequently seem to feature git experts
saying "I have a script for my version of this problem" (Elijah,
Junio, Johannes Schindelin, ...), or even "I have a VCS for this
problem" :), but I seem to be too stupid or impatient to dig
through/understand whether or when these things will work for a
regular joe and how to use them.

The temptation, obviously(?), is to write a "rebase a merge" script to
do something like Sergey Organov's V2 proposal referenced above... but
it feels like I'd be spending a bunch of time and ultimately just
making things worse for the community, rather than better - helping
myself based on my (very limited, but still above average)
understanding of merge mechanics, in a way that leaves the general
public message / status just as unsatisfactory/unhelpful.

Does anyone have an existing simpler answer? Ideally I'm looking for
something like:
---
* When you have a merge in your history, and you are rebasing, follow
steps XXXXXX, involving this publicly available gist, or contrib
script, or experimental flag, and it will probably do what you want.
If there is a (new) conflict when rebasing the merge commit, you can
expect conflicts to be presented as YYYYY, because rebasing a merge in
this "informed" way can fundamentally involve multiple different
steps/phases of conflict resolution - rebase conflicts vs merge
conflicts.
* Something like this will likely be introduced as a new rebase option
in a future release, something like "--reapply-merges", or
"--rebase-merges-better", because it will always require the user to
understand that the three-way conflicts presented as part of such an
"informed" merge rebase are subtly different to regular rebase or
merge conflicts.
---

Is it possible to get that sort of simplistic message for this complex topic?

My apologies if this request is a duplicate - obviously a pointer to
some sort of existing summary would be perfect.

Thanks,
Tao

[1]: https://lore.kernel.org/git/20220413164336.101390-1-eantoranz@gmail.com/
[2]: https://lore.kernel.org/git/CABPp-BE=H-OcvGNJKm2zTvV3jEcUV0L=6W76ctpwOewZg56FKg@mail.gmail.com/
[3]: https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/
[4]: https://public-inbox.org/git/a0cc88d2-bfed-ce7b-1b3f-3c447d2b32da@gmail.com/
[5]: https://github.com/martinvonz/jj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 11:56 Current state / standard advice for rebasing merges without information loss/re-entry? Tao Klerks
@ 2022-04-18 14:26 ` Philip Oakley
  2022-04-18 15:48   ` Junio C Hamano
  2022-04-18 16:47 ` Sergey Organov
  2022-04-19  4:24 ` Martin von Zweigbergk
  2 siblings, 1 reply; 14+ messages in thread
From: Philip Oakley @ 2022-04-18 14:26 UTC (permalink / raw)
  To: Tao Klerks, git

A few personal ramblings/comments..
On 18/04/2022 12:56, Tao Klerks wrote:
> Hi folks,
>
> The discussion around Edmundo Carmona Antoranz's recent "git replay"
> proposal ([1]) led me down a rabbit-hole reminding me I really don't
> understand where we stand with rebasing merges, and I don't think I'm
> alone.
My understanding was the 'preserve' may not have retained the expected
history structure. `rebasing` merges is better with the history
expectations. Neither address the merge resolutions.
>
> I understand the standard advice at the moment to be something like:
> ---
> Use a recent git client, use the '--rebase-merges' option (avoid the
> --preserve-merges option if you find it), and re-resolve any textual
> and/or semantic conflicts manually (possibly using rerere if you know
> what you're doing).

The rerere man page is still magic for me. The UX here could be
improved. (also, could the rerere-train be focussed on each merge?)
> ---
> Is this correct?
>
> This current state/advice seems... suboptimal, at best, because it
> ignores any information encoded in the original merge commit, as
> clearly documented in the help. It will often result in you having to
> resolve conflicts that you already resolved, *where nothing relevant
> to that merge/commit has changed in your rebase*. If you have rerere,
> and you know what you are doing, and you were the one that performed
> the merge, in this repo, then maybe you're ok; similarly if it's a
> clean merge of course.
>
> Elijah Newren describes this problem/opportunity quite carefully in
> [2], and mentions a bunch of WIP that I have a hard time getting my
> head around.
>
> Similarly, Sergey Organov refers to a thread/discussion four years ago
> [3], largely involving a debate around two implementations (his and
> that of Phillip Wood?) that are largely theoretically-equivalent (in a
> majority of cases), with a lovely explanation of the theory behind the
> proposal by Igor Djordjevic / Buga [4], but that discussion appears to
> have dried up; I can't tell whether anything came of it, even if only
> a manually-usable "rebase a merge" script.
>
> Finally, Martin von Zweigbergk mentions his git-like VCS [5] which
> stores conflict data in some kinds of commit as part of a general
> "working state is always committable and auto-committed"
> state-management strategy; I may be misunderstanding something, but I
> *think* the resulting conflict-resolution information ends up being
> reusable in a manner theoretically equivalent to the strategy
> described by Buga as referenced above.
>
> These kinds of discussions frequently seem to feature git experts
> saying "I have a script for my version of this problem" (Elijah,
> Junio, Johannes Schindelin, ...), or even "I have a VCS for this
> problem" :), but I seem to be too stupid or impatient to dig
> through/understand whether or when these things will work for a
> regular joe and how to use them.
>
> The temptation, obviously(?), is to write a "rebase a merge" script to
> do something like Sergey Organov's V2 proposal referenced above... but
> it feels like I'd be spending a bunch of time and ultimately just
> making things worse for the community, rather than better - helping
> myself based on my (very limited, but still above average)
> understanding of merge mechanics, in a way that leaves the general
> public message / status just as unsatisfactory/unhelpful.
>
> Does anyone have an existing simpler answer? Ideally I'm looking for
> something like:

I believe there is a paper that highlights that even diff's aren't
unique. So I don't expect merges to be resolvable in the general case.
It's why we have software engineers;-) 

We also have to distinguish between interactive and automatic rebase,
and how much information could be provided to the user about the
previous merges (in the insn), and during resolution of a current merge
conflict compared to the prior conflict. The interactive rebase could
allow early resolution guidance from the users (e.g. highlighting likely
semantic conflicts which shouldn't be auto resolved, which to use
rerere, etc)
> ---
> * When you have a merge in your history, and you are rebasing, follow
> steps XXXXXX, involving this publicly available gist, or contrib
> script, or experimental flag, and it will probably do what you want.
> If there is a (new) conflict when rebasing the merge commit, you can
> expect conflicts to be presented as YYYYY, because rebasing a merge in
> this "informed" way can fundamentally involve multiple different
> steps/phases of conflict resolution - rebase conflicts vs merge
> conflicts.
> * Something like this will likely be introduced as a new rebase option
> in a future release, something like "--reapply-merges", or
> "--rebase-merges-better", because it will always require the user to
> understand that the three-way conflicts presented as part of such an
> "informed" merge rebase are subtly different to regular rebase or
> merge conflicts.
> ---
>
> Is it possible to get that sort of simplistic message for this complex topic?
>
> My apologies if this request is a duplicate - obviously a pointer to
> some sort of existing summary would be perfect.
>
> Thanks,
> Tao
>
> [1]: https://lore.kernel.org/git/20220413164336.101390-1-eantoranz@gmail.com/
> [2]: https://lore.kernel.org/git/CABPp-BE=H-OcvGNJKm2zTvV3jEcUV0L=6W76ctpwOewZg56FKg@mail.gmail.com/
> [3]: https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/
> [4]: https://public-inbox.org/git/a0cc88d2-bfed-ce7b-1b3f-3c447d2b32da@gmail.com/
> [5]: https://github.com/martinvonz/jj
(I'm away 3 days, hence early comments)
--
Philip

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 14:26 ` Philip Oakley
@ 2022-04-18 15:48   ` Junio C Hamano
  2022-04-18 16:28     ` Philip Oakley
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2022-04-18 15:48 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Tao Klerks, git

Philip Oakley <philipoakley@iee.email> writes:

> The rerere man page is still magic for me. The UX here could be
> improved. (also, could the rerere-train be focussed on each merge?)

I am curious to see a clarification on the question in parentheses.

>> These kinds of discussions frequently seem to feature git experts
>> saying "I have a script for my version of this problem" (Elijah,
>> Junio, Johannes Schindelin, ...), or even "I have a VCS for this
>> problem" :), but I seem to be too stupid or impatient to dig
>> through/understand whether or when these things will work for a
>> regular joe and how to use them.

You shouldn't take that to mean "there already is a script to
satisfy _my_ needs and no improvement is needed"; read it as "we
have real need that cannot wait for improvements in this area, so
(unfortunately) we have built our workflow around some scripts".
We can use these scripts to learn the workflows that are not yet
directly supported with existing tools like rebase, but other than
that, their presence is not a sign that discourage you to improve
the standard tools---it is quite an opposite.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 15:48   ` Junio C Hamano
@ 2022-04-18 16:28     ` Philip Oakley
  2022-04-18 16:41       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Philip Oakley @ 2022-04-18 16:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tao Klerks, git

On 18/04/2022 16:48, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
>
>> The rerere man page is still magic for me. The UX here could be
>> improved. (also, could the rerere-train be focussed on each merge?)
> I am curious to see a clarification on the question in parentheses.
>
It was the feeling that the rerere-train currently (IIRC) will parse a
whole set of commits & merges to create the rerere database and then try
an apply all the potential resolutions when called upon.

Thus for the 'replay' scenario, it could be that the database is
partitioned and prioritised so that first it applies the resolutions for
that particular merge, then considers previous resolutions, and finally
starts using resolutions that occur later in the series being rebased.

There is also the possibility that the rerere database is updated after
each commit resolution (and especially as merges pass by) so that the
'prior' resolutions are up to date with any of the current semantic
changes, rather than being outdated so could be applied first (i.e. two
rerere changes being applied to the merge..).

So, essentially, it's talking a small part of the rerere-train at each
step in the replay, so that it's more focussed.

Philip

(this all assumes my mental model of the rerere magic is roughly correct ;-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 16:28     ` Philip Oakley
@ 2022-04-18 16:41       ` Junio C Hamano
  2022-04-19 15:32         ` Martin von Zweigbergk
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2022-04-18 16:41 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Tao Klerks, git

Philip Oakley <philipoakley@iee.email> writes:

> So, essentially, it's talking a small part of the rerere-train at each
> step in the replay, so that it's more focussed.

As rerere database is designed to be an O(1) hashtable, having
knowledge of how many other merge conflicts are to be resolved
shouldn't affect the time you need to find the relevant record
to use to help you resulve the conflict you currently see.

That reminds me of one topic.  I often wondered if it were a mistake
that I didn't make the rerere database easily transferrable across
repositories (just like "stash cannot be transport via fetch" which
is being worked on recently).  As long as a mergy history that will
need to be recreated later gets transferred to a new repository, it
can be used to "train" the rerere database in the new repository, so
it probably is a much lower priority.

"git rerere" command on the other hand may be in desperate need to
learn the "train" subcommand to officially support it (and deprecate
the "contrib/rerere-train.sh").  Especially given that we now can do
the necessary "trial merges" in core, without touching the working
tree or the index, thanks to the "ort" merge-backend.

The size of such a project may be appropriate for GSoC (if done the
same way as the script, smudging HEAD, index and the working tree),
or may exceed what is reasonable for GSoC (if done all in-core using
ort machinery).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 11:56 Current state / standard advice for rebasing merges without information loss/re-entry? Tao Klerks
  2022-04-18 14:26 ` Philip Oakley
@ 2022-04-18 16:47 ` Sergey Organov
  2022-04-19 15:24   ` Martin von Zweigbergk
  2022-04-19  4:24 ` Martin von Zweigbergk
  2 siblings, 1 reply; 14+ messages in thread
From: Sergey Organov @ 2022-04-18 16:47 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

Tao Klerks <tao@klerks.biz> writes:

> Hi folks,
>
> The discussion around Edmundo Carmona Antoranz's recent "git replay"
> proposal ([1]) led me down a rabbit-hole reminding me I really don't
> understand where we stand with rebasing merges, and I don't think I'm
> alone.

Neither do I. Status-quo seems to be sub-optimal, or worse. I,
personally, still use 2-step merge workflow, see below.

>
> I understand the standard advice at the moment to be something like:
> ---
> Use a recent git client, use the '--rebase-merges' option (avoid the
> --preserve-merges option if you find it), and re-resolve any textual
> and/or semantic conflicts manually (possibly using rerere if you know
> what you're doing).
> ---
> Is this correct?
>
> This current state/advice seems... suboptimal, at best, because it
> ignores any information encoded in the original merge commit, as
> clearly documented in the help. It will often result in you having to
> resolve conflicts that you already resolved, *where nothing relevant
> to that merge/commit has changed in your rebase*.

This is IMHO the least important of 2 drawbacks of this method. The most
important one is that it silently drops user changes, that is major
deficiency that, e.g., forces me to split my merges into 2 commits: the
merge itself (along with formal conflict resolutions) and the semantic
fixes to the merge needed by the project. This is constant headache.

[...]

The above deficiency was the main reason of the:

> Similarly, Sergey Organov refers to a thread/discussion four years ago
> [3], largely involving a debate around two implementations (his and
> that of Phillip Wood?) that are largely theoretically-equivalent (in a
> majority of cases), with a lovely explanation of the theory behind the
> proposal by Igor Djordjevic / Buga [4], but that discussion appears to
> have dried up; I can't tell whether anything came of it, even if only
> a manually-usable "rebase a merge" script.

I still hope rebase will finally start to rebase *all* commits, at least
by default, rather than trying to re-create (some of) them out of thin
air.

I'd love to implement that myself, but unfortunately it won't happen any
time soon, sorry.

> Finally, Martin von Zweigbergk mentions his git-like VCS [5] which
> stores conflict data in some kinds of commit as part of a general
> "working state is always committable and auto-committed"
> state-management strategy; I may be misunderstanding something, but I
> *think* the resulting conflict-resolution information ends up being
> reusable in a manner theoretically equivalent to the strategy
> described by Buga as referenced above.

I still think that Git got it right by *not* storing things like that
(e.g., renaming paths / moving contents), so I'd still propose to
*rebase* merge *commits* as *content*, without any additional info being
used, if at all possible. As I wrote in the aforementioned discussion,
we should not confuse "merge-the-process" and "merge-the-result". It's
the latter, the commit, that should be rebased no matter what
particular process has been used to get to this commit, in accordance
with general Git philosophy.

Besides, merge algorithms themselves are subjects to change, so a merge
performed 2 years ago might end-up being rather different when attempted
with a new algorithm today, rendering information stored from an old
algorithm useless.

That said, I'm not opposed to storing/using additional merge
meta-information in general, but it should be an *option* rather than a
requirement, to only improve otherwise reliable content rebasing
algorithms.

Thanks,
-- Sergey Organov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 11:56 Current state / standard advice for rebasing merges without information loss/re-entry? Tao Klerks
  2022-04-18 14:26 ` Philip Oakley
  2022-04-18 16:47 ` Sergey Organov
@ 2022-04-19  4:24 ` Martin von Zweigbergk
  2022-04-19  9:49   ` Tao Klerks
  2 siblings, 1 reply; 14+ messages in thread
From: Martin von Zweigbergk @ 2022-04-19  4:24 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

On Mon, Apr 18, 2022 at 9:52 AM Tao Klerks <tao@klerks.biz> wrote:
>
> Hi folks,
>
> The discussion around Edmundo Carmona Antoranz's recent "git replay"
> proposal ([1]) led me down a rabbit-hole reminding me I really don't
> understand where we stand with rebasing merges, and I don't think I'm
> alone.
>
> I understand the standard advice at the moment to be something like:
> ---
> Use a recent git client, use the '--rebase-merges' option (avoid the
> --preserve-merges option if you find it), and re-resolve any textual
> and/or semantic conflicts manually (possibly using rerere if you know
> what you're doing).
> ---
> Is this correct?
>
> This current state/advice seems... suboptimal, at best, because it
> ignores any information encoded in the original merge commit, as
> clearly documented in the help. It will often result in you having to
> resolve conflicts that you already resolved, *where nothing relevant
> to that merge/commit has changed in your rebase*. If you have rerere,
> and you know what you are doing, and you were the one that performed
> the merge, in this repo, then maybe you're ok; similarly if it's a
> clean merge of course.
>
> Elijah Newren describes this problem/opportunity quite carefully in
> [2], and mentions a bunch of WIP that I have a hard time getting my
> head around.
>
> Similarly, Sergey Organov refers to a thread/discussion four years ago
> [3], largely involving a debate around two implementations (his and
> that of Phillip Wood?) that are largely theoretically-equivalent (in a
> majority of cases), with a lovely explanation of the theory behind the
> proposal by Igor Djordjevic / Buga [4], but that discussion appears to
> have dried up; I can't tell whether anything came of it, even if only
> a manually-usable "rebase a merge" script.
>
> Finally, Martin von Zweigbergk mentions his git-like VCS [5] which
> stores conflict data in some kinds of commit as part of a general
> "working state is always committable and auto-committed"
> state-management strategy;

Just so there's no misunderstanding, the "auto-committed working copy"
idea is not a requirement for storing conflict objects in trees.

> I may be misunderstanding something, but I
> *think* the resulting conflict-resolution information ends up being
> reusable in a manner theoretically equivalent to the strategy
> described by Buga as referenced above.

I think it's more similar to what Elijah suggested, actually. For
example, my VCS lets you rebase a merge commit (the "evil" part of it)
even if you don't rebase all its ancestors. Consider this case:

  X
 /
A---B---C
 \       \
  D---E---F

If you now want to rebase E onto X, and then F onto E' and C, then
Elijah's suggestion (and what my VCS does) will work correctly. If I
understood Sergey's proposal, on the other hand, the utility merge
would bring in the changes from D as well. Or, put another way, that
algorithm is only useful for rebasing "internal" merges, where the
merge commit is being rebased along with both (all of) its legs
(again, if I understood it correctly). With the "rebase changes
compared to auto-merged parents" idea, you can even change the number
of parents of a commit as you rebase it.


>
> These kinds of discussions frequently seem to feature git experts
> saying "I have a script for my version of this problem" (Elijah,
> Junio, Johannes Schindelin, ...), or even "I have a VCS for this
> problem" :), but I seem to be too stupid or impatient to dig
> through/understand whether or when these things will work for a
> regular joe and how to use them.
>
> The temptation, obviously(?), is to write a "rebase a merge" script to
> do something like Sergey Organov's V2 proposal referenced above... but
> it feels like I'd be spending a bunch of time and ultimately just
> making things worse for the community, rather than better - helping
> myself based on my (very limited, but still above average)
> understanding of merge mechanics, in a way that leaves the general
> public message / status just as unsatisfactory/unhelpful.
>
> Does anyone have an existing simpler answer? Ideally I'm looking for
> something like:
> ---
> * When you have a merge in your history, and you are rebasing, follow
> steps XXXXXX, involving this publicly available gist, or contrib
> script, or experimental flag, and it will probably do what you want.
> If there is a (new) conflict when rebasing the merge commit, you can
> expect conflicts to be presented as YYYYY, because rebasing a merge in
> this "informed" way can fundamentally involve multiple different
> steps/phases of conflict resolution - rebase conflicts vs merge
> conflicts.
> * Something like this will likely be introduced as a new rebase option
> in a future release, something like "--reapply-merges", or
> "--rebase-merges-better", because it will always require the user to
> understand that the three-way conflicts presented as part of such an
> "informed" merge rebase are subtly different to regular rebase or
> merge conflicts.
> ---
>
> Is it possible to get that sort of simplistic message for this complex topic?
>
> My apologies if this request is a duplicate - obviously a pointer to
> some sort of existing summary would be perfect.
>
> Thanks,
> Tao
>
> [1]: https://lore.kernel.org/git/20220413164336.101390-1-eantoranz@gmail.com/
> [2]: https://lore.kernel.org/git/CABPp-BE=H-OcvGNJKm2zTvV3jEcUV0L=6W76ctpwOewZg56FKg@mail.gmail.com/
> [3]: https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/
> [4]: https://public-inbox.org/git/a0cc88d2-bfed-ce7b-1b3f-3c447d2b32da@gmail.com/
> [5]: https://github.com/martinvonz/jj

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-19  4:24 ` Martin von Zweigbergk
@ 2022-04-19  9:49   ` Tao Klerks
  2022-04-19 15:10     ` Martin von Zweigbergk
  0 siblings, 1 reply; 14+ messages in thread
From: Tao Klerks @ 2022-04-19  9:49 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: git

On Tue, Apr 19, 2022 at 6:25 AM Martin von Zweigbergk
<martinvonz@gmail.com> wrote:
>
> Consider this case:
>
>   X
>  /
> A---B---C
>  \       \
>   D---E---F
>
> If you now want to rebase E onto X, and then F onto E' and C, then
> Elijah's suggestion (and what my VCS does) will work correctly. If I
> understood Sergey's proposal, on the other hand, the utility merge
> would bring in the changes from D as well. Or, put another way, that
> algorithm is only useful for rebasing "internal" merges, where the
> merge commit is being rebased along with both (all of) its legs
> (again, if I understood it correctly).
>

FWIW, I don't believe this to be the case. If you rebase E onto X, the
way the "D side" of the merge will be resolved, on X, will be as a
combination of "Addition of X" and "Removal of D" onto the previous E
commit state. The secret sauce in Sergey's approach is the application
of a patch representing the "inverted change" to the "D arm" of the
merge base in the original merge vs the "new D arm" (which  happens to
no longer contain D and have X instead - I just have no better way to
refer to it).

I haven't understood or explored Elijah's suggestion (or your
implementation), but based on your description, it sounds like they
end up being equivalent in result, but maybe present any conflicts
differently (as a different patch applying to a different base). I
expect cleanly rebased merges to come out the same, and the same
situations/scenarios to lead to clean merges vs conflicts, but the
presentation of conflicts to likely look different.

That said, I haven't tested them both, I was just hoping for a
"current state of existing merge information reuse for merge-rebasing
users" summary, and it doesn't look like there's one available so far.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-19  9:49   ` Tao Klerks
@ 2022-04-19 15:10     ` Martin von Zweigbergk
  0 siblings, 0 replies; 14+ messages in thread
From: Martin von Zweigbergk @ 2022-04-19 15:10 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

On Tue, Apr 19, 2022 at 2:50 AM Tao Klerks <tao@klerks.biz> wrote:
>
> On Tue, Apr 19, 2022 at 6:25 AM Martin von Zweigbergk
> <martinvonz@gmail.com> wrote:
> >
> > Consider this case:
> >
> >   X
> >  /
> > A---B---C
> >  \       \
> >   D---E---F
> >
> > If you now want to rebase E onto X, and then F onto E' and C, then
> > Elijah's suggestion (and what my VCS does) will work correctly. If I
> > understood Sergey's proposal, on the other hand, the utility merge
> > would bring in the changes from D as well. Or, put another way, that
> > algorithm is only useful for rebasing "internal" merges, where the
> > merge commit is being rebased along with both (all of) its legs
> > (again, if I understood it correctly).
> >
>
> FWIW, I don't believe this to be the case. If you rebase E onto X, the
> way the "D side" of the merge will be resolved, on X, will be as a
> combination of "Addition of X" and "Removal of D" onto the previous E
> commit state. The secret sauce in Sergey's approach is the application
> of a patch representing the "inverted change" to the "D arm" of the
> merge base in the original merge vs the "new D arm" (which  happens to
> no longer contain D and have X instead - I just have no better way to
> refer to it).

I see, it applies a reversed D onto E before creating the utility
merges. That makes sense.

> I haven't understood or explored Elijah's suggestion (or your
> implementation), but based on your description, it sounds like they
> end up being equivalent in result, but maybe present any conflicts
> differently (as a different patch applying to a different base). I
> expect cleanly rebased merges to come out the same, and the same
> situations/scenarios to lead to clean merges vs conflicts, but the
> presentation of conflicts to likely look different.

Yes, pretty much. I expect there would be some minor differences but I
can't think of an example.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 16:47 ` Sergey Organov
@ 2022-04-19 15:24   ` Martin von Zweigbergk
  2022-04-19 18:17     ` Sergey Organov
  0 siblings, 1 reply; 14+ messages in thread
From: Martin von Zweigbergk @ 2022-04-19 15:24 UTC (permalink / raw)
  To: Sergey Organov; +Cc: Tao Klerks, git

On Tue, Apr 19, 2022 at 5:25 AM Sergey Organov <sorganov@gmail.com> wrote:
>
> Tao Klerks <tao@klerks.biz> writes:
>
> > Finally, Martin von Zweigbergk mentions his git-like VCS [5] which
> > stores conflict data in some kinds of commit as part of a general
> > "working state is always committable and auto-committed"
> > state-management strategy; I may be misunderstanding something, but I
> > *think* the resulting conflict-resolution information ends up being
> > reusable in a manner theoretically equivalent to the strategy
> > described by Buga as referenced above.
>
> I still think that Git got it right by *not* storing things like that
> (e.g., renaming paths / moving contents),

My VCS doesn't store that either. Maybe you're thinking of Darcs or
Pijul? [1] explains what my VCS stores. FYI, [2] explains other
benefits of first-class conflicts; being able to rebase merge commits
is much less important than the other benefits, IMO (but it's still
important).

> so I'd still propose to
> *rebase* merge *commits* as *content*, without any additional info being
> used, if at all possible.

Rebasing is about applying changes from some commit onto some other
commit, as I'm sure you know. What Elijah and I are proposing is to
consider the changes in the commit to be relative to the auto-merged
parents (regardless of the number of parents - auto-merging a single
parent commit just yields that commit), although I don't think Elijah
phrased it that way.

> As I wrote in the aforementioned discussion,
> we should not confuse "merge-the-process" and "merge-the-result". It's
> the latter, the commit, that should be rebased no matter what
> particular process has been used to get to this commit, in accordance
> with general Git philosophy.
>
> Besides, merge algorithms themselves are subjects to change, so a merge
> performed 2 years ago might end-up being rather different when attempted
> with a new algorithm today, rendering information stored from an old
> algorithm useless.

I agree with all of that.

[1] https://github.com/martinvonz/jj/blob/main/docs/technical/conflicts.md
[2] https://github.com/martinvonz/jj/blob/main/docs/conflicts.md

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-18 16:41       ` Junio C Hamano
@ 2022-04-19 15:32         ` Martin von Zweigbergk
  2022-04-20  5:43           ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Martin von Zweigbergk @ 2022-04-19 15:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, Tao Klerks, git

On Tue, Apr 19, 2022 at 6:57 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Philip Oakley <philipoakley@iee.email> writes:
>
> > So, essentially, it's talking a small part of the rerere-train at each
> > step in the replay, so that it's more focussed.
>
> That reminds me of one topic.

And it reminds me of a discussion about first-class conflicts vs
rerere I had recently [1] (Philip's email hasn't been delivered to me
yet). As I wrote there, I think most of rerere's use cases can be
fulfilled by first-class conflicts. I understand that it would be a
huge project (much more than appropriate for GSoC :)) to add such
support to Git. I just want to make sure the project is aware of the
idea.

[1] https://github.com/martinvonz/jj/issues/175#issuecomment-1079831788

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-19 15:24   ` Martin von Zweigbergk
@ 2022-04-19 18:17     ` Sergey Organov
  0 siblings, 0 replies; 14+ messages in thread
From: Sergey Organov @ 2022-04-19 18:17 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Tao Klerks, git

Martin von Zweigbergk <martinvonz@gmail.com> writes:

> On Tue, Apr 19, 2022 at 5:25 AM Sergey Organov <sorganov@gmail.com> wrote:
>>

[...]

>> so I'd still propose to
>> *rebase* merge *commits* as *content*, without any additional info being
>> used, if at all possible.
>
> Rebasing is about applying changes from some commit onto some other
> commit, as I'm sure you know.

Yep.

> What Elijah and I are proposing is to
> consider the changes in the commit to be relative to the auto-merged
> parents (regardless of the number of parents - auto-merging a single
> parent commit just yields that commit), although I don't think Elijah
> phrased it that way.

I admit I didn't put enough thought into this new (to me) idea, but I
can't immediately see advantages of this method. Suppose, for the sake
of the argument, that the merge commit in question has been created
without any use of an auto-merge (whatever it actually means) in the
first place. What's then the reason to consider it to be a diff with
respect to an auto-merge? What advantages would it bring?

Then, do we need to be able to reproduce that exact auto-merge in 2
years from now for the method to work reliably? If so, isn't it a
problem, as we seem to agree that merge algorithms are subject to change
over time?

Essentially, this method apparently still puts a result of particular
procedure at the root of the method, again mixing merge-a-process with
merge-commit-the-result, that to me looks fundamentally flawed. I still
think that at its core Git should remain indifferent to the way a commit
has been created, be it merge or non-merge.

OTOH, the method of rebasing merge commits I've described long ago has
no assumptions about procedures involved in creation of the commit to be
rebased, nor does it need any notion of conflicts being involved in the
process, if any. It simply doesn't care, exactly the same way current
rebase doesn't care, when it rebases non-merge commits, if they were
created, say, using conflicting cherry-picks. What it cares about is
preserving the content by properly applying the recorded changes to the
new base. This property of the method I've suggested makes me believe it
is the best candidate for the core functionality, on top of which other
usable features could evolve.

Anyway, the choice is up to whoever gets time and desire to implement
it, and, not being that guy for now, I'm only looking forward for any
suitable solution for reliable rebasing of merge commits.

Thanks,
-- Sergey Organov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-19 15:32         ` Martin von Zweigbergk
@ 2022-04-20  5:43           ` Junio C Hamano
  2022-04-20 23:54             ` Martin von Zweigbergk
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2022-04-20  5:43 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Philip Oakley, Tao Klerks, git

Martin von Zweigbergk <martinvonz@gmail.com> writes:

> On Tue, Apr 19, 2022 at 6:57 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Philip Oakley <philipoakley@iee.email> writes:
>>
>> > So, essentially, it's talking a small part of the rerere-train at each
>> > step in the replay, so that it's more focussed.
>>
>> That reminds me of one topic.
>
> And it reminds me of a discussion about first-class conflicts vs
> rerere I had recently [1] (Philip's email hasn't been delivered to me
> yet). As I wrote there, I think most of rerere's use cases can be
> fulfilled by first-class conflicts. I understand that it would be a
> huge project (much more than appropriate for GSoC :)) to add such
> support to Git. I just want to make sure the project is aware of the
> idea.
>
> [1] https://github.com/martinvonz/jj/issues/175#issuecomment-1079831788

I saw that before, but neither of these two "use cases" solve a
problem relevant to what I have to do often.  It may be a case where
you have a hammer while rerere is a screwdriver, perhaps?  Each is
useful in its own ways and is good at different applications.

Rebuilding of 'seen' multiple times every day may superficially be
similar to "test merge" case you mention there, but the desired end
result from keeping multiple topics in master..seen chain, and have
selected ones (not necessarily in the order in 'master..seen')
graduate while keeping others and rebuilding 'seen' with them never
involves artificially linearlized history in the end, and that is an
explicit goal---to avoid the last-minute rebasing to the upstream,
which can introduce unnecessary bugs.

When I merge topics from 'seen' to 'next', I first reorder the
topics so that these topics that are planned to be merged to 'next'
come directly on top of the tree that matches 'next' in the
'master..seen' chain, so that the exact state planned to be in
'next' in the next iteration appears in 'seen' and be tested.  The
merge of these topics to 'next' happens in the next integration
iteration after this preparatory step passes.  It is the same way
when topics that have been cooking in 'next' are (first planned to
and then actually) merged to 'master'.  There is no "final last
minute" rebase involved.

Another thing that I didn't quite see in your "I see rebase as
replaying the change between parent and child" is how different
order of merging is handled.  It often happens that topic A and
topic B have funny interactions, and the resolution rerere records
when I first merge topic A to 'seen' and then topic B (at which time
the conflict we are interested in happens) is later cleanly reused
if topic B turns out to go first long before topic C graduates.
When such a reordering happens, topic B will be merged first
(without causing the conflict between topics A and B), then topic A
is merged.  Dealing with such a reordering of topics was an explicit
goal of 'rerere' and it works reasonably well, but it is no clear
how [1] you cited above handles such a use case.

The most importantly, at the philosophical level, in order to allow
earlier mistakes to be corrected later, Git tries to avoid casting
heuristic decisions in immutable objects when possible.

Not recording "in this commit, parent and child trees rename path A
to B, combine some contents of path C and D to create a new path E"
and instead computing renames when we actually compare these two
trees, is an example of the application of the philosophy.  It
allows rename detection heuristics at the runtime to improve over
time and a commit you made 5 years ago will be shown better with the
improved rename detection logic.  We do avoid recomputing the same
information over and over again by having long lived cache data
structure like commit-graph, but they are left out of the central
data structure and can be reproducible.

Keeping the rerere database outside the commit object is another
application of the same philosophy.  There needs a clear way to nuke
an earlier recorded resolution that was faulty without having to
rewrite the history, and having it outside the commit object is a
must, and having database in .git/rr-cache/ is one possible
implementation to achieve that goal.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Current state / standard advice for rebasing merges without information loss/re-entry?
  2022-04-20  5:43           ` Junio C Hamano
@ 2022-04-20 23:54             ` Martin von Zweigbergk
  0 siblings, 0 replies; 14+ messages in thread
From: Martin von Zweigbergk @ 2022-04-20 23:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Philip Oakley, Tao Klerks, git

On Tue, Apr 19, 2022 at 10:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin von Zweigbergk <martinvonz@gmail.com> writes:
>
> > On Tue, Apr 19, 2022 at 6:57 AM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Philip Oakley <philipoakley@iee.email> writes:
> >>
> >> > So, essentially, it's talking a small part of the rerere-train at each
> >> > step in the replay, so that it's more focussed.
> >>
> >> That reminds me of one topic.
> >
> > And it reminds me of a discussion about first-class conflicts vs
> > rerere I had recently [1] (Philip's email hasn't been delivered to me
> > yet). As I wrote there, I think most of rerere's use cases can be
> > fulfilled by first-class conflicts. I understand that it would be a
> > huge project (much more than appropriate for GSoC :)) to add such
> > support to Git. I just want to make sure the project is aware of the
> > idea.
> >
> > [1] https://github.com/martinvonz/jj/issues/175#issuecomment-1079831788
>
> I saw that before, but neither of these two "use cases" solve a
> problem relevant to what I have to do often.  It may be a case where
> you have a hammer while rerere is a screwdriver, perhaps?  Each is
> useful in its own ways and is good at different applications.

Yes, that's probably true. I understand that there are scenarios that
rerere helps with that first-class conflicts (at least the way I
implemented them) do not.

> Rebuilding of 'seen' multiple times every day may superficially be
> similar to "test merge" case you mention there, but the desired end
> result from keeping multiple topics in master..seen chain, and have
> selected ones (not necessarily in the order in 'master..seen')
> graduate while keeping others and rebuilding 'seen' with them never
> involves artificially linearlized history in the end, and that is an
> explicit goal---to avoid the last-minute rebasing to the upstream,
> which can introduce unnecessary bugs.
>
> When I merge topics from 'seen' to 'next', I first reorder the
> topics so that these topics that are planned to be merged to 'next'
> come directly on top of the tree that matches 'next' in the
> 'master..seen' chain, so that the exact state planned to be in
> 'next' in the next iteration appears in 'seen' and be tested.  The
> merge of these topics to 'next' happens in the next integration
> iteration after this preparatory step passes.  It is the same way
> when topics that have been cooking in 'next' are (first planned to
> and then actually) merged to 'master'.  There is no "final last
> minute" rebase involved.

Thanks for explaining it in such detail. I'm afraid I still don't
understand how it's related to first-class conflicts vs rerere (I've
read the text at least 5 times).

> Another thing that I didn't quite see in your "I see rebase as
> replaying the change between parent and child" is how different
> order of merging is handled.  It often happens that topic A and
> topic B have funny interactions, and the resolution rerere records
> when I first merge topic A to 'seen' and then topic B (at which time
> the conflict we are interested in happens) is later cleanly reused
> if topic B turns out to go first long before topic C graduates.
> When such a reordering happens, topic B will be merged first
> (without causing the conflict between topics A and B), then topic A
> is merged.  Dealing with such a reordering of topics was an explicit
> goal of 'rerere' and it works reasonably well, but it is no clear
> how [1] you cited above handles such a use case.

Good point! That's not a use case I had considered. To make sure I
understand you correctly, the reordering you're talking about is
something like the difference between the following two graphs
(children on top, not on the right).

  N
  |\
  M |
 /| |
X Y Z

  P
 /|
| O
| |\
X Y Z

The problem (for my tool) here is that commit N contains resolutions
for conflicts between X and Z *and* between Y and Z, so when the
merges are done in the opposite order, you'll want to put some of the
conflict resolutions from M in O and some in P. There are commands for
moving changes (including conflict resolutions) between commits, so
you could use that here, but rerere is way smoother since it's
automatic.

> The most importantly, at the philosophical level, in order to allow
> earlier mistakes to be corrected later, Git tries to avoid casting
> heuristic decisions in immutable objects when possible.
>
> Not recording "in this commit, parent and child trees rename path A
> to B, combine some contents of path C and D to create a new path E"
> and instead computing renames when we actually compare these two
> trees, is an example of the application of the philosophy.  It
> allows rename detection heuristics at the runtime to improve over
> time and a commit you made 5 years ago will be shown better with the
> improved rename detection logic.  We do avoid recomputing the same
> information over and over again by having long lived cache data
> structure like commit-graph, but they are left out of the central
> data structure and can be reproducible.
>
> Keeping the rerere database outside the commit object is another
> application of the same philosophy.  There needs a clear way to nuke
> an earlier recorded resolution that was faulty without having to
> rewrite the history, and having it outside the commit object is a
> must, and having database in .git/rr-cache/ is one possible
> implementation to achieve that goal.

I agree with all of that. I guess there's some implication about
first-class conflicts vs rerere here too? Is the concern that if you
leave some conflict unresolved for years, it might be that the tool
now could have actually resolved that conflict instead of marking it
as a conflict in a file? So by not being forced to redo the merge, you
are instead trying to resolve an auto-resolvable conflict. Yes, that
is a problem, but it seems very small. I'm probably missing a more
serious problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-04-20 23:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 11:56 Current state / standard advice for rebasing merges without information loss/re-entry? Tao Klerks
2022-04-18 14:26 ` Philip Oakley
2022-04-18 15:48   ` Junio C Hamano
2022-04-18 16:28     ` Philip Oakley
2022-04-18 16:41       ` Junio C Hamano
2022-04-19 15:32         ` Martin von Zweigbergk
2022-04-20  5:43           ` Junio C Hamano
2022-04-20 23:54             ` Martin von Zweigbergk
2022-04-18 16:47 ` Sergey Organov
2022-04-19 15:24   ` Martin von Zweigbergk
2022-04-19 18:17     ` Sergey Organov
2022-04-19  4:24 ` Martin von Zweigbergk
2022-04-19  9:49   ` Tao Klerks
2022-04-19 15:10     ` Martin von Zweigbergk

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).