Bring together merge and rebase

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Bring together merge and rebase
@ 2017-12-23  6:10 Carl Baldwin
  2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-23  6:10 UTC (permalink / raw)
  To: Git Mailing List

The big contention among git users is whether to rebase or to merge
changes [2][3] while iterating. I used to firmly believe that merging
was the way to go and rebase was harmful. More recently, I have worked
in some environments where I saw rebase used very effectively while
iterating on changes and I relaxed my stance a lot. Now, I'm on the
fence. I appreciate the strengths and weaknesses of both approaches. I
waffle between the two depending on the situation, the tools being
used, and I guess, to some extent, my mood.

I think what git needs is something brand new that brings the two
together and has all of the advantages of both approaches. Let me
explain what I've got in mind...

I've been calling this proposal `git replay` or `git replace` but I'd
like to hear other suggestions for what to name it. It works like
rebase except with one very important difference. Instead of orphaning
the original commit, it keeps a pointer to it in the commit just like
a `parent` entry but calls it `replaces` instead to distinguish it
from regular history. In the resulting commit history, following
`parent` pointers shows exactly the same history as if the commit had
been rebased. Meanwhile, the history of iterating on the change itself
is available by following `replaces` pointers. The new commit replaces
the old one but keeps it around to record how the change evolved.

The git history now has two dimensions. The first shows a cleaned up
history where fix ups and code review feedback have been rolled into
the original changes and changes can possibly be ordered in a nice
linear progression that is much easier to understand. The second
drills into the history of a change. There is no loss and you don't
change history in a way that will cause problems for others who have
the older commits.

Replay handles collaboration between multiple authors on a single
change. This is difficult and prone to accidental loss when using
rebase and it results in a complex history when done with merge. With
replay, collaborators could merge while collaborating on a single
change and a record of each one's contributions can be preserved.
Attempting this level of collaboration caused me many headaches when I
worked with the gerrit workflow (which in many ways, I like a lot).

I blogged about this proposal earlier this year when I first thought
of it [1]. I got busy and didn't think about it for a while. Now with
a little time off of work, I've come back to revisit it. The blog
entry has a few examples showing how it works and how the history will
look in a few examples. Take a look.

Various git commands will have to learn how to handle this kind of
history. For example, things like fetch, push, gc, and others that
move history around and clean out orphaned history should treat
anything reachable through `replaces` pointers as precious. Log and
related history commands may need new switches to traverse the history
differently in different situations. Bisect is a interesting one. I
tend to think that bisect should prefer the regular commit history but
have the ability to drill into the change history if necessary.

In my opinion, this proposal would bring together rebase and merge in
a powerful way and could end the contention. Thanks for your
consideration.

Carl Baldwin

[1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
[2] https://git-scm.com/book/en/v2/Git-Branching-Rebasing
[3] http://changelog.complete.org/archives/586-rebase-considered-harmful

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23  6:10 Bring together merge and rebase Carl Baldwin
@ 2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
  2017-12-23 21:01   ` Carl Baldwin
  2017-12-23 22:30   ` Johannes Schindelin
  2017-12-25  3:52 ` Theodore Ts'o
  2017-12-26  4:08 ` Mike Hommey
  2 siblings, 2 replies; 44+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-12-23 18:59 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List


On Sat, Dec 23 2017, Carl Baldwin jotted:

> The big contention among git users is whether to rebase or to merge
> changes [2][3] while iterating. I used to firmly believe that merging
> was the way to go and rebase was harmful. More recently, I have worked
> in some environments where I saw rebase used very effectively while
> iterating on changes and I relaxed my stance a lot. Now, I'm on the
> fence. I appreciate the strengths and weaknesses of both approaches. I
> waffle between the two depending on the situation, the tools being
> used, and I guess, to some extent, my mood.
>
> I think what git needs is something brand new that brings the two
> together and has all of the advantages of both approaches. Let me
> explain what I've got in mind...
>
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.
>
> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.
>
> Replay handles collaboration between multiple authors on a single
> change. This is difficult and prone to accidental loss when using
> rebase and it results in a complex history when done with merge. With
> replay, collaborators could merge while collaborating on a single
> change and a record of each one's contributions can be preserved.
> Attempting this level of collaboration caused me many headaches when I
> worked with the gerrit workflow (which in many ways, I like a lot).
>
> I blogged about this proposal earlier this year when I first thought
> of it [1]. I got busy and didn't think about it for a while. Now with
> a little time off of work, I've come back to revisit it. The blog
> entry has a few examples showing how it works and how the history will
> look in a few examples. Take a look.
>
> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations. Bisect is a interesting one. I
> tend to think that bisect should prefer the regular commit history but
> have the ability to drill into the change history if necessary.
>
> In my opinion, this proposal would bring together rebase and merge in
> a powerful way and could end the contention. Thanks for your
> consideration.
>
> Carl Baldwin
>
> [1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
> [2] https://git-scm.com/book/en/v2/Git-Branching-Rebasing
> [3] http://changelog.complete.org/archives/586-rebase-considered-harmful

I think this is a worthwhile thing to implement, there are certainly
use-cases where you'd like to have your cake & eat it too as it were,
i.e. have a nice rebased history in "git log", but also have the "raw"
history for all the reasons the fossil people like to talk about, or for
some compliance reasons.

But I don't see why you think this needs a new "replaces" parent pointer
orthagonal to parent pointers, i.e. something that would need to be a
new field in the commit object (I may have misread the proposal, it's
not heavy on technical details).

Consider a merge use case like this:

          A---B---C topic
         /         \
    D---E---F---G---H master

Here we worked on a topic with commits A,B & C, maybe we regret not
squashing B into A, but it gives us the "raw" history. Instead we might
rebase it like this:

          A+B---C topic
         /
    G---H master

Now we can push "topic" to master, but as you've noted this loses the
raw history, but now consider doing this instead:

          A---B---C   A2+B2---C2 topic
         /         \ /
    D---E---F---G---G master

I.e. you could have started working on commit A/B/C, now you "git
replace" them (which would be some fancy rebase alias), and what it'll
do is create a merge commit that entirely resolves the conflict so that
hte tree is equivalent to what "master" was already at. Then you rewrite
them and re-apply them on top.

If you run "git log" it will already ignore A,B,C unless you specify
--full-history, so git already knows to ignore these sort of side
histories that result in no changes on the branch they got merged
into. I don't know about bisect, but if it's not doing something similar
already it would be easy to make it do so.

You could even add a new field to the commit object of A2+B2 & C2 which
would be one or more of "replaces <sha1 of A/B/C>", commit objects
support adding arbitrary new fields without anything breaking.

But most importantly, while I think this gives you the same things from
a UX level, it doesn't need any changes to fetch, push, gc or whatever,
since it's all stuff we support today, someone just needs to hack
"rebase" to create this sort of no-op merge commit to take advantage of
it.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
@ 2017-12-23 21:01   ` Carl Baldwin
  2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
                       ` (2 more replies)
  2017-12-23 22:30   ` Johannes Schindelin
  1 sibling, 3 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-23 21:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
> I think this is a worthwhile thing to implement, there are certainly
> use-cases where you'd like to have your cake & eat it too as it were,
> i.e. have a nice rebased history in "git log", but also have the "raw"
> history for all the reasons the fossil people like to talk about, or for
> some compliance reasons.

Thank you kindly for your reply. I do think we can have the cake and eat
it too in this case. At a high level, what you describe above is what
I'm after. I'm sorry if I left something out or was unclear. I hoped to
keep my original post brief. Maybe it was too brief to be useful.
However, I'd like to follow up and be understood.

> But I don't see why you think this needs a new "replaces" parent pointer
> orthagonal to parent pointers, i.e. something that would need to be a
> new field in the commit object (I may have misread the proposal, it's
> not heavy on technical details).

Just to clarify, I am proposing a new "replaces" pointer in the commit
object. Imagine starting with rebase exactly as it works today. This new
field would be inserted into any new commit created by a rebase command
to reference the original commit on which it was based. Though, I'm not
sure if it would be better to change the behavior of the existing rebase
command, provide a switch or config option to turn it on, or provide a
new command entirely (e.g. git replay or git replace) to avoid
compatibility issues with the existing rebase.

I imagine that a "git commit --amend" would also insert a "replaces"
reference to the original commit but I failed to mention that in my
original post. The amend use case is similar to adding a fixup commit
and then doing a squash in interactive mode.

> Consider a merge use case like this:
> 
>           A---B---C topic
>          /         \
>     D---E---F---G---H master

This is a bit different than the use cases that I've had in mind. You
show that the topic has already merged to master. I have imagined this
proposal being useful before the topic becomes a part of the master
branch. I'm thinking in the context of something like a github pull
request under active development and review or a gerrit review. So, at
this point, we still look like this:

          A---B---C topic
         /
    D---E---F---G

> Here we worked on a topic with commits A,B & C, maybe we regret not
> squashing B into A, but it gives us the "raw" history. Instead we might
> rebase it like this:
> 
>           A+B---C topic
>          /
>     G---H master

Since H already merged the topic. I'm not sure what the A+B and C
commits are doing.

At the point where I have C and G above, let's say I regret not having
squashed A and B as you suggested. My proposal would end up as I draw
below where the primes are the new versions of the commits (A' is A+B).
Bare with me, I'm not sure the best way to draw this in ascii. It has
that orthogoal dimension that makes the ascii drawings a little more
complex: (I left out the parent of A' which is still E)

       A--B---C
        \ |    \                    <- "replaces" rather than "parent"
         -A'----C' topic
         /
    D---E---F---G master

We can continue by actually changing the base. All of these commits are
kept, I just drop them from the drawings to avoid getting too complex.

                A'--C'
                 \   \              <- "replaces" rather than "parent"
                  A"--C" topic
                 /
    D---E---F---G master

Normal git log operations would ignore them by default. When finally
merging to master, it ends up very simple (by default) but the history
is still there to support archealogic operations.

    D---E---F---G---A"--C" master

> Now we can push "topic" to master, but as you've noted this loses the
> raw history, but now consider doing this instead:
> 
>           A---B---C   A2+B2---C2 topic
>          /         \ /
>     D---E---F---G---G master

There are two Gs in this drawing. Should the second be H? Sorry, I'm
just trying to understanding the use case you're describing and I don't
understand it yet which makes it difficult to comment on the rest of
your reply.

> I.e. you could have started working on commit A/B/C, now you "git
> replace" them (which would be some fancy rebase alias), and what it'll
> do is create a merge commit that entirely resolves the conflict so that
> hte tree is equivalent to what "master" was already at. Then you rewrite
> them and re-apply them on top.
> 
> If you run "git log" it will already ignore A,B,C unless you specify
> --full-history, so git already knows to ignore these sort of side
> histories that result in no changes on the branch they got merged
> into. I don't know about bisect, but if it's not doing something similar
> already it would be easy to make it do so.

I haven't had the need to use --full-history much. Let me see if I can
play around with it to see if I can figure out how to use it in a way
that gives me what I'm after.

> You could even add a new field to the commit object of A2+B2 & C2 which
> would be one or more of "replaces <sha1 of A/B/C>", commit objects
> support adding arbitrary new fields without anything breaking.
> 
> But most importantly, while I think this gives you the same things from
> a UX level, it doesn't need any changes to fetch, push, gc or whatever,
> since it's all stuff we support today, someone just needs to hack
> "rebase" to create this sort of no-op merge commit to take advantage of
> it.

Avoiding changes would be very nice. I'm not convinced yet that it can
be done but maybe when I understand your counter proposal, it will
become clearer.

Thank you,
Carl Baldwin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 21:01   ` Carl Baldwin
@ 2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
  2017-12-26  0:16       ` Carl Baldwin
  2017-12-23 22:19     ` Randall S. Becker
  2017-12-23 23:01     ` Johannes Schindelin
  2 siblings, 1 reply; 44+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-12-23 22:09 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List


On Sat, Dec 23 2017, Carl Baldwin jotted:

> On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> I think this is a worthwhile thing to implement, there are certainly
>> use-cases where you'd like to have your cake & eat it too as it were,
>> i.e. have a nice rebased history in "git log", but also have the "raw"
>> history for all the reasons the fossil people like to talk about, or for
>> some compliance reasons.
>
> Thank you kindly for your reply. I do think we can have the cake and eat
> it too in this case. At a high level, what you describe above is what
> I'm after. I'm sorry if I left something out or was unclear. I hoped to
> keep my original post brief. Maybe it was too brief to be useful.
> However, I'd like to follow up and be understood.
>
>> But I don't see why you think this needs a new "replaces" parent pointer
>> orthagonal to parent pointers, i.e. something that would need to be a
>> new field in the commit object (I may have misread the proposal, it's
>> not heavy on technical details).
>
> Just to clarify, I am proposing a new "replaces" pointer in the commit
> object. Imagine starting with rebase exactly as it works today. This new
> field would be inserted into any new commit created by a rebase command
> to reference the original commit on which it was based. Though, I'm not
> sure if it would be better to change the behavior of the existing rebase
> command, provide a switch or config option to turn it on, or provide a
> new command entirely (e.g. git replay or git replace) to avoid
> compatibility issues with the existing rebase.

Yeah that sounds fine, I thought you meant that this "replaces" field
would replace the "parent" field, which would require some rather deep
incompatible changes to all git clients.

But then I don't get why you think fetch/pull/gc would need to be
altered, if it's because you thought that adding arbitrary *new* fields
to the commit object would require changes to those that's not the case.

> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my
> original post. The amend use case is similar to adding a fixup commit
> and then doing a squash in interactive mode.
>
>> Consider a merge use case like this:
>>
>>           A---B---C topic
>>          /         \
>>     D---E---F---G---H master
>
> This is a bit different than the use cases that I've had in mind. You
> show that the topic has already merged to master. I have imagined this
> proposal being useful before the topic becomes a part of the master
> branch. I'm thinking in the context of something like a github pull
> request under active development and review or a gerrit review. So, at
> this point, we still look like this:
>
>           A---B---C topic
>          /
>     D---E---F---G

Right, I'm just mentioning this for context, i.e. "if you only used
git-merge".

>> Here we worked on a topic with commits A,B & C, maybe we regret not
>> squashing B into A, but it gives us the "raw" history. Instead we might
>> rebase it like this:
>>
>>           A+B---C topic
>>          /
>>     G---H master
>
> Since H already merged the topic. I'm not sure what the A+B and C
> commits are doing.

This means that master is at commit H, but your newly rebased topic is
at C, i.e. master has no new commits so you could `git push origin
C:master` without -f.

> At the point where I have C and G above, let's say I regret not having
> squashed A and B as you suggested. My proposal would end up as I draw
> below where the primes are the new versions of the commits (A' is A+B).
> Bare with me, I'm not sure the best way to draw this in ascii. It has
> that orthogoal dimension that makes the ascii drawings a little more
> complex: (I left out the parent of A' which is still E)
>
>        A--B---C
>         \ |    \                    <- "replaces" rather than "parent"
>          -A'----C' topic
>          /
>     D---E---F---G master
>
> We can continue by actually changing the base. All of these commits are
> kept, I just drop them from the drawings to avoid getting too complex.
>
>                 A'--C'
>                  \   \              <- "replaces" rather than "parent"
>                   A"--C" topic
>                  /
>     D---E---F---G master
>
> Normal git log operations would ignore them by default. When finally
> merging to master, it ends up very simple (by default) but the history
> is still there to support archealogic operations.
>
>     D---E---F---G---A"--C" master
>
>> Now we can push "topic" to master, but as you've noted this loses the
>> raw history, but now consider doing this instead:
>>
>>           A---B---C   A2+B2---C2 topic
>>          /         \ /
>>     D---E---F---G---G master
>
> There are two Gs in this drawing. Should the second be H? Sorry, I'm
> just trying to understanding the use case you're describing and I don't
> understand it yet which makes it difficult to comment on the rest of
> your reply.

Yes this is very confusing, sorry for not clarifying this.

What the letters in *this* diagram actually mean is they're all unique
ids for commits that parse to the same value given;

    git rev-parse $commit^{tree}

I.e. you'd merge C into the G commit, and you'd end up with a commit
that would give you the exact same tree, see "ours" under "MERGE
STRATEGIES" in git-commit(1).

You can try to create one of these with:

    (
        rm -rf /tmp/testgit &&
        git clone git@github.com:antirez/rax.git /tmp/testgit &&
        cd /tmp/testgit &&
        git checkout -b wip-rebase master &&
        for f in foo bar baz; do
            echo $f >$f &&
            git add $f &&
            git commit -m"$f"
        done &&
        git checkout master &&
        git merge --no-edit -s ours wip-rebase &&
        git rev-parse origin/master^{tree} &&
        git rev-parse HEAD^{tree}
    )

Note that the output of the two rev-parse commands is the same,
i.e. I've created a bunch of content on a side branch and merged it in,
but due to "-s ours" the end result is exactly the same as if it had
never been merged as far as the content of the tree at HEAD goes.

But I see now that I was wrong/misremembering about --full-history. In
this case if you just run "git log" you'd get those foo/bar/baz changes,
however if you run;

    git log -- foo

You get nothing, but run:

    git log --full-history -- foo

And you get that no-op merge.

But in any case, regardless of what the history simplification does
*now* I was trying to point out, with the assumption (see my comment
about pull/fetch/gc above) that you were suggesting some deep changes in
how git's object model works.

Instead, if I understand what you're actually trying to do, it could
also be done as:

 1) Just add a new replaces <sha1> field to new commit objects

 2) Make git-rebase know how to write those, e.g. add two of those
    pointing to A & B when it squashes them into AB.

 3) Write a history traversal mechanism similar to --full-history
    that'll ignore any commits on branches that yield no changes, or
    only those whose commits are referenced by this "replaces" field.

You'd then end up with:

 A) A way to "stash" these commits in the permanent history

 B) ... that wouldn't be visble in "git log" by default

 C) Would require no underlying changes to the commit model, i.e. it
    would work with all past & future git clients, if they didn't know
    about the "replaces" field they'd just show more verbose history.

>> I.e. you could have started working on commit A/B/C, now you "git
>> replace" them (which would be some fancy rebase alias), and what it'll
>> do is create a merge commit that entirely resolves the conflict so that
>> hte tree is equivalent to what "master" was already at. Then you rewrite
>> them and re-apply them on top.
>>
>> If you run "git log" it will already ignore A,B,C unless you specify
>> --full-history, so git already knows to ignore these sort of side
>> histories that result in no changes on the branch they got merged
>> into. I don't know about bisect, but if it's not doing something similar
>> already it would be easy to make it do so.
>
> I haven't had the need to use --full-history much. Let me see if I can
> play around with it to see if I can figure out how to use it in a way
> that gives me what I'm after.
>
>> You could even add a new field to the commit object of A2+B2 & C2 which
>> would be one or more of "replaces <sha1 of A/B/C>", commit objects
>> support adding arbitrary new fields without anything breaking.
>>
>> But most importantly, while I think this gives you the same things from
>> a UX level, it doesn't need any changes to fetch, push, gc or whatever,
>> since it's all stuff we support today, someone just needs to hack
>> "rebase" to create this sort of no-op merge commit to take advantage of
>> it.
>
> Avoiding changes would be very nice. I'm not convinced yet that it can
> be done but maybe when I understand your counter proposal, it will
> become clearer.
>
> Thank you,
> Carl Baldwin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Bring together merge and rebase
  2017-12-23 21:01   ` Carl Baldwin
  2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
@ 2017-12-23 22:19     ` Randall S. Becker
  2017-12-25 20:05       ` Carl Baldwin
  2017-12-23 23:01     ` Johannes Schindelin
  2 siblings, 1 reply; 44+ messages in thread
From: Randall S. Becker @ 2017-12-23 22:19 UTC (permalink / raw)
  To: 'Carl Baldwin',
	'Ævar Arnfjörð Bjarmason'
  Cc: 'Git Mailing List'

On December 23, 2017 4:02 PM, Carl Baldwin wrote:
> On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > I think this is a worthwhile thing to implement, there are certainly
> > use-cases where you'd like to have your cake & eat it too as it were,
> > i.e. have a nice rebased history in "git log", but also have the "raw"
> > history for all the reasons the fossil people like to talk about, or
> > for some compliance reasons.
> 
> Thank you kindly for your reply. I do think we can have the cake and eat it
> too in this case. At a high level, what you describe above is what I'm after.
> I'm sorry if I left something out or was unclear. I hoped to keep my original
> post brief. Maybe it was too brief to be useful.
> However, I'd like to follow up and be understood.
> 
> > But I don't see why you think this needs a new "replaces" parent
> > pointer orthagonal to parent pointers, i.e. something that would need
> > to be a new field in the commit object (I may have misread the
> > proposal, it's not heavy on technical details).
> 
> Just to clarify, I am proposing a new "replaces" pointer in the commit object.
> Imagine starting with rebase exactly as it works today. This new field would
> be inserted into any new commit created by a rebase command to reference
> the original commit on which it was based. Though, I'm not sure if it would
> be better to change the behavior of the existing rebase command, provide a
> switch or config option to turn it on, or provide a new command entirely (e.g.
> git replay or git replace) to avoid compatibility issues with the existing rebase.
> 
> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my original
> post. The amend use case is similar to adding a fixup commit and then doing
> a squash in interactive mode.
> 
> > Consider a merge use case like this:
> >
> >           A---B---C topic
> >          /         \
> >     D---E---F---G---H master
> 
> This is a bit different than the use cases that I've had in mind. You show that
> the topic has already merged to master. I have imagined this proposal being
> useful before the topic becomes a part of the master branch. I'm thinking in
> the context of something like a github pull request under active development
> and review or a gerrit review. So, at this point, we still look like this:
> 
>           A---B---C topic
>          /
>     D---E---F---G
> 
> > Here we worked on a topic with commits A,B & C, maybe we regret not
> > squashing B into A, but it gives us the "raw" history. Instead we
> > might rebase it like this:
> >
> >           A+B---C topic
> >          /
> >     G---H master
> 
> Since H already merged the topic. I'm not sure what the A+B and C commits
> are doing.
> 
> At the point where I have C and G above, let's say I regret not having
> squashed A and B as you suggested. My proposal would end up as I draw
> below where the primes are the new versions of the commits (A' is A+B).
> Bare with me, I'm not sure the best way to draw this in ascii. It has that
> orthogoal dimension that makes the ascii drawings a little more
> complex: (I left out the parent of A' which is still E)
> 
>        A--B---C
>         \ |    \                    <- "replaces" rather than "parent"
>          -A'----C' topic
>          /
>     D---E---F---G master
> 
> We can continue by actually changing the base. All of these commits are
> kept, I just drop them from the drawings to avoid getting too complex.
> 
>                 A'--C'
>                  \   \              <- "replaces" rather than "parent"
>                   A"--C" topic
>                  /
>     D---E---F---G master
> 
> Normal git log operations would ignore them by default. When finally
> merging to master, it ends up very simple (by default) but the history is still
> there to support archealogic operations.
> 
>     D---E---F---G---A"--C" master
> 
> > Now we can push "topic" to master, but as you've noted this loses the
> > raw history, but now consider doing this instead:
> >
> >           A---B---C   A2+B2---C2 topic
> >          /         \ /
> >     D---E---F---G---G master
> 
> There are two Gs in this drawing. Should the second be H? Sorry, I'm just
> trying to understanding the use case you're describing and I don't
> understand it yet which makes it difficult to comment on the rest of your
> reply.
> 
> > I.e. you could have started working on commit A/B/C, now you "git
> > replace" them (which would be some fancy rebase alias), and what it'll
> > do is create a merge commit that entirely resolves the conflict so
> > that hte tree is equivalent to what "master" was already at. Then you
> > rewrite them and re-apply them on top.
> >
> > If you run "git log" it will already ignore A,B,C unless you specify
> > --full-history, so git already knows to ignore these sort of side
> > histories that result in no changes on the branch they got merged
> > into. I don't know about bisect, but if it's not doing something
> > similar already it would be easy to make it do so.
> 
> I haven't had the need to use --full-history much. Let me see if I can play
> around with it to see if I can figure out how to use it in a way that gives me
> what I'm after.
> 
> > You could even add a new field to the commit object of A2+B2 & C2
> > which would be one or more of "replaces <sha1 of A/B/C>", commit
> > objects support adding arbitrary new fields without anything breaking.
> >
> > But most importantly, while I think this gives you the same things
> > from a UX level, it doesn't need any changes to fetch, push, gc or
> > whatever, since it's all stuff we support today, someone just needs to
> > hack "rebase" to create this sort of no-op merge commit to take
> > advantage of it.
> 
> Avoiding changes would be very nice. I'm not convinced yet that it can be
> done but maybe when I understand your counter proposal, it will become
> clearer.

No matter how this plays out, let's please make very sure to provide sufficient user documentation so that those of us who have to explain the differences to users have a decent reference. Even now, explaining rebase vs. merge is difficult enough for people new to git to choose which to use when (sometimes pummeling is involved to get the point across 😉 ), even though it should be intuitive to most of us. I am predicting that adding this capability is going to further confuse the *new* user community a little. Entirely out of enlighted self-interest, I am offering to help document (edits/contribution//whatever) this once we get to that point in development.

Something else to consider is how (or if) this capability is going to be presented in front-ends and in Cloud services. GitK is a given, of course. I'm still impatiently waiting for worktree support from some other front-ends.

Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000)
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
  2017-12-23 21:01   ` Carl Baldwin
@ 2017-12-23 22:30   ` Johannes Schindelin
  1 sibling, 0 replies; 44+ messages in thread
From: Johannes Schindelin @ 2017-12-23 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Carl Baldwin, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 10651 bytes --]

Hi Ævar,

On Sat, 23 Dec 2017, Ævar Arnfjörð Bjarmason wrote:

> On Sat, Dec 23 2017, Carl Baldwin jotted:
> 
> > The big contention among git users is whether to rebase or to merge
> > changes [2][3] while iterating. I used to firmly believe that merging
> > was the way to go and rebase was harmful. More recently, I have worked
> > in some environments where I saw rebase used very effectively while
> > iterating on changes and I relaxed my stance a lot. Now, I'm on the
> > fence. I appreciate the strengths and weaknesses of both approaches. I
> > waffle between the two depending on the situation, the tools being
> > used, and I guess, to some extent, my mood.
> >
> > I think what git needs is something brand new that brings the two
> > together and has all of the advantages of both approaches. Let me
> > explain what I've got in mind...
> >
> > I've been calling this proposal `git replay` or `git replace` but I'd
> > like to hear other suggestions for what to name it. It works like
> > rebase except with one very important difference. Instead of orphaning
> > the original commit, it keeps a pointer to it in the commit just like
> > a `parent` entry but calls it `replaces` instead to distinguish it
> > from regular history. In the resulting commit history, following
> > `parent` pointers shows exactly the same history as if the commit had
> > been rebased. Meanwhile, the history of iterating on the change itself
> > is available by following `replaces` pointers. The new commit replaces
> > the old one but keeps it around to record how the change evolved.
> >
> > The git history now has two dimensions. The first shows a cleaned up
> > history where fix ups and code review feedback have been rolled into
> > the original changes and changes can possibly be ordered in a nice
> > linear progression that is much easier to understand. The second
> > drills into the history of a change. There is no loss and you don't
> > change history in a way that will cause problems for others who have
> > the older commits.
> >
> > Replay handles collaboration between multiple authors on a single
> > change. This is difficult and prone to accidental loss when using
> > rebase and it results in a complex history when done with merge. With
> > replay, collaborators could merge while collaborating on a single
> > change and a record of each one's contributions can be preserved.
> > Attempting this level of collaboration caused me many headaches when I
> > worked with the gerrit workflow (which in many ways, I like a lot).
> >
> > I blogged about this proposal earlier this year when I first thought
> > of it [1]. I got busy and didn't think about it for a while. Now with
> > a little time off of work, I've come back to revisit it. The blog
> > entry has a few examples showing how it works and how the history will
> > look in a few examples. Take a look.
> >
> > Various git commands will have to learn how to handle this kind of
> > history. For example, things like fetch, push, gc, and others that
> > move history around and clean out orphaned history should treat
> > anything reachable through `replaces` pointers as precious. Log and
> > related history commands may need new switches to traverse the history
> > differently in different situations. Bisect is a interesting one. I
> > tend to think that bisect should prefer the regular commit history but
> > have the ability to drill into the change history if necessary.
> >
> > In my opinion, this proposal would bring together rebase and merge in
> > a powerful way and could end the contention. Thanks for your
> > consideration.
> >
> > Carl Baldwin
> >
> > [1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/ [2]
> > https://git-scm.com/book/en/v2/Git-Branching-Rebasing [3]
> > http://changelog.complete.org/archives/586-rebase-considered-harmful
> 
> I think this is a worthwhile thing to implement, there are certainly
> use-cases where you'd like to have your cake & eat it too as it were,
> i.e. have a nice rebased history in "git log", but also have the "raw"
> history for all the reasons the fossil people like to talk about, or for
> some compliance reasons.
> 
> But I don't see why you think this needs a new "replaces" parent pointer
> orthagonal to parent pointers, i.e. something that would need to be a
> new field in the commit object (I may have misread the proposal, it's
> not heavy on technical details).
> 
> Consider a merge use case like this:
> 
>           A---B---C topic
>          /         \
>     D---E---F---G---H master
> 
> Here we worked on a topic with commits A,B & C, maybe we regret not
> squashing B into A, but it gives us the "raw" history. Instead we might
> rebase it like this:
> 
>           A+B---C topic
>          /
>     G---H master
> 
> Now we can push "topic" to master, but as you've noted this loses the
> raw history, but now consider doing this instead:
> 
>           A---B---C   A2+B2---C2 topic
>          /         \ /
>     D---E---F---G---G master
> 
> I.e. you could have started working on commit A/B/C, now you "git
> replace" them (which would be some fancy rebase alias), and what it'll
> do is create a merge commit that entirely resolves the conflict so that
> hte tree is equivalent to what "master" was already at. Then you rewrite
> them and re-apply them on top.

1) you just described the "merging rebase" I use in Git for Windows for
*quite* a while (five years or so):

https://github.com/git-for-windows/build-extra/blob/af9cff5005/shears.sh#L12-L18

Just look for commits in https://github.com/git-for-windows/git/commits
whose oneline begins with "Start the merging-rebase".

2) you do not resolve merge conflicts here, as there may not be any.
Instead, you use the "ours" merge strategy.

> If you run "git log" it will already ignore A,B,C unless you specify
> --full-history, so git already knows to ignore these sort of side
> histories that result in no changes on the branch they got merged
> into. I don't know about bisect, but if it's not doing something similar
> already it would be easy to make it do so.

Sadly, it is not as easy as that. When you call "git log", you often want
to know *when* a change was introduced originally. In this case, you would
*not* want A, B nor C ignored, but you would really want to dig into that
history that is ignored by default.

In general, the technique you described (and that I described years before
you, and employ for years, too, so I actually already have experience with
its pros and cons) works, but leaves quite a bit to be desired.

For example, when anybody asks me "when was XYZ fixed in Git for Windows?"
it is not enough to run `git blame` and then `git name-rev` on the commit
identified by `git blame`: this would be your A2, and there could be any
number of previous iterations *of the same patch*. What I do in this case
is to search for the matching oneline in the full history, from the end.
This is a costly, and not very automatable operation (as there have been
rewordings at times, in particular when the oneline contained a typo).

> You could even add a new field to the commit object of A2+B2 & C2 which
> would be one or more of "replaces <sha1 of A/B/C>", commit objects
> support adding arbitrary new fields without anything breaking.

This is a very fragile way of doing things because you cannot fix rebases
done in the past. Those commits won't have that header, and you cannot put
it there after the fact, not even manually identifying the mapping.

Besides, there are plenty of scenarios when you do not actually want a
merging-rebase, e.g. when you develop a patch series and have to iterate
it over a dozen times until it is finally accepted into core Git. In this
instance, you may want to retain the iterations' commit histories during
the time of the development, but you probably won't need it any longer
after the patches have been integrated into a released version.

Baking those names into the commit object would kind of cause broken links
in such a scenario.

BTW it gets a lot more complicated when you think about

1) fixups and squashes, and

2) the often much more interesting question: *with what commit* was this
one replaced?

> But most importantly, while I think this gives you the same things from
> a UX level, it doesn't need any changes to fetch, push, gc or whatever,
> since it's all stuff we support today, someone just needs to hack
> "rebase" to create this sort of no-op merge commit to take advantage of
> it.

I already did that. You can use the above-linked shears.sh script to
perform such a merging-rebase.

However, my experience is that the lack of UX in Git's tools do hurt at
times, it is incorrect to say that log or blame already support this use
case (see the discussion above).

The most important part would be to record the mapping between old and new
commits; the `post-rewrite` hook would pose a natural way to implement
this: it gets called after a successful rebase, receiving a stream of
lines via stdin of the form `<old-commit> <new-commit>` (listing multiple
lines for the same <new-commit> if there were fixups and/or squashes).

This points to another command that absolutely would need patching: `git
pull` (or more correctly: `git merge`). Why, you ask? If you use that
technique (and we do, as I pointed out earlier, in Git for Windows),
contributors *will* start to send you contributions *based on previous
iterations*. If you integrate them ("pull"), the merge may very well
succeed. But the next time you rebase, those commits will not be ordered
correctly, as they are considered older than the patches on which they are
based (which have been rebased in the meantime already).

In Git for Windows, I do have these problems, and they are even worse
there because instead of a linearizing rebase, I recreate the branch
structure instead (see https://github.com/git/git/pull/447 for my upcoming
attempt to implement this functionality directly in core Git, in proper,
performant and portable C). So the base commits are *really* important
information. And I had to work harder in the past to find out what the
newer iteration of the base commit really is.

BTW I would *strongly* suggest to use notes instead of a new commit
header. You absolutely do not want to impose this on any user who does not
want it. And notes can be added for already-completed rebases. Even
manually, if need be.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 21:01   ` Carl Baldwin
  2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
  2017-12-23 22:19     ` Randall S. Becker
@ 2017-12-23 23:01     ` Johannes Schindelin
  2017-12-24 14:13       ` Alexei Lozovsky
                         ` (2 more replies)
  2 siblings, 3 replies; 44+ messages in thread
From: Johannes Schindelin @ 2017-12-23 23:01 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

Hi Carl,

On Sat, 23 Dec 2017, Carl Baldwin wrote:

> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my
> original post.

And cherry-pick, too, of course.

Both of these examples hint at a rather huge urge of some users to turn
this feature off because the referenced commits may very well be
throw-away commits in their case, making the newly-recorded information
completely undesired.

Example: I am working on a topic branch. In the middle, I see a typo. I
commit a fix, continue to work on the topic branch. Later, I cherry-pick
that commit to a separate topic branch because I really don't think that
those two topics are related. Now I definitely do not want a reference of
the cherry-picked commit to the original one: the latter will never be
pushed to a public repository, and gc'ed in a few weeks.

Of course, that is only my wish, other users in similar situations may
want that information. Demonstrating that you would be better served with
an opt-in feature that uses notes rather than a baked-in commit header.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 23:01     ` Johannes Schindelin
@ 2017-12-24 14:13       ` Alexei Lozovsky
  2018-01-04 15:44         ` Johannes Schindelin
  2017-12-25 23:43       ` Carl Baldwin
  2018-01-04 19:49       ` Martin Fick
  2 siblings, 1 reply; 44+ messages in thread
From: Alexei Lozovsky @ 2017-12-24 14:13 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Dec 24, 2017, at 01:01, Johannes Schindelin wrote:
> 
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> 
>> I imagine that a "git commit --amend" would also insert a "replaces"
>> reference to the original commit but I failed to mention that in my
>> original post.
> 
> And cherry-pick, too, of course.

Why would it? In my mind, cherry-picking does not 'replace' or 'refine'
commits, it copies them into other, unrelated branches (usually something
like stable branches maintained separately from the mainline). If anything,
cherry-pick could add a separate "cherry-picked from" reference which may
be useful, I guess, for conflict resolution if two branches with the same
commit are merged.

> Of course, that is only my wish, other users in similar situations may
> want that information. Demonstrating that you would be better served with
> an opt-in feature that uses notes rather than a baked-in commit header.

Using notes also allows to test and evaluate this new feature without
any changes to core git, using it as an extension at first.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23  6:10 Bring together merge and rebase Carl Baldwin
  2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
@ 2017-12-25  3:52 ` Theodore Ts'o
  2017-12-26  1:16   ` Carl Baldwin
  2017-12-27  4:35   ` Carl Baldwin
  2017-12-26  4:08 ` Mike Hommey
  2 siblings, 2 replies; 44+ messages in thread
From: Theodore Ts'o @ 2017-12-25  3:52 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List

On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.

As a suggestion, before diving into the technical details of your
proposal, it might be useful consider the usage scenario you are
targetting.  Things like "git rebase" and "git merge" and your
proposed "git replace/replay" are *mechanisms*.

But how they fit into a particular workflow is much more important
from a design perspective, and given that there are many different git
workflows which are used by different projects, and by different
developers within a particular project.

For example, rebase gets used in many different ways, and many of the
debates when people talk about "git rebase" being evil generally
presuppose a particular workflow that that the advocate has in mind.
If someone is using git rebase or git commit --amend before git
commits have ever been pushed out to a public repository, or to anyone
else, that's a very different case where it has been visible
elsewhere.  Even the the most strident, "you must never rewrite a
commit and all history must be preserved" generally don't insist that
every single edit must be preserved on the theory that "all history is
valuable".

> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.

If your goal is to preserve the history of the change, one of the
problems with any git-centric solution is that you generally lose the
code review feedback and the discussions that are involved with a
commit.  Just simply preserving the different versions of the commits
is going to lose a huge amount of the context that makes the history
valuable.

So for example, I would claim that if *that* is your goal, a better
solution is to use Gerrit, so that all of the different versions of
the commits are preserved along with the line-by-line comments and
discussions that were part of the code review.  In that model, each
commit has something like this in the commit trailer:

Change-Id: I8d89b33683274451bcd6bfbaf75bce98

You can then cut and paste the Change-Id into the Gerrit user
interface, and see the different commits, more important, the
discussion surrounding each change.

If the complaint about Gerrit is that it's not a core part of Git, the
challenge is (a) how to carry the code review comments in the git
repository, and (b) do so in a while that it doesn't bloat the core
repository, since most of the time, you *don't* want or need to keep a
local copy of all of the code review comments going back since the
beginning of the project.

-------------

Here's another potential use case.  The stable kernels (e.g., 3.18.y,
4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
and this is handled by putting in the commit body something like this:

    [ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

----

And here's yet another use case.  For internal Google kernel
development, we maintain a kernel that has a large number of patches
on top of a kernel version.  When we backport an upstream fix (say,
one that first appeared in the 4.12 version of the upstream kernel),
we include a line in the commit body that looks like this:

Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5

This is useful, because when we switch to use a newer upstream kernel,
we need make sure we can account for all patches that were built on
top of the 3xx kernel (which might have been using 4.10, for the sake
of argument), to the 4xx kernel series (which might be using 4.15 ---
the version numbers have been changed to protect the innocent).  This
means going through each and every patch that was on top of the 3xx
kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
that it will already be included in a 4.15 based kernel, so we don't
need to worry about carrying that patch forward.

In other cases, we might decide that the patch is no longer needed.
It could be because the patch has already be included upstream, in
which case we might check in a commit with an empty patch body, but
whose header contains something like this in the 4xx kernel:

Origin-3xx-SHA1: fe546bdfc46a92255ebbaa908dc3a942bc422faa
Upstream-Dropped-4.11-SHA1: d90dc0ae7c264735bfc5ac354c44ce2e

Or we could decide that the commit is no longer no longer needed ---
perhaps because the relevant subsystem was completely rewritten and
the functionality was added in a different way.  Then we might have
just have an empty commit with an explanation of why the commit is no
longer needed and the commit body would have the metadata:

Origin-Dropped-3xx-SHA1: 26f49fcbb45e4bc18ad5b52dc93c3afe

Or perhaps the commit is still needed, and for various reasons the
commit was never upstreamed; perhaps because it's only useful for
Google-specific hardware, or the patch was rejected upstream.  The we
will have a cherry-pick that would include in the body:

Origin-3xx-SHA1: 8f3b6df74b9b4ec3ab615effb984c1b5

(Note: all commits that are added in the rebase workflow, even the
empty commits that just have the Origin-Dropped-3xx-SHA1 or
Upstream-Droped-4.11-SHA1 headers, are patch reviewed through Gerrit,
so we have an audited, second-engineer review to make sure each commit
in the 3xx kernel that Google had been carrying had the correct
disposition when rebasing to the 4xx kernel.)

The point is that for this much more complex, real-world workflow, we
need much *more* metadata than a simple "Replaces" metadata.  (And we
also have other metadata --- for example, we have a "Tested: " trailer
that explains how to test the commit, or which unit test can and
should be used to test this commit, combined with a link to the test
log in our automated unit tester that has the test run, and a
"Rebase-Tested-4xx: " trailer that might just have the URL to the test
log when the commit was rebased since the testing instructions in the
Tested: trailer is still relevant.)

And since this metadata is not really needed by the core git
machinery, we just use text trailers in the commit body; it's not hard
to write code which parses this out of the git commit.

> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations.

I'd encourage you to think very hard about how exactly "git log" and
"gitk" might actually deal with these links.  In the Google kernel
development use cases, we use different repos for the 3xx and 4xx
kernels.  It would be possible to make hot links for the
Original-3xx-SHA1: trailers, but you couldn't do it using gitk.  It
would actually have to be a completely new tool.  (And we do have new
tools, most especially a dashboard so we can keep track of how many
commits in the 3xx kernel still have to be rebased to the 4xx kernel,
or can be confirmed to be in the upstream kernel, or can be confirmed
to be dropped.  We have a *large* number of patches that we carry, so
it's a multi-month effort involving a large number of engineers
working together to do a kernel rebase operation from a 4.x upstream
kernel to a 4.y upstream kernel.  So having a dashboard is useful
because we can see whether a particular subsystem team is ahead or
behind the curve in terms of handling those commits which are their
responsibility.)

My experience, from seeing these much more complex use cases ---
starting with something as simple as the Linux Kernel Stable Kernel
Series, and extending to something much more complex such as the
workflow that is used to support a Google Kernel Rebase, is that using
just a simple extra "Replaces" pointer in the commit header is not
nearly expressive enough.  And, if you make it a core part of the
commit data structure, there are all sorts of compatibility headaches
with older versions of git that wouldn't know about it.  And if it
then turns out it's not sufficient more the more complex workflows
*anyway*, maybe adding a new "replace" pointer in the core git data
structures isn't worth it.  It might be that just keeping such things
as trailers in the commit body might be the better way to go.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 22:19     ` Randall S. Becker
@ 2017-12-25 20:05       ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-25 20:05 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Git Mailing List'

On Sat, Dec 23, 2017 at 05:19:35PM -0500, Randall S. Becker wrote:
> No matter how this plays out, let's please make very sure to provide
> sufficient user documentation so that those of us who have to explain
> the differences to users have a decent reference. Even now, explaining
> rebase vs. merge is difficult enough for people new to git to choose
> which to use when (sometimes pummeling is involved to get the point
> across 😉 ), even though it should be intuitive to most of us. I am
> predicting that adding this capability is going to further confuse the
> *new* user community a little. Entirely out of enlighted
> self-interest, I am offering to help document
> (edits/contribution//whatever) this once we get to that point in
> development.

I agree. I have a feeling that it may take a while for this to play out.
This has been on my mind for a while and think there will be some more
discussion before anything gets started.

Carl

> Something else to consider is how (or if) this capability is going to
> be presented in front-ends and in Cloud services. GitK is a given, of
> course. I'm still impatiently waiting for worktree support from some
> other front-ends.

It all takes time. :)

> Cheers,
> Randall
> 
> -- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000)
> -- In my real life, I talk too much.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 23:01     ` Johannes Schindelin
  2017-12-24 14:13       ` Alexei Lozovsky
@ 2017-12-25 23:43       ` Carl Baldwin
  2017-12-26  0:01         ` Randall S. Becker
  2018-01-04 19:49       ` Martin Fick
  2 siblings, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2017-12-25 23:43 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Sun, Dec 24, 2017 at 12:01:38AM +0100, Johannes Schindelin wrote:
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> 
> > I imagine that a "git commit --amend" would also insert a "replaces"
> > reference to the original commit but I failed to mention that in my
> > original post.
> 
> And cherry-pick, too, of course.

This brings up a good point. I do think this can be applied to
cherry-pick, but as someone else pointed out, the name "replaces"
doesn't seem right in the context of a cherry-pick. So, maybe "replaces"
is not the right name. I'm open to suggestions.

It occurs to me now that the reason that I want a separate, orthogonal
history dimension is that a "replaces" reference does not imply that the
referenced commit is pulled in with all of its history like a "parent"
reference does. It isn't creating a merge commit. It means that the
referenced commit is derived from the other one and, at least in the
context of this branch's main history, renders it obsolete. Given this
definition, I think it applies to a cherry-pick.

> Both of these examples hint at a rather huge urge of some users to turn
> this feature off because the referenced commits may very well be
> throw-away commits in their case, making the newly-recorded information
> completely undesired.

I certainly don't want to make it difficult to get rid of throw-away
commits.

The workflows I'm interested in are mostly around iterating on what will
end up looking like a single commit in the final history. I'm imagining
posting a change, (or changes) somewhere to be reviewed by others.
Others submit feedback and I continue iterating given the feedback. If
certain intermediate throw-away commits have only been seen locally by
the author, they could be squashed into a single minimal new update.

I'm diving deeper into these workflows in my reply to Theodore. To avoid
fragmenting my ideas too much, I'll take the details over to that reply.
I hope to finished that soon.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: Bring together merge and rebase
  2017-12-25 23:43       ` Carl Baldwin
@ 2017-12-26  0:01         ` Randall S. Becker
  0 siblings, 0 replies; 44+ messages in thread
From: Randall S. Becker @ 2017-12-26  0:01 UTC (permalink / raw)
  To: 'Carl Baldwin', 'Johannes Schindelin'
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Git Mailing List'

On December 25, 2017 6:44 PM Carl Baldwin wrote:
> On Sun, Dec 24, 2017 at 12:01:38AM +0100, Johannes Schindelin wrote:
> > On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > > I imagine that a "git commit --amend" would also insert a "replaces"
> > > reference to the original commit but I failed to mention that in my
> > > original post.
> >
> > And cherry-pick, too, of course.
> 
> This brings up a good point. I do think this can be applied to cherry-pick, but
> as someone else pointed out, the name "replaces"
> doesn't seem right in the context of a cherry-pick. So, maybe "replaces"
> is not the right name. I'm open to suggestions.

Just an off the wall suggestion: what about "stitch" or "suture" since this is now way beyond a band-aid solution (sorry 😉 , but only a little). I was thinking along the lines of "blend" but that seems less graphic and doesn't apply to cherry-picking.

Holiday Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400)/NonStop(211288444200000000)
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
@ 2017-12-26  0:16       ` Carl Baldwin
  2017-12-26  1:28         ` Jacob Keller
  2017-12-26 17:49         ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26  0:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> But I don't see why you think this needs a new "replaces" parent
> >> pointer orthagonal to parent pointers, i.e. something that would
> >> need to be a new field in the commit object (I may have misread the
> >> proposal, it's not heavy on technical details).
> >
> > Just to clarify, I am proposing a new "replaces" pointer in the commit
> > object. Imagine starting with rebase exactly as it works today. This new
> > field would be inserted into any new commit created by a rebase command
> > to reference the original commit on which it was based. Though, I'm not
> > sure if it would be better to change the behavior of the existing rebase
> > command, provide a switch or config option to turn it on, or provide a
> > new command entirely (e.g. git replay or git replace) to avoid
> > compatibility issues with the existing rebase.
> 
> Yeah that sounds fine, I thought you meant that this "replaces" field
> would replace the "parent" field, which would require some rather deep
> incompatible changes to all git clients.
> 
> But then I don't get why you think fetch/pull/gc would need to be
> altered, if it's because you thought that adding arbitrary *new* fields
> to the commit object would require changes to those that's not the case.

Thank you again for your reply. Following is the kind of commit that I
would like to create.

    tree fcce2f309177c7da9c795448a3e392a137434cf1
    parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
    replaces e8aa79baf6aef573da930a385e4db915187d5187
    author Carl Baldwin <carl@ecbaldwin.net> 1514057225 -0700
    committer Carl Baldwin <carl@ecbaldwin.net> 1514058444 -0700

What will happen if I create this today? I assumed git would just choke
on it but I'm not certain. It has been a long time since I attempted to
get into the internals of git.

Even if core git code does not simply choke on it, I would like push and
pull to follow these pointers and transfer the history behind them. I
assumed that git would not do this today. I would also like gc to
preserve e8aa79baf6 as if it were referenced by a parent pointer so that
it doesn't purge it from the history.

I'm currently thinking of an example of the workflow that I'm after in
response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
it makes it clearer why I want it this way.

[snip]

> Instead, if I understand what you're actually trying to do, it could
> also be done as:
> 
>  1) Just add a new replaces <sha1> field to new commit objects
> 
>  2) Make git-rebase know how to write those, e.g. add two of those
>     pointing to A & B when it squashes them into AB.
> 
>  3) Write a history traversal mechanism similar to --full-history
>     that'll ignore any commits on branches that yield no changes, or
>     only those whose commits are referenced by this "replaces" field.
> 
> You'd then end up with:
> 
>  A) A way to "stash" these commits in the permanent history
> 
>  B) ... that wouldn't be visble in "git log" by default
> 
>  C) Would require no underlying changes to the commit model, i.e. it
>     would work with all past & future git clients, if they didn't know
>     about the "replaces" field they'd just show more verbose history.

I get this point. I don't underestimate how difficult making such a
change to the core model is. I know there are older clients which cannot
simply be updated. There are also alternate implementations (e.g. jgit)
that also need to be considered. This is the thing I worry about the
most. I think at the very least, this new feature will have to be an
opt-in feature for teams who can easily ensure a minimum version of git
will be used. Maybe the core.repositoryformatversion config or something
like that would have to play into it. There may also be some minimal
amount that could be backported to older clients to at least avoid
choking on new repos (I know this doesn't guarantee older clients will
be updated). Just throwing a few ideas out.

I want to be sure that the implications have been explored before giving
up and doing something external to git.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-25  3:52 ` Theodore Ts'o
@ 2017-12-26  1:16   ` Carl Baldwin
  2017-12-26  1:47     ` Jacob Keller
                       ` (2 more replies)
  2017-12-27  4:35   ` Carl Baldwin
  1 sibling, 3 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26  1:16 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Git Mailing List

On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
> As a suggestion, before diving into the technical details of your
> proposal, it might be useful consider the usage scenario you are
> targetting.  Things like "git rebase" and "git merge" and your
> proposed "git replace/replay" are *mechanisms*.
> 
> But how they fit into a particular workflow is much more important
> from a design perspective, and given that there are many different git
> workflows which are used by different projects, and by different
> developers within a particular project.
> 
> For example, rebase gets used in many different ways, and many of the
> debates when people talk about "git rebase" being evil generally
> presuppose a particular workflow that that the advocate has in mind.
> If someone is using git rebase or git commit --amend before git
> commits have ever been pushed out to a public repository, or to anyone
> else, that's a very different case where it has been visible
> elsewhere.  Even the the most strident, "you must never rewrite a
> commit and all history must be preserved" generally don't insist that
> every single edit must be preserved on the theory that "all history is
> valuable".
> 
> > The git history now has two dimensions. The first shows a cleaned up
> > history where fix ups and code review feedback have been rolled into
> > the original changes and changes can possibly be ordered in a nice
> > linear progression that is much easier to understand. The second
> > drills into the history of a change. There is no loss and you don't
> > change history in a way that will cause problems for others who have
> > the older commits.
> 
> If your goal is to preserve the history of the change, one of the
> problems with any git-centric solution is that you generally lose the
> code review feedback and the discussions that are involved with a
> commit.  Just simply preserving the different versions of the commits
> is going to lose a huge amount of the context that makes the history
> valuable.
> 
> So for example, I would claim that if *that* is your goal, a better
> solution is to use Gerrit, so that all of the different versions of
> the commits are preserved along with the line-by-line comments and
> discussions that were part of the code review.  In that model, each
> commit has something like this in the commit trailer:
> 
> Change-Id: I8d89b33683274451bcd6bfbaf75bce98

Thank you for your reply. I agree that discussing the workflows is very
valuable and I certainly haven't done that justice yet.

Gerrit is the tool that got me thinking about my proposal in the first
place. I spent a few years developing and doing a significant number of
code reviews using it. I've since changed to an environment where I no
longer have it. It turns out that "a better solution is to use Gerrit"
is not helpful to me now because it isn't up to me. Gerrit is not nearly
as ubiquitous as git itself.

In my opinion, Gerrit has shown us the power of the "change". As you
point out, it introduced the change-id embedded into the commit message
and uses it to track a change's progress as a "review." I think these
are powerful concepts and Gerrit did a nice job with them. I guess one
of my goals with my proposal here is to formalize the "change" idea so
that any git-based tool understands it and can interoperate. This is why
I want it in the core git commit object and I want push, pull, gc, and
other commands to understand it.

At this point, you might wonder why I'm not proposing to simply add a
"change-id" to the commit object. The short answer is that the
"change-id" Gerrit uses in the commit messages cannot stand on its own.
It depends on data stored on the server which maintains a relationship
of commits to a review number and a linear ordering of commits within
the review (hopefully I'm not over simplifying this). The "replaces"
reference is an attempt to make something which can stand on its own. I
don't think we need to solve the problem of where to keep comments at
this point.

An unbroken chain of "replaces" references obviates the need for the
change id in the commit message. From any given commit in the chain, we
can follow the references to the first commit which started the review.
However, the chain is even more useful because it is not limited to a
linear progression of revisions. Let me try to explain how this can
solve some of the most common issues I ran into with the rebase type
workflow.

Look at what happens in a rebase type workflow in any of the following
scenarios. All of these came up regularly in my time with Gerrit.

    1. Make a quick edit through the web UI then later work on the
       change again in your local clone. It is easy to forget to pull
       down the change made through the UI before starting to work on it
       again. If that happens, the change made through the UI will
       almost certainly be clobbered.

    2. You or someone else creates a second change that is dependent on
       yours and works on it while yours is still evolving. If the
       second change gets rebased with an older copy of the base change
       and then posted back up for review, newer work in the base change
       has just been clobbered.

    3. As a reviewer, you decide the best way to explain how you'd like
       to see something done differently is to make the quick change
       yourself and push it up. If the author fails to fetch what you
       pushed before continuing onto something else, it gets clobbered.

    4. You want to collaborate on a single change with someone else in
       any way and for whatever reason. As soon as that change starts
       hitting multiple work spaces, there are synchronization issues
       that currently take careful manual intervention.

These kinds of scenarios usually end up being used as arguments against
a rebased based workflow. On the other hand, with a chain of "replaces"
references, these scenarios end up branching the change. This is where
it will be useful for my local git command to be able to pull down the
upstream state, understand what branched, help me merge, and then push
the result upstream. I really think this brings the power and benefits
of branching and merging to the rebase workflow.

Anyway, now I am compelled to use github which is also a fine tool and I
appreciate all of the work that has gone into it. About 80% of the time,
I rebase and force push to my branch to update a pull request. I've come
to like the end product of the rebase workflow. However, github doesn't
excel at this approach. For one, it doesn't preserve older revisions
which were already reviewed which makes it is difficult for reviewers to
pick up where they left off the last time. If it preserved them, as
gerrit does, the reviewer can compare a new revision with the most
recent older revision they reviewed to see just what has been addressed
since then.

I think it would be great if git standardized the way that revisions of
"changes" are preserved in the repository as commits. Preserving the
comments attached to each commit could be left up to each of the tools,
in my opinion. Or, at least left to a later discussion.

The other 20% of the time, I revert to branching and merging only. This
helps reviewers with the problem of picking up where they left off. It
is also an absolute necessity if another developer is going to be
collaborating with me on the pr or basing a dependent pr on it. However,
it leaves the history in a more complex state in the end. Github offers
"squash merge" and "rebase merge" to help out with this but these don't
give me the control I want over what goes into each change when I want
to end up with more than one. They also cause problems if there is a
dependent pr.

After a pull request in github has progressed, it can be difficult for a
new reviewer to jump on. They may want to review a large pr commit by
commit from the bottom up instead of trying to tackle the entire thing
at once. However, if fixups have been made in later commits, they will
be looking at the old stuff first, seeing all the bugs and issues that
have already been addressed.

> You can then cut and paste the Change-Id into the Gerrit user
> interface, and see the different commits, more important, the
> discussion surrounding each change.
> 
> 
> If the complaint about Gerrit is that it's not a core part of Git, the
> challenge is (a) how to carry the code review comments in the git
> repository, and (b) do so in a while that it doesn't bloat the core
> repository, since most of the time, you *don't* want or need to keep a
> local copy of all of the code review comments going back since the
> beginning of the project.
> 
> -------------

I need to give your kernel patching use cases some thought. I once
designed a process to do similar patching against a different project so
I think I know what you're getting at. I just need a little more time to
think about it. Hopefully, I'll have a little more time to post another
reply.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  0:16       ` Carl Baldwin
@ 2017-12-26  1:28         ` Jacob Keller
  2017-12-26 23:30           ` Igor Djordjevic
  2017-12-26 17:49         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 44+ messages in thread
From: Jacob Keller @ 2017-12-26  1:28 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Mon, Dec 25, 2017 at 4:16 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
> On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ęvar Arnfjörš Bjarmason wrote:
>> >> But I don't see why you think this needs a new "replaces" parent
>> >> pointer orthagonal to parent pointers, i.e. something that would
>> >> need to be a new field in the commit object (I may have misread the
>> >> proposal, it's not heavy on technical details).
>> >
>> > Just to clarify, I am proposing a new "replaces" pointer in the commit
>> > object. Imagine starting with rebase exactly as it works today. This new
>> > field would be inserted into any new commit created by a rebase command
>> > to reference the original commit on which it was based. Though, I'm not
>> > sure if it would be better to change the behavior of the existing rebase
>> > command, provide a switch or config option to turn it on, or provide a
>> > new command entirely (e.g. git replay or git replace) to avoid
>> > compatibility issues with the existing rebase.
>>
>> Yeah that sounds fine, I thought you meant that this "replaces" field
>> would replace the "parent" field, which would require some rather deep
>> incompatible changes to all git clients.
>>
>> But then I don't get why you think fetch/pull/gc would need to be
>> altered, if it's because you thought that adding arbitrary *new* fields
>> to the commit object would require changes to those that's not the case.
>
> Thank you again for your reply. Following is the kind of commit that I
> would like to create.
>
>     tree fcce2f309177c7da9c795448a3e392a137434cf1
>     parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
>     replaces e8aa79baf6aef573da930a385e4db915187d5187
>     author Carl Baldwin <carl@ecbaldwin.net> 1514057225 -0700
>     committer Carl Baldwin <carl@ecbaldwin.net> 1514058444 -0700
>
> What will happen if I create this today? I assumed git would just choke
> on it but I'm not certain. It has been a long time since I attempted to
> get into the internals of git.
>
> Even if core git code does not simply choke on it, I would like push and
> pull to follow these pointers and transfer the history behind them. I
> assumed that git would not do this today. I would also like gc to
> preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> it doesn't purge it from the history.
>
> I'm currently thinking of an example of the workflow that I'm after in
> response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
> it makes it clearer why I want it this way.
>
> [snip]
>
>> Instead, if I understand what you're actually trying to do, it could
>> also be done as:
>>
>>  1) Just add a new replaces <sha1> field to new commit objects
>>
>>  2) Make git-rebase know how to write those, e.g. add two of those
>>     pointing to A & B when it squashes them into AB.
>>
>>  3) Write a history traversal mechanism similar to --full-history
>>     that'll ignore any commits on branches that yield no changes, or
>>     only those whose commits are referenced by this "replaces" field.
>>
>> You'd then end up with:
>>
>>  A) A way to "stash" these commits in the permanent history
>>
>>  B) ... that wouldn't be visble in "git log" by default
>>
>>  C) Would require no underlying changes to the commit model, i.e. it
>>     would work with all past & future git clients, if they didn't know
>>     about the "replaces" field they'd just show more verbose history.
>
> I get this point. I don't underestimate how difficult making such a
> change to the core model is. I know there are older clients which cannot
> simply be updated. There are also alternate implementations (e.g. jgit)
> that also need to be considered. This is the thing I worry about the
> most. I think at the very least, this new feature will have to be an
> opt-in feature for teams who can easily ensure a minimum version of git
> will be used. Maybe the core.repositoryformatversion config or something
> like that would have to play into it. There may also be some minimal
> amount that could be backported to older clients to at least avoid
> choking on new repos (I know this doesn't guarantee older clients will
> be updated). Just throwing a few ideas out.
>
> I want to be sure that the implications have been explored before giving
> up and doing something external to git.
>
> Carl

What about some way to take the reflog and turn it into a commit-based
linkage and export that? Rather than tying it into the individual
commit history, keep track of it outside the commit, possibly via
something like notes, or some other mechanism.

This also ties into work done by Josh Triplett on git series [1] and
some previous mail discussions that I remember. He had some mechanism
for tracking series history which works ok, but can cause problems you
mentioned when simply adding a second parent commit.

I tend to think some mechanism to store both patch/commit history and
review based comments would be a very useful thing to integrate so
that multiple platforms had a more generic way of sharing things such
as line-based commentary, and so on. It could even be some sort of
unformatted method to at least get the mechanism of "how to share this
among clients" to be stable across tools, even if each review tool
made its own format (thus we don't lock the *type* of review comments
in stone).

I definitely think that storing just the history of commits isn't as
valuable without storing the comments made in the review process.

-Jake

[1] https://github.com/git-series/git-series

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  1:16   ` Carl Baldwin
@ 2017-12-26  1:47     ` Jacob Keller
  2017-12-26  6:02       ` Carl Baldwin
  2017-12-26 18:04     ` Theodore Ts'o
  2018-01-04 19:54     ` Martin Fick
  2 siblings, 1 reply; 44+ messages in thread
From: Jacob Keller @ 2017-12-26  1:47 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Theodore Ts'o, Git Mailing List

On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
> Anyway, now I am compelled to use github which is also a fine tool and I
> appreciate all of the work that has gone into it. About 80% of the time,
> I rebase and force push to my branch to update a pull request. I've come
> to like the end product of the rebase workflow. However, github doesn't
> excel at this approach. For one, it doesn't preserve older revisions
> which were already reviewed which makes it is difficult for reviewers to
> pick up where they left off the last time. If it preserved them, as
> gerrit does, the reviewer can compare a new revision with the most
> recent older revision they reviewed to see just what has been addressed
> since then.

A bit of a tangent here, but a thought I didn't wanna lose: In the
general case where a patch was rebased and the original parent pointer
was changed, it is actually quite hard to show a diff of what changed
between versions.

The best I've found is to do something like a 4-way --cc merge diff,
which mostly works, but has a few awkward cases, and ends up usually
showing double ++ and -- notation.

Just something I've thought about a fair bit, trying to come up with
some good way to show "what changed between A1 and A2, but ignore all
changes between parent P1 and P2 which you don't care that much about
in this context.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23  6:10 Bring together merge and rebase Carl Baldwin
  2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
  2017-12-25  3:52 ` Theodore Ts'o
@ 2017-12-26  4:08 ` Mike Hommey
  2017-12-27  2:44   ` Carl Baldwin
  2 siblings, 1 reply; 44+ messages in thread
From: Mike Hommey @ 2017-12-26  4:08 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List

On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> The big contention among git users is whether to rebase or to merge
> changes [2][3] while iterating. I used to firmly believe that merging
> was the way to go and rebase was harmful. More recently, I have worked
> in some environments where I saw rebase used very effectively while
> iterating on changes and I relaxed my stance a lot. Now, I'm on the
> fence. I appreciate the strengths and weaknesses of both approaches. I
> waffle between the two depending on the situation, the tools being
> used, and I guess, to some extent, my mood.
> 
> I think what git needs is something brand new that brings the two
> together and has all of the advantages of both approaches. Let me
> explain what I've got in mind...
> 
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.
> 
> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.
> 
> Replay handles collaboration between multiple authors on a single
> change. This is difficult and prone to accidental loss when using
> rebase and it results in a complex history when done with merge. With
> replay, collaborators could merge while collaborating on a single
> change and a record of each one's contributions can be preserved.
> Attempting this level of collaboration caused me many headaches when I
> worked with the gerrit workflow (which in many ways, I like a lot).
> 
> I blogged about this proposal earlier this year when I first thought
> of it [1]. I got busy and didn't think about it for a while. Now with
> a little time off of work, I've come back to revisit it. The blog
> entry has a few examples showing how it works and how the history will
> look in a few examples. Take a look.
> 
> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations. Bisect is a interesting one. I
> tend to think that bisect should prefer the regular commit history but
> have the ability to drill into the change history if necessary.
> 
> In my opinion, this proposal would bring together rebase and merge in
> a powerful way and could end the contention. Thanks for your
> consideration.

FWIW, your proposal has a lot in common (but is not quite equivalent) to
mercurial's obsolescence markers and changeset evolution features.

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  1:47     ` Jacob Keller
@ 2017-12-26  6:02       ` Carl Baldwin
  2017-12-26  8:40         ` Jacob Keller
  0 siblings, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26  6:02 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Theodore Ts'o, Git Mailing List

On Mon, Dec 25, 2017 at 05:47:55PM -0800, Jacob Keller wrote:
> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
> > Anyway, now I am compelled to use github which is also a fine tool and I
> > appreciate all of the work that has gone into it. About 80% of the time,
> > I rebase and force push to my branch to update a pull request. I've come
> > to like the end product of the rebase workflow. However, github doesn't
> > excel at this approach. For one, it doesn't preserve older revisions
> > which were already reviewed which makes it is difficult for reviewers to
> > pick up where they left off the last time. If it preserved them, as
> > gerrit does, the reviewer can compare a new revision with the most
> > recent older revision they reviewed to see just what has been addressed
> > since then.
> 
> A bit of a tangent here, but a thought I didn't wanna lose: In the
> general case where a patch was rebased and the original parent pointer
> was changed, it is actually quite hard to show a diff of what changed
> between versions.
>
> The best I've found is to do something like a 4-way --cc merge diff,
> which mostly works, but has a few awkward cases, and ends up usually
> showing double ++ and -- notation.
>
> Just something I've thought about a fair bit, trying to come up with
> some good way to show "what changed between A1 and A2, but ignore all
> changes between parent P1 and P2 which you don't care that much about
> in this context.

I ran into this all the time with gerrit. I wrote a script that you'd
run on a working copy (with no local changes). I'd fetch and checkout
the latest patchset that I want to review(say, for example, its patchset
5) from gerrit. Then, say I wanted to compare it with patch set 3 which
has a different parent. I'd run this from the top level of my working
copy.

    compare-to-previous-patchset 3

It would fetch patch set 3 from gerrit, rebase it to the same parent as
the current patch set on a detached HEAD and then git diff it with the
current patch set. If there were conflicts, it would just commit the
conflict markers to the commit. There is no attempt to resolve the
conflicts. The script was crude but it helped me out many times and it
was nice to be able to review how conflicts were resolved when those
came up.

Carl

PS In case you're curious, here's my script...

#!/bin/bash

remote=gerrit
previous_patchset=$1; shift

# Assumes we're sitting on the latest patch set.
new_patch_set_id=$(git rev-parse HEAD)

branch=$(git branch | awk '/^\*/ {print$2}')
[ "$branch" = "(no" ] && branch=

# set user, host, port, and project from git config
eval $(echo "$(git config remote.$remote.url)" |
       sed 's,ssh://\(.*\)@\(.*\):\([[:digit:]]*\)/\(.*\).git,user=\1 host=\2 p<

gerrit() {
    ssh $user@$host -p $port gerrit ${1+"$@"}
}

# Grabs a bunch of information from gerrit about the current patch
eval $(gerrit query --current-patch-set $new_patch_set_id |
    awk '
        BEGIN {mode="main"}
        / currentPatchSet:/ { mode="currentPatchSet" }
        / ref:/ { printf "new_patch_ref=%s\n", $2 }
        / number:/ {
            if (mode=="main") {
                printf "review_num=%s\n", $2
            }
            if (mode=="currentPatchSet") {
                printf "new_patchset=%s\n", $2
            }
        }
    ')

# Fetch the old patch set
old_patch_ref=${new_patch_ref%$new_patchset}$previous_patchset
git fetch $remote $old_patch_ref && git checkout FETCH_HEAD

# Rebase the old patch set to the parent of the new patch set.
if ! git rebase HEAD^ --onto ${new_patch_set_id}^
then
    git diff --name-only --diff-filter=U -z | xargs -0 git add
    git rebase --continue
fi

previous_patchset_rebased=$(git rev-parse HEAD)

# Go back to the new patch set and diff it against the rebased old one.
if [ "$branch" ]
then
    git checkout $branch
else
    git checkout $new_patch_set_id
fi
git diff $previous_patchset_rebased

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  6:02       ` Carl Baldwin
@ 2017-12-26  8:40         ` Jacob Keller
  2018-01-04 19:19           ` Martin Fick
  0 siblings, 1 reply; 44+ messages in thread
From: Jacob Keller @ 2017-12-26  8:40 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Theodore Ts'o, Git Mailing List

On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
> On Mon, Dec 25, 2017 at 05:47:55PM -0800, Jacob Keller wrote:
>> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin <carl@ecbaldwin.net> wrote:
>> > Anyway, now I am compelled to use github which is also a fine tool and I
>> > appreciate all of the work that has gone into it. About 80% of the time,
>> > I rebase and force push to my branch to update a pull request. I've come
>> > to like the end product of the rebase workflow. However, github doesn't
>> > excel at this approach. For one, it doesn't preserve older revisions
>> > which were already reviewed which makes it is difficult for reviewers to
>> > pick up where they left off the last time. If it preserved them, as
>> > gerrit does, the reviewer can compare a new revision with the most
>> > recent older revision they reviewed to see just what has been addressed
>> > since then.
>>
>> A bit of a tangent here, but a thought I didn't wanna lose: In the
>> general case where a patch was rebased and the original parent pointer
>> was changed, it is actually quite hard to show a diff of what changed
>> between versions.
>>
>> The best I've found is to do something like a 4-way --cc merge diff,
>> which mostly works, but has a few awkward cases, and ends up usually
>> showing double ++ and -- notation.
>>
>> Just something I've thought about a fair bit, trying to come up with
>> some good way to show "what changed between A1 and A2, but ignore all
>> changes between parent P1 and P2 which you don't care that much about
>> in this context.
>
> I ran into this all the time with gerrit. I wrote a script that you'd
> run on a working copy (with no local changes). I'd fetch and checkout
> the latest patchset that I want to review(say, for example, its patchset
> 5) from gerrit. Then, say I wanted to compare it with patch set 3 which
> has a different parent. I'd run this from the top level of my working
> copy.
>
>     compare-to-previous-patchset 3
>
> It would fetch patch set 3 from gerrit, rebase it to the same parent as
> the current patch set on a detached HEAD and then git diff it with the
> current patch set. If there were conflicts, it would just commit the
> conflict markers to the commit. There is no attempt to resolve the
> conflicts. The script was crude but it helped me out many times and it
> was nice to be able to review how conflicts were resolved when those
> came up.
>
> Carl
>

Interesting. That could work fairly well. I usually do something along
the lines of:

git diff patch-new patch-old patch-base-new patch-base-old --cc, which
produces a combined diff format patch which usually works ok.

My biggest gripes are that the gerrit web interface doesn't itself do
something like this (and jgit does not appear to be able to generate
combined diffs at all!)

> PS In case you're curious, here's my script...
>
> #!/bin/bash
>
> remote=gerrit
> previous_patchset=$1; shift
>
> # Assumes we're sitting on the latest patch set.
> new_patch_set_id=$(git rev-parse HEAD)
>
> branch=$(git branch | awk '/^\*/ {print$2}')
> [ "$branch" = "(no" ] && branch=
>
> # set user, host, port, and project from git config
> eval $(echo "$(git config remote.$remote.url)" |
>        sed 's,ssh://\(.*\)@\(.*\):\([[:digit:]]*\)/\(.*\).git,user=\1 host=\2 p<
>
> gerrit() {
>     ssh $user@$host -p $port gerrit ${1+"$@"}
> }
>
> # Grabs a bunch of information from gerrit about the current patch
> eval $(gerrit query --current-patch-set $new_patch_set_id |
>     awk '
>         BEGIN {mode="main"}
>         / currentPatchSet:/ { mode="currentPatchSet" }
>         / ref:/ { printf "new_patch_ref=%s\n", $2 }
>         / number:/ {
>             if (mode=="main") {
>                 printf "review_num=%s\n", $2
>             }
>             if (mode=="currentPatchSet") {
>                 printf "new_patchset=%s\n", $2
>             }
>         }
>     ')
>
> # Fetch the old patch set
> old_patch_ref=${new_patch_ref%$new_patchset}$previous_patchset
> git fetch $remote $old_patch_ref && git checkout FETCH_HEAD
>
> # Rebase the old patch set to the parent of the new patch set.
> if ! git rebase HEAD^ --onto ${new_patch_set_id}^
> then
>     git diff --name-only --diff-filter=U -z | xargs -0 git add
>     git rebase --continue
> fi
>
> previous_patchset_rebased=$(git rev-parse HEAD)
>
> # Go back to the new patch set and diff it against the rebased old one.
> if [ "$branch" ]
> then
>     git checkout $branch
> else
>     git checkout $new_patch_set_id
> fi
> git diff $previous_patchset_rebased

One thing you might do is have it create a temporary worktree in order
to avoid problems with being in the local checkout.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  0:16       ` Carl Baldwin
  2017-12-26  1:28         ` Jacob Keller
@ 2017-12-26 17:49         ` Ævar Arnfjörð Bjarmason
  2017-12-26 19:44           ` Carl Baldwin
  1 sibling, 1 reply; 44+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-12-26 17:49 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List


On Tue, Dec 26 2017, Carl Baldwin jotted:

> On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> But I don't see why you think this needs a new "replaces" parent
>> >> pointer orthagonal to parent pointers, i.e. something that would
>> >> need to be a new field in the commit object (I may have misread the
>> >> proposal, it's not heavy on technical details).
>> >
>> > Just to clarify, I am proposing a new "replaces" pointer in the commit
>> > object. Imagine starting with rebase exactly as it works today. This new
>> > field would be inserted into any new commit created by a rebase command
>> > to reference the original commit on which it was based. Though, I'm not
>> > sure if it would be better to change the behavior of the existing rebase
>> > command, provide a switch or config option to turn it on, or provide a
>> > new command entirely (e.g. git replay or git replace) to avoid
>> > compatibility issues with the existing rebase.
>>
>> Yeah that sounds fine, I thought you meant that this "replaces" field
>> would replace the "parent" field, which would require some rather deep
>> incompatible changes to all git clients.
>>
>> But then I don't get why you think fetch/pull/gc would need to be
>> altered, if it's because you thought that adding arbitrary *new* fields
>> to the commit object would require changes to those that's not the case.
>
> Thank you again for your reply. Following is the kind of commit that I
> would like to create.
>
>     tree fcce2f309177c7da9c795448a3e392a137434cf1
>     parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
>     replaces e8aa79baf6aef573da930a385e4db915187d5187
>     author Carl Baldwin <carl@ecbaldwin.net> 1514057225 -0700
>     committer Carl Baldwin <carl@ecbaldwin.net> 1514058444 -0700
>
> What will happen if I create this today? I assumed git would just choke
> on it but I'm not certain. It has been a long time since I attempted to
> get into the internals of git.

New headers should be added after existing headers, but other than that
it won't choke on it. See 4b2bced559 when the encoding header was added,
this also passes most tests:

    diff --git a/commit.c b/commit.c
    index cab8d4455b..cd2bafbaa0 100644
    --- a/commit.c
    +++ b/commit.c
    @@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t msg_len,
            if (!encoding_is_utf8)
                    strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);

    +       strbuf_addf(&buffer, "replaces 0000000000000000000000000000000000000000\n");
    +
            while (extra) {
                    add_extra_header(&buffer, extra);
                    extra = extra->next;

Only "most" since of course this changes the sha1 of every commit git
creates from what you get now.

> Even if core git code does not simply choke on it, I would like push and
> pull to follow these pointers and transfer the history behind them. I
> assumed that git would not do this today. I would also like gc to
> preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> it doesn't purge it from the history.

It won't pay any attention to them if "replaces" is something entirely
new, what I was pointing out in my earlier reply is that you can simply
*also* create the parent pointers to these no-op merge commits that hide
away the previous history the "replaces" headers will be referencing.

The reason to do that is 100% backwards compatibility, and and only
needing to make minor UI changes to have this feature (to e.g. history
walking), as opposed to needing to hack everything that now follows
"parent" or constructs a commit graph.

> I'm currently thinking of an example of the workflow that I'm after in
> response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
> it makes it clearer why I want it this way.
>
> [snip]
>
>> Instead, if I understand what you're actually trying to do, it could
>> also be done as:
>>
>>  1) Just add a new replaces <sha1> field to new commit objects
>>
>>  2) Make git-rebase know how to write those, e.g. add two of those
>>     pointing to A & B when it squashes them into AB.
>>
>>  3) Write a history traversal mechanism similar to --full-history
>>     that'll ignore any commits on branches that yield no changes, or
>>     only those whose commits are referenced by this "replaces" field.
>>
>> You'd then end up with:
>>
>>  A) A way to "stash" these commits in the permanent history
>>
>>  B) ... that wouldn't be visble in "git log" by default
>>
>>  C) Would require no underlying changes to the commit model, i.e. it
>>     would work with all past & future git clients, if they didn't know
>>     about the "replaces" field they'd just show more verbose history.
>
> I get this point. I don't underestimate how difficult making such a
> change to the core model is. I know there are older clients which cannot
> simply be updated. There are also alternate implementations (e.g. jgit)
> that also need to be considered. This is the thing I worry about the
> most. I think at the very least, this new feature will have to be an
> opt-in feature for teams who can easily ensure a minimum version of git
> will be used. Maybe the core.repositoryformatversion config or something
> like that would have to play into it. There may also be some minimal
> amount that could be backported to older clients to at least avoid
> choking on new repos (I know this doesn't guarantee older clients will
> be updated). Just throwing a few ideas out.

Sure, it could be opt in, be a new format etc. But you haven't explained
why you think a feature like this would need to rely on an entirely new
parent structure and side-DAG, as opposed to just the more minor changes
I'm pointing out above, and which I think will give you what you need
from a UX level.

> I want to be sure that the implications have been explored before giving
> up and doing something external to git.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  1:16   ` Carl Baldwin
  2017-12-26  1:47     ` Jacob Keller
@ 2017-12-26 18:04     ` Theodore Ts'o
  2017-12-26 20:31       ` Carl Baldwin
  2018-01-04 19:54     ` Martin Fick
  2 siblings, 1 reply; 44+ messages in thread
From: Theodore Ts'o @ 2017-12-26 18:04 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Git Mailing List

On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> At this point, you might wonder why I'm not proposing to simply add a
> "change-id" to the commit object. The short answer is that the
> "change-id" Gerrit uses in the commit messages cannot stand on its own.
> It depends on data stored on the server which maintains a relationship
> of commits to a review number and a linear ordering of commits within
> the review (hopefully I'm not over simplifying this). The "replaces"
> reference is an attempt to make something which can stand on its own. I
> don't think we need to solve the problem of where to keep comments at
> this point.

I strongly disagree, and one way to see that is by doing a real-life
experiment.  If you take a look at a gerrit change that, which in my
experience can have up to ten or twelve revisions, and strip out the
comments, so all you get to look at it is half-dozen or more
revisions.  How useful is it *really*?  How does it get used in
practice?  What development problem does it help to solve?

And when you say that it is a bug that the Gerrit Change-Id does not
stand alone, consider that it can also be a *feature*.  If you keep
all of this in the main repo, the number of commits can easily grow by
an order of magnitude.  And these are commits that you have to keep
forever, which means it slows down every subsequent git clone, git gc
operation, git tag --contains search, etc.

So what are the benefits, and what are the costs?  If the benefits
were huge, then perhaps it would be worthwhile.  But if you lose a
huge amount of the value because you are missing the *why* between the
half-dozen to dozen past revisions of the commit, then is it really
worth it to adopt that particular workflow?

It seems to me your argument is contrasting a "replaces" pointer
versus the github PR.  But compared to the Gerrit solution, I don't
think the "replaces" pointer proposal is as robust or as featureful.
Also, please keep in mind that just because it's in core git doesn't
guarantee that Github will support it.  As far as I know github has
zero support notes, for example.

						- Ted

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26 17:49         ` Ævar Arnfjörð Bjarmason
@ 2017-12-26 19:44           ` Carl Baldwin
  2017-12-26 20:19             ` Paul Smith
  0 siblings, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26 19:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Tue, Dec 26, 2017 at 06:49:56PM +0100, Ævar Arnfjörð Bjarmason wrote:
> New headers should be added after existing headers, but other than
> that it won't choke on it. See 4b2bced559 when the encoding header was
> added, this also passes most tests:
> 
>     diff --git a/commit.c b/commit.c
>     index cab8d4455b..cd2bafbaa0 100644
>     --- a/commit.c
>     +++ b/commit.c
>     @@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t msg_len,
>             if (!encoding_is_utf8)
>                     strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
> 
>     +       strbuf_addf(&buffer, "replaces 0000000000000000000000000000000000000000\n");
>     +
>             while (extra) {
>                     add_extra_header(&buffer, extra);
>                     extra = extra->next;
> 
> Only "most" since of course this changes the sha1 of every commit git
> creates from what you get now.
> 
> > Even if core git code does not simply choke on it, I would like push and
> > pull to follow these pointers and transfer the history behind them. I
> > assumed that git would not do this today. I would also like gc to
> > preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> > it doesn't purge it from the history.
> 
> It won't pay any attention to them if "replaces" is something entirely
> new, what I was pointing out in my earlier reply is that you can simply
> *also* create the parent pointers to these no-op merge commits that hide
> away the previous history the "replaces" headers will be referencing.
> 
> The reason to do that is 100% backwards compatibility, and and only
> needing to make minor UI changes to have this feature (to e.g. history
> walking), as opposed to needing to hack everything that now follows
> "parent" or constructs a commit graph.

Thank you for clarifying this. I have learned something.

> Sure, it could be opt in, be a new format etc. But you haven't
> explained why you think a feature like this would need to rely on an
> entirely new parent structure and side-DAG, as opposed to just the
> more minor changes I'm pointing out above, and which I think will give
> you what you need from a UX level.

I have not wrapped my head around it enough to convince myself that it
gives what I'm after. Let me spend a little more time with it to get a
feel for it.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26 19:44           ` Carl Baldwin
@ 2017-12-26 20:19             ` Paul Smith
  2017-12-26 21:07               ` Carl Baldwin
  0 siblings, 1 reply; 44+ messages in thread
From: Paul Smith @ 2017-12-26 20:19 UTC (permalink / raw)
  To: Carl Baldwin, Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

On Tue, 2017-12-26 at 12:44 -0700, Carl Baldwin wrote:
> > Sure, it could be opt in, be a new format etc. But you haven't
> > explained why you think a feature like this would need to rely on
> > an entirely new parent structure and side-DAG, as opposed to just
> > the more minor changes I'm pointing out above, and which I think
> > will give you what you need from a UX level.
> 
> I have not wrapped my head around it enough to convince myself that
> it gives what I'm after. Let me spend a little more time with it to
> get a feel for it.

As someone working in an environment where we do a lot of rebasing and
very little merging, I read these proposals with interest.  I'm not
convinced that we would switch to using a "replaces"-type feature, but
I'm pretty sure that the "null-merge and rebase" trick described
previously would not be something we're interested in using.

Although "git log" doesn't follow these merges (unless requested), all
the graphical tools that are used to display history WOULD show all
those branches.  In a "replaces"-type environment I think the point is
that we would not want to see them (certainly not by default) as they
would be used mainly for deeper spelunking, but since they just seem
like normal merges I don't see any way to turn them off.

If "replaces" was a separate capability then it could be treated
differently by history browsing tools, and shown or not shown as
desired.  For example, a commit that had a "replaces" element could be
selected somehow and you could expand that set of commits that were
replaced, or something like that.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26 18:04     ` Theodore Ts'o
@ 2017-12-26 20:31       ` Carl Baldwin
  2018-01-04 20:06         ` Martin Fick
  0 siblings, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26 20:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Git Mailing List

On Tue, Dec 26, 2017 at 01:04:36PM -0500, Theodore Ts'o wrote:
> On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> > At this point, you might wonder why I'm not proposing to simply add a
> > "change-id" to the commit object. The short answer is that the
> > "change-id" Gerrit uses in the commit messages cannot stand on its own.
> > It depends on data stored on the server which maintains a relationship
> > of commits to a review number and a linear ordering of commits within
> > the review (hopefully I'm not over simplifying this). The "replaces"
> > reference is an attempt to make something which can stand on its own. I
> > don't think we need to solve the problem of where to keep comments at
> > this point.
> 
> I strongly disagree, and one way to see that is by doing a real-life
> experiment.  If you take a look at a gerrit change that, which in my
> experience can have up to ten or twelve revisions, and strip out the
> comments, so all you get to look at it is half-dozen or more
> revisions.  How useful is it *really*?  How does it get used in
> practice?  What development problem does it help to solve?

I didn't mean to imply that we need to get along without the comments. I
was only pointing out that gerrit, github, other code review UIs have
already figured out how to store comments archored to specific revisions
of files in the repository. I'm suggesting that we let them continue to
do that part while we take the first step of specifying how the
intermediate revisions are kept.

If the various code review servers adopted this then we'd have a client
side which could push up revisions for review to any of them. In
addition, they'd all get the collaborative functionality that I
described in my reply to your previous message.

What we get with this proposal is if I push up a review and that review
is changed by someone (maybe even me) outside of my original workspace,
my client gives me the tools to detect it and merge with it. If I try to
push over (clobber) that work then I get an error that the remote cannot
be fast-forwarded and I'm forced to fetch it and merge it.

I get this while using the rebase methodology I've grown to enjoy having
since using gerrit and I end up with a mainline history that looks
exactly the way I want it to.

> And when you say that it is a bug that the Gerrit Change-Id does not
> stand alone, consider that it can also be a *feature*.  If you keep
> all of this in the main repo, the number of commits can easily grow by
> an order of magnitude.  And these are commits that you have to keep
> forever, which means it slows down every subsequent git clone, git gc
> operation, git tag --contains search, etc.

I didn't say it was a bug; just that it is at odds with what I'm hoping
to do.

I agree that the number of commits in the repository will go up.
However, I think there will be ways to mitigate the costs.

The commits are not in the mainline history. So, I wouldn't expect a git
tag --contains or most other commands that traverse history to consider
them at all.

It could be possible to make the default git clone skip them all and
only fetch them on demand for specific changes.

> So what are the benefits, and what are the costs?  If the benefits
> were huge, then perhaps it would be worthwhile.  But if you lose a
> huge amount of the value because you are missing the *why* between the
> half-dozen to dozen past revisions of the commit, then is it really
> worth it to adopt that particular workflow?
> 
> It seems to me your argument is contrasting a "replaces" pointer
> versus the github PR.  But compared to the Gerrit solution, I don't
> think the "replaces" pointer proposal is as robust or as featureful.
> Also, please keep in mind that just because it's in core git doesn't
> guarantee that Github will support it.  As far as I know github has
> zero support notes, for example.

What I propose is that gerrit and github could end up more robust,
featureful, and interoperable if they had this feature to build from.

With gerrit specifically, adopting this feature would make the "change"
concept richer than it is now because it could supersede the change-id
in the commit message and allow a change to evolve in a distributed
non-linear way with protection against clobbering work.

I have no intention to disparage either tool. I love them both. They've
both made my career better in different ways. I know there is no
guarantee that github, gerrit, or any other tool will do anything to
adopt this. But, I'm hoping they are reading this thread and that they
recognize how this feature can make them a little bit better and jump in
and help. I know it is a lot to hope for but I think it could be great
if it happened.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26 20:19             ` Paul Smith
@ 2017-12-26 21:07               ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-26 21:07 UTC (permalink / raw)
  To: Paul Smith; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Tue, Dec 26, 2017 at 03:19:02PM -0500, Paul Smith wrote:
> As someone working in an environment where we do a lot of rebasing and
> very little merging, I read these proposals with interest.  I'm not
> convinced that we would switch to using a "replaces"-type feature, but
> I'm pretty sure that the "null-merge and rebase" trick described
> previously would not be something we're interested in using.

In the near term, maybe. I'm still working with it to be sure I
understand it right.

> Although "git log" doesn't follow these merges (unless requested), all
> the graphical tools that are used to display history WOULD show all
> those branches.  In a "replaces"-type environment I think the point is
> that we would not want to see them (certainly not by default) as they
> would be used mainly for deeper spelunking, but since they just seem
> like normal merges I don't see any way to turn them off.

You've touched on some of my concerns with the null-merge approach. I
want the end result to be as clean as possible which I think is what
lures many to the rebase methodology in the first place.

> If "replaces" was a separate capability then it could be treated
> differently by history browsing tools, and shown or not shown as
> desired.  For example, a commit that had a "replaces" element could be
> selected somehow and you could expand that set of commits that were
> replaced, or something like that.

Exactly!

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  1:28         ` Jacob Keller
@ 2017-12-26 23:30           ` Igor Djordjevic
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Djordjevic @ 2017-12-26 23:30 UTC (permalink / raw)
  To: Jacob Keller, Carl Baldwin
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Git Mailing List

Very interesting topic, just this one part I wanted to comment on:

On 26/12/2017 02:28, Jacob Keller wrote:
> 
> What about some way to take the reflog and turn it into a commit-based
> linkage and export that? Rather than tying it into the individual
> commit history, keep track of it outside the commit, possibly via
> something like notes, or some other mechanism.

This seems like the most useful approach, might be not touching reflog 
per se, but having some kind of "cherry-picked commits source" log 
(where rebasing is a subset of cherry-picking). What Johannes 
mentioned, a mapping between "old" and "new" commits. Might be notes 
could fit in nicely, but I`m not competent to comment on that at the 
moment.

For me, the most interesting use case is not even tied to code review 
(thus no review comments to think about), but a situation where one 
might be rebasing a set of downstream patches on top of updating 
upstream - it might be possible for a bug to slip through due to some 
upstream changes, even where there are no conflicts and test suite is 
executed regularly (might be test reveling the bug is yet to be added).

In that situation, instead of just going back in "regular" history 
(single dimension) and eventually finding the offending (rebased) 
commit (its N-th rebased version, that is), it might be great to 
actually keep drilling down the "rebase history" now (second 
dimension), finding the exact rebase iteration / rebased commit 
version where the error first appeared.

Carl, you described this well in your document[1], and Johannes 
provided a valuable first-hand experience[2] from working around the 
very same native Git limitation for years, mentioning using (fragile, 
costly and not very automatible) rebased commits message search to 
drill down the second dimension (rebase iterations), which seems to 
be the only possible approach at the moment, with "vanilla" Git, at 
least.

So this might be much more interesting case, if code review one is 
less appropriate because of review comments being also relevant to 
commit rebase iterations (which should be then stored somewhere, too, 
relating to corresponding commits, not to lose context).

Regards, Buga

p.s. "Merging rebase" and "shears.sh" script[3] seem to be orthogonal 
to this - really great on their own in improving rebase itself and 
making it smarter and much more powerful and useful, where I guess 
they would benefit from native Git "cherry-picked (rebased) commits 
iterations tracking" (old/source <> new/destination commit mapping), 
too, as would other Git tools.

[1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
[2] https://public-inbox.org/git/20171226040843.h7o6txkrp6zlv7u5@glandium.org/T/#m2e5079488bed2968d4ea52a10051a06c06ff61e0
[3] https://github.com/git-for-windows/build-extra/blob/af9cff5005/shears.sh#L12-L18

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  4:08 ` Mike Hommey
@ 2017-12-27  2:44   ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-27  2:44 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Git Mailing List

On Tue, Dec 26, 2017 at 01:08:45PM +0900, Mike Hommey wrote:
> FWIW, your proposal has a lot in common (but is not quite equivalent)
> to mercurial's obsolescence markers and changeset evolution features.

I've had experience with mercurial but not since about 2009. After
reading up a little bit on this changeset evolution feature, it looks
very much like what I'm proposing. Obsolescence markers look a lot like
replaces references except, as illustrated by this blog [1], they point
the other way! Hence, the illustrations confused me for a moment. It
seems more natural to embed the reference in the new commit pointing at
the old. That said, the illustrated direction of the arrows doesn't
really affect the usefulness of the idea.

His third example (#3-working-with-other-people), appears to be the kind
of collaboration that I'm trying to describe here. To quote the blog:

  In git or vanilla (no extension) mercurial, you would have to figure
  out that b’ and b” are two new versions of b and merge them. Changeset
  evolution detects that situation, marks b’ and b” as being divergent.
  It then suggests automatic resolution with a merge and preserves
  history.

This is the kind of thing that I had to deal with manually in gerrit. I
hadn't seen this feature in mercurial but I'm glad to know now there is
a precedent for it.

Carl

[1] https://blog.laurentcharignon.com/post/2016-02-02-changeset-evolution/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-25  3:52 ` Theodore Ts'o
  2017-12-26  1:16   ` Carl Baldwin
@ 2017-12-27  4:35   ` Carl Baldwin
  2017-12-27 13:35     ` Alexei Lozovsky
  1 sibling, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2017-12-27  4:35 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Git Mailing List

On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
> Here's another potential use case.  The stable kernels (e.g., 3.18.y,
> 4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
> and this is handled by putting in the commit body something like this:
> 
>     [ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

I think replaces could apply to cherry picks like this too. The more I
think about it, I actually think that replaces isn't a bad name for it
in the cherry pick context. When you cherry pick a commit, you create a
new commit that is derived from it and stands in for or replaces it in
the new context. It is a stretch but I don't think it is that bad.

You can tell that it is a cherry pick because the referenced commit's
history is not reachable in the current context.

Though we could consider some different names like "derivedfrom",
"obsoletes", "succeeds", "supersedes", "supplants"

> ----
> 
> And here's yet another use case.  For internal Google kernel
> development, we maintain a kernel that has a large number of patches
> on top of a kernel version.  When we backport an upstream fix (say,
> one that first appeared in the 4.12 version of the upstream kernel),
> we include a line in the commit body that looks like this:
> 
> Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5
> 
> This is useful, because when we switch to use a newer upstream kernel,
> we need make sure we can account for all patches that were built on
> top of the 3xx kernel (which might have been using 4.10, for the sake
> of argument), to the 4xx kernel series (which might be using 4.15 ---
> the version numbers have been changed to protect the innocent).  This
> means going through each and every patch that was on top of the 3xx
> kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
> that it will already be included in a 4.15 based kernel, so we don't
> need to worry about carrying that patch forward.

Are 3xx and 4xx internal version numbers? If I understand correctly, in
your example, 3xx is the heavily patched internal kernel based on 4.10
and 4xx is the internal patched version of 4.15. I think I'm following
so far.

Let's say that you used a "replaces" reference instead of your
"Upstream-4.12-SHA1" reference. The only piece of metadata that is
missing is the "4.12" of your string. However, you could replicate this
with some set arithmetic. If the sha1 referred to by "replaces" exists
in the set of commits reachable from 4.15 then you've answered the same
question.

> In other cases, we might decide that the patch is no longer needed.
> It could be because the patch has already be included upstream, in
> which case we might check in a commit with an empty patch body, but
> whose header contains something like this in the 4xx kernel:
> 
> Origin-3xx-SHA1: fe546bdfc46a92255ebbaa908dc3a942bc422faa
> Upstream-Dropped-4.11-SHA1: d90dc0ae7c264735bfc5ac354c44ce2e

So, the first reference is the old commit that patched the 3xx series?
What is the second reference? What is "4.11" indicating? Is that the
patch that was included in the upstream kernel that obsoleted your 3xx
patch?

If I understood that correctly. You could use a "replaces" reference for
the first line and the second line would still have to be included as a
separate header in your commit message? Does this mean "replaces" is not
useful in your case? I don't think so.

> Or we could decide that the commit is no longer no longer needed ---

no longer no longer needed? Is this a double negative indicating that it
is needed again? Or, is it a mistake?

> perhaps because the relevant subsystem was completely rewritten and
> the functionality was added in a different way.  Then we might have
> just have an empty commit with an explanation of why the commit is no
> longer needed and the commit body would have the metadata:
> 
> Origin-Dropped-3xx-SHA1: 26f49fcbb45e4bc18ad5b52dc93c3afe

The metadata in this reference indicates that it was dropped since 3xx.
Doesn't the empty body (and maybe a commit message saying dropping a
patch) indicate this if a "references" pointer were used instead? The
3xx part of the metadata could be derived again by set arithmetic.

> Or perhaps the commit is still needed, and for various reasons the
> commit was never upstreamed; perhaps because it's only useful for
> Google-specific hardware, or the patch was rejected upstream.  The we
> will have a cherry-pick that would include in the body:
> 
> Origin-3xx-SHA1: 8f3b6df74b9b4ec3ab615effb984c1b5

Replaces reference and set arithmetic.

> (Note: all commits that are added in the rebase workflow, even the
> empty commits that just have the Origin-Dropped-3xx-SHA1 or
> Upstream-Droped-4.11-SHA1 headers, are patch reviewed through Gerrit,
> so we have an audited, second-engineer review to make sure each commit
> in the 3xx kernel that Google had been carrying had the correct
> disposition when rebasing to the 4xx kernel.)

This is great! I designed a strikingly similar workflow for local
patches to Openstack Neutron about four years ago. Each time we moved
forward to a new version of upstream, we went through a very similar
process. I don't have access to those scripts any longer but here is
what I recall.

With each now upstream version, we'd generate a list of commits we
created locally using git. I recall it being a fairly simple set
difference between the upstream tag and our downstream tag. I wrote
scripts that would take each of them and proposed them as new gerrit
reviews against the new upstream. I made sure to keep the same gerrit
change-id as the old one in the new review. Gerrit allowed this because
it was a new branch for each now revision and gerrit allows the same
change-id in different reviews as long as the branch is different. I'm
not sure if you kept the same change-id or not but it was very useful to
me. I could see the history of the patch applied to various upstream
versions with the click of a link in gerrit.

My automated script would cherry pick but wouldn't attempt to resolve
any conflicts. Instead, it would commit the conflict markers exactly as
they occurred and flag the review as conflicted. A human would have to
come along and recreate the cherry pick in their own workspace to
resolve the conflicts. Then they'd post the result to the same gerrit
review as the second patch set. This way, the resolution of the
conflicts could easily be reviewed in the tool.

We'd run our CI against each and every one also to be sure that we
weren't breaking it.

Sometimes, patches were rendered obsolete by something upstream. In this
case, we would close the gerrit review without merging indicating that
the patch was dropped and the reason.

For patches that we proposed upstream, we'd use the same gerrit id in
the upstream review. This helped us tie them together and identify them
as equivalent patches.

When all of the gerrit reviews for all of the patches were merged, we'd
merge the result to master. I recall doing something special for this
final merge but I don't recall exactly what it was. Maybe it was to use
the "theirs" strategy or something like that.

I remember having a script called "delinearize" which would actually
find the minimum chain of preceding patches that had to come before a
given one in order for it to rebase cleanly to the upstream base. Most
of the time, the patches rebased cleanly to the upstream. This meant
that they really didn't depend on any of the patches that came before
them. This was very useful because it allowed us humans to switch
between a bunch of mostly independent gerrit reviews for the newly
cherry-picked patches and do a lot of things in parallel. We could merge
them in any order as long as they went through the entire review
process.

> The point is that for this much more complex, real-world workflow, we
> need much *more* metadata than a simple "Replaces" metadata.  (And we
> also have other metadata --- for example, we have a "Tested: " trailer
> that explains how to test the commit, or which unit test can and
> should be used to test this commit, combined with a link to the test
> log in our automated unit tester that has the test run, and a
> "Rebase-Tested-4xx: " trailer that might just have the URL to the test
> log when the commit was rebased since the testing instructions in the
> Tested: trailer is still relevant.)

So far, I haven't seen that it is that much more complex. I've actually
had experience doing practically the same thing. Yes, it was a complex
process but we didn't need much more than the gerrit-id in the reviews
and an external dashboard listing the patches out with a link to each
review. Even the dashboard was pretty much obsolete once the whole
process was declared done for a given upstream revision.

Your "Tested" trailer sounds completely orthogonal. I'm not sure that
showing a need for other orthogonal metadata is necessarily a good
argument against my proposal. It doesn't seem relevant.

> And since this metadata is not really needed by the core git
> machinery, we just use text trailers in the commit body; it's not hard
> to write code which parses this out of the git commit.
> 
> > Various git commands will have to learn how to handle this kind of
> > history. For example, things like fetch, push, gc, and others that
> > move history around and clean out orphaned history should treat
> > anything reachable through `replaces` pointers as precious. Log and
> > related history commands may need new switches to traverse the history
> > differently in different situations.
> 
> I'd encourage you to think very hard about how exactly "git log" and
> "gitk" might actually deal with these links.  In the Google kernel
> development use cases, we use different repos for the 3xx and 4xx
> kernels.  It would be possible to make hot links for the
> Original-3xx-SHA1: trailers, but you couldn't do it using gitk.  It
> would actually have to be a completely new tool.  (And we do have new
> tools, most especially a dashboard so we can keep track of how many
> commits in the 3xx kernel still have to be rebased to the 4xx kernel,
> or can be confirmed to be in the upstream kernel, or can be confirmed
> to be dropped.  We have a *large* number of patches that we carry, so
> it's a multi-month effort involving a large number of engineers
> working together to do a kernel rebase operation from a 4.x upstream
> kernel to a 4.y upstream kernel.  So having a dashboard is useful
> because we can see whether a particular subsystem team is ahead or
> behind the curve in terms of handling those commits which are their
> responsibility.)
> 
> My experience, from seeing these much more complex use cases ---
> starting with something as simple as the Linux Kernel Stable Kernel
> Series, and extending to something much more complex such as the
> workflow that is used to support a Google Kernel Rebase, is that using
> just a simple extra "Replaces" pointer in the commit header is not
> nearly expressive enough.  And, if you make it a core part of the
> commit data structure, there are all sorts of compatibility headaches
> with older versions of git that wouldn't know about it.  And if it

The more I think about this, the less I worry. Be sure that you're using 

> then turns out it's not sufficient more the more complex workflows
> *anyway*, maybe adding a new "replace" pointer in the core git data
> structures isn't worth it.  It might be that just keeping such things
> as trailers in the commit body might be the better way to go.

It doesn't need to be everything to everyone to be useful. I hope to
show in this thread that it is useful enough to be a compelling addition
to git. I think I've also shown that it could be used as a part of your
more complex workflow. Maybe even a bigger part of it than you had
realized.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-27  4:35   ` Carl Baldwin
@ 2017-12-27 13:35     ` Alexei Lozovsky
  2017-12-28  5:23       ` Carl Baldwin
  0 siblings, 1 reply; 44+ messages in thread
From: Alexei Lozovsky @ 2017-12-27 13:35 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Theodore Ts'o, Git Mailing List

On Dec 27, 2017, at 06:35, Carl Baldwin <carl@ecbaldwin.net> wrote:
> 
> On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
>> 
>> My experience, from seeing these much more complex use cases ---
>> starting with something as simple as the Linux Kernel Stable Kernel
>> Series, and extending to something much more complex such as the
>> workflow that is used to support a Google Kernel Rebase, is that using
>> just a simple extra "Replaces" pointer in the commit header is not
>> nearly expressive enough.  And, if you make it a core part of the
>> commit data structure, there are all sorts of compatibility headaches
>> with older versions of git that wouldn't know about it.  And if it
> 
> The more I think about this, the less I worry. Be sure that you're using 
> 
>> then turns out it's not sufficient more the more complex workflows
>> *anyway*, maybe adding a new "replace" pointer in the core git data
>> structures isn't worth it.  It might be that just keeping such things
>> as trailers in the commit body might be the better way to go.
> 
> It doesn't need to be everything to everyone to be useful. I hope to
> show in this thread that it is useful enough to be a compelling addition
> to git. I think I've also shown that it could be used as a part of your
> more complex workflow. Maybe even a bigger part of it than you had
> realized.

I think the reasoning behind Theo's words is that it would be better to
first implement the commit relationship tracking as an add-in which uses
commit messages for data storage, then evaluate its usefulness when it's
actually available (including extensions to gitk and stuff to support the
new metadata), and then it could be moved into core git data structures,
when it has proven itself useful. It's not a trivial feature which warrants
immediate addition to git and its design can change when faced with real-
world use-cases, so it would be bad for compatibility to rush its addition.
Storage location for metadata seems to be an implementation detail which
could be technically changed more or less easily. But it's much easier to
ignore a trailer in commit message in the favor of a commit header field
than to replace a deprecated commit header field with a better one, which
could cause massive headache for all git repositories in the world.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-27 13:35     ` Alexei Lozovsky
@ 2017-12-28  5:23       ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2017-12-28  5:23 UTC (permalink / raw)
  To: Alexei Lozovsky; +Cc: Theodore Ts'o, Git Mailing List

On Wed, Dec 27, 2017 at 03:35:58PM +0200, Alexei Lozovsky wrote:
> I think the reasoning behind Theo's words is that it would be better
> to first implement the commit relationship tracking as an add-in which
> uses commit messages for data storage, then evaluate its usefulness
> when it's actually available (including extensions to gitk and stuff
> to support the new metadata), and then it could be moved into core git
> data structures, when it has proven itself useful. It's not a trivial
> feature which warrants immediate addition to git and its design can
> change when faced with real- world use-cases, so it would be bad for
> compatibility to rush its addition. Storage location for metadata
> seems to be an implementation detail which could be technically
> changed more or less easily. But it's much easier to ignore a trailer
> in commit message in the favor of a commit header field than to
> replace a deprecated commit header field with a better one, which
> could cause massive headache for all git repositories in the world.

Yeah, this is a point that everyone is eager to make instead of really
trying to understand what I'm trying to do and offering constructive
suggestions. It's not that I'm not listening. I'm not really concerned
about headers vs trailers or the asthetics of the whole thing as much as
I'm concerned about how the server / client interaction will be. I worry
that anything that I come up with that isn't implemented in the regular
git core push and fetch will end up being awkward or end up needing to
reimplement a lot of what's already in git. But, maybe it just needs a
little more thought. Let me try to think through it...

Imagine John posts a new change up for review to a review server. The
current master points at commit A and so he grabs it and drafts his
first proposal, B1.

    digraph history {
        B1 -> A
    }

Soon after posting, he notices a couple of simple errors and uses the
web UI to correct them. This creates B2. (Dashed edges are replaces
references).

    digraph history {
        B1 -> A
        B2 -> A
        B2 -> B1 [ style="dashed"; ]
    }

Anna reviews B2 and finds a small nit. She asks John if she can just fix
it and push up a new review. He agrees. She pushes up B3.

    digraph history {
        B1 -> A
        B2 -> A
        B3 -> A
        B2 -> B1 [ style="dashed"; ]
        B3 -> B2 [ style="dashed"; ]
    }

John goes back to his workspace and does a little more work on B. He
creates the fourth revision, B4 but since he didn't update his workspace
with the other two most recent revisions, his new revision is derived
from B1.

    digraph history {
        B1 -> A
        B2 -> A
        B3 -> A
        B4 -> A
        B2 -> B1 [ style="dashed"; ]
        B3 -> B2 [ style="dashed"; ]
        B4 -> B1 [ style="dashed"; ]
    }

John then pushes to the server. I imagined that would be a command
similar to what gerrit does.

    git push codereview refs/for/master

At this point, I want a couple of things to happen. First, the server
should be able to match the new revision to the change by following the
replaces references to the commits it already has. Then it should
recognize that this is not a fast forward update to the change and
reject it on those grounds.

After that, John needs to be able to fetch B2 and B3 so that his local
client can perform a merge. I guess John needs to know what change he's
trying to fetch. In this case, he needs to fetch both B2 and B3 in order
get the full history graph of the change. The problem I see here is that
today's git fetch would see B2 and B3 as unrelated branches. There could
be any number of them to fetch. So, how does he ask for everything
related to the change? Does he do a wild card or something?

    git fetch codereview refs/changes/123/*

Or does he just fetch all refs (this could be many on a busy review
server)? Or do we need to do something out of band to discover the list
of references that need to be fetched?

I've been thinking out loud a bit. I guess this could be a path forward.
I guess to make gc happy, I've got to keep around a ref pointing at each
new revision so that it doesn't get garbage collected.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-24 14:13       ` Alexei Lozovsky
@ 2018-01-04 15:44         ` Johannes Schindelin
  0 siblings, 0 replies; 44+ messages in thread
From: Johannes Schindelin @ 2018-01-04 15:44 UTC (permalink / raw)
  To: Alexei Lozovsky
  Cc: Carl Baldwin, Ævar Arnfjörð Bjarmason,
	Git Mailing List

Hi,

On Sun, 24 Dec 2017, Alexei Lozovsky wrote:

> On Dec 24, 2017, at 01:01, Johannes Schindelin wrote:
> > 
> > Hi Carl,
> > 
> > On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > 
> >> I imagine that a "git commit --amend" would also insert a "replaces"
> >> reference to the original commit but I failed to mention that in my
> >> original post.
> > 
> > And cherry-pick, too, of course.
> 
> Why would it?

Because that's the command you use if you perform an interactive rebase
"manually". Or if you need to split a topic branch into two.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  8:40         ` Jacob Keller
@ 2018-01-04 19:19           ` Martin Fick
  2018-01-05  0:31             ` Martin Fick
  2018-01-05  5:09             ` Carl Baldwin
  0 siblings, 2 replies; 44+ messages in thread
From: Martin Fick @ 2018-01-04 19:19 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Carl Baldwin, Theodore Ts'o, Git Mailing List

On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller 
wrote:
> On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin 
<carl@ecbaldwin.net> wrote:
> >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin 
<carl@ecbaldwin.net> wrote:
> >> A bit of a tangent here, but a thought I didn't wanna
> >> lose: In the general case where a patch was rebased
> >> and the original parent pointer was changed, it is
> >> actually quite hard to show a diff of what changed
> >> between versions.
> 
> My biggest gripes are that the gerrit web interface
> doesn't itself do something like this (and jgit does not
> appear to be able to generate combined diffs at all!)

I believe it now does, a presentation was given at the 
Gerrit User summit in London describing this work.  It would 
indeed be great if git could do this also!

-Martin 



-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-23 23:01     ` Johannes Schindelin
  2017-12-24 14:13       ` Alexei Lozovsky
  2017-12-25 23:43       ` Carl Baldwin
@ 2018-01-04 19:49       ` Martin Fick
  2 siblings, 0 replies; 44+ messages in thread
From: Martin Fick @ 2018-01-04 19:49 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Carl Baldwin, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sunday, December 24, 2017 12:01:38 AM Johannes Schindelin 
wrote:
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > I imagine that a "git commit --amend" would also insert
> > a "replaces" reference to the original commit but I
> > failed to mention that in my original post.
> 
> And cherry-pick, too, of course.
> 
> Both of these examples hint at a rather huge urge of some
> users to turn this feature off because the referenced
> commits may very well be throw-away commits in their
> case, making the newly-recorded information completely
> undesired.
> 
> Example: I am working on a topic branch. In the middle, I
> see a typo. I commit a fix, continue to work on the topic
> branch. Later, I cherry-pick that commit to a separate
> topic branch because I really don't think that those two
> topics are related. Now I definitely do not want a
> reference of the cherry-picked commit to the original
> one: the latter will never be pushed to a public
> repository, and gc'ed in a few weeks.
> 
> Of course, that is only my wish, other users in similar
> situations may want that information. Demonstrating that
> you would be better served with an opt-in feature that
> uses notes rather than a baked-in commit header.

I think what you are highlighting is not when to track this, 
but rather when to share this tracking.  In my local repo, I 
would definitely want to know that I cherry-picked this from 
elsewhere, it helps me understand what I have done later 
when I look back at old commits and branches that need to 
potentially be thrown away.  But I agree you may not want to 
share these publicly.

I am not sure what the right formula is, for when to share 
these pointers publicly, but it seems like it might be that 
whenever you push something, it should push along any 
references to amended commits that were publicly available 
already.  I am not sure how to track that, but I suspect it 
is a subset of the union of commits you have fetched, and 
commits you have pushed (i.e. you got it from elsewhere, or 
you created it and already shared it with others)?  Maybe it 
should also include any commits reachable by advertisements 
to places you are pushing to (in case it got shared some 
other way)?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26  1:16   ` Carl Baldwin
  2017-12-26  1:47     ` Jacob Keller
  2017-12-26 18:04     ` Theodore Ts'o
@ 2018-01-04 19:54     ` Martin Fick
  2018-01-05  4:08       ` Carl Baldwin
  2018-01-05 20:14       ` Junio C Hamano
  2 siblings, 2 replies; 44+ messages in thread
From: Martin Fick @ 2018-01-04 19:54 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Theodore Ts'o, Git Mailing List

On Monday, December 25, 2017 06:16:40 PM Carl Baldwin wrote:
> On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o 
wrote:
> Look at what happens in a rebase type workflow in any of
> the following scenarios. All of these came up regularly
> in my time with Gerrit.
> 
>     1. Make a quick edit through the web UI then later
> work on the change again in your local clone. It is easy
> to forget to pull down the change made through the UI
> before starting to work on it again. If that happens, the
> change made through the UI will almost certainly be
> clobbered.
> 
>     2. You or someone else creates a second change that is
> dependent on yours and works on it while yours is still
> evolving. If the second change gets rebased with an older
> copy of the base change and then posted back up for
> review, newer work in the base change has just been
> clobbered.
> 
>     3. As a reviewer, you decide the best way to explain
> how you'd like to see something done differently is to
> make the quick change yourself and push it up. If the
> author fails to fetch what you pushed before continuing
> onto something else, it gets clobbered.
> 
>     4. You want to collaborate on a single change with
> someone else in any way and for whatever reason. As soon
> as that change starts hitting multiple work spaces, there
> are synchronization issues that currently take careful
> manual intervention.

These scenarios seem to come up most for me at Gerrit hack-
a-thons where we collaborate a lot in short time spans on 
changes.  We (the Gerrit maintainers) too have wanted and 
sometimes discussed ways to track the relation of "amended" 
commits (which is generally what Gerrit patchsets are).  We 
also concluded that some sort of parent commit pointer was 
needed, although parent is somewhat the wrong term since 
that already means something in git.  Rather, maybe some 
"predecessor" type of term would be better, maybe 
"antecedent", but "amended-commit" pointer might be best?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2017-12-26 20:31       ` Carl Baldwin
@ 2018-01-04 20:06         ` Martin Fick
  2018-01-05  5:06           ` Carl Baldwin
  0 siblings, 1 reply; 44+ messages in thread
From: Martin Fick @ 2018-01-04 20:06 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Theodore Ts'o, Git Mailing List

On Tuesday, December 26, 2017 01:31:55 PM Carl Baldwin 
wrote:
...
> What I propose is that gerrit and github could end up more
> robust, featureful, and interoperable if they had this
> feature to build from.

I agree (assuming we come up with a well defined feature)

> With gerrit specifically, adopting this feature would make
> the "change" concept richer than it is now because it
> could supersede the change-id in the commit message and
> allow a change to evolve in a distributed non-linear way
> with protection against clobbering work.

We (the Gerrit maintainers) would like changes to be able to 
evolve non-linearly so that we can eventually support 
distributed Gerrit reviews, and the amended-commit pointer 
is one way I have thought to resolve this.

> I have no intention to disparage either tool. I love them
> both. They've both made my career better in different
> ways. I know there is no guarantee that github, gerrit,
> or any other tool will do anything to adopt this. But,
> I'm hoping they are reading this thread and that they
> recognize how this feature can make them a little bit
> better and jump in and help. I know it is a lot to hope
> for but I think it could be great if it happened.

We (the Gerrit maintainers) do recognize it, and I am glad 
that someone is pushing for solutions in this space.  I am 
not sure what the right solution is, and how to modify 
workflows to deal better with this.  I do think that starting 
by making your local repo track pointers to amended-commits, 
likely with various git hooks and notes (as also proposed by 
Johannes Schindelin), would be a good start.   With that in 
place, then you can attack various specific workflows.

If you want to then attack the Gerrit workflow, it would be 
good if you could prevent pushing new patchests that are 
amended versions of patchsets that are out of date.  While 
it would be great if Gerrit could reject such pushes, I 
wonder if to start, git could detect and it prevent the push 
in this situation?  Could a git push hook analyze the ref 
advertisements and figure this out (all the patchsets are in 
the advertisement)?  Can a git hook look at the ref 
advertisement?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-04 19:19           ` Martin Fick
@ 2018-01-05  0:31             ` Martin Fick
  2018-01-05  5:09             ` Carl Baldwin
  1 sibling, 0 replies; 44+ messages in thread
From: Martin Fick @ 2018-01-05  0:31 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Carl Baldwin, Theodore Ts'o, Git Mailing List

> On Jan 4, 2018 11:19 AM, "Martin Fick" 
<mfick@codeaurora.org> wrote:
> > On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller
> > 
> > wrote:
> > > On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin
> > 
> > <carl@ecbaldwin.net> wrote:
> > > >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin
> > 
> > <carl@ecbaldwin.net> wrote:
> > > >> A bit of a tangent here, but a thought I didn't
> > > >> wanna
> > > >> lose: In the general case where a patch was rebased
> > > >> and the original parent pointer was changed, it is
> > > >> actually quite hard to show a diff of what changed
> > > >> between versions.
> > > 
> > > My biggest gripes are that the gerrit web interface
> > > doesn't itself do something like this (and jgit does
> > > not
> > > appear to be able to generate combined diffs at all!)
> > 
> > I believe it now does, a presentation was given at the
> > Gerrit User summit in London describing this work.  It
> > would indeed be great if git could do this also!


On Thursday, January 04, 2018 04:02:40 PM Jacob Keller 
wrote:
> Any chance slides or a recording was posted anywhere? I'm
> quite interested in this topic.

Slides and video + transcript here:

https://gerrit.googlesource.com/summit/2017/+/master/sessions/new-in-2.15.md

Watch the part after the backend improvements,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-04 19:54     ` Martin Fick
@ 2018-01-05  4:08       ` Carl Baldwin
  2018-01-05 20:14       ` Junio C Hamano
  1 sibling, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2018-01-05  4:08 UTC (permalink / raw)
  To: Martin Fick; +Cc: Theodore Ts'o, Git Mailing List

On Thu, Jan 04, 2018 at 12:54:00PM -0700, Martin Fick wrote:
> On Monday, December 25, 2017 06:16:40 PM Carl Baldwin wrote:
> > On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o 
> wrote:
> > Look at what happens in a rebase type workflow in any of
> > the following scenarios. All of these came up regularly
> > in my time with Gerrit.
> > 
> >     1. Make a quick edit through the web UI then later
> > work on the change again in your local clone. It is easy
> > to forget to pull down the change made through the UI
> > before starting to work on it again. If that happens, the
> > change made through the UI will almost certainly be
> > clobbered.
> > 
> >     2. You or someone else creates a second change that is
> > dependent on yours and works on it while yours is still
> > evolving. If the second change gets rebased with an older
> > copy of the base change and then posted back up for
> > review, newer work in the base change has just been
> > clobbered.
> > 
> >     3. As a reviewer, you decide the best way to explain
> > how you'd like to see something done differently is to
> > make the quick change yourself and push it up. If the
> > author fails to fetch what you pushed before continuing
> > onto something else, it gets clobbered.
> > 
> >     4. You want to collaborate on a single change with
> > someone else in any way and for whatever reason. As soon
> > as that change starts hitting multiple work spaces, there
> > are synchronization issues that currently take careful
> > manual intervention.
> 
> These scenarios seem to come up most for me at Gerrit hack-
> a-thons where we collaborate a lot in short time spans on 
> changes.  We (the Gerrit maintainers) too have wanted and 
> sometimes discussed ways to track the relation of "amended" 
> commits (which is generally what Gerrit patchsets are).  We 
> also concluded that some sort of parent commit pointer was 
> needed, although parent is somewhat the wrong term since 
> that already means something in git.  Rather, maybe some 
> "predecessor" type of term would be better, maybe 
> "antecedent", but "amended-commit" pointer might be best?

I like "replaces" as I have proposed or "supersedes". "predecessor" also
seems pretty good. I may add that to my list of favorites.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-04 20:06         ` Martin Fick
@ 2018-01-05  5:06           ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2018-01-05  5:06 UTC (permalink / raw)
  To: Martin Fick; +Cc: Theodore Ts'o, Git Mailing List

On Thu, Jan 04, 2018 at 01:06:27PM -0700, Martin Fick wrote:
> On Tuesday, December 26, 2017 01:31:55 PM Carl Baldwin 
> wrote:
> ...
> > What I propose is that gerrit and github could end up more
> > robust, featureful, and interoperable if they had this
> > feature to build from.
> 
> I agree (assuming we come up with a well defined feature)
> 
> > With gerrit specifically, adopting this feature would make
> > the "change" concept richer than it is now because it
> > could supersede the change-id in the commit message and
> > allow a change to evolve in a distributed non-linear way
> > with protection against clobbering work.
> 
> We (the Gerrit maintainers) would like changes to be able to 
> evolve non-linearly so that we can eventually support 
> distributed Gerrit reviews, and the amended-commit pointer 
> is one way I have thought to resolve this.

I really think that keeping these references is the key to doing this.

> > I have no intention to disparage either tool. I love them
> > both. They've both made my career better in different
> > ways. I know there is no guarantee that github, gerrit,
> > or any other tool will do anything to adopt this. But,
> > I'm hoping they are reading this thread and that they
> > recognize how this feature can make them a little bit
> > better and jump in and help. I know it is a lot to hope
> > for but I think it could be great if it happened.
> 
> We (the Gerrit maintainers) do recognize it, and I am glad 
> that someone is pushing for solutions in this space.  I am 
> not sure what the right solution is, and how to modify 
> workflows to deal better with this.  I do think that starting 
> by making your local repo track pointers to amended-commits, 
> likely with various git hooks and notes (as also proposed by 
> Johannes Schindelin), would be a good start.   With that in 
> place, then you can attack various specific workflows.

I have started a prototype that I will use to demonstrate this. I hope
to have something in a couple of weeks. I do have a day job also, so it
will be slow going. One idea that I had was to put my own server with
special hooks in it in front of gerrit to illustrate how collaboration
on a gerrit change, or even a chain of them, can be made safe. It would
act as a middle man between my client and the gerrit server. I'd just
have to change remote reference on my client to demonstrate.

> If you want to then attack the Gerrit workflow, it would be 
> good if you could prevent pushing new patchests that are 
> amended versions of patchsets that are out of date.  While 
> it would be great if Gerrit could reject such pushes, I 
> wonder if to start, git could detect and it prevent the push 
> in this situation?  Could a git push hook analyze the ref 
> advertisements and figure this out (all the patchsets are in 
> the advertisement)?  Can a git hook look at the ref 
> advertisement?

I'll think about this. At the least, the hook would have to look at the
server to see if there are new revisions. It would be difficult to close
race conditions that occur because the client will always be using
potentially out of date information even if it just went and pulled down
the latest stuff. I think I still like my middle man idea better as a
short term proof of concept.

Preventing pushing amended/rebased versions of out of date changes is
simple. Follow the "predecessor" references until you hit a known
commit. If that commit is the latest revision of the change then it is
up to date. If that commit not the latest revision, then it is out of
date. Reject it. This is what I plan to illustrate in my middle man
server.

If you traverse the entire graph of predecessors without finding a known
commit, then you have a new change. (In fact, the changeset id in the
commit message in a gerrit change seems unnecessary at this point). It
gets a little more complicated when you think about combining/squashing
changes (resulting in two or more "predecessor" references from a single
commit) or dividing a change into multiple but it works.

The harder part is the push/pull interaction between client and server.
When you go to push your amended update to a patchset, you want git to
send along any other new commits to complete the predecessor graph on
the server side. For example, you might rebase your commit and then
amend it to fix something. Personally, I'd like the rebase and the amend
to both be kept separately.

Similarly, when you've just had a push rejected because you're out of
date, you want to be able to easily pull down the commits you're missing
so that you can merge locally and try to push again.

You also don't want gc to garbage collect the intermediate commits. I
think gerrit uses many references internally in the git repo to "pin"
older revisions in the repository so that they don't appear orphaned. I
think I'm going to have to do something similar in my prototype.

If you think about it, this is all very much like what git already does
with its commit history and branches. If you stick to a strict
branch/merge model and don't rewrite commits, then it is unnecessary.
However, for those that do rewrite commits (such as anyone using the
gerrit workflow), this is a way to bring that power to them.

I'd like to point out the RECOVERING FROM UPSTREAM REBASE section of the
git-rebase man page. If we have the graph of "predecessor" references
for a change, it could be used to automatically recover from the cases
described in this section much like regular branching and merging.
Rewriting changes would no longer be something to consider "a bad idea"
for these reasons.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-04 19:19           ` Martin Fick
  2018-01-05  0:31             ` Martin Fick
@ 2018-01-05  5:09             ` Carl Baldwin
  2018-01-05  5:20               ` Carl Baldwin
  1 sibling, 1 reply; 44+ messages in thread
From: Carl Baldwin @ 2018-01-05  5:09 UTC (permalink / raw)
  To: Martin Fick; +Cc: Jacob Keller, Theodore Ts'o, Git Mailing List

On Thu, Jan 04, 2018 at 12:19:34PM -0700, Martin Fick wrote:
> On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller 
> wrote:
> > On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin 
> <carl@ecbaldwin.net> wrote:
> > >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin 
> <carl@ecbaldwin.net> wrote:
> > >> A bit of a tangent here, but a thought I didn't wanna
> > >> lose: In the general case where a patch was rebased
> > >> and the original parent pointer was changed, it is
> > >> actually quite hard to show a diff of what changed
> > >> between versions.
> > 
> > My biggest gripes are that the gerrit web interface
> > doesn't itself do something like this (and jgit does not
> > appear to be able to generate combined diffs at all!)
> 
> I believe it now does, a presentation was given at the 
> Gerrit User summit in London describing this work.  It would 
> indeed be great if git could do this also!

This would be very cool. I've wanted to tackle this for a long time. I
think I even filed an issue with gerrit about this years ago.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-05  5:09             ` Carl Baldwin
@ 2018-01-05  5:20               ` Carl Baldwin
  0 siblings, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2018-01-05  5:20 UTC (permalink / raw)
  To: Martin Fick; +Cc: Jacob Keller, Theodore Ts'o, Git Mailing List

On Thu, Jan 04, 2018 at 10:09:19PM -0700, Carl Baldwin wrote:
> This would be very cool. I've wanted to tackle this for a long time. I
> think I even filed an issue with gerrit about this years ago.

Yep, it turned out that it was a duplicate but I described what I did to
work around it.

https://bugs.chromium.org/p/gerrit/issues/detail?id=2375

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-04 19:54     ` Martin Fick
  2018-01-05  4:08       ` Carl Baldwin
@ 2018-01-05 20:14       ` Junio C Hamano
  2018-01-06 17:29         ` Carl Baldwin
  1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2018-01-05 20:14 UTC (permalink / raw)
  To: Martin Fick; +Cc: Carl Baldwin, Theodore Ts'o, Git Mailing List

Martin Fick <mfick@codeaurora.org> writes:

> These scenarios seem to come up most for me at Gerrit hack-
> a-thons where we collaborate a lot in short time spans on 
> changes.  We (the Gerrit maintainers) too have wanted and 
> sometimes discussed ways to track the relation of "amended" 
> commits (which is generally what Gerrit patchsets are).  We 
> also concluded that some sort of parent commit pointer was 
> needed, although parent is somewhat the wrong term since 
> that already means something in git.  Rather, maybe some 
> "predecessor" type of term would be better, maybe 
> "antecedent", but "amended-commit" pointer might be best?

In general, I agree that you would want richer set of "relationship"
than mere "predecessor" or "related", but I do not think "amended"
is sufficient.  I certainly do not think a "pointer" embedded in a
commit object is a good idea, either (a new commit object header is
out of question, but I doubt it is a good idea to make a pointer
back to an existing commit as a part of the log message).

You may used to have a set of n-patches A1, A2, ..., An, that turned
into m-patches X1, X2, ..., Xm, after refactoring.  During the work,
it may turned out that some things the original tried to do are not
sensible and dropped, while some other things are added in the final.
series.  

When n==m==1, "amended" pointer from X1 to A1 may allow you to
answer "Is this the first attempt?  If this is refined, what did the
earlier one look like?" when given X1, but you would also want to
answer a related question "This was a good start, but did the effort
result in a refined patch, and if so what is it?" when given A1, and
"amended" pointer won't help at all.  Needless to say, the "pointer"
approach breaks down when !(n==m==1).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-05 20:14       ` Junio C Hamano
@ 2018-01-06 17:29         ` Carl Baldwin
  2018-01-06 17:32           ` Carl Baldwin
  2018-01-06 21:38           ` Theodore Ts'o
  0 siblings, 2 replies; 44+ messages in thread
From: Carl Baldwin @ 2018-01-06 17:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Fick, Theodore Ts'o, Git Mailing List

On Fri, Jan 05, 2018 at 12:14:28PM -0800, Junio C Hamano wrote:
> Martin Fick <mfick@codeaurora.org> writes:
> 
> > These scenarios seem to come up most for me at Gerrit hack-
> > a-thons where we collaborate a lot in short time spans on 
> > changes.  We (the Gerrit maintainers) too have wanted and 
> > sometimes discussed ways to track the relation of "amended" 
> > commits (which is generally what Gerrit patchsets are).  We 
> > also concluded that some sort of parent commit pointer was 
> > needed, although parent is somewhat the wrong term since 
> > that already means something in git.  Rather, maybe some 
> > "predecessor" type of term would be better, maybe 
> > "antecedent", but "amended-commit" pointer might be best?
> 
> In general, I agree that you would want richer set of "relationship"
> than mere "predecessor" or "related", but I do not think "amended"
> is sufficient.  I certainly do not think a "pointer" embedded in a
> commit object is a good idea, either (a new commit object header is

To me, this is roughly equivalent to saying that parent pointers
embedded in a commit object is a good idea because we want a richer
relationship than mere "parent". Look how much we've done with this
simple relationship. Similarly, the new relationship that I'm proposing
handles much more than the simple m==n==1 case. Read below for more
detail.

> out of question, but I doubt it is a good idea to make a pointer
> back to an existing commit as a part of the log message).
> 
> You may used to have a set of n-patches A1, A2, ..., An, that turned
> into m-patches X1, X2, ..., Xm, after refactoring.  During the work,
> it may turned out that some things the original tried to do are not
> sensible and dropped, while some other things are added in the final.
> series.  
> 
> When n==m==1, "amended" pointer from X1 to A1 may allow you to
> answer "Is this the first attempt?  If this is refined, what did the
> earlier one look like?" when given X1, but you would also want to
> answer a related question "This was a good start, but did the effort
> result in a refined patch, and if so what is it?" when given A1, and
> "amended" pointer won't help at all.  Needless to say, the "pointer"
> approach breaks down when !(n==m==1).

It doesn't break down. It merely presents more sophisticated situations
that may be more work for the tool to help out with. This is where I
think a prototype will help see these situations and develop the tool to
manage them.

When each of n commits is amended or rebased trivially into m==n new
commits then each change is represented by a distinct graph of
predecessors that can be followed independently of others. With rebase,
this is accomplished by using only "pick" in interactive mode or not
using interactive mode at all (and no autosquash).

The more sophisticated cases can be broken down into two operations that
change the number of resulting commits.

  1. Squashing two commits together ("fixup", "squash"). In this case,
     the resulting commit will have two or more pointers. This clearly
     shows that multiple changes converged into one at this point.

  2. Splitting a single commit into multiple new commits ("edit"). In
     this case, the graph shows multiple new commits pointing to the
     same predecessor. In my experience, this is less common. It also is
     a little more challenging to think about the tool managing
     divergent work but I think it is possible.

The end result is m commits where m can be any positive number (even,
coincidentally, n). However, the graph of amended commits still tells
the story quite well. Even if commits are reordered, the graphs can
still be useful. The predecessor graph is independent of the parent
graph which makes up normal git commit history so it isn't inherently
bad that the order of commits was changed.

We can dream up some very interesting graphs. Sure, as we do
increasingly more complicated history rewriting, it is going to be
increasingly more difficult for the tool to help out. I'm not really
deterred by this at this point. I want to experiment and work it out
with a prototype.

My primary objective personally is to detect where work on a single
change has diverged by working on it from more than one workspace
whether its multiple people chipping in or just me. Merely having the
ability to reject an update that clobbers divergent work is a big win.
No more silent corruption of work.

My secondary objective is to develop a tool to help get the divergent
work back on track. I believe that in the majority of common cases, this
tool can be successful in either finding an automatic way to bring the
divergent work back into a new revision of the change or present the
user with conflicts to resolve that end up being much easier than what
I've had to do in past experience with rebase workflows.

Carl

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-06 17:29         ` Carl Baldwin
@ 2018-01-06 17:32           ` Carl Baldwin
  2018-01-06 21:38           ` Theodore Ts'o
  1 sibling, 0 replies; 44+ messages in thread
From: Carl Baldwin @ 2018-01-06 17:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Fick, Theodore Ts'o, Git Mailing List

On Sat, Jan 06, 2018 at 10:29:19AM -0700, Carl Baldwin wrote:
> To me, this is roughly equivalent to saying that parent pointers
> embedded in a commit object is a good idea because we want a richer
> relationship than mere "parent". Look how much we've done with this
> simple relationship. Similarly, the new relationship that I'm
> proposing handles much more than the simple m==n==1 case. Read below
> for more detail.

Of course, I meant to say "is not a good idea" in the above paragraph.
Please pardon my error.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Bring together merge and rebase
  2018-01-06 17:29         ` Carl Baldwin
  2018-01-06 17:32           ` Carl Baldwin
@ 2018-01-06 21:38           ` Theodore Ts'o
  1 sibling, 0 replies; 44+ messages in thread
From: Theodore Ts'o @ 2018-01-06 21:38 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Junio C Hamano, Martin Fick, Git Mailing List

On Sat, Jan 06, 2018 at 10:29:21AM -0700, Carl Baldwin wrote:
> > When n==m==1, "amended" pointer from X1 to A1 may allow you to
> > answer "Is this the first attempt?  If this is refined, what did the
> > earlier one look like?" when given X1, but you would also want to
> > answer a related question "This was a good start, but did the effort
> > result in a refined patch, and if so what is it?" when given A1, and
> > "amended" pointer won't help at all.  Needless to say, the "pointer"
> > approach breaks down when !(n==m==1).
> 
> It doesn't break down. It merely presents more sophisticated situations
> that may be more work for the tool to help out with. This is where I
> think a prototype will help see these situations and develop the tool to
> manage them.

That's another way of saying "break down".

And if the goal is a prototype, may I gently suggest that the way
forward is trailers in the commit body, ala:

	Change-Id: I0b793feac9664bcc8935d8ec04ca16d5

or

	Upstream-4.15-SHA1: 73875fc2b3934e45b4b9a94eb57ca8cd

Making changes in the commit header is complex, and has all *sorts* of
forward and backwards compatibility challenges, especially when it's
not clear what the proper data model should be.

Cheers,

						 -Ted

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-01-06 21:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-23  6:10 Bring together merge and rebase Carl Baldwin
2017-12-23 18:59 ` Ævar Arnfjörð Bjarmason
2017-12-23 21:01   ` Carl Baldwin
2017-12-23 22:09     ` Ævar Arnfjörð Bjarmason
2017-12-26  0:16       ` Carl Baldwin
2017-12-26  1:28         ` Jacob Keller
2017-12-26 23:30           ` Igor Djordjevic
2017-12-26 17:49         ` Ævar Arnfjörð Bjarmason
2017-12-26 19:44           ` Carl Baldwin
2017-12-26 20:19             ` Paul Smith
2017-12-26 21:07               ` Carl Baldwin
2017-12-23 22:19     ` Randall S. Becker
2017-12-25 20:05       ` Carl Baldwin
2017-12-23 23:01     ` Johannes Schindelin
2017-12-24 14:13       ` Alexei Lozovsky
2018-01-04 15:44         ` Johannes Schindelin
2017-12-25 23:43       ` Carl Baldwin
2017-12-26  0:01         ` Randall S. Becker
2018-01-04 19:49       ` Martin Fick
2017-12-23 22:30   ` Johannes Schindelin
2017-12-25  3:52 ` Theodore Ts'o
2017-12-26  1:16   ` Carl Baldwin
2017-12-26  1:47     ` Jacob Keller
2017-12-26  6:02       ` Carl Baldwin
2017-12-26  8:40         ` Jacob Keller
2018-01-04 19:19           ` Martin Fick
2018-01-05  0:31             ` Martin Fick
2018-01-05  5:09             ` Carl Baldwin
2018-01-05  5:20               ` Carl Baldwin
2017-12-26 18:04     ` Theodore Ts'o
2017-12-26 20:31       ` Carl Baldwin
2018-01-04 20:06         ` Martin Fick
2018-01-05  5:06           ` Carl Baldwin
2018-01-04 19:54     ` Martin Fick
2018-01-05  4:08       ` Carl Baldwin
2018-01-05 20:14       ` Junio C Hamano
2018-01-06 17:29         ` Carl Baldwin
2018-01-06 17:32           ` Carl Baldwin
2018-01-06 21:38           ` Theodore Ts'o
2017-12-27  4:35   ` Carl Baldwin
2017-12-27 13:35     ` Alexei Lozovsky
2017-12-28  5:23       ` Carl Baldwin
2017-12-26  4:08 ` Mike Hommey
2017-12-27  2:44   ` Carl Baldwin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).