A workflow for local patch maintenance

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* A workflow for local patch maintenance
@ 2013-10-08 18:12 Tony Finch
  2013-10-10  1:33 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Tony Finch @ 2013-10-08 18:12 UTC (permalink / raw)
  To: git

This is a copy of an article I published at
http://fanf.livejournal.com/128282.html
I'm sending a copy here because I'm interested to know what other ways
there might be of handling this situation.

--

We often need to patch the software that we run in order to fix bugs
quickly rather than wait for an official release, or to add functionality
that we need. In many cases we have to maintain a locally-developed patch
for a significant length of time, across multiple upstream releases,
either because it is not yet ready for incorporation into a stable
upstream version, or because it is too specific to our setup so will not
be suitable for passing upstream without significant extra work.

I have been experimenting with a git workflow in which I have a feature
branch per patch. (Usually there is only one patch for each change we
make.) To move them on to a new feature release, I tag the feature branch
heads (to preserve history), rebase them onto the new release version, and
octopus merge them to create a new deployment version. This is rather
unsatisfactory, because there is a lot of tedious per-branch work, and I
would prefer to have branches recording the development of our patches
rather than a series of tags.

Here is a git workflow suggested by Ian Jackson which I am trying out
instead. I don't yet have much experience with it; I am writing it down
now as a form of documentation.

There are three branches:

upstream, which is where public releases live
working, which is where development happens
deployment, which is what we run

Which branch corresponds to upstream may change over time, for instance
when we move from one stable version to the next one.

The working branch exists on the developer's workstation and is not
normally published. There might be multiple working branches for
work-in-progress. They get rebased a lot.

Starting from an upstream version, a working branch will have a number of
mature patches. The developer works on top of these in
commit-early-commit-often mode, without worrying about order of changes or
cleanliness. Every so often we use git rebase --interactive to tidy up the
patch set. Often we'll use the "squash" command to combine new commits
with the mature patches that they amend. Sometimes it will be rebased onto
a new upstream version.

When the working branch is ready, we use the commands below to update the
deployment branch. The aim is to make it look like updates from the
working branch are repeatedly merged into the deployment branch. This is
so that we can push updated versions of the patch set to a server without
having to use --force, and pulling updates into a checked out version is
just a fast-forward. However this isn't a normal merge since the tree at
the head of deployment always matches the most recent good version of
working. (This is similar to what stg publish does.) Diagramatically,

     |
    1.1
     | \
     |  `A---B-- 1.1-patched
     |    \       |
     |     \      |
     |      `C-- 1.1-revised
     |            |
    2.0           |
     | \          |
     |  `-C--D-- 2.0-patched
     |            |
    3.1           |
     | \          |
     |  `-C--E-- 3.1-patched
     |            |
  upstream        |
              deployment

The horizontal-ish lines are different rebased versions of the patch set.
Letters represent patches and numbers represent version tags. The tags on
the deployment branch are for the install scripts so I probably won't need
one on every update.

Ideally we would be able to do this with the following commands:

    $ git checkout deployment
    $ git merge -s theirs working

However there is an "ours" merge strategy but not a "theirs" merge
strategy. Johannes Sixt described how to simulate git merge -s theirs in a
post to the git mailing list in 2010.
http://article.gmane.org/gmane.comp.version-control.git/163631
So the commands are:

    $ git checkout deployment
    $ git merge --no-commit -s ours working
    $ git read-tree -m -u working
    $ git commit -m "Update to $(git describe working)"

Mark Wooding suggested the following more plumbing-based version, which
unlike the above does not involve switching to the deployment branch.

    $ d=$(git rev-parse deployment)
    $ w=$(git rev-parse working)
    $ c=$(echo "Update to $(git describe working)" |
          git commit-tree -p $d -p $w working^{tree})
    $ git update-ref deployment $c $d
    $ unset c d w

Tony.
-- 
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-08 18:12 A workflow for local patch maintenance Tony Finch
@ 2013-10-10  1:33 ` Jeff King
  2013-10-10 16:53   ` Tony Finch
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2013-10-10  1:33 UTC (permalink / raw)
  To: Tony Finch; +Cc: git

On Tue, Oct 08, 2013 at 07:12:22PM +0100, Tony Finch wrote:

> We often need to patch the software that we run in order to fix bugs
> quickly rather than wait for an official release, or to add functionality
> that we need. In many cases we have to maintain a locally-developed patch
> for a significant length of time, across multiple upstream releases,
> either because it is not yet ready for incorporation into a stable
> upstream version, or because it is too specific to our setup so will not
> be suitable for passing upstream without significant extra work.

Do you need to keep the modifications you make on top of upstream as a
nice, clean series of rebased patches? If not, then you can avoid the
repeated rebasing, and just use a more traditional topic-branch
workflow. Treat modifications from upstream as just another topic.

For example, start with some version (let's say 1.0) of the upstream
software as your "master" branch. If it's kept in git, build on
upstream's git history.  If all you get are tarballs, create an
"upstream" branch with v1.0, and fork "master" from it.

Build on master as you would if it were your own. Fork topic branches,
develop the topics, test them, and then merge them back to "master" when
they're ready (or do development straight on master, or whatever
workflow you're accustomed to).

When v1.1 of the upstream software comes out, create a "merge-upstream"
topic branch from the tip of your "master". If upstream is in git, just
"git merge v1.1" from upstream. If not, then checkout your pristine
"upstream" branch (which should still be sitting at the v1.0 commit),
and build a v1.1 commit on top of it. And then "git merge upstream" to
pick up the new changes.

Test your merge-upstream topic in isolation, and when you think it's
ready merge it into master and deploy.

The most difficult part is the merge of upstream into the topic branch.
But git's 3-way merge tends to do a pretty good job (e.g., if you
contributed your patches upstream, then there should be no conflict).
You can also break up the work by keeping the "merge" topic running for
a long time, and merging as often as possible from upstream. That breaks
the conflict resolution into smaller chunks, and lets you do it closer
to when the conflicting patches were actually made, when they are
hopefully closer in your mind. And you don't have to worry about having
a broken intermediate result, because you're not deploying it; you're
just keeping the topic up to date until you're ready to test it.

You can also try git-imerge, which can make big merges a little more
manageable (though it can also make them harder sometimes...):

  https://github.com/mhagger/git-imerge

The history for such a repository might look like:

       o--o--B--o--o--C  <-- upstream branch
      /       \        \
     o--o---o--o--o--o--D  <-- upstream-merge branch
    /      /        /    \
   A--o---E--o--o--F--o---G <-- master branch
    \    / \      /
     o--o   o----o  <-- topic branches

where:

  - A is the v1.0 commit you start at

  - B and C are milestones where you merged upstream into your
    upstream-merge topic branch. These could be releases (like v1.1), or
    they could just be random spots where you felt like merging to keep
    things up to date. It depends how you want to break up the conflict
    resolution

  - D is a state of the upstream-merge branch that you test to make sure
    the merge happened OK

  - E and F are merges of regular topic branches (i.e., the patches you
    are working on locally). Note that we also merge those up to the
    upstream-merge branch, so that we can resolve early any conflicts
    between what's happening on master and what's happening upstream.

  - G is the merge of D into the master branch, after we have decided
    it's good to deploy

This all assumes that "master" is your known-good state that you deploy
or ship. If you prefer to have a "deploy" or "maint" branch for
hotfixes, you can do that too.

Hope that helps,
-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-10  1:33 ` Jeff King
@ 2013-10-10 16:53   ` Tony Finch
  2013-10-10 17:36     ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Tony Finch @ 2013-10-10 16:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> wrote:
>
> Do you need to keep the modifications you make on top of upstream as a
> nice, clean series of rebased patches? If not, then you can avoid the
> repeated rebasing, and just use a more traditional topic-branch
> workflow. Treat modifications from upstream as just another topic.

Thanks for the suggestion!

Our aim is to get as many patches into the upstream version as we can,
which is why my starting point is a clean rebased patch series. I am also
thinking that this will help me to know when a patch can be dropped from
the series because upstream have incorporated something like it. If
upstream works like git upstream (incorporating patches verbatim after
they pass review) then git can handle this automatically, but if the patch
gets re-worked it might be easier for me to drop it when rebasing rather
than resolve conflicts. I'm also thinking that for packages which we
update relatively infrequently, having a clean patch series makes it
easier to review whether they are all still necessary when updating. But
perhaps I am too wedded to manual patch management...

Tony.
-- 
f.anthony.n.finch  <dot@dotat.at>  http://dotat.at/
Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first.
Rough, becoming slight or moderate. Showers, rain at first. Moderate or good,
occasionally poor at first.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-10 16:53   ` Tony Finch
@ 2013-10-10 17:36     ` Jeff King
  2013-10-10 19:18       ` Jonathan Nieder
  2013-10-11 13:22       ` Stephen Bash
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2013-10-10 17:36 UTC (permalink / raw)
  To: Tony Finch; +Cc: git

On Thu, Oct 10, 2013 at 05:53:57PM +0100, Tony Finch wrote:

> Our aim is to get as many patches into the upstream version as we can,
> which is why my starting point is a clean rebased patch series. I am also
> thinking that this will help me to know when a patch can be dropped from
> the series because upstream have incorporated something like it. If
> upstream works like git upstream (incorporating patches verbatim after
> they pass review) then git can handle this automatically, but if the patch
> gets re-worked it might be easier for me to drop it when rebasing rather
> than resolve conflicts. I'm also thinking that for packages which we
> update relatively infrequently, having a clean patch series makes it
> easier to review whether they are all still necessary when updating. But
> perhaps I am too wedded to manual patch management...

I am in a similar situation to you. At GitHub, we run more-or-less stock
git on our backend, but often make bug-fixes or enhancements that are
intended for upstream, but which we want to start using before the next
release.

We used to keep the patches as series, and rebase them on newer versions
of git from time to time. These days we use the workflow I described
earlier. The specific things we wanted to fix were:

  1. It was a giant pain to work on or modify a patch series.

  2. It did not scale well beyond one person handling the patches
     and rebasing. Now people more or less work on our forked repository
     as they would normally, and don't have to care; merging from
     upstream is just another feature (that happens to bring in a ton of
     commits :) ).

  3. The pain in doing the big rebase-test-deploy cycle meant that we
     often delayed it, keeping us several versions behind upstream.
     This is bad not only for the end product (you aren't getting other
     bugfixes from upstream as quickly), but also because the longer you
     wait to rebase or merge, the more painful it generally is.

That being said, there are some new downsides, as you noted:

  1. Resolving conflicts between your version and the reworked upstream
     version can be a pain.

  2. If your local development does not happen in a clean series, it can
     be hard to create a clean series for upstream, and/or revert in
     favor of upstream when necessary.

I don't have silver bullets for either, unfortunately. To mitigate
problem 1, I will sometimes revert a local topic before doing the
upstream merge, if I know it has been reworked. You can do this right
before merging. Or, as soon as you see that upstream is taking a
reworked version, you can revert what you have locally and apply the
upstream fix. This latter has the advantage of doing it much closer to
the actual development time, so handling any irregularities is easier.
But it is not always a possibility if upstream's reworking involved
building on other changes that you do not want to grab. :)

For problem 2, it helps if you can do development with topic branches as
git.git does, with aggressive rebasing while a topic is in development,
and then merging to master once it is mature.  Then at least your "git
log --first-parent" view of "master" shows you which topics you made,
and the topics themselves are relatively clean. Sometimes you end up
needing to make changes to a topic after it is mature, and the ordering
is not exactly what you would send upstream (e.g., for a pure patch
series going to upstream, you would not fix the bug on top, you would
squash the fix into an earlier commit). I mostly just handle this
manually, and it doesn't come up too often (and in many cases, by the
time you have the "bugfix on top" locally, upstream has also already
applied the original patches, and they want it as a bugfix on top, too).

We can no longer easily say "this is stock git, with patches X-Y-Z on
top" (we can get the set of commits we have that git does not have, of
course, but there's no easy way to say "these ones aren't relevant
anymore"). But we've found it's not all _that_ important. With a
rebasing strategy, you really want to know so that you don't
accidentally drop a patch that is necessary. But with a merge strategy,
you cannot accidentally drop a patch (you might botch the conflict
resolution, of course, but that is a bit harder).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-10 17:36     ` Jeff King
@ 2013-10-10 19:18       ` Jonathan Nieder
  2013-10-11 13:22       ` Stephen Bash
  1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Nieder @ 2013-10-10 19:18 UTC (permalink / raw)
  To: Tony Finch; +Cc: Jeff King, git

Jeff King wrote:

>   3. The pain in doing the big rebase-test-deploy cycle meant that we
>      often delayed it, keeping us several versions behind upstream.
>      This is bad not only for the end product (you aren't getting other
>      bugfixes from upstream as quickly), but also because the longer you
>      wait to rebase or merge, the more painful it generally is.
>
> That being said, there are some new downsides, as you noted:
>
>   1. Resolving conflicts between your version and the reworked upstream
>      version can be a pain.
>
>   2. If your local development does not happen in a clean series, it can
>      be hard to create a clean series for upstream, and/or revert in
>      favor of upstream when necessary.

That suggests a possible hybrid approach: use a normal merge-heavy
workflow day to day, but occasionally clean up, for example by
rebasing against upstream.

That doesn't address the question of "how do I preserve old versions
of my patchset after a rebase", though.

The msysgit project uses a script called merging-rebase.sh[1] to
keep their patches current on top of the shifting target of git's
"next".  It's similar to your "merge -s theirs" approach.  It has some
problems (once you get past the current version of the patch stack,
history mining is complicated by all the old versions of the patch
stack) but for their day-to-day development it works ok.

There is an interesting approach that involves only merging and never
rebasing, while still being able to create a presentable patch series
when you're done.  The idea is to keep each patch meant for upstream
consumption in a separate (specially named) branch, with tracked files
like ".topmsg" containing its commit message, dependencies, and other
metadata.  There is a tool called 'tg' (TopGit) for working with this
kind of repo[2].  The Hurd uses it for their binutils and glibc
patches.

Another tool for maintaining a public patch stack, this time using a
"quilt"-style workflow instead of aggressively using native git
commands, is guilt[3], used for example to maintain the ext4 patch
queue.

In practice I tend to find all these too formal, and just keep one
branch that moves forward and is never rebased and a separate branch
that is constantly rebased with commits explaining all my changes to
the upstream code.  E.g., see [4].  This probably only works when the
patch stack is not very large.

Jonathan

[1] https://github.com/msysgit/msysgit/blob/master/share/msysGit/merging-rebase.sh
[2] https://github.com/greenrd/topgit#readme
[3] http://repo.or.cz/w/guilt.git
[4] git://repo.or.cz/xz/debian.git

    Here the constantly-rebased branch is not even published, since
    it is easy to re-create by applying the patches.

    The constantly-advancing branch is "master", which consists of
    patched upstream source + extra metadata in the debian/
    subdirectory.

    The constantly-rebased branch can be revived by applying the
    patches from debian/diff/ to the "upstream" branch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-10 17:36     ` Jeff King
  2013-10-10 19:18       ` Jonathan Nieder
@ 2013-10-11 13:22       ` Stephen Bash
  2013-10-11 15:16         ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Stephen Bash @ 2013-10-11 13:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Tony Finch

----- Original Message -----
> From: "Jeff King" <peff@peff.net>
> Sent: Thursday, October 10, 2013 1:36:28 PM
> Subject: Re: A workflow for local patch maintenance
> 
> ... snip ...
>
> That being said, there are some new downsides, as you noted:
> 
>   1. Resolving conflicts between your version and the reworked
>   upstream version can be a pain.
> 
> ... snip ...
> 
> To mitigate problem 1, I will sometimes revert a local topic before
> doing the upstream merge, if I know it has been reworked.

Peff (slightly off topic) - A coworker of mine actually ran into this
problem earlier this week.  Is there recommended way to revert a merged
topic branch?  I assume it's essentially reverted each commit introduced
by the branch, but is there a convenient invocation of revert? (easy to 
remember and hard to screw up)

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-11 13:22       ` Stephen Bash
@ 2013-10-11 15:16         ` Jeff King
  2013-10-11 15:30           ` Stephen Bash
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2013-10-11 15:16 UTC (permalink / raw)
  To: Stephen Bash; +Cc: git, Tony Finch

On Fri, Oct 11, 2013 at 09:22:28AM -0400, Stephen Bash wrote:

> > To mitigate problem 1, I will sometimes revert a local topic before
> > doing the upstream merge, if I know it has been reworked.
> 
> Peff (slightly off topic) - A coworker of mine actually ran into this
> problem earlier this week.  Is there recommended way to revert a merged
> topic branch?  I assume it's essentially reverted each commit introduced
> by the branch, but is there a convenient invocation of revert? (easy to 
> remember and hard to screw up)

If you merged the whole topic in at once, then you can use "git revert
-m 1 $merge_commit" to undo the merge. If it came in individual pieces,
then you have to revert each one individually (though if it was a series
of merges, you can in theory revert each merge in reverse order).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A workflow for local patch maintenance
  2013-10-11 15:16         ` Jeff King
@ 2013-10-11 15:30           ` Stephen Bash
  0 siblings, 0 replies; 8+ messages in thread
From: Stephen Bash @ 2013-10-11 15:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Tony Finch

----- Original Message -----
> From: "Jeff King" <peff@peff.net>
> To: "Stephen Bash" <bash@genarts.com>
> Cc: git@vger.kernel.org, "Tony Finch" <dot@dotat.at>
> Sent: Friday, October 11, 2013 11:16:14 AM
> Subject: Re: A workflow for local patch maintenance
> 
> On Fri, Oct 11, 2013 at 09:22:28AM -0400, Stephen Bash wrote:
> 
> > > To mitigate problem 1, I will sometimes revert a local topic
> > > before doing the upstream merge, if I know it has been reworked.
> > 
> > Peff (slightly off topic) - A coworker of mine actually ran into
> > this problem earlier this week.  Is there recommended way to revert
> > a merged topic branch?  I assume it's essentially reverted each
> > commit introduced by the branch, but is there a convenient
> > invocation of revert?  (easy to remember and hard to screw up)
> 
> If you merged the whole topic in at once, then you can use "git revert
> -m 1 $merge_commit" to undo the merge. If it came in individual
> pieces, then you have to revert each one individually (though if it
> was a series of merges, you can in theory revert each merge in reverse
> order).

Thanks for the pointer.  That got me to the right place on the revert
manpage, and there I found the link to howto/revert-a-faulty-merge.txt
which was extremely helpful.

Thanks!
Stephen

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-10-11 15:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-08 18:12 A workflow for local patch maintenance Tony Finch
2013-10-10  1:33 ` Jeff King
2013-10-10 16:53   ` Tony Finch
2013-10-10 17:36     ` Jeff King
2013-10-10 19:18       ` Jonathan Nieder
2013-10-11 13:22       ` Stephen Bash
2013-10-11 15:16         ` Jeff King
2013-10-11 15:30           ` Stephen Bash

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).