git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Using git to bridge two svn repositories: a success story
@ 2007-04-20  4:12 Steven Grimm
  2007-04-20  7:18 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Steven Grimm @ 2007-04-20  4:12 UTC (permalink / raw)
  To: git

I complain too much of the time on this list, so here's a success story 
I can share for a change. I just used git to merge two separate svn 
repositories: the official repo for an open-source program and an 
internal repo with our locally-modified version of the same program. The 
local copy has been tracking the official one off and on over time; it 
has a bunch of changes that were contributed back to the official code 
base at various points, other changes that weren't, and some directory 
layout changes to accommodate our internal build system.

We had fallen fairly far behind the official version, so yesterday I 
decided to bring us up to date. Not a trivial merge; various of our 
changes had been applied to different branches in the official svn 
repository, which had gotten merged back into their trunk at various 
points. In many cases local change A appeared before remote change B in 
our history but in the opposite order in the official repo since they 
committed our change after the other one.

Obviously svn is nowhere near adequate to the task of normalizing these 
two code bases. So I used git instead, and it worked out great. 
Specifically, here's what I did, minus a few false starts:

1. Made two git-svn repositories, one based on our local code base and 
one based on the official svn repository.

2. Created a git repository and pulled from both of the git-svn repos. 
(I know I could have done this with one repo instead of three, but I 
wanted to make sure I could easily blow away one of the parts of this 
and start over.)

3. Added a couple of .git/info/grafts entries for places where I knew 
the original project had merged branches back into trunk, but where 
git-svn hadn't detected the merge. Probably not git-svn's fault, given 
how brittle merging is in svn and the fact that a couple of the merges 
were split across multiple svn revisions.

4. Found an early point in our history when we had a fairly close to 
unmodified copy of the distribution at the time and created a branch 
from that revision.

5. Renamed the files from our layout back to the distribution's. (I'll 
talk more about this below.)

6. Did a baseless merge with the corresponding revision of the 
distribution's history. Resolved the conflicts, which weren't too severe 
thanks to step 4.

7. Walked through the revision history on both sides merging into my 
integration branch. I was more cautious about this than I probably 
needed to be (though more on that below too); my approach was to merge 
up to a particular change on our side that I knew we'd contributed 
upstream, then merge up to the corresponding revision on the official 
side, repeat until done. In cases where our stuff had been integrated 
into a branch in the official repo, I followed that branch rather than 
trunk for the most part. I ended up walking three branches plus trunk.

8. Once I had merged the last of our local changes, I merged the head of 
the official trunk into my integration branch, picking up a bunch of 
official revs in one step.

9. Renamed everything back to our naming conventions.

This was kind of an iterative process and the main reason I did it 
incrementally at first was mostly to limit the amount of conflict 
resolution at any one step, as well as to make sure that each of our 
contributions had in fact been merged correctly. (I wrote most of the 
code we contributed so I was able to quickly tell if it looked right.)

The gitk display for this repo looks like a ladder; nearly every 
revision of my integration branch is a merge.

Now, about those renames. The major change in structure was to rename 
the source directory from "server" in the official repository to "src" 
which our build system expects. So before I did any merges, I committed 
a revision where I did "git mv src server" (along with a couple other 
similar renames) so there'd be an explicit rename-only revision for 
git's rename detection to use to apply changes to the right files.

Unfortunately, that broke down as soon as I got to a contribution of 
ours that added a new file. I merged the contribution on our side (where 
everything lives in src/), and it correctly applied the modifications to 
the existing files in server/ thanks to the renames in the history. But 
the new files were created in src/. I didn't notice the file missing 
from server/ at first, and merged the revision from the official repo 
that created the same file there. The new file was identical on both 
sides, so I didn't think it was odd that there wasn't a conflict, and 
proceeded to the next rev. It was only after several more revisions 
merged from both sides that I noticed the server/ copy of the file was 
missing changes I'd sworn I'd just merged from our side. Naturally all 
our local changes were getting successfully applied to the copy in src/ 
and all the changes from the official repo were showing up in the 
server/ copy.

So I ended up resetting back to the first revision that created a new 
file in src/, and making sure I stopped at each revision that introduced 
a new file there so I could commit an extra revision after the merge to 
manually rename it into server/. Then the subsequent merge with the 
revision that created the file in server/ would correctly flag any 
differences between the two versions as conflicts, and I could go 
through and do the right thing with them. There were only three or four 
such cases so it wasn't too much extra work.

The only other glitch I ran into was missing one merge from the official 
svn repository when I created my grafts file. That caused me to get a 
bunch of repeat conflicts when I merged the subsequent svn trunk 
revision. But I immediately realized what was happening there; I reset, 
added the missing merge to my grafts file, and did the merge again, and 
the repeat conflicts went away.

Aside from those two minor things, it was a painless exercise, and now I 
have a reasonably coherent (if a bit convoluted) combined history of the 
two versions of the code base without the svn repositories on either 
side being aware of each other. I plan to keep all these git 
repositories around so I can quickly integrate subsequent changes from 
both sides.

So, kudos all around. Without git this would have been a much more 
time-consuming and error-prone exercise!

-Steve

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Using git to bridge two svn repositories: a success story
  2007-04-20  4:12 Using git to bridge two svn repositories: a success story Steven Grimm
@ 2007-04-20  7:18 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2007-04-20  7:18 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven Grimm <koreth@midwinter.com> writes:

> ..., so here's a success
> story I can share for a change. I just used git to merge two separate
> svn repositories: the official repo for an open-source program and an
> internal repo with our locally-modified version of the same
> program.

Nice story.  Thanks for sharing.

> Now, about those renames. The major change in structure was to rename
> the source directory from "server" in the official repository to "src"
> which our build system expects. So before I did any merges, I
> committed a revision where I did "git mv src server" (along with a
> couple other similar renames) so there'd be an explicit rename-only
> revision for git's rename detection to use to apply changes to the
> right files.

In the work you did in the story, your rename from server to src
was indeed an atomic action YOU wanted to have, which was done
to match the two tree structure.  It was your project, not git,
that did not want the name upstream uses.  So it makes perfect
sense to have that rename-only commit.  But if you do it only
because you think it would help later merges, don't.

I do not know who started this myth, but "rename only commit"
does not help rename detection in merges AT ALL, as rename
deteciton is not done step-by-step, but between ancestor and the
tip of each branch.

A "rename-only commit" does help if you are following history
with "git log -p -M", where rename detection logic compares
trees stepwise.  I think somebody confused this with the rename
detection done by the merge machinery to start the myth.  Please
do not spread it any further.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-04-20  7:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-20  4:12 Using git to bridge two svn repositories: a success story Steven Grimm
2007-04-20  7:18 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).