From: Stephen Bash <bash@genarts.com>
To: Andrew Sayers <andrew-git@pileofstuff.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>,
git@vger.kernel.org, Sverre Rabbelier <srabbelier@gmail.com>,
Dmitry Ivankov <divanorama@gmail.com>,
Ramkumar Ramachandra <artagnon@gmail.com>,
Sam Vilain <sam@vilain.net>, David Barr <davidbarr@google.com>
Subject: Re: Approaches to SVN to Git conversion
Date: Tue, 06 Mar 2012 09:36:31 -0500 (EST) [thread overview]
Message-ID: <9130e486-21bd-4c8c-9647-b627dbc1e5c6@mail> (raw)
In-Reply-To: <4F554BE4.5010401@pileofstuff.org>
----- Original Message -----
> From: "Andrew Sayers" <andrew-git@pileofstuff.org>
> Sent: Monday, March 5, 2012 6:27:32 PM
> Subject: Re: Approaches to SVN to Git conversion
>
> > My current thinking (and this is very much open for discussion) is
> > that as long as the SVN properties are available (especially the
> > copyfrom information) Git has just as much information (if not more)
> > to reconstruct the SVN history as SVN does. (And going through our
> > messy history I haven't found any counterpoint to this yet)
>
> I agree that git can be taught a superset of the information in SVN,
> but you'll need absolutely all SVN properties available...
I'm pretty sure Jonathan won't be happy with anything less ;)
> I wrote my SVN exporter based on SVN dumps for three reasons - I
> figured people switching from SVN would be more comfortable
> customising a solution that only used technologies they understood, I
> figured it might be useful to Mercurial or Bazaar some day if it was
> DVCS-neutral, and I have to use SVN for my day job so I'm more
> interested in getting a good migration story today than a great one
> tomorrow.
The multiple systems argument is a good one.
> > my %branch_spec = { '/trunk/projname' => 'master',
> > '/branches/*/projname' => '/refs/heads/*' };
> > my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' };
> >
> > Now I know this simple mapping will fail as I get further in our
> > history -- in particular we have one branch that came from:
> >
> > svn cp $SVN_REPO/trunk/ $SVN_REPO/foo # OOPS! not in branches!
> > svn mv $SVN_REPO/foo $SVN_REPO/branches/foo
> >
> > It's then up to the user to modify the branch
> > map to something that accounts for this behavior:
> >
> > my %branch_spec = { '/trunk/projname' => 'master',
> > '/branches/*/projname' => '/refs/heads/*',
> > '/foo' => '/refs/heads/foo' };
> > my %tag_spec = { '/tags/*/projname' => '/refs/tags/*' };
>
> I started with an approach like you describe, but as you say it winds
> up in a mess of special cases. A friend pointed me to Perl's catalyst
> repository[2], which is a wonderful haven of every mad SVN thing ever
> dreamt up. That got me playing with more general heuristics, and
> while writing this e-mail I think I've finally nailed it. What do you
> say to defining SVN branches like this:
>
> A directory is a branch if...
> 1. it is not a subdirectory of an existing branch; and
> 2. either:
> 2a. it is in a list of branches specified by the user, or
> 2b. it is copied from a (subdirectory of a) branch
I think I started with a very similar set of rules... Looking at my code now I'm having a hard time summarizing them (probably because they evolved with the code, so what started simple morphed into something pretty complicated). I guess as long as the user has the option to say "no, don't treat this copy as a branch" (or equivalently the Git side of things has a way to say "ignore this branch") these rules would be okay. But at that point we're back to a list of exceptions -- really we're arguing white-list vs black-list... I eventually chose to go the white-list route for our conversion after starting with black-list (a white-list that still required a few manual edits before manipulating the Git history). So take that single data point for what it's worth.
> > > Once the format is defined, git import is fairly straightforward.
> > > Proof-of-concept code to follow, but it's really just a wrapper
> > > around git-commit-tree, git-mktag etc. I wrote this in Perl
> > > thinking it would relate somehow to git-svn, but eventually
> > > realised it didn't and that a few hundred calls to (plumbing)
> > > processes per second isn't so good for performance. The only
> > > interesting part of the problem is how to tackle SVN tags. I went
> > > for an ambitious approach, making normal tags where possible and
> > > downgrading them to lightweight tags when necessary. This does
> > > involve managing something that is effectively a branch in
> > > refs/tags/, but what else is an SVN tag but a branch in the wrong
> > > namespace?
> >
> > I don't understand how "normal" and "lightweight" apply in this
> > situation? ... In the case of actual content changes in a tag's
> > life, I think it's up to the user to decide between three options:
> >
> > 1) only retain the last SVN tag
> > 2) tag using the git-svn-style 'tagname@rev' for all but the last
> > 3) Do (2), but move older tags to some hidden namespace
> > (refs/hidden/tags or the like)
> >
> > ... In the bidirectional case things get murky (maybe always tag
> > with tagname@rev and hope for tab completion?).
>
> I didn't explain this particularly well, as it's based largely on the
> vague desire to make update work some day. Imagine the user does
> this:
>
> * git svn-pull # get tags/foo, a candidate for an annotated tag
> ... time passes ...
> * git svn-pull # tags/foo has now been updated in another revision
>
> If we create an annotated tag in step 1, what do we do in step 2? You
> can't make the tag object the parent of a new revision, so you need to
> do something unpleasant. The solution I proposed was to convert the
> tag message to a commit message (i.e. pretend a lightweight tag had
> been created all along), then add another commit on top of it and make
> a lightweight tag from the new commit (i.e. treat it like a branch).
> In retrospect that's far too much magic without user involvement - a
> better solution would be to give the user this option along with the
> ones you outlined, and let git-config remember their preference if
> they want.
Okay, that's what I thought you meant (and what I classified as a bidirectional problem, but I guess it's not strictly a bidirectional problem, but a one-time migration does not have the problem). If you want to continue to update Git from SVN there are two cases to consider:
1) Each Git repository *only* talks to SVN
2) The Git repository is cloned for further use
(So the chain is something like SVN->Git->Git)
In (1) your lightweight tag solution is probably okay (but I'm pretty sure creating/deleting annotated tags would behave the same way because no one else sees the Git tag object). In (2) I think there would still be a tag conflict when the upstream Git repo replaces a lightweight tag and the downstream repo attempts to fetch it. I don't know what the fetch/pull machinery does when there's a lightweight tag conflict (I'm guessing either bails out or keeps the local one?). Case (2) motivates me to say always generate (annotated?) tags named tagname@rev so there can be no conflicts. In that case the only difference I see is if we create an empty Git commit with the tag message plus a lightweight tag or tag the original commit with an annotated tag (I think it's fairly obvious I'm a fan of
the latter).
> [1] http://en.wikipedia.org/wiki/Full_employment_theorem
> [2] http://dev.catalyst.perl.org/repos/bast/
Thanks,
Stephen
next prev parent reply other threads:[~2012-03-06 14:36 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-03 12:27 [RFC] "Remote helper for Subversion" project David Barr
2012-03-03 12:41 ` David Barr
2012-03-04 7:54 ` Jonathan Nieder
2012-03-04 10:37 ` David Barr
2012-03-04 13:36 ` Andrew Sayers
2012-03-05 15:27 ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Stephen Bash
2012-03-05 23:27 ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-06 14:36 ` Stephen Bash [this message]
2012-03-06 19:29 ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Nathan Gray
2012-03-06 20:35 ` Stephen Bash
2012-03-06 23:59 ` [spf:guess] " Sam Vilain
2012-03-07 22:06 ` Andrew Sayers
2012-03-07 23:15 ` [spf:guess,iffy] " Sam Vilain
2012-03-08 20:51 ` Andrew Sayers
2012-03-06 22:34 ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-07 15:38 ` Sam Vilain
2012-03-07 20:28 ` Andrew Sayers
2012-03-07 22:33 ` Phil Hord
2012-03-07 23:08 ` Nathan Gray
2012-03-07 23:32 ` Andrew Sayers
2012-03-04 16:23 ` [RFC] "Remote helper for Subversion" project Jonathan Nieder
2012-03-27 3:58 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9130e486-21bd-4c8c-9647-b627dbc1e5c6@mail \
--to=bash@genarts.com \
--cc=andrew-git@pileofstuff.org \
--cc=artagnon@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=sam@vilain.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).