git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Andrew Sayers <andrew-git@pileofstuff.org>
To: Nathan Gray <n8gray@n8gray.org>
Cc: Stephen Bash <bash@genarts.com>,
	Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>,
	git@vger.kernel.org, Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	Sam Vilain <sam@vilain.net>, David Barr <davidbarr@google.com>
Subject: Re: Approaches to SVN to Git conversion
Date: Tue, 06 Mar 2012 22:34:35 +0000	[thread overview]
Message-ID: <4F5690FB.9060800@pileofstuff.org> (raw)
In-Reply-To: <CA+7g9Jwb=7wH7R3=ShhOGMdHXWmq4ZahocpaEuJdf+yBfCpA8A@mail.gmail.com>

I've now added a bit of documentation and uploaded my code to github:
https://github.com/andrew-sayers/Proof-of-concept-History-Converter

I haven't attached it here because the code isn't at a stage where it
would be useful to review line-by-line.  Comments are welcome if you
really want to though :)

svn-branch-export.pl makes heavy use of SVN::Dump.  You may want to get
the latest version from github if speed is important to you:
https://github.com/book/SVN-Dump/ - many thanks to Philippe Bruhat for
accepting my performance patch so quickly.

Here are some particular gripes I have with the code I've uploaded:

git-branch-import.pl gets the revision number by parsing out the
"git-svn-id" in commit messages - as I mentioned earlier, I started off
thinking this script would be closely related to git-svn somehow.  In
hindsight it would be better to read revision numbers from the marks
file exported by git-fast-import.

Branch History Format has some git-specific stuff in the setup section.
 I didn't think about this in too much detail while writing it, but
DVCS-neutrality would be better served by turning these into
command-line options.

As mentioned before, branch detection in svn-branch-export.pl is rather
muddled, as my understanding of the problem evolved significantly while
writing it.

svn-branch-export.pl half-heartedly uses a configure/make/make install
analogy to describe its behaviour - I'm increasingly sure this is
gimmicky and awful, rather than a neat explanatory trick.

svn-branch-export.pl exposes a lot of config values (e.g. "log_style")
that just bulk up the implementation and create space for bugs to creep
in without adding much actual value.  They should be removed.

On 06/03/12 19:29, Nathan Gray wrote:
<snip>
> 
> The problem of specifying and detecting branches is a major problem in
> my upcoming conversion.  We've got toplevel trunk/branches/tags
> directories but underneath "branches" it's a free-for-all:
> 
> /branches/codenameA/{projectA,projectB,projectC}
> /branches/codenameB   (actually a branch of projectA)
> /branches/developers/joe/frobnicator-experiment (also a branch of projectA)
> 
> Clearly there's no simple regex that's going to capture this, so I'm
> reduced to listing every branch of projectA, which is tedious and
> error-prone.  However, what *would* work fabulously well for me is
> "marker file" detection.  Every copy of projectA has a certain file at
> it's root.  Let's call it "markerFile.txt".  What I'd really love is a
> way to say:

This is quite close to the implementation I've got.  The SVN exporter
runs in two stages:

In the first stage, the script treats any non-blacklisted file as a
marker file, but only looks for trunk branches.  It looks all through
the history, traces back through the copyfroms, and tries to find the
original directory associated with the file.  Usually it decides that
the only branch without a copyfrom is /trunk.  Searching just for trunks
with this weak heuristic makes it much easier to hand-verify the result.

In the second stage, the script looks through the history again, tracing
the copies of known branches in a slightly less clever way than
described in my previous e-mail.  There's no need for marker files this
time round, as we just assume any `svn cp /trunk
/directory/not/within/a/branch` is a new branch.  In my experiments this
has been a pretty solid way of detecting branches without too much human
input - I might be missing something (or have mis-explained something),
but I'd be interested to hear examples of where this would go wrong.
Having said that, here's a dodgy example I'd like to pre-emptively defend:

	svn add tronk
	svn ci -m "Created trunk" # r1
	svn cp tronk trunk
	svn ci -m "D'oh" # r2
	svn rm tronk
	svn add trunk/markerFile.txt
	svn ci -m "Double d'oh!" # r3

You could argue that the correct branch history description for the
above would be:

	In r3, create branch "trunk"

In other words, ignore everything that happened before the marker file
was created.  However, I would argue the following representation is
more correct:

	In r1, create branch "tronk"
	In r2, create branch "trunk" from "tronk" r1
	In r3, delete branch "tronk"

The branch history format supports the "delete branch" command (remove
the branch entirely) as well as the more common "deactivate branch"
(keep the branch but don't accept any new commits) specifically to deal
with this sort of weirdness.  Creating a branch then deleting it keeps
the r1 revision log intact as part of the "trunk" branch, without
leaving any useless branches lying around.

	- Andrew

  parent reply	other threads:[~2012-03-06 22:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-03 12:27 [RFC] "Remote helper for Subversion" project David Barr
2012-03-03 12:41 ` David Barr
2012-03-04  7:54   ` Jonathan Nieder
2012-03-04 10:37     ` David Barr
2012-03-04 13:36       ` Andrew Sayers
2012-03-05 15:27         ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Stephen Bash
2012-03-05 23:27           ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-06 14:36             ` Stephen Bash
2012-03-06 19:29           ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Nathan Gray
2012-03-06 20:35             ` Stephen Bash
2012-03-06 23:59               ` [spf:guess] " Sam Vilain
2012-03-07 22:06                 ` Andrew Sayers
2012-03-07 23:15                   ` [spf:guess,iffy] " Sam Vilain
2012-03-08 20:51                     ` Andrew Sayers
2012-03-06 22:34             ` Andrew Sayers [this message]
2012-03-07 15:38               ` Approaches to SVN to Git conversion Sam Vilain
2012-03-07 20:28                 ` Andrew Sayers
2012-03-07 22:33               ` Phil Hord
2012-03-07 23:08               ` Nathan Gray
2012-03-07 23:32                 ` Andrew Sayers
2012-03-04 16:23       ` [RFC] "Remote helper for Subversion" project Jonathan Nieder
2012-03-27  3:58     ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F5690FB.9060800@pileofstuff.org \
    --to=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=bash@genarts.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=n8gray@n8gray.org \
    --cc=peff@peff.net \
    --cc=sam@vilain.net \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).