From: Sam Vilain <sam@vilain.net>
To: Stephen Bash <bash@genarts.com>
Cc: Nathan Gray <n8gray@n8gray.org>,
Andrew Sayers <andrew-git@pileofstuff.org>,
Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>,
git@vger.kernel.org, Sverre Rabbelier <srabbelier@gmail.com>,
Dmitry Ivankov <divanorama@gmail.com>,
Ramkumar Ramachandra <artagnon@gmail.com>,
David Barr <davidbarr@google.com>
Subject: Re: [spf:guess] Re: Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project)
Date: Tue, 06 Mar 2012 15:59:27 -0800 [thread overview]
Message-ID: <4F56A4DF.8060807@vilain.net> (raw)
In-Reply-To: <ab5eb5a7-a446-4dc3-b8e8-e3f7ec306452@mail>
On 3/6/12 12:35 PM, Stephen Bash wrote:
>> The problem of specifying and detecting branches is a major problem in
>> my upcoming conversion. We've got toplevel trunk/branches/tags
>> directories but underneath "branches" it's a free-for-all:
>>
>> /branches/codenameA/{projectA,projectB,projectC}
>> /branches/codenameB (actually a branch of projectA)
>> /branches/developers/joe/frobnicator-experiment (also a branch of
>> projectA)
>>
>> Clearly there's no simple regex that's going to capture this, so I'm
>> reduced to listing every branch of projectA, which is tedious and
>> error-prone. However, what *would* work fabulously well for me is
>> "marker file" detection. Every copy of projectA has a certain file at
>> it's root. Let's call it "markerFile.txt". What I'd really love is a
>> way to say:
>>
>> my %branch_markers = {'/branches/**/markerFile.txt' =>
>> '/refs/heads/**'}
>
> Ooo... I like it. I hadn't hit on this idea yet, but it certainly is a very helpful heuristic. I doubt I'd have any sort of demo code for you in the near future, but it's definitely an idea to roll into the mix.
What I did for the Perl Perforce conversion is make this a multi–step
process; first, the heuristic goes through and detects branches and
merge parents. Then you do the actual export. If, however, the
heuristic gets it wrong, then you can manually override the branch
detection for a particular revision, which invalidates all of the
_automatic_ decisions made for later revisions the next time you run it.
Even with all of the information in Postgres, and much of the hard work
pushed into the Postgres engine, and Postgres tuned for OLAP, this was
the slowest part of the operation. For a 30,000–odd revision Perforce
repository.
The manual input is extremely useful for bespoke conversions; there will
always be warts in the history and no heuristic is perfect (even if you
can supply your own set of expressions, a way to override it for just
one revision is handy).
Just to revise, the steps in git-p4raw, are:
* load metadata (git-p4raw load ; git-p4raw check)
* load blobs (git-p4raw export-blobs)
* find project roots (git-p4raw find-branches)
Project root decisions can be overridden, in git-p4raw this was
through a DB insert, but all this consisted of was inserting (revision,
branch) tuples into the appropriate table so a front–end would be
trivial. As you suggest, a custom heuristic is also an option but the
most flexible solution is just being able to override the decisions made
for a particular revision.
* detect project merges (also done by git-p4raw find-branches)
Detecting merge parents used a heuristic based on the per–file
integration records and a computation based on an internal diff-tree
which produced a list of files that would have needed resolving. This
one I actually used enough to bother implementing a front–end for:
git-p4raw graft REV PARENT PARENT
Where 'PARENT' could be another project root (revision/branch location),
or it could be a git commit ID (for the inevitable occasion where you
need to manually graft on some history). This interface allows you to
do several things:
1. mark a merge which was not recorded correctly in history
2. un–mark a merge which was detected/recorded incorrectly
3. skip bad sections of history, for instance squash merging merges
which happened over several commits (SVN and Perforce, of course,
support insane piecemeal merging prohibited by git)
* the actual fast-import exporter.
git-p4raw export-commits 1..5000
There was also an important reverse operation:
git-p4raw unexport-commits 2500
Which moved all of the exported refs backwards, deleted ones which
didn't exist at revision 2500.
Once the data has been mined, the actual exporting can proceed very
fast. Eg, on my laptop I could easily be topping 300 commits per second
which makes for a nice export/examine/rewind/adjust cycle.
For more information,
git clone git://github.com/samv/git-p4raw
cd git-p4raw
perldoc git-p4raw
The "Game plan." section of the POD is particularly relevant. Remember
that SVN is very similar to Perforce in virtually all of its design
details so this tool, its database schema, and implementation are all
very relevant to the design of the new svn-fe importer.
Sam
next prev parent reply other threads:[~2012-03-06 23:59 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-03 12:27 [RFC] "Remote helper for Subversion" project David Barr
2012-03-03 12:41 ` David Barr
2012-03-04 7:54 ` Jonathan Nieder
2012-03-04 10:37 ` David Barr
2012-03-04 13:36 ` Andrew Sayers
2012-03-05 15:27 ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Stephen Bash
2012-03-05 23:27 ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-06 14:36 ` Stephen Bash
2012-03-06 19:29 ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Nathan Gray
2012-03-06 20:35 ` Stephen Bash
2012-03-06 23:59 ` Sam Vilain [this message]
2012-03-07 22:06 ` [spf:guess] " Andrew Sayers
2012-03-07 23:15 ` [spf:guess,iffy] " Sam Vilain
2012-03-08 20:51 ` Andrew Sayers
2012-03-06 22:34 ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-07 15:38 ` Sam Vilain
2012-03-07 20:28 ` Andrew Sayers
2012-03-07 22:33 ` Phil Hord
2012-03-07 23:08 ` Nathan Gray
2012-03-07 23:32 ` Andrew Sayers
2012-03-04 16:23 ` [RFC] "Remote helper for Subversion" project Jonathan Nieder
2012-03-27 3:58 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F56A4DF.8060807@vilain.net \
--to=sam@vilain.net \
--cc=andrew-git@pileofstuff.org \
--cc=artagnon@gmail.com \
--cc=bash@genarts.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=n8gray@n8gray.org \
--cc=peff@peff.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).