git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Andrew Sayers <andrew-git@pileofstuff.org>
To: David Barr <davidbarr@google.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>, Jeff King <peff@peff.net>,
	git@vger.kernel.org, Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	Sam Vilain <sam@vilain.net>, Stephen Bash <bash@genarts.com>
Subject: Re: [RFC] "Remote helper for Subversion" project
Date: Sun, 04 Mar 2012 13:36:41 +0000	[thread overview]
Message-ID: <4F536FE9.1050000@pileofstuff.org> (raw)
In-Reply-To: <CAFfmPPPs0FRbT-i+ZwBLNSca330Eo7thjNxDt3hJf0yUATthtQ@mail.gmail.com>

Hi guys,

I made a few little git contributions a couple of years back, before
being swallowed up by my old job.  I've become quite interested in
svn->git translation since starting a new job, but I'm delurking earlier
than planned so please bear with me as all I have right now is an
interesting proof of concept and the possibility of some free time some
day.  If things go well, I hope to help out with the
branching-and-merging part of the problem, so I'll describe what I've
done and how it might affect SoC projects.  Apologies in advance for
hijacking the thread :)

I mentioned proof-of-concept code for the work I've done - I'd like to
get confirmation from my new employer before putting it all online, so
hopefully I can make it available in a few days.

While researching the problem, I found Stephen Bash's original
proposal[1] and snerp-vortex[2] quite inspiring, but wasn't able to find
any details on SoC-related work in the branching-and-merging department
- hopefully the following isn't just a retread of ideas developed since
then.  I've concentrated on importing from SVN so far, but have kept an
eye on update and half an eye on bi-direction in the hopes of being
useful there some day.

It seems to me the "svn export" and "git import" steps make most sense
as two unrelated projects.  Snerp-vortex and Stephen's scripts both cut
the history import problem at that point, as do svn-fe/git-fast-import
with code import.  Exporting SVN history is a messy and sometimes
project-specific job, so allowing a project to concentrate on that part
makes it possible for SVN experts to use all their skills without having
to learn git plumbing before they make their first commit (much respect
to Stephen for managing that feat BTW).


I've written a proof-of-concept history converter that can be split into
three parts: a format for describing SVN history; a large, often messy
Perl program that writes files in that format; and a small Perl program
that reads the format and translates it into git.  With hindsight, Perl
is the right language for the SVN exporter, but the git importer would
have been better written in C.


Personally, I think SVN export will always need a strong manual
component to get the best results, so I've put quite a bit of work into
designing a good SVN history format.  Like git-fast-import, it's an
ASCII format designed both for human and machine consumption:

In r1, create branch "trunk"
In r10, create branch "branches/foo" as "foo" from "trunk" r9
In r12, create tag "tags/version1" as "version1" from "trunk" r11
In r12, deactivate "tags/version1"
# blank lines and lines beginning with a '#' are ignored
In r15, merge "branches/foo" r14 into "trunk"
In r15, delete "branches/foo"

The above has been designed as an abstract representation of SVN
history, with as little git-specific content as possible.  This turns
out to make the problem a bit easier to think about, a bit easier to
implement, and potentially useful to other DVCSs some day.  I wanted to
enable svn-merge2git[3]-style extraction of merge info from logs, but
found that even a message as clear as "merge r123 from trunk" could mean
"cherry-pick revision 123", "merge everything up to revision 123",
"merge everything up to the last revision that touched trunk before
123", "merge everything up to 132", or any number of other things.  As
such, the format is designed to allow copious comments and ease of
bulk-editing with regexps and a powerful editor.

The format feels relatively complete to me, and in any case not a good
candidate for an SoC project because it's all about experience and
nothing to do with raw talent.


Once the format is defined, git import is fairly straightforward.
Proof-of-concept code to follow, but it's really just a wrapper around
git-commit-tree, git-mktag etc.  I wrote this in Perl thinking it would
relate somehow to git-svn, but eventually realised it didn't and that a
few hundred calls to (plumbing) processes per second isn't so good for
performance.  The only interesting part of the problem is how to tackle
SVN tags.  I went for an ambitious approach, making normal tags where
possible and downgrading them to lightweight tags when necessary.  This
does involve managing something that is effectively a branch in
refs/tags/, but what else is an SVN tag but a branch in the wrong namespace?

I'm afraid I don't know enough about SoC to say whether rewriting git
import in C would make a good project - on the one hand, it's a smallish
bit of work that would make a student learn a lot of good git code, on
the other hand it would mostly just involve fixing up a bit of ugly Perl
with little chance to show any creativity.


SVN export is much more complicated, and has taken most of my time.
Although there's no way to tell automatically which directories are
branches, I realised you can detect trunks automatically about 90% of
the time by looking for where files/directories are first created, and
can detect non-trunk branches at least 99% of the time once you know the
trunks.  Trunk detection is normally just a convenience, but can be a
lifesaver when importing from a sufficiently messy repository.  There's
a lot you can do with merge detection, but between svn:mergeinfo,
svnmerge.py[4], svk[5] and log messages I realise I've only scratched
the surface.  In many ways, this is a classic Perl problem - take a
bunch of messy textual input, string it all together and make some neat
textual output.  The code I've got right now seems fine for anything up
to about 20,000 commits, but eats way too much memory for huge repos.
Depending on how much time I get, I might have tackled that problem by
the time I put this code online.

Assuming I have enough time to work on SVN export, I would be loathed to
put it on anyone else in the near future.  I could see endless
optimisations being spun off once the code is mature, but right now it's
an idiosyncratic collection of untested mostly-working bits that need
serious attention before I'd be happy warping a young mind on it.


I hope this information dump explains where I am and how I can help.  As
I say, I can't yet promise how much (if any) time I'll get to work on
this in future, but I hope to help with the history translation process
if circumstances allow.  More importantly to this thread, I hope this
gives some ideas about SoC projects and places (not) to put attention.

	- Andrew Sayers

[1] http://comments.gmane.org/gmane.comp.version-control.git/158940
[2] https://github.com/rcaputo/snerp-vortex
[3] http://repo.or.cz/w/svn-merge2git.git
[4] http://www.orcaware.com/svn/wiki/Svnmerge.py
[5] http://search.cpan.org/dist/SVK/

  reply	other threads:[~2012-03-04 15:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-03 12:27 [RFC] "Remote helper for Subversion" project David Barr
2012-03-03 12:41 ` David Barr
2012-03-04  7:54   ` Jonathan Nieder
2012-03-04 10:37     ` David Barr
2012-03-04 13:36       ` Andrew Sayers [this message]
2012-03-05 15:27         ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Stephen Bash
2012-03-05 23:27           ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-06 14:36             ` Stephen Bash
2012-03-06 19:29           ` Approaches to SVN to Git conversion (was: Re: [RFC] "Remote helper for Subversion" project) Nathan Gray
2012-03-06 20:35             ` Stephen Bash
2012-03-06 23:59               ` [spf:guess] " Sam Vilain
2012-03-07 22:06                 ` Andrew Sayers
2012-03-07 23:15                   ` [spf:guess,iffy] " Sam Vilain
2012-03-08 20:51                     ` Andrew Sayers
2012-03-06 22:34             ` Approaches to SVN to Git conversion Andrew Sayers
2012-03-07 15:38               ` Sam Vilain
2012-03-07 20:28                 ` Andrew Sayers
2012-03-07 22:33               ` Phil Hord
2012-03-07 23:08               ` Nathan Gray
2012-03-07 23:32                 ` Andrew Sayers
2012-03-04 16:23       ` [RFC] "Remote helper for Subversion" project Jonathan Nieder
2012-03-27  3:58     ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F536FE9.1050000@pileofstuff.org \
    --to=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=bash@genarts.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=sam@vilain.net \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).