From: Jeff King <peff@peff.net>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: paul.mattke@s4m.com, git@vger.kernel.org
Subject: Re: Transform log message during migration svn -> git (using git-svn)
Date: Tue, 20 Jun 2017 10:51:01 -0400 [thread overview]
Message-ID: <20170620145101.6mjikvmxegw5pbwj@sigill.intra.peff.net> (raw)
In-Reply-To: <BDA6D349-0DF9-49F3-B2B6-0B1AE3BBD052@gmail.com>
On Tue, Jun 20, 2017 at 02:46:22PM +0200, Lars Schneider wrote:
>
> > On 20 Jun 2017, at 14:32, <paul.mattke@s4m.com> <paul.mattke@s4m.com> wrote:
> >
> > Well this is a possibility, of course. Our problem is that our SVN
> > repository contains about 220.000 revisions currently. As a colleague of
> > mine said that the command you suggest might take about 4 seconds per
> > revision, it would take about 10 days to do this for our whole repository.
> > So of course it could save a lot of time generally if such operation could
> > be done immediately during git-svn.
>
> You colleague is most likely correct. I suggested it as this is a one time
> operation and therefore still somewhat practical from my point of view.
I didn't follow this whole thread, but I happened to see this bit. I
think the command in question is:
git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'
I know filter-branch is slow, but a msg-filter should be relatively
fast. I'd be surprised at 4 seconds per revision (the main cost is
kicking off a new perl process per revision). It's more like 120/sec on
my machine.
However, I think the fastest way would be to do it with fast-export,
where you can just tweak the stream as it flows through:
# set up a new repo to hold the results; we won't bother
# copying the blobs, so just point at the current repo as an
# alternate.
git init fixed-repo
echo "../../../.git/objects" >fixed-repo/.git/objects/info/alternates
git fast-export --no-data --all |
perl -ne '
# look for "data" chunks which contain the commit message
if (/^data (\d+)/) {
read STDIN, my $buf, $1;
$buf =~ s/^T(\d+)/#$1/;
print "data ", length($buf), "\n";
print $buf;
} else {
print;
}
' |
git -C fixed-repo fast-import
That runs at about 3600 commits/sec on my machine.
Most of that time goes to doing a tree diff on each commit. Technically
that is not required for your use case, but I don't think there's a way
to get fast-export to skip that (and it's an inherent part of the
fast-import stream). It's probably fast enough, but it's possible that
a specialized tool like BFG repo cleaner[1] could do better (I don't
know offhand if it handles commit message rewrites or not).
-Peff
[1] https://rtyley.github.io/bfg-repo-cleaner/
next prev parent reply other threads:[~2017-06-20 14:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-20 7:32 Transform log message during migration svn -> git (using git-svn) paul.mattke
2017-06-20 9:32 ` Lars Schneider
2017-06-20 12:32 ` AW: " paul.mattke
2017-06-20 12:46 ` Lars Schneider
2017-06-20 14:51 ` Jeff King [this message]
2017-06-20 18:50 ` Andreas Heiduk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170620145101.6mjikvmxegw5pbwj@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
--cc=paul.mattke@s4m.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).