git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Transform log message during migration svn -> git (using git-svn)
@ 2017-06-20  7:32 paul.mattke
  2017-06-20  9:32 ` Lars Schneider
  0 siblings, 1 reply; 6+ messages in thread
From: paul.mattke @ 2017-06-20  7:32 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

Hi there,

this is actually not really a bug report, but much more a feature request
(if I did not oversee an already existing feature like this):

We want to migrate our SVN repository to GIT and will be using git-svn for
that of course. Currently in SVN, all our commit log messages start either
with:

123456 (a number, representing the Bug Id in our old legacy bug tracker)

or

T123456 (a number, but prefixed with T, referring a TFS item in this case)

During conversion to GIT, we want to replace the T in such log messages with
a #, so commits, referring a TFS item will start with #123456 in the future.
We don’t care about log messages which do not start with a T, only the
TXXXXXX messages need to be transformed here.

I guess an operation like this is currently not possible with git-svn, isn’t
it? So it would be nice, if a feature could be implemented that gives the
user the possibility to specify some kind of script file for example, which
transforms the log message in any way we want it.

Paul Mattke
Software Developer
-------------------------------------------------
Arvato Systems S4M GmbH
Am Coloneum 3
50829 Köln
 
Phone: +49 221 28555-443
Fax: +49 221 28555-210
E-Mail: paul.mattke@s4m.com
www.s4m.arvato-systems.com


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5568 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Transform log message during migration svn -> git (using git-svn)
  2017-06-20  7:32 Transform log message during migration svn -> git (using git-svn) paul.mattke
@ 2017-06-20  9:32 ` Lars Schneider
  2017-06-20 12:32   ` AW: " paul.mattke
  0 siblings, 1 reply; 6+ messages in thread
From: Lars Schneider @ 2017-06-20  9:32 UTC (permalink / raw)
  To: paul.mattke; +Cc: git


> On 20 Jun 2017, at 09:32, paul.mattke@s4m.com wrote:
> 
> Hi there,
> 
> this is actually not really a bug report, but much more a feature request
> (if I did not oversee an already existing feature like this):
> 
> We want to migrate our SVN repository to GIT and will be using git-svn for
> that of course. Currently in SVN, all our commit log messages start either
> with:
> 
> 123456 (a number, representing the Bug Id in our old legacy bug tracker)
> 
> or
> 
> T123456 (a number, but prefixed with T, referring a TFS item in this case)
> 
> During conversion to GIT, we want to replace the T in such log messages with
> a #, so commits, referring a TFS item will start with #123456 in the future.
> We don’t care about log messages which do not start with a T, only the
> TXXXXXX messages need to be transformed here.
> 
> I guess an operation like this is currently not possible with git-svn, isn’t
> it? So it would be nice, if a feature could be implemented that gives the
> user the possibility to specify some kind of script file for example, which
> transforms the log message in any way we want it.

You can migrate your repo from SVN to Git as is. Afterwards you can
fix up the commit messages with the following command:

git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

(this might take a while on a large repo)

- Lars

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: Transform log message during migration svn -> git (using git-svn)
  2017-06-20  9:32 ` Lars Schneider
@ 2017-06-20 12:32   ` paul.mattke
  2017-06-20 12:46     ` Lars Schneider
  2017-06-20 18:50     ` Andreas Heiduk
  0 siblings, 2 replies; 6+ messages in thread
From: paul.mattke @ 2017-06-20 12:32 UTC (permalink / raw)
  To: larsxschneider; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2356 bytes --]

Well this is a possibility, of course. Our problem is that our SVN
repository contains about 220.000 revisions currently. As a colleague of
mine said that the command you suggest might take about 4 seconds per
revision, it would take about 10 days to do this for our whole repository.
So of course it could save a lot of time generally if such operation could
be done immediately during git-svn.

Paul Mattke
Software Developer
-------------------------------------------------
Arvato Systems S4M GmbH
Am Coloneum 3
50829 Köln
 
Phone: +49 221 28555-443
Fax: +49 221 28555-210
E-Mail: paul.mattke@s4m.com
www.s4m.arvato-systems.com


-----Ursprüngliche Nachricht-----
Von: Lars Schneider [mailto:larsxschneider@gmail.com] 
Gesendet: Dienstag, 20. Juni 2017 11:32
An: Mattke, Paul, NMM-BPDD <paul.mattke@s4m.com>
Cc: git@vger.kernel.org
Betreff: Re: Transform log message during migration svn -> git (using
git-svn)


> On 20 Jun 2017, at 09:32, paul.mattke@s4m.com wrote:
> 
> Hi there,
> 
> this is actually not really a bug report, but much more a feature 
> request (if I did not oversee an already existing feature like this):
> 
> We want to migrate our SVN repository to GIT and will be using git-svn 
> for that of course. Currently in SVN, all our commit log messages 
> start either
> with:
> 
> 123456 (a number, representing the Bug Id in our old legacy bug 
> tracker)
> 
> or
> 
> T123456 (a number, but prefixed with T, referring a TFS item in this 
> case)
> 
> During conversion to GIT, we want to replace the T in such log 
> messages with a #, so commits, referring a TFS item will start with
#123456 in the future.
> We don’t care about log messages which do not start with a T, only the 
> TXXXXXX messages need to be transformed here.
> 
> I guess an operation like this is currently not possible with git-svn, 
> isn’t it? So it would be nice, if a feature could be implemented that 
> gives the user the possibility to specify some kind of script file for 
> example, which transforms the log message in any way we want it.

You can migrate your repo from SVN to Git as is. Afterwards you can fix up
the commit messages with the following command:

git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

(this might take a while on a large repo)

- Lars

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5568 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Transform log message during migration svn -> git (using git-svn)
  2017-06-20 12:32   ` AW: " paul.mattke
@ 2017-06-20 12:46     ` Lars Schneider
  2017-06-20 14:51       ` Jeff King
  2017-06-20 18:50     ` Andreas Heiduk
  1 sibling, 1 reply; 6+ messages in thread
From: Lars Schneider @ 2017-06-20 12:46 UTC (permalink / raw)
  To: paul.mattke; +Cc: git


> On 20 Jun 2017, at 14:32, <paul.mattke@s4m.com> <paul.mattke@s4m.com> wrote:
> 
> Well this is a possibility, of course. Our problem is that our SVN
> repository contains about 220.000 revisions currently. As a colleague of
> mine said that the command you suggest might take about 4 seconds per
> revision, it would take about 10 days to do this for our whole repository.
> So of course it could save a lot of time generally if such operation could
> be done immediately during git-svn.

You colleague is most likely correct. I suggested it as this is a one time
operation and therefore still somewhat practical from my point of view.

If you don't like the solution then you need to change the git-svn code.
Probably here somewhere (I am not familiar with this code):
https://github.com/git/git/blob/master/git-svn.perl#L1836

- Lars

PS: Please don't top post on this mailing list :-)
https://en.wikipedia.org/wiki/Posting_style#Top-posting



> 
> Paul Mattke
> Software Developer
> -------------------------------------------------
> Arvato Systems S4M GmbH
> Am Coloneum 3
> 50829 Köln
>  
> Phone: +49 221 28555-443
> Fax: +49 221 28555-210
> E-Mail: paul.mattke@s4m.com
> www.s4m.arvato-systems.com
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Lars Schneider [mailto:larsxschneider@gmail.com] 
> Gesendet: Dienstag, 20. Juni 2017 11:32
> An: Mattke, Paul, NMM-BPDD <paul.mattke@s4m.com>
> Cc: git@vger.kernel.org
> Betreff: Re: Transform log message during migration svn -> git (using
> git-svn)
> 
> 
>> On 20 Jun 2017, at 09:32, paul.mattke@s4m.com wrote:
>> 
>> Hi there,
>> 
>> this is actually not really a bug report, but much more a feature 
>> request (if I did not oversee an already existing feature like this):
>> 
>> We want to migrate our SVN repository to GIT and will be using git-svn 
>> for that of course. Currently in SVN, all our commit log messages 
>> start either
>> with:
>> 
>> 123456 (a number, representing the Bug Id in our old legacy bug 
>> tracker)
>> 
>> or
>> 
>> T123456 (a number, but prefixed with T, referring a TFS item in this 
>> case)
>> 
>> During conversion to GIT, we want to replace the T in such log 
>> messages with a #, so commits, referring a TFS item will start with
> #123456 in the future.
>> We don’t care about log messages which do not start with a T, only the 
>> TXXXXXX messages need to be transformed here.
>> 
>> I guess an operation like this is currently not possible with git-svn, 
>> isn’t it? So it would be nice, if a feature could be implemented that 
>> gives the user the possibility to specify some kind of script file for 
>> example, which transforms the log message in any way we want it.
> 
> You can migrate your repo from SVN to Git as is. Afterwards you can fix up
> the commit messages with the following command:
> 
> git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'
> 
> (this might take a while on a large repo)
> 
> - Lars


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Transform log message during migration svn -> git (using git-svn)
  2017-06-20 12:46     ` Lars Schneider
@ 2017-06-20 14:51       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-06-20 14:51 UTC (permalink / raw)
  To: Lars Schneider; +Cc: paul.mattke, git

On Tue, Jun 20, 2017 at 02:46:22PM +0200, Lars Schneider wrote:

> 
> > On 20 Jun 2017, at 14:32, <paul.mattke@s4m.com> <paul.mattke@s4m.com> wrote:
> > 
> > Well this is a possibility, of course. Our problem is that our SVN
> > repository contains about 220.000 revisions currently. As a colleague of
> > mine said that the command you suggest might take about 4 seconds per
> > revision, it would take about 10 days to do this for our whole repository.
> > So of course it could save a lot of time generally if such operation could
> > be done immediately during git-svn.
> 
> You colleague is most likely correct. I suggested it as this is a one time
> operation and therefore still somewhat practical from my point of view.

I didn't follow this whole thread, but I happened to see this bit. I
think the command in question is:

  git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

I know filter-branch is slow, but a msg-filter should be relatively
fast.  I'd be surprised at 4 seconds per revision (the main cost is
kicking off a new perl process per revision). It's more like 120/sec on
my machine.

However, I think the fastest way would be to do it with fast-export,
where you can just tweak the stream as it flows through:

  # set up a new repo to hold the results; we won't bother
  # copying the blobs, so just point at the current repo as an
  # alternate.
  git init fixed-repo
  echo "../../../.git/objects" >fixed-repo/.git/objects/info/alternates

  git fast-export --no-data --all |
  perl -ne '
	# look for "data" chunks which contain the commit message
	if (/^data (\d+)/) {
		read STDIN, my $buf, $1;
		$buf =~ s/^T(\d+)/#$1/;
		print "data ", length($buf), "\n";
		print $buf;
	} else {
		print;
	}
  ' |
  git -C fixed-repo fast-import

That runs at about 3600 commits/sec on my machine.

Most of that time goes to doing a tree diff on each commit. Technically
that is not required for your use case, but I don't think there's a way
to get fast-export to skip that (and it's an inherent part of the
fast-import stream). It's probably fast enough, but it's possible that
a specialized tool like BFG repo cleaner[1] could do better (I don't
know offhand if it handles commit message rewrites or not).

-Peff

[1] https://rtyley.github.io/bfg-repo-cleaner/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Transform log message during migration svn -> git (using git-svn)
  2017-06-20 12:32   ` AW: " paul.mattke
  2017-06-20 12:46     ` Lars Schneider
@ 2017-06-20 18:50     ` Andreas Heiduk
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Heiduk @ 2017-06-20 18:50 UTC (permalink / raw)
  To: paul.mattke, larsxschneider; +Cc: git


Am 20.06.2017 um 14:32 schrieb paul.mattke@s4m.com:
> Well this is a possibility, of course. Our problem is that our SVN
> repository contains about 220.000 revisions currently. As a colleague of
> mine said that the command you suggest might take about 4 seconds per
> revision, it would take about 10 days to do this for our whole repository.
> So of course it could save a lot of time generally if such operation could
> be done immediately during git-svn.

My data point is this: A "git filter branch" run with ~2000 revisions
took several hours on a Windows 7 box. That number seems to be roughly
the same as your number. A comparable run on a Linux box took only about
10 minutes.

So: If your benchmark was done on Windows you might do that also on Linux.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-20 18:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20  7:32 Transform log message during migration svn -> git (using git-svn) paul.mattke
2017-06-20  9:32 ` Lars Schneider
2017-06-20 12:32   ` AW: " paul.mattke
2017-06-20 12:46     ` Lars Schneider
2017-06-20 14:51       ` Jeff King
2017-06-20 18:50     ` Andreas Heiduk

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).