git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Merging (joining/stiching/rewriting) history of "unrelated" git repositories
@ 2019-05-15 14:52 Piotr Krukowiecki
  2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 8+ messages in thread
From: Piotr Krukowiecki @ 2019-05-15 14:52 UTC (permalink / raw)
  To: git

Hello,

I'm migrating two repositories from svn. I already did svn->git
migration (git-svn clone) and now have two git repositories.

I would like to merge them into 1 git repository, but to merge also
history - branches and tags.

The reason is that the svn repositories in fact represent one
"project" - you had to download both of then, they are not useful
separately. Tags were applied to both repositories, also list of
branches is almost identical for both.

So right now I have:

    - projectA:
       master: r1, r4, r5, r7
       branch1: r10, r11, r13
    - projectB:
       master: r2, r3, r6
       branch1: r12, r14

The content of projectA and projectB is different (let's say projectA
is in subfolder A and projectB is in subfolder B). So revisions on
projectA branches have only A folder, and revisions on projectB
branches have only B folder.

But I would like to have:

    - projectAB:
       master: r1', r2', r3', r4', r5', r6', r7'
       branch1: r10', r11', r12', r13', r14'

Where all revisions have content from both projects. For example, the
r5' should have the "A" folder content the same as r5, but also should
have "B" folder content the same as in r3 (because r3 was the last
commit to projectB (date-wise) before commit r5 to projectA).

There's additional difficulty of handling merges...


Any suggestions on what's the best way to do it?


Currently I'm testing join-git-repos.py script
(https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
but it's slow, memory inefficient and handles "master" branch only...


Thanks,

-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-15 14:52 Merging (joining/stiching/rewriting) history of "unrelated" git repositories Piotr Krukowiecki
@ 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
  2019-05-15 20:33   ` Elijah Newren
  2019-05-16  6:10   ` Piotr Krukowiecki
  0 siblings, 2 replies; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-15 15:25 UTC (permalink / raw)
  To: Piotr Krukowiecki; +Cc: git


On Wed, May 15 2019, Piotr Krukowiecki wrote:

> Hello,
>
> I'm migrating two repositories from svn. I already did svn->git
> migration (git-svn clone) and now have two git repositories.
>
> I would like to merge them into 1 git repository, but to merge also
> history - branches and tags.
>
> The reason is that the svn repositories in fact represent one
> "project" - you had to download both of then, they are not useful
> separately. Tags were applied to both repositories, also list of
> branches is almost identical for both.
>
> So right now I have:
>
>     - projectA:
>        master: r1, r4, r5, r7
>        branch1: r10, r11, r13
>     - projectB:
>        master: r2, r3, r6
>        branch1: r12, r14
>
> The content of projectA and projectB is different (let's say projectA
> is in subfolder A and projectB is in subfolder B). So revisions on
> projectA branches have only A folder, and revisions on projectB
> branches have only B folder.
>
> But I would like to have:
>
>     - projectAB:
>        master: r1', r2', r3', r4', r5', r6', r7'
>        branch1: r10', r11', r12', r13', r14'
>
> Where all revisions have content from both projects. For example, the
> r5' should have the "A" folder content the same as r5, but also should
> have "B" folder content the same as in r3 (because r3 was the last
> commit to projectB (date-wise) before commit r5 to projectA).
>
> There's additional difficulty of handling merges...
>
>>
> Any suggestions on what's the best way to do it?
>
>
> Currently I'm testing join-git-repos.py script
> (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
> but it's slow, memory inefficient and handles "master" branch only...
>
>
> Thanks,

You might be able to use https://github.com/newren/git-filter-repo

But I'd say try something even more stupid first:

 1. Migrate repo A to Git
 2. Migrate repo B to Git
 3. "git subtree add" B's history to A
 4. "git rebase" the history to linear-ize it

At this point you'll have A's history first, then B. Then run some
script to date order the commits, and just "git cherry-pick" those in
the order desired in a loop to a fresh history.

Maybe that sort of stupidity will wreck your merges etc., so you might
need less stupid methods :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
@ 2019-05-15 20:33   ` Elijah Newren
  2019-05-16  6:38     ` Piotr Krukowiecki
  2019-05-20 13:54     ` Jakub Narebski
  2019-05-16  6:10   ` Piotr Krukowiecki
  1 sibling, 2 replies; 8+ messages in thread
From: Elijah Newren @ 2019-05-15 20:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Piotr Krukowiecki, Git Mailing List

On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, May 15 2019, Piotr Krukowiecki wrote:
>
> > Hello,
> >
> > I'm migrating two repositories from svn. I already did svn->git
> > migration (git-svn clone) and now have two git repositories.
> >
> > I would like to merge them into 1 git repository, but to merge also
> > history - branches and tags.
> >
> > The reason is that the svn repositories in fact represent one
> > "project" - you had to download both of then, they are not useful
> > separately. Tags were applied to both repositories, also list of
> > branches is almost identical for both.
> >
> > So right now I have:
> >
> >     - projectA:
> >        master: r1, r4, r5, r7
> >        branch1: r10, r11, r13
> >     - projectB:
> >        master: r2, r3, r6
> >        branch1: r12, r14
> >
> > The content of projectA and projectB is different (let's say projectA
> > is in subfolder A and projectB is in subfolder B). So revisions on
> > projectA branches have only A folder, and revisions on projectB
> > branches have only B folder.
> >
> > But I would like to have:
> >
> >     - projectAB:
> >        master: r1', r2', r3', r4', r5', r6', r7'
> >        branch1: r10', r11', r12', r13', r14'
> >
> > Where all revisions have content from both projects. For example, the
> > r5' should have the "A" folder content the same as r5, but also should
> > have "B" folder content the same as in r3 (because r3 was the last
> > commit to projectB (date-wise) before commit r5 to projectA).
> >
> > There's additional difficulty of handling merges...
> >
> >>
> > Any suggestions on what's the best way to do it?
> >
> >
> > Currently I'm testing join-git-repos.py script
> > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
> > but it's slow, memory inefficient and handles "master" branch only...
> >
> >
> > Thanks,
>
> You might be able to use https://github.com/newren/git-filter-repo

Splicing repos is an interesting case, but unless the history is
linear and the branch/tag names exactly match and you are simplify
weaving commits together based on timestamp within the same
branch/tag, then I don't know what algorithm should be used to weave
them together.  There are lots of choices, and "correct" may be very
usecase-specific.

That said, filter-repo was designed to be usable as a library and has
a few simple examples of such usage, including one of splicing some
trivial repos together.  (See
https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py
and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121)
 As noted there, fast-export's diff against first parent handling
makes splicing commits into the second (or third) parent history of a
merge problematic as new files introduced in such locations would by
default appear to get deleted by the merge unless additional work is
done to also insert the files there.  My example was meant as a simple
testcase that should be easy to inspect by others, so it just worked
with very short linear histories.

Somewhat interestingly, a search on others having tried to solve this
same problem turned up
https://github.com/j5int/jbosstools-gitmigration, which apparently is
based on git_fast_filter, which is the predecessor of filter-repo.
Perhaps that tool would be useful to you as-is, though they apparently
do ignore merges.

If folks have a good idea for a weaving algorithm that appears
generally useful rather than usecase-specific, then I may be
interested in coding it up as a more general example of using
filter-repo as a library.  But every time I've thought about it before
it just sounded too hairy and too usecase specific so I've just punted
on it.

> But I'd say try something even more stupid first:

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
  2019-05-15 20:33   ` Elijah Newren
@ 2019-05-16  6:10   ` Piotr Krukowiecki
  1 sibling, 0 replies; 8+ messages in thread
From: Piotr Krukowiecki @ 2019-05-16  6:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Wed, May 15, 2019 at 5:25 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Wed, May 15 2019, Piotr Krukowiecki wrote:
>
> > Hello,
> >
> > I'm migrating two repositories from svn. I already did svn->git
> > migration (git-svn clone) and now have two git repositories.
> >
> > I would like to merge them into 1 git repository, but to merge also
> > history - branches and tags.
> >
[...]
> > There's additional difficulty of handling merges...
> >
> You might be able to use https://github.com/newren/git-filter-repo
>
> But I'd say try something even more stupid first:
>
>  1. Migrate repo A to Git
>  2. Migrate repo B to Git
>  3. "git subtree add" B's history to A
>  4. "git rebase" the history to linear-ize it
>
> At this point you'll have A's history first, then B. Then run some
> script to date order the commits, and just "git cherry-pick" those in
> the order desired in a loop to a fresh history.
>
> Maybe that sort of stupidity will wreck your merges etc., so you might
> need less stupid methods :)

I think both git-filter-repo and the subtree+rebase do not handle
branches/merges well :(


-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-15 20:33   ` Elijah Newren
@ 2019-05-16  6:38     ` Piotr Krukowiecki
  2019-05-17 13:08       ` Piotr Krukowiecki
  2019-05-20 13:54     ` Jakub Narebski
  1 sibling, 1 reply; 8+ messages in thread
From: Piotr Krukowiecki @ 2019-05-16  6:38 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Wed, May 15, 2019 at 10:34 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> > On Wed, May 15 2019, Piotr Krukowiecki wrote:
> >
> > > Hello,
> > >
> > > I'm migrating two repositories from svn. I already did svn->git
> > > migration (git-svn clone) and now have two git repositories.
> > >
> > > I would like to merge them into 1 git repository, but to merge also
> > > history - branches and tags.
> > >
> > > The reason is that the svn repositories in fact represent one
> > > "project" - you had to download both of then, they are not useful
> > > separately. Tags were applied to both repositories, also list of
> > > branches is almost identical for both.
> > >
> > > So right now I have:
> > >
> > >     - projectA:
> > >        master: r1, r4, r5, r7
> > >        branch1: r10, r11, r13
> > >     - projectB:
> > >        master: r2, r3, r6
> > >        branch1: r12, r14
> > >
> > > The content of projectA and projectB is different (let's say projectA
> > > is in subfolder A and projectB is in subfolder B). So revisions on
> > > projectA branches have only A folder, and revisions on projectB
> > > branches have only B folder.
> > >
> > > But I would like to have:
> > >
> > >     - projectAB:
> > >        master: r1', r2', r3', r4', r5', r6', r7'
> > >        branch1: r10', r11', r12', r13', r14'
> > >
> > > Where all revisions have content from both projects. For example, the
> > > r5' should have the "A" folder content the same as r5, but also should
> > > have "B" folder content the same as in r3 (because r3 was the last
> > > commit to projectB (date-wise) before commit r5 to projectA).
> > >
> > > There's additional difficulty of handling merges...
> > >
> > >>
> > > Any suggestions on what's the best way to do it?
> > >
> > >
> > > Currently I'm testing join-git-repos.py script
> > > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
> > > but it's slow, memory inefficient and handles "master" branch only...
> > >
> > >
> > > Thanks,
> >
> > You might be able to use https://github.com/newren/git-filter-repo
>
> Splicing repos is an interesting case, but unless the history is
> linear and the branch/tag names exactly match and you are simplify
> weaving commits together based on timestamp within the same
> branch/tag, then I don't know what algorithm should be used to weave
> them together.  There are lots of choices, and "correct" may be very
> usecase-specific.
>
> That said, filter-repo was designed to be usable as a library and has
> a few simple examples of such usage, including one of splicing some
> trivial repos together.  (See
> https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py
> and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121)
>  As noted there, fast-export's diff against first parent handling
> makes splicing commits into the second (or third) parent history of a
> merge problematic as new files introduced in such locations would by
> default appear to get deleted by the merge unless additional work is
> done to also insert the files there.  My example was meant as a simple
> testcase that should be easy to inspect by others, so it just worked
> with very short linear histories.
>
> Somewhat interestingly, a search on others having tried to solve this
> same problem turned up
> https://github.com/j5int/jbosstools-gitmigration, which apparently is
> based on git_fast_filter, which is the predecessor of filter-repo.
> Perhaps that tool would be useful to you as-is, though they apparently
> do ignore merges.
>
> If folks have a good idea for a weaving algorithm that appears
> generally useful rather than usecase-specific, then I may be
> interested in coding it up as a more general example of using
> filter-repo as a library.  But every time I've thought about it before
> it just sounded too hairy and too usecase specific so I've just punted
> on it.

At first I thought that joining history with branch/merge support
should be simple, but in fact it is not. At least for git repo. Now I
think it is impossible.

It should be possible for svn, or for git repo migrated from svn which
still has the git-svn-id string as part of commit message. So you know
which branch any commit belongs.

For example:

projectA:
- ra1 - ra3 - ra6 - ra9 * ra11       (master)
      |                 |
      - ra4 - ra7 - ra8 -            (branchX)

projectB:
- rb2 - rb5 - rb10 -                 (master)


Merged project AB should look like this:

project AB:
- ra1 - rb2 - ra3 - rb5 - ra6 - ra9 - rb10 * ra11       (master)
            |                              |
            - ra4 - ra7 - ra8 --------------            (branch)

Because you know that rb5 was on branch "master" (trunk) then you know
it should be applied to the same branch in projectA, but not to branch
"branchX". This information is lost in git:

projectA (git):
- ra1 - ra3 - ra6 - ra9 * ra11       (master)
      |                 |
      - ra4 - ra7 - ra8 -

You do not know which parent of ra11 represented the main "master"
branch. Should rb5 be added after between ra4 and ra7, or between ra3
and ra6?


Does it make sense?


Maybe I should look for a way to first prepare "merged" svn repo
projectAB. Maybe there's a tool which can do it. And then migrate it
to svn.

-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-16  6:38     ` Piotr Krukowiecki
@ 2019-05-17 13:08       ` Piotr Krukowiecki
  0 siblings, 0 replies; 8+ messages in thread
From: Piotr Krukowiecki @ 2019-05-17 13:08 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Thu, May 16, 2019 at 8:38 AM Piotr Krukowiecki
<piotr.krukowiecki@gmail.com> wrote:
>
> On Wed, May 15, 2019 at 10:34 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> > >
> > > On Wed, May 15 2019, Piotr Krukowiecki wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm migrating two repositories from svn. I already did svn->git
> > > > migration (git-svn clone) and now have two git repositories.
> > > >
> > > > I would like to merge them into 1 git repository, but to merge also
> > > > history - branches and tags.
[...]
> > > You might be able to use https://github.com/newren/git-filter-repo
> >
> > Splicing repos is an interesting case, but unless the history is
> > linear and the branch/tag names exactly match and you are simplify
> > weaving commits together based on timestamp within the same
> > branch/tag, then I don't know what algorithm should be used to weave
> > them together.  There are lots of choices, and "correct" may be very
> > usecase-specific.
> >
> > That said, filter-repo was designed to be usable as a library and has
> > a few simple examples of such usage, including one of splicing some
> > trivial repos together.  (See
> > https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py
> > and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121)

I'll try writing some script using the filter-repo. Looked at
splice_repos.py and fast-export/fast-import format and it looks
promising / relatively simple.


> > Somewhat interestingly, a search on others having tried to solve this
> > same problem turned up
> > https://github.com/j5int/jbosstools-gitmigration, which apparently is
> > based on git_fast_filter, which is the predecessor of filter-repo.
> > Perhaps that tool would be useful to you as-is, though they apparently
> > do ignore merges.

I tried using it but it didn't work. Don't remember what was the problem though.


> Maybe I should look for a way to first prepare "merged" svn repo
> projectAB. Maybe there's a tool which can do it. And then migrate it
> to svn.

I tried this - use svnadmin dump + svndumpfilter + svnadmin load to
remodel svn repository and "splice" the history (by replacing 2nd
project paths to appear to be added to the 1st project path). But it
didn't work. The resulting svn repository had incorrect history (from
my point of view).

I also looked at reposurgeon. Maybe it could do the splicing, but I
given up on learning it, documentation isn't very helpful and there's
not enough examples in the internet...


-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-15 20:33   ` Elijah Newren
  2019-05-16  6:38     ` Piotr Krukowiecki
@ 2019-05-20 13:54     ` Jakub Narebski
  2019-05-21  7:53       ` Piotr Krukowiecki
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2019-05-20 13:54 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason, Piotr Krukowiecki,
	Git Mailing List

Elijah Newren <newren@gmail.com> writes:
> On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Wed, May 15 2019, Piotr Krukowiecki wrote:
>>>
>>> I'm migrating two repositories from svn. I already did svn->git
>>> migration (git-svn clone) and now have two git repositories.
>>>
>>> I would like to merge them into 1 git repository, but to merge also
>>> history - branches and tags.
>>>
>>> The reason is that the svn repositories in fact represent one
>>> "project" - you had to download both of then, they are not useful
>>> separately. Tags were applied to both repositories, also list of
>>> branches is almost identical for both.
>>>
>>> So right now I have:
>>>
>>>     - projectA:
>>>        master: r1, r4, r5, r7
>>>        branch1: r10, r11, r13
>>>     - projectB:
>>>        master: r2, r3, r6
>>>        branch1: r12, r14
>>>
>>> The content of projectA and projectB is different (let's say projectA
>>> is in subfolder A and projectB is in subfolder B). So revisions on
>>> projectA branches have only A folder, and revisions on projectB
>>> branches have only B folder.
>>>
>>> But I would like to have:
>>>
>>>     - projectAB:
>>>        master: r1', r2', r3', r4', r5', r6', r7'
>>>        branch1: r10', r11', r12', r13', r14'
>>>
>>> Where all revisions have content from both projects. For example, the
>>> r5' should have the "A" folder content the same as r5, but also should
>>> have "B" folder content the same as in r3 (because r3 was the last
>>> commit to projectB (date-wise) before commit r5 to projectA).
>>>
>>> There's additional difficulty of handling merges...
>>>
>>>>
>>> Any suggestions on what's the best way to do it?
>>>
>>>
>>> Currently I'm testing join-git-repos.py script
>>> (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
>>> but it's slow, memory inefficient and handles "master" branch only...
>>
>> You might be able to use https://github.com/newren/git-filter-repo
[...]

> Somewhat interestingly, a search on others having tried to solve this
> same problem turned up
> https://github.com/j5int/jbosstools-gitmigration, which apparently is
> based on git_fast_filter, which is the predecessor of filter-repo.
> Perhaps that tool would be useful to you as-is, though they apparently
> do ignore merges.

There is also reposurgeon tool; though its main purported purpose is to
aid migrating from one version control system to another, it can also be
used to edit repositories (utilizing fast-import stream).

  https://gitlab.com/esr/reposurgeon
  http://www.catb.org/~esr/reposurgeon/

Hope that helps,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
  2019-05-20 13:54     ` Jakub Narebski
@ 2019-05-21  7:53       ` Piotr Krukowiecki
  0 siblings, 0 replies; 8+ messages in thread
From: Piotr Krukowiecki @ 2019-05-21  7:53 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Elijah Newren, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Mon, May 20, 2019 at 3:54 PM Jakub Narebski <jnareb@gmail.com> wrote:
> There is also reposurgeon tool; though its main purported purpose is to
> aid migrating from one version control system to another, it can also be
> used to edit repositories (utilizing fast-import stream).
>
>   https://gitlab.com/esr/reposurgeon
>   http://www.catb.org/~esr/reposurgeon/

If only there was real documentation for it...


[resending in plain text...]

-- 
Piotr Krukowiecki

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-21  7:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-15 14:52 Merging (joining/stiching/rewriting) history of "unrelated" git repositories Piotr Krukowiecki
2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
2019-05-15 20:33   ` Elijah Newren
2019-05-16  6:38     ` Piotr Krukowiecki
2019-05-17 13:08       ` Piotr Krukowiecki
2019-05-20 13:54     ` Jakub Narebski
2019-05-21  7:53       ` Piotr Krukowiecki
2019-05-16  6:10   ` Piotr Krukowiecki

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).