* Merging (joining/stiching/rewriting) history of "unrelated" git repositories @ 2019-05-15 14:52 Piotr Krukowiecki 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 8+ messages in thread From: Piotr Krukowiecki @ 2019-05-15 14:52 UTC (permalink / raw) To: git Hello, I'm migrating two repositories from svn. I already did svn->git migration (git-svn clone) and now have two git repositories. I would like to merge them into 1 git repository, but to merge also history - branches and tags. The reason is that the svn repositories in fact represent one "project" - you had to download both of then, they are not useful separately. Tags were applied to both repositories, also list of branches is almost identical for both. So right now I have: - projectA: master: r1, r4, r5, r7 branch1: r10, r11, r13 - projectB: master: r2, r3, r6 branch1: r12, r14 The content of projectA and projectB is different (let's say projectA is in subfolder A and projectB is in subfolder B). So revisions on projectA branches have only A folder, and revisions on projectB branches have only B folder. But I would like to have: - projectAB: master: r1', r2', r3', r4', r5', r6', r7' branch1: r10', r11', r12', r13', r14' Where all revisions have content from both projects. For example, the r5' should have the "A" folder content the same as r5, but also should have "B" folder content the same as in r3 (because r3 was the last commit to projectB (date-wise) before commit r5 to projectA). There's additional difficulty of handling merges... Any suggestions on what's the best way to do it? Currently I'm testing join-git-repos.py script (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) but it's slow, memory inefficient and handles "master" branch only... Thanks, -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-15 14:52 Merging (joining/stiching/rewriting) history of "unrelated" git repositories Piotr Krukowiecki @ 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason 2019-05-15 20:33 ` Elijah Newren 2019-05-16 6:10 ` Piotr Krukowiecki 0 siblings, 2 replies; 8+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2019-05-15 15:25 UTC (permalink / raw) To: Piotr Krukowiecki; +Cc: git On Wed, May 15 2019, Piotr Krukowiecki wrote: > Hello, > > I'm migrating two repositories from svn. I already did svn->git > migration (git-svn clone) and now have two git repositories. > > I would like to merge them into 1 git repository, but to merge also > history - branches and tags. > > The reason is that the svn repositories in fact represent one > "project" - you had to download both of then, they are not useful > separately. Tags were applied to both repositories, also list of > branches is almost identical for both. > > So right now I have: > > - projectA: > master: r1, r4, r5, r7 > branch1: r10, r11, r13 > - projectB: > master: r2, r3, r6 > branch1: r12, r14 > > The content of projectA and projectB is different (let's say projectA > is in subfolder A and projectB is in subfolder B). So revisions on > projectA branches have only A folder, and revisions on projectB > branches have only B folder. > > But I would like to have: > > - projectAB: > master: r1', r2', r3', r4', r5', r6', r7' > branch1: r10', r11', r12', r13', r14' > > Where all revisions have content from both projects. For example, the > r5' should have the "A" folder content the same as r5, but also should > have "B" folder content the same as in r3 (because r3 was the last > commit to projectB (date-wise) before commit r5 to projectA). > > There's additional difficulty of handling merges... > >> > Any suggestions on what's the best way to do it? > > > Currently I'm testing join-git-repos.py script > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) > but it's slow, memory inefficient and handles "master" branch only... > > > Thanks, You might be able to use https://github.com/newren/git-filter-repo But I'd say try something even more stupid first: 1. Migrate repo A to Git 2. Migrate repo B to Git 3. "git subtree add" B's history to A 4. "git rebase" the history to linear-ize it At this point you'll have A's history first, then B. Then run some script to date order the commits, and just "git cherry-pick" those in the order desired in a loop to a fresh history. Maybe that sort of stupidity will wreck your merges etc., so you might need less stupid methods :) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason @ 2019-05-15 20:33 ` Elijah Newren 2019-05-16 6:38 ` Piotr Krukowiecki 2019-05-20 13:54 ` Jakub Narebski 2019-05-16 6:10 ` Piotr Krukowiecki 1 sibling, 2 replies; 8+ messages in thread From: Elijah Newren @ 2019-05-15 20:33 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Piotr Krukowiecki, Git Mailing List On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Wed, May 15 2019, Piotr Krukowiecki wrote: > > > Hello, > > > > I'm migrating two repositories from svn. I already did svn->git > > migration (git-svn clone) and now have two git repositories. > > > > I would like to merge them into 1 git repository, but to merge also > > history - branches and tags. > > > > The reason is that the svn repositories in fact represent one > > "project" - you had to download both of then, they are not useful > > separately. Tags were applied to both repositories, also list of > > branches is almost identical for both. > > > > So right now I have: > > > > - projectA: > > master: r1, r4, r5, r7 > > branch1: r10, r11, r13 > > - projectB: > > master: r2, r3, r6 > > branch1: r12, r14 > > > > The content of projectA and projectB is different (let's say projectA > > is in subfolder A and projectB is in subfolder B). So revisions on > > projectA branches have only A folder, and revisions on projectB > > branches have only B folder. > > > > But I would like to have: > > > > - projectAB: > > master: r1', r2', r3', r4', r5', r6', r7' > > branch1: r10', r11', r12', r13', r14' > > > > Where all revisions have content from both projects. For example, the > > r5' should have the "A" folder content the same as r5, but also should > > have "B" folder content the same as in r3 (because r3 was the last > > commit to projectB (date-wise) before commit r5 to projectA). > > > > There's additional difficulty of handling merges... > > > >> > > Any suggestions on what's the best way to do it? > > > > > > Currently I'm testing join-git-repos.py script > > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) > > but it's slow, memory inefficient and handles "master" branch only... > > > > > > Thanks, > > You might be able to use https://github.com/newren/git-filter-repo Splicing repos is an interesting case, but unless the history is linear and the branch/tag names exactly match and you are simplify weaving commits together based on timestamp within the same branch/tag, then I don't know what algorithm should be used to weave them together. There are lots of choices, and "correct" may be very usecase-specific. That said, filter-repo was designed to be usable as a library and has a few simple examples of such usage, including one of splicing some trivial repos together. (See https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121) As noted there, fast-export's diff against first parent handling makes splicing commits into the second (or third) parent history of a merge problematic as new files introduced in such locations would by default appear to get deleted by the merge unless additional work is done to also insert the files there. My example was meant as a simple testcase that should be easy to inspect by others, so it just worked with very short linear histories. Somewhat interestingly, a search on others having tried to solve this same problem turned up https://github.com/j5int/jbosstools-gitmigration, which apparently is based on git_fast_filter, which is the predecessor of filter-repo. Perhaps that tool would be useful to you as-is, though they apparently do ignore merges. If folks have a good idea for a weaving algorithm that appears generally useful rather than usecase-specific, then I may be interested in coding it up as a more general example of using filter-repo as a library. But every time I've thought about it before it just sounded too hairy and too usecase specific so I've just punted on it. > But I'd say try something even more stupid first: ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-15 20:33 ` Elijah Newren @ 2019-05-16 6:38 ` Piotr Krukowiecki 2019-05-17 13:08 ` Piotr Krukowiecki 2019-05-20 13:54 ` Jakub Narebski 1 sibling, 1 reply; 8+ messages in thread From: Piotr Krukowiecki @ 2019-05-16 6:38 UTC (permalink / raw) To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List On Wed, May 15, 2019 at 10:34 PM Elijah Newren <newren@gmail.com> wrote: > > On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > > On Wed, May 15 2019, Piotr Krukowiecki wrote: > > > > > Hello, > > > > > > I'm migrating two repositories from svn. I already did svn->git > > > migration (git-svn clone) and now have two git repositories. > > > > > > I would like to merge them into 1 git repository, but to merge also > > > history - branches and tags. > > > > > > The reason is that the svn repositories in fact represent one > > > "project" - you had to download both of then, they are not useful > > > separately. Tags were applied to both repositories, also list of > > > branches is almost identical for both. > > > > > > So right now I have: > > > > > > - projectA: > > > master: r1, r4, r5, r7 > > > branch1: r10, r11, r13 > > > - projectB: > > > master: r2, r3, r6 > > > branch1: r12, r14 > > > > > > The content of projectA and projectB is different (let's say projectA > > > is in subfolder A and projectB is in subfolder B). So revisions on > > > projectA branches have only A folder, and revisions on projectB > > > branches have only B folder. > > > > > > But I would like to have: > > > > > > - projectAB: > > > master: r1', r2', r3', r4', r5', r6', r7' > > > branch1: r10', r11', r12', r13', r14' > > > > > > Where all revisions have content from both projects. For example, the > > > r5' should have the "A" folder content the same as r5, but also should > > > have "B" folder content the same as in r3 (because r3 was the last > > > commit to projectB (date-wise) before commit r5 to projectA). > > > > > > There's additional difficulty of handling merges... > > > > > >> > > > Any suggestions on what's the best way to do it? > > > > > > > > > Currently I'm testing join-git-repos.py script > > > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) > > > but it's slow, memory inefficient and handles "master" branch only... > > > > > > > > > Thanks, > > > > You might be able to use https://github.com/newren/git-filter-repo > > Splicing repos is an interesting case, but unless the history is > linear and the branch/tag names exactly match and you are simplify > weaving commits together based on timestamp within the same > branch/tag, then I don't know what algorithm should be used to weave > them together. There are lots of choices, and "correct" may be very > usecase-specific. > > That said, filter-repo was designed to be usable as a library and has > a few simple examples of such usage, including one of splicing some > trivial repos together. (See > https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py > and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121) > As noted there, fast-export's diff against first parent handling > makes splicing commits into the second (or third) parent history of a > merge problematic as new files introduced in such locations would by > default appear to get deleted by the merge unless additional work is > done to also insert the files there. My example was meant as a simple > testcase that should be easy to inspect by others, so it just worked > with very short linear histories. > > Somewhat interestingly, a search on others having tried to solve this > same problem turned up > https://github.com/j5int/jbosstools-gitmigration, which apparently is > based on git_fast_filter, which is the predecessor of filter-repo. > Perhaps that tool would be useful to you as-is, though they apparently > do ignore merges. > > If folks have a good idea for a weaving algorithm that appears > generally useful rather than usecase-specific, then I may be > interested in coding it up as a more general example of using > filter-repo as a library. But every time I've thought about it before > it just sounded too hairy and too usecase specific so I've just punted > on it. At first I thought that joining history with branch/merge support should be simple, but in fact it is not. At least for git repo. Now I think it is impossible. It should be possible for svn, or for git repo migrated from svn which still has the git-svn-id string as part of commit message. So you know which branch any commit belongs. For example: projectA: - ra1 - ra3 - ra6 - ra9 * ra11 (master) | | - ra4 - ra7 - ra8 - (branchX) projectB: - rb2 - rb5 - rb10 - (master) Merged project AB should look like this: project AB: - ra1 - rb2 - ra3 - rb5 - ra6 - ra9 - rb10 * ra11 (master) | | - ra4 - ra7 - ra8 -------------- (branch) Because you know that rb5 was on branch "master" (trunk) then you know it should be applied to the same branch in projectA, but not to branch "branchX". This information is lost in git: projectA (git): - ra1 - ra3 - ra6 - ra9 * ra11 (master) | | - ra4 - ra7 - ra8 - You do not know which parent of ra11 represented the main "master" branch. Should rb5 be added after between ra4 and ra7, or between ra3 and ra6? Does it make sense? Maybe I should look for a way to first prepare "merged" svn repo projectAB. Maybe there's a tool which can do it. And then migrate it to svn. -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-16 6:38 ` Piotr Krukowiecki @ 2019-05-17 13:08 ` Piotr Krukowiecki 0 siblings, 0 replies; 8+ messages in thread From: Piotr Krukowiecki @ 2019-05-17 13:08 UTC (permalink / raw) To: Elijah Newren; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List On Thu, May 16, 2019 at 8:38 AM Piotr Krukowiecki <piotr.krukowiecki@gmail.com> wrote: > > On Wed, May 15, 2019 at 10:34 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason > > <avarab@gmail.com> wrote: > > > > > > On Wed, May 15 2019, Piotr Krukowiecki wrote: > > > > > > > Hello, > > > > > > > > I'm migrating two repositories from svn. I already did svn->git > > > > migration (git-svn clone) and now have two git repositories. > > > > > > > > I would like to merge them into 1 git repository, but to merge also > > > > history - branches and tags. [...] > > > You might be able to use https://github.com/newren/git-filter-repo > > > > Splicing repos is an interesting case, but unless the history is > > linear and the branch/tag names exactly match and you are simplify > > weaving commits together based on timestamp within the same > > branch/tag, then I don't know what algorithm should be used to weave > > them together. There are lots of choices, and "correct" may be very > > usecase-specific. > > > > That said, filter-repo was designed to be usable as a library and has > > a few simple examples of such usage, including one of splicing some > > trivial repos together. (See > > https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py > > and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121) I'll try writing some script using the filter-repo. Looked at splice_repos.py and fast-export/fast-import format and it looks promising / relatively simple. > > Somewhat interestingly, a search on others having tried to solve this > > same problem turned up > > https://github.com/j5int/jbosstools-gitmigration, which apparently is > > based on git_fast_filter, which is the predecessor of filter-repo. > > Perhaps that tool would be useful to you as-is, though they apparently > > do ignore merges. I tried using it but it didn't work. Don't remember what was the problem though. > Maybe I should look for a way to first prepare "merged" svn repo > projectAB. Maybe there's a tool which can do it. And then migrate it > to svn. I tried this - use svnadmin dump + svndumpfilter + svnadmin load to remodel svn repository and "splice" the history (by replacing 2nd project paths to appear to be added to the 1st project path). But it didn't work. The resulting svn repository had incorrect history (from my point of view). I also looked at reposurgeon. Maybe it could do the splicing, but I given up on learning it, documentation isn't very helpful and there's not enough examples in the internet... -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-15 20:33 ` Elijah Newren 2019-05-16 6:38 ` Piotr Krukowiecki @ 2019-05-20 13:54 ` Jakub Narebski 2019-05-21 7:53 ` Piotr Krukowiecki 1 sibling, 1 reply; 8+ messages in thread From: Jakub Narebski @ 2019-05-20 13:54 UTC (permalink / raw) To: Elijah Newren Cc: Ævar Arnfjörð Bjarmason, Piotr Krukowiecki, Git Mailing List Elijah Newren <newren@gmail.com> writes: > On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> On Wed, May 15 2019, Piotr Krukowiecki wrote: >>> >>> I'm migrating two repositories from svn. I already did svn->git >>> migration (git-svn clone) and now have two git repositories. >>> >>> I would like to merge them into 1 git repository, but to merge also >>> history - branches and tags. >>> >>> The reason is that the svn repositories in fact represent one >>> "project" - you had to download both of then, they are not useful >>> separately. Tags were applied to both repositories, also list of >>> branches is almost identical for both. >>> >>> So right now I have: >>> >>> - projectA: >>> master: r1, r4, r5, r7 >>> branch1: r10, r11, r13 >>> - projectB: >>> master: r2, r3, r6 >>> branch1: r12, r14 >>> >>> The content of projectA and projectB is different (let's say projectA >>> is in subfolder A and projectB is in subfolder B). So revisions on >>> projectA branches have only A folder, and revisions on projectB >>> branches have only B folder. >>> >>> But I would like to have: >>> >>> - projectAB: >>> master: r1', r2', r3', r4', r5', r6', r7' >>> branch1: r10', r11', r12', r13', r14' >>> >>> Where all revisions have content from both projects. For example, the >>> r5' should have the "A" folder content the same as r5, but also should >>> have "B" folder content the same as in r3 (because r3 was the last >>> commit to projectB (date-wise) before commit r5 to projectA). >>> >>> There's additional difficulty of handling merges... >>> >>>> >>> Any suggestions on what's the best way to do it? >>> >>> >>> Currently I'm testing join-git-repos.py script >>> (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py) >>> but it's slow, memory inefficient and handles "master" branch only... >> >> You might be able to use https://github.com/newren/git-filter-repo [...] > Somewhat interestingly, a search on others having tried to solve this > same problem turned up > https://github.com/j5int/jbosstools-gitmigration, which apparently is > based on git_fast_filter, which is the predecessor of filter-repo. > Perhaps that tool would be useful to you as-is, though they apparently > do ignore merges. There is also reposurgeon tool; though its main purported purpose is to aid migrating from one version control system to another, it can also be used to edit repositories (utilizing fast-import stream). https://gitlab.com/esr/reposurgeon http://www.catb.org/~esr/reposurgeon/ Hope that helps, -- Jakub Narębski ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-20 13:54 ` Jakub Narebski @ 2019-05-21 7:53 ` Piotr Krukowiecki 0 siblings, 0 replies; 8+ messages in thread From: Piotr Krukowiecki @ 2019-05-21 7:53 UTC (permalink / raw) To: Jakub Narebski Cc: Elijah Newren, Ævar Arnfjörð Bjarmason, Git Mailing List On Mon, May 20, 2019 at 3:54 PM Jakub Narebski <jnareb@gmail.com> wrote: > There is also reposurgeon tool; though its main purported purpose is to > aid migrating from one version control system to another, it can also be > used to edit repositories (utilizing fast-import stream). > > https://gitlab.com/esr/reposurgeon > http://www.catb.org/~esr/reposurgeon/ If only there was real documentation for it... [resending in plain text...] -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason 2019-05-15 20:33 ` Elijah Newren @ 2019-05-16 6:10 ` Piotr Krukowiecki 1 sibling, 0 replies; 8+ messages in thread From: Piotr Krukowiecki @ 2019-05-16 6:10 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: git On Wed, May 15, 2019 at 5:25 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Wed, May 15 2019, Piotr Krukowiecki wrote: > > > Hello, > > > > I'm migrating two repositories from svn. I already did svn->git > > migration (git-svn clone) and now have two git repositories. > > > > I would like to merge them into 1 git repository, but to merge also > > history - branches and tags. > > [...] > > There's additional difficulty of handling merges... > > > You might be able to use https://github.com/newren/git-filter-repo > > But I'd say try something even more stupid first: > > 1. Migrate repo A to Git > 2. Migrate repo B to Git > 3. "git subtree add" B's history to A > 4. "git rebase" the history to linear-ize it > > At this point you'll have A's history first, then B. Then run some > script to date order the commits, and just "git cherry-pick" those in > the order desired in a loop to a fresh history. > > Maybe that sort of stupidity will wreck your merges etc., so you might > need less stupid methods :) I think both git-filter-repo and the subtree+rebase do not handle branches/merges well :( -- Piotr Krukowiecki ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-05-21 7:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-15 14:52 Merging (joining/stiching/rewriting) history of "unrelated" git repositories Piotr Krukowiecki 2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason 2019-05-15 20:33 ` Elijah Newren 2019-05-16 6:38 ` Piotr Krukowiecki 2019-05-17 13:08 ` Piotr Krukowiecki 2019-05-20 13:54 ` Jakub Narebski 2019-05-21 7:53 ` Piotr Krukowiecki 2019-05-16 6:10 ` Piotr Krukowiecki
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).