git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Piotr Krukowiecki <piotr.krukowiecki@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Merging (joining/stiching/rewriting) history of "unrelated" git repositories
Date: Wed, 15 May 2019 13:33:57 -0700	[thread overview]
Message-ID: <CABPp-BGycoHEMN27Z9rAccT5yVRf3N50o4sc3wo8uE_HLR9QbA@mail.gmail.com> (raw)
In-Reply-To: <874l5vwxhw.fsf@evledraar.gmail.com>

On Wed, May 15, 2019 at 8:30 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, May 15 2019, Piotr Krukowiecki wrote:
>
> > Hello,
> >
> > I'm migrating two repositories from svn. I already did svn->git
> > migration (git-svn clone) and now have two git repositories.
> >
> > I would like to merge them into 1 git repository, but to merge also
> > history - branches and tags.
> >
> > The reason is that the svn repositories in fact represent one
> > "project" - you had to download both of then, they are not useful
> > separately. Tags were applied to both repositories, also list of
> > branches is almost identical for both.
> >
> > So right now I have:
> >
> >     - projectA:
> >        master: r1, r4, r5, r7
> >        branch1: r10, r11, r13
> >     - projectB:
> >        master: r2, r3, r6
> >        branch1: r12, r14
> >
> > The content of projectA and projectB is different (let's say projectA
> > is in subfolder A and projectB is in subfolder B). So revisions on
> > projectA branches have only A folder, and revisions on projectB
> > branches have only B folder.
> >
> > But I would like to have:
> >
> >     - projectAB:
> >        master: r1', r2', r3', r4', r5', r6', r7'
> >        branch1: r10', r11', r12', r13', r14'
> >
> > Where all revisions have content from both projects. For example, the
> > r5' should have the "A" folder content the same as r5, but also should
> > have "B" folder content the same as in r3 (because r3 was the last
> > commit to projectB (date-wise) before commit r5 to projectA).
> >
> > There's additional difficulty of handling merges...
> >
> >>
> > Any suggestions on what's the best way to do it?
> >
> >
> > Currently I'm testing join-git-repos.py script
> > (https://github.com/mbitsnbites/git-tools/blob/master/join-git-repos.py)
> > but it's slow, memory inefficient and handles "master" branch only...
> >
> >
> > Thanks,
>
> You might be able to use https://github.com/newren/git-filter-repo

Splicing repos is an interesting case, but unless the history is
linear and the branch/tag names exactly match and you are simplify
weaving commits together based on timestamp within the same
branch/tag, then I don't know what algorithm should be used to weave
them together.  There are lots of choices, and "correct" may be very
usecase-specific.

That said, filter-repo was designed to be usable as a library and has
a few simple examples of such usage, including one of splicing some
trivial repos together.  (See
https://github.com/newren/git-filter-repo/blob/master/t/t9391/splice_repos.py
and https://github.com/newren/git-filter-repo/blob/master/t/t9391-filter-repo-lib-usage.sh#L90-L121)
 As noted there, fast-export's diff against first parent handling
makes splicing commits into the second (or third) parent history of a
merge problematic as new files introduced in such locations would by
default appear to get deleted by the merge unless additional work is
done to also insert the files there.  My example was meant as a simple
testcase that should be easy to inspect by others, so it just worked
with very short linear histories.

Somewhat interestingly, a search on others having tried to solve this
same problem turned up
https://github.com/j5int/jbosstools-gitmigration, which apparently is
based on git_fast_filter, which is the predecessor of filter-repo.
Perhaps that tool would be useful to you as-is, though they apparently
do ignore merges.

If folks have a good idea for a weaving algorithm that appears
generally useful rather than usecase-specific, then I may be
interested in coding it up as a more general example of using
filter-repo as a library.  But every time I've thought about it before
it just sounded too hairy and too usecase specific so I've just punted
on it.

> But I'd say try something even more stupid first:

  reply	other threads:[~2019-05-15 20:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 14:52 Merging (joining/stiching/rewriting) history of "unrelated" git repositories Piotr Krukowiecki
2019-05-15 15:25 ` Ævar Arnfjörð Bjarmason
2019-05-15 20:33   ` Elijah Newren [this message]
2019-05-16  6:38     ` Piotr Krukowiecki
2019-05-17 13:08       ` Piotr Krukowiecki
2019-05-20 13:54     ` Jakub Narebski
2019-05-21  7:53       ` Piotr Krukowiecki
2019-05-16  6:10   ` Piotr Krukowiecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGycoHEMN27Z9rAccT5yVRf3N50o4sc3wo8uE_HLR9QbA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=piotr.krukowiecki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).