Submodule, subtree, or something else?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Submodule, subtree, or something else?
@ 2015-08-21 22:47 Jānis Rukšāns
  2015-08-22  0:07 ` Stefan Beller
  0 siblings, 1 reply; 6+ messages in thread
From: Jānis Rukšāns @ 2015-08-21 22:47 UTC (permalink / raw
  To: git

Hello,

First of all, I apologise for the wall of text that follows; obviously I
am bad at this.

My $DAYJOB is switching from Subversion to Git, primarily because of
it's distributed nature (we are scattered all across the globe), and the
ease of branching and merging.  One issue that has popped up is how to
manage code shared between multiple projects.

Our SVN setup used a shared repository for all projects, either using
externals for shared code, or, more often than not, simply merging the
code between projects as needed.  Ignoring the fact that merging with
SVN is somewhat cumbersome, overall it has worked quite well for us,
especially when combined with git-svn.

For external libraries that rarely change, submodules appear to be the
obvious choice when using Git.  On the other hand, I've found them
somewhat cumbersome to use, and subtree merging (either using git
subtree, or directly with git merge -s subtree) is closer to what we
were doing in SVN.  A major drawback of submodules in my opinion is the
inability to make a full clone from an existing one without having
access to the central repository, which is something I have to do from
time to time.

For internal libraries, the situation is even less clear.  For many of
these libraries, most of the development happens within the context of a
single project, with commits to main project being interleaved with
commits to the subproject(s), resulting in histories resembling:

 (using git submodule)

   A---B---S1---S2---C---S3
          ,´   ,´       ,´
     N---O----P----Q---R

 (using git subtree with --rejoin)

   A---B---N---O---M1---M2---Q---C---R---M3
                  /    /                /
             N'--O'---P--------Q'------R'

 (using merge -s subtree)

   A---B---M1---M2---C---M3
          /    /        /
     N---O----P----Q---R

where A, B and C are changes to the main project, N, O, P, Q and R are
changes to library code, and Sn and Mn are submodule updates and merge
commits, respectively.

From what I have gathered, submodules have issues with branching and
merging, therefore, unless I'm mistaken, submodules are kinda out of
question.  Of the remaining two options, merging directly results in a
nicer history, but requires making all changes to the library repo first
(although I am quite sure that a similar effect can be achieved with
plumbing, similarly to how git subtree split works), and is harder to
use than git subtree.  Also, all three options can result in the main
project history being cluttered with extra commits.

Lastly, there is a particularly painful 3rd party library that has an
enormous amount of local modifications that are never going to make it
upstream, essentially making it a fork, project specific changes that
are required for one project, but would break others, separate language
bindings that access the internals (often requiring bug fixes to be made
simultaneously to both), and, if that wasn't enough, it *requires*
several source files to be modified for each individual project that
uses it.  It's a complete mess, but we're stuck with it for the existing
projects, as switching to an alternative would be too time consuming.

To sum up, I'm looking for something that would let us share code
between multiple projects, allow for:

1) separate histories with relatively easy branching and merging

2) distributed workflow without having to set up a multiple repositories
everywhere (eg. work <-> home <-> laptop)

3) to work on the shared code within a project using it

4) inspection of the complete history

5) modifications that are not shared with other projects

and would not result in lots of clutter in the history.

Repository size is somewhat less of an issue, because each submodule has
to be checked out anyway.

Submodules let you have #3, and #1, #2 and #5 to a point, after which it
becomes a pain.  git subtree allows #1, #2, #3 and #4, and #5 with some
pain (?), but results in duplicate commits.  Using subtree merge
strategy directly gives everything except #3, but is harder to use than
submodules or subtree.

Are there any other options beside these three for sharing (or in some
cases, not sharing) common code between projects using Git, that would
address the above points better?  Or, alternatively, ways to work around
the drawbacks of the existing tools?

Lastly, I will be grateful for any suggestions about how to handle the
messy case described above better.

Thanks,
Jānis

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Submodule, subtree, or something else?
  2015-08-21 22:47 Submodule, subtree, or something else? Jānis Rukšāns
@ 2015-08-22  0:07 ` Stefan Beller
  2015-08-23 14:11   ` Jānis Rukšāns
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2015-08-22  0:07 UTC (permalink / raw
  To: Jānis Rukšāns; +Cc: git@vger.kernel.org

On Fri, Aug 21, 2015 at 3:47 PM, Jānis Rukšāns <janis.ruksans@gmail.com> wrote:
> Hello,
>
>
> First of all, I apologise for the wall of text that follows; obviously I
> am bad at this.
>
> My $DAYJOB is switching from Subversion to Git, primarily because of
> it's distributed nature (we are scattered all across the globe), and the
> ease of branching and merging.  One issue that has popped up is how to
> manage code shared between multiple projects.
>
> Our SVN setup used a shared repository for all projects, either using
> externals for shared code, or, more often than not, simply merging the
> code between projects as needed.  Ignoring the fact that merging with
> SVN is somewhat cumbersome, overall it has worked quite well for us,
> especially when combined with git-svn.
>
> For external libraries that rarely change, submodules appear to be the
> obvious choice when using Git.  On the other hand, I've found them
> somewhat cumbersome to use, and subtree merging (either using git
> subtree, or directly with git merge -s subtree) is closer to what we
> were doing in SVN.  A major drawback of submodules in my opinion is the
> inability to make a full clone from an existing one without having
> access to the central repository, which is something I have to do from
> time to time.

Can you elaborate on that a bit more?
git clone --recurse-submodules should do that no matter which remote
you contact?


>
> For internal libraries, the situation is even less clear.  For many of
> these libraries, most of the development happens within the context of a
> single project, with commits to main project being interleaved with
> commits to the subproject(s), resulting in histories resembling:
>
>  (using git submodule)
>
>    A---B---S1---S2---C---S3
>           ,´   ,´       ,´
>      N---O----P----Q---R
>
>  (using git subtree with --rejoin)
>
>    A---B---N---O---M1---M2---Q---C---R---M3
>                   /    /                /
>              N'--O'---P--------Q'------R'
>
>  (using merge -s subtree)
>
>    A---B---M1---M2---C---M3
>           /    /        /
>      N---O----P----Q---R
>
> where A, B and C are changes to the main project, N, O, P, Q and R are
> changes to library code, and Sn and Mn are submodule updates and merge
> commits, respectively.
>
> From what I have gathered, submodules have issues with branching and
> merging, therefore, unless I'm mistaken, submodules are kinda out of
> question.  Of the remaining two options, merging directly results in a
> nicer history, but requires making all changes to the library repo first
> (although I am quite sure that a similar effect can be achieved with
> plumbing, similarly to how git subtree split works), and is harder to
> use than git subtree.  Also, all three options can result in the main
> project history being cluttered with extra commits.
>
> Lastly, there is a particularly painful 3rd party library that has an
> enormous amount of local modifications that are never going to make it
> upstream, essentially making it a fork, project specific changes that
> are required for one project, but would break others, separate language
> bindings that access the internals (often requiring bug fixes to be made
> simultaneously to both), and, if that wasn't enough, it *requires*
> several source files to be modified for each individual project that
> uses it.  It's a complete mess, but we're stuck with it for the existing
> projects, as switching to an alternative would be too time consuming.
>
>
> To sum up, I'm looking for something that would let us share code
> between multiple projects, allow for:
>
> 1) separate histories with relatively easy branching and merging
>
> 2) distributed workflow without having to set up a multiple repositories
> everywhere (eg. work <-> home <-> laptop)
>
> 3) to work on the shared code within a project using it
>
> 4) inspection of the complete history
>
> 5) modifications that are not shared with other projects
>
> and would not result in lots of clutter in the history.
>
> Repository size is somewhat less of an issue, because each submodule has
> to be checked out anyway.
>
> Submodules let you have #3, and #1, #2 and #5 to a point, after which it
> becomes a pain.  git subtree allows #1, #2, #3 and #4, and #5 with some
> pain (?), but results in duplicate commits.  Using subtree merge
> strategy directly gives everything except #3, but is harder to use than
> submodules or subtree.
>
> Are there any other options beside these three for sharing (or in some
> cases, not sharing) common code between projects using Git, that would
> address the above points better?  Or, alternatively, ways to work around
> the drawbacks of the existing tools?
>
> Lastly, I will be grateful for any suggestions about how to handle the
> messy case described above better.
>
> Thanks,
> Jānis
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Submodule, subtree, or something else?
  2015-08-22  0:07 ` Stefan Beller
@ 2015-08-23 14:11   ` Jānis Rukšāns
  2015-08-24 16:51     ` Stefan Beller
       [not found]     ` <CAK6hiNiBD+DUdNq0c2DY9LWg2PCgE56SpbBip8BNNmHTsEttuQ@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Jānis Rukšāns @ 2015-08-23 14:11 UTC (permalink / raw
  To: Stefan Beller; +Cc: git@vger.kernel.org

On Pk, 2015-08-21 at 17:07 -0700, Stefan Beller wrote:
> On Fri, Aug 21, 2015 at 3:47 PM, Jānis Rukšāns <janis.ruksans@gmail.com> wrote:
> > 
> > A major drawback of submodules in my opinion is the
> > inability to make a full clone from an existing one without having
> > access to the central repository, which is something I have to do from
> > time to time.
> 
> Can you elaborate on that a bit more?
> git clone --recurse-submodules should do that no matter which remote
> you contact?

I mean that if I have cloned a repository with submodules, cloning that
repository with --recurse-submodules will either access the "central
server" if absolute URLs are used, or requires additional clones for
each submodule.  For example

git clone --recursive http://somewhere/projectA.git
git clone --recursive file://$(pwd)/projectA projectA.tmp

The second command will cause the submodules to be downloaded again, or
expect them to be found in $(pwd).

Or am I mistaken, or doing something wrong?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Submodule, subtree, or something else?
  2015-08-23 14:11   ` Jānis Rukšāns
@ 2015-08-24 16:51     ` Stefan Beller
  2015-08-24 17:53       ` Jānis Rukšāns
       [not found]     ` <CAK6hiNiBD+DUdNq0c2DY9LWg2PCgE56SpbBip8BNNmHTsEttuQ@mail.gmail.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2015-08-24 16:51 UTC (permalink / raw
  To: Jānis Rukšāns; +Cc: git@vger.kernel.org

On Sun, Aug 23, 2015 at 7:11 AM, Jānis Rukšāns <janis.ruksans@gmail.com> wrote:
> On Pk, 2015-08-21 at 17:07 -0700, Stefan Beller wrote:
>> On Fri, Aug 21, 2015 at 3:47 PM, Jānis Rukšāns <janis.ruksans@gmail.com> wrote:
>> >
>> > A major drawback of submodules in my opinion is the
>> > inability to make a full clone from an existing one without having
>> > access to the central repository, which is something I have to do from
>> > time to time.
>>
>> Can you elaborate on that a bit more?
>> git clone --recurse-submodules should do that no matter which remote
>> you contact?
>
> I mean that if I have cloned a repository with submodules, cloning that
> repository with --recurse-submodules will either access the "central
> server" if absolute URLs are used, or requires additional clones for
> each submodule.  For example
>
> git clone --recursive http://somewhere/projectA.git
> git clone --recursive file://$(pwd)/projectA projectA.tmp
>
> The second command will cause the submodules to be downloaded again, or
> expect them to be found in $(pwd).

IIUC, the second command will lookup the submodules in $(pwd), but if they
are not there they are skipped, so all of the existing submodules are cloned.
Why do you need more submodules in the tmp clone than in $(pwd)/projectA
would be my next question. But I see your point now.



>
> Or am I mistaken, or doing something wrong?
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Submodule, subtree, or something else?
       [not found]     ` <CAK6hiNiBD+DUdNq0c2DY9LWg2PCgE56SpbBip8BNNmHTsEttuQ@mail.gmail.com>
@ 2015-08-24 17:12       ` Jānis Rukšāns
  0 siblings, 0 replies; 6+ messages in thread
From: Jānis Rukšāns @ 2015-08-24 17:12 UTC (permalink / raw
  To: Cox, Michael; +Cc: git@vger.kernel.org

On Sv, 2015-08-23 at 17:13 -0600, Cox, Michael wrote:
> You might want to take a look at how the Boost (boost.org) project
> uses submodules.  They use submodules for each library.  I know they
> use relative paths in their .gitmodules file to avoid the problem
> you're referring to regarding "git clone --recurse-submodules".

Thanks!  I had a look at their setup, and they are using ../libx.git for
submodules, which unfortunately breaks when cloning from another
"working copy":

$ git clone --recursive file:///tmp/gittest/repo.a/main.git main.work
Cloning into 'main.work'...
<snip>
Submodule 'liba' (file:///tmp/gittest/repo.a/liba.git) registered for path 'liba'
Cloning into 'liba'...
<snip>
Submodule path 'liba': checked out '6a0ef37c03a7068328956dcb8a08bc39f280edfc'

$ git clone --recursive file://($pwd)/main.work main.home
Cloning into 'main.home'...
<snip>
Submodule 'liba' (file:///tmp/gittest/work/liba.git) registered for path 'liba'
Cloning into 'liba'...
fatal: '/tmp/gittest/work/liba.git' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Clone of 'file:///tmp/gittest/work/liba.git' into submodule path 'liba' failed

After some trial and error I managed to get what I wanted to achieve by
using ./liba as the submodule URL (no .git suffix!), and creating a file
named liba in /tmp/gittest/repo.a/main.git (ie. the bare "origin" repo)
with a single line in it:

gitdir: ../liba.git

However, I'm not sure it is the right thing, or even advisable to do so.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Submodule, subtree, or something else?
  2015-08-24 16:51     ` Stefan Beller
@ 2015-08-24 17:53       ` Jānis Rukšāns
  0 siblings, 0 replies; 6+ messages in thread
From: Jānis Rukšāns @ 2015-08-24 17:53 UTC (permalink / raw
  To: Stefan Beller; +Cc: git@vger.kernel.org

On P , 2015-08-24 at 09:51 -0700, Stefan Beller wrote:
> IIUC, the second command will lookup the submodules in $(pwd), but if
> they are not there they are skipped, so all of the existing submodules
> are cloned.
> Why do you need more submodules in the tmp clone than in
> $(pwd)/projectA would be my next question. But I see your point now.

The $(pwd) was just an example to illustrate my point.  The actual use
case is that I would be hacking on something at work, notice that it is
already late and I have to catch the last bus home, yet I don't want to
postpone whatever I was working on until the next day.  So I would do
git commit -a -m "[WIP] Stuff, finish at home" to save my work so far,
go home, and clone / fetch it over ssh.

Another important factor is that a lot of our code can be meaningfully
tested only on the actual hardware, and is built in a VM.  Quite often
getting things right involve many iterations of hack hack hack, git
commit --amend, fetch && reset --hard in the VM, build, test, repeat.
Being able to clone / fetch directly from the copy I am working on makes
it a lot easier.

As I wrote in the other e-mail, I managed to achieve the desired result
by using ./<submodule> (without .git suffix) as the submodule URL, and
creating a file named <submodule> in the bare repo with
'gitdir: ../<submodule.git>' as it's contents, but I'm not sure whether
it is a good idea or not.

Jānis

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-24 17:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-21 22:47 Submodule, subtree, or something else? Jānis Rukšāns
2015-08-22  0:07 ` Stefan Beller
2015-08-23 14:11   ` Jānis Rukšāns
2015-08-24 16:51     ` Stefan Beller
2015-08-24 17:53       ` Jānis Rukšāns
     [not found]     ` <CAK6hiNiBD+DUdNq0c2DY9LWg2PCgE56SpbBip8BNNmHTsEttuQ@mail.gmail.com>
2015-08-24 17:12       ` Jānis Rukšāns

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).