A design for subrepositories

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* A design for subrepositories
@ 2012-10-13 13:33 Lauri Alanko
  2012-10-13 17:30 ` Junio C Hamano
  2012-10-13 21:20 ` perryh
  0 siblings, 2 replies; 17+ messages in thread
From: Lauri Alanko @ 2012-10-13 13:33 UTC (permalink / raw)
  To: git

Hello.

I intend to work on a "subrepository" tool for git, but before I  
embark on the actual programming, I thought to first invite comments  
on the general design.

Some background first. I know that there are several existing  
approaches already for managing nested repositories, but none of them  
quite seems to fit my purposes. My primary goal is to use git for home  
directory backup and mirroring, while the home directory itself may of  
course contain repositories.

Git-subtree doesn't quite fit the bill. It allows merging a subtree  
into a larger tree and then again splitting it out for exporting, but  
this is tedious. More importantly, a merged tree gets branched along  
with the containing tree, whereas I want to have subrepositories  
precisely because the subtrees need to be branched independently of  
the container.

Submodules are a bit closer to what I want, but they have clearly been  
designed for a different purpose: a repository with submodules is only  
supposed to collate existing repositories, not act as a source for  
them. So they aren't really faithful to the distributed nature of git:  
there's no easy way to completely clone a repository and its submodules.

Moreover, submodules have some other annoyances like not supporting  
bare repositories and checking out the submodules in detached heads.

Now, in other circumstances I might just patch git-submodule to add  
the features I want, but it turns out that it is written in shell. I  
know that is a git tradition, but I'm going to get a bit religious  
here: anything longer than a screenful shouldn't be written in shell,  
and I'm certainly not going to add more lines to an already overlong  
script. Hence I'm going to write a separate tool using something a bit  
more... structured. Probably Python with Dulwich.

So here are some preliminary thoughts on how the tool should work.

* Repository layout

Every subrepository has a unique identifier. The heads of  
subrepository <subname> are simply stored as heads in a subdirectory  
of the main repository: e.g.  
refs/heads/subrepos/<subname>/<branchname>. Likewise for tags:  
refs/tags/subrepos/<subname>/<tagname>.

Rationale: if we had fully independent repositories under the main  
repository directory, like what git-submodule uses, there would be no  
easy way to enumerate all the existing subrepositories to copy them.  
Since the only thing we can directly list from a remote repository are  
references, it makes sense to store the subrepositories just as a  
bunch of them.

The reason for storing the subrepo references under refs/heads/ and  
refs/tags/ (instead of, say, refs/subrepos/) is simply that this way  
everything is directly compatible with standard git tools: one can do  
a normal git clone/push/pull for mirroring and backup purposes without  
any need for special tools. You only need tools once you operate on a  
working tree.

* Tree layout

A tree can mount references of subrepositories. There are two  
components to a mount: a gitlink under <path> to a particular commit  
of a subrepo, and an entry in .gitrepos. This is very similar to how  
git-submodule works.

The entry in .gitrepos specifies two things: the name of the  
subrepository mounted under <path>, and the active branch in that  
mount at the time of commit. So .gitrepos would look like this:

[mount "<path>"]
    subrepo = <subname>
    branch = <branchname>

Rationale: by storing the active branch name we can cater for the very  
common case where we check out a gitlink pointing to the current head  
of the branch. Then, when we check out the subrepository at the mount  
point, we can adjust HEAD to point to the correct branch.

By associating from a path to a subrepository (instead of the other  
way, as git-submodule does), we can have multiple mount points for the  
same subrepository, presumably with different active branches.  
Sometimes we want to have separate working trees for various branches,  
and it's good to be able to store this configuration in the containing  
tree.

* Working tree layout

When a tree containing mount points is checked out, a repository is  
created at each of those mount points. For every <path> specified in  
.gitrepos with subrepo <subname> and active branch <branchname>, and a  
gitlink in <path> pointing to <commit>, we do the following:

- Create a repository under <path>/.git

- Add the object store of the containing repository to  
<path>/.git/objects/info/alternates

- Pull (just copy, really) the containing repository's references to  
the subrepository as follows:

  - refs/heads/subrepos/<subname>/* -> refs/heads/*
  - refs/tags/subrepos/<subname>/* -> refs/tags/*
  - refs/remotes/<remote>/subrepos/<subname>/* -> refs/remotes/<remote>/*

- If now in the subrepository refs/heads/<branchname> points to  
<commit>, set HEAD as a symref to it. Otherwise set a detached HEAD  
directly to <commit>.

- Check out HEAD in the subrepository.

Rationale: it was a tempting idea to make refs/heads and refs/tags to  
be symlinks directly to the correct subdirectories in the containing  
repository, and likewise make objects/ directly a symlink to the  
containing repository's object store. However, this is not really  
feasible due to packed-refs, and it would make symlinks a requirement,  
something that git tries to avoid. (Of course "directory symrefs"  
would be a simple addition to the core.)

More importantly, a symlink to the object store would break git-gc.  
Also, it would be ugly to have ref manipulations under the mount point  
directly affect the refs in the containing repository. It's better  
that none of the changes under the mount point affect the containing  
repository in any way before an explicit add and check-in. At this  
point the refs are pulled back in the reverse direction.

* Basic commands

** git subrepo add <path> [<subname>]

Add a subrepository to the containing repository, or add the changes  
in a subrepository to the index.

If <path> is not yet found in .gitrepos, <subname> must be specified.  
Otherwise <subname> is looked up from .gitrepos.

The command performs the following:

- Add or update the gitlink to the index: git add <path>
- Add or change an entry in .gitmodules, setting mount.<path>.subname  
to <subname> and mount.<path>.branch to the active branch under <path>  
(if any).
- git add .gitmodules

** git subrepo checkin [-f] [<path>...]

Update the subrepo references in the containing repository to the  
references in the mount points. This is meant to be run as a  
pre-commit hook with no arguments.

If no paths are given, <path>... defaults to every mount path in  
.gitrepos that has been changed in the index. For each <path> mounting  
<subname>, perform the following:

- git fetch [-f] <path> refs/heads/*:refs/heads/subrepos/<subname>/
- git fetch [-f] <path> refs/tags/*:refs/tags/subrepos/<subname>/

If [-f] is given, it is passed to git fetch.

The operation can fail in the unlikely case that there are multiple  
mount points for the same subrepository, and a branch has diverged  
between those mount points.

Note: after this operation, any new objects that were added under the  
mount point are now duplicated in the containing repository. A git gc  
in the containing repository followed by a git gc in the mount point  
should remove the now-redundant objects from the mount point.

Note: the default paths overlook the spurious case where have modified  
the head of a non-active branch under the mount point, but the active  
branch (and hence the commit in the gitlink) have remained unchanged.  
I don't know if there's a reasonable way to make "git subrepo add"  
somehow stage even these kinds of changes.

** git subrepo checkout [<path>...]

Check out the subrepositories at mount points <path>..., or at all the  
mount points if none are specified. This is meant to be run as a  
post-checkout hook with no arguments.

This is described above in "Working tree layout". If this is not an  
initial checkout, then the first two steps are skipped and just the  
refs and working tree are updated.

** git subrepo mv <path> <path>

Move a mount point: git mv the actual directory and adjust the path in  
.gitrepos and possibly the relative path in  
<path>/.git/objects/info/alternates. (An absolute path would fix the  
latter, but then we couldn't move the entire containing repository.  
This is the lesser evil, IMHO.)

Gripe: why doesn't git support arbitrary metadata for tree entries?  
Then we wouldn't need to worry about syncing various path attributes  
that are stored in separate files, but a simple git mv could  
automatically move everything associated with the path.

** git subrepo rm <path>

Remove the mount point and its entry in .gitrepos.

* A variant design

The above design is straightforward to implement, but it has a bit of  
an ad-hoc feel in that we have these magic commands that transfer refs  
between the containing repository and the mount points. But there are  
already standard tools for transferring refs: push and pull/fetch. It  
would be more "git-like" to use these directly, and make the  
containing repository be simply a remote for the mount point. We need  
a special remote for this purpose: git-remote-subrepo gives a "view"  
of the refs of a particular subrepo within the ref tree of the  
containing repository. It just makes the following translations for  
push and fetch:

subrepo://<URL>/<subname> refs/heads/<branchname>
-> <URL> heads/subrepos/<subname>/<branchname>

subrepo://<URL>/<subname> refs/tags/<tagname>
-> <URL> tags/subrepos/<subname>/<tagname>

subrepo://<URL>/<subname>/<remote> refs/heads/<branchname> ->
-> <URL> remotes/<remote>/heads/subrepos/<subname>/<branchname>

subrepo://<URL>/<subname>/<remote> refs/heads/<branchname> ->
-> <URL> remotes/<remote>/heads/subrepos/<subname>/<branchname>

Then subrepo://<containingrepo>/<subname> is set as the origin in the  
mount point, so one can just do a normal git push to push the changes  
to the containing repository. Likewise, for all the remotes in the  
containing repository, a remote with the same name is created under  
the mount point with the url  
subrepo://<containingrepo>/<subname>/<remote>. Or it can be set to  
directly access the actual remote:  
subrepo://<url-of-remote>/<subname>. It's a matter of taste.

The problem with explicit pushing to the containing repository is that  
then changes to the refs happen completely independently of changes to  
the gitlinks, and ideally these should be synchronized in a single  
commit. So I'm not quite sure if the additional complexity of a remote  
helper is warranted.

I hope I managed to make some sense of what this is about. Questions  
and comments are appreciated.

Cheers,

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-13 13:33 A design for subrepositories Lauri Alanko
@ 2012-10-13 17:30 ` Junio C Hamano
  2012-10-13 21:23   ` Lauri Alanko
  2012-10-13 21:20 ` perryh
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2012-10-13 17:30 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

"Lauri Alanko" <la@iki.fi> writes:

> I intend to work on a "subrepository" tool for git, but before I
> embark on the actual programming, I thought to first invite comments
> on the general design.
>
> Some background first. I know that there are several existing
> approaches already for managing nested repositories, but none of them
> quite seems to fit my purposes. My primary goal is to use git for home
> directory backup and mirroring, while the home directory itself may of
> course contain repositories.
> ...
> Submodules are a bit closer to what I want, but they have clearly been
> designed for a different purpose: a repository with submodules is only
> supposed to collate existing repositories, not act as a source for
> them.

I have a repository that covers my home directory and some of its
subdirectories have their own repositories.

I had my home directory and its subdirectories before Git ever
existed, and I made my home directory and these subdirectories into
separate, nested Git repositories fairly early after I started
managing them with Git---way before submodules were invented.  Now
the subdirectory repositories are bound as submodules of the top
level directory just fine.

I push these out for safekeeping purposes, all of my machines get
their copies from here, and some submodules are not cloned to work
machines (they house data of private nature).  They are used just
like you are expected to use submodules. In fact, this is pretty
much vanilla use case of submodules, I think.

They _all_ originate from under my home directory, not "collating
existing repositories" at all.

Have you considered how you can _extend_ submodules support to
support your use case better?  I think that would be a much more
useful approach, as you are likely to get help from other people who
do use submodules.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-13 17:30 ` Junio C Hamano
@ 2012-10-13 21:23   ` Lauri Alanko
  2012-10-14  4:36     ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Lauri Alanko @ 2012-10-13 21:23 UTC (permalink / raw)
  To: git

Quoting "Junio C Hamano" <gitster@pobox.com>:
> Now
> the subdirectory repositories are bound as submodules of the top
> level directory just fine.

This is indeed possible, but with some serious caveats.

Firstly, if you simply do "git submodule add ./foo" (the obligatory  
"./" being quite an unobvious pitfall), you get something quite  
fragile, since now we have submodule.foo.url = ./foo. If the  
submodules ever get reorganized and foo is moved to ./bar, then it is  
impossible to check out older versions or alternate branches, since  
the submodule is no longer where it is expected to be at the origin.

A more robust solution is to use submodule.foo.url =  
./.git/modules/foo, since logical name of a module doesn't change.  
This seems quite kludgy, though, and this cannot be how git-submodule  
is supposed to be used.

But still, "git submodule update" only looks at the modules in the  
currently checked-out tree. If we have other branches or old tags that  
refer to other submodules, there's no simple way to fetch those, too.  
And there is not even such a concept as a bare repository with modules.

So git-submodule is fundamentally a tool to attach repositories into a  
tree, not to attach repositories into a repository. That's why it's  
not really fit for my purposes.

The core problem is that to clone an entire repository and all its  
submodules, there needs to be a way to list them all remotely. But the  
git protocol doesn't just allow us to list the subdirectories under  
.git/modules. Still, there are several ways to do this:

* Just read .gitmodules in every ref and find by brute force every  
submodule referred to even by a single ref. This doesn't really scale.

* Maintain a list of all the submodules in a repository. This would  
have to be in a separate metadata branch, and would get rather hairy  
when we need to merge from a remote that has added other submodules.

* Represent the submodules as refs instead of independent  
repositories. This is my proposal for subrepositories.

However, I feel that all of these are too drastic changes to make in  
git-submodule, given that it is already well-established.

The minor problems, like lack of active branch tracking and multiple  
mount points of a module, could in principle be fixed in  
git-submodule. But again, I have no fondness for complex shell  
programming. Perhaps it was justified when the only interface to git's  
functionality were the command-line tools, but nowadays there are  
various ways to manipulate git repositories from real programming  
languages through real libraries (libgit2, dulwich, etc), and I prefer  
to use those, so I don't really have any motivation to touch  
git-submodule.

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-13 21:23   ` Lauri Alanko
@ 2012-10-14  4:36     ` Junio C Hamano
  2012-10-14 10:19       ` Lauri Alanko
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2012-10-14  4:36 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

"Lauri Alanko" <la@iki.fi> writes:

> Firstly, if you simply do "git submodule add ./foo" (the obligatory
> "./" being quite an unobvious pitfall), you get something quite
> fragile, since now we have submodule.foo.url = ./foo. If the
> submodules ever get reorganized and foo is moved to ./bar, then it is
> impossible to check out older versions or alternate branches, since
> the submodule is no longer where it is expected to be at the origin.

Isn't that exactly what the "module name" vs "module path" mapping
in .gitmodules file is meant to address?

> But still, "git submodule update" only looks at the modules in the
> currently checked-out tree. If we have other branches or old tags that
> refer to other submodules, there's no simple way to fetch those, too.

Didn't I already suggest you to think about how you can improve
existing "git submodule" to suit your use case better?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14  4:36     ` Junio C Hamano
@ 2012-10-14 10:19       ` Lauri Alanko
  2012-10-14 13:28         ` Jens Lehmann
  0 siblings, 1 reply; 17+ messages in thread
From: Lauri Alanko @ 2012-10-14 10:19 UTC (permalink / raw)
  To: git

Quoting "Junio C Hamano" <gitster@pobox.com>:

>> If the
>> submodules ever get reorganized and foo is moved to ./bar, then it is
>> impossible to check out older versions or alternate branches, since
>> the submodule is no longer where it is expected to be at the origin.
>
> Isn't that exactly what the "module name" vs "module path" mapping
> in .gitmodules file is meant to address?

Yes, and as I showed after the part you quoted, it is possible to  
refer to a module by name, although it looks like such a hack that I  
can't imagine it's currently something that git-submodule is intended  
to support.

>> But still, "git submodule update" only looks at the modules in the
>> currently checked-out tree. If we have other branches or old tags that
>> refer to other submodules, there's no simple way to fetch those, too.

> Didn't I already suggest you to think about how you can improve
> existing "git submodule" to suit your use case better?

Yes, and I listed three possible ways. Two of them seem technically  
unattractive, whereas one of them (submodules as ref directories)  
seems like a huge change that could introduce incompatibilities. That  
is why a separate tool seems like a cleaner choice.

If you want enhancements to git-submodule, at least deign to comment  
on the issues above.

There is actually a fourth alternative: extend the git protocol so  
that a remote repository could be queried for its list of submodules.  
But this seems particularly icky: git is at its core such a low-level  
framework. Nested repositories are such a high-level concept that  
something is wrong if the core needs specialized support for it. The  
ref directories approach, on the other hand, is completely transparent  
to standard tools.

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 10:19       ` Lauri Alanko
@ 2012-10-14 13:28         ` Jens Lehmann
  2012-10-14 15:27           ` Lauri Alanko
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Lehmann @ 2012-10-14 13:28 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 14.10.2012 12:19, schrieb Lauri Alanko:
> Quoting "Junio C Hamano" <gitster@pobox.com>:
> 
>>> If the
>>> submodules ever get reorganized and foo is moved to ./bar, then it is
>>> impossible to check out older versions or alternate branches, since
>>> the submodule is no longer where it is expected to be at the origin.
>>
>> Isn't that exactly what the "module name" vs "module path" mapping
>> in .gitmodules file is meant to address?
> 
> Yes, and as I showed after the part you quoted, it is possible to refer to a module by name, although it looks like such a hack that I can't imagine it's currently something that git-submodule is intended to support.

Your initial statement is not correct. It is possible to check out older
versions or alternate branches (at least since we moved the .git directory
into the .git directory of the superproject). So no improvement gained
here by your proposal (although I concede that the current user experience
is suboptimal until my recursive submodule update work hits mainline).

>>> But still, "git submodule update" only looks at the modules in the
>>> currently checked-out tree. If we have other branches or old tags that
>>> refer to other submodules, there's no simple way to fetch those, too.

Did you notice that "git fetch" fetches all those submodules too which
have been updated in the commits fetched for the superproject, no matter
on what branch they are on?

>> Didn't I already suggest you to think about how you can improve
>> existing "git submodule" to suit your use case better?
> 
> Yes, and I listed three possible ways. Two of them seem technically unattractive, whereas one of them (submodules as ref directories) seems like a huge change that could introduce incompatibilities. That is why a separate tool seems like a cleaner choice.

What's wrong with making git clone all submodules together with the
superproject (when the user said he wants to update all submodules on
clone too by setting a - still to be added - config option)? That's my
plan to make automagic recursive submodule cloning work and it would
clone all submodules seen in the history of the superproject to
.git/modules so they could easily be checked out later (and those
present in the HEAD of the superproject will be checked out immediately
like "git clone --recurse-submodules" does right now). Were not there
yet, but that's how I believe that should work.

> There is actually a fourth alternative: extend the git protocol so that a remote repository could be queried for its list of submodules.

That information is contained in the different versions of the .gitmodules
file, so no need to extend anything here.

I saw nothing in your proposal which couldn't been handled by submodules,
and for every issue there already have been proposals on how to do that.
So adding another tool doesn't make any sense here. But you are welcome
helping us to improve the submodule script (and some core commands too)
to make submodules cover your use case too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 13:28         ` Jens Lehmann
@ 2012-10-14 15:27           ` Lauri Alanko
  2012-10-14 16:10             ` Jens Lehmann
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Lauri Alanko @ 2012-10-14 15:27 UTC (permalink / raw)
  To: git

Quoting "Jens Lehmann" <Jens.Lehmann@web.de>:

>>>> If the
>>>> submodules ever get reorganized and foo is moved to ./bar, then it is
>>>> impossible to check out older versions or alternate branches, since
>>>> the submodule is no longer where it is expected to be at the origin.

> Your initial statement is not correct.

Please elaborate. My initial statement was about "git submodule add  
./foo", and this is what I get:

la@bq:~/tmp$ git --version
git version 1.8.0.rc2.2.gfc364c7
la@bq:~/tmp$ git init super
Initialized empty Git repository in /home/la/tmp/super/.git/
la@bq:~/tmp$ cd super
la@bq:~/tmp/super$ echo foo > foo
la@bq:~/tmp/super$ git add foo
la@bq:~/tmp/super$ git ci -m foo
[master (root-commit) a0dd543] foo
  1 file changed, 1 insertion(+)
  create mode 100644 foo
la@bq:~/tmp/super$ git init sub
Initialized empty Git repository in /home/la/tmp/super/sub/.git/
la@bq:~/tmp/super$ cd sub
la@bq:~/tmp/super/sub$ echo bar > bar
la@bq:~/tmp/super/sub$ git add bar
la@bq:~/tmp/super/sub$ git ci -m bar
[master (root-commit) a6ee6d6] bar
  1 file changed, 1 insertion(+)
  create mode 100644 bar
la@bq:~/tmp/super/sub$ cd ..
la@bq:~/tmp/super$ git submodule add ./sub
Adding existing repo at 'sub' to the index
la@bq:~/tmp/super$ git ci -m sub
[master cb289e8] sub
  2 files changed, 4 insertions(+)
  create mode 100644 .gitmodules
  create mode 160000 sub
la@bq:~/tmp/super$ git branch old
la@bq:~/tmp/super$ git mv sub movedsub
fatal: source directory is empty, source=sub, destination=movedsub
la@bq:~/tmp/super$ mv sub movedsub
la@bq:~/tmp/super$ git rm sub
rm 'sub'
la@bq:~/tmp/super$ git add movedsub
la@bq:~/tmp/super$ git config -f .gitmodules submodule.sub.path movedsub
la@bq:~/tmp/super$ git config -f .gitmodules submodule.sub.url ./movedsub
la@bq:~/tmp/super$ git ci -am movedsub
[master 5598bc0] movedsub
  2 files changed, 2 insertions(+), 2 deletions(-)
  rename sub => movedsub (100%)
la@bq:~/tmp/super$ cd ..
la@bq:~/tmp$ git clone super superc
Cloning into 'superc'...
done.
la@bq:~/tmp$ cd superc
la@bq:~/tmp/superc$ git co old
Branch old set up to track remote branch old from origin.
Switched to a new branch 'old'
la@bq:~/tmp/superc$ git submodule update --init
Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
fatal: repository '/home/la/tmp/super/sub' does not exist
Clone of '/home/la/tmp/super/sub' into submodule path 'sub' failed

So a normal relative path in .gitmodules to inside the tree is  
fragile, since the location of the submodule can change.

> Did you notice that "git fetch" fetches all those submodules too which
> have been updated in the commits fetched for the superproject, no matter
> on what branch they are on?

No. This would be great, but this is what I get:

la@bq:~/tmp$ git init super
Initialized empty Git repository in /home/la/tmp/super/.git/
la@bq:~/tmp$ cd super
la@bq:~/tmp/super$ echo foo > foo
la@bq:~/tmp/super$ git add foo
la@bq:~/tmp/super$ git ci -m foo
[master (root-commit) 0f207c9] foo
  1 file changed, 1 insertion(+)
  create mode 100644 foo
la@bq:~/tmp/super$ git branch nosubs
la@bq:~/tmp/super$ git init sub
Initialized empty Git repository in /home/la/tmp/super/sub/.git/
la@bq:~/tmp/super$ cd sub
la@bq:~/tmp/super/sub$ echo bar > bar
la@bq:~/tmp/super/sub$ git add bar
la@bq:~/tmp/super/sub$ git ci -m bar
[master (root-commit) 180c6c9] bar
  1 file changed, 1 insertion(+)
  create mode 100644 bar
la@bq:~/tmp/super/sub$ cd ..
la@bq:~/tmp/super$ git submodule add ./sub
Adding existing repo at 'sub' to the index
la@bq:~/tmp/super$ git ci -m sub
[master 16cff18] sub
  2 files changed, 4 insertions(+)
  create mode 100644 .gitmodules
  create mode 160000 sub
la@bq:~/tmp/super$ cd ..
la@bq:~/tmp$ git clone super superc
Cloning into 'superc'...
done.
la@bq:~/tmp$ cd superc
la@bq:~/tmp/superc$ git submodule update --init
Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
Cloning into 'sub'...
done.
Submodule path 'sub': checked out '180c6c979289f4e25525003673e51d0e39dab8f6'
la@bq:~/tmp/superc$ cd ../super/sub
la@bq:~/tmp/super/sub$ echo baz >> bar
la@bq:~/tmp/super/sub$ git ci -am baz
[master 652c8b3] baz
  1 file changed, 1 insertion(+)
la@bq:~/tmp/super/sub$ cd ..
la@bq:~/tmp/super$ git ci -am subbaz
[master c7c3bfc] subbaz
  1 file changed, 1 insertion(+), 1 deletion(-)
la@bq:~/tmp/super$ cd ../superc
la@bq:~/tmp/superc$ git co nosubs
warning: unable to rmdir sub: Directory not empty
Branch nosubs set up to track remote branch nosubs from origin.
Switched to a new branch 'nosubs'
la@bq:~/tmp/superc$ git fetch --recurse-submodules=yes
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
 From /home/la/tmp/super
    16cff18..c7c3bfc  master     -> origin/master
la@bq:~/tmp/superc$ git co master
Switched to branch 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
la@bq:~/tmp/superc$ git fetch --recurse-submodules=yes
Fetching submodule sub
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
 From /home/la/tmp/super/sub
    180c6c9..652c8b3  master     -> origin/master

So I had to checkout master in order to fetch the updates to the  
submodule used by master.

> What's wrong with making git clone all submodules together with the
> superproject (when the user said he wants to update all submodules on
> clone too by setting a - still to be added - config option)?

Depends on how it's done. In a previous mail I just considered various  
ways to do it. If I see correctly, your choice is to read .gitmodules  
from every branch and every tag to find the total set of submodules  
used by the repository. As I said already, that is certainly possible,  
but it's just not very scalable, if fetch operations slow down  
linearly in the number of tags.

But no matter the technical issues, it seems that you at least have  
the _intention_ to eventually support self-contained, HEAD-independent  
repository collections. That already is valuable information, thanks.

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 15:27           ` Lauri Alanko
@ 2012-10-14 16:10             ` Jens Lehmann
  2012-10-14 16:15             ` Jens Lehmann
  2012-10-14 16:25             ` Jens Lehmann
  2 siblings, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-14 16:10 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 14.10.2012 17:27, schrieb Lauri Alanko:
> Quoting "Jens Lehmann" <Jens.Lehmann@web.de>:
> 
>>>>> If the
>>>>> submodules ever get reorganized and foo is moved to ./bar, then it is
>>>>> impossible to check out older versions or alternate branches, since
>>>>> the submodule is no longer where it is expected to be at the origin.
> 
>> Your initial statement is not correct.
> 
> Please elaborate. My initial statement was about "git submodule add ./foo", and this is what I get:
> 
> la@bq:~/tmp$ git --version
> git version 1.8.0.rc2.2.gfc364c7
> la@bq:~/tmp$ git init super
> Initialized empty Git repository in /home/la/tmp/super/.git/
> la@bq:~/tmp$ cd super
> la@bq:~/tmp/super$ echo foo > foo
> la@bq:~/tmp/super$ git add foo
> la@bq:~/tmp/super$ git ci -m foo
> [master (root-commit) a0dd543] foo
>  1 file changed, 1 insertion(+)
>  create mode 100644 foo
> la@bq:~/tmp/super$ git init sub
> Initialized empty Git repository in /home/la/tmp/super/sub/.git/
> la@bq:~/tmp/super$ cd sub
> la@bq:~/tmp/super/sub$ echo bar > bar
> la@bq:~/tmp/super/sub$ git add bar
> la@bq:~/tmp/super/sub$ git ci -m bar
> [master (root-commit) a6ee6d6] bar
>  1 file changed, 1 insertion(+)
>  create mode 100644 bar
> la@bq:~/tmp/super/sub$ cd ..
> la@bq:~/tmp/super$ git submodule add ./sub
> Adding existing repo at 'sub' to the index
> la@bq:~/tmp/super$ git ci -m sub
> [master cb289e8] sub
>  2 files changed, 4 insertions(+)
>  create mode 100644 .gitmodules
>  create mode 160000 sub
> la@bq:~/tmp/super$ git branch old
> la@bq:~/tmp/super$ git mv sub movedsub
> fatal: source directory is empty, source=sub, destination=movedsub

This error here indicates that we didn't teach git to properly move
a submodule yet. It is one of my next goals to make "git [submodule]
mv sub movedsub" do the right thing here. To do these steps manually
you'll additionally have to do the following before moving the
submodule (because after moving it the relative paths will be broken):

$ HASH=$(cd sub; git rev-parse HEAD)

> la@bq:~/tmp/super$ mv sub movedsub

Currently it is better to remove the submodule here, as recreating it
with a "git submodule update" later will get the relative paths right.

> la@bq:~/tmp/super$ git rm sub
> rm 'sub'
> la@bq:~/tmp/super$ git add movedsub

And to git this adds a completely different submodule (as its name
is not "sub"), which breaks your expectation. To do what you intended
use this line instead:

$ git update-index --add --cacheinfo 160000 $HASH movedsub

(With the "--next" option currently in the "next" branch of Junio's
repo a "git submodule add --name sub movedsub" should do the job.
Until then a bit more magic is necessary).

> la@bq:~/tmp/super$ git config -f .gitmodules submodule.sub.path movedsub
> la@bq:~/tmp/super$ git config -f .gitmodules submodule.sub.url ./movedsub
> la@bq:~/tmp/super$ git ci -am movedsub
> [master 5598bc0] movedsub
>  2 files changed, 2 insertions(+), 2 deletions(-)
>  rename sub => movedsub (100%)
> la@bq:~/tmp/super$ cd ..
> la@bq:~/tmp$ git clone super superc
> Cloning into 'superc'...
> done.
> la@bq:~/tmp$ cd superc
> la@bq:~/tmp/superc$ git co old
> Branch old set up to track remote branch old from origin.
> Switched to a new branch 'old'
> la@bq:~/tmp/superc$ git submodule update --init
> Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
> fatal: repository '/home/la/tmp/super/sub' does not exist
> Clone of '/home/la/tmp/super/sub' into submodule path 'sub' failed

And that fails because to be able to clone a submodule it has to be
pushed into its own repo first, so it can be cloned from there somewhere
else. After doing that this will work.

> So a normal relative path in .gitmodules to inside the tree is fragile, since the location of the submodule can change.

As I said, the current user experience is suboptimal. The test case
'submodule update properly revives a moved submodule' in t7406 shows
what has to be done with current git to properly move a submodule,
which is way too much to remember for a regular git user.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 15:27           ` Lauri Alanko
  2012-10-14 16:10             ` Jens Lehmann
@ 2012-10-14 16:15             ` Jens Lehmann
  2012-10-14 16:25             ` Jens Lehmann
  2 siblings, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-14 16:15 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 14.10.2012 17:27, schrieb Lauri Alanko:
> Quoting "Jens Lehmann" <Jens.Lehmann@web.de>:
>> What's wrong with making git clone all submodules together with the
>> superproject (when the user said he wants to update all submodules on
>> clone too by setting a - still to be added - config option)?
> 
> Depends on how it's done. In a previous mail I just considered various ways to do it. If I see correctly, your choice is to read .gitmodules from every branch and every tag to find the total set of submodules used by the repository. As I said already, that is certainly possible, but it's just not very scalable, if fetch operations slow down linearly in the number of tags.

Currently "git fetch" checks all newly fetched commits for changes in
gitlinks too, so that would just add another file to that. And as a
fetch is pretty much linear in the number of newly fetched commits
anyway, its performance impact should be minimal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 15:27           ` Lauri Alanko
  2012-10-14 16:10             ` Jens Lehmann
  2012-10-14 16:15             ` Jens Lehmann
@ 2012-10-14 16:25             ` Jens Lehmann
  2012-10-14 18:04               ` Junio C Hamano
  2012-10-14 22:59               ` A design for subrepositories Lauri Alanko
  2 siblings, 2 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-14 16:25 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 14.10.2012 17:27, schrieb Lauri Alanko:
> Quoting "Jens Lehmann" <Jens.Lehmann@web.de>:
>> Did you notice that "git fetch" fetches all those submodules too which
>> have been updated in the commits fetched for the superproject, no matter
>> on what branch they are on?
> 
> No. This would be great, but this is what I get:
> 
> la@bq:~/tmp$ git init super
> Initialized empty Git repository in /home/la/tmp/super/.git/
> la@bq:~/tmp$ cd super
> la@bq:~/tmp/super$ echo foo > foo
> la@bq:~/tmp/super$ git add foo
> la@bq:~/tmp/super$ git ci -m foo
> [master (root-commit) 0f207c9] foo
>  1 file changed, 1 insertion(+)
>  create mode 100644 foo
> la@bq:~/tmp/super$ git branch nosubs
> la@bq:~/tmp/super$ git init sub
> Initialized empty Git repository in /home/la/tmp/super/sub/.git/
> la@bq:~/tmp/super$ cd sub
> la@bq:~/tmp/super/sub$ echo bar > bar
> la@bq:~/tmp/super/sub$ git add bar
> la@bq:~/tmp/super/sub$ git ci -m bar
> [master (root-commit) 180c6c9] bar
>  1 file changed, 1 insertion(+)
>  create mode 100644 bar
> la@bq:~/tmp/super/sub$ cd ..
> la@bq:~/tmp/super$ git submodule add ./sub
> Adding existing repo at 'sub' to the index
> la@bq:~/tmp/super$ git ci -m sub
> [master 16cff18] sub
>  2 files changed, 4 insertions(+)
>  create mode 100644 .gitmodules
>  create mode 160000 sub
> la@bq:~/tmp/super$ cd ..
> la@bq:~/tmp$ git clone super superc
> Cloning into 'superc'...
> done.
> la@bq:~/tmp$ cd superc
> la@bq:~/tmp/superc$ git submodule update --init
> Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
> Cloning into 'sub'...
> done.
> Submodule path 'sub': checked out '180c6c979289f4e25525003673e51d0e39dab8f6'
> la@bq:~/tmp/superc$ cd ../super/sub
> la@bq:~/tmp/super/sub$ echo baz >> bar
> la@bq:~/tmp/super/sub$ git ci -am baz
> [master 652c8b3] baz
>  1 file changed, 1 insertion(+)
> la@bq:~/tmp/super/sub$ cd ..
> la@bq:~/tmp/super$ git ci -am subbaz
> [master c7c3bfc] subbaz
>  1 file changed, 1 insertion(+), 1 deletion(-)
> la@bq:~/tmp/super$ cd ../superc
> la@bq:~/tmp/superc$ git co nosubs
> warning: unable to rmdir sub: Directory not empty
> Branch nosubs set up to track remote branch nosubs from origin.
> Switched to a new branch 'nosubs'
> la@bq:~/tmp/superc$ git fetch --recurse-submodules=yes
> remote: Counting objects: 3, done.
> remote: Compressing objects: 100% (2/2), done.
> remote: Total 2 (delta 1), reused 0 (delta 0)
> Unpacking objects: 100% (2/2), done.
> From /home/la/tmp/super
>    16cff18..c7c3bfc  master     -> origin/master
> la@bq:~/tmp/superc$ git co master
> Switched to branch 'master'
> Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
> la@bq:~/tmp/superc$ git fetch --recurse-submodules=yes
> Fetching submodule sub
> remote: Counting objects: 5, done.
> remote: Total 3 (delta 0), reused 0 (delta 0)
> Unpacking objects: 100% (3/3), done.
> From /home/la/tmp/super/sub
>    180c6c9..652c8b3  master     -> origin/master
> 
> So I had to checkout master in order to fetch the updates to the submodule used by master.

Yes, when you switch to a branch which hasn't got that submodule at all
that is the case (as currently the .gitmodules found in the work tree is
used to do the path -> name mapping). The culprit is the "git fetch" does
not yet examine the .gitmodules file of the commit it finds a submodule
change in, but uses the one currently found inside the work tree. But I'll
have to tackle too soon, as that also poses a problem when the submodule
was moved. So "no matter what branch they are on" is not always correct
at the moment ;-)

Again, the user experience is currently suboptimal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 16:25             ` Jens Lehmann
@ 2012-10-14 18:04               ` Junio C Hamano
  2012-10-14 19:32                 ` Jens Lehmann
  2012-10-19  0:31                 ` A design for distributed submodules Lauri Alanko
  2012-10-14 22:59               ` A design for subrepositories Lauri Alanko
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2012-10-14 18:04 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Lauri Alanko, git

Jens Lehmann <Jens.Lehmann@web.de> writes:

> Again, the user experience is currently suboptimal.

You mentioned multiple things in your responses that you are
planning to address, but I am wondering if the first step before
doing anything else is to have a list of known-to-be-suboptimal
things and publish it somewhere other people can find it.  Then
Lauri or others may able to help code the design of the approach to
address them for items you already have designs for, and they may
even be able to help designing the approach for the ones you don't.

More importantly, they do not have to waste time coming up with
incompatible tools.  Adding "works in this scenario that is
different from those other slightly different tools" to the mix of
third-party tool set would fragment and confuse the user base
("which one of 47 different tools, all of which are incomplete,
should I use?") and dilute developer attention.  They all at some
point want to interact with the core side, and without an overall
consistent design and coordination, some of their demand on the core
side would end up being imcompatible.

The "just let .gitmodules record which branch is of interest,
without checking out a specific commit bound to the superproject
tree and using as a base for diff" (aka floating submodule) could be
one of the items on the list, for example; to support it, we should
not have to throw the entire "git submodule" with the bathwater.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 18:04               ` Junio C Hamano
@ 2012-10-14 19:32                 ` Jens Lehmann
  2012-10-19  0:31                 ` A design for distributed submodules Lauri Alanko
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-14 19:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lauri Alanko, git

Am 14.10.2012 20:04, schrieb Junio C Hamano:
> Jens Lehmann <Jens.Lehmann@web.de> writes:
> 
>> Again, the user experience is currently suboptimal.
> 
> You mentioned multiple things in your responses that you are
> planning to address, but I am wondering if the first step before
> doing anything else is to have a list of known-to-be-suboptimal
> things and publish it somewhere other people can find it.  Then
> Lauri or others may able to help code the design of the approach to
> address them for items you already have designs for, and they may
> even be able to help designing the approach for the ones you don't.

I'm keeping such a list in the "Issues still to be tackled in this
repo" section of the Wiki page of my github repo:
   https://github.com/jlehmann/git-submod-enhancements/wiki

Currently that's just a collection of things to do and bugs to fix,
but if people are interested I'm willing to add descriptions of the
solutions I have in mind for those topics.

> More importantly, they do not have to waste time coming up with
> incompatible tools.  Adding "works in this scenario that is
> different from those other slightly different tools" to the mix of
> third-party tool set would fragment and confuse the user base
> ("which one of 47 different tools, all of which are incomplete,
> should I use?") and dilute developer attention.  They all at some
> point want to interact with the core side, and without an overall
> consistent design and coordination, some of their demand on the core
> side would end up being imcompatible.
> 
> The "just let .gitmodules record which branch is of interest,
> without checking out a specific commit bound to the superproject
> tree and using as a base for diff" (aka floating submodule) could be
> one of the items on the list, for example; to support it, we should
> not have to throw the entire "git submodule" with the bathwater.

Yup, that's also on that list under "always tip" mode.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* A design for distributed submodules
  2012-10-14 18:04               ` Junio C Hamano
  2012-10-14 19:32                 ` Jens Lehmann
@ 2012-10-19  0:31                 ` Lauri Alanko
  2012-10-19 20:09                   ` Jens Lehmann
  1 sibling, 1 reply; 17+ messages in thread
From: Lauri Alanko @ 2012-10-19  0:31 UTC (permalink / raw)
  To: git

I think I finally agree that it's best to develop submodules further
rather than introduce a new tool for the functionality I require. Here
are some explicit proposals for submodules so we can at least establish
agreement on what should be done. These are in order of decreasing
importance (to me).

* Upstreamless submodules

If there is no 'url' key defined for a submodule in .gitconfig, there is
no "authoritative upstream" for it. When a recursive
fetch/pull/clone/push is performed on a remote superproject, its
upstreamless submodules are also fetched/pulled/cloned/pushed directly
from/to the submodule repositories under the superproject .git/modules.
If this is the first time that remote's submodules are accessed, that
remote is initialized for the local submodules: the submodule of the
remote superproject becomes a remote of the local submodule, and is
given the same name as the remote of the superproject.

So, suppose we have a superproject with .gitmodules:

[submodule "sub"]
	path = sub

which is hosted at repositories at URL1 and URL2. Then we do:

git clone --recursive URL1 super
cd super
git remote add other URL2
git fetch --recursive URL2

Now .git/modules/sub/config has:

[remote "origin"]
	url = URL1/.git/modules/sub
[remote "other"]
	url = URL2/.git/modules/sub

So the effect is similar to just setting the submodule's url as
".git/modules/sub", except that:

  - it hides the implementation detail of the exact location of the
    submodule repository from the publicly visible configuration file

  - it also works with bare remotes (where the actual remote submodule
    location would be URL/modules/sub)

  - it allows multiple simultaneous superproject remotes (where
    git-submodule currently always resolves relative urls against
    branch.$branch.remote with no option to fetch from a different
    remote).

* Submodule discovery across all refs

This is what Jens already mentioned. If we fetch multiple refs of a
remote superproject, we also need to fetch _all_ of the submodules
referenced by _any_ of the refs, not just the ones in the currently
active branch. Finding the complete list of submodules probably has to
be implemented by reading .gitmodules in all of the (updated) refs,
which is a bit ugly, but not too bad.

* Recording the active branch of a submodule

When a submodule is added, its active branch should be stored in
.gitmodules as submodule.$sub.branch. Then, when the submodule is
checked out, and the head of that branch is the same as the commit in
the gitlink (i.e. the superproject tree is "current"), then that branch
is set as the active branch in the checked-out submodule working tree.
Otherwise, a detached head is used.

* Multiple working trees for a submodule

A superproject may have multiple paths for the same submodule,
presumably for different commits. This is for cases where the
superproject is a snapshot of a developer's directory hierarchy, and the
developer is simultaneously working on multiple branches of a submodule
and it is convenient to have separate working trees for each of them.

This is a bit hard to express with the current .gitconfig format, since
paths are attributes of repository ids instead of vice versa. I'd
introduce an alternative section format where you can say:

[mount "path1"]
   module = sub
   branch = master

[mount "path2"]
   module = sub
   branch = topic

Implementing this is a bit intricate, since we need to use the
git-new-workdir method to create multiple working directories that share
the same refs, config, and object store, but have separate HEAD and
index. I think this is a problem with the repository layout: the
non-workdir-specific bits should all be in a single directory so that a
single symlink would be enough.

Obviously, I'm willing to implement the above functionalities since I
need them. However, I think I'm going to work in Dulwich (which doesn't
currently have any submodule support): a Python API is currently more
important to me than a command-line tool, and the git.git codebase
doesn't look like a very attractive place to contribute anyway. No
offense, it's just not to my tastes.

So the main reason I'd like to reach some tentative agreement about the
details of the proposal is to ensure that _once_ someone finally
implements this kind of functionality in git.git, it will use the same
configuration format and same conventions, so that it will be compatible
with my code. The compatibility between different tools is after all the
main reason for doing this stuff as an extension to submodules instead
of something completely different.

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for distributed submodules
  2012-10-19  0:31                 ` A design for distributed submodules Lauri Alanko
@ 2012-10-19 20:09                   ` Jens Lehmann
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-19 20:09 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 19.10.2012 02:31, schrieb Lauri Alanko:
> I think I finally agree that it's best to develop submodules further
> rather than introduce a new tool for the functionality I require. Here
> are some explicit proposals for submodules so we can at least establish
> agreement on what should be done. These are in order of decreasing
> importance (to me).

Good to hear that!

> * Upstreamless submodules
> 
> If there is no 'url' key defined for a submodule in .gitconfig, there is
> no "authoritative upstream" for it. When a recursive
> fetch/pull/clone/push is performed on a remote superproject, its
> upstreamless submodules are also fetched/pulled/cloned/pushed directly
> from/to the submodule repositories under the superproject .git/modules.
> If this is the first time that remote's submodules are accessed, that
> remote is initialized for the local submodules: the submodule of the
> remote superproject becomes a remote of the local submodule, and is
> given the same name as the remote of the superproject.
> 
> So, suppose we have a superproject with .gitmodules:
> 
> [submodule "sub"]
>     path = sub
> 
> which is hosted at repositories at URL1 and URL2. Then we do:
> 
> git clone --recursive URL1 super
> cd super
> git remote add other URL2
> git fetch --recursive URL2
> 
> Now .git/modules/sub/config has:
> 
> [remote "origin"]
>     url = URL1/.git/modules/sub
> [remote "other"]
>     url = URL2/.git/modules/sub

So you want to automatically propagate the new superproject remote
"other" into the submodules?

> So the effect is similar to just setting the submodule's url as
> ".git/modules/sub", except that:
> 
>  - it hides the implementation detail of the exact location of the
>    submodule repository from the publicly visible configuration file
> 
>  - it also works with bare remotes (where the actual remote submodule
>    location would be URL/modules/sub)
> 
>  - it allows multiple simultaneous superproject remotes (where
>    git-submodule currently always resolves relative urls against
>    branch.$branch.remote with no option to fetch from a different
>    remote).

Maybe it's too late on a Friday evening in my timezone, but currently
I can't wrap my mind around what you have in mind here ... will try
again later.

> * Submodule discovery across all refs
> 
> This is what Jens already mentioned. If we fetch multiple refs of a
> remote superproject, we also need to fetch _all_ of the submodules
> referenced by _any_ of the refs, not just the ones in the currently
> active branch.

That is how things already work now (and it is done in an optimized
way because we only do a fetch in a submodule when the referenced
commit isn't already present locally). But the current limitation
is that only populated submodules are updated (we do a "git fetch"
inside the submodules work tree), so e.g. currently we can't follow
renames. We should also do a fetch for submodules which aren't
checked out but whose repo is found in .git/modules/<name>.

> Finding the complete list of submodules probably has to
> be implemented by reading .gitmodules in all of the (updated) refs,
> which is a bit ugly, but not too bad.

Yes, this will be necessary to get the correct path -> name mapping
for submodules which aren't found in the work tree (e.g. because
they are renamed). (I will also need to peek into another commit's
.gitmodules file to make the recursive checkout work for appearing
submodules for the same reason)

> * Recording the active branch of a submodule
> 
> When a submodule is added, its active branch should be stored in
> .gitmodules as submodule.$sub.branch. Then, when the submodule is
> checked out, and the head of that branch is the same as the commit in
> the gitlink (i.e. the superproject tree is "current"), then that branch
> is set as the active branch in the checked-out submodule working tree.
> Otherwise, a detached head is used.

We had some discussions about a "floating" submodule model where the
submodules follow the tip of a branch configured in .gitmodules. That
looked similar to what you have in mind, except that the tip of that
branch is always used.

> * Multiple working trees for a submodule
> 
> A superproject may have multiple paths for the same submodule,
> presumably for different commits. This is for cases where the
> superproject is a snapshot of a developer's directory hierarchy, and the
> developer is simultaneously working on multiple branches of a submodule
> and it is convenient to have separate working trees for each of them.
> 
> This is a bit hard to express with the current .gitconfig format, since
> paths are attributes of repository ids instead of vice versa. I'd
> introduce an alternative section format where you can say:
> 
> [mount "path1"]
>   module = sub
>   branch = master
> 
> [mount "path2"]
>   module = sub
>   branch = topic
> 
> Implementing this is a bit intricate, since we need to use the
> git-new-workdir method to create multiple working directories that share
> the same refs, config, and object store, but have separate HEAD and
> index. I think this is a problem with the repository layout: the
> non-workdir-specific bits should all be in a single directory so that a
> single symlink would be enough.

I'm not sure how good that'll work. E.g. what happens if the user
configures the URL of "path1" to something else? It looks to me like
having the same repo copied under different .git/modules/<name> would
be a more robust approach, even though it wastes some disk space.

> Obviously, I'm willing to implement the above functionalities since I
> need them. However, I think I'm going to work in Dulwich (which doesn't
> currently have any submodule support): a Python API is currently more
> important to me than a command-line tool, and the git.git codebase
> doesn't look like a very attractive place to contribute anyway. No
> offense, it's just not to my tastes.
> 
> So the main reason I'd like to reach some tentative agreement about the
> details of the proposal is to ensure that _once_ someone finally
> implements this kind of functionality in git.git, it will use the same
> configuration format and same conventions, so that it will be compatible
> with my code. The compatibility between different tools is after all the
> main reason for doing this stuff as an extension to submodules instead
> of something completely different.

Fair enough. But I fear unless we code the same functionality in both
worlds at about the same time the assumption that it will be done in
the future in the git core in the same way you expect may fail.

Having said that: I expect to implement peeking into another commit's
.gitmodules to read the config next after I finished the rm and mv for
submodules (and intend to use it for doing a fetch first), so maybe we
can start with that?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 16:25             ` Jens Lehmann
  2012-10-14 18:04               ` Junio C Hamano
@ 2012-10-14 22:59               ` Lauri Alanko
  2012-10-15 17:10                 ` Jens Lehmann
  1 sibling, 1 reply; 17+ messages in thread
From: Lauri Alanko @ 2012-10-14 22:59 UTC (permalink / raw)
  To: git

>> la@bq:~/tmp/super$ git mv sub movedsub
>> fatal: source directory is empty, source=sub, destination=movedsub
>
> This error here indicates that we didn't teach git to properly move
> a submodule yet. It is one of my next goals to make "git [submodule]
> mv sub movedsub" do the right thing here.

I'll digress here a bit: I'm not really fond of the idea of adding
special-purpose support into the core git commands. It just makes them
messier, and there will always be other tools that won't be supported by
git directly. I'd much rather see an mv-hook that arbitrary extensions
could use to update metadata associated with a tree entry.

Indeed, one of the reasons a separate tool seemed attractive to me was
that that way I could be sure that the tool was a high-level utility
that was completely implemented on top of basic low-level git
operations. The fact that git's submodule support manifests as bits and
pieces in various parts of the core seems a bit worrisome to me.

(Moreover, it's confusing to the user. I read the git-submodule man page
and thought that that described all the available submodule operations.
Only now did I find out that clone and fetch also have built-in
submodule functionality.)

>> la@bq:~/tmp/super$ mv sub movedsub
>
> Currently it is better to remove the submodule here, as recreating it
> with a "git submodule update" later will get the relative paths right.

This was a bit of a special case, as this was the original directory
where we did "git init sub" and "git submodule add ./sub". So "sub"
actually contains the real repository, not a gitlink to
.git/modules/sub. Arguably "git submodule add" should move the local
submodule's repository there.

>> la@bq:~/tmp/super$ git rm sub
>> rm 'sub'
>> la@bq:~/tmp/super$ git add movedsub
>
> And to git this adds a completely different submodule (as its name
> is not "sub"), which breaks your expectation.

Submodule? This is just a normal git add, not git submodule add. I
thought this just adds to the index a gitlink with the head revision in
movedsub, which is the same as the head revision was in sub, so it's
detected as a move of a gitlink.

> To do what you intended
> use this line instead:
>
> $ git update-index --add --cacheinfo 160000 $HASH movedsub

Doesn't this do exactly the same thing as "git add" for a directory
containing a repository?

>> la@bq:~/tmp/superc$ git submodule update --init
>> Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
>> fatal: repository '/home/la/tmp/super/sub' does not exist
>> Clone of '/home/la/tmp/super/sub' into submodule path 'sub' failed
>
> And that fails because to be able to clone a submodule it has to be
> pushed into its own repo first, so it can be cloned from there somewhere
> else. After doing that this will work.

Sorry, but I can't get this to work. To me it seems that when fetching
submodules from the origin, submodule.sub.url has to point to the actual
location of the repository, and if this is outdated or missing, the
fetch won't work.

It would make sense that if the url is missing, the submodule repo
inside origin's .git/modules would be used, but this doesn't seem to be
the case currently.

> Currently "git fetch" checks all newly fetched commits for changes in
> gitlinks too, so that would just add another file to that.

I only now realized that it is indeed enough to check .gitmodules only
in the _updated_ refs. The older refs are interested in their submodules
only up to a certain commit, and even if those submodules have been
updated upstream, we won't be interested in them until we have trees
with gitlinks pointing to the newer revisions.

So it turns out that my main technical argument against git-submodule's
potential scalability was false, and it is indeed feasible to make it
support all the features I require.

However, "always tip" mode would break this, since then even non-updated
branches might be interested in upstream changes to a submodule.

Anyway, I am a bit surprised to hear of such active development for
git-submodule. It's pretty old now (the shell script says 2007), and I
thought that if it were to ever support the kind of basic functionality
I require, it would do so already.

How soon do you envision support for bare repositories with submodules?

Lauri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-14 22:59               ` A design for subrepositories Lauri Alanko
@ 2012-10-15 17:10                 ` Jens Lehmann
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Lehmann @ 2012-10-15 17:10 UTC (permalink / raw)
  To: Lauri Alanko; +Cc: git

Am 15.10.2012 00:59, schrieb Lauri Alanko:
>>> la@bq:~/tmp/super$ git mv sub movedsub
>>> fatal: source directory is empty, source=sub, destination=movedsub
>>
>> This error here indicates that we didn't teach git to properly move
>> a submodule yet. It is one of my next goals to make "git [submodule]
>> mv sub movedsub" do the right thing here.
> 
> I'll digress here a bit: I'm not really fond of the idea of adding
> special-purpose support into the core git commands. It just makes them
> messier, and there will always be other tools that won't be supported by
> git directly. I'd much rather see an mv-hook that arbitrary extensions
> could use to update metadata associated with a tree entry.

One third of the participants of the GitSurvey2010 stated that they are
using submodules (e.g. more than gitattributes), so adding some support
for them into the core doesn't look that unwarranted to me. And believe
me, without putting support in there the user experience will stay
suboptimal.

> Indeed, one of the reasons a separate tool seemed attractive to me was
> that that way I could be sure that the tool was a high-level utility
> that was completely implemented on top of basic low-level git
> operations. The fact that git's submodule support manifests as bits and
> pieces in various parts of the core seems a bit worrisome to me.

I see it the other way around: Due to the fact that submodules were
only accessible via the submodule script and not integrated into the
core made a lot of people (e.g. the Jenkins Git plugin we are using at
work) code around that. That wouldn't have been necessary if I would
have finished my submodule update work at that time.

And e.g. you can't forget to add changes inside the submodule anymore
since diff and status learned to show those changes. And we still have
mis-merges at my dayjob due to not updated submodules, which will go
away the moment merge learns to update all submodules without merge
conflicts. And so on.

> (Moreover, it's confusing to the user. I read the git-submodule man page
> and thought that that described all the available submodule operations.
> Only now did I find out that clone and fetch also have built-in
> submodule functionality.)

Then the man page might need some overhaul. Care to take a look?

>>> la@bq:~/tmp/super$ mv sub movedsub
>>
>> Currently it is better to remove the submodule here, as recreating it
>> with a "git submodule update" later will get the relative paths right.
> 
> This was a bit of a special case, as this was the original directory
> where we did "git init sub" and "git submodule add ./sub". So "sub"
> actually contains the real repository, not a gitlink to
> .git/modules/sub. Arguably "git submodule add" should move the local
> submodule's repository there.

That sounds like a good idea.

>>> la@bq:~/tmp/super$ git rm sub
>>> rm 'sub'
>>> la@bq:~/tmp/super$ git add movedsub
>>
>> And to git this adds a completely different submodule (as its name
>> is not "sub"), which breaks your expectation.
> 
> Submodule? This is just a normal git add, not git submodule add. I
> thought this just adds to the index a gitlink with the head revision in
> movedsub, which is the same as the head revision was in sub, so it's
> detected as a move of a gitlink.

You're free to use simple gitlinks, but then you can't expect existing
and coming goodies - like git being able to move them around in the
work tree - work all by itself, because they are only possible with
submodule support.

>> To do what you intended
>> use this line instead:
>>
>> $ git update-index --add --cacheinfo 160000 $HASH movedsub
> 
> Doesn't this do exactly the same thing as "git add" for a directory
> containing a repository?

In my test case "git add movesub" silently does nothing, as my
directory is empty. So I need the update-index here.

>>> la@bq:~/tmp/superc$ git submodule update --init
>>> Submodule 'sub' (/home/la/tmp/super/sub) registered for path 'sub'
>>> fatal: repository '/home/la/tmp/super/sub' does not exist
>>> Clone of '/home/la/tmp/super/sub' into submodule path 'sub' failed
>>
>> And that fails because to be able to clone a submodule it has to be
>> pushed into its own repo first, so it can be cloned from there somewhere
>> else. After doing that this will work.
> 
> Sorry, but I can't get this to work. To me it seems that when fetching
> submodules from the origin, submodule.sub.url has to point to the actual
> location of the repository, and if this is outdated or missing, the
> fetch won't work.
> 
> It would make sense that if the url is missing, the submodule repo
> inside origin's .git/modules would be used, but this doesn't seem to be
> the case currently.

No it isn't. Patches welcome ;-)

> Anyway, I am a bit surprised to hear of such active development for
> git-submodule. It's pretty old now (the shell script says 2007), and I
> thought that if it were to ever support the kind of basic functionality
> I require, it would do so already.

So much to do, so little time.

> How soon do you envision support for bare repositories with submodules?

I'm not sure what you mean by that and what your use case is, but I'll
be happy to discuss design issues with you. But as that is not my itch
I don't expect to be working on that soon, as my next main goal is to
get recursive checkout working (currently I'm removing the obstacles
I find in my way towards that, but there is still quite some work to
do until I get there).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: A design for subrepositories
  2012-10-13 13:33 A design for subrepositories Lauri Alanko
  2012-10-13 17:30 ` Junio C Hamano
@ 2012-10-13 21:20 ` perryh
  1 sibling, 0 replies; 17+ messages in thread
From: perryh @ 2012-10-13 21:20 UTC (permalink / raw)
  To: la; +Cc: git

"Lauri Alanko" <la@iki.fi> wrote:

> I'm going to get a bit religious here:
> anything longer than a screenful shouldn't be written in shell ...

Whence cometh this religion?  I've heard of a modularity principle
wherein no one function, in any language, ought to be longer than a
page, but what's special about shell that warrants such a further
restriction?

BTW, to adherents of the mentioned religion, this:
http://www.freebsd.org/cgi/cvsweb.cgi/ports/ports-mgmt/portmaster/files/Attic/portmaster.sh.in?rev=2.32;content-type=text/plain
-- at just under 3600 lines -- is likely one of the greater heresies
around :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-10-19 20:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-13 13:33 A design for subrepositories Lauri Alanko
2012-10-13 17:30 ` Junio C Hamano
2012-10-13 21:23   ` Lauri Alanko
2012-10-14  4:36     ` Junio C Hamano
2012-10-14 10:19       ` Lauri Alanko
2012-10-14 13:28         ` Jens Lehmann
2012-10-14 15:27           ` Lauri Alanko
2012-10-14 16:10             ` Jens Lehmann
2012-10-14 16:15             ` Jens Lehmann
2012-10-14 16:25             ` Jens Lehmann
2012-10-14 18:04               ` Junio C Hamano
2012-10-14 19:32                 ` Jens Lehmann
2012-10-19  0:31                 ` A design for distributed submodules Lauri Alanko
2012-10-19 20:09                   ` Jens Lehmann
2012-10-14 22:59               ` A design for subrepositories Lauri Alanko
2012-10-15 17:10                 ` Jens Lehmann
2012-10-13 21:20 ` perryh

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).