git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFD/PATCH] submodule doc: describe where we can configure them
@ 2016-05-03 23:26 Stefan Beller
  2016-05-03 23:56 ` Jonathan Nieder
  2016-05-04 20:48 ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Beller @ 2016-05-03 23:26 UTC (permalink / raw)
  To: jrnieder; +Cc: git, Stefan Beller

This is similar to the gitignore document, but doesn't mirror
the current situation. It is rather meant to start a discussion for
the right approach for mirroring repositories with submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

 Jonathan, is this something you had in mind?

 Documentation/git-submodule.txt | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 13adebf..b5559e5 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -59,6 +59,22 @@ instead of treating the other project as a submodule. Directories
 that come from both projects can be cloned and checked out as a whole
 if you choose to go that route.
 
+Submodule operations can be configured using the following mechanisms
+(from highest to lowest precedence):
+
+ * the command line for those commands that support taking submodule specs.
+
+ * the configuration file `$GIT_DIR/config`.
+
+ * the configuration file `config` found in the `refs/submodule/config` branch.
+   This can be used to overwrite the upstream configuration in the `.gitmodules`
+   file without changing the history of the project.
+   Useful options here are overwriting the base, where relative URLs apply to,
+   when mirroring only parts of the larger collection of submodules.
+
+ * the `.gitmodules` file inside the repository. A project usually includes this
+   file to suggest defaults for the upstream collection of repositories.
+
 COMMANDS
 --------
 add::
-- 
2.8.0.rc4.10.geb92688.dirty

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-03 23:26 [RFD/PATCH] submodule doc: describe where we can configure them Stefan Beller
@ 2016-05-03 23:56 ` Jonathan Nieder
  2016-05-04  0:59   ` Stefan Beller
  2016-05-04 21:13   ` Junio C Hamano
  2016-05-04 20:48 ` Junio C Hamano
  1 sibling, 2 replies; 13+ messages in thread
From: Jonathan Nieder @ 2016-05-03 23:56 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Jens Lehmann, Heiko Voigt

Stefan Beller wrote:

> This is similar to the gitignore document, but doesn't mirror
> the current situation. It is rather meant to start a discussion for
> the right approach for mirroring repositories with submodules.

Ooh.

[...]
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -59,6 +59,22 @@ instead of treating the other project as a submodule. Directories
>  that come from both projects can be cloned and checked out as a whole
>  if you choose to go that route.
>  
> +Submodule operations can be configured using the following mechanisms
> +(from highest to lowest precedence):
> +
> + * the command line for those commands that support taking submodule specs.
> +
> + * the configuration file `$GIT_DIR/config`.
> +
> + * the configuration file `config` found in the `refs/submodule/config` branch.
> +   This can be used to overwrite the upstream configuration in the `.gitmodules`
> +   file without changing the history of the project.
> +   Useful options here are overwriting the base, where relative URLs apply to,
> +   when mirroring only parts of the larger collection of submodules.
> +
> + * the `.gitmodules` file inside the repository. A project usually includes this
> +   file to suggest defaults for the upstream collection of repositories.

(This documentation probably belongs in gitmodules(5) --- then,
git-submodule(1) could focus on command-line usage and point there for
configuration information.)

There are two aspects of this to be separated: what governs the behavior
of commands running locally, and where we get information about
submodules from a remote repository.

Local commands
--------------
The original submodule design was that local commands rely on
information from .git/config, and that information gets copied there
from .gitmodules when a submodule is initialized.  That way, a local
user can specify their preferred mirror or other options using some
straightforward 'git config' commands.

As a side effect, the settings in .git/config tell git which submodules
to pay attention to (which submodules were initialized).

When .gitmodules changes, the settings in .git/config are left alone,
since the end user *might* have manually set something up and we don't
want to trample on it.

This design is somewhat problematic for a few reasons:

- When I want to stop paying attention to a particular submodule and
  start paying attention to it again later, all my local settings are
  gone.

- When upstream adds a new submodule, I have to do the same manual
  work to change the options for that new submodule.

- When upstream changes submodule options (perhaps to fix a URL
  typo), I don't get those updates.

A fix is to use settings from .git/config when present and fall back
to .gitmodules when not.  I believe the submodule code has been slowly
moving in that direction for new features.  Perhaps we can do so for
existing features (like submodule.*.url) too.

An alternative would have been to introduce a .git/info/submodules
file that overrides settings from .gitmodules, analagous to
.git/info/excludes overriding .gitignore and .git/info/attributes
overriding .gitattributes.  We are already using .git/config for
this so that doesn't seem necessary.

Remote repositories
-------------------
The .gitmodules file has some odd properties as a place to put
configuration:

- it is versioned.  There is no way to change URLs in an old version
  of .gitmodules retroactively when a URL has changed.

- it is controlled by whoever writes history.  There is no way for me
  to change the URLs in my mirror of https://gerrit.googlesource.com/gerrit
  to match my mirror's different filesystem layout without producing
  my own history that diverges from the commits I am mirroring.

When the URLs in .gitmodules are relative URLs, this means that if
I mirror a superproject, I have to mirror all its submodules, too,
with the same layout.  It's not so easy for me to publish my copy
of the parent project and the one subproject I made changes in --- I
have to mirror everything.  In particular, this means I can't mirror
https://gerrit.googlesource.com/gerrit to github.

When the URLs in .gitmodules are absolute URLs, this means that if
I mirror a superproject, I cannot ask people consuming my mirror to
use my mirrors of child projects, too.  I cannot publish my copy of
the parent project and the one subproject I made changes in and
expect people to be able to "git clone --recurse-submodules" the
result successfully.

It is as though refs were stored in a .gitrefs file, with all the
attendant disadvantages, instead of being a separate component of
the repository that a particular repository owner can manipulate
without changing history.

To fix this, we could allow additional .gitmodules settings to be put
in another ref (perhaps something like "refs/repository/config" to allow
sharing additional repository-specific configuration in other files
within the same tree --- e.g., branch descriptions).  The semantics:

* If there is a gitmodules file in refs/repository/config in the
  repository I clone, then the submodule settings from it are stored
  locally somewhere that overrides .gitmodules.  Perhaps
  .git/info/<remotename>/gitmodules?

* Later fetches from the remote would also update this gitmodules
  file.

* Settings from this gitmodules file can be overridden locally
  using 'git config' until an explicit "git submodule sync" to
  override the local configuration.

What do you think?

If two different remotes provide conflicting values for a setting
in their gitmodules files, git would error out and ask the user
to intervene with a tie-breaking "git config" setting.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-03 23:56 ` Jonathan Nieder
@ 2016-05-04  0:59   ` Stefan Beller
  2016-05-04 15:01     ` Heiko Voigt
  2016-05-04 21:13   ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Beller @ 2016-05-04  0:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git@vger.kernel.org, Jens Lehmann, Heiko Voigt

On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Stefan Beller wrote:
>
>> This is similar to the gitignore document, but doesn't mirror
>> the current situation. It is rather meant to start a discussion for
>> the right approach for mirroring repositories with submodules.
>
> Ooh.

Thanks for writing such a detailed answer. :)

>
> [...]
>> --- a/Documentation/git-submodule.txt
>> +++ b/Documentation/git-submodule.txt
>> @@ -59,6 +59,22 @@ instead of treating the other project as a submodule. Directories
>>  that come from both projects can be cloned and checked out as a whole
>>  if you choose to go that route.
>>
>> +Submodule operations can be configured using the following mechanisms
>> +(from highest to lowest precedence):
>> +
>> + * the command line for those commands that support taking submodule specs.
>> +
>> + * the configuration file `$GIT_DIR/config`.
>> +
>> + * the configuration file `config` found in the `refs/submodule/config` branch.
>> +   This can be used to overwrite the upstream configuration in the `.gitmodules`
>> +   file without changing the history of the project.
>> +   Useful options here are overwriting the base, where relative URLs apply to,
>> +   when mirroring only parts of the larger collection of submodules.
>> +
>> + * the `.gitmodules` file inside the repository. A project usually includes this
>> +   file to suggest defaults for the upstream collection of repositories.
>
> (This documentation probably belongs in gitmodules(5) --- then,
> git-submodule(1) could focus on command-line usage and point there for
> configuration information.)

That makes sense!

>
> There are two aspects of this to be separated: what governs the behavior
> of commands running locally, and where we get information about
> submodules from a remote repository.

After reading the first time, this seems to also contain "historical context".

>
> Local commands
> --------------
> The original submodule design was that local commands rely on
> information from .git/config, and that information gets copied there
> from .gitmodules when a submodule is initialized.  That way, a local
> user can specify their preferred mirror or other options using some
> straightforward 'git config' commands.
>
> As a side effect, the settings in .git/config tell git which submodules
> to pay attention to (which submodules were initialized).
>
> When .gitmodules changes, the settings in .git/config are left alone,
> since the end user *might* have manually set something up and we don't
> want to trample on it.
>
> This design is somewhat problematic for a few reasons:
>
> - When I want to stop paying attention to a particular submodule and
>   start paying attention to it again later, all my local settings are
>   gone.
>
> - When upstream adds a new submodule, I have to do the same manual
>   work to change the options for that new submodule.
>
> - When upstream changes submodule options (perhaps to fix a URL
>   typo), I don't get those updates.
>
> A fix is to use settings from .git/config when present and fall back
> to .gitmodules when not.  I believe the submodule code has been slowly
> moving in that direction for new features.  Perhaps we can do so for
> existing features (like submodule.*.url) too.
>
> An alternative would have been to introduce a .git/info/submodules
> file that overrides settings from .gitmodules, analagous to
> .git/info/excludes overriding .gitignore and .git/info/attributes
> overriding .gitattributes.  We are already using .git/config for
> this so that doesn't seem necessary.

I don't know if it is a worthwhile goal nevertheless to move
the information about submodules to .git/info/submodules eventually
as that brings consistency across different features of Git?

>
> Remote repositories
> -------------------
> The .gitmodules file has some odd properties as a place to put
> configuration:
>
> - it is versioned.  There is no way to change URLs in an old version
>   of .gitmodules retroactively when a URL has changed.

I would not call it odd for having one versioned place. Consider your
build process is updated and the new build process produces new
intermediate files. You would add these files to the .gitignore file
eventually, but when building old revisions with the new build chain
you'd be surprised by all those untracked files being displayed.
Or another example: Recently in git.git some test helper files were moved.
By checking out an older version of git you see a lot of test-* files
in your worktree although they were ignored at another revision.

That paragraph got longer than expected, but I just wanted to say that
being versioned can be either good or bad.

>
> - it is controlled by whoever writes history.  There is no way for me
>   to change the URLs in my mirror of https://gerrit.googlesource.com/gerrit
>   to match my mirror's different filesystem layout without producing
>   my own history that diverges from the commits I am mirroring.

To come up with an analogy to ignored files:
If I use a project and use a different build system, I may see untracked
files as they are not ignored by the .gitignore file.

Then I have a way of ignoring them nevertheless in .git/info/excludes.
Sharing this information beyond this repository is hard though, but
that wasn't seen as a feature yet?

>
> When the URLs in .gitmodules are relative URLs, this means that if
> I mirror a superproject, I have to mirror all its submodules, too,
> with the same layout.  It's not so easy for me to publish my copy
> of the parent project and the one subproject I made changes in --- I
> have to mirror everything.  In particular, this means I can't mirror
> https://gerrit.googlesource.com/gerrit to github.

because the way repository URLs work are different for these 2 hosts.
googlesource.com allows to have URLs that are nested in another level
e.g. Gerrit references "../plugins/download-commands", such that
remote URL becomes https://gerrit.googlesource.com/plugins/download-commands

At Github we cannot create another level of nesting as their naming follows the
owner/name scheme.

>
> When the URLs in .gitmodules are absolute URLs, this means that if
> I mirror a superproject, I cannot ask people consuming my mirror to
> use my mirrors of child projects, too.  I cannot publish my copy of
> the parent project and the one subproject I made changes in and
> expect people to be able to "git clone --recurse-submodules" the
> result successfully.


>
> It is as though refs were stored in a .gitrefs file, with all the
> attendant disadvantages, instead of being a separate component of
> the repository that a particular repository owner can manipulate
> without changing history.
>
> To fix this, we could allow additional .gitmodules settings to be put
> in another ref (perhaps something like "refs/repository/config" to allow
> sharing additional repository-specific configuration in other files
> within the same tree --- e.g., branch descriptions).  The semantics:
>
> * If there is a gitmodules file in refs/repository/config in the
>   repository I clone, then the submodule settings from it are stored
>   locally somewhere that overrides .gitmodules.  Perhaps
>   .git/info/<remotename>/gitmodules?
>
> * Later fetches from the remote would also update this gitmodules
>   file.
>
> * Settings from this gitmodules file can be overridden locally
>   using 'git config' until an explicit "git submodule sync" to
>   override the local configuration.
>
> What do you think?
>
> If two different remotes provide conflicting values for a setting
> in their gitmodules files, git would error out and ask the user
> to intervene with a tie-breaking "git config" setting.

Let's look at an example with C mirroring from B, who mirrors from A.

The user who clones the superproject from C may want to obtain submodules
from either C or B or A. All this can be configured in
the refs/repository/config value of C, but in case it is not configured in C,
it may fall back to the same branch from B. When and how would B get
that branch?

Thanks for writing out this detailed brain dump :)
Stefan

>
> Thanks,
> Jonathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-04  0:59   ` Stefan Beller
@ 2016-05-04 15:01     ` Heiko Voigt
  2016-05-04 20:50       ` Stefan Beller
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Voigt @ 2016-05-04 15:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git@vger.kernel.org, Jens Lehmann

On Tue, May 03, 2016 at 05:59:58PM -0700, Stefan Beller wrote:
> On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> > Stefan Beller wrote:
> >
> >> This is similar to the gitignore document, but doesn't mirror
> >> the current situation. It is rather meant to start a discussion for
> >> the right approach for mirroring repositories with submodules.
> >
> > Ooh.
> 
> Thanks for writing such a detailed answer. :)

BTW, here is a pointer to the discussion (and what I wrote down) about
this from back in 2014:

https://github.com/jlehmann/git-submod-enhancements/wiki/Ideas#special-ref-overriding-gitmodules-values

> > To fix this, we could allow additional .gitmodules settings to be put
> > in another ref (perhaps something like "refs/repository/config" to allow
> > sharing additional repository-specific configuration in other files
> > within the same tree --- e.g., branch descriptions).  The semantics:
> >
> > * If there is a gitmodules file in refs/repository/config in the
> >   repository I clone, then the submodule settings from it are stored
> >   locally somewhere that overrides .gitmodules.  Perhaps
> >   .git/info/<remotename>/gitmodules?
> >
> > * Later fetches from the remote would also update this gitmodules
> >   file.
> >
> > * Settings from this gitmodules file can be overridden locally
> >   using 'git config' until an explicit "git submodule sync" to
> >   override the local configuration.
> >
> > What do you think?
> >
> > If two different remotes provide conflicting values for a setting
> > in their gitmodules files, git would error out and ask the user
> > to intervene with a tie-breaking "git config" setting.
> 
> Let's look at an example with C mirroring from B, who mirrors from A.
> 
> The user who clones the superproject from C may want to obtain submodules
> from either C or B or A. All this can be configured in
> the refs/repository/config value of C, but in case it is not configured in C,
> it may fall back to the same branch from B. When and how would B get
> that branch?

I think B has to setup that branch on its own when it starts to mirror
A and uses different submodule urls or other configs.

Jonathan you suggested to copy the content from a remote to
.git/info/<remotename>/gitmodules locally. How would one get it to the
remote side? It seems to me as if we would need to implement additional
infrastructure to do this. Would it not be simpler if we just kept it on
a ref on the local side as well? We already have the infrastructure to
read those values from a ref. We only would need to add something to
write them. Then a simple push, which could be aliased in form of a
git-submodule subcommand, suffices to get the values to the remote.

That also solves issues when people clone from their working copy.

I would like to think a little bit further about the conflict situation
when two remotes are providing values. Configuring this looks to me like
a nightmare for users. Maybe there is some sort of elegant solution?
E.g. like we use the values from remote A during a fetch from A, the
ones from B during a fetch from B and no values from a special ref in
case there is no remote operation involved. Since the main goal is to
support forking of submodules isn't there always a remote operation
involved?

My suggested scheme above does not solve the currently quite typical use
case where you might 'git fetch' without submodules first and then do
the submodule fetches during a 'git submodule update'. On the other hand
in the 'ideal future world' where submodules behave like "normal files" the
fetch will be done during the superproject fetch so in that case we
could solve such conflicts.

The main thing which we could keep in mind is that we only allow certain
values in such special refs. E.g. only the ones needed to support the
fork workflow. BTW, do we actually need to change other values than the
URL? Addtionally we ignore other values that are more related to the
overall project structure. E.g. like submodule.<name>.ignore.

Ok after writing this it really feels like special casing a lot. I would
not really call it elegant. At the same time limiting these special refs
to one special use case (forking) might help us to keep the user
interface[1] simpler and conflict free in the long run. Not sure. What
do you think?

Cheers Heiko

[1] Which is not the simplest already.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-03 23:26 [RFD/PATCH] submodule doc: describe where we can configure them Stefan Beller
  2016-05-03 23:56 ` Jonathan Nieder
@ 2016-05-04 20:48 ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-05-04 20:48 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jrnieder, git

Stefan Beller <sbeller@google.com> writes:

> This is similar to the gitignore document, but doesn't mirror
> the current situation. It is rather meant to start a discussion for
> the right approach for mirroring repositories with submodules.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
>  Jonathan, is this something you had in mind?
>
>  Documentation/git-submodule.txt | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index 13adebf..b5559e5 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -59,6 +59,22 @@ instead of treating the other project as a submodule. Directories
>  that come from both projects can be cloned and checked out as a whole
>  if you choose to go that route.
>  
> +Submodule operations can be configured using the following mechanisms
> +(from highest to lowest precedence):
> +
> + * the command line for those commands that support taking submodule specs.

Sorry, but have we introduced <submodule spec> as a Git lingo?  What
does it mean?

> +
> + * the configuration file `$GIT_DIR/config`.
> +
> + * the configuration file `config` found in the `refs/submodule/config` branch.
> +   This can be used to overwrite the upstream configuration in the `.gitmodules`
> +   file without changing the history of the project.
> +   Useful options here are overwriting the base, where relative URLs apply to,
> +   when mirroring only parts of the larger collection of submodules.

This smells like something server side people may come up with; how
would an end user with a usual "repository with working tree" layout
futz with this thing?  Can it even be checked out, or would we have
a UI similar to "notes"?

> + * the `.gitmodules` file inside the repository. A project usually includes this
> +   file to suggest defaults for the upstream collection of repositories.
> +
>  COMMANDS
>  --------
>  add::

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-04 15:01     ` Heiko Voigt
@ 2016-05-04 20:50       ` Stefan Beller
  2016-05-08 21:54         ` Heiko Voigt
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Beller @ 2016-05-04 20:50 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Jonathan Nieder, git@vger.kernel.org, Jens Lehmann

On Wed, May 4, 2016 at 8:01 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> On Tue, May 03, 2016 at 05:59:58PM -0700, Stefan Beller wrote:
>> On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> > Stefan Beller wrote:
>> >
>> >> This is similar to the gitignore document, but doesn't mirror
>> >> the current situation. It is rather meant to start a discussion for
>> >> the right approach for mirroring repositories with submodules.
>> >
>> > Ooh.
>>
>> Thanks for writing such a detailed answer. :)
>
> BTW, here is a pointer to the discussion (and what I wrote down) about
> this from back in 2014:
>
> https://github.com/jlehmann/git-submod-enhancements/wiki/Ideas#special-ref-overriding-gitmodules-values

Thanks for pointing at the prior discussion!
Although not much happened since then (code wise)?

>
>> > To fix this, we could allow additional .gitmodules settings to be put
>> > in another ref (perhaps something like "refs/repository/config" to allow
>> > sharing additional repository-specific configuration in other files
>> > within the same tree --- e.g., branch descriptions).  The semantics:
>> >
>> > * If there is a gitmodules file in refs/repository/config in the
>> >   repository I clone, then the submodule settings from it are stored
>> >   locally somewhere that overrides .gitmodules.  Perhaps
>> >   .git/info/<remotename>/gitmodules?
>> >
>> > * Later fetches from the remote would also update this gitmodules
>> >   file.
>> >
>> > * Settings from this gitmodules file can be overridden locally
>> >   using 'git config' until an explicit "git submodule sync" to
>> >   override the local configuration.
>> >
>> > What do you think?
>> >
>> > If two different remotes provide conflicting values for a setting
>> > in their gitmodules files, git would error out and ask the user
>> > to intervene with a tie-breaking "git config" setting.
>>
>> Let's look at an example with C mirroring from B, who mirrors from A.
>>
>> The user who clones the superproject from C may want to obtain submodules
>> from either C or B or A. All this can be configured in
>> the refs/repository/config value of C, but in case it is not configured in C,
>> it may fall back to the same branch from B. When and how would B get
>> that branch?
>
> I think B has to setup that branch on its own when it starts to mirror
> A and uses different submodule urls or other configs.
>
> Jonathan you suggested to copy the content from a remote to
> .git/info/<remotename>/gitmodules locally. How would one get it to the
> remote side? It seems to me as if we would need to implement additional
> infrastructure to do this. Would it not be simpler if we just kept it on
> a ref on the local side as well? We already have the infrastructure to
> read those values from a ref. We only would need to add something to
> write them. Then a simple push, which could be aliased in form of a
> git-submodule subcommand, suffices to get the values to the remote.

That is good idea!

>
> That also solves issues when people clone from their working copy.
>
> I would like to think a little bit further about the conflict situation
> when two remotes are providing values. Configuring this looks to me like
> a nightmare for users. Maybe there is some sort of elegant solution?
> E.g. like we use the values from remote A during a fetch from A, the
> ones from B during a fetch from B and no values from a special ref in
> case there is no remote operation involved. Since the main goal is to
> support forking of submodules isn't there always a remote operation
> involved?

Here is what I imagine
When B mirrors from A, B sets up this special ref for its repository,
e.g. refs/meta/submodule-B and have a symbolic ref pointing at that.
(e.g. SUBMODULE_CONFIG pointing at refs/meta/submodule-B,
which has a worktree which contains a .gitmodules files which
sets up
  "submodule.baz.url = http://B/baz"
  "submodule.relativeBase = http://A"

That way anyone cloning from B would get
the superproject and the submodule baz from B while the
rest of the submodules are found at A.

When C mirrors from A, they add another branch  refs/meta/submodule-C,
which can either be a fork of refs/meta/submodule-B with some changes on
top of it or it can add a reference to refs/meta/submodule-B, i.e. the
configuration
would be:

  "submodule.baseConfig = refs/meta/submodule-B"
  "submodule.foo.url = ssh://C/foo"

and SUBMODULE_CONFIG would point at refs/meta/submodule-C.

When cloning from C, the user would get

 * the superproject from C
 * submodule foo from C
 * submodule baz from B
 * all other submodules from A

By the inheriting property of the branch of B there are no conflicting values.
C could just overwrite submodule.baseConfig for example.

>
> My suggested scheme above does not solve the currently quite typical use
> case where you might 'git fetch' without submodules first and then do
> the submodule fetches during a 'git submodule update'. On the other hand
> in the 'ideal future world' where submodules behave like "normal files" the
> fetch will be done during the superproject fetch so in that case we
> could solve such conflicts.
>
> The main thing which we could keep in mind is that we only allow certain
> values in such special refs. E.g. only the ones needed to support the
> fork workflow. BTW, do we actually need to change other values than the
> URL? Addtionally we ignore other values that are more related to the
> overall project structure. E.g. like submodule.<name>.ignore.

Maybe we want to have a dedicated protocol field, eventually.
A,B,C may have different standards on what they use by default.
e.g. Use ssh at kernel.org, but http in a corporate mirror, because http is
the only protocol not blocked by firewall. So I could imagine that a
complete mirror of submodules with relative URLs wants to only replace
ssh by http.

>
> Ok after writing this it really feels like special casing a lot. I would
> not really call it elegant. At the same time limiting these special refs
> to one special use case (forking) might help us to keep the user
> interface[1] simpler and conflict free in the long run. Not sure. What
> do you think?



>
> Cheers Heiko
>
> [1] Which is not the simplest already.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-03 23:56 ` Jonathan Nieder
  2016-05-04  0:59   ` Stefan Beller
@ 2016-05-04 21:13   ` Junio C Hamano
  2016-05-08 22:01     ` Heiko Voigt
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-05-04 21:13 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Stefan Beller, git, Jens Lehmann, Heiko Voigt

Jonathan Nieder <jrnieder@gmail.com> writes:

> This design is somewhat problematic for a few reasons:
>
> - When I want to stop paying attention to a particular submodule and
>   start paying attention to it again later, all my local settings are
>   gone.

True; "[submodule "foo"] enabled = no" may also be a way to fix it
without throwing the whole with bathwater, though.

> - When upstream adds a new submodule, I have to do the same manual
>   work to change the options for that new submodule.

Because a new module is not automatically "init"ed by default?

Isn't "config only" vs "config with gitmodules fallback" orthogonal
to that issue?

> - When upstream changes submodule options (perhaps to fix a URL
>   typo), I don't get those updates.

True.

> A fix is to use settings from .git/config when present and fall back
> to .gitmodules when not.  

How would that fix the "upstream updated"?

I think an alternative suggested in an ancient time had a more
elaborate scheme:

 * Use .git/config as the authoritative source, but record
   sufficient information to detect the case and cope with it when
   entry in .gitmodules changes (details below).

 * When seeing a new .gitmodules entry, either by "git pull" or even
   "git checkout other-branch", copy that to .git/config (just like
   what "git submodule init" does).  It would be a policy decision
   to automatically enabling it or not.  If the policy is "no
   autoinit", then "module.<name>.inited = no" may also have to be
   added to .git/config at this point.

   Record contents of the entry in .gitmodules to the corresponding
   .git/config entry as "seen".

 * When the entry in .gitmodules for a submodule known to
   .git/config is different from what has been "seen", offer the
   user a chance to update corresponding .git/config entry, and
   append to the "seen" set of variants in .gitmodules so that the
   user will not be bugged with "we see .gitmodules entry for module
   <foo> is different from anything you have ever seen; do you want
   to make corresponding changes to the module entry in your
   .git/config" again.

which would handle all of the above, and without using anything from
.gitmodules before the user has a chance to vet it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-04 20:50       ` Stefan Beller
@ 2016-05-08 21:54         ` Heiko Voigt
  2016-05-09 17:32           ` Stefan Beller
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Voigt @ 2016-05-08 21:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git@vger.kernel.org, Jens Lehmann

Hi,

On Wed, May 04, 2016 at 01:50:24PM -0700, Stefan Beller wrote:
> On Wed, May 4, 2016 at 8:01 AM, Heiko Voigt <hvoigt@hvoigt.net> wrote:
> > On Tue, May 03, 2016 at 05:59:58PM -0700, Stefan Beller wrote:
> >> On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> >> > Stefan Beller wrote:
> >> >
> >> >> This is similar to the gitignore document, but doesn't mirror
> >> >> the current situation. It is rather meant to start a discussion for
> >> >> the right approach for mirroring repositories with submodules.
> >> >
> >> > Ooh.
> >>
> >> Thanks for writing such a detailed answer. :)
> >
> > BTW, here is a pointer to the discussion (and what I wrote down) about
> > this from back in 2014:
> >
> > https://github.com/jlehmann/git-submod-enhancements/wiki/Ideas#special-ref-overriding-gitmodules-values
> 
> Thanks for pointing at the prior discussion!
> Although not much happened since then (code wise)?

Yes, IIRC nothing happened code wise. It went so far that a rough
consensus was made but nobody actually stepped in to scratch that itch.

> > Jonathan you suggested to copy the content from a remote to
> > .git/info/<remotename>/gitmodules locally. How would one get it to the
> > remote side? It seems to me as if we would need to implement additional
> > infrastructure to do this. Would it not be simpler if we just kept it on
> > a ref on the local side as well? We already have the infrastructure to
> > read those values from a ref. We only would need to add something to
> > write them. Then a simple push, which could be aliased in form of a
> > git-submodule subcommand, suffices to get the values to the remote.
> 
> That is good idea!

Thanks.

> > That also solves issues when people clone from their working copy.
> >
> > I would like to think a little bit further about the conflict situation
> > when two remotes are providing values. Configuring this looks to me like
> > a nightmare for users. Maybe there is some sort of elegant solution?
> > E.g. like we use the values from remote A during a fetch from A, the
> > ones from B during a fetch from B and no values from a special ref in
> > case there is no remote operation involved. Since the main goal is to
> > support forking of submodules isn't there always a remote operation
> > involved?
> 
> Here is what I imagine
> When B mirrors from A, B sets up this special ref for its repository,
> e.g. refs/meta/submodule-B and have a symbolic ref pointing at that.
> (e.g. SUBMODULE_CONFIG pointing at refs/meta/submodule-B,
> which has a worktree which contains a .gitmodules files which
> sets up
>   "submodule.baz.url = http://B/baz"
>   "submodule.relativeBase = http://A"
> 
> That way anyone cloning from B would get
> the superproject and the submodule baz from B while the
> rest of the submodules are found at A.

This sounds sensible. But my imagination of a conflict was in a
different way. E.g. project A has a submodule B. And now A has a remote
1 where you publish and maybe another remote 2 where someone else (a
colleague?) publishes. Which configuration do you use? Here the two
remotes are independent instead of subsequent forks. In this case my
solution would be to use the configuration branch from 1 for B when
interacting with 1. I do not have completely checked whether we always
have a remote at hand for such a resolution.

> When C mirrors from A, they add another branch  refs/meta/submodule-C,
> which can either be a fork of refs/meta/submodule-B with some changes on
> top of it or it can add a reference to refs/meta/submodule-B, i.e. the
> configuration
> would be:
> 
>   "submodule.baseConfig = refs/meta/submodule-B"
>   "submodule.foo.url = ssh://C/foo"
> 
> and SUBMODULE_CONFIG would point at refs/meta/submodule-C.
> 
> When cloning from C, the user would get
> 
>  * the superproject from C
>  * submodule foo from C
>  * submodule baz from B
>  * all other submodules from A
> 
> By the inheriting property of the branch of B there are no conflicting values.
> C could just overwrite submodule.baseConfig for example.

So that means in the default case we create a chain of all previous
forks embedded in repository database. I am not saying that this is
necessarily a bad thing but I feel that it is a new property which we
should think about. It helps because users will get updated values from
sources that are in the chain. On the other hand it adds a lot of
dependencies which are point of failures in case a remote disappears. I
am undecided on this. I would prefer if we could let people play with it
a little (maybe on pu?) and then decide if there are practical pitfalls
with this.

> > My suggested scheme above does not solve the currently quite typical use
> > case where you might 'git fetch' without submodules first and then do
> > the submodule fetches during a 'git submodule update'. On the other hand
> > in the 'ideal future world' where submodules behave like "normal files" the
> > fetch will be done during the superproject fetch so in that case we
> > could solve such conflicts.
> >
> > The main thing which we could keep in mind is that we only allow certain
> > values in such special refs. E.g. only the ones needed to support the
> > fork workflow. BTW, do we actually need to change other values than the
> > URL? Addtionally we ignore other values that are more related to the
> > overall project structure. E.g. like submodule.<name>.ignore.
> 
> Maybe we want to have a dedicated protocol field, eventually.
> A,B,C may have different standards on what they use by default.
> e.g. Use ssh at kernel.org, but http in a corporate mirror, because http is
> the only protocol not blocked by firewall. So I could imagine that a
> complete mirror of submodules with relative URLs wants to only replace
> ssh by http.

By this you mean 'submodule.relativeBase' that you introduced above
right? Or something similar. These values I would still consider them
URL'ish. But my question was more geared towards this direction: Are
there other values than the ones used to assemble the URL that make
sense to share?

E.g.: Someone might want to fork a repository and might want to change
the default set of submodules that are populated with 'git submodule
update --init'. Is this something we should allow via these special refs
or is this actually changing the project structure and should also be
reflected in project history? IMO the latter is the case.

Only things like the technical organisation (like the place where a
repository can be found) justify to be outside of the repository IMO.

A repository without submodules does have one collection of remote
repository urls. To me adding proper fork support seems be the switch
from one collection for one repository to many collections for many
repositories. Since this one collection is already outside of the
superproject it makes sense to do the same for the submodules. So my
question reformulated could be: Are there more values we currently keep
inside the repository for submodules that actually belong outside? A
good indication could be that they are already outside in the
superproject.

I did not find any flaw in these statements yet, but maybe I am
oversimplifying?

Cheers Heiko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-04 21:13   ` Junio C Hamano
@ 2016-05-08 22:01     ` Heiko Voigt
  2016-05-09 16:19       ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Voigt @ 2016-05-08 22:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Stefan Beller, git, Jens Lehmann

Hi,

On Wed, May 04, 2016 at 02:13:47PM -0700, Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> > This design is somewhat problematic for a few reasons:
> >
> > - When I want to stop paying attention to a particular submodule and
> >   start paying attention to it again later, all my local settings are
> >   gone.
> 
> True; "[submodule "foo"] enabled = no" may also be a way to fix it
> without throwing the whole with bathwater, though.

IMO we already have this. With

	git config submodule.<name>.update none
	rm -rf <path>
	mkdir <path>

We remove a submodule from the working copy and disable any updates with
'git submodule update ...'. Maybe we should add this 'light' operation
as an option to 'git submodule deinit' in the long run?

> > - When upstream adds a new submodule, I have to do the same manual
> >   work to change the options for that new submodule.
> 
> Because a new module is not automatically "init"ed by default?
> 
> Isn't "config only" vs "config with gitmodules fallback" orthogonal
> to that issue?

What do you mean with "orthogonal to that issue"? AFAICS a gitmodule
fallback does not have that issue. Actually I would see it more like:
.gitmodule is the default and .git/config a possibility to override.
When viewing it like this and using .gitmodule directly is the default
a user does not have any issues when upstream changes submodule
configurations.

Or are we talking about subsequent forks from upstreams? Like C forked
from B and B from A... Then forget what I said.

> > - When upstream changes submodule options (perhaps to fix a URL
> >   typo), I don't get those updates.
> 
> True.

I would say it depends on what is your default view. See above.

> > A fix is to use settings from .git/config when present and fall back
> > to .gitmodules when not.  
> 
> How would that fix the "upstream updated"?

When .gitmodules is the default source "upstream updated" is
automatically read.

> I think an alternative suggested in an ancient time had a more
> elaborate scheme:
> 
>  * Use .git/config as the authoritative source, but record
>    sufficient information to detect the case and cope with it when
>    entry in .gitmodules changes (details below).
> 
>  * When seeing a new .gitmodules entry, either by "git pull" or even
>    "git checkout other-branch", copy that to .git/config (just like
>    what "git submodule init" does).  It would be a policy decision
>    to automatically enabling it or not.  If the policy is "no
>    autoinit", then "module.<name>.inited = no" may also have to be
>    added to .git/config at this point.
> 
>    Record contents of the entry in .gitmodules to the corresponding
>    .git/config entry as "seen".
> 
>  * When the entry in .gitmodules for a submodule known to
>    .git/config is different from what has been "seen", offer the
>    user a chance to update corresponding .git/config entry, and
>    append to the "seen" set of variants in .gitmodules so that the
>    user will not be bugged with "we see .gitmodules entry for module
>    <foo> is different from anything you have ever seen; do you want
>    to make corresponding changes to the module entry in your
>    .git/config" again.
> 
> which would handle all of the above, and without using anything from
> .gitmodules before the user has a chance to vet it.

I can see that for some users it might be important not to pull every
submodule that upstream decides they should have. On the other hand: Is
it really a decision a user can/should make during a pull or a checkout.
I would be annoyed by it, since it interrupts me from the thing I really
want to do and would mostly just choose some default (like always yes or
always no) depending on what is important to me (e.g. faster checkout or
complete repository). So IMO it is more sensible if we just give the
user some default to configure and then use that instead of asking
questions in a situation where the user is not ready to answer them.

And when the user has his defaults we can actually try to deduct such
decisions directly from .gitmodules and do not need to store anything in
.git/config as long as the user goes with the defaults.

Cheers Heiko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-08 22:01     ` Heiko Voigt
@ 2016-05-09 16:19       ` Junio C Hamano
  2016-05-11 15:50         ` Heiko Voigt
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-05-09 16:19 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Jonathan Nieder, Stefan Beller, git, Jens Lehmann

Heiko Voigt <hvoigt@hvoigt.net> writes:

>> > - When upstream adds a new submodule, I have to do the same manual
>> >   work to change the options for that new submodule.
>> 
>> Because a new module is not automatically "init"ed by default?
>> 
>> Isn't "config only" vs "config with gitmodules fallback" orthogonal
>> to that issue?
>
> What do you mean with "orthogonal to that issue"? AFAICS a gitmodule
> fallback does not have that issue.
>
> Actually I would see it more like:
> .gitmodule is the default and .git/config a possibility to override.

The way I read Jonathan's "I have to do the same manual..." above is:

  Back when I cloned, the upstream had one submodule A.  I didn't like
  some aspect of the configuration for that submodule so I did a
  customization in [submodule "A"] section of .git/config for it.

  Now the upstream added another submodule B.  I want a tweak similar
  to what I did to A applied to this one, but that would mean I need
  to edit the entry in .git/config copied by "init" from .gitmodules.

I do not see how difference between ".git/config is the only source
of truth" or ".git/config overrides what is in .gitmodules" would
matter to the above scenario.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-08 21:54         ` Heiko Voigt
@ 2016-05-09 17:32           ` Stefan Beller
  2016-05-11 16:54             ` Heiko Voigt
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Beller @ 2016-05-09 17:32 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: Jonathan Nieder, git@vger.kernel.org, Jens Lehmann

>> Here is what I imagine
>> When B mirrors from A, B sets up this special ref for its repository,
>> e.g. refs/meta/submodule-B and have a symbolic ref pointing at that.
>> (e.g. SUBMODULE_CONFIG pointing at refs/meta/submodule-B,
>> which has a worktree which contains a .gitmodules files which
>> sets up
>>   "submodule.baz.url = http://B/baz"
>>   "submodule.relativeBase = http://A"
>>
>> That way anyone cloning from B would get
>> the superproject and the submodule baz from B while the
>> rest of the submodules are found at A.
>
> This sounds sensible. But my imagination of a conflict was in a
> different way. E.g. project A has a submodule B. And now A has a remote
> 1 where you publish and maybe another remote 2 where someone else (a
> colleague?) publishes. Which configuration do you use? Here the two
> remotes are independent instead of subsequent forks. In this case my
> solution would be to use the configuration branch from 1 for B when
> interacting with 1. I do not have completely checked whether we always
> have a remote at hand for such a resolution.

I think it is the responsibility of the pusher to make sure the
configuration is sane.
So if I were to push to remote 2 and you push to remote 1, we'd both configure
the special branch of our superprojects for these remotes for that submodule.

If the superproject has relative urls for the submodule, all we had to do was
unset (or overwrite) the submodule.baseConfig.

>
>> When C mirrors from A, they add another branch  refs/meta/submodule-C,
>> which can either be a fork of refs/meta/submodule-B with some changes on
>> top of it or it can add a reference to refs/meta/submodule-B, i.e. the
>> configuration
>> would be:
>>
>>   "submodule.baseConfig = refs/meta/submodule-B"
>>   "submodule.foo.url = ssh://C/foo"
>>
>> and SUBMODULE_CONFIG would point at refs/meta/submodule-C.
>>
>> When cloning from C, the user would get
>>
>>  * the superproject from C
>>  * submodule foo from C
>>  * submodule baz from B
>>  * all other submodules from A
>>
>> By the inheriting property of the branch of B there are no conflicting values.
>> C could just overwrite submodule.baseConfig for example.
>
> So that means in the default case we create a chain of all previous
> forks embedded in repository database.

Not necessarily. I was just pointing out that this was possible. The
intermediate
party could decide that their upstream is too unreliable and not point
to their upstream.

This would incur the cost of having to clone all submodules and
overwriting the absolute
urls. For the relative URLs this would just work as of now.

All I wanted with that example is to offer the flexibility to not have
to clone all the
submodule, but I can fork a mega-project with 100s of submodules and maybe
just fiddle with one of them and then publish that.

> I am not saying that this is
> necessarily a bad thing but I feel that it is a new property which we
> should think about. It helps because users will get updated values from
> sources that are in the chain. On the other hand it adds a lot of
> dependencies which are point of failures in case a remote disappears. I
> am undecided on this. I would prefer if we could let people play with it
> a little (maybe on pu?) and then decide if there are practical pitfalls
> with this.
>
>> > My suggested scheme above does not solve the currently quite typical use
>> > case where you might 'git fetch' without submodules first and then do
>> > the submodule fetches during a 'git submodule update'. On the other hand
>> > in the 'ideal future world' where submodules behave like "normal files" the
>> > fetch will be done during the superproject fetch so in that case we
>> > could solve such conflicts.
>> >
>> > The main thing which we could keep in mind is that we only allow certain
>> > values in such special refs. E.g. only the ones needed to support the
>> > fork workflow. BTW, do we actually need to change other values than the
>> > URL? Addtionally we ignore other values that are more related to the
>> > overall project structure. E.g. like submodule.<name>.ignore.
>>
>> Maybe we want to have a dedicated protocol field, eventually.
>> A,B,C may have different standards on what they use by default.
>> e.g. Use ssh at kernel.org, but http in a corporate mirror, because http is
>> the only protocol not blocked by firewall. So I could imagine that a
>> complete mirror of submodules with relative URLs wants to only replace
>> ssh by http.
>
> By this you mean 'submodule.relativeBase' that you introduced above
> right? Or something similar. These values I would still consider them
> URL'ish. But my question was more geared towards this direction: Are
> there other values than the ones used to assemble the URL that make
> sense to share?
>
> E.g.: Someone might want to fork a repository and might want to change
> the default set of submodules that are populated with 'git submodule
> update --init'. Is this something we should allow via these special refs
> or is this actually changing the project structure and should also be
> reflected in project history? IMO the latter is the case.

That sounds reasonable.

>
> Only things like the technical organisation (like the place where a
> repository can be found) justify to be outside of the repository IMO.
>
> A repository without submodules does have one collection of remote
> repository urls. To me adding proper fork support seems be the switch
> from one collection for one repository to many collections for many
> repositories. Since this one collection is already outside of the
> superproject it makes sense to do the same for the submodules. So my
> question reformulated could be: Are there more values we currently keep
> inside the repository for submodules that actually belong outside? A
> good indication could be that they are already outside in the
> superproject.
>
> I did not find any flaw in these statements yet, but maybe I am
> oversimplifying?

They sound right to me.

>
> Cheers Heiko

Thanks for the discussion :)
Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-09 16:19       ` Junio C Hamano
@ 2016-05-11 15:50         ` Heiko Voigt
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Voigt @ 2016-05-11 15:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Stefan Beller, git, Jens Lehmann

On Mon, May 09, 2016 at 09:19:44AM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
> 
> >> > - When upstream adds a new submodule, I have to do the same manual
> >> >   work to change the options for that new submodule.
> >> 
> >> Because a new module is not automatically "init"ed by default?
> >> 
> >> Isn't "config only" vs "config with gitmodules fallback" orthogonal
> >> to that issue?
> >
> > What do you mean with "orthogonal to that issue"? AFAICS a gitmodule
> > fallback does not have that issue.
> >
> > Actually I would see it more like:
> > .gitmodule is the default and .git/config a possibility to override.
> 
> The way I read Jonathan's "I have to do the same manual..." above is:
> 
>   Back when I cloned, the upstream had one submodule A.  I didn't like
>   some aspect of the configuration for that submodule so I did a
>   customization in [submodule "A"] section of .git/config for it.
> 
>   Now the upstream added another submodule B.  I want a tweak similar
>   to what I did to A applied to this one, but that would mean I need
>   to edit the entry in .git/config copied by "init" from .gitmodules.
> 
> I do not see how difference between ".git/config is the only source
> of truth" or ".git/config overrides what is in .gitmodules" would
> matter to the above scenario.

I see with that explanation your comment makes sense to me. So what we
are here talking about is the wish to configure some general user set
settings that are applied to a group of/all submodules.

Thinking about it: Maybe sticking configurations to the submodule
groups, which Stefan Beller introduced in a different topic, could be a
direction we can go for such needs.

Cheers Heiko

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: [RFD/PATCH] submodule doc: describe where we can configure them
  2016-05-09 17:32           ` Stefan Beller
@ 2016-05-11 16:54             ` Heiko Voigt
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Voigt @ 2016-05-11 16:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Nieder, git@vger.kernel.org, Jens Lehmann

On Mon, May 09, 2016 at 10:32:50AM -0700, Stefan Beller wrote:
> >> Here is what I imagine
> >> When B mirrors from A, B sets up this special ref for its repository,
> >> e.g. refs/meta/submodule-B and have a symbolic ref pointing at that.
> >> (e.g. SUBMODULE_CONFIG pointing at refs/meta/submodule-B,
> >> which has a worktree which contains a .gitmodules files which
> >> sets up
> >>   "submodule.baz.url = http://B/baz"
> >>   "submodule.relativeBase = http://A"
> >>
> >> That way anyone cloning from B would get
> >> the superproject and the submodule baz from B while the
> >> rest of the submodules are found at A.
> >
> > This sounds sensible. But my imagination of a conflict was in a
> > different way. E.g. project A has a submodule B. And now A has a remote
> > 1 where you publish and maybe another remote 2 where someone else (a
> > colleague?) publishes. Which configuration do you use? Here the two
> > remotes are independent instead of subsequent forks. In this case my
> > solution would be to use the configuration branch from 1 for B when
> > interacting with 1. I do not have completely checked whether we always
> > have a remote at hand for such a resolution.
> 
> I think it is the responsibility of the pusher to make sure the
> configuration is sane.
> So if I were to push to remote 2 and you push to remote 1, we'd both configure
> the special branch of our superprojects for these remotes for that submodule.
> 
> If the superproject has relative urls for the submodule, all we had to do was
> unset (or overwrite) the submodule.baseConfig.

What if (because we work together) you and me have both remotes in our
local repository. We only push to our private remotes but fetch from
both. Since we work together we also forked the same submodule B and
have different URL configurations for it. I push to B1 and you to B2.
Now we both have two special branches (one from B1 and one from B2) in
our local repositories, since on either of our private remotes there is
one special branch.

Which values are valid now? I see you are advocating for a symbolic ref
SUBMODULE_CONFIG that points to a single special branch in charge, but
maybe we can avoid that. In this case there actually is no real
conflict, since we can just add both remotes B1, B2 to the submodule B.
Which one is used is a choice of the user during push.

For submodule.relativeBase we could try a similar solution and just add
all remotes that can be constructed with the different configurations.
Probably under the same name as in the superproject.

So if we limit ourselves to only allow URL'ish (actually remote'ish is
probably a better term) we can actually avoid conflict resolution and
just add/use them all. If we limit ourselves to the fork use case and my
hypothesis that we only need to allow remote'ish values in these special
branches for it is true, we can actually keep it quite simple and have
no conflict resolution at all I think (and realize now).

What do you think?

> >> When C mirrors from A, they add another branch  refs/meta/submodule-C,
> >> which can either be a fork of refs/meta/submodule-B with some changes on
> >> top of it or it can add a reference to refs/meta/submodule-B, i.e. the
> >> configuration
> >> would be:
> >>
> >>   "submodule.baseConfig = refs/meta/submodule-B"
> >>   "submodule.foo.url = ssh://C/foo"
> >>
> >> and SUBMODULE_CONFIG would point at refs/meta/submodule-C.
> >>
> >> When cloning from C, the user would get
> >>
> >>  * the superproject from C
> >>  * submodule foo from C
> >>  * submodule baz from B
> >>  * all other submodules from A
> >>
> >> By the inheriting property of the branch of B there are no conflicting values.
> >> C could just overwrite submodule.baseConfig for example.
> >
> > So that means in the default case we create a chain of all previous
> > forks embedded in repository database.
> 
> Not necessarily. I was just pointing out that this was possible. The
> intermediate
> party could decide that their upstream is too unreliable and not point
> to their upstream.
> 
> This would incur the cost of having to clone all submodules and
> overwriting the absolute
> urls. For the relative URLs this would just work as of now.
> 
> All I wanted with that example is to offer the flexibility to not have
> to clone all the
> submodule, but I can fork a mega-project with 100s of submodules and maybe
> just fiddle with one of them and then publish that.

Do you mean 'not having to fork all the submodules' here? Since 'without
cloning' is already possible, no?

I am assuming you meant fork. So submodule.relativeBase is meant to
solve that right? You set it and all relative submodule URLs that are
not configured otherwise relate to it.

My point was about the chaining with submodule.baseConfig. That is not
necessary to support partial forks of just a few submodules.

Actually while thinking about submodule.relativeBase now, I found it
might be nice to extend it a little. Imagine someone wants fork a set of
submodules and specify a relativeBase for them and then someone else
forking again wants to do that with another set of submodules. I imagine
subsequents forks are quite usual in git (like in the kernels workflow).
Maybe we can extend this scheme a little bit and allow to set
submodule.relativeBase for groups of submodules somehow?

> > I am not saying that this is
> > necessarily a bad thing but I feel that it is a new property which we
> > should think about. It helps because users will get updated values from
> > sources that are in the chain. On the other hand it adds a lot of
> > dependencies which are point of failures in case a remote disappears. I
> > am undecided on this. I would prefer if we could let people play with it
> > a little (maybe on pu?) and then decide if there are practical pitfalls
> > with this.
> >
[...]

> >
> > Only things like the technical organisation (like the place where a
> > repository can be found) justify to be outside of the repository IMO.
> >
> > A repository without submodules does have one collection of remote
> > repository urls. To me adding proper fork support seems be the switch
> > from one collection for one repository to many collections for many
> > repositories. Since this one collection is already outside of the
> > superproject it makes sense to do the same for the submodules. So my
> > question reformulated could be: Are there more values we currently keep
> > inside the repository for submodules that actually belong outside? A
> > good indication could be that they are already outside in the
> > superproject.
> >
> > I did not find any flaw in these statements yet, but maybe I am
> > oversimplifying?
> 
> They sound right to me.

Great. Then my simplification suggestion above should work as well.

> Thanks for the discussion :)
Thank you for caring about this topic! I think this is quite some
important work to get submodule forking almost as simple as forking
their superprojects. I am happy to continue this discussion and
bounce ideas off of each other :)

Cheers Heiko

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-11 16:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-03 23:26 [RFD/PATCH] submodule doc: describe where we can configure them Stefan Beller
2016-05-03 23:56 ` Jonathan Nieder
2016-05-04  0:59   ` Stefan Beller
2016-05-04 15:01     ` Heiko Voigt
2016-05-04 20:50       ` Stefan Beller
2016-05-08 21:54         ` Heiko Voigt
2016-05-09 17:32           ` Stefan Beller
2016-05-11 16:54             ` Heiko Voigt
2016-05-04 21:13   ` Junio C Hamano
2016-05-08 22:01     ` Heiko Voigt
2016-05-09 16:19       ` Junio C Hamano
2016-05-11 15:50         ` Heiko Voigt
2016-05-04 20:48 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).