git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC-PATCHv2] submodules: add a background story
@ 2017-02-09  2:08 Stefan Beller
  2017-02-09 23:32 ` Junio C Hamano
  2017-02-14  0:39 ` Brandon Williams
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Beller @ 2017-02-09  2:08 UTC (permalink / raw)
  Cc: git, bmwill, Stefan Beller

Just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), we'd like to provide some background
on submodules, which is not specific to the `submodule` command, but
elaborates on the background and its intended usage.

Add gitsubmodules(7), that explains the states, structure and usage of
submodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This would replace the last patch of  sb/submodule-doc, though it's still
RFC. In this revision I took care of the technical details (i.e. proper
formatting, spelling), and only slight rewording of the text.

The main issue persists; see bottom of the patch:

  SAMPLE WORKFLOWS (RFC/TODO)
  ---------------------------
  
  Do we need
  
  * an opinionated way to check for a specific state of a submodule
  * (submodule helper to be plumbing?)
  * expose the design mistake of having the (name->path) mapping inside the
    working tree, i.e. never remove a name from the submodule config even when
    the submodule doesn't exist any more.
    
Any opinion on these would be welcome!
Thanks,
Stefan

 Documentation/Makefile          |   1 +
 Documentation/git-submodule.txt |  36 ++------
 Documentation/gitsubmodules.txt | 194 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 200 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b43d66eae6..325c4735a7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 4a4cede144..d38aa2d53a 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 -----------
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[5]
 
 COMMANDS
 --------
@@ -420,6 +390,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
 to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
 for details.
 
+SEE ALSO
+--------
+linkgit:gitsubmodules[1], linkgit:gitmodules[1].
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
new file mode 100644
index 0000000000..3369d55ae9
--- /dev/null
+++ b/Documentation/gitsubmodules.txt
@@ -0,0 +1,194 @@
+gitsubmodules(7)
+================
+
+NAME
+----
+gitsubmodules - information about submodules
+
+SYNOPSIS
+--------
+$GIT_DIR/config, .gitmodules
+
+------------------
+git submodule
+------------------
+
+DESCRIPTION
+-----------
+
+A submodule allows you to keep another Git repository in a subdirectory
+of your repository. The other repository has its own history, which does not
+interfere with the history of the current repository. This can be used to
+have external dependencies such as third party libraries for example.
+
+Submodules are composed from a so-called `gitlink` tree entry
+in the main repository that refers to a particular commit object
+within the inner repository that is completely separate.
+A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
+root of the source tree assigns a logical name to the submodule and
+describes the default URL the submodule shall be cloned from.
+The logical name can be used for overriding this URL within your
+local repository configuration (see 'submodule init').
+
+Submodules are not to be confused with remotes, which are other
+repositories of the same project; submodules are meant for
+different projects you would like to make part of your source tree,
+while the history of the two projects still stays completely
+independent and you cannot modify the contents of the submodule
+from within the main project.
+If you want to merge the project histories and want to treat the
+aggregated whole as a single project from then on, you may want to
+add a remote for the other project and use the 'subtree' merge strategy,
+instead of treating the other project as a submodule. Directories
+that come from both projects can be cloned and checked out as a whole
+if you choose to go that route.
+
+When cloning or pulling a repository containing submodules however,
+the submodules will not be checked out by default; You need to instruct
+'clone' to recurse into submodules. The 'init' and 'update' subcommands
+of 'git submodule' will maintain submodules checked out and at an
+appropriate revision in your working tree.
+
+WHEN TO USE
+-----------
+
+Submodules, repositories inside other repositories,
+can be used for different use cases:
+
+* To have finer grained access control.
+  The design principles of Git do not allow for partial repositories to be
+  checked out or transferred. A repository is the smallest unit that a user
+  can be given access to. Submodules are separate repositories, such that
+  you can restrict access to parts of your project via the use of submodules.
+
+* To decouple Git histories.
+  Decoupling histories has different benefits.
+
+** When you want to use a (third party) library tied to a specific version.
+   Using submodules for a library allows you to have a clean history for
+   your own project and only updating the library in the submodule when needed.
+
+** In its current form Git scales up poorly for very large repositories that
+   change a lot, as the history grows very large. For that you may want to look
+   at shallow clone, sparse checkout or git-lfs.
+   However you can also use submodules to e.g. hold large binary assets
+   and these repositories are then shallowly cloned such that you do not
+   have a large history locally.
+
+STATES
+------
+
+When working with submodules, you can think of them as in a state machine.
+So each submodule can be in a different state, the following indicators are used:
+
+* the existence of the setting of 'submodule.<name>.url' in the
+  superprojects configuration
+* the existence of the submodules working tree within the
+  working tree of the superproject
+* the existence of the submodules git directory within the superprojects
+  git directory at $GIT_DIR/modules/<name> or within the submodules working
+  tree
+
+      State      URL config        working tree     git dir
+      -----------------------------------------------------
+      uninitialized    no               no           no
+      initialized     yes               no           no
+      populated       yes              yes          yes
+      depopulated     yes               no          yes
+      deinitialized    no               no          yes
+      uninteresting    no              yes          yes
+
+      invalid          no              yes           no
+      invalid         yes              yes           no
+      -----------------------------------------------------
+
+The first six states can be reached by normal git usage, the latter two are
+only shown for completeness to show all possible eight states with 3 binary
+indicators. The states in detail:
+
+uninitialized::
+The uninitialized state is the default state if no
+'--recurse-submodules' / '--recursive'. An empty directory will be put in
+the working tree as a place holder, such that you are reminded of the
+existence of the submodule.
+---
+To transition into the initialized state
+you can use 'git submodule init', which copies the presets from the
+.gitmodules file into the config.
+
+initialized::
+Users transitioned from the uninitialized state to this state via
+'git submodule init', which preset the URL configuration. As these URLs
+may not be desired in certain scenarios, this state allows to change the
+URLs.  For example in a corporate environment you may want to run
+
+    sed -i s/example.org/$internal-mirror/ .git/config
++
+before proceeding to populate the submodules.
+
+populated::
+In the populated state you have the submodule fully available, i.e. the git
+directory exists as well the working tree exists. In this state you can work
+with the submodule, just like with any other repository.
+
+depopulated::
+In this state you still have the git directory around, but the working tree
+is gone.  For example when the superproject checks out a revision that doesn't
+have the submodule, the state may change to depopulated.
+
+deinitialized::
+The git directory is still there, but the user is no longer interested in the
+submodule as indicated by the missing URL configuration.
+
+invalid::
+When there is no git directory for a submodule, then there is something
+seriously wrong with the submodule.
+
+INNER WORKINGS
+--------------
+
+Generally a submodule can be considered its own autonomous repository,
+that has a worktree and a git directory at split places.
+
+The superproject only records the commit sha1 in its tree, such that
+any other information, e.g. where to obtain a copy from, is not recorded
+in the core data structures of Git. The porcelain layer of Git however
+makes use of the .gitmodules file that gives strong hints where and how
+to obtain a copy of the submodules git repository from.
+
+On the location of the git directory
+------------------------------------
+
+Since v1.7.7 of Git, the git directory of submodules is stored inside the
+superprojects git directory at $GIT_DIR/modules/<submodule-name>
+This location allows for the working tree to be non existent while keeping
+the history around. So we can use git-rm on a submodule without loosing
+information that may only be local.
+
+In the future we may see git-checkout that can checkout submodules and
+revisions that do not contain the submodule can still be checked out without
+having to drop the submodules git directory.
+
+It is also possible to imagine a future in which a bare repository still
+contains its submodules inside the modules sub directory, such that you can
+get a full clone including submodules from that bare repository, the URLs
+as configured or given in the .gitmodules would only be used as a backup.
+
+SAMPLE WORKFLOWS (RFC/TODO)
+---------------------------
+
+Do we need
+
+* an opinionated way to check for a specific state of a submodule
+* (submodule helper to be plumbing?)
+* expose the design mistake of having the (name->path) mapping inside the
+  working tree, i.e. never remove a name from the submodule config even when
+  the submodule doesn't exist any more.
+
+SEE ALSO
+--------
+linkgit:git-submodule[1], linkgit:gitmodules[1].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.12.0.rc0.1.g018cb5e6f4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-09  2:08 [RFC-PATCHv2] submodules: add a background story Stefan Beller
@ 2017-02-09 23:32 ` Junio C Hamano
  2017-02-14 21:46   ` Stefan Beller
  2017-02-14  0:39 ` Brandon Williams
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-02-09 23:32 UTC (permalink / raw)
  To: Stefan Beller

Stefan Beller <sbeller@google.com> writes:

> Just like gitmodules(5), gitattributes(5), gitcredentials(7),
> gitnamespaces(7), gittutorial(7), we'd like to provide some background
> on submodules, which is not specific to the `submodule` command, but
> elaborates on the background and its intended usage.
>
> Add gitsubmodules(7), that explains the states, structure and usage of
> submodules.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> This would replace the last patch of  sb/submodule-doc, though it's still
> RFC. In this revision I took care of the technical details (i.e. proper
> formatting, spelling), and only slight rewording of the text.
>
> The main issue persists; see bottom of the patch:
>
>   SAMPLE WORKFLOWS (RFC/TODO)
>   ---------------------------
>   
>   Do we need
>   
>   * an opinionated way to check for a specific state of a submodule
>   * (submodule helper to be plumbing?)
>   * expose the design mistake of having the (name->path) mapping inside the
>     working tree, i.e. never remove a name from the submodule config even when
>     the submodule doesn't exist any more.

I am not sure about the last item.  

Are you talking about a case where submodule comes and goes (think:
"git checkout v1.0" that would make submodules added since that
version disappar)?  .gitmodules that is checked out would not have
any entry, but .git/config needs to record the end-user preference
for the module, so that the user can do "git checkout -" to come
back, no?  IOW .git/config that mentions all the submodule the user
ever showed interests in is not a design mistake, so you must be
talking about something else, but I do not know what it is.

> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index 4a4cede144..d38aa2d53a 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -24,37 +24,7 @@ DESCRIPTION
>  -----------
>  Inspects, updates and manages submodules.
>  
> -A submodule allows you to keep another Git repository in a subdirectory
> ...
> -if you choose to go that route.
> +For more information about submodules, see linkgit:gitsubmodules[5]

OK.

> @@ -420,6 +390,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
>  to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
>  for details.
>  
> +SEE ALSO
> +--------
> +linkgit:gitsubmodules[1], linkgit:gitmodules[1].

Are they both in section (1)?  I think the former (concepts) belongs
to section 7 and the latter (file formats) belongs to section 5.

> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> new file mode 100644
> index 0000000000..3369d55ae9
> --- /dev/null
> +++ b/Documentation/gitsubmodules.txt
> @@ -0,0 +1,194 @@
> +gitsubmodules(7)
> +================
> +
> +NAME
> +----
> +gitsubmodules - information about submodules
> +
> +SYNOPSIS
> +--------
> +$GIT_DIR/config, .gitmodules
> +
> +------------------
> +git submodule
> +------------------
> +
> +DESCRIPTION
> +-----------
> +
> +A submodule allows you to keep another Git repository in a subdirectory
> +...
> +When cloning or pulling a repository containing submodules however,
> +the submodules will not be checked out by default; You need to instruct
> +'clone' to recurse into submodules. The 'init' and 'update' subcommands

I think this is not "You need to", but rather "You can, if you want
to have each and every submodules."

> +of 'git submodule' will maintain submodules checked out and at an
> +appropriate revision in your working tree.
> +
> +WHEN TO USE
> +-----------
> +
> +Submodules, repositories inside other repositories,
> +can be used for different use cases:
> +
> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.
> +
> +* To decouple Git histories.
> +  Decoupling histories has different benefits.
> +
> +** When you want to use a (third party) library tied to a specific version.
> +   Using submodules for a library allows you to have a clean history for
> +   your own project and only updating the library in the submodule when needed.

I somehow do not see this as decoupling; it is keeping what is
originally separate separate, isn't it?

> +** In its current form Git scales up poorly for very large repositories that
> +   change a lot, as the history grows very large. For that you may want to look
> +   at shallow clone, sparse checkout or git-lfs.
> +   However you can also use submodules to e.g. hold large binary assets
> +   and these repositories are then shallowly cloned such that you do not
> +   have a large history locally.

In other words, a better way to list these may be 

 1. using another project that stands on its own.

 2. artificially split a (logically single) project into multiple
    repositories and tying them back together.

The access control and performance reasons are subclasses of 2.
IOW, if Git had per-path ACL and infinite scaling, you wouldn't be
splitting your project into submodules for 2.  You would still want
to use somebody else's project by binding it as a subproject, instead
of merging its history into yours.

> +When working with submodules, you can think of them as in a state machine.
> +So each submodule can be in a different state, the following indicators are used:
> +
> +* the existence of the setting of 'submodule.<name>.url' in the
> +  superprojects configuration
> +* the existence of the submodules working tree within the
> +  working tree of the superproject
> +* the existence of the submodules git directory within the superprojects
> +  git directory at $GIT_DIR/modules/<name> or within the submodules working
> +  tree
> +
> +      State      URL config        working tree     git dir
> +      -----------------------------------------------------
> +      uninitialized    no               no           no
> +      initialized     yes               no           no
> +      populated       yes              yes          yes
> +      depopulated     yes               no          yes
> +      deinitialized    no               no          yes
> +      uninteresting    no              yes          yes
> +
> +      invalid          no              yes           no
> +      invalid         yes              yes           no

I do not have strong opinions on these labels; are submodule folks
happy with the above vocabulary?

"uninteresting" is not explained in the below?

> ...
> +SEE ALSO
> +--------
> +linkgit:git-submodule[1], linkgit:gitmodules[1].

Ditto.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-09  2:08 [RFC-PATCHv2] submodules: add a background story Stefan Beller
  2017-02-09 23:32 ` Junio C Hamano
@ 2017-02-14  0:39 ` Brandon Williams
  1 sibling, 0 replies; 10+ messages in thread
From: Brandon Williams @ 2017-02-14  0:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On 02/08, Stefan Beller wrote:
> +STATES
> +------
> +
> +When working with submodules, you can think of them as in a state machine.
> +So each submodule can be in a different state, the following indicators are used:
> +
> +* the existence of the setting of 'submodule.<name>.url' in the
> +  superprojects configuration
> +* the existence of the submodules working tree within the
> +  working tree of the superproject
> +* the existence of the submodules git directory within the superprojects
> +  git directory at $GIT_DIR/modules/<name> or within the submodules working
> +  tree
> +
> +      State      URL config        working tree     git dir
> +      -----------------------------------------------------
> +      uninitialized    no               no           no
> +      initialized     yes               no           no
> +      populated       yes              yes          yes
> +      depopulated     yes               no          yes
> +      deinitialized    no               no          yes
> +      uninteresting    no              yes          yes
> +
> +      invalid          no              yes           no
> +      invalid         yes              yes           no
> +      -----------------------------------------------------
> +
> +The first six states can be reached by normal git usage, the latter two are
> +only shown for completeness to show all possible eight states with 3 binary
> +indicators. The states in detail:
> +
> +uninitialized::
> +The uninitialized state is the default state if no
> +'--recurse-submodules' / '--recursive'. An empty directory will be put in
> +the working tree as a place holder, such that you are reminded of the
> +existence of the submodule.
> +---
> +To transition into the initialized state
> +you can use 'git submodule init', which copies the presets from the
> +.gitmodules file into the config.
> +
> +initialized::
> +Users transitioned from the uninitialized state to this state via
> +'git submodule init', which preset the URL configuration. As these URLs
> +may not be desired in certain scenarios, this state allows to change the
> +URLs.  For example in a corporate environment you may want to run
> +
> +    sed -i s/example.org/$internal-mirror/ .git/config
> ++

Maybe we can try to brainstorm and come up with some clearer terminology
while we are at it.  I was trying to think about the "initialized" state
and I may be the only one but it seems unclear what "initialized" means.
I mean I already have all the information about a submodule in the
.gitmodules file, isn't it already initialized then?   Maybe this state
would be better named "(in)active" as a module that is interesting to a
user is "active"?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-09 23:32 ` Junio C Hamano
@ 2017-02-14 21:46   ` Stefan Beller
  2017-02-14 21:56     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Beller @ 2017-02-14 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Brandon Williams

Sorry for dropping the ball here, I was stressed out a bit.

On Thu, Feb 9, 2017 at 3:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>   Do we need
>>
>>   * an opinionated way to check for a specific state of a submodule
>>   * (submodule helper to be plumbing?)
>>   * expose the design mistake of having the (name->path) mapping inside the
>>     working tree, i.e. never remove a name from the submodule config even when
>>     the submodule doesn't exist any more.
>
> I am not sure about the last item.
>
> Are you talking about a case where submodule comes and goes (think:
> "git checkout v1.0" that would make submodules added since that
> version disappar)?  .gitmodules that is checked out would not have
> any entry, but .git/config needs to record the end-user preference
> for the module, so that the user can do "git checkout -" to come
> back, no?

That is perfectly legit and I agree that is good design.

>  IOW .git/config that mentions all the submodule the user
> ever showed interests in is not a design mistake, so you must be
> talking about something else, but I do not know what it is.

I mean that we
(1) have a gitmodules file tracked in git that includes the name.
The "tracking some information inside the version control to
help the very version control system" is also not bad. The bad part
is that the name *must not be changed* and
 * we do not tell people about it in the docs
 * we happily make commits that change the name of a submodule
(2) name the submodule by path be default

See
https://public-inbox.org/git/7e54658a-dcb2-64a7-3c67-0c4fa221b2fb@gmail.com/

    > Oh, I see. You did not just rename the path, but also the name
    > in the .gitmodules?

    I wasn't even aware that the submodule name was something different from
    the path because the name is by default set to be the path to it.

You could blame this specific instance on the user, but I rather blame it on Git
as such questions come up once in a while on the mailing list.

If we were to redesign the .gitmodules file, we might have it as

    [submodule "path"]
        url = git://example.org
        branch = .
        ...

and the "path -> name/UID" mapping would be inside $GIT_DIR.

>
> Are they both in section (1)?  I think the former (concepts) belongs
> to section 7 and the latter (file formats) belongs to section 5.

oops. Will fix.

>
>> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
>> new file mode 100644
>> index 0000000000..3369d55ae9
>> --- /dev/null
>> +++ b/Documentation/gitsubmodules.txt
>> @@ -0,0 +1,194 @@
>> +gitsubmodules(7)
>> +================
>> +
>> +NAME
>> +----
>> +gitsubmodules - information about submodules
>> +
>> +SYNOPSIS
>> +--------
>> +$GIT_DIR/config, .gitmodules
>> +
>> +------------------
>> +git submodule
>> +------------------
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +A submodule allows you to keep another Git repository in a subdirectory
>> +...
>> +When cloning or pulling a repository containing submodules however,
>> +the submodules will not be checked out by default; You need to instruct
>> +'clone' to recurse into submodules. The 'init' and 'update' subcommands
>
> I think this is not "You need to", but rather "You can, if you want
> to have each and every submodules."

ok. In this  man page for submodules I assumed an implicit
"[if you want these submodules to be there, then] you have to/need to ...

But I'll tone it down as it doesn't carry internal assumptions.

>> +
>> +** When you want to use a (third party) library tied to a specific version.
>> +   Using submodules for a library allows you to have a clean history for
>> +   your own project and only updating the library in the submodule when needed.
>
> I somehow do not see this as decoupling; it is keeping what is
> originally separate separate, isn't it?

ok I'll reword that to say keeping separate things separate.

>
>> +** In its current form Git scales up poorly for very large repositories that
>> +   change a lot, as the history grows very large. For that you may want to look
>> +   at shallow clone, sparse checkout or git-lfs.
>> +   However you can also use submodules to e.g. hold large binary assets
>> +   and these repositories are then shallowly cloned such that you do not
>> +   have a large history locally.
>
> In other words, a better way to list these may be
>
>  1. using another project that stands on its own.
>
>  2. artificially split a (logically single) project into multiple
>     repositories and tying them back together.
>
> The access control and performance reasons are subclasses of 2.
> IOW, if Git had per-path ACL and infinite scaling, you wouldn't be
> splitting your project into submodules for 2.  You would still want
> to use somebody else's project by binding it as a subproject, instead
> of merging its history into yours.

Looking at the big picture with a logical view is better indeed.

>
>> +When working with submodules, you can think of them as in a state machine.
>> +So each submodule can be in a different state, the following indicators are used:
>> +
>> +* the existence of the setting of 'submodule.<name>.url' in the
>> +  superprojects configuration
>> +* the existence of the submodules working tree within the
>> +  working tree of the superproject
>> +* the existence of the submodules git directory within the superprojects
>> +  git directory at $GIT_DIR/modules/<name> or within the submodules working
>> +  tree
>> +
>> +      State      URL config        working tree     git dir
>> +      -----------------------------------------------------
>> +      uninitialized    no               no           no
>> +      initialized     yes               no           no
>> +      populated       yes              yes          yes
>> +      depopulated     yes               no          yes
>> +      deinitialized    no               no          yes
>> +      uninteresting    no              yes          yes
>> +
>> +      invalid          no              yes           no
>> +      invalid         yes              yes           no
>
> I do not have strong opinions on these labels; are submodule folks
> happy with the above vocabulary?

Brandon suggested (in)active instead of (un)initialized, which is better as
it decouples the current process from the actual states. Once we reintroduce
[1], then the user would not need to run "init" (whether it is 'git
submodule init'
or implicit as e.g. 'git submodule update --init') any more, but the selection
of active submodules would be done via config.

[1] https://public-inbox.org/git/20161110203428.30512-35-sbeller@google.com/

>
> "uninteresting" is not explained in the below?

will fix.

>
>> ...
>> +SEE ALSO
>> +--------
>> +linkgit:git-submodule[1], linkgit:gitmodules[1].
>
> Ditto.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 21:46   ` Stefan Beller
@ 2017-02-14 21:56     ` Junio C Hamano
  2017-02-14 22:10       ` Stefan Beller
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-02-14 21:56 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

> If we were to redesign the .gitmodules file, we might have it as
>
>     [submodule "path"]
>         url = git://example.org
>         branch = .
>         ...
>
> and the "path -> name/UID" mapping would be inside $GIT_DIR.

I am not sure how you are going to keep track of that mapping,
though.  If .gitmodules file does not have a way to tell that what
used to be at "path" in its v1.0 is now at "htap" (instead the above
seems to assume there will just be an entry for [submodule "htap"]
in the newer version, without anything that links the old one with
the new one), how would the mapping inside $GIT_DIR know?  Don't
forget that name was introduced as the identity because we cannot
assume that URL for a single project will never change.

I fully agree that our documentation and user education should
stress that names must be unique and immultable throughout the
history of a superproject, though.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 21:56     ` Junio C Hamano
@ 2017-02-14 22:10       ` Stefan Beller
  2017-02-14 22:17         ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Beller @ 2017-02-14 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Brandon Williams

On Tue, Feb 14, 2017 at 1:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> If we were to redesign the .gitmodules file, we might have it as
>>
>>     [submodule "path"]
>>         url = git://example.org
>>         branch = .
>>         ...
>>
>> and the "path -> name/UID" mapping would be inside $GIT_DIR.
>
> I am not sure how you are going to keep track of that mapping,
> though.  If .gitmodules file does not have a way to tell that what
> used to be at "path" in its v1.0 is now at "htap" (instead the above
> seems to assume there will just be an entry for [submodule "htap"]
> in the newer version, without anything that links the old one with
> the new one), how would the mapping inside $GIT_DIR know?

It depends. Maybe git-mv could have rewritten the internal mapping
as well.

Maybe it would work similar to a rename detection
utilizing a bloomfilter that includes all recorded sha1s at a given path
and then we can take the sha1 from the a given path and check for each
absorbed submodule git dir if that commit belongs to this repo.

I did not quite think it through, but I was pointing out this is brittle.
I guess a quick way would be to follow the .git file inside the submodule
if that exists and if not build up an internal cache that can map
"path -> potential git dirs".

Of course we can argue that the same problem applies to e.g. remotes:
If I have
    remote.origin.url = git://kernel.org and
    remote.mirror.url = kernel.googlesource.com
then swapping the urls will of course yield different behavior
for 'origin' and 'mirror'. But in this case it is obvious because
"origin" is not the same string as "kernel.org".

So long term, maybe we should come up with a better default name
for submodules, e.g. just a hash of say the URL being used when
adding the submodule.

>  Don't
> forget that name was introduced as the identity because we cannot
> assume that URL for a single project will never change.

Yes, URL and path can both change over time, which is why it is
a good idea to have them versioned as well as having a way to
overwrite the URL in the config later on.

> I fully agree that our documentation and user education should
> stress that names must be unique and immultable throughout the
> history of a superproject, though.

This would be a good paragraph in this "background story" that this
patch tries to write. I'll add that.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 22:10       ` Stefan Beller
@ 2017-02-14 22:17         ` Junio C Hamano
  2017-02-14 22:24           ` Stefan Beller
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2017-02-14 22:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

> On Tue, Feb 14, 2017 at 1:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> If we were to redesign the .gitmodules file, we might have it as
>>>
>>>     [submodule "path"]
>>>         url = git://example.org
>>>         branch = .
>>>         ...
>>>
>>> and the "path -> name/UID" mapping would be inside $GIT_DIR.
>>
>> I am not sure how you are going to keep track of that mapping,
>> though.  If .gitmodules file does not have a way to tell that what
>> used to be at "path" in its v1.0 is now at "htap" (instead the above
>> seems to assume there will just be an entry for [submodule "htap"]
>> in the newer version, without anything that links the old one with
>> the new one), how would the mapping inside $GIT_DIR know?
>
> It depends. Maybe git-mv could have rewritten the internal mapping
> as well.

And then after doing the "git mv" you have pushed the result, which
I pulled.  Now, how will your "internal mapping" propagate to me?

I also do not think "this is similar to file renames" holds water.
Moving the path a submodule bound to from one path to another is
done as a whole, and it is not like the blob contents where we need
to handle patch application that expresses a move as creation and
deletion of similar contents at two different paths.  We can afford
to be precise (after all, we are recording other information about
submodules by having an extra .gitmodules file).

In short, "name" is not a design mistake at all.  That needs to be
excised from the "background story".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 22:17         ` Junio C Hamano
@ 2017-02-14 22:24           ` Stefan Beller
  2017-02-14 22:39             ` Junio C Hamano
  2017-02-14 23:31             ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Beller @ 2017-02-14 22:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Brandon Williams

On Tue, Feb 14, 2017 at 2:17 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Tue, Feb 14, 2017 at 1:56 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Stefan Beller <sbeller@google.com> writes:
>>>
>>>> If we were to redesign the .gitmodules file, we might have it as
>>>>
>>>>     [submodule "path"]
>>>>         url = git://example.org
>>>>         branch = .
>>>>         ...
>>>>
>>>> and the "path -> name/UID" mapping would be inside $GIT_DIR.
>>>
>>> I am not sure how you are going to keep track of that mapping,
>>> though.  If .gitmodules file does not have a way to tell that what
>>> used to be at "path" in its v1.0 is now at "htap" (instead the above
>>> seems to assume there will just be an entry for [submodule "htap"]
>>> in the newer version, without anything that links the old one with
>>> the new one), how would the mapping inside $GIT_DIR know?
>>
>> It depends. Maybe git-mv could have rewritten the internal mapping
>> as well.
>
> And then after doing the "git mv" you have pushed the result, which
> I pulled.  Now, how will your "internal mapping" propagate to me?

The "name" inside your superprojects git dir may be different from mine,
after all the name only serves the purpose to not have duplicate
git repositories when renaming a submodule.

>
> I also do not think "this is similar to file renames" holds water.
> Moving the path a submodule bound to from one path to another is
> done as a whole, and it is not like the blob contents where we need
> to handle patch application that expresses a move as creation and
> deletion of similar contents at two different paths.  We can afford
> to be precise (after all, we are recording other information about
> submodules by having an extra .gitmodules file).
>
> In short, "name" is not a design mistake at all.  That needs to be
> excised from the "background story".

I am not saying it was a design mistake per se.

I claim that the exposure into .gitmodules combined with
the extreme similarity to its path is confusing. Maybe this
can be fixed by a different default name.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 22:24           ` Stefan Beller
@ 2017-02-14 22:39             ` Junio C Hamano
  2017-02-14 23:31             ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-02-14 22:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

>> And then after doing the "git mv" you have pushed the result, which
>> I pulled.  Now, how will your "internal mapping" propagate to me?
>
> The "name" inside your superprojects git dir may be different from mine,
> after all the name only serves the purpose to not have duplicate
> git repositories when renaming a submodule.

That is true, but you still need to convey "what I used to have at
'path' is now at 'htap'".  It is clear how to do so if we use "name"
in .gitmodules (you say "what we collectively call module A is now
at 'htap'").  I do not know how you do so without having a name.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC-PATCHv2] submodules: add a background story
  2017-02-14 22:24           ` Stefan Beller
  2017-02-14 22:39             ` Junio C Hamano
@ 2017-02-14 23:31             ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2017-02-14 23:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Brandon Williams

Stefan Beller <sbeller@google.com> writes:

> I claim that the exposure into .gitmodules combined with
> the extreme similarity to its path is confusing. Maybe this
> can be fixed by a different default name.

I think that this may be worth thinking about it further.

The names are something the end users are not supposed to change,
and one way to ensure that is to make .gitmodules file a binary
black box that can only be updated with a specialized tool---as long
as the tool does not allow updating the "name" field, you wouldn't
risk them mucking with it.  Limiting the update to a specialized
tool also would give us a single place to ensure that it is globally
unique across the history of the project (well, at least the part of
the history that is visible to your repository).

Of course, being "one way" to do so does not mean it is the only
way, or it is the best way.  Keeping the information in a text file
lets you merge them more easily when you add a submodule B while I
added a submodule C, for example, and having a human readble name
lets us learn from the output of "git log -p .gitmodules" that the
repository of the "linux-kernel" submodule we use in our appliance
used to live at linux-2.6.git but has moved to linux.git over time
(for the latter use case to work well, we cannot change the name to
something unreadable by humans like uuid---discouraging people from
modifying and making them unreadble are two different things).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-14 23:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09  2:08 [RFC-PATCHv2] submodules: add a background story Stefan Beller
2017-02-09 23:32 ` Junio C Hamano
2017-02-14 21:46   ` Stefan Beller
2017-02-14 21:56     ` Junio C Hamano
2017-02-14 22:10       ` Stefan Beller
2017-02-14 22:17         ` Junio C Hamano
2017-02-14 22:24           ` Stefan Beller
2017-02-14 22:39             ` Junio C Hamano
2017-02-14 23:31             ` Junio C Hamano
2017-02-14  0:39 ` Brandon Williams

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).