git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH] submodules: overhaul documentation
@ 2017-06-07 18:53 Stefan Beller
  2017-06-13 19:29 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-07 18:53 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

The moved part of text has been slightly restructured:
* Rewrite first paragraph ("allows" is wrong. For example you can keep
  untracked repos as well, submodules enable tracking across versions)
  (Also remove short example as we have examples later)

* Remove "that is completely separate" from the second sentence as
  that was said in the first sentence.

* Introduce the gitmodules file in the third paragraph, mention name
  as the basic requirement. The URL is optional though strongly
  suggested. Leave it out as gitmodules(5) explains the url.

* The paragraphs about other mechanisms and implementation details
  are moved further down, as they are not as relevant to the concept of
  gitmodules.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This is kind of a resend from [RFC-PATCHv2] submodules: add a background story
https://public-inbox.org/git/20170209020855.23486-1-sbeller@google.com/
but the new man page is completely reworked, so I'd expect it go over better
for the first half at least.

(In the "data model" section it begins to differ from reality,
as it mentions a new not-yet-implemented place where to put submodule
related config)

Thanks,
Stefan

 Documentation/Makefile          |   1 +
 Documentation/git-rm.txt        |   4 +-
 Documentation/git-submodule.txt |  44 ++-------
 Documentation/gitsubmodules.txt | 214 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 227 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 --------
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d5..032590d828 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 -----------
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[5]
 
 COMMANDS
 --------
@@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] <path>...)::
 	tree. Further calls to `git submodule update`, `git submodule foreach`
 	and `git submodule sync` will skip any unregistered submodules until
 	they are initialized again, so use this command if you don't want to
-	have a local checkout of the submodule in your working tree anymore. If
-	you really want to remove a submodule from the repository and commit
-	that use linkgit:git-rm[1] instead.
+	have a local checkout of the submodule in your working tree anymore.
 +
 When the command is run without pathspec, it errors out,
 instead of deinit-ing everything, to prevent mistakes.
 +
 If `--force` is specified, the submodule's working tree will
 be removed even if it contains local modifications.
++
+If you really want to remove a submodule from the repository and commit
+that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
+options.
 
 update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
 +
@@ -435,6 +407,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
 to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
 for details.
 
+SEE ALSO
+--------
+linkgit:gitsubmodules[7], linkgit:gitmodules[5].
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
new file mode 100644
index 0000000000..2bf3149b68
--- /dev/null
+++ b/Documentation/gitsubmodules.txt
@@ -0,0 +1,214 @@
+gitsubmodules(7)
+================
+
+NAME
+----
+gitsubmodules - mounting one repository inside another
+
+SYNOPSIS
+--------
+.gitmodules, $GIT_DIR/config
+------------------
+git submodule
+git <command> --recurse-submodules
+------------------
+
+DESCRIPTION
+-----------
+
+A submodule is another Git repository tracked in a subdirectory of your
+repository. The tracked repository has its own history, which does not
+interfere with the history of the current repository.
+
+Submodules are composed from a so-called `gitlink` tree entry
+in the main repository that refers to a particular commit object
+within the inner repository.
+
+Additionally to the gitlink entry the `.gitmodules` file (see
+linkgit:gitmodules[5]) at the root of the source tree contains
+information needed for submodules. The only required information
+is the path setting, which estabishes a logical name for the submodule.
+
+The usual git configuration (see linkgit:git-config[1]) can be used to
+override settings given by the `.gitmodules` file.
+
+Submodules can be used for two different use cases:
+
+1. Using another project that stands on its own.
+  When you want to use a third party library, submodules allow you to
+  have a clean history for your own project as well as for the library.
+  This also allows for updating the third party library as needed.
+
+2. Artificially split a (logically single) project into multiple
+   repositories and tying them back together. This can be used to
+   overcome deficiences in the data model of Git, such as:
+
+* To have finer grained access control.
+  The design principles of Git do not allow for partial repositories to be
+  checked out or transferred. A repository is the smallest unit that a user
+  can be given access to. Submodules are separate repositories, such that
+  you can restrict access to parts of your project via the use of submodules.
+* In its current form Git scales up poorly for very large repositories that
+  change a lot, as the history grows very large. For that you may want to look
+  at shallow clone, sparse checkout, or git-LFS.
+  However you can also use submodules to e.g. hold large binary assets
+  and these repositories are then shallowly cloned such that you do not
+  have a large history locally.
+
+The data model
+--------------
+
+A submodule can be considered its own autonomous repository, that has a
+worktree and a git directory at a different place than the superproject.
+
+The superproject only records the commit object name in its tree, such that
+any other information, e.g. where to obtain a copy from, is not recorded
+in the core data structures of Git. The porcelain layer of Git however
+makes use of the `.gitmodules` file that gives hints where and how to
+obtain a copy of the submodule git repository from.
+
+Submodule operations can be configured using the following mechanisms
+(from highest to lowest precedence):
+
+ * the command line for those commands that support taking submodule specs.
+
+ * the configuration file `$GIT_DIR/config`.
+
+ * the configuration file `config` found in the `refs/submodule/config` branch.
+   This can be used to overwrite the upstream configuration in the `.gitmodules`
+   file without changing the history of the project.
+   Useful options here are overwriting the base, where relative URLs apply to,
+   when mirroring only parts of the larger collection of submodules.
+
+ * the `.gitmodules` file inside the repository. A project usually includes this
+   file to suggest defaults for the upstream collection of repositories.
+
+On the location of the git directory
+------------------------------------
+
+Since v1.7.7 of Git, the git directory of submodules is stored inside the
+superprojects git directory at $GIT_DIR/modules/<submodule-name>
+This location allows for the working tree to be non existent while keeping
+the history around. So we can use `git-rm` on a submodule without loosing
+information that may only be local; it is also possible to checkout the
+superproject before and after the deletion of the submodule without the
+need to reclone the submodule as it is kept locally.
+
+Workflow for a third party library
+----------------------------------
+
+  # add the submodule
+  git submodule add <url> <path>
+
+  # occasionally update the submodule to a new version:
+  git -C <path> checkout <new version>
+  git add <path>
+  git commit -m "update submodule to new version"
+
+  # see the discussion below on deleting submodules
+
+
+Workflow for an artifically split repo
+--------------------------------------
+
+  # Enable recursion for relevant commands, such that
+  # regular commands recurse into submodules by default
+  git config --global submodule.recurse true
+
+  # Unlike the other commands below clone still needs
+  # its own recurse flag:
+  git clone --recurse <URL> <directory>
+  cd <directory>
+
+  # Get to know the code:
+  git grep foo
+  git ls-files
+
+  # Get new code
+  git fetch
+  git pull --rebase
+
+  # change worktree
+  git checkout
+  git reset
+
+Deleting a submodule
+--------------------
+
+Deleting a submodule can happen on different levels:
+
+1) Removing it from the local working tree without tampering with
+   the history of the superproject.
+
+You may no longer need the submodule, but still want to keep it recorded
+in the superproject history as others may have use for it.
+--
+  git submodule deinit <submodule path>
+--
+will remove the configuration entries
+as well as the work
+
+2) Remove it from history:
+--
+   git rm <submodule>
+--
+
+3) Remove the submodules git directory:
+
+When you also want to free up the disk space that the submodules git
+directory uses, you have to delete it manually. It is found in
+`$GIT_DIR/modules`.
+The steps 1 and 2 can be undone via `git submodule init` or
+`git revert`, respectively.  This step may incur data loss,
+and cannot be undone. That is why there is no builtin.
+
+Other mechanisms
+----------------
+
+Git repositories are allowed to be kept inside other repositories without
+the need to use submodules. This however does not enable cross-repository
+versioning as the inner repository is unaware of the outer repository,
+which in turn ignores the inner.
+
+Submodules are not to be confused with remotes, which are other
+repositories of the same project; submodules are meant for
+different projects you would like to make part of your source tree,
+while the history of the two projects still stays completely
+independent and you cannot modify the contents of the submodule
+from within the main project.
+If you want to merge the project histories and want to treat the
+aggregated whole as a single project from then on, you may want to
+add a remote for the other project and use the 'subtree' merge strategy,
+instead of treating the other project as a submodule. Directories
+that come from both projects can be cloned and checked out as a whole
+if you choose to go that route.
+
+Third party tools
+-----------------
+
+There are a variety of third party tools that manage multiple repositories
+and their relationships to each other, such as Androids repo tool or git-slave.
+Often these tools lack cross repository versioning.
+
+https://source.android.com/source/using-repo
+
+http://gitslave.sourceforge.net/
+
+Implementation details
+----------------------
+
+When cloning or pulling a repository containing submodules the submodules
+will not be checked out by default; You can instruct 'clone' to recurse
+into submodules. The 'init' and 'update' subcommands of 'git submodule'
+will maintain submodules checked out and at an appropriate revision in
+your working tree. Alternatively you can set 'submodule.recurse' to have
+'checkout' recursing into submodules.
+
+
+SEE ALSO
+--------
+linkgit:git-submodule[1], linkgit:gitmodules[5].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.13.0.17.gf3d7728391


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-07 18:53 [RFC/PATCH] submodules: overhaul documentation Stefan Beller
@ 2017-06-13 19:29 ` Junio C Hamano
  2017-06-13 21:06   ` Stefan Beller
  2017-06-20 18:18 ` Jonathan Tan
  2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-06-13 19:29 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> @@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] <path>...)::
>  	tree. Further calls to `git submodule update`, `git submodule foreach`
>  	and `git submodule sync` will skip any unregistered submodules until
>  	they are initialized again, so use this command if you don't want to
> -	have a local checkout of the submodule in your working tree anymore. If
> -	you really want to remove a submodule from the repository and commit
> -	that use linkgit:git-rm[1] instead.
> +	have a local checkout of the submodule in your working tree anymore.
>  +
>  When the command is run without pathspec, it errors out,
>  instead of deinit-ing everything, to prevent mistakes.
>  +
>  If `--force` is specified, the submodule's working tree will
>  be removed even if it contains local modifications.
> ++
> +If you really want to remove a submodule from the repository and commit
> +that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
> +options.

Good reorganization.

> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> new file mode 100644
> index 0000000000..2bf3149b68
> --- /dev/null
> +++ b/Documentation/gitsubmodules.txt
> @@ -0,0 +1,214 @@
> +gitsubmodules(7)
> +================
> +
> +NAME
> +----
> +gitsubmodules - mounting one repository inside another
> +
> +SYNOPSIS
> +--------
> +.gitmodules, $GIT_DIR/config
> +------------------
> +git submodule
> +git <command> --recurse-submodules
> +------------------
> +
> +DESCRIPTION
> +-----------
> +
> +A submodule is another Git repository tracked in a subdirectory of your
> +repository. The tracked repository has its own history, which does not
> +interfere with the history of the current repository.

"tracked in a subdirectory" sounds as if your top-level superproject
has a dedicated submodules/ directory and in it there live a bunch
of submodules.  Which obviously is not what you meant.  If phrased
"tracked as a subdirectory", I think the sentence makes sense.

While "which does not interfere" may be technically correct, I am
not sure what the value of saying that is.

> +Submodules are composed from a so-called `gitlink` tree entry
> +in the main repository that refers to a particular commit object
> +within the inner repository.

Correct, but it may be unclear to the readers why we do so.  Perhaps

        ... and this way, the tree of each commit in the main repository
        "knows" which commit from the submodule's history is "tied" to it.

or something like that?

> +Additionally to the gitlink entry the `.gitmodules` file (see
> +linkgit:gitmodules[5]) at the root of the source tree contains
> +information needed for submodules.

Is that really true?  Each submodule do not *need* what is in
.gitmodules; the top-level superproject needs to learn about
its submodules from the contents of that file, though.

> +The only required information
> +is the path setting, which estabishes a logical name for the submodule.

The phrase "the path setting" feels a bit unfortunate.  Is that
"only" thing we need?  Without URL we have no way to populate it,
no?

> +The usual git configuration (see linkgit:git-config[1]) can be used to
> +override settings given by the `.gitmodules` file.
> +
> +Submodules can be used for two different use cases:
> +
> +1. Using another project that stands on its own.
> +  When you want to use a third party library, submodules allow you to
> +  have a clean history for your own project as well as for the library.
> +  This also allows for updating the third party library as needed.
> +
> +2. Artificially split a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome deficiences in the data model of Git, such as:

s/deficiences in the data model/current limitations/ perhaps?

> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.

Some servers implement per-branch access control that seems to work
rather well.  Given that "shallow history" is possible (i.e. you
could give one commit without exposing older parts of the history),
I think the limitation this paragrah refers to is that "a tree is
the smallest unit that the user can be given access to."

> +* In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large. For that you may want to look
> +  at shallow clone, sparse checkout, or git-LFS.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.

This is why I suggest "current limitations"; this is not about
deficiency in the data model.

> +A submodule can be considered its own autonomous repository, that has a
> +worktree and a git directory at a different place than the superproject.

"Its own" I agree, but autonomous?

The mention of "main repository" in the earlier part of the document
may want to use the same phrase "superproject".

> +The superproject only records the commit object name in its tree, such that
> +any other information, e.g. where to obtain a copy from, is not recorded
> +in the core data structures of Git. The porcelain layer of Git however
> +makes use of the `.gitmodules` file that gives hints where and how to
> +obtain a copy of the submodule git repository from.

OK.

> +On the location of the git directory
> +------------------------------------
> +
> +Since v1.7.7 of Git, the git directory of submodules is stored inside the
> +superprojects git directory at $GIT_DIR/modules/<submodule-name>
> +This location allows for the working tree to be non existent while keeping
> +the history around. So we can use `git-rm` on a submodule without loosing

s/git-rm/git -rm/
s/loosing/losing/

> +Workflow for a third party library
> +----------------------------------
> +
> +  # add the submodule
> +  git submodule add <url> <path>
> +
> +  # occasionally update the submodule to a new version:
> +  git -C <path> checkout <new version>
> +  git add <path>
> +  git commit -m "update submodule to new version"

OK.

> +Workflow for an artifically split repo
> +--------------------------------------
> +
> +  # Enable recursion for relevant commands, such that
> +  # regular commands recurse into submodules by default
> +  git config --global submodule.recurse true
> +
> +  # Unlike the other commands below clone still needs
> +  # its own recurse flag:
> +  git clone --recurse <URL> <directory>
> +  cd <directory>
> +
> +  # Get to know the code:
> +  git grep foo
> +  git ls-files
> +
> +  # Get new code
> +  git fetch
> +  git pull --rebase
> +
> +  # change worktree
> +  git checkout
> +  git reset

This part is interesting ;-)

> +Deleting a submodule
> +--------------------
> +
> +Deleting a submodule can happen on different levels:
> +
> +1) Removing it from the local working tree without tampering with
> +   the history of the superproject.
> +
> +You may no longer need the submodule, but still want to keep it recorded
> +in the superproject history as others may have use for it.
> +--
> +  git submodule deinit <submodule path>
> +--
> +will remove the configuration entries
> +as well as the work

Do we have an adjective used for submodules that are checked out
vs deleted in this manner (I am thinking of "active" from earlier
work by Brandon)?  Do we want to mention it around here?

> +2) Remove it from history:
> +--
> +   git rm <submodule>
> +--

Is this removing from "history"?  Isn't it merely removing it from
the index of the superproject (hence potentially removing it from
the tree of the upcoming commit in the superproject)?

> +3) Remove the submodules git directory:
> +
> +When you also want to free up the disk space that the submodules git
> +directory uses, you have to delete it manually. It is found in
> +`$GIT_DIR/modules`.
> +The steps 1 and 2 can be undone via `git submodule init` or
> +`git revert`, respectively.  This step may incur data loss,
> +and cannot be undone. That is why there is no builtin.

Perhaps "deinit" can learn an option to do this (tangent).  When you
are a follower, it is OK to do so.

When you are removing the only copy of the repository, of course
there will be some data loss ;-)


> +Other mechanisms
> +----------------
> +
> +Git repositories are allowed to be kept inside other repositories without
> +the need to use submodules. This however does not enable cross-repository
> +versioning as the inner repository is unaware of the outer repository,
> +which in turn ignores the inner.

s/the inner/& repository/;

> +Submodules are not to be confused with remotes, which are other
> +repositories of the same project; submodules are meant for
> +different projects you would like to make part of your source tree,
> +while the history of the two projects still stays completely
> +independent and you cannot modify the contents of the submodule
> +from within the main project.

Would anybody make such a confusion, though?  Perhaps drop the first
sentence up to ';' in a follow-up patch?

> +If you want to merge the project histories and want to treat the
> +aggregated whole as a single project from then on, you may want to
> +add a remote for the other project and use the 'subtree' merge strategy,
> +instead of treating the other project as a submodule. Directories
> +that come from both projects can be cloned and checked out as a whole
> +if you choose to go that route.

While it is correct, is this something we want to mention in
gitsubmodule.txt?  It sounds more like what "git merge" should say,
if we wanted to.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-13 19:29 ` Junio C Hamano
@ 2017-06-13 21:06   ` Stefan Beller
  2017-06-19 18:10     ` Brandon Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Beller @ 2017-06-13 21:06 UTC (permalink / raw)
  To: Junio C Hamano, Brandon Williams, Jonathan Nieder; +Cc: git@vger.kernel.org

Adding two native speakers as we start word smithing.

On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamano <gitster@pobox.com> wrote:

>> +
>> +A submodule is another Git repository tracked in a subdirectory of your
>> +repository. The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>
> "tracked in a subdirectory" sounds as if your top-level superproject
> has a dedicated submodules/ directory and in it there live a bunch
> of submodules.  Which obviously is not what you meant.  If phrased
> "tracked as a subdirectory", I think the sentence makes sense.

Given this explanation "as a" also sounds wrong[1], maybe we need to
separate (1) where it is put/mounted and (2) the fact that is tracked,
i.e. the superproject has an idea of what should be there at a given
revision. (I shortly thought about /s/as a/using/ in the above, but):

  A submodule is another Git repository at an arbitrary place inside
  the working tree, and also tracked. The tracked repository has its
  own history, which does not interfere with the history of the current
  repository.

[1] http://www.thesaurus.com/browse/as

>
> While "which does not interfere" may be technically correct, I am
> not sure what the value of saying that is.

I think we can drop it here. When writing I wanted to separate it from
subtrees, but this is the wrong place for that.

>
>> +Submodules are composed from a so-called `gitlink` tree entry
>> +in the main repository that refers to a particular commit object
>> +within the inner repository.
>
> Correct, but it may be unclear to the readers why we do so.  Perhaps
>
>         ... and this way, the tree of each commit in the main repository
>         "knows" which commit from the submodule's history is "tied" to it.
>
> or something like that?

sounds good to me.

>
>> +Additionally to the gitlink entry the `.gitmodules` file (see
>> +linkgit:gitmodules[5]) at the root of the source tree contains
>> +information needed for submodules.
>
> Is that really true?  Each submodule do not *need* what is in
> .gitmodules; the top-level superproject needs to learn about
> its submodules from the contents of that file, though.

Ha! The ediled words in my mind were:

 ... information needed for submodules [to work in the superproject].

But maybe we need to reword that as

  Additionally to the gitlink entry the `.gitmodules` file (see
  linkgit:gitmodules[5]) at the root of the source tree contains
  information on how to handle submodules.

I'd like to keep this part short and not go into detail.

>
>> +The only required information
>> +is the path setting, which estabishes a logical name for the submodule.
>
> The phrase "the path setting" feels a bit unfortunate.  Is that
> "only" thing we need?  Without URL we have no way to populate it,
> no?

    git config -f .gitmodules submodule.foo.path foo
    git config submodule.foo.url example.org/foo
    git submodule update --init

ought to work just fine. It is not the recommended way of working,
but it should work.

I think (in the far future) we actually should only have the path information
in-tree and *any* other information outside the tree, which includes the URL,

See[2], where I state how I'd like to shape the future:

  $ cat .gitmodules
  [submodule "sub42"]
    path = foo
  # path only in tree!

  $ cat .git/config
  ...
  [submodule]
    active = .
    active = :(exclude)Irrelevant/submodules/for/my/usecase/*
  # note how this is user centric

  $ git show refs/meta/magic/for/refs/heads/master:.gitmodules
  [submodule "sub42"]
    url = https://example.org/foo
    branch = .
  # Note how this is neither centering on the in-tree
  # contents, nor the user. Instead it focuses on the
  # project or group. It is *workflow* centric.
  # Workflows may change over time, e.g. the url could
  # be repointed to k.org or an in-house mirror without tree
  # changes.

Jonathan pointed out the ref name is chosen poorly, but conceptually
I would want to keep the URL setting outside the tree. The URL may
change over time, independently from the history currently checked out
(think of bisect, that includes an "submodule update --init" to bisect across
a fully populated superproject 'at the time')

[2] https://public-inbox.org/git/CAGZ79kbbTwQicVkRs51fV91R_7ZhDtC+FR8Z-SQzRpF2cjFfag@mail.gmail.com/




>
>> +The usual git configuration (see linkgit:git-config[1]) can be used to
>> +override settings given by the `.gitmodules` file.
>> +
>> +Submodules can be used for two different use cases:
>> +
>> +1. Using another project that stands on its own.
>> +  When you want to use a third party library, submodules allow you to
>> +  have a clean history for your own project as well as for the library.
>> +  This also allows for updating the third party library as needed.
>> +
>> +2. Artificially split a (logically single) project into multiple
>> +   repositories and tying them back together. This can be used to
>> +   overcome deficiences in the data model of Git, such as:
>
> s/deficiences in the data model/current limitations/ perhaps?

makes sense.

>
>> +* To have finer grained access control.
>> +  The design principles of Git do not allow for partial repositories to be
>> +  checked out or transferred. A repository is the smallest unit that a user
>> +  can be given access to. Submodules are separate repositories, such that
>> +  you can restrict access to parts of your project via the use of submodules.
>
> Some servers implement per-branch access control that seems to work
> rather well.

True. So maybe s/partial repository/partial working tree/

> Given that "shallow history" is possible (i.e. you
> could give one commit without exposing older parts of the history),
> I think the limitation this paragrah refers to is that "a tree is
> the smallest unit that the user can be given access to."

yes. Though in theory (with the work on omitting blobs and potentially trees)
we could omit partial trees as well and just tell the user they cannot have it.

>
>> +* In its current form Git scales up poorly for very large repositories that
>> +  change a lot, as the history grows very large. For that you may want to look
>> +  at shallow clone, sparse checkout, or git-LFS.
>> +  However you can also use submodules to e.g. hold large binary assets
>> +  and these repositories are then shallowly cloned such that you do not
>> +  have a large history locally.
>
> This is why I suggest "current limitations"; this is not about
> deficiency in the data model.

ok.

>
>> +A submodule can be considered its own autonomous repository, that has a
>> +worktree and a git directory at a different place than the superproject.
>
> "Its own" I agree, but autonomous?

I'll drop that word.


>> +Workflow for an artifically split repo
>> +--------------------------------------
>> +
...
>> +
>> +  # change worktree
>> +  git checkout
>> +  git reset
>
> This part is interesting ;-)

and the problem is this is still in flux ...


>
>> +Deleting a submodule
>> +--------------------
>> +
>> +Deleting a submodule can happen on different levels:
>> +
>> +1) Removing it from the local working tree without tampering with
>> +   the history of the superproject.
>> +
>> +You may no longer need the submodule, but still want to keep it recorded
>> +in the superproject history as others may have use for it.
>> +--
>> +  git submodule deinit <submodule path>
>> +--
>> +will remove the configuration entries
>> +as well as the work
>
> Do we have an adjective used for submodules that are checked out
> vs deleted in this manner (I am thinking of "active" from earlier
> work by Brandon)?  Do we want to mention it around here?

We'd want to propagate "active" more throughout our documentation,
too.

I think this state would be called "unpopulated" (as: the working
tree is not populated, no hint wither the git dir of the submodule
exists)

>
>> +2) Remove it from history:
>> +--
>> +   git rm <submodule>
>> +--
>
> Is this removing from "history"?  Isn't it merely removing it from
> the index of the superproject (hence potentially removing it from
> the tree of the upcoming commit in the superproject)?

True.

>
>> +3) Remove the submodules git directory:
>> +
>> +When you also want to free up the disk space that the submodules git
>> +directory uses, you have to delete it manually. It is found in
>> +`$GIT_DIR/modules`.
>> +The steps 1 and 2 can be undone via `git submodule init` or
>> +`git revert`, respectively.  This step may incur data loss,
>> +and cannot be undone. That is why there is no builtin.
>
> Perhaps "deinit" can learn an option to do this (tangent).  When you
> are a follower, it is OK to do so.
>
> When you are removing the only copy of the repository, of course
> there will be some data loss ;-)

Good point. deinit seems to be the logical place to put it.
Although we could also argue to not hide it in a flag of deinit,
but have a new subcommand "git submodule delete" that removes
the working tree and the git dir, but not the gitlink.

>> +Other mechanisms
>> +----------------
>> +
>> +Git repositories are allowed to be kept inside other repositories without
>> +the need to use submodules. This however does not enable cross-repository
>> +versioning as the inner repository is unaware of the outer repository,
>> +which in turn ignores the inner.
>
> s/the inner/& repository/;
>
>> +Submodules are not to be confused with remotes, which are other
>> +repositories of the same project; submodules are meant for
>> +different projects you would like to make part of your source tree,
>> +while the history of the two projects still stays completely
>> +independent and you cannot modify the contents of the submodule
>> +from within the main project.
>
> Would anybody make such a confusion, though?  Perhaps drop the first
> sentence up to ';' in a follow-up patch?

This code was moved from the current git-submodule man page.
I questioned this confusion as well. Maybe this was confusing when
it was new?

Will remove.

>
>> +If you want to merge the project histories and want to treat the
>> +aggregated whole as a single project from then on, you may want to
>> +add a remote for the other project and use the 'subtree' merge strategy,
>> +instead of treating the other project as a submodule. Directories
>> +that come from both projects can be cloned and checked out as a whole
>> +if you choose to go that route.
>
> While it is correct, is this something we want to mention in
> gitsubmodule.txt?  It sounds more like what "git merge" should say,
> if we wanted to.

The section "Other mechanisms" would want to point out all
things that are useful for slightly different use cases, which includes
sub trees?

>
> Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-13 21:06   ` Stefan Beller
@ 2017-06-19 18:10     ` Brandon Williams
  2017-06-20 21:42       ` Stefan Beller
  0 siblings, 1 reply; 17+ messages in thread
From: Brandon Williams @ 2017-06-19 18:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Jonathan Nieder, git@vger.kernel.org

On 06/13, Stefan Beller wrote:
> Adding two native speakers as we start word smithing.
> 
> On Tue, Jun 13, 2017 at 12:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
> 
> >> +
> >> +A submodule is another Git repository tracked in a subdirectory of your
> >> +repository. The tracked repository has its own history, which does not
> >> +interfere with the history of the current repository.
> >
> > "tracked in a subdirectory" sounds as if your top-level superproject
> > has a dedicated submodules/ directory and in it there live a bunch
> > of submodules.  Which obviously is not what you meant.  If phrased
> > "tracked as a subdirectory", I think the sentence makes sense.
> 
> Given this explanation "as a" also sounds wrong[1], maybe we need to
> separate (1) where it is put/mounted and (2) the fact that is tracked,
> i.e. the superproject has an idea of what should be there at a given
> revision. (I shortly thought about /s/as a/using/ in the above, but):
> 
>   A submodule is another Git repository at an arbitrary place inside
>   the working tree, and also tracked. The tracked repository has its
>   own history, which does not interfere with the history of the current
>   repository.

I would probably change the first sentence to:

  A submodule is another Git repository tracked at an arbitrary place
  inside the working tree.

> 
> [1] http://www.thesaurus.com/browse/as
> 
> >
> > While "which does not interfere" may be technically correct, I am
> > not sure what the value of saying that is.
> 
> I think we can drop it here. When writing I wanted to separate it from
> subtrees, but this is the wrong place for that.
> 
> >
> >> +Submodules are composed from a so-called `gitlink` tree entry
> >> +in the main repository that refers to a particular commit object
> >> +within the inner repository.
> >
> > Correct, but it may be unclear to the readers why we do so.  Perhaps
> >
> >         ... and this way, the tree of each commit in the main repository
> >         "knows" which commit from the submodule's history is "tied" to it.
> >
> > or something like that?
> 
> sounds good to me.
> 
> >
> >> +Additionally to the gitlink entry the `.gitmodules` file (see
> >> +linkgit:gitmodules[5]) at the root of the source tree contains
> >> +information needed for submodules.
> >
> > Is that really true?  Each submodule do not *need* what is in
> > .gitmodules; the top-level superproject needs to learn about
> > its submodules from the contents of that file, though.
> 
> Ha! The ediled words in my mind were:
> 
>  ... information needed for submodules [to work in the superproject].
> 
> But maybe we need to reword that as
> 
>   Additionally to the gitlink entry the `.gitmodules` file (see
>   linkgit:gitmodules[5]) at the root of the source tree contains
>   information on how to handle submodules.

This sounds slightly awkward.  Maybe:

    In addition to the gitlink entry, the `.gitmodules` file (see
    linkgit:gitmodules[5]) at the root of the source tree contains
    information on how to handle submodules.


-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-07 18:53 [RFC/PATCH] submodules: overhaul documentation Stefan Beller
  2017-06-13 19:29 ` Junio C Hamano
@ 2017-06-20 18:18 ` Jonathan Tan
  2017-06-20 19:15   ` Stefan Beller
  2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
  2 siblings, 1 reply; 17+ messages in thread
From: Jonathan Tan @ 2017-06-20 18:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed,  7 Jun 2017 11:53:54 -0700
Stefan Beller <sbeller@google.com> wrote:

[snip]

> +DESCRIPTION
> +-----------
> +
> +A submodule is another Git repository tracked in a subdirectory of your
> +repository. The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +Submodules are composed from a so-called `gitlink` tree entry
> +in the main repository that refers to a particular commit object
> +within the inner repository.
> +
> +Additionally to the gitlink entry the `.gitmodules` file (see
> +linkgit:gitmodules[5]) at the root of the source tree contains
> +information needed for submodules. The only required information
> +is the path setting, which estabishes a logical name for the submodule.

I know that this was copied over, but this is confusing to me, so it
might be worthwhile to change it. In particular, `gitlink` is, as far as
I know, the type of the child object, not any sort of tree entry. I
would rewrite this as:

    A submodule consists of a tracking subdirectory and an entry in the
    `.gitmodules` file (see linkgit:gitmodules[5]).

    The tracking subdirectory appears in the main repository as a
    `gitlink` object (instead of a `tree` object). The parent of the
    tracking subdirectory links to this `gitlink` object through its
    hash, just like linking to a `tree` or `blob`. This `gitlink` object
    contains the hash of a particular commit object of the submodule.

    The entry in the `.gitmodules` file establishes the name of the
    submodule (to be used as a reference by other entries and commands)
    and references the tracking subdirectory as follows:

        submodule.my_submodule_name.path = path/to/my_submodule

There might also be a need to mention that when the submodule is
populated (or whatever the right term is), the tracking subdirectory in
the working tree contains both the submodule's working tree and the
submodule's Git directory.

> +The usual git configuration (see linkgit:git-config[1]) can be used to
> +override settings given by the `.gitmodules` file.

Not sure if this is relevant here.

> +Submodules can be used for two different use cases:

A creative person might come up with more, so this might be better as:

    Submodules can be used for at least these use cases:

> +1. Using another project that stands on its own.
> +  When you want to use a third party library, submodules allow you to
> +  have a clean history for your own project as well as for the library.
> +  This also allows for updating the third party library as needed.

Probably better as:

    1. Using another project while maintaining independent history.
    Submodules allow you to contain the working tree of another project
    within your own working tree while keeping the history of both
    projects separate. Also, since submodules are fixed to a hash, the
    other project can be independently developed without affecting the
    parent project, allowing the parent project to fix itself to new
    versions only whenever desired.

> +2. Artificially split a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome deficiences in the data model of Git, such as:

This should match the gerund used in point 1:

    2. Splitting a (logically single) project into multiple
    repositories. This can be used to overcome deficiencies in the data
    model of Git, such as:

> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.

Not sure about this point - if the project is logically single, you
would probably need to see the entire project. If this is about
different teams independently working on different subcomponents, this
seems more like point 1 (inclusion of other projects).

> +* In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large. For that you may want to look
> +  at shallow clone, sparse checkout, or git-LFS.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +The data model
> +--------------
> +
> +A submodule can be considered its own autonomous repository, that has a
> +worktree and a git directory at a different place than the superproject.

Isn't the worktree inside the superproject's worktree? I would write
this as:

    A submodule is its own repository, having its own working tree and
    Git directory.

> +The superproject only records the commit object name in its tree, such that
> +any other information, e.g. where to obtain a copy from, is not recorded
> +in the core data structures of Git. The porcelain layer of Git however
> +makes use of the `.gitmodules` file that gives hints where and how to
> +obtain a copy of the submodule git repository from.

I would write this as:

    Additional metadata (for example, from where this submodule was
    obtained) can be written to the `.gitmodules` file, to be used by
    the porcelain layer of Git.

> +Submodule operations can be configured using the following mechanisms
> +(from highest to lowest precedence):
> +
> + * the command line for those commands that support taking submodule specs.
> +
> + * the configuration file `$GIT_DIR/config`.

Which $GIT_DIR? The submodule or the superproject?

> + * the configuration file `config` found in the `refs/submodule/config` branch.
> +   This can be used to overwrite the upstream configuration in the `.gitmodules`
> +   file without changing the history of the project.
> +   Useful options here are overwriting the base, where relative URLs apply to,
> +   when mirroring only parts of the larger collection of submodules.
> +
> + * the `.gitmodules` file inside the repository. A project usually includes this
> +   file to suggest defaults for the upstream collection of repositories.

Which .gitmodules? The submodule's or the superproject's?

> +On the location of the git directory
> +------------------------------------
> +
> +Since v1.7.7 of Git, the git directory of submodules is stored inside the
> +superprojects git directory at $GIT_DIR/modules/<submodule-name>

Missing period at end of sentence.

Also, is this always true? We discussed that at least sometimes, it is
in the .git directory of the tracking directory.

> +This location allows for the working tree to be non existent while keeping
> +the history around. So we can use `git-rm` on a submodule without loosing
> +information that may only be local; it is also possible to checkout the
> +superproject before and after the deletion of the submodule without the
> +need to reclone the submodule as it is kept locally.

For me, it is confusing to refer to it as a "submodule" after deletion,
since when you delete something, normally it no longer exists. Maybe
write this as:

    When a submodule is deleted using `git-rm`, its Git directory is
    moved to the superproject in $GIT_DIR/modules/<submodule-name>. This
    allows a submodule of the same name to later be re-added.

Also, this seems like a strange feature to me - if there is a rationale,
maybe it could be added.

[snip]

> +Deleting a submodule
> +--------------------
> +
> +Deleting a submodule can happen on different levels:
> +
> +1) Removing it from the local working tree without tampering with
> +   the history of the superproject.
> +
> +You may no longer need the submodule, but still want to keep it recorded
> +in the superproject history as others may have use for it.
> +--
> +  git submodule deinit <submodule path>
> +--
> +will remove the configuration entries
> +as well as the work

Incomplete sentence? Also, won't removing the configuration entries
tamper with the history of the superproject? (.gitmodules is usually
tracked, right?)

> +2) Remove it from history:
> +--
> +   git rm <submodule>
> +--

Does this completely delete the Git directory of the submodule too?
Above, it is stated that in such a case, the directory will be moved to
the superproject.

Actually, after reading below, I see that it is not the case. I would
include the commit step and then mention how the whole thing can be
undone.

    git rm <submodule>
    git commit

    This can be undone with `git revert`.

> +3) Remove the submodules git directory:
> +
> +When you also want to free up the disk space that the submodules git
> +directory uses, you have to delete it manually. It is found in
> +`$GIT_DIR/modules`.
> +The steps 1 and 2 can be undone via `git submodule init` or
> +`git revert`, respectively.  This step may incur data loss,
> +and cannot be undone. That is why there is no builtin.
> +
> +Other mechanisms
> +----------------
> +
> +Git repositories are allowed to be kept inside other repositories without
> +the need to use submodules. This however does not enable cross-repository
> +versioning as the inner repository is unaware of the outer repository,
> +which in turn ignores the inner.

Not sure if this is necessary.

> +Submodules are not to be confused with remotes, which are other
> +repositories of the same project; submodules are meant for
> +different projects you would like to make part of your source tree,
> +while the history of the two projects still stays completely
> +independent and you cannot modify the contents of the submodule
> +from within the main project.
> +If you want to merge the project histories and want to treat the
> +aggregated whole as a single project from then on, you may want to
> +add a remote for the other project and use the 'subtree' merge strategy,
> +instead of treating the other project as a submodule. Directories
> +that come from both projects can be cloned and checked out as a whole
> +if you choose to go that route.
> +
> +Third party tools
> +-----------------
> +
> +There are a variety of third party tools that manage multiple repositories
> +and their relationships to each other, such as Androids repo tool or git-slave.
> +Often these tools lack cross repository versioning.
> +
> +https://source.android.com/source/using-repo
> +
> +http://gitslave.sourceforge.net/

Not sure if this is necessary.

> +Implementation details
> +----------------------
> +
> +When cloning or pulling a repository containing submodules the submodules
> +will not be checked out by default; You can instruct 'clone' to recurse

semicolon -> period

> +into submodules. The 'init' and 'update' subcommands of 'git submodule'
> +will maintain submodules checked out and at an appropriate revision in
> +your working tree.

This part should probably be in the workflow section above. Also, above,
"git checkout" was introduced as a way to update the state of a
submodule - what's the difference between this and "git submodule
update"?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-20 18:18 ` Jonathan Tan
@ 2017-06-20 19:15   ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-20 19:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org

On Tue, Jun 20, 2017 at 11:18 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
>> +DESCRIPTION
>> +-----------
>> +
>> +A submodule is another Git repository tracked in a subdirectory of your
>> +repository. The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>> +
>> +Submodules are composed from a so-called `gitlink` tree entry
>> +in the main repository that refers to a particular commit object
>> +within the inner repository.
>> +
>> +Additionally to the gitlink entry the `.gitmodules` file (see
>> +linkgit:gitmodules[5]) at the root of the source tree contains
>> +information needed for submodules. The only required information
>> +is the path setting, which estabishes a logical name for the submodule.
>
> I know that this was copied over, but this is confusing to me, so it
> might be worthwhile to change it. In particular, `gitlink` is, as far as
> I know, the type of the child object,

correct

> not any sort of tree entry. I

Well a tree references (a) other trees (b) blobs or (c) gitlinks, so calling
a gitlink a tree-entry is correct, but maybe confusing.

> would rewrite this as:
>
>     A submodule consists of a tracking subdirectory and an entry in the
>     `.gitmodules` file (see linkgit:gitmodules[5]).

    A submodule consists of a tracking subdirectory [in the working directory]
    and an entry in the `.gitmodules` file (see linkgit:gitmodules[5]).

>     The tracking subdirectory appears in the main repository as a
>     `gitlink` object (instead of a `tree` object). The parent of the
>     tracking subdirectory links to this `gitlink` object through its
>     hash, just like linking to a `tree` or `blob`. This `gitlink` object
>     contains the hash of a particular commit object of the submodule.
>

I think this is confusing, too. :)
That is because a subdirectory exists on the FS, whereas the gitlink
appears in gits representation of the world (in the tree). Maybe:

    The tracking subdirectory appears in the main repository at
    the point where it is tracked via the gitlink in the tree.
    It is empty when the submodule is not populated, otherwise
    it contains the content of the submodule repository.

    The gitlink object contains the hash of a particular commit
    object of the submodule.

    The .gitmodules file establishes a relationship between the
    path, which is where the gitlink is in the tree, and the logical
    name, which is used for the location of the submodules git
    directory. The .gitmodules file has the same syntax as the
    $Git_DIR/config file. The relationship mapping of path to name
    is done via setting submodule.<name value>.path = <path value>.

    The submodules git directory is found in in the main repositories
    '$GIT_DIR/modules/<name>' or inside the tracking subdirectory,
    but this is deprecated.

>     The entry in the `.gitmodules` file establishes the name of the
>     submodule (to be used as a reference by other entries and commands)
>     and references the tracking subdirectory as follows:
>
>         submodule.my_submodule_name.path = path/to/my_submodule
>
> There might also be a need to mention that when the submodule is
> populated (or whatever the right term is), the tracking subdirectory in
> the working tree contains both the submodule's working tree and the
> submodule's Git directory.

See the alternative proposal above, the git directory is best kept outside
the tracking subdirectory, but rather contained inside the superprojects
git dir itself.

>
>> +The usual git configuration (see linkgit:git-config[1]) can be used to
>> +override settings given by the `.gitmodules` file.
>
> Not sure if this is relevant here.

This is relevant for overwriting e.g. the submodule.NAME.url setting.

>
>> +Submodules can be used for two different use cases:
>
> A creative person might come up with more, so this might be better as:
>
>     Submodules can be used for at least these use cases:

ok.

>> +1. Using another project that stands on its own.
>> +  When you want to use a third party library, submodules allow you to
>> +  have a clean history for your own project as well as for the library.
>> +  This also allows for updating the third party library as needed.
>
> Probably better as:
>
>     1. Using another project while maintaining independent history.
>     Submodules allow you to contain the working tree of another project
>     within your own working tree while keeping the history of both
>     projects separate. Also, since submodules are fixed to a hash, the

"fixed to an arbitrary version" instead?

>     other project can be independently developed without affecting the
>     parent project, allowing the parent project to fix itself to new
>     versions only whenever desired.

>> +2. Artificially split a (logically single) project into multiple
>> +   repositories and tying them back together. This can be used to
>> +   overcome deficiences in the data model of Git, such as:
>
> This should match the gerund used in point 1:
>
>     2. Splitting a (logically single) project into multiple
>     repositories. This can be used to overcome deficiencies in the data
>     model of Git, such as:

as Junio pointed in a separate email out we'd not use "deficiencies
in the data model" but something else.

>> +* To have finer grained access control.
>> +  The design principles of Git do not allow for partial repositories to be
>> +  checked out or transferred. A repository is the smallest unit that a user
>> +  can be given access to. Submodules are separate repositories, such that
>> +  you can restrict access to parts of your project via the use of submodules.
>
> Not sure about this point - if the project is logically single, you
> would probably need to see the entire project.

Well no. Large projects do not require you to see everything. This can be
a decision by yourself ("I am only here to fix grammar in the docs /
I am only here to work on this niche feature contained in 3 submodules,
so I don't want to download the whole thing") or by the project ("You
are not trustworthy enough to read the 'sekret' submodule, which only
our top engineers have access too but believe me you don't need it").

> If this is about
> different teams independently working on different subcomponents, this
> seems more like point 1 (inclusion of other projects).

In the very first attempt of this man page overhaul I also had an example
of big binary files, which were put in submodules, such that it could be
understood as a solution to the same problem as git-LFS is offering. But
this is not a long term solution, as you are working on fixing this for real.
But that is how Android uses some repositories, today.

>> +* In its current form Git scales up poorly for very large repositories that
>> +  change a lot, as the history grows very large. For that you may want to look
>> +  at shallow clone, sparse checkout, or git-LFS.
>> +  However you can also use submodules to e.g. hold large binary assets
>> +  and these repositories are then shallowly cloned such that you do not
>> +  have a large history locally.
>> +
>> +The data model
>> +--------------
>> +
>> +A submodule can be considered its own autonomous repository, that has a
>> +worktree and a git directory at a different place than the superproject.
>
> Isn't the worktree inside the superproject's worktree? I would write
> this as:
>
>     A submodule is its own repository, having its own working tree and
>     Git directory.

yes, but continued:

    The working tree is inside the superprojects working tree, at the place
    as bound via its (gitlink entry) path.
    The git directory may be either inside the working tree or contained
    inside the superprojects git directory.

>
>> +The superproject only records the commit object name in its tree, such that
>> +any other information, e.g. where to obtain a copy from, is not recorded
>> +in the core data structures of Git. The porcelain layer of Git however
>> +makes use of the `.gitmodules` file that gives hints where and how to
>> +obtain a copy of the submodule git repository from.
>
> I would write this as:
>
>     Additional metadata (for example, from where this submodule was
>     obtained) can be written to the `.gitmodules` file, to be used by
>     the porcelain layer of Git.

s/was obtained/can be obtained/ IMHO, but this sounds good.

>
>> +Submodule operations can be configured using the following mechanisms
>> +(from highest to lowest precedence):
>> +
>> + * the command line for those commands that support taking submodule specs.
>> +
>> + * the configuration file `$GIT_DIR/config`.
>
> Which $GIT_DIR? The submodule or the superproject?

superprojects

>
>> + * the configuration file `config` found in the `refs/submodule/config` branch.
>> +   This can be used to overwrite the upstream configuration in the `.gitmodules`
>> +   file without changing the history of the project.
>> +   Useful options here are overwriting the base, where relative URLs apply to,
>> +   when mirroring only parts of the larger collection of submodules.
>> +
>> + * the `.gitmodules` file inside the repository. A project usually includes this
>> +   file to suggest defaults for the upstream collection of repositories.
>
> Which .gitmodules? The submodule's or the superproject's?

superprojects

>
>> +On the location of the git directory
>> +------------------------------------
>> +
>> +Since v1.7.7 of Git, the git directory of submodules is stored inside the
>> +superprojects git directory at $GIT_DIR/modules/<submodule-name>
>
> Missing period at end of sentence.
>
> Also, is this always true? We discussed that at least sometimes, it is
> in the .git directory of the tracking directory.
>
>> +This location allows for the working tree to be non existent while keeping
>> +the history around. So we can use `git-rm` on a submodule without loosing
>> +information that may only be local; it is also possible to checkout the
>> +superproject before and after the deletion of the submodule without the
>> +need to reclone the submodule as it is kept locally.
>
> For me, it is confusing to refer to it as a "submodule" after deletion,
> since when you delete something, normally it no longer exists. Maybe
> write this as:
>
>     When a submodule is deleted using `git-rm`, its Git directory is
>     moved to the superproject in $GIT_DIR/modules/<submodule-name>. This
>     allows a submodule of the same name to later be re-added.

Not just re-added but also it is kept around if you checkout an older commit.

> Also, this seems like a strange feature to me - if there is a rationale,
> maybe it could be added.

The rationale is to not throw away the data, but you can reconstruct any
state before its deletion locally without having to re-clone/fetch it.


>
> [snip]
>
>> +Deleting a submodule
>> +--------------------
>> +
>> +Deleting a submodule can happen on different levels:
>> +
>> +1) Removing it from the local working tree without tampering with
>> +   the history of the superproject.
>> +
>> +You may no longer need the submodule, but still want to keep it recorded
>> +in the superproject history as others may have use for it.
>> +--
>> +  git submodule deinit <submodule path>
>> +--
>> +will remove the configuration entries
>> +as well as the work
>
> Incomplete sentence? Also, won't removing the configuration entries
> tamper with the history of the superproject? (.gitmodules is usually
> tracked, right?)

configuration entries in the .git/config file that are overwriting the
entries from the .gitmodules file.

>
>> +2) Remove it from history:
>> +--
>> +   git rm <submodule>
>> +--
>
> Does this completely delete the Git directory of the submodule too?
> Above, it is stated that in such a case, the directory will be moved to
> the superproject.
>
> Actually, after reading below, I see that it is not the case. I would
> include the commit step and then mention how the whole thing can be
> undone.
>
>     git rm <submodule>
>     git commit
>
>     This can be undone with `git revert`.

That makes sense.

>
>> +3) Remove the submodules git directory:
>> +
>> +When you also want to free up the disk space that the submodules git
>> +directory uses, you have to delete it manually. It is found in
>> +`$GIT_DIR/modules`.
>> +The steps 1 and 2 can be undone via `git submodule init` or
>> +`git revert`, respectively.  This step may incur data loss,
>> +and cannot be undone. That is why there is no builtin.
>> +
>> +Other mechanisms
>> +----------------
>> +
>> +Git repositories are allowed to be kept inside other repositories without
>> +the need to use submodules. This however does not enable cross-repository
>> +versioning as the inner repository is unaware of the outer repository,
>> +which in turn ignores the inner.
>
> Not sure if this is necessary.
>
>> +Submodules are not to be confused with remotes, which are other
>> +repositories of the same project; submodules are meant for
>> +different projects you would like to make part of your source tree,
>> +while the history of the two projects still stays completely
>> +independent and you cannot modify the contents of the submodule
>> +from within the main project.
>> +If you want to merge the project histories and want to treat the
>> +aggregated whole as a single project from then on, you may want to
>> +add a remote for the other project and use the 'subtree' merge strategy,
>> +instead of treating the other project as a submodule. Directories
>> +that come from both projects can be cloned and checked out as a whole
>> +if you choose to go that route.
>> +
>> +Third party tools
>> +-----------------
>> +
>> +There are a variety of third party tools that manage multiple repositories
>> +and their relationships to each other, such as Androids repo tool or git-slave.
>> +Often these tools lack cross repository versioning.
>> +
>> +https://source.android.com/source/using-repo
>> +
>> +http://gitslave.sourceforge.net/
>
> Not sure if this is necessary.
>
>> +Implementation details
>> +----------------------
>> +
>> +When cloning or pulling a repository containing submodules the submodules
>> +will not be checked out by default; You can instruct 'clone' to recurse
>
> semicolon -> period
>
>> +into submodules. The 'init' and 'update' subcommands of 'git submodule'
>> +will maintain submodules checked out and at an appropriate revision in
>> +your working tree.
>
> This part should probably be in the workflow section above. Also, above,
> "git checkout" was introduced as a way to update the state of a
> submodule - what's the difference between this and "git submodule
> update"?

checkout doesn't do any network traffic, whereas "submodule update"
tries everything it can including re-clone to obtain the submodule at the
recorded state.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC/PATCH] submodules: overhaul documentation
  2017-06-19 18:10     ` Brandon Williams
@ 2017-06-20 21:42       ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-20 21:42 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, Jonathan Nieder, git@vger.kernel.org

On Mon, Jun 19, 2017 at 11:10 AM, Brandon Williams <bmwill@google.com> wrote:

> I would probably change the first sentence to:
>
>   A submodule is another Git repository tracked at an arbitrary place
>   inside the working tree.

The tracking doesn't happen at an arbitrary place, but in
the gitlink/.gitmodules file. The location of the submodule is
at an arbitrary place within the working tree.

In a resend, I'll reword it completely (again) to focus more on the structure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCHv2] submodules: overhaul documentation
  2017-06-07 18:53 [RFC/PATCH] submodules: overhaul documentation Stefan Beller
  2017-06-13 19:29 ` Junio C Hamano
  2017-06-20 18:18 ` Jonathan Tan
@ 2017-06-20 22:56 ` Stefan Beller
  2017-06-21  3:45   ` Jonathan Tan
                     ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-20 22:56 UTC (permalink / raw)
  To: sbeller; +Cc: git, bmwill, jrnieder, gitster, jonathantanmy

This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

I have considered most of the feedback, and stopped marking it RFC,
but I'd like to propose this as a serious patch.

Thanks,
Stefan

 Documentation/Makefile          |   1 +
 Documentation/git-rm.txt        |   4 +-
 Documentation/git-submodule.txt |  44 +++-------
 Documentation/gitsubmodules.txt | 189 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 202 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 --------
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d5..9ffd129bbc 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 -----------
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[7].
 
 COMMANDS
 --------
@@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] <path>...)::
 	tree. Further calls to `git submodule update`, `git submodule foreach`
 	and `git submodule sync` will skip any unregistered submodules until
 	they are initialized again, so use this command if you don't want to
-	have a local checkout of the submodule in your working tree anymore. If
-	you really want to remove a submodule from the repository and commit
-	that use linkgit:git-rm[1] instead.
+	have a local checkout of the submodule in your working tree anymore.
 +
 When the command is run without pathspec, it errors out,
 instead of deinit-ing everything, to prevent mistakes.
 +
 If `--force` is specified, the submodule's working tree will
 be removed even if it contains local modifications.
++
+If you really want to remove a submodule from the repository and commit
+that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
+options.
 
 update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
 +
@@ -435,6 +407,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
 to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
 for details.
 
+SEE ALSO
+--------
+linkgit:gitsubmodules[7], linkgit:gitmodules[5].
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
new file mode 100644
index 0000000000..80e71ff55c
--- /dev/null
+++ b/Documentation/gitsubmodules.txt
@@ -0,0 +1,189 @@
+gitsubmodules(7)
+================
+
+NAME
+----
+gitsubmodules - mounting one repository inside another
+
+SYNOPSIS
+--------
+.gitmodules, $GIT_DIR/config
+------------------
+git submodule
+git <command> --recurse-submodules
+------------------
+
+DESCRIPTION
+-----------
+
+A submodule is another Git repository tracked inside a repository.
+The tracked repository has its own history, which does not
+interfere with the history of the current repository.
+
+It consists of a tracking subdirectory in the working directory,
+a 'gitlink' in the working tree and an entry in the `.gitmodules`
+file (see linkgit:gitmodules[5]) at the root of the source tree.
+
+The tracking subdirectory appears in the main repositorys working
+tree at the point where the submodules gitlink is tracked in the
+tree.  It is empty when the submodule is not populated, otherwise
+it contains the content of the submodule repository.
+The main repository is often referred to as superproject.
+
+The gitlink contains the object name of a particular commit
+of the submodule.
+
+The `.gitmodules` file establishes a relationship between the
+path, which is where the gitlink is in the tree, and the logical
+name, which is used for the location of the submodules git
+directory. The `.gitmodules` file has the same syntax as the
+$Git_DIR/config file and the mapping of path to name
+is done via setting `submodule.<name value>.path = <path value>`.
+
+The submodules git directory is found in in the main repositories
+'$GIT_DIR/modules/<name>' or inside the tracking subdirectory.
+
+Submodules can be used for at least two different use cases:
+
+1. Using another project while maintaining independent history.
+  Submodules allow you to contain the working tree of another project
+  within your own working tree while keeping the history of both
+  projects separate. Also, since submodules are fixed to a an arbitrary
+  version, the other project can be independently developed without
+  affecting the superproject, allowing the superproject project to
+  fix itself to new versions only whenever desired.
+
+2. Splitting a (logically single) project into multiple
+   repositories and tying them back together. This can be used to
+   overcome current limitations of Gits implementation to have
+   finer grained access:
+
+* Size of the git repository
+  In its current form Git scales up poorly for very large repositories that
+  change a lot, as the history grows very large.
+  However you can also use submodules to e.g. hold large binary assets
+  and these repositories are then shallowly cloned such that you do not
+  have a large history locally.
+
+* Transfer size
+  In its current form Git requires the whole working tree present. It
+  does not allow partial trees to be transferred in fetch or clone.
+
+* Access control
+  By restricting user access to submodules, this can be used to implement
+  read/write policies for different users.
+
+The configuration of submodules
+-------------------------------
+
+Submodule operations can be configured using the following mechanisms
+(from highest to lowest precedence):
+
+ * the command line for those commands that support taking submodule specs.
+
+ * the configuration file `$GIT_DIR/config` in the superproject.
+
+ * the `.gitmodules` file inside the superproject. A project usually
+   includes this file to suggest defaults for the upstream collection
+   of repositories.
+
+On the location of the git directory
+------------------------------------
+
+Since v1.7.7 of Git, the git directory of submodules is either stored inside
+the superprojects git directory at $GIT_DIR/modules/<submodule-name> or
+in the submodule.
+The location inside the superproject allows for the working tree to be
+non existent while keeping the history around. So we can delete a submodule
+working tree without losing information that may only be local. It is also
+possible to checkout the superproject before and after the deletion of the
+submodule without the need to reclone the submodule as it is kept locally.
+
+Workflow for a third party library
+----------------------------------
+
+  # add the submodule
+  git submodule add <url> <path>
+
+  # occasionally update the submodule to a new version:
+  git -C <path> checkout <new version>
+  git add <path>
+  git commit -m "update submodule to new version"
+
+  # see the discussion below on deleting submodules
+
+
+Workflow for an artifically split repo
+--------------------------------------
+
+  # Enable recursion for relevant commands, such that
+  # regular commands recurse into submodules by default
+  git config --global submodule.recurse true
+
+  # Unlike the other commands below clone still needs
+  # its own recurse flag:
+  git clone --recurse <URL> <directory>
+  cd <directory>
+
+  # Get to know the code:
+  git grep foo
+  git ls-files
+
+  # Get new code
+  git fetch
+  git pull --rebase
+
+  # change worktree
+  git checkout
+  git reset
+
+Deleting a submodule
+--------------------
+
+Deleting a submodule can happen on different levels:
+
+1) Removing it from the local working tree without tampering with
+   the history of the superproject.
+
+You may no longer need the submodule, but still want to keep it recorded
+in the superproject history as others may have use for it. The command
+`git submodule deinit <submodule path>` will remove any configuration
+entries from the config file, such that the submodule becomes
+uninitialized. The tracking directory in the superprojects working
+tree that holds the submodules working directory is emptied.
+This step can be undone via `git submodule init`.
+
+2) Remove it from history:
+--
+   git rm <submodule path>
+   git commit
+--
+This removes the submodules gitlink from the superprojects tree, as well
+as removing the entries from the `.gitmodules` file, but keeps the
+local configuration for the submodule. This can be undone using `git revert`.
+
+
+3) Remove the submodules git directory:
+
+When you also want to free up the disk space that the submodules git
+directory uses, you have to delete it manually as this
+step cannot be undone using git tools. It is found in `$GIT_DIR/modules`.
+
+Implementation details
+----------------------
+
+When cloning or pulling a repository containing submodules the submodules
+will not be checked out by default; You can instruct 'clone' to recurse
+into submodules. The 'init' and 'update' subcommands of 'git submodule'
+will maintain submodules checked out and at an appropriate revision in
+your working tree. Alternatively you can set 'submodule.recurse' to have
+'checkout' recursing into submodules.
+
+
+SEE ALSO
+--------
+linkgit:git-submodule[1], linkgit:gitmodules[5].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.12.2.575.gb14f27f917


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
@ 2017-06-21  3:45   ` Jonathan Tan
  2017-06-21 17:25     ` Stefan Beller
  2017-06-22 17:46   ` Brandon Williams
  2017-06-22 21:01   ` [PATCHv3] " Stefan Beller
  2 siblings, 1 reply; 17+ messages in thread
From: Jonathan Tan @ 2017-06-21  3:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git mailing list, Brandon Williams, Jonathan Nieder,
	Junio C Hamano

Thanks, this looks like a good explanation. Some more nits, but
overall I feel like I understand this and have learned something from
it.

On Tue, Jun 20, 2017 at 3:56 PM, Stefan Beller <sbeller@google.com> wrote:
> +A submodule is another Git repository tracked inside a repository.
> +The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +It consists of a tracking subdirectory in the working directory,
> +a 'gitlink' in the working tree and an entry in the `.gitmodules`

Probably should be `gitlink` (the special quotes), and (optional)
s/`gitlink`/`gitlink` object/ because it might not be apparent that
gitlink is a type of object.

> +file (see linkgit:gitmodules[5]) at the root of the source tree.

After reading below, maybe we should mention the Git directory in
$GIT_DIR/modules/<submodule name> as part of what a submodule consists
of too.

> +The tracking subdirectory appears in the main repositorys working

s/repositorys/repository's/ (apostrophe is also missing in some other
places below)

> +tree at the point where the submodules gitlink is tracked in the
> +tree.  It is empty when the submodule is not populated, otherwise
> +it contains the content of the submodule repository.
> +The main repository is often referred to as superproject.
> +
> +The gitlink contains the object name of a particular commit
> +of the submodule.
> +
> +The `.gitmodules` file establishes a relationship between the
> +path, which is where the gitlink is in the tree, and the logical
> +name, which is used for the location of the submodules git
> +directory. The `.gitmodules` file has the same syntax as the
> +$Git_DIR/config file and the mapping of path to name

Capitalization of $GIT_DIR

> +is done via setting `submodule.<name value>.path = <path value>`.

(Optional) I would prefer <name> and <path> to be consistent with the
following paragraph.

> +The submodules git directory is found in in the main repositories
> +'$GIT_DIR/modules/<name>' or inside the tracking subdirectory.
> +
> +Submodules can be used for at least two different use cases:
> +
> +1. Using another project while maintaining independent history.
> +  Submodules allow you to contain the working tree of another project
> +  within your own working tree while keeping the history of both
> +  projects separate. Also, since submodules are fixed to a an arbitrary
> +  version, the other project can be independently developed without
> +  affecting the superproject, allowing the superproject project to
> +  fix itself to new versions only whenever desired.
> +
> +2. Splitting a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome current limitations of Gits implementation to have
> +   finer grained access:
> +
> +* Size of the git repository
> +  In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +* Transfer size
> +  In its current form Git requires the whole working tree present. It
> +  does not allow partial trees to be transferred in fetch or clone.
> +
> +* Access control
> +  By restricting user access to submodules, this can be used to implement
> +  read/write policies for different users.

The bullet points should probably be indented more.

[snip]

> +Deleting a submodule
> +--------------------
> +
> +Deleting a submodule can happen on different levels:
> +
> +1) Removing it from the local working tree without tampering with
> +   the history of the superproject.
> +
> +You may no longer need the submodule, but still want to keep it recorded
> +in the superproject history as others may have use for it. The command
> +`git submodule deinit <submodule path>` will remove any configuration
> +entries from the config file, such that the submodule becomes

s=config=$GIT_DIR/config= (since there are multiple relevant config files)

> +uninitialized. The tracking directory in the superprojects working
> +tree that holds the submodules working directory is emptied.
> +This step can be undone via `git submodule init`.
> +
> +2) Remove it from history:
> +--
> +   git rm <submodule path>
> +   git commit
> +--
> +This removes the submodules gitlink from the superprojects tree, as well
> +as removing the entries from the `.gitmodules` file, but keeps the
> +local configuration for the submodule. This can be undone using `git revert`.
> +
> +
> +3) Remove the submodules git directory:
> +
> +When you also want to free up the disk space that the submodules git
> +directory uses, you have to delete it manually as this
> +step cannot be undone using git tools. It is found in `$GIT_DIR/modules`.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-21  3:45   ` Jonathan Tan
@ 2017-06-21 17:25     ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-21 17:25 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Git mailing list, Brandon Williams, Jonathan Nieder,
	Junio C Hamano

On Tue, Jun 20, 2017 at 8:45 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Thanks, this looks like a good explanation. Some more nits, but
> overall I feel like I understand this and have learned something from
> it.
>
> On Tue, Jun 20, 2017 at 3:56 PM, Stefan Beller <sbeller@google.com> wrote:
>> +A submodule is another Git repository tracked inside a repository.
>> +The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>> +
>> +It consists of a tracking subdirectory in the working directory,
>> +a 'gitlink' in the working tree and an entry in the `.gitmodules`
>
> Probably should be `gitlink` (the special quotes), and (optional)
> s/`gitlink`/`gitlink` object/ because it might not be apparent that
> gitlink is a type of object.
>
>> +file (see linkgit:gitmodules[5]) at the root of the source tree.
>
> After reading below, maybe we should mention the Git directory in
> $GIT_DIR/modules/<submodule name> as part of what a submodule consists
> of too.

I implemented the rest of the suggestions, but this one leaves my head
scratching.
It's not as if we don't know what a submodule is already. Our glossary says:

    submodule
           A repository that holds the history of a separate project inside
           another repository (the latter of which is called superproject).

So by that definition the gitlink and the entry in .gitmodules are just
metadata by the superproject and the actual submodule is the
repository consisting of its working tree (inside the superprojects
working tree)
as well as its git directory preferably inside the superprojects git dir.

This definition however is broken IMHO, as when there is proper meta data
(gitlink + .gitmodules entry), you can have an "uninitialized" submodule,
which has neither a working tree nor a git dir at the time.

The wording above holds true for un-{initialized,populated} submodules
as well, as the tracking directory is empty and the git dir doesn't exist.

So I think I'll resend without that change.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
  2017-06-21  3:45   ` Jonathan Tan
@ 2017-06-22 17:46   ` Brandon Williams
  2017-06-22 18:54     ` Stefan Beller
  2017-06-22 20:20     ` Junio C Hamano
  2017-06-22 21:01   ` [PATCHv3] " Stefan Beller
  2 siblings, 2 replies; 17+ messages in thread
From: Brandon Williams @ 2017-06-22 17:46 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, gitster, jonathantanmy

On 06/20, Stefan Beller wrote:
> This patch aims to detangle (a) the usage of `git-submodule`
> from (b) the concept of submodules and (c) how the actual
> implementation looks like, such as where they are configured
> and (d) what the best practices are.
> 
> To do so, move the conceptual parts of the 'git-submodule'
> man page to a new man page gitsubmodules(7). This new page
> is just like gitmodules(5), gitattributes(5), gitcredentials(7),
> gitnamespaces(7), gittutorial(7), which introduce a concept
> rather than explaining a specific command.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> 
> I have considered most of the feedback, and stopped marking it RFC,
> but I'd like to propose this as a serious patch.
> 
> Thanks,
> Stefan
> 
>  Documentation/Makefile          |   1 +
>  Documentation/git-rm.txt        |   4 +-
>  Documentation/git-submodule.txt |  44 +++-------
>  Documentation/gitsubmodules.txt | 189 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 202 insertions(+), 36 deletions(-)
>  create mode 100644 Documentation/gitsubmodules.txt
> 
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index b5be2e2d3f..2415e0d657 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
>  MAN7_TXT += gitglossary.txt
>  MAN7_TXT += gitnamespaces.txt
>  MAN7_TXT += gitrevisions.txt
> +MAN7_TXT += gitsubmodules.txt
>  MAN7_TXT += gittutorial-2.txt
>  MAN7_TXT += gittutorial.txt
>  MAN7_TXT += gitworkflows.txt
> diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
> index f1efc116eb..db444693dd 100644
> --- a/Documentation/git-rm.txt
> +++ b/Documentation/git-rm.txt
> @@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a submodule's work
>  tree from being removed.
>  
>  If you only want to remove the local checkout of a submodule from your
> -work tree without committing the removal,
> -use linkgit:git-submodule[1] `deinit` instead.
> +work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
> +instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
>  
>  EXAMPLES
>  --------
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index 74bc6200d5..9ffd129bbc 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -24,37 +24,7 @@ DESCRIPTION
>  -----------
>  Inspects, updates and manages submodules.
>  
> -A submodule allows you to keep another Git repository in a subdirectory
> -of your repository. The other repository has its own history, which does not
> -interfere with the history of the current repository. This can be used to
> -have external dependencies such as third party libraries for example.
> -
> -When cloning or pulling a repository containing submodules however,
> -these will not be checked out by default; the 'init' and 'update'
> -subcommands will maintain submodules checked out and at
> -appropriate revision in your working tree.
> -
> -Submodules are composed from a so-called `gitlink` tree entry
> -in the main repository that refers to a particular commit object
> -within the inner repository that is completely separate.
> -A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
> -root of the source tree assigns a logical name to the submodule and
> -describes the default URL the submodule shall be cloned from.
> -The logical name can be used for overriding this URL within your
> -local repository configuration (see 'submodule init').
> -
> -Submodules are not to be confused with remotes, which are other
> -repositories of the same project; submodules are meant for
> -different projects you would like to make part of your source tree,
> -while the history of the two projects still stays completely
> -independent and you cannot modify the contents of the submodule
> -from within the main project.
> -If you want to merge the project histories and want to treat the
> -aggregated whole as a single project from then on, you may want to
> -add a remote for the other project and use the 'subtree' merge strategy,
> -instead of treating the other project as a submodule. Directories
> -that come from both projects can be cloned and checked out as a whole
> -if you choose to go that route.
> +For more information about submodules, see linkgit:gitsubmodules[7].
>  
>  COMMANDS
>  --------
> @@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] <path>...)::
>  	tree. Further calls to `git submodule update`, `git submodule foreach`
>  	and `git submodule sync` will skip any unregistered submodules until
>  	they are initialized again, so use this command if you don't want to
> -	have a local checkout of the submodule in your working tree anymore. If
> -	you really want to remove a submodule from the repository and commit
> -	that use linkgit:git-rm[1] instead.
> +	have a local checkout of the submodule in your working tree anymore.
>  +
>  When the command is run without pathspec, it errors out,
>  instead of deinit-ing everything, to prevent mistakes.
>  +
>  If `--force` is specified, the submodule's working tree will
>  be removed even if it contains local modifications.
> ++
> +If you really want to remove a submodule from the repository and commit
> +that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
> +options.
>  
>  update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
>  +
> @@ -435,6 +407,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
>  to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
>  for details.
>  
> +SEE ALSO
> +--------
> +linkgit:gitsubmodules[7], linkgit:gitmodules[5].
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> new file mode 100644
> index 0000000000..80e71ff55c
> --- /dev/null
> +++ b/Documentation/gitsubmodules.txt
> @@ -0,0 +1,189 @@
> +gitsubmodules(7)
> +================
> +
> +NAME
> +----
> +gitsubmodules - mounting one repository inside another
> +
> +SYNOPSIS
> +--------
> +.gitmodules, $GIT_DIR/config
> +------------------
> +git submodule
> +git <command> --recurse-submodules
> +------------------
> +
> +DESCRIPTION
> +-----------
> +
> +A submodule is another Git repository tracked inside a repository.
> +The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +It consists of a tracking subdirectory in the working directory,
> +a 'gitlink' in the working tree and an entry in the `.gitmodules`
> +file (see linkgit:gitmodules[5]) at the root of the source tree.
> +
> +The tracking subdirectory appears in the main repositorys working

s/repositorys/repository's

> +tree at the point where the submodules gitlink is tracked in the

s/submodules/submodule's

> +tree.  It is empty when the submodule is not populated, otherwise
> +it contains the content of the submodule repository.
> +The main repository is often referred to as superproject.

maybe: "referred to as a superproject"

> +
> +The gitlink contains the object name of a particular commit
> +of the submodule.
> +
> +The `.gitmodules` file establishes a relationship between the
> +path, which is where the gitlink is in the tree, and the logical
> +name, which is used for the location of the submodules git

s/submodules/submodule's

> +directory. The `.gitmodules` file has the same syntax as the
> +$Git_DIR/config file and the mapping of path to name
> +is done via setting `submodule.<name value>.path = <path value>`.
> +
> +The submodules git directory is found in in the main repositories

s/submodules/submodule's
s/repositories/repository's

> +'$GIT_DIR/modules/<name>' or inside the tracking subdirectory.

Well I'd say that the preferred place is inside the main repo's gitdir,
(as in the normal location) but this is correct.

> +
> +Submodules can be used for at least two different use cases:
> +
> +1. Using another project while maintaining independent history.
> +  Submodules allow you to contain the working tree of another project
> +  within your own working tree while keeping the history of both
> +  projects separate. Also, since submodules are fixed to a an arbitrary
> +  version, the other project can be independently developed without
> +  affecting the superproject, allowing the superproject project to
> +  fix itself to new versions only whenever desired.
> +
> +2. Splitting a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome current limitations of Gits implementation to have
> +   finer grained access:
> +
> +* Size of the git repository
> +  In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +* Transfer size
> +  In its current form Git requires the whole working tree present. It
> +  does not allow partial trees to be transferred in fetch or clone.
> +
> +* Access control
> +  By restricting user access to submodules, this can be used to implement
> +  read/write policies for different users.
> +
> +The configuration of submodules
> +-------------------------------
> +
> +Submodule operations can be configured using the following mechanisms
> +(from highest to lowest precedence):
> +
> + * the command line for those commands that support taking submodule specs.
> +
> + * the configuration file `$GIT_DIR/config` in the superproject.
> +
> + * the `.gitmodules` file inside the superproject. A project usually
> +   includes this file to suggest defaults for the upstream collection
> +   of repositories.

I dislike this last point.  Realistically we don't want this right?  So
perhaps we shouldn't include it?

> +
> +On the location of the git directory
> +------------------------------------
> +
> +Since v1.7.7 of Git, the git directory of submodules is either stored inside
> +the superprojects git directory at $GIT_DIR/modules/<submodule-name> or
> +in the submodule.
> +The location inside the superproject allows for the working tree to be
> +non existent while keeping the history around. So we can delete a submodule

s/submodule/submodule's

> +working tree without losing information that may only be local. It is also
> +possible to checkout the superproject before and after the deletion of the
> +submodule without the need to reclone the submodule as it is kept locally.
> +
> +Workflow for a third party library
> +----------------------------------
> +
> +  # add the submodule
> +  git submodule add <url> <path>
> +
> +  # occasionally update the submodule to a new version:
> +  git -C <path> checkout <new version>
> +  git add <path>
> +  git commit -m "update submodule to new version"
> +
> +  # see the discussion below on deleting submodules
> +
> +
> +Workflow for an artifically split repo
> +--------------------------------------
> +
> +  # Enable recursion for relevant commands, such that
> +  # regular commands recurse into submodules by default
> +  git config --global submodule.recurse true
> +
> +  # Unlike the other commands below clone still needs
> +  # its own recurse flag:
> +  git clone --recurse <URL> <directory>
> +  cd <directory>
> +
> +  # Get to know the code:
> +  git grep foo
> +  git ls-files
> +
> +  # Get new code
> +  git fetch
> +  git pull --rebase
> +
> +  # change worktree
> +  git checkout
> +  git reset
> +
> +Deleting a submodule
> +--------------------
> +
> +Deleting a submodule can happen on different levels:
> +
> +1) Removing it from the local working tree without tampering with
> +   the history of the superproject.
> +
> +You may no longer need the submodule, but still want to keep it recorded
> +in the superproject history as others may have use for it. The command

s/superproject/superproject's

> +`git submodule deinit <submodule path>` will remove any configuration
> +entries from the config file, such that the submodule becomes
> +uninitialized. The tracking directory in the superprojects working

Do we want to use the term 'active' instead of un/initialized?  Unless
you intend for these to mean different things.

> +tree that holds the submodules working directory is emptied.
> +This step can be undone via `git submodule init`.
> +
> +2) Remove it from history:

I'd argue that this doesn't remove the submodule from history, it still
exists in the history.

> +--
> +   git rm <submodule path>
> +   git commit
> +--
> +This removes the submodules gitlink from the superprojects tree, as well
> +as removing the entries from the `.gitmodules` file, but keeps the
> +local configuration for the submodule. This can be undone using `git revert`.
> +
> +
> +3) Remove the submodules git directory:

s/submodules/submodule's

> +
> +When you also want to free up the disk space that the submodules git
> +directory uses, you have to delete it manually as this
> +step cannot be undone using git tools. It is found in `$GIT_DIR/modules`.
> +
> +Implementation details
> +----------------------
> +
> +When cloning or pulling a repository containing submodules the submodules
> +will not be checked out by default; You can instruct 'clone' to recurse
> +into submodules. The 'init' and 'update' subcommands of 'git submodule'
> +will maintain submodules checked out and at an appropriate revision in
> +your working tree. Alternatively you can set 'submodule.recurse' to have
> +'checkout' recursing into submodules.
> +
> +
> +SEE ALSO
> +--------
> +linkgit:git-submodule[1], linkgit:gitmodules[5].
> +
> +GIT
> +---
> +Part of the linkgit:git[1] suite
> -- 
> 2.12.2.575.gb14f27f917
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-22 17:46   ` Brandon Williams
@ 2017-06-22 18:54     ` Stefan Beller
  2017-06-22 20:20     ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-22 18:54 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git@vger.kernel.org, Jonathan Nieder, Junio C Hamano,
	Jonathan Tan

>> + * the `.gitmodules` file inside the superproject. A project usually
>> +   includes this file to suggest defaults for the upstream collection
>> +   of repositories.
>
> I dislike this last point.  Realistically we don't want this right?  So
> perhaps we shouldn't include it?

Well, it describes the current situation accurately. In a resend
I'll de-emphasize it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-22 17:46   ` Brandon Williams
  2017-06-22 18:54     ` Stefan Beller
@ 2017-06-22 20:20     ` Junio C Hamano
  2017-06-22 20:27       ` Stefan Beller
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-06-22 20:20 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Stefan Beller, git, jrnieder, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> On 06/20, Stefan Beller wrote:
> ...
>> +The configuration of submodules
>> +-------------------------------
>> +
>> +Submodule operations can be configured using the following mechanisms
>> +(from highest to lowest precedence):
>> +
>> + * the command line for those commands that support taking submodule specs.
>> +
>> + * the configuration file `$GIT_DIR/config` in the superproject.
>> +
>> + * the `.gitmodules` file inside the superproject. A project usually
>> +   includes this file to suggest defaults for the upstream collection
>> +   of repositories.
>
> I dislike this last point.  Realistically we don't want this right?  So
> perhaps we shouldn't include it?

I am not sure if I follow.  Without .gitmodules, how would you, as a
downstream developer, bootstrap the whole thing?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-22 20:20     ` Junio C Hamano
@ 2017-06-22 20:27       ` Stefan Beller
  2017-06-22 21:03         ` Brandon Williams
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Beller @ 2017-06-22 20:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, git@vger.kernel.org, Jonathan Nieder,
	Jonathan Tan

On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> On 06/20, Stefan Beller wrote:
>> ...
>>> +The configuration of submodules
>>> +-------------------------------
>>> +
>>> +Submodule operations can be configured using the following mechanisms
>>> +(from highest to lowest precedence):
>>> +
>>> + * the command line for those commands that support taking submodule specs.
>>> +
>>> + * the configuration file `$GIT_DIR/config` in the superproject.
>>> +
>>> + * the `.gitmodules` file inside the superproject. A project usually
>>> +   includes this file to suggest defaults for the upstream collection
>>> +   of repositories.
>>
>> I dislike this last point.  Realistically we don't want this right?  So
>> perhaps we shouldn't include it?
>
> I am not sure if I follow.  Without .gitmodules, how would you, as a
> downstream developer, bootstrap the whole thing?
>

I think Brandon eludes to our long term vision of having a separate
magic ref containing these informations instead of carrying it in tree.

As urls change over time, it is better to keep the urls out of the
actual history, but still versioned so maybe we'll want to have
a ref/submodule-config/master ref that contains all the bootstrapping
information. The .gitmodules file would degenerate to a pure
name<->path mapping.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCHv3] submodules: overhaul documentation
  2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
  2017-06-21  3:45   ` Jonathan Tan
  2017-06-22 17:46   ` Brandon Williams
@ 2017-06-22 21:01   ` Stefan Beller
  2 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-22 21:01 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder

This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

Rerolling sb/submodule-doc once again.

* Adding examples to the config section
* no extra word invented (the "tracking directory" is no more)

I rebased this on top of Kaartics submodule cleanup patch, but there were
no conflicts, such that we can keep it in parallel branches.

 Documentation/Makefile          |   1 +
 Documentation/git-rm.txt        |   4 +-
 Documentation/git-submodule.txt |  44 ++------
 Documentation/gitsubmodules.txt | 221 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 234 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 --------
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 6e07bade39..e67b58bddc 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 -----------
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[7].
 
 COMMANDS
 --------
@@ -142,15 +112,17 @@ deinit [-f|--force] (--all|[--] <path>...)::
 	tree. Further calls to `git submodule update`, `git submodule foreach`
 	and `git submodule sync` will skip any unregistered submodules until
 	they are initialized again, so use this command if you don't want to
-	have a local checkout of the submodule in your working tree anymore. If
-	you really want to remove a submodule from the repository and commit
-	that use linkgit:git-rm[1] instead.
+	have a local checkout of the submodule in your working tree anymore.
 +
 When the command is run without pathspec, it errors out,
 instead of deinit-ing everything, to prevent mistakes.
 +
 If `--force` is specified, the submodule's working tree will
 be removed even if it contains local modifications.
++
+If you really want to remove a submodule from the repository and commit
+that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
+options.
 
 update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]::
 +
@@ -428,6 +400,10 @@ This file should be formatted in the same way as `$GIT_DIR/config`. The key
 to each submodule url is "submodule.$name.url".  See linkgit:gitmodules[5]
 for details.
 
+SEE ALSO
+--------
+linkgit:gitsubmodules[7], linkgit:gitmodules[5].
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
new file mode 100644
index 0000000000..46cf120f66
--- /dev/null
+++ b/Documentation/gitsubmodules.txt
@@ -0,0 +1,221 @@
+gitsubmodules(7)
+================
+
+NAME
+----
+gitsubmodules - mounting one repository inside another
+
+SYNOPSIS
+--------
+ .gitmodules, $GIT_DIR/config
+------------------
+git submodule
+git <command> --recurse-submodules
+------------------
+
+DESCRIPTION
+-----------
+
+A submodule is a repository embedded inside another repository.
+The submodule has its own history; the repository it is embedded
+in is called a superproject.
+
+On the filesystem, a submodule usually (but not always - see FORMS below)
+consists of (i) a Git directory located under the `$GIT_DIR/modules/`
+directory of its superproject, (ii) a working directory inside the
+superproject's working directory, and a `.git` file at the root of
+the submodule’s working directory pointing to (i).
+
+Assuming the submodule has a Git directory at `$GIT_DIR/modules/foo/`
+and a working directory at `path/to/bar/`, the superproject tracks the
+submodule via a `gitlink` entry in the tree at `path/to/bar` and an entry
+in its `.gitmodules` file (see linkgit:gitmodules[5]) of the form
+`submodule.foo.path = path/to/bar`.
+
+The `gitlink` entry contains the object name of the commit that the
+superproject expects the submodule’s working directory to be at.
+
+The section `submodule.foo.*` in the `.gitmodules` file gives additional
+hints to Gits porcelain layer such as where to obtain the submodule via
+the `submodule.foo.url` setting.
+
+Submodules can be used for at least two different use cases:
+
+1. Using another project while maintaining independent history.
+  Submodules allow you to contain the working tree of another project
+  within your own working tree while keeping the history of both
+  projects separate. Also, since submodules are fixed to an arbitrary
+  version, the other project can be independently developed without
+  affecting the superproject, allowing the superproject project to
+  fix itself to new versions only when desired.
+
+2. Splitting a (logically single) project into multiple
+   repositories and tying them back together. This can be used to
+   overcome current limitations of Gits implementation to have
+   finer grained access:
+
+    * Size of the git repository:
+      In its current form Git scales up poorly for large repositories containing
+      content that is not compressed by delta computation between trees.
+      However you can also use submodules to e.g. hold large binary assets
+      and these repositories are then shallowly cloned such that you do not
+      have a large history locally.
+    * Transfer size:
+      In its current form Git requires the whole working tree present. It
+      does not allow partial trees to be transferred in fetch or clone.
+    * Access control:
+      By restricting user access to submodules, this can be used to implement
+      read/write policies for different users.
+
+The configuration of submodules
+-------------------------------
+
+Submodule operations can be configured using the following mechanisms
+(from highest to lowest precedence):
+
+ * The command line for those commands that support taking submodule specs.
+   Most commands have a boolean flag '--recurse-submodules' whether to
+   recurse into submodules. Examples are `ls-files` or `checkout`.
+   Some commands take enums, such as `fetch` and `push`, where you can
+   specify how submodules are affected.
+
+ * The configuration inside the submodule. This includes `$GIT_DIR/config`
+   in the submodule, but also settings in the tree such as a `.gitattributes`
+   or `.gitignore` files that specify behavior of commands inside the
+   submodule.
++
+For example an effect from the submodule's `.gitignore` file
+would be observed when you run `git status --ignore-submodules=none` in
+the superproject. This collects information from the submodule's working
+directory by running `status` in the submodule, which does pay attention
+to its `.gitignore` file.
++
+The submodule's `$GIT_DIR/config` file would come into play when running
+`git push --recurse-submodules=check` in the superproject, as this would
+check if the submodule has any changes not published to any remote. The
+remotes are configured in the submodule as usual in the `$GIT_DIR/config`
+file.
+
+ * The configuration file `$GIT_DIR/config` in the superproject.
+   Typical configuration at this place is controlling if a submodule
+   is recursed into at all via the `active` flag for example.
++
+If the submodule is not yet initialized, then the configuration
+inside the submodule does not exist yet, so configuration where to
+obtain the submodule from is configured here for example.
+
+ * the `.gitmodules` file inside the superproject. Additionally to the
+   required mapping between submodule's name and path, a project usually
+   uses this file to suggest defaults for the upstream collection
+   of repositories.
++
+This file mainly serves as the mapping between name and path in
+the superproject, such that the submodule's git directory can be
+located.
++
+If the submodule has never been initialized, this is the only place
+where submodule configuration is found. It serves as the last fallback
+to specify where to obtain the submodule from.
+
+FORMS
+-----
+
+Submodules can take the following forms:
+
+ * The basic form described in DESCRIPTION with a Git directory,
+a working directory, a `gitlink`, and a `.gitmodules` entry.
+
+ * "Old-form" submodule: A working directory with an embedded
+`.git` directory, and the tracking `gitlink` and `.gitmodules` entry in
+the superproject. This is typically found in repositories generated
+using older versions of Git.
++
+It is possible to construct these old form repositories manually.
++
+When deinitialized or deleted (see below), the submodule’s Git
+directory is automatically moved to `$GIT_DIR/modules/<name>/`
+of the superproject.
+
+ * Deinitialized submodule: A `gitlink`, and a `.gitmodules` entry,
+but no submodule working directory. The submodule’s git directory
+may be there as after deinitializing the git directory is kept around.
+The directory which is supposed to be the working directory is empty instead.
++
+A submodule can be deinitialized by running `git submodule deinit`.
+Besides emptying the working directory, this command only modifies
+the superproject’s `$GIT_DIR/config` file, so the superproject’s history
+is not affected. This can be undone using `git submodule init`.
+
+ * Deleted submodule: A submodule can be deleted by running
+`git rm <submodule path> && git commit`. This can be undone
+using `git revert`.
++
+The deletion removes the superproject’s tracking data, which are
+both the `gitlink` entry and the section in the `.gitmodules` file.
+The submodule’s working directory is removed from the file
+system, but the Git directory is kept around as it to make it
+possible to checkout past commits without requiring fetching
+from another repository.
++
+To completely remove a submodule, manually delete
+`$GIT_DIR/modules/<name>/`.
+
+Workflow for a third party library
+----------------------------------
+
+  # add a submodule
+  git submodule add <url> <path>
+
+  # occasionally update the submodule to a new version:
+  git -C <path> checkout <new version>
+  git add <path>
+  git commit -m "update submodule to new version"
+
+  # See the list of submodules in a superproject
+  git submodule status
+
+  # See FORMS on removing submodules
+
+
+Workflow for an artificially split repo
+--------------------------------------
+
+  # Enable recursion for relevant commands, such that
+  # regular commands recurse into submodules by default
+  git config --global submodule.recurse true
+
+  # Unlike the other commands below clone still needs
+  # its own recurse flag:
+  git clone --recurse <URL> <directory>
+  cd <directory>
+
+  # Get to know the code:
+  git grep foo
+  git ls-files
+
+  # Get new code
+  git fetch
+  git pull --rebase
+
+  # change worktree
+  git checkout
+  git reset
+
+Implementation details
+----------------------
+
+When cloning or pulling a repository containing submodules the submodules
+will not be checked out by default; You can instruct 'clone' to recurse
+into submodules. The 'init' and 'update' subcommands of 'git submodule'
+will maintain submodules checked out and at an appropriate revision in
+your working tree. Alternatively you can set 'submodule.recurse' to have
+'checkout' recursing into submodules.
+
+
+SEE ALSO
+--------
+linkgit:git-submodule[1], linkgit:gitmodules[5].
+
+GIT
+---
+Part of the linkgit:git[1] suite
-- 
2.12.2.575.gb14f27f917


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-22 20:27       ` Stefan Beller
@ 2017-06-22 21:03         ` Brandon Williams
  2017-06-22 21:09           ` Stefan Beller
  0 siblings, 1 reply; 17+ messages in thread
From: Brandon Williams @ 2017-06-22 21:03 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, git@vger.kernel.org, Jonathan Nieder,
	Jonathan Tan

On 06/22, Stefan Beller wrote:
> On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Brandon Williams <bmwill@google.com> writes:
> >
> >> On 06/20, Stefan Beller wrote:
> >> ...
> >>> +The configuration of submodules
> >>> +-------------------------------
> >>> +
> >>> +Submodule operations can be configured using the following mechanisms
> >>> +(from highest to lowest precedence):
> >>> +
> >>> + * the command line for those commands that support taking submodule specs.
> >>> +
> >>> + * the configuration file `$GIT_DIR/config` in the superproject.
> >>> +
> >>> + * the `.gitmodules` file inside the superproject. A project usually
> >>> +   includes this file to suggest defaults for the upstream collection
> >>> +   of repositories.
> >>
> >> I dislike this last point.  Realistically we don't want this right?  So
> >> perhaps we shouldn't include it?
> >
> > I am not sure if I follow.  Without .gitmodules, how would you, as a
> > downstream developer, bootstrap the whole thing?
> >
> 
> I think Brandon eludes to our long term vision of having a separate
> magic ref containing these informations instead of carrying it in tree.
> 
> As urls change over time, it is better to keep the urls out of the
> actual history, but still versioned so maybe we'll want to have
> a ref/submodule-config/master ref that contains all the bootstrapping
> information. The .gitmodules file would degenerate to a pure
> name<->path mapping.

I was more eluding to having fetch.recurse and the other similar bits
stored in the gitmodules file.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHv2] submodules: overhaul documentation
  2017-06-22 21:03         ` Brandon Williams
@ 2017-06-22 21:09           ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-22 21:09 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Junio C Hamano, git@vger.kernel.org, Jonathan Nieder,
	Jonathan Tan

On Thu, Jun 22, 2017 at 2:03 PM, Brandon Williams <bmwill@google.com> wrote:
> On 06/22, Stefan Beller wrote:
>> On Thu, Jun 22, 2017 at 1:20 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> > Brandon Williams <bmwill@google.com> writes:
>> >
>> >> On 06/20, Stefan Beller wrote:
>> >> ...
>> >>> +The configuration of submodules
>> >>> +-------------------------------
>> >>> +
>> >>> +Submodule operations can be configured using the following mechanisms
>> >>> +(from highest to lowest precedence):
>> >>> +
>> >>> + * the command line for those commands that support taking submodule specs.
>> >>> +
>> >>> + * the configuration file `$GIT_DIR/config` in the superproject.
>> >>> +
>> >>> + * the `.gitmodules` file inside the superproject. A project usually
>> >>> +   includes this file to suggest defaults for the upstream collection
>> >>> +   of repositories.
>> >>
>> >> I dislike this last point.  Realistically we don't want this right?  So
>> >> perhaps we shouldn't include it?
>> >
>> > I am not sure if I follow.  Without .gitmodules, how would you, as a
>> > downstream developer, bootstrap the whole thing?
>> >
>>
>> I think Brandon eludes to our long term vision of having a separate
>> magic ref containing these informations instead of carrying it in tree.
>>
>> As urls change over time, it is better to keep the urls out of the
>> actual history, but still versioned so maybe we'll want to have
>> a ref/submodule-config/master ref that contains all the bootstrapping
>> information. The .gitmodules file would degenerate to a pure
>> name<->path mapping.
>
> I was more eluding to having fetch.recurse and the other similar bits
> stored in the gitmodules file.

Well yes, but these configurations would also go onto the new magic ref
once implemented.

And to answer your question:
Yes we (you and me) dislike it, but it is the best we can do given
the current implementation. Once the implementation changes,
we may want to adapt this man page.


>
> --
> Brandon Williams

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-06-22 21:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 18:53 [RFC/PATCH] submodules: overhaul documentation Stefan Beller
2017-06-13 19:29 ` Junio C Hamano
2017-06-13 21:06   ` Stefan Beller
2017-06-19 18:10     ` Brandon Williams
2017-06-20 21:42       ` Stefan Beller
2017-06-20 18:18 ` Jonathan Tan
2017-06-20 19:15   ` Stefan Beller
2017-06-20 22:56 ` [PATCHv2] " Stefan Beller
2017-06-21  3:45   ` Jonathan Tan
2017-06-21 17:25     ` Stefan Beller
2017-06-22 17:46   ` Brandon Williams
2017-06-22 18:54     ` Stefan Beller
2017-06-22 20:20     ` Junio C Hamano
2017-06-22 20:27       ` Stefan Beller
2017-06-22 21:03         ` Brandon Williams
2017-06-22 21:09           ` Stefan Beller
2017-06-22 21:01   ` [PATCHv3] " Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).