git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section
@ 2020-04-21 13:11 Shourya Shukla
  0 siblings, 0 replies; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:11 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

This is the third version of addition of issues in the 'Common Issues' section.
In this version I have:
	1. Changed column wrapping from 90 col. to ~70 col.
	2. Removed the issues: 'rebasing-and-merging' & 'checking-out'
	3. Added issue: 'shallow-cloning'
	4. Separated the issues in individual commits.
	5. Corrected spelling and grammatical mistakes.

I decided to drop the issues mentioned in (2) because of the lack of clarity
in them. As Junio advised, it would be better to improve their respective
documentations rather than adding them in the FAQ.

I really appreciate Junio and Brian for reviewing my patch in such great detail :)

Regards,
Shourya Shukla

Shourya Shukla (4):
  gitfaq: files in .gitignore are tracked
  gitfaq: changing the remote of a repository
  gitfaq: shallow cloning a repository
  gitfaq: fetching and pulling a repository

 Documentation/gitfaq.txt | 86 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section
@ 2020-04-21 13:12 Shourya Shukla
  2020-04-21 13:12 ` [PATCH v3 1/4] gitfaq: files in .gitignore are tracked Shourya Shukla
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

This is the third version of addition of issues in the 'Common Issues' section.
In this version I have:
	1. Changed column wrapping from 90 col. to ~70 col.
	2. Removed the issues: 'rebasing-and-merging' & 'checking-out'
	3. Added issue: 'shallow-cloning'
	4. Separated the issues in individual commits.
	5. Corrected spelling and grammatical mistakes.

I decided to drop the issues mentioned in (2) because of the lack of clarity
in them. As Junio advised, it would be better to improve their respective
documentations rather than adding them in the FAQ.

I really appreciate Junio and Brian for reviewing my patch in such great detail :)

Regards,
Shourya Shukla

Shourya Shukla (4):
  gitfaq: files in .gitignore are tracked
  gitfaq: changing the remote of a repository
  gitfaq: shallow cloning a repository
  gitfaq: fetching and pulling a repository

 Documentation/gitfaq.txt | 86 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/4] gitfaq: files in .gitignore are tracked
  2020-04-21 13:12 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
@ 2020-04-21 13:12 ` Shourya Shukla
  2020-04-21 19:45   ` Junio C Hamano
  2020-04-21 13:12 ` [PATCH v3 2/4] gitfaq: changing the remote of a repository Shourya Shukla
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

Add issue in 'Common Issues' section which addresses the problem of
Git tracking files/paths mentioned in '.gitignore'.

Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 Documentation/gitfaq.txt | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 1cf83df118..96767e7c75 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -223,6 +223,27 @@ a file checked into the repository which is a template or set of defaults which
 can then be copied alongside and modified as appropriate.  This second, modified
 file is usually ignored to prevent accidentally committing it.
 
+[[files-in-.gitignore-are-tracked]]
+I asked Git to ignore various files, yet they are still tracked::
+	Git ignores files matching the patterns stated in the '.gitignore'.
+	Consequently, `git add` does not add the files/paths matching the
+	pattern in `.gitignore`, meaning they remain untracked; `git status`
+	does not list the aforementioned files/paths as untracked.
+
+	One thing to note is that the `.gitignore` mechanism applies only
+	to the files that are not already tracked. A file/path that is
+	already tracked will stay to be tracked even if you add a pattern
+	that happens to match it to `.gitignore` file.
+
+	This is probably the reason why Git shows some files/paths in the
+	staging area. These entities were being tracked before and later
+	were added in the `.gitignore`, due to which they show up in the
+	staging area.
+
+	To completely ignore and untrack files/paths falling in the above
+	category, it is advised to use `git rm --cached <file>` as well as
+	add these files/paths in the `.gitignore`.
+
 Hooks
 -----
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/4] gitfaq: changing the remote of a repository
  2020-04-21 13:12 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
  2020-04-21 13:12 ` [PATCH v3 1/4] gitfaq: files in .gitignore are tracked Shourya Shukla
@ 2020-04-21 13:12 ` Shourya Shukla
  2020-04-21 19:54   ` Junio C Hamano
  2020-04-21 13:12 ` [PATCH v3 3/4] gitfaq: shallow cloning " Shourya Shukla
  2020-04-21 13:12 ` [PATCH v3 4/4] gitfaq: fetching and pulling " Shourya Shukla
  3 siblings, 1 reply; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

Add issue in 'Common Issues' section which addresses the problem of
changing the remote of a repository, covering various cases in which
one might want to change the remote and the ways to do the same.

Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 Documentation/gitfaq.txt | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 96767e7c75..13d37f96af 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -244,6 +244,37 @@ I asked Git to ignore various files, yet they are still tracked::
 	category, it is advised to use `git rm --cached <file>` as well as
 	add these files/paths in the `.gitignore`.
 
+[[changing-remote-of-the-repository]]
+I want to change the remote of my repository. How do I do that?::
+	A remote is an identifier for a location to which Git pushes your
+	changes as well as fetches any new changes from (if any). There
+	might be different circumstances in which one might need to change
+	the remote:
+
+		1. One might want to update the URL of their remote; in that
+		   case, the command to use is, `git remote set-url <name> <newurl>`.
+
+		2. One might want to have two different remotes for fetching
+		   and pushing; this generally happens in case of triangular
+		   workflows: one fetches from one repository and pushes to
+		   another. In this case, it is advisable to have separate
+		   remotes for fetching and pushing. But, another way can be
+		   to change the push URL using the `--push` option in the
+		   `git set-url` command.
+
+		3. One might want to push changes to a network protocol
+		   different from the one they fetch from. For instance,
+		   one may be using an unauthenticated http:// URL for
+		   fetching from a repository and use an ssh:// URL when
+		   you push via the same remote. In such a case, one can
+		   change the 'push' URL of the same remote using the `--push`
+		   option in `git remote set-url`. Now, the same remote will
+		   have two different kinds of URLs (http and ssh) for fetching
+		   and pulling.
++
+One can list the remotes of a repository using `git remote -v` command.
+The default name of a remote is 'origin'.
+
 Hooks
 -----
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 13:12 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
  2020-04-21 13:12 ` [PATCH v3 1/4] gitfaq: files in .gitignore are tracked Shourya Shukla
  2020-04-21 13:12 ` [PATCH v3 2/4] gitfaq: changing the remote of a repository Shourya Shukla
@ 2020-04-21 13:12 ` Shourya Shukla
  2020-04-21 20:00   ` Junio C Hamano
  2020-04-21 13:12 ` [PATCH v3 4/4] gitfaq: fetching and pulling " Shourya Shukla
  3 siblings, 1 reply; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

Add issue in 'Common issue' section which covers issues with cloning
large repositories. Use shallow cloning to clone the repository in
a smaller size.

Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 Documentation/gitfaq.txt | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 13d37f96af..cea293cf07 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -275,6 +275,20 @@ I want to change the remote of my repository. How do I do that?::
 One can list the remotes of a repository using `git remote -v` command.
 The default name of a remote is 'origin'.
 
+[[shallow-cloning]]
+The repository I am trying to clone is too big. Is there an alternative
+way of cloning it in lesser space?::
+	One can clone a repository having a truncated history, meaning the
+	history	will span upto a specified number of commits instead of
+	the whole history of the repository. This is called 'Shallow Cloning'.
+	This helps to decrease the space taken up by the repository.
+	Shallow cloning can be done by using the `--depth` option
+	while cloning. Therefore, the command would look like:
+	`git clone --depth <n> <url>`.
+	Here, 'n' is the depth of the clone. For e.g., a depth of 1
+	would mean fetching only the top level commits of the repository
+	See linkgit:git-clone[1].
+
 Hooks
 -----
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/4] gitfaq: fetching and pulling a repository
  2020-04-21 13:12 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
                   ` (2 preceding siblings ...)
  2020-04-21 13:12 ` [PATCH v3 3/4] gitfaq: shallow cloning " Shourya Shukla
@ 2020-04-21 13:12 ` Shourya Shukla
  3 siblings, 0 replies; 16+ messages in thread
From: Shourya Shukla @ 2020-04-21 13:12 UTC (permalink / raw)
  To: git; +Cc: gitster, sandals, Shourya Shukla

Add an issue in 'Common Issues' section which addresses the confusion
between performing a 'fetch' and a 'pull'.

Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
 Documentation/gitfaq.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index cea293cf07..e93785f2f8 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -289,6 +289,26 @@ way of cloning it in lesser space?::
 	would mean fetching only the top level commits of the repository
 	See linkgit:git-clone[1].
 
+[[fetching-and-pulling]]
+How do I know if I want to do a fetch or a pull?::
+	A fetch brings in the latest changes made upstream (i.e., the
+	remote repository we are working on). This allows us to inspect
+	the changes made upstream and integrate all those changes (if
+	and only if we want to) or only cherry pick certain changes.
+	Fetching does not have any immediate effects on the local
+	repository.
+
+	A pull is a wrapper for a fetch and merge. This means that doing
+	a `git pull` will not only fetch the changes made upstream but
+	integrate them as well with our local repository. The merge may
+	go smoothly or have merge conflicts depending on the case. A pull
+	does not allow you to review any changes made upstream but rather
+	merge those changes on their own.
++
+This is the reason why it is sometimes advised to fetch the changes
+first and then merge them accordingly because not every change might
+be of utility to the user.
+
 Hooks
 -----
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/4] gitfaq: files in .gitignore are tracked
  2020-04-21 13:12 ` [PATCH v3 1/4] gitfaq: files in .gitignore are tracked Shourya Shukla
@ 2020-04-21 19:45   ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-04-21 19:45 UTC (permalink / raw)
  To: Shourya Shukla; +Cc: git, sandals

Shourya Shukla <shouryashukla.oo@gmail.com> writes:

> Add issue in 'Common Issues' section which addresses the problem of
> Git tracking files/paths mentioned in '.gitignore'.

I do not think this much text is warranted in this file.

The first part of Documentation/gitignore.txt *ought* to cover this
material and it does say "specifies intentionally untracked files"
and "already tracked byt Git are not affected".  Read that paragarph
twice, and then jump to the NOTES section it refers two and also
read that twice.  Then let's work on polishing these places if there
is anything unclear.  I think what we have there is clear enough.

And then trim the text we see here down.  The way the question is
phrased may be good as-is (I trust that you researched to make sure
that is how the question is most frequently phrased).  The answer
should be just "see gitignore(5)", or perhaps repeat the first
paragraph of gitignore(5) and then refer to the page, i.e. no more
than

    [[files-in-.gitignore-are-tracked]]
    I asked Git to ignore various files, yet they are still tracked::
            A `gitignore` file specifies intentionally untracked files
            that Git should ignore.  Files already tracked by Git are
            not affected.  See linkgit:gitignore[5] for details.

should be in the FAQ file.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/4] gitfaq: changing the remote of a repository
  2020-04-21 13:12 ` [PATCH v3 2/4] gitfaq: changing the remote of a repository Shourya Shukla
@ 2020-04-21 19:54   ` Junio C Hamano
  2020-04-27 17:30     ` Shourya Shukla
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-04-21 19:54 UTC (permalink / raw)
  To: Shourya Shukla; +Cc: git, sandals

Shourya Shukla <shouryashukla.oo@gmail.com> writes:

> Add issue in 'Common Issues' section which addresses the problem of
> changing the remote of a repository, covering various cases in which
> one might want to change the remote and the ways to do the same.
>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  Documentation/gitfaq.txt | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)

Again, I think this belongs to Documentation/git-remote.txt; unlike
the ".gitignore" one, however, the existing description is heavily
concentrated on "what happens when X is set to Y?" and does not
answer "why would I want to set X to Y in the first place?" very
much.  And the text below you have is a good thing to teach anybody
who learns "git-remote".  

So how about clarifying the existing page, perhaps its DISCUSSION
section (which currently talks only about "how to add a remote, and
configure" without discussing "why would I want to add a remote, set
a URL and/or a pushURL to it") with what you have, and trim the
description here in the FAQ file to the minimum and refer to the
page instead?

> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index 96767e7c75..13d37f96af 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -244,6 +244,37 @@ I asked Git to ignore various files, yet they are still tracked::
>  	category, it is advised to use `git rm --cached <file>` as well as
>  	add these files/paths in the `.gitignore`.
>  
> +[[changing-remote-of-the-repository]]
> +I want to change the remote of my repository. How do I do that?::
> +	A remote is an identifier for a location to which Git pushes your
> +	changes as well as fetches any new changes from (if any). There
> +	might be different circumstances in which one might need to change
> +	the remote:
> +
> +		1. One might want to update the URL of their remote; in that
> +		   case, the command to use is, `git remote set-url <name> <newurl>`.
> +
> +		2. One might want to have two different remotes for fetching
> +		   and pushing; this generally happens in case of triangular
> +		   workflows: one fetches from one repository and pushes to
> +		   another. In this case, it is advisable to have separate
> +		   remotes for fetching and pushing. But, another way can be
> +		   to change the push URL using the `--push` option in the
> +		   `git set-url` command.
> +
> +		3. One might want to push changes to a network protocol
> +		   different from the one they fetch from. For instance,
> +		   one may be using an unauthenticated http:// URL for
> +		   fetching from a repository and use an ssh:// URL when
> +		   you push via the same remote. In such a case, one can
> +		   change the 'push' URL of the same remote using the `--push`
> +		   option in `git remote set-url`. Now, the same remote will
> +		   have two different kinds of URLs (http and ssh) for fetching
> +		   and pulling.
> ++
> +One can list the remotes of a repository using `git remote -v` command.
> +The default name of a remote is 'origin'.
> +
>  Hooks
>  -----

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 13:12 ` [PATCH v3 3/4] gitfaq: shallow cloning " Shourya Shukla
@ 2020-04-21 20:00   ` Junio C Hamano
  2020-04-21 20:43     ` Randall S. Becker
  2020-04-22  0:13     ` Elijah Newren
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-04-21 20:00 UTC (permalink / raw)
  To: Shourya Shukla
  Cc: git, sandals, Derrick Stolee, Elijah Newren, Christian Couder

Shourya Shukla <shouryashukla.oo@gmail.com> writes:

> Add issue in 'Common issue' section which covers issues with cloning
> large repositories. Use shallow cloning to clone the repository in
> a smaller size.
>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  Documentation/gitfaq.txt | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index 13d37f96af..cea293cf07 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -275,6 +275,20 @@ I want to change the remote of my repository. How do I do that?::
>  One can list the remotes of a repository using `git remote -v` command.
>  The default name of a remote is 'origin'.
>  
> +[[shallow-cloning]]
> +The repository I am trying to clone is too big. Is there an alternative
> +way of cloning it in lesser space?::
> +	One can clone a repository having a truncated history, meaning the
> +	history	will span upto a specified number of commits instead of
> +	the whole history of the repository. This is called 'Shallow Cloning'.
> ...

The question is worth keeping but the answer is questionable.

I have a feeling that --depth/shallow is deprecated/frowned upon
these days and more people recommend partial/blob-less clones
instead (a few random people added to Cc: to see if they want to say
something here).  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 20:00   ` Junio C Hamano
@ 2020-04-21 20:43     ` Randall S. Becker
  2020-04-21 20:57       ` Junio C Hamano
  2020-04-22  1:30       ` Derrick Stolee
  2020-04-22  0:13     ` Elijah Newren
  1 sibling, 2 replies; 16+ messages in thread
From: Randall S. Becker @ 2020-04-21 20:43 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Shourya Shukla'
  Cc: git, sandals, 'Derrick Stolee', 'Elijah Newren',
	'Christian Couder'

On April 21, 2020 4:01 PM, Junio C Hamano
> Subject: Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
> 
> Shourya Shukla <shouryashukla.oo@gmail.com> writes:
> 
> > Add issue in 'Common issue' section which covers issues with cloning
> > large repositories. Use shallow cloning to clone the repository in a
> > smaller size.
> >
> > Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> > ---
> >  Documentation/gitfaq.txt | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt index
> > 13d37f96af..cea293cf07 100644
> > --- a/Documentation/gitfaq.txt
> > +++ b/Documentation/gitfaq.txt
> > @@ -275,6 +275,20 @@ I want to change the remote of my repository.
> How do I do that?::
> >  One can list the remotes of a repository using `git remote -v` command.
> >  The default name of a remote is 'origin'.
> >
> > +[[shallow-cloning]]
> > +The repository I am trying to clone is too big. Is there an
> > +alternative way of cloning it in lesser space?::
> > +	One can clone a repository having a truncated history, meaning the
> > +	history	will span upto a specified number of commits instead of
> > +	the whole history of the repository. This is called 'Shallow
Cloning'.
> > ...
> 
> The question is worth keeping but the answer is questionable.
> 
> I have a feeling that --depth/shallow is deprecated/frowned upon these
days
> and more people recommend partial/blob-less clones instead (a few random
> people added to Cc: to see if they want to say something here).

I rather hate to chime in as a dissenting opinion, but the --depth/shallow
clone is very useful when git is being used as an artifact repository for
production. The shallow clone allows only the production branch HEAD to be
cloned into production/staging areas and limits the visible history for
staff who do not want to go through a potentially long trail during
time-sensitive operations (a.k.a. production installs). There are also space
and policy constraints in some of these environments where they do not want
to have ongoing visibility to non-production commit paths. When the *stuff*
hits the fan, then it's good to be able to fetch everything (or a limited
set). I would be very disappointed to see --depth frowned upon.

Regards,
Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 20:43     ` Randall S. Becker
@ 2020-04-21 20:57       ` Junio C Hamano
  2020-04-21 21:25         ` Randall S. Becker
  2020-04-22  1:30       ` Derrick Stolee
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-04-21 20:57 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Shourya Shukla', git, sandals, 'Derrick Stolee',
	'Elijah Newren', 'Christian Couder'

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

>> I have a feeling that --depth/shallow is deprecated/frowned upon these
> days
>> and more people recommend partial/blob-less clones instead (a few random
>> people added to Cc: to see if they want to say something here).
>
> I rather hate to chime in as a dissenting opinion,...

Oh, don't hate anything.  It is greatly appreciated so that we can
cover "in such and such use case, this solution is good" variants
for similarly-sounding-but-fundamentally-different classes of
problems.  We do not want to give a spinal-reflex answer of "use
shallow" (or "use partial", for that matter) to "too large a repo"
question without contexts that guide the readers to a better choice.

That is where a well-organized FAQ list shines.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 20:57       ` Junio C Hamano
@ 2020-04-21 21:25         ` Randall S. Becker
  0 siblings, 0 replies; 16+ messages in thread
From: Randall S. Becker @ 2020-04-21 21:25 UTC (permalink / raw)
  To: 'Junio C Hamano'
  Cc: 'Shourya Shukla', git, sandals, 'Derrick Stolee',
	'Elijah Newren', 'Christian Couder'

On April 21, 2020 4:58 PM, Junio C Hamano wrote:
> Subject: Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
> >> I have a feeling that --depth/shallow is deprecated/frowned upon
> >> these
> > days
> >> and more people recommend partial/blob-less clones instead (a few
> >> random people added to Cc: to see if they want to say something here).
> >
> > I rather hate to chime in as a dissenting opinion,...
> 
> Oh, don't hate anything.  It is greatly appreciated so that we can cover
"in
> such and such use case, this solution is good" variants for
similarly-sounding-
> but-fundamentally-different classes of problems.  We do not want to give a
> spinal-reflex answer of "use shallow" (or "use partial", for that matter)
to
> "too large a repo"
> question without contexts that guide the readers to a better choice.
> 
> That is where a well-organized FAQ list shines.

I have spoken on this topic and can probably share some of it.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 20:00   ` Junio C Hamano
  2020-04-21 20:43     ` Randall S. Becker
@ 2020-04-22  0:13     ` Elijah Newren
  1 sibling, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2020-04-22  0:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shourya Shukla, Git Mailing List, brian m. carlson,
	Derrick Stolee, Christian Couder

On Tue, Apr 21, 2020 at 1:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Shourya Shukla <shouryashukla.oo@gmail.com> writes:
>
> > Add issue in 'Common issue' section which covers issues with cloning
> > large repositories. Use shallow cloning to clone the repository in
> > a smaller size.
> >
> > Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> > ---
> >  Documentation/gitfaq.txt | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> > index 13d37f96af..cea293cf07 100644
> > --- a/Documentation/gitfaq.txt
> > +++ b/Documentation/gitfaq.txt
> > @@ -275,6 +275,20 @@ I want to change the remote of my repository. How do I do that?::
> >  One can list the remotes of a repository using `git remote -v` command.
> >  The default name of a remote is 'origin'.
> >
> > +[[shallow-cloning]]
> > +The repository I am trying to clone is too big. Is there an alternative
> > +way of cloning it in lesser space?::
> > +     One can clone a repository having a truncated history, meaning the
> > +     history will span upto a specified number of commits instead of
> > +     the whole history of the repository. This is called 'Shallow Cloning'.
> > ...
>
> The question is worth keeping but the answer is questionable.
>
> I have a feeling that --depth/shallow is deprecated/frowned upon
> these days and more people recommend partial/blob-less clones
> instead (a few random people added to Cc: to see if they want to say
> something here).

I don't have a problem with us saying we have to support shallow
clones for years or decades more, but I personally strongly dislike
advertising it, for multiple reasons:

* From an internal perspective: The shallow clone implementation feels
like a hack that isn't extensible and doesn't work with other
features, and as far as I can tell that's intrinsic to its design.

* From an end-user perspective: Shallow clones are heavily misused,
oversold, and induce or perpetuate various misunderstandings.  CI
systems seem especially keen on turning on shallow clones whether
requested or not and breaking all sorts of things from simple (like
'git describe') to the more complex (like merge this branch with
master and run tests there too to avoid breaks due to semantic
conflicts) and all sorts of things inbetween (e.g. when trying to
'debug with SSH' the user can't look around in the repo because it's
all missing).  Despite the huge waste of time projects induce by
defaulting this on and sometimes making it hard to turn off, is to
'save space' and they often sell it as a dramatic savings.  B if it's
a standard source code repository then usually you save only about 50%
of the overall download size. (Years ago, I used to like to point out
that a git clone of a repo would only be marginally bigger than a svn
checkout, despite having 'all history', and had a handful of repos
where I had measured the cost to back it up).  The way CI folks talk
about shallow clone makes people assume that 'all history' is
hundreds/thousands of times bigger than the most recent checkout.  The
only case where 'saving size' seems to matter is either the special
simple cases that have really simple builds (though they tend to be
small enough that the size doesn't matter anyway) or for repos that
have accidentally committed huge files in their history that are no
longer present in new versions.  But because shallow clones are touted
so much, people come to perceive the cost of 'all history' as being a
very onerous requirement in git.  And the perception seems to be
sticking in lots of places.  I can sometimes go dig out facts for a
repository in question to show people the differences in sizes and
dispel some of this, but that's a one-by-one case.  I think these
misunderstandings hurt us as a community.

* Diversion of resources: Even though there are current valid usecases
for shallow clones (e.g. Randall sounds like he has some), advertising
this feature is going to make it harder for us to focus efforts on the
better designed solutions we want to implement and extend.  Perhaps a
funny story is in order: At $FORMER_JOB, we made software used by
various groups on supercomputers (or high performance computers, or
however you want to refer to that class of many machines).  One
customer requested support on Itanium machines, and we made the
necessary (though painful) adjustments.  At some point we decide to
list our supported platforms on the DVDs we sent out.  Then at some
point, the Navy decides they're going to buy some nice
supercomputer(s).  They want to use our software, but also want to use
general well-supported industry standard hardware.  They put out a
purchase order for $100 million (I don't remember the real number but
it was large), and overlooked specifying the computer architecture.
Vendors who were just about to retire the very last Itanium chips and
were literally going to just scrap the rest of their inventory notice
this purchase order, bid on the procurement at dirt cheap prices, and
then the Navy is stuck because of "don't waste taxpayer dollars" and
"procurement has to be fair".  They need those machines to work for
several years.  Anyone who provides them software has to support that
architecture for several more years, but the vendors would not sell
any more Itanium machines after that even if you begged, so we were
working with some really old Itanium machines that didn't have enough
power to run the basic regression test in under 24 hours.  The last
sysadmin at $FORMER_JOB with the necessary qualifications to actually
maintain those systems (not just knowledge but red tape box checking
too; this was government after all) was retiring about a year and a
half before the mandatory support period ended for us as well.  We
found out at some point that they checked our requirements before
putting out the purchase order; had our DVDs only advertised support
for x86_64, the whole debacle could have been avoided.

Yes we totally need to support shallow clones (I brought them up as a
concern for fetch.writeCommitGraphs just last week after all), but I
really don't want to advertise them, and if we need to in some way,
then minimize it.

Anyway, that's my $0.02.

Elijah

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-21 20:43     ` Randall S. Becker
  2020-04-21 20:57       ` Junio C Hamano
@ 2020-04-22  1:30       ` Derrick Stolee
  2020-04-22  4:00         ` Jonathan Nieder
  1 sibling, 1 reply; 16+ messages in thread
From: Derrick Stolee @ 2020-04-22  1:30 UTC (permalink / raw)
  To: Randall S. Becker, 'Junio C Hamano',
	'Shourya Shukla'
  Cc: git, sandals, 'Derrick Stolee', 'Elijah Newren',
	'Christian Couder'

On 4/21/2020 4:43 PM, Randall S. Becker wrote:
> On April 21, 2020 4:01 PM, Junio C Hamano
>> Subject: Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
>>
>> Shourya Shukla <shouryashukla.oo@gmail.com> writes:
>>
>>> Add issue in 'Common issue' section which covers issues with cloning
>>> large repositories. Use shallow cloning to clone the repository in a
>>> smaller size.
>>>
>>> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
>>> ---
>>>  Documentation/gitfaq.txt | 14 ++++++++++++++
>>>  1 file changed, 14 insertions(+)
>>>
>>> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt index
>>> 13d37f96af..cea293cf07 100644
>>> --- a/Documentation/gitfaq.txt
>>> +++ b/Documentation/gitfaq.txt
>>> @@ -275,6 +275,20 @@ I want to change the remote of my repository.
>> How do I do that?::
>>>  One can list the remotes of a repository using `git remote -v` command.
>>>  The default name of a remote is 'origin'.
>>>
>>> +[[shallow-cloning]]
>>> +The repository I am trying to clone is too big. Is there an
>>> +alternative way of cloning it in lesser space?::
>>> +	One can clone a repository having a truncated history, meaning the
>>> +	history	will span upto a specified number of commits instead of
>>> +	the whole history of the repository. This is called 'Shallow
> Cloning'.
>>> ...
>>
>> The question is worth keeping but the answer is questionable.
>>
>> I have a feeling that --depth/shallow is deprecated/frowned upon these
> days
>> and more people recommend partial/blob-less clones instead (a few random
>> people added to Cc: to see if they want to say something here).
> 
> I rather hate to chime in as a dissenting opinion, but the --depth/shallow
> clone is very useful when git is being used as an artifact repository for
> production. 

It is important, then, to mention what the _real_ uses for shallow clones.

They are great for getting just the working directory at tip for a throwaway
action (like building an artifact, or just taking a static copy of something)
but it is a _terrible_ way to start working on source code for a project that
you intend to use for daily work.

The way this is worded in the FAQ will lead users to have a bad experience
and we should recommend partial clone (--filter=blob:none) instead.

Of course, with the speedups from reachability bitmaps, it is sometimes
_faster_ to do a partial clone than a shallow clone. (It definitely takes
less time in the "counting objects" phase, and the cost of downloading
all commits and trees might be small enough on top of the necessary blob
data to keep the total cost under a shallow clone. Your mileage may vary.)
Because the cost of a partial clone is "comparable" to shallow clone, I
would almost recommend partial clone over shallow clones 95% of the time,
even in scenarios like automated builds on cloud-hosted VMs.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  2020-04-22  1:30       ` Derrick Stolee
@ 2020-04-22  4:00         ` Jonathan Nieder
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Nieder @ 2020-04-22  4:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Randall S. Becker, 'Junio C Hamano',
	'Shourya Shukla', git, sandals, 'Derrick Stolee',
	'Elijah Newren', 'Christian Couder'

Derrick Stolee wrote:

> Of course, with the speedups from reachability bitmaps, it is sometimes
> _faster_ to do a partial clone than a shallow clone. (It definitely takes
> less time in the "counting objects" phase, and the cost of downloading
> all commits and trees might be small enough on top of the necessary blob
> data to keep the total cost under a shallow clone. Your mileage may vary.)
> Because the cost of a partial clone is "comparable" to shallow clone, I
> would almost recommend partial clone over shallow clones 95% of the time,
> even in scenarios like automated builds on cloud-hosted VMs.

By the way, an idea for the interested (#leftoverbits?):

It would be possible to emulate the shallow clone experience making
use of the partial clone protocol.  That is, fetch a full history
without blobs but record the "shallows" somewhere and make user-facing
traversals like "git log" stop there (similar to the effect "git
replace" has on user-facing traversals).  Then later fetches would be
able to take advantage of the full commit history, but scripts and
muscle memory (e.g., the assumption that most commands will never
contact the remote) that assume a shallow clone would continue to
work.

Would that be useful or interesting to people?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/4] gitfaq: changing the remote of a repository
  2020-04-21 19:54   ` Junio C Hamano
@ 2020-04-27 17:30     ` Shourya Shukla
  0 siblings, 0 replies; 16+ messages in thread
From: Shourya Shukla @ 2020-04-27 17:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sandals

On 21/04 12:54, Junio C Hamano wrote:
> Shourya Shukla <shouryashukla.oo@gmail.com> writes:
> 
> > Add issue in 'Common Issues' section which addresses the problem of
> > changing the remote of a repository, covering various cases in which
> > one might want to change the remote and the ways to do the same.
> >
> > Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> > ---
> >  Documentation/gitfaq.txt | 31 +++++++++++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> 
> Again, I think this belongs to Documentation/git-remote.txt; unlike
> the ".gitignore" one, however, the existing description is heavily
> concentrated on "what happens when X is set to Y?" and does not
> answer "why would I want to set X to Y in the first place?" very
> much.  And the text below you have is a good thing to teach anybody
> who learns "git-remote".  
> 
> So how about clarifying the existing page, perhaps its DISCUSSION
> section (which currently talks only about "how to add a remote, and
> configure" without discussing "why would I want to add a remote, set
> a URL and/or a pushURL to it") with what you have, and trim the
> description here in the FAQ file to the minimum and refer to the
> page instead?

Yep, it seems reasonable. So a good strategy would be to append the
'DISCUSSION' section with what I have added in the FAQ and quoting a
couple of lines from the documentation, providing the solution and
giving a further reference to the Documentation right?

	A remote is an identifier for a location to which Git pushes
	your changes as well as fetches any new changes from (if any).

	To change the remote of your repository, you may want to
	execute:
		git remote set-url <name> <newurl>

Something along the above lines? I think that a generic user will mostly
find this as a solution to their problem instead of using the '--push'
option to specify a different push URL.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-27 17:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 13:12 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
2020-04-21 13:12 ` [PATCH v3 1/4] gitfaq: files in .gitignore are tracked Shourya Shukla
2020-04-21 19:45   ` Junio C Hamano
2020-04-21 13:12 ` [PATCH v3 2/4] gitfaq: changing the remote of a repository Shourya Shukla
2020-04-21 19:54   ` Junio C Hamano
2020-04-27 17:30     ` Shourya Shukla
2020-04-21 13:12 ` [PATCH v3 3/4] gitfaq: shallow cloning " Shourya Shukla
2020-04-21 20:00   ` Junio C Hamano
2020-04-21 20:43     ` Randall S. Becker
2020-04-21 20:57       ` Junio C Hamano
2020-04-21 21:25         ` Randall S. Becker
2020-04-22  1:30       ` Derrick Stolee
2020-04-22  4:00         ` Jonathan Nieder
2020-04-22  0:13     ` Elijah Newren
2020-04-21 13:12 ` [PATCH v3 4/4] gitfaq: fetching and pulling " Shourya Shukla
  -- strict thread matches above, loose matches on Subject: below --
2020-04-21 13:11 [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).