git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] gc docs: modernize and fix the documentation
@ 2019-03-18 16:14 Ævar Arnfjörð Bjarmason
  2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
                   ` (15 more replies)
  0 siblings, 16 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 16:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Ævar Arnfjörð Bjarmason

I've been annoyed by the "gc" docs for a while. This fixes most of the
issues I've noticed, and also removes the duplication between the "gc"
variables in git-config(1) and in git-gc(1), which was made possible
by Duy's Documentation/config/* series.

This should make the "gc" docs more awesome, and due to removing the
de-duplication results in a net deletion of lines. Yay.

The only thing I was on the fence about was removing the 'gc
"refs/remotes/*' config example, but I think the remaining docs
explain it well enough. It can be added back if someone insists...

Now, I was originally going to have 5 patches in this series by
modernizing the "NOTES" section, but that didn't make it in.

This series is unrelated (and does not conflict with) my in-flight gc
contention series
(https://public-inbox.org/git/20190315155959.12390-1-avarab@gmail.com/),
but the "git-gc" docs should be updated to discuss the
core.filesRefLockTimeout option and how it impacts contention, see 8/8
in that series for context. I.e. "you may have contention, but
core.filesRefLockTimeout can mitigate blah blah".

I was going to do that, but then thought that we should also mention
that on the server-side we mitigate most/all of the contention via the
quarantine, see "QUARANTINE ENVIRONMENT" in
git-receive-pack(1). I.e. we:

 1. Get the temp pack
 2. OK it (fsck, hooks etc.)
 3. Move *complete* previously temp packs over
 4. Update the refs

I.e. we are immune from the "concurrently with another process" race,
but of course something concurrently updating the "server" repo
without a quarantine environment may be subject to that race.

The only problem is that the last couple of paragraphs may be
wrong. That's just my understanding from a brief reading of
722ff7f876c ("receive-pack: quarantine objects until pre-receive
accepts", 2016-10-03) so I didn't want to include that in this
series. Peff (or others), any comments?

Ævar Arnfjörð Bjarmason (4):
  gc docs: modernize the advice for manually running "gc"
  gc docs: include the "gc.*" section from "config" in "gc"
  gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
  gc docs: downplay the usefulness of --aggressive

 Documentation/config/gc.txt |  39 ++++++++++--
 Documentation/git-gc.txt    | 120 +++++++++---------------------------
 2 files changed, 65 insertions(+), 94 deletions(-)

-- 
2.21.0.360.g471c308f928


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH 1/4] gc docs: modernize the advice for manually running "gc"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
@ 2019-03-18 16:14 ` Ævar Arnfjörð Bjarmason
  2019-03-18 21:27   ` Jeff King
  2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 16:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Ævar Arnfjörð Bjarmason

The docs have been recommending that users need to run this manually,
but that hasn't been needed in practice for a long time.

Let's instead have this reflect reality and say that most users don't
need to run this manually at all.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index a7c1b0f60ed..cc82971022e 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -20,13 +20,17 @@ created from prior invocations of 'git add', packing refs, pruning
 reflog, rerere metadata or stale working trees. May also update ancillary
 indexes such as the commit-graph.
 
-Users are encouraged to run this task on a regular basis within
-each repository to maintain good disk space utilization and good
-operating performance.
+Most users should not have to run this command manually. When common
+porcelain operations that create objects are run, such as
+linkgit:git-commit[1] and linkgit:git-fetch[1], `git gc --auto` will
+be run automatically.
 
-Some git commands may automatically run 'git gc'; see the `--auto` flag
-below for details. If you know what you're doing and all you want is to
-disable this behavior permanently without further considerations, just do:
+You should only need to run `git gc` manually when adding objects to a
+repository without regularly running such porcelain commands. Another
+use-case is wanting to do a one-off repository optimization.
+
+If you know what you're doing and all you want is to disable automatic
+runs, do:
 
 ----------------------
 $ git config --global gc.auto 0
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
  2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-18 16:15 ` Ævar Arnfjörð Bjarmason
  2019-03-18 21:31   ` Jeff King
  2019-03-19  2:08   ` Duy Nguyen
  2019-03-18 16:15 ` [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION" Ævar Arnfjörð Bjarmason
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 16:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Ævar Arnfjörð Bjarmason

Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 12 +++++++
 Documentation/git-gc.txt    | 65 +++----------------------------------
 2 files changed, 16 insertions(+), 61 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index c6fbb8a96f9..a834a801cd6 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -2,11 +2,17 @@ gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 50.
++
+See the documentation for the `--depth` option in
+linkgit:git-repack[1] for more details.
 
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 250.
++
+See the documentation for the `--window` option in
+linkgit:git-repack[1] for more details.
 
 gc.auto::
 	When there are approximately more than this many loose
@@ -94,6 +100,12 @@ gc.<pattern>.reflogExpireUnreachable::
 	With "<pattern>" (e.g. "refs/stash")
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
++
+These types of entries are generally created as a result of using `git
+commit --amend` or `git rebase` and are the commits prior to the amend
+or rebase occurring. Since these changes are not part of the current
+project history most users will want to expire them sooner, which is
+why the default is more aggressive than `gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index cc82971022e..9edf4e465b4 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -29,8 +29,7 @@ You should only need to run `git gc` manually when adding objects to a
 repository without regularly running such porcelain commands. Another
 use-case is wanting to do a one-off repository optimization.
 
-If you know what you're doing and all you want is to disable automatic
-runs, do:
+If you know what you're doing and want to disable automatic runs, do:
 
 ----------------------
 $ git config --global gc.auto 0
@@ -103,66 +102,10 @@ be performed as well.
 CONFIGURATION
 -------------
 
-The optional configuration variable `gc.reflogExpire` can be
-set to indicate how long historical entries within each branch's
-reflog should remain available in this repository.  The setting is
-expressed as a length of time, for example '90 days' or '3 months'.
-It defaults to '90 days'.
-
-The optional configuration variable `gc.reflogExpireUnreachable`
-can be set to indicate how long historical reflog entries which
-are not part of the current branch should remain available in
-this repository.  These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner.  This option defaults to '30 days'.
-
-The above two configuration variables can be given to a pattern.  For
-example, this sets non-default expiry values only to remote-tracking
-branches:
-
-------------
-[gc "refs/remotes/*"]
-	reflogExpire = never
-	reflogExpireUnreachable = 3 days
-------------
-
-The optional configuration variable `gc.rerereResolved` indicates
-how long records of conflicted merge you resolved earlier are
-kept.  This defaults to 60 days.
-
-The optional configuration variable `gc.rerereUnresolved` indicates
-how long records of conflicted merge you have not resolved are
-kept.  This defaults to 15 days.
-
-The optional configuration variable `gc.packRefs` determines if
-'git gc' runs 'git pack-refs'. This can be set to "notbare" to enable
-it within all non-bare repos or it can be set to a boolean value.
-This defaults to true.
-
-The optional configuration variable `gc.writeCommitGraph` determines if
-'git gc' should run 'git commit-graph write'. This can be set to a
-boolean value. This defaults to false.
-
-The optional configuration variable `gc.aggressiveWindow` controls how
-much time is spent optimizing the delta compression of the objects in
-the repository when the --aggressive option is specified.  The larger
-the value, the more time is spent optimizing the delta compression.  See
-the documentation for the --window option in linkgit:git-repack[1] for
-more details.  This defaults to 250.
-
-Similarly, the optional configuration variable `gc.aggressiveDepth`
-controls --depth option in linkgit:git-repack[1]. This defaults to 50.
-
-The optional configuration variable `gc.pruneExpire` controls how old
-the unreferenced loose objects have to be before they are pruned.  The
-default is "2 weeks ago".
-
-Optional configuration variable `gc.worktreePruneExpire` controls how
-old a stale working tree should be before `git worktree prune` deletes
-it. Default is "3 months ago".
+The below documentation is the same as what's found in
+linkgit:git-config[1]:
 
+include::config/gc.txt[]
 
 NOTES
 -----
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
  2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
  2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-18 16:15 ` Ævar Arnfjörð Bjarmason
  2019-03-18 21:49   ` Jeff King
  2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 16:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Ævar Arnfjörð Bjarmason

In an earlier commit I started including the "gc.*" documentation from
git-config(1) in the git-gc(1) documentation. That still left us in a
state where the "--auto" option and "gc.auto" were redundantly
discussing the same thing.

Fix that by briefly discussing how the option itself works for
"--auto", and for the rest referring to the configuration
documentation.

This revealed existing blind spots in the configuration documentation,
move over the documentation and reword as appropriate.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 27 +++++++++++++++++++++++----
 Documentation/git-gc.txt    | 25 ++++---------------------
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index a834a801cd6..605e14bc80b 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -19,13 +19,27 @@ gc.auto::
 	objects in the repository, `git gc --auto` will pack them.
 	Some Porcelain commands use this command to perform a
 	light-weight garbage collection from time to time.  The
-	default value is 6700.  Setting this to 0 disables it.
+	default value is 6700.
++
+Setting this to 0 disables not only automatic packing based on the
+number of loose objects, but any other heuristic `git gc --auto` will
+otherwise use to determine if there's work to do, such as
+`gc.autoPackLimit`.
++
+The repacking of loose objects will be performed with `git repack -d
+-l`.
 
 gc.autoPackLimit::
+
 	When there are more than this many packs that are not
 	marked with `*.keep` file in the repository, `git gc
 	--auto` consolidates them into one larger pack.  The
-	default	value is 50.  Setting this to 0 disables it.
+	default value is 50.  Setting this (or `gc.auto`) to 0
+	disables it. Packs will be consolidated using the `-A` option
+	of `git repack`.
++
+See the `gc.bigPackThreshold` configuration variable below. When in
+use it'll effect how the auto pack limit works.
 
 gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
@@ -35,13 +49,18 @@ gc.bigPackThreshold::
 	If non-zero, all packs larger than this limit are kept when
 	`git gc` is run. This is very similar to `--keep-base-pack`
 	except that all packs that meet the threshold are kept, not
-	just the base pack. Defaults to zero. Common unit suffixes of
-	'k', 'm', or 'g' are supported.
+	just the base pack. Defaults to zero or a memory heuristic.
+	Common unit suffixes of 'k', 'm', or 'g' are supported.
 +
 Note that if the number of kept packs is more than gc.autoPackLimit,
 this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
++
+If the amount of memory is estimated not enough for `git repack` to
+run smoothly and `gc.bigPackThreshold` is not set, the largest pack
+will also be excluded (which is the equivalent of running `git gc`
+with `--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 9edf4e465b4..154c7c5e652 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -49,29 +49,12 @@ OPTIONS
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
 	required; if not, it exits without performing any work.
-	Some git commands run `git gc --auto` after performing
-	operations that could create many loose objects. Housekeeping
-	is required if there are too many loose objects or too many
-	packs in the repository.
 +
-If the number of loose objects exceeds the value of the `gc.auto`
-configuration variable, then all loose objects are combined into a
-single pack using `git repack -d -l`.  Setting the value of `gc.auto`
-to 0 disables automatic packing of loose objects.
+See the `gc.auto' option in the "CONFIGURATION" below for how this
+heuristic works.
 +
-If the number of packs exceeds the value of `gc.autoPackLimit`,
-then existing packs (except those marked with a `.keep` file
-or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack by using the `-A` option of
-'git repack'.
-If the amount of memory is estimated not enough for `git repack` to
-run smoothly and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-packs.
-+
-If houskeeping is required due to many loose objects or packs, all
+Once housekeeping is triggered by exceeding the limits of
+configurations options such as `gc.auto` and `gc.autoPackLimit`, all
 other housekeeping tasks (e.g. rerere, working trees, reflog...) will
 be performed as well.
 
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2019-03-18 16:15 ` [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION" Ævar Arnfjörð Bjarmason
@ 2019-03-18 16:15 ` Ævar Arnfjörð Bjarmason
  2019-03-18 20:28   ` Jonathan Nieder
  2019-03-19  6:54   ` Johannes Sixt
  2019-03-18 21:51 ` [PATCH 0/4] gc docs: modernize and fix the documentation Jeff King
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 16:15 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Ævar Arnfjörð Bjarmason

The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. In reality it's a waste of CPU for
most users, and may even make things actively worse. I've personally
talked to many users who've taken these docs as an advice to use this
option, and have.

Let's change this documentation to better reflect reality, i.e. for
most users using --aggressive is a waste of time, and may even be
actively making things worse.

Let's also clarify the "The effects [...] are persistent" to clearly
note that that's true to the extent that subsequent gc's aren't going
to re-roll existing packs generated with --aggressive into a new set
of packs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 154c7c5e652..d0eaba98db5 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -41,10 +41,20 @@ OPTIONS
 --aggressive::
 	Usually 'git gc' runs very quickly while providing good disk
 	space utilization and performance.  This option will cause
-	'git gc' to more aggressively optimize the repository at the expense
-	of taking much more time.  The effects of this optimization are
-	persistent, so this option only needs to be used occasionally; every
-	few hundred changesets or so.
+	'git gc' to more aggressively optimize the repository to save storage space
+	at the expense of taking much more time.
++
+Using this option may optimize for disk space at the expense of
+runtime performance. See the `--depth` and `--window` documentation in
+linkgit:git-repack[1]. It is not recommended that this option be used
+to improve performance for a given repository without running tailored
+performance benchmarks on it. It may make things better, or worse. Not
+using this at all is the right trade-off for most users and their
+repositories.
++
+The effects of this option are persistent to the extent that
+`gc.autoPackLimit` and friends don't cause a consolidation of existing
+pack(s) generated with this option.
 
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
@ 2019-03-18 20:28   ` Jonathan Nieder
  2019-03-18 21:22     ` Jeff King
  2019-03-19  6:54   ` Johannes Sixt
  1 sibling, 1 reply; 64+ messages in thread
From: Jonathan Nieder @ 2019-03-18 20:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Jeff King, Michael Haggerty, Stefan Beller, Matt McCutchen

Hi,

Ævar Arnfjörð Bjarmason wrote:

> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -41,10 +41,20 @@ OPTIONS
>  --aggressive::
>  	Usually 'git gc' runs very quickly while providing good disk
>  	space utilization and performance.  This option will cause
> -	'git gc' to more aggressively optimize the repository at the expense
> -	of taking much more time.  The effects of this optimization are
> -	persistent, so this option only needs to be used occasionally; every
> -	few hundred changesets or so.
> +	'git gc' to more aggressively optimize the repository to save storage space
> +	at the expense of taking much more time.

This part looks good.

> ++
> +Using this option may optimize for disk space at the expense of
> +runtime performance. See the `--depth` and `--window` documentation in
> +linkgit:git-repack[1]. It is not recommended that this option be used
> +to improve performance for a given repository without running tailored
> +performance benchmarks on it. It may make things better, or worse. Not
> +using this at all is the right trade-off for most users and their
> +repositories.

This part kind of feels like giving up.  Can we make --aggressive have
good runtime read performance so we don't have to hedge this way?
E.g. is this patch papering over a poor choice of --depth setting in
--aggressive?

> ++
> +The effects of this option are persistent to the extent that
> +`gc.autoPackLimit` and friends don't cause a consolidation of existing
> +pack(s) generated with this option.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 20:28   ` Jonathan Nieder
@ 2019-03-18 21:22     ` Jeff King
  2019-03-18 22:13       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-18 21:22 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Michael Haggerty,
	Stefan Beller, Matt McCutchen

On Mon, Mar 18, 2019 at 01:28:24PM -0700, Jonathan Nieder wrote:

> > +Using this option may optimize for disk space at the expense of
> > +runtime performance. See the `--depth` and `--window` documentation in
> > +linkgit:git-repack[1]. It is not recommended that this option be used
> > +to improve performance for a given repository without running tailored
> > +performance benchmarks on it. It may make things better, or worse. Not
> > +using this at all is the right trade-off for most users and their
> > +repositories.
> 
> This part kind of feels like giving up.  Can we make --aggressive have
> good runtime read performance so we don't have to hedge this way?
> E.g. is this patch papering over a poor choice of --depth setting in
> --aggressive?

I thought we already did that, in 07e7dbf0db (gc: default aggressive
depth to 50, 2016-08-11). As far as I know, "--aggressive" produces
packs with similar runtime performance.

It is possible, if it finds more deltas due to the larger window, that
we'd spend more time accessing those deltas. But if the chains aren't
long, the base cache tends to perform well, and delta reconstruction is
about the same cost as zlib inflating. And we have a smaller disk cache
footprint.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] gc docs: modernize the advice for manually running "gc"
  2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-18 21:27   ` Jeff King
  2019-03-18 22:18     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-18 21:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 05:14:59PM +0100, Ævar Arnfjörð Bjarmason wrote:

> The docs have been recommending that users need to run this manually,
> but that hasn't been needed in practice for a long time.
> 
> Let's instead have this reflect reality and say that most users don't
> need to run this manually at all.

Yeah, I think this makes sense.

> -Users are encouraged to run this task on a regular basis within
> -each repository to maintain good disk space utilization and good
> -operating performance.
> +Most users should not have to run this command manually. When common
> +porcelain operations that create objects are run, such as
> +linkgit:git-commit[1] and linkgit:git-fetch[1], `git gc --auto` will
> +be run automatically.

This is in the description, before "--auto" is introduced. I wonder if
it is worth just describing it briefly, like:

  When common porcelain operations that creates objects are run, they
  will check whether the repository has grown substantially since the
  last maintenance, and if so run `git gc` automatically.

That gives a first-time reader an idea of whether they need to care
about this command without having to dig into what "--auto" is.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-18 21:31   ` Jeff King
  2019-03-21 22:11     ` Andreas Heiduk
  2019-03-19  2:08   ` Duy Nguyen
  1 sibling, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-18 21:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 05:15:00PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Rather than duplicating the documentation for the various "gc" options
> let's include the "gc" docs from git-config. They were mostly better
> already, and now we don't have the same docs in two places with subtly
> different wording.
> 
> In the cases where the git-gc(1) docs were saying something the "gc"
> docs in git-config(1) didn't cover move the relevant section over to
> the git-config(1) docs.

Makes sense.

I think we lose the actual example for gc.*.reflogExpire:

> -The above two configuration variables can be given to a pattern.  For
> -example, this sets non-default expiry values only to remote-tracking
> -branches:
> -
> -------------
> -[gc "refs/remotes/*"]
> -	reflogExpire = never
> -	reflogExpireUnreachable = 3 days
> -------------

I don't actually think it's that big a loss. If we wanted to retain it,
though, it might make sense in the "EXAMPLES" section.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
  2019-03-18 16:15 ` [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION" Ævar Arnfjörð Bjarmason
@ 2019-03-18 21:49   ` Jeff King
  2019-03-18 22:48     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-18 21:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 05:15:01PM +0100, Ævar Arnfjörð Bjarmason wrote:

> In an earlier commit I started including the "gc.*" documentation from
> git-config(1) in the git-gc(1) documentation. That still left us in a
> state where the "--auto" option and "gc.auto" were redundantly
> discussing the same thing.
> 
> Fix that by briefly discussing how the option itself works for
> "--auto", and for the rest referring to the configuration
> documentation.
> 
> This revealed existing blind spots in the configuration documentation,
> move over the documentation and reword as appropriate.

Nice improvement. A few comments:

> diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
> index a834a801cd6..605e14bc80b 100644
> --- a/Documentation/config/gc.txt
> +++ b/Documentation/config/gc.txt
> @@ -19,13 +19,27 @@ gc.auto::
>  	objects in the repository, `git gc --auto` will pack them.
>  	Some Porcelain commands use this command to perform a
>  	light-weight garbage collection from time to time.  The
> -	default value is 6700.  Setting this to 0 disables it.
> +	default value is 6700.
> ++
> +Setting this to 0 disables not only automatic packing based on the
> +number of loose objects, but any other heuristic `git gc --auto` will
> +otherwise use to determine if there's work to do, such as
> +`gc.autoPackLimit`.
> ++
> +The repacking of loose objects will be performed with `git repack -d
> +-l`.

I know this last sentence came from the existing documentation, but I
wonder if we should be more vague here. We'd pack with "repack -dl" when
we have just loose objects, and "repack -Adl" when we have too many
packs. Or "repack -adl" if we're pruning now, and "--unpack-unreachable"
otherwise.

I think the point of git-gc is that you don't have to care about that
stuff. It works magically, and if you are implementing your own custom
gc scheme, then you are probably better off reading the output of
GIT_TRACE or looking at the source, rather than this documentation.

>  gc.autoPackLimit::
> +
>  	When there are more than this many packs that are not

What's this newline for? I'm not completely opposed if that's the style
we want, but it seems odd that just this one has a blank between the
variable name and the text.

>  	marked with `*.keep` file in the repository, `git gc
>  	--auto` consolidates them into one larger pack.  The
> -	default	value is 50.  Setting this to 0 disables it.
> +	default value is 50.  Setting this (or `gc.auto`) to 0
> +	disables it. Packs will be consolidated using the `-A` option
> +	of `git repack`.

If we do revise the "-d -l" bit for the loose limit, we'd probably want
to adjust this to match.

> @@ -35,13 +49,18 @@ gc.bigPackThreshold::
>  	If non-zero, all packs larger than this limit are kept when
>  	`git gc` is run. This is very similar to `--keep-base-pack`
>  	except that all packs that meet the threshold are kept, not
> -	just the base pack. Defaults to zero. Common unit suffixes of
> -	'k', 'm', or 'g' are supported.
> +	just the base pack. Defaults to zero or a memory heuristic.
> +	Common unit suffixes of 'k', 'm', or 'g' are supported.

I'm not sure how to read this "or". What's the difference between "0" or
the memory heuristic, and when is one used? Or is that what the "if the
number of kept packs is more than..." below is trying to say?

If so, I wonder if it would be simpler to say "defaults to a memory
heuristic", but with a note for "but under these conditions it is not
used".

Or am I totally misunderstanding how it actually works (which seems
likely to me)?

> +If the amount of memory is estimated not enough for `git repack` to
> +run smoothly and `gc.bigPackThreshold` is not set, the largest pack
> +will also be excluded (which is the equivalent of running `git gc`
> +with `--keep-base-pack`).

I had trouble parsing this first line. Maybe:

  If the amount of memory estimated for `git repack` to run smoothly is
  not available and ...

I guess a lot of this is just being copied from elsewhere, but it's
probably worth cleaning it up while we're here.

> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> [...]
> +See the `gc.auto' option in the "CONFIGURATION" below for how this
> +heuristic works.

s/CONFIGURATION/& section/?

> +Once housekeeping is triggered by exceeding the limits of
> +configurations options such as `gc.auto` and `gc.autoPackLimit`, all

s/configurations/configuration/

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
@ 2019-03-18 21:51 ` Jeff King
  2019-03-18 22:45   ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-18 21:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 05:14:58PM +0100, Ævar Arnfjörð Bjarmason wrote:

> This series is unrelated (and does not conflict with) my in-flight gc
> contention series
> (https://public-inbox.org/git/20190315155959.12390-1-avarab@gmail.com/),
> but the "git-gc" docs should be updated to discuss the
> core.filesRefLockTimeout option and how it impacts contention, see 8/8
> in that series for context. I.e. "you may have contention, but
> core.filesRefLockTimeout can mitigate blah blah".
> 
> I was going to do that, but then thought that we should also mention
> that on the server-side we mitigate most/all of the contention via the
> quarantine, see "QUARANTINE ENVIRONMENT" in
> git-receive-pack(1). I.e. we:
> 
>  1. Get the temp pack
>  2. OK it (fsck, hooks etc.)
>  3. Move *complete* previously temp packs over
>  4. Update the refs
> 
> I.e. we are immune from the "concurrently with another process" race,
> but of course something concurrently updating the "server" repo
> without a quarantine environment may be subject to that race.
> 
> The only problem is that the last couple of paragraphs may be
> wrong. That's just my understanding from a brief reading of
> 722ff7f876c ("receive-pack: quarantine objects until pre-receive
> accepts", 2016-10-03) so I didn't want to include that in this
> series. Peff (or others), any comments?

I don't think the quarantine stuff should impact contention at all. It's
only quarantining the objects, which are the least contentious part of
Git (because object content is idempotent, so we don't do any locking
there, and with two racing processes, one will just "win").

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 21:22     ` Jeff King
@ 2019-03-18 22:13       ` Ævar Arnfjörð Bjarmason
  2019-03-18 23:53         ` Jeff King
  0 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 22:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, git, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Michael Haggerty,
	Stefan Beller, Matt McCutchen


On Mon, Mar 18 2019, Jeff King wrote:

> On Mon, Mar 18, 2019 at 01:28:24PM -0700, Jonathan Nieder wrote:
>
>> > +Using this option may optimize for disk space at the expense of
>> > +runtime performance. See the `--depth` and `--window` documentation in
>> > +linkgit:git-repack[1]. It is not recommended that this option be used
>> > +to improve performance for a given repository without running tailored
>> > +performance benchmarks on it. It may make things better, or worse. Not
>> > +using this at all is the right trade-off for most users and their
>> > +repositories.
>>
>> This part kind of feels like giving up.  Can we make --aggressive have
>> good runtime read performance so we don't have to hedge this way?
>> E.g. is this patch papering over a poor choice of --depth setting in
>> --aggressive?
>
> I thought we already did that, in 07e7dbf0db (gc: default aggressive
> depth to 50, 2016-08-11). As far as I know, "--aggressive" produces
> packs with similar runtime performance.


What happened here is that I'd entirely forgotten about your 07e7dbf0db
and in skimming while writing this throught we were still picking larger
depth values, which we aren't.

I'll fix that, and see that gc.aggressiveDepth also needs to be changed
to note that the depth it's now using as "aggressive" is just the
default of 50 you'd get without --aggressive.

> It is possible, if it finds more deltas due to the larger window, that
> we'd spend more time accessing those deltas. But if the chains aren't
> long, the base cache tends to perform well, and delta reconstruction is
> about the same cost as zlib inflating. And we have a smaller disk cache
> footprint.

I haven't tested that but suspect it won't matter. We do spend a *lot*
more time though, so that still needs to be noted...

On the topic of other things I may have screwed up, is this:

    +The effects of this option are persistent to the extent that
    +`gc.autoPackLimit` and friends don't cause a consolidation of existing
    +pack(s) generated with this option.

Actually wrong since we don't pass -f usually, and thus a one-off
--aggressive would live forever for the objects involved in that run no
matter if we later consolidate?

From the docs it seems so, but I'd like to confirm...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 1/4] gc docs: modernize the advice for manually running "gc"
  2019-03-18 21:27   ` Jeff King
@ 2019-03-18 22:18     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 22:18 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Mon, Mar 18 2019, Jeff King wrote:

> On Mon, Mar 18, 2019 at 05:14:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> The docs have been recommending that users need to run this manually,
>> but that hasn't been needed in practice for a long time.
>>
>> Let's instead have this reflect reality and say that most users don't
>> need to run this manually at all.
>
> Yeah, I think this makes sense.
>
>> -Users are encouraged to run this task on a regular basis within
>> -each repository to maintain good disk space utilization and good
>> -operating performance.
>> +Most users should not have to run this command manually. When common
>> +porcelain operations that create objects are run, such as
>> +linkgit:git-commit[1] and linkgit:git-fetch[1], `git gc --auto` will
>> +be run automatically.
>
> This is in the description, before "--auto" is introduced. I wonder if
> it is worth just describing it briefly, like:
>
>   When common porcelain operations that creates objects are run, they
>   will check whether the repository has grown substantially since the
>   last maintenance, and if so run `git gc` automatically.
>
> That gives a first-time reader an idea of whether they need to care
> about this command without having to dig into what "--auto" is.

Yeah I think that's better. Also more briefly describing gc.auto=0
without an example (suggesting people run that, which for most is a bad
idea). I.e. just adding to that "This behavior can be disabled, see
`gc.auto` below."

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-03-18 21:51 ` [PATCH 0/4] gc docs: modernize and fix the documentation Jeff King
@ 2019-03-18 22:45   ` Ævar Arnfjörð Bjarmason
  2019-03-19  0:18     ` Jeff King
  0 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 22:45 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Mon, Mar 18 2019, Jeff King wrote:

> On Mon, Mar 18, 2019 at 05:14:58PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This series is unrelated (and does not conflict with) my in-flight gc
>> contention series
>> (https://public-inbox.org/git/20190315155959.12390-1-avarab@gmail.com/),
>> but the "git-gc" docs should be updated to discuss the
>> core.filesRefLockTimeout option and how it impacts contention, see 8/8
>> in that series for context. I.e. "you may have contention, but
>> core.filesRefLockTimeout can mitigate blah blah".
>>
>> I was going to do that, but then thought that we should also mention
>> that on the server-side we mitigate most/all of the contention via the
>> quarantine, see "QUARANTINE ENVIRONMENT" in
>> git-receive-pack(1). I.e. we:
>>
>>  1. Get the temp pack
>>  2. OK it (fsck, hooks etc.)
>>  3. Move *complete* previously temp packs over
>>  4. Update the refs
>>
>> I.e. we are immune from the "concurrently with another process" race,
>> but of course something concurrently updating the "server" repo
>> without a quarantine environment may be subject to that race.
>>
>> The only problem is that the last couple of paragraphs may be
>> wrong. That's just my understanding from a brief reading of
>> 722ff7f876c ("receive-pack: quarantine objects until pre-receive
>> accepts", 2016-10-03) so I didn't want to include that in this
>> series. Peff (or others), any comments?
>
> I don't think the quarantine stuff should impact contention at all. It's
> only quarantining the objects, which are the least contentious part of
> Git (because object content is idempotent, so we don't do any locking
> there, and with two racing processes, one will just "win").

Without the quarantine, isn't there the race that the NOTES section
talks about (unless I've misread it).

I.e. we have some loose object "ABCD" not referrred to by anything for
the last 2 weeks, as we're gc-ing a ref update comes in that makes it
referenced again. We then delete "ABCD" (not used!) at the same time the
ref update happens, and get corruption.

Whereas the quarantine might work around since the client will have sent
ABCD with no reference pointing to it to the server in the temp pack,
which we then rename in-place and then update the ref, so we don't care
if "ABCD" goes away.

Unless that interacts racily with the receive.unpackLimit, but then I
have no idea that section is trying to say...

Also, surely the part where "NOTES" says something to the effect of "you
are subject to races unless gc.auto=0" is wrong. To the extent that
there's races it won't matter that you invoke "git gc" or "git gc
--auto", it's the concurrency that matters. So if there's still races we
should be saying the repo needs to be locked for writes for the duration
of the "gc".

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
  2019-03-18 21:49   ` Jeff King
@ 2019-03-18 22:48     ` Ævar Arnfjörð Bjarmason
  2019-03-18 23:42       ` Jeff King
  0 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-18 22:48 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Mon, Mar 18 2019, Jeff King wrote:

> On Mon, Mar 18, 2019 at 05:15:01PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> In an earlier commit I started including the "gc.*" documentation from
>> git-config(1) in the git-gc(1) documentation. That still left us in a
>> state where the "--auto" option and "gc.auto" were redundantly
>> discussing the same thing.
>>
>> Fix that by briefly discussing how the option itself works for
>> "--auto", and for the rest referring to the configuration
>> documentation.
>>
>> This revealed existing blind spots in the configuration documentation,
>> move over the documentation and reword as appropriate.
>
> Nice improvement. A few comments:
>
>> diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
>> index a834a801cd6..605e14bc80b 100644
>> --- a/Documentation/config/gc.txt
>> +++ b/Documentation/config/gc.txt
>> @@ -19,13 +19,27 @@ gc.auto::
>>  	objects in the repository, `git gc --auto` will pack them.
>>  	Some Porcelain commands use this command to perform a
>>  	light-weight garbage collection from time to time.  The
>> -	default value is 6700.  Setting this to 0 disables it.
>> +	default value is 6700.
>> ++
>> +Setting this to 0 disables not only automatic packing based on the
>> +number of loose objects, but any other heuristic `git gc --auto` will
>> +otherwise use to determine if there's work to do, such as
>> +`gc.autoPackLimit`.
>> ++
>> +The repacking of loose objects will be performed with `git repack -d
>> +-l`.
>
> I know this last sentence came from the existing documentation, but I
> wonder if we should be more vague here. We'd pack with "repack -dl" when
> we have just loose objects, and "repack -Adl" when we have too many
> packs. Or "repack -adl" if we're pruning now, and "--unpack-unreachable"
> otherwise.
>
> I think the point of git-gc is that you don't have to care about that
> stuff. It works magically, and if you are implementing your own custom
> gc scheme, then you are probably better off reading the output of
> GIT_TRACE or looking at the source, rather than this documentation.

Yeah I can just drop it while I'm at it. Was just losslessly trying to
port the existing docs.

>>  gc.autoPackLimit::
>> +
>>  	When there are more than this many packs that are not
>
> What's this newline for? I'm not completely opposed if that's the style
> we want, but it seems odd that just this one has a blank between the
> variable name and the text.

Mistake, will fix.

>>  	marked with `*.keep` file in the repository, `git gc
>>  	--auto` consolidates them into one larger pack.  The
>> -	default	value is 50.  Setting this to 0 disables it.
>> +	default value is 50.  Setting this (or `gc.auto`) to 0
>> +	disables it. Packs will be consolidated using the `-A` option
>> +	of `git repack`.
>
> If we do revise the "-d -l" bit for the loose limit, we'd probably want
> to adjust this to match.

Or not mention it either?

>> @@ -35,13 +49,18 @@ gc.bigPackThreshold::
>>  	If non-zero, all packs larger than this limit are kept when
>>  	`git gc` is run. This is very similar to `--keep-base-pack`
>>  	except that all packs that meet the threshold are kept, not
>> -	just the base pack. Defaults to zero. Common unit suffixes of
>> -	'k', 'm', or 'g' are supported.
>> +	just the base pack. Defaults to zero or a memory heuristic.
>> +	Common unit suffixes of 'k', 'm', or 'g' are supported.
>
> I'm not sure how to read this "or". What's the difference between "0" or
> the memory heuristic, and when is one used? Or is that what the "if the
> number of kept packs is more than..." below is trying to say?

That by default we don't use gc.bigPackThreshold, unless we find that
you're under memory pressure. I.e. "it's off by default, unless your
system has too little memory".

> If so, I wonder if it would be simpler to say "defaults to a memory
> heuristic", but with a note for "but under these conditions it is not
> used".
>
> Or am I totally misunderstanding how it actually works (which seems
> likely to me)?
>
>> +If the amount of memory is estimated not enough for `git repack` to
>> +run smoothly and `gc.bigPackThreshold` is not set, the largest pack
>> +will also be excluded (which is the equivalent of running `git gc`
>> +with `--keep-base-pack`).
>
> I had trouble parsing this first line. Maybe:
>
>   If the amount of memory estimated for `git repack` to run smoothly is
>   not available and ...
>
> I guess a lot of this is just being copied from elsewhere, but it's
> probably worth cleaning it up while we're here.

Will try to make it suck less.

>> --- a/Documentation/git-gc.txt
>> +++ b/Documentation/git-gc.txt
>> [...]
>> +See the `gc.auto' option in the "CONFIGURATION" below for how this
>> +heuristic works.
>
> s/CONFIGURATION/& section/?
>
>> +Once housekeeping is triggered by exceeding the limits of
>> +configurations options such as `gc.auto` and `gc.autoPackLimit`, all
>
> s/configurations/configuration/

*Nod*. Thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
  2019-03-18 22:48     ` Ævar Arnfjörð Bjarmason
@ 2019-03-18 23:42       ` Jeff King
  0 siblings, 0 replies; 64+ messages in thread
From: Jeff King @ 2019-03-18 23:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 11:48:46PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > I know this last sentence came from the existing documentation, but I
> > wonder if we should be more vague here. We'd pack with "repack -dl" when
> > we have just loose objects, and "repack -Adl" when we have too many
> > packs. Or "repack -adl" if we're pruning now, and "--unpack-unreachable"
> > otherwise.
> >
> > I think the point of git-gc is that you don't have to care about that
> > stuff. It works magically, and if you are implementing your own custom
> > gc scheme, then you are probably better off reading the output of
> > GIT_TRACE or looking at the source, rather than this documentation.
> 
> Yeah I can just drop it while I'm at it. Was just losslessly trying to
> port the existing docs.

Yeah, I'm sympathetic to that (if you did drop it, you might have gotten
the opposite review). ;) I think it would be OK to just mention it in
the commit message, but I'd also be OK dropping it in a preliminary
patch.

> >>  	marked with `*.keep` file in the repository, `git gc
> >>  	--auto` consolidates them into one larger pack.  The
> >> -	default	value is 50.  Setting this to 0 disables it.
> >> +	default value is 50.  Setting this (or `gc.auto`) to 0
> >> +	disables it. Packs will be consolidated using the `-A` option
> >> +	of `git repack`.
> >
> > If we do revise the "-d -l" bit for the loose limit, we'd probably want
> > to adjust this to match.
> 
> Or not mention it either?

Yes. :)

> > I'm not sure how to read this "or". What's the difference between "0" or
> > the memory heuristic, and when is one used? Or is that what the "if the
> > number of kept packs is more than..." below is trying to say?
> 
> That by default we don't use gc.bigPackThreshold, unless we find that
> you're under memory pressure. I.e. "it's off by default, unless your
> system has too little memory".

OK, I see. It might make sense to write that out more explicitly.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 22:13       ` Ævar Arnfjörð Bjarmason
@ 2019-03-18 23:53         ` Jeff King
  0 siblings, 0 replies; 64+ messages in thread
From: Jeff King @ 2019-03-18 23:53 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jonathan Nieder, git, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Michael Haggerty,
	Stefan Beller, Matt McCutchen

On Mon, Mar 18, 2019 at 11:13:57PM +0100, Ævar Arnfjörð Bjarmason wrote:

> What happened here is that I'd entirely forgotten about your 07e7dbf0db
> and in skimming while writing this throught we were still picking larger
> depth values, which we aren't.
> 
> I'll fix that, and see that gc.aggressiveDepth also needs to be changed
> to note that the depth it's now using as "aggressive" is just the
> default of 50 you'd get without --aggressive.

Yeah. I think I tweaked the documentation in that commit, but I agree
it's probably worth calling out the subtlety that it's the same as the
default.

> > It is possible, if it finds more deltas due to the larger window, that
> > we'd spend more time accessing those deltas. But if the chains aren't
> > long, the base cache tends to perform well, and delta reconstruction is
> > about the same cost as zlib inflating. And we have a smaller disk cache
> > footprint.
> 
> I haven't tested that but suspect it won't matter. We do spend a *lot*
> more time though, so that still needs to be noted...

Yeah, agreed on both counts.

> On the topic of other things I may have screwed up, is this:
> 
>     +The effects of this option are persistent to the extent that
>     +`gc.autoPackLimit` and friends don't cause a consolidation of existing
>     +pack(s) generated with this option.
> 
> Actually wrong since we don't pass -f usually, and thus a one-off
> --aggressive would live forever for the objects involved in that run no
> matter if we later consolidate?
> 
> From the docs it seems so, but I'd like to confirm...

In general, yeah, I'd expect an --aggressive repack's effects to live on
through subsequent gc's. It's not _entirely_ true, because objects from
that big repack may end up duplicate in another pack (e.g., due to
thin-fixing, or just a client which sends objects we didn't need due to
push's abbreviated negotiation). And then either:

  - we may select the copy of the object from the other pack, where it's
    a base object, and then end up looking for a new delta for it

  - after the pack-mru patches from ~2016, we don't have a strict
    ordering of the packs, which means we can see cycles in the delta
    graph. So even if object A isn't duplicated, it may be a delta on B,
    which deltas on C, and then the copy of C we pick is from another
    pack where it's a delta on A. We have to break the cycle, which
    could happen on any one of A, B, or C.

I haven't done careful measurements, but I'd be surprised if those cases
make a significant dent, even over many gc's. What I think probably does
make a dent is that new objects come into the repo with whatever crappy
packing the client did as part of the push, and you'd ideally like to
throw away all of their deltas and just find new good ones.

I think it might help for pack-objects to have a mode that isn't quite
"keep the big pack", but rather "keep the deltas from the big pack, but
not other ones, but otherwise create a new big pack". But this has
diverged pretty far from the point of your series. :)

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-03-18 22:45   ` Ævar Arnfjörð Bjarmason
@ 2019-03-19  0:18     ` Jeff King
  2019-05-06  9:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-03-19  0:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 11:45:39PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > I don't think the quarantine stuff should impact contention at all. It's
> > only quarantining the objects, which are the least contentious part of
> > Git (because object content is idempotent, so we don't do any locking
> > there, and with two racing processes, one will just "win").
> 
> Without the quarantine, isn't there the race that the NOTES section
> talks about (unless I've misread it).

Ah, OK, I wasn't quite sure which documentation you were talking about.
I see the discussion now in the "NOTES" section of git-gc(1).

> I.e. we have some loose object "ABCD" not referrred to by anything for
> the last 2 weeks, as we're gc-ing a ref update comes in that makes it
> referenced again. We then delete "ABCD" (not used!) at the same time the
> ref update happens, and get corruption.
> 
> Whereas the quarantine might work around since the client will have sent
> ABCD with no reference pointing to it to the server in the temp pack,
> which we then rename in-place and then update the ref, so we don't care
> if "ABCD" goes away.

tl;dr I don't think quarantine impacts this, but if you really want gory
details, read on.

This is a problem with or without the quarantine. It's fundamentally a
race because we do not atomically say "is anybody using X? If not, we
can delete it" and some other process saying "I'd like to use X".

Pushes are actually better off than most operations, because we only
advertise what's reachable, and the client is expected to send
everything else. So with just a normal update-ref call, we could race
like this:

  1. ABCD is ancient.

  2. Process 1 (update-ref) wants to reference ABCD. It sees that we
     have it.

  3. Process 2 (gc/prune) sees that nobody references it. It deletes
     ABCD.

  4. Process 1 writes out the reference.

That doesn't happen with a push, because the server never would have
told the client that it has ABCD in the first place (so process 1 here
is the client). That is true with or without quarantine.

But pushes aren't foolproof either. You said "loose object ABCD not
referred t oby anything for the last 2 weeks". But that's not exactly
how it works. It's "object with an mtime of more than 2 weeks which is
not currently referenced". So imagine a sequence like:

  1. ABCD is ancient.

  2. refs/heads/foo points to ABCD.

  3. Server receive-pack advertises foo pointing to ABCD.

  4. Simultaneous process on the server deletes refs/heads/foo (or
     perhaps somebody force-pushes over it).

  5. Client prepares and sends pack without ABCD.

  6. Server receive-pack checks that yes, we still have ABCD (i.e., the
     usual connectivity check).

  7. Server gc drops ABCD, which is now unreachable (reflogs can help
     here, if you've enabled them; but we do delete reflogs when the
     branch is deleted).

  8. Server receive-pack writes corrupt repo mentioning ABCD.

That's a lot more steps, though they might not be as implausible as you
think (e.g., consider moving "refs/heads/foo" to "refs/heads/bar" in a
single push; that's actually a delete and an update, which is all you
need to race with a simultaneous gc).

I have no idea how often this happens in practice. My subjective
recollection is that most of the race corruptions I've seen were from
local operations on the server. E.g., we compute a tentative merge for
somebody's pull request which shares objects with an older tentative
merge. They click the "merge" button and we reference that commit, which
is recent, but unbeknownst to us, while we were creating our new
tentative merge, a "gc" was deleting the old one.

We're sometimes saved by the "transitive freshness" rules in d3038d22f9
(prune: keep objects reachable from recent objects, 2014-10-15).  But
they're far from perfect:

 - some operations (like the push rename example) aren't writing new
   objects, so the ref write _is_ the moment that gc would find out
   something is reachable

 - the "is it reachable?" and "no, then delete it" steps aren't atomic.
   Unless you want a whole-repo stop-the-world lock, somebody can
   reference the object in between. And since it may take many seconds
   to compute reachability, stop-the-world is not great.

I think there are probably ways to make it better. Perhaps some kind of
lockless delete-but-be-able-to-rollback scheme (but keep in mind this
has to be implemented no top of POSIX filesystem semantics). Or even
just a "compute reachability, mark for deletion, and then hold a
stop-the-world lock briefly to double-check that our reachability is
still up to date".

At least those seem plausible to me. I've never worked out the details,
and our solution was to just stop deleting objects during routine
maintenance (using "repack -adk"). We do still occasionally prune
manually (e.g., when somebody writes to support to remove a confidential
mistake).

Anyway, that was more than you probably wanted to know. The short of it
is that I don't think quarantines help (they may even make things worse
by slightly increasing the length of the race window, though in practice
I doubt it).

> Unless that interacts racily with the receive.unpackLimit, but then I
> have no idea that section is trying to say...

No, I don't think unpackLimit really affects it much either way.

> Also, surely the part where "NOTES" says something to the effect of "you
> are subject to races unless gc.auto=0" is wrong. To the extent that
> there's races it won't matter that you invoke "git gc" or "git gc
> --auto", it's the concurrency that matters. So if there's still races we
> should be saying the repo needs to be locked for writes for the duration
> of the "gc".

Correct. It's the very act of pruning that is problematic. I think the
point is that if you are manually running "git gc", you'd presumably do
it at a time when the repository is not otherwise active.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
  2019-03-18 21:31   ` Jeff King
@ 2019-03-19  2:08   ` Duy Nguyen
  1 sibling, 0 replies; 64+ messages in thread
From: Duy Nguyen @ 2019-03-19  2:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Michael Haggerty,
	Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, Mar 18, 2019 at 11:15 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> Rather than duplicating the documentation for the various "gc" options
> let's include the "gc" docs from git-config. They were mostly better
> already, and now we don't have the same docs in two places with subtly
> different wording.

Nice. I have a wip series to do this for more man pages but it's
still, well, wip.

There may be one thing we need to sort out to include config/*.txt,
sometimes we mention see linkgit:git-gc[1]. When including
config/gc.txt in git-gc.1, this self reference seems silly. But I
think we can just leave it for now.
-- 
Duy

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
  2019-03-18 20:28   ` Jonathan Nieder
@ 2019-03-19  6:54   ` Johannes Sixt
  2019-03-19  9:28     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 64+ messages in thread
From: Johannes Sixt @ 2019-03-19  6:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Jeff King, Michael Haggerty, Stefan Beller, Jonathan Nieder,
	Matt McCutchen

Am 18.03.19 um 17:15 schrieb Ævar Arnfjörð Bjarmason:
> +Using this option may optimize for disk space at the expense of
> +runtime performance. See the `--depth` and `--window` documentation in
> +linkgit:git-repack[1]. It is not recommended that this option be used
> +to improve performance for a given repository without running tailored
> +performance benchmarks on it. It may make things better, or worse. Not
> +using this at all is the right trade-off for most users and their
> +repositories.
> ++
> +The effects of this option are persistent to the extent that
> +`gc.autoPackLimit` and friends don't cause a consolidation of existing
> +pack(s) generated with this option.

The first paragraph talks about potential downsides. And I think that
the second paragraph attempts to tell me how I can back out if I'm hit
by those downsides. But I have not the slightest idea how to read this
sentence and know what I have to do.

-- Hannes

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive
  2019-03-19  6:54   ` Johannes Sixt
@ 2019-03-19  9:28     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-19  9:28 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Jeff King, Michael Haggerty, Stefan Beller, Jonathan Nieder,
	Matt McCutchen


On Tue, Mar 19 2019, Johannes Sixt wrote:

> Am 18.03.19 um 17:15 schrieb Ævar Arnfjörð Bjarmason:
>> +Using this option may optimize for disk space at the expense of
>> +runtime performance. See the `--depth` and `--window` documentation in
>> +linkgit:git-repack[1]. It is not recommended that this option be used
>> +to improve performance for a given repository without running tailored
>> +performance benchmarks on it. It may make things better, or worse. Not
>> +using this at all is the right trade-off for most users and their
>> +repositories.
>> ++
>> +The effects of this option are persistent to the extent that
>> +`gc.autoPackLimit` and friends don't cause a consolidation of existing
>> +pack(s) generated with this option.
>
> The first paragraph talks about potential downsides. And I think that
> the second paragraph attempts to tell me how I can back out if I'm hit
> by those downsides. But I have not the slightest idea how to read this
> sentence and know what I have to do.

That's an existing issue, but I'm fine with improving the docs even
more, will add that :)

You need to repack non-aggressively with the -f option. Right now
there's nothing that exposes that from "gc", except setting the
"aggressive" settings to the same as the default window/depth.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 00/10] gc docs: modernize and fix the documentation
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2019-03-18 21:51 ` [PATCH 0/4] gc docs: modernize and fix the documentation Jeff King
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
                     ` (11 more replies)
  2019-03-21 20:50 ` [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  15 siblings, 12 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

For v1 see: https://public-inbox.org/git/20190318161502.7979-1-avarab@gmail.com/

This addresses all the feedback it got, which includes splitting out
various "while we're at it" fixes, and then I found/remembered some
more things I needed to fix.

It would still be great to have Peff submit some version of his
https://public-inbox.org/git/20190319001829.GL29661@sigill.intra.peff.net/
reply to the NOTES section sometime, but I had to stop somewhere.

I also documented the fast-import caveat discussed in
https://public-inbox.org/git/87o964cnn0.fsf@evledraar.gmail.com/ while
I was at it, as promised.

Ævar Arnfjörð Bjarmason (10):
  gc docs: modernize the advice for manually running "gc"
  gc docs: stop noting "repack" flags
  gc docs: clean grammar for "gc.bigPackThreshold"
  gc docs: include the "gc.*" section from "config" in "gc"
  gc docs: re-flow the "gc.*" section in "config"
  gc docs: note how --aggressive impacts --window & --depth
  gc docs: downplay the usefulness of --aggressive
  gc docs: note "gc --aggressive" in "fast-import"
  gc docs: clarify that "gc" doesn't throw away referenced objects
  gc docs: remove incorrect reference to gc.auto=0

 Documentation/config/gc.txt       |  34 ++++++-
 Documentation/git-fast-import.txt |   7 ++
 Documentation/git-gc.txt          | 142 ++++++++++--------------------
 3 files changed, 84 insertions(+), 99 deletions(-)

Range-diff:
 1:  d48b9c7221 !  1:  89719142c7 gc docs: modernize the advice for manually running "gc"
    @@ -3,10 +3,17 @@
         gc docs: modernize the advice for manually running "gc"
     
         The docs have been recommending that users need to run this manually,
    -    but that hasn't been needed in practice for a long time.
    +    but that hasn't been needed in practice for a long time except in
    +    exceptional circumstances.
     
         Let's instead have this reflect reality and say that most users don't
    -    need to run this manually at all.
    +    need to run this manually at all, while briefly describing the sorts
    +    sort of cases where "gc" does need to be run manually.
    +
    +    Since we're recommending that users run this most of the and usually
    +    don't need to tweak it, let's tone down the very prominent example of
    +    the gc.auto=0 command. It's sufficient to point to the gc.auto
    +    documentation below.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -20,20 +27,24 @@
     -Users are encouraged to run this task on a regular basis within
     -each repository to maintain good disk space utilization and good
     -operating performance.
    -+Most users should not have to run this command manually. When common
    -+porcelain operations that create objects are run, such as
    -+linkgit:git-commit[1] and linkgit:git-fetch[1], `git gc --auto` will
    -+be run automatically.
    - 
    +-
     -Some git commands may automatically run 'git gc'; see the `--auto` flag
     -below for details. If you know what you're doing and all you want is to
     -disable this behavior permanently without further considerations, just do:
    -+You should only need to run `git gc` manually when adding objects to a
    -+repository without regularly running such porcelain commands. Another
    -+use-case is wanting to do a one-off repository optimization.
    +-
    +-----------------------
    +-$ git config --global gc.auto 0
    +-----------------------
    ++When common porcelain operations that creates objects are run, they
    ++will check whether the repository has grown substantially since the
    ++last maintenance, and if so run `git gc` automatically. See `gc.auto`
    ++below for how to disable this behavior.
     +
    -+If you know what you're doing and all you want is to disable automatic
    -+runs, do:
    ++Running `git gc` manually should only be needed when adding objects to
    ++a repository without regularly running such porcelain commands, to do
    ++a one-off repository optimization, or e.g. to clean up a suboptimal
    ++mass-import. See the "PACKFILE OPTIMIZATION" section in
    ++linkgit:git-fast-import[1] for more details on the import case.
      
    - ----------------------
    - $ git config --global gc.auto 0
    + OPTIONS
    + -------
 -:  ---------- >  2:  d90a5b1b4c gc docs: stop noting "repack" flags
 -:  ---------- >  3:  fedd9bb886 gc docs: clean grammar for "gc.bigPackThreshold"
 2:  e670d514ce !  4:  6fad05a67c gc docs: include the "gc.*" section from "config" in "gc"
    @@ -34,16 +34,52 @@
      
      gc.auto::
      	When there are approximately more than this many loose
    + 	objects in the repository, `git gc --auto` will pack them.
    + 	Some Porcelain commands use this command to perform a
    + 	light-weight garbage collection from time to time.  The
    +-	default value is 6700.  Setting this to 0 disables it.
    ++	default value is 6700.
    +++
    ++Setting this to 0 disables not only automatic packing based on the
    ++number of loose objects, but any other heuristic `git gc --auto` will
    ++otherwise use to determine if there's work to do, such as
    ++`gc.autoPackLimit`.
    + 
    + gc.autoPackLimit::
    + 	When there are more than this many packs that are not
    + 	marked with `*.keep` file in the repository, `git gc
    + 	--auto` consolidates them into one larger pack.  The
    + 	default	value is 50.  Setting this to 0 disables it.
    ++	Setting `gc.auto` to 0 will also disable this.
    +++
    ++See the `gc.bigPackThreshold` configuration variable below. When in
    ++use, it'll affect how the auto pack limit works.
    + 
    + gc.autoDetach::
    + 	Make `git gc --auto` return immediately and run in background
    +@@
    + this configuration variable is ignored, all packs except the base pack
    + will be repacked. After this the number of packs should go below
    + gc.autoPackLimit and gc.bigPackThreshold should be respected again.
    +++
    ++If the amount of memory estimated for `git repack` to run smoothly is
    ++not available and `gc.bigPackThreshold` is not set, the largest
    ++pack will also be excluded (this is the equivalent of running `git gc`
    ++with `--keep-base-pack`).
    + 
    + gc.writeCommitGraph::
    + 	If true, then gc will rewrite the commit-graph file when
     @@
      	With "<pattern>" (e.g. "refs/stash")
      	in the middle, the setting applies only to the refs that
      	match the <pattern>.
     ++
    -+These types of entries are generally created as a result of using `git
    -+commit --amend` or `git rebase` and are the commits prior to the amend
    -+or rebase occurring. Since these changes are not part of the current
    -+project history most users will want to expire them sooner, which is
    -+why the default is more aggressive than `gc.reflogExpire`.
    ++These types of entries are generally created as
    ++a result of using `git commit --amend` or `git rebase` and are the
    ++commits prior to the amend or rebase occurring.  Since these changes
    ++are not part of the current project most users will want to expire
    ++them sooner, which is why the default is more aggressive than
    ++`gc.reflogExpire`.
      
      gc.rerereResolved::
      	Records of conflicted merge you resolved earlier are
    @@ -52,15 +88,38 @@
      --- a/Documentation/git-gc.txt
      +++ b/Documentation/git-gc.txt
     @@
    - repository without regularly running such porcelain commands. Another
    - use-case is wanting to do a one-off repository optimization.
    - 
    --If you know what you're doing and all you want is to disable automatic
    --runs, do:
    -+If you know what you're doing and want to disable automatic runs, do:
    + --auto::
    + 	With this option, 'git gc' checks whether any housekeeping is
    + 	required; if not, it exits without performing any work.
    +-	Some git commands run `git gc --auto` after performing
    +-	operations that could create many loose objects. Housekeeping
    +-	is required if there are too many loose objects or too many
    +-	packs in the repository.
    + +
    +-If the number of loose objects exceeds the value of the `gc.auto`
    +-configuration variable, then all loose objects are combined into a
    +-single pack.  Setting the value of `gc.auto`
    +-to 0 disables automatic packing of loose objects.
    ++See the `gc.auto' option in the "CONFIGURATION" section below for how
    ++this heuristic works.
    + +
    +-If the number of packs exceeds the value of `gc.autoPackLimit`,
    +-then existing packs (except those marked with a `.keep` file
    +-or over `gc.bigPackThreshold` limit)
    +-are consolidated into a single pack.
    +-If the amount of memory estimated for `git repack` to run smoothly is
    +-not available and `gc.bigPackThreshold` is not set, the largest
    +-pack will also be excluded (this is the equivalent of running `git gc`
    +-with `--keep-base-pack`).
    +-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
    +-packs.
    +-+
    +-If houskeeping is required due to many loose objects or packs, all
    ++Once housekeeping is triggered by exceeding the limits of
    ++configuration options such as `gc.auto` and `gc.autoPackLimit`, all
    + other housekeeping tasks (e.g. rerere, working trees, reflog...) will
    + be performed as well.
      
    - ----------------------
    - $ git config --global gc.auto 0
     @@
      CONFIGURATION
      -------------
 3:  d6f1e001a4 !  5:  994e22a0d6 gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
    @@ -1,18 +1,10 @@
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    -    gc docs: de-duplicate "OPTIONS" and "CONFIGURATION"
    +    gc docs: re-flow the "gc.*" section in "config"
     
    -    In an earlier commit I started including the "gc.*" documentation from
    -    git-config(1) in the git-gc(1) documentation. That still left us in a
    -    state where the "--auto" option and "gc.auto" were redundantly
    -    discussing the same thing.
    -
    -    Fix that by briefly discussing how the option itself works for
    -    "--auto", and for the rest referring to the configuration
    -    documentation.
    -
    -    This revealed existing blind spots in the configuration documentation,
    -    move over the documentation and reword as appropriate.
    +    Re-flow the "gc.*" section in "config". A previous commit moved this
    +    over from the "gc" docs, but tried to keep as many of the lines
    +    identical to benefit from diff's move detection.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -20,91 +12,33 @@
      --- a/Documentation/config/gc.txt
      +++ b/Documentation/config/gc.txt
     @@
    - 	objects in the repository, `git gc --auto` will pack them.
    - 	Some Porcelain commands use this command to perform a
    - 	light-weight garbage collection from time to time.  The
    --	default value is 6700.  Setting this to 0 disables it.
    -+	default value is 6700.
    -++
    -+Setting this to 0 disables not only automatic packing based on the
    -+number of loose objects, but any other heuristic `git gc --auto` will
    -+otherwise use to determine if there's work to do, such as
    -+`gc.autoPackLimit`.
    -++
    -+The repacking of loose objects will be performed with `git repack -d
    -+-l`.
    - 
    - gc.autoPackLimit::
    -+
    - 	When there are more than this many packs that are not
    - 	marked with `*.keep` file in the repository, `git gc
    - 	--auto` consolidates them into one larger pack.  The
    --	default	value is 50.  Setting this to 0 disables it.
    -+	default value is 50.  Setting this (or `gc.auto`) to 0
    -+	disables it. Packs will be consolidated using the `-A` option
    -+	of `git repack`.
    -++
    -+See the `gc.bigPackThreshold` configuration variable below. When in
    -+use it'll effect how the auto pack limit works.
    - 
    - gc.autoDetach::
    - 	Make `git gc --auto` return immediately and run in background
    -@@
    - 	If non-zero, all packs larger than this limit are kept when
    - 	`git gc` is run. This is very similar to `--keep-base-pack`
    - 	except that all packs that meet the threshold are kept, not
    --	just the base pack. Defaults to zero. Common unit suffixes of
    --	'k', 'm', or 'g' are supported.
    -+	just the base pack. Defaults to zero or a memory heuristic.
    -+	Common unit suffixes of 'k', 'm', or 'g' are supported.
    - +
    - Note that if the number of kept packs is more than gc.autoPackLimit,
    - this configuration variable is ignored, all packs except the base pack
    - will be repacked. After this the number of packs should go below
      gc.autoPackLimit and gc.bigPackThreshold should be respected again.
    -++
    -+If the amount of memory is estimated not enough for `git repack` to
    -+run smoothly and `gc.bigPackThreshold` is not set, the largest pack
    -+will also be excluded (which is the equivalent of running `git gc`
    -+with `--keep-base-pack`).
    + +
    + If the amount of memory estimated for `git repack` to run smoothly is
    +-not available and `gc.bigPackThreshold` is not set, the largest
    +-pack will also be excluded (this is the equivalent of running `git gc`
    +-with `--keep-base-pack`).
    ++not available and `gc.bigPackThreshold` is not set, the largest pack
    ++will also be excluded (this is the equivalent of running `git gc` with
    ++`--keep-base-pack`).
      
      gc.writeCommitGraph::
      	If true, then gc will rewrite the commit-graph file when
    -
    - diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
    - --- a/Documentation/git-gc.txt
    - +++ b/Documentation/git-gc.txt
     @@
    - --auto::
    - 	With this option, 'git gc' checks whether any housekeeping is
    - 	required; if not, it exits without performing any work.
    --	Some git commands run `git gc --auto` after performing
    --	operations that could create many loose objects. Housekeeping
    --	is required if there are too many loose objects or too many
    --	packs in the repository.
    + 	in the middle, the setting applies only to the refs that
    + 	match the <pattern>.
      +
    --If the number of loose objects exceeds the value of the `gc.auto`
    --configuration variable, then all loose objects are combined into a
    --single pack using `git repack -d -l`.  Setting the value of `gc.auto`
    --to 0 disables automatic packing of loose objects.
    -+See the `gc.auto' option in the "CONFIGURATION" below for how this
    -+heuristic works.
    - +
    --If the number of packs exceeds the value of `gc.autoPackLimit`,
    --then existing packs (except those marked with a `.keep` file
    --or over `gc.bigPackThreshold` limit)
    --are consolidated into a single pack by using the `-A` option of
    --'git repack'.
    --If the amount of memory is estimated not enough for `git repack` to
    --run smoothly and `gc.bigPackThreshold` is not set, the largest
    --pack will also be excluded (this is the equivalent of running `git gc`
    --with `--keep-base-pack`).
    --Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
    --packs.
    --+
    --If houskeeping is required due to many loose objects or packs, all
    -+Once housekeeping is triggered by exceeding the limits of
    -+configurations options such as `gc.auto` and `gc.autoPackLimit`, all
    - other housekeeping tasks (e.g. rerere, working trees, reflog...) will
    - be performed as well.
    +-These types of entries are generally created as
    +-a result of using `git commit --amend` or `git rebase` and are the
    +-commits prior to the amend or rebase occurring.  Since these changes
    +-are not part of the current project most users will want to expire
    +-them sooner, which is why the default is more aggressive than
    +-`gc.reflogExpire`.
    ++These types of entries are generally created as a result of using `git
    ++commit --amend` or `git rebase` and are the commits prior to the amend
    ++or rebase occurring.  Since these changes are not part of the current
    ++project most users will want to expire them sooner, which is why the
    ++default is more aggressive than `gc.reflogExpire`.
      
    + gc.rerereResolved::
    + 	Records of conflicted merge you resolved earlier are
 -:  ---------- >  6:  916433ef73 gc docs: note how --aggressive impacts --window & --depth
 4:  257aff2808 !  7:  457357b464 gc docs: downplay the usefulness of --aggressive
    @@ -3,19 +3,17 @@
         gc docs: downplay the usefulness of --aggressive
     
         The existing "gc --aggressive" docs come just short of recommending to
    -    users that they run it regularly. In reality it's a waste of CPU for
    -    most users, and may even make things actively worse. I've personally
    -    talked to many users who've taken these docs as an advice to use this
    -    option, and have.
    +    users that they run it regularly. I've personally talked to many users
    +    who've taken these docs as an advice to use this option, and have,
    +    usually it's (mostly) a waste of time.
     
    -    Let's change this documentation to better reflect reality, i.e. for
    -    most users using --aggressive is a waste of time, and may even be
    -    actively making things worse.
    +    So let's clarify what it really does, and let the user draw their own
    +    conclusions.
     
    -    Let's also clarify the "The effects [...] are persistent" to clearly
    -    note that that's true to the extent that subsequent gc's aren't going
    -    to re-roll existing packs generated with --aggressive into a new set
    -    of packs.
    +    Let's also clarify the "The effects [...] are persistent" to
    +    paraphrase a brief version of Jeff King's explanation at [1].
    +
    +    1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ -23,27 +21,45 @@
      --- a/Documentation/git-gc.txt
      +++ b/Documentation/git-gc.txt
     @@
    - --aggressive::
    - 	Usually 'git gc' runs very quickly while providing good disk
      	space utilization and performance.  This option will cause
    --	'git gc' to more aggressively optimize the repository at the expense
    --	of taking much more time.  The effects of this optimization are
    + 	'git gc' to more aggressively optimize the repository at the expense
    + 	of taking much more time.  The effects of this optimization are
     -	persistent, so this option only needs to be used occasionally; every
     -	few hundred changesets or so.
    -+	'git gc' to more aggressively optimize the repository to save storage space
    -+	at the expense of taking much more time.
    -++
    -+Using this option may optimize for disk space at the expense of
    -+runtime performance. See the `--depth` and `--window` documentation in
    -+linkgit:git-repack[1]. It is not recommended that this option be used
    -+to improve performance for a given repository without running tailored
    -+performance benchmarks on it. It may make things better, or worse. Not
    -+using this at all is the right trade-off for most users and their
    -+repositories.
    -++
    -+The effects of this option are persistent to the extent that
    -+`gc.autoPackLimit` and friends don't cause a consolidation of existing
    -+pack(s) generated with this option.
    ++	mostly persistent. See the "AGGRESSIVE" section below for details.
      
      --auto::
      	With this option, 'git gc' checks whether any housekeeping is
    +@@
    + 	`.keep` files are consolidated into a single pack. When this
    + 	option is used, `gc.bigPackThreshold` is ignored.
    + 
    ++AGGRESSIVE
    ++----------
    ++
    ++When the `--aggressive` option is supplied, linkgit:git-repack[1] will
    ++be invoked with the `-f` flag, which in turn will pass
    ++`--no-reuse-delta` to linkgit:git-pack-objects[1]. This will throw
    ++away any existing deltas and re-compute them, at the expense of
    ++spending much more time on the repacking.
    ++
    ++The effects of this are mostly persistent, e.g. when packs and loose
    ++objects are coalesced into one another pack the existing deltas in
    ++that pack might get re-used, but there are also various cases where we
    ++might pick a sub-optimal delta from a newer pack instead.
    ++
    ++Furthermore, supplying `--aggressive` will tweak the `--depth` and
    ++`--window` options passed to linkgit:git-repack[1]. See the
    ++`gc.aggressiveDepth` and `gc.aggressiveWindow` settings below. By
    ++using a larger window size we're more likely to find more optimal
    ++deltas.
    ++
    ++It's probably not worth it to use this option on a given repository
    ++without running tailored performance benchmarks on it. It takes a lot
    ++more time, and the resulting space/delta optimization may or may not
    ++be worth it. Not using this at all is the right trade-off for most
    ++users and their repositories.
    ++
    + CONFIGURATION
    + -------------
    + 
 -:  ---------- >  8:  d80a6021f5 gc docs: note "gc --aggressive" in "fast-import"
 -:  ---------- >  9:  a5d31faf6f gc docs: clarify that "gc" doesn't throw away referenced objects
 -:  ---------- > 10:  9fd1203ad5 gc docs: remove incorrect reference to gc.auto=0
-- 
2.21.0.360.g471c308f928


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-22  6:01   ` Junio C Hamano
  2019-03-21 20:50 ` [PATCH v2 02/10] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The docs have been recommending that users need to run this manually,
but that hasn't been needed in practice for a long time except in
exceptional circumstances.

Let's instead have this reflect reality and say that most users don't
need to run this manually at all, while briefly describing the sorts
sort of cases where "gc" does need to be run manually.

Since we're recommending that users run this most of the and usually
don't need to tweak it, let's tone down the very prominent example of
the gc.auto=0 command. It's sufficient to point to the gc.auto
documentation below.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index a7c1b0f60e..774503e33d 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -20,17 +20,16 @@ created from prior invocations of 'git add', packing refs, pruning
 reflog, rerere metadata or stale working trees. May also update ancillary
 indexes such as the commit-graph.
 
-Users are encouraged to run this task on a regular basis within
-each repository to maintain good disk space utilization and good
-operating performance.
-
-Some git commands may automatically run 'git gc'; see the `--auto` flag
-below for details. If you know what you're doing and all you want is to
-disable this behavior permanently without further considerations, just do:
-
-----------------------
-$ git config --global gc.auto 0
-----------------------
+When common porcelain operations that creates objects are run, they
+will check whether the repository has grown substantially since the
+last maintenance, and if so run `git gc` automatically. See `gc.auto`
+below for how to disable this behavior.
+
+Running `git gc` manually should only be needed when adding objects to
+a repository without regularly running such porcelain commands, to do
+a one-off repository optimization, or e.g. to clean up a suboptimal
+mass-import. See the "PACKFILE OPTIMIZATION" section in
+linkgit:git-fast-import[1] for more details on the import case.
 
 OPTIONS
 -------
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 02/10] gc docs: stop noting "repack" flags
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 03/10] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Remove the mention of specific flags from the "gc" documentation, and
leave it at describing what we'll do instead. As seen in builtin/gc.c
we'll use various repack flags depending on what we detect we need to
do, so this isn't always accurate.

More importantly, a subsequent change is about to remove all this
documentation and replace it with an include of the gc.* docs in
git-config(1). By first changing this it's easier to reason about that
subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 774503e33d..95c3237f8e 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -52,14 +52,13 @@ OPTIONS
 +
 If the number of loose objects exceeds the value of the `gc.auto`
 configuration variable, then all loose objects are combined into a
-single pack using `git repack -d -l`.  Setting the value of `gc.auto`
+single pack.  Setting the value of `gc.auto`
 to 0 disables automatic packing of loose objects.
 +
 If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack by using the `-A` option of
-'git repack'.
+are consolidated into a single pack.
 If the amount of memory is estimated not enough for `git repack` to
 run smoothly and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 03/10] gc docs: clean grammar for "gc.bigPackThreshold"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 02/10] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 04/10] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Clean up the grammar in the documentation for
"gc.bigPackThreshold". This documentation was added in 9806f5a7bf ("gc
--auto: exclude base pack if not enough mem to "repack -ad"",
2018-04-15).

Saying "the amount of memory estimated for" flows more smoothly than
the previous "the amount of memory is estimated not enough".

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 95c3237f8e..c31fe581d9 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,8 +59,8 @@ If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
 are consolidated into a single pack.
-If the amount of memory is estimated not enough for `git repack` to
-run smoothly and `gc.bigPackThreshold` is not set, the largest
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
 with `--keep-base-pack`).
 Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 04/10] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 03/10] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 05/10] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 29 ++++++++++++-
 Documentation/git-gc.txt    | 86 +++----------------------------------
 2 files changed, 35 insertions(+), 80 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index c6fbb8a96f..a255ae67b0 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -2,24 +2,39 @@ gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 50.
++
+See the documentation for the `--depth` option in
+linkgit:git-repack[1] for more details.
 
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 250.
++
+See the documentation for the `--window` option in
+linkgit:git-repack[1] for more details.
 
 gc.auto::
 	When there are approximately more than this many loose
 	objects in the repository, `git gc --auto` will pack them.
 	Some Porcelain commands use this command to perform a
 	light-weight garbage collection from time to time.  The
-	default value is 6700.  Setting this to 0 disables it.
+	default value is 6700.
++
+Setting this to 0 disables not only automatic packing based on the
+number of loose objects, but any other heuristic `git gc --auto` will
+otherwise use to determine if there's work to do, such as
+`gc.autoPackLimit`.
 
 gc.autoPackLimit::
 	When there are more than this many packs that are not
 	marked with `*.keep` file in the repository, `git gc
 	--auto` consolidates them into one larger pack.  The
 	default	value is 50.  Setting this to 0 disables it.
+	Setting `gc.auto` to 0 will also disable this.
++
+See the `gc.bigPackThreshold` configuration variable below. When in
+use, it'll affect how the auto pack limit works.
 
 gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
@@ -36,6 +51,11 @@ Note that if the number of kept packs is more than gc.autoPackLimit,
 this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
++
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
+pack will also be excluded (this is the equivalent of running `git gc`
+with `--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -94,6 +114,13 @@ gc.<pattern>.reflogExpireUnreachable::
 	With "<pattern>" (e.g. "refs/stash")
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
++
+These types of entries are generally created as
+a result of using `git commit --amend` or `git rebase` and are the
+commits prior to the amend or rebase occurring.  Since these changes
+are not part of the current project most users will want to expire
+them sooner, which is why the default is more aggressive than
+`gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c31fe581d9..ba1ff9b4cf 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -45,28 +45,12 @@ OPTIONS
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
 	required; if not, it exits without performing any work.
-	Some git commands run `git gc --auto` after performing
-	operations that could create many loose objects. Housekeeping
-	is required if there are too many loose objects or too many
-	packs in the repository.
 +
-If the number of loose objects exceeds the value of the `gc.auto`
-configuration variable, then all loose objects are combined into a
-single pack.  Setting the value of `gc.auto`
-to 0 disables automatic packing of loose objects.
+See the `gc.auto' option in the "CONFIGURATION" section below for how
+this heuristic works.
 +
-If the number of packs exceeds the value of `gc.autoPackLimit`,
-then existing packs (except those marked with a `.keep` file
-or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack.
-If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-packs.
-+
-If houskeeping is required due to many loose objects or packs, all
+Once housekeeping is triggered by exceeding the limits of
+configuration options such as `gc.auto` and `gc.autoPackLimit`, all
 other housekeeping tasks (e.g. rerere, working trees, reflog...) will
 be performed as well.
 
@@ -97,66 +81,10 @@ be performed as well.
 CONFIGURATION
 -------------
 
-The optional configuration variable `gc.reflogExpire` can be
-set to indicate how long historical entries within each branch's
-reflog should remain available in this repository.  The setting is
-expressed as a length of time, for example '90 days' or '3 months'.
-It defaults to '90 days'.
-
-The optional configuration variable `gc.reflogExpireUnreachable`
-can be set to indicate how long historical reflog entries which
-are not part of the current branch should remain available in
-this repository.  These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner.  This option defaults to '30 days'.
-
-The above two configuration variables can be given to a pattern.  For
-example, this sets non-default expiry values only to remote-tracking
-branches:
-
-------------
-[gc "refs/remotes/*"]
-	reflogExpire = never
-	reflogExpireUnreachable = 3 days
-------------
-
-The optional configuration variable `gc.rerereResolved` indicates
-how long records of conflicted merge you resolved earlier are
-kept.  This defaults to 60 days.
-
-The optional configuration variable `gc.rerereUnresolved` indicates
-how long records of conflicted merge you have not resolved are
-kept.  This defaults to 15 days.
-
-The optional configuration variable `gc.packRefs` determines if
-'git gc' runs 'git pack-refs'. This can be set to "notbare" to enable
-it within all non-bare repos or it can be set to a boolean value.
-This defaults to true.
-
-The optional configuration variable `gc.writeCommitGraph` determines if
-'git gc' should run 'git commit-graph write'. This can be set to a
-boolean value. This defaults to false.
-
-The optional configuration variable `gc.aggressiveWindow` controls how
-much time is spent optimizing the delta compression of the objects in
-the repository when the --aggressive option is specified.  The larger
-the value, the more time is spent optimizing the delta compression.  See
-the documentation for the --window option in linkgit:git-repack[1] for
-more details.  This defaults to 250.
-
-Similarly, the optional configuration variable `gc.aggressiveDepth`
-controls --depth option in linkgit:git-repack[1]. This defaults to 50.
-
-The optional configuration variable `gc.pruneExpire` controls how old
-the unreferenced loose objects have to be before they are pruned.  The
-default is "2 weeks ago".
-
-Optional configuration variable `gc.worktreePruneExpire` controls how
-old a stale working tree should be before `git worktree prune` deletes
-it. Default is "3 months ago".
+The below documentation is the same as what's found in
+linkgit:git-config[1]:
 
+include::config/gc.txt[]
 
 NOTES
 -----
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 05/10] gc docs: re-flow the "gc.*" section in "config"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 04/10] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 06/10] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Re-flow the "gc.*" section in "config". A previous commit moved this
over from the "gc" docs, but tried to keep as many of the lines
identical to benefit from diff's move detection.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index a255ae67b0..3e7fc052d9 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -53,9 +53,9 @@ will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 +
 If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
+not available and `gc.bigPackThreshold` is not set, the largest pack
+will also be excluded (this is the equivalent of running `git gc` with
+`--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -115,12 +115,11 @@ gc.<pattern>.reflogExpireUnreachable::
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
 +
-These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner, which is why the default is more aggressive than
-`gc.reflogExpire`.
+These types of entries are generally created as a result of using `git
+commit --amend` or `git rebase` and are the commits prior to the amend
+or rebase occurring.  Since these changes are not part of the current
+project most users will want to expire them sooner, which is why the
+default is more aggressive than `gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 06/10] gc docs: note how --aggressive impacts --window & --depth
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 05/10] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 07/10] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Since 07e7dbf0db (gc: default aggressive depth to 50, 2016-08-11) we
somewhat confusingly use the same depth under --aggressive as we do by
default.

As noted in that commit that makes sense, it was wrong to make more
depth the default for "aggressive", and thus save disk space at the
expense of runtime performance, which is usually the opposite of
someone who'd like "aggressive gc" wants.

But that's left us with a mostly-redundant configuration variable, so
let's clearly note in its documentation that it doesn't change the
default.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 3e7fc052d9..0daa4683f6 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -1,7 +1,8 @@
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 50.
+	to 50, which is the default for the `--depth` option when
+	`--aggressive` isn't in use.
 +
 See the documentation for the `--depth` option in
 linkgit:git-repack[1] for more details.
@@ -9,7 +10,8 @@ linkgit:git-repack[1] for more details.
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 250.
+	to 250, which is a much more aggressive window size than
+	the default `--window` of 10.
 +
 See the documentation for the `--window` option in
 linkgit:git-repack[1] for more details.
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 07/10] gc docs: downplay the usefulness of --aggressive
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 06/10] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 08/10] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. I've personally talked to many users
who've taken these docs as an advice to use this option, and have,
usually it's (mostly) a waste of time.

So let's clarify what it really does, and let the user draw their own
conclusions.

Let's also clarify the "The effects [...] are persistent" to
paraphrase a brief version of Jeff King's explanation at [1].

1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index ba1ff9b4cf..c50ec30c83 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -39,8 +39,7 @@ OPTIONS
 	space utilization and performance.  This option will cause
 	'git gc' to more aggressively optimize the repository at the expense
 	of taking much more time.  The effects of this optimization are
-	persistent, so this option only needs to be used occasionally; every
-	few hundred changesets or so.
+	mostly persistent. See the "AGGRESSIVE" section below for details.
 
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
@@ -78,6 +77,32 @@ be performed as well.
 	`.keep` files are consolidated into a single pack. When this
 	option is used, `gc.bigPackThreshold` is ignored.
 
+AGGRESSIVE
+----------
+
+When the `--aggressive` option is supplied, linkgit:git-repack[1] will
+be invoked with the `-f` flag, which in turn will pass
+`--no-reuse-delta` to linkgit:git-pack-objects[1]. This will throw
+away any existing deltas and re-compute them, at the expense of
+spending much more time on the repacking.
+
+The effects of this are mostly persistent, e.g. when packs and loose
+objects are coalesced into one another pack the existing deltas in
+that pack might get re-used, but there are also various cases where we
+might pick a sub-optimal delta from a newer pack instead.
+
+Furthermore, supplying `--aggressive` will tweak the `--depth` and
+`--window` options passed to linkgit:git-repack[1]. See the
+`gc.aggressiveDepth` and `gc.aggressiveWindow` settings below. By
+using a larger window size we're more likely to find more optimal
+deltas.
+
+It's probably not worth it to use this option on a given repository
+without running tailored performance benchmarks on it. It takes a lot
+more time, and the resulting space/delta optimization may or may not
+be worth it. Not using this at all is the right trade-off for most
+users and their repositories.
+
 CONFIGURATION
 -------------
 
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 08/10] gc docs: note "gc --aggressive" in "fast-import"
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (12 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 07/10] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 09/10] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 10/10] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Amend the "PACKFILE OPTIMIZATION" section in "fast-import" to explain
that simply running "git gc --aggressive" after a "fast-import" should
properly optimize the repository. This is simpler and more effective
than the existing "repack" advice (which I'm keeping as it helps
explain things) because it e.g. also packs the newly imported refs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-fast-import.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 43ab3b1637..2248755cb7 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1396,6 +1396,13 @@ deltas are suboptimal (see above) then also adding the `-f` option
 to force recomputation of all deltas can significantly reduce the
 final packfile size (30-50% smaller can be quite typical).
 
+Instead of running `git repack` you can also run `git gc
+--aggressive`, which will also optimize other things after an import
+(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
+linkgit:git-gc[1] the `--aggressive` option will find new deltas with
+the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
+on above using `--aggressive` after a fast-import is one of the few
+cases where it's known to be worthwhile.
 
 MEMORY UTILIZATION
 ------------------
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 09/10] gc docs: clarify that "gc" doesn't throw away referenced objects
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (13 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 08/10] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  2019-03-21 20:50 ` [PATCH v2 10/10] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Amend the "NOTES" section to fix up wording that's been with us since
3ffb58be0a ("doc/git-gc: add a note about what is collected",
2008-04-23).

I can't remember when/where anymore (I think Freenode #Git), but at
some point I was having a conversation with someone who was convinced
that "gc" would prune things only referenced by e.g. refs/pull/*, and
pointed to this section as proof.

It turned out that they'd read the "branches and tags" wording here
and thought just refs/{heads,tags}/* and refs/remotes/* etc. would be
kept, which is what we enumerate explicitly.

So let's say "other refs", even though just above we say "objects that
are referenced anywhere in your repository".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c50ec30c83..dced7cde09 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,8 +119,8 @@ anywhere in your repository. In
 particular, it will keep not only objects referenced by your current set
 of branches and tags, but also objects referenced by the index,
 remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, or reflogs (which may reference commits in branches
-that were later amended or rewound).
+refs/original/, reflogs (which may reference commits in branches
+that were later amended or rewound), and anything else in the refs/* namespace.
 If you are expecting some objects to be deleted and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 10/10] gc docs: remove incorrect reference to gc.auto=0
  2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                   ` (14 preceding siblings ...)
  2019-03-21 20:50 ` [PATCH v2 09/10] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
@ 2019-03-21 20:50 ` Ævar Arnfjörð Bjarmason
  15 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21 20:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The chance of a repository being corrupted due to a "gc" has nothing
to do with whether or not that "gc" was invoked via "gc --auto", but
whether there's other concurrent operations happening.

This is already noted earlier in the paragraph, so there's no reason
to suggest this here. The user can infer from the rest of the
documentation that "gc" will run automatically unless gc.auto=0 is
set, and we shouldn't confuse the issue by implying that "gc --auto"
is somehow more prone to produce corruption than a normal "gc".

Well, it is in the sense that a blocking "gc" would stop you from
doing anything else in *that* particular terminal window, but users
are likely to have another window, or to be worried about how
concurrent "gc" on a server might cause corruption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index dced7cde09..1826d7e3bb 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -141,8 +141,7 @@ mitigate this problem:
 
 However, these features fall short of a complete solution, so users who
 run commands concurrently have to live with some risk of corruption (which
-seems to be low in practice) unless they turn off automatic garbage
-collection with 'git config gc.auto 0'.
+seems to be low in practice).
 
 HOOKS
 -----
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-18 21:31   ` Jeff King
@ 2019-03-21 22:11     ` Andreas Heiduk
  0 siblings, 0 replies; 64+ messages in thread
From: Andreas Heiduk @ 2019-03-21 22:11 UTC (permalink / raw)
  To: Jeff King, Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On 18.03.19 at 22:31 Jeff King wrote:
> On Mon, Mar 18, 2019 at 05:15:00PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
>> Rather than duplicating the documentation for the various "gc" options
>> let's include the "gc" docs from git-config. They were mostly better
>> already, and now we don't have the same docs in two places with subtly
>> different wording.
>>
>> In the cases where the git-gc(1) docs were saying something the "gc"
>> docs in git-config(1) didn't cover move the relevant section over to
>> the git-config(1) docs.
> 
> Makes sense.
> 
> I think we lose the actual example for gc.*.reflogExpire:
> 
>> -The above two configuration variables can be given to a pattern.  For
>> -example, this sets non-default expiry values only to remote-tracking
>> -branches:
>> -
>> -------------
>> -[gc "refs/remotes/*"]
>> -	reflogExpire = never
>> -	reflogExpireUnreachable = 3 days
>> -------------
> 
> I don't actually think it's that big a loss. If we wanted to retain it,
> though, it might make sense in the "EXAMPLES" section.

Coincidentally I stumbled over that example a  month ago and immediately
put `never` into my configuraton for a certain - often rebased branch
(something like "pu" here).

> 
> -Peff
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc"
  2019-03-21 20:50 ` [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-22  6:01   ` Junio C Hamano
  0 siblings, 0 replies; 64+ messages in thread
From: Junio C Hamano @ 2019-03-22  6:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> +When common porcelain operations that creates objects are run, they

"operations that create objects are run"?

> +will check whether the repository has grown substantially since the
> +last maintenance, and if so run `git gc` automatically. See `gc.auto`
> +below for how to disable this behavior.
> +
> +Running `git gc` manually should only be needed when adding objects to
> +a repository without regularly running such porcelain commands, to do
> +a one-off repository optimization, or e.g. to clean up a suboptimal
> +mass-import. See the "PACKFILE OPTIMIZATION" section in
> +linkgit:git-fast-import[1] for more details on the import case.
>  
>  OPTIONS
>  -------

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 01/11] " Ævar Arnfjörð Bjarmason
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Patch v3 fixes a minor grammar issue noted by Junio in
<xmqqtvfvphv6.fsf@gitster-ct.c.googlers.com>, and another "while I'm
at it" formatting error.

Ævar Arnfjörð Bjarmason (11):
  gc docs: modernize the advice for manually running "gc"
  gc docs: stop noting "repack" flags
  gc docs: clean grammar for "gc.bigPackThreshold"
  gc docs: include the "gc.*" section from "config" in "gc"
  gc docs: re-flow the "gc.*" section in "config"
  gc docs: fix formatting for "gc.writeCommitGraph"
  gc docs: note how --aggressive impacts --window & --depth
  gc docs: downplay the usefulness of --aggressive
  gc docs: note "gc --aggressive" in "fast-import"
  gc docs: clarify that "gc" doesn't throw away referenced objects
  gc docs: remove incorrect reference to gc.auto=0

 Documentation/config/gc.txt       |  38 ++++++--
 Documentation/git-fast-import.txt |   7 ++
 Documentation/git-gc.txt          | 142 ++++++++++--------------------
 3 files changed, 86 insertions(+), 101 deletions(-)

Range-diff:
 1:  89719142c7 !  1:  a48ef8d5d8 gc docs: modernize the advice for manually running "gc"
    @@ -35,7 +35,7 @@
     -----------------------
     -$ git config --global gc.auto 0
     -----------------------
    -+When common porcelain operations that creates objects are run, they
    ++When common porcelain operations that create objects are run, they
     +will check whether the repository has grown substantially since the
     +last maintenance, and if so run `git gc` automatically. See `gc.auto`
     +below for how to disable this behavior.
 2:  d90a5b1b4c =  2:  21e66a7903 gc docs: stop noting "repack" flags
 3:  fedd9bb886 =  3:  c8a1342e34 gc docs: clean grammar for "gc.bigPackThreshold"
 4:  6fad05a67c =  4:  9163e2f885 gc docs: include the "gc.*" section from "config" in "gc"
 5:  994e22a0d6 =  5:  8fa0e26671 gc docs: re-flow the "gc.*" section in "config"
 -:  ---------- >  6:  b70396f029 gc docs: fix formatting for "gc.writeCommitGraph"
 6:  916433ef73 =  7:  04ee81a3c9 gc docs: note how --aggressive impacts --window & --depth
 7:  457357b464 =  8:  04af0afbcf gc docs: downplay the usefulness of --aggressive
 8:  d80a6021f5 =  9:  c35bc94416 gc docs: note "gc --aggressive" in "fast-import"
 9:  a5d31faf6f = 10:  702f2cd2d9 gc docs: clarify that "gc" doesn't throw away referenced objects
10:  9fd1203ad5 = 11:  08af3cc3ee gc docs: remove incorrect reference to gc.auto=0
-- 
2.21.0.360.g471c308f928


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v3 01/11] gc docs: modernize the advice for manually running "gc"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The docs have been recommending that users need to run this manually,
but that hasn't been needed in practice for a long time except in
exceptional circumstances.

Let's instead have this reflect reality and say that most users don't
need to run this manually at all, while briefly describing the sorts
sort of cases where "gc" does need to be run manually.

Since we're recommending that users run this most of the and usually
don't need to tweak it, let's tone down the very prominent example of
the gc.auto=0 command. It's sufficient to point to the gc.auto
documentation below.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index a7c1b0f60e..dd22eecc79 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -20,17 +20,16 @@ created from prior invocations of 'git add', packing refs, pruning
 reflog, rerere metadata or stale working trees. May also update ancillary
 indexes such as the commit-graph.
 
-Users are encouraged to run this task on a regular basis within
-each repository to maintain good disk space utilization and good
-operating performance.
-
-Some git commands may automatically run 'git gc'; see the `--auto` flag
-below for details. If you know what you're doing and all you want is to
-disable this behavior permanently without further considerations, just do:
-
-----------------------
-$ git config --global gc.auto 0
-----------------------
+When common porcelain operations that create objects are run, they
+will check whether the repository has grown substantially since the
+last maintenance, and if so run `git gc` automatically. See `gc.auto`
+below for how to disable this behavior.
+
+Running `git gc` manually should only be needed when adding objects to
+a repository without regularly running such porcelain commands, to do
+a one-off repository optimization, or e.g. to clean up a suboptimal
+mass-import. See the "PACKFILE OPTIMIZATION" section in
+linkgit:git-fast-import[1] for more details on the import case.
 
 OPTIONS
 -------
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 02/11] gc docs: stop noting "repack" flags
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 01/11] " Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Remove the mention of specific flags from the "gc" documentation, and
leave it at describing what we'll do instead. As seen in builtin/gc.c
we'll use various repack flags depending on what we detect we need to
do, so this isn't always accurate.

More importantly, a subsequent change is about to remove all this
documentation and replace it with an include of the gc.* docs in
git-config(1). By first changing this it's easier to reason about that
subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index dd22eecc79..c56f4f7cde 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -52,14 +52,13 @@ OPTIONS
 +
 If the number of loose objects exceeds the value of the `gc.auto`
 configuration variable, then all loose objects are combined into a
-single pack using `git repack -d -l`.  Setting the value of `gc.auto`
+single pack.  Setting the value of `gc.auto`
 to 0 disables automatic packing of loose objects.
 +
 If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack by using the `-A` option of
-'git repack'.
+are consolidated into a single pack.
 If the amount of memory is estimated not enough for `git repack` to
 run smoothly and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 03/11] gc docs: clean grammar for "gc.bigPackThreshold"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Clean up the grammar in the documentation for
"gc.bigPackThreshold". This documentation was added in 9806f5a7bf ("gc
--auto: exclude base pack if not enough mem to "repack -ad"",
2018-04-15).

Saying "the amount of memory estimated for" flows more smoothly than
the previous "the amount of memory is estimated not enough".

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c56f4f7cde..66386439b7 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,8 +59,8 @@ If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
 are consolidated into a single pack.
-If the amount of memory is estimated not enough for `git repack` to
-run smoothly and `gc.bigPackThreshold` is not set, the largest
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
 with `--keep-base-pack`).
 Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (3 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-30 18:04     ` Todd Zullinger
                       ` (12 more replies)
  2019-03-22  9:32   ` [PATCH v3 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
                     ` (6 subsequent siblings)
  11 siblings, 13 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 29 ++++++++++++-
 Documentation/git-gc.txt    | 86 +++----------------------------------
 2 files changed, 35 insertions(+), 80 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index c6fbb8a96f..a255ae67b0 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -2,24 +2,39 @@ gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 50.
++
+See the documentation for the `--depth` option in
+linkgit:git-repack[1] for more details.
 
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 250.
++
+See the documentation for the `--window` option in
+linkgit:git-repack[1] for more details.
 
 gc.auto::
 	When there are approximately more than this many loose
 	objects in the repository, `git gc --auto` will pack them.
 	Some Porcelain commands use this command to perform a
 	light-weight garbage collection from time to time.  The
-	default value is 6700.  Setting this to 0 disables it.
+	default value is 6700.
++
+Setting this to 0 disables not only automatic packing based on the
+number of loose objects, but any other heuristic `git gc --auto` will
+otherwise use to determine if there's work to do, such as
+`gc.autoPackLimit`.
 
 gc.autoPackLimit::
 	When there are more than this many packs that are not
 	marked with `*.keep` file in the repository, `git gc
 	--auto` consolidates them into one larger pack.  The
 	default	value is 50.  Setting this to 0 disables it.
+	Setting `gc.auto` to 0 will also disable this.
++
+See the `gc.bigPackThreshold` configuration variable below. When in
+use, it'll affect how the auto pack limit works.
 
 gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
@@ -36,6 +51,11 @@ Note that if the number of kept packs is more than gc.autoPackLimit,
 this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
++
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
+pack will also be excluded (this is the equivalent of running `git gc`
+with `--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -94,6 +114,13 @@ gc.<pattern>.reflogExpireUnreachable::
 	With "<pattern>" (e.g. "refs/stash")
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
++
+These types of entries are generally created as
+a result of using `git commit --amend` or `git rebase` and are the
+commits prior to the amend or rebase occurring.  Since these changes
+are not part of the current project most users will want to expire
+them sooner, which is why the default is more aggressive than
+`gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 66386439b7..c037a46b09 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -45,28 +45,12 @@ OPTIONS
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
 	required; if not, it exits without performing any work.
-	Some git commands run `git gc --auto` after performing
-	operations that could create many loose objects. Housekeeping
-	is required if there are too many loose objects or too many
-	packs in the repository.
 +
-If the number of loose objects exceeds the value of the `gc.auto`
-configuration variable, then all loose objects are combined into a
-single pack.  Setting the value of `gc.auto`
-to 0 disables automatic packing of loose objects.
+See the `gc.auto' option in the "CONFIGURATION" section below for how
+this heuristic works.
 +
-If the number of packs exceeds the value of `gc.autoPackLimit`,
-then existing packs (except those marked with a `.keep` file
-or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack.
-If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-packs.
-+
-If houskeeping is required due to many loose objects or packs, all
+Once housekeeping is triggered by exceeding the limits of
+configuration options such as `gc.auto` and `gc.autoPackLimit`, all
 other housekeeping tasks (e.g. rerere, working trees, reflog...) will
 be performed as well.
 
@@ -97,66 +81,10 @@ be performed as well.
 CONFIGURATION
 -------------
 
-The optional configuration variable `gc.reflogExpire` can be
-set to indicate how long historical entries within each branch's
-reflog should remain available in this repository.  The setting is
-expressed as a length of time, for example '90 days' or '3 months'.
-It defaults to '90 days'.
-
-The optional configuration variable `gc.reflogExpireUnreachable`
-can be set to indicate how long historical reflog entries which
-are not part of the current branch should remain available in
-this repository.  These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner.  This option defaults to '30 days'.
-
-The above two configuration variables can be given to a pattern.  For
-example, this sets non-default expiry values only to remote-tracking
-branches:
-
-------------
-[gc "refs/remotes/*"]
-	reflogExpire = never
-	reflogExpireUnreachable = 3 days
-------------
-
-The optional configuration variable `gc.rerereResolved` indicates
-how long records of conflicted merge you resolved earlier are
-kept.  This defaults to 60 days.
-
-The optional configuration variable `gc.rerereUnresolved` indicates
-how long records of conflicted merge you have not resolved are
-kept.  This defaults to 15 days.
-
-The optional configuration variable `gc.packRefs` determines if
-'git gc' runs 'git pack-refs'. This can be set to "notbare" to enable
-it within all non-bare repos or it can be set to a boolean value.
-This defaults to true.
-
-The optional configuration variable `gc.writeCommitGraph` determines if
-'git gc' should run 'git commit-graph write'. This can be set to a
-boolean value. This defaults to false.
-
-The optional configuration variable `gc.aggressiveWindow` controls how
-much time is spent optimizing the delta compression of the objects in
-the repository when the --aggressive option is specified.  The larger
-the value, the more time is spent optimizing the delta compression.  See
-the documentation for the --window option in linkgit:git-repack[1] for
-more details.  This defaults to 250.
-
-Similarly, the optional configuration variable `gc.aggressiveDepth`
-controls --depth option in linkgit:git-repack[1]. This defaults to 50.
-
-The optional configuration variable `gc.pruneExpire` controls how old
-the unreferenced loose objects have to be before they are pruned.  The
-default is "2 weeks ago".
-
-Optional configuration variable `gc.worktreePruneExpire` controls how
-old a stale working tree should be before `git worktree prune` deletes
-it. Default is "3 months ago".
+The below documentation is the same as what's found in
+linkgit:git-config[1]:
 
+include::config/gc.txt[]
 
 NOTES
 -----
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 05/11] gc docs: re-flow the "gc.*" section in "config"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (4 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Re-flow the "gc.*" section in "config". A previous commit moved this
over from the "gc" docs, but tried to keep as many of the lines
identical to benefit from diff's move detection.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index a255ae67b0..3e7fc052d9 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -53,9 +53,9 @@ will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 +
 If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
+not available and `gc.bigPackThreshold` is not set, the largest pack
+will also be excluded (this is the equivalent of running `git gc` with
+`--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -115,12 +115,11 @@ gc.<pattern>.reflogExpireUnreachable::
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
 +
-These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner, which is why the default is more aggressive than
-`gc.reflogExpire`.
+These types of entries are generally created as a result of using `git
+commit --amend` or `git rebase` and are the commits prior to the amend
+or rebase occurring.  Since these changes are not part of the current
+project most users will want to expire them sooner, which is why the
+default is more aggressive than `gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 06/11] gc docs: fix formatting for "gc.writeCommitGraph"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (5 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Change the AsciiDoc formatting so that an example of "gc --auto" isn't
rendered as "git-gc(1) --auto", but as "git gc --auto". This is
consistent with the rest of the links and command examples in this
documentation.

The formatting I'm changing was initially introduced in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 3e7fc052d9..56918a5008 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -59,8 +59,8 @@ will also be excluded (this is the equivalent of running `git gc` with
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
-	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
-	'--auto' the commit-graph will be updated if housekeeping is
+	linkgit:git-gc[1] is run. When using `git gc --auto`
+	the commit-graph will be updated if housekeeping is
 	required. Default is false. See linkgit:git-commit-graph[1]
 	for details.
 
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 07/11] gc docs: note how --aggressive impacts --window & --depth
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (6 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Since 07e7dbf0db (gc: default aggressive depth to 50, 2016-08-11) we
somewhat confusingly use the same depth under --aggressive as we do by
default.

As noted in that commit that makes sense, it was wrong to make more
depth the default for "aggressive", and thus save disk space at the
expense of runtime performance, which is usually the opposite of
someone who'd like "aggressive gc" wants.

But that's left us with a mostly-redundant configuration variable, so
let's clearly note in its documentation that it doesn't change the
default.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 56918a5008..f732fe5bfd 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -1,7 +1,8 @@
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 50.
+	to 50, which is the default for the `--depth` option when
+	`--aggressive` isn't in use.
 +
 See the documentation for the `--depth` option in
 linkgit:git-repack[1] for more details.
@@ -9,7 +10,8 @@ linkgit:git-repack[1] for more details.
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 250.
+	to 250, which is a much more aggressive window size than
+	the default `--window` of 10.
 +
 See the documentation for the `--window` option in
 linkgit:git-repack[1] for more details.
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 08/11] gc docs: downplay the usefulness of --aggressive
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (7 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. I've personally talked to many users
who've taken these docs as an advice to use this option, and have,
usually it's (mostly) a waste of time.

So let's clarify what it really does, and let the user draw their own
conclusions.

Let's also clarify the "The effects [...] are persistent" to
paraphrase a brief version of Jeff King's explanation at [1].

1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c037a46b09..165f05e999 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -39,8 +39,7 @@ OPTIONS
 	space utilization and performance.  This option will cause
 	'git gc' to more aggressively optimize the repository at the expense
 	of taking much more time.  The effects of this optimization are
-	persistent, so this option only needs to be used occasionally; every
-	few hundred changesets or so.
+	mostly persistent. See the "AGGRESSIVE" section below for details.
 
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
@@ -78,6 +77,32 @@ be performed as well.
 	`.keep` files are consolidated into a single pack. When this
 	option is used, `gc.bigPackThreshold` is ignored.
 
+AGGRESSIVE
+----------
+
+When the `--aggressive` option is supplied, linkgit:git-repack[1] will
+be invoked with the `-f` flag, which in turn will pass
+`--no-reuse-delta` to linkgit:git-pack-objects[1]. This will throw
+away any existing deltas and re-compute them, at the expense of
+spending much more time on the repacking.
+
+The effects of this are mostly persistent, e.g. when packs and loose
+objects are coalesced into one another pack the existing deltas in
+that pack might get re-used, but there are also various cases where we
+might pick a sub-optimal delta from a newer pack instead.
+
+Furthermore, supplying `--aggressive` will tweak the `--depth` and
+`--window` options passed to linkgit:git-repack[1]. See the
+`gc.aggressiveDepth` and `gc.aggressiveWindow` settings below. By
+using a larger window size we're more likely to find more optimal
+deltas.
+
+It's probably not worth it to use this option on a given repository
+without running tailored performance benchmarks on it. It takes a lot
+more time, and the resulting space/delta optimization may or may not
+be worth it. Not using this at all is the right trade-off for most
+users and their repositories.
+
 CONFIGURATION
 -------------
 
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 09/11] gc docs: note "gc --aggressive" in "fast-import"
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (8 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Amend the "PACKFILE OPTIMIZATION" section in "fast-import" to explain
that simply running "git gc --aggressive" after a "fast-import" should
properly optimize the repository. This is simpler and more effective
than the existing "repack" advice (which I'm keeping as it helps
explain things) because it e.g. also packs the newly imported refs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-fast-import.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 43ab3b1637..2248755cb7 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1396,6 +1396,13 @@ deltas are suboptimal (see above) then also adding the `-f` option
 to force recomputation of all deltas can significantly reduce the
 final packfile size (30-50% smaller can be quite typical).
 
+Instead of running `git repack` you can also run `git gc
+--aggressive`, which will also optimize other things after an import
+(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
+linkgit:git-gc[1] the `--aggressive` option will find new deltas with
+the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
+on above using `--aggressive` after a fast-import is one of the few
+cases where it's known to be worthwhile.
 
 MEMORY UTILIZATION
 ------------------
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (9 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  2019-03-22  9:32   ` [PATCH v3 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

Amend the "NOTES" section to fix up wording that's been with us since
3ffb58be0a ("doc/git-gc: add a note about what is collected",
2008-04-23).

I can't remember when/where anymore (I think Freenode #Git), but at
some point I was having a conversation with someone who was convinced
that "gc" would prune things only referenced by e.g. refs/pull/*, and
pointed to this section as proof.

It turned out that they'd read the "branches and tags" wording here
and thought just refs/{heads,tags}/* and refs/remotes/* etc. would be
kept, which is what we enumerate explicitly.

So let's say "other refs", even though just above we say "objects that
are referenced anywhere in your repository".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 165f05e999..49aec5435b 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,8 +119,8 @@ anywhere in your repository. In
 particular, it will keep not only objects referenced by your current set
 of branches and tags, but also objects referenced by the index,
 remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, or reflogs (which may reference commits in branches
-that were later amended or rewound).
+refs/original/, reflogs (which may reference commits in branches
+that were later amended or rewound), and anything else in the refs/* namespace.
 If you are expecting some objects to be deleted and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v3 11/11] gc docs: remove incorrect reference to gc.auto=0
  2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
                     ` (10 preceding siblings ...)
  2019-03-22  9:32   ` [PATCH v3 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
@ 2019-03-22  9:32   ` Ævar Arnfjörð Bjarmason
  11 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-22  9:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Ævar Arnfjörð Bjarmason

The chance of a repository being corrupted due to a "gc" has nothing
to do with whether or not that "gc" was invoked via "gc --auto", but
whether there's other concurrent operations happening.

This is already noted earlier in the paragraph, so there's no reason
to suggest this here. The user can infer from the rest of the
documentation that "gc" will run automatically unless gc.auto=0 is
set, and we shouldn't confuse the issue by implying that "gc --auto"
is somehow more prone to produce corruption than a normal "gc".

Well, it is in the sense that a blocking "gc" would stop you from
doing anything else in *that* particular terminal window, but users
are likely to have another window, or to be worried about how
concurrent "gc" on a server might cause corruption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 49aec5435b..2af503bdb1 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -141,8 +141,7 @@ mitigate this problem:
 
 However, these features fall short of a complete solution, so users who
 run commands concurrently have to live with some risk of corruption (which
-seems to be low in practice) unless they turn off automatic garbage
-collection with 'git config gc.auto 0'.
+seems to be low in practice).
 
 HOOKS
 -----
-- 
2.21.0.360.g471c308f928


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-03-30 18:04     ` Todd Zullinger
  2019-04-07 19:52     ` [PATCH v4 00/11] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Todd Zullinger @ 2019-03-30 18:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Jeff King, Michael Haggerty, Stefan Beller, Jonathan Nieder,
	Matt McCutchen, Johannes Sixt

Hi Ævar,

Ævar Arnfjörð Bjarmason wrote:
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 66386439b7..c037a46b09 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -45,28 +45,12 @@ OPTIONS
>  --auto::
>  	With this option, 'git gc' checks whether any housekeeping is
>  	required; if not, it exits without performing any work.
> -	Some git commands run `git gc --auto` after performing
> -	operations that could create many loose objects. Housekeeping
> -	is required if there are too many loose objects or too many
> -	packs in the repository.
>  +
> -If the number of loose objects exceeds the value of the `gc.auto`
> -configuration variable, then all loose objects are combined into a
> -single pack.  Setting the value of `gc.auto`
> -to 0 disables automatic packing of loose objects.
> +See the `gc.auto' option in the "CONFIGURATION" section below for how
> +this heuristic works.

Did you want this "gc.auto" to use the differing left and
right accent/quote characters (which asciidoc renders as
single-quotes and asciidoctor as double-quotes) or should
the closing "'" instead be "`" to render "gc.auto" as
monospaced text?

I suspect it's the latter, as that matches most of the other
variable names in the docs.

I noticed this while comparing the output from asciidoc and
asciidoctor.  I've got a few similar changes queued up as
minor fixes to lower the diff between asciidoc/tor but I
wanted to check whether you intended this one before I sent
a patch to correct it. :)

Thanks,

-- 
Todd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 00/11] gc docs: modernize and fix the documentation
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
  2019-03-30 18:04     ` Todd Zullinger
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 01/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

v4 fixes a misbalanced quote noted by Todd Zullinger in
<20190330180415.GC4047@pobox.com>, and makes this equivalent to the
post-squash version sitting in gitster/ab/gc-docs now.

Ævar Arnfjörð Bjarmason (11):
  gc docs: modernize the advice for manually running "gc"
  gc docs: stop noting "repack" flags
  gc docs: clean grammar for "gc.bigPackThreshold"
  gc docs: include the "gc.*" section from "config" in "gc"
  gc docs: re-flow the "gc.*" section in "config"
  gc docs: fix formatting for "gc.writeCommitGraph"
  gc docs: note how --aggressive impacts --window & --depth
  gc docs: downplay the usefulness of --aggressive
  gc docs: note "gc --aggressive" in "fast-import"
  gc docs: clarify that "gc" doesn't throw away referenced objects
  gc docs: remove incorrect reference to gc.auto=0

 Documentation/config/gc.txt       |  38 ++++++--
 Documentation/git-fast-import.txt |   7 ++
 Documentation/git-gc.txt          | 142 ++++++++++--------------------
 3 files changed, 86 insertions(+), 101 deletions(-)

Range-diff:
 1:  a48ef8d5d8 =  1:  a48ef8d5d8 gc docs: modernize the advice for manually running "gc"
 2:  21e66a7903 =  2:  21e66a7903 gc docs: stop noting "repack" flags
 3:  c8a1342e34 =  3:  c8a1342e34 gc docs: clean grammar for "gc.bigPackThreshold"
 4:  9163e2f885 !  4:  f54ef80e69 gc docs: include the "gc.*" section from "config" in "gc"
    @@ -100,7 +100,7 @@
     -configuration variable, then all loose objects are combined into a
     -single pack.  Setting the value of `gc.auto`
     -to 0 disables automatic packing of loose objects.
    -+See the `gc.auto' option in the "CONFIGURATION" section below for how
    ++See the `gc.auto` option in the "CONFIGURATION" section below for how
     +this heuristic works.
      +
     -If the number of packs exceeds the value of `gc.autoPackLimit`,
 5:  8fa0e26671 =  5:  4ea4cf885a gc docs: re-flow the "gc.*" section in "config"
 6:  b70396f029 =  6:  ae5755278f gc docs: fix formatting for "gc.writeCommitGraph"
 7:  04ee81a3c9 =  7:  fc3bd0d5f4 gc docs: note how --aggressive impacts --window & --depth
 8:  04af0afbcf =  8:  7cff026e58 gc docs: downplay the usefulness of --aggressive
 9:  c35bc94416 =  9:  64617c43f6 gc docs: note "gc --aggressive" in "fast-import"
10:  702f2cd2d9 = 10:  84e5c669eb gc docs: clarify that "gc" doesn't throw away referenced objects
11:  08af3cc3ee = 11:  6a027d25a7 gc docs: remove incorrect reference to gc.auto=0
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 01/11] gc docs: modernize the advice for manually running "gc"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
  2019-03-30 18:04     ` Todd Zullinger
  2019-04-07 19:52     ` [PATCH v4 00/11] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

The docs have been recommending that users need to run this manually,
but that hasn't been needed in practice for a long time except in
exceptional circumstances.

Let's instead have this reflect reality and say that most users don't
need to run this manually at all, while briefly describing the sorts
sort of cases where "gc" does need to be run manually.

Since we're recommending that users run this most of the and usually
don't need to tweak it, let's tone down the very prominent example of
the gc.auto=0 command. It's sufficient to point to the gc.auto
documentation below.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index a7c1b0f60e..dd22eecc79 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -20,17 +20,16 @@ created from prior invocations of 'git add', packing refs, pruning
 reflog, rerere metadata or stale working trees. May also update ancillary
 indexes such as the commit-graph.
 
-Users are encouraged to run this task on a regular basis within
-each repository to maintain good disk space utilization and good
-operating performance.
-
-Some git commands may automatically run 'git gc'; see the `--auto` flag
-below for details. If you know what you're doing and all you want is to
-disable this behavior permanently without further considerations, just do:
-
-----------------------
-$ git config --global gc.auto 0
-----------------------
+When common porcelain operations that create objects are run, they
+will check whether the repository has grown substantially since the
+last maintenance, and if so run `git gc` automatically. See `gc.auto`
+below for how to disable this behavior.
+
+Running `git gc` manually should only be needed when adding objects to
+a repository without regularly running such porcelain commands, to do
+a one-off repository optimization, or e.g. to clean up a suboptimal
+mass-import. See the "PACKFILE OPTIMIZATION" section in
+linkgit:git-fast-import[1] for more details on the import case.
 
 OPTIONS
 -------
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 02/11] gc docs: stop noting "repack" flags
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 01/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Remove the mention of specific flags from the "gc" documentation, and
leave it at describing what we'll do instead. As seen in builtin/gc.c
we'll use various repack flags depending on what we detect we need to
do, so this isn't always accurate.

More importantly, a subsequent change is about to remove all this
documentation and replace it with an include of the gc.* docs in
git-config(1). By first changing this it's easier to reason about that
subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index dd22eecc79..c56f4f7cde 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -52,14 +52,13 @@ OPTIONS
 +
 If the number of loose objects exceeds the value of the `gc.auto`
 configuration variable, then all loose objects are combined into a
-single pack using `git repack -d -l`.  Setting the value of `gc.auto`
+single pack.  Setting the value of `gc.auto`
 to 0 disables automatic packing of loose objects.
 +
 If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack by using the `-A` option of
-'git repack'.
+are consolidated into a single pack.
 If the amount of memory is estimated not enough for `git repack` to
 run smoothly and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 03/11] gc docs: clean grammar for "gc.bigPackThreshold"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Clean up the grammar in the documentation for
"gc.bigPackThreshold". This documentation was added in 9806f5a7bf ("gc
--auto: exclude base pack if not enough mem to "repack -ad"",
2018-04-15).

Saying "the amount of memory estimated for" flows more smoothly than
the previous "the amount of memory is estimated not enough".

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index c56f4f7cde..66386439b7 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,8 +59,8 @@ If the number of packs exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
 are consolidated into a single pack.
-If the amount of memory is estimated not enough for `git repack` to
-run smoothly and `gc.bigPackThreshold` is not set, the largest
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
 pack will also be excluded (this is the equivalent of running `git gc`
 with `--keep-base-pack`).
 Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 04/11] gc docs: include the "gc.*" section from "config" in "gc"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 29 ++++++++++++-
 Documentation/git-gc.txt    | 86 +++----------------------------------
 2 files changed, 35 insertions(+), 80 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index c6fbb8a96f..a255ae67b0 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -2,24 +2,39 @@ gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 50.
++
+See the documentation for the `--depth` option in
+linkgit:git-repack[1] for more details.
 
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
 	to 250.
++
+See the documentation for the `--window` option in
+linkgit:git-repack[1] for more details.
 
 gc.auto::
 	When there are approximately more than this many loose
 	objects in the repository, `git gc --auto` will pack them.
 	Some Porcelain commands use this command to perform a
 	light-weight garbage collection from time to time.  The
-	default value is 6700.  Setting this to 0 disables it.
+	default value is 6700.
++
+Setting this to 0 disables not only automatic packing based on the
+number of loose objects, but any other heuristic `git gc --auto` will
+otherwise use to determine if there's work to do, such as
+`gc.autoPackLimit`.
 
 gc.autoPackLimit::
 	When there are more than this many packs that are not
 	marked with `*.keep` file in the repository, `git gc
 	--auto` consolidates them into one larger pack.  The
 	default	value is 50.  Setting this to 0 disables it.
+	Setting `gc.auto` to 0 will also disable this.
++
+See the `gc.bigPackThreshold` configuration variable below. When in
+use, it'll affect how the auto pack limit works.
 
 gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
@@ -36,6 +51,11 @@ Note that if the number of kept packs is more than gc.autoPackLimit,
 this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
++
+If the amount of memory estimated for `git repack` to run smoothly is
+not available and `gc.bigPackThreshold` is not set, the largest
+pack will also be excluded (this is the equivalent of running `git gc`
+with `--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -94,6 +114,13 @@ gc.<pattern>.reflogExpireUnreachable::
 	With "<pattern>" (e.g. "refs/stash")
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
++
+These types of entries are generally created as
+a result of using `git commit --amend` or `git rebase` and are the
+commits prior to the amend or rebase occurring.  Since these changes
+are not part of the current project most users will want to expire
+them sooner, which is why the default is more aggressive than
+`gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 66386439b7..37c4d26a76 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -45,28 +45,12 @@ OPTIONS
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
 	required; if not, it exits without performing any work.
-	Some git commands run `git gc --auto` after performing
-	operations that could create many loose objects. Housekeeping
-	is required if there are too many loose objects or too many
-	packs in the repository.
 +
-If the number of loose objects exceeds the value of the `gc.auto`
-configuration variable, then all loose objects are combined into a
-single pack.  Setting the value of `gc.auto`
-to 0 disables automatic packing of loose objects.
+See the `gc.auto` option in the "CONFIGURATION" section below for how
+this heuristic works.
 +
-If the number of packs exceeds the value of `gc.autoPackLimit`,
-then existing packs (except those marked with a `.keep` file
-or over `gc.bigPackThreshold` limit)
-are consolidated into a single pack.
-If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
-Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
-packs.
-+
-If houskeeping is required due to many loose objects or packs, all
+Once housekeeping is triggered by exceeding the limits of
+configuration options such as `gc.auto` and `gc.autoPackLimit`, all
 other housekeeping tasks (e.g. rerere, working trees, reflog...) will
 be performed as well.
 
@@ -97,66 +81,10 @@ be performed as well.
 CONFIGURATION
 -------------
 
-The optional configuration variable `gc.reflogExpire` can be
-set to indicate how long historical entries within each branch's
-reflog should remain available in this repository.  The setting is
-expressed as a length of time, for example '90 days' or '3 months'.
-It defaults to '90 days'.
-
-The optional configuration variable `gc.reflogExpireUnreachable`
-can be set to indicate how long historical reflog entries which
-are not part of the current branch should remain available in
-this repository.  These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner.  This option defaults to '30 days'.
-
-The above two configuration variables can be given to a pattern.  For
-example, this sets non-default expiry values only to remote-tracking
-branches:
-
-------------
-[gc "refs/remotes/*"]
-	reflogExpire = never
-	reflogExpireUnreachable = 3 days
-------------
-
-The optional configuration variable `gc.rerereResolved` indicates
-how long records of conflicted merge you resolved earlier are
-kept.  This defaults to 60 days.
-
-The optional configuration variable `gc.rerereUnresolved` indicates
-how long records of conflicted merge you have not resolved are
-kept.  This defaults to 15 days.
-
-The optional configuration variable `gc.packRefs` determines if
-'git gc' runs 'git pack-refs'. This can be set to "notbare" to enable
-it within all non-bare repos or it can be set to a boolean value.
-This defaults to true.
-
-The optional configuration variable `gc.writeCommitGraph` determines if
-'git gc' should run 'git commit-graph write'. This can be set to a
-boolean value. This defaults to false.
-
-The optional configuration variable `gc.aggressiveWindow` controls how
-much time is spent optimizing the delta compression of the objects in
-the repository when the --aggressive option is specified.  The larger
-the value, the more time is spent optimizing the delta compression.  See
-the documentation for the --window option in linkgit:git-repack[1] for
-more details.  This defaults to 250.
-
-Similarly, the optional configuration variable `gc.aggressiveDepth`
-controls --depth option in linkgit:git-repack[1]. This defaults to 50.
-
-The optional configuration variable `gc.pruneExpire` controls how old
-the unreferenced loose objects have to be before they are pruned.  The
-default is "2 weeks ago".
-
-Optional configuration variable `gc.worktreePruneExpire` controls how
-old a stale working tree should be before `git worktree prune` deletes
-it. Default is "3 months ago".
+The below documentation is the same as what's found in
+linkgit:git-config[1]:
 
+include::config/gc.txt[]
 
 NOTES
 -----
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 05/11] gc docs: re-flow the "gc.*" section in "config"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Re-flow the "gc.*" section in "config". A previous commit moved this
over from the "gc" docs, but tried to keep as many of the lines
identical to benefit from diff's move detection.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index a255ae67b0..3e7fc052d9 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -53,9 +53,9 @@ will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 +
 If the amount of memory estimated for `git repack` to run smoothly is
-not available and `gc.bigPackThreshold` is not set, the largest
-pack will also be excluded (this is the equivalent of running `git gc`
-with `--keep-base-pack`).
+not available and `gc.bigPackThreshold` is not set, the largest pack
+will also be excluded (this is the equivalent of running `git gc` with
+`--keep-base-pack`).
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
@@ -115,12 +115,11 @@ gc.<pattern>.reflogExpireUnreachable::
 	in the middle, the setting applies only to the refs that
 	match the <pattern>.
 +
-These types of entries are generally created as
-a result of using `git commit --amend` or `git rebase` and are the
-commits prior to the amend or rebase occurring.  Since these changes
-are not part of the current project most users will want to expire
-them sooner, which is why the default is more aggressive than
-`gc.reflogExpire`.
+These types of entries are generally created as a result of using `git
+commit --amend` or `git rebase` and are the commits prior to the amend
+or rebase occurring.  Since these changes are not part of the current
+project most users will want to expire them sooner, which is why the
+default is more aggressive than `gc.reflogExpire`.
 
 gc.rerereResolved::
 	Records of conflicted merge you resolved earlier are
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 06/11] gc docs: fix formatting for "gc.writeCommitGraph"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (6 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Change the AsciiDoc formatting so that an example of "gc --auto" isn't
rendered as "git-gc(1) --auto", but as "git gc --auto". This is
consistent with the rest of the links and command examples in this
documentation.

The formatting I'm changing was initially introduced in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 3e7fc052d9..56918a5008 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -59,8 +59,8 @@ will also be excluded (this is the equivalent of running `git gc` with
 
 gc.writeCommitGraph::
 	If true, then gc will rewrite the commit-graph file when
-	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
-	'--auto' the commit-graph will be updated if housekeeping is
+	linkgit:git-gc[1] is run. When using `git gc --auto`
+	the commit-graph will be updated if housekeeping is
 	required. Default is false. See linkgit:git-commit-graph[1]
 	for details.
 
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 07/11] gc docs: note how --aggressive impacts --window & --depth
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (7 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Since 07e7dbf0db (gc: default aggressive depth to 50, 2016-08-11) we
somewhat confusingly use the same depth under --aggressive as we do by
default.

As noted in that commit that makes sense, it was wrong to make more
depth the default for "aggressive", and thus save disk space at the
expense of runtime performance, which is usually the opposite of
someone who'd like "aggressive gc" wants.

But that's left us with a mostly-redundant configuration variable, so
let's clearly note in its documentation that it doesn't change the
default.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/gc.txt | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 56918a5008..f732fe5bfd 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -1,7 +1,8 @@
 gc.aggressiveDepth::
 	The depth parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 50.
+	to 50, which is the default for the `--depth` option when
+	`--aggressive` isn't in use.
 +
 See the documentation for the `--depth` option in
 linkgit:git-repack[1] for more details.
@@ -9,7 +10,8 @@ linkgit:git-repack[1] for more details.
 gc.aggressiveWindow::
 	The window size parameter used in the delta compression
 	algorithm used by 'git gc --aggressive'.  This defaults
-	to 250.
+	to 250, which is a much more aggressive window size than
+	the default `--window` of 10.
 +
 See the documentation for the `--window` option in
 linkgit:git-repack[1] for more details.
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 08/11] gc docs: downplay the usefulness of --aggressive
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (8 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. I've personally talked to many users
who've taken these docs as an advice to use this option, and have,
usually it's (mostly) a waste of time.

So let's clarify what it really does, and let the user draw their own
conclusions.

Let's also clarify the "The effects [...] are persistent" to
paraphrase a brief version of Jeff King's explanation at [1].

1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 37c4d26a76..5e80f306e7 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -39,8 +39,7 @@ OPTIONS
 	space utilization and performance.  This option will cause
 	'git gc' to more aggressively optimize the repository at the expense
 	of taking much more time.  The effects of this optimization are
-	persistent, so this option only needs to be used occasionally; every
-	few hundred changesets or so.
+	mostly persistent. See the "AGGRESSIVE" section below for details.
 
 --auto::
 	With this option, 'git gc' checks whether any housekeeping is
@@ -78,6 +77,32 @@ be performed as well.
 	`.keep` files are consolidated into a single pack. When this
 	option is used, `gc.bigPackThreshold` is ignored.
 
+AGGRESSIVE
+----------
+
+When the `--aggressive` option is supplied, linkgit:git-repack[1] will
+be invoked with the `-f` flag, which in turn will pass
+`--no-reuse-delta` to linkgit:git-pack-objects[1]. This will throw
+away any existing deltas and re-compute them, at the expense of
+spending much more time on the repacking.
+
+The effects of this are mostly persistent, e.g. when packs and loose
+objects are coalesced into one another pack the existing deltas in
+that pack might get re-used, but there are also various cases where we
+might pick a sub-optimal delta from a newer pack instead.
+
+Furthermore, supplying `--aggressive` will tweak the `--depth` and
+`--window` options passed to linkgit:git-repack[1]. See the
+`gc.aggressiveDepth` and `gc.aggressiveWindow` settings below. By
+using a larger window size we're more likely to find more optimal
+deltas.
+
+It's probably not worth it to use this option on a given repository
+without running tailored performance benchmarks on it. It takes a lot
+more time, and the resulting space/delta optimization may or may not
+be worth it. Not using this at all is the right trade-off for most
+users and their repositories.
+
 CONFIGURATION
 -------------
 
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 09/11] gc docs: note "gc --aggressive" in "fast-import"
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (9 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Amend the "PACKFILE OPTIMIZATION" section in "fast-import" to explain
that simply running "git gc --aggressive" after a "fast-import" should
properly optimize the repository. This is simpler and more effective
than the existing "repack" advice (which I'm keeping as it helps
explain things) because it e.g. also packs the newly imported refs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-fast-import.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 43ab3b1637..2248755cb7 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -1396,6 +1396,13 @@ deltas are suboptimal (see above) then also adding the `-f` option
 to force recomputation of all deltas can significantly reduce the
 final packfile size (30-50% smaller can be quite typical).
 
+Instead of running `git repack` you can also run `git gc
+--aggressive`, which will also optimize other things after an import
+(e.g. pack loose refs). As noted in the "AGGRESSIVE" section in
+linkgit:git-gc[1] the `--aggressive` option will find new deltas with
+the `-f` option to linkgit:git-repack[1]. For the reasons elaborated
+on above using `--aggressive` after a fast-import is one of the few
+cases where it's known to be worthwhile.
 
 MEMORY UTILIZATION
 ------------------
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (10 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  2019-04-07 19:52     ` [PATCH v4 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

Amend the "NOTES" section to fix up wording that's been with us since
3ffb58be0a ("doc/git-gc: add a note about what is collected",
2008-04-23).

I can't remember when/where anymore (I think Freenode #Git), but at
some point I was having a conversation with someone who was convinced
that "gc" would prune things only referenced by e.g. refs/pull/*, and
pointed to this section as proof.

It turned out that they'd read the "branches and tags" wording here
and thought just refs/{heads,tags}/* and refs/remotes/* etc. would be
kept, which is what we enumerate explicitly.

So let's say "other refs", even though just above we say "objects that
are referenced anywhere in your repository".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 5e80f306e7..9cdae588fb 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,8 +119,8 @@ anywhere in your repository. In
 particular, it will keep not only objects referenced by your current set
 of branches and tags, but also objects referenced by the index,
 remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, or reflogs (which may reference commits in branches
-that were later amended or rewound).
+refs/original/, reflogs (which may reference commits in branches
+that were later amended or rewound), and anything else in the refs/* namespace.
 If you are expecting some objects to be deleted and they aren't, check
 all of those locations and decide whether it makes sense in your case to
 remove those references.
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 11/11] gc docs: remove incorrect reference to gc.auto=0
  2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
                       ` (11 preceding siblings ...)
  2019-04-07 19:52     ` [PATCH v4 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
@ 2019-04-07 19:52     ` Ævar Arnfjörð Bjarmason
  12 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 19:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen,
	Johannes Sixt, Todd Zullinger,
	Ævar Arnfjörð Bjarmason

The chance of a repository being corrupted due to a "gc" has nothing
to do with whether or not that "gc" was invoked via "gc --auto", but
whether there's other concurrent operations happening.

This is already noted earlier in the paragraph, so there's no reason
to suggest this here. The user can infer from the rest of the
documentation that "gc" will run automatically unless gc.auto=0 is
set, and we shouldn't confuse the issue by implying that "gc --auto"
is somehow more prone to produce corruption than a normal "gc".

Well, it is in the sense that a blocking "gc" would stop you from
doing anything else in *that* particular terminal window, but users
are likely to have another window, or to be worried about how
concurrent "gc" on a server might cause corruption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-gc.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 9cdae588fb..247f765604 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -141,8 +141,7 @@ mitigate this problem:
 
 However, these features fall short of a complete solution, so users who
 run commands concurrently have to live with some risk of corruption (which
-seems to be low in practice) unless they turn off automatic garbage
-collection with 'git config gc.auto 0'.
+seems to be low in practice).
 
 HOOKS
 -----
-- 
2.21.0.392.gf8f6787159e


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-03-19  0:18     ` Jeff King
@ 2019-05-06  9:44       ` Ævar Arnfjörð Bjarmason
  2019-05-07  7:51         ` Jeff King
  0 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-06  9:44 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Tue, Mar 19 2019, Jeff King wrote:

> On Mon, Mar 18, 2019 at 11:45:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> > I don't think the quarantine stuff should impact contention at all. It's
>> > only quarantining the objects, which are the least contentious part of
>> > Git (because object content is idempotent, so we don't do any locking
>> > there, and with two racing processes, one will just "win").
>>
>> Without the quarantine, isn't there the race that the NOTES section
>> talks about (unless I've misread it).
>
> Ah, OK, I wasn't quite sure which documentation you were talking about.
> I see the discussion now in the "NOTES" section of git-gc(1).
>
>> I.e. we have some loose object "ABCD" not referrred to by anything for
>> the last 2 weeks, as we're gc-ing a ref update comes in that makes it
>> referenced again. We then delete "ABCD" (not used!) at the same time the
>> ref update happens, and get corruption.
>>
>> Whereas the quarantine might work around since the client will have sent
>> ABCD with no reference pointing to it to the server in the temp pack,
>> which we then rename in-place and then update the ref, so we don't care
>> if "ABCD" goes away.
>
> tl;dr I don't think quarantine impacts this, but if you really want gory
> details, read on.
>
> This is a problem with or without the quarantine. It's fundamentally a
> race because we do not atomically say "is anybody using X? If not, we
> can delete it" and some other process saying "I'd like to use X".
>
> Pushes are actually better off than most operations, because we only
> advertise what's reachable, and the client is expected to send
> everything else. So with just a normal update-ref call, we could race
> like this:
>
>   1. ABCD is ancient.
>
>   2. Process 1 (update-ref) wants to reference ABCD. It sees that we
>      have it.
>
>   3. Process 2 (gc/prune) sees that nobody references it. It deletes
>      ABCD.
>
>   4. Process 1 writes out the reference.
>
> That doesn't happen with a push, because the server never would have
> told the client that it has ABCD in the first place (so process 1 here
> is the client). That is true with or without quarantine.
>
> But pushes aren't foolproof either. You said "loose object ABCD not
> referred t oby anything for the last 2 weeks". But that's not exactly
> how it works. It's "object with an mtime of more than 2 weeks which is
> not currently referenced". So imagine a sequence like:
>
>   1. ABCD is ancient.
>
>   2. refs/heads/foo points to ABCD.
>
>   3. Server receive-pack advertises foo pointing to ABCD.
>
>   4. Simultaneous process on the server deletes refs/heads/foo (or
>      perhaps somebody force-pushes over it).
>
>   5. Client prepares and sends pack without ABCD.
>
>   6. Server receive-pack checks that yes, we still have ABCD (i.e., the
>      usual connectivity check).
>
>   7. Server gc drops ABCD, which is now unreachable (reflogs can help
>      here, if you've enabled them; but we do delete reflogs when the
>      branch is deleted).
>
>   8. Server receive-pack writes corrupt repo mentioning ABCD.
>
> That's a lot more steps, though they might not be as implausible as you
> think (e.g., consider moving "refs/heads/foo" to "refs/heads/bar" in a
> single push; that's actually a delete and an update, which is all you
> need to race with a simultaneous gc).
>
> I have no idea how often this happens in practice. My subjective
> recollection is that most of the race corruptions I've seen were from
> local operations on the server. E.g., we compute a tentative merge for
> somebody's pull request which shares objects with an older tentative
> merge. They click the "merge" button and we reference that commit, which
> is recent, but unbeknownst to us, while we were creating our new
> tentative merge, a "gc" was deleting the old one.
>
> We're sometimes saved by the "transitive freshness" rules in d3038d22f9
> (prune: keep objects reachable from recent objects, 2014-10-15).  But
> they're far from perfect:
>
>  - some operations (like the push rename example) aren't writing new
>    objects, so the ref write _is_ the moment that gc would find out
>    something is reachable
>
>  - the "is it reachable?" and "no, then delete it" steps aren't atomic.
>    Unless you want a whole-repo stop-the-world lock, somebody can
>    reference the object in between. And since it may take many seconds
>    to compute reachability, stop-the-world is not great.
>
> I think there are probably ways to make it better. Perhaps some kind of
> lockless delete-but-be-able-to-rollback scheme (but keep in mind this
> has to be implemented no top of POSIX filesystem semantics). Or even
> just a "compute reachability, mark for deletion, and then hold a
> stop-the-world lock briefly to double-check that our reachability is
> still up to date".
>
> At least those seem plausible to me. I've never worked out the details,
> and our solution was to just stop deleting objects during routine
> maintenance (using "repack -adk"). We do still occasionally prune
> manually (e.g., when somebody writes to support to remove a confidential
> mistake).
>
> Anyway, that was more than you probably wanted to know. The short of it
> is that I don't think quarantines help (they may even make things worse
> by slightly increasing the length of the race window, though in practice
> I doubt it).
>
>> Unless that interacts racily with the receive.unpackLimit, but then I
>> have no idea that section is trying to say...
>
> No, I don't think unpackLimit really affects it much either way.
>
>> Also, surely the part where "NOTES" says something to the effect of "you
>> are subject to races unless gc.auto=0" is wrong. To the extent that
>> there's races it won't matter that you invoke "git gc" or "git gc
>> --auto", it's the concurrency that matters. So if there's still races we
>> should be saying the repo needs to be locked for writes for the duration
>> of the "gc".
>
> Correct. It's the very act of pruning that is problematic. I think the
> point is that if you are manually running "git gc", you'd presumably do
> it at a time when the repository is not otherwise active.

Thanks for that E-Mail. I'm hoping to get around to another set of "gc
doc" updates and will hopefully be able to steal liberally from it.

Maybe there's some case I haven't thought of that makes this stupid, but
I wonder if something like a "gc quarantine" might be a fix fo both of
the the issues you noted above.

I.e. it seems to me that the main issue is that we conflate "mtime 2
weeks old because it's unreferenced for 2 weeks" v.s. "mtime 2 weeks old
because we haven't gotten around to a 'gc'".

So in such a "gc quarantine" mode when we discover an object/pack that's
unreachable/purely made up of unreachable objects we'd move the relevant
loose object/"loose" pack to such a quarantine, which would just be
.git/unreferenced-objects/{??,pack}/ or whatever.

AFAICT both cases you mentioned above would be mitigated by this because
we'd no longer conflate "haven't gc'd this yet and it's 2 weeks old"
v.s. "hasn't been referenced in 2 weeks".

I started looking at this initially because I was wondering if the
--keep-unreachable mode you modified in e26a8c4721 ("repack: extend
--keep-unreachable to loose objects", 2016-06-13) could be made to write
out such "unreferenced" objects into their *own* pack, so we could
delete them all at once as a batch, and wouldn't create the "ref
explosions" mentioned in [1].

But of course without an accompanying quarantine described above doing
that would just make this race condition worse.

1. https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-05-06  9:44       ` Ævar Arnfjörð Bjarmason
@ 2019-05-07  7:51         ` Jeff King
  2019-05-09 23:20           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-05-07  7:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Mon, May 06, 2019 at 11:44:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Maybe there's some case I haven't thought of that makes this stupid, but
> I wonder if something like a "gc quarantine" might be a fix fo both of
> the the issues you noted above.
> 
> I.e. it seems to me that the main issue is that we conflate "mtime 2
> weeks old because it's unreferenced for 2 weeks" v.s. "mtime 2 weeks old
> because we haven't gotten around to a 'gc'".
> 
> So in such a "gc quarantine" mode when we discover an object/pack that's
> unreachable/purely made up of unreachable objects we'd move the relevant
> loose object/"loose" pack to such a quarantine, which would just be
> .git/unreferenced-objects/{??,pack}/ or whatever.
> 
> AFAICT both cases you mentioned above would be mitigated by this because
> we'd no longer conflate "haven't gc'd this yet and it's 2 weeks old"
> v.s. "hasn't been referenced in 2 weeks".

Michael Haggerty and I have (off-list) discussed variations on that, but
it opens up a lot of new issues.  Moving something into quarantine isn't
atomic. So you've still corrupted the repo, but now it's recoverable by
reaching into the quarantine. Who notices that the repo is corrupt, and
how? When do we expire objects from quarantine?

I think the heart of the issue is really the lack of atomicity in the
operations. You need some way to mark "I am using this now" in a way
that cannot race with "looks like nobody is using this, so I'll delete
it".

And ideally without traversing large bits of the graph on the writing
side, and without requiring any stop-the-world locks during pruning.

> I started looking at this initially because I was wondering if the
> --keep-unreachable mode you modified in e26a8c4721 ("repack: extend
> --keep-unreachable to loose objects", 2016-06-13) could be made to write
> out such "unreferenced" objects into their *own* pack, so we could
> delete them all at once as a batch, and wouldn't create the "ref
> explosions" mentioned in [1].
> 
> But of course without an accompanying quarantine described above doing
> that would just make this race condition worse.

I'm not sure it really makes it worse. The pack would have the same
mtime as the loose objects would.

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-05-07  7:51         ` Jeff King
@ 2019-05-09 23:20           ` Ævar Arnfjörð Bjarmason
  2019-07-31  4:26             ` Jeff King
  0 siblings, 1 reply; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-09 23:20 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Tue, May 07 2019, Jeff King wrote:

> On Mon, May 06, 2019 at 11:44:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Maybe there's some case I haven't thought of that makes this stupid, but
>> I wonder if something like a "gc quarantine" might be a fix fo both of
>> the the issues you noted above.
>>
>> I.e. it seems to me that the main issue is that we conflate "mtime 2
>> weeks old because it's unreferenced for 2 weeks" v.s. "mtime 2 weeks old
>> because we haven't gotten around to a 'gc'".
>>
>> So in such a "gc quarantine" mode when we discover an object/pack that's
>> unreachable/purely made up of unreachable objects we'd move the relevant
>> loose object/"loose" pack to such a quarantine, which would just be
>> .git/unreferenced-objects/{??,pack}/ or whatever.
>>
>> AFAICT both cases you mentioned above would be mitigated by this because
>> we'd no longer conflate "haven't gc'd this yet and it's 2 weeks old"
>> v.s. "hasn't been referenced in 2 weeks".
>
> Michael Haggerty and I have (off-list) discussed variations on that, but
> it opens up a lot of new issues.  Moving something into quarantine isn't
> atomic. So you've still corrupted the repo, but now it's recoverable by
> reaching into the quarantine. Who notices that the repo is corrupt, and
> how? When do we expire objects from quarantine?
>
> I think the heart of the issue is really the lack of atomicity in the
> operations. You need some way to mark "I am using this now" in a way
> that cannot race with "looks like nobody is using this, so I'll delete
> it".
>
> And ideally without traversing large bits of the graph on the writing
> side, and without requiring any stop-the-world locks during pruning.

I was thinking (but realize now that I didn't articulate) that the "gc
quarantine" would be another "alternate" implementing a copy-on-write
"lockless delete-but-be-able-to-rollback scheme" as you put it.

So "gc" would decide (racily) what's unreachable, but instead of
unlink()-ing it would "mv" the loose object/pack into the
"unreferenced-objects" quarantine.

Then in your example #1 "wants to reference ABCD. It sees that we have
it." would race on the "other side". I.e. maybe ABCD was *just* moved to
the quarantine. But in that case we'd move it back, which would bump the
mtime and thus make it ineligible for expiry.

Similarly for example #2, the "ABCD is ancient" would be moved, but then
promptely moved back on the next GC as we notice ABCD has been
re-referenced.

Maybe it's just the same problem all over again, but I don't see how
yet.

Aside from that, I have a hunch that while it's theoretically true that
you can at any time re-reference some loose blob/tree/commit again, that
the likelyhood of that in practice goes down as it ages, since a user is
likely to e.g. re-push or rename some branch they pushed last week, not
last year.

Hence the mention of creating "unreferenced packs" with some new
--keep-unreachable mode. Since we'd pack those together they wouldn't
create the "ref explosion" problem we have with the loose refs, and thus
you could afford to keep them longer (even though the deltas would be
shittier).

Whereas now you either need --keep-unreachable (keep stuff forever) or a
more aggressive gc.pruneExpire if you'd like to not end up with a
ginormous amount of loose objects.

>> I started looking at this initially because I was wondering if the
>> --keep-unreachable mode you modified in e26a8c4721 ("repack: extend
>> --keep-unreachable to loose objects", 2016-06-13) could be made to write
>> out such "unreferenced" objects into their *own* pack, so we could
>> delete them all at once as a batch, and wouldn't create the "ref
>> explosions" mentioned in [1].
>>
>> But of course without an accompanying quarantine described above doing
>> that would just make this race condition worse.
>
> I'm not sure it really makes it worse. The pack would have the same
> mtime as the loose objects would.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-05-09 23:20           ` Ævar Arnfjörð Bjarmason
@ 2019-07-31  4:26             ` Jeff King
  2019-07-31 10:12               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 64+ messages in thread
From: Jeff King @ 2019-07-31  4:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen

On Fri, May 10, 2019 at 01:20:55AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Michael Haggerty and I have (off-list) discussed variations on that, but
> > it opens up a lot of new issues.  Moving something into quarantine isn't
> > atomic. So you've still corrupted the repo, but now it's recoverable by
> > reaching into the quarantine. Who notices that the repo is corrupt, and
> > how? When do we expire objects from quarantine?
> >
> > I think the heart of the issue is really the lack of atomicity in the
> > operations. You need some way to mark "I am using this now" in a way
> > that cannot race with "looks like nobody is using this, so I'll delete
> > it".
> >
> > And ideally without traversing large bits of the graph on the writing
> > side, and without requiring any stop-the-world locks during pruning.
> 
> I was thinking (but realize now that I didn't articulate) that the "gc
> quarantine" would be another "alternate" implementing a copy-on-write
> "lockless delete-but-be-able-to-rollback scheme" as you put it.
> 
> So "gc" would decide (racily) what's unreachable, but instead of
> unlink()-ing it would "mv" the loose object/pack into the
> "unreferenced-objects" quarantine.
> 
> Then in your example #1 "wants to reference ABCD. It sees that we have
> it." would race on the "other side". I.e. maybe ABCD was *just* moved to
> the quarantine. But in that case we'd move it back, which would bump the
> mtime and thus make it ineligible for expiry.

I think this is basically the same as the current freshening scheme,
though. In general, you can replace "move it back" with "update its
mtime". Neither is atomic with respect to other operations.

It does seem like the twist is that "gc" is supposed to do the "move it
back" step (and it's also the thing expiring, if we assume that there's
only one gc running at a time). But again, how do we know somebody isn't
referencing it _right now_ while we're deciding whether to move it back?

I think there are lots of solutions you can come up with if you have
atomicity. But fundamentally it isn't there in the way we handle updates
now. You could imagine something like a shared/unique lock where anybody
updating a ref takes the "shared" side, and multiple entities can hold
it at once. But somebody pruning takes the "unique" side and excludes
everybody else, stopping ref updates during the prune (which you'd
obviously want to do in a way that you hold the lock for as short as
possible; say, optimistically check reachability without the lock, then
take the lock and check to see if anything has changed).

(By shared/unique I basically mean a reader/writer lock, but I didn't
want to use those terms in the paragraph since both holders are
writing).

It is tricky to find out when to hold the shared lock, though. It's
_not_ just a ref write, for example. When you accept a push, you'd want
to hold the lock while you are checking that you have all of the
necessary objects to write the ref. For something like "git commit" it's
even harder, because we implicitly rely on state created by commands run
over the course of hours or days (e.g., "git add" to put a blob in the
index and maybe create the tree via cache-tree, then a commit to
reference it, and finally the ref write; each step adds state which the
next step relies on).

> Aside from that, I have a hunch that while it's theoretically true that
> you can at any time re-reference some loose blob/tree/commit again, that
> the likelyhood of that in practice goes down as it ages, since a user is
> likely to e.g. re-push or rename some branch they pushed last week, not
> last year.
>
> Hence the mention of creating "unreferenced packs" with some new
> --keep-unreachable mode. Since we'd pack those together they wouldn't
> create the "ref explosion" problem we have with the loose refs, and thus
> you could afford to keep them longer (even though the deltas would be
> shittier).

Yeah, that may make it less likely (and we'd like those unreferenced
packs for other reasons anyway, so it's certainly worth a shot). But the
whole race is kind of unlikely in the first place. If you have enough
repositories, you see it eventually. ;)

-Peff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH 0/4] gc docs: modernize and fix the documentation
  2019-07-31  4:26             ` Jeff King
@ 2019-07-31 10:12               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 64+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-07-31 10:12 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Stefan Beller, Jonathan Nieder, Matt McCutchen


On Wed, Jul 31 2019, Jeff King wrote:

> On Fri, May 10, 2019 at 01:20:55AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Michael Haggerty and I have (off-list) discussed variations on that, but
>> > it opens up a lot of new issues.  Moving something into quarantine isn't
>> > atomic. So you've still corrupted the repo, but now it's recoverable by
>> > reaching into the quarantine. Who notices that the repo is corrupt, and
>> > how? When do we expire objects from quarantine?
>> >
>> > I think the heart of the issue is really the lack of atomicity in the
>> > operations. You need some way to mark "I am using this now" in a way
>> > that cannot race with "looks like nobody is using this, so I'll delete
>> > it".
>> >
>> > And ideally without traversing large bits of the graph on the writing
>> > side, and without requiring any stop-the-world locks during pruning.
>>
>> I was thinking (but realize now that I didn't articulate) that the "gc
>> quarantine" would be another "alternate" implementing a copy-on-write
>> "lockless delete-but-be-able-to-rollback scheme" as you put it.
>>
>> So "gc" would decide (racily) what's unreachable, but instead of
>> unlink()-ing it would "mv" the loose object/pack into the
>> "unreferenced-objects" quarantine.
>>
>> Then in your example #1 "wants to reference ABCD. It sees that we have
>> it." would race on the "other side". I.e. maybe ABCD was *just* moved to
>> the quarantine. But in that case we'd move it back, which would bump the
>> mtime and thus make it ineligible for expiry.
>
> I think this is basically the same as the current freshening scheme,
> though. In general, you can replace "move it back" with "update its
> mtime". Neither is atomic with respect to other operations.
>
> It does seem like the twist is that "gc" is supposed to do the "move it
> back" step (and it's also the thing expiring, if we assume that there's
> only one gc running at a time). But again, how do we know somebody isn't
> referencing it _right now_ while we're deciding whether to move it back?

The twist is to create a "quarantine" area of the ref store you can't
read any objects from without copying them to the "main" area (git-gc
itself would be an exception).

Hence step #2 and #6, respectively, in your examples in
https://public-inbox.org/git/20190319001829.GL29661@sigill.intra.peff.net/
would have update-ref/receive-pack fail to find "ABCD" in the "main"
store due to the exact same race we have now with mtimes & gc, then fall
back to the "quarantine" and (this is the important part) immediately
copy it back to the "main" store.

IOW yes, you'd have the exact same race you have now with the initial
move to the quarantine. You'd have ref updates & gc racing and
"unreachable" things would be moved to the quarantine, but really the
just became reachable again.

The difference is that instead of unlinking that unreachable object we
move it to the quarantine, so the next "gc" (which is what would delete
it) would notice it's reachable and move it to the "main" area before
proceeding, *and* anything that "faults" back to reading the
"quarantine" would do the same.

> I think there are lots of solutions you can come up with if you have
> atomicity. But fundamentally it isn't there in the way we handle updates
> now. You could imagine something like a shared/unique lock where anybody
> updating a ref takes the "shared" side, and multiple entities can hold
> it at once. But somebody pruning takes the "unique" side and excludes
> everybody else, stopping ref updates during the prune (which you'd
> obviously want to do in a way that you hold the lock for as short as
> possible; say, optimistically check reachability without the lock, then
> take the lock and check to see if anything has changed).
>
> (By shared/unique I basically mean a reader/writer lock, but I didn't
> want to use those terms in the paragraph since both holders are
> writing).
>
> It is tricky to find out when to hold the shared lock, though. It's
> _not_ just a ref write, for example. When you accept a push, you'd want
> to hold the lock while you are checking that you have all of the
> necessary objects to write the ref. For something like "git commit" it's
> even harder, because we implicitly rely on state created by commands run
> over the course of hours or days (e.g., "git add" to put a blob in the
> index and maybe create the tree via cache-tree, then a commit to
> reference it, and finally the ref write; each step adds state which the
> next step relies on).

I don't think this sort of approach would require any global locks, but
it would be vulnerable to operations that take longer than the
"main->quarantine->unlink()" cycle takes. E.g. a "hash-object" that
takes a month before the subsequent "write-tree" etc.

All of the above written with the previously stated "I may be missing
something" caveat etc. :)

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2019-07-31 10:12 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-18 16:14 [PATCH 0/4] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
2019-03-18 16:14 ` [PATCH 1/4] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-18 21:27   ` Jeff King
2019-03-18 22:18     ` Ævar Arnfjörð Bjarmason
2019-03-18 16:15 ` [PATCH 2/4] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-18 21:31   ` Jeff King
2019-03-21 22:11     ` Andreas Heiduk
2019-03-19  2:08   ` Duy Nguyen
2019-03-18 16:15 ` [PATCH 3/4] gc docs: de-duplicate "OPTIONS" and "CONFIGURATION" Ævar Arnfjörð Bjarmason
2019-03-18 21:49   ` Jeff King
2019-03-18 22:48     ` Ævar Arnfjörð Bjarmason
2019-03-18 23:42       ` Jeff King
2019-03-18 16:15 ` [PATCH 4/4] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-18 20:28   ` Jonathan Nieder
2019-03-18 21:22     ` Jeff King
2019-03-18 22:13       ` Ævar Arnfjörð Bjarmason
2019-03-18 23:53         ` Jeff King
2019-03-19  6:54   ` Johannes Sixt
2019-03-19  9:28     ` Ævar Arnfjörð Bjarmason
2019-03-18 21:51 ` [PATCH 0/4] gc docs: modernize and fix the documentation Jeff King
2019-03-18 22:45   ` Ævar Arnfjörð Bjarmason
2019-03-19  0:18     ` Jeff King
2019-05-06  9:44       ` Ævar Arnfjörð Bjarmason
2019-05-07  7:51         ` Jeff King
2019-05-09 23:20           ` Ævar Arnfjörð Bjarmason
2019-07-31  4:26             ` Jeff King
2019-07-31 10:12               ` Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 00/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 01/11] " Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-30 18:04     ` Todd Zullinger
2019-04-07 19:52     ` [PATCH v4 00/11] gc docs: modernize and fix the documentation Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 01/11] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 02/11] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 03/11] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 04/11] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-04-07 19:52     ` [PATCH v4 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 05/11] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 06/11] gc docs: fix formatting for "gc.writeCommitGraph" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 07/11] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 08/11] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 09/11] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 10/11] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-03-22  9:32   ` [PATCH v3 11/11] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 01/10] gc docs: modernize the advice for manually running "gc" Ævar Arnfjörð Bjarmason
2019-03-22  6:01   ` Junio C Hamano
2019-03-21 20:50 ` [PATCH v2 02/10] gc docs: stop noting "repack" flags Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 03/10] gc docs: clean grammar for "gc.bigPackThreshold" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 04/10] gc docs: include the "gc.*" section from "config" in "gc" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 05/10] gc docs: re-flow the "gc.*" section in "config" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 06/10] gc docs: note how --aggressive impacts --window & --depth Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 07/10] gc docs: downplay the usefulness of --aggressive Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 08/10] gc docs: note "gc --aggressive" in "fast-import" Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 09/10] gc docs: clarify that "gc" doesn't throw away referenced objects Ævar Arnfjörð Bjarmason
2019-03-21 20:50 ` [PATCH v2 10/10] gc docs: remove incorrect reference to gc.auto=0 Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).