git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] config: set pack.useSparse=true by default
@ 2020-03-19  1:58 Derrick Stolee via GitGitGadget
  2020-03-19 23:13 ` Jonathan Nieder
  2020-03-20 12:27 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  0 siblings, 2 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-19  1:58 UTC (permalink / raw)
  To: git; +Cc: me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The pack.useSparse config option was introduced by 3d036eb0
(pack-objects: create pack.useSparse setting, 2019-01-19) and was
first available in v2.21.0. When enabled, the pack-objects process
during 'git push' will use a sparse tree walk when deciding which
trees and blobs to send to the remote. The algorithm was introduced
by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
has been in production use by VFS for Git since around that time.
The features.experimental config option also enabled pack.useSparse,
so hopefully that has also increased exposure.

It is worth noting that pack.useSparse has a possibility of
sending more objects across a push, but requires a special
arrangement of exact _copies_ across directories. There is a test
in t5322-pack-objects-sparse.sh that demonstrates this possibility.

Since the downside is unlikely but the upside is significant, set
the default value of pack.useSparse to true. Remove it from the
set of options implied by features.experimental.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    config: set pack.useSparse=true by default
    
    Here is a small patch to convert pack.useSparse to true by default. It's
    been released for over a year, so the feature is quite stable. I'm
    submitting this now to allow it to cook for a while during the next
    release cycle.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-585%2Fderrickstolee%2Fpack-use-sparse-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-585/derrickstolee/pack-use-sparse-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/585

 Documentation/config/feature.txt | 3 ---
 Documentation/config/pack.txt    | 4 ++--
 repo-settings.c                  | 3 ++-
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index 875f8c8a66f..4e3a5c0cebc 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -12,9 +12,6 @@ feature.experimental::
 	setting if you are interested in providing feedback on experimental
 	features. The new default values are:
 +
-* `pack.useSparse=true` uses a new algorithm when constructing a pack-file
-which can improve `git push` performance in repos with many files.
-+
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
 +
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 0dac5805816..837f1b16792 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -119,8 +119,8 @@ pack.useSparse::
 	objects. This can have significant performance benefits when
 	computing a pack to send a small change. However, it is possible
 	that extra objects are added to the pack-file if the included
-	commits contain certain types of direct renames. Default is `false`
-	unless `feature.experimental` is enabled.
+	commits contain certain types of direct renames. Default is
+	`true`.
 
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
diff --git a/repo-settings.c b/repo-settings.c
index a703e407a3f..dc6817daa95 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -45,6 +45,8 @@ void prepare_repo_settings(struct repository *r)
 
 	if (!repo_config_get_bool(r, "pack.usesparse", &value))
 		r->settings.pack_use_sparse = value;
+	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
+
 	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
 		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
 		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
@@ -52,7 +54,6 @@ void prepare_repo_settings(struct repository *r)
 	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
 		r->settings.fetch_write_commit_graph = value;
 	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
-		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
 	}

base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] config: set pack.useSparse=true by default
  2020-03-19  1:58 [PATCH] config: set pack.useSparse=true by default Derrick Stolee via GitGitGadget
@ 2020-03-19 23:13 ` Jonathan Nieder
  2020-03-20  0:34   ` Derrick Stolee
  2020-03-20 12:27 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Nieder @ 2020-03-19 23:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee

Hi,

Derrick Stolee wrote:

> The pack.useSparse config option was introduced by 3d036eb0
> (pack-objects: create pack.useSparse setting, 2019-01-19) and was
> first available in v2.21.0. When enabled, the pack-objects process
> during 'git push' will use a sparse tree walk when deciding which
> trees and blobs to send to the remote. The algorithm was introduced
> by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
> has been in production use by VFS for Git since around that time.
> The features.experimental config option also enabled pack.useSparse,
> so hopefully that has also increased exposure.
>
> It is worth noting that pack.useSparse has a possibility of
> sending more objects across a push, but requires a special
> arrangement of exact _copies_ across directories. There is a test
> in t5322-pack-objects-sparse.sh that demonstrates this possibility.
>
> Since the downside is unlikely but the upside is significant, set
> the default value of pack.useSparse to true. Remove it from the
> set of options implied by features.experimental.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/feature.txt | 3 ---
>  Documentation/config/pack.txt    | 4 ++--
>  repo-settings.c                  | 3 ++-
>  3 files changed, 4 insertions(+), 6 deletions(-)

Makes sense.  Thanks for writing it.

Should this have a test?

[...]
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -45,6 +45,8 @@ void prepare_repo_settings(struct repository *r)
>  
>  	if (!repo_config_get_bool(r, "pack.usesparse", &value))
>  		r->settings.pack_use_sparse = value;
> +	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
> +
>  	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
>  		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
>  		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
> @@ -52,7 +54,6 @@ void prepare_repo_settings(struct repository *r)
>  	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
>  		r->settings.fetch_write_commit_graph = value;
>  	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
> -		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
>  		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
>  		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
>  	}
> 

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] config: set pack.useSparse=true by default
  2020-03-19 23:13 ` Jonathan Nieder
@ 2020-03-20  0:34   ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2020-03-20  0:34 UTC (permalink / raw)
  To: Jonathan Nieder, Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee

On 3/19/2020 7:13 PM, Jonathan Nieder wrote:
> Hi,
> 
> Derrick Stolee wrote:
> 
>> The pack.useSparse config option was introduced by 3d036eb0
>> (pack-objects: create pack.useSparse setting, 2019-01-19) and was
>> first available in v2.21.0. When enabled, the pack-objects process
>> during 'git push' will use a sparse tree walk when deciding which
>> trees and blobs to send to the remote. The algorithm was introduced
>> by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
>> has been in production use by VFS for Git since around that time.
>> The features.experimental config option also enabled pack.useSparse,
>> so hopefully that has also increased exposure.
>>
>> It is worth noting that pack.useSparse has a possibility of
>> sending more objects across a push, but requires a special
>> arrangement of exact _copies_ across directories. There is a test
>> in t5322-pack-objects-sparse.sh that demonstrates this possibility.
>>
>> Since the downside is unlikely but the upside is significant, set
>> the default value of pack.useSparse to true. Remove it from the
>> set of options implied by features.experimental.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  Documentation/config/feature.txt | 3 ---
>>  Documentation/config/pack.txt    | 4 ++--
>>  repo-settings.c                  | 3 ++-
>>  3 files changed, 4 insertions(+), 6 deletions(-)
> 
> Makes sense.  Thanks for writing it.
> 
> Should this have a test?

I suppose the test that demonstrates the difference in algorithm
in t5322-pack-objects-sparse.sh could be adjusted to drop the
explicit config setting, which would demonstrate that the config
option is being set correctly.

While looking at that test, I see that we use --[no-]sparse
explicitly everywhere to avoid conflicts with the GIT_TEST_*
variable that enables the algorithm. This leads to two things
I will do in v2 that I did not do here:

1. Update the docs for "git pack-objects" because it doesn't
   reference that --no-sparse is an option. Point out that the
   new default is --sparse.

2. Remove GIT_TEST_PACK_SPARSE which was used to test this sparse
   algorithm throughout the test suite.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] config: set pack.useSparse=true by default
  2020-03-19  1:58 [PATCH] config: set pack.useSparse=true by default Derrick Stolee via GitGitGadget
  2020-03-19 23:13 ` Jonathan Nieder
@ 2020-03-20 12:27 ` Derrick Stolee via GitGitGadget
  2020-03-20 12:38   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
  2020-03-20 20:43   ` [PATCH v2] config: set pack.useSparse=true by default Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-20 12:27 UTC (permalink / raw)
  To: git; +Cc: me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The pack.useSparse config option was introduced by 3d036eb0
(pack-objects: create pack.useSparse setting, 2019-01-19) and was
first available in v2.21.0. When enabled, the pack-objects process
during 'git push' will use a sparse tree walk when deciding which
trees and blobs to send to the remote. The algorithm was introduced
by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
has been in production use by VFS for Git since around that time.
The features.experimental config option also enabled pack.useSparse,
so hopefully that has also increased exposure.

It is worth noting that pack.useSparse has a possibility of
sending more objects across a push, but requires a special
arrangement of exact _copies_ across directories. There is a test
in t5322-pack-objects-sparse.sh that demonstrates this possibility.
This test uses the --sparse option to "git pack-objects" but we
can make it implied by the config value to demonstrate that the
default value has changed.

While updating that test, I noticed that the documentation did not
include an option for --no-sparse, which is now more important than
it was before.

Since the downside is unlikely but the upside is significant, set
the default value of pack.useSparse to true. Remove it from the
set of options implied by features.experimental.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    config: set pack.useSparse=true by default
    
    Here is a small patch to convert pack.useSparse to true by default. It's
    been released for over a year, so the feature is quite stable. I'm
    submitting this now to allow it to cook for a while during the next
    release cycle.
    
    Thanks to Jonathan Nieder pointing out the test implications, I've added
    a patch to swap the role of GIT_TEST_PACK_SPARSE to test the other mode.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-585%2Fderrickstolee%2Fpack-use-sparse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-585/derrickstolee/pack-use-sparse-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/585

Range-diff vs v1:

 1:  02e9a813126 ! 1:  60b5cc6f337 config: set pack.useSparse=true by default
     @@ -16,6 +16,13 @@
          sending more objects across a push, but requires a special
          arrangement of exact _copies_ across directories. There is a test
          in t5322-pack-objects-sparse.sh that demonstrates this possibility.
     +    This test uses the --sparse option to "git pack-objects" but we
     +    can make it implied by the config value to demonstrate that the
     +    default value has changed.
     +
     +    While updating that test, I noticed that the documentation did not
     +    include an option for --no-sparse, which is now more important than
     +    it was before.
      
          Since the downside is unlikely but the upside is significant, set
          the default value of pack.useSparse to true. Remove it from the
     @@ -52,6 +59,39 @@
       pack.writeBitmaps (deprecated)::
       	This is a deprecated synonym for `repack.writeBitmaps`.
      
     + diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
     + --- a/Documentation/git-pack-objects.txt
     + +++ b/Documentation/git-pack-objects.txt
     +@@
     + 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
     + 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
     + 	[--stdout [--filter=<filter-spec>] | base-name]
     +-	[--shallow] [--keep-true-parents] [--sparse] < object-list
     ++	[--shallow] [--keep-true-parents] [--[no-]sparse] < object-list
     + 
     + 
     + DESCRIPTION
     +@@
     + 	Add --no-reuse-object if you want to force a uniform compression
     + 	level on all data no matter the source.
     + 
     +---sparse::
     +-	Use the "sparse" algorithm to determine which objects to include in
     ++--[no-]sparse::
     ++	Toggle the "sparse" algorithm to determine which objects to include in
     + 	the pack, when combined with the "--revs" option. This algorithm
     + 	only walks trees that appear in paths that introduce new objects.
     + 	This can have significant performance benefits when computing
     + 	a pack to send a small change. However, it is possible that extra
     + 	objects are added to the pack-file if the included commits contain
     +-	certain types of direct renames.
     ++	certain types of direct renames. If this option is not included,
     ++	it defaults to the value of `pack.useSparse`, which is true unless
     ++	otherwise specified.
     + 
     + --thin::
     + 	Create a "thin" pack by omitting the common objects between a
     +
       diff --git a/repo-settings.c b/repo-settings.c
       --- a/repo-settings.c
       +++ b/repo-settings.c
     @@ -72,3 +112,24 @@
       		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
       		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
       	}
     +
     + diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
     + --- a/t/t5322-pack-objects-sparse.sh
     + +++ b/t/t5322-pack-objects-sparse.sh
     +@@
     + 	test_cmp required_objects.txt nonsparse_required_objects.txt
     + '
     + 
     ++# --sparse is enabled by default by pack.useSparse
     + test_expect_success 'sparse pack-objects' '
     + 	git rev-parse			\
     + 		topic1			\
     +@@
     + 		topic1:f3		\
     + 		topic1:f3/f4		\
     + 		topic1:f3/f4/data.txt | sort >expect_sparse_objects.txt &&
     +-	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
     ++	git pack-objects --stdout --revs <packinput.txt >sparse.pack &&
     + 	git index-pack -o sparse.idx sparse.pack &&
     + 	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
     + 	test_cmp expect_sparse_objects.txt sparse_objects.txt


 Documentation/config/feature.txt   |  3 ---
 Documentation/config/pack.txt      |  4 ++--
 Documentation/git-pack-objects.txt | 10 ++++++----
 repo-settings.c                    |  3 ++-
 t/t5322-pack-objects-sparse.sh     |  3 ++-
 5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index 875f8c8a66f..4e3a5c0cebc 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -12,9 +12,6 @@ feature.experimental::
 	setting if you are interested in providing feedback on experimental
 	features. The new default values are:
 +
-* `pack.useSparse=true` uses a new algorithm when constructing a pack-file
-which can improve `git push` performance in repos with many files.
-+
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
 +
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 0dac5805816..837f1b16792 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -119,8 +119,8 @@ pack.useSparse::
 	objects. This can have significant performance benefits when
 	computing a pack to send a small change. However, it is possible
 	that extra objects are added to the pack-file if the included
-	commits contain certain types of direct renames. Default is `false`
-	unless `feature.experimental` is enabled.
+	commits contain certain types of direct renames. Default is
+	`true`.
 
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index fecdf2600cc..eaa2f2a4041 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
 	[--stdout [--filter=<filter-spec>] | base-name]
-	[--shallow] [--keep-true-parents] [--sparse] < object-list
+	[--shallow] [--keep-true-parents] [--[no-]sparse] < object-list
 
 
 DESCRIPTION
@@ -196,14 +196,16 @@ depth is 4095.
 	Add --no-reuse-object if you want to force a uniform compression
 	level on all data no matter the source.
 
---sparse::
-	Use the "sparse" algorithm to determine which objects to include in
+--[no-]sparse::
+	Toggle the "sparse" algorithm to determine which objects to include in
 	the pack, when combined with the "--revs" option. This algorithm
 	only walks trees that appear in paths that introduce new objects.
 	This can have significant performance benefits when computing
 	a pack to send a small change. However, it is possible that extra
 	objects are added to the pack-file if the included commits contain
-	certain types of direct renames.
+	certain types of direct renames. If this option is not included,
+	it defaults to the value of `pack.useSparse`, which is true unless
+	otherwise specified.
 
 --thin::
 	Create a "thin" pack by omitting the common objects between a
diff --git a/repo-settings.c b/repo-settings.c
index a703e407a3f..dc6817daa95 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -45,6 +45,8 @@ void prepare_repo_settings(struct repository *r)
 
 	if (!repo_config_get_bool(r, "pack.usesparse", &value))
 		r->settings.pack_use_sparse = value;
+	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
+
 	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
 		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
 		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
@@ -52,7 +54,6 @@ void prepare_repo_settings(struct repository *r)
 	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
 		r->settings.fetch_write_commit_graph = value;
 	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
-		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
 	}
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 7124b5581a0..6e5d6bdb0a7 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -105,6 +105,7 @@ test_expect_success 'non-sparse pack-objects' '
 	test_cmp required_objects.txt nonsparse_required_objects.txt
 '
 
+# --sparse is enabled by default by pack.useSparse
 test_expect_success 'sparse pack-objects' '
 	git rev-parse			\
 		topic1			\
@@ -112,7 +113,7 @@ test_expect_success 'sparse pack-objects' '
 		topic1:f3		\
 		topic1:f3/f4		\
 		topic1:f3/f4/data.txt | sort >expect_sparse_objects.txt &&
-	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git pack-objects --stdout --revs <packinput.txt >sparse.pack &&
 	git index-pack -o sparse.idx sparse.pack &&
 	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
 	test_cmp expect_sparse_objects.txt sparse_objects.txt

base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 0/2] config: set pack.useSparse=true by default
  2020-03-20 12:27 ` [PATCH v2] " Derrick Stolee via GitGitGadget
@ 2020-03-20 12:38   ` Derrick Stolee via GitGitGadget
  2020-03-20 12:38     ` [PATCH v3 1/2] " Derrick Stolee via GitGitGadget
  2020-03-20 12:38     ` [PATCH v3 2/2] pack-objects: flip the use of GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
  2020-03-20 20:43   ` [PATCH v2] config: set pack.useSparse=true by default Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-20 12:38 UTC (permalink / raw)
  To: git; +Cc: me, jrnieder, Derrick Stolee

Here is a small patch to convert pack.useSparse to true by default. It's
been released for over a year, so the feature is quite stable. I'm
submitting this now to allow it to cook for a while during the next release
cycle.

UPDATE IN V3: (I'm sorry for the rapid v3, I forgot to push the commit that
included this bit about GIT_TEST_PACK_SPARSE.)

Thanks to Jonathan Nieder pointing out the test implications, I've added a
patch to swap the role of GIT_TEST_PACK_SPARSE to test the other mode.

Thanks, -Stolee

Derrick Stolee (2):
  config: set pack.useSparse=true by default
  pack-objects: flip the use of GIT_TEST_PACK_SPARSE

 Documentation/config/feature.txt   |  3 ---
 Documentation/config/pack.txt      |  4 ++--
 Documentation/git-pack-objects.txt | 10 ++++++----
 builtin/pack-objects.c             |  4 ++--
 repo-settings.c                    |  3 ++-
 t/README                           |  6 +++---
 t/t5322-pack-objects-sparse.sh     |  4 +++-
 7 files changed, 18 insertions(+), 16 deletions(-)


base-commit: 6c85aac65fb455af85745130ce35ddae4678db84
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-585%2Fderrickstolee%2Fpack-use-sparse-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-585/derrickstolee/pack-use-sparse-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/585

Range-diff vs v2:

 1:  60b5cc6f337 = 1:  60b5cc6f337 config: set pack.useSparse=true by default
 -:  ----------- > 2:  908d5c77c96 pack-objects: flip the use of GIT_TEST_PACK_SPARSE

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] config: set pack.useSparse=true by default
  2020-03-20 12:38   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
@ 2020-03-20 12:38     ` Derrick Stolee via GitGitGadget
  2020-03-20 12:38     ` [PATCH v3 2/2] pack-objects: flip the use of GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-20 12:38 UTC (permalink / raw)
  To: git; +Cc: me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The pack.useSparse config option was introduced by 3d036eb0
(pack-objects: create pack.useSparse setting, 2019-01-19) and was
first available in v2.21.0. When enabled, the pack-objects process
during 'git push' will use a sparse tree walk when deciding which
trees and blobs to send to the remote. The algorithm was introduced
by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
has been in production use by VFS for Git since around that time.
The features.experimental config option also enabled pack.useSparse,
so hopefully that has also increased exposure.

It is worth noting that pack.useSparse has a possibility of
sending more objects across a push, but requires a special
arrangement of exact _copies_ across directories. There is a test
in t5322-pack-objects-sparse.sh that demonstrates this possibility.
This test uses the --sparse option to "git pack-objects" but we
can make it implied by the config value to demonstrate that the
default value has changed.

While updating that test, I noticed that the documentation did not
include an option for --no-sparse, which is now more important than
it was before.

Since the downside is unlikely but the upside is significant, set
the default value of pack.useSparse to true. Remove it from the
set of options implied by features.experimental.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/feature.txt   |  3 ---
 Documentation/config/pack.txt      |  4 ++--
 Documentation/git-pack-objects.txt | 10 ++++++----
 repo-settings.c                    |  3 ++-
 t/t5322-pack-objects-sparse.sh     |  3 ++-
 5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index 875f8c8a66f..4e3a5c0cebc 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -12,9 +12,6 @@ feature.experimental::
 	setting if you are interested in providing feedback on experimental
 	features. The new default values are:
 +
-* `pack.useSparse=true` uses a new algorithm when constructing a pack-file
-which can improve `git push` performance in repos with many files.
-+
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
 +
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 0dac5805816..837f1b16792 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -119,8 +119,8 @@ pack.useSparse::
 	objects. This can have significant performance benefits when
 	computing a pack to send a small change. However, it is possible
 	that extra objects are added to the pack-file if the included
-	commits contain certain types of direct renames. Default is `false`
-	unless `feature.experimental` is enabled.
+	commits contain certain types of direct renames. Default is
+	`true`.
 
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index fecdf2600cc..eaa2f2a4041 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	[--local] [--incremental] [--window=<n>] [--depth=<n>]
 	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
 	[--stdout [--filter=<filter-spec>] | base-name]
-	[--shallow] [--keep-true-parents] [--sparse] < object-list
+	[--shallow] [--keep-true-parents] [--[no-]sparse] < object-list
 
 
 DESCRIPTION
@@ -196,14 +196,16 @@ depth is 4095.
 	Add --no-reuse-object if you want to force a uniform compression
 	level on all data no matter the source.
 
---sparse::
-	Use the "sparse" algorithm to determine which objects to include in
+--[no-]sparse::
+	Toggle the "sparse" algorithm to determine which objects to include in
 	the pack, when combined with the "--revs" option. This algorithm
 	only walks trees that appear in paths that introduce new objects.
 	This can have significant performance benefits when computing
 	a pack to send a small change. However, it is possible that extra
 	objects are added to the pack-file if the included commits contain
-	certain types of direct renames.
+	certain types of direct renames. If this option is not included,
+	it defaults to the value of `pack.useSparse`, which is true unless
+	otherwise specified.
 
 --thin::
 	Create a "thin" pack by omitting the common objects between a
diff --git a/repo-settings.c b/repo-settings.c
index a703e407a3f..dc6817daa95 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -45,6 +45,8 @@ void prepare_repo_settings(struct repository *r)
 
 	if (!repo_config_get_bool(r, "pack.usesparse", &value))
 		r->settings.pack_use_sparse = value;
+	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
+
 	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
 		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
 		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
@@ -52,7 +54,6 @@ void prepare_repo_settings(struct repository *r)
 	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
 		r->settings.fetch_write_commit_graph = value;
 	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
-		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
 	}
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 7124b5581a0..6e5d6bdb0a7 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -105,6 +105,7 @@ test_expect_success 'non-sparse pack-objects' '
 	test_cmp required_objects.txt nonsparse_required_objects.txt
 '
 
+# --sparse is enabled by default by pack.useSparse
 test_expect_success 'sparse pack-objects' '
 	git rev-parse			\
 		topic1			\
@@ -112,7 +113,7 @@ test_expect_success 'sparse pack-objects' '
 		topic1:f3		\
 		topic1:f3/f4		\
 		topic1:f3/f4/data.txt | sort >expect_sparse_objects.txt &&
-	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
+	git pack-objects --stdout --revs <packinput.txt >sparse.pack &&
 	git index-pack -o sparse.idx sparse.pack &&
 	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
 	test_cmp expect_sparse_objects.txt sparse_objects.txt
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] pack-objects: flip the use of GIT_TEST_PACK_SPARSE
  2020-03-20 12:38   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
  2020-03-20 12:38     ` [PATCH v3 1/2] " Derrick Stolee via GitGitGadget
@ 2020-03-20 12:38     ` Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-03-20 12:38 UTC (permalink / raw)
  To: git; +Cc: me, jrnieder, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The environment variable GIT_TEST_PACK_SPARSE was previously used
to allow testing the --sparse option for "git pack-objects" in
the test suite. This allowed interesting cases of "git push" to
also test this algorithm.

Since pack.useSparse is now true by default, we do not need this
variable to _enable_ the --sparse option, but instead to _disable_
it. This flips how we work with the variable a bit.

When checking for the variable, default to a value of -1 for
"unset". If unset, then take the default from the repo settings,
which is currently 1. Then, the --[no-]sparse command-line option
will override either of these settings.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c         | 4 ++--
 t/README                       | 6 +++---
 t/t5322-pack-objects-sparse.sh | 1 +
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 02aa6ee4808..eff9542f09f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3469,9 +3469,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 
-	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", 0);
+	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
 	prepare_repo_settings(the_repository);
-	if (!sparse && the_repository->settings.pack_use_sparse != -1)
+	if (sparse < 0)
 		sparse = the_repository->settings.pack_use_sparse;
 
 	reset_pack_idx_option(&pack_idx_opts);
diff --git a/t/README b/t/README
index 9afd61e3ca0..99ebb18829f 100644
--- a/t/README
+++ b/t/README
@@ -386,9 +386,9 @@ GIT_TEST_INDEX_VERSION=<n> exercises the index read/write code path
 for the index version specified.  Can be set to any valid version
 (currently 2, 3, or 4).
 
-GIT_TEST_PACK_SPARSE=<boolean> if enabled will default the pack-objects
-builtin to use the sparse object walk. This can still be overridden by
-the --no-sparse command-line argument.
+GIT_TEST_PACK_SPARSE=<boolean> if disabled will default the pack-objects
+builtin to use the non-sparse object walk. This can still be overridden by
+the --sparse command-line argument.
 
 GIT_TEST_PRELOAD_INDEX=<boolean> exercises the preload-index code path
 by overriding the minimum number of cache entries required per thread.
diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
index 6e5d6bdb0a7..a581eaf5293 100755
--- a/t/t5322-pack-objects-sparse.sh
+++ b/t/t5322-pack-objects-sparse.sh
@@ -107,6 +107,7 @@ test_expect_success 'non-sparse pack-objects' '
 
 # --sparse is enabled by default by pack.useSparse
 test_expect_success 'sparse pack-objects' '
+	GIT_TEST_PACK_SPARSE=-1 &&
 	git rev-parse			\
 		topic1			\
 		topic1^{tree}		\
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] config: set pack.useSparse=true by default
  2020-03-20 12:27 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  2020-03-20 12:38   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
@ 2020-03-20 20:43   ` Junio C Hamano
  2020-03-20 21:14     ` Derrick Stolee
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-03-20 20:43 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, jrnieder, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>     Here is a small patch to convert pack.useSparse to true by default. It's
>     been released for over a year, so the feature is quite stable.

I would not say anything more than "its' been released for over a
year, so the feature is known not to cause problems when it is not
enabled (in other words, we coded our if/else correctly)", unless
some telemetry tells us that significant number of users with widely
differing use patterns have enabled it and are not seeing much
negative effect.  And we can tell if it is stable only if we flip
the default.

> I'm submitting this now to allow it to cook for a while during the
> next release cycle.

I agree that it is about time to see if flipping of default would be
a good move for users whose usage patterns are unlike VFS for Git by
cooking a change like this in 'next', definitely at least a cycle
but possible a bit more, and it is a good idea to have it at the
beginning of the next cycle.  Very much appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] config: set pack.useSparse=true by default
  2020-03-20 20:43   ` [PATCH v2] config: set pack.useSparse=true by default Junio C Hamano
@ 2020-03-20 21:14     ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2020-03-20 21:14 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, jrnieder, Derrick Stolee

On 3/20/2020 4:43 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>     Here is a small patch to convert pack.useSparse to true by default. It's
>>     been released for over a year, so the feature is quite stable.
> 
> I would not say anything more than "its' been released for over a
> year, so the feature is known not to cause problems when it is not
> enabled (in other words, we coded our if/else correctly)", unless
> some telemetry tells us that significant number of users with widely
> differing use patterns have enabled it and are not seeing much
> negative effect.  And we can tell if it is stable only if we flip
> the default.

True. I've done my best to advertise the feature but have heard very
little from users about it outside of our "captive audience" in the
Windows OS team. I've also been using it myself (as part of
features.experimental) on all of my machines, but that's hardly a
vote for rigorous use in strange patterns.

>> I'm submitting this now to allow it to cook for a while during the
>> next release cycle.
> 
> I agree that it is about time to see if flipping of default would be
> a good move for users whose usage patterns are unlike VFS for Git by
> cooking a change like this in 'next', definitely at least a cycle
> but possible a bit more, and it is a good idea to have it at the
> beginning of the next cycle.  Very much appreciated.

I'm happy to let you decide when this has cooked long enough. If
you feel that more than one cycle is needed, then absolutely let's
be cautious here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-20 21:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19  1:58 [PATCH] config: set pack.useSparse=true by default Derrick Stolee via GitGitGadget
2020-03-19 23:13 ` Jonathan Nieder
2020-03-20  0:34   ` Derrick Stolee
2020-03-20 12:27 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2020-03-20 12:38   ` [PATCH v3 0/2] " Derrick Stolee via GitGitGadget
2020-03-20 12:38     ` [PATCH v3 1/2] " Derrick Stolee via GitGitGadget
2020-03-20 12:38     ` [PATCH v3 2/2] pack-objects: flip the use of GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2020-03-20 20:43   ` [PATCH v2] config: set pack.useSparse=true by default Junio C Hamano
2020-03-20 21:14     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).