git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] doc: mention bigFileThreshold for packing
@ 2021-02-09 19:07 Christian Walther via GitGitGadget
  2021-02-09 21:50 ` Junio C Hamano
  2021-02-21 13:23 ` [PATCH v2] " Christian Walther via GitGitGadget
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Walther via GitGitGadget @ 2021-02-09 19:07 UTC (permalink / raw)
  To: git; +Cc: Christian Walther, Christian Walther

From: Christian Walther <cwalther@gmx.ch>

Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Signed-off-by: Christian Walther <cwalther@gmx.ch>
---
    doc: mention bigFileThreshold for packing
    
    I recently spent a lot of time trying to figure out why git repack would
    create huge packs on some clones of my repository and small ones on
    others, until I found out about the existence of the
    core.bigFileThreshold configuration variable, which happened to be set
    on some and not on others. It would have saved me a lot of time if that
    variable had been mentioned in the relevant manpages that I was reading,
    git-repack and git-pack-objects. So this patch adds that.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-872%2Fcwalther%2Fdeltadoc-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-872/cwalther/deltadoc-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/872

 Documentation/git-pack-objects.txt | 4 ++++
 Documentation/git-repack.txt       | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 54d715ead137..59150ded4bef 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -97,6 +97,10 @@ base-name::
 	side, because delta data needs to be applied that many
 	times to get to the necessary object.
 +
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see
+linkgit:git-config[1]).
++
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 92f146d27dc3..0a7038ec4ad8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -96,6 +96,10 @@ to the new separate pack will be written.
 	affects the performance on the unpacker side, because delta data needs
 	to be applied that many times to get to the necessary object.
 +
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see
+linkgit:git-config[1]).
++
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
 

base-commit: fb7fa4a1fd273f22efcafdd13c7f897814fd1eb9
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] doc: mention bigFileThreshold for packing
  2021-02-09 19:07 [PATCH] doc: mention bigFileThreshold for packing Christian Walther via GitGitGadget
@ 2021-02-09 21:50 ` Junio C Hamano
  2021-02-10 21:43   ` Christian Walther
  2021-02-21 13:23 ` [PATCH v2] " Christian Walther via GitGitGadget
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2021-02-09 21:50 UTC (permalink / raw)
  To: Christian Walther via GitGitGadget; +Cc: git, Christian Walther

"Christian Walther via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Christian Walther <cwalther@gmx.ch>
>
> Knowing about the core.bigFileThreshold configuration variable is
> helpful when examining pack file size differences between repositories.
> Add a reference to it to the manpages a user is likely to read in this
> situation.

Thanks.

I doubt that the description of --window/--depth command line
options, for both repack and pack-objects, is the best place to add
this "Note".  Even if we were to add it as an appendix to these
places, please do not break the flow of explanation by inserting it
before the description of the default values of these options.

>     I recently spent a lot of time trying to figure out why git repack would
>     create huge packs on some clones of my repository and small ones on
>     others, until I found out about the existence of the
>     core.bigFileThreshold configuration variable, which happened to be set
>     on some and not on others. It would have saved me a lot of time if that
>     variable had been mentioned in the relevant manpages that I was reading,
>     git-repack and git-pack-objects. So this patch adds that.

Not related to the contents of the patch, but I am somewhat curious
to know what configuration resulted in the "huge" ones and "small"
ones.  Documentation/config/core.txt::core.bigFileThreashold may be
helped by addition of a success story, and the configuration for the
"small" ones may be a good place to start.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] doc: mention bigFileThreshold for packing
  2021-02-09 21:50 ` Junio C Hamano
@ 2021-02-10 21:43   ` Christian Walther
  2021-02-10 22:19     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Walther @ 2021-02-10 21:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Walther via GitGitGadget, git

Junio C Hamano wrote:

> I doubt that the description of --window/--depth command line
> options, for both repack and pack-objects, is the best place to add
> this "Note".  Even if we were to add it as an appendix to these
> places, please do not break the flow of explanation by inserting it
> before the description of the default values of these options.

OK. That was where I would have looked for it, because it explains why --window wasn't effective in my attempts to get better compression, but I don't insist on it - any place would have worked, as I read both manpages back and forth several times.

In git-repack.txt, there is a "Configuration" section at the bottom, I guess it would fit there? There is none in git-pack-objects.txt, but I could add it. What do you think?


>>    I recently spent a lot of time trying to figure out why git repack would
>>    create huge packs on some clones of my repository and small ones on
>>    others
> 
> Not related to the contents of the patch, but I am somewhat curious
> to know what configuration resulted in the "huge" ones and "small"
> ones.  Documentation/config/core.txt::core.bigFileThreashold may be
> helped by addition of a success story, and the configuration for the
> "small" ones may be a good place to start.

The "huge" repository had bigFileThreshold = 1m. That was set by SubGit when converting from Subversion, for reasons unknown to me (see some discussion at https://support.tmatesoft.com/t/reduce-repository-size/2551 and https://issues.tmatesoft.com/issue/SGT-604). The result is a pack file of about 3 GB.

The "small" repository has it unset, so the default 512m applies, resulting in a pack file of about 50 MB.

What causes the huge difference is that the repository contains a "changelog" file that changes in almost every commit and has grown to 2.4 MB over 10000 commits. So it exists in about that many different versions, of which about 6000 are larger than 1 MB, but they only differ from each other by successive addition of small pieces.

I'm not sure if that makes for a good success story. 1m seems a rather extreme value to me. If you think so, I can try to come up with something.

Thanks

 Christian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] doc: mention bigFileThreshold for packing
  2021-02-10 21:43   ` Christian Walther
@ 2021-02-10 22:19     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-02-10 22:19 UTC (permalink / raw)
  To: Christian Walther; +Cc: Christian Walther via GitGitGadget, git

Christian Walther <cwalther@gmx.ch> writes:

> Junio C Hamano wrote:
>
>> I doubt that the description of --window/--depth command line
>> options, for both repack and pack-objects, is the best place to add
>> this "Note".  Even if we were to add it as an appendix to these
>> places, please do not break the flow of explanation by inserting it
>> before the description of the default values of these options.
>
> OK. That was where I would have looked for it, because it explains
> why --window wasn't effective in my attempts to get better
> compression, but I don't insist on it - any place would have
> worked, as I read both manpages back and forth several times.

The "pack-objects" command (and to some degree "repack", too) is
about packing throughout, and --depth/--window is not necessarily
the central piece of the puzzle, and that, together with disruption
of the flow of the original explanation, was the reason why I found
the initial location a bit odd.

> In git-repack.txt, there is a "Configuration" section at the
> bottom, I guess it would fit there? There is none in
> git-pack-objects.txt, but I could add it. What do you think?

You're right---if there is an existing CONFIGURATION section, that
may be a much better place.  There are configuration variables that
affect how the packing works other than the core.bigFileThreshold,
and attributes like "delta" would also affect the outcome.

Describing all in one CONFIGURATION section would be valuable.

What I queued is with the following ready to be squashed in,
primarily because I was lazy and didn't have time/inclination to
look for a better place myself ;-)

Thanks.

---- >8 ----
Subject: [PATCH] fixup! doc: mention bigFileThreshold for packing

---
 Documentation/git-pack-objects.txt | 7 +++----
 Documentation/git-repack.txt       | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 59150ded4b..be0f953c35 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -97,12 +97,11 @@ base-name::
 	side, because delta data needs to be applied that many
 	times to get to the necessary object.
 +
-Note that delta compression is never used on objects larger than the
-`core.bigFileThreshold` configuration variable (see
-linkgit:git-config[1]).
-+
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
++
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see linkgit:git-config[1]).
 
 --window-memory=<n>::
 	This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 0a7038ec4a..145fff6e01 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -96,12 +96,11 @@ to the new separate pack will be written.
 	affects the performance on the unpacker side, because delta data needs
 	to be applied that many times to get to the necessary object.
 +
-Note that delta compression is never used on objects larger than the
-`core.bigFileThreshold` configuration variable (see
-linkgit:git-config[1]).
-+
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
++
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see linkgit:git-config[1]).
 
 --threads=<n>::
 	This option is passed through to `git pack-objects`.
-- 
2.30.1-597-g82b686dd6a


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2] doc: mention bigFileThreshold for packing
  2021-02-09 19:07 [PATCH] doc: mention bigFileThreshold for packing Christian Walther via GitGitGadget
  2021-02-09 21:50 ` Junio C Hamano
@ 2021-02-21 13:23 ` Christian Walther via GitGitGadget
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Walther via GitGitGadget @ 2021-02-21 13:23 UTC (permalink / raw)
  To: git; +Cc: Christian Walther, Christian Walther

From: Christian Walther <cwalther@gmx.ch>

Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Capitalize CONFIGURATION for consistency with other pages having such a
section.

Signed-off-by: Christian Walther <cwalther@gmx.ch>
---
    doc: mention bigFileThreshold for packing
    
    I recently spent a lot of time trying to figure out why git repack would
    create huge packs on some clones of my repository and small ones on
    others, until I found out about the existence of the
    core.bigFileThreshold configuration variable, which happened to be set
    on some and not on others. It would have saved me a lot of time if that
    variable had been mentioned in the relevant manpages that I was reading,
    git-repack and git-pack-objects. So this patch adds that.
    
    Changes in v2:
    
     * Move additions to the CONFIGURATION section at the bottom.
     * Reword a little after realizing that there are more configuration
       variables affecting packing.
     * Capitalize CONFIGURATION for consistency with other pages having such
       a section.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-872%2Fcwalther%2Fdeltadoc-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-872/cwalther/deltadoc-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/872

Range-diff vs v1:

 1:  20b9a56d94b7 < -:  ------------ doc: mention bigFileThreshold for packing
 -:  ------------ > 1:  027d1038fbb1 doc: mention bigFileThreshold for packing


 Documentation/git-pack-objects.txt | 11 +++++++++++
 Documentation/git-repack.txt       |  9 ++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 54d715ead137..f85cb7ea934c 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -400,6 +400,17 @@ Note that we pick a single island for each regex to go into, using "last
 one wins" ordering (which allows repo-specific config to take precedence
 over user-wide config, and so forth).
 
+
+CONFIGURATION
+-------------
+
+Various configuration variables affect packing, see
+linkgit:git-config[1] (search for "pack" and "delta").
+
+Notably, delta compression is not used on objects larger than the
+`core.bigFileThreshold` configuration variable and on files with the
+attribute `delta` set to false.
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 92f146d27dc3..fbd4b4ae0677 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -165,9 +165,12 @@ depth is 4095.
 	Pass the `--delta-islands` option to `git-pack-objects`, see
 	linkgit:git-pack-objects[1].
 
-Configuration
+CONFIGURATION
 -------------
 
+Various configuration variables affect packing, see
+linkgit:git-config[1] (search for "pack" and "delta").
+
 By default, the command passes `--delta-base-offset` option to
 'git pack-objects'; this typically results in slightly smaller packs,
 but the generated packs are incompatible with versions of Git older than
@@ -178,6 +181,10 @@ need to set the configuration variable `repack.UseDeltaBaseOffset` to
 is unaffected by this option as the conversion is performed on the fly
 as needed in that case.
 
+Delta compression is not used on objects larger than the
+`core.bigFileThreshold` configuration variable and on files with the
+attribute `delta` set to false.
+
 SEE ALSO
 --------
 linkgit:git-pack-objects[1]

base-commit: 2283e0e9af55689215afa39c03beb2315ce18e83
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-21 13:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 19:07 [PATCH] doc: mention bigFileThreshold for packing Christian Walther via GitGitGadget
2021-02-09 21:50 ` Junio C Hamano
2021-02-10 21:43   ` Christian Walther
2021-02-10 22:19     ` Junio C Hamano
2021-02-21 13:23 ` [PATCH v2] " Christian Walther via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).