From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: sandals@crustytoothpaste.net, steadmon@google.com,
jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com,
phillip.wood123@gmail.com, emilyshaffer@google.com,
sluongng@gmail.com, jonathantanmy@google.com,
Jonathan Tan <jonathantanmy@google.com>,
Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v4 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks
Date: Fri, 25 Sep 2020 12:33:30 +0000 [thread overview]
Message-ID: <pull.696.v4.git.1601037218.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.696.v3.git.1598380599.gitgitgadget@gmail.com>
This series is based on ds/maintenance-part-1 [2].
This patch series contains 9 patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help, I'm
splitting out the portions that create and test the 'maintenance' builtin
from the additional tasks (prefetch, loose-objects, incremental-repack) that
can be brought in later.
[1]
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/
[2]
https://lore.kernel.org/git/pull.695.v3.git.1598380426.gitgitgadget@gmail.com/
As detailed in [2], the 'git maintenance run' subcommand will run certain
tasks based on config options or the --task= arguments. The --auto option
indicates to the task to only run based on some internal check that there
has been "enough" change in that domain to merit the work. In the case of
the 'gc' task, this also reduces the amount of work done.
The new maintenance tasks in this series are:
* 'loose-objects' : prune packed loose objects, then create a new pack from
a batch of loose objects.
* 'pack-files' : expire redundant packs from the multi-pack-index, then
repack using the multi-pack-index's incremental repack strategy.
* 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
/'.
These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=".
Since [2] replaced the 'git gc --auto' calls with 'git maintenance run
--auto' at the end of some Git commands, users could replace the 'gc' task
with these lighter-weight changes for foreground maintenance.
The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves. I have an RFC series for
this available at [3].
[3]
https://lore.kernel.org/git/pull.680.git.1597857408.gitgitgadget@gmail.com/
Updates in v3
=============
* Several commit message, documentation, and test updates from Jonathan
Tan's helpful review!
Updates since v2
================
* Dropped "fetch: optionally allow disabling FETCH_HEAD update"
* A lot of fallout from the change in the option parsing in v3 of
Maintenance II.
* Dropped the "verify, and delete and rewrite on failure" logic from the
incremental-repack task. This might be added again later after it can be
tested more thoroughly.
Updates since v1 (of this series)
=================================
* PATCH 1 ("fetch: optionally allow disabling FETCH_HEAD update") was
rewritten on-list. Getting a version out with this patch is the main
reason for rolling a v2. (That, and Part I is re-rolled with a v2 and I
want to make sure this series applies cleanly.)
* The 'prefetch' and 'loose-objects' tasks had some review, but my proposed
changes were not acked, so they may need another review.
UPDATES since v3 of [1]
=======================
* The biggest change here is the use of "test_subcommand", based on
Jonathan Nieder's approach. This requires having the exact command-line
figured out, which now requires spelling out all --no- [quiet%7Cprogress]
options. I also added a bunch of "2>/dev/null" checks because of the
isatty(2) calls. Without that, the behavior will change depending on
whether the test is run with -x/-v or without.
* The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test
that verifies it.
* The option parsing has changed to use a local struct and pass that struct
to the helper methods. This is instead of having a global singleton.
Thanks, -Stolee
Derrick Stolee (8):
maintenance: add prefetch task
maintenance: add loose-objects task
maintenance: create auto condition for loose-objects
midx: enable core.multiPackIndex by default
midx: use start_delayed_progress()
maintenance: add incremental-repack task
maintenance: auto-size incremental-repack batch
maintenance: add incremental-repack auto condition
Documentation/config/core.txt | 4 +-
Documentation/config/maintenance.txt | 18 ++
Documentation/git-maintenance.txt | 48 ++++
builtin/gc.c | 326 +++++++++++++++++++++++++++
midx.c | 21 +-
repo-settings.c | 6 +
repository.h | 2 +
t/t5319-multi-pack-index.sh | 15 +-
t/t7900-maintenance.sh | 185 +++++++++++++++
9 files changed, 603 insertions(+), 22 deletions(-)
base-commit: 25914c4fdeefd99b06e134496dfb9bbb58a5c417
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/696
Range-diff vs v3:
1: da64c51a81 ! 1: 7a62e224cf maintenance: add prefetch task
@@ Commit message
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
- then the user running a foregroudn 'git fetch <remote>' would lose
+ then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
2: 75e846456b ! 2: f3a16fd324 maintenance: add loose-objects task
@@ Commit message
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
- requires populating a virtual file. This has potential of
- happening in partial clones if someone runs 'git grep' or
- otherwise evades the batch-download feature for requesting
- promisor objects.
+ requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
3: d6e382c43e ! 3: 931fff4883 maintenance: create auto condition for loose-objects
@@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' '
+ git -c maintenance.loose-objects.auto=1 maintenance \
+ run --auto --task=loose-objects 2>/dev/null &&
+ test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
-+ for i in 1 2
-+ do
-+ printf data-A-$i | git hash-object -t blob --stdin -w &&
-+ GIT_TRACE2_EVENT="$(pwd)/trace-loA-$i" \
-+ git -c maintenance.loose-objects.auto=2 \
-+ maintenance run --auto --task=loose-objects 2>/dev/null &&
-+ test_subcommand ! git prune-packed --quiet <trace-loA-$i &&
-+ printf data-B-$i | git hash-object -t blob --stdin -w &&
-+ GIT_TRACE2_EVENT="$(pwd)/trace-loB-$i" \
-+ git -c maintenance.loose-objects.auto=2 \
-+ maintenance run --auto --task=loose-objects 2>/dev/null &&
-+ test_subcommand git prune-packed --quiet <trace-loB-$i &&
-+ GIT_TRACE2_EVENT="$(pwd)/trace-loC-$i" \
-+ git -c maintenance.loose-objects.auto=2 \
-+ maintenance run --auto --task=loose-objects 2>/dev/null &&
-+ test_subcommand git prune-packed --quiet <trace-loC-$i || return 1
-+ done
++ printf data-A | git hash-object -t blob --stdin -w &&
++ GIT_TRACE2_EVENT="$(pwd)/trace-loA" \
++ git -c maintenance.loose-objects.auto=2 \
++ maintenance run --auto --task=loose-objects 2>/dev/null &&
++ test_subcommand ! git prune-packed --quiet <trace-loA &&
++ printf data-B | git hash-object -t blob --stdin -w &&
++ GIT_TRACE2_EVENT="$(pwd)/trace-loB" \
++ git -c maintenance.loose-objects.auto=2 \
++ maintenance run --auto --task=loose-objects 2>/dev/null &&
++ test_subcommand git prune-packed --quiet <trace-loB &&
++ GIT_TRACE2_EVENT="$(pwd)/trace-loC" \
++ git -c maintenance.loose-objects.auto=2 \
++ maintenance run --auto --task=loose-objects 2>/dev/null &&
++ test_subcommand git prune-packed --quiet <trace-loC
+'
+
test_done
4: d0f2ec70d9 = 4: 0fe2036aa8 midx: enable core.multiPackIndex by default
5: 2cd3c803d9 = 5: ce435bf784 midx: use start_delayed_progress()
6: 0dd26bb584 ! 6: d934899253 maintenance: add incremental-repack task
@@ Documentation/git-maintenance.txt: loose-objects::
+ The `incremental-repack` job repacks the object directory
+ using the `multi-pack-index` feature. In order to prevent race
+ conditions with concurrent Git commands, it follows a two-step
-+ process. First, it deletes any pack-files included in the
-+ `multi-pack-index` where none of the objects in the
-+ `multi-pack-index` reference those pack-files; this only happens
-+ if all objects in the pack-file are also stored in a newer
-+ pack-file. Second, it selects a group of pack-files whose "expected
-+ size" is below the batch size until the group has total expected
-+ size at least the batch size; see the `--batch-size` option for
-+ the `repack` subcommand in linkgit:git-multi-pack-index[1]. The
-+ default batch-size is zero, which is a special case that attempts
-+ to repack all pack-files into a single pack-file.
++ process. First, it calls `git multi-pack-index expire` to delete
++ pack-files unreferenced by the `multi-pack-index` file. Second, it
++ calls `git multi-pack-index repack` to select several small
++ pack-files and repack them into a bigger one, and then update the
++ `multi-pack-index` entries that refer to the small pack-files to
++ refer to the new pack-file. This prepares those small pack-files
++ for deletion upon the next run of `git multi-pack-index expire`.
++ The selection of the small pack-files is such that the expected
++ size of the big pack-file is at least the batch size; see the
++ `--batch-size` option for the `repack` subcommand in
++ linkgit:git-multi-pack-index[1]. The default batch-size is zero,
++ which is a special case that attempts to repack all pack-files
++ into a single pack-file.
+
OPTIONS
-------
--auto::
## builtin/gc.c ##
-@@
- #include "promisor-remote.h"
- #include "refs.h"
- #include "remote.h"
-+#include "midx.h"
-
- #define FAILED_RUN "failed to run %s"
-
@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
return prune_packed(opts) || pack_loose(opts);
}
@@ builtin/gc.c: static struct maintenance_task tasks[] = {
"gc",
maintenance_task_gc,
- ## midx.c ##
-@@
-
- #define PACK_EXPIRED UINT_MAX
-
--static char *get_midx_filename(const char *object_dir)
-+char *get_midx_filename(const char *object_dir)
- {
- return xstrfmt("%s/pack/multi-pack-index", object_dir);
- }
-
- ## midx.h ##
-@@ midx.h: struct multi_pack_index {
-
- #define MIDX_PROGRESS (1 << 0)
-
-+char *get_midx_filename(const char *object_dir);
- struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
- int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
- int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result);
-
## t/t5319-multi-pack-index.sh ##
@@
test_description='multi-pack-indexes'
@@ t/t7900-maintenance.sh: test_description='git maintenance builtin'
test_expect_success 'help text' '
test_expect_code 129 git maintenance -h 2>err &&
@@ t/t7900-maintenance.sh: test_expect_success 'maintenance.loose-objects.auto' '
- done
+ test_subcommand git prune-packed --quiet <trace-loC
'
+test_expect_success 'incremental-repack task' '
7: f3b25a9927 = 7: bade7706d5 maintenance: auto-size incremental-repack batch
8: e9bb32f53a ! 8: f660dd1890 maintenance: add incremental-repack auto condition
@@ Documentation/config/maintenance.txt: maintenance.loose-objects.auto::
## builtin/gc.c ##
@@
+ #include "promisor-remote.h"
#include "refs.h"
#include "remote.h"
- #include "midx.h"
+#include "object-store.h"
#define FAILED_RUN "failed to run %s"
@@ t/t7900-maintenance.sh: test_expect_success EXPENSIVE 'incremental-repack 2g lim
+ -c maintenance.incremental-repack.auto=1 \
+ maintenance run --auto --task=incremental-repack 2>/dev/null &&
+ test_subcommand ! git multi-pack-index write --no-progress <midx-init.txt &&
-+ for i in 1 2
-+ do
-+ test_commit A-$i &&
-+ git pack-objects --revs .git/objects/pack/pack <<-\EOF &&
-+ HEAD
-+ ^HEAD~1
-+ EOF
-+ GIT_TRACE2_EVENT=$(pwd)/trace-A-$i git \
-+ -c maintenance.incremental-repack.auto=2 \
-+ maintenance run --auto --task=incremental-repack 2>/dev/null &&
-+ test_subcommand ! git multi-pack-index write --no-progress <trace-A-$i &&
-+ test_commit B-$i &&
-+ git pack-objects --revs .git/objects/pack/pack <<-\EOF &&
-+ HEAD
-+ ^HEAD~1
-+ EOF
-+ GIT_TRACE2_EVENT=$(pwd)/trace-B-$i git \
-+ -c maintenance.incremental-repack.auto=2 \
-+ maintenance run --auto --task=incremental-repack 2>/dev/null &&
-+ test_subcommand git multi-pack-index write --no-progress <trace-B-$i || return 1
-+ done
++ test_commit A &&
++ git pack-objects --revs .git/objects/pack/pack <<-\EOF &&
++ HEAD
++ ^HEAD~1
++ EOF
++ GIT_TRACE2_EVENT=$(pwd)/trace-A git \
++ -c maintenance.incremental-repack.auto=2 \
++ maintenance run --auto --task=incremental-repack 2>/dev/null &&
++ test_subcommand ! git multi-pack-index write --no-progress <trace-A &&
++ test_commit B &&
++ git pack-objects --revs .git/objects/pack/pack <<-\EOF &&
++ HEAD
++ ^HEAD~1
++ EOF
++ GIT_TRACE2_EVENT=$(pwd)/trace-B git \
++ -c maintenance.incremental-repack.auto=2 \
++ maintenance run --auto --task=incremental-repack 2>/dev/null &&
++ test_subcommand git multi-pack-index write --no-progress <trace-B
+'
+
test_done
--
gitgitgadget
next prev parent reply other threads:[~2020-09-25 12:33 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-06 16:30 [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-12 23:10 ` Emily Shaffer
2020-08-13 0:03 ` Junio C Hamano
2020-08-13 1:45 ` Jonathan Nieder
2020-08-13 4:37 ` [PATCH v3] " Junio C Hamano
2020-08-14 1:13 ` Derrick Stolee
2020-08-14 1:32 ` Junio C Hamano
2020-08-06 16:30 ` [PATCH 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-12 23:10 ` Emily Shaffer
2020-08-14 1:28 ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-12 23:10 ` Emily Shaffer
2020-08-14 1:46 ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-06 17:02 ` Son Luong Ngoc
2020-08-06 18:13 ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-25 18:36 ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-25 18:36 ` [PATCH v3 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-22 23:05 ` Jonathan Tan
2020-08-25 18:36 ` [PATCH v3 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-22 23:09 ` Jonathan Tan
2020-09-24 13:45 ` Derrick Stolee
2020-08-25 18:36 ` [PATCH v3 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-22 23:15 ` Jonathan Tan
2020-09-24 13:51 ` Derrick Stolee
2020-08-25 18:36 ` [PATCH v3 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-22 23:16 ` Jonathan Tan
2020-09-24 13:53 ` Derrick Stolee
2020-08-25 18:36 ` [PATCH v3 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-25 18:36 ` [PATCH v3 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-22 23:26 ` Jonathan Tan
2020-09-24 14:05 ` Derrick Stolee
2020-09-24 22:01 ` Jonathan Tan
2020-08-25 18:36 ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25 18:36 ` [PATCH v3 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-09-22 23:52 ` Jonathan Tan
2020-08-25 20:59 ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Junio C Hamano
2020-08-26 15:15 ` Son Luong Ngoc
2020-08-26 16:21 ` Derrick Stolee
2020-09-25 12:33 ` Derrick Stolee via GitGitGadget [this message]
2020-09-25 12:33 ` [PATCH v4 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-25 18:00 ` Junio C Hamano
2020-09-25 18:43 ` Derrick Stolee
2020-09-25 12:33 ` [PATCH v4 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-09-25 12:33 ` [PATCH v4 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.696.v4.git.1601037218.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=congdanhqx@gmail.com \
--cc=derrickstolee@github.com \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=phillip.wood123@gmail.com \
--cc=sandals@crustytoothpaste.net \
--cc=sluongng@gmail.com \
--cc=steadmon@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).