git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: sandals@crustytoothpaste.net, steadmon@google.com,
	jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com,
	phillip.wood123@gmail.com, emilyshaffer@google.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks
Date: Tue, 25 Aug 2020 18:36:31 +0000	[thread overview]
Message-ID: <pull.696.v3.git.1598380599.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.696.v2.git.1597760730.gitgitgadget@gmail.com>

This series is based on v3 of part I (ds/maintenance-part-1) [2].

This patch series contains 9 patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help, I'm
splitting out the portions that create and test the 'maintenance' builtin
from the additional tasks (prefetch, loose-objects, incremental-repack) that
can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/
[2] 
https://lore.kernel.org/git/pull.695.v3.git.1598380426.gitgitgadget@gmail.com/

As detailed in [2], the 'git maintenance run' subcommand will run certain
tasks based on config options or the --task= arguments. The --auto option
indicates to the task to only run based on some internal check that there
has been "enough" change in that domain to merit the work. In the case of
the 'gc' task, this also reduces the amount of work done. 

The new maintenance tasks in this series are:

 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". 

Since [2] replaced the 'git gc --auto' calls with 'git maintenance run
--auto' at the end of some Git commands, users could replace the 'gc' task
with these lighter-weight changes for foreground maintenance.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves. I have an RFC series for
this available at [3].

[3] 
https://lore.kernel.org/git/pull.680.git.1597857408.gitgitgadget@gmail.com/

Updates since v2
================

 * Dropped "fetch: optionally allow disabling FETCH_HEAD update"
   
   
 * A lot of fallout from the change in the option parsing in v3 of
   Maintenance II.
   
   
 * Dropped the "verify, and delete and rewrite on failure" logic from the
   incremental-repack task. This might be added again later after it can be
   tested more thoroughly.
   
   

Updates since v1 (of this series)
=================================

 * PATCH 1 ("fetch: optionally allow disabling FETCH_HEAD update") was
   rewritten on-list. Getting a version out with this patch is the main
   reason for rolling a v2. (That, and Part I is re-rolled with a v2 and I
   want to make sure this series applies cleanly.)
   
   
 * The 'prefetch' and 'loose-objects' tasks had some review, but my proposed
   changes were not acked, so they may need another review.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test
   that verifies it.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (8):
  maintenance: add prefetch task
  maintenance: add loose-objects task
  maintenance: create auto condition for loose-objects
  midx: enable core.multiPackIndex by default
  midx: use start_delayed_progress()
  maintenance: add incremental-repack task
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack auto condition

 Documentation/config/core.txt        |   4 +-
 Documentation/config/maintenance.txt |  18 ++
 Documentation/git-maintenance.txt    |  45 ++++
 builtin/gc.c                         | 327 +++++++++++++++++++++++++++
 midx.c                               |  23 +-
 midx.h                               |   1 +
 repo-settings.c                      |   6 +
 repository.h                         |   2 +
 t/t5319-multi-pack-index.sh          |  15 +-
 t/t7900-maintenance.sh               | 191 ++++++++++++++++
 10 files changed, 609 insertions(+), 23 deletions(-)


base-commit: 652a8eac57d04a51820c7a5b45031b50c5188e7b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/696

Range-diff vs v2:

  1:  f3bc0b2d92 <  -:  ---------- fetch: optionally allow disabling FETCH_HEAD update
  2:  8779c6c20d !  1:  da64c51a81 maintenance: add prefetch task
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: since it will not expire `.graph` files that were in the previous
     - `commit-graph-chain` file. They will be deleted by a later run based on
     - the expiration delay.
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + 	`commit-graph-chain` file. They will be deleted by a later run based
     + 	on the expiration delay.
       
      +prefetch::
      +	The `prefetch` task updates the object directory with the latest
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     - 	return 1;
     +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
     + 	return 0;
       }
       
     -+static int fetch_remote(const char *remote, struct maintenance_opts *opts)
     ++static int fetch_remote(const char *remote, struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return 0;
      +}
      +
     -+static int maintenance_task_prefetch(struct maintenance_opts *opts)
     ++static int maintenance_task_prefetch(struct maintenance_run_opts *opts)
      +{
      +	int result = 0;
      +	struct string_list_item *item;
     @@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *
      +	return result;
      +}
      +
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: struct maintenance_task {
  3:  4fa9d298b9 !  2:  75e846456b maintenance: add loose-objects task
     @@ Documentation/git-maintenance.txt: gc::
       --auto::
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       	return run_command(&child);
       }
       
     -+static int prune_packed(struct maintenance_opts *opts)
     ++static int prune_packed(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return ++(d->count) > d->batch_size;
      +}
      +
     -+static int pack_loose(struct maintenance_opts *opts)
     ++static int pack_loose(struct maintenance_run_opts *opts)
      +{
      +	struct repository *r = the_repository;
      +	int result = 0;
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	return result;
      +}
      +
     -+static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     ++static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
      +{
      +	return prune_packed(opts) || pack_loose(opts);
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  4:  3432bc3167 =  3:  d6e382c43e maintenance: create auto condition for loose-objects
  5:  0ee2434bdb =  4:  d0f2ec70d9 midx: enable core.multiPackIndex by default
  6:  e157ea8dd7 =  5:  2cd3c803d9 midx: use start_delayed_progress()
  7:  a8d956dad6 !  6:  0dd26bb584 maintenance: add incremental-repack task
     @@ Commit message
             it requires doing some calculations that are better isolated to
             a separate change.
      
     -    Each of the above steps update the multi-pack-index file. After
     -    each step, we verify the new multi-pack-index. If the new
     -    multi-pack-index is corrupt, then delete the multi-pack-index,
     -    rewrite it from scratch, and stop doing the later steps of the
     -    job. This is intended to be an extra-safe check without leaving
     -    a repo with many pack-files without a multi-pack-index.
     -
          These steps are based on a similar background maintenance step in
          Scalar (and VFS for Git) [1]. This was incredibly effective for
          users of the Windows OS repository. After using the same VFS for Git
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     -+static int multi_pack_index_write(struct maintenance_opts *opts)
     ++static int multi_pack_index_write(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int rewrite_multi_pack_index(struct maintenance_opts *opts)
     -+{
     -+	struct repository *r = the_repository;
     -+	char *midx_name = get_midx_filename(r->objects->odb->path);
     -+
     -+	unlink(midx_name);
     -+	free(midx_name);
     -+
     -+	return multi_pack_index_write(opts);
     -+}
     -+
     -+static int multi_pack_index_verify(struct maintenance_opts *opts,
     -+				   const char *message)
     -+{
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+
     -+	child.git_cmd = 1;
     -+	strvec_pushl(&child.args, "multi-pack-index", "verify", NULL);
     -+
     -+	if (opts->quiet)
     -+		strvec_push(&child.args, "--no-progress");
     -+
     -+	if (run_command(&child)) {
     -+		warning(_("'git multi-pack-index verify' failed %s"), message);
     -+		return 1;
     -+	}
     -+
     -+	return 0;
     -+}
     -+
     -+static int multi_pack_index_expire(struct maintenance_opts *opts)
     ++static int multi_pack_index_expire(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int multi_pack_index_repack(struct maintenance_opts *opts)
     ++static int multi_pack_index_repack(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return 0;
      +}
      +
     -+static int maintenance_task_incremental_repack(struct maintenance_opts *opts)
     ++static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts)
      +{
      +	prepare_repo_settings(the_repository);
      +	if (!the_repository->settings.core_multi_pack_index) {
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +
      +	if (multi_pack_index_write(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after initial write"))
     -+		return rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_expire(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after expire step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	if (multi_pack_index_repack(opts))
      +		return 1;
     -+	if (multi_pack_index_verify(opts, "after repack step"))
     -+		return !!rewrite_multi_pack_index(opts);
      +	return 0;
      +}
      +
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
       /*
      @@ builtin/gc.c: struct maintenance_task {
  8:  f0e7276755 !  7:  f3b25a9927 maintenance: auto-size incremental-repack batch
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_run_opts *opts)
       	return 0;
       }
       
     @@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
      +	return result_size;
      +}
      +
     - static int multi_pack_index_repack(struct maintenance_opts *opts)
     + static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
     -@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_run_opts *opts)
       	if (opts->quiet)
       		strvec_push(&child.args, "--no-progress");
       
  9:  5659a23ad5 !  8:  e9bb32f53a maintenance: add incremental-repack auto condition
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
       	return prune_packed(opts) || pack_loose(opts);
       }
       
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts
      +	return count >= incremental_repack_auto_limit;
      +}
      +
     - static int multi_pack_index_write(struct maintenance_opts *opts)
     + static int multi_pack_index_write(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {

-- 
gitgitgadget

  parent reply	other threads:[~2020-08-25 18:36 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 16:30 [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-13  0:03     ` Junio C Hamano
2020-08-13  1:45       ` Jonathan Nieder
2020-08-13  4:37       ` [PATCH v3] " Junio C Hamano
2020-08-14  1:13         ` Derrick Stolee
2020-08-14  1:32           ` Junio C Hamano
2020-08-06 16:30 ` [PATCH 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-14  1:28     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-12 23:10   ` Emily Shaffer
2020-08-14  1:46     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-06 16:30 ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-06 17:02   ` Son Luong Ngoc
2020-08-06 18:13     ` Derrick Stolee
2020-08-06 16:30 ` [PATCH 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-18 14:25 ` [PATCH v2 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 1/9] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 2/9] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 3/9] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 4/9] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 5/9] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 6/9] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 7/9] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-18 14:25   ` [PATCH v2 9/9] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-08-25 18:36   ` Derrick Stolee via GitGitGadget [this message]
2020-08-25 18:36     ` [PATCH v3 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-22 23:05       ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-22 23:09       ` Jonathan Tan
2020-09-24 13:45         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-22 23:15       ` Jonathan Tan
2020-09-24 13:51         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-22 23:16       ` Jonathan Tan
2020-09-24 13:53         ` Derrick Stolee
2020-08-25 18:36     ` [PATCH v3 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-22 23:26       ` Jonathan Tan
2020-09-24 14:05         ` Derrick Stolee
2020-09-24 22:01           ` Jonathan Tan
2020-08-25 18:36     ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25 18:36     ` [PATCH v3 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-09-22 23:52       ` Jonathan Tan
2020-08-25 20:59     ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Junio C Hamano
2020-08-26 15:15     ` Son Luong Ngoc
2020-08-26 16:21       ` Derrick Stolee
2020-09-25 12:33     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 1/8] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 2/8] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 3/8] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-09-25 18:00         ` Junio C Hamano
2020-09-25 18:43           ` Derrick Stolee
2020-09-25 12:33       ` [PATCH v4 4/8] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 5/8] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 6/8] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-09-25 12:33       ` [PATCH v4 8/8] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.696.v3.git.1598380599.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).