git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks
@ 2020-08-06 15:48 Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This series is based on jk/strvec.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Here is the range-diff from the v3 of [1].

 1:  12fe73bb72 !  1:  2b9deb6d6a maintenance: create basic maintenance runner
    @@ Commit message
         'gc' builtin entirely at some point, leaving 'git gc' as an alias for
         some specific arguments to 'git maintenance run'.

    +    Create a new test_subcommand helper that allows us to test if a certain
    +    subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
    +    file. A negation mode is available that will be used in later tests.
    +
    +    Helped-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## .gitignore ##
    @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     +    NULL
     +};
     +
    -+static struct maintenance_opts {
    ++struct maintenance_opts {
     +    int auto_flag;
    -+} opts;
    ++};
     +
    -+static int maintenance_task_gc(void)
    ++static int maintenance_task_gc(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_push(&child.args, "gc");
     +
    -+    if (opts.auto_flag)
    ++    if (opts->auto_flag)
     +        strvec_push(&child.args, "--auto");
     +
     +    close_object_store(the_repository->objects);
     +    return run_command(&child);
     +}
     +
    -+static int maintenance_run(void)
    ++static int maintenance_run(struct maintenance_opts *opts)
     +{
    -+    return maintenance_task_gc();
    ++    return maintenance_task_gc(opts);
     +}
     +
     +int cmd_maintenance(int argc, const char **argv, const char *prefix)
     +{
    ++    static struct maintenance_opts opts;
     +    static struct option builtin_maintenance_options[] = {
     +        OPT_BOOL(0, "auto", &opts.auto_flag,
     +             N_("run tasks based on the state of the repository")),
    @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     +
     +    if (argc == 1) {
     +        if (!strcmp(argv[0], "run"))
    -+            return maintenance_run();
    ++            return maintenance_run(&opts);
     +    }
     +
     +    usage_with_options(builtin_maintenance_usage,
    @@ t/t7900-maintenance.sh (new)
     +test_expect_success 'run [--auto]' '
     +    GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
     +    GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
    -+    grep ",\"gc\"]" run-no-auto.txt  &&
    -+    grep ",\"gc\",\"--auto\"]" run-auto.txt
    ++    test_subcommand git gc <run-no-auto.txt &&
    ++    test_subcommand git gc --auto <run-auto.txt
     +'
     +
     +test_done
    +
    + ## t/test-lib-functions.sh ##
    +@@ t/test-lib-functions.sh: test_path_is_hidden () {
    +     case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
    +     return 1
    + }
    ++
    ++# Check that the given command was invoked as part of the
    ++# trace2-format trace on stdin.
    ++#
    ++#    test_subcommand [!] <command> <args>... < <trace>
    ++#
    ++# For example, to look for an invocation of "git upload-pack
    ++# /path/to/repo"
    ++#
    ++#    GIT_TRACE2_EVENT=event.log git fetch ... &&
    ++#    test_subcommand git upload-pack "$PATH" <event.log
    ++#
    ++# If the first parameter passed is !, this instead checks that
    ++# the given command was not called.
    ++#
    ++test_subcommand () {
    ++    local negate=
    ++    if test "$1" = "!"
    ++    then
    ++        negate=t
    ++        shift
    ++    fi
    ++
    ++    local expr=$(printf '"%s",' "$@")
    ++    expr="${expr%,}"
    ++
    ++    if test -n "$negate"
    ++    then
    ++        ! grep "\[$expr\]"
    ++    else
    ++        grep "\[$expr\]"
    ++    fi
    ++}
 2:  6e533e43d7 !  2:  d5faef26af maintenance: add --quiet option
    @@ Documentation/git-maintenance.txt: OPTIONS
      ## builtin/gc.c ##
     @@ builtin/gc.c: static const char * const builtin_maintenance_usage[] = {

    - static struct maintenance_opts {
    + struct maintenance_opts {
          int auto_flag;
     +    int quiet;
    - } opts;
    + };

    - static int maintenance_task_gc(void)
    -@@ builtin/gc.c: static int maintenance_task_gc(void)
    + static int maintenance_task_gc(struct maintenance_opts *opts)
    +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)

    -     if (opts.auto_flag)
    +     if (opts->auto_flag)
              strvec_push(&child.args, "--auto");
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--quiet");
    ++    else
    ++        strvec_push(&child.args, "--no-quiet");

          close_object_store(the_repository->objects);
          return run_command(&child);
    @@ t/t7900-maintenance.sh: test_expect_success 'help text' '

     -test_expect_success 'run [--auto]' '
     -    GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
    +-    GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
    +-    test_subcommand git gc <run-no-auto.txt &&
    +-    test_subcommand git gc --auto <run-auto.txt
     +test_expect_success 'run [--auto|--quiet]' '
    -+    GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run --no-quiet &&
    -     GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-quiet.txt" git maintenance run --quiet &&
    -     grep ",\"gc\"]" run-no-auto.txt  &&
    --    grep ",\"gc\",\"--auto\"]" run-auto.txt
    -+    grep ",\"gc\",\"--auto\"" run-auto.txt &&
    -+    grep ",\"gc\",\"--quiet\"" run-quiet.txt
    ++    GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
    ++        git maintenance run 2>/dev/null &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
    ++        git maintenance run --auto 2>/dev/null &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
    ++        git maintenance run --no-quiet 2>/dev/null &&
    ++    test_subcommand git gc --quiet <run-no-auto.txt &&
    ++    test_subcommand git gc --auto --quiet <run-auto.txt &&
    ++    test_subcommand git gc --no-quiet <run-no-quiet.txt
      '

      test_done
 3:  c4674fc211 !  3:  233811310b maintenance: replace run_auto_gc()
    @@ run-command.c: int run_processes_parallel_tr2(int n, get_next_task_fn get_next_t
     -    strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
     -    if (quiet)
     -        strvec_push(&argv_gc_auto, "--quiet");
    --    status = run_command_v_opt(argv_gc_auto.items, RUN_GIT_CMD);
    +-    status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
     -    strvec_clear(&argv_gc_auto);
     -    return status;
     +    maint.git_cmd = 1;
 4:  b9332c1318 !  4:  7efa23abc8 maintenance: initialize task array
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_task_gc(void)
    +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
          return run_command(&child);
      }

    -+typedef int maintenance_task_fn(void);
    ++typedef int maintenance_task_fn(struct maintenance_opts *opts);
     +
     +struct maintenance_task {
     +    const char *name;
    @@ builtin/gc.c: static int maintenance_task_gc(void)
     +    },
     +};
     +
    - static int maintenance_run(void)
    + static int maintenance_run(struct maintenance_opts *opts)
      {
    --    return maintenance_task_gc();
    +-    return maintenance_task_gc(opts);
     +    int i;
     +    int result = 0;
     +
    @@ builtin/gc.c: static int maintenance_task_gc(void)
     +        if (!tasks[i].enabled)
     +            continue;
     +
    -+        if (tasks[i].fn()) {
    ++        if (tasks[i].fn(opts)) {
     +            error(_("task '%s' failed"), tasks[i].name);
     +            result = 1;
     +        }
 5:  a4d9836bed !  5:  902b742032 maintenance: add commit-graph task
    @@ Documentation/git-maintenance.txt: run::
          stands for "garbage collection," but this task performs many

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static struct maintenance_opts {
    +@@ builtin/gc.c: struct maintenance_opts {
          int quiet;
    - } opts;
    + };

    -+static int run_write_commit_graph(void)
    ++static int run_write_commit_graph(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
    @@ builtin/gc.c: static struct maintenance_opts {
     +    strvec_pushl(&child.args, "commit-graph", "write",
     +             "--split", "--reachable", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    return !!run_command(&child);
     +}
     +
    -+static int run_verify_commit_graph(void)
    ++static int run_verify_commit_graph(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
    @@ builtin/gc.c: static struct maintenance_opts {
     +    strvec_pushl(&child.args, "commit-graph", "verify",
     +             "--shallow", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    return !!run_command(&child);
     +}
     +
    -+static int maintenance_task_commit_graph(void)
    ++static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     +{
     +    struct repository *r = the_repository;
     +    char *chain_path;
     +
     +    close_object_store(r->objects);
    -+    if (run_write_commit_graph()) {
    ++    if (run_write_commit_graph(opts)) {
     +        error(_("failed to write commit-graph"));
     +        return 1;
     +    }
     +
    -+    if (!run_verify_commit_graph())
    ++    if (!run_verify_commit_graph(opts))
     +        return 0;
     +
     +    warning(_("commit-graph verify caught error, rewriting"));
    @@ builtin/gc.c: static struct maintenance_opts {
     +    }
     +    free(chain_path);
     +
    -+    if (!run_write_commit_graph())
    ++    if (!run_write_commit_graph(opts))
     +        return 0;
     +
     +    error(_("failed to rewrite commit-graph"));
     +    return 1;
     +}
     +
    - static int maintenance_task_gc(void)
    + static int maintenance_task_gc(struct maintenance_opts *opts)
      {
          struct child_process child = CHILD_PROCESS_INIT;
     @@ builtin/gc.c: struct maintenance_task {
    @@ builtin/gc.c: static struct maintenance_task tasks[] = {
     +    },
      };

    - static int maintenance_run(void)
    + static int maintenance_run(struct maintenance_opts *opts)

      ## commit-graph.c ##
     @@ commit-graph.c: static char *get_split_graph_filename(struct object_directory *odb,
 6:  dafb0d9bbc !  6:  dddbcc4f3d maintenance: add --task option
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## Documentation/git-maintenance.txt ##
    +@@ Documentation/git-maintenance.txt: SUBCOMMANDS
    + -----------
    + 
    + run::
    +-    Run one or more maintenance tasks.
    ++    Run one or more maintenance tasks. If one or more `--task=<task>`
    ++    options are specified, then those tasks are run in the provided
    ++    order. Otherwise, only the `gc` task is run.
    + 
    + TASKS
    + -----
     @@ Documentation/git-maintenance.txt: OPTIONS
      --quiet::
          Do not report progress or other information over `stderr`.
    @@ Documentation/git-maintenance.txt: OPTIONS
      Part of the linkgit:git[1] suite

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static const char * const builtin_maintenance_usage[] = {
    - static struct maintenance_opts {
    -     int auto_flag;
    -     int quiet;
    -+    int tasks_selected;
    - } opts;
    - 
    - static int run_write_commit_graph(void)
    -@@ builtin/gc.c: typedef int maintenance_task_fn(void);
    +@@ builtin/gc.c: typedef int maintenance_task_fn(struct maintenance_opts *opts);
      struct maintenance_task {
          const char *name;
          maintenance_task_fn *fn;
    @@ builtin/gc.c: static struct maintenance_task tasks[] = {
     +    return b->selected_order - a->selected_order;
     +}
     +
    - static int maintenance_run(void)
    + static int maintenance_run(struct maintenance_opts *opts)
      {
    -     int i;
    +-    int i;
    ++    int i, found_selected = 0;
          int result = 0;

    -+    if (opts.tasks_selected)
    ++    for (i = 0; !found_selected && i < TASK__COUNT; i++)
    ++        found_selected = tasks[i].selected;
    ++
    ++    if (found_selected)
     +        QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
     +
          for (i = 0; i < TASK__COUNT; i++) {
     -        if (!tasks[i].enabled)
    -+        if (opts.tasks_selected && !tasks[i].selected)
    ++        if (found_selected && !tasks[i].selected)
     +            continue;
     +
    -+        if (!opts.tasks_selected && !tasks[i].enabled)
    ++        if (!found_selected && !tasks[i].enabled)
                  continue;

    -         if (tasks[i].fn()) {
    -@@ builtin/gc.c: static int maintenance_run(void)
    +         if (tasks[i].fn(opts)) {
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
          return result;
      }

     +static int task_option_parse(const struct option *opt,
     +                 const char *arg, int unset)
     +{
    -+    int i;
    ++    int i, num_selected = 0;
     +    struct maintenance_task *task = NULL;
     +
     +    BUG_ON_OPT_NEG(unset);
     +
    -+    opts.tasks_selected++;
    -+
     +    for (i = 0; i < TASK__COUNT; i++) {
    ++        num_selected += tasks[i].selected;
     +        if (!strcasecmp(tasks[i].name, arg)) {
     +            task = &tasks[i];
    -+            break;
     +        }
     +    }
     +
    @@ builtin/gc.c: static int maintenance_run(void)
     +    }
     +
     +    task->selected = 1;
    -+    task->selected_order = opts.tasks_selected;
    ++    task->selected_order = num_selected + 1;
     +
     +    return 0;
     +}
     +
      int cmd_maintenance(int argc, const char **argv, const char *prefix)
      {
    -     static struct option builtin_maintenance_options[] = {
    +     static struct maintenance_opts opts;
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
                   N_("run tasks based on the state of the repository")),
              OPT_BOOL(0, "quiet", &opts.quiet,
    @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi

      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
    -     grep ",\"gc\",\"--quiet\"" run-quiet.txt
    +     test_subcommand git gc --no-quiet <run-no-quiet.txt
      '

     +test_expect_success 'run --task=<task>' '
    -+    GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" git maintenance run --task=commit-graph &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" git maintenance run --task=gc &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" git maintenance run --task=commit-graph &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-both.txt" git maintenance run --task=commit-graph --task=gc &&
    -+    ! grep ",\"gc\"" run-commit-graph.txt  &&
    -+    grep ",\"gc\"" run-gc.txt  &&
    -+    grep ",\"gc\"" run-both.txt  &&
    -+    grep ",\"commit-graph\",\"write\"" run-commit-graph.txt  &&
    -+    ! grep ",\"commit-graph\",\"write\"" run-gc.txt  &&
    -+    grep ",\"commit-graph\",\"write\"" run-both.txt
    ++    GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
    ++        git maintenance run --task=commit-graph 2>/dev/null &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
    ++        git maintenance run --task=gc 2>/dev/null &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
    ++        git maintenance run --task=commit-graph 2>/dev/null &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
    ++        git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
    ++    test_subcommand ! git gc --quiet <run-commit-graph.txt &&
    ++    test_subcommand git gc --quiet <run-gc.txt &&
    ++    test_subcommand git gc --quiet <run-both.txt &&
    ++    test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
    ++    test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
    ++    test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
     +'
     +
     +test_expect_success 'run --task=bogus' '
 7:  1b00524da3 !  7:  79af39be13 maintenance: take a lock on the objects directory
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_run(void)
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      {
    -     int i;
    +     int i, found_selected = 0;
          int result = 0;
     +    struct lock_file lk;
     +    struct repository *r = the_repository;
    @@ builtin/gc.c: static int maintenance_run(void)
     +         * recursive process stack. Do not report an error in
     +         * that case.
     +         */
    -+        if (!opts.auto_flag && !opts.quiet)
    ++        if (!opts->auto_flag && !opts->quiet)
     +            error(_("lock file '%s' exists, skipping maintenance"),
     +                  lock_path);
     +        free(lock_path);
    @@ builtin/gc.c: static int maintenance_run(void)
     +    }
     +    free(lock_path);

    -     if (opts.tasks_selected)
    -         QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
    -@@ builtin/gc.c: static int maintenance_run(void)
    +     for (i = 0; !found_selected && i < TASK__COUNT; i++)
    +         found_selected = tasks[i].selected;
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
              }
          }

 8:  0e94e04dcd <  -:  ---------- fetch: optionally allow disabling FETCH_HEAD update
 9:  9e38ade15c <  -:  ---------- maintenance: add prefetch task
10:  0128fdfd1a <  -:  ---------- maintenance: add loose-objects task
11:  c2baf6e119 <  -:  ---------- midx: enable core.multiPackIndex by default
12:  00f47c4848 <  -:  ---------- maintenance: add incremental-repack task
13:  ef2a231956 <  -:  ---------- maintenance: auto-size incremental-repack batch
14:  99840c4b8f !  8:  69bfc6a4b2 maintenance: create maintenance.<task>.enabled config
    @@ Documentation/git-maintenance.txt: SUBCOMMANDS
      -----------

      run::
    --    Run one or more maintenance tasks.
    +-    Run one or more maintenance tasks. If one or more `--task=<task>`
    +-    options are specified, then those tasks are run in the provided
    +-    order. Otherwise, only the `gc` task is run.
     +    Run one or more maintenance tasks. If one or more `--task` options
     +    are specified, then those tasks are run in that order. Otherwise,
     +    the tasks are determined by which `maintenance.<task>.enabled`
    @@ Documentation/git-maintenance.txt: SUBCOMMANDS
      -----

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_run(void)
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
          return result;
      }

    @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi

      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
    -     grep ",\"gc\",\"--quiet\"" run-quiet.txt
    +     test_subcommand git gc --no-quiet <run-no-quiet.txt
      '

     +test_expect_success 'maintenance.<task>.enabled' '
     +    git config maintenance.gc.enabled false &&
     +    git config maintenance.commit-graph.enabled true &&
    -+    git config maintenance.loose-objects.enabled true &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run &&
    -+    ! grep ",\"fetch\"" run-config.txt &&
    -+    ! grep ",\"gc\"" run-config.txt &&
    -+    ! grep ",\"multi-pack-index\"" run-config.txt &&
    -+    grep ",\"commit-graph\"" run-config.txt &&
    -+    grep ",\"prune-packed\"" run-config.txt
    ++    GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
    ++    test_subcommand ! git gc --quiet <run-config.txt &&
    ++    test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
     +'
     +
      test_expect_success 'run --task=<task>' '
    -     GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" git maintenance run --task=commit-graph &&
    -     GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" git maintenance run --task=gc &&
    +     GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
    +         git maintenance run --task=commit-graph 2>/dev/null &&
15:  a087c63572 !  9:  df21bbb000 maintenance: use pointers to check --auto
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_task_incremental_repack(void)
    +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)

    - typedef int maintenance_task_fn(void);
    + typedef int maintenance_task_fn(struct maintenance_opts *opts);

     +/*
     + * An auto condition function returns 1 if the task should run
    @@ builtin/gc.c: static struct maintenance_task tasks[] = {
              1,
          },
          [TASK_COMMIT_GRAPH] = {
    -@@ builtin/gc.c: static int maintenance_run(void)
    -         if (!opts.tasks_selected && !tasks[i].enabled)
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
    +         if (!found_selected && !tasks[i].enabled)
                  continue;

    -+        if (opts.auto_flag &&
    ++        if (opts->auto_flag &&
     +            (!tasks[i].auto_condition ||
     +             !tasks[i].auto_condition()))
     +            continue;
     +
    -         if (tasks[i].fn()) {
    +         if (tasks[i].fn(opts)) {
                  error(_("task '%s' failed"), tasks[i].name);
                  result = 1;
     @@ builtin/gc.c: static void initialize_task_config(void)
    @@ t/t5514-fetch-multiple.sh: test_expect_success 'git fetch --multiple (two remote

      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
    -     GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
    -     GIT_TRACE2_EVENT="$(pwd)/run-quiet.txt" git maintenance run --quiet &&
    -     grep ",\"gc\"]" run-no-auto.txt  &&
    --    grep ",\"gc\",\"--auto\"" run-auto.txt &&
    -+    ! grep ",\"gc\",\"--auto\"" run-auto.txt &&
    -     grep ",\"gc\",\"--quiet\"" run-quiet.txt
    +     GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
    +         git maintenance run --no-quiet 2>/dev/null &&
    +     test_subcommand git gc --quiet <run-no-auto.txt &&
    +-    test_subcommand git gc --auto --quiet <run-auto.txt &&
    ++    test_subcommand ! git gc --auto --quiet <run-auto.txt &&
    +     test_subcommand git gc --no-quiet <run-no-quiet.txt
      '

16:  ef3a854508 ! 10:  e67e259aef maintenance: add auto condition for commit-graph task
    @@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::

      ## builtin/gc.c ##
     @@
    + #include "blob.h"
    + #include "tree.h"
      #include "promisor-remote.h"
    - #include "remote.h"
    - #include "midx.h"
     +#include "refs.h"

      #define FAILED_RUN "failed to run %s"

    -@@ builtin/gc.c: static struct maintenance_opts {
    -     int tasks_selected;
    - } opts;
    +@@ builtin/gc.c: struct maintenance_opts {
    +     int quiet;
    + };

     +/* Remember to update object flag allocation in object.h */
     +#define PARENT1        (1u<<16)
    @@ builtin/gc.c: static struct maintenance_opts {
     +    return result;
     +}
     +
    - static int run_write_commit_graph(void)
    + static int run_write_commit_graph(struct maintenance_opts *opts)
      {
          struct child_process child = CHILD_PROCESS_INIT;
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
17:  6ac3a58f2f <  -:  ---------- maintenance: create auto condition for loose-objects
18:  801b262d1c <  -:  ---------- maintenance: add incremental-repack auto condition
19:  9b4cef7635 <  -:  ---------- midx: use start_delayed_progress()
20:  39eb83ad1e ! 11:  a5d1914846 maintenance: add trace2 regions for task execution
    @@ Commit message
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_run(void)
    +@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
                   !tasks[i].auto_condition()))
                  continue;

     +        trace2_region_enter("maintenance", tasks[i].name, r);
    -         if (tasks[i].fn()) {
    +         if (tasks[i].fn(opts)) {
                  error(_("task '%s' failed"), tasks[i].name);
                  result = 1;
              }

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  14 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  86 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 353 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  61 +++++
 t/test-lib-functions.sh              |  33 +++
 23 files changed, 584 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: d70a9eb611a9d242c1d26847d223b8677609305b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/695
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-07 22:16   ` Martin Ågren
  2020-08-12 21:03   ` Jonathan Nieder
  2020-08-06 15:48 ` [PATCH 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 19 +++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 7 files changed, 169 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..34cd2b4417
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
++
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
++
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Cleanup unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be rather expensive for large
+	repositories, as it repacks all Git objects into a single pack-file.
+	It can also be disruptive in some situations, as it deletes stale
+	data.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..e4f0ce1c86 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_usage[] = {
+	N_("git maintenance run [<options>]"),
+	NULL
+};
+
+struct maintenance_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(struct maintenance_opts *opts)
+{
+	return maintenance_task_gc(opts);
+}
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	static struct maintenance_opts opts;
+	static struct option builtin_maintenance_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+
+	memset(&opts, 0, sizeof(opts));
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_maintenance_usage,
+				   builtin_maintenance_options);
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_options,
+			     builtin_maintenance_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+
+	if (argc == 1) {
+		if (!strcmp(argv[0], "run"))
+			return maintenance_run(&opts);
+	}
+
+	usage_with_options(builtin_maintenance_usage,
+			   builtin_maintenance_options);
+}
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..c4b9b4a6fe
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 3103be8a32..0adf2b85f8 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1561,3 +1561,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 02/11] maintenance: add --quiet option
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      |  9 +++++++++
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 34cd2b4417..089fa4cedc 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index e4f0ce1c86..a060ba8424 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -707,6 +707,7 @@ static const char * const builtin_maintenance_usage[] = {
 
 struct maintenance_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -734,6 +739,8 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	static struct option builtin_maintenance_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 
@@ -743,6 +750,8 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_maintenance_usage,
 				   builtin_maintenance_options);
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_options,
 			     builtin_maintenance_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c4b9b4a6fe..7b63b4ec0c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -9,11 +9,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "usage: git maintenance run" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 03/11] maintenance: replace run_auto_gc()
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 6e2a160a47..495bc8ab5a 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -86,9 +86,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c49f0e9752..c8b9366d3c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1882,7 +1884,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55..c068e73037 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -457,7 +457,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index a66dbe0bde..9850ecde5d 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -919,7 +919,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 04/11] maintenance: initialize task array
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will send an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index a060ba8424..150dce4301 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,9 +728,45 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
 static int maintenance_run(struct maintenance_opts *opts)
 {
-	return maintenance_task_gc(opts);
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
 }
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 05/11] maintenance: add commit-graph task
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-07 22:29   ` Martin Ågren
  2020-08-06 15:48 ` [PATCH 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' job. It is based on the sequence of events in the
'commit-graph' job in Scalar [1]. This sequence is as follows:

1. git commit-graph write --reachable --split
2. git commit-graph verify --shallow
3. If the verify succeeds, stop.
4. Delete the commit-graph-chain file.
5. git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

By using 'git commit-graph verify --shallow' we can ensure that
the file we just wrote is valid. This is an extra safety precaution
that is faster than our 'write' subcommand. In the rare situation
that the newest layer of the commit-graph is corrupt, we can "fix"
the corruption by deleting the commit-graph-chain file and rewrite
the full commit-graph as a new one-layer commit graph. This does
not completely prevent _that_ file from being corrupt, but it does
recompute the commit-graph by parsing commits from the object
database. In our use of this step in Scalar and VFS for Git, we
have only seen this issue arise because our microsoft/git fork
reverted 43d3561 ("commit-graph write: don't die if the existing
graph is corrupt" 2019-03-25) for a while to keep commit-graph
writes very fast. We dropped the revert when updating to v2.23.0.
The verify still has potential for catching corrupt data across
the layer boundary: if the new file has commit X with parent Y
in an old file but the commit ID for Y in the old file had a
bitswap, then we will notice that in the 'verify' command.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/CommitGraphStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 18 +++++++++
 builtin/gc.c                      | 63 +++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++--
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 +
 5 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 089fa4cedc..35b0be7d40 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,24 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. If the new layer has an
+	issue, then the chain file is removed and the `commit-graph` is
+	rewritten from scratch.
++
+The verification only checks the top layer of the `commit-graph` chain.
+If the incremental write merged the new commits with at least one
+existing layer, then there is potential for on-disk corruption being
+carried forward into the new file. This will be noticed and the new
+commit-graph file will be clean as Git reparses the commit data from
+the object database.
++
+The incremental write is safe to run alongside concurrent Git processes
+since it will not expire `.graph` files that were in the previous
+`commit-graph-chain` file. They will be deleted by a later run based on
+the expiration delay.
+
 gc::
 	Cleanup unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 150dce4301..3b7b914d60 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,64 @@ struct maintenance_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int run_verify_commit_graph(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "verify",
+		     "--shallow", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_opts *opts)
+{
+	struct repository *r = the_repository;
+	char *chain_path;
+
+	close_object_store(r->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	if (!run_verify_commit_graph(opts))
+		return 0;
+
+	warning(_("commit-graph verify caught error, rewriting"));
+
+	chain_path = get_commit_graph_chain_filename(r->objects->odb);
+	if (unlink(chain_path)) {
+		UNLEAK(chain_path);
+		die(_("failed to remove commit-graph at %s"), chain_path);
+	}
+	free(chain_path);
+
+	if (!run_write_commit_graph(opts))
+		return 0;
+
+	error(_("failed to rewrite commit-graph"));
+	return 1;
+}
+
 static int maintenance_task_gc(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +796,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +808,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run(struct maintenance_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index 1af68c297d..9705d237e4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -521,7 +521,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1619,7 +1619,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -1996,7 +1996,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 28f89cdf3e..3c202748c3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct commit;
 struct bloom_filter_settings;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 7b63b4ec0c..384294d111 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 06/11] maintenance: add --task option
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  8 +++-
 builtin/gc.c                      | 61 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 ++++++++++++++
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 35b0be7d40..5911213c9c 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -73,6 +75,10 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 3b7b914d60..3d50ab7ac9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
-	unsigned enabled:1;
+	unsigned enabled:1,
+		 selected:1;
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -814,13 +816,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run(struct maintenance_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && !tasks[i].selected)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -832,6 +853,37 @@ static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		num_selected += tasks[i].selected;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected = 1;
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
 	static struct maintenance_opts opts;
@@ -840,6 +892,9 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 384294d111..bcc7131818 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -23,4 +23,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 07/11] maintenance: take a lock on the objects directory
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then fail silently. This will become
very important later when we implement the 'fetch' task, as this is our
stop-gap from creating a recursive process loop between 'git fetch' and
'git maintenance run'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 3d50ab7ac9..1b7d502b87 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -830,6 +830,25 @@ static int maintenance_run(struct maintenance_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			error(_("lock file '%s' exists, skipping maintenance"),
+			      lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected;
@@ -850,6 +869,7 @@ static int maintenance_run(struct maintenance_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 08/11] maintenance: create maintenance.<task>.enabled config
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  4 ++++
 Documentation/git-maintenance.txt    |  8 +++++---
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ef0768b91a..2783b825f9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -396,6 +396,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..370cbfb42f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,4 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified.
+	By default, only `maintenance.gc.enabled` is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 5911213c9c..e1a2a8902c 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
diff --git a/builtin/gc.c b/builtin/gc.c
index 1b7d502b87..dc1d260858 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -873,6 +873,24 @@ static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -925,6 +943,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 				   builtin_maintenance_options);
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_options,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index bcc7131818..cdf0bf2e60 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -23,6 +23,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 09/11] maintenance: use pointers to check --auto
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index dc1d260858..2b3e45eb71 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -788,9 +788,17 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1,
 		 selected:1;
 	int selected_order;
@@ -808,6 +816,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -863,6 +872,11 @@ static int maintenance_run(struct maintenance_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -877,6 +891,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index cdf0bf2e60..406dc7c303 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -19,7 +19,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 10/11] maintenance: add auto condition for commit-graph task
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-06 15:48 ` [PATCH 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the PARENT1 flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 76 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 87 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 370cbfb42f..9bd69b9df3 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -2,3 +2,13 @@ maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
 	with name `<task>` is run when no `--task` option is specified.
 	By default, only `maintenance.gc.enabled` is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index 2b3e45eb71..20d85f6f4e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,80 @@ struct maintenance_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define PARENT1		(1u<<16)
+
+static int num_commits_not_in_graph = 0;
+static int limit_commits_not_in_graph = 100;
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & PARENT1)
+				continue;
+
+			parent->item->object.flags |= PARENT1;
+			num_commits_not_in_graph++;
+
+			if (num_commits_not_in_graph >= limit_commits_not_in_graph) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &limit_commits_not_in_graph);
+
+	if (!limit_commits_not_in_graph)
+		return 0;
+	if (limit_commits_not_in_graph < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, NULL);
+
+	clear_commit_marks_all(PARENT1);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -822,6 +897,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..30d4e0f6a0 100644
--- a/object.h
+++ b/object.h
@@ -74,6 +74,7 @@ struct object_array {
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
  * builtin/index-pack.c:                                     2021
+ * builtin/maintenance.c:                           16
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
  * builtin/show-branch.c:    0-------------------------------------------26
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 11/11] maintenance: add trace2 regions for task execution
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-06 15:48 ` Derrick Stolee via GitGitGadget
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-06 15:48 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 20d85f6f4e..8162bca974 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -953,10 +953,12 @@ static int maintenance_run(struct maintenance_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-08-07 22:16   ` Martin Ågren
  2020-08-12 21:03   ` Jonathan Nieder
  1 sibling, 0 replies; 91+ messages in thread
From: Martin Ågren @ 2020-08-07 22:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, brian m. carlson, Josh Steadmon,
	Jonathan Nieder, Jeff King,
	Đoàn Trần Công Danh, Phillip Wood,
	Emily Shaffer, sluongng, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

On Thu, 6 Aug 2020 at 19:57, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++

This "+" and the one below render literally so you would want to drop
them. (You're not in any kind of "list" here, so no need for a "list
continuation".)

> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.

Martin

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/11] maintenance: add commit-graph task
  2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-07 22:29   ` Martin Ågren
  2020-08-12 13:30     ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Martin Ågren @ 2020-08-07 22:29 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, brian m. carlson, Josh Steadmon,
	Jonathan Nieder, Jeff King,
	Đoàn Trần Công Danh, Phillip Wood,
	Emily Shaffer, sluongng, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 089fa4cedc..35b0be7d40 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -35,6 +35,24 @@ run::
>  TASKS
>  -----
>
> +commit-graph::
> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> +       then verifies that the written data is correct. If the new layer has an
> +       issue, then the chain file is removed and the `commit-graph` is
> +       rewritten from scratch.
> ++
> +The verification only checks the top layer of the `commit-graph` chain.
> +If the incremental write merged the new commits with at least one
> +existing layer, then there is potential for on-disk corruption being
> +carried forward into the new file. This will be noticed and the new
> +commit-graph file will be clean as Git reparses the commit data from
> +the object database.

This reads somewhat awkwardly. I think what you mean is something like
"is there a risk for on-disk corruption? yes, but no: we're clever
enough to detect it and avoid it". So from a user's point of view, I
think this is too detailed.

How about

 The verification checks the top layer of the resulting `commit-graph` chain.
 This ensures that the maintenance task leaves the top layer in a shape
 that matches the object database, even if it was ostensibly constructed
 by "merging in" existing, incorrect layers.

? I don't quite like it -- it's a bit too technical -- but it describes
the end result which the user should care about -- on-disk data is
consistent and correct -- rather than how we got there.

Perhaps:

 It is ensured that the resulting "top layer" is correct. This should
 help avoid most on-disk corruption of the commit-graph and ensure
 that the commit-graph matches the object database.

Don't know whether that's entirely true though...

Food for thought, perhaps.

There's something probabilistic about this whole thing: If a low layer
is corrupt, you might "eventually" get to replace it. I suppose it could
make sense to go "verify the whole thing, drop however many top layers
we need to drop to get only correct layers, then generate a new layer
(and merge and whatnot) on top of that". But the proposed commit message
makes it fairly clear that this would have other drawbacks and that we
don't really expect corrupt layers in the first place.

Martin

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/11] maintenance: add commit-graph task
  2020-08-07 22:29   ` Martin Ågren
@ 2020-08-12 13:30     ` Derrick Stolee
  2020-08-14 12:23       ` Martin Ågren
  0 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee @ 2020-08-12 13:30 UTC (permalink / raw)
  To: Martin Ågren, Derrick Stolee via GitGitGadget
  Cc: Git Mailing List, brian m. carlson, Josh Steadmon,
	Jonathan Nieder, Jeff King,
	Đoàn Trần Công Danh, Phillip Wood,
	Emily Shaffer, sluongng, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

On 8/7/2020 6:29 PM, Martin Ågren wrote:
> On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
>> index 089fa4cedc..35b0be7d40 100644
>> --- a/Documentation/git-maintenance.txt
>> +++ b/Documentation/git-maintenance.txt
>> @@ -35,6 +35,24 @@ run::
>>  TASKS
>>  -----
>>
>> +commit-graph::
>> +       The `commit-graph` job updates the `commit-graph` files incrementally,
>> +       then verifies that the written data is correct. If the new layer has an
>> +       issue, then the chain file is removed and the `commit-graph` is
>> +       rewritten from scratch.
>> ++
>> +The verification only checks the top layer of the `commit-graph` chain.
>> +If the incremental write merged the new commits with at least one
>> +existing layer, then there is potential for on-disk corruption being
>> +carried forward into the new file. This will be noticed and the new
>> +commit-graph file will be clean as Git reparses the commit data from
>> +the object database.
> 
> This reads somewhat awkwardly. I think what you mean is something like
> "is there a risk for on-disk corruption? yes, but no: we're clever
> enough to detect it and avoid it". So from a user's point of view, I
> think this is too detailed.
> 
> How about
> 
>  The verification checks the top layer of the resulting `commit-graph` chain.
>  This ensures that the maintenance task leaves the top layer in a shape
>  that matches the object database, even if it was ostensibly constructed
>  by "merging in" existing, incorrect layers.
> 
> ? I don't quite like it -- it's a bit too technical -- but it describes
> the end result which the user should care about -- on-disk data is
> consistent and correct -- rather than how we got there.
> 
> Perhaps:
> 
>  It is ensured that the resulting "top layer" is correct. This should
>  help avoid most on-disk corruption of the commit-graph and ensure
>  that the commit-graph matches the object database.
> 
> Don't know whether that's entirely true though...

This is my understanding. We focus on the data we just wrote to see
if anything went wrong at a lower level (i.e. filesystem or RAM
corruption during the write process) instead of the data "at rest".

> Food for thought, perhaps.
> 
> There's something probabilistic about this whole thing: If a low layer
> is corrupt, you might "eventually" get to replace it. I suppose it could
> make sense to go "verify the whole thing, drop however many top layers
> we need to drop to get only correct layers, then generate a new layer
> (and merge and whatnot) on top of that". But the proposed commit message
> makes it fairly clear that this would have other drawbacks and that we
> don't really expect corrupt layers in the first place.

Right. And we want to keep the amount of work to be very small in most
cases, amortizing the cost of the big merge operations across many much
smaller runs. Perhaps a later change could introduce an option to drop
the '--shallow' option and check the entire chain, for users willing to
pay that price.

Back to the point of your comments: I'm not sure this second paragraph
is required at all in the documentation. The first paragraph already
says:

	...then verifies that the written data is correct.

This "written data" _is_ the top layer of the chain. There is probably
no reason to dig deeper into _why_ we do this in this user-facing
documentation.

So, I propose just deleting this paragraph. What do you think?

Thanks for keeping a close eye on documentation changes! I updated
my local branch to include the '+' problem from your other reply.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-08-07 22:16   ` Martin Ågren
@ 2020-08-12 21:03   ` Jonathan Nieder
  2020-08-12 22:07     ` Junio C Hamano
  2020-08-14  1:05     ` Derrick Stolee
  1 sibling, 2 replies; 91+ messages in thread
From: Jonathan Nieder @ 2020-08-12 21:03 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

Derrick Stolee wrote:

> Helped-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  .gitignore                        |  1 +
>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>  builtin.h                         |  1 +
>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>  git.c                             |  1 +
>  t/t7900-maintenance.sh            | 19 +++++++++++
>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>  7 files changed, 169 insertions(+)
>  create mode 100644 Documentation/git-maintenance.txt
>  create mode 100755 t/t7900-maintenance.sh

Looks reasonable.

[...]
> --- /dev/null
> +++ b/Documentation/git-maintenance.txt
> @@ -0,0 +1,57 @@
> +git-maintenance(1)
> +==================
> +
> +NAME
> +----
> +git-maintenance - Run tasks to optimize Git repository data
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git maintenance' run [<options>]
> +
> +
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++
> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.
> +
> +SUBCOMMANDS
> +-----------
> +
> +run::
> +	Run one or more maintenance tasks.

This is still confusing --- shouldn't it say something like "Run the
`gc` maintenance task (see below)"?

[...]
> +TASKS
> +-----
> +
> +gc::
> +	Cleanup unnecessary files and optimize the local repository. "GC"

nit: cleanup is the noun, "clean up" is the verb

> +	stands for "garbage collection," but this task performs many
> +	smaller tasks. This task can be rather expensive for large

nit: the word "rather" is not doing much work here, so we could leave
it out

> +	repositories, as it repacks all Git objects into a single pack-file.
> +	It can also be disruptive in some situations, as it deletes stale
> +	data.

What stale data is this referring to?  Where can I read more about
what disruption it causes (or in other words, as a user, how would I
decide when *not* to run this command)?

[...]
> +OPTIONS
> +-------
> +--auto::
> +	When combined with the `run` subcommand, run maintenance tasks
> +	only if certain thresholds are met. For example, the `gc` task
> +	runs when the number of loose objects exceeds the number stored
> +	in the `gc.auto` config setting, or when the number of pack-files
> +	exceeds the `gc.autoPackLimit` config setting.

Hm, today I learned.  I had thought that --auto doesn't only affect
thresholds but also would affect how aggressive the gc is when
triggered, but the git-gc(1) manpage agrees with what's said above.

Is there a way we can share information between the two to avoid one
falling out of date?  For example, should git-gc.txt point to this
page for the authoritative description?

[...]
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  
>  	return 0;
>  }
> +
> +static const char * const builtin_maintenance_usage[] = {
> +	N_("git maintenance run [<options>]"),

not about this patch: I wish we could automatically generate a usage
string in this simple kind of case, to decrease the burden on
translators.

[...]
> +static int maintenance_task_gc(struct maintenance_opts *opts)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	child.git_cmd = 1;
> +	strvec_push(&child.args, "gc");
> +
> +	if (opts->auto_flag)
> +		strvec_push(&child.args, "--auto");
> +
> +	close_object_store(the_repository->objects);

Interesting --- what does this function call do?

> +	return run_command(&child);
> +}
> +
> +static int maintenance_run(struct maintenance_opts *opts)
> +{
> +	return maintenance_task_gc(opts);
> +}
> +
> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
> +{
> +	static struct maintenance_opts opts;
> +	static struct option builtin_maintenance_options[] = {
> +		OPT_BOOL(0, "auto", &opts.auto_flag,
> +			 N_("run tasks based on the state of the repository")),
> +		OPT_END()
> +	};

optional: these could be local instead of static

> +
> +	memset(&opts, 0, sizeof(opts));

not needed if it's static.  If it's not static, could use `opts = {0}`
where it's declared to do the same thing in a simpler way.

> +
> +	if (argc == 2 && !strcmp(argv[1], "-h"))
> +		usage_with_options(builtin_maintenance_usage,
> +				   builtin_maintenance_options);
> +
> +	argc = parse_options(argc, argv, prefix,
> +			     builtin_maintenance_options,
> +			     builtin_maintenance_usage,
> +			     PARSE_OPT_KEEP_UNKNOWN);
> +
> +	if (argc == 1) {
> +		if (!strcmp(argv[0], "run"))
> +			return maintenance_run(&opts);
> +	}
> +
> +	usage_with_options(builtin_maintenance_usage,
> +			   builtin_maintenance_options);

optional: could reverse this test to get the uninteresting case out of
the way first:

	if (argc != 1)
		usage ...

	...

That would also allow making the usage string when argv[0] is not
"run" more specific.

[...]
> --- /dev/null
> +++ b/t/t7900-maintenance.sh
> @@ -0,0 +1,19 @@
> +#!/bin/sh
> +
> +test_description='git maintenance builtin'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'help text' '
> +	test_expect_code 129 git maintenance -h 2>err &&
> +	test_i18ngrep "usage: git maintenance run" err
> +'
> +
> +test_expect_success 'run [--auto]' '
> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
> +	test_subcommand git gc <run-no-auto.txt &&
> +	test_subcommand git gc --auto <run-auto.txt

Nice.

[...]
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>  	return 1
>  }
> +
> +# Check that the given command was invoked as part of the
> +# trace2-format trace on stdin.
> +#
> +#	test_subcommand [!] <command> <args>... < <trace>
> +#
> +# For example, to look for an invocation of "git upload-pack
> +# /path/to/repo"
> +#
> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
> +#	test_subcommand git upload-pack "$PATH" <event.log
> +#
> +# If the first parameter passed is !, this instead checks that
> +# the given command was not called.
> +#
> +test_subcommand () {
> +	local negate=
> +	if test "$1" = "!"
> +	then
> +		negate=t
> +		shift
> +	fi
> +
> +	local expr=$(printf '"%s",' "$@")
> +	expr="${expr%,}"
> +
> +	if test -n "$negate"
> +	then
> +		! grep "\[$expr\]"
> +	else
> +		grep "\[$expr\]"
> +	fi
> +}

With whatever subset of the changes described above makes sense,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for your patient work.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-12 21:03   ` Jonathan Nieder
@ 2020-08-12 22:07     ` Junio C Hamano
  2020-08-12 22:50       ` Jonathan Nieder
  2020-08-14  1:05     ` Derrick Stolee
  1 sibling, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2020-08-12 22:07 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Derrick Stolee via GitGitGadget, git, sandals, steadmon, peff,
	congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

Jonathan Nieder <jrnieder@gmail.com> writes:

>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
>
> optional: these could be local instead of static

Do we have preference?  I think it is more common in our codebase to
have these non-static but I do not think we've chosen which is more
preferable with technical reasoning behind it.  As the top-level
function in any callchain, cmd_foo() does not have multiple instances
running at the same time, or it does not have to be prepared to be
called twice, so they can afford to be static, but is there any upside
for them to be static?

>> +
>> +	memset(&opts, 0, sizeof(opts));
>
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.

Okay, so that's one upside, albeit a small one.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-12 22:07     ` Junio C Hamano
@ 2020-08-12 22:50       ` Jonathan Nieder
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Nieder @ 2020-08-12 22:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, sandals, steadmon, peff,
	congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>> +static int maintenance_run(struct maintenance_opts *opts)
>>> +{
>>> +	return maintenance_task_gc(opts);
>>> +}
>>> +
>>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>>> +{
>>> +	static struct maintenance_opts opts;
>>> +	static struct option builtin_maintenance_options[] = {
>>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>>> +			 N_("run tasks based on the state of the repository")),
>>> +		OPT_END()
>>> +	};
>>
>> optional: these could be local instead of static
>
> Do we have preference?  I think it is more common in our codebase to
> have these non-static but I do not think we've chosen which is more
> preferable with technical reasoning behind it.  As the top-level
> function in any callchain, cmd_foo() does not have multiple instances
> running at the same time, or it does not have to be prepared to be
> called twice, so they can afford to be static, but is there any upside
> for them to be static?

Code size, execution time:

- benefit of static: can rely on automatic zero-initialization of
  pages instead of spending cycles and text size on explicit
  zero-initialization

- benefit of local: avoids wasting address space in bss when the
  function isn't called

Neither of those seems important enough to care about. :)

Maintainability:

- benefit of static: address is determined at compile time, so can build
  using C89 compilers that require struct initializers to be constant
  (but Git already cannot be built with such compilers)

- benefit of local: less likely to accidentally move the static var into
  a function that needs to be reentrant

- benefit of local: allows readers and reviewers to build the habit of
  seeing non-const "static" declarations within a function as the
  unusual case.  When a "static" is declared in a function body, this
  means we are caching something or have some other performance reason
  to sacrifice reentrancy

It's mostly that last aspect (saving readers from having to think
about it) that motivates my suggestion above.

Older Git code used static more because it was needed when building
using C89 compilers.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/11] maintenance: create basic maintenance runner
  2020-08-12 21:03   ` Jonathan Nieder
  2020-08-12 22:07     ` Junio C Hamano
@ 2020-08-14  1:05     ` Derrick Stolee
  1 sibling, 0 replies; 91+ messages in thread
From: Derrick Stolee @ 2020-08-14  1:05 UTC (permalink / raw)
  To: Jonathan Nieder, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

On 8/12/2020 5:03 PM, Jonathan Nieder wrote:
> Derrick Stolee wrote:
> 
>> Helped-by: Jonathan Nieder <jrnieder@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  .gitignore                        |  1 +
>>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>>  builtin.h                         |  1 +
>>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>>  git.c                             |  1 +
>>  t/t7900-maintenance.sh            | 19 +++++++++++
>>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>>  7 files changed, 169 insertions(+)
>>  create mode 100644 Documentation/git-maintenance.txt
>>  create mode 100755 t/t7900-maintenance.sh
> 
> Looks reasonable.
> 
> [...]
>> --- /dev/null
>> +++ b/Documentation/git-maintenance.txt
>> @@ -0,0 +1,57 @@
>> +git-maintenance(1)
>> +==================
>> +
>> +NAME
>> +----
>> +git-maintenance - Run tasks to optimize Git repository data
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git maintenance' run [<options>]
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +Run tasks to optimize Git repository data, speeding up other Git commands
>> +and reducing storage requirements for the repository.
>> ++
>> +Git commands that add repository data, such as `git add` or `git fetch`,
>> +are optimized for a responsive user experience. These commands do not take
>> +time to optimize the Git data, since such optimizations scale with the full
>> +size of the repository while these user commands each perform a relatively
>> +small action.
>> ++
>> +The `git maintenance` command provides flexibility for how to optimize the
>> +Git repository.
>> +
>> +SUBCOMMANDS
>> +-----------
>> +
>> +run::
>> +	Run one or more maintenance tasks.
> 
> This is still confusing --- shouldn't it say something like "Run the
> `gc` maintenance task (see below)"?

As I mentioned in the earlier version, I disagree with this.

> [...]
>> +TASKS
>> +-----
>> +
>> +gc::
>> +	Cleanup unnecessary files and optimize the local repository. "GC"
> 
> nit: cleanup is the noun, "clean up" is the verb
> 
>> +	stands for "garbage collection," but this task performs many
>> +	smaller tasks. This task can be rather expensive for large
> 
> nit: the word "rather" is not doing much work here, so we could leave
> it out

Both good.

> 
>> +	repositories, as it repacks all Git objects into a single pack-file.
>> +	It can also be disruptive in some situations, as it deletes stale
>> +	data.
> 
> What stale data is this referring to?  Where can I read more about
> what disruption it causes (or in other words, as a user, how would I
> decide when *not* to run this command)?

For this and the next comment, I prefer (for now) to keep all of the
deep details of the 'gc' task in the git-gc.txt documentation. However,
I should use "linkgit:git-gc[1]" to point users to that for reference.

Eventually, the maintenance builtin might replace the gc builtin, but
only after all of its behavior has been extracted into smaller tasks.
Those tasks would be documented in detail here, allowing us to make the
blurb for the 'gc' task be "Run the 'prune-reflog', 'pack-refs', ...,
and 'full-repack' tasks."

> [...]
>> +OPTIONS
>> +-------
>> +--auto::
>> +	When combined with the `run` subcommand, run maintenance tasks
>> +	only if certain thresholds are met. For example, the `gc` task
>> +	runs when the number of loose objects exceeds the number stored
>> +	in the `gc.auto` config setting, or when the number of pack-files
>> +	exceeds the `gc.autoPackLimit` config setting.
> 
> Hm, today I learned.  I had thought that --auto doesn't only affect
> thresholds but also would affect how aggressive the gc is when
> triggered, but the git-gc(1) manpage agrees with what's said above.
> 
> Is there a way we can share information between the two to avoid one
> falling out of date?  For example, should git-gc.txt point to this
> page for the authoritative description?

I remember Junio mentioning something about how 'gc' interprets '--auto'
as a "quick" mode, but like you I cannot find any reference to that in
the docs. Since it is not documented there and none of the other tasks
I am implementing here use that interpretation, I'll plan to leave this
portion as-is.
 
> [...]
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>>  
>>  	return 0;
>>  }
>> +
>> +static const char * const builtin_maintenance_usage[] = {
>> +	N_("git maintenance run [<options>]"),
> 
> not about this patch: I wish we could automatically generate a usage
> string in this simple kind of case, to decrease the burden on
> translators.
> 
> [...]
>> +static int maintenance_task_gc(struct maintenance_opts *opts)
>> +{
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_push(&child.args, "gc");
>> +
>> +	if (opts->auto_flag)
>> +		strvec_push(&child.args, "--auto");
>> +
>> +	close_object_store(the_repository->objects);
> 
> Interesting --- what does this function call do?

We need to close our file handles on the pack-files (and
commit-graph and multi-pack-index) or else the GC subcommand
cannot delete the files on Windows.

>> +	return run_command(&child);
>> +}
>> +
>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
> 
> optional: these could be local instead of static

I see my problem here. The builtin_maintenance_options is static
(copied from another builtin, I'm sure) which is why the 'opts'
struct needs to be static or else this doesn't compile.

>> +
>> +	memset(&opts, 0, sizeof(opts));
> 
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.
> 
>> +
>> +	if (argc == 2 && !strcmp(argv[1], "-h"))
>> +		usage_with_options(builtin_maintenance_usage,
>> +				   builtin_maintenance_options);
>> +
>> +	argc = parse_options(argc, argv, prefix,
>> +			     builtin_maintenance_options,
>> +			     builtin_maintenance_usage,
>> +			     PARSE_OPT_KEEP_UNKNOWN);
>> +
>> +	if (argc == 1) {
>> +		if (!strcmp(argv[0], "run"))
>> +			return maintenance_run(&opts);
>> +	}
>> +
>> +	usage_with_options(builtin_maintenance_usage,
>> +			   builtin_maintenance_options);
> 
> optional: could reverse this test to get the uninteresting case out of
> the way first:
> 
> 	if (argc != 1)
> 		usage ...
> 
> 	...
> 
> That would also allow making the usage string when argv[0] is not
> "run" more specific.

Sure. Were you thinking of anything more specific than

	die(_("invalid subcommand: %s"), argv[0]);

? Of course, running "git maintenance barf" would result in

	fatal: invalid subcommand: barf

with an exit code of 128 instead of 129 like the usage string does,
so I'm not sure of the best way to make a custom "usage" error.

> [...]
>> --- /dev/null
>> +++ b/t/t7900-maintenance.sh
>> @@ -0,0 +1,19 @@
>> +#!/bin/sh
>> +
>> +test_description='git maintenance builtin'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'help text' '
>> +	test_expect_code 129 git maintenance -h 2>err &&
>> +	test_i18ngrep "usage: git maintenance run" err
>> +'
>> +
>> +test_expect_success 'run [--auto]' '
>> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
>> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
>> +	test_subcommand git gc <run-no-auto.txt &&
>> +	test_subcommand git gc --auto <run-auto.txt
> 
> Nice.
> 
> [...]
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>>  	return 1
>>  }
>> +
>> +# Check that the given command was invoked as part of the
>> +# trace2-format trace on stdin.
>> +#
>> +#	test_subcommand [!] <command> <args>... < <trace>
>> +#
>> +# For example, to look for an invocation of "git upload-pack
>> +# /path/to/repo"
>> +#
>> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
>> +#	test_subcommand git upload-pack "$PATH" <event.log
>> +#
>> +# If the first parameter passed is !, this instead checks that
>> +# the given command was not called.
>> +#
>> +test_subcommand () {
>> +	local negate=
>> +	if test "$1" = "!"
>> +	then
>> +		negate=t
>> +		shift
>> +	fi
>> +
>> +	local expr=$(printf '"%s",' "$@")
>> +	expr="${expr%,}"
>> +
>> +	if test -n "$negate"
>> +	then
>> +		! grep "\[$expr\]"
>> +	else
>> +		grep "\[$expr\]"
>> +	fi
>> +}
> 
> With whatever subset of the changes described above makes sense,
> 
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
> 
> Thanks for your patient work.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/11] maintenance: add commit-graph task
  2020-08-12 13:30     ` Derrick Stolee
@ 2020-08-14 12:23       ` Martin Ågren
  0 siblings, 0 replies; 91+ messages in thread
From: Martin Ågren @ 2020-08-14 12:23 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, Git Mailing List,
	brian m. carlson, Josh Steadmon, Jonathan Nieder, Jeff King,
	Đoàn Trần Công Danh, Phillip Wood,
	Emily Shaffer, sluongng, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

On Wed, 12 Aug 2020 at 15:30, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/7/2020 6:29 PM, Martin Ågren wrote:
> > On Thu, 6 Aug 2020 at 18:50, Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> >> index 089fa4cedc..35b0be7d40 100644
> >> --- a/Documentation/git-maintenance.txt
> >> +++ b/Documentation/git-maintenance.txt
> >> @@ -35,6 +35,24 @@ run::
> >>  TASKS
> >>  -----
> >>
> >> +commit-graph::
> >> +       The `commit-graph` job updates the `commit-graph` files incrementally,
> >> +       then verifies that the written data is correct. If the new layer has an
> >> +       issue, then the chain file is removed and the `commit-graph` is
> >> +       rewritten from scratch.
> >> ++
> >> +The verification only checks the top layer of the `commit-graph` chain.
> >> +If the incremental write merged the new commits with at least one
> >> +existing layer, then there is potential for on-disk corruption being
> >> +carried forward into the new file. This will be noticed and the new
> >> +commit-graph file will be clean as Git reparses the commit data from
> >> +the object database.
> >
> > This reads somewhat awkwardly. I think what you mean is something like
> > "is there a risk for on-disk corruption? yes, but no: we're clever
> > enough to detect it and avoid it". So from a user's point of view, I
> > think this is too detailed.

[snip quite a bit]

> Back to the point of your comments: I'm not sure this second paragraph
> is required at all in the documentation. The first paragraph already
> says:
>
>         ...then verifies that the written data is correct.
>
> This "written data" _is_ the top layer of the chain. There is probably
> no reason to dig deeper into _why_ we do this in this user-facing
> documentation.
>
> So, I propose just deleting this paragraph. What do you think?

Yeah, that makes lots of sense. Thanks!


Martin

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2020-08-06 15:48 ` [PATCH 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-08-18 14:22 ` Derrick Stolee via GitGitGadget
  2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
                     ` (12 more replies)
  11 siblings, 13 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:22 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This series is based on jk/strvec.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  14 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 ++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 354 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 23 files changed, 580 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: d70a9eb611a9d242c1d26847d223b8677609305b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v1:

  1:  2b9deb6d6a !  1:  e09e4a4a87 maintenance: create basic maintenance runner
     @@ Documentation/git-maintenance.txt (new)
      +-----------
      +Run tasks to optimize Git repository data, speeding up other Git commands
      +and reducing storage requirements for the repository.
     -++
     ++
      +Git commands that add repository data, such as `git add` or `git fetch`,
      +are optimized for a responsive user experience. These commands do not take
      +time to optimize the Git data, since such optimizations scale with the full
      +size of the repository while these user commands each perform a relatively
      +small action.
     -++
     ++
      +The `git maintenance` command provides flexibility for how to optimize the
      +Git repository.
      +
     @@ Documentation/git-maintenance.txt (new)
      +-----
      +
      +gc::
     -+	Cleanup unnecessary files and optimize the local repository. "GC"
     ++	Clean up unnecessary files and optimize the local repository. "GC"
      +	stands for "garbage collection," but this task performs many
     -+	smaller tasks. This task can be rather expensive for large
     -+	repositories, as it repacks all Git objects into a single pack-file.
     -+	It can also be disruptive in some situations, as it deletes stale
     -+	data.
     ++	smaller tasks. This task can be expensive for large repositories,
     ++	as it repacks all Git objects into a single pack-file. It can also
     ++	be disruptive in some situations, as it deletes stale data. See
     ++	linkgit:git-gc[1] for more details on garbage collection in Git.
      +
      +OPTIONS
      +-------
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +
      +int cmd_maintenance(int argc, const char **argv, const char *prefix)
      +{
     -+	static struct maintenance_opts opts;
     -+	static struct option builtin_maintenance_options[] = {
     ++	struct maintenance_opts opts;
     ++	struct option builtin_maintenance_options[] = {
      +		OPT_BOOL(0, "auto", &opts.auto_flag,
      +			 N_("run tasks based on the state of the repository")),
      +		OPT_END()
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +			     builtin_maintenance_usage,
      +			     PARSE_OPT_KEEP_UNKNOWN);
      +
     -+	if (argc == 1) {
     -+		if (!strcmp(argv[0], "run"))
     -+			return maintenance_run(&opts);
     -+	}
     ++	if (argc != 1)
     ++		usage_with_options(builtin_maintenance_usage,
     ++				   builtin_maintenance_options);
     ++
     ++	if (!strcmp(argv[0], "run"))
     ++		return maintenance_run(&opts);
      +
     -+	usage_with_options(builtin_maintenance_usage,
     -+			   builtin_maintenance_options);
     ++	die(_("invalid subcommand: %s"), argv[0]);
      +}
      
       ## git.c ##
     @@ t/t7900-maintenance.sh (new)
      +
      +test_expect_success 'help text' '
      +	test_expect_code 129 git maintenance -h 2>err &&
     -+	test_i18ngrep "usage: git maintenance run" err
     ++	test_i18ngrep "usage: git maintenance run" err &&
     ++	test_expect_code 128 git maintenance barf 2>err &&
     ++	test_i18ngrep "invalid subcommand: barf" err
      +'
      +
      +test_expect_success 'run [--auto]' '
  2:  d5faef26af !  2:  adae48d235 maintenance: add --quiet option
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       	close_object_store(the_repository->objects);
       	return run_command(&child);
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 	static struct option builtin_maintenance_options[] = {
     + 	struct option builtin_maintenance_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
      +		OPT_BOOL(0, "quiet", &opts.quiet,
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
     - 	test_i18ngrep "usage: git maintenance run" err
     + 	test_i18ngrep "invalid subcommand: barf" err
       '
       
      -test_expect_success 'run [--auto]' '
  3:  233811310b =  3:  91741a0cfc maintenance: replace run_auto_gc()
  4:  7efa23abc8 =  4:  1db3b96280 maintenance: initialize task array
  5:  902b742032 !  5:  50b457fd57 maintenance: add commit-graph task
     @@ Documentation/git-maintenance.txt: run::
      +	issue, then the chain file is removed and the `commit-graph` is
      +	rewritten from scratch.
      ++
     -+The verification only checks the top layer of the `commit-graph` chain.
     -+If the incremental write merged the new commits with at least one
     -+existing layer, then there is potential for on-disk corruption being
     -+carried forward into the new file. This will be noticed and the new
     -+commit-graph file will be clean as Git reparses the commit data from
     -+the object database.
     -++
      +The incremental write is safe to run alongside concurrent Git processes
      +since it will not expire `.graph` files that were in the previous
      +`commit-graph-chain` file. They will be deleted by a later run based on
      +the expiration delay.
      +
       gc::
     - 	Cleanup unnecessary files and optimize the local repository. "GC"
     + 	Clean up unnecessary files and optimize the local repository. "GC"
       	stands for "garbage collection," but this task performs many
      
       ## builtin/gc.c ##
     @@ t/t7900-maintenance.sh: test_description='git maintenance builtin'
      +
       test_expect_success 'help text' '
       	test_expect_code 129 git maintenance -h 2>err &&
     - 	test_i18ngrep "usage: git maintenance run" err
     + 	test_i18ngrep "usage: git maintenance run" err &&
  6:  dddbcc4f3d !  6:  85268bd53e maintenance: add --task option
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +
       int cmd_maintenance(int argc, const char **argv, const char *prefix)
       {
     - 	static struct maintenance_opts opts;
     + 	struct maintenance_opts opts;
      @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
       			 N_("run tasks based on the state of the repository")),
       		OPT_BOOL(0, "quiet", &opts.quiet,
  7:  79af39be13 =  7:  6f86cfaa94 maintenance: take a lock on the objects directory
  8:  69bfc6a4b2 =  8:  5c0f9d69d1 maintenance: create maintenance.<task>.enabled config
  9:  df21bbb000 =  9:  68bf5bef4b maintenance: use pointers to check --auto
 10:  e67e259aef = 10:  fc097c389a maintenance: add auto condition for commit-graph task
 11:  a5d1914846 = 11:  46fbe161aa maintenance: add trace2 regions for task execution

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 01/11] maintenance: create basic maintenance runner
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
@ 2020-08-18 14:22   ` Derrick Stolee via GitGitGadget
  2020-08-18 14:22   ` [PATCH v2 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:22 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 ++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 58 +++++++++++++++++++++++++++++++
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 21 +++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 7 files changed, 172 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..ff47fb3641
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
+
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
+
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Clean up unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be expensive for large repositories,
+	as it repacks all Git objects into a single pack-file. It can also
+	be disruptive in some situations, as it deletes stale data. See
+	linkgit:git-gc[1] for more details on garbage collection in Git.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..ce91c754ee 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,61 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_usage[] = {
+	N_("git maintenance run [<options>]"),
+	NULL
+};
+
+struct maintenance_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(struct maintenance_opts *opts)
+{
+	return maintenance_task_gc(opts);
+}
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	struct maintenance_opts opts;
+	struct option builtin_maintenance_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+
+	memset(&opts, 0, sizeof(opts));
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_maintenance_usage,
+				   builtin_maintenance_options);
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_options,
+			     builtin_maintenance_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+
+	if (argc != 1)
+		usage_with_options(builtin_maintenance_usage,
+				   builtin_maintenance_options);
+
+	if (!strcmp(argv[0], "run"))
+		return maintenance_run(&opts);
+
+	die(_("invalid subcommand: %s"), argv[0]);
+}
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..14aa691f19
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err &&
+	test_expect_code 128 git maintenance barf 2>err &&
+	test_i18ngrep "invalid subcommand: barf" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 3103be8a32..0adf2b85f8 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1561,3 +1561,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 02/11] maintenance: add --quiet option
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-08-18 14:22   ` Derrick Stolee via GitGitGadget
  2020-08-18 14:23   ` [PATCH v2 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:22 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      |  9 +++++++++
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ff47fb3641..04fa0fe329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index ce91c754ee..b75ddb780b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -707,6 +707,7 @@ static const char * const builtin_maintenance_usage[] = {
 
 struct maintenance_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -734,6 +739,8 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 
@@ -743,6 +750,8 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 		usage_with_options(builtin_maintenance_usage,
 				   builtin_maintenance_options);
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_options,
 			     builtin_maintenance_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 14aa691f19..47d512464c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -11,11 +11,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "invalid subcommand: barf" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 03/11] maintenance: replace run_auto_gc()
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-08-18 14:22   ` [PATCH v2 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 6e2a160a47..495bc8ab5a 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -86,9 +86,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c49f0e9752..c8b9366d3c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -196,8 +196,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1882,7 +1884,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55..c068e73037 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -457,7 +457,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index a66dbe0bde..9850ecde5d 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -919,7 +919,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 04/11] maintenance: initialize task array
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-18 23:46     ` Jonathan Tan
  2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will send an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index b75ddb780b..946d871d54 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,9 +728,45 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
 static int maintenance_run(struct maintenance_opts *opts)
 {
-	return maintenance_task_gc(opts);
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
 }
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 05/11] maintenance: add commit-graph task
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-18 23:51     ` Jonathan Tan
  2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' job. It is based on the sequence of events in the
'commit-graph' job in Scalar [1]. This sequence is as follows:

1. git commit-graph write --reachable --split
2. git commit-graph verify --shallow
3. If the verify succeeds, stop.
4. Delete the commit-graph-chain file.
5. git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

By using 'git commit-graph verify --shallow' we can ensure that
the file we just wrote is valid. This is an extra safety precaution
that is faster than our 'write' subcommand. In the rare situation
that the newest layer of the commit-graph is corrupt, we can "fix"
the corruption by deleting the commit-graph-chain file and rewrite
the full commit-graph as a new one-layer commit graph. This does
not completely prevent _that_ file from being corrupt, but it does
recompute the commit-graph by parsing commits from the object
database. In our use of this step in Scalar and VFS for Git, we
have only seen this issue arise because our microsoft/git fork
reverted 43d3561 ("commit-graph write: don't die if the existing
graph is corrupt" 2019-03-25) for a while to keep commit-graph
writes very fast. We dropped the revert when updating to v2.23.0.
The verify still has potential for catching corrupt data across
the layer boundary: if the new file has commit X with parent Y
in an old file but the commit ID for Y in the old file had a
bitswap, then we will notice that in the 'verify' command.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/CommitGraphStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt | 11 ++++++
 builtin/gc.c                      | 63 +++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++--
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 +
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 04fa0fe329..c816fa1dcd 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,17 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. If the new layer has an
+	issue, then the chain file is removed and the `commit-graph` is
+	rewritten from scratch.
++
+The incremental write is safe to run alongside concurrent Git processes
+since it will not expire `.graph` files that were in the previous
+`commit-graph-chain` file. They will be deleted by a later run based on
+the expiration delay.
+
 gc::
 	Clean up unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 946d871d54..4f9352b9d0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,64 @@ struct maintenance_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int run_verify_commit_graph(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "verify",
+		     "--shallow", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_opts *opts)
+{
+	struct repository *r = the_repository;
+	char *chain_path;
+
+	close_object_store(r->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	if (!run_verify_commit_graph(opts))
+		return 0;
+
+	warning(_("commit-graph verify caught error, rewriting"));
+
+	chain_path = get_commit_graph_chain_filename(r->objects->odb);
+	if (unlink(chain_path)) {
+		UNLEAK(chain_path);
+		die(_("failed to remove commit-graph at %s"), chain_path);
+	}
+	free(chain_path);
+
+	if (!run_write_commit_graph(opts))
+		return 0;
+
+	error(_("failed to rewrite commit-graph"));
+	return 1;
+}
+
 static int maintenance_task_gc(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +796,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +808,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run(struct maintenance_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index 1af68c297d..9705d237e4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -521,7 +521,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1619,7 +1619,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -1996,7 +1996,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 28f89cdf3e..3c202748c3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct commit;
 struct bloom_filter_settings;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 47d512464c..c0c4e6846e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 06/11] maintenance: add --task option
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-19  0:00     ` Jonathan Tan
  2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  8 +++-
 builtin/gc.c                      | 61 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 ++++++++++++++
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index c816fa1dcd..cf59eb258c 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -66,6 +68,10 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 4f9352b9d0..0231d2f8c1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
-	unsigned enabled:1;
+	unsigned enabled:1,
+		 selected:1;
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -814,13 +816,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run(struct maintenance_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && !tasks[i].selected)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -832,6 +853,37 @@ static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		num_selected += tasks[i].selected;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected = 1;
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
 	struct maintenance_opts opts;
@@ -840,6 +892,9 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c0c4e6846e..792765aff7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,4 +25,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 07/11] maintenance: take a lock on the objects directory
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-19  0:04     ` Jonathan Tan
  2020-08-18 14:23   ` [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then fail silently. This will become
very important later when we implement the 'fetch' task, as this is our
stop-gap from creating a recursive process loop between 'git fetch' and
'git maintenance run'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 0231d2f8c1..a47bf64623 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -830,6 +830,25 @@ static int maintenance_run(struct maintenance_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			error(_("lock file '%s' exists, skipping maintenance"),
+			      lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected;
@@ -850,6 +869,7 @@ static int maintenance_run(struct maintenance_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-18 14:23   ` [PATCH v2 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  4 ++++
 Documentation/git-maintenance.txt    |  8 +++++---
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ef0768b91a..2783b825f9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -396,6 +396,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..370cbfb42f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,4 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified.
+	By default, only `maintenance.gc.enabled` is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index cf59eb258c..9af08c644f 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
diff --git a/builtin/gc.c b/builtin/gc.c
index a47bf64623..e66326ede9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -873,6 +873,24 @@ static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -925,6 +943,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 				   builtin_maintenance_options);
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_options,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 792765aff7..290abb7832 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,6 +25,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 09/11] maintenance: use pointers to check --auto
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e66326ede9..c54c3070b8 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -788,9 +788,17 @@ static int maintenance_task_gc(struct maintenance_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1,
 		 selected:1;
 	int selected_order;
@@ -808,6 +816,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -863,6 +872,11 @@ static int maintenance_run(struct maintenance_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -877,6 +891,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 290abb7832..4f6a04ddb1 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -21,7 +21,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 10/11] maintenance: add auto condition for commit-graph task
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-19  0:09     ` Jonathan Tan
  2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the PARENT1 flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 76 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 87 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 370cbfb42f..9bd69b9df3 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -2,3 +2,13 @@ maintenance.<task>.enabled::
 	This boolean config option controls whether the maintenance task
 	with name `<task>` is run when no `--task` option is specified.
 	By default, only `maintenance.gc.enabled` is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index c54c3070b8..73e485bfc0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,80 @@ struct maintenance_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define PARENT1		(1u<<16)
+
+static int num_commits_not_in_graph = 0;
+static int limit_commits_not_in_graph = 100;
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & PARENT1)
+				continue;
+
+			parent->item->object.flags |= PARENT1;
+			num_commits_not_in_graph++;
+
+			if (num_commits_not_in_graph >= limit_commits_not_in_graph) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &limit_commits_not_in_graph);
+
+	if (!limit_commits_not_in_graph)
+		return 0;
+	if (limit_commits_not_in_graph < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, NULL);
+
+	clear_commit_marks_all(PARENT1);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -822,6 +897,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..30d4e0f6a0 100644
--- a/object.h
+++ b/object.h
@@ -74,6 +74,7 @@ struct object_array {
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
  * builtin/index-pack.c:                                     2021
+ * builtin/maintenance.c:                           16
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
  * builtin/show-branch.c:    0-------------------------------------------26
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 11/11] maintenance: add trace2 regions for task execution
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-18 14:23   ` Derrick Stolee via GitGitGadget
  2020-08-19  0:11     ` Jonathan Tan
  2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:23 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 73e485bfc0..3fdb08655c 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -953,10 +953,12 @@ static int maintenance_run(struct maintenance_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (10 preceding siblings ...)
  2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-08-18 20:18   ` Junio C Hamano
  2020-08-19 14:51     ` Derrick Stolee
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2020-08-18 20:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series is based on jk/strvec.

As jc/no-update-fetch-head topic seems to be getting popular for
some reason, let's do this:

 - recreate jc/no-update-fetch-head topic with patch [1/9] of the
   maintenance-2 series directly on top of 'master';

 - fork ds/maintenance-part-1 topic off of jc/no-update-fetch-head
   and queue these 11 patches;

 - fork ds/maintenance-part-2 topic off of ds/maintenance-part-1 and
   queue its 8 patches [2/9 - 9/9].

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 04/11] maintenance: initialize task array
  2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-08-18 23:46     ` Jonathan Tan
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Tan @ 2020-08-18 23:46 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> In anticipation of implementing multiple maintenance tasks inside the
> 'maintenance' builtin, use a list of structs to describe the work to be
> done.
> 
> The struct maintenance_task stores the name of the task (as given by a
> future command-line argument) along with a function pointer to its
> implementation and a boolean for whether the step is enabled.
> 
> A list these structs are initialized with the full list of implemented
> tasks along with a default order. For now, this list only contains the
> "gc" task. This task is also the only task enabled by default.
> 
> The run subcommand will return a nonzero exit code if any task fails.
> However, it will attempt all tasks in its loop before returning with the
> failure. Also each failed task will send an error message.

Better to say "will print an error message", I think.

Other than that, patches 1-4 look good to me.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 05/11] maintenance: add commit-graph task
  2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-18 23:51     ` Jonathan Tan
  2020-08-19 15:04       ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Jonathan Tan @ 2020-08-18 23:51 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> By using 'git commit-graph verify --shallow' we can ensure that
> the file we just wrote is valid. This is an extra safety precaution
> that is faster than our 'write' subcommand. In the rare situation
> that the newest layer of the commit-graph is corrupt, we can "fix"
> the corruption by deleting the commit-graph-chain file and rewrite
> the full commit-graph as a new one-layer commit graph. This does
> not completely prevent _that_ file from being corrupt, but it does
> recompute the commit-graph by parsing commits from the object
> database. In our use of this step in Scalar and VFS for Git, we
> have only seen this issue arise because our microsoft/git fork
> reverted 43d3561 ("commit-graph write: don't die if the existing
> graph is corrupt" 2019-03-25) for a while to keep commit-graph
> writes very fast. We dropped the revert when updating to v2.23.0.
> The verify still has potential for catching corrupt data across
> the layer boundary: if the new file has commit X with parent Y
> in an old file but the commit ID for Y in the old file had a
> bitswap, then we will notice that in the 'verify' command.

I'm concerned about having this extra precaution, because it is never
tested (neither here or in a future patch). When you saw this issue
arise, was there ever an instance in which a valid set of commit graph
files turned invalid after this maintenance step? (It seems from your
description that the set was invalid to begin with, so the maintenance
step did not fix it but also did not make it worse. And it does not make
it worse, that seems not too bad to me.)

Other than that, this patch looks good to me.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/11] maintenance: add --task option
  2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-08-19  0:00     ` Jonathan Tan
  2020-08-19  0:36       ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19  0:00 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> @@ -66,6 +68,10 @@ OPTIONS
>  --quiet::
>  	Do not report progress or other information over `stderr`.
>  
> +--task=<task>::
> +	If this option is specified one or more times, then only run the
> +	specified tasks in the specified order.

We should list the accepted tasks somewhere but maybe this can wait
until after part 2.

> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>  struct maintenance_task {
>  	const char *name;
>  	maintenance_task_fn *fn;
> -	unsigned enabled:1;
> +	unsigned enabled:1,
> +		 selected:1;
> +	int selected_order;
>  };

"selected" and "selected_order" are redundant in some cases - I think
this would be better if selected_order is negative if this task is not
selected, and non-negative otherwise.

Apart from that, maybe this should be documented. It is unusual (to me)
that a selection can override something being enabled, but that is the
case here.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 07/11] maintenance: take a lock on the objects directory
  2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-08-19  0:04     ` Jonathan Tan
  2020-08-19 15:10       ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19  0:04 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> If the lock file already exists, then fail silently. This will become

Maybe "skip all maintenance steps silently"?

> +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
> +		/*
> +		 * Another maintenance command is running.
> +		 *
> +		 * If --auto was provided, then it is likely due to a
> +		 * recursive process stack. Do not report an error in
> +		 * that case.
> +		 */
> +		if (!opts->auto_flag && !opts->quiet)
> +			error(_("lock file '%s' exists, skipping maintenance"),
> +			      lock_path);
> +		free(lock_path);
> +		return 0;
> +	}

As it is, this doesn't seem very silent. :-) If we do want a message to
be printed maybe make it warning instead of error.

Other than that, the idea of having a lock file and the implementation
in this patch look good to me.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 10/11] maintenance: add auto condition for commit-graph task
  2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-19  0:09     ` Jonathan Tan
  2020-08-19 15:15       ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19  0:09 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> Instead of writing a new commit-graph in every 'git maintenance run
> --auto' process (when maintenance.commit-graph.enalbed is configured to

s/enalbed/enabled/

> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1		(1u<<16)

Why this name? "SEEN" seems perfectly fine.

> +static int num_commits_not_in_graph = 0;
> +static int limit_commits_not_in_graph = 100;
> +
> +static int dfs_on_ref(const char *refname,
> +		      const struct object_id *oid, int flags,
> +		      void *cb_data)
> +{

[snip]

> +static int should_write_commit_graph(void)
> +{
> +	int result;
> +
> +	git_config_get_int("maintenance.commit-graph.auto",
> +			   &limit_commits_not_in_graph);
> +
> +	if (!limit_commits_not_in_graph)
> +		return 0;
> +	if (limit_commits_not_in_graph < 0)
> +		return 1;
> +
> +	result = for_each_ref(dfs_on_ref, NULL);

I don't like introducing the mutable global num_commits_not_in_graph
especially when there seems to be at least one easy alternative (e.g.
putting it in cb_data) but I know that this is a pattern than existing
code.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 11/11] maintenance: add trace2 regions for task execution
  2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-08-19  0:11     ` Jonathan Tan
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19  0:11 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

Thanks. Overall this looks like a good patch set. As you can see from my
other emails the only issues I have remaining are minor.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/11] maintenance: add --task option
  2020-08-19  0:00     ` Jonathan Tan
@ 2020-08-19  0:36       ` Junio C Hamano
  2020-08-19 15:09         ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2020-08-19  0:36 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: gitgitgadget, git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

Jonathan Tan <jonathantanmy@google.com> writes:

>> @@ -66,6 +68,10 @@ OPTIONS
>>  --quiet::
>>  	Do not report progress or other information over `stderr`.
>>  
>> +--task=<task>::
>> +	If this option is specified one or more times, then only run the
>> +	specified tasks in the specified order.
>
> We should list the accepted tasks somewhere but maybe this can wait
> until after part 2.
>
>> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>>  struct maintenance_task {
>>  	const char *name;
>>  	maintenance_task_fn *fn;
>> -	unsigned enabled:1;
>> +	unsigned enabled:1,
>> +		 selected:1;
>> +	int selected_order;
>>  };
>
> "selected" and "selected_order" are redundant in some cases - I think
> this would be better if selected_order is negative if this task is not
> selected, and non-negative otherwise.

It is good to get rid of redundancies.

> Apart from that, maybe this should be documented. It is unusual (to me)
> that a selection can override something being enabled, but that is the
> case here.

I had the same reaction as "(to me)" above during an earlier review
round, IIRC.  This definitely deserves documentation.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
@ 2020-08-19 14:51     ` Derrick Stolee
  0 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee @ 2020-08-19 14:51 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	Derrick Stolee

On 8/18/2020 4:18 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This series is based on jk/strvec.
> 
> As jc/no-update-fetch-head topic seems to be getting popular for
> some reason, let's do this:
> 
>  - recreate jc/no-update-fetch-head topic with patch [1/9] of the
>    maintenance-2 series directly on top of 'master';
> 
>  - fork ds/maintenance-part-1 topic off of jc/no-update-fetch-head
>    and queue these 11 patches;
> 
>  - fork ds/maintenance-part-2 topic off of ds/maintenance-part-1 and
>    queue its 8 patches [2/9 - 9/9].

Sounds good. I will update my branches before their next versions.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 05/11] maintenance: add commit-graph task
  2020-08-18 23:51     ` Jonathan Tan
@ 2020-08-19 15:04       ` Derrick Stolee
  2020-08-19 17:43         ` Jonathan Tan
  0 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee @ 2020-08-19 15:04 UTC (permalink / raw)
  To: Jonathan Tan, gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

On 8/18/2020 7:51 PM, Jonathan Tan wrote:
>> By using 'git commit-graph verify --shallow' we can ensure that
>> the file we just wrote is valid. This is an extra safety precaution
>> that is faster than our 'write' subcommand. In the rare situation
>> that the newest layer of the commit-graph is corrupt, we can "fix"
>> the corruption by deleting the commit-graph-chain file and rewrite
>> the full commit-graph as a new one-layer commit graph. This does
>> not completely prevent _that_ file from being corrupt, but it does
>> recompute the commit-graph by parsing commits from the object
>> database. In our use of this step in Scalar and VFS for Git, we
>> have only seen this issue arise because our microsoft/git fork
>> reverted 43d3561 ("commit-graph write: don't die if the existing
>> graph is corrupt" 2019-03-25) for a while to keep commit-graph
>> writes very fast. We dropped the revert when updating to v2.23.0.
>> The verify still has potential for catching corrupt data across
>> the layer boundary: if the new file has commit X with parent Y
>> in an old file but the commit ID for Y in the old file had a
>> bitswap, then we will notice that in the 'verify' command.
> 
> I'm concerned about having this extra precaution, because it is never
> tested (neither here or in a future patch). When you saw this issue
> arise, was there ever an instance in which a valid set of commit graph
> files turned invalid after this maintenance step? (It seems from your
> description that the set was invalid to begin with, so the maintenance
> step did not fix it but also did not make it worse. And it does not make
> it worse, that seems not too bad to me.)

The cases we've seen this happen were root-caused to hardware
problems (disk or RAM), and the invalid data was present immediately
after the "git commit-graph write" command. Since implementing the
"verify" step after each write, we have not had any user reports of
problems with these files.

Are you suggesting one of these options?

 1. Remove this validation and rewrite entirely.

 2. Remove the rewrite and only delete the known-bad data.

 3. Insert a way to cause the verify to return failure mid-process,
    so that this function can be covered by tests.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/11] maintenance: add --task option
  2020-08-19  0:36       ` Junio C Hamano
@ 2020-08-19 15:09         ` Derrick Stolee
  2020-08-19 17:35           ` Jonathan Tan
  0 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee @ 2020-08-19 15:09 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan
  Cc: gitgitgadget, git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

On 8/18/2020 8:36 PM, Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
>>> @@ -66,6 +68,10 @@ OPTIONS
>>>  --quiet::
>>>  	Do not report progress or other information over `stderr`.
>>>  
>>> +--task=<task>::
>>> +	If this option is specified one or more times, then only run the
>>> +	specified tasks in the specified order.
>>
>> We should list the accepted tasks somewhere but maybe this can wait
>> until after part 2.

It's hard to see from the patch-context, but there is a section titled
"TASKS" that lists the 'gc' and 'commit-graph' tasks from the earlier
patches. Would a reference to that section be helpful?

	See the TASKS section for the list of available tasks.

>>> @@ -791,7 +791,9 @@ typedef int maintenance_task_fn(struct maintenance_opts *opts);
>>>  struct maintenance_task {
>>>  	const char *name;
>>>  	maintenance_task_fn *fn;
>>> -	unsigned enabled:1;
>>> +	unsigned enabled:1,
>>> +		 selected:1;
>>> +	int selected_order;
>>>  };
>>
>> "selected" and "selected_order" are redundant in some cases - I think
>> this would be better if selected_order is negative if this task is not
>> selected, and non-negative otherwise.
> 
> It is good to get rid of redundancies.
> 
>> Apart from that, maybe this should be documented. It is unusual (to me)
>> that a selection can override something being enabled, but that is the
>> case here.
> 
> I had the same reaction as "(to me)" above during an earlier review
> round, IIRC.  This definitely deserves documentation.

Will do. Thanks.

-Stolee
 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 07/11] maintenance: take a lock on the objects directory
  2020-08-19  0:04     ` Jonathan Tan
@ 2020-08-19 15:10       ` Derrick Stolee
  0 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee @ 2020-08-19 15:10 UTC (permalink / raw)
  To: Jonathan Tan, gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

On 8/18/2020 8:04 PM, Jonathan Tan wrote:
>> If the lock file already exists, then fail silently. This will become
> 
> Maybe "skip all maintenance steps silently"?
> 
>> +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
>> +		/*
>> +		 * Another maintenance command is running.
>> +		 *
>> +		 * If --auto was provided, then it is likely due to a
>> +		 * recursive process stack. Do not report an error in
>> +		 * that case.
>> +		 */
>> +		if (!opts->auto_flag && !opts->quiet)
>> +			error(_("lock file '%s' exists, skipping maintenance"),
>> +			      lock_path);
>> +		free(lock_path);
>> +		return 0;
>> +	}
> 
> As it is, this doesn't seem very silent. :-) If we do want a message to
> be printed maybe make it warning instead of error.

Sorry, it is silent when "--auto" is specified. The commit message needs
to reflect this properly. This could easily be downgraded to a warning.

> Other than that, the idea of having a lock file and the implementation
> in this patch look good to me.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 10/11] maintenance: add auto condition for commit-graph task
  2020-08-19  0:09     ` Jonathan Tan
@ 2020-08-19 15:15       ` Derrick Stolee
  0 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee @ 2020-08-19 15:15 UTC (permalink / raw)
  To: Jonathan Tan, gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

On 8/18/2020 8:09 PM, Jonathan Tan wrote:
>> Instead of writing a new commit-graph in every 'git maintenance run
>> --auto' process (when maintenance.commit-graph.enalbed is configured to
> 
> s/enalbed/enabled/
> 
>> +/* Remember to update object flag allocation in object.h */
>> +#define PARENT1		(1u<<16)
> 
> Why this name? "SEEN" seems perfectly fine.

For some reason I thought that there might be issues using
SEEN (1u<<0) but trying it locally does not seem to be a
problem. I think I've just been bit by how revision.c uses
it a bit too often. The use here is independent enough to
not cause problems.

>> +static int num_commits_not_in_graph = 0;
>> +static int limit_commits_not_in_graph = 100;
>> +
>> +static int dfs_on_ref(const char *refname,
>> +		      const struct object_id *oid, int flags,
>> +		      void *cb_data)
>> +{
> 
> [snip]
> 
>> +static int should_write_commit_graph(void)
>> +{
>> +	int result;
>> +
>> +	git_config_get_int("maintenance.commit-graph.auto",
>> +			   &limit_commits_not_in_graph);
>> +
>> +	if (!limit_commits_not_in_graph)
>> +		return 0;
>> +	if (limit_commits_not_in_graph < 0)
>> +		return 1;
>> +
>> +	result = for_each_ref(dfs_on_ref, NULL);
> 
> I don't like introducing the mutable global num_commits_not_in_graph
> especially when there seems to be at least one easy alternative (e.g.
> putting it in cb_data) but I know that this is a pattern than existing
> code.

I should have done this right in the first place. Thanks for
catching it.

-Stolee

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/11] maintenance: add --task option
  2020-08-19 15:09         ` Derrick Stolee
@ 2020-08-19 17:35           ` Jonathan Tan
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19 17:35 UTC (permalink / raw)
  To: stolee
  Cc: gitster, jonathantanmy, gitgitgadget, git, sandals, steadmon,
	jrnieder, peff, congdanhqx, phillip.wood123, emilyshaffer,
	sluongng, derrickstolee, dstolee

> On 8/18/2020 8:36 PM, Junio C Hamano wrote:
> > Jonathan Tan <jonathantanmy@google.com> writes:
> > 
> >>> @@ -66,6 +68,10 @@ OPTIONS
> >>>  --quiet::
> >>>  	Do not report progress or other information over `stderr`.
> >>>  
> >>> +--task=<task>::
> >>> +	If this option is specified one or more times, then only run the
> >>> +	specified tasks in the specified order.
> >>
> >> We should list the accepted tasks somewhere but maybe this can wait
> >> until after part 2.
> 
> It's hard to see from the patch-context, but there is a section titled
> "TASKS" that lists the 'gc' and 'commit-graph' tasks from the earlier
> patches. Would a reference to that section be helpful?
> 
> 	See the TASKS section for the list of available tasks.

Ah, I missed that. What you have now is fine then, but a reference might
be helpful if this man page ever grows.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 05/11] maintenance: add commit-graph task
  2020-08-19 15:04       ` Derrick Stolee
@ 2020-08-19 17:43         ` Jonathan Tan
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Tan @ 2020-08-19 17:43 UTC (permalink / raw)
  To: stolee
  Cc: jonathantanmy, gitgitgadget, git, sandals, steadmon, jrnieder,
	peff, congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	derrickstolee, dstolee

> The cases we've seen this happen were root-caused to hardware
> problems (disk or RAM), and the invalid data was present immediately
> after the "git commit-graph write" command. Since implementing the
> "verify" step after each write, we have not had any user reports of
> problems with these files.
> 
> Are you suggesting one of these options?
> 
>  1. Remove this validation and rewrite entirely.
> 
>  2. Remove the rewrite and only delete the known-bad data.
> 
>  3. Insert a way to cause the verify to return failure mid-process,
>     so that this function can be covered by tests.

I was suggesting 1, although 2 and 3 would assuage my concerns also (2
less so, but just deleting the file is relatively simple and easier to
reason about). I don't think Git in general defends much against
hardware problems, but if an issue is noticeable in some hardware (as is
in this case, at least in the past) I can see why we would want to
defend against it. My concern is that if we do defend against it, but
leave it untested, several versions of Git later we might find that we
need this defense but it does not work.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                     ` (11 preceding siblings ...)
  2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
@ 2020-08-25 18:33   ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
                       ` (11 more replies)
  12 siblings, 12 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 336 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 565 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v2:

  1:  e09e4a4a87 !  1:  aa961af387 maintenance: create basic maintenance runner
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
       	return 0;
       }
      +
     -+static const char * const builtin_maintenance_usage[] = {
     -+	N_("git maintenance run [<options>]"),
     ++static const char * const builtin_maintenance_run_usage[] = {
     ++	N_("git maintenance run [--auto]"),
      +	NULL
      +};
      +
     -+struct maintenance_opts {
     ++struct maintenance_run_opts {
      +	int auto_flag;
      +};
      +
     -+static int maintenance_task_gc(struct maintenance_opts *opts)
     ++static int maintenance_task_gc(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +	return run_command(&child);
      +}
      +
     -+static int maintenance_run(struct maintenance_opts *opts)
     ++static int maintenance_run(int argc, const char **argv, const char *prefix)
      +{
     -+	return maintenance_task_gc(opts);
     -+}
     -+
     -+int cmd_maintenance(int argc, const char **argv, const char *prefix)
     -+{
     -+	struct maintenance_opts opts;
     -+	struct option builtin_maintenance_options[] = {
     ++	struct maintenance_run_opts opts;
     ++	struct option builtin_maintenance_run_options[] = {
      +		OPT_BOOL(0, "auto", &opts.auto_flag,
      +			 N_("run tasks based on the state of the repository")),
      +		OPT_END()
      +	};
     -+
      +	memset(&opts, 0, sizeof(opts));
      +
     -+	if (argc == 2 && !strcmp(argv[1], "-h"))
     -+		usage_with_options(builtin_maintenance_usage,
     -+				   builtin_maintenance_options);
     -+
      +	argc = parse_options(argc, argv, prefix,
     -+			     builtin_maintenance_options,
     -+			     builtin_maintenance_usage,
     -+			     PARSE_OPT_KEEP_UNKNOWN);
     ++			     builtin_maintenance_run_options,
     ++			     builtin_maintenance_run_usage,
     ++			     PARSE_OPT_STOP_AT_NON_OPTION);
     ++
     ++	if (argc != 0)
     ++		usage_with_options(builtin_maintenance_run_usage,
     ++				   builtin_maintenance_run_options);
     ++	return maintenance_task_gc(&opts);
     ++}
     ++
     ++static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
      +
     -+	if (argc != 1)
     -+		usage_with_options(builtin_maintenance_usage,
     -+				   builtin_maintenance_options);
     ++int cmd_maintenance(int argc, const char **argv, const char *prefix)
     ++{
     ++	if (argc == 2 && !strcmp(argv[1], "-h"))
     ++		usage(builtin_maintenance_usage);
      +
     -+	if (!strcmp(argv[0], "run"))
     -+		return maintenance_run(&opts);
     ++	if (!strcmp(argv[1], "run"))
     ++		return maintenance_run(argc - 1, argv + 1, prefix);
      +
     -+	die(_("invalid subcommand: %s"), argv[0]);
     ++	die(_("invalid subcommand: %s"), argv[1]);
      +}
      
     + ## command-list.txt ##
     +@@ command-list.txt: git-ls-remote                           plumbinginterrogators
     + git-ls-tree                             plumbinginterrogators
     + git-mailinfo                            purehelpers
     + git-mailsplit                           purehelpers
     ++git-maintenance                         mainporcelain
     + git-merge                               mainporcelain           history
     + git-merge-base                          plumbinginterrogators
     + git-merge-file                          plumbingmanipulators
     +
       ## git.c ##
      @@ git.c: static struct cmd_struct commands[] = {
       	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
  2:  adae48d235 !  2:  5386d8a628 maintenance: add --quiet option
     @@ Documentation/git-maintenance.txt: OPTIONS
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static const char * const builtin_maintenance_usage[] = {
     +@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + }
       
     - struct maintenance_opts {
     + static const char * const builtin_maintenance_run_usage[] = {
     +-	N_("git maintenance run [--auto]"),
     ++	N_("git maintenance run [--auto] [--[no-]quiet]"),
     + 	NULL
     + };
     + 
     + struct maintenance_run_opts {
       	int auto_flag;
      +	int quiet;
       };
       
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       
       	if (opts->auto_flag)
       		strvec_push(&child.args, "--auto");
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       
       	close_object_store(the_repository->objects);
       	return run_command(&child);
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 	struct option builtin_maintenance_options[] = {
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	struct option builtin_maintenance_run_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
      +		OPT_BOOL(0, "quiet", &opts.quiet,
      +			 N_("do not report progress or other information over stderr")),
       		OPT_END()
       	};
     - 
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 		usage_with_options(builtin_maintenance_usage,
     - 				   builtin_maintenance_options);
     + 	memset(&opts, 0, sizeof(opts));
       
      +	opts.quiet = !isatty(2);
      +
       	argc = parse_options(argc, argv, prefix,
     - 			     builtin_maintenance_options,
     - 			     builtin_maintenance_usage,
     + 			     builtin_maintenance_run_options,
     + 			     builtin_maintenance_run_usage,
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
  3:  91741a0cfc =  3:  e28b332df4 maintenance: replace run_auto_gc()
  4:  1db3b96280 !  4:  82326c1c38 maintenance: initialize task array
     @@ Commit message
      
          The run subcommand will return a nonzero exit code if any task fails.
          However, it will attempt all tasks in its loop before returning with the
     -    failure. Also each failed task will send an error message.
     +    failure. Also each failed task will print an error message.
      
          Helped-by: Taylor Blau <me@ttaylorr.com>
          Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       	return run_command(&child);
       }
       
     -+typedef int maintenance_task_fn(struct maintenance_opts *opts);
     ++typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
      +
      +struct maintenance_task {
      +	const char *name;
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	},
      +};
      +
     - static int maintenance_run(struct maintenance_opts *opts)
     - {
     --	return maintenance_task_gc(opts);
     ++static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     ++{
      +	int i;
      +	int result = 0;
      +
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
      +	}
      +
      +	return result;
     ++}
     ++
     + static int maintenance_run(int argc, const char **argv, const char *prefix)
     + {
     + 	struct maintenance_run_opts opts;
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	if (argc != 0)
     + 		usage_with_options(builtin_maintenance_run_usage,
     + 				   builtin_maintenance_run_options);
     +-	return maintenance_task_gc(&opts);
     ++	return maintenance_run_tasks(&opts);
       }
       
     - int cmd_maintenance(int argc, const char **argv, const char *prefix)
     + static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
  5:  50b457fd57 !  5:  06984a42bf maintenance: add commit-graph task
     @@ Commit message
          maintenance: add commit-graph task
      
          The first new task in the 'git maintenance' builtin is the
     -    'commit-graph' job. It is based on the sequence of events in the
     -    'commit-graph' job in Scalar [1]. This sequence is as follows:
     +    'commit-graph' task. This updates the commit-graph file
     +    incrementally with the command
      
     -    1. git commit-graph write --reachable --split
     -    2. git commit-graph verify --shallow
     -    3. If the verify succeeds, stop.
     -    4. Delete the commit-graph-chain file.
     -    5. git commit-graph write --reachable --split
     +            git commit-graph write --reachable --split
      
          By writing an incremental commit-graph file using the "--split"
          option we minimize the disruption from this operation. The default
     @@ Commit message
          This could be avoided by a future update to use the --expire-time
          argument when writing the commit-graph.
      
     -    By using 'git commit-graph verify --shallow' we can ensure that
     -    the file we just wrote is valid. This is an extra safety precaution
     -    that is faster than our 'write' subcommand. In the rare situation
     -    that the newest layer of the commit-graph is corrupt, we can "fix"
     -    the corruption by deleting the commit-graph-chain file and rewrite
     -    the full commit-graph as a new one-layer commit graph. This does
     -    not completely prevent _that_ file from being corrupt, but it does
     -    recompute the commit-graph by parsing commits from the object
     -    database. In our use of this step in Scalar and VFS for Git, we
     -    have only seen this issue arise because our microsoft/git fork
     -    reverted 43d3561 ("commit-graph write: don't die if the existing
     -    graph is corrupt" 2019-03-25) for a while to keep commit-graph
     -    writes very fast. We dropped the revert when updating to v2.23.0.
     -    The verify still has potential for catching corrupt data across
     -    the layer boundary: if the new file has commit X with parent Y
     -    in an old file but the commit ID for Y in the old file had a
     -    bitswap, then we will notice that in the 'verify' command.
     -
     -    [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/CommitGraphStep.cs
     -
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     @@ Documentation/git-maintenance.txt: run::
       
      +commit-graph::
      +	The `commit-graph` job updates the `commit-graph` files incrementally,
     -+	then verifies that the written data is correct. If the new layer has an
     -+	issue, then the chain file is removed and the `commit-graph` is
     -+	rewritten from scratch.
     -++
     -+The incremental write is safe to run alongside concurrent Git processes
     -+since it will not expire `.graph` files that were in the previous
     -+`commit-graph-chain` file. They will be deleted by a later run based on
     -+the expiration delay.
     ++	then verifies that the written data is correct. The incremental
     ++	write is safe to run alongside concurrent Git processes since it
     ++	will not expire `.graph` files that were in the previous
     ++	`commit-graph-chain` file. They will be deleted by a later run based
     ++	on the expiration delay.
      +
       gc::
       	Clean up unnecessary files and optimize the local repository. "GC"
       	stands for "garbage collection," but this task performs many
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: struct maintenance_opts {
     +@@ builtin/gc.c: struct maintenance_run_opts {
       	int quiet;
       };
       
     -+static int run_write_commit_graph(struct maintenance_opts *opts)
     ++static int run_write_commit_graph(struct maintenance_run_opts *opts)
      +{
      +	struct child_process child = CHILD_PROCESS_INIT;
      +
     @@ builtin/gc.c: struct maintenance_opts {
      +	return !!run_command(&child);
      +}
      +
     -+static int run_verify_commit_graph(struct maintenance_opts *opts)
     ++static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
      +{
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+
     -+	child.git_cmd = 1;
     -+	strvec_pushl(&child.args, "commit-graph", "verify",
     -+		     "--shallow", NULL);
     -+
     -+	if (opts->quiet)
     -+		strvec_push(&child.args, "--no-progress");
     -+
     -+	return !!run_command(&child);
     -+}
     -+
     -+static int maintenance_task_commit_graph(struct maintenance_opts *opts)
     -+{
     -+	struct repository *r = the_repository;
     -+	char *chain_path;
     -+
     -+	close_object_store(r->objects);
     ++	close_object_store(the_repository->objects);
      +	if (run_write_commit_graph(opts)) {
      +		error(_("failed to write commit-graph"));
      +		return 1;
      +	}
      +
     -+	if (!run_verify_commit_graph(opts))
     -+		return 0;
     -+
     -+	warning(_("commit-graph verify caught error, rewriting"));
     -+
     -+	chain_path = get_commit_graph_chain_filename(r->objects->odb);
     -+	if (unlink(chain_path)) {
     -+		UNLEAK(chain_path);
     -+		die(_("failed to remove commit-graph at %s"), chain_path);
     -+	}
     -+	free(chain_path);
     -+
     -+	if (!run_write_commit_graph(opts))
     -+		return 0;
     -+
     -+	error(_("failed to rewrite commit-graph"));
     -+	return 1;
     ++	return 0;
      +}
      +
     - static int maintenance_task_gc(struct maintenance_opts *opts)
     + static int maintenance_task_gc(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: struct maintenance_task {
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      +	},
       };
       
     - static int maintenance_run(struct maintenance_opts *opts)
     + static int maintenance_run_tasks(struct maintenance_run_opts *opts)
      
       ## commit-graph.c ##
      @@ commit-graph.c: static char *get_split_graph_filename(struct object_directory *odb,
     @@ commit-graph.c: static void expire_commit_graphs(struct write_commit_graph_conte
       		ctx->num_commit_graphs_after = 0;
      
       ## commit-graph.h ##
     -@@ commit-graph.h: struct commit;
     - struct bloom_filter_settings;
     +@@ commit-graph.h: struct raw_object_store;
     + struct string_list;
       
       char *get_commit_graph_filename(struct object_directory *odb);
      +char *get_commit_graph_chain_filename(struct object_directory *odb);
  6:  85268bd53e !  6:  69298aee24 maintenance: add --task option
     @@ Documentation/git-maintenance.txt: OPTIONS
       
      +--task=<task>::
      +	If this option is specified one or more times, then only run the
     -+	specified tasks in the specified order.
     ++	specified tasks in the specified order. See the 'TASKS' section
     ++	for the list of accepted `<task>` values.
      +
       GIT
       ---
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: typedef int maintenance_task_fn(struct maintenance_opts *opts);
     - struct maintenance_task {
     +@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + }
     + 
     + static const char * const builtin_maintenance_run_usage[] = {
     +-	N_("git maintenance run [--auto] [--[no-]quiet]"),
     ++	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
     + 	NULL
     + };
     + 
     +@@ builtin/gc.c: struct maintenance_task {
       	const char *name;
       	maintenance_task_fn *fn;
     --	unsigned enabled:1;
     -+	unsigned enabled:1,
     -+		 selected:1;
     + 	unsigned enabled:1;
     ++
     ++	/* -1 if not selected. */
      +	int selected_order;
       };
       
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      +	return b->selected_order - a->selected_order;
      +}
      +
     - static int maintenance_run(struct maintenance_opts *opts)
     + static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       {
      -	int i;
      +	int i, found_selected = 0;
       	int result = 0;
       
      +	for (i = 0; !found_selected && i < TASK__COUNT; i++)
     -+		found_selected = tasks[i].selected;
     ++		found_selected = tasks[i].selected_order >= 0;
      +
      +	if (found_selected)
      +		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
      +
       	for (i = 0; i < TASK__COUNT; i++) {
      -		if (!tasks[i].enabled)
     -+		if (found_selected && !tasks[i].selected)
     ++		if (found_selected && tasks[i].selected_order < 0)
      +			continue;
      +
      +		if (!found_selected && !tasks[i].enabled)
       			continue;
       
       		if (tasks[i].fn(opts)) {
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       	return result;
       }
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +	BUG_ON_OPT_NEG(unset);
      +
      +	for (i = 0; i < TASK__COUNT; i++) {
     -+		num_selected += tasks[i].selected;
     ++		if (tasks[i].selected_order >= 0)
     ++			num_selected++;
      +		if (!strcasecmp(tasks[i].name, arg)) {
      +			task = &tasks[i];
      +		}
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +		return 1;
      +	}
      +
     -+	if (task->selected) {
     ++	if (task->selected_order >= 0) {
      +		error(_("task '%s' cannot be selected multiple times"), arg);
      +		return 1;
      +	}
      +
     -+	task->selected = 1;
      +	task->selected_order = num_selected + 1;
      +
      +	return 0;
      +}
      +
     - int cmd_maintenance(int argc, const char **argv, const char *prefix)
     + static int maintenance_run(int argc, const char **argv, const char *prefix)
       {
     - 	struct maintenance_opts opts;
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     ++	int i;
     + 	struct maintenance_run_opts opts;
     + 	struct option builtin_maintenance_run_options[] = {
     + 		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
       		OPT_BOOL(0, "quiet", &opts.quiet,
       			 N_("do not report progress or other information over stderr")),
     @@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefi
      +			PARSE_OPT_NONEG, task_option_parse),
       		OPT_END()
       	};
     + 	memset(&opts, 0, sizeof(opts));
     + 
     + 	opts.quiet = !isatty(2);
       
     ++	for (i = 0; i < TASK__COUNT; i++)
     ++		tasks[i].selected_order = -1;
     ++
     + 	argc = parse_options(argc, argv, prefix,
     + 			     builtin_maintenance_run_options,
     + 			     builtin_maintenance_run_usage,
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
  7:  6f86cfaa94 !  7:  3d513acdd8 maintenance: take a lock on the objects directory
     @@ Commit message
          lock is never committed, since it does not represent meaningful data.
          Instead, it is only a placeholder.
      
     -    If the lock file already exists, then fail silently. This will become
     -    very important later when we implement the 'fetch' task, as this is our
     -    stop-gap from creating a recursive process loop between 'git fetch' and
     -    'git maintenance run'.
     +    If the lock file already exists, then fail with a warning. If '--auto'
     +    is specified, then instead no warning is shown and no tasks are attempted.
     +    This will become very important later when we implement the 'prefetch'
     +    task, as this is our stop-gap from creating a recursive process loop
     +    between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       {
       	int i, found_selected = 0;
       	int result = 0;
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
      +		 * that case.
      +		 */
      +		if (!opts->auto_flag && !opts->quiet)
     -+			error(_("lock file '%s' exists, skipping maintenance"),
     -+			      lock_path);
     ++			warning(_("lock file '%s' exists, skipping maintenance"),
     ++				lock_path);
      +		free(lock_path);
      +		return 0;
      +	}
      +	free(lock_path);
       
       	for (i = 0; !found_selected && i < TASK__COUNT; i++)
     - 		found_selected = tasks[i].selected;
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     + 		found_selected = tasks[i].selected_order >= 0;
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		}
       	}
       
  8:  5c0f9d69d1 !  8:  712f5f2d8e maintenance: create maintenance.<task>.enabled config
     @@ Documentation/config/maintenance.txt (new)
      @@
      +maintenance.<task>.enabled::
      +	This boolean config option controls whether the maintenance task
     -+	with name `<task>` is run when no `--task` option is specified.
     -+	By default, only `maintenance.gc.enabled` is true.
     ++	with name `<task>` is run when no `--task` option is specified to
     ++	`git maintenance run`. These config values are ignored if a
     ++	`--task` option exists. By default, only `maintenance.gc.enabled`
     ++	is true.
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: SUBCOMMANDS
     @@ Documentation/git-maintenance.txt: SUBCOMMANDS
       
       TASKS
       -----
     +@@ Documentation/git-maintenance.txt: OPTIONS
     + 
     + --task=<task>::
     + 	If this option is specified one or more times, then only run the
     +-	specified tasks in the specified order. See the 'TASKS' section
     +-	for the list of accepted `<task>` values.
     ++	specified tasks in the specified order. If no `--task=<task>`
     ++	arguments are specified, then only the tasks with
     ++	`maintenance.<task>.enabled` configured as `true` are considered.
     ++	See the 'TASKS' section for the list of accepted `<task>` values.
     + 
     + GIT
     + ---
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       	return result;
       }
       
     @@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
       static int task_option_parse(const struct option *opt,
       			     const char *arg, int unset)
       {
     -@@ builtin/gc.c: int cmd_maintenance(int argc, const char **argv, const char *prefix)
     - 				   builtin_maintenance_options);
     +@@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char *prefix)
     + 	memset(&opts, 0, sizeof(opts));
       
       	opts.quiet = !isatty(2);
      +	initialize_task_config();
       
     - 	argc = parse_options(argc, argv, prefix,
     - 			     builtin_maintenance_options,
     + 	for (i = 0; i < TASK__COUNT; i++)
     + 		tasks[i].selected_order = -1;
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'run [--auto|--quiet]' '
  9:  68bf5bef4b !  9:  69d3b48fd4 maintenance: use pointers to check --auto
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
       
     - typedef int maintenance_task_fn(struct maintenance_opts *opts);
     + typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
       
      +/*
      + * An auto condition function returns 1 if the task should run
     @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
       	const char *name;
       	maintenance_task_fn *fn;
      +	maintenance_auto_fn *auto_condition;
     - 	unsigned enabled:1,
     - 		 selected:1;
     - 	int selected_order;
     + 	unsigned enabled:1;
     + 
     + 	/* -1 if not selected. */
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {
       	[TASK_GC] = {
       		"gc",
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
       		1,
       	},
       	[TASK_COMMIT_GRAPH] = {
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		if (!found_selected && !tasks[i].enabled)
       			continue;
       
 10:  fc097c389a ! 10:  4c3115fe35 maintenance: add auto condition for commit-graph task
     @@ Commit message
      
       ## Documentation/config/maintenance.txt ##
      @@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
     - 	This boolean config option controls whether the maintenance task
     - 	with name `<task>` is run when no `--task` option is specified.
     - 	By default, only `maintenance.gc.enabled` is true.
     + 	`git maintenance run`. These config values are ignored if a
     + 	`--task` option exists. By default, only `maintenance.gc.enabled`
     + 	is true.
      +
      +maintenance.commit-graph.auto::
      +	This integer config option controls how often the `commit-graph` task
     @@ builtin/gc.c
       
       #define FAILED_RUN "failed to run %s"
       
     -@@ builtin/gc.c: struct maintenance_opts {
     +@@ builtin/gc.c: struct maintenance_run_opts {
       	int quiet;
       };
       
      +/* Remember to update object flag allocation in object.h */
     -+#define PARENT1		(1u<<16)
     ++#define SEEN		(1u<<0)
      +
     -+static int num_commits_not_in_graph = 0;
     -+static int limit_commits_not_in_graph = 100;
     ++struct cg_auto_data {
     ++	int num_not_in_graph;
     ++	int limit;
     ++};
      +
      +static int dfs_on_ref(const char *refname,
      +		      const struct object_id *oid, int flags,
      +		      void *cb_data)
      +{
     ++	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
      +	int result = 0;
      +	struct object_id peeled;
      +	struct commit_list *stack = NULL;
     @@ builtin/gc.c: struct maintenance_opts {
      +		for (parent = commit->parents; parent; parent = parent->next) {
      +			if (parse_commit(parent->item) ||
      +			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
     -+			    parent->item->object.flags & PARENT1)
     ++			    parent->item->object.flags & SEEN)
      +				continue;
      +
     -+			parent->item->object.flags |= PARENT1;
     -+			num_commits_not_in_graph++;
     ++			parent->item->object.flags |= SEEN;
     ++			data->num_not_in_graph++;
      +
     -+			if (num_commits_not_in_graph >= limit_commits_not_in_graph) {
     ++			if (data->num_not_in_graph >= data->limit) {
      +				result = 1;
      +				break;
      +			}
     @@ builtin/gc.c: struct maintenance_opts {
      +static int should_write_commit_graph(void)
      +{
      +	int result;
     ++	struct cg_auto_data data;
      +
     ++	data.num_not_in_graph = 0;
     ++	data.limit = 100;
      +	git_config_get_int("maintenance.commit-graph.auto",
     -+			   &limit_commits_not_in_graph);
     ++			   &data.limit);
      +
     -+	if (!limit_commits_not_in_graph)
     ++	if (!data.limit)
      +		return 0;
     -+	if (limit_commits_not_in_graph < 0)
     ++	if (data.limit < 0)
      +		return 1;
      +
     -+	result = for_each_ref(dfs_on_ref, NULL);
     ++	result = for_each_ref(dfs_on_ref, &data);
      +
     -+	clear_commit_marks_all(PARENT1);
     ++	clear_commit_marks_all(SEEN);
      +
      +	return result;
      +}
      +
     - static int run_write_commit_graph(struct maintenance_opts *opts)
     + static int run_write_commit_graph(struct maintenance_run_opts *opts)
       {
       	struct child_process child = CHILD_PROCESS_INIT;
      @@ builtin/gc.c: static struct maintenance_task tasks[] = {
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
      
       ## object.h ##
      @@ object.h: struct object_array {
     +  * sha1-name.c:                                              20
        * list-objects-filter.c:                                      21
        * builtin/fsck.c:           0--3
     ++ * builtin/gc.c:             0
        * builtin/index-pack.c:                                     2021
     -+ * builtin/maintenance.c:                           16
        * builtin/pack-objects.c:                                   20
        * builtin/reflog.c:                   10--12
     -  * builtin/show-branch.c:    0-------------------------------------------26
 11:  46fbe161aa ! 11:  652a8eac57 maintenance: add trace2 regions for task execution
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int maintenance_run(struct maintenance_opts *opts)
     +@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
       		     !tasks[i].auto_condition()))
       			continue;
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 01/11] maintenance: create basic maintenance runner
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
 command-list.txt                  |  1 +
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 21 ++++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 8 files changed, 172 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..ff47fb3641
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
+
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
+
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Clean up unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be expensive for large repositories,
+	as it repacks all Git objects into a single pack-file. It can also
+	be disruptive in some situations, as it deletes stale data. See
+	linkgit:git-gc[1] for more details on garbage collection in Git.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..7f2771e786 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto]"),
+	NULL
+};
+
+struct maintenance_run_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(int argc, const char **argv, const char *prefix)
+{
+	struct maintenance_run_opts opts;
+	struct option builtin_maintenance_run_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+	memset(&opts, 0, sizeof(opts));
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_run_options,
+			     builtin_maintenance_run_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc != 0)
+		usage_with_options(builtin_maintenance_run_usage,
+				   builtin_maintenance_run_options);
+	return maintenance_task_gc(&opts);
+}
+
+static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage(builtin_maintenance_usage);
+
+	if (!strcmp(argv[1], "run"))
+		return maintenance_run(argc - 1, argv + 1, prefix);
+
+	die(_("invalid subcommand: %s"), argv[1]);
+}
diff --git a/command-list.txt b/command-list.txt
index e5901f2213..0e3204e7d1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -117,6 +117,7 @@ git-ls-remote                           plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
+git-maintenance                         mainporcelain
 git-merge                               mainporcelain           history
 git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..14aa691f19
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err &&
+	test_expect_code 128 git maintenance barf 2>err &&
+	test_i18ngrep "invalid subcommand: barf" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6a8e194a99..d805e73f45 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1628,3 +1628,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 02/11] maintenance: add --quiet option
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      | 11 ++++++++++-
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ff47fb3641..04fa0fe329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 7f2771e786..da11f1edcd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,12 +701,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto]"),
+	N_("git maintenance run [--auto] [--[no-]quiet]"),
 	NULL
 };
 
 struct maintenance_run_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -729,10 +734,14 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 14aa691f19..47d512464c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -11,11 +11,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "invalid subcommand: barf" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 03/11] maintenance: replace run_auto_gc()
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 07d06ff445..b65a758661 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -95,9 +95,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2eb8d6a5a5..cb38e6f5ec 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -199,8 +199,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1891,7 +1893,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 74829a838e..8f31bfab56 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -456,7 +456,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2a1abe91f0..a9682c5669 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -934,7 +934,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 04/11] maintenance: initialize task array
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index da11f1edcd..81643515e5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,6 +728,47 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
+static int maintenance_run_tasks(struct maintenance_run_opts *opts)
+{
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
 	struct maintenance_run_opts opts;
@@ -750,7 +791,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
-	return maintenance_task_gc(&opts);
+	return maintenance_run_tasks(&opts);
 }
 
 static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 05/11] maintenance: add commit-graph task
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  8 ++++++++
 builtin/gc.c                      | 30 ++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++++----
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 ++
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 04fa0fe329..fc5dbcf0b9 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,14 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. The incremental
+	write is safe to run alongside concurrent Git processes since it
+	will not expire `.graph` files that were in the previous
+	`commit-graph-chain` file. They will be deleted by a later run based
+	on the expiration delay.
+
 gc::
 	Clean up unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 81643515e5..16e567992e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,31 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
+{
+	close_object_store(the_repository->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	return 0;
+}
+
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +763,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +775,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index e51c91dd5b..a9b0db7d4a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -516,7 +516,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1668,7 +1668,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -2038,7 +2038,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 09a97030dc..765221cdcc 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct raw_object_store;
 struct string_list;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 47d512464c..c0c4e6846e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 06/11] maintenance: add --task option
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  9 ++++-
 builtin/gc.c                      | 66 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 +++++++++++++
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index fc5dbcf0b9..819ca41ab6 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -63,6 +65,11 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order. See the 'TASKS' section
+	for the list of accepted `<task>` values.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 16e567992e..e94f263c77 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet]"),
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
 	NULL
 };
 
@@ -759,6 +759,9 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	unsigned enabled:1;
+
+	/* -1 if not selected. */
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -781,13 +784,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected_order >= 0;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && tasks[i].selected_order < 0)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -799,20 +821,58 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (tasks[i].selected_order >= 0)
+			num_selected++;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected_order >= 0) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
+	int i;
 	struct maintenance_run_opts opts;
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
 
+	for (i = 0; i < TASK__COUNT; i++)
+		tasks[i].selected_order = -1;
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c0c4e6846e..792765aff7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,4 +25,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 07/11] maintenance: take a lock on the objects directory
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-26 23:02       ` Jonathan Tan
  2020-08-25 18:33     ` [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  11 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then fail with a warning. If '--auto'
is specified, then instead no warning is shown and no tasks are attempted.
This will become very important later when we implement the 'prefetch'
task, as this is our stop-gap from creating a recursive process loop
between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index e94f263c77..1cebb7282d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			warning(_("lock file '%s' exists, skipping maintenance"),
+				lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
@@ -818,6 +837,7 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  6 ++++++
 Documentation/git-maintenance.txt    | 14 +++++++++-----
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3042d80978..f93b6837e4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..4402b8b49f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,6 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified to
+	`git maintenance run`. These config values are ignored if a
+	`--task` option exists. By default, only `maintenance.gc.enabled`
+	is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 819ca41ab6..6abcb8255a 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
@@ -67,8 +69,10 @@ OPTIONS
 
 --task=<task>::
 	If this option is specified one or more times, then only run the
-	specified tasks in the specified order. See the 'TASKS' section
-	for the list of accepted `<task>` values.
+	specified tasks in the specified order. If no `--task=<task>`
+	arguments are specified, then only the tasks with
+	`maintenance.<task>.enabled` configured as `true` are considered.
+	See the 'TASKS' section for the list of accepted `<task>` values.
 
 GIT
 ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 1cebb7282d..67a8d405a1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -841,6 +841,24 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -889,6 +907,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	for (i = 0; i < TASK__COUNT; i++)
 		tasks[i].selected_order = -1;
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 792765aff7..290abb7832 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,6 +25,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 09/11] maintenance: use pointers to check --auto
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 67a8d405a1..709d13553b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -755,9 +755,17 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
 	/* -1 if not selected. */
@@ -776,6 +784,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -831,6 +840,11 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -845,6 +859,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 290abb7832..4f6a04ddb1 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -21,7 +21,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 10/11] maintenance: add auto condition for commit-graph task
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (8 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-08-26 23:02       ` Jonathan Tan
  2020-08-25 18:33     ` [PATCH v3 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the PARENT1 flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 82 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 93 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 4402b8b49f..7cc6700d57 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -4,3 +4,13 @@ maintenance.<task>.enabled::
 	`git maintenance run`. These config values are ignored if a
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index 709d13553b..8c4edf19ba 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,86 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define SEEN		(1u<<0)
+
+struct cg_auto_data {
+	int num_not_in_graph;
+	int limit;
+};
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & SEEN)
+				continue;
+
+			parent->item->object.flags |= SEEN;
+			data->num_not_in_graph++;
+
+			if (data->num_not_in_graph >= data->limit) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+	struct cg_auto_data data;
+
+	data.num_not_in_graph = 0;
+	data.limit = 100;
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &data.limit);
+
+	if (!data.limit)
+		return 0;
+	if (data.limit < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, &data);
+
+	clear_commit_marks_all(SEEN);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -790,6 +871,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..20b18805f0 100644
--- a/object.h
+++ b/object.h
@@ -73,6 +73,7 @@ struct object_array {
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
+ * builtin/gc.c:             0
  * builtin/index-pack.c:                                     2021
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 11/11] maintenance: add trace2 regions for task execution
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (9 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-25 18:33     ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 8c4edf19ba..c3bcdc1167 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -927,10 +927,12 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 07/11] maintenance: take a lock on the objects directory
  2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-08-26 23:02       ` Jonathan Tan
  0 siblings, 0 replies; 91+ messages in thread
From: Jonathan Tan @ 2020-08-26 23:02 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> If the lock file already exists, then fail with a warning. If '--auto'
> is specified, then instead no warning is shown and no tasks are attempted.

Maybe just delete these 2 lines - you're not failing, and you're
refraining from attempting all tasks regardless.

With or without this change, patches 1-7 look good.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 10/11] maintenance: add auto condition for commit-graph task
  2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-08-26 23:02       ` Jonathan Tan
  2020-08-26 23:56         ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Jonathan Tan @ 2020-08-26 23:02 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, jonathantanmy,
	derrickstolee, dstolee

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Instead of writing a new commit-graph in every 'git maintenance run
> --auto' process (when maintenance.commit-graph.enalbed is configured to
> be true), only write when there are "enough" commits not in a
> commit-graph file.
> 
> This count is controlled by the maintenance.commit-graph.auto config
> option.
> 
> To compute the count, use a depth-first search starting at each ref, and
> leaving markers using the PARENT1 flag. If this count reaches the limit,

PARENT1 -> SEEN

Other than that, all the 11 patches look good.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 10/11] maintenance: add auto condition for commit-graph task
  2020-08-26 23:02       ` Jonathan Tan
@ 2020-08-26 23:56         ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2020-08-26 23:56 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: gitgitgadget, git, sandals, steadmon, jrnieder, peff, congdanhqx,
	phillip.wood123, emilyshaffer, sluongng, derrickstolee, dstolee

Jonathan Tan <jonathantanmy@google.com> writes:

>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> Instead of writing a new commit-graph in every 'git maintenance run
>> --auto' process (when maintenance.commit-graph.enalbed is configured to
>> be true), only write when there are "enough" commits not in a
>> commit-graph file.
>> 
>> This count is controlled by the maintenance.commit-graph.auto config
>> option.
>> 
>> To compute the count, use a depth-first search starting at each ref, and
>> leaving markers using the PARENT1 flag. If this count reaches the limit,
>
> PARENT1 -> SEEN
>
> Other than that, all the 11 patches look good.

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (10 preceding siblings ...)
  2020-08-25 18:33     ` [PATCH v3 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09     ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
                         ` (11 more replies)
  11 siblings, 12 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v3
================

 * Two commit-message typos are fixed.

Thanks for all of the review!

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 336 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 565 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v3:

  1:  aa961af387 =  1:  aa961af387 maintenance: create basic maintenance runner
  2:  5386d8a628 =  2:  5386d8a628 maintenance: add --quiet option
  3:  e28b332df4 =  3:  e28b332df4 maintenance: replace run_auto_gc()
  4:  82326c1c38 =  4:  82326c1c38 maintenance: initialize task array
  5:  06984a42bf =  5:  06984a42bf maintenance: add commit-graph task
  6:  69298aee24 =  6:  69298aee24 maintenance: add --task option
  7:  3d513acdd8 !  7:  c57998a1c8 maintenance: take a lock on the objects directory
     @@ Commit message
          lock is never committed, since it does not represent meaningful data.
          Instead, it is only a placeholder.
      
     -    If the lock file already exists, then fail with a warning. If '--auto'
     -    is specified, then instead no warning is shown and no tasks are attempted.
     -    This will become very important later when we implement the 'prefetch'
     -    task, as this is our stop-gap from creating a recursive process loop
     -    between 'git fetch' and 'git maintenance run --auto'.
     +    If the lock file already exists, then no maintenance tasks are
     +    attempted. This will become very important later when we implement the
     +    'prefetch' task, as this is our stop-gap from creating a recursive process
     +    loop between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
  8:  712f5f2d8e =  8:  dc2a092366 maintenance: create maintenance.<task>.enabled config
  9:  69d3b48fd4 =  9:  f5f1e85b03 maintenance: use pointers to check --auto
 10:  4c3115fe35 ! 10:  8f84b03c46 maintenance: add auto condition for commit-graph task
     @@ Commit message
          option.
      
          To compute the count, use a depth-first search starting at each ref, and
     -    leaving markers using the PARENT1 flag. If this count reaches the limit,
     +    leaving markers using the SEEN flag. If this count reaches the limit,
          then terminate early and start the task. Otherwise, this operation will
          peel every ref and parse the commit it points to. If these are all in
          the commit-graph, then this is typically a very fast operation. Users
 11:  652a8eac57 = 11:  2d6414d89b maintenance: add trace2 regions for task execution

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 01/11] maintenance: create basic maintenance runner
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
 command-list.txt                  |  1 +
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 21 ++++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 8 files changed, 172 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..ff47fb3641
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
+
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
+
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Clean up unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be expensive for large repositories,
+	as it repacks all Git objects into a single pack-file. It can also
+	be disruptive in some situations, as it deletes stale data. See
+	linkgit:git-gc[1] for more details on garbage collection in Git.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..7f2771e786 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto]"),
+	NULL
+};
+
+struct maintenance_run_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(int argc, const char **argv, const char *prefix)
+{
+	struct maintenance_run_opts opts;
+	struct option builtin_maintenance_run_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+	memset(&opts, 0, sizeof(opts));
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_run_options,
+			     builtin_maintenance_run_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc != 0)
+		usage_with_options(builtin_maintenance_run_usage,
+				   builtin_maintenance_run_options);
+	return maintenance_task_gc(&opts);
+}
+
+static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage(builtin_maintenance_usage);
+
+	if (!strcmp(argv[1], "run"))
+		return maintenance_run(argc - 1, argv + 1, prefix);
+
+	die(_("invalid subcommand: %s"), argv[1]);
+}
diff --git a/command-list.txt b/command-list.txt
index e5901f2213..0e3204e7d1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -117,6 +117,7 @@ git-ls-remote                           plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
+git-maintenance                         mainporcelain
 git-merge                               mainporcelain           history
 git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..14aa691f19
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err &&
+	test_expect_code 128 git maintenance barf 2>err &&
+	test_i18ngrep "invalid subcommand: barf" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6a8e194a99..d805e73f45 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1628,3 +1628,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 02/11] maintenance: add --quiet option
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      | 11 ++++++++++-
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ff47fb3641..04fa0fe329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 7f2771e786..da11f1edcd 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,12 +701,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto]"),
+	N_("git maintenance run [--auto] [--[no-]quiet]"),
 	NULL
 };
 
 struct maintenance_run_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -729,10 +734,14 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 14aa691f19..47d512464c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -11,11 +11,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "invalid subcommand: barf" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 03/11] maintenance: replace run_auto_gc()
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
                         ` (8 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 07d06ff445..b65a758661 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -95,9 +95,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2eb8d6a5a5..cb38e6f5ec 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -199,8 +199,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1891,7 +1893,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 74829a838e..8f31bfab56 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -456,7 +456,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2a1abe91f0..a9682c5669 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -934,7 +934,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 04/11] maintenance: initialize task array
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index da11f1edcd..81643515e5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,6 +728,47 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
+static int maintenance_run_tasks(struct maintenance_run_opts *opts)
+{
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
 	struct maintenance_run_opts opts;
@@ -750,7 +791,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
-	return maintenance_task_gc(&opts);
+	return maintenance_run_tasks(&opts);
 }
 
 static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 05/11] maintenance: add commit-graph task
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  8 ++++++++
 builtin/gc.c                      | 30 ++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++++----
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 ++
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 04fa0fe329..fc5dbcf0b9 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,14 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. The incremental
+	write is safe to run alongside concurrent Git processes since it
+	will not expire `.graph` files that were in the previous
+	`commit-graph-chain` file. They will be deleted by a later run based
+	on the expiration delay.
+
 gc::
 	Clean up unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 81643515e5..16e567992e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,31 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
+{
+	close_object_store(the_repository->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	return 0;
+}
+
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +763,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +775,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index e51c91dd5b..a9b0db7d4a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -516,7 +516,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1668,7 +1668,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -2038,7 +2038,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 09a97030dc..765221cdcc 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct raw_object_store;
 struct string_list;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 47d512464c..c0c4e6846e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 06/11] maintenance: add --task option
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  9 ++++-
 builtin/gc.c                      | 66 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 +++++++++++++
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index fc5dbcf0b9..819ca41ab6 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -63,6 +65,11 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order. See the 'TASKS' section
+	for the list of accepted `<task>` values.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 16e567992e..e94f263c77 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet]"),
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
 	NULL
 };
 
@@ -759,6 +759,9 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	unsigned enabled:1;
+
+	/* -1 if not selected. */
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -781,13 +784,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected_order >= 0;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && tasks[i].selected_order < 0)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -799,20 +821,58 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (tasks[i].selected_order >= 0)
+			num_selected++;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected_order >= 0) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
+	int i;
 	struct maintenance_run_opts opts;
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
 
+	for (i = 0; i < TASK__COUNT; i++)
+		tasks[i].selected_order = -1;
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c0c4e6846e..792765aff7 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,4 +25,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 07/11] maintenance: take a lock on the objects directory
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index e94f263c77..1cebb7282d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			warning(_("lock file '%s' exists, skipping maintenance"),
+				lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
@@ -818,6 +837,7 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  6 ++++++
 Documentation/git-maintenance.txt    | 14 +++++++++-----
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3042d80978..f93b6837e4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..4402b8b49f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,6 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified to
+	`git maintenance run`. These config values are ignored if a
+	`--task` option exists. By default, only `maintenance.gc.enabled`
+	is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 819ca41ab6..6abcb8255a 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
@@ -67,8 +69,10 @@ OPTIONS
 
 --task=<task>::
 	If this option is specified one or more times, then only run the
-	specified tasks in the specified order. See the 'TASKS' section
-	for the list of accepted `<task>` values.
+	specified tasks in the specified order. If no `--task=<task>`
+	arguments are specified, then only the tasks with
+	`maintenance.<task>.enabled` configured as `true` are considered.
+	See the 'TASKS' section for the list of accepted `<task>` values.
 
 GIT
 ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 1cebb7282d..67a8d405a1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -841,6 +841,24 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -889,6 +907,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	for (i = 0; i < TASK__COUNT; i++)
 		tasks[i].selected_order = -1;
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 792765aff7..290abb7832 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -25,6 +25,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 09/11] maintenance: use pointers to check --auto
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 67a8d405a1..709d13553b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -755,9 +755,17 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
 	/* -1 if not selected. */
@@ -776,6 +784,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -831,6 +840,11 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -845,6 +859,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 290abb7832..4f6a04ddb1 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -21,7 +21,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 10/11] maintenance: add auto condition for commit-graph task
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-04 13:09       ` [PATCH v4 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 82 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 93 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 4402b8b49f..7cc6700d57 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -4,3 +4,13 @@ maintenance.<task>.enabled::
 	`git maintenance run`. These config values are ignored if a
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index 709d13553b..8c4edf19ba 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,86 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define SEEN		(1u<<0)
+
+struct cg_auto_data {
+	int num_not_in_graph;
+	int limit;
+};
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & SEEN)
+				continue;
+
+			parent->item->object.flags |= SEEN;
+			data->num_not_in_graph++;
+
+			if (data->num_not_in_graph >= data->limit) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+	struct cg_auto_data data;
+
+	data.num_not_in_graph = 0;
+	data.limit = 100;
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &data.limit);
+
+	if (!data.limit)
+		return 0;
+	if (data.limit < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, &data);
+
+	clear_commit_marks_all(SEEN);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -790,6 +871,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..20b18805f0 100644
--- a/object.h
+++ b/object.h
@@ -73,6 +73,7 @@ struct object_array {
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
+ * builtin/gc.c:             0
  * builtin/index-pack.c:                                     2021
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v4 11/11] maintenance: add trace2 regions for task execution
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-09-04 13:09       ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 8c4edf19ba..c3bcdc1167 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -927,10 +927,12 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2020-09-04 13:09       ` [PATCH v4 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11       ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
                           ` (11 more replies)
  11 siblings, 12 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Update in v4
============

A segfault when running just "git maintenance" is fixed.

Updates since v3
================

 * Two commit-message typos are fixed.

Thanks for all of the review!

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 337 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  65 ++++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 568 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v4:

  1:  aa961af387 !  1:  00a0d63508 maintenance: create basic maintenance runner
     @@ Commit message
      
          Helped-by: Jonathan Nieder <jrnieder@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## .gitignore ##
      @@
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +
      +int cmd_maintenance(int argc, const char **argv, const char *prefix)
      +{
     -+	if (argc == 2 && !strcmp(argv[1], "-h"))
     ++	if (argc < 2 ||
     ++	    (argc == 2 && !strcmp(argv[1], "-h")))
      +		usage(builtin_maintenance_usage);
      +
      +	if (!strcmp(argv[1], "run"))
     @@ t/t7900-maintenance.sh (new)
      +	test_expect_code 129 git maintenance -h 2>err &&
      +	test_i18ngrep "usage: git maintenance run" err &&
      +	test_expect_code 128 git maintenance barf 2>err &&
     -+	test_i18ngrep "invalid subcommand: barf" err
     ++	test_i18ngrep "invalid subcommand: barf" err &&
     ++	test_expect_code 129 git maintenance 2>err &&
     ++	test_i18ngrep "usage: git maintenance" err
      +'
      +
      +test_expect_success 'run [--auto]' '
  2:  5386d8a628 !  2:  52eb937f49 maintenance: add --quiet option
     @@ Commit message
          Pipe the option to the 'git gc' child process.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: OPTIONS
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
     - 	test_i18ngrep "invalid subcommand: barf" err
     + 	test_i18ngrep "usage: git maintenance" err
       '
       
      -test_expect_success 'run [--auto]' '
  3:  e28b332df4 !  3:  3cbdeeafb5 maintenance: replace run_auto_gc()
     @@ Commit message
          documentation to include these options at the same time.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/fetch-options.txt ##
      @@ Documentation/fetch-options.txt: ifndef::git-pull[]
  4:  82326c1c38 !  4:  1d2f2731bd maintenance: initialize task array
     @@ Commit message
          Helped-by: Taylor Blau <me@ttaylorr.com>
          Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
  5:  06984a42bf !  5:  8204ebbf83 maintenance: add commit-graph task
     @@ Commit message
          argument when writing the commit-graph.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: run::
  6:  69298aee24 !  6:  91b8555c9e maintenance: add --task option
     @@ Commit message
          member should be ignored if --task=<task> appears.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: SUBCOMMANDS
  7:  c57998a1c8 !  7:  1a0a3eebb8 maintenance: take a lock on the objects directory
     @@ Commit message
          loop between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
  8:  dc2a092366 !  8:  713207b4a1 maintenance: create maintenance.<task>.enabled config
     @@ Commit message
          tasks (or turn off the 'gc' task).
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config.txt ##
      @@ Documentation/config.txt: include::config/mailinfo.txt[]
  9:  f5f1e85b03 !  9:  d424cda058 maintenance: use pointers to check --auto
     @@ Commit message
          gc.autoDetach as a maintenance.autoDetach config option.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
 10:  8f84b03c46 ! 10:  7a2a4e1e52 maintenance: add auto condition for commit-graph task
     @@ Commit message
          run every time.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config/maintenance.txt ##
      @@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
 11:  2d6414d89b ! 11:  20a74abd96 maintenance: add trace2 regions for task execution
     @@ Commit message
          maintenance: add trace2 regions for task execution
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 01/11] maintenance: create basic maintenance runner
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
                           ` (10 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 ++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 58 +++++++++++++++++++++++++++++++
 command-list.txt                  |  1 +
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 23 ++++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 8 files changed, 175 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..ff47fb3641
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
+
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
+
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Clean up unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be expensive for large repositories,
+	as it repacks all Git objects into a single pack-file. It can also
+	be disruptive in some situations, as it deletes stale data. See
+	linkgit:git-gc[1] for more details on garbage collection in Git.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..ec064e8686 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,61 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_run_usage[] = {
+	N_("git maintenance run [--auto]"),
+	NULL
+};
+
+struct maintenance_run_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(int argc, const char **argv, const char *prefix)
+{
+	struct maintenance_run_opts opts;
+	struct option builtin_maintenance_run_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+	memset(&opts, 0, sizeof(opts));
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_run_options,
+			     builtin_maintenance_run_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (argc != 0)
+		usage_with_options(builtin_maintenance_run_usage,
+				   builtin_maintenance_run_options);
+	return maintenance_task_gc(&opts);
+}
+
+static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	if (argc < 2 ||
+	    (argc == 2 && !strcmp(argv[1], "-h")))
+		usage(builtin_maintenance_usage);
+
+	if (!strcmp(argv[1], "run"))
+		return maintenance_run(argc - 1, argv + 1, prefix);
+
+	die(_("invalid subcommand: %s"), argv[1]);
+}
diff --git a/command-list.txt b/command-list.txt
index e5901f2213..0e3204e7d1 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -117,6 +117,7 @@ git-ls-remote                           plumbinginterrogators
 git-ls-tree                             plumbinginterrogators
 git-mailinfo                            purehelpers
 git-mailsplit                           purehelpers
+git-maintenance                         mainporcelain
 git-merge                               mainporcelain           history
 git-merge-base                          plumbinginterrogators
 git-merge-file                          plumbingmanipulators
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@ static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..c2f0b1d0c0
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err &&
+	test_expect_code 128 git maintenance barf 2>err &&
+	test_i18ngrep "invalid subcommand: barf" err &&
+	test_expect_code 129 git maintenance 2>err &&
+	test_i18ngrep "usage: git maintenance" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6a8e194a99..d805e73f45 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1628,3 +1628,36 @@ test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 02/11] maintenance: add --quiet option
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
                           ` (9 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-maintenance.txt |  3 +++
 builtin/gc.c                      | 11 ++++++++++-
 t/t7900-maintenance.sh            | 15 ++++++++++-----
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index ff47fb3641..04fa0fe329 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -52,6 +52,9 @@ OPTIONS
 	in the `gc.auto` config setting, or when the number of pack-files
 	exceeds the `gc.autoPackLimit` config setting.
 
+--quiet::
+	Do not report progress or other information over `stderr`.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index ec064e8686..bacce0c747 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,12 +701,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto]"),
+	N_("git maintenance run [--auto] [--[no-]quiet]"),
 	NULL
 };
 
 struct maintenance_run_opts {
 	int auto_flag;
+	int quiet;
 };
 
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
@@ -718,6 +719,10 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 	if (opts->auto_flag)
 		strvec_push(&child.args, "--auto");
+	if (opts->quiet)
+		strvec_push(&child.args, "--quiet");
+	else
+		strvec_push(&child.args, "--no-quiet");
 
 	close_object_store(the_repository->objects);
 	return run_command(&child);
@@ -729,10 +734,14 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "quiet", &opts.quiet,
+			 N_("do not report progress or other information over stderr")),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
+	opts.quiet = !isatty(2);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index c2f0b1d0c0..5637053bf6 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -13,11 +13,16 @@ test_expect_success 'help text' '
 	test_i18ngrep "usage: git maintenance" err
 '
 
-test_expect_success 'run [--auto]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
-	test_subcommand git gc <run-no-auto.txt &&
-	test_subcommand git gc --auto <run-auto.txt
+test_expect_success 'run [--auto|--quiet]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
+		git maintenance run 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
+		git maintenance run --auto 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
+		git maintenance run --no-quiet 2>/dev/null &&
+	test_subcommand git gc --quiet <run-no-auto.txt &&
+	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 03/11] maintenance: replace run_auto_gc()
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
                           ` (8 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/fetch-options.txt |  6 ++++--
 Documentation/git-clone.txt     |  6 +++---
 builtin/am.c                    |  2 +-
 builtin/commit.c                |  2 +-
 builtin/fetch.c                 |  6 ++++--
 builtin/merge.c                 |  2 +-
 builtin/rebase.c                |  4 ++--
 run-command.c                   | 16 +++++++---------
 run-command.h                   |  2 +-
 t/t5510-fetch.sh                |  2 +-
 10 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 07d06ff445..b65a758661 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -95,9 +95,11 @@ ifndef::git-pull[]
 	Allow several <repository> and <group> arguments to be
 	specified. No <refspec>s may be specified.
 
+--[no-]auto-maintenance::
 --[no-]auto-gc::
-	Run `git gc --auto` at the end to perform garbage collection
-	if needed. This is enabled by default.
+	Run `git maintenance run --auto` at the end to perform automatic
+	repository maintenance if needed. (`--[no-]auto-gc` is a synonym.)
+	This is enabled by default.
 
 --[no-]write-commit-graph::
 	Write a commit-graph after fetching. This overrides the config
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c898310099..097e6a86c5 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -78,9 +78,9 @@ repository using this option and then delete branches (or use any
 other Git command that makes any existing commit unreferenced) in the
 source repository, some objects may become unreferenced (or dangling).
 These objects may be removed by normal Git operations (such as `git commit`)
-which automatically call `git gc --auto`. (See linkgit:git-gc[1].)
-If these objects are removed and were referenced by the cloned repository,
-then the cloned repository will become corrupt.
+which automatically call `git maintenance run --auto`. (See
+linkgit:git-maintenance[1].) If these objects are removed and were referenced
+by the cloned repository, then the cloned repository will become corrupt.
 +
 Note that running `git repack` without the `--local` option in a repository
 cloned with `--shared` will copy objects from the source repository into a pack
diff --git a/builtin/am.c b/builtin/am.c
index dd4e6c2d9b..68e9d17379 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1795,7 +1795,7 @@ static void am_run(struct am_state *state, int resume)
 	if (!state->rebasing) {
 		am_destroy(state);
 		close_object_store(the_repository->objects);
-		run_auto_gc(state->quiet);
+		run_auto_maintenance(state->quiet);
 	}
 }
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 69ac78d5e5..f9b0a0c05d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1702,7 +1702,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	git_test_write_commit_graph_or_die();
 
 	repo_rerere(the_repository, 0);
-	run_auto_gc(quiet);
+	run_auto_maintenance(quiet);
 	run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
 	if (amend && !no_post_rewrite) {
 		commit_post_rewrite(the_repository, current_head, &oid);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 2eb8d6a5a5..cb38e6f5ec 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -199,8 +199,10 @@ static struct option builtin_fetch_options[] = {
 	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
 			N_("report that we have only objects reachable from this object")),
 	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+	OPT_BOOL(0, "auto-maintenance", &enable_auto_gc,
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
-		 N_("run 'gc --auto' after fetching")),
+		 N_("run 'maintenance --auto' after fetching")),
 	OPT_BOOL(0, "show-forced-updates", &fetch_show_forced_updates,
 		 N_("check for forced-updates on all updated branches")),
 	OPT_BOOL(0, "write-commit-graph", &fetch_write_commit_graph,
@@ -1891,7 +1893,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	close_object_store(the_repository->objects);
 
 	if (enable_auto_gc)
-		run_auto_gc(verbosity < 0);
+		run_auto_maintenance(verbosity < 0);
 
 	return result;
 }
diff --git a/builtin/merge.c b/builtin/merge.c
index 74829a838e..8f31bfab56 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -456,7 +456,7 @@ static void finish(struct commit *head_commit,
 			 * user should see them.
 			 */
 			close_object_store(the_repository->objects);
-			run_auto_gc(verbosity < 0);
+			run_auto_maintenance(verbosity < 0);
 		}
 	}
 	if (new_head && show_diffstat) {
diff --git a/builtin/rebase.c b/builtin/rebase.c
index dadb52fa92..9ffba9009d 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -728,10 +728,10 @@ static int finish_rebase(struct rebase_options *opts)
 	apply_autostash(state_dir_path("autostash", opts));
 	close_object_store(the_repository->objects);
 	/*
-	 * We ignore errors in 'gc --auto', since the
+	 * We ignore errors in 'git maintenance run --auto', since the
 	 * user should see them.
 	 */
-	run_auto_gc(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
+	run_auto_maintenance(!(opts->flags & (REBASE_NO_QUIET|REBASE_VERBOSE)));
 	if (opts->type == REBASE_MERGE) {
 		struct replay_opts replay = REPLAY_OPTS_INIT;
 
diff --git a/run-command.c b/run-command.c
index cc9c3296ba..2ee59acdc8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1866,15 +1866,13 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 	return result;
 }
 
-int run_auto_gc(int quiet)
+int run_auto_maintenance(int quiet)
 {
-	struct strvec argv_gc_auto = STRVEC_INIT;
-	int status;
+	struct child_process maint = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&argv_gc_auto, "gc", "--auto", NULL);
-	if (quiet)
-		strvec_push(&argv_gc_auto, "--quiet");
-	status = run_command_v_opt(argv_gc_auto.v, RUN_GIT_CMD);
-	strvec_clear(&argv_gc_auto);
-	return status;
+	maint.git_cmd = 1;
+	strvec_pushl(&maint.args, "maintenance", "run", "--auto", NULL);
+	strvec_push(&maint.args, quiet ? "--quiet" : "--no-quiet");
+
+	return run_command(&maint);
 }
diff --git a/run-command.h b/run-command.h
index 8b9bfaef16..6472b38bde 100644
--- a/run-command.h
+++ b/run-command.h
@@ -221,7 +221,7 @@ int run_hook_ve(const char *const *env, const char *name, va_list args);
 /*
  * Trigger an auto-gc
  */
-int run_auto_gc(int quiet);
+int run_auto_maintenance(int quiet);
 
 #define RUN_COMMAND_NO_STDIN 1
 #define RUN_GIT_CMD	     2	/*If this is to be git sub-command */
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 2a1abe91f0..a9682c5669 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -934,7 +934,7 @@ test_expect_success 'fetching with auto-gc does not lock up' '
 		git config fetch.unpackLimit 1 &&
 		git config gc.autoPackLimit 1 &&
 		git config gc.autoDetach false &&
-		GIT_ASK_YESNO="$D/askyesno" git fetch >fetch.out 2>&1 &&
+		GIT_ASK_YESNO="$D/askyesno" git fetch --verbose >fetch.out 2>&1 &&
 		test_i18ngrep "Auto packing the repository" fetch.out &&
 		! grep "Should I try again" fetch.out
 	)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 04/11] maintenance: initialize task array
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (2 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
                           ` (7 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index bacce0c747..904fb2d9aa 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -728,6 +728,47 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 	return run_command(&child);
 }
 
+typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
+
+struct maintenance_task {
+	const char *name;
+	maintenance_task_fn *fn;
+	unsigned enabled:1;
+};
+
+enum maintenance_task_label {
+	TASK_GC,
+
+	/* Leave as final value */
+	TASK__COUNT
+};
+
+static struct maintenance_task tasks[] = {
+	[TASK_GC] = {
+		"gc",
+		maintenance_task_gc,
+		1,
+	},
+};
+
+static int maintenance_run_tasks(struct maintenance_run_opts *opts)
+{
+	int i;
+	int result = 0;
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (!tasks[i].enabled)
+			continue;
+
+		if (tasks[i].fn(opts)) {
+			error(_("task '%s' failed"), tasks[i].name);
+			result = 1;
+		}
+	}
+
+	return result;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
 	struct maintenance_run_opts opts;
@@ -750,7 +791,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	if (argc != 0)
 		usage_with_options(builtin_maintenance_run_usage,
 				   builtin_maintenance_run_options);
-	return maintenance_task_gc(&opts);
+	return maintenance_run_tasks(&opts);
 }
 
 static const char builtin_maintenance_usage[] = N_("git maintenance run [<options>]");
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 05/11] maintenance: add commit-graph task
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (3 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
                           ` (6 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-maintenance.txt |  8 ++++++++
 builtin/gc.c                      | 30 ++++++++++++++++++++++++++++++
 commit-graph.c                    |  8 ++++----
 commit-graph.h                    |  1 +
 t/t7900-maintenance.sh            |  2 ++
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 04fa0fe329..fc5dbcf0b9 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -35,6 +35,14 @@ run::
 TASKS
 -----
 
+commit-graph::
+	The `commit-graph` job updates the `commit-graph` files incrementally,
+	then verifies that the written data is correct. The incremental
+	write is safe to run alongside concurrent Git processes since it
+	will not expire `.graph` files that were in the previous
+	`commit-graph-chain` file. They will be deleted by a later run based
+	on the expiration delay.
+
 gc::
 	Clean up unnecessary files and optimize the local repository. "GC"
 	stands for "garbage collection," but this task performs many
diff --git a/builtin/gc.c b/builtin/gc.c
index 904fb2d9aa..86b807a008 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -710,6 +710,31 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+static int run_write_commit_graph(struct maintenance_run_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "commit-graph", "write",
+		     "--split", "--reachable", NULL);
+
+	if (opts->quiet)
+		strvec_push(&child.args, "--no-progress");
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
+{
+	close_object_store(the_repository->objects);
+	if (run_write_commit_graph(opts)) {
+		error(_("failed to write commit-graph"));
+		return 1;
+	}
+
+	return 0;
+}
+
 static int maintenance_task_gc(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -738,6 +763,7 @@ struct maintenance_task {
 
 enum maintenance_task_label {
 	TASK_GC,
+	TASK_COMMIT_GRAPH,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -749,6 +775,10 @@ static struct maintenance_task tasks[] = {
 		maintenance_task_gc,
 		1,
 	},
+	[TASK_COMMIT_GRAPH] = {
+		"commit-graph",
+		maintenance_task_commit_graph,
+	},
 };
 
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
diff --git a/commit-graph.c b/commit-graph.c
index e51c91dd5b..a9b0db7d4a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -172,7 +172,7 @@ static char *get_split_graph_filename(struct object_directory *odb,
 		       oid_hex);
 }
 
-static char *get_chain_filename(struct object_directory *odb)
+char *get_commit_graph_chain_filename(struct object_directory *odb)
 {
 	return xstrfmt("%s/info/commit-graphs/commit-graph-chain", odb->path);
 }
@@ -516,7 +516,7 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r,
 	struct stat st;
 	struct object_id *oids;
 	int i = 0, valid = 1, count;
-	char *chain_name = get_chain_filename(odb);
+	char *chain_name = get_commit_graph_chain_filename(odb);
 	FILE *fp;
 	int stat_res;
 
@@ -1668,7 +1668,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	if (ctx->split) {
-		char *lock_name = get_chain_filename(ctx->odb);
+		char *lock_name = get_commit_graph_chain_filename(ctx->odb);
 
 		hold_lock_file_for_update_mode(&lk, lock_name,
 					       LOCK_DIE_ON_ERROR, 0444);
@@ -2038,7 +2038,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 	if (ctx->split_opts && ctx->split_opts->expire_time)
 		expire_time = ctx->split_opts->expire_time;
 	if (!ctx->split) {
-		char *chain_file_name = get_chain_filename(ctx->odb);
+		char *chain_file_name = get_commit_graph_chain_filename(ctx->odb);
 		unlink(chain_file_name);
 		free(chain_file_name);
 		ctx->num_commit_graphs_after = 0;
diff --git a/commit-graph.h b/commit-graph.h
index 09a97030dc..765221cdcc 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -25,6 +25,7 @@ struct raw_object_store;
 struct string_list;
 
 char *get_commit_graph_filename(struct object_directory *odb);
+char *get_commit_graph_chain_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 
 /*
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 5637053bf6..505a1b4d60 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -4,6 +4,8 @@ test_description='git maintenance builtin'
 
 . ./test-lib.sh
 
+GIT_TEST_COMMIT_GRAPH=0
+
 test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 06/11] maintenance: add --task option
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (4 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-maintenance.txt |  9 ++++-
 builtin/gc.c                      | 66 +++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 27 +++++++++++++
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index fc5dbcf0b9..819ca41ab6 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,7 +30,9 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks.
+	Run one or more maintenance tasks. If one or more `--task=<task>`
+	options are specified, then those tasks are run in the provided
+	order. Otherwise, only the `gc` task is run.
 
 TASKS
 -----
@@ -63,6 +65,11 @@ OPTIONS
 --quiet::
 	Do not report progress or other information over `stderr`.
 
+--task=<task>::
+	If this option is specified one or more times, then only run the
+	specified tasks in the specified order. See the 'TASKS' section
+	for the list of accepted `<task>` values.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/gc.c b/builtin/gc.c
index 86b807a008..00fff59bdb 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -701,7 +701,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 }
 
 static const char * const builtin_maintenance_run_usage[] = {
-	N_("git maintenance run [--auto] [--[no-]quiet]"),
+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
 	NULL
 };
 
@@ -759,6 +759,9 @@ struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	unsigned enabled:1;
+
+	/* -1 if not selected. */
+	int selected_order;
 };
 
 enum maintenance_task_label {
@@ -781,13 +784,32 @@ static struct maintenance_task tasks[] = {
 	},
 };
 
+static int compare_tasks_by_selection(const void *a_, const void *b_)
+{
+	const struct maintenance_task *a, *b;
+
+	a = (const struct maintenance_task *)&a_;
+	b = (const struct maintenance_task *)&b_;
+
+	return b->selected_order - a->selected_order;
+}
+
 static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
-	int i;
+	int i, found_selected = 0;
 	int result = 0;
 
+	for (i = 0; !found_selected && i < TASK__COUNT; i++)
+		found_selected = tasks[i].selected_order >= 0;
+
+	if (found_selected)
+		QSORT(tasks, TASK__COUNT, compare_tasks_by_selection);
+
 	for (i = 0; i < TASK__COUNT; i++) {
-		if (!tasks[i].enabled)
+		if (found_selected && tasks[i].selected_order < 0)
+			continue;
+
+		if (!found_selected && !tasks[i].enabled)
 			continue;
 
 		if (tasks[i].fn(opts)) {
@@ -799,20 +821,58 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static int task_option_parse(const struct option *opt,
+			     const char *arg, int unset)
+{
+	int i, num_selected = 0;
+	struct maintenance_task *task = NULL;
+
+	BUG_ON_OPT_NEG(unset);
+
+	for (i = 0; i < TASK__COUNT; i++) {
+		if (tasks[i].selected_order >= 0)
+			num_selected++;
+		if (!strcasecmp(tasks[i].name, arg)) {
+			task = &tasks[i];
+		}
+	}
+
+	if (!task) {
+		error(_("'%s' is not a valid task"), arg);
+		return 1;
+	}
+
+	if (task->selected_order >= 0) {
+		error(_("task '%s' cannot be selected multiple times"), arg);
+		return 1;
+	}
+
+	task->selected_order = num_selected + 1;
+
+	return 0;
+}
+
 static int maintenance_run(int argc, const char **argv, const char *prefix)
 {
+	int i;
 	struct maintenance_run_opts opts;
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
+		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
+			N_("run a specific task"),
+			PARSE_OPT_NONEG, task_option_parse),
 		OPT_END()
 	};
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
 
+	for (i = 0; i < TASK__COUNT; i++)
+		tasks[i].selected_order = -1;
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_maintenance_run_options,
 			     builtin_maintenance_run_usage,
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 505a1b4d60..fb4cadd30c 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -27,4 +27,31 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'run --task=<task>' '
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-gc.txt" \
+		git maintenance run --task=gc 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
+		git maintenance run --task=commit-graph 2>/dev/null &&
+	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
+		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
+	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
+	test_subcommand git gc --quiet <run-gc.txt &&
+	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
+	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
+'
+
+test_expect_success 'run --task=bogus' '
+	test_must_fail git maintenance run --task=bogus 2>err &&
+	test_i18ngrep "is not a valid task" err
+'
+
+test_expect_success 'run --task duplicate' '
+	test_must_fail git maintenance run --task=gc --task=gc 2>err &&
+	test_i18ngrep "cannot be selected multiple times" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 07/11] maintenance: take a lock on the objects directory
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (5 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
  2020-09-17 18:11         ` [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
                           ` (4 subsequent siblings)
  11 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 00fff59bdb..7ba9c6f7c9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			warning(_("lock file '%s' exists, skipping maintenance"),
+				lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
@@ -818,6 +837,7 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (6 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
                           ` (3 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config.txt             |  2 ++
 Documentation/config/maintenance.txt |  6 ++++++
 Documentation/git-maintenance.txt    | 14 +++++++++-----
 builtin/gc.c                         | 19 +++++++++++++++++++
 t/t7900-maintenance.sh               |  8 ++++++++
 5 files changed, 44 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3042d80978..f93b6837e4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -398,6 +398,8 @@ include::config/mailinfo.txt[]
 
 include::config/mailmap.txt[]
 
+include::config/maintenance.txt[]
+
 include::config/man.txt[]
 
 include::config/merge.txt[]
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
new file mode 100644
index 0000000000..4402b8b49f
--- /dev/null
+++ b/Documentation/config/maintenance.txt
@@ -0,0 +1,6 @@
+maintenance.<task>.enabled::
+	This boolean config option controls whether the maintenance task
+	with name `<task>` is run when no `--task` option is specified to
+	`git maintenance run`. These config values are ignored if a
+	`--task` option exists. By default, only `maintenance.gc.enabled`
+	is true.
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 819ca41ab6..6abcb8255a 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -30,9 +30,11 @@ SUBCOMMANDS
 -----------
 
 run::
-	Run one or more maintenance tasks. If one or more `--task=<task>`
-	options are specified, then those tasks are run in the provided
-	order. Otherwise, only the `gc` task is run.
+	Run one or more maintenance tasks. If one or more `--task` options
+	are specified, then those tasks are run in that order. Otherwise,
+	the tasks are determined by which `maintenance.<task>.enabled`
+	config options are true. By default, only `maintenance.gc.enabled`
+	is true.
 
 TASKS
 -----
@@ -67,8 +69,10 @@ OPTIONS
 
 --task=<task>::
 	If this option is specified one or more times, then only run the
-	specified tasks in the specified order. See the 'TASKS' section
-	for the list of accepted `<task>` values.
+	specified tasks in the specified order. If no `--task=<task>`
+	arguments are specified, then only the tasks with
+	`maintenance.<task>.enabled` configured as `true` are considered.
+	See the 'TASKS' section for the list of accepted `<task>` values.
 
 GIT
 ---
diff --git a/builtin/gc.c b/builtin/gc.c
index 7ba9c6f7c9..55a3d836f0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -841,6 +841,24 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 	return result;
 }
 
+static void initialize_task_config(void)
+{
+	int i;
+	struct strbuf config_name = STRBUF_INIT;
+	for (i = 0; i < TASK__COUNT; i++) {
+		int config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.enabled",
+			    tasks[i].name);
+
+		if (!git_config_get_bool(config_name.buf, &config_value))
+			tasks[i].enabled = config_value;
+	}
+
+	strbuf_release(&config_name);
+}
+
 static int task_option_parse(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -889,6 +907,7 @@ static int maintenance_run(int argc, const char **argv, const char *prefix)
 	memset(&opts, 0, sizeof(opts));
 
 	opts.quiet = !isatty(2);
+	initialize_task_config();
 
 	for (i = 0; i < TASK__COUNT; i++)
 		tasks[i].selected_order = -1;
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index fb4cadd30c..8a162a18ba 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -27,6 +27,14 @@ test_expect_success 'run [--auto|--quiet]' '
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
+test_expect_success 'maintenance.<task>.enabled' '
+	git config maintenance.gc.enabled false &&
+	git config maintenance.commit-graph.enabled true &&
+	GIT_TRACE2_EVENT="$(pwd)/run-config.txt" git maintenance run 2>err &&
+	test_subcommand ! git gc --quiet <run-config.txt &&
+	test_subcommand git commit-graph write --split --reachable --no-progress <run-config.txt
+'
+
 test_expect_success 'run --task=<task>' '
 	GIT_TRACE2_EVENT="$(pwd)/run-commit-graph.txt" \
 		git maintenance run --task=commit-graph 2>/dev/null &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 09/11] maintenance: use pointers to check --auto
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (7 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c              | 16 ++++++++++++++++
 t/t5514-fetch-multiple.sh |  2 +-
 t/t7900-maintenance.sh    |  2 +-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 55a3d836f0..13c24bca7d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -755,9 +755,17 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts)
 
 typedef int maintenance_task_fn(struct maintenance_run_opts *opts);
 
+/*
+ * An auto condition function returns 1 if the task should run
+ * and 0 if the task should NOT run. See needs_to_gc() for an
+ * example.
+ */
+typedef int maintenance_auto_fn(void);
+
 struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
+	maintenance_auto_fn *auto_condition;
 	unsigned enabled:1;
 
 	/* -1 if not selected. */
@@ -776,6 +784,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_GC] = {
 		"gc",
 		maintenance_task_gc,
+		need_to_gc,
 		1,
 	},
 	[TASK_COMMIT_GRAPH] = {
@@ -831,6 +840,11 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		if (!found_selected && !tasks[i].enabled)
 			continue;
 
+		if (opts->auto_flag &&
+		    (!tasks[i].auto_condition ||
+		     !tasks[i].auto_condition()))
+			continue;
+
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
@@ -845,6 +859,8 @@ static void initialize_task_config(void)
 {
 	int i;
 	struct strbuf config_name = STRBUF_INIT;
+	gc_config();
+
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
 
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index de8e2f1531..bd202ec6f3 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -108,7 +108,7 @@ test_expect_success 'git fetch --multiple (two remotes)' '
 	 GIT_TRACE=1 git fetch --multiple one two 2>trace &&
 	 git branch -r > output &&
 	 test_cmp ../expect output &&
-	 grep "built-in: git gc" trace >gc &&
+	 grep "built-in: git maintenance" trace >gc &&
 	 test_line_count = 1 gc
 	)
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 8a162a18ba..53c883531e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -23,7 +23,7 @@ test_expect_success 'run [--auto|--quiet]' '
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
 	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand git gc --auto --quiet <run-auto.txt &&
+	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
 	test_subcommand git gc --no-quiet <run-no-quiet.txt
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 10/11] maintenance: add auto condition for commit-graph task
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (8 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11         ` [PATCH v5 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
  2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/maintenance.txt | 10 ++++
 builtin/gc.c                         | 82 ++++++++++++++++++++++++++++
 object.h                             |  1 +
 3 files changed, 93 insertions(+)

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 4402b8b49f..7cc6700d57 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -4,3 +4,13 @@ maintenance.<task>.enabled::
 	`git maintenance run`. These config values are ignored if a
 	`--task` option exists. By default, only `maintenance.gc.enabled`
 	is true.
+
+maintenance.commit-graph.auto::
+	This integer config option controls how often the `commit-graph` task
+	should be run as part of `git maintenance run --auto`. If zero, then
+	the `commit-graph` task will not run with the `--auto` option. A
+	negative value will force the task to run every time. Otherwise, a
+	positive value implies the command should run when the number of
+	reachable commits that are not in the commit-graph file is at least
+	the value of `maintenance.commit-graph.auto`. The default value is
+	100.
diff --git a/builtin/gc.c b/builtin/gc.c
index 13c24bca7d..4db853c561 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -28,6 +28,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "promisor-remote.h"
+#include "refs.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -710,6 +711,86 @@ struct maintenance_run_opts {
 	int quiet;
 };
 
+/* Remember to update object flag allocation in object.h */
+#define SEEN		(1u<<0)
+
+struct cg_auto_data {
+	int num_not_in_graph;
+	int limit;
+};
+
+static int dfs_on_ref(const char *refname,
+		      const struct object_id *oid, int flags,
+		      void *cb_data)
+{
+	struct cg_auto_data *data = (struct cg_auto_data *)cb_data;
+	int result = 0;
+	struct object_id peeled;
+	struct commit_list *stack = NULL;
+	struct commit *commit;
+
+	if (!peel_ref(refname, &peeled))
+		oid = &peeled;
+	if (oid_object_info(the_repository, oid, NULL) != OBJ_COMMIT)
+		return 0;
+
+	commit = lookup_commit(the_repository, oid);
+	if (!commit)
+		return 0;
+	if (parse_commit(commit))
+		return 0;
+
+	commit_list_append(commit, &stack);
+
+	while (!result && stack) {
+		struct commit_list *parent;
+
+		commit = pop_commit(&stack);
+
+		for (parent = commit->parents; parent; parent = parent->next) {
+			if (parse_commit(parent->item) ||
+			    commit_graph_position(parent->item) != COMMIT_NOT_FROM_GRAPH ||
+			    parent->item->object.flags & SEEN)
+				continue;
+
+			parent->item->object.flags |= SEEN;
+			data->num_not_in_graph++;
+
+			if (data->num_not_in_graph >= data->limit) {
+				result = 1;
+				break;
+			}
+
+			commit_list_append(parent->item, &stack);
+		}
+	}
+
+	free_commit_list(stack);
+	return result;
+}
+
+static int should_write_commit_graph(void)
+{
+	int result;
+	struct cg_auto_data data;
+
+	data.num_not_in_graph = 0;
+	data.limit = 100;
+	git_config_get_int("maintenance.commit-graph.auto",
+			   &data.limit);
+
+	if (!data.limit)
+		return 0;
+	if (data.limit < 0)
+		return 1;
+
+	result = for_each_ref(dfs_on_ref, &data);
+
+	clear_commit_marks_all(SEEN);
+
+	return result;
+}
+
 static int run_write_commit_graph(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -790,6 +871,7 @@ static struct maintenance_task tasks[] = {
 	[TASK_COMMIT_GRAPH] = {
 		"commit-graph",
 		maintenance_task_commit_graph,
+		should_write_commit_graph,
 	},
 };
 
diff --git a/object.h b/object.h
index 96a2105859..20b18805f0 100644
--- a/object.h
+++ b/object.h
@@ -73,6 +73,7 @@ struct object_array {
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
  * builtin/fsck.c:           0--3
+ * builtin/gc.c:             0
  * builtin/index-pack.c:                                     2021
  * builtin/pack-objects.c:                                   20
  * builtin/reflog.c:                   10--12
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v5 11/11] maintenance: add trace2 regions for task execution
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (9 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11         ` Derrick Stolee via GitGitGadget
  2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
  11 siblings, 0 replies; 91+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/gc.c b/builtin/gc.c
index 4db853c561..090959350e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -927,10 +927,12 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		trace2_region_enter("maintenance", tasks[i].name, r);
 		if (tasks[i].fn(opts)) {
 			error(_("task '%s' failed"), tasks[i].name);
 			result = 1;
 		}
+		trace2_region_leave("maintenance", tasks[i].name, r);
 	}
 
 	rollback_lock_file(&lk);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
                           ` (10 preceding siblings ...)
  2020-09-17 18:11         ` [PATCH v5 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
@ 2020-09-17 18:35         ` Junio C Hamano
  2020-09-18 13:14           ` Johannes Schindelin
  11 siblings, 1 reply; 91+ messages in thread
From: Junio C Hamano @ 2020-09-17 18:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Jonathan Tan, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Update in v4
> ============
>
> A segfault when running just "git maintenance" is fixed.

The net change is just a single liner below, and is obviously
correct.

I propagated it through to part #2 and part #3 locally, and
hopefully this makes part #1 ready to go.

Thanks.



diff --git a/builtin/gc.c b/builtin/gc.c
index c3bcdc1167..090959350e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1027,7 +1027,8 @@ static const char builtin_maintenance_usage[] = N_("git maintenance run [<option
 
 int cmd_maintenance(int argc, const char **argv, const char *prefix)
 {
-	if (argc == 2 && !strcmp(argv[1], "-h"))
+	if (argc < 2 ||
+	    (argc == 2 && !strcmp(argv[1], "-h")))
 		usage(builtin_maintenance_usage);
 
 	if (!strcmp(argv[1], "run"))
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 4f6a04ddb1..53c883531e 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -10,7 +10,9 @@ test_expect_success 'help text' '
 	test_expect_code 129 git maintenance -h 2>err &&
 	test_i18ngrep "usage: git maintenance run" err &&
 	test_expect_code 128 git maintenance barf 2>err &&
-	test_i18ngrep "invalid subcommand: barf" err
+	test_i18ngrep "invalid subcommand: barf" err &&
+	test_expect_code 129 git maintenance 2>err &&
+	test_i18ngrep "usage: git maintenance" err
 '
 
 test_expect_success 'run [--auto|--quiet]' '

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
@ 2020-09-18 13:14           ` Johannes Schindelin
  0 siblings, 0 replies; 91+ messages in thread
From: Johannes Schindelin @ 2020-09-18 13:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, Jonathan Tan, Derrick Stolee

Hi Junio,

On Thu, 17 Sep 2020, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Update in v4
> > ============
> >
> > A segfault when running just "git maintenance" is fixed.
>
> The net change is just a single liner below, and is obviously
> correct.
>
> I propagated it through to part #2 and part #3 locally, and
> hopefully this makes part #1 ready to go.

Yay!

> diff --git a/builtin/gc.c b/builtin/gc.c
> index c3bcdc1167..090959350e 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1027,7 +1027,8 @@ static const char builtin_maintenance_usage[] = N_("git maintenance run [<option
>
>  int cmd_maintenance(int argc, const char **argv, const char *prefix)
>  {
> -	if (argc == 2 && !strcmp(argv[1], "-h"))
> +	if (argc < 2 ||
> +	    (argc == 2 && !strcmp(argv[1], "-h")))
>  		usage(builtin_maintenance_usage);
>
>  	if (!strcmp(argv[1], "run"))
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 4f6a04ddb1..53c883531e 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -10,7 +10,9 @@ test_expect_success 'help text' '
>  	test_expect_code 129 git maintenance -h 2>err &&
>  	test_i18ngrep "usage: git maintenance run" err &&
>  	test_expect_code 128 git maintenance barf 2>err &&
> -	test_i18ngrep "invalid subcommand: barf" err
> +	test_i18ngrep "invalid subcommand: barf" err &&
> +	test_expect_code 129 git maintenance 2>err &&
> +	test_i18ngrep "usage: git maintenance" err
>  '
>
>  test_expect_success 'run [--auto|--quiet]' '
>

Yep, that's how I read the range-diff, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5 07/11] maintenance: take a lock on the objects directory
  2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
@ 2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
  2020-09-21 13:43             ` Derrick Stolee
  0 siblings, 1 reply; 91+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-09-21 13:36 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, Jonathan Tan, Derrick Stolee, Derrick Stolee


On Thu, Sep 17 2020, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Performing maintenance on a Git repository involves writing data to the
> .git directory, which is not safe to do with multiple writers attempting
> the same operation. Ensure that only one 'git maintenance' process is
> running at a time by holding a file-based lock. Simply the presence of
> the .git/maintenance.lock file will prevent future maintenance. This
> lock is never committed, since it does not represent meaningful data.
> Instead, it is only a placeholder.
>
> If the lock file already exists, then no maintenance tasks are
> attempted. This will become very important later when we implement the
> 'prefetch' task, as this is our stop-gap from creating a recursive process
> loop between 'git fetch' and 'git maintenance run --auto'.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/gc.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 00fff59bdb..7ba9c6f7c9 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
>  {
>  	int i, found_selected = 0;
>  	int result = 0;
> +	struct lock_file lk;
> +	struct repository *r = the_repository;
> +	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
> +
> +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
> +		/*
> +		 * Another maintenance command is running.
> +		 *
> +		 * If --auto was provided, then it is likely due to a
> +		 * recursive process stack. Do not report an error in
> +		 * that case.
> +		 */
> +		if (!opts->auto_flag && !opts->quiet)
> +			warning(_("lock file '%s' exists, skipping maintenance"),
> +				lock_path);
> +		free(lock_path);
> +		return 0;
> +	}
> +	free(lock_path);
>  
>  	for (i = 0; !found_selected && i < TASK__COUNT; i++)
>  		found_selected = tasks[i].selected_order >= 0;

There's now two different lock strategies in builtin/gc.c, the existing
one introduced in 64a99eb476 ("gc: reject if another gc is running,
unless --force is given", 2013-08-08) where we write the hostname to the
gc.pid file, and then discard the lockfile depending on a heuristic of
whether or not it's the same etc., and this one.

With this as an entry point we'll entirely do away with the old one
since we don't use the "gc --auto" entry point.

All of that may or may not be desirable, but I think a description in
the docs & tests for how these lock files should interact would be
helpful. E.g. writing a different hostname in the gc lockfile and
setting the time on it with with "test-tool chmtime" will cause it to
plow ahead, but doing the same for "git maintenance" will stop it in its
tracks no matter the time or content.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5 07/11] maintenance: take a lock on the objects directory
  2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
@ 2020-09-21 13:43             ` Derrick Stolee
  2020-09-21 19:29               ` Junio C Hamano
  0 siblings, 1 reply; 91+ messages in thread
From: Derrick Stolee @ 2020-09-21 13:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, Jonathan Tan, Derrick Stolee, Derrick Stolee

On 9/21/2020 9:36 AM, Ævar Arnfjörð Bjarmason wrote:
> There's now two different lock strategies in builtin/gc.c, the existing
> one introduced in 64a99eb476 ("gc: reject if another gc is running,
> unless --force is given", 2013-08-08) where we write the hostname to the
> gc.pid file, and then discard the lockfile depending on a heuristic of
> whether or not it's the same etc., and this one.
> 
> With this as an entry point we'll entirely do away with the old one
> since we don't use the "gc --auto" entry point.

Users could still erroneously launch two `git gc` commands concurrently
and that will not use the maintenance.lock file. The GC lock is worth
keeping around until we redirect the `gc` command to be an alias that
runs `git maintenance run --task=gc [options]`.

> All of that may or may not be desirable, but I think a description in
> the docs & tests for how these lock files should interact would be
> helpful. E.g. writing a different hostname in the gc lockfile and
> setting the time on it with with "test-tool chmtime" will cause it to
> plow ahead, but doing the same for "git maintenance" will stop it in its
> tracks no matter the time or content.

Thanks for the pointers to create this test. I'll try to get to it
soon. For now, here is a tracking issue [1].

[1] https://github.com/gitgitgadget/git/issues/737

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v5 07/11] maintenance: take a lock on the objects directory
  2020-09-21 13:43             ` Derrick Stolee
@ 2020-09-21 19:29               ` Junio C Hamano
  0 siblings, 0 replies; 91+ messages in thread
From: Junio C Hamano @ 2020-09-21 19:29 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget, git, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>> With this as an entry point we'll entirely do away with the old one
>> since we don't use the "gc --auto" entry point.
>
> Users could still erroneously launch two `git gc` commands concurrently
> and that will not use the maintenance.lock file. The GC lock is worth
> keeping around until we redirect the `gc` command to be an alias that
> runs `git maintenance run --task=gc [options]`.

Yes, that is prudent.  And as long as the traditional gc lock stops
the maintenance command from running the gc task, everthing would
work consistently ;-)

Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2020-09-21 19:30 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-06 15:48 [PATCH 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-07 22:16   ` Martin Ågren
2020-08-12 21:03   ` Jonathan Nieder
2020-08-12 22:07     ` Junio C Hamano
2020-08-12 22:50       ` Jonathan Nieder
2020-08-14  1:05     ` Derrick Stolee
2020-08-06 15:48 ` [PATCH 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-07 22:29   ` Martin Ågren
2020-08-12 13:30     ` Derrick Stolee
2020-08-14 12:23       ` Martin Ågren
2020-08-06 15:48 ` [PATCH 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-06 15:48 ` [PATCH 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-18 14:22 ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-08-18 14:22   ` [PATCH v2 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-18 14:22   ` [PATCH v2 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-18 23:46     ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-18 23:51     ` Jonathan Tan
2020-08-19 15:04       ` Derrick Stolee
2020-08-19 17:43         ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-19  0:00     ` Jonathan Tan
2020-08-19  0:36       ` Junio C Hamano
2020-08-19 15:09         ` Derrick Stolee
2020-08-19 17:35           ` Jonathan Tan
2020-08-18 14:23   ` [PATCH v2 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-19  0:04     ` Jonathan Tan
2020-08-19 15:10       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-18 14:23   ` [PATCH v2 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-19  0:09     ` Jonathan Tan
2020-08-19 15:15       ` Derrick Stolee
2020-08-18 14:23   ` [PATCH v2 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-08-19  0:11     ` Jonathan Tan
2020-08-18 20:18   ` [PATCH v2 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-08-19 14:51     ` Derrick Stolee
2020-08-25 18:33   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-25 18:33     ` [PATCH v3 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-08-25 18:33     ` [PATCH v3 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-08-26 23:02       ` Jonathan Tan
2020-08-26 23:56         ` Junio C Hamano
2020-08-25 18:33     ` [PATCH v3 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-04 13:09     ` [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-04 13:09       ` [PATCH v4 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:11       ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 01/11] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 02/11] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 03/11] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 04/11] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 05/11] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 06/11] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 07/11] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-09-21 13:36           ` Ævar Arnfjörð Bjarmason
2020-09-21 13:43             ` Derrick Stolee
2020-09-21 19:29               ` Junio C Hamano
2020-09-17 18:11         ` [PATCH v5 08/11] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 09/11] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 10/11] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-09-17 18:11         ` [PATCH v5 11/11] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-09-17 18:35         ` [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks Junio C Hamano
2020-09-18 13:14           ` Johannes Schindelin

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git