git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Jonathan Tan <jonathantanmy@google.com>
Subject: [PATCH 0/2] repack: implement `--cruft-max-size`
Date: Thu, 7 Sep 2023 17:51:58 -0400	[thread overview]
Message-ID: <cover.1694123506.git.me@ttaylorr.com> (raw)

(These patches should be applied on top of a merge with
tb/repack-existing-packs-cleanup, and tb/multi-cruft-pack).

This series attempts to give users some more robust tools for managing
repositories with a large number of unreachable objects by storing them
in separate cruft packs, via a new option `--cruft-max-size`, like so:

    $ git.compile repack -d --cruft --max-pack-size=10M
    [...]
    Enumerating cruft objects: 617483, done.
    Counting objects: 100% (83791/83791), done.
    Delta compression using up to 20 threads
    Compressing objects: 100% (59696/59696), done.
    Writing objects: 100% (83791/83791), done.
    Total 83791 (delta 19251), reused 82502 (delta 19148), pack-reused 0

    $ ls -la .git/objects/pack/pack-*.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr 179144 Sep  7 17:46 .git/objects/pack/pack-1a95260d26f2897abfd2d54f1d58f535acb81d23.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr    452 Sep  7 17:46 .git/objects/pack/pack-5fde8701ae0f2e5553f1fa33de05faf12f94c07f.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr 155720 Sep  7 17:46 .git/objects/pack/pack-91f9e66921e0ebe1b5e35d34842551468cecdc28.mtimes
    -r--r--r-- 1 ttaylorr ttaylorr     56 Sep  7 17:46 .git/objects/pack/pack-95fe626743207b177b45f32b60fdc313e525ea60.mtimes

The details are explained in the second patch, but the gist is that we
will combine cruft packs up until they reach a certain threshold (as
specified by `--cruft-max-size`) and then begin a new "generation" of
cruft packs. That younger generation will grow up until it reaches the
configured threshold, at which point it will become "frozen" and then
any new unreachable objects will be written into a new generation of
cruft packs.

The goal of this series is to reduce I/O churn in repositories that
either (a) have a large number of unreachable objects, (b) rarely prune
them, or (c) both.

Instead of having to rewrite a cruft pack containing every unreachable
object in the repository, we only have to rewrite a cruft pack up until
it reaches the given threshold, at which point it is effectively kept
(i.e., it behaves as if the cruft pack had a ".keep" file tied to it,
provided that the threshold is held constant).

Thanks in advance for your review!

Taylor Blau (2):
  t7700: split cruft-related tests to t7704
  builtin/repack.c: implement support for `--cruft-max-size`

 Documentation/config/gc.txt  |   6 +
 Documentation/git-gc.txt     |   7 +
 Documentation/git-repack.txt |   9 +
 builtin/gc.c                 |   8 +
 builtin/repack.c             | 133 +++++++++++--
 t/t6500-gc.sh                |  27 +++
 t/t7700-repack.sh            | 121 -----------
 t/t7704-repack-cruft.sh      | 375 +++++++++++++++++++++++++++++++++++
 8 files changed, 553 insertions(+), 133 deletions(-)
 create mode 100755 t/t7704-repack-cruft.sh

-- 
2.42.0.138.g7e4e42e1aa

             reply	other threads:[~2023-09-07 21:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 21:51 Taylor Blau [this message]
2023-09-07 21:52 ` [PATCH 1/2] t7700: split cruft-related tests to t7704 Taylor Blau
2023-09-08  0:01   ` Eric Sunshine
2023-09-07 21:52 ` [PATCH 2/2] builtin/repack.c: implement support for `--cruft-max-size` Taylor Blau
2023-09-07 23:42   ` Junio C Hamano
2023-09-25 18:01     ` Taylor Blau
2023-09-08 11:21   ` Patrick Steinhardt
2023-10-02 20:30     ` Taylor Blau
2023-10-03  0:44 ` [PATCH v2 0/3] repack: implement `--cruft-max-size` Taylor Blau
2023-10-03  0:44   ` [PATCH v2 1/3] t7700: split cruft-related tests to t7704 Taylor Blau
2023-10-03  0:44   ` [PATCH v2 2/3] builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE Taylor Blau
2023-10-05 11:31     ` Patrick Steinhardt
2023-10-05 17:28       ` Taylor Blau
2023-10-05 20:22         ` Junio C Hamano
2023-10-03  0:44   ` [PATCH v2 3/3] builtin/repack.c: implement support for `--max-cruft-size` Taylor Blau
2023-10-05 12:08     ` Patrick Steinhardt
2023-10-05 17:35       ` Taylor Blau
2023-10-05 20:25       ` Junio C Hamano
2023-10-07 17:20     ` [PATCH] repack: free existing_cruft array after use Jeff King
2023-10-09  1:24       ` Taylor Blau
2023-10-09 17:28         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1694123506.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).