From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: sbeller@google.com, peff@peff.net, jrnieder@gmail.com,
avarab@gmail.com, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/7] Create 'expire' and 'repack' verbs for git-multi-pack-index
Date: Fri, 21 Dec 2018 08:28:36 -0800 (PST) [thread overview]
Message-ID: <pull.92.v2.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.92.git.gitgitgadget@gmail.com>
The multi-pack-index provides a fast way to find an object among a large
list of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.
Create new subcommands for the multi-pack-index builtin.
* 'git multi-pack-index expire': If we have a pack-file indexed by the
multi-pack-index, but all objects in that pack are duplicated in
more-recently modified packs, then delete that pack (and any others like
it). Delete the reference to that pack in the multi-pack-index.
* 'git multi-pack-index repack --batch-size=': Starting from the oldest
pack-files covered by the multi-pack-index, find those whose on-disk size
is below the batch size until we have a collection of packs whose sizes
add up to the batch size. Create a new pack containing all objects that
the multi-pack-index references to those packs.
This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the
last 'repack' are finished, run 'expire' again. This approach has some
advantages over the existing "repack everything" model:
1. Incremental. We can repack a small batch of objects at a time, instead
of repacking all reachable objects. We can also limit ourselves to the
objects that do not appear in newer pack-files.
2. Highly Available. By adding a new pack-file (and not deleting the old
pack-files) we do not interrupt concurrent Git commands, and do not
suffer performance degradation. By expiring only pack-files that have no
referenced objects, we know that Git commands that are doing normal
object lookups* will not be interrupted.
3. Note: if someone concurrently runs a Git command that uses
get_all_packs(), then that command could try to read the pack-files and
pack-indexes that we are deleting during an expire command. Such
commands are usually related to object maintenance (i.e. fsck, gc,
pack-objects) or are related to less-often-used features (i.e.
fast-import, http-backend, server-info).
We plan to use this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on
an hourly basis to keep up-to-date with the central server. The cache
servers supply packs on an hourly and daily basis, so most of the hourly
packs become useless after a new daily pack is downloaded. The 'expire'
command would clear out most of those packs, but many will still remain with
fewer than 100 objects remaining. The 'repack' command (with a batch size of
1-3gb, probably) can condense the remaining packs in commands that run for
1-3 min at a time. Since the daily packs range from 100-250mb, we will also
combine and condense those packs.
Updates in V2:
* Added a method, unlink_pack_path() to remove packfiles, but with the
additional check for a .keep file. This borrows logic from
builtin/repack.c.
* Modified documentation and commit messages to replace 'verb' with
'subcommand'. Simplified the documentation. (I left 'verbs' in the title
of the cover letter for consistency.)
Thanks, -Stolee
Derrick Stolee (7):
repack: refactor pack deletion for future use
Docs: rearrange subcommands for multi-pack-index
multi-pack-index: prepare for 'expire' subcommand
midx: refactor permutation logic
multi-pack-index: implement 'expire' verb
multi-pack-index: prepare 'repack' subcommand
midx: implement midx_repack()
Documentation/git-multi-pack-index.txt | 26 ++-
builtin/multi-pack-index.c | 12 +-
builtin/repack.c | 14 +-
midx.c | 217 +++++++++++++++++++++++--
midx.h | 2 +
packfile.c | 28 ++++
packfile.h | 7 +
t/t5319-multi-pack-index.sh | 98 +++++++++++
8 files changed, 376 insertions(+), 28 deletions(-)
base-commit: 26aa9fc81d4c7f6c3b456a29da0b7ec72e5c6595
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-92%2Fderrickstolee%2Fmidx-expire%2Fupstream-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-92/derrickstolee/midx-expire/upstream-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/92
Range-diff vs v1:
-: ---------- > 1: a697df120c repack: refactor pack deletion for future use
-: ---------- > 2: 55df6b20ff Docs: rearrange subcommands for multi-pack-index
1: 1e34b48a20 ! 3: 2529afe89e multi-pack-index: prepare for 'expire' verb
@@ -1,6 +1,6 @@
Author: Derrick Stolee <dstolee@microsoft.com>
- multi-pack-index: prepare for 'expire' verb
+ multi-pack-index: prepare for 'expire' subcommand
The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
@@ -8,12 +8,12 @@
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.
- Introduce a new 'expire' verb to the multi-pack-index builtin.
- This verb will delete these unused pack-files and rewrite the
+ Introduce a new 'expire' subcommand to the multi-pack-index builtin.
+ This subcommand will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.
- Add a test that verifies the 'expire' verb is correctly wired,
+ Add a test that verifies the 'expire' subcommand is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation.
@@ -24,16 +24,13 @@
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@
- When given as the verb, verify the contents of the MIDX file
- at `<dir>/packs/multi-pack-index`.
+ verify::
+ Verify the contents of the MIDX file.
+expire::
-+ When given as the verb, delete the pack-files that are tracked
-+ by the MIDX file at `<dir>/packs/multi-pack-index` but have
-+ no objects referenced by the MIDX. All objects in these pack-
-+ files have another copy in a more-recently modified pack-file.
-+ Rewrite the MIDX file afterward to remove all references to
-+ these pack-files.
++ Delete the pack-files that are tracked by the MIDX file, but
++ have no objects referenced by the MIDX. Rewrite the MIDX file
++ afterward to remove all references to these pack-files.
+
EXAMPLES
2: 8f496ccb46 = 4: 0c29a242fe midx: refactor permutation logic
3: 244bdf2a6f ! 5: 1c4af93f5e multi-pack-index: implement 'expire' verb
@@ -75,6 +75,7 @@
+ drop_index++;
+ i--;
+ missing_drops++;
++ continue;
+ }
+ }
+
@@ -114,8 +115,6 @@
{
- return 0;
+ uint32_t i, *count, result = 0;
-+ size_t dirlen;
-+ struct strbuf buf = STRBUF_INIT;
+ struct string_list packs_to_drop = STRING_LIST_INIT_DUP;
+ struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+
@@ -128,31 +127,27 @@
+ count[pack_int_id]++;
+ }
+
-+ strbuf_addstr(&buf, object_dir);
-+ strbuf_addstr(&buf, "/pack/");
-+ dirlen = buf.len;
-+
+ for (i = 0; i < m->num_packs; i++) {
++ char *pack_name;
++
+ if (count[i])
+ continue;
+
-+ if (m->packs[i]) {
-+ close_pack(m->packs[i]);
-+ m->packs[i] = NULL;
-+ }
++ if (prepare_midx_pack(m, i))
++ continue;
+
-+ string_list_insert(&packs_to_drop, m->pack_names[i]);
++ if (m->packs[i]->pack_keep)
++ continue;
+
-+ strbuf_setlen(&buf, dirlen);
-+ strbuf_addstr(&buf, m->pack_names[i]);
-+ unlink(buf.buf);
++ pack_name = xstrdup(m->packs[i]->pack_name);
++ close_pack(m->packs[i]);
++ FREE_AND_NULL(m->packs[i]);
+
-+ strip_suffix_mem(buf.buf, &buf.len, "idx");
-+ strbuf_addstr(&buf, "pack");
-+ unlink(buf.buf);
++ string_list_insert(&packs_to_drop, m->pack_names[i]);
++ unlink_pack_path(pack_name, 0);
++ free(pack_name);
+ }
+
-+ strbuf_release(&buf);
+ free(count);
+
+ if (packs_to_drop.nr)
4: 72b2139591 ! 6: af08e21c97 multi-pack-index: prepare 'repack' verb
@@ -1,6 +1,6 @@
Author: Derrick Stolee <dstolee@microsoft.com>
- multi-pack-index: prepare 'repack' verb
+ multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
@@ -10,16 +10,16 @@
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.
- Introduce a 'repack' verb to 'git multi-pack-index' that takes a
- '--batch-size' option. The verb will inspect the multi-pack-index
- for referenced pack-files whose size is smaller than the batch
- size, until collecting a list of pack-files whose sizes sum to
- larger than the batch size. Then, a new pack-file will be created
- containing the objects from those pack-files that are referenced
- by the multi-pack-index. The resulting pack is likely to actually
- be smaller than the batch size due to compression and the fact
- that there may be objects in the pack-files that have duplicate
- copies in other pack-files.
+ Introduce a 'repack' subcommand to 'git multi-pack-index' that
+ takes a '--batch-size' option. The verb will inspect the
+ multi-pack-index for referenced pack-files whose size is smaller
+ than the batch size, until collecting a list of pack-files whose
+ sizes sum to larger than the batch size. Then, a new pack-file
+ will be created containing the objects from those pack-files that
+ are referenced by the multi-pack-index. The resulting pack is
+ likely to actually be smaller than the batch size due to
+ compression and the fact that there may be objects in the pack-
+ files that have duplicate copies in other pack-files.
The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
@@ -32,20 +32,19 @@
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@
- Rewrite the MIDX file afterward to remove all references to
- these pack-files.
+ have no objects referenced by the MIDX. Rewrite the MIDX file
+ afterward to remove all references to these pack-files.
+repack::
-+ When given as the verb, collect a batch of pack-files whose
-+ size are all at most the size given by --batch-size, but
-+ whose sizes sum to larger than --batch-size. The batch is
-+ selected by greedily adding small pack-files starting with
-+ the oldest pack-files that fit the size. Create a new pack-
-+ file containing the objects the multi-pack-index indexes
-+ into thos pack-files, and rewrite the multi-pack-index to
-+ contain that pack-file. A later run of 'git multi-pack-index
-+ expire' will delete the pack-files that were part of this
-+ batch.
++ Collect a batch of pack-files whose size are all at most the
++ size given by --batch-size, but whose sizes sum to larger
++ than --batch-size. The batch is selected by greedily adding
++ small pack-files starting with the oldest pack-files that fit
++ the size. Create a new pack-file containing the objects the
++ multi-pack-index indexes into those pack-files, and rewrite
++ the multi-pack-index to contain that pack-file. A later run
++ of 'git multi-pack-index expire' will delete the pack-files
++ that were part of this batch.
+
EXAMPLES
@@ -123,7 +122,7 @@
)
'
-+test_expect_success 'repack does not create any packs' '
++test_expect_success 'repack with minimum size does not alter existing packs' '
+ (
+ cd dup &&
+ ls .git/objects/pack >expect &&
5: 41ef671ec8 = 7: bef7aa007c midx: implement midx_repack()
--
gitgitgadget
next prev parent reply other threads:[~2018-12-21 16:28 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-10 18:06 [PATCH 0/5] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 1/5] multi-pack-index: prepare for 'expire' verb Derrick Stolee via GitGitGadget
2018-12-11 1:35 ` Stefan Beller
2018-12-11 1:59 ` SZEDER Gábor
2018-12-11 12:32 ` Derrick Stolee
2018-12-10 18:06 ` [PATCH 2/5] midx: refactor permutation logic Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 3/5] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 4/5] multi-pack-index: prepare 'repack' verb Derrick Stolee via GitGitGadget
2018-12-11 1:54 ` Stefan Beller
2018-12-11 12:45 ` Derrick Stolee
2018-12-10 18:06 ` [PATCH 5/5] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2018-12-11 2:32 ` Stefan Beller
2018-12-11 13:00 ` Derrick Stolee
2018-12-12 7:40 ` Junio C Hamano
2018-12-13 4:23 ` Junio C Hamano
2018-12-21 16:28 ` Derrick Stolee via GitGitGadget [this message]
2018-12-21 16:28 ` [PATCH v2 1/7] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 2/7] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 3/7] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 4/7] midx: refactor permutation logic Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 5/7] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 6/7] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2018-12-21 16:28 ` [PATCH v2 7/7] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 0/9] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 1/9] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 2/9] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 3/9] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 4/9] midx: simplify computation of pack name lengths Derrick Stolee via GitGitGadget
2019-01-09 15:21 ` [PATCH v3 5/9] midx: refactor permutation logic and pack sorting Derrick Stolee via GitGitGadget
2019-01-23 21:00 ` Jonathan Tan
2019-01-24 17:34 ` Derrick Stolee
2019-01-24 19:17 ` Derrick Stolee
2019-01-09 15:21 ` [PATCH v3 6/9] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2019-01-09 15:54 ` SZEDER Gábor
2019-01-10 18:05 ` Junio C Hamano
2019-01-23 22:13 ` Jonathan Tan
2019-01-24 17:36 ` Derrick Stolee
2019-01-09 15:21 ` [PATCH v3 7/9] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-01-09 15:56 ` SZEDER Gábor
2019-01-23 22:38 ` Jonathan Tan
2019-01-24 19:36 ` Derrick Stolee
2019-01-24 21:38 ` Jonathan Tan
2019-01-09 15:21 ` [PATCH v3 8/9] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-01-23 22:33 ` Jonathan Tan
2019-01-09 15:21 ` [PATCH v3 9/9] multi-pack-index: test expire while adding packs Derrick Stolee via GitGitGadget
2019-01-17 15:27 ` [PATCH v3 0/9] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee
2019-01-23 22:44 ` Jonathan Tan
2019-01-24 21:51 ` [PATCH v4 00/10] " Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 01/10] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 02/10] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 03/10] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 04/10] midx: simplify computation of pack name lengths Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 05/10] midx: refactor permutation logic and pack sorting Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 06/10] multi-pack-index: implement 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-24 21:51 ` [PATCH v4 07/10] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-01-25 23:24 ` Josh Steadmon
2019-01-24 21:52 ` [PATCH v4 08/10] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-01-26 17:10 ` Derrick Stolee
2019-01-27 22:50 ` Junio C Hamano
2019-01-24 21:52 ` [PATCH v4 09/10] multi-pack-index: test expire while adding packs Derrick Stolee via GitGitGadget
2019-01-24 21:52 ` [PATCH v4 10/10] midx: add test that 'expire' respects .keep files Derrick Stolee via GitGitGadget
2019-01-24 22:14 ` [PATCH v4 00/10] Create 'expire' and 'repack' verbs for git-multi-pack-index Jonathan Tan
2019-01-25 23:49 ` Josh Steadmon
2019-04-24 15:14 ` [PATCH v5 00/11] " Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 01/11] repack: refactor pack deletion for future use Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 02/11] Docs: rearrange subcommands for multi-pack-index Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 03/11] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 04/11] midx: simplify computation of pack name lengths Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 05/11] midx: refactor permutation logic and pack sorting Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 06/11] multi-pack-index: implement 'expire' subcommand Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 07/11] multi-pack-index: prepare 'repack' subcommand Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 08/11] midx: implement midx_repack() Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 09/11] multi-pack-index: test expire while adding packs Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 10/11] midx: add test that 'expire' respects .keep files Derrick Stolee
2019-04-24 15:14 ` [PATCH v5 11/11] t5319-multi-pack-index.sh: test batch size zero Derrick Stolee
2019-04-25 5:38 ` [PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index Junio C Hamano
2019-04-25 11:06 ` Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 " Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 01/11] repack: refactor pack deletion for future use Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 02/11] Docs: rearrange subcommands for multi-pack-index Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 03/11] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 04/11] midx: simplify computation of pack name lengths Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 05/11] midx: refactor permutation logic and pack sorting Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 06/11] multi-pack-index: implement 'expire' subcommand Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 07/11] multi-pack-index: prepare 'repack' subcommand Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 08/11] midx: implement midx_repack() Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 09/11] multi-pack-index: test expire while adding packs Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 10/11] midx: add test that 'expire' respects .keep files Derrick Stolee
2019-05-14 18:47 ` [PATCH v6 11/11] t5319-multi-pack-index.sh: test batch size zero Derrick Stolee
2019-06-10 14:15 ` [PATCH v6 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee
2019-06-10 17:31 ` Junio C Hamano
2019-06-10 17:57 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.92.v2.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).