git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire`
@ 2022-09-20  1:55 Taylor Blau
  2022-09-20  1:55 ` [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Taylor Blau
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

This series fixes a pair of bugs that were originally pointed out by
Gregory Szorc in [1].

Namely, that both `git multi-pack-index repack` and `git
multi-pack-index expire` can cause us to "absorb" the cruft pack,
distributing its objects into a new pack, and removing its metadata.

This is worth avoiding, since even though it doesn't result in object
corruption, this bug removes semi-important metadata contained in the
.mtimes file, which controls how fast objects leave the repository
during a pruning GC.

This series teaches both sub-commands to avoid any cruft pack(s),
preserving their metadata.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/CAKQoGanPBec6wRO6uWrETaoJXdszpjRWytXaJwx6jw0mrrj-gQ@mail.gmail.com/

Taylor Blau (7):
  Documentation/git-multi-pack-index.txt: fix typo
  Documentation/git-multi-pack-index.txt: clarify expire behavior
  midx.c: prevent `expire` from removing the cruft pack
  midx.c: avoid cruft packs with `repack --batch-size=0`
  midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`
  midx.c: remove unnecessary loop condition
  midx.c: avoid cruft packs with non-zero `repack --batch-size`

 Documentation/git-multi-pack-index.txt |  7 +-
 midx.c                                 | 12 +++-
 t/t5319-multi-pack-index.sh            | 94 ++++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 6 deletions(-)

-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20  1:55 ` [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior Taylor Blau
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

Remove the extra space character between "tracked" and "by", which dates
back to when this paragraph was originally written in cff9711616
(multi-pack-index: prepare for 'expire' subcommand, 2019-06-10).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index a48c3d5ea6..b4a2378cd8 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -70,7 +70,7 @@ verify::
 	Verify the contents of the MIDX file.
 
 expire::
-	Delete the pack-files that are tracked 	by the MIDX file, but
+	Delete the pack-files that are tracked by the MIDX file, but
 	have no objects referenced by the MIDX. Rewrite the MIDX file
 	afterward to remove all references to these pack-files.
 
-- 
@@ -70,7 +70,7 @@ verify::
 	Verify the contents of the MIDX file.
 
 expire::
-	Delete the pack-files that are tracked 	by the MIDX file, but
+	Delete the pack-files that are tracked by the MIDX file, but
 	have no objects referenced by the MIDX. Rewrite the MIDX file
 	afterward to remove all references to these pack-files.
 
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
  2022-09-20  1:55 ` [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20  1:55 ` [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack Taylor Blau
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

The `expire` sub-command of `git multi-pack-index` will never expire
`.keep` packs, regardless of whether or not any of their objects were
selected in the MIDX.

This has always been the case since 19575c7c8e (multi-pack-index:
implement 'expire' subcommand, 2019-06-10), which came after cff9711616
(multi-pack-index: prepare for 'expire' subcommand, 2019-06-10), when
this documentation was originally written.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index b4a2378cd8..11e6dc53e3 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -71,8 +71,9 @@ verify::
 
 expire::
 	Delete the pack-files that are tracked by the MIDX file, but
-	have no objects referenced by the MIDX. Rewrite the MIDX file
-	afterward to remove all references to these pack-files.
+	have no objects referenced by the MIDX (with the exception of
+	`.keep` packs). Rewrite the MIDX file afterward to remove all
+	references to these pack-files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
-- 
@@ -71,8 +71,9 @@ verify::
 
 expire::
 	Delete the pack-files that are tracked by the MIDX file, but
-	have no objects referenced by the MIDX. Rewrite the MIDX file
-	afterward to remove all references to these pack-files.
+	have no objects referenced by the MIDX (with the exception of
+	`.keep` packs). Rewrite the MIDX file afterward to remove all
+	references to these pack-files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
  2022-09-20  1:55 ` [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Taylor Blau
  2022-09-20  1:55 ` [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-22 14:08   ` Derrick Stolee
  2022-09-20  1:55 ` [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0` Taylor Blau
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

The `expire` sub-command unlinks any packs that are (a) contained in the
MIDX, but (b) have no objects referenced by the MIDX.

This sub-command ignores `.keep` packs, which remain on-disk even if
they have no objects referenced by the MIDX. Cruft packs, however,
aren't given the same treatment: if none of the objects contained in the
cruft pack are selected from the cruft pack by the MIDX, then the cruft
pack is eligible to be expired.

This is less than desireable, since the cruft pack has important
metadata about the individual object mtimes, which is useful to
determine how quickly an object should age out of the repository when
pruning.

Ordinarily, we wouldn't expect the contents of a cruft pack to
duplicated across non-cruft packs (and we'd expect to see the MIDX
select all cruft objects from other sources even less often). But
nonetheless, it is still possible to trick the `expire` sub-command into
removing the `.mtimes` file in this circumstance.

Teach the `expire` sub-command to ignore cruft packs in the same manner
as it does `.keep` packs, in order to keep their metadata around, even
when they are unreferenced by the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  4 ++--
 midx.c                                 |  2 +-
 t/t5319-multi-pack-index.sh            | 30 ++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 11e6dc53e3..3696506eb3 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -72,8 +72,8 @@ verify::
 expire::
 	Delete the pack-files that are tracked by the MIDX file, but
 	have no objects referenced by the MIDX (with the exception of
-	`.keep` packs). Rewrite the MIDX file afterward to remove all
-	references to these pack-files.
+	`.keep` packs and cruft packs). Rewrite the MIDX file afterward
+	to remove all references to these pack-files.
 
 repack::
 	Create a new pack-file containing objects in small pack-files
diff --git a/midx.c b/midx.c
index c27d0e5f15..bff5b99933 100644
--- a/midx.c
+++ b/midx.c
@@ -1839,7 +1839,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 		if (prepare_midx_pack(r, m, i))
 			continue;
 
-		if (m->packs[i]->pack_keep)
+		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
 			continue;
 
 		pack_name = xstrdup(m->packs[i]->pack_name);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index afbe93f162..2d51b09680 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -847,6 +847,36 @@ test_expect_success 'expire respects .keep files' '
 	)
 '
 
+test_expect_success 'expiring unreferenced cruft pack retains pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit --no-tag unreachable &&
+		unreachable=$(git rev-parse HEAD) &&
+
+		git reset --hard base &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+		mtimes="$(ls $objdir/pack/pack-*.mtimes)" &&
+
+		echo "base..$unreachable" >in &&
+		pack="$(git pack-objects --revs --delta-base-offset \
+			$objdir/pack/pack <in)" &&
+
+		# Preferring the contents of "$pack" will leave the
+		# cruft pack unreferenced (ie., none of the objects
+		# contained in the cruft pack will have their MIDX copy
+		# selected from the cruft pack).
+		git multi-pack-index write --preferred-pack="pack-$pack.pack" &&
+		git multi-pack-index expire &&
+
+		test_path_is_file "$mtimes"
+	)
+'
+
 test_expect_success 'repack --batch-size=0 repacks everything' '
 	cp -r dup dup2 &&
 	(
-- 
@@ -847,6 +847,36 @@ test_expect_success 'expire respects .keep files' '
 	)
 '
 
+test_expect_success 'expiring unreferenced cruft pack retains pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit --no-tag unreachable &&
+		unreachable=$(git rev-parse HEAD) &&
+
+		git reset --hard base &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+		mtimes="$(ls $objdir/pack/pack-*.mtimes)" &&
+
+		echo "base..$unreachable" >in &&
+		pack="$(git pack-objects --revs --delta-base-offset \
+			$objdir/pack/pack <in)" &&
+
+		# Preferring the contents of "$pack" will leave the
+		# cruft pack unreferenced (ie., none of the objects
+		# contained in the cruft pack will have their MIDX copy
+		# selected from the cruft pack).
+		git multi-pack-index write --preferred-pack="pack-$pack.pack" &&
+		git multi-pack-index expire &&
+
+		test_path_is_file "$mtimes"
+	)
+'
+
 test_expect_success 'repack --batch-size=0 repacks everything' '
 	cp -r dup dup2 &&
 	(
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0`
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (2 preceding siblings ...)
  2022-09-20  1:55 ` [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20  1:55 ` [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()` Taylor Blau
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

The `repack` sub-command of the `git multi-pack-index` builtin creates a
new pack aggregating smaller packs contained in the MIDX up to some
given `--batch-size`.

When `--batch-size=0`, this instructs the MIDX builtin to repack
everything contained in the MIDX into a single pack.

In similar spirit as a previous commit, it is undesirable to repack the
contents of a cruft pack in this step. Teach `repack` to ignore any
cruft pack(s) when `--batch-size=0` for the same reason(s).

(The case of a non-zero `--batch-size` will be handled in a subsequent
commit).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                      |  2 ++
 t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/midx.c b/midx.c
index bff5b99933..05bcfc6f02 100644
--- a/midx.c
+++ b/midx.c
@@ -1895,6 +1895,8 @@ static int fill_included_packs_all(struct repository *r,
 			continue;
 		if (!pack_kept_objects && m->packs[i]->pack_keep)
 			continue;
+		if (m->packs[i]->is_cruft)
+			continue;
 
 		include_pack[i] = 1;
 		count++;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2d51b09680..d967d92c20 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -784,6 +784,29 @@ test_expect_success 'repack creates a new pack' '
 	)
 '
 
+test_expect_success 'repack (all) ignores cruft pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit --no-tag unreachable &&
+
+		git reset --hard base &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+
+		git multi-pack-index write &&
+
+		find $objdir/pack | sort >before &&
+		git multi-pack-index repack --batch-size=0 &&
+		find $objdir/pack | sort >after &&
+
+		test_cmp before after
+	)
+'
+
 test_expect_success 'expire removes repacked packs' '
 	(
 		cd dup &&
-- 
@@ -784,6 +784,29 @@ test_expect_success 'repack creates a new pack' '
 	)
 '
 
+test_expect_success 'repack (all) ignores cruft pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit --no-tag unreachable &&
+
+		git reset --hard base &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+
+		git multi-pack-index write &&
+
+		find $objdir/pack | sort >before &&
+		git multi-pack-index repack --batch-size=0 &&
+		find $objdir/pack | sort >after &&
+
+		test_cmp before after
+	)
+'
+
 test_expect_success 'expire removes repacked packs' '
 	(
 		cd dup &&
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (3 preceding siblings ...)
  2022-09-20  1:55 ` [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0` Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20  1:55 ` [PATCH 6/7] midx.c: remove unnecessary loop condition Taylor Blau
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

Replace a direct invocation of Git's `xcalloc()` wrapper with the
`CALLOC_ARRAY()` macro instead.

The latter is preferred since it is more conventional in Git's codebase,
but also because it automatically picks the correct value for the record
size.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 05bcfc6f02..d703fc5a16 100644
--- a/midx.c
+++ b/midx.c
@@ -1912,9 +1912,11 @@ static int fill_included_packs_batch(struct repository *r,
 {
 	uint32_t i, packs_to_repack;
 	size_t total_size;
-	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
 
+	CALLOC_ARRAY(pack_info, m->num_packs);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
-- 
@@ -1912,9 +1912,11 @@ static int fill_included_packs_batch(struct repository *r,
 {
 	uint32_t i, packs_to_repack;
 	size_t total_size;
-	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
 
+	CALLOC_ARRAY(pack_info, m->num_packs);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/7] midx.c: remove unnecessary loop condition
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (4 preceding siblings ...)
  2022-09-20  1:55 ` [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()` Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20  1:55 ` [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size` Taylor Blau
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

The fill_included_packs_batch() routine is responsible for aggregating
objects in packs with a non-zero value for the `--batch-size` option of
the `git multi-pack-index repack` sub-command.

Since this routine is explicitly called only when `--batch-size` is
non-zero, there is no point in checking that this is the case in our
loop condition.

Remove the unnecessary part of this condition to avoid confusion.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index d703fc5a16..273704caed 100644
--- a/midx.c
+++ b/midx.c
@@ -1928,7 +1928,7 @@ static int fill_included_packs_batch(struct repository *r,
 		pack_info[i].mtime = m->packs[i]->mtime;
 	}
 
-	for (i = 0; batch_size && i < m->num_objects; i++) {
+	for (i = 0; i < m->num_objects; i++) {
 		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
 		pack_info[pack_int_id].referenced_objects++;
 	}
-- 
@@ -1928,7 +1928,7 @@ static int fill_included_packs_batch(struct repository *r,
 		pack_info[i].mtime = m->packs[i]->mtime;
 	}
 
-	for (i = 0; batch_size && i < m->num_objects; i++) {
+	for (i = 0; i < m->num_objects; i++) {
 		uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
 		pack_info[pack_int_id].referenced_objects++;
 	}
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size`
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (5 preceding siblings ...)
  2022-09-20  1:55 ` [PATCH 6/7] midx.c: remove unnecessary loop condition Taylor Blau
@ 2022-09-20  1:55 ` Taylor Blau
  2022-09-20 14:39 ` [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Christian Couder
  2022-09-22 14:14 ` Derrick Stolee
  8 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2022-09-20  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, derrickstolee, gregory.szorc

Apply similar treatment with respect to cruft packs as in a few commits
ago to `repack` with a non-zero `--batch-size`.

Since the case of a non-zero `--batch-size` is handled separately (in
`fill_included_packs_batch()` instead of `fill_included_packs_all()`), a
separate fix must be applied for this case.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                      |  2 ++
 t/t5319-multi-pack-index.sh | 41 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/midx.c b/midx.c
index 273704caed..3a8dcfe98e 100644
--- a/midx.c
+++ b/midx.c
@@ -1946,6 +1946,8 @@ static int fill_included_packs_batch(struct repository *r,
 			continue;
 		if (!pack_kept_objects && p->pack_keep)
 			continue;
+		if (p->is_cruft)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d967d92c20..b5f9b10922 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -807,6 +807,47 @@ test_expect_success 'repack (all) ignores cruft pack' '
 	)
 '
 
+test_expect_success 'repack (--batch-size) ignores cruft pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk 5 &&
+		test_commit --no-tag unreachable &&
+
+		git reset --hard HEAD^ &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+
+		test_commit four &&
+
+		find $objdir/pack -type f -name "*.pack" | sort >before &&
+		git repack -d &&
+		find $objdir/pack -type f -name "*.pack" | sort >after &&
+
+		pack="$(comm -13 before after)" &&
+		test_file_size "$pack" >sz &&
+		# Set --batch-size to twice the size of the pack created
+		# in the previous step, since this is enough to
+		# accommodate it and the cruft pack.
+		#
+		# This means that the MIDX machinery *could* combine the
+		# new and cruft packs together.
+		#
+		# We ensure that it does not below.
+		batch="$((($(cat sz) * 2)))" &&
+
+		git multi-pack-index write &&
+
+		find $objdir/pack | sort >before &&
+		git multi-pack-index repack --batch-size=$batch &&
+		find $objdir/pack | sort >after &&
+
+		test_cmp before after
+	)
+'
+
 test_expect_success 'expire removes repacked packs' '
 	(
 		cd dup &&
-- 
@@ -807,6 +807,47 @@ test_expect_success 'repack (all) ignores cruft pack' '
 	)
 '
 
+test_expect_success 'repack (--batch-size) ignores cruft pack' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk 5 &&
+		test_commit --no-tag unreachable &&
+
+		git reset --hard HEAD^ &&
+		git reflog expire --all --expire=all &&
+		git repack --cruft -d &&
+
+		test_commit four &&
+
+		find $objdir/pack -type f -name "*.pack" | sort >before &&
+		git repack -d &&
+		find $objdir/pack -type f -name "*.pack" | sort >after &&
+
+		pack="$(comm -13 before after)" &&
+		test_file_size "$pack" >sz &&
+		# Set --batch-size to twice the size of the pack created
+		# in the previous step, since this is enough to
+		# accommodate it and the cruft pack.
+		#
+		# This means that the MIDX machinery *could* combine the
+		# new and cruft packs together.
+		#
+		# We ensure that it does not below.
+		batch="$((($(cat sz) * 2)))" &&
+
+		git multi-pack-index write &&
+
+		find $objdir/pack | sort >before &&
+		git multi-pack-index repack --batch-size=$batch &&
+		find $objdir/pack | sort >after &&
+
+		test_cmp before after
+	)
+'
+
 test_expect_success 'expire removes repacked packs' '
 	(
 		cd dup &&
-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire`
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (6 preceding siblings ...)
  2022-09-20  1:55 ` [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size` Taylor Blau
@ 2022-09-20 14:39 ` Christian Couder
  2022-09-22 14:14 ` Derrick Stolee
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Couder @ 2022-09-20 14:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster, derrickstolee, gregory.szorc

On Tue, Sep 20, 2022 at 4:59 AM Taylor Blau <me@ttaylorr.com> wrote:

> This series teaches both sub-commands to avoid any cruft pack(s),
> preserving their metadata.

Maybe:

s/to avoid any cruft pack(s)/to avoid removing any cruft pack(s)/

or:

s/to avoid any cruft pack(s)/to avoid absorbing any cruft pack(s)/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack
  2022-09-20  1:55 ` [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack Taylor Blau
@ 2022-09-22 14:08   ` Derrick Stolee
  0 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee @ 2022-09-22 14:08 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster, gregory.szorc

On 9/19/2022 9:55 PM, Taylor Blau wrote:
> The `expire` sub-command unlinks any packs that are (a) contained in the
> MIDX, but (b) have no objects referenced by the MIDX.

It is important to note that this can only happen if all objects in
the pack have duplicates in other pack-files.
 
> This sub-command ignores `.keep` packs, which remain on-disk even if
> they have no objects referenced by the MIDX. Cruft packs, however,
> aren't given the same treatment: if none of the objects contained in the
> cruft pack are selected from the cruft pack by the MIDX, then the cruft
> pack is eligible to be expired.
> 
> This is less than desireable, since the cruft pack has important

s/desireable/desirable/
(according to my spell-checker)

> metadata about the individual object mtimes, which is useful to
> determine how quickly an object should age out of the repository when
> pruning.
>
> Ordinarily, we wouldn't expect the contents of a cruft pack to
> duplicated across non-cruft packs (and we'd expect to see the MIDX
> select all cruft objects from other sources even less often). But
> nonetheless, it is still possible to trick the `expire` sub-command into
> removing the `.mtimes` file in this circumstance.

I was initially unconvinced that this scenario was super-critical
to keeping the .mtimes file, but I was able to think of cases where
objects are duplicated out of the cruft pack due to de-thinning or
otherwise inefficiently packing unreachable objects into these
other pack-files.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire`
  2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
                   ` (7 preceding siblings ...)
  2022-09-20 14:39 ` [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Christian Couder
@ 2022-09-22 14:14 ` Derrick Stolee
  8 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee @ 2022-09-22 14:14 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster, gregory.szorc

On 9/19/2022 9:55 PM, Taylor Blau wrote:
> This series fixes a pair of bugs that were originally pointed out by
> Gregory Szorc in [1].
> 
> Namely, that both `git multi-pack-index repack` and `git
> multi-pack-index expire` can cause us to "absorb" the cruft pack,
> distributing its objects into a new pack, and removing its metadata.
> 
> This is worth avoiding, since even though it doesn't result in object
> corruption, this bug removes semi-important metadata contained in the
> .mtimes file, which controls how fast objects leave the repository
> during a pruning GC.
> 
> This series teaches both sub-commands to avoid any cruft pack(s),
> preserving their metadata.

I gave these patches a careful review. They all look correct and
are closing possible gaps in this mixed mode of repacking with
different strategies.

I do think the 'expire' situation is very rare, but it's best to
be safe here.

Overall, the end result is that users could set up their background
maintenance as follows:

 1. maintenance.incremental-repack.schedule=daily to do an
    incremental repack every night.

 2. maintenance.gc.schedule=weekly to do a GC once a week.

 3. With gc.cruftPacks=true and a gc.pruneExpire value, the cruft
    packs will be written on that weekly job, only expiring old
    enough objects.

Before this series, that weekly GC creating cruft packs ran the
(very likely) risk of those cruft packs being rewritten with the
rest of the new object data, losing the .mtimes data.

(While the proposed schedule above is now possible, I don't
recommend it as a default. Not only will it delete objects
automatically, but the GC task is very expensive for very large
repositories and the incremental-repack task was explicitly
created to avoid that huge cost in those cases.)

Thanks for this update!
-Stolee

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-22 14:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-20  1:55 [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Taylor Blau
2022-09-20  1:55 ` [PATCH 1/7] Documentation/git-multi-pack-index.txt: fix typo Taylor Blau
2022-09-20  1:55 ` [PATCH 2/7] Documentation/git-multi-pack-index.txt: clarify expire behavior Taylor Blau
2022-09-20  1:55 ` [PATCH 3/7] midx.c: prevent `expire` from removing the cruft pack Taylor Blau
2022-09-22 14:08   ` Derrick Stolee
2022-09-20  1:55 ` [PATCH 4/7] midx.c: avoid cruft packs with `repack --batch-size=0` Taylor Blau
2022-09-20  1:55 ` [PATCH 5/7] midx.c: replace `xcalloc()` with `CALLOC_ARRAY()` Taylor Blau
2022-09-20  1:55 ` [PATCH 6/7] midx.c: remove unnecessary loop condition Taylor Blau
2022-09-20  1:55 ` [PATCH 7/7] midx.c: avoid cruft packs with non-zero `repack --batch-size` Taylor Blau
2022-09-20 14:39 ` [PATCH 0/7] midx: ignore cruft pack with `repack`, `expire` Christian Couder
2022-09-22 14:14 ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).