git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks
@ 2022-05-20 19:01 Taylor Blau
  2022-05-20 19:01 ` [PATCH 1/3] repack: respect --keep-pack with geometric repack Taylor Blau
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Taylor Blau @ 2022-05-20 19:01 UTC (permalink / raw)
  To: git; +Cc: vdye, gitster

This series fixes two issues that Victoria and I noticed while working on an
unrelated issue yesterday.

  - The first patch comes from Victoria's earlier submission[1], and addresses
    an issue where packs specified as kept via the `--keep-pack` option could
    potentially be removed (without rewriting their objects) during a
    `--geometric` repack.

    The first patch is Victoria's alone, with some minor fixups applied from my
    review in [2]. It's included in this series since it's related, and avoids
    any conflicts when merging.

  - The latter two patches are mine, and address an issue where specifying a
    `--max-pack-size` value during a `--geometric` repack could result in object
    loss because of a false positive in our "did we write a pack with this
    name?" check (which can occur when the list of packs we wrote isn't sorted).

    The first of these two patches demonstrates the issue (done in a separate
    patch, since the scenario is quite involved), and the second patch fixes the
    bug.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/pull.1235.git.1653064572170.gitgitgadget@gmail.com/
[2]: https://lore.kernel.org/git/YofJLv8+x5J7yPmf@nand.local/

Taylor Blau (2):
  t7703: demonstrate object corruption with pack.packSizeLimit
  builtin/repack.c: ensure that `names` is sorted

Victoria Dye (1):
  repack: respect --keep-pack with geometric repack

 builtin/repack.c            | 49 ++++++++++++++++++------
 t/t7703-repack-geometric.sh | 75 +++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+), 12 deletions(-)

-- 
2.36.1.94.gb0d54bedca

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] repack: respect --keep-pack with geometric repack
  2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
@ 2022-05-20 19:01 ` Taylor Blau
  2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2022-05-20 19:01 UTC (permalink / raw)
  To: git; +Cc: vdye, gitster

From: Victoria Dye <vdye@github.com>

Update 'repack' to ignore packs named on the command line with the
'--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
command line-kept packs the same way it treats packs with an on-disk '.keep'
file (that is, skip the pack and do not include it in the 'geometry'
structure).

Without this handling, a '--keep-pack' pack would be included in the
'geometry' structure. If the pack is *before* the geometry split line (with
at least one other pack and/or loose objects present), 'repack' assumes the
pack's contents are "rolled up" into another pack via 'pack-objects'.
However, because the internally-invoked 'pack-objects' properly excludes
'--keep-pack' objects, any new pack it creates will not contain the kept
objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
it assumes 'pack-objects' created a new pack with its contents), resulting
in possible object loss and repository corruption.

Add a test ensuring that '--keep-pack' packs are now appropriately handled.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c            | 46 ++++++++++++++++++++++++++++---------
 t/t7703-repack-geometric.sh | 28 ++++++++++++++++++++++
 2 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index d1a563d5b6..ea56e92ad5 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -137,6 +137,8 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
 			string_list_append_nodup(fname_nonkept_list, fname);
 	}
 	closedir(dir);
+
+	string_list_sort(fname_kept_list);
 }
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
@@ -332,17 +334,38 @@ static int geometry_cmp(const void *va, const void *vb)
 	return 0;
 }
 
-static void init_pack_geometry(struct pack_geometry **geometry_p)
+static void init_pack_geometry(struct pack_geometry **geometry_p,
+			       struct string_list *existing_kept_packs)
 {
 	struct packed_git *p;
 	struct pack_geometry *geometry;
+	struct strbuf buf = STRBUF_INIT;
 
 	*geometry_p = xcalloc(1, sizeof(struct pack_geometry));
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
-		if (!pack_kept_objects && p->pack_keep)
-			continue;
+		if (!pack_kept_objects) {
+			/*
+			 * Any pack that has its pack_keep bit set will appear
+			 * in existing_kept_packs below, but this saves us from
+			 * doing a more expensive check.
+			 */
+			if (p->pack_keep)
+				continue;
+
+			/*
+			 * The pack may be kept via the --keep-pack option;
+			 * check 'existing_kept_packs' to determine whether to
+			 * ignore it.
+			 */
+			strbuf_reset(&buf);
+			strbuf_addstr(&buf, pack_basename(p));
+			strbuf_strip_suffix(&buf, ".pack");
+
+			if (string_list_has_string(existing_kept_packs, buf.buf))
+				continue;
+		}
 
 		ALLOC_GROW(geometry->pack,
 			   geometry->pack_nr + 1,
@@ -353,6 +376,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p)
 	}
 
 	QSORT(geometry->pack, geometry->pack_nr, geometry_cmp);
+	strbuf_release(&buf);
 }
 
 static void split_pack_geometry(struct pack_geometry *geometry, int factor)
@@ -714,17 +738,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		strbuf_release(&path);
 	}
 
+	packdir = mkpathdup("%s/pack", get_object_directory());
+	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
+	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
+
+	collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
+			       &keep_pack_list);
+
 	if (geometric_factor) {
 		if (pack_everything)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
-		init_pack_geometry(&geometry);
+		init_pack_geometry(&geometry, &existing_kept_packs);
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
-	packdir = mkpathdup("%s/pack", get_object_directory());
-	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
-	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
-
 	sigchain_push_common(remove_pack_on_signal);
 
 	prepare_pack_objects(&cmd, &po_args);
@@ -764,9 +791,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (use_delta_islands)
 		strvec_push(&cmd.args, "--delta-islands");
 
-	collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs,
-			       &keep_pack_list);
-
 	if (pack_everything & ALL_INTO_ONE) {
 		repack_promisor_objects(&po_args, &names);
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index bdbbcbf1ec..91bb2b37a8 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -180,6 +180,34 @@ test_expect_success '--geometric ignores kept packs' '
 	)
 '
 
+test_expect_success '--geometric ignores --keep-pack packs' '
+	git init geometric &&
+	test_when_finished "rm -fr geometric" &&
+	(
+		cd geometric &&
+
+		# Create two equal-sized packs
+		test_commit kept && # 3 objects
+		git repack -d &&
+		test_commit pack && # 3 objects
+		git repack -d &&
+
+		find $objdir/pack -type f -name "*.pack" | sort >packs.before &&
+		git repack --geometric 2 -dm \
+			--keep-pack="$(basename "$(head -n 1 packs.before)")" >out &&
+		find $objdir/pack -type f -name "*.pack" | sort >packs.after &&
+
+		# Packs should not have changed (only one non-kept pack, no
+		# loose objects), but $midx should now exist.
+		grep "Nothing new to pack" out &&
+		test_path_is_file $midx &&
+
+		test_cmp packs.before packs.after &&
+
+		git fsck
+	)
+'
+
 test_expect_success '--geometric chooses largest MIDX preferred pack' '
 	git init geometric &&
 	test_when_finished "rm -fr geometric" &&
-- 
2.36.1.94.gb0d54bedca


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit
  2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
  2022-05-20 19:01 ` [PATCH 1/3] repack: respect --keep-pack with geometric repack Taylor Blau
@ 2022-05-20 19:01 ` Taylor Blau
  2022-05-20 19:42   ` Victoria Dye
  2022-05-20 20:54   ` Junio C Hamano
  2022-05-20 19:01 ` [PATCH 3/3] builtin/repack.c: ensure that `names` is sorted Taylor Blau
  2022-05-20 19:46 ` [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Victoria Dye
  3 siblings, 2 replies; 10+ messages in thread
From: Taylor Blau @ 2022-05-20 19:01 UTC (permalink / raw)
  To: git; +Cc: vdye, gitster

When doing a `--geometric=<d>` repack, `git repack` determines a
splitting point among packs ordered by their object count such that:

  - each pack above the split has at least `<d>` times as many objects
    as the next-largest pack by object count, and
  - the first pack above the split has at least `<d>` times as many
    object as the sum of all packs below the split line combined

`git repack` then creates a pack containing all of the objects contained
in packs below the split line by running `git pack-objects
--stdin-packs` underneath. Once packs are moved into place, then any
packs below the split line are removed, since their objects were just
combined into a new pack.

But `git repack` tries to be careful to avoid removing a pack that it
just wrote, by checking:

    struct packed_git *p = geometry->pack[i];
    if (string_list_has_string(&names, hash_to_hex(p->hash)))
      continue;

in the `delete_redundant` and `geometric` conditional towards the end of
`cmd_repack`.

But it's possible to trick `git repack` into not recognizing a pack that
it just wrote when `names` is out-of-order (which violates
`string_list_has_string()`'s assumption that the list is sorted and thus
binary search-able).

When this happens in just the right circumstances, it is possible to
remove a pack that we just wrote, leading to object corruption.

Luckily, this is quite difficult to provoke in practice (for a couple of
reasons):

  - we ordinarily write just one pack, so `names` usually contains just
    one entry, and is thus sorted
  - when we do write more than one pack (e.g., due to `--max-pack-size`)
    we have to: (a) write a pack identical to one that already
    exists, (b) have that pack be below the split line, and (c) have
    the set of packs written by `pack-objects` occur in an order which
    tricks `string_list_has_string()`.

Demonstrate the above scenario in a failing test, which causes `git
repack --geometric` to write a pack which occurs below the split line,
_and_ fail to recognize that it wrote that pack.

The following patch will fix this bug.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 91bb2b37a8..2cd1de7295 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly'
 GIT_TEST_MULTI_PACK_INDEX=0
 
 objdir=.git/objects
+packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
 test_expect_success '--geometric with no packs' '
@@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
 	)
 '
 
+test_expect_failure '--geometric with pack.packSizeLimit' '
+	git init pack-rewrite &&
+	test_when_finished "rm -fr pack-rewrite" &&
+	(
+		cd pack-rewrite &&
+
+		test-tool genrandom foo 1048576 >foo &&
+		test-tool genrandom bar 1048576 >bar &&
+
+		git add foo bar &&
+		test_tick &&
+		git commit -m base &&
+
+		git rev-parse HEAD:foo HEAD:bar >p1.objects &&
+		git rev-parse HEAD HEAD^{tree} >p2.objects &&
+
+		# These two packs each contain two objects, so the following
+		# `--geometric` repack will try to combine them.
+		p1="$(git pack-objects $packdir/pack <p1.objects)" &&
+		p2="$(git pack-objects $packdir/pack <p2.objects)" &&
+
+		# Remove any loose objects in packs, since we do not want extra
+		# copies around (which would mask over potential object
+		# corruption issues).
+		git prune-packed &&
+
+		# Both p1 and p2 will be rolled up, but pack-objects will write
+		# three packs:
+		#
+		#   - one containing object "foo",
+		#   - another containing object "bar",
+		#   - a final pack containing the commit and tree objects
+		#     (identical to p2 above)
+		git repack --geometric 2 -d --max-pack-size=1048576 &&
+
+		# Ensure `repack` can detect that the third pack it wrote
+		# (containing just the tree and commit objects) was identical to
+		# one that was below the geometric split, so that we can save it
+		# from deletion.
+		#
+		# If `repack` fails to do that, we will incorrectly delete p2,
+		# causing object corruption.
+		git fsck
+	)
+'
+
 test_done
-- 
2.36.1.94.gb0d54bedca


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] builtin/repack.c: ensure that `names` is sorted
  2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
  2022-05-20 19:01 ` [PATCH 1/3] repack: respect --keep-pack with geometric repack Taylor Blau
  2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
@ 2022-05-20 19:01 ` Taylor Blau
  2022-05-20 19:46 ` [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Victoria Dye
  3 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2022-05-20 19:01 UTC (permalink / raw)
  To: git; +Cc: vdye, gitster

The previous patch demonstrates a scenario where the list of packs
written by `pack-objects` (and stored in the `names` string_list) is
out-of-order, and can thus cause us to delete packs we shouldn't.

This patch resolves that bug by ensuring that `names` is sorted in all
cases, not just when

    delete_redundant && pack_everything & ALL_INTO_ONE

is true.

Because we did sort `names` in that case (which, prior to `--geometric`
repacks, was the only time we would actually delete packs, this is only
a bug for `--geometric` repacks.

It would be sufficient to only sort `names` when `delete_redundant` is
set to a non-zero value. But sorting a small list of strings is cheap,
and it is defensive against future calls to `string_list_has_string()`
on this list.

Co-discovered-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c            | 3 ++-
 t/t7703-repack-geometric.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index ea56e92ad5..0e4aae80c0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -856,6 +856,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (!names.nr && !po_args.quiet)
 		printf_ln(_("Nothing new to pack."));
 
+	string_list_sort(&names);
+
 	for_each_string_list_item(item, &names) {
 		item->util = (void *)(uintptr_t)populate_pack_exts(item->string);
 	}
@@ -896,7 +898,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	if (delete_redundant && pack_everything & ALL_INTO_ONE) {
 		const int hexsz = the_hash_algo->hexsz;
-		string_list_sort(&names);
 		for_each_string_list_item(item, &existing_nonkept_packs) {
 			char *sha1;
 			size_t len = strlen(item->string);
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 2cd1de7295..da87f8b2d8 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -231,7 +231,7 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
 	)
 '
 
-test_expect_failure '--geometric with pack.packSizeLimit' '
+test_expect_success '--geometric with pack.packSizeLimit' '
 	git init pack-rewrite &&
 	test_when_finished "rm -fr pack-rewrite" &&
 	(
-- 
2.36.1.94.gb0d54bedca

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit
  2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
@ 2022-05-20 19:42   ` Victoria Dye
  2022-05-20 23:22     ` Taylor Blau
  2022-05-20 20:54   ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Victoria Dye @ 2022-05-20 19:42 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster

Taylor Blau wrote:
> When doing a `--geometric=<d>` repack, `git repack` determines a
> splitting point among packs ordered by their object count such that:
> 
>   - each pack above the split has at least `<d>` times as many objects
>     as the next-largest pack by object count, and
>   - the first pack above the split has at least `<d>` times as many
>     object as the sum of all packs below the split line combined
> 
> `git repack` then creates a pack containing all of the objects contained
> in packs below the split line by running `git pack-objects
> --stdin-packs` underneath. Once packs are moved into place, then any
> packs below the split line are removed, since their objects were just
> combined into a new pack.
> 
> But `git repack` tries to be careful to avoid removing a pack that it
> just wrote, by checking:
> 
>     struct packed_git *p = geometry->pack[i];
>     if (string_list_has_string(&names, hash_to_hex(p->hash)))
>       continue;
> 
> in the `delete_redundant` and `geometric` conditional towards the end of
> `cmd_repack`.
> 
> But it's possible to trick `git repack` into not recognizing a pack that
> it just wrote when `names` is out-of-order (which violates
> `string_list_has_string()`'s assumption that the list is sorted and thus
> binary search-able).
> 
> When this happens in just the right circumstances, it is possible to
> remove a pack that we just wrote, leading to object corruption.
> 
> Luckily, this is quite difficult to provoke in practice (for a couple of
> reasons):
> 
>   - we ordinarily write just one pack, so `names` usually contains just
>     one entry, and is thus sorted
>   - when we do write more than one pack (e.g., due to `--max-pack-size`)
>     we have to: (a) write a pack identical to one that already
>     exists, (b) have that pack be below the split line, and (c) have
>     the set of packs written by `pack-objects` occur in an order which
>     tricks `string_list_has_string()`.
> 
> Demonstrate the above scenario in a failing test, which causes `git
> repack --geometric` to write a pack which occurs below the split line,
> _and_ fail to recognize that it wrote that pack.
> 
> The following patch will fix this bug.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 91bb2b37a8..2cd1de7295 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly'
>  GIT_TEST_MULTI_PACK_INDEX=0
>  
>  objdir=.git/objects
> +packdir=$objdir/pack
>  midx=$objdir/pack/multi-pack-index
>  
>  test_expect_success '--geometric with no packs' '
> @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
>  	)
>  '
>  
> +test_expect_failure '--geometric with pack.packSizeLimit' '
> +	git init pack-rewrite &&
> +	test_when_finished "rm -fr pack-rewrite" &&
> +	(
> +		cd pack-rewrite &&
> +
> +		test-tool genrandom foo 1048576 >foo &&
> +		test-tool genrandom bar 1048576 >bar &&
> +

I was a bit worried about this test being flaky in the future (relying on
particular pseudorandomly-generated file contents and the subsequent
ordering of hashes on the packs). But, since neither 'genrandom' nor the
pack hash generation seem likely to change (and I can't come up with an
alternative to this approach anyway), the test looks good as-is. 

> +		git add foo bar &&
> +		test_tick &&
> +		git commit -m base &&
> +
> +		git rev-parse HEAD:foo HEAD:bar >p1.objects &&
> +		git rev-parse HEAD HEAD^{tree} >p2.objects &&
> +
> +		# These two packs each contain two objects, so the following
> +		# `--geometric` repack will try to combine them.
> +		p1="$(git pack-objects $packdir/pack <p1.objects)" &&
> +		p2="$(git pack-objects $packdir/pack <p2.objects)" &&
> +
> +		# Remove any loose objects in packs, since we do not want extra
> +		# copies around (which would mask over potential object
> +		# corruption issues).
> +		git prune-packed &&
> +
> +		# Both p1 and p2 will be rolled up, but pack-objects will write
> +		# three packs:
> +		#
> +		#   - one containing object "foo",
> +		#   - another containing object "bar",
> +		#   - a final pack containing the commit and tree objects
> +		#     (identical to p2 above)
> +		git repack --geometric 2 -d --max-pack-size=1048576 &&
> +
> +		# Ensure `repack` can detect that the third pack it wrote
> +		# (containing just the tree and commit objects) was identical to
> +		# one that was below the geometric split, so that we can save it
> +		# from deletion.
> +		#
> +		# If `repack` fails to do that, we will incorrectly delete p2,
> +		# causing object corruption.
> +		git fsck
> +	)
> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks
  2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
                   ` (2 preceding siblings ...)
  2022-05-20 19:01 ` [PATCH 3/3] builtin/repack.c: ensure that `names` is sorted Taylor Blau
@ 2022-05-20 19:46 ` Victoria Dye
  2022-05-20 20:05   ` Derrick Stolee
  3 siblings, 1 reply; 10+ messages in thread
From: Victoria Dye @ 2022-05-20 19:46 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster

Taylor Blau wrote:
> This series fixes two issues that Victoria and I noticed while working on an
> unrelated issue yesterday.
> 
>   - The first patch comes from Victoria's earlier submission[1], and addresses
>     an issue where packs specified as kept via the `--keep-pack` option could
>     potentially be removed (without rewriting their objects) during a
>     `--geometric` repack.
> 
>     The first patch is Victoria's alone, with some minor fixups applied from my
>     review in [2]. It's included in this series since it's related, and avoids
>     any conflicts when merging.
> 

I'm happy with the fixes you applied here and don't have anything else I'd
like to add this patch.

>   - The latter two patches are mine, and address an issue where specifying a
>     `--max-pack-size` value during a `--geometric` repack could result in object
>     loss because of a false positive in our "did we write a pack with this
>     name?" check (which can occur when the list of packs we wrote isn't sorted).
> 
>     The first of these two patches demonstrates the issue (done in a separate
>     patch, since the scenario is quite involved), and the second patch fixes the
>     bug.
> 

I was worried about the robustness of the test, but some deeper diving
revealed that it should produce consistent results. Otherwise, the fix
itself is a straightforward (albeit hard to find in the first place). These
two patches look good to me!

> Thanks in advance for your review.
> 
> [1]: https://lore.kernel.org/git/pull.1235.git.1653064572170.gitgitgadget@gmail.com/
> [2]: https://lore.kernel.org/git/YofJLv8+x5J7yPmf@nand.local/
> 
> Taylor Blau (2):
>   t7703: demonstrate object corruption with pack.packSizeLimit
>   builtin/repack.c: ensure that `names` is sorted
> 
> Victoria Dye (1):
>   repack: respect --keep-pack with geometric repack
> 
>  builtin/repack.c            | 49 ++++++++++++++++++------
>  t/t7703-repack-geometric.sh | 75 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 112 insertions(+), 12 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks
  2022-05-20 19:46 ` [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Victoria Dye
@ 2022-05-20 20:05   ` Derrick Stolee
  2022-05-20 20:55     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Derrick Stolee @ 2022-05-20 20:05 UTC (permalink / raw)
  To: Victoria Dye, Taylor Blau, git; +Cc: gitster

On 5/20/2022 3:46 PM, Victoria Dye wrote:
> Taylor Blau wrote:
>> This series fixes two issues that Victoria and I noticed while working on an
>> unrelated issue yesterday.
>>
>>   - The first patch comes from Victoria's earlier submission[1], and addresses
>>     an issue where packs specified as kept via the `--keep-pack` option could
>>     potentially be removed (without rewriting their objects) during a
>>     `--geometric` repack.
>>
>>     The first patch is Victoria's alone, with some minor fixups applied from my
>>     review in [2]. It's included in this series since it's related, and avoids
>>     any conflicts when merging.
>>
> 
> I'm happy with the fixes you applied here and don't have anything else I'd
> like to add this patch.
> 
>>   - The latter two patches are mine, and address an issue where specifying a
>>     `--max-pack-size` value during a `--geometric` repack could result in object
>>     loss because of a false positive in our "did we write a pack with this
>>     name?" check (which can occur when the list of packs we wrote isn't sorted).
>>
>>     The first of these two patches demonstrates the issue (done in a separate
>>     patch, since the scenario is quite involved), and the second patch fixes the
>>     bug.
>>
> 
> I was worried about the robustness of the test, but some deeper diving
> revealed that it should produce consistent results. Otherwise, the fix
> itself is a straightforward (albeit hard to find in the first place). These
> two patches look good to me!
> 
>> Thanks in advance for your review.

I'm chiming in to say that I also read these patches and think they
are good. Couldn't find a way to improve them.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit
  2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
  2022-05-20 19:42   ` Victoria Dye
@ 2022-05-20 20:54   ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-05-20 20:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, vdye

Taylor Blau <me@ttaylorr.com> writes:

> When this happens in just the right circumstances, it is possible to
> remove a pack that we just wrote, leading to object corruption.
>
> Luckily, this is quite difficult to provoke in practice (for a couple of
> reasons):

I'd call it unlucky that it is hard to trigger, actually ;-) With a
system like Git with more than a few handful of users, even an issue
that is hard-to-trigger is bound to hit somebody every day, but it
it is hard to trigger without the right condition, it is hard to
debug.

Thanks for finding and fixing (presumably we need a call to keep the
list sorted at the right places?)

>   - we ordinarily write just one pack, so `names` usually contains just
>     one entry, and is thus sorted
>   - when we do write more than one pack (e.g., due to `--max-pack-size`)
>     we have to: (a) write a pack identical to one that already
>     exists, (b) have that pack be below the split line, and (c) have
>     the set of packs written by `pack-objects` occur in an order which
>     tricks `string_list_has_string()`.
>
> Demonstrate the above scenario in a failing test, which causes `git
> repack --geometric` to write a pack which occurs below the split line,
> _and_ fail to recognize that it wrote that pack.
>
> The following patch will fix this bug.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/t7703-repack-geometric.sh | 47 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 91bb2b37a8..2cd1de7295 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -7,6 +7,7 @@ test_description='git repack --geometric works correctly'
>  GIT_TEST_MULTI_PACK_INDEX=0
>  
>  objdir=.git/objects
> +packdir=$objdir/pack
>  midx=$objdir/pack/multi-pack-index
>  
>  test_expect_success '--geometric with no packs' '
> @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
>  	)
>  '
>  
> +test_expect_failure '--geometric with pack.packSizeLimit' '
> +	git init pack-rewrite &&
> +	test_when_finished "rm -fr pack-rewrite" &&
> +	(
> +		cd pack-rewrite &&
> +
> +		test-tool genrandom foo 1048576 >foo &&
> +		test-tool genrandom bar 1048576 >bar &&
> +
> +		git add foo bar &&
> +		test_tick &&
> +		git commit -m base &&
> +
> +		git rev-parse HEAD:foo HEAD:bar >p1.objects &&
> +		git rev-parse HEAD HEAD^{tree} >p2.objects &&
> +
> +		# These two packs each contain two objects, so the following
> +		# `--geometric` repack will try to combine them.
> +		p1="$(git pack-objects $packdir/pack <p1.objects)" &&
> +		p2="$(git pack-objects $packdir/pack <p2.objects)" &&
> +
> +		# Remove any loose objects in packs, since we do not want extra
> +		# copies around (which would mask over potential object
> +		# corruption issues).
> +		git prune-packed &&
> +
> +		# Both p1 and p2 will be rolled up, but pack-objects will write
> +		# three packs:
> +		#
> +		#   - one containing object "foo",
> +		#   - another containing object "bar",
> +		#   - a final pack containing the commit and tree objects
> +		#     (identical to p2 above)
> +		git repack --geometric 2 -d --max-pack-size=1048576 &&
> +
> +		# Ensure `repack` can detect that the third pack it wrote
> +		# (containing just the tree and commit objects) was identical to
> +		# one that was below the geometric split, so that we can save it
> +		# from deletion.
> +		#
> +		# If `repack` fails to do that, we will incorrectly delete p2,
> +		# causing object corruption.
> +		git fsck
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks
  2022-05-20 20:05   ` Derrick Stolee
@ 2022-05-20 20:55     ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-05-20 20:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, Taylor Blau, git

Derrick Stolee <derrickstolee@github.com> writes:

>> I was worried about the robustness of the test, but some deeper diving
>> revealed that it should produce consistent results. Otherwise, the fix
>> itself is a straightforward (albeit hard to find in the first place). These
>> two patches look good to me!
>> 
>>> Thanks in advance for your review.
>
> I'm chiming in to say that I also read these patches and think they
> are good. Couldn't find a way to improve them.

Thanks, all.  Queued.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit
  2022-05-20 19:42   ` Victoria Dye
@ 2022-05-20 23:22     ` Taylor Blau
  0 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2022-05-20 23:22 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Taylor Blau, git, gitster

On Fri, May 20, 2022 at 12:42:58PM -0700, Victoria Dye wrote:
> > @@ -230,4 +231,50 @@ test_expect_success '--geometric chooses largest MIDX preferred pack' '
> >  	)
> >  '
> >
> > +test_expect_failure '--geometric with pack.packSizeLimit' '
> > +	git init pack-rewrite &&
> > +	test_when_finished "rm -fr pack-rewrite" &&
> > +	(
> > +		cd pack-rewrite &&
> > +
> > +		test-tool genrandom foo 1048576 >foo &&
> > +		test-tool genrandom bar 1048576 >bar &&
> > +
>
> I was a bit worried about this test being flaky in the future (relying on
> particular pseudorandomly-generated file contents and the subsequent
> ordering of hashes on the packs). But, since neither 'genrandom' nor the
> pack hash generation seem likely to change (and I can't come up with an
> alternative to this approach anyway), the test looks good as-is.

Note that the "random" contents aren't so random (though I suspect
you're talking about _how_ genrandom interprets the seed changing), and
that we're really only depending on genrandom here to create a large
amount of data.

We are relying on the pack hashes appearing in a certain order, so in
that sense this test could "break" even if pack-objects reported the
packs it wrote in a different order.

But I agree in the sense that I also cannot come up with a less brittle
approach for writing this test ;).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-20 23:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20 19:01 [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Taylor Blau
2022-05-20 19:01 ` [PATCH 1/3] repack: respect --keep-pack with geometric repack Taylor Blau
2022-05-20 19:01 ` [PATCH 2/3] t7703: demonstrate object corruption with pack.packSizeLimit Taylor Blau
2022-05-20 19:42   ` Victoria Dye
2022-05-20 23:22     ` Taylor Blau
2022-05-20 20:54   ` Junio C Hamano
2022-05-20 19:01 ` [PATCH 3/3] builtin/repack.c: ensure that `names` is sorted Taylor Blau
2022-05-20 19:46 ` [PATCH 0/3] repack: handle --keep-pack, --max-pack-size for geometric repacks Victoria Dye
2022-05-20 20:05   ` Derrick Stolee
2022-05-20 20:55     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).