git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, avarab@gmail.com,
	jrnieder@gmail.com, Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 3/6] pack-objects: add --sparse option
Date: Fri, 11 Jan 2019 14:30:35 -0800	[thread overview]
Message-ID: <xmqqmuo6yghg.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <ab733daff5398fd07ff051c323f51b70efbc2e57.1544822533.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Fri, 14 Dec 2018 13:22:19 -0800 (PST)")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Add a '--sparse' option flag to the pack-objects builtin. This
> allows the user to specify that they want to use the new logic
> for walking trees. This logic currently does not differ from the
> existing output, but will in a later change.
>
> Create a new test script, t5322-pack-objects-sparse.sh, to ensure
> the object list that is selected matches what we expect. When we
> update the logic to walk in a sparse fashion, the final test will
> be updated to show the extra objects that are added.

Somehow checking the "these are exactly what we expect" feels
brittle.  In a history with three relevant commits A---B---C,
packing B..C could omit trees and blobs in C that appear in A but
not in B, but traditionally, because we stop traversal at B and do
not even look at A, we do not notice that such objects that need to
complete C's tree are already available in a repository that has B.
The approach of test in this patch feels similar to saying "we must
send these duplicates because we must stay dumb", even though with a
perfect knowledge about the history, e.g. bitmap, we would be able
to omit these objects in A that appear in C but not in B.

I think we want to test test both "are we sending enough, even
though we might be wasting some bandwidth by not noticing the other
side already have some?" and "are we still avoiding from becoming
overly stupid, even though we may be cheating to save traversal cost
and risking to redundantly send some objects?"

IOW, a set of tests to make sure that truly new objects are all sent
by seeing what is sent is a strict superset of what we expect.  And
another set of tests to make sure that objects that are so obviously
(this criterion may be highly subjective) be present in the
receiving repository (e.g. the tree object of commit B and what it
contains when seinding B..C) are never sent, even when using an
algorithm that are tuned for traversal cost over bandwidth
consumption.

The code change in this step looks all trivially good, and it may
want to be squashed into the previous step to become a single patch.
Otherwise, [2/6] would be adding a dead code that nobody exercises
until this step.

Thanks.


> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-pack-objects.txt |  11 ++-
>  builtin/pack-objects.c             |   5 +-
>  t/t5322-pack-objects-sparse.sh     | 115 +++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+), 2 deletions(-)
>  create mode 100755 t/t5322-pack-objects-sparse.sh
>
> diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
> index 40c825c381..e45f3e680d 100644
> --- a/Documentation/git-pack-objects.txt
> +++ b/Documentation/git-pack-objects.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>  	[--local] [--incremental] [--window=<n>] [--depth=<n>]
>  	[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
>  	[--stdout [--filter=<filter-spec>] | base-name]
> -	[--shallow] [--keep-true-parents] < object-list
> +	[--shallow] [--keep-true-parents] [--sparse] < object-list
>  
>  
>  DESCRIPTION
> @@ -196,6 +196,15 @@ depth is 4095.
>  	Add --no-reuse-object if you want to force a uniform compression
>  	level on all data no matter the source.
>  
> +--sparse::
> +	Use the "sparse" algorithm to determine which objects to include in
> +	the pack, when combined with the "--revs" option. This algorithm
> +	only walks trees that appear in paths that introduce new objects.
> +	This can have significant performance benefits when computing
> +	a pack to send a small change. However, it is possible that extra
> +	objects are added to the pack-file if the included commits contain
> +	certain types of direct renames.
> +
>  --thin::
>  	Create a "thin" pack by omitting the common objects between a
>  	sender and a receiver in order to reduce network transfer. This
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5f70d840a7..7d5b0735e3 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -84,6 +84,7 @@ static unsigned long pack_size_limit;
>  static int depth = 50;
>  static int delta_search_threads;
>  static int pack_to_stdout;
> +static int sparse;
>  static int thin;
>  static int num_preferred_base;
>  static struct progress *progress_state;
> @@ -3135,7 +3136,7 @@ static void get_object_list(int ac, const char **av)
>  
>  	if (prepare_revision_walk(&revs))
>  		die(_("revision walk setup failed"));
> -	mark_edges_uninteresting(&revs, show_edge, 0);
> +	mark_edges_uninteresting(&revs, show_edge, sparse);
>  
>  	if (!fn_show_object)
>  		fn_show_object = show_object;
> @@ -3292,6 +3293,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  		{ OPTION_CALLBACK, 0, "unpack-unreachable", NULL, N_("time"),
>  		  N_("unpack unreachable objects newer than <time>"),
>  		  PARSE_OPT_OPTARG, option_parse_unpack_unreachable },
> +		OPT_BOOL(0, "sparse", &sparse,
> +			 N_("use the sparse reachability algorithm")),
>  		OPT_BOOL(0, "thin", &thin,
>  			 N_("create thin packs")),
>  		OPT_BOOL(0, "shallow", &shallow,
> diff --git a/t/t5322-pack-objects-sparse.sh b/t/t5322-pack-objects-sparse.sh
> new file mode 100755
> index 0000000000..81f6805bc3
> --- /dev/null
> +++ b/t/t5322-pack-objects-sparse.sh
> @@ -0,0 +1,115 @@
> +#!/bin/sh
> +
> +test_description='pack-objects object selection using sparse algorithm'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup repo' '
> +	test_commit initial &&
> +	for i in $(test_seq 1 3)
> +	do
> +		mkdir f$i &&
> +		for j in $(test_seq 1 3)
> +		do
> +			mkdir f$i/f$j &&
> +			echo $j >f$i/f$j/data.txt
> +		done
> +	done &&
> +	git add . &&
> +	git commit -m "Initialized trees" &&
> +	for i in $(test_seq 1 3)
> +	do
> +		git checkout -b topic$i master &&
> +		echo change-$i >f$i/f$i/data.txt &&
> +		git commit -a -m "Changed f$i/f$i/data.txt"
> +	done &&
> +	cat >packinput.txt <<-EOF &&
> +	topic1
> +	^topic2
> +	^topic3
> +	EOF
> +	git rev-parse			\
> +		topic1			\
> +		topic1^{tree}		\
> +		topic1:f1		\
> +		topic1:f1/f1		\
> +		topic1:f1/f1/data.txt | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +# Demonstrate that both algorithms send "extra" objects because
> +# they are not in the frontier.
> +
> +test_expect_success 'duplicate a folder from f3 and commit to topic1' '
> +	git checkout topic1 &&
> +	echo change-3 >f3/f3/data.txt &&
> +	git commit -a -m "Changed f3/f3/data.txt" &&
> +	git rev-parse			\
> +		topic1~1		\
> +		topic1~1^{tree}		\
> +		topic1^{tree}		\
> +		topic1			\
> +		topic1:f1		\
> +		topic1:f1/f1		\
> +		topic1:f1/f1/data.txt	\
> +		topic1:f3		\
> +		topic1:f3/f3		\
> +		topic1:f3/f3/data.txt | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +test_expect_success 'duplicate a folder from f1 into f3' '
> +	mkdir f3/f4 &&
> +	cp -r f1/f1/* f3/f4 &&
> +	git add f3/f4 &&
> +	git commit -m "Copied f1/f1 to f3/f4" &&
> +	cat >packinput.txt <<-EOF &&
> +	topic1
> +	^topic1~1
> +	EOF
> +	git rev-parse		\
> +		topic1		\
> +		topic1^{tree}	\
> +		topic1:f3 | sort >expect_objects.txt
> +'
> +
> +test_expect_success 'non-sparse pack-objects' '
> +	git pack-objects --stdout --revs <packinput.txt >nonsparse.pack &&
> +	git index-pack -o nonsparse.idx nonsparse.pack &&
> +	git show-index <nonsparse.idx | awk "{print \$2}" >nonsparse_objects.txt &&
> +	test_cmp expect_objects.txt nonsparse_objects.txt
> +'
> +
> +test_expect_success 'sparse pack-objects' '
> +	git pack-objects --stdout --revs --sparse <packinput.txt >sparse.pack &&
> +	git index-pack -o sparse.idx sparse.pack &&
> +	git show-index <sparse.idx | awk "{print \$2}" >sparse_objects.txt &&
> +	test_cmp expect_objects.txt sparse_objects.txt
> +'
> +
> +test_done

  reply	other threads:[~2019-01-11 22:30 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28 21:52 [PATCH 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 3/5] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-28 22:11   ` Stefan Beller
2018-11-29 14:20     ` Derrick Stolee
2018-11-30  2:39       ` Junio C Hamano
2018-11-30 15:53         ` Derrick Stolee
2018-11-28 21:52 ` [PATCH 4/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 5/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-28 22:18 ` [PATCH 0/5] Add a new "sparse" tree walk algorithm Ævar Arnfjörð Bjarmason
2018-11-29  4:05   ` Derrick Stolee
2018-11-29 14:24 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-10 16:42   ` [PATCH v3 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-14 21:22     ` [PATCH v4 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-14 21:22       ` [PATCH v4 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-11 19:43         ` Junio C Hamano
2019-01-11 20:25           ` Junio C Hamano
2019-01-11 22:05             ` Derrick Stolee
2018-12-14 21:22       ` [PATCH v4 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-11 23:20         ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2019-01-11 22:30         ` Junio C Hamano [this message]
2019-01-15 15:06           ` Derrick Stolee
2019-01-15 18:23             ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-14 23:32         ` Ævar Arnfjörð Bjarmason
2018-12-17 14:20           ` Derrick Stolee
2018-12-17 14:26             ` Ævar Arnfjörð Bjarmason
2018-12-17 14:50               ` Derrick Stolee
2019-01-11 23:20         ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-14 21:22       ` [PATCH v4 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2019-01-16 18:25       ` [PATCH v5 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 3/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:26         ` [PATCH v5 4/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2019-01-16 18:26         ` [PATCH v5 5/5] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmuo6yghg.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).