git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Repacking of promisor packfiles
@ 2018-08-07 20:12 Jonathan Tan
  2018-08-07 20:12 ` [PATCH 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-07 20:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

These patches teach Git to also repack promisor packfiles upon GC, which
reduces one of the pain points of current partial clone usage (many
promisor packfiles in the objects/pack/ directory, generated upon each
fetch).

In the t/ tests, I strived to verify that repack doesn't accidentally
delete any objects. Let me know if you can think of better ways to do
that.

Jonathan Tan (2):
  repack: refactor setup of pack-objects cmd
  repack: repack promisor objects if -a or -A is set

 Documentation/git-repack.txt |   5 ++
 builtin/repack.c             | 163 ++++++++++++++++++++++++-----------
 t/t0410-partial-clone.sh     |  71 ++++++++++++---
 3 files changed, 180 insertions(+), 59 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] repack: refactor setup of pack-objects cmd
  2018-08-07 20:12 [PATCH 0/2] Repacking of promisor packfiles Jonathan Tan
@ 2018-08-07 20:12 ` Jonathan Tan
  2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
  2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
  2 siblings, 0 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-07 20:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

A subsequent patch will teach repack to run pack-objects with some same
and some different arguments if repacking of promisor objects is
required. Refactor the setup of the pack-objects cmd so that setting up
the arguments common to both is done in a function.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/repack.c | 99 +++++++++++++++++++++++++++---------------------
 1 file changed, 55 insertions(+), 44 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159..ca695fa10 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -138,6 +138,47 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
 	strbuf_release(&buf);
 }
 
+struct packed_objects_args {
+	const char *window;
+	const char *window_memory;
+	const char *depth;
+	const char *threads;
+	const char *max_pack_size;
+	int no_reuse_delta;
+	int no_reuse_object;
+	int quiet;
+	int local;
+};
+
+static void prepare_packed_objects(struct child_process *cmd,
+				   const struct packed_objects_args *args)
+{
+	argv_array_push(&cmd->args, "pack-objects");
+	if (args->window)
+		argv_array_pushf(&cmd->args, "--window=%s", args->window);
+	if (args->window_memory)
+		argv_array_pushf(&cmd->args, "--window-memory=%s", args->window_memory);
+	if (args->depth)
+		argv_array_pushf(&cmd->args, "--depth=%s", args->depth);
+	if (args->threads)
+		argv_array_pushf(&cmd->args, "--threads=%s", args->threads);
+	if (args->max_pack_size)
+		argv_array_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size);
+	if (args->no_reuse_delta)
+		argv_array_pushf(&cmd->args, "--no-reuse-delta");
+	if (args->no_reuse_object)
+		argv_array_pushf(&cmd->args, "--no-reuse-object");
+	if (args->local)
+		argv_array_push(&cmd->args,  "--local");
+	if (args->quiet)
+		argv_array_push(&cmd->args,  "--quiet");
+	if (delta_base_offset)
+		argv_array_push(&cmd->args,  "--delta-base-offset");
+	argv_array_push(&cmd->args, packtmp);
+	cmd->git_cmd = 1;
+	cmd->out = -1;
+}
+
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
 
@@ -165,15 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	int delete_redundant = 0;
 	const char *unpack_unreachable = NULL;
 	int keep_unreachable = 0;
-	const char *window = NULL, *window_memory = NULL;
-	const char *depth = NULL;
-	const char *threads = NULL;
-	const char *max_pack_size = NULL;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	int no_reuse_delta = 0, no_reuse_object = 0;
 	int no_update_server_info = 0;
-	int quiet = 0;
-	int local = 0;
+	struct packed_objects_args po_args = {NULL};
 
 	struct option builtin_repack_options[] = {
 		OPT_BIT('a', NULL, &pack_everything,
@@ -183,14 +218,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				   LOOSEN_UNREACHABLE | ALL_INTO_ONE),
 		OPT_BOOL('d', NULL, &delete_redundant,
 				N_("remove redundant packs, and run git-prune-packed")),
-		OPT_BOOL('f', NULL, &no_reuse_delta,
+		OPT_BOOL('f', NULL, &po_args.no_reuse_delta,
 				N_("pass --no-reuse-delta to git-pack-objects")),
-		OPT_BOOL('F', NULL, &no_reuse_object,
+		OPT_BOOL('F', NULL, &po_args.no_reuse_object,
 				N_("pass --no-reuse-object to git-pack-objects")),
 		OPT_BOOL('n', NULL, &no_update_server_info,
 				N_("do not run git-update-server-info")),
-		OPT__QUIET(&quiet, N_("be quiet")),
-		OPT_BOOL('l', "local", &local,
+		OPT__QUIET(&po_args.quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &po_args.local,
 				N_("pass --local to git-pack-objects")),
 		OPT_BOOL('b', "write-bitmap-index", &write_bitmaps,
 				N_("write bitmap index")),
@@ -198,15 +233,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("with -A, do not loosen objects older than this")),
 		OPT_BOOL('k', "keep-unreachable", &keep_unreachable,
 				N_("with -a, repack unreachable objects")),
-		OPT_STRING(0, "window", &window, N_("n"),
+		OPT_STRING(0, "window", &po_args.window, N_("n"),
 				N_("size of the window used for delta compression")),
-		OPT_STRING(0, "window-memory", &window_memory, N_("bytes"),
+		OPT_STRING(0, "window-memory", &po_args.window_memory, N_("bytes"),
 				N_("same as the above, but limit memory size instead of entries count")),
-		OPT_STRING(0, "depth", &depth, N_("n"),
+		OPT_STRING(0, "depth", &po_args.depth, N_("n"),
 				N_("limits the maximum delta depth")),
-		OPT_STRING(0, "threads", &threads, N_("n"),
+		OPT_STRING(0, "threads", &po_args.threads, N_("n"),
 				N_("limits the maximum number of threads")),
-		OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"),
+		OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
 				N_("repack objects in packs marked with .keep")),
@@ -238,7 +273,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	sigchain_push_common(remove_pack_on_signal);
 
-	argv_array_push(&cmd.args, "pack-objects");
+	prepare_packed_objects(&cmd, &po_args);
+
 	argv_array_push(&cmd.args, "--keep-true-parents");
 	if (!pack_kept_objects)
 		argv_array_push(&cmd.args, "--honor-pack-keep");
@@ -251,20 +287,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (window)
-		argv_array_pushf(&cmd.args, "--window=%s", window);
-	if (window_memory)
-		argv_array_pushf(&cmd.args, "--window-memory=%s", window_memory);
-	if (depth)
-		argv_array_pushf(&cmd.args, "--depth=%s", depth);
-	if (threads)
-		argv_array_pushf(&cmd.args, "--threads=%s", threads);
-	if (max_pack_size)
-		argv_array_pushf(&cmd.args, "--max-pack-size=%s", max_pack_size);
-	if (no_reuse_delta)
-		argv_array_pushf(&cmd.args, "--no-reuse-delta");
-	if (no_reuse_object)
-		argv_array_pushf(&cmd.args, "--no-reuse-object");
 	if (write_bitmaps)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 
@@ -292,17 +314,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		argv_array_push(&cmd.args, "--incremental");
 	}
 
-	if (local)
-		argv_array_push(&cmd.args,  "--local");
-	if (quiet)
-		argv_array_push(&cmd.args,  "--quiet");
-	if (delta_base_offset)
-		argv_array_push(&cmd.args,  "--delta-base-offset");
-
-	argv_array_push(&cmd.args, packtmp);
-
-	cmd.git_cmd = 1;
-	cmd.out = -1;
 	cmd.no_stdin = 1;
 
 	ret = start_command(&cmd);
@@ -320,7 +331,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (ret)
 		return ret;
 
-	if (!names.nr && !quiet)
+	if (!names.nr && !po_args.quiet)
 		printf("Nothing new to pack.\n");
 
 	/*
@@ -441,7 +452,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			if (!string_list_has_string(&names, sha1))
 				remove_redundant_pack(packdir, item->string);
 		}
-		if (!quiet && isatty(2))
+		if (!po_args.quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
 	}
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 20:12 [PATCH 0/2] Repacking of promisor packfiles Jonathan Tan
  2018-08-07 20:12 ` [PATCH 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
@ 2018-08-07 20:12 ` Jonathan Tan
  2018-08-07 20:57   ` Junio C Hamano
  2018-08-07 22:37   ` Junio C Hamano
  2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
  2 siblings, 2 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-07 20:12 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Currently, repack does not touch promisor packfiles at all, potentially
causing the performance of repositories that have many such packfiles to
drop. Therefore, repack all promisor objects if invoked with -a or -A.

This is done by an additional invocation of pack-objects on all promisor
objects individually given, which takes care of deduplication and allows
the resulting packfiles to respect flags such as --max-pack-size.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-repack.txt |  5 +++
 builtin/repack.c             | 64 ++++++++++++++++++++++++++++++--
 t/t0410-partial-clone.sh     | 71 ++++++++++++++++++++++++++++++------
 3 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d90e7907f..d05625096 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -40,6 +40,11 @@ OPTIONS
 Note that users fetching over dumb protocols will have to fetch the
 whole new pack in order to get any contained object, no matter how many
 other objects in that pack they already have locally.
++
+Promisor packfiles are repacked separately: if there are packfiles that
+have an associated ".promisor" file, these packfiles will be repacked
+into another separate pack, and an empty ".promisor" file corresponding
+to the new separate pack will be written.
 
 -A::
 	Same as `-a`, unless `-d` is used.  Then any unreachable
diff --git a/builtin/repack.c b/builtin/repack.c
index ca695fa10..a78d2f0aa 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "packfile.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -83,7 +84,7 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep or .promisor file. These packs are not to
+ * have a corresponding .keep file. These packs are not to
  * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list,
@@ -111,8 +112,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 
 		fname = xmemdupz(e->d_name, len);
 
-		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
-		    !file_exists(mkpath("%s/%s.promisor", packdir, fname)))
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
 		else
 			free(fname);
@@ -122,7 +122,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-	const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"};
+	const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
@@ -179,6 +179,58 @@ static void prepare_packed_objects(struct child_process *cmd,
 	cmd->out = -1;
 }
 
+static int write_oid(const struct object_id *oid, struct packed_git *pack,
+		     uint32_t pos, void *data)
+{
+	int fd = *(int *)data;
+
+	xwrite(fd, oid_to_hex(oid), GIT_SHA1_HEXSZ);
+	xwrite(fd, "\n", 1);
+	return 0;
+}
+
+static void repack_promisor_objects(const struct packed_objects_args *args,
+				    struct string_list *names)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	FILE *out;
+	struct strbuf line = STRBUF_INIT;
+
+	prepare_packed_objects(&cmd, args);
+	cmd.in = -1;
+
+	if (start_command(&cmd))
+		die("Could not start pack-objects to repack promisor objects");
+
+	for_each_packed_object(write_oid, &cmd.in,
+			       FOR_EACH_OBJECT_PROMISOR_ONLY);
+	close(cmd.in);
+
+	out = xfdopen(cmd.out, "r");
+	while (strbuf_getline_lf(&line, out) != EOF) {
+		char *promisor_name;
+		int fd;
+		if (line.len != 40)
+			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
+		string_list_append(names, line.buf);
+
+		/*
+		 * pack-objects creates the .pack and .idx files, but not the
+		 * .promisor file. Create the .promisor file, which is empty.
+		 */
+		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
+					  line.buf);
+		fd = open(promisor_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+		if (fd < 0)
+			die_errno("unable to create '%s'", promisor_name);
+		close(fd);
+		free(promisor_name);
+	}
+	fclose(out);
+	if (finish_command(&cmd))
+		die("Could not finish pack-objects to repack promisor objects");
+}
+
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
 
@@ -191,6 +243,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		{".pack"},
 		{".idx"},
 		{".bitmap", 1},
+		{".promisor", 1},
 	};
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
@@ -293,6 +346,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (pack_everything & ALL_INTO_ONE) {
 		get_non_kept_pack_filenames(&existing_packs, &keep_pack_list);
 
+		if (repository_format_partial_clone)
+			repack_promisor_objects(&po_args, &names);
+
 		if (existing_packs.nr && delete_redundant) {
 			if (unpack_unreachable) {
 				argv_array_pushf(&cmd.args,
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..27ba7d201 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -271,28 +271,77 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
-test_expect_success 'gc does not repack promisor objects' '
+test_expect_success 'gc repacks promisor objects separately from non-promisor objects' '
 	rm -rf repo &&
 	test_create_repo repo &&
-	test_commit -C repo my_commit &&
+	test_commit -C repo one &&
+	test_commit -C repo two &&
 
-	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
-	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
+	TREE_ONE=$(git -C repo rev-parse one^{tree}) &&
+	printf "$TREE_ONE\n" | pack_as_from_promisor &&
+	TREE_TWO=$(git -C repo rev-parse two^{tree}) &&
+	printf "$TREE_TWO\n" | pack_as_from_promisor &&
 
 	git -C repo config core.repositoryformatversion 1 &&
 	git -C repo config extensions.partialclone "arbitrary string" &&
 	git -C repo gc &&
 
-	# Ensure that the promisor packfile still exists, and remove it
-	test -e repo/.git/objects/pack/pack-$HASH.pack &&
-	rm repo/.git/objects/pack/pack-$HASH.* &&
-
-	# Ensure that the single other pack contains the commit, but not the tree
+	# Ensure that exactly one promisor packfile exists, and that it
+	# contains the trees but not the commits
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	PROMISOR_PACKFILE=$(sed "s/.promisor/.pack/" <promisorlist) &&
+	git verify-pack $PROMISOR_PACKFILE -v >out &&
+	grep "$TREE_ONE" out &&
+	grep "$TREE_TWO" out &&
+	! grep "$(git -C repo rev-parse one)" out &&
+	! grep "$(git -C repo rev-parse two)" out &&
+
+	# Remove the promisor packfile and associated files
+	rm $(sed "s/.promisor//" <promisorlist).* &&
+
+	# Ensure that the single other pack contains the commits, but not the
+	# trees
 	ls repo/.git/objects/pack/pack-*.pack >packlist &&
 	test_line_count = 1 packlist &&
 	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
-	grep "$(git -C repo rev-parse HEAD)" out &&
-	! grep "$TREE_HASH" out
+	grep "$(git -C repo rev-parse one)" out &&
+	grep "$(git -C repo rev-parse two)" out &&
+	! grep "$TREE_ONE" out &&
+	! grep "$TREE_TWO" out
+'
+
+repack_and_check () {
+	rm -rf repo2 &&
+	cp -r repo repo2 &&
+	git -C repo2 repack $1 -d &&
+	git -C repo2 fsck &&
+
+	git -C repo2 cat-file -e $2 &&
+	git -C repo2 cat-file -e $3
+}
+
+test_expect_success 'repack -d does not irreversibly delete promisor objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo commit --allow-empty -m one &&
+	git -C repo commit --allow-empty -m two &&
+	git -C repo commit --allow-empty -m three &&
+	git -C repo commit --allow-empty -m four &&
+	ONE=$(git -C repo rev-parse HEAD^^^) &&
+	TWO=$(git -C repo rev-parse HEAD^^) &&
+	THREE=$(git -C repo rev-parse HEAD^) &&
+
+	printf "$TWO\n" | pack_as_from_promisor &&
+	printf "$THREE\n" | pack_as_from_promisor &&
+	delete_object repo "$ONE" &&
+
+	repack_and_check -a "$TWO" "$THREE" &&
+	repack_and_check -A "$TWO" "$THREE" &&
+	repack_and_check -l "$TWO" "$THREE"
 '
 
 test_expect_success 'gc stops traversal when a missing but promised object is reached' '
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
@ 2018-08-07 20:57   ` Junio C Hamano
  2018-08-07 23:23     ` Jonathan Tan
  2018-08-07 22:37   ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2018-08-07 20:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> +static int write_oid(const struct object_id *oid, struct packed_git *pack,
> +		     uint32_t pos, void *data)
> +{
> +	int fd = *(int *)data;
> +
> +	xwrite(fd, oid_to_hex(oid), GIT_SHA1_HEXSZ);
> +	xwrite(fd, "\n", 1);
> +	return 0;
> +}
> +
> +static void repack_promisor_objects(const struct packed_objects_args *args,
> +				    struct string_list *names)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +	FILE *out;
> +	struct strbuf line = STRBUF_INIT;
> +
> +	prepare_packed_objects(&cmd, args);
> +	cmd.in = -1;
> +
> +	if (start_command(&cmd))
> +		die("Could not start pack-objects to repack promisor objects");
> +
> +	for_each_packed_object(write_oid, &cmd.in,
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +	close(cmd.in);

for_each_object_in_pack() is a fine way to make sure that you list
everythning in a pack, but I suspect it is a horrible way to feed a
list of objects to pack-objects, as it goes by the .idx order, which
is by definition a way to enumerate objects in a randomly useless
order.

Do we already have an access to the in-core reverse index for the
pack at this point in the code?  If so, we can enumerate the objects
in the offset order without incurring a lot of cost (building the
in-core revindex is the more expensive part).  When writing a pack,
we try to make sure that related objects are written out close to
each other [*1*], so listing them in the offset order (with made-up
pathname information to _force_ that objects that live close
together in the original pack are grouped together by getting
similar names) might give us a better than horrible deltification.
I dunno.

	Side note *1*: "related" has two axis, and one is relevant
	for better deltification, while the other is not useful.
	The one I have in mind here is that we write set of blobs
	that belong to the same "delta family" together.

I do not think such a "make it a bit better than horrible" is necessary
in an initial version, but it deserves an in-code NEEDSWORK comment
to remind us that we need to measure and experiment.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
  2018-08-07 20:57   ` Junio C Hamano
@ 2018-08-07 22:37   ` Junio C Hamano
  2018-08-07 23:25     ` Jonathan Tan
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2018-08-07 22:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Christian Couder

Jonathan Tan <jonathantanmy@google.com> writes:

> @@ -293,6 +346,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	if (pack_everything & ALL_INTO_ONE) {
>  		get_non_kept_pack_filenames(&existing_packs, &keep_pack_list);
>  
> +		if (repository_format_partial_clone)
> +			repack_promisor_objects(&po_args, &names);
> +
>  		if (existing_packs.nr && delete_redundant) {
>  			if (unpack_unreachable) {
>  				argv_array_pushf(&cmd.args,

Just a note (and a request-for-sanity-check) and not meant to be a
request to update the code, but with a still-in-pu 4b757a40 ("Use
remote_odb_get_direct() and has_remote_odb()", 2018-08-02) in
flight, repository_format_partial_clone is now gone.

I've tentatively resolved the above to read like so:

		if (has_remote_odb())
			repack_promisor_objects(&po_args, &names);

but I am not sure if it makes sense to always require odb helper to
be present for any promisor.  As long as you do not have need to
actually access these missing objects, you do not need any remote
odb access at all, in which case requiring has_remote_odb() as a
precondition to concatenate the promisor packs to coalesce them into
one pack does not make sense---you only want to know if there are
any .promisor packs.

In other words, I suspect that the world is not black (i.e. partial
clone, which always has remote-odb) and white (i.e. full repository,
without remote-odb).  4b757a40 makes it impossible to have a gray
(i.e. partial clone, but no access to remote-odb), which I am not
sure if it is a good thing.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 20:57   ` Junio C Hamano
@ 2018-08-07 23:23     ` Jonathan Tan
  2018-08-08  0:25       ` Junio C Hamano
  2018-08-08 15:50       ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-07 23:23 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> for_each_object_in_pack() is a fine way to make sure that you list
> everythning in a pack, but I suspect it is a horrible way to feed a
> list of objects to pack-objects, as it goes by the .idx order, which
> is by definition a way to enumerate objects in a randomly useless
> order.

That's true.

> Do we already have an access to the in-core reverse index for the
> pack at this point in the code?

As far as I can tell, no. These patches construct the list of promisor
objects in repack.c (which does not use the revindex at all), to be
processed by pack-objects in a different process (which does use the
revindex in reuse-delta mode, which is the default). I guess that we
could have access to the revindex if we were to libify pack-objects and
run it in the same process as repack.c or if we were to add a special
mode to pack-objects that reads for itself the list of all the promisor
objects.

> If so, we can enumerate the objects
> in the offset order without incurring a lot of cost (building the
> in-core revindex is the more expensive part).  When writing a pack,
> we try to make sure that related objects are written out close to
> each other [*1*], so listing them in the offset order (with made-up
> pathname information to _force_ that objects that live close
> together in the original pack are grouped together by getting
> similar names) might give us a better than horrible deltification.
> I dunno.

Before I write the NEEDSWORK comment, just to clarify - do you mean
better than horrible object locality? I think deltification still occurs
because pack-objects sorts the input when doing the windowed
deltification, for example:

  git rev-parse HEAD:fetch-pack.c HEAD HEAD^ HEAD^^ \
	HEAD^^^ v2.17.0:fetch-pack.c | git pack-objects --window=2 abc

produces a packfile with a delta, as checked by `verify-pack -v`.

> 	Side note *1*: "related" has two axis, and one is relevant
> 	for better deltification, while the other is not useful.
> 	The one I have in mind here is that we write set of blobs
> 	that belong to the same "delta family" together.

As far as I can see, they do not need to be adjacent in the packfile to
have one be a delta against the other.

> I do not think such a "make it a bit better than horrible" is necessary
> in an initial version, but it deserves an in-code NEEDSWORK comment
> to remind us that we need to measure and experiment.

OK, I'll do this in the next reroll.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 22:37   ` Junio C Hamano
@ 2018-08-07 23:25     ` Jonathan Tan
  2018-08-08 16:34       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jonathan Tan @ 2018-08-07 23:25 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, christian.couder

> Just a note (and a request-for-sanity-check) and not meant to be a
> request to update the code, but with a still-in-pu 4b757a40 ("Use
> remote_odb_get_direct() and has_remote_odb()", 2018-08-02) in
> flight, repository_format_partial_clone is now gone.
> 
> I've tentatively resolved the above to read like so:
> 
> 		if (has_remote_odb())
> 			repack_promisor_objects(&po_args, &names);
> 
> but I am not sure if it makes sense to always require odb helper to
> be present for any promisor.  As long as you do not have need to
> actually access these missing objects, you do not need any remote
> odb access at all, in which case requiring has_remote_odb() as a
> precondition to concatenate the promisor packs to coalesce them into
> one pack does not make sense---you only want to know if there are
> any .promisor packs.
> 
> In other words, I suspect that the world is not black (i.e. partial
> clone, which always has remote-odb) and white (i.e. full repository,
> without remote-odb).  4b757a40 makes it impossible to have a gray
> (i.e. partial clone, but no access to remote-odb), which I am not
> sure if it is a good thing.

Thanks for the heads-up. This makes me realize that even the current
precondition (repository_format_partial_clone) is not an exact one - I
should only be doing this if I know that there is at least one promisor
object (if not, in the rare case that a repo is configured to be partial
but does not have any promisor objects, repack will generate an empty
packfile). In the next reroll, I'll take care of this, which should
avoid this merge issue too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 23:23     ` Jonathan Tan
@ 2018-08-08  0:25       ` Junio C Hamano
  2018-08-08 18:45         ` Jonathan Tan
  2018-08-08 15:50       ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2018-08-08  0:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> each other [*1*], so listing them in the offset order (with made-up
>> pathname information to _force_ that objects that live close
>> together in the original pack are grouped together by getting
>> similar names) might give us a better than horrible deltification.
>> I dunno.
>
> Before I write the NEEDSWORK comment, just to clarify - do you mean
> better than horrible object locality? I think deltification still occurs
> because pack-objects sorts the input when doing the windowed
> deltification, for example:

But what to delta against what else is determined by the pathname
info, which is now lost by enumerating objects without tree/history
walking.  By giving phoney pathnames to objects while enumerating
them in offset order and giving similar pathnames to objects closer
to each other, I was hoping that better bases will likely to be in
the same window.  The order in which objects are prepared and fed to
try_delta() is "group by type, and then sort by path-hash (major
key) and size (descending order, used as minor key)", so that
the largest among similarly named blobs is stored as base and other
blobs with similar name try to become delta against that one.

>   git rev-parse HEAD:fetch-pack.c HEAD HEAD^ HEAD^^ \
> 	HEAD^^^ v2.17.0:fetch-pack.c | git pack-objects --window=2 abc
>
> produces a packfile with a delta, as checked by `verify-pack -v`.

Is that interesting?  You can make the same argument that
fast-import produces a packfile with a delta.

It is known to produce horrible delta---its delta base selection
essentially is "you are giving me a new object?  let me see if it
deltas well with the object you gave me immediately before that (and
nothing else)".


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 23:23     ` Jonathan Tan
  2018-08-08  0:25       ` Junio C Hamano
@ 2018-08-08 15:50       ` Jeff King
  2018-08-08 16:17         ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff King @ 2018-08-08 15:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, git

On Tue, Aug 07, 2018 at 04:23:04PM -0700, Jonathan Tan wrote:

> > Do we already have an access to the in-core reverse index for the
> > pack at this point in the code?
> 
> As far as I can tell, no. These patches construct the list of promisor
> objects in repack.c (which does not use the revindex at all), to be
> processed by pack-objects in a different process (which does use the
> revindex in reuse-delta mode, which is the default). I guess that we
> could have access to the revindex if we were to libify pack-objects and
> run it in the same process as repack.c or if we were to add a special
> mode to pack-objects that reads for itself the list of all the promisor
> objects.

It's not the end of the world to access the revindex here. It requires
effort proportional to the size of the packfile, which makes it a bad
choice for operations that want to touch only a few objects. But here
you're iterating over the whole packfile anyway.

I think that the for_each_packed_object() interface should visit items
in pack order. Something like this (I'd probably change the name of the
inner "i" variable for clarity, but I didn't here to minimize the diff):

diff --git a/packfile.c b/packfile.c
index 04d387f312..334f47e4cd 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1887,10 +1887,13 @@ int has_pack_index(const unsigned char *sha1)
 
 int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
-	uint32_t i;
+	uint32_t in_pack;
 	int r = 0;
 
-	for (i = 0; i < p->num_objects; i++) {
+	load_pack_revindex(p);
+
+	for (in_pack = 0; in_pack < p->num_objects; in_pack++) {
+		uint32_t i = p->revindex[in_pack].nr;
 		struct object_id oid;
 
 		if (!nth_packed_object_oid(&oid, p, i))

Possibly it should be an optional flag, in case the caller is going to
eventually sort by sha1 anyway. But the major caller that sorts by sha1
is cat-file's --batch-all-objects option.  And IMHO that is actually a
bad default. Try timing this in a fully-packed repository:

  time git cat-file --batch-all-objects --buffer --batch | wc -c

versus this:

  time git show-index <.git/objects/pack/*.idx |
  sort -n |
  awk '{print $2}' |
  git cat-file --buffer --batch | wc -c

On my git.git repo, it's literally 7x faster to do the latter, because
of locality in resolving deltas.

So I think we should consider at least making it an option for cat-file
to do the iteration in pack-ish order (it can't be perfect, of course,
because there may be multiple packs, loose objects, etc, but
approximating it is a good idea).

I guess it has to be an option, since we advertise in the manpage that
the output is sorted by hash.

> > If so, we can enumerate the objects
> > in the offset order without incurring a lot of cost (building the
> > in-core revindex is the more expensive part).  When writing a pack,
> > we try to make sure that related objects are written out close to
> > each other [*1*], so listing them in the offset order (with made-up
> > pathname information to _force_ that objects that live close
> > together in the original pack are grouped together by getting
> > similar names) might give us a better than horrible deltification.
> > I dunno.
> 
> Before I write the NEEDSWORK comment, just to clarify - do you mean
> better than horrible object locality? I think deltification still occurs
> because pack-objects sorts the input when doing the windowed
> deltification, for example:
> 
>   git rev-parse HEAD:fetch-pack.c HEAD HEAD^ HEAD^^ \
> 	HEAD^^^ v2.17.0:fetch-pack.c | git pack-objects --window=2 abc
> 
> produces a packfile with a delta, as checked by `verify-pack -v`.

You're going to get a lot of reused deltas from the old pack, but you'll
probably miss opportunities to delta objects between packs. And
pack-object's compute_write_order() will fix some of the issues:

 - it will group objects of each type together

 - it will group delta "families" together

So the resulting packfile may not actually be _too_ bad with respect to
delta locality. Where it will fail is:

 - there's no locality with respect to history, so a traversal will have
   bad I/O patterns as it skips all over the packfile

 - the delta heuristics do use the incoming order (which usually is from
   a traversal) as a hint. But IMHO that is way less important than
   lacking the names. Even if you have an actual traversal, two versions
   of the same file may be separated by hundreds of other blob changes
   in the middle. Traversal order doesn't help you; putting things with
   similar names next to each other does.

   For traversals using bitmaps, we have a "name hash cache" in the
   .bitmap file, which stores the 32-bit hash of the name. That lets us
   put similar names next to each other without actually knowing the
   names. If we absolutely can't traverse as part of the strategy for
   repacking a promisor pack, then it may be worth trying to figure out
   how to carry that hash around.

-Peff

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-08 15:50       ` Jeff King
@ 2018-08-08 16:17         ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2018-08-08 16:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

Jeff King <peff@peff.net> writes:

> I think that the for_each_packed_object() interface should visit items
> in pack order.

Yeah, that may make more sense.  Any order is more logical than the
object name order (which by definition is random, so useful only
when you want them in a random order), and pack order will give us
locality of accesses.

>    For traversals using bitmaps, we have a "name hash cache" in the
>    .bitmap file, which stores the 32-bit hash of the name. That lets us
>    put similar names next to each other without actually knowing the
>    names.

I forgot about that one; it is a neat trick.  As .primisor packs are
intentionally kept apart from the remainder of the repository, and
because .bitmap wants to have a single pack, we cannot just read
the .bitmap file for this particular application, though.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-07 23:25     ` Jonathan Tan
@ 2018-08-08 16:34       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2018-08-08 16:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, christian.couder

Jonathan Tan <jonathantanmy@google.com> writes:

>> Just a note (and a request-for-sanity-check) and not meant to be a
>> request to update the code, but with a still-in-pu 4b757a40 ("Use
>> remote_odb_get_direct() and has_remote_odb()", 2018-08-02) in
>> flight, repository_format_partial_clone is now gone.
>> ...
> Thanks for the heads-up. This makes me realize that even the current
> precondition (repository_format_partial_clone) is not an exact one - I
> should only be doing this if I know that there is at least one promisor
> object (if not, in the rare case that a repo is configured to be partial
> but does not have any promisor objects, repack will generate an empty
> packfile). In the next reroll, I'll take care of this, which should
> avoid this merge issue too.

I think that it is a good change, regardless of the semantic
conflict with any other topic, to look at the .promisor packs and
not look at the partial clone extension for this "do we want to
coalesce .promisor packs into one?" application.  You only want to
do so when you have two or more such packs; being in partial clone
might be a preconditon for having such packs, but it is overly loose.

In any case, remote-odb topic is turning out to be more trouble than
I originally thought; in the same change that removes the variable,
it actually removes the support for a repository configured with a
partial clone extension, and causes a hard error in t0410, breaking
test of 'pu'.  I do not know how many people actually use partial
clone right now, but their repositories would be broken the same way
when they update Git, if we merge that topic in its current shape.

I am dropping it from 'pu' for today's integration cycle.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-08  0:25       ` Junio C Hamano
@ 2018-08-08 18:45         ` Jonathan Tan
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-08 18:45 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> But what to delta against what else is determined by the pathname
> info, which is now lost by enumerating objects without tree/history
> walking.  By giving phoney pathnames to objects while enumerating
> them in offset order and giving similar pathnames to objects closer
> to each other, I was hoping that better bases will likely to be in
> the same window.  The order in which objects are prepared and fed to
> try_delta() is "group by type, and then sort by path-hash (major
> key) and size (descending order, used as minor key)", so that
> the largest among similarly named blobs is stored as base and other
> blobs with similar name try to become delta against that one.

Thanks for the patient explanation - I think I see it now. If we
enumerate in pack order and pass "<sha-1> <fake-name>" instead of
"<sha-1>" to pack-objects in such a way that we make pack-objects sort
by {type -> pack-order -> size} instead of {type -> size}, we can
hopefully get better deltas. I'll write a NEEDSWORK similar to this.

I'll send a reroll later today.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 0/2] Repacking of promisor packfiles
  2018-08-07 20:12 [PATCH 0/2] Repacking of promisor packfiles Jonathan Tan
  2018-08-07 20:12 ` [PATCH 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
  2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
@ 2018-08-08 22:34 ` Jonathan Tan
  2018-08-08 22:34   ` [PATCH v2 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
  2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
  2 siblings, 2 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-08 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster

Changes from v1:
 - add NEEDSWORK stating that input to pack-objects could be removed
 - run pack-objects to repack promisor objects only if there is at least
   one of them - this exposed a possible bug where the later part of
   cmd_repack() requires a correct packed_git (this was mostly noticed
   by tests that deal with corrupt packfiles), so insert a call to
   reprepare_packed_git() after all packfile manipulation is done
 - rename prepare_packed_objects() to prepare_pack_objects() to better
   reflect that we are preparing a call to pack-objects, not preparing
   existing packed objects

Peff raised the possibility of making for_each_packed_object() use pack
order instead of index order - I'll also take a look at that, separately
from this patch series.

Jonathan Tan (2):
  repack: refactor setup of pack-objects cmd
  repack: repack promisor objects if -a or -A is set

 Documentation/git-repack.txt |   5 +
 builtin/repack.c             | 182 ++++++++++++++++++++++++++---------
 t/t0410-partial-clone.sh     |  85 +++++++++++++---
 3 files changed, 213 insertions(+), 59 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/2] repack: refactor setup of pack-objects cmd
  2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
@ 2018-08-08 22:34   ` Jonathan Tan
  2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-08 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster

A subsequent patch will teach repack to run pack-objects with some same
and some different arguments if repacking of promisor objects is
required. Refactor the setup of the pack-objects cmd so that setting up
the arguments common to both is done in a function.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/repack.c | 99 +++++++++++++++++++++++++++---------------------
 1 file changed, 55 insertions(+), 44 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159..f8cae7d66 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -138,6 +138,47 @@ static void remove_redundant_pack(const char *dir_name, const char *base_name)
 	strbuf_release(&buf);
 }
 
+struct pack_objects_args {
+	const char *window;
+	const char *window_memory;
+	const char *depth;
+	const char *threads;
+	const char *max_pack_size;
+	int no_reuse_delta;
+	int no_reuse_object;
+	int quiet;
+	int local;
+};
+
+static void prepare_pack_objects(struct child_process *cmd,
+				 const struct pack_objects_args *args)
+{
+	argv_array_push(&cmd->args, "pack-objects");
+	if (args->window)
+		argv_array_pushf(&cmd->args, "--window=%s", args->window);
+	if (args->window_memory)
+		argv_array_pushf(&cmd->args, "--window-memory=%s", args->window_memory);
+	if (args->depth)
+		argv_array_pushf(&cmd->args, "--depth=%s", args->depth);
+	if (args->threads)
+		argv_array_pushf(&cmd->args, "--threads=%s", args->threads);
+	if (args->max_pack_size)
+		argv_array_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size);
+	if (args->no_reuse_delta)
+		argv_array_pushf(&cmd->args, "--no-reuse-delta");
+	if (args->no_reuse_object)
+		argv_array_pushf(&cmd->args, "--no-reuse-object");
+	if (args->local)
+		argv_array_push(&cmd->args,  "--local");
+	if (args->quiet)
+		argv_array_push(&cmd->args,  "--quiet");
+	if (delta_base_offset)
+		argv_array_push(&cmd->args,  "--delta-base-offset");
+	argv_array_push(&cmd->args, packtmp);
+	cmd->git_cmd = 1;
+	cmd->out = -1;
+}
+
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
 
@@ -165,15 +206,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	int delete_redundant = 0;
 	const char *unpack_unreachable = NULL;
 	int keep_unreachable = 0;
-	const char *window = NULL, *window_memory = NULL;
-	const char *depth = NULL;
-	const char *threads = NULL;
-	const char *max_pack_size = NULL;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
-	int no_reuse_delta = 0, no_reuse_object = 0;
 	int no_update_server_info = 0;
-	int quiet = 0;
-	int local = 0;
+	struct pack_objects_args po_args = {NULL};
 
 	struct option builtin_repack_options[] = {
 		OPT_BIT('a', NULL, &pack_everything,
@@ -183,14 +218,14 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				   LOOSEN_UNREACHABLE | ALL_INTO_ONE),
 		OPT_BOOL('d', NULL, &delete_redundant,
 				N_("remove redundant packs, and run git-prune-packed")),
-		OPT_BOOL('f', NULL, &no_reuse_delta,
+		OPT_BOOL('f', NULL, &po_args.no_reuse_delta,
 				N_("pass --no-reuse-delta to git-pack-objects")),
-		OPT_BOOL('F', NULL, &no_reuse_object,
+		OPT_BOOL('F', NULL, &po_args.no_reuse_object,
 				N_("pass --no-reuse-object to git-pack-objects")),
 		OPT_BOOL('n', NULL, &no_update_server_info,
 				N_("do not run git-update-server-info")),
-		OPT__QUIET(&quiet, N_("be quiet")),
-		OPT_BOOL('l', "local", &local,
+		OPT__QUIET(&po_args.quiet, N_("be quiet")),
+		OPT_BOOL('l', "local", &po_args.local,
 				N_("pass --local to git-pack-objects")),
 		OPT_BOOL('b', "write-bitmap-index", &write_bitmaps,
 				N_("write bitmap index")),
@@ -198,15 +233,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 				N_("with -A, do not loosen objects older than this")),
 		OPT_BOOL('k', "keep-unreachable", &keep_unreachable,
 				N_("with -a, repack unreachable objects")),
-		OPT_STRING(0, "window", &window, N_("n"),
+		OPT_STRING(0, "window", &po_args.window, N_("n"),
 				N_("size of the window used for delta compression")),
-		OPT_STRING(0, "window-memory", &window_memory, N_("bytes"),
+		OPT_STRING(0, "window-memory", &po_args.window_memory, N_("bytes"),
 				N_("same as the above, but limit memory size instead of entries count")),
-		OPT_STRING(0, "depth", &depth, N_("n"),
+		OPT_STRING(0, "depth", &po_args.depth, N_("n"),
 				N_("limits the maximum delta depth")),
-		OPT_STRING(0, "threads", &threads, N_("n"),
+		OPT_STRING(0, "threads", &po_args.threads, N_("n"),
 				N_("limits the maximum number of threads")),
-		OPT_STRING(0, "max-pack-size", &max_pack_size, N_("bytes"),
+		OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
 				N_("maximum size of each packfile")),
 		OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
 				N_("repack objects in packs marked with .keep")),
@@ -238,7 +273,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	sigchain_push_common(remove_pack_on_signal);
 
-	argv_array_push(&cmd.args, "pack-objects");
+	prepare_pack_objects(&cmd, &po_args);
+
 	argv_array_push(&cmd.args, "--keep-true-parents");
 	if (!pack_kept_objects)
 		argv_array_push(&cmd.args, "--honor-pack-keep");
@@ -251,20 +287,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	argv_array_push(&cmd.args, "--indexed-objects");
 	if (repository_format_partial_clone)
 		argv_array_push(&cmd.args, "--exclude-promisor-objects");
-	if (window)
-		argv_array_pushf(&cmd.args, "--window=%s", window);
-	if (window_memory)
-		argv_array_pushf(&cmd.args, "--window-memory=%s", window_memory);
-	if (depth)
-		argv_array_pushf(&cmd.args, "--depth=%s", depth);
-	if (threads)
-		argv_array_pushf(&cmd.args, "--threads=%s", threads);
-	if (max_pack_size)
-		argv_array_pushf(&cmd.args, "--max-pack-size=%s", max_pack_size);
-	if (no_reuse_delta)
-		argv_array_pushf(&cmd.args, "--no-reuse-delta");
-	if (no_reuse_object)
-		argv_array_pushf(&cmd.args, "--no-reuse-object");
 	if (write_bitmaps)
 		argv_array_push(&cmd.args, "--write-bitmap-index");
 
@@ -292,17 +314,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		argv_array_push(&cmd.args, "--incremental");
 	}
 
-	if (local)
-		argv_array_push(&cmd.args,  "--local");
-	if (quiet)
-		argv_array_push(&cmd.args,  "--quiet");
-	if (delta_base_offset)
-		argv_array_push(&cmd.args,  "--delta-base-offset");
-
-	argv_array_push(&cmd.args, packtmp);
-
-	cmd.git_cmd = 1;
-	cmd.out = -1;
 	cmd.no_stdin = 1;
 
 	ret = start_command(&cmd);
@@ -320,7 +331,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (ret)
 		return ret;
 
-	if (!names.nr && !quiet)
+	if (!names.nr && !po_args.quiet)
 		printf("Nothing new to pack.\n");
 
 	/*
@@ -441,7 +452,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 			if (!string_list_has_string(&names, sha1))
 				remove_redundant_pack(packdir, item->string);
 		}
-		if (!quiet && isatty(2))
+		if (!po_args.quiet && isatty(2))
 			opts |= PRUNE_PACKED_VERBOSE;
 		prune_packed_objects(opts);
 	}
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
  2018-08-08 22:34   ` [PATCH v2 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
@ 2018-08-08 22:34   ` Jonathan Tan
  2018-08-09 17:05     ` Junio C Hamano
  2018-08-18 16:05     ` Duy Nguyen
  1 sibling, 2 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-08 22:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster

Currently, repack does not touch promisor packfiles at all, potentially
causing the performance of repositories that have many such packfiles to
drop. Therefore, repack all promisor objects if invoked with -a or -A.

This is done by an additional invocation of pack-objects on all promisor
objects individually given, which takes care of deduplication and allows
the resulting packfiles to respect flags such as --max-pack-size.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-repack.txt |  5 +++
 builtin/repack.c             | 83 +++++++++++++++++++++++++++++++++--
 t/t0410-partial-clone.sh     | 85 +++++++++++++++++++++++++++++++-----
 3 files changed, 158 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index d90e7907f..d05625096 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -40,6 +40,11 @@ OPTIONS
 Note that users fetching over dumb protocols will have to fetch the
 whole new pack in order to get any contained object, no matter how many
 other objects in that pack they already have locally.
++
+Promisor packfiles are repacked separately: if there are packfiles that
+have an associated ".promisor" file, these packfiles will be repacked
+into another separate pack, and an empty ".promisor" file corresponding
+to the new separate pack will be written.
 
 -A::
 	Same as `-a`, unless `-d` is used.  Then any unreachable
diff --git a/builtin/repack.c b/builtin/repack.c
index f8cae7d66..5c97dec3d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "packfile.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -83,7 +84,7 @@ static void remove_pack_on_signal(int signo)
 
 /*
  * Adds all packs hex strings to the fname list, which do not
- * have a corresponding .keep or .promisor file. These packs are not to
+ * have a corresponding .keep file. These packs are not to
  * be kept if we are going to pack everything into one file.
  */
 static void get_non_kept_pack_filenames(struct string_list *fname_list,
@@ -111,8 +112,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 
 		fname = xmemdupz(e->d_name, len);
 
-		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)) &&
-		    !file_exists(mkpath("%s/%s.promisor", packdir, fname)))
+		if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
 			string_list_append_nodup(fname_list, fname);
 		else
 			free(fname);
@@ -122,7 +122,7 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-	const char *exts[] = {".pack", ".idx", ".keep", ".bitmap"};
+	const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
@@ -179,6 +179,76 @@ static void prepare_pack_objects(struct child_process *cmd,
 	cmd->out = -1;
 }
 
+/*
+ * Write oid to the given struct child_process's stdin, starting it first if
+ * necessary.
+ */
+static int write_oid(const struct object_id *oid, struct packed_git *pack,
+		     uint32_t pos, void *data)
+{
+	struct child_process *cmd = data;
+
+	if (cmd->in == -1) {
+		if (start_command(cmd))
+			die("Could not start pack-objects to repack promisor objects");
+	}
+
+	xwrite(cmd->in, oid_to_hex(oid), GIT_SHA1_HEXSZ);
+	xwrite(cmd->in, "\n", 1);
+	return 0;
+}
+
+static void repack_promisor_objects(const struct pack_objects_args *args,
+				    struct string_list *names)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	FILE *out;
+	struct strbuf line = STRBUF_INIT;
+
+	prepare_pack_objects(&cmd, args);
+	cmd.in = -1;
+
+	/*
+	 * NEEDSWORK: Giving pack-objects only the OIDs without any ordering
+	 * hints may result in suboptimal deltas in the resulting pack. See if
+	 * the OIDs can be sent with fake paths such that pack-objects can use a
+	 * {type -> existing pack order} ordering when computing deltas instead
+	 * of a {type -> size} ordering, which may produce better deltas.
+	 */
+	for_each_packed_object(write_oid, &cmd,
+			       FOR_EACH_OBJECT_PROMISOR_ONLY);
+
+	if (cmd.in == -1)
+		/* No packed objects; cmd was never started */
+		return;
+
+	close(cmd.in);
+
+	out = xfdopen(cmd.out, "r");
+	while (strbuf_getline_lf(&line, out) != EOF) {
+		char *promisor_name;
+		int fd;
+		if (line.len != 40)
+			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
+		string_list_append(names, line.buf);
+
+		/*
+		 * pack-objects creates the .pack and .idx files, but not the
+		 * .promisor file. Create the .promisor file, which is empty.
+		 */
+		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
+					  line.buf);
+		fd = open(promisor_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+		if (fd < 0)
+			die_errno("unable to create '%s'", promisor_name);
+		close(fd);
+		free(promisor_name);
+	}
+	fclose(out);
+	if (finish_command(&cmd))
+		die("Could not finish pack-objects to repack promisor objects");
+}
+
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
 
@@ -191,6 +261,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		{".pack"},
 		{".idx"},
 		{".bitmap", 1},
+		{".promisor", 1},
 	};
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
@@ -293,6 +364,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (pack_everything & ALL_INTO_ONE) {
 		get_non_kept_pack_filenames(&existing_packs, &keep_pack_list);
 
+		repack_promisor_objects(&po_args, &names);
+
 		if (existing_packs.nr && delete_redundant) {
 			if (unpack_unreachable) {
 				argv_array_pushf(&cmd.args,
@@ -440,6 +513,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	/* End of pack replacement. */
 
+	reprepare_packed_git(the_repository);
+
 	if (delete_redundant) {
 		int opts = 0;
 		string_list_sort(&names);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..128130066 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -271,28 +271,91 @@ test_expect_success 'rev-list accepts missing and promised objects on command li
 	git -C repo rev-list --exclude-promisor-objects --objects "$COMMIT" "$TREE" "$BLOB"
 '
 
-test_expect_success 'gc does not repack promisor objects' '
+test_expect_success 'gc repacks promisor objects separately from non-promisor objects' '
 	rm -rf repo &&
 	test_create_repo repo &&
-	test_commit -C repo my_commit &&
+	test_commit -C repo one &&
+	test_commit -C repo two &&
 
-	TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) &&
-	HASH=$(printf "$TREE_HASH\n" | pack_as_from_promisor) &&
+	TREE_ONE=$(git -C repo rev-parse one^{tree}) &&
+	printf "$TREE_ONE\n" | pack_as_from_promisor &&
+	TREE_TWO=$(git -C repo rev-parse two^{tree}) &&
+	printf "$TREE_TWO\n" | pack_as_from_promisor &&
 
 	git -C repo config core.repositoryformatversion 1 &&
 	git -C repo config extensions.partialclone "arbitrary string" &&
 	git -C repo gc &&
 
-	# Ensure that the promisor packfile still exists, and remove it
-	test -e repo/.git/objects/pack/pack-$HASH.pack &&
-	rm repo/.git/objects/pack/pack-$HASH.* &&
-
-	# Ensure that the single other pack contains the commit, but not the tree
+	# Ensure that exactly one promisor packfile exists, and that it
+	# contains the trees but not the commits
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	PROMISOR_PACKFILE=$(sed "s/.promisor/.pack/" <promisorlist) &&
+	git verify-pack $PROMISOR_PACKFILE -v >out &&
+	grep "$TREE_ONE" out &&
+	grep "$TREE_TWO" out &&
+	! grep "$(git -C repo rev-parse one)" out &&
+	! grep "$(git -C repo rev-parse two)" out &&
+
+	# Remove the promisor packfile and associated files
+	rm $(sed "s/.promisor//" <promisorlist).* &&
+
+	# Ensure that the single other pack contains the commits, but not the
+	# trees
 	ls repo/.git/objects/pack/pack-*.pack >packlist &&
 	test_line_count = 1 packlist &&
 	git verify-pack repo/.git/objects/pack/pack-*.pack -v >out &&
-	grep "$(git -C repo rev-parse HEAD)" out &&
-	! grep "$TREE_HASH" out
+	grep "$(git -C repo rev-parse one)" out &&
+	grep "$(git -C repo rev-parse two)" out &&
+	! grep "$TREE_ONE" out &&
+	! grep "$TREE_TWO" out
+'
+
+test_expect_success 'gc does not repack promisor objects if there are none' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo one &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo gc &&
+
+	# Ensure that only one pack exists
+	ls repo/.git/objects/pack/pack-*.pack >packlist &&
+	test_line_count = 1 packlist
+'
+
+repack_and_check () {
+	rm -rf repo2 &&
+	cp -r repo repo2 &&
+	git -C repo2 repack $1 -d &&
+	git -C repo2 fsck &&
+
+	git -C repo2 cat-file -e $2 &&
+	git -C repo2 cat-file -e $3
+}
+
+test_expect_success 'repack -d does not irreversibly delete promisor objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo commit --allow-empty -m one &&
+	git -C repo commit --allow-empty -m two &&
+	git -C repo commit --allow-empty -m three &&
+	git -C repo commit --allow-empty -m four &&
+	ONE=$(git -C repo rev-parse HEAD^^^) &&
+	TWO=$(git -C repo rev-parse HEAD^^) &&
+	THREE=$(git -C repo rev-parse HEAD^) &&
+
+	printf "$TWO\n" | pack_as_from_promisor &&
+	printf "$THREE\n" | pack_as_from_promisor &&
+	delete_object repo "$ONE" &&
+
+	repack_and_check -a "$TWO" "$THREE" &&
+	repack_and_check -A "$TWO" "$THREE" &&
+	repack_and_check -l "$TWO" "$THREE"
 '
 
 test_expect_success 'gc stops traversal when a missing but promised object is reached' '
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
@ 2018-08-09 17:05     ` Junio C Hamano
  2018-08-09 18:12       ` Jonathan Tan
  2018-08-18 16:05     ` Duy Nguyen
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2018-08-09 17:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

> +/*
> + * Write oid to the given struct child_process's stdin, starting it first if
> + * necessary.
> + */
> +static int write_oid(const struct object_id *oid, struct packed_git *pack,
> +		     uint32_t pos, void *data)
> +{
> +	struct child_process *cmd = data;
> +
> +	if (cmd->in == -1) {
> +		if (start_command(cmd))
> +			die("Could not start pack-objects to repack promisor objects");
> +	}
> +
> +	xwrite(cmd->in, oid_to_hex(oid), GIT_SHA1_HEXSZ);
> +	xwrite(cmd->in, "\n", 1);
> +	return 0;
> +}
> +
> +static void repack_promisor_objects(const struct pack_objects_args *args,
> +				    struct string_list *names)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +	FILE *out;
> +	struct strbuf line = STRBUF_INIT;
> +
> +	prepare_pack_objects(&cmd, args);
> +	cmd.in = -1;
> +
> +	/*
> +	 * NEEDSWORK: Giving pack-objects only the OIDs without any ordering
> +	 * hints may result in suboptimal deltas in the resulting pack. See if
> +	 * the OIDs can be sent with fake paths such that pack-objects can use a
> +	 * {type -> existing pack order} ordering when computing deltas instead
> +	 * of a {type -> size} ordering, which may produce better deltas.
> +	 */
> +	for_each_packed_object(write_oid, &cmd,
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +
> +	if (cmd.in == -1)
> +		/* No packed objects; cmd was never started */
> +		return;
> +	close(cmd.in);
> +
> +	out = xfdopen(cmd.out, "r");

Hmm, it is clever to auto-start the pack-objects (and notice there
wasn't anything needing to pack).  Two things that are worth noting
are:

 - The code takes advantage of the fact that cmd.in being left as -1
   is a sign that start_command() was never called (well, it could
   be that it was called but failed to open pipe---but then we would
   have died inside write_oid(), so we can ignore taht case).  This
   is not relying on its implementation detail---cmd.in would be
   replaced by a real fd to the pipe if start_command() was called.

 - We know that pack-objects reads all its standard input through to
   the EOF before emitting a single byte to its standard output, and
   that is why we can use bidi pipe and not worry about deadlocking
   caused by not reading from cmd.out at all (before closing cmd.in,
   that is).

So I kind of like the cleverness of the design of this code.

> +	while (strbuf_getline_lf(&line, out) != EOF) {
> +		char *promisor_name;
> +		int fd;
> +		if (line.len != 40)
> +			die("repack: Expecting 40 character sha1 lines only from pack-objects.");
> +		string_list_append(names, line.buf);
> +
> +		/*
> +		 * pack-objects creates the .pack and .idx files, but not the
> +		 * .promisor file. Create the .promisor file, which is empty.
> +		 */
> +		promisor_name = mkpathdup("%s-%s.promisor", packtmp,
> +					  line.buf);
> +		fd = open(promisor_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
> +		if (fd < 0)
> +			die_errno("unable to create '%s'", promisor_name);
> +		close(fd);
> +		free(promisor_name);

For now, as we do not know what is the appropriate thing to leave in
the file, empty is fine, but perhaps in the future we may want to
carry forward the info from the existing .promisor file?  What does
it initially contain?  Transport invoked with from_promisor bit
seems to call index-pack with "--promisor" option with no argument,
so this is probably in line with that.  Do we in the future need to
teach "--promisor" option to pack-objects we invoke here, which will
be passed to the index-pack it internally performs, so that we do
not have to do this by hand here?

In any case, don't we need to update the doc for the "--promisor"
option of "index-pack"?  The same comment for the new options added
in the same topic, e.g. "--from-promisor" and "--no-dependents"
options added to "fetch-pack".  It seems that the topic that ended
at 0c16cd499d was done rather hastily and under-documented?

> @@ -440,6 +513,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	/* End of pack replacement. */
>  
> +	reprepare_packed_git(the_repository);
> +

I do not think this call hurts, but why does this become suddenly
necessary with _this_ patch?  Is it an unrelated bugfix?

>  	if (delete_redundant) {
>  		int opts = 0;
>  		string_list_sort(&names);

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-09 17:05     ` Junio C Hamano
@ 2018-08-09 18:12       ` Jonathan Tan
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Tan @ 2018-08-09 18:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git mailing list, Jeff King

On Thu, Aug 9, 2018 at 10:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Hmm, it is clever to auto-start the pack-objects (and notice there
> wasn't anything needing to pack).  Two things that are worth noting
> are:
>
>  - The code takes advantage of the fact that cmd.in being left as -1
>    is a sign that start_command() was never called (well, it could
>    be that it was called but failed to open pipe---but then we would
>    have died inside write_oid(), so we can ignore taht case).  This
>    is not relying on its implementation detail---cmd.in would be
>    replaced by a real fd to the pipe if start_command() was called.
>
>  - We know that pack-objects reads all its standard input through to
>    the EOF before emitting a single byte to its standard output, and
>    that is why we can use bidi pipe and not worry about deadlocking
>    caused by not reading from cmd.out at all (before closing cmd.in,
>    that is).
>
> So I kind of like the cleverness of the design of this code.

Thanks for checking this.

> For now, as we do not know what is the appropriate thing to leave in
> the file, empty is fine, but perhaps in the future we may want to
> carry forward the info from the existing .promisor file?  What does
> it initially contain?  Transport invoked with from_promisor bit
> seems to call index-pack with "--promisor" option with no argument,
> so this is probably in line with that.  Do we in the future need to
> teach "--promisor" option to pack-objects we invoke here, which will
> be passed to the index-pack it internally performs, so that we do
> not have to do this by hand here?

There might be more than one .promisor file that we are repacking, so
we would also have to deal with the order. It might be better to
document that the contents of .promisor files are lost upon repack.

> In any case, don't we need to update the doc for the "--promisor"
> option of "index-pack"?  The same comment for the new options added
> in the same topic, e.g. "--from-promisor" and "--no-dependents"
> options added to "fetch-pack".  It seems that the topic that ended
> at 0c16cd499d was done rather hastily and under-documented?

Yes, you're right - I'll make a patch documenting these.

>> @@ -440,6 +513,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>
>>       /* End of pack replacement. */
>>
>> +     reprepare_packed_git(the_repository);
>> +
>
> I do not think this call hurts, but why does this become suddenly
> necessary with _this_ patch?  Is it an unrelated bugfix?

Hmm...I thought I mentioned somewhere that we need
reprepare_packed_git now because for_each_packed_object calls
prepare_packed_git, but apparently I didn't. I would add the following
to the commit message (if you'd rather not do it yourself, let me know
and I'll send a v3 with the new commit message):

Because for_each_packed_object() calls prepare_packed_git(), a call to
reprepare_packed_git() now has to be inserted after all the packfile
manipulation - if not, old information about packed objects would
remain in the_repository->objects->packed_git.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set
  2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
  2018-08-09 17:05     ` Junio C Hamano
@ 2018-08-18 16:05     ` Duy Nguyen
  1 sibling, 0 replies; 18+ messages in thread
From: Duy Nguyen @ 2018-08-18 16:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Jeff King, Junio C Hamano

On Thu, Aug 9, 2018 at 12:35 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> @@ -179,6 +179,76 @@ static void prepare_pack_objects(struct child_process *cmd,
>         cmd->out = -1;
>  }
>
> +/*
> + * Write oid to the given struct child_process's stdin, starting it first if
> + * necessary.
> + */
> +static int write_oid(const struct object_id *oid, struct packed_git *pack,
> +                    uint32_t pos, void *data)
> +{
> +       struct child_process *cmd = data;
> +
> +       if (cmd->in == -1) {
> +               if (start_command(cmd))
> +                       die("Could not start pack-objects to repack promisor objects");

_() ? (also other messages)
-- 
Duy

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-08-18 16:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 20:12 [PATCH 0/2] Repacking of promisor packfiles Jonathan Tan
2018-08-07 20:12 ` [PATCH 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
2018-08-07 20:12 ` [PATCH 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
2018-08-07 20:57   ` Junio C Hamano
2018-08-07 23:23     ` Jonathan Tan
2018-08-08  0:25       ` Junio C Hamano
2018-08-08 18:45         ` Jonathan Tan
2018-08-08 15:50       ` Jeff King
2018-08-08 16:17         ` Junio C Hamano
2018-08-07 22:37   ` Junio C Hamano
2018-08-07 23:25     ` Jonathan Tan
2018-08-08 16:34       ` Junio C Hamano
2018-08-08 22:34 ` [PATCH v2 0/2] Repacking of promisor packfiles Jonathan Tan
2018-08-08 22:34   ` [PATCH v2 1/2] repack: refactor setup of pack-objects cmd Jonathan Tan
2018-08-08 22:34   ` [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set Jonathan Tan
2018-08-09 17:05     ` Junio C Hamano
2018-08-09 18:12       ` Jonathan Tan
2018-08-18 16:05     ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).