git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/6] Create commit-graph file format v2
@ 2019-01-23 21:59 Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
                   ` (8 more replies)
  0 siblings, 9 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano

The commit-graph file format has some shortcomings that were discussed
on-list:

 1. It doesn't use the 4-byte format ID from the_hash_algo.
    
    
 2. There is no way to change the reachability index from generation numbers
    to corrected commit date [1].
    
    
 3. The unused byte in the format could be used to signal the file is
    incremental, but current clients ignore the value even if it is
    non-zero.
    
    

This series adds a new version (2) to the commit-graph file. The fifth byte
already specified the file format, so existing clients will gracefully
respond to files with a different version number. The only real change now
is that the header takes 12 bytes instead of 8, due to using the 4-byte
format ID for the hash algorithm.

The new bytes reserved for the reachability index version and incremental
file formats are now expected to be equal to the defaults. When we update
these values to be flexible in the future, if a client understands
commit-graph v2 but not those new values, then it will fail gracefully.

This series is based on ab/commit-graph-write-progress and bc/sha-256.

Thanks, -Stolee

[1] 
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/

Derrick Stolee (6):
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: create new version flags
  commit-graph: add --version=<n> option
  commit-graph: implement file format version 2
  commit-graph: test verifying a corrupt v2 header

 Documentation/git-commit-graph.txt            |   3 +
 .../technical/commit-graph-format.txt         |  26 ++-
 builtin/commit-graph.c                        |  43 +++--
 builtin/commit.c                              |   5 +-
 builtin/gc.c                                  |   7 +-
 commit-graph.c                                | 158 +++++++++++++-----
 commit-graph.h                                |  16 +-
 t/t5318-commit-graph.sh                       |  42 ++++-
 8 files changed, 233 insertions(+), 67 deletions(-)


base-commit: 91b3ce35eeb93be1f4406e25ccdc4ab983a8e5af
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/112
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 1/6] commit-graph: return with errors during write
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 19 +++++++------
 builtin/commit.c       |  5 ++--
 builtin/gc.c           |  7 ++---
 commit-graph.c         | 60 +++++++++++++++++++++++++++++-------------
 commit-graph.h         | 10 +++----
 5 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 4ae502754c..b12d46fdc8 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -126,6 +126,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *pack_indexes = NULL;
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
+	int result;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -153,10 +154,8 @@ static int graph_write(int argc, const char **argv)
 
 	read_replace_refs = 0;
 
-	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-		return 0;
-	}
+	if (opts.reachable)
+		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -173,14 +172,14 @@ static int graph_write(int argc, const char **argv)
 		UNLEAK(buf);
 	}
 
-	write_commit_graph(opts.obj_dir,
-			   pack_indexes,
-			   commit_hex,
-			   opts.append,
-			   1);
+	result = write_commit_graph(opts.obj_dir,
+				    pack_indexes,
+				    commit_hex,
+				    opts.append,
+				    1);
 
 	UNLEAK(lines);
-	return 0;
+	return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 004b816635..04b0717b35 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1667,8 +1667,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "new_index file. Check that disk is not full and quota is\n"
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-		write_commit_graph_reachable(get_object_directory(), 0, 0);
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+		return 1;
 
 	repo_rerere(the_repository, 0);
 	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 7696017cd4..9c6c9c9007 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -662,9 +662,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (pack_garbage.nr > 0)
 		clean_pack_garbage();
 
-	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0,
-					     !quiet && !daemonized);
+	if (gc_write_commit_graph &&
+	    write_commit_graph_reachable(get_object_directory(), 0,
+					 !quiet && !daemonized))
+		return 1;
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 0f8274d15d..162b9f2a85 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -755,27 +755,30 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+				 int report_progress)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
+	int result;
 
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+	result = write_commit_graph(obj_dir, NULL, &list,
+				    append, report_progress);
 
 	string_list_clear(&list, 0);
+	return result;
 }
 
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress)
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
-	char *graph_name;
+	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
@@ -787,15 +790,17 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
-		return;
+		return 0;
 
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
+	commits.list = NULL;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -836,10 +841,16 @@ void write_commit_graph(const char *obj_dir,
 			strbuf_setlen(&packname, dirlen);
 			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p)
-				die(_("error adding pack %s"), packname.buf);
-			if (open_pack_index(p))
-				die(_("error opening index for %s"), packname.buf);
+			if (!p) {
+				error(_("error adding pack %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
+			if (open_pack_index(p)) {
+				error(_("error opening index for %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
 			for_each_object_in_pack(p, add_packed_commits, &oids,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
@@ -910,8 +921,11 @@ void write_commit_graph(const char *obj_dir,
 	}
 	stop_progress(&progress);
 
-	if (count_distinct >= GRAPH_PARENT_MISSING)
-		die(_("the commit graph format cannot write %d commits"), count_distinct);
+	if (count_distinct >= GRAPH_PARENT_MISSING) {
+		error(_("the commit graph format cannot write %d commits"), count_distinct);
+		res = 1;
+		goto cleanup;
+	}
 
 	commits.nr = 0;
 	commits.alloc = count_distinct;
@@ -943,16 +957,21 @@ void write_commit_graph(const char *obj_dir,
 	num_chunks = num_extra_edges ? 4 : 3;
 	stop_progress(&progress);
 
-	if (commits.nr >= GRAPH_PARENT_MISSING)
-		die(_("too many commits to write graph"));
+	if (commits.nr >= GRAPH_PARENT_MISSING) {
+		error(_("too many commits to write graph"));
+		res = 1;
+		goto cleanup;
+	}
 
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name)) {
 		UNLEAK(graph_name);
-		die_errno(_("unable to create leading directories of %s"),
-			  graph_name);
+		error(_("unable to create leading directories of %s"),
+			graph_name);
+		res = errno;
+		goto cleanup;
 	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
@@ -1011,9 +1030,12 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+cleanup:
 	free(graph_name);
 	free(commits.list);
 	free(oids.list);
+
+	return res;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
diff --git a/commit-graph.h b/commit-graph.h
index e6aff2c2e1..cd333a0cd0 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -60,12 +60,12 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
  */
 int generation_numbers_enabled(struct repository *r);
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
+int write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress);
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress);
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 2/6] commit-graph: collapse parameters into flags
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 8 +++++---
 builtin/commit.c       | 2 +-
 builtin/gc.c           | 4 ++--
 commit-graph.c         | 9 +++++----
 commit-graph.h         | 8 +++++---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b12d46fdc8..0c92421f75 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -127,6 +127,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
 	int result;
+	int flags = COMMIT_GRAPH_PROGRESS;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -151,11 +152,13 @@ static int graph_write(int argc, const char **argv)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
+	if (opts.append)
+		flags |= COMMIT_GRAPH_APPEND;
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
+		return write_commit_graph_reachable(opts.obj_dir, flags);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -175,8 +178,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    opts.append,
-				    1);
+				    flags);
 
 	UNLEAK(lines);
 	return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index 04b0717b35..3228de4e3c 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1668,7 +1668,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+	    write_commit_graph_reachable(get_object_directory(), 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 9c6c9c9007..198872206b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -663,8 +663,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 
 	if (gc_write_commit_graph &&
-	    write_commit_graph_reachable(get_object_directory(), 0,
-					 !quiet && !daemonized))
+	    write_commit_graph_reachable(get_object_directory(),
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index 162b9f2a85..28fe2378be 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -755,15 +755,14 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				 int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int flags)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    append, report_progress);
+				    flags);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -772,7 +771,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append,
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress)
+		       int flags)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -791,6 +790,8 @@ int write_commit_graph(const char *obj_dir,
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
 	int res = 0;
+	int append = flags & COMMIT_GRAPH_APPEND;
+	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
diff --git a/commit-graph.h b/commit-graph.h
index cd333a0cd0..83fa548138 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -60,12 +60,14 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
  */
 int generation_numbers_enabled(struct repository *r);
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress);
+#define COMMIT_GRAPH_APPEND     (1 << 0)
+#define COMMIT_GRAPH_PROGRESS   (1 << 1)
+
+int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress);
+		       int flags);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 3/6] commit-graph: create new version flags
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of a new commit-graph file format version, create
a flag for the write_commit_graph() and write_commit_graph_reachable()
methods to take a version number.

When there is no specified version, the implementation selects a
default value. Currently, the only valid value is 1.

The file format will change the header information, so place the
existing header logic inside a switch statement with only one case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 58 +++++++++++++++++++++++++++++++++-----------------
 commit-graph.h |  1 +
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 28fe2378be..f7f45893fd 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,9 +25,6 @@
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
-#define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
-
 #define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
 #define GRAPH_PARENT_MISSING 0x7fffffff
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
@@ -118,30 +115,35 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != GRAPH_VERSION) {
+	if (graph_version != 1) {
 		error(_("graph version %X does not match version %X"),
-		      graph_version, GRAPH_VERSION);
-		goto cleanup_fail;
-	}
-
-	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != oid_version()) {
-		error(_("hash version %X does not match version %X"),
-		      hash_version, oid_version());
+		      graph_version, 1);
 		goto cleanup_fail;
 	}
 
 	graph = alloc_commit_graph();
 
+	switch (graph_version) {
+	case 1:
+		hash_version = *(unsigned char*)(data + 5);
+		if (hash_version != oid_version()) {
+			error(_("hash version %X does not match version %X"),
+			      hash_version, oid_version());
+			goto cleanup_fail;
+		}
+
+		graph->num_chunks = *(unsigned char*)(data + 6);
+		chunk_lookup = data + 8;
+		break;
+	}
+
 	graph->hash_len = the_hash_algo->rawsz;
-	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
-	chunk_lookup = data + 8;
 	for (i = 0; i < graph->num_chunks; i++) {
 		uint32_t chunk_id = get_be32(chunk_lookup + 0);
 		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
@@ -792,10 +794,22 @@ int write_commit_graph(const char *obj_dir,
 	int res = 0;
 	int append = flags & COMMIT_GRAPH_APPEND;
 	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
+	int version = 0;
+	int header_size = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
 
+	if (flags & COMMIT_GRAPH_VERSION_1)
+		version = 1;
+	if (!version)
+		version = 1;
+	if (version != 1) {
+		error(_("unsupported commit-graph version %d"),
+		      version);
+		return 1;
+	}
+
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
@@ -980,10 +994,16 @@ int write_commit_graph(const char *obj_dir,
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
-	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused padding byte */
+	hashwrite_u8(f, version);
+
+	switch (version) {
+	case 1:
+		hashwrite_u8(f, oid_version());
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 0); /* unused padding byte */
+		header_size = 8;
+		break;
+	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
@@ -994,7 +1014,7 @@ int write_commit_graph(const char *obj_dir,
 		chunk_ids[3] = 0;
 	chunk_ids[4] = 0;
 
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
diff --git a/commit-graph.h b/commit-graph.h
index 83fa548138..e03df54e33 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -62,6 +62,7 @@ int generation_numbers_enabled(struct repository *r);
 
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
+#define COMMIT_GRAPH_VERSION_1  (1 << 2)
 
 int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/6] commit-graph: add --version=<n> option
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
  2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Allo the commit-graph builtin to specify the file format version
using the '--version=<n>' option. Specify the version exactly in
the verification tests as using a different version would change
the offsets used in those tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  3 +++
 builtin/commit-graph.c             | 13 +++++++++++--
 t/t5318-commit-graph.sh            |  2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 624470e198..1d1cc70de4 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -51,6 +51,9 @@ or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
++
+With the `--version=<n>` option, specify the file format version. Used
+only for testing.
 
 'read'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0c92421f75..b1bed84260 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -35,6 +35,7 @@ static struct opts_commit_graph {
 	int stdin_packs;
 	int stdin_commits;
 	int append;
+	int version;
 } opts;
 
 
@@ -141,6 +142,8 @@ static int graph_write(int argc, const char **argv)
 			N_("start walk at commits listed by stdin")),
 		OPT_BOOL(0, "append", &opts.append,
 			N_("include all commits already in the commit-graph file")),
+		OPT_INTEGER(0, "version", &opts.version,
+			N_("specify the file format version")),
 		OPT_END(),
 	};
 
@@ -155,6 +158,12 @@ static int graph_write(int argc, const char **argv)
 	if (opts.append)
 		flags |= COMMIT_GRAPH_APPEND;
 
+	switch (opts.version) {
+	case 1:
+		flags |= COMMIT_GRAPH_VERSION_1;
+		break;
+	}
+
 	read_replace_refs = 0;
 
 	if (opts.reachable)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index f4deb13b1d..b79d6263e9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
-	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits --version=1 &&
 	git commit-graph verify >output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 5/6] commit-graph: implement file format version 2
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-23 23:56   ` Jonathan Tan
                     ` (2 more replies)
  2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
     of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
     We currently only support generation numbers, but that will
     change in the future.

  3. Git did not fail with error if the unused eighth byte was
     non-zero, so we could not use that to indicate an incremental
     file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

The 'git commit-graph read' subcommand needs updating to show the
new data.

Set the default file format version to 2, and adjust the tests to
expect the new 'git commit-graph read' output.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 26 +++++++++-
 builtin/commit-graph.c                        |  9 ++++
 commit-graph.c                                | 47 ++++++++++++++++---
 commit-graph.h                                |  1 +
 t/t5318-commit-graph.sh                       |  9 +++-
 5 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
       The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-      Currently, the only valid version is 1.
+      Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
       We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
      Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+      Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+      Current clients expect this value to be zero, and will not
+      try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+      If this identifier does not agree with the repository's current
+      hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b1bed84260..28787d0c9c 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -102,6 +102,11 @@ static int graph_read(int argc, const char **argv)
 		*(unsigned char*)(graph->data + 5),
 		*(unsigned char*)(graph->data + 6),
 		*(unsigned char*)(graph->data + 7));
+
+	if (*(unsigned char *)(graph->data + 4) == 2)
+		printf("hash algorithm: %X\n",
+		       get_be32(graph->data + 8));
+
 	printf("num_commits: %u\n", graph->num_commits);
 	printf("chunks:");
 
@@ -162,6 +167,10 @@ static int graph_write(int argc, const char **argv)
 	case 1:
 		flags |= COMMIT_GRAPH_VERSION_1;
 		break;
+
+	case 2:
+		flags |= COMMIT_GRAPH_VERSION_2;
+		break;
 	}
 
 	read_replace_refs = 0;
diff --git a/commit-graph.c b/commit-graph.c
index f7f45893fd..aeb6cae656 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -90,7 +90,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	uint64_t last_chunk_offset;
 	uint32_t last_chunk_id;
 	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
+	unsigned char graph_version, hash_version, reach_index_version;
+	uint32_t hash_id;
 
 	if (fd < 0)
 		return NULL;
@@ -115,9 +116,9 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != 1) {
-		error(_("graph version %X does not match version %X"),
-		      graph_version, 1);
+	if (!graph_version || graph_version > 2) {
+		error(_("unsupported graph version %X"),
+		      graph_version);
 		goto cleanup_fail;
 	}
 
@@ -135,6 +136,30 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 		graph->num_chunks = *(unsigned char*)(data + 6);
 		chunk_lookup = data + 8;
 		break;
+
+	case 2:
+		graph->num_chunks = *(unsigned char *)(data + 5);
+
+		reach_index_version = *(unsigned char *)(data + 6);
+		if (reach_index_version != 1) {
+			error(_("unsupported reachability index version %d"),
+			      reach_index_version);
+			goto cleanup_fail;
+		}
+
+		if (*(unsigned char*)(data + 7)) {
+			error(_("unsupported value in commit-graph header"));
+			goto cleanup_fail;
+		}
+
+		hash_id = get_be32(data + 8);
+		if (hash_id != the_hash_algo->format_id) {
+			error(_("commit-graph hash algorithm does not match current algorithm"));
+			goto cleanup_fail;
+		}
+
+		chunk_lookup = data + 12;
+		break;
 	}
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -802,9 +827,11 @@ int write_commit_graph(const char *obj_dir,
 
 	if (flags & COMMIT_GRAPH_VERSION_1)
 		version = 1;
+	if (flags & COMMIT_GRAPH_VERSION_2)
+		version = 2;
 	if (!version)
-		version = 1;
-	if (version != 1) {
+		version = 2;
+	if (version <= 0 || version > 2) {
 		error(_("unsupported commit-graph version %d"),
 		      version);
 		return 1;
@@ -1003,6 +1030,14 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite_u8(f, 0); /* unused padding byte */
 		header_size = 8;
 		break;
+
+	case 2:
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 1); /* reachability index version */
+		hashwrite_u8(f, 0); /* unused padding byte */
+		hashwrite_be32(f, the_hash_algo->format_id);
+		header_size = 12;
+		break;
 	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
diff --git a/commit-graph.h b/commit-graph.h
index e03df54e33..050137063b 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -63,6 +63,7 @@ int generation_numbers_enabled(struct repository *r);
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
 #define COMMIT_GRAPH_VERSION_1  (1 << 2)
+#define COMMIT_GRAPH_VERSION_2  (1 << 3)
 
 int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b79d6263e9..3ff5e3b48d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -65,7 +65,8 @@ graph_read_expect() {
 		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
 	fi
 	cat >expect <<- EOF
-	header: 43475048 1 1 $NUM_CHUNKS 0
+	header: 43475048 2 $NUM_CHUNKS 1 0
+	hash algorithm: 73686131
 	num_commits: $1
 	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
 	EOF
@@ -390,10 +391,14 @@ test_expect_success 'detect bad signature' '
 '
 
 test_expect_success 'detect bad version' '
-	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
 		"graph version"
 '
 
+test_expect_success 'detect version 2 with version 1 data' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"reachability index version"
+'
 test_expect_success 'detect bad hash version' '
 	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
 		"hash version"
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 6/6] commit-graph: test verifying a corrupt v2 header
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
@ 2019-01-23 21:59 ` Derrick Stolee via GitGitGadget
  2019-01-23 23:59   ` Jonathan Tan
  2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-01-23 21:59 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph file format v2 changes the v1 data only in the
header information. Add tests that check the 'verify' subcommand
catches corruption in the v2 header.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3ff5e3b48d..be7bbf911a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -497,6 +497,37 @@ test_expect_success 'git fsck (checks commit-graph)' '
 	test_must_fail git fsck
 '
 
+test_expect_success 'rewrite commmit-graph with version 2' '
+	rm -f .git/objects/info/commit-graph &&
+	git commit-graph write --reachable --version=2 &&
+	git commit-graph verify
+'
+
+GRAPH_BYTE_CHUNK_COUNT=5
+GRAPH_BYTE_REACH_INDEX=6
+GRAPH_BYTE_UNUSED=7
+GRAPH_BYTE_HASH=8
+
+test_expect_success 'detect low chunk count (v2)' '
+	corrupt_graph_and_verify $GRAPH_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect incorrect reachability index' '
+	corrupt_graph_and_verify $GRAPH_REACH_INDEX "\03" \
+		"reachability index version"
+'
+
+test_expect_success 'detect non-zero unused byte' '
+	corrupt_graph_and_verify $GRAPH_BYTE_UNUSED "\01" \
+		"unsupported value"
+'
+
+test_expect_success 'detect bad hash version (v2)' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\00" \
+		"hash algorithm"
+'
+
 test_expect_success 'setup non-the_repository tests' '
 	rm -rf repo &&
 	git init repo &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 5/6] commit-graph: implement file format version 2
  2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
@ 2019-01-23 23:56   ` Jonathan Tan
  2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
  2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 89+ messages in thread
From: Jonathan Tan @ 2019-01-23 23:56 UTC (permalink / raw)
  To: gitgitgadget; +Cc: git, sandals, avarab, gitster, dstolee, Jonathan Tan

> +Version 2:
> +
> +  1-byte number (C) of "chunks"
> +
> +  1-byte reachability index version number:
> +      Currently, the only valid number is 1.
> +
> +  1-byte (reserved for later use)
> +      Current clients expect this value to be zero, and will not
> +      try to read the commit-graph file if it is non-zero.
> +
> +  4-byte format identifier for the hash algorithm:
> +      If this identifier does not agree with the repository's current
> +      hash algorithm, then the client will not read the commit graph.

[snip]

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index b79d6263e9..3ff5e3b48d 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -65,7 +65,8 @@ graph_read_expect() {
>  		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
>  	fi
>  	cat >expect <<- EOF
> -	header: 43475048 1 1 $NUM_CHUNKS 0
> +	header: 43475048 2 $NUM_CHUNKS 1 0
> +	hash algorithm: 73686131
>  	num_commits: $1
>  	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
>  	EOF
> @@ -390,10 +391,14 @@ test_expect_success 'detect bad signature' '
>  '
>  
>  test_expect_success 'detect bad version' '
> -	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
> +	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
>  		"graph version"
>  '
>  
> +test_expect_success 'detect version 2 with version 1 data' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
> +		"reachability index version"
> +'
>  test_expect_success 'detect bad hash version' '
>  	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
>  		"hash version"

Should there also be a test that the "reserved" section be 0 and the
4-byte identifier agrees with the repo's hash algorithm? I assume that
this can be done by "corrupting" the version to 2 and then truly
corrupting the subsequent bytes.

Other than that, this and the previous patches look good.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 6/6] commit-graph: test verifying a corrupt v2 header
  2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
@ 2019-01-23 23:59   ` Jonathan Tan
  0 siblings, 0 replies; 89+ messages in thread
From: Jonathan Tan @ 2019-01-23 23:59 UTC (permalink / raw)
  To: gitgitgadget; +Cc: git, sandals, avarab, gitster, dstolee, Jonathan Tan

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The commit-graph file format v2 changes the v1 data only in the
> header information. Add tests that check the 'verify' subcommand
> catches corruption in the v2 header.

Ah, I should have read this patch before I wrote [1]. I think the commit
message of that patch should contain a note that verification of the v2
file format is done in a subsequent patch.

With or without that additional note, this series looks good to me.

[1] https://public-inbox.org/git/20190123235630.183779-1-jonathantanmy@google.com/

> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t5318-commit-graph.sh | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 3ff5e3b48d..be7bbf911a 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -497,6 +497,37 @@ test_expect_success 'git fsck (checks commit-graph)' '
>  	test_must_fail git fsck
>  '
>  
> +test_expect_success 'rewrite commmit-graph with version 2' '
> +	rm -f .git/objects/info/commit-graph &&
> +	git commit-graph write --reachable --version=2 &&
> +	git commit-graph verify
> +'
> +
> +GRAPH_BYTE_CHUNK_COUNT=5
> +GRAPH_BYTE_REACH_INDEX=6
> +GRAPH_BYTE_UNUSED=7
> +GRAPH_BYTE_HASH=8
> +
> +test_expect_success 'detect low chunk count (v2)' '
> +	corrupt_graph_and_verify $GRAPH_CHUNK_COUNT "\02" \
> +		"missing the .* chunk"
> +'
> +
> +test_expect_success 'detect incorrect reachability index' '
> +	corrupt_graph_and_verify $GRAPH_REACH_INDEX "\03" \
> +		"reachability index version"
> +'
> +
> +test_expect_success 'detect non-zero unused byte' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_UNUSED "\01" \
> +		"unsupported value"
> +'
> +
> +test_expect_success 'detect bad hash version (v2)' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\00" \
> +		"hash algorithm"
> +'
> +
>  test_expect_success 'setup non-the_repository tests' '
>  	rm -rf repo &&
>  	git init repo &&
> -- 
> gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/6] commit-graph: add --version=<n> option
  2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
@ 2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-24  9:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, Jan 23 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Allo the commit-graph builtin to specify the file format version

"Allow"

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 5/6] commit-graph: implement file format version 2
  2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
  2019-01-23 23:56   ` Jonathan Tan
@ 2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
  2019-01-24 14:34     ` Derrick Stolee
  2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-01-24  9:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, Jan 23 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The commit-graph file format had some shortcomings which we now
> correct:
>
>   1. The hash algorithm was determined by a single byte, instead
>      of the 4-byte format identifier.
>
>   2. There was no way to update the reachability index we used.
>      We currently only support generation numbers, but that will
>      change in the future.
>
>   3. Git did not fail with error if the unused eighth byte was
>      non-zero, so we could not use that to indicate an incremental
>      file format without breaking compatibility across versions.
>
> The new format modifies the header of the commit-graph to solve
> these problems. We use the 4-byte hash format id, freeing up a byte
> in our 32-bit alignment to introduce a reachability index version.
> We can also fail to read the commit-graph if the eighth byte is
> non-zero.

I haven't tested, but it looks from the patch like we can transparently
read existing v1 data and then will write v2 the next time. Would be
helpful for reviewers if this was noted explicitly in the commit
message.

Should there be a GIT_TEST_COMMIT_GRAPH_VERSION=[12] going forward to
test the non-default version, or do you feel confident the tests added
here test the upgrade path & old code well enough?

> The 'git commit-graph read' subcommand needs updating to show the
> new data.

Let's say "The ... subcommand has been updated to show the new
data". This sounds like a later patch is going to do that, but in fact
it's done here.

> Set the default file format version to 2, and adjust the tests to
> expect the new 'git commit-graph read' output.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  .../technical/commit-graph-format.txt         | 26 +++++++++-
>  builtin/commit-graph.c                        |  9 ++++
>  commit-graph.c                                | 47 ++++++++++++++++---
>  commit-graph.h                                |  1 +
>  t/t5318-commit-graph.sh                       |  9 +++-
>  5 files changed, 83 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 16452a0504..e367aa94b1 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -31,13 +31,22 @@ and hash type.
>
>  All 4-byte numbers are in network order.
>
> +There are two versions available, 1 and 2. These currently differ only in
> +the header.

Shouldn't this be s/currently/ / ? Won't we add a version 3 if we make
new changes?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 5/6] commit-graph: implement file format version 2
  2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
@ 2019-01-24 14:34     ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-01-24 14:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee

On 1/24/2019 4:40 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jan 23 2019, Derrick Stolee via GitGitGadget wrote:
>
>> The new format modifies the header of the commit-graph to solve
>> these problems. We use the 4-byte hash format id, freeing up a byte
>> in our 32-bit alignment to introduce a reachability index version.
>> We can also fail to read the commit-graph if the eighth byte is
>> non-zero.
> I haven't tested, but it looks from the patch like we can transparently
> read existing v1 data and then will write v2 the next time. Would be
> helpful for reviewers if this was noted explicitly in the commit
> message.

Can do.

>
> Should there be a GIT_TEST_COMMIT_GRAPH_VERSION=[12] going forward to
> test the non-default version, or do you feel confident the tests added
> here test the upgrade path & old code well enough?

You're right that we should have an explicit "upgrade" test:

  1. Write a v1 commit-graph
  2. Add a commit
  3. Write a v2 commit-graph

As for a new GIT_TEST_ variable, we should only need to continue relying 
on GIT_TEST_COMMIT_GRAPH to test v2. I can add a 'graph_git_behavior' 
call on an explicitly v1 commit-graph file to get most of the coverage 
we need.

>> The 'git commit-graph read' subcommand needs updating to show the
>> new data.
> Let's say "The ... subcommand has been updated to show the new
> data". This sounds like a later patch is going to do that, but in fact
> it's done here.

Will clean up.

>> Set the default file format version to 2, and adjust the tests to
>> expect the new 'git commit-graph read' output.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   .../technical/commit-graph-format.txt         | 26 +++++++++-
>>   builtin/commit-graph.c                        |  9 ++++
>>   commit-graph.c                                | 47 ++++++++++++++++---
>>   commit-graph.h                                |  1 +
>>   t/t5318-commit-graph.sh                       |  9 +++-
>>   5 files changed, 83 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
>> index 16452a0504..e367aa94b1 100644
>> --- a/Documentation/technical/commit-graph-format.txt
>> +++ b/Documentation/technical/commit-graph-format.txt
>> @@ -31,13 +31,22 @@ and hash type.
>>
>>   All 4-byte numbers are in network order.
>>
>> +There are two versions available, 1 and 2. These currently differ only in
>> +the header.
> Shouldn't this be s/currently/ / ? Won't we add a version 3 if we make
> new changes?

When we add a new reachability index version, then the content of the 
data chunk will change. Since we have a separate byte for versioning 
that data, we do not need a v3 for the file format as a whole. A similar 
statement applies to the unused byte reserved for the incremental file 
format: we won't need to increase the file format version as we will 
make that number non-zero and add a chunk with extra data.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 0/6] Create commit-graph file format v2
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
@ 2019-01-24 23:05 ` Junio C Hamano
  2019-01-24 23:39 ` Junio C Hamano
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-01-24 23:05 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, avarab

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The commit-graph file format has some shortcomings that were discussed
> on-list:
> ...
> This series adds a new version (2) to the commit-graph file.

Sigh.  It is unfortunate that we have to bump the format version
this early before we can say "it is no longer experimental" with
confidence X-<.  Perhaps we have been moving too fast for our own
good and should slow down in introducing new low-level machineries?

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 0/6] Create commit-graph file format v2
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
@ 2019-01-24 23:39 ` Junio C Hamano
  2019-01-25 13:54   ` Derrick Stolee
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  8 siblings, 1 reply; 89+ messages in thread
From: Junio C Hamano @ 2019-01-24 23:39 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, avarab

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series is based on ab/commit-graph-write-progress and bc/sha-256.

Thanks.

It seems that the base (i.e. merge between these two topics) you
used may have used a version of either topic (most likely the
latter) slightly older than what I have, as patches 1 and 2 seem to
lack the local variable "hashsz" in the context, near the beginning
of commit_graph_write().  I wiggled the patches in, but it has too
heavy conflict merging to 'pu', so it may have to wait until the
other topics stabilize a bit further.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 0/6] Create commit-graph file format v2
  2019-01-24 23:39 ` Junio C Hamano
@ 2019-01-25 13:54   ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-01-25 13:54 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, sandals, avarab

On 1/24/2019 6:39 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This series is based on ab/commit-graph-write-progress and bc/sha-256.
> 
> Thanks.
> 
> It seems that the base (i.e. merge between these two topics) you
> used may have used a version of either topic (most likely the
> latter) slightly older than what I have, as patches 1 and 2 seem to
> lack the local variable "hashsz" in the context, near the beginning
> of commit_graph_write().  I wiggled the patches in, but it has too
> heavy conflict merging to 'pu', so it may have to wait until the
> other topics stabilize a bit further.

Sorry that the merge was painful. I would have waited longer for
things to stabilize, but I'm expecting to go on paternity leave
soon. Didn't want to get the idea out there before I disappear
for a while.

When things stabilize, I may have time to do a rebase and work
out the details myself. Otherwise, everyone has my blessing to
take work I've started and move it forward themselves.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 5/6] commit-graph: implement file format version 2
  2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
  2019-01-23 23:56   ` Jonathan Tan
  2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
@ 2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-03-21  9:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, Jan 23 2019, Derrick Stolee via GitGitGadget wrote:

>  	graph_version = *(unsigned char*)(data + 4);
> -	if (graph_version != 1) {
> -		error(_("graph version %X does not match version %X"),
> -		      graph_version, 1);
> +	if (!graph_version || graph_version > 2) {
> +		error(_("unsupported graph version %X"),
> +		      graph_version);
>  		goto cleanup_fail;
>  	}

Just noticed this while writing
https://public-inbox.org/git/87va0cd1zp.fsf@evledraar.gmail.com/ i.e. to
resolve the conflict with my commit-graph segfault fixing series.

This really should be something like:

	/* earlier */
	#define GRAPH_MAX_VERSION 2

	if (!graph_version || graph_version > GRAPH_MAX_VERSION) {
		error(_("commit-graph unsupported graph version %X, we support up to %X"),
			graph_version, GRAPH_MAX_VERSION);

Also, I'm confused as to what these patches are based on. Your is doing
"!= 1", but on "master" and ever since 2a2e32bdc5 ("commit-graph:
implement git commit-graph read", 2018-04-10) this has been the macro
"GRAPH_VERSION" instead of "1".

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 0/5] Create commit-graph file format v2
  2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2019-01-24 23:39 ` Junio C Hamano
@ 2019-04-24 19:58 ` " Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
                     ` (6 more replies)
  8 siblings, 7 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano

The commit-graph file format has some shortcomings that were discussed
on-list:

 1. It doesn't use the 4-byte format ID from the_hash_algo.
    
    
 2. There is no way to change the reachability index from generation numbers
    to corrected commit date [1].
    
    
 3. The unused byte in the format could be used to signal the file is
    incremental, but current clients ignore the value even if it is
    non-zero.
    
    

This series adds a new version (2) to the commit-graph file. The fifth byte
already specified the file format, so existing clients will gracefully
respond to files with a different version number. The only real change now
is that the header takes 12 bytes instead of 8, due to using the 4-byte
format ID for the hash algorithm.

The new bytes reserved for the reachability index version and incremental
file formats are now expected to be equal to the defaults. When we update
these values to be flexible in the future, if a client understands
commit-graph v2 but not those new values, then it will fail gracefully.

NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
were significant and subtle.

Thanks, -Stolee

[1] 
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/

Derrick Stolee (5):
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: create new version flags
  commit-graph: add --version=<n> option
  commit-graph: implement file format version 2

 Documentation/git-commit-graph.txt            |   3 +
 .../technical/commit-graph-format.txt         |  26 ++-
 builtin/commit-graph.c                        |  43 +++--
 builtin/commit.c                              |   5 +-
 builtin/gc.c                                  |   7 +-
 commit-graph.c                                | 156 +++++++++++++-----
 commit-graph.h                                |  16 +-
 t/t5318-commit-graph.sh                       |  68 +++++++-
 8 files changed, 254 insertions(+), 70 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v1:

 1:  e72498d0c5 ! 1:  91f300ec0a commit-graph: return with errors during write
     @@ -82,8 +82,8 @@
       --- a/builtin/gc.c
       +++ b/builtin/gc.c
      @@
     - 	if (pack_garbage.nr > 0)
       		clean_pack_garbage();
     + 	}
       
      -	if (gc_write_commit_graph)
      -		write_commit_graph_reachable(get_object_directory(), 0,
     @@ -182,9 +182,9 @@
       	}
       	stop_progress(&progress);
       
     --	if (count_distinct >= GRAPH_PARENT_MISSING)
     +-	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
      -		die(_("the commit graph format cannot write %d commits"), count_distinct);
     -+	if (count_distinct >= GRAPH_PARENT_MISSING) {
     ++	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
      +		error(_("the commit graph format cannot write %d commits"), count_distinct);
      +		res = 1;
      +		goto cleanup;
     @@ -196,9 +196,9 @@
       	num_chunks = num_extra_edges ? 4 : 3;
       	stop_progress(&progress);
       
     --	if (commits.nr >= GRAPH_PARENT_MISSING)
     +-	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
      -		die(_("too many commits to write graph"));
     -+	if (commits.nr >= GRAPH_PARENT_MISSING) {
     ++	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
      +		error(_("too many commits to write graph"));
      +		res = 1;
      +		goto cleanup;
 2:  43a40d0c43 ! 2:  04f5df1135 commit-graph: collapse parameters into flags
     @@ -66,7 +66,7 @@
       --- a/builtin/gc.c
       +++ b/builtin/gc.c
      @@
     - 		clean_pack_garbage();
     + 	}
       
       	if (gc_write_commit_graph &&
      -	    write_commit_graph_reachable(get_object_directory(), 0,
 3:  39319e36bc ! 3:  4ddb829163 commit-graph: create new version flags
     @@ -25,25 +25,25 @@
      -#define GRAPH_VERSION GRAPH_VERSION_1
      -
       #define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
     - #define GRAPH_PARENT_MISSING 0x7fffffff
       #define GRAPH_EDGE_LAST_MASK 0x7fffffff
     + #define GRAPH_PARENT_NONE 0x70000000
      @@
       	}
       
       	graph_version = *(unsigned char*)(data + 4);
      -	if (graph_version != GRAPH_VERSION) {
      +	if (graph_version != 1) {
     - 		error(_("graph version %X does not match version %X"),
     + 		error(_("commit-graph version %X does not match version %X"),
      -		      graph_version, GRAPH_VERSION);
     --		goto cleanup_fail;
     +-		return NULL;
      -	}
      -
      -	hash_version = *(unsigned char*)(data + 5);
      -	if (hash_version != oid_version()) {
     --		error(_("hash version %X does not match version %X"),
     +-		error(_("commit-graph hash version %X does not match version %X"),
      -		      hash_version, oid_version());
      +		      graph_version, 1);
     - 		goto cleanup_fail;
     + 		return NULL;
       	}
       
       	graph = alloc_commit_graph();
     @@ -52,9 +52,9 @@
      +	case 1:
      +		hash_version = *(unsigned char*)(data + 5);
      +		if (hash_version != oid_version()) {
     -+			error(_("hash version %X does not match version %X"),
     ++			error(_("commit-graph hash version %X does not match version %X"),
      +			      hash_version, oid_version());
     -+			goto cleanup_fail;
     ++			return NULL;
      +		}
      +
      +		graph->num_chunks = *(unsigned char*)(data + 6);
     @@ -72,8 +72,8 @@
       	last_chunk_offset = 8;
      -	chunk_lookup = data + 8;
       	for (i = 0; i < graph->num_chunks; i++) {
     - 		uint32_t chunk_id = get_be32(chunk_lookup + 0);
     - 		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
     + 		uint32_t chunk_id;
     + 		uint64_t chunk_offset;
      @@
       	int res = 0;
       	int append = flags & COMMIT_GRAPH_APPEND;
 4:  e7ae3007f5 ! 4:  b1b0c76eb4 commit-graph: add --version=<n> option
     @@ -2,7 +2,7 @@
      
          commit-graph: add --version=<n> option
      
     -    Allo the commit-graph builtin to specify the file format version
     +    Allow the commit-graph builtin to specify the file format version
          using the '--version=<n>' option. Specify the version exactly in
          the verification tests as using a different version would change
          the offsets used in those tests.
 5:  c55e0a738c ! 5:  09362bda1b commit-graph: implement file format version 2
     @@ -22,12 +22,20 @@
          We can also fail to read the commit-graph if the eighth byte is
          non-zero.
      
     -    The 'git commit-graph read' subcommand needs updating to show the
     -    new data.
     +    Update the 'git commit-graph read' subcommand to display the new
     +    data.
      
          Set the default file format version to 2, and adjust the tests to
          expect the new 'git commit-graph read' output.
      
     +    Add explicit tests for the upgrade path from version 1 to 2. Users
     +    with an existing commit-graph with version 1 will seamlessly
     +    upgrade to version 2 on their next write.
     +
     +    While we converted the existing 'verify' tests to use a version 1
     +    file to avoid recalculating data offsets, add explicit 'verify'
     +    tests on a version 2 file that corrupt the new header values.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
     @@ -118,21 +126,17 @@
      +	unsigned char graph_version, hash_version, reach_index_version;
      +	uint32_t hash_id;
       
     - 	if (fd < 0)
     + 	if (!graph_map)
       		return NULL;
      @@
       	}
       
       	graph_version = *(unsigned char*)(data + 4);
      -	if (graph_version != 1) {
     --		error(_("graph version %X does not match version %X"),
     --		      graph_version, 1);
      +	if (!graph_version || graph_version > 2) {
     -+		error(_("unsupported graph version %X"),
     -+		      graph_version);
     - 		goto cleanup_fail;
     - 	}
     - 
     + 		error(_("commit-graph version %X does not match version %X"),
     + 		      graph_version, 1);
     + 		return NULL;
      @@
       		graph->num_chunks = *(unsigned char*)(data + 6);
       		chunk_lookup = data + 8;
     @@ -145,18 +149,18 @@
      +		if (reach_index_version != 1) {
      +			error(_("unsupported reachability index version %d"),
      +			      reach_index_version);
     -+			goto cleanup_fail;
     ++			return NULL;
      +		}
      +
      +		if (*(unsigned char*)(data + 7)) {
      +			error(_("unsupported value in commit-graph header"));
     -+			goto cleanup_fail;
     ++			return NULL;
      +		}
      +
      +		hash_id = get_be32(data + 8);
      +		if (hash_id != the_hash_algo->format_id) {
      +			error(_("commit-graph hash algorithm does not match current algorithm"));
     -+			goto cleanup_fail;
     ++			return NULL;
      +		}
      +
      +		chunk_lookup = data + 12;
     @@ -209,6 +213,31 @@
       diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
       --- a/t/t5318-commit-graph.sh
       +++ b/t/t5318-commit-graph.sh
     +@@
     + 	git repack
     + '
     + 
     +-graph_git_two_modes() {
     ++graph_git_two_modes () {
     + 	git -c core.commitGraph=true $1 >output
     + 	git -c core.commitGraph=false $1 >expect
     + 	test_cmp expect output
     + }
     + 
     +-graph_git_behavior() {
     ++graph_git_behavior () {
     + 	MSG=$1
     + 	DIR=$2
     + 	BRANCH=$3
     +@@
     + 
     + graph_git_behavior 'no graph' full commits/3 commits/1
     + 
     +-graph_read_expect() {
     ++graph_read_expect () {
     + 	OPTIONAL=""
     + 	NUM_CHUNKS=3
     + 	if test ! -z $2
      @@
       		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
       	fi
     @@ -219,6 +248,40 @@
       	num_commits: $1
       	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
       	EOF
     +@@
     + 	)
     + '
     + 
     ++test_expect_success 'write v1 graph' '
     ++	git commit-graph write --reachable --version=1 &&
     ++	git commit-graph verify
     ++'
     ++
     ++graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
     ++graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
     ++
     ++test_expect_success 'upgrade from v1 to v2' '
     ++	git checkout -b new-commit-for-upgrade &&
     ++	test_commit force-upgrade &&
     ++	git commit-graph write --reachable --version=2 &&
     ++	git commit-graph verify
     ++'
     ++
     ++graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
     ++graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
     ++
     + # the verify tests below expect the commit-graph to contain
     + # exactly the commits reachable from the commits/8 branch.
     + # If the file changes the set of commits in the list, then the
     +@@
     + # starting at <zero_pos>, then runs 'git commit-graph verify'
     + # and places the output in the file 'err'. Test 'err' for
     + # the given string.
     +-corrupt_graph_and_verify() {
     ++corrupt_graph_and_verify () {
     + 	pos=$1
     + 	data="${2:-\0}"
     + 	grepstr=$3
      @@
       '
       
     @@ -235,3 +298,41 @@
       test_expect_success 'detect bad hash version' '
       	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
       		"hash version"
     +@@
     + 	test_must_fail git fsck
     + '
     + 
     ++test_expect_success 'rewrite commmit-graph with version 2' '
     ++	rm -f .git/objects/info/commit-graph &&
     ++	git commit-graph write --reachable --version=2 &&
     ++	git commit-graph verify
     ++'
     ++
     ++GRAPH_BYTE_CHUNK_COUNT=5
     ++GRAPH_BYTE_REACH_INDEX=6
     ++GRAPH_BYTE_UNUSED=7
     ++GRAPH_BYTE_HASH=8
     ++
     ++test_expect_success 'detect low chunk count (v2)' '
     ++	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
     ++		"missing the .* chunk"
     ++'
     ++
     ++test_expect_success 'detect incorrect reachability index' '
     ++	corrupt_graph_and_verify $GRAPH_BYTE_REACH_INDEX "\03" \
     ++		"reachability index version"
     ++'
     ++
     ++test_expect_success 'detect non-zero unused byte' '
     ++	corrupt_graph_and_verify $GRAPH_BYTE_UNUSED "\01" \
     ++		"unsupported value"
     ++'
     ++
     ++test_expect_success 'detect bad hash version (v2)' '
     ++	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\00" \
     ++		"hash algorithm"
     ++'
     ++
     + test_expect_success 'setup non-the_repository tests' '
     + 	rm -rf repo &&
     + 	git init repo &&
 6:  693900b4c5 < -:  ---------- commit-graph: test verifying a corrupt v2 header

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 1/5] commit-graph: return with errors during write
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
@ 2019-04-24 19:58   ` Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 19 +++++++------
 builtin/commit.c       |  5 ++--
 builtin/gc.c           |  7 ++---
 commit-graph.c         | 60 +++++++++++++++++++++++++++++-------------
 commit-graph.h         | 10 +++----
 5 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2e86251f02 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *pack_indexes = NULL;
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
+	int result;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
 	read_replace_refs = 0;
 
-	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-		return 0;
-	}
+	if (opts.reachable)
+		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
 		UNLEAK(buf);
 	}
 
-	write_commit_graph(opts.obj_dir,
-			   pack_indexes,
-			   commit_hex,
-			   opts.append,
-			   1);
+	result = write_commit_graph(opts.obj_dir,
+				    pack_indexes,
+				    commit_hex,
+				    opts.append,
+				    1);
 
 	UNLEAK(lines);
-	return 0;
+	return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "new_index file. Check that disk is not full and quota is\n"
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-		write_commit_graph_reachable(get_object_directory(), 0, 0);
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+		return 1;
 
 	repo_rerere(the_repository, 0);
 	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0,
-					     !quiet && !daemonized);
+	if (gc_write_commit_graph &&
+	    write_commit_graph_reachable(get_object_directory(), 0,
+					 !quiet && !daemonized))
+		return 1;
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..ee487a364b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+				 int report_progress)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
+	int result;
 
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+	result = write_commit_graph(obj_dir, NULL, &list,
+				    append, report_progress);
 
 	string_list_clear(&list, 0);
+	return result;
 }
 
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress)
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
-	char *graph_name;
+	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
@@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
-		return;
+		return 0;
 
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
+	commits.list = NULL;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
 			strbuf_setlen(&packname, dirlen);
 			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p)
-				die(_("error adding pack %s"), packname.buf);
-			if (open_pack_index(p))
-				die(_("error opening index for %s"), packname.buf);
+			if (!p) {
+				error(_("error adding pack %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
+			if (open_pack_index(p)) {
+				error(_("error opening index for %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
 			for_each_object_in_pack(p, add_packed_commits, &oids,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
@@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
 	}
 	stop_progress(&progress);
 
-	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
-		die(_("the commit graph format cannot write %d commits"), count_distinct);
+	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
+		error(_("the commit graph format cannot write %d commits"), count_distinct);
+		res = 1;
+		goto cleanup;
+	}
 
 	commits.nr = 0;
 	commits.alloc = count_distinct;
@@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
 	num_chunks = num_extra_edges ? 4 : 3;
 	stop_progress(&progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
-		die(_("too many commits to write graph"));
+	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+		error(_("too many commits to write graph"));
+		res = 1;
+		goto cleanup;
+	}
 
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name)) {
 		UNLEAK(graph_name);
-		die_errno(_("unable to create leading directories of %s"),
-			  graph_name);
+		error(_("unable to create leading directories of %s"),
+			graph_name);
+		res = errno;
+		goto cleanup;
 	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
@@ -1107,9 +1126,12 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+cleanup:
 	free(graph_name);
 	free(commits.list);
 	free(oids.list);
+
+	return res;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
diff --git a/commit-graph.h b/commit-graph.h
index 7dfb8c896f..d15670bf46 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
+int write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress);
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress);
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 2/5] commit-graph: collapse parameters into flags
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-04-24 19:58   ` Derrick Stolee via GitGitGadget
  2019-04-25  5:21     ` Junio C Hamano
  2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 8 +++++---
 builtin/commit.c       | 2 +-
 builtin/gc.c           | 4 ++--
 commit-graph.c         | 9 +++++----
 commit-graph.h         | 8 +++++---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2e86251f02..828b1a713f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
 	int result;
+	int flags = COMMIT_GRAPH_PROGRESS;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
+	if (opts.append)
+		flags |= COMMIT_GRAPH_APPEND;
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
+		return write_commit_graph_reachable(opts.obj_dir, flags);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    opts.append,
-				    1);
+				    flags);
 
 	UNLEAK(lines);
 	return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+	    write_commit_graph_reachable(get_object_directory(), 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	}
 
 	if (gc_write_commit_graph &&
-	    write_commit_graph_reachable(get_object_directory(), 0,
-					 !quiet && !daemonized))
+	    write_commit_graph_reachable(get_object_directory(),
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index ee487a364b..b16c71fd82 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				 int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int flags)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    append, report_progress);
+				    flags);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append,
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress)
+		       int flags)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
 	int res = 0;
+	int append = flags & COMMIT_GRAPH_APPEND;
+	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
diff --git a/commit-graph.h b/commit-graph.h
index d15670bf46..390474047c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,14 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress);
+#define COMMIT_GRAPH_APPEND     (1 << 0)
+#define COMMIT_GRAPH_PROGRESS   (1 << 1)
+
+int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress);
+		       int flags);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 3/5] commit-graph: create new version flags
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-04-24 19:58   ` Derrick Stolee via GitGitGadget
  2019-04-25  5:29     ` Junio C Hamano
  2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
  2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of a new commit-graph file format version, create
a flag for the write_commit_graph() and write_commit_graph_reachable()
methods to take a version number.

When there is no specified version, the implementation selects a
default value. Currently, the only valid value is 1.

The file format will change the header information, so place the
existing header logic inside a switch statement with only one case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 58 +++++++++++++++++++++++++++++++++-----------------
 commit-graph.h |  1 +
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b16c71fd82..e75e1655fb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,9 +25,6 @@
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
-#define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
-
 #define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
 #define GRAPH_PARENT_NONE 0x70000000
@@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != GRAPH_VERSION) {
+	if (graph_version != 1) {
 		error(_("commit-graph version %X does not match version %X"),
-		      graph_version, GRAPH_VERSION);
-		return NULL;
-	}
-
-	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != oid_version()) {
-		error(_("commit-graph hash version %X does not match version %X"),
-		      hash_version, oid_version());
+		      graph_version, 1);
 		return NULL;
 	}
 
 	graph = alloc_commit_graph();
 
+	switch (graph_version) {
+	case 1:
+		hash_version = *(unsigned char*)(data + 5);
+		if (hash_version != oid_version()) {
+			error(_("commit-graph hash version %X does not match version %X"),
+			      hash_version, oid_version());
+			return NULL;
+		}
+
+		graph->num_chunks = *(unsigned char*)(data + 6);
+		chunk_lookup = data + 8;
+		break;
+	}
+
 	graph->hash_len = the_hash_algo->rawsz;
-	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
-	chunk_lookup = data + 8;
 	for (i = 0; i < graph->num_chunks; i++) {
 		uint32_t chunk_id;
 		uint64_t chunk_offset;
@@ -888,10 +890,22 @@ int write_commit_graph(const char *obj_dir,
 	int res = 0;
 	int append = flags & COMMIT_GRAPH_APPEND;
 	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
+	int version = 0;
+	int header_size = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
 
+	if (flags & COMMIT_GRAPH_VERSION_1)
+		version = 1;
+	if (!version)
+		version = 1;
+	if (version != 1) {
+		error(_("unsupported commit-graph version %d"),
+		      version);
+		return 1;
+	}
+
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
@@ -1076,10 +1090,16 @@ int write_commit_graph(const char *obj_dir,
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
-	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused padding byte */
+	hashwrite_u8(f, version);
+
+	switch (version) {
+	case 1:
+		hashwrite_u8(f, oid_version());
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 0); /* unused padding byte */
+		header_size = 8;
+		break;
+	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
@@ -1090,7 +1110,7 @@ int write_commit_graph(const char *obj_dir,
 		chunk_ids[3] = 0;
 	chunk_ids[4] = 0;
 
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
diff --git a/commit-graph.h b/commit-graph.h
index 390474047c..d7cd13deb3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -67,6 +67,7 @@ int generation_numbers_enabled(struct repository *r);
 
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
+#define COMMIT_GRAPH_VERSION_1  (1 << 2)
 
 int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 4/5] commit-graph: add --version=<n> option
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
@ 2019-04-24 19:58   ` Derrick Stolee via GitGitGadget
  2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Allow the commit-graph builtin to specify the file format version
using the '--version=<n>' option. Specify the version exactly in
the verification tests as using a different version would change
the offsets used in those tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  3 +++
 builtin/commit-graph.c             | 13 +++++++++++--
 t/t5318-commit-graph.sh            |  2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 624470e198..1d1cc70de4 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -51,6 +51,9 @@ or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
++
+With the `--version=<n>` option, specify the file format version. Used
+only for testing.
 
 'read'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 828b1a713f..65ceb7a141 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -35,6 +35,7 @@ static struct opts_commit_graph {
 	int stdin_packs;
 	int stdin_commits;
 	int append;
+	int version;
 } opts;
 
 
@@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv)
 			N_("start walk at commits listed by stdin")),
 		OPT_BOOL(0, "append", &opts.append,
 			N_("include all commits already in the commit-graph file")),
+		OPT_INTEGER(0, "version", &opts.version,
+			N_("specify the file format version")),
 		OPT_END(),
 	};
 
@@ -170,6 +173,12 @@ static int graph_write(int argc, const char **argv)
 	if (opts.append)
 		flags |= COMMIT_GRAPH_APPEND;
 
+	switch (opts.version) {
+	case 1:
+		flags |= COMMIT_GRAPH_VERSION_1;
+		break;
+	}
+
 	read_replace_refs = 0;
 
 	if (opts.reachable)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..4eb5a09ef3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
-	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits --version=1 &&
 	git commit-graph verify >output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v2 5/5] commit-graph: implement file format version 2
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
@ 2019-04-24 19:58   ` Derrick Stolee via GitGitGadget
  2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
  6 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-04-24 19:58 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
     of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
     We currently only support generation numbers, but that will
     change in the future.

  3. Git did not fail with error if the unused eighth byte was
     non-zero, so we could not use that to indicate an incremental
     file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

Update the 'git commit-graph read' subcommand to display the new
data.

Set the default file format version to 2, and adjust the tests to
expect the new 'git commit-graph read' output.

Add explicit tests for the upgrade path from version 1 to 2. Users
with an existing commit-graph with version 1 will seamlessly
upgrade to version 2 on their next write.

While we converted the existing 'verify' tests to use a version 1
file to avoid recalculating data offsets, add explicit 'verify'
tests on a version 2 file that corrupt the new header values.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 26 +++++++-
 builtin/commit-graph.c                        |  9 +++
 commit-graph.c                                | 43 ++++++++++--
 commit-graph.h                                |  1 +
 t/t5318-commit-graph.sh                       | 66 +++++++++++++++++--
 5 files changed, 134 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
       The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-      Currently, the only valid version is 1.
+      Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
       We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
      Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+      Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+      Current clients expect this value to be zero, and will not
+      try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+      If this identifier does not agree with the repository's current
+      hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 65ceb7a141..1485b4daaf 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv)
 		*(unsigned char*)(graph->data + 5),
 		*(unsigned char*)(graph->data + 6),
 		*(unsigned char*)(graph->data + 7));
+
+	if (*(unsigned char *)(graph->data + 4) == 2)
+		printf("hash algorithm: %X\n",
+		       get_be32(graph->data + 8));
+
 	printf("num_commits: %u\n", graph->num_commits);
 	printf("chunks:");
 
@@ -177,6 +182,10 @@ static int graph_write(int argc, const char **argv)
 	case 1:
 		flags |= COMMIT_GRAPH_VERSION_1;
 		break;
+
+	case 2:
+		flags |= COMMIT_GRAPH_VERSION_2;
+		break;
 	}
 
 	read_replace_refs = 0;
diff --git a/commit-graph.c b/commit-graph.c
index e75e1655fb..14d6aebd99 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	uint64_t last_chunk_offset;
 	uint32_t last_chunk_id;
 	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
+	unsigned char graph_version, hash_version, reach_index_version;
+	uint32_t hash_id;
 
 	if (!graph_map)
 		return NULL;
@@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != 1) {
+	if (!graph_version || graph_version > 2) {
 		error(_("commit-graph version %X does not match version %X"),
 		      graph_version, 1);
 		return NULL;
@@ -190,6 +191,30 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		graph->num_chunks = *(unsigned char*)(data + 6);
 		chunk_lookup = data + 8;
 		break;
+
+	case 2:
+		graph->num_chunks = *(unsigned char *)(data + 5);
+
+		reach_index_version = *(unsigned char *)(data + 6);
+		if (reach_index_version != 1) {
+			error(_("unsupported reachability index version %d"),
+			      reach_index_version);
+			return NULL;
+		}
+
+		if (*(unsigned char*)(data + 7)) {
+			error(_("unsupported value in commit-graph header"));
+			return NULL;
+		}
+
+		hash_id = get_be32(data + 8);
+		if (hash_id != the_hash_algo->format_id) {
+			error(_("commit-graph hash algorithm does not match current algorithm"));
+			return NULL;
+		}
+
+		chunk_lookup = data + 12;
+		break;
 	}
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -898,9 +923,11 @@ int write_commit_graph(const char *obj_dir,
 
 	if (flags & COMMIT_GRAPH_VERSION_1)
 		version = 1;
+	if (flags & COMMIT_GRAPH_VERSION_2)
+		version = 2;
 	if (!version)
-		version = 1;
-	if (version != 1) {
+		version = 2;
+	if (version <= 0 || version > 2) {
 		error(_("unsupported commit-graph version %d"),
 		      version);
 		return 1;
@@ -1099,6 +1126,14 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite_u8(f, 0); /* unused padding byte */
 		header_size = 8;
 		break;
+
+	case 2:
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 1); /* reachability index version */
+		hashwrite_u8(f, 0); /* unused padding byte */
+		hashwrite_be32(f, the_hash_algo->format_id);
+		header_size = 12;
+		break;
 	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
diff --git a/commit-graph.h b/commit-graph.h
index d7cd13deb3..2c461770e8 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -68,6 +68,7 @@ int generation_numbers_enabled(struct repository *r);
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
 #define COMMIT_GRAPH_VERSION_1  (1 << 2)
+#define COMMIT_GRAPH_VERSION_2  (1 << 3)
 
 int write_commit_graph_reachable(const char *obj_dir, int flags);
 int write_commit_graph(const char *obj_dir,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4eb5a09ef3..0c766e7cdb 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,13 +33,13 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
-graph_git_two_modes() {
+graph_git_two_modes () {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
 	test_cmp expect output
 }
 
-graph_git_behavior() {
+graph_git_behavior () {
 	MSG=$1
 	DIR=$2
 	BRANCH=$3
@@ -56,7 +56,7 @@ graph_git_behavior() {
 
 graph_git_behavior 'no graph' full commits/3 commits/1
 
-graph_read_expect() {
+graph_read_expect () {
 	OPTIONAL=""
 	NUM_CHUNKS=3
 	if test ! -z $2
@@ -65,7 +65,8 @@ graph_read_expect() {
 		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
 	fi
 	cat >expect <<- EOF
-	header: 43475048 1 1 $NUM_CHUNKS 0
+	header: 43475048 2 $NUM_CHUNKS 1 0
+	hash algorithm: 73686131
 	num_commits: $1
 	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
 	EOF
@@ -320,6 +321,24 @@ test_expect_success 'replace-objects invalidates commit-graph' '
 	)
 '
 
+test_expect_success 'write v1 graph' '
+	git commit-graph write --reachable --version=1 &&
+	git commit-graph verify
+'
+
+graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
+graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'upgrade from v1 to v2' '
+	git checkout -b new-commit-for-upgrade &&
+	test_commit force-upgrade &&
+	git commit-graph write --reachable --version=2 &&
+	git commit-graph verify
+'
+
+graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
+graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
@@ -392,7 +411,7 @@ corrupt_graph_verify() {
 # starting at <zero_pos>, then runs 'git commit-graph verify'
 # and places the output in the file 'err'. Test 'err' for
 # the given string.
-corrupt_graph_and_verify() {
+corrupt_graph_and_verify () {
 	pos=$1
 	data="${2:-\0}"
 	grepstr=$3
@@ -424,10 +443,14 @@ test_expect_success 'detect bad signature' '
 '
 
 test_expect_success 'detect bad version' '
-	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
 		"graph version"
 '
 
+test_expect_success 'detect version 2 with version 1 data' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"reachability index version"
+'
 test_expect_success 'detect bad hash version' '
 	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
 		"hash version"
@@ -532,6 +555,37 @@ test_expect_success 'git fsck (checks commit-graph)' '
 	test_must_fail git fsck
 '
 
+test_expect_success 'rewrite commmit-graph with version 2' '
+	rm -f .git/objects/info/commit-graph &&
+	git commit-graph write --reachable --version=2 &&
+	git commit-graph verify
+'
+
+GRAPH_BYTE_CHUNK_COUNT=5
+GRAPH_BYTE_REACH_INDEX=6
+GRAPH_BYTE_UNUSED=7
+GRAPH_BYTE_HASH=8
+
+test_expect_success 'detect low chunk count (v2)' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect incorrect reachability index' '
+	corrupt_graph_and_verify $GRAPH_BYTE_REACH_INDEX "\03" \
+		"reachability index version"
+'
+
+test_expect_success 'detect non-zero unused byte' '
+	corrupt_graph_and_verify $GRAPH_BYTE_UNUSED "\01" \
+		"unsupported value"
+'
+
+test_expect_success 'detect bad hash version (v2)' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\00" \
+		"hash algorithm"
+'
+
 test_expect_success 'setup non-the_repository tests' '
 	rm -rf repo &&
 	git init repo &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 2/5] commit-graph: collapse parameters into flags
  2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-04-25  5:21     ` Junio C Hamano
  0 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-04-25  5:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, avarab, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() and write_commit_graph_reachable() methods
> currently take two boolean parameters: 'append' and 'report_progress'.
> We will soon expand the possible options to send to these methods, so
> instead of complicating the parameter list, first simplify it.
>
> Collapse these parameters into a 'flags' parameter, and adjust the
> callers to provide flags as necessary.

Nice.  It would make more sense for a collection of independent bits
to be in an unsigned, not signed integer variable, though.  Unless
you assign some special meaning to the top-most bit, that is.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 3/5] commit-graph: create new version flags
  2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
@ 2019-04-25  5:29     ` Junio C Hamano
  2019-04-25 11:09       ` Derrick Stolee
  2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 89+ messages in thread
From: Junio C Hamano @ 2019-04-25  5:29 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, avarab, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	int version = 0;
> ...
> +	if (flags & COMMIT_GRAPH_VERSION_1)
> +		version = 1;
> +	if (!version)
> +		version = 1;
> +	if (version != 1) {
> +		error(_("unsupported commit-graph version %d"),
> +		      version);
> +		return 1;
> +	}

The above sequence had a certain "Huh?" factor before 5/5 introduced
the support for a later version that is in use by default.

Is it sensible to define VERSION_$N as if they are independent bits
in a single flags variable?  What does it mean for the flags variable
to have both GRAPH_VERSION_1 and GRAPH_VERSION_2 bits set?

What I am getting at is if this is better done as a n-bit bitfield
that represents a small unsigned integer (e.g. "unsigned char" that
lets you play with up to 255 versions, or "unsigned version : 3"
that limits you to up to 7 versions).

You use an 8-bit byte in the file format anyway, so it might not be
so bad to have a separate version parameter that is not mixed with
the flag bits, perhaps?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 3/5] commit-graph: create new version flags
  2019-04-25  5:29     ` Junio C Hamano
@ 2019-04-25 11:09       ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-04-25 11:09 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, Derrick Stolee

On 4/25/2019 1:29 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	int version = 0;
>> ...
>> +	if (flags & COMMIT_GRAPH_VERSION_1)
>> +		version = 1;
>> +	if (!version)
>> +		version = 1;
>> +	if (version != 1) {
>> +		error(_("unsupported commit-graph version %d"),
>> +		      version);
>> +		return 1;
>> +	}
> 
> The above sequence had a certain "Huh?" factor before 5/5 introduced
> the support for a later version that is in use by default.
> 
> Is it sensible to define VERSION_$N as if they are independent bits
> in a single flags variable?  What does it mean for the flags variable
> to have both GRAPH_VERSION_1 and GRAPH_VERSION_2 bits set?
>
> What I am getting at is if this is better done as a n-bit bitfield
> that represents a small unsigned integer (e.g. "unsigned char" that
> lets you play with up to 255 versions, or "unsigned version : 3"
> that limits you to up to 7 versions).
> 
> You use an 8-bit byte in the file format anyway, so it might not be
> so bad to have a separate version parameter that is not mixed with
> the flag bits, perhaps?

This is a reasonable idea, as this is a "pick exactly one" option.
It is still important to reduce the overall parameter count by combining
the other boolean options into flags.

Thanks,
-Stolee
 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 3/5] commit-graph: create new version flags
  2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
  2019-04-25  5:29     ` Junio C Hamano
@ 2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
  2019-04-26  2:20       ` Junio C Hamano
  1 sibling, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-25 21:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee, Duy Nguyen


On Wed, Apr 24 2019, Derrick Stolee via GitGitGadget wrote:

> -	hash_version = *(unsigned char*)(data + 5);
> -	if (hash_version != oid_version()) {
> -		error(_("commit-graph hash version %X does not match version %X"),
> -		      hash_version, oid_version());
> +		      graph_version, 1);
>  		return NULL;
>  	}
>
>  	graph = alloc_commit_graph();
>
> +	switch (graph_version) {
> +	case 1:
> +		hash_version = *(unsigned char*)(data + 5);
> +		if (hash_version != oid_version()) {
> +			error(_("commit-graph hash version %X does not match version %X"),
> +			      hash_version, oid_version());
> +			return NULL;
> +		}

This is just munging existing code, but one thing in my series that I
didn't follow-up on was Duy's suggestion[1] of %X here being
nonsensical.

It doesn't make sense to start saying "version A" here when we make it
to version 10, however unlikely that is :)

So I think for the existing %X in this file it should be 0x%X as he
suggests, except in cases like this where we should just use %d.

1. https://public-inbox.org/git/CACsJy8DgNzGK3g2P7ZyRmd7sbiSOXY07KqYEh-gSsPkEZ+D5Qw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
@ 2019-04-25 22:09   ` Ævar Arnfjörð Bjarmason
  2019-04-26  2:28     ` Junio C Hamano
  2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
  6 siblings, 2 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-25 22:09 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, Junio C Hamano, Jeff King


On Wed, Apr 24 2019, Derrick Stolee via GitGitGadget wrote:

> NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
> were significant and subtle.

Sorry, hopefully it helped more than it harmed :)

A few unrelated things:

1)

First, before that series of mine applying this and writing a v2 file
would make most things (e.g. "status") hard error on e.g. v2.21.0:


    $ git status
    error: graph version 2 does not match version 1
    $

Now as noted in my series we now on 'master' downgrade that to a warning
(along with the rest of the errors):

    $ ~/g/git/git --exec-path=$PWD status
    error: commit-graph version 2 does not match version 1
    On branch master
    [...]

...and this series sets the default version for all new graphs to v2.

I think this is *way* too aggressive of an upgrade path. If these
patches go into v2.22.0 then git clients on all older versions that grok
the commit graph (IIRC v2.18 and above) will have their git completely
broken if they're in a mixed-git-version environment.

Is it really so important to move to v2 right away that we need to risk
those breakages? I think even with my ab/commit-graph-fixes it's still
too annoying (I was mostly trying to fix other stuff...). If only we
could detect "we should make a new graph now" ....

2)

...speaking of which, digging up outstanding stuff I have on the
commit-graph I was reminded to finish up my "commit graph on clone"
patch in:
https://public-inbox.org/git/87in2hgzin.fsf@evledraar.gmail.com/

And re #1 above: I guess we could also do that "let's make a graph" and
call "gc --auto" if a) we have gc.writeCommitGraph b) we see it's not
the "right" version. As long as older versions always write a "old" one
if they can't grok the "new" one, and newer versions leave existing
graphs alone even if they're older versions, so we don't flip-flop.

One of the things that would make that "graph on clone/fetch/whatever"
easier is having the graph store the total number of objects while it
was at it, you indicated in
https://public-inbox.org/git/934fa00e-f6df-c333-4968-3e9acffab22d@gmail.com/
that you already have an internal MSFT implementation of it that does
it.

Any reason not to make it part of v2 while we're at it? We already find
out how many (packed) objects we have in "add_packed_commits", we just
don't do anything with that information now.

3)

Also (but mostly unrelated). I see that "Future Work" in
Documentation/technical/commit-graph.txt now appears to entirely
describe "Past Work" instead :)

4)

The third point in "Design Details" also says that the format doesn't
need a change for a future hash algo change, yet here we are at v2
making a (small) change for that purpose :)

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 3/5] commit-graph: create new version flags
  2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
@ 2019-04-26  2:20       ` Junio C Hamano
  0 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-04-26  2:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Derrick Stolee,
	Duy Nguyen

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It doesn't make sense to start saying "version A" here when we make it
> to version 10, however unlikely that is :)

;-)

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
@ 2019-04-26  2:28     ` Junio C Hamano
  2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
  2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 89+ messages in thread
From: Junio C Hamano @ 2019-04-26  2:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Jeff King

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Apr 24 2019, Derrick Stolee via GitGitGadget wrote:
>
>> NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
>> were significant and subtle.
>
> Sorry, hopefully it helped more than it harmed :)
>
> A few unrelated things:

Thanks always for your careful review and thoughtful comments, by
the way.

> Now as noted in my series we now on 'master' downgrade that to a warning
> (along with the rest of the errors):
>
>     $ ~/g/git/git --exec-path=$PWD status
>     error: commit-graph version 2 does not match version 1
>     On branch master
>     [...]
>
> ...and this series sets the default version for all new graphs to v2.

The phrasing seems odd.  It is unclear, even to me who is vaguely
familiar with the word "commit-graph" and is aware of the fact that
the file format is being updated, what

    "commit-graph version 2 does not match version 1" 

wants to say.  Do I have version #2 on disk and the running binary
only understands version #1?  Or do I have version #1 on disk and
the binary expected version #2?  How would I get out of this
situation?  Is it sufficient to do "rm -f .git/info/commit-graph*"
and is it safe?

> I think this is *way* too aggressive of an upgrade path. If these
> patches go into v2.22.0 then git clients on all older versions that grok
> the commit graph (IIRC v2.18 and above) will have their git completely
> broken if they're in a mixed-git-version environment.
>
> Is it really so important to move to v2 right away that we need to risk
> those breakages? I think even with my ab/commit-graph-fixes it's still
> too annoying (I was mostly trying to fix other stuff...). If only we
> could detect "we should make a new graph now" ....

True.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-26  2:28     ` Junio C Hamano
@ 2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
  2019-04-26 12:06         ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-26  8:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, sandals, Jeff King


On Fri, Apr 26 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Apr 24 2019, Derrick Stolee via GitGitGadget wrote:
>>
>>> NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
>>> were significant and subtle.
>>
>> Sorry, hopefully it helped more than it harmed :)
>>
>> A few unrelated things:
>
> Thanks always for your careful review and thoughtful comments, by
> the way.
>
>> Now as noted in my series we now on 'master' downgrade that to a warning
>> (along with the rest of the errors):
>>
>>     $ ~/g/git/git --exec-path=$PWD status
>>     error: commit-graph version 2 does not match version 1
>>     On branch master
>>     [...]
>>
>> ...and this series sets the default version for all new graphs to v2.
>
> The phrasing seems odd.  It is unclear, even to me who is vaguely
> familiar with the word "commit-graph" and is aware of the fact that
> the file format is being updated, what
>
>     "commit-graph version 2 does not match version 1"

Yeah it should really say:

    "commit-graph is of version 2, our maximum supported version is 1"

Hindsight is 20/20, but more generally I wonder if we should have these
format versions match that of the git version (unlikely to change it
twice in the same release...) which would allow us to say things like:

    "commit-graph needs v2.22.0 or later, we have the version written by v2.18.0..v2.21.0"

But of course dealing with those larger integers in the code/gaps is
also messy :)

> wants to say.  Do I have version #2 on disk and the running binary
> only understands version #1?  Or do I have version #1 on disk and
> the binary expected version #2?  How would I get out of this
> situation?  Is it sufficient to do "rm -f .git/info/commit-graph*"
> and is it safe?

Yeah. An rm of .git/info/commit-graph is safe, so is "-c
core.commitGraph=false" as a workaround.

I'd say "let's improve the error", but that ship has sailed, and we can
do better than an error here, no matter how it's phrased...

>> I think this is *way* too aggressive of an upgrade path. If these
>> patches go into v2.22.0 then git clients on all older versions that grok
>> the commit graph (IIRC v2.18 and above) will have their git completely
>> broken if they're in a mixed-git-version environment.
>>


I should note that "all older versions..." here is those that have
core.commitGraph=true set. More details in 43d3561805 ("commit-graph
write: don't die if the existing graph is corrupt", 2019-03-25).

>> Is it really so important to move to v2 right away that we need to risk
>> those breakages? I think even with my ab/commit-graph-fixes it's still
>> too annoying (I was mostly trying to fix other stuff...). If only we
>> could detect "we should make a new graph now" ....
>
> True.

Having slept on my earlier
https://public-inbox.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ I think
I see a better way to deal with this than my earlier suggestion that we
perform some version flip-flop dance on the single "commit-graph" file:

How about just writing .git/objects/info/commit-graph-v2, and for the
upcoming plan when where they'll be split have some dir/prefix there
where we include the version?

That means that:

 1. If there's an existing v1 "commit-graph" file we don't write a v2 at
    that path in v2.22, although we might have some "write v1 (as well
    as v2?) for old client compat" config where we opt-in to do that.

 2. By default in v2.22 we read/write a "commit-graph-v2" file,
    preferring it over the v1 "commit-graph", falling back on earlier
    versions if it's not there (until gc --auto kicks in on v2.22 and
    makes a v2 graph).

 3. If you have concurrent v2.21 and v2.22 clients accessing the repo
    you might end up generating one commit-graph or the other depending
    on who happens to trigger "gc --auto".

    Hopefully that's a non-issue since an out-of-date graph isn't
    usually a big deal, and client versions mostly march forward. But
    v2.22 could also learn some "incremental gc" where it says "my v2 is
    older, v1 client must have refreshed it, I'll refresh mine/both".

 4. v2.22 and newer versions will have some code in git-gc where we'll
    eventually readdir() .git/objects/info and remove graphs that are
    too old per some new config (say
    "gc.pruneOlderCommitGraphVersions=180 days").

This means that:

 A. GOOD: Now and going forward we can fearlessly create new versions of
    the graph without worrying/testing how older clients deal with it.

 B. BAD: We are going to eat ~2x the disk space for commit-graphs while
    such transitions are underway. I think that's fine. They're
    relatively small compared to .git/objects, and we'll eventually "gc"
    the old ones.

 C. BAD: Different versions of git might perform wildly differently (new
    version slower) since their respective preferred graph versions
    might have a very different/up-to-date number of commits v.s. what's
    in the packs.

I think "A" outweighs "B" && "C" in this case. It's "just" a caching
data structure, and git works without it. So we can be a lot looser than
say updating the index/pack format.

Worst case things slow down but still work, and as noted in #3 above if
we care it can be mitigated (but I don't, I think we can safely assume
"client versions march forward").

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
@ 2019-04-26 12:06         ` Derrick Stolee
  2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee @ 2019-04-26 12:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Jeff King

On 4/26/2019 4:33 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Apr 26 2019, Junio C Hamano wrote:
>>
>> Thanks always for your careful review and thoughtful comments, by
>> the way.

I agree that these comments are extremely helpful.

>>> Now as noted in my series we now on 'master' downgrade that to a warning
>>> (along with the rest of the errors):
>>>
>>>     $ ~/g/git/git --exec-path=$PWD status
>>>     error: commit-graph version 2 does not match version 1
>>>     On branch master
>>>     [...]
>>>
>>> ...and this series sets the default version for all new graphs to v2.>>>> The phrasing seems odd.  It is unclear, even to me who is vaguely
>> familiar with the word "commit-graph" and is aware of the fact that
>> the file format is being updated, what
>>
>>     "commit-graph version 2 does not match version 1"
> 
> Yeah it should really say:
> 
>     "commit-graph is of version 2, our maximum supported version is 1"

I agree this phrasing is better. Please see the patch I just submitted [1]
to try and improve these messages.

[1] https://public-inbox.org/git/pull.181.git.gitgitgadget@gmail.com/
 
> Hindsight is 20/20, but more generally I wonder if we should have these
> format versions match that of the git version (unlikely to change it
> twice in the same release...) which would allow us to say things like:
> 
>     "commit-graph needs v2.22.0 or later, we have the version written by v2.18.0..v2.21.0"
> 
> But of course dealing with those larger integers in the code/gaps is
> also messy :)

There are a couple issues with using the version numbers, from my
perspective:

1. We don't do that anywhere else, like the index file.

2. The microsoft/git fork takes certain performance changes faster
   than core git, and frequently ships versions between major version
   updates. Our 2.17 had the commit-graph, for instance. It's also
   possible that we'd take commit-graph v2 earlier than the core Git
   major release.
 
>> wants to say.  Do I have version #2 on disk and the running binary
>> only understands version #1?  Or do I have version #1 on disk and
>> the binary expected version #2?  How would I get out of this
>> situation?  Is it sufficient to do "rm -f .git/info/commit-graph*"
>> and is it safe?
> 
> Yeah. An rm of .git/info/commit-graph is safe, so is "-c
> core.commitGraph=false" as a workaround.

That is true. I'm not sure the error message is the right place to
describe the workaround.

> I'd say "let's improve the error", but that ship has sailed, and we can
> do better than an error here, no matter how it's phrased...
> 
>>> I think this is *way* too aggressive of an upgrade path. If these
>>> patches go into v2.22.0 then git clients on all older versions that grok
>>> the commit graph (IIRC v2.18 and above) will have their git completely
>>> broken if they're in a mixed-git-version environmen.>
> I should note that "all older versions..." here is those that have
> core.commitGraph=true set. More details in 43d3561805 ("commit-graph
> write: don't die if the existing graph is corrupt", 2019-03-25).
> 
>>> Is it really so important to move to v2 right away that we need to risk
>>> those breakages? I think even with my ab/commit-graph-fixes it's still
>>> too annoying (I was mostly trying to fix other stuff...). If only we
>>> could detect "we should make a new graph now" ....
>>
>> True.

You are right, this is too aggressive and I should have known better. I'll
update in the next version to keep a default to v1. Not only do we have this
downgrade risk, there is no actual benefit in this series alone. This only
sets up the ability for other features.
 
> Having slept on my earlier
> https://public-inbox.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ I think
> I see a better way to deal with this than my earlier suggestion that we
> perform some version flip-flop dance on the single "commit-graph" file:
> 
> How about just writing .git/objects/info/commit-graph-v2, and for the
> upcoming plan when where they'll be split have some dir/prefix there
> where we include the version?
> 
> That means that:
> 
>  1. If there's an existing v1 "commit-graph" file we don't write a v2 at
>     that path in v2.22, although we might have some "write v1 (as well
>     as v2?) for old client compat" config where we opt-in to do that.
> 
>  2. By default in v2.22 we read/write a "commit-graph-v2" file,
>     preferring it over the v1 "commit-graph", falling back on earlier
>     versions if it's not there (until gc --auto kicks in on v2.22 and
>     makes a v2 graph).
> 
>  3. If you have concurrent v2.21 and v2.22 clients accessing the repo
>     you might end up generating one commit-graph or the other depending
>     on who happens to trigger "gc --auto".
> 
>     Hopefully that's a non-issue since an out-of-date graph isn't
>     usually a big deal, and client versions mostly march forward. But
>     v2.22 could also learn some "incremental gc" where it says "my v2 is
>     older, v1 client must have refreshed it, I'll refresh mine/both".
> 
>  4. v2.22 and newer versions will have some code in git-gc where we'll
>     eventually readdir() .git/objects/info and remove graphs that are
>     too old per some new config (say
>     "gc.pruneOlderCommitGraphVersions=180 days").
> 
> This means that:
> 
>  A. GOOD: Now and going forward we can fearlessly create new versions of
>     the graph without worrying/testing how older clients deal with it.
> 
>  B. BAD: We are going to eat ~2x the disk space for commit-graphs while
>     such transitions are underway. I think that's fine. They're
>     relatively small compared to .git/objects, and we'll eventually "gc"
>     the old ones.

We could also write 'commit-graph-v2' and delete 'commit-graph' and if
someone downgrades they would just have a performance issue, not a failure.

>  C. BAD: Different versions of git might perform wildly differently (new
>     version slower) since their respective preferred graph versions
>     might have a very different/up-to-date number of commits v.s. what's
>     in the packs.
> 
> I think "A" outweighs "B" && "C" in this case. It's "just" a caching
> data structure, and git works without it. So we can be a lot looser than
> say updating the index/pack format.
> 
> Worst case things slow down but still work, and as noted in #3 above if
> we care it can be mitigated (but I don't, I think we can safely assume
> "client versions march forward").

While I agree that this downgrade path can be a problem, I don't like the
idea of adding a version in the filename. The whole point of having a versioned
file format is so we can make these changes without changing the filename.

Is it sufficient to remove the auto-upgrade path, at least for a few major
versions? And I can learn from past mistakes and change the response to
the other version information in the v2 file (reach index version, hash version,
unused value in 8th byte) and instead make them non-fatal warnings.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-26 12:06         ` Derrick Stolee
@ 2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-26 13:55 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, sandals, Jeff King


On Fri, Apr 26 2019, Derrick Stolee wrote:

> On 4/26/2019 4:33 AM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Apr 26 2019, Junio C Hamano wrote:
>>>
>>> Thanks always for your careful review and thoughtful comments, by
>>> the way.
>
> I agree that these comments are extremely helpful.
>
>>>> Now as noted in my series we now on 'master' downgrade that to a warning
>>>> (along with the rest of the errors):
>>>>
>>>>     $ ~/g/git/git --exec-path=$PWD status
>>>>     error: commit-graph version 2 does not match version 1
>>>>     On branch master
>>>>     [...]
>>>>
>>>> ...and this series sets the default version for all new graphs to v2.>>>> The phrasing seems odd.  It is unclear, even to me who is vaguely
>>> familiar with the word "commit-graph" and is aware of the fact that
>>> the file format is being updated, what
>>>
>>>     "commit-graph version 2 does not match version 1"
>>
>> Yeah it should really say:
>>
>>     "commit-graph is of version 2, our maximum supported version is 1"
>
> I agree this phrasing is better. Please see the patch I just submitted [1]
> to try and improve these messages.
>
> [1] https://public-inbox.org/git/pull.181.git.gitgitgadget@gmail.com/
>
>> Hindsight is 20/20, but more generally I wonder if we should have these
>> format versions match that of the git version (unlikely to change it
>> twice in the same release...) which would allow us to say things like:
>>
>>     "commit-graph needs v2.22.0 or later, we have the version written by v2.18.0..v2.21.0"
>>
>> But of course dealing with those larger integers in the code/gaps is
>> also messy :)
>
> There are a couple issues with using the version numbers, from my
> perspective:
>
> 1. We don't do that anywhere else, like the index file.
>
> 2. The microsoft/git fork takes certain performance changes faster
>    than core git, and frequently ships versions between major version
>    updates. Our 2.17 had the commit-graph, for instance. It's also
>    possible that we'd take commit-graph v2 earlier than the core Git
>    major release.

Good points. I'm just blathering on and playing architecture astronaut
:)

>>> wants to say.  Do I have version #2 on disk and the running binary
>>> only understands version #1?  Or do I have version #1 on disk and
>>> the binary expected version #2?  How would I get out of this
>>> situation?  Is it sufficient to do "rm -f .git/info/commit-graph*"
>>> and is it safe?
>>
>> Yeah. An rm of .git/info/commit-graph is safe, so is "-c
>> core.commitGraph=false" as a workaround.
>
> That is true. I'm not sure the error message is the right place to
> describe the workaround.

Yeah for sure, it should Just Work...

>> I'd say "let's improve the error", but that ship has sailed, and we can
>> do better than an error here, no matter how it's phrased...
>>
>>>> I think this is *way* too aggressive of an upgrade path. If these
>>>> patches go into v2.22.0 then git clients on all older versions that grok
>>>> the commit graph (IIRC v2.18 and above) will have their git completely
>>>> broken if they're in a mixed-git-version environmen.>
>> I should note that "all older versions..." here is those that have
>> core.commitGraph=true set. More details in 43d3561805 ("commit-graph
>> write: don't die if the existing graph is corrupt", 2019-03-25).
>>
>>>> Is it really so important to move to v2 right away that we need to risk
>>>> those breakages? I think even with my ab/commit-graph-fixes it's still
>>>> too annoying (I was mostly trying to fix other stuff...). If only we
>>>> could detect "we should make a new graph now" ....
>>>
>>> True.
>
> You are right, this is too aggressive and I should have known better. I'll
> update in the next version to keep a default to v1. Not only do we have this
> downgrade risk, there is no actual benefit in this series alone. This only
> sets up the ability for other features.
>
>> Having slept on my earlier
>> https://public-inbox.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ I think
>> I see a better way to deal with this than my earlier suggestion that we
>> perform some version flip-flop dance on the single "commit-graph" file:
>>
>> How about just writing .git/objects/info/commit-graph-v2, and for the
>> upcoming plan when where they'll be split have some dir/prefix there
>> where we include the version?
>>
>> That means that:
>>
>>  1. If there's an existing v1 "commit-graph" file we don't write a v2 at
>>     that path in v2.22, although we might have some "write v1 (as well
>>     as v2?) for old client compat" config where we opt-in to do that.
>>
>>  2. By default in v2.22 we read/write a "commit-graph-v2" file,
>>     preferring it over the v1 "commit-graph", falling back on earlier
>>     versions if it's not there (until gc --auto kicks in on v2.22 and
>>     makes a v2 graph).
>>
>>  3. If you have concurrent v2.21 and v2.22 clients accessing the repo
>>     you might end up generating one commit-graph or the other depending
>>     on who happens to trigger "gc --auto".
>>
>>     Hopefully that's a non-issue since an out-of-date graph isn't
>>     usually a big deal, and client versions mostly march forward. But
>>     v2.22 could also learn some "incremental gc" where it says "my v2 is
>>     older, v1 client must have refreshed it, I'll refresh mine/both".
>>
>>  4. v2.22 and newer versions will have some code in git-gc where we'll
>>     eventually readdir() .git/objects/info and remove graphs that are
>>     too old per some new config (say
>>     "gc.pruneOlderCommitGraphVersions=180 days").
>>
>> This means that:
>>
>>  A. GOOD: Now and going forward we can fearlessly create new versions of
>>     the graph without worrying/testing how older clients deal with it.
>>
>>  B. BAD: We are going to eat ~2x the disk space for commit-graphs while
>>     such transitions are underway. I think that's fine. They're
>>     relatively small compared to .git/objects, and we'll eventually "gc"
>>     the old ones.
>
> We could also write 'commit-graph-v2' and delete 'commit-graph' and if
> someone downgrades they would just have a performance issue, not a failure.
>
>>  C. BAD: Different versions of git might perform wildly differently (new
>>     version slower) since their respective preferred graph versions
>>     might have a very different/up-to-date number of commits v.s. what's
>>     in the packs.
>>
>> I think "A" outweighs "B" && "C" in this case. It's "just" a caching
>> data structure, and git works without it. So we can be a lot looser than
>> say updating the index/pack format.
>>
>> Worst case things slow down but still work, and as noted in #3 above if
>> we care it can be mitigated (but I don't, I think we can safely assume
>> "client versions march forward").
>
> While I agree that this downgrade path can be a problem, I don't like the
> idea of adding a version in the filename. The whole point of having a versioned
> file format is so we can make these changes without changing the filename.
>
> Is it sufficient to remove the auto-upgrade path, at least for a few major
> versions? And I can learn from past mistakes and change the response to
> the other version information in the v2 file (reach index version, hash version,
> unused value in 8th byte) and instead make them non-fatal warnings.

I think there's two things here, and for *me* just one of them would be
enough for "screw it, let's write another file":

 1. Our "you have v2" error reporting for all versions until the one we
    haven't released yet sucks/hard errors.

    So right *now* I think few people turn on core.commitGraph=true, I
    do, you do, I just convinced GitLab to do it (after doing it myself
    & carrying patches...):
    https://gitlab.com/gitlab-org/gitaly/issues/1627

    I think we both want to get to a point where core.commitGraph=true
    is the default though, because it's cheap to make it, and it rocks
    for a lot of use-cases, meanwhile people are toggling it on manually
    at an increasing rate.

    So that combined with some distros/OSs upgrading at a glacial pace
    basically means that we'd *at least* need to do the equivalent of a
    one-off "commit-graph" -> "commit-graph-verion-2-and-beyond",
    because just like some OSs are still shipping with git 1.8.* or 2.11
    whatever, we're also going to have the versions where this is hard
    erroring in production somewhere for a long time.

    It would suck with even a "conservative" upgrade path to need to
    wait until 2022 or something just so I can have a commit graph that
    by default has the number of objects or whatever other small thing
    we add to it because we're paranoid that it hasn't been N versions
    since we stopped hard-erroring on v2.

    There's a lot of such mixed-version cases, e.g. running a server
    cluster where the git data is on NFS and you're in the process of
    updating git on some "client" nodes but not others, now if the git
    version is one of the ones impacted and you e.g. run gitlab the new
    version will ruin your day. Ditto a user on their laptop testing a
    git from debian testing and going back to stable etc, only to find
    that their repos broke.

 2. While we *can* say a lost commit-graph is "just a performance issue,
    not a failure", which is easy enough to make the case for v2.22 and
    beyond if we fix a couple of things, I think this is becoming less
    and less acceptable in practice.

    E.g. I have some things now where I pretty much hard-rely on it if I
    don't want a CPU/IO spike as some commands that take 5ms now start
    taking 30s (e.g. "what branch is this on" in the GitLab UI)., which
    would happen if we have one file and switch to v2 by default at some
    point (and v3, ...).


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v2 0/5] Create commit-graph file format v2
  2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
  2019-04-26  2:28     ` Junio C Hamano
@ 2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-27 12:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, Junio C Hamano, Jeff King


On Fri, Apr 26 2019, Ævar Arnfjörð Bjarmason wrote:

> ...speaking of which, digging up outstanding stuff I have on the
> commit-graph I was reminded to finish up my "commit graph on clone"
> patch in:
> https://public-inbox.org/git/87in2hgzin.fsf@evledraar.gmail.com/
>
> And re #1 above: I guess we could also do that "let's make a graph" and
> call "gc --auto" if a) we have gc.writeCommitGraph b) we see it's not
> the "right" version. As long as older versions always write a "old" one
> if they can't grok the "new" one, and newer versions leave existing
> graphs alone even if they're older versions, so we don't flip-flop.
>
> One of the things that would make that "graph on clone/fetch/whatever"
> easier is having the graph store the total number of objects while it
> was at it, you indicated in
> https://public-inbox.org/git/934fa00e-f6df-c333-4968-3e9acffab22d@gmail.com/
> that you already have an internal MSFT implementation of it that does
> it.
>
> Any reason not to make it part of v2 while we're at it? We already find
> out how many (packed) objects we have in "add_packed_commits", we just
> don't do anything with that information now.

I hacked up this plus general tag/tree/blob count stats in the WIP/RFC
patch below. I figured once I did objects I might as well do tags (note:
annotated tag objects, not # tag refs)/trees/blobs as well.

It passes all tests with GIT_TEST_COMMIT_GRAPH=true, it fails on the
commit-graph's own test suite, but AFAICT only because the selective
corruption tests are thrown off by the location of this new chunk.

Since we now skip some commits found in the pack(s) (just duplicates?)
the new "num_commits_stat" is not the same as the current "num_commits",
but usually really close.

It's probably best if we do something like this to make this chunk be of
dynamic length, as long as we kept order we could keep adding new stats
to the file even within the same "version".

This as (ab)using the "commit-graph" to start storing arbitrary stats
about stuff we find the the packs during gc. Maybe that sucks, but OTOH
it's useful, and just having some new "last gc stats" format/file would
be overkill...

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 1485b4daaf..d9378f23d9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -114,7 +114,9 @@ static int graph_read(int argc, const char **argv)
 	printf("header: %08x %d %d %d %d\n",
 		ntohl(*(uint32_t*)graph->data),
 		*(unsigned char*)(graph->data + 4),
-		*(unsigned char*)(graph->data + 5),
+		(getenv("STAT_ME")
+		 ? *(unsigned char*)(graph->data + 5)
+		 : (*(unsigned char*)(graph->data + 5) - 1)),
 		*(unsigned char*)(graph->data + 6),
 		*(unsigned char*)(graph->data + 7));

@@ -123,8 +125,20 @@ static int graph_read(int argc, const char **argv)
 		       get_be32(graph->data + 8));

 	printf("num_commits: %u\n", graph->num_commits);
+	if (getenv("STAT_ME")) {
+		printf(" pack num commits (st): %u\n", graph->num_commits_stat);
+		printf(" pack num commits - inferred diff (diff = duplicate (I think!)): %u\n", graph->num_commits_stat - graph->num_commits);
+		printf(" pack num objects: %u\n", graph->num_objects);
+		printf(" pack num tags: %u\n", graph->num_tags);
+		printf(" pack num trees: %u\n", graph->num_trees);
+		printf(" pack num blobs: %u\n", graph->num_blobs);
+	}
 	printf("chunks:");

+	if (getenv("STAT_ME")) {
+		if (graph->chunk_oid_numbers)
+			printf(" oid_numbers");
+	}
 	if (graph->chunk_oid_fanout)
 		printf(" oid_fanout");
 	if (graph->chunk_oid_lookup)
diff --git a/commit-graph.c b/commit-graph.c
index 14d6aebd99..3d0fb5193b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -18,6 +18,7 @@
 #include "progress.h"

 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDNUMBERS 0x4f49444e  /* "OIDN" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
@@ -32,6 +33,7 @@
 #define GRAPH_LAST_EDGE 0x80000000

 #define GRAPH_HEADER_SIZE 8
+#define GRAPH_OIDNUMBERS_SIZE (4 * 5)
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
@@ -127,6 +129,10 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	 * over g->num_commits, or runs a checksum on the commit-graph
 	 * itself.
 	 */
+	if (!g->chunk_oid_numbers) {
+		error("commit-graph is missing the OID Numbers chunk");
+		return 1;
+	}
 	if (!g->chunk_oid_fanout) {
 		error("commit-graph is missing the OID Fanout chunk");
 		return 1;
@@ -249,6 +255,18 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		}

 		switch (chunk_id) {
+		case GRAPH_CHUNKID_OIDNUMBERS:
+			if (graph->chunk_oid_numbers) {
+				chunk_repeated = 1;
+			} else {
+				graph->chunk_oid_numbers = data + chunk_offset;
+				graph->num_objects = get_be32(graph->chunk_oid_numbers + 0);
+				graph->num_commits_stat = get_be32(graph->chunk_oid_numbers + 4);
+				graph->num_tags    = get_be32(graph->chunk_oid_numbers + 8);
+				graph->num_trees   = get_be32(graph->chunk_oid_numbers + 12);
+				graph->num_blobs   = get_be32(graph->chunk_oid_numbers + 16);
+			}
+			break;
 		case GRAPH_CHUNKID_OIDFANOUT:
 			if (graph->chunk_oid_fanout)
 				chunk_repeated = 1;
@@ -545,6 +563,22 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 	return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
 }

+static void write_graph_chunk_numbers(struct hashfile *f,
+				      struct progress *progress,
+				      uint64_t *progress_cnt,
+				      uint32_t num_objects,
+				      uint32_t num_commits,
+				      uint32_t num_tags,
+				      uint32_t num_trees,
+				      uint32_t num_blobs)
+{
+	hashwrite_be32(f, num_objects);
+	hashwrite_be32(f, num_commits);
+	hashwrite_be32(f, num_tags);
+	hashwrite_be32(f, num_trees);
+	hashwrite_be32(f, num_blobs);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits,
@@ -567,7 +601,6 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 			count++;
 			list++;
 		}
-
 		hashwrite_be32(f, count);
 	}
 }
@@ -732,6 +765,11 @@ struct packed_oid_list {
 	int alloc;
 	struct progress *progress;
 	int progress_done;
+	uint32_t num_objects;
+	uint32_t num_commits;
+	uint32_t num_tags;
+	uint32_t num_trees;
+	uint32_t num_blobs;
 };

 static int add_packed_commits(const struct object_id *oid,
@@ -751,6 +789,21 @@ static int add_packed_commits(const struct object_id *oid,
 	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));

+	/*
+	 * Aggregate object statistics
+	 */
+	list->num_objects++;
+	if (type == OBJ_COMMIT)
+		list->num_commits++;
+	else if (type == OBJ_TAG)
+		list->num_tags++;
+	else if (type == OBJ_TREE)
+		list->num_trees++;
+	else if (type == OBJ_BLOB)
+		list->num_blobs++;
+	else
+		BUG("should not encounter internal-only object_type %d value here!", type);
+
 	if (type != OBJ_COMMIT)
 		return 0;

@@ -939,6 +992,8 @@ int write_commit_graph(const char *obj_dir,
 	oids.progress = NULL;
 	oids.progress_done = 0;
 	commits.list = NULL;
+	oids.num_objects = oids.num_commits = oids.num_tags =
+		oids.num_trees = oids.num_blobs = 0;

 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -1092,7 +1147,7 @@ int write_commit_graph(const char *obj_dir,

 		commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
+	num_chunks = num_extra_edges ? 5 : 4;
 	stop_progress(&progress);

 	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
@@ -1136,20 +1191,22 @@ int write_commit_graph(const char *obj_dir,
 		break;
 	}

-	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
-	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
-	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	chunk_ids[0] = GRAPH_CHUNKID_OIDNUMBERS;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[2] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[3] = GRAPH_CHUNKID_DATA;
 	if (num_extra_edges)
-		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
+		chunk_ids[4] = GRAPH_CHUNKID_EXTRAEDGES;
 	else
-		chunk_ids[3] = 0;
-	chunk_ids[4] = 0;
+		chunk_ids[4] = 0;
+	chunk_ids[5] = 0;

 	chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
-	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_OIDNUMBERS_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[3] = chunk_offsets[2] + hashsz * commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + (hashsz + 16) * commits.nr;
+	chunk_offsets[5] = chunk_offsets[4] + 4 * num_extra_edges;

 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
@@ -1170,6 +1227,10 @@ int write_commit_graph(const char *obj_dir,
 			progress_title.buf,
 			num_chunks * commits.nr);
 	}
+	write_graph_chunk_numbers(f, progress, &progress_cnt,
+				  oids.num_objects, oids.num_commits,
+				  oids.num_tags, oids.num_trees,
+				  oids.num_blobs);
 	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
 	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
 	write_graph_chunk_data(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
diff --git a/commit-graph.h b/commit-graph.h
index 2c461770e8..ef9eb0b6cb 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -45,10 +45,16 @@ struct commit_graph {

 	unsigned char hash_len;
 	unsigned char num_chunks;
+	uint32_t num_objects;
 	uint32_t num_commits;
+	uint32_t num_commits_stat;
+	uint32_t num_tags;
+	uint32_t num_trees;
+	uint32_t num_blobs;
 	struct object_id oid;

 	const uint32_t *chunk_oid_fanout;
+	const unsigned char *chunk_oid_numbers;
 	const unsigned char *chunk_oid_lookup;
 	const unsigned char *chunk_commit_data;
 	const unsigned char *chunk_extra_edges;


Opel Vivaro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 0/6] Create commit-graph file format v2
  2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
@ 2019-05-01 13:11   ` " Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
                       ` (7 more replies)
  6 siblings, 8 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano

The commit-graph file format has some shortcomings that were discussed
on-list:

 1. It doesn't use the 4-byte format ID from the_hash_algo.
    
    
 2. There is no way to change the reachability index from generation numbers
    to corrected commit date [1].
    
    
 3. The unused byte in the format could be used to signal the file is
    incremental, but current clients ignore the value even if it is
    non-zero.
    
    

This series adds a new version (2) to the commit-graph file. The fifth byte
already specified the file format, so existing clients will gracefully
respond to files with a different version number. The only real change now
is that the header takes 12 bytes instead of 8, due to using the 4-byte
format ID for the hash algorithm.

The new bytes reserved for the reachability index version and incremental
file formats are now expected to be equal to the defaults. When we update
these values to be flexible in the future, if a client understands
commit-graph v2 but not those new values, then it will fail gracefully.

NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
were significant and subtle.

Updates in V3: Thanks for all the feedback so far!

 * Moved the version information into an unsigned char parameter, instead of
   a flag.
   
   
 * We no longer default to the v2 file format, as that will break users who
   downgrade. This required some changes to the test script.
   
   
 * Removed the "Future work" section from the commit-graph design document
   in a new patch.
   
   
 * I did not change the file name for v2 file formats, as Ævar suggested.
   I'd like the discussion to continue on this topic.
   
   

Thanks, -Stolee

[1] 
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/

Derrick Stolee (6):
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: create new version parameter
  commit-graph: add --version=<n> option
  commit-graph: implement file format version 2
  commit-graph: remove Future Work section

 Documentation/git-commit-graph.txt            |   3 +
 .../technical/commit-graph-format.txt         |  26 ++-
 Documentation/technical/commit-graph.txt      |  17 --
 builtin/commit-graph.c                        |  33 ++--
 builtin/commit.c                              |   5 +-
 builtin/gc.c                                  |   8 +-
 commit-graph.c                                | 153 +++++++++++++-----
 commit-graph.h                                |  16 +-
 t/t5318-commit-graph.sh                       |  75 ++++++++-
 9 files changed, 250 insertions(+), 86 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v2:

 1:  91f300ec0a = 1:  91f300ec0a commit-graph: return with errors during write
 2:  04f5df1135 ! 2:  924b22f990 commit-graph: collapse parameters into flags
     @@ -86,7 +86,7 @@
       
      -int write_commit_graph_reachable(const char *obj_dir, int append,
      -				 int report_progress)
     -+int write_commit_graph_reachable(const char *obj_dir, int flags)
     ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
       {
       	struct string_list list = STRING_LIST_INIT_DUP;
       	int result;
     @@ -103,7 +103,7 @@
       		       struct string_list *pack_indexes,
       		       struct string_list *commit_hex,
      -		       int append, int report_progress)
     -+		       int flags)
     ++		       unsigned int flags)
       {
       	struct packed_oid_list oids;
       	struct packed_commit_list commits;
     @@ -129,12 +129,12 @@
      +#define COMMIT_GRAPH_APPEND     (1 << 0)
      +#define COMMIT_GRAPH_PROGRESS   (1 << 1)
      +
     -+int write_commit_graph_reachable(const char *obj_dir, int flags);
     ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
       int write_commit_graph(const char *obj_dir,
       		       struct string_list *pack_indexes,
       		       struct string_list *commit_hex,
      -		       int append, int report_progress);
     -+		       int flags);
     ++		       unsigned int flags);
       
       int verify_commit_graph(struct repository *r, struct commit_graph *g);
       
 3:  4ddb829163 ! 3:  8446011a43 commit-graph: create new version flags
     @@ -1,12 +1,12 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    commit-graph: create new version flags
     +    commit-graph: create new version parameter
      
          In anticipation of a new commit-graph file format version, create
     -    a flag for the write_commit_graph() and write_commit_graph_reachable()
     +    a parameter for the write_commit_graph() and write_commit_graph_reachable()
          methods to take a version number.
      
     -    When there is no specified version, the implementation selects a
     +    When the given version is zero, the implementation selects a
          default value. Currently, the only valid value is 1.
      
          The file format will change the header information, so place the
     @@ -14,6 +14,55 @@
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     + diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
     + --- a/builtin/commit-graph.c
     + +++ b/builtin/commit-graph.c
     +@@
     + 	read_replace_refs = 0;
     + 
     + 	if (opts.reachable)
     +-		return write_commit_graph_reachable(opts.obj_dir, flags);
     ++		return write_commit_graph_reachable(opts.obj_dir, flags, 0);
     + 
     + 	string_list_init(&lines, 0);
     + 	if (opts.stdin_packs || opts.stdin_commits) {
     +@@
     + 	result = write_commit_graph(opts.obj_dir,
     + 				    pack_indexes,
     + 				    commit_hex,
     +-				    flags);
     ++				    flags, 0);
     + 
     + 	UNLEAK(lines);
     + 	return result;
     +
     + diff --git a/builtin/commit.c b/builtin/commit.c
     + --- a/builtin/commit.c
     + +++ b/builtin/commit.c
     +@@
     + 		      "not exceeded, and then \"git reset HEAD\" to recover."));
     + 
     + 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
     +-	    write_commit_graph_reachable(get_object_directory(), 0))
     ++	    write_commit_graph_reachable(get_object_directory(), 0, 0))
     + 		return 1;
     + 
     + 	repo_rerere(the_repository, 0);
     +
     + diff --git a/builtin/gc.c b/builtin/gc.c
     + --- a/builtin/gc.c
     + +++ b/builtin/gc.c
     +@@
     + 
     + 	if (gc_write_commit_graph &&
     + 	    write_commit_graph_reachable(get_object_directory(),
     +-					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
     ++					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0,
     ++					 0))
     + 		return 1;
     + 
     + 	if (auto_gc && too_many_loose_objects())
     +
       diff --git a/commit-graph.c b/commit-graph.c
       --- a/commit-graph.c
       +++ b/commit-graph.c
     @@ -74,18 +123,43 @@
       	for (i = 0; i < graph->num_chunks; i++) {
       		uint32_t chunk_id;
       		uint64_t chunk_offset;
     +@@
     + 	return 0;
     + }
     + 
     +-int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
     ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags,
     ++				 unsigned char version)
     + {
     + 	struct string_list list = STRING_LIST_INIT_DUP;
     + 	int result;
     + 
     + 	for_each_ref(add_ref_to_list, &list);
     + 	result = write_commit_graph(obj_dir, NULL, &list,
     +-				    flags);
     ++				    flags, version);
     + 
     + 	string_list_clear(&list, 0);
     + 	return result;
     +@@
     + int write_commit_graph(const char *obj_dir,
     + 		       struct string_list *pack_indexes,
     + 		       struct string_list *commit_hex,
     +-		       unsigned int flags)
     ++		       unsigned int flags,
     ++		       unsigned char version)
     + {
     + 	struct packed_oid_list oids;
     + 	struct packed_commit_list commits;
      @@
       	int res = 0;
       	int append = flags & COMMIT_GRAPH_APPEND;
       	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
     -+	int version = 0;
      +	int header_size = 0;
       
       	if (!commit_graph_compatible(the_repository))
       		return 0;
       
     -+	if (flags & COMMIT_GRAPH_VERSION_1)
     -+		version = 1;
      +	if (!version)
      +		version = 1;
      +	if (version != 1) {
     @@ -132,10 +206,18 @@
       --- a/commit-graph.h
       +++ b/commit-graph.h
      @@
     - 
       #define COMMIT_GRAPH_APPEND     (1 << 0)
       #define COMMIT_GRAPH_PROGRESS   (1 << 1)
     -+#define COMMIT_GRAPH_VERSION_1  (1 << 2)
       
     - int write_commit_graph_reachable(const char *obj_dir, int flags);
     +-int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
     ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags,
     ++				 unsigned char version);
       int write_commit_graph(const char *obj_dir,
     + 		       struct string_list *pack_indexes,
     + 		       struct string_list *commit_hex,
     +-		       unsigned int flags);
     ++		       unsigned int flags,
     ++		       unsigned char version);
     + 
     + int verify_commit_graph(struct repository *r, struct commit_graph *g);
     + 
 4:  b1b0c76eb4 ! 4:  6a0e99f9f9 commit-graph: add --version=<n> option
     @@ -62,18 +62,29 @@
       	};
       
      @@
     + 	if (!opts.obj_dir)
     + 		opts.obj_dir = get_object_directory();
       	if (opts.append)
     - 		flags |= COMMIT_GRAPH_APPEND;
     +-		flags |= COMMIT_GRAPH_APPEND;
     ++		flags |= COMMIT_GRAPH_APPEND;	
       
     -+	switch (opts.version) {
     -+	case 1:
     -+		flags |= COMMIT_GRAPH_VERSION_1;
     -+		break;
     -+	}
     -+
       	read_replace_refs = 0;
       
       	if (opts.reachable)
     +-		return write_commit_graph_reachable(opts.obj_dir, flags, 0);
     ++		return write_commit_graph_reachable(opts.obj_dir, flags, opts.version);
     + 
     + 	string_list_init(&lines, 0);
     + 	if (opts.stdin_packs || opts.stdin_commits) {
     +@@
     + 	result = write_commit_graph(opts.obj_dir,
     + 				    pack_indexes,
     + 				    commit_hex,
     +-				    flags, 0);
     ++				    flags, opts.version);
     + 
     + 	UNLEAK(lines);
     + 	return result;
      
       diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
       --- a/t/t5318-commit-graph.sh
 5:  09362bda1b ! 5:  cca8267dfe commit-graph: implement file format version 2
     @@ -23,14 +23,8 @@
          non-zero.
      
          Update the 'git commit-graph read' subcommand to display the new
     -    data.
     -
     -    Set the default file format version to 2, and adjust the tests to
     -    expect the new 'git commit-graph read' output.
     -
     -    Add explicit tests for the upgrade path from version 1 to 2. Users
     -    with an existing commit-graph with version 1 will seamlessly
     -    upgrade to version 2 on their next write.
     +    data, and check this output in the test that explicitly writes a
     +    v2 commit-graph file.
      
          While we converted the existing 'verify' tests to use a version 1
          file to avoid recalculating data offsets, add explicit 'verify'
     @@ -103,17 +97,6 @@
       	printf("num_commits: %u\n", graph->num_commits);
       	printf("chunks:");
       
     -@@
     - 	case 1:
     - 		flags |= COMMIT_GRAPH_VERSION_1;
     - 		break;
     -+
     -+	case 2:
     -+		flags |= COMMIT_GRAPH_VERSION_2;
     -+		break;
     - 	}
     - 
     - 	read_replace_refs = 0;
      
       diff --git a/commit-graph.c b/commit-graph.c
       --- a/commit-graph.c
     @@ -170,14 +153,9 @@
       	graph->hash_len = the_hash_algo->rawsz;
      @@
       
     - 	if (flags & COMMIT_GRAPH_VERSION_1)
     - 		version = 1;
     -+	if (flags & COMMIT_GRAPH_VERSION_2)
     -+		version = 2;
       	if (!version)
     --		version = 1;
     + 		version = 1;
      -	if (version != 1) {
     -+		version = 2;
      +	if (version <= 0 || version > 2) {
       		error(_("unsupported commit-graph version %d"),
       		      version);
     @@ -198,18 +176,6 @@
       
       	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
      
     - diff --git a/commit-graph.h b/commit-graph.h
     - --- a/commit-graph.h
     - +++ b/commit-graph.h
     -@@
     - #define COMMIT_GRAPH_APPEND     (1 << 0)
     - #define COMMIT_GRAPH_PROGRESS   (1 << 1)
     - #define COMMIT_GRAPH_VERSION_1  (1 << 2)
     -+#define COMMIT_GRAPH_VERSION_2  (1 << 3)
     - 
     - int write_commit_graph_reachable(const char *obj_dir, int flags);
     - int write_commit_graph(const char *obj_dir,
     -
       diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
       --- a/t/t5318-commit-graph.sh
       +++ b/t/t5318-commit-graph.sh
     @@ -238,37 +204,37 @@
       	OPTIONAL=""
       	NUM_CHUNKS=3
       	if test ! -z $2
     -@@
     - 		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
     - 	fi
     - 	cat >expect <<- EOF
     --	header: 43475048 1 1 $NUM_CHUNKS 0
     -+	header: 43475048 2 $NUM_CHUNKS 1 0
     -+	hash algorithm: 73686131
     - 	num_commits: $1
     - 	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
     - 	EOF
      @@
       	)
       '
       
     -+test_expect_success 'write v1 graph' '
     -+	git commit-graph write --reachable --version=1 &&
     -+	git commit-graph verify
     -+'
     -+
     -+graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
     -+graph_git_behavior 'version 1 graph, commit 8 vs merge 2' full commits/8 merge/2
     ++graph_read_expect_v2 () {
     ++	OPTIONAL=""
     ++	NUM_CHUNKS=3
     ++	if test ! -z $2
     ++	then
     ++		OPTIONAL=" $2"
     ++		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
     ++	fi	
     ++	cat >expect <<- EOF
     ++	header: 43475048 2 $NUM_CHUNKS 1 0
     ++	hash algorithm: 73686131
     ++	num_commits: $1
     ++	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
     ++	EOF
     ++	git commit-graph read >output &&
     ++	test_cmp expect output
     ++}
      +
     -+test_expect_success 'upgrade from v1 to v2' '
     -+	git checkout -b new-commit-for-upgrade &&
     -+	test_commit force-upgrade &&
     ++test_expect_success 'write v2 graph' '
     ++	cd "$TRASH_DIRECTORY/full" &&
      +	git commit-graph write --reachable --version=2 &&
     ++	graph_read_expect_v2 11 extra_edges &&
      +	git commit-graph verify
      +'
      +
     -+graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
     -+graph_git_behavior 'upgraded graph, commit 8 vs merge 2' full commits/8 merge/2
     ++graph_git_behavior 'version 2 graph, commit 8 vs merge 2' full commits/8 merge/2
     ++graph_git_behavior 'version 2 graph, commit 8 vs merge 2' full commits/8 merge/2
      +
       # the verify tests below expect the commit-graph to contain
       # exactly the commits reachable from the commits/8 branch.
 -:  ---------- > 6:  e72bca6c78 commit-graph: remove Future Work section

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 1/6] commit-graph: return with errors during write
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
  2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 19 +++++++------
 builtin/commit.c       |  5 ++--
 builtin/gc.c           |  7 ++---
 commit-graph.c         | 60 +++++++++++++++++++++++++++++-------------
 commit-graph.h         | 10 +++----
 5 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2e86251f02 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *pack_indexes = NULL;
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
+	int result;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
 	read_replace_refs = 0;
 
-	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-		return 0;
-	}
+	if (opts.reachable)
+		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
 		UNLEAK(buf);
 	}
 
-	write_commit_graph(opts.obj_dir,
-			   pack_indexes,
-			   commit_hex,
-			   opts.append,
-			   1);
+	result = write_commit_graph(opts.obj_dir,
+				    pack_indexes,
+				    commit_hex,
+				    opts.append,
+				    1);
 
 	UNLEAK(lines);
-	return 0;
+	return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "new_index file. Check that disk is not full and quota is\n"
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-		write_commit_graph_reachable(get_object_directory(), 0, 0);
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+		return 1;
 
 	repo_rerere(the_repository, 0);
 	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0,
-					     !quiet && !daemonized);
+	if (gc_write_commit_graph &&
+	    write_commit_graph_reachable(get_object_directory(), 0,
+					 !quiet && !daemonized))
+		return 1;
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..ee487a364b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+				 int report_progress)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
+	int result;
 
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+	result = write_commit_graph(obj_dir, NULL, &list,
+				    append, report_progress);
 
 	string_list_clear(&list, 0);
+	return result;
 }
 
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress)
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
-	char *graph_name;
+	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
@@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
-		return;
+		return 0;
 
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
+	commits.list = NULL;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
 			strbuf_setlen(&packname, dirlen);
 			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p)
-				die(_("error adding pack %s"), packname.buf);
-			if (open_pack_index(p))
-				die(_("error opening index for %s"), packname.buf);
+			if (!p) {
+				error(_("error adding pack %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
+			if (open_pack_index(p)) {
+				error(_("error opening index for %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
 			for_each_object_in_pack(p, add_packed_commits, &oids,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
@@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
 	}
 	stop_progress(&progress);
 
-	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
-		die(_("the commit graph format cannot write %d commits"), count_distinct);
+	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
+		error(_("the commit graph format cannot write %d commits"), count_distinct);
+		res = 1;
+		goto cleanup;
+	}
 
 	commits.nr = 0;
 	commits.alloc = count_distinct;
@@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
 	num_chunks = num_extra_edges ? 4 : 3;
 	stop_progress(&progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
-		die(_("too many commits to write graph"));
+	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+		error(_("too many commits to write graph"));
+		res = 1;
+		goto cleanup;
+	}
 
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name)) {
 		UNLEAK(graph_name);
-		die_errno(_("unable to create leading directories of %s"),
-			  graph_name);
+		error(_("unable to create leading directories of %s"),
+			graph_name);
+		res = errno;
+		goto cleanup;
 	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
@@ -1107,9 +1126,12 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+cleanup:
 	free(graph_name);
 	free(commits.list);
 	free(oids.list);
+
+	return res;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
diff --git a/commit-graph.h b/commit-graph.h
index 7dfb8c896f..d15670bf46 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
+int write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress);
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress);
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 2/6] commit-graph: collapse parameters into flags
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 8 +++++---
 builtin/commit.c       | 2 +-
 builtin/gc.c           | 4 ++--
 commit-graph.c         | 9 +++++----
 commit-graph.h         | 8 +++++---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2e86251f02..828b1a713f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
 	int result;
+	int flags = COMMIT_GRAPH_PROGRESS;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
+	if (opts.append)
+		flags |= COMMIT_GRAPH_APPEND;
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
+		return write_commit_graph_reachable(opts.obj_dir, flags);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    opts.append,
-				    1);
+				    flags);
 
 	UNLEAK(lines);
 	return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+	    write_commit_graph_reachable(get_object_directory(), 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	}
 
 	if (gc_write_commit_graph &&
-	    write_commit_graph_reachable(get_object_directory(), 0,
-					 !quiet && !daemonized))
+	    write_commit_graph_reachable(get_object_directory(),
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index ee487a364b..8bbd50658c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				 int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    append, report_progress);
+				    flags);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append,
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress)
+		       unsigned int flags)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
 	int res = 0;
+	int append = flags & COMMIT_GRAPH_APPEND;
+	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
diff --git a/commit-graph.h b/commit-graph.h
index d15670bf46..70f4caf0c7 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,14 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress);
+#define COMMIT_GRAPH_APPEND     (1 << 0)
+#define COMMIT_GRAPH_PROGRESS   (1 << 1)
+
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress);
+		       unsigned int flags);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 3/6] commit-graph: create new version parameter
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In anticipation of a new commit-graph file format version, create
a parameter for the write_commit_graph() and write_commit_graph_reachable()
methods to take a version number.

When the given version is zero, the implementation selects a
default value. Currently, the only valid value is 1.

The file format will change the header information, so place the
existing header logic inside a switch statement with only one case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c |  4 +--
 builtin/commit.c       |  2 +-
 builtin/gc.c           |  3 +-
 commit-graph.c         | 63 +++++++++++++++++++++++++++---------------
 commit-graph.h         |  6 ++--
 5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 828b1a713f..7d9185dfc2 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -173,7 +173,7 @@ static int graph_write(int argc, const char **argv)
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, flags);
+		return write_commit_graph_reachable(opts.obj_dir, flags, 0);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -193,7 +193,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    flags);
+				    flags, 0);
 
 	UNLEAK(lines);
 	return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b001ef565d..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0))
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index df2573f124..41637242b1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -666,7 +666,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	if (gc_write_commit_graph &&
 	    write_commit_graph_reachable(get_object_directory(),
-					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0,
+					 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index 8bbd50658c..b6f09f1be2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,9 +25,6 @@
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
-#define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
-
 #define GRAPH_EXTRA_EDGES_NEEDED 0x80000000
 #define GRAPH_EDGE_LAST_MASK 0x7fffffff
 #define GRAPH_PARENT_NONE 0x70000000
@@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != GRAPH_VERSION) {
+	if (graph_version != 1) {
 		error(_("commit-graph version %X does not match version %X"),
-		      graph_version, GRAPH_VERSION);
-		return NULL;
-	}
-
-	hash_version = *(unsigned char*)(data + 5);
-	if (hash_version != oid_version()) {
-		error(_("commit-graph hash version %X does not match version %X"),
-		      hash_version, oid_version());
+		      graph_version, 1);
 		return NULL;
 	}
 
 	graph = alloc_commit_graph();
 
+	switch (graph_version) {
+	case 1:
+		hash_version = *(unsigned char*)(data + 5);
+		if (hash_version != oid_version()) {
+			error(_("commit-graph hash version %X does not match version %X"),
+			      hash_version, oid_version());
+			return NULL;
+		}
+
+		graph->num_chunks = *(unsigned char*)(data + 6);
+		chunk_lookup = data + 8;
+		break;
+	}
+
 	graph->hash_len = the_hash_algo->rawsz;
-	graph->num_chunks = *(unsigned char*)(data + 6);
 	graph->graph_fd = fd;
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
-	chunk_lookup = data + 8;
 	for (i = 0; i < graph->num_chunks; i++) {
 		uint32_t chunk_id;
 		uint64_t chunk_offset;
@@ -851,14 +853,15 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags,
+				 unsigned char version)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    flags);
+				    flags, version);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -867,7 +870,8 @@ int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       unsigned int flags)
+		       unsigned int flags,
+		       unsigned char version)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -888,10 +892,19 @@ int write_commit_graph(const char *obj_dir,
 	int res = 0;
 	int append = flags & COMMIT_GRAPH_APPEND;
 	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
+	int header_size = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
 
+	if (!version)
+		version = 1;
+	if (version != 1) {
+		error(_("unsupported commit-graph version %d"),
+		      version);
+		return 1;
+	}
+
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
@@ -1076,10 +1089,16 @@ int write_commit_graph(const char *obj_dir,
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
-	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused padding byte */
+	hashwrite_u8(f, version);
+
+	switch (version) {
+	case 1:
+		hashwrite_u8(f, oid_version());
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 0); /* unused padding byte */
+		header_size = 8;
+		break;
+	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
@@ -1090,7 +1109,7 @@ int write_commit_graph(const char *obj_dir,
 		chunk_ids[3] = 0;
 	chunk_ids[4] = 0;
 
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
 	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
diff --git a/commit-graph.h b/commit-graph.h
index 70f4caf0c7..d64a2cc78c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -68,11 +68,13 @@ int generation_numbers_enabled(struct repository *r);
 #define COMMIT_GRAPH_APPEND     (1 << 0)
 #define COMMIT_GRAPH_PROGRESS   (1 << 1)
 
-int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags,
+				 unsigned char version);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       unsigned int flags);
+		       unsigned int flags,
+		       unsigned char version);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 4/6] commit-graph: add --version=<n> option
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Allow the commit-graph builtin to specify the file format version
using the '--version=<n>' option. Specify the version exactly in
the verification tests as using a different version would change
the offsets used in those tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  3 +++
 builtin/commit-graph.c             | 13 ++++++++-----
 t/t5318-commit-graph.sh            |  2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 624470e198..1d1cc70de4 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -51,6 +51,9 @@ or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
++
+With the `--version=<n>` option, specify the file format version. Used
+only for testing.
 
 'read'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 7d9185dfc2..e766dd076e 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=<n>]"),
 	NULL
 };
 
@@ -35,6 +35,7 @@ static struct opts_commit_graph {
 	int stdin_packs;
 	int stdin_commits;
 	int append;
+	int version;
 } opts;
 
 
@@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv)
 			N_("start walk at commits listed by stdin")),
 		OPT_BOOL(0, "append", &opts.append,
 			N_("include all commits already in the commit-graph file")),
+		OPT_INTEGER(0, "version", &opts.version,
+			N_("specify the file format version")),
 		OPT_END(),
 	};
 
@@ -168,12 +171,12 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 	if (opts.append)
-		flags |= COMMIT_GRAPH_APPEND;
+		flags |= COMMIT_GRAPH_APPEND;	
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, flags, 0);
+		return write_commit_graph_reachable(opts.obj_dir, flags, opts.version);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -193,7 +196,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    flags, 0);
+				    flags, opts.version);
 
 	UNLEAK(lines);
 	return result;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..4eb5a09ef3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
-	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits --version=1 &&
 	git commit-graph verify >output
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 5/6] commit-graph: implement file format version 2
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
  2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
     of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
     We currently only support generation numbers, but that will
     change in the future.

  3. Git did not fail with error if the unused eighth byte was
     non-zero, so we could not use that to indicate an incremental
     file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

Update the 'git commit-graph read' subcommand to display the new
data, and check this output in the test that explicitly writes a
v2 commit-graph file.

While we converted the existing 'verify' tests to use a version 1
file to avoid recalculating data offsets, add explicit 'verify'
tests on a version 2 file that corrupt the new header values.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 26 ++++++-
 builtin/commit-graph.c                        |  5 ++
 commit-graph.c                                | 39 +++++++++-
 t/t5318-commit-graph.sh                       | 73 +++++++++++++++++--
 4 files changed, 134 insertions(+), 9 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
       The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-      Currently, the only valid version is 1.
+      Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
       We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
      Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+      Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+      Current clients expect this value to be zero, and will not
+      try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+      If this identifier does not agree with the repository's current
+      hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index e766dd076e..7df6688b08 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv)
 		*(unsigned char*)(graph->data + 5),
 		*(unsigned char*)(graph->data + 6),
 		*(unsigned char*)(graph->data + 7));
+
+	if (*(unsigned char *)(graph->data + 4) == 2)
+		printf("hash algorithm: %X\n",
+		       get_be32(graph->data + 8));
+
 	printf("num_commits: %u\n", graph->num_commits);
 	printf("chunks:");
 
diff --git a/commit-graph.c b/commit-graph.c
index b6f09f1be2..5eebba6a0f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	uint64_t last_chunk_offset;
 	uint32_t last_chunk_id;
 	uint32_t graph_signature;
-	unsigned char graph_version, hash_version;
+	unsigned char graph_version, hash_version, reach_index_version;
+	uint32_t hash_id;
 
 	if (!graph_map)
 		return NULL;
@@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	}
 
 	graph_version = *(unsigned char*)(data + 4);
-	if (graph_version != 1) {
+	if (!graph_version || graph_version > 2) {
 		error(_("commit-graph version %X does not match version %X"),
 		      graph_version, 1);
 		return NULL;
@@ -190,6 +191,30 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		graph->num_chunks = *(unsigned char*)(data + 6);
 		chunk_lookup = data + 8;
 		break;
+
+	case 2:
+		graph->num_chunks = *(unsigned char *)(data + 5);
+
+		reach_index_version = *(unsigned char *)(data + 6);
+		if (reach_index_version != 1) {
+			error(_("unsupported reachability index version %d"),
+			      reach_index_version);
+			return NULL;
+		}
+
+		if (*(unsigned char*)(data + 7)) {
+			error(_("unsupported value in commit-graph header"));
+			return NULL;
+		}
+
+		hash_id = get_be32(data + 8);
+		if (hash_id != the_hash_algo->format_id) {
+			error(_("commit-graph hash algorithm does not match current algorithm"));
+			return NULL;
+		}
+
+		chunk_lookup = data + 12;
+		break;
 	}
 
 	graph->hash_len = the_hash_algo->rawsz;
@@ -899,7 +924,7 @@ int write_commit_graph(const char *obj_dir,
 
 	if (!version)
 		version = 1;
-	if (version != 1) {
+	if (version <= 0 || version > 2) {
 		error(_("unsupported commit-graph version %d"),
 		      version);
 		return 1;
@@ -1098,6 +1123,14 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite_u8(f, 0); /* unused padding byte */
 		header_size = 8;
 		break;
+
+	case 2:
+		hashwrite_u8(f, num_chunks);
+		hashwrite_u8(f, 1); /* reachability index version */
+		hashwrite_u8(f, 0); /* unused padding byte */
+		hashwrite_be32(f, the_hash_algo->format_id);
+		header_size = 12;
+		break;
 	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4eb5a09ef3..373a6cd0d4 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,13 +33,13 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
-graph_git_two_modes() {
+graph_git_two_modes () {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
 	test_cmp expect output
 }
 
-graph_git_behavior() {
+graph_git_behavior () {
 	MSG=$1
 	DIR=$2
 	BRANCH=$3
@@ -56,7 +56,7 @@ graph_git_behavior() {
 
 graph_git_behavior 'no graph' full commits/3 commits/1
 
-graph_read_expect() {
+graph_read_expect () {
 	OPTIONAL=""
 	NUM_CHUNKS=3
 	if test ! -z $2
@@ -320,6 +320,34 @@ test_expect_success 'replace-objects invalidates commit-graph' '
 	)
 '
 
+graph_read_expect_v2 () {
+	OPTIONAL=""
+	NUM_CHUNKS=3
+	if test ! -z $2
+	then
+		OPTIONAL=" $2"
+		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+	fi	
+	cat >expect <<- EOF
+	header: 43475048 2 $NUM_CHUNKS 1 0
+	hash algorithm: 73686131
+	num_commits: $1
+	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+	EOF
+	git commit-graph read >output &&
+	test_cmp expect output
+}
+
+test_expect_success 'write v2 graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --reachable --version=2 &&
+	graph_read_expect_v2 11 extra_edges &&
+	git commit-graph verify
+'
+
+graph_git_behavior 'version 2 graph, commit 8 vs merge 2' full commits/8 merge/2
+graph_git_behavior 'version 2 graph, commit 8 vs merge 2' full commits/8 merge/2
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
@@ -392,7 +420,7 @@ corrupt_graph_verify() {
 # starting at <zero_pos>, then runs 'git commit-graph verify'
 # and places the output in the file 'err'. Test 'err' for
 # the given string.
-corrupt_graph_and_verify() {
+corrupt_graph_and_verify () {
 	pos=$1
 	data="${2:-\0}"
 	grepstr=$3
@@ -424,10 +452,14 @@ test_expect_success 'detect bad signature' '
 '
 
 test_expect_success 'detect bad version' '
-	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\03" \
 		"graph version"
 '
 
+test_expect_success 'detect version 2 with version 1 data' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"reachability index version"
+'
 test_expect_success 'detect bad hash version' '
 	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
 		"hash version"
@@ -532,6 +564,37 @@ test_expect_success 'git fsck (checks commit-graph)' '
 	test_must_fail git fsck
 '
 
+test_expect_success 'rewrite commmit-graph with version 2' '
+	rm -f .git/objects/info/commit-graph &&
+	git commit-graph write --reachable --version=2 &&
+	git commit-graph verify
+'
+
+GRAPH_BYTE_CHUNK_COUNT=5
+GRAPH_BYTE_REACH_INDEX=6
+GRAPH_BYTE_UNUSED=7
+GRAPH_BYTE_HASH=8
+
+test_expect_success 'detect low chunk count (v2)' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect incorrect reachability index' '
+	corrupt_graph_and_verify $GRAPH_BYTE_REACH_INDEX "\03" \
+		"reachability index version"
+'
+
+test_expect_success 'detect non-zero unused byte' '
+	corrupt_graph_and_verify $GRAPH_BYTE_UNUSED "\01" \
+		"unsupported value"
+'
+
+test_expect_success 'detect bad hash version (v2)' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\00" \
+		"hash algorithm"
+'
+
 test_expect_success 'setup non-the_repository tests' '
 	rm -rf repo &&
 	git init repo &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v3 6/6] commit-graph: remove Future Work section
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
@ 2019-05-01 13:11     ` Derrick Stolee via GitGitGadget
  2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
  2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
  7 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-01 13:11 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.

As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.

It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is as expensive as writing a commit-graph from
scratch. If we want to start trusting remote content, then
that item can be investigated again.

While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 7805b0968c..fb53341d5e 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,23 +127,6 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
-Future Work
------------
-
-- After computing and storing generation numbers, we must make graph
-  walks aware of generation numbers to gain the performance benefits they
-  enable. This will mostly be accomplished by swapping a commit-date-ordered
-  priority queue with one ordered by generation number. The following
-  operations are important candidates:
-
-    - 'log --topo-order'
-    - 'tag --merged'
-
-- A server could provide a commit-graph file as part of the network protocol
-  to avoid extra calculations by clients. This feature is only of benefit if
-  the user is willing to trust the file, because verifying the file is correct
-  is as hard as computing it from scratch.
-
 Related Links
 -------------
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 1/6] commit-graph: return with errors during write
  2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-01 14:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method uses die() to report failure and
> exit when confronted with an unexpected condition. This use of
> die() in a library function is incorrect and is now replaced by
> error() statements and an int return type.
>
> Now that we use 'goto cleanup' to jump to the terminal condition
> on an error, we have new paths that could lead to uninitialized
> values. New initializers are added to correct for this.
>
> The builtins 'commit-graph', 'gc', and 'commit' call these methods,
> so update them to check the return value.

Seems good to have a test to check for some of this behavior. I see that
can be done as just:

    echo doesnotexist | git commit-graph write --stdin-packs

And checking the exit code is 1 as it is now, not 128.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 6/6] commit-graph: remove Future Work section
  2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
@ 2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
  2019-05-01 19:59         ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-01 14:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> The commit-graph feature began with a long list of planned
> benefits, most of which are now complete. The future work
> section has only a few items left.
>
> As for making more algorithms aware of generation numbers,
> some are only waiting for generation number v2 to ensure the
> performance matches the existing behavior using commit date.
>
> It is unlikely that we will ever send a commit-graph file
> as part of the protocol, since we would need to verify the
> data, and that is as expensive as writing a commit-graph from
> scratch. If we want to start trusting remote content, then
> that item can be investigated again.

My best of 3 times for "write" followed by "verify" on linux.git are
8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write".

So that's a reduction of ~40%. I have another big in-house repo where I
get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for
a commit-graph file on the order of 50MB where it would be quicker for
me to download and verify it if the protocol supported it.

I'm not clamoring to make it part of the protocol, but the claim that
"verify" needs to do the equivalent of "write" seems to be demonstrably
wrong, or perhaps "verify" isn't doing all the work it should be doing?

> While there is more work to be done on the feature, having
> a section of the docs devoted to a TODO list is wasteful and
> hard to keep up-to-date.

Agreed, whatever we decide to do in the future I think it makes sense to
remove this section from the docs, although perhaps the commit message
should be amended per the above :)

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/commit-graph.txt | 17 -----------------
>  1 file changed, 17 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
> index 7805b0968c..fb53341d5e 100644
> --- a/Documentation/technical/commit-graph.txt
> +++ b/Documentation/technical/commit-graph.txt
> @@ -127,23 +127,6 @@ Design Details
>    helpful for these clones, anyway. The commit-graph will not be read or
>    written when shallow commits are present.
>
> -Future Work
> ------------
> -
> -- After computing and storing generation numbers, we must make graph
> -  walks aware of generation numbers to gain the performance benefits they
> -  enable. This will mostly be accomplished by swapping a commit-date-ordered
> -  priority queue with one ordered by generation number. The following
> -  operations are important candidates:
> -
> -    - 'log --topo-order'
> -    - 'tag --merged'
> -
> -- A server could provide a commit-graph file as part of the network protocol
> -  to avoid extra calculations by clients. This feature is only of benefit if
> -  the user is willing to trust the file, because verifying the file is correct
> -  is as hard as computing it from scratch.
> -
>  Related Links
>  -------------
>  [0] https://bugs.chromium.org/p/git/issues/detail?id=8

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 5/6] commit-graph: implement file format version 2
  2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
@ 2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
  2019-05-01 19:56         ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-01 19:12 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

>   3. Git did not fail with error if the unused eighth byte was
>      non-zero, so we could not use that to indicate an incremental
>      file format without breaking compatibility across versions.

This isn't new, I just missed this the last time around. I don't see how
this makes any sense.

On the current v1 graph code you can apply this patch and everything
continues to work because we ignore this padding byte:

    -       hashwrite_u8(f, 0); /* unused padding byte */
    +       hashwrite_u8(f, 1); /* unused padding byte */

But now after ab/commit-graph-fixes just got finished fixed the version
bump being a hard error of:

    error: graph version 2 does not match version 1

This v2 code is basically, as I understand it, introducing two ways of
expressing the version, so e.g. we might have v2 graphs with "0" here
changed to "1" for an incremental version of "2.1".

OK, let's try that then, on top of this series:

    diff --git a/commit-graph.c b/commit-graph.c
    index 5eebba6a0f..36c8cdb950 100644
    --- a/commit-graph.c
    +++ b/commit-graph.c
    @@ -1127,7 +1127,7 @@ int write_commit_graph(const char *obj_dir,
            case 2:
                    hashwrite_u8(f, num_chunks);
                    hashwrite_u8(f, 1); /* reachability index version */
    -               hashwrite_u8(f, 0); /* unused padding byte */
    +               hashwrite_u8(f, 1); /* unused padding byte */

Then:

    $ ~/g/git/git --exec-path=$PWD commit-graph write --version=2; ~/g/git/git --exec-path=$PWD status
    Expanding reachable commits in commit graph: 100% (201645/201645), done.
    Computing commit graph generation numbers: 100% (200556/200556), done.
    error: unsupported value in commit-graph header
    HEAD detached at pr-112/derrickstolee/graph/v2-head-v3

So we'll error out in the same way as if "2.0" was changed to "3.0" with
this "2.1" change, just with a more cryptic error message on this new v2
code.

I don't see how this is meaningfully different from just bumping the
version to "3". We abort parsing the graph just like with major version
changes.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 5/6] commit-graph: implement file format version 2
  2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
@ 2019-05-01 19:56         ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-01 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee

On 5/1/2019 3:12 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> OK, let's try that then, on top of this series:
> 
>     diff --git a/commit-graph.c b/commit-graph.c
>     index 5eebba6a0f..36c8cdb950 100644
>     --- a/commit-graph.c
>     +++ b/commit-graph.c
>     @@ -1127,7 +1127,7 @@ int write_commit_graph(const char *obj_dir,
>             case 2:
>                     hashwrite_u8(f, num_chunks);
>                     hashwrite_u8(f, 1); /* reachability index version */
>     -               hashwrite_u8(f, 0); /* unused padding byte */
>     +               hashwrite_u8(f, 1); /* unused padding byte */
> 
> Then:
> 
>     $ ~/g/git/git --exec-path=$PWD commit-graph write --version=2; ~/g/git/git --exec-path=$PWD status
>     Expanding reachable commits in commit graph: 100% (201645/201645), done.
>     Computing commit graph generation numbers: 100% (200556/200556), done.
>     error: unsupported value in commit-graph header
>     HEAD detached at pr-112/derrickstolee/graph/v2-head-v3
> 
> So we'll error out in the same way as if "2.0" was changed to "3.0" with
> this "2.1" change, just with a more cryptic error message on this new v2
> code.
> 
> I don't see how this is meaningfully different from just bumping the
> version to "3". We abort parsing the graph just like with major version
> changes.

Having a non-zero value here doesn't really mean "2.1" or "3". But I understand
your apprehension.

I'm currently working on building the incremental file format, based on
this series. This "unused" byte will be used to say "how many base commit-graph
files does this graph have?" If non-zero, we do not currently understand
how to stitch these files together into a "combined" graph at run time,
so we should fail.

If we should never have an intermediate version of Git that doesn't
understand this byte, then this series can wait until that feature is ready.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 6/6] commit-graph: remove Future Work section
  2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
@ 2019-05-01 19:59         ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-01 19:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano, Derrick Stolee

On 5/1/2019 10:58 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> The commit-graph feature began with a long list of planned
>> benefits, most of which are now complete. The future work
>> section has only a few items left.
>>
>> As for making more algorithms aware of generation numbers,
>> some are only waiting for generation number v2 to ensure the
>> performance matches the existing behavior using commit date.
>>
>> It is unlikely that we will ever send a commit-graph file
>> as part of the protocol, since we would need to verify the
>> data, and that is as expensive as writing a commit-graph from
>> scratch. If we want to start trusting remote content, then
>> that item can be investigated again.
> 
> My best of 3 times for "write" followed by "verify" on linux.git are
> 8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write".
> 
> So that's a reduction of ~40%. I have another big in-house repo where I
> get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for
> a commit-graph file on the order of 50MB where it would be quicker for
> me to download and verify it if the protocol supported it.

Keep in mind that your first "write" may have warmed up the file system
and your pack-files parsed faster the second time around.

You are right though, 'verify' doesn't do these things:

1. Sort a list of OIDs.
2. Write a file.

And perhaps some other things. I should mean that "the main task of
'git commit-graph verify' is to parse commits from the object store,
and this is the most expensive operation in 'git commit-graph write'."

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
@ 2019-05-01 20:25     ` Ævar Arnfjörð Bjarmason
  2019-05-02 13:26       ` Derrick Stolee
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
  7 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-01 20:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, sandals, Junio C Hamano


On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:

> The commit-graph file format has some shortcomings that were discussed
> on-list:
>
>  1. It doesn't use the 4-byte format ID from the_hash_algo.
>
>  2. There is no way to change the reachability index from generation numbers
>     to corrected commit date [1].
>
>  3. The unused byte in the format could be used to signal the file is
>     incremental, but current clients ignore the value even if it is
>     non-zero.
>
> This series adds a new version (2) to the commit-graph file. The fifth byte
> already specified the file format, so existing clients will gracefully
> respond to files with a different version number. The only real change now
> is that the header takes 12 bytes instead of 8, due to using the 4-byte
> format ID for the hash algorithm.
>
> The new bytes reserved for the reachability index version and incremental
> file formats are now expected to be equal to the defaults. When we update
> these values to be flexible in the future, if a client understands
> commit-graph v2 but not those new values, then it will fail gracefully.
>
> NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
> were significant and subtle.
>
> Updates in V3: Thanks for all the feedback so far!
>
>  * Moved the version information into an unsigned char parameter, instead of
>    a flag.
>
>  * We no longer default to the v2 file format, as that will break users who
>    downgrade. This required some changes to the test script.
>
>  * Removed the "Future work" section from the commit-graph design document
>    in a new patch.
>
>  * I did not change the file name for v2 file formats, as Ævar suggested.
>    I'd like the discussion to continue on this topic.

I won't repeat my outstanding v2 feedback about v1 & v2
incompatibilities, except to say that I'd in principle be fine with
having a v2 format the way this series is adding it. I.e. saying "here
it is, it's never written by default, we'll figure out these compat
issues later".

My only objection/nit on that point would be that the current
docs/commit messages should make some mention of the really bad
interactions between v1 and v2 on different git versions.

However, having written this again I really don't understand why we need
a v2 of this format at all.

The current format is:

    <CGPH signature>
    <CGPH version = 1>
    <hash version (0..255) where 1 == SHA-1>
    <num chunks (0..255)>
    <reserved byte ignored>
    [chunk offsets for our $num_chunks]
    [arbitrary chunk data for our $num_chunks]

And you want to change it to:

    <CGPH signature>
    <CGPH version = 2>
    <num chunks (0..255)>
    <reachability index version, hard error on values != 1 (should have seen this in my [1])>
    <reserved byte hard error on values != 0 [1]>
    <hash version 32 bit. So 0x73686131 = "sha1" instead of "1">
    [chunk offsets for our $num_chunks]
    [arbitrary chunk data for our $num_chunks]

Where "chunks" in the v1 format has always been a non-exhaustive list of
things *where we ignore anything we don't know about*.

So given our really bad compatibility issues with any non-v1 format I
suggested "let's use a different filename". But on closer look I retract
that.

How about we instead just don't change the header? I.e.:

 * Let's just live with "1" as the marker for SHA-1.

   Yeah it would be cute to use 0x73686131 instead like "struct
   git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
   2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
   be running into the 255 limit of hash algorithms Git will support any
   time soon.

 * Don't add the reachability index version *to the header* or change
   the reserved byte to be an error (see [1] again).

Instead we just add these things to new "chunks" as appropriate. As this
patch of mine shows we can easily do that, and it doesn't error out on
any existing version of git:
https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95

I now can't imagine a situation where we'd ever need to change the
format. We have 32 bits of chunk ids to play with, and can have 255 of
these chunks at a time, and unknown chunks are ignored by existing
versions and future version.

We can even have more than 255 if it comes to that by having a special
"extension" chunk, or even use the existing reserved byte for that and
pull the nasty trick of putting another set after the existing file
checksum, but I digress.

If we ever find that we e.g. don't want to write SHA-1 data anymore but
just want SHA-256 we just write a tiny amount of dummy data. Older git
versions will shrug at what looks like a really incomplete commit graph
data, but newer versions will know the real data is in some other chunk
they know about.

Ditto this "gen numbers or adjusted timestamps?" plan in
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/
We can have a chunk of adjusted timestamps into the generation number
chunk, and even start adding chunks of other side-data, e.g. the path
bloom filters...

This E-Mail needs to stop at some point, but as a brief aside I don't
see how this die("commit-graph hash algorithm does not match current
algorithm") plan makes sense either.

The hash-function-transition.txt plan describes how we'll have an index
of SHA-1<->SHA-256 object names. Why would it be an error to read a
SHA-1 commit-graph under SHA-256? Won't we just say "oh this graph lists
the SHA-1s" and then use that lookup table to resolve them as SHA-256s?

And as discussed once we go for extending things with chunks we can do a
lot better. We'd just keep the SHA-1 data segment, and perhaps (for
"just commits" cache locality) have another chunk of SHA-256 mappings to
those SHA-1s in the commit-graph file.

1. See feedback on the v2 patch in
   https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
@ 2019-05-02 13:26       ` Derrick Stolee
  2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee @ 2019-05-02 13:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee via GitGitGadget
  Cc: git, sandals, Junio C Hamano

On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
> I won't repeat my outstanding v2 feedback about v1 & v2
> incompatibilities, except to say that I'd in principle be fine with
> having a v2 format the way this series is adding it. I.e. saying "here
> it is, it's never written by default, we'll figure out these compat
> issues later".
> 
> My only objection/nit on that point would be that the current
> docs/commit messages should make some mention of the really bad
> interactions between v1 and v2 on different git versions.

Good idea to add some warnings in the docs to say something like
"version 2 is not supported by Git 2.2x and earlier".

> However, having written this again I really don't understand why we need
> a v2 of this format at all.

[snip]

> How about we instead just don't change the header? I.e.:
> 
>  * Let's just live with "1" as the marker for SHA-1.
> 
>    Yeah it would be cute to use 0x73686131 instead like "struct
>    git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>    2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>    be running into the 255 limit of hash algorithms Git will support any
>    time soon.

This was the intended direction of using the 1-byte value before, but
we have a preferred plan to use the 4-byte value in all future file formats.

>  * Don't add the reachability index version *to the header* or change
>    the reserved byte to be an error (see [1] again).

Since we can make the "corrected commit date" offset for a commit be
strictly larger than the offset of a parent, we can make it so an old client
will not give incorrect values when we use the new values. The only downside
would be that we would fail on 'git commit-graph verify' since the offsets
are not actually generation numbers in all cases.

> Instead we just add these things to new "chunks" as appropriate. As this
> patch of mine shows we can easily do that, and it doesn't error out on
> any existing version of git:
> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95

I like the idea of a "metadata" chunk. This can be useful for a lot of things.
If we start the chunk with a "number of items" and only append items to the
list, we can dynamically grow the chunk as we add values.

> I now can't imagine a situation where we'd ever need to change the
> format. We have 32 bits of chunk ids to play with, and can have 255 of
> these chunks at a time, and unknown chunks are ignored by existing
> versions and future version.

The solutions you have discussed work for 2 of the 3 problems at hand.
The incremental file format is a high-value feature, but _would_ break
existing clients if they don't understand the extra data. Unless I am
missing something for how to succeed here.

> 1. See feedback on the v2 patch in
>    https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/

My response [2] to that message includes the discussion of the
incremental file format.

[2] https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-02 13:26       ` Derrick Stolee
@ 2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
  2019-05-03 12:47           ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-02 18:02 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Junio C Hamano


On Thu, May 02 2019, Derrick Stolee wrote:

> On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> I won't repeat my outstanding v2 feedback about v1 & v2
>> incompatibilities, except to say that I'd in principle be fine with
>> having a v2 format the way this series is adding it. I.e. saying "here
>> it is, it's never written by default, we'll figure out these compat
>> issues later".
>>
>> My only objection/nit on that point would be that the current
>> docs/commit messages should make some mention of the really bad
>> interactions between v1 and v2 on different git versions.
>
> Good idea to add some warnings in the docs to say something like
> "version 2 is not supported by Git 2.2x and earlier".
>
>> However, having written this again I really don't understand why we need
>> a v2 of this format at all.
>
> [snip]
>
>> How about we instead just don't change the header? I.e.:
>>
>>  * Let's just live with "1" as the marker for SHA-1.
>>
>>    Yeah it would be cute to use 0x73686131 instead like "struct
>>    git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>>    2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>>    be running into the 255 limit of hash algorithms Git will support any
>>    time soon.
>
> This was the intended direction of using the 1-byte value before, but
> we have a preferred plan to use the 4-byte value in all future file formats.

Right, and I wouldn't argue about such a pointless thing for a future
file format.

But since the v1->v2 migration story is so unfriendly already for
reasons that can't be helped at this point (existing released versions)
I think we need to weigh the trade-offs of changing the header v.s. just
doing the conceptually less clean thing that allows existing clients a
painless transition.

>>  * Don't add the reachability index version *to the header* or change
>>    the reserved byte to be an error (see [1] again).
>
> Since we can make the "corrected commit date" offset for a commit be
> strictly larger than the offset of a parent, we can make it so an old client
> will not give incorrect values when we use the new values. The only downside
> would be that we would fail on 'git commit-graph verify' since the offsets
> are not actually generation numbers in all cases.

Aren't you talking about how the *content* (presumably in the chunk part
of the graph) is going to look like? I just mean these couple of bytes
in the header, again as a proxy discussion for "do we *really* need to
change this?".

>> Instead we just add these things to new "chunks" as appropriate. As this
>> patch of mine shows we can easily do that, and it doesn't error out on
>> any existing version of git:
>> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95
>
> I like the idea of a "metadata" chunk. This can be useful for a lot of things.
> If we start the chunk with a "number of items" and only append items to the
> list, we can dynamically grow the chunk as we add values.

Right. I like it too :) But right now I'm just using it as a demo for
how new arbitrary chunk data can be added to the v1 format in backwards
compatible ways.

My inclination for an actual version of that patch would be to make it
easier to read/extend (even just dump JSON there, or a series of
key/values) over micro-optimizing the storage size. Such metadata will
always be tiny v.s. the rest, but that's for later bikeshedding...

>> I now can't imagine a situation where we'd ever need to change the
>> format. We have 32 bits of chunk ids to play with, and can have 255 of
>> these chunks at a time, and unknown chunks are ignored by existing
>> versions and future version.
>
> The solutions you have discussed work for 2 of the 3 problems at hand.
> The incremental file format is a high-value feature, but _would_ break
> existing clients if they don't understand the extra data. Unless I am
> missing something for how to succeed here.

We would write out a file like this:

    <CGPH signature>
    <rest of v1 header incl. chunk offsets (but higher chunk count)>
    <chunks git understands now>
    <new chunks>
    <signature>

Where one of the new chunks could be INCC ("incremental count") or
whatever, serving the same purpose as the v2 modification of the header
to use the padding byte for the count. Then we'd have more chunks with
the incremental data (or pack it into one "magic" chunk with its own
format...).

Existing clients deal with the graph being incomplete, so the writer
could just not bother to update that part of the data and a newer
clients would know to find the rest in a series of incremental updates.

IOW an "empty" commit-graph now is 1100 bytes. Worst case we'd be
writing at least that number of bytes that would be mostly or entirely
useless to older clients, with the rest being new stuff newer clients
understand.

On the incremental format: I don't like:

 1) The idea that an incremental format would involve in-place
    modification of an existing file (or would we write a completely new
    one and move it in-place?).

    If it's in-place modification we get away a lot of things/avoid
    complexity with "we might delete, but we never modify existing" on
    existing *.{idx,bitmap} formats. E.g. mmap() is a royal PITA
    (more-so than now) once you need to deal with modifications.

    Also, if it's in-place we'd need to fully recompute the checksum
    SHA-1 as we modify the file.

 2) The assumption that we'd just have 255 of these, wouldn't it be
    reasonable to have a MIDX-like for it & write it along with packs as
    they come in? I.e. eventually have PACK.{pack,idx,bitmap,graph}.

    We support more than 255 packs, and it seems likely that we'd
    eventually want to generate/coalesce packs/idx and any other
    side-indexes on the same "gc" schedule.

But those are separate from any back-compat concerns, which is what I
think makes sense to focus on now.

>> 1. See feedback on the v2 patch in
>>    https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/
>
> My response [2] to that message includes the discussion of the
> incremental file format.
>
> [2] https://public-inbox.org/git/87muk6q98k.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
@ 2019-05-03 12:47           ` Derrick Stolee
  2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
  2019-05-03 14:16             ` SZEDER Gábor
  0 siblings, 2 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-03 12:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Junio C Hamano

On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote:
>
> But those are separate from any back-compat concerns, which is what I
> think makes sense to focus on now.

Thinking more on this topic, I think I have a way to satisfy _all_ of
your concerns by simplifying the plan for incremental commit-graph files.

My initial plan was to have the "commit-graph" file always be the "tip"
file, and it would point to some number of "graph-{hash}.graph" files.
Then, we would have some set of strategies to decide when we create a new
.graph file or when we compact the files down into the "commit-graph"
file.

This has several issues regarding race conditions that I had not yet
resolved (and maybe would always have problems).

It would be much simpler to restrict the model. Your idea of changing
the file name is the inspiration here.

* The "commit-graph" file is the base commit graph. It is always
  closed under reachability (if a commit exists in this file, then
  its parents are also in this file). We will also consider this to
  be "commit-graph-0".

* A commit-graph-<N> exists, then we check for the existence of
  commit-graph-<N+1>. This file can contain commits whose parents
  are in any smaller file.

I think this resolves the issue of back-compat without updating
the file format:

1. Old clients will never look at commit-graph-N, so they will
   never complain about an "incomplete" file.

2. If we always open a read handle as we move up the list, then
   a "merge and collapse" write to commit-graph-N will not
   interrupt an existing process reading that file.

I'll start hacking on this model.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-03 12:47           ` Derrick Stolee
@ 2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
  2019-05-06  8:27               ` Christian Couder
  2019-05-03 14:16             ` SZEDER Gábor
  1 sibling, 1 reply; 89+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-03 13:41 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, sandals, Junio C Hamano, Jeff King


On Fri, May 03 2019, Derrick Stolee wrote:

> On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> But those are separate from any back-compat concerns, which is what I
>> think makes sense to focus on now.
>
> Thinking more on this topic, I think I have a way to satisfy _all_ of
> your concerns by simplifying the plan for incremental commit-graph files.
>
> My initial plan was to have the "commit-graph" file always be the "tip"
> file, and it would point to some number of "graph-{hash}.graph" files.
> Then, we would have some set of strategies to decide when we create a new
> .graph file or when we compact the files down into the "commit-graph"
> file.
>
> This has several issues regarding race conditions that I had not yet
> resolved (and maybe would always have problems).
>
> It would be much simpler to restrict the model. Your idea of changing
> the file name is the inspiration here.

I still have some questions about SHA-256, how we'll eventually change
the format of these new files etc, but those can wait...

> * The "commit-graph" file is the base commit graph. It is always
>   closed under reachability (if a commit exists in this file, then
>   its parents are also in this file). We will also consider this to
>   be "commit-graph-0".
>
> * A commit-graph-<N> exists, then we check for the existence of
>   commit-graph-<N+1>. This file can contain commits whose parents
>   are in any smaller file.
>
> I think this resolves the issue of back-compat without updating
> the file format:
>
> 1. Old clients will never look at commit-graph-N, so they will
>    never complain about an "incomplete" file.
>
> 2. If we always open a read handle as we move up the list, then
>    a "merge and collapse" write to commit-graph-N will not
>    interrupt an existing process reading that file.
>
> I'll start hacking on this model.

Just on this, consider storing them in
.git/objects/info/commit-graphs/commit-graph-<THIS-FILE'S-CHECKSUM-SHA1>,
because:

1) We can stat() the "commit-graphs" directory to see if there's any
   new/deleted ones (dir mtime changed), similar to what we do for the
   untracked cache, and can (but I don't think we do...) do for packs/*
   and objects/??/.

   As opposed to ".git/objects/info" itself which e.g. has the "packs",
   "alternates" etc. files (can still do it, but more false positives)

2) If these things are "chained" supersets of one another anyway we have
   a big caveat that we don't have with repacking the *.pack
   files.

   Those you can do in any order, as long as you write a new one it
   doesn't matter if the data is already elsewhere stored. Just repack
   however many N at a time, then unlink() all the old ones later.

   If you have commit-graph-<N>, commit-graph-<N+1> you can only munge
   them in that sequence, and you get to do a much more careful fsync()
   dance around with both the directory entry and the file itself, which
   has a much higher chance of breaking things.

   I.e. the client might see a change to "N" before "N+1", and now we're
   screwed. Dealing with this is a giant but avoidable PITA. See
   https://public-inbox.org/git/20180117184828.31816-1-hch@lst.de/ and
   http://blog.httrack.com/blog/2013/11/15/everything-you-always-wanted-to-know-about-fsync/

   If instead you name the files commit-graph-<MY-OWN-SHA>, and add a
   new (v1 backwards-compatible) chunk for "and the next file's SHA-1
   is..."  we can more aggressively repack these.

   Worst case the directory entries will sync in the wrong order and the
   chain will be broken, but as long as we gracefully handle the chain
   abruptly ending that's OK.

   You can also do things like rewrite the middle "[]" of a "N, [N+1,
   N+2], N+3" sequence to have a single file for what was previously
   within those brackets (again, like "repack"). You'd just need to drop
   in a new "N" that points to the replacement for "[N+1, N+2]", but
   *don't* need to touch "N+3".

   Whereas if they're numbered you need to also move "N+3", and you're
   back to juggling both fsync() on files and directory entries.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-03 12:47           ` Derrick Stolee
  2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
@ 2019-05-03 14:16             ` SZEDER Gábor
  2019-05-03 15:11               ` Derrick Stolee
  1 sibling, 1 reply; 89+ messages in thread
From: SZEDER Gábor @ 2019-05-03 14:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget, git, sandals, Junio C Hamano

On Fri, May 03, 2019 at 08:47:25AM -0400, Derrick Stolee wrote:
> It would be much simpler to restrict the model. Your idea of changing
> the file name is the inspiration here.
> 
> * The "commit-graph" file is the base commit graph. It is always
>   closed under reachability (if a commit exists in this file, then
>   its parents are also in this file). We will also consider this to
>   be "commit-graph-0".
> 
> * A commit-graph-<N> exists, then we check for the existence of
>   commit-graph-<N+1>. This file can contain commits whose parents
>   are in any smaller file.
> 
> I think this resolves the issue of back-compat without updating
> the file format:
> 
> 1. Old clients will never look at commit-graph-N, so they will
>    never complain about an "incomplete" file.
> 
> 2. If we always open a read handle as we move up the list, then
>    a "merge and collapse" write to commit-graph-N will not
>    interrupt an existing process reading that file.

What if a process reading the commit-graph files runs short on file
descriptors and has to close some of them, while a second process is
merging commit-graph files?


> I'll start hacking on this model.

Have fun! :)


Semi-related, but I'm curious:  what are your plans for 'struct
commit's 'graph_pos' field, and how will it work with multiple
commit-graph files?

In particular: currently we use this 'graph_pos' field as an index
into the Commit Data chunk to find the metadata associated with a
given commit object.  But we could add any commit-specific metadata in
a new chunk, being an array ordered by commit OID, and then use
'graph_pos' as an index into this chunk as well.  I find this quite
convenient.  However, with mulitple commit-graph files there will be
multiple arrays...


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-03 14:16             ` SZEDER Gábor
@ 2019-05-03 15:11               ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-03 15:11 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget, git, sandals, Junio C Hamano

On 5/3/2019 10:16 AM, SZEDER Gábor wrote:
> On Fri, May 03, 2019 at 08:47:25AM -0400, Derrick Stolee wrote:
>> It would be much simpler to restrict the model. Your idea of changing
>> the file name is the inspiration here.
>>
>> * The "commit-graph" file is the base commit graph. It is always
>>   closed under reachability (if a commit exists in this file, then
>>   its parents are also in this file). We will also consider this to
>>   be "commit-graph-0".
>>
>> * A commit-graph-<N> exists, then we check for the existence of
>>   commit-graph-<N+1>. This file can contain commits whose parents
>>   are in any smaller file.
>>
>> I think this resolves the issue of back-compat without updating
>> the file format:
>>
>> 1. Old clients will never look at commit-graph-N, so they will
>>    never complain about an "incomplete" file.
>>
>> 2. If we always open a read handle as we move up the list, then
>>    a "merge and collapse" write to commit-graph-N will not
>>    interrupt an existing process reading that file.
> 
> What if a process reading the commit-graph files runs short on file
> descriptors and has to close some of them, while a second process is
> merging commit-graph files?

We will want to keep the number small so we never recycle the file
handles. Instead, we will keep them open for the entire process.

The strategies for creating these graphs should include a "merge"
strategy that keeps the number of commit-graph files very small
(fewer than 5 should be sufficient).
 
>> I'll start hacking on this model.
> 
> Have fun! :)
> 
> 
> Semi-related, but I'm curious:  what are your plans for 'struct
> commit's 'graph_pos' field, and how will it work with multiple
> commit-graph files?

Since we have a predefined "sequence" of graphs, the graph_pos
will be the position in the "meta-order" given by concatenating
the commit lists from each commit-graph. We then navigate to a commit
in O(num graphs) instead of O(1).

In the commit-graph format, we will use this "meta-order" number
to refer to parent positions.

> In particular: currently we use this 'graph_pos' field as an index
> into the Commit Data chunk to find the metadata associated with a
> given commit object.  But we could add any commit-specific metadata in
> a new chunk, being an array ordered by commit OID, and then use
> 'graph_pos' as an index into this chunk as well.  I find this quite
> convenient.  However, with mulitple commit-graph files there will be
> multiple arrays...

Yes, this will continue to be useful*. To find the position inside
a specific commit-graph-N file, take the graph_pos and subtract the
number of commits in the "lower" commit-graph files.

* For example, this meta-data information is necessary for the Bloom
filter data [1].

Thanks,
-Stolee

[1] https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@gmail.com/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
@ 2019-05-06  8:27               ` Christian Couder
  2019-05-06 13:47                 ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Christian Couder @ 2019-05-06  8:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git,
	brian m. carlson, Junio C Hamano, Jeff King

On Fri, May 3, 2019 at 3:44 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Fri, May 03 2019, Derrick Stolee wrote:
>
> > On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> But those are separate from any back-compat concerns, which is what I
> >> think makes sense to focus on now.
> >
> > Thinking more on this topic, I think I have a way to satisfy _all_ of
> > your concerns by simplifying the plan for incremental commit-graph files.

[...]

> Just on this, consider storing them in
> .git/objects/info/commit-graphs/commit-graph-<THIS-FILE'S-CHECKSUM-SHA1>,
> because:
>
> 1) We can stat() the "commit-graphs" directory to see if there's any
>    new/deleted ones (dir mtime changed), similar to what we do for the
>    untracked cache, and can (but I don't think we do...) do for packs/*
>    and objects/??/.
>
>    As opposed to ".git/objects/info" itself which e.g. has the "packs",
>    "alternates" etc. files (can still do it, but more false positives)

About incremental commit-graph files and alternates, I wonder if they
could work well together. The main use case would be for servers that
use a common repo for all the forks.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v3 0/6] Create commit-graph file format v2
  2019-05-06  8:27               ` Christian Couder
@ 2019-05-06 13:47                 ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-06 13:47 UTC (permalink / raw)
  To: Christian Couder, Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, brian m. carlson,
	Junio C Hamano, Jeff King

On 5/6/2019 4:27 AM, Christian Couder wrote:
> On Fri, May 3, 2019 at 3:44 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> 1) We can stat() the "commit-graphs" directory to see if there's any
>>    new/deleted ones (dir mtime changed), similar to what we do for the
>>    untracked cache, and can (but I don't think we do...) do for packs/*
>>    and objects/??/.
>>
>>    As opposed to ".git/objects/info" itself which e.g. has the "packs",
>>    "alternates" etc. files (can still do it, but more false positives)
> 
> About incremental commit-graph files and alternates, I wonder if they
> could work well together. The main use case would be for servers that
> use a common repo for all the forks.

We use a "shared object cache" in VFS for Git, implemented as an alternate,
so all enlistments share prefetch packs, multi-pack-indexes, and commit-graph
files.  This is something we are very focused on.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2)
  2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
@ 2019-05-09 14:22     ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
                         ` (12 more replies)
  7 siblings, 13 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano

This series replaces ds/commit-graph-file-v2, and I'm using the same
gitgitgadget PR to continue the version numbers and hopefully make that
clear. This is a slight modification on patches 1-11 from the incremental
file format RFC [0].

The commit-graph feature is growing, thanks to all of the contributions by
several community members. This also means that the write_commit_graph()
method is a bit unwieldy now. This series refactors that method to use a
write_commit_graph_context struct that is passed between several smaller
methods. The final result should be a write_commit_graph() method that has a
clear set of steps. Future changes should then be easier to understand.

 * Patches 1-4: these are small changes which either fix issues or just
   provide clean-up. These are mostly borrowed from
   ds/commit-graph-format-v2. 
   
   
 * Patches 5-11: these provide a non-functional refactor of
   write_commit_graph() into several methods using a "struct
   write_commit_graph_context" to share across the methods.
   
   

Updates to commits previously in this thread:

 * "commit-graph: remove Future Work section" no longer says that 'verify'
   takes as long as 'write'. [1]
   
   
 * "commit-graph: return with errors during write" now has a test to check
   we don't die(). [2]
   
   

Ævar: Looking at the old thread, I only saw two comments that still apply to
this series [1] [2]. Please point me to any comments I have missed.

Thanks, -Stolee

[0] https://public-inbox.org/git/pull.184.git.gitgitgadget@gmail.com/

[1] https://public-inbox.org/git/87o94mql0a.fsf@evledraar.gmail.com/

[2] https://public-inbox.org/git/87pnp2qlkv.fsf@evledraar.gmail.com/

Derrick Stolee (11):
  commit-graph: fix the_repository reference
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: remove Future Work section
  commit-graph: create write_commit_graph_context
  commit-graph: extract fill_oids_from_packs()
  commit-graph: extract fill_oids_from_commit_hex()
  commit-graph: extract fill_oids_from_all_packs()
  commit-graph: extract count_distinct_commits()
  commit-graph: extract copy_oids_to_commits()
  commit-graph: extract write_commit_graph_file()

 Documentation/technical/commit-graph.txt |  17 -
 builtin/commit-graph.c                   |  21 +-
 builtin/commit.c                         |   5 +-
 builtin/gc.c                             |   7 +-
 commit-graph.c                           | 607 +++++++++++++----------
 commit-graph.h                           |  14 +-
 commit.c                                 |   2 +-
 t/t5318-commit-graph.sh                  |   8 +
 8 files changed, 371 insertions(+), 310 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v3:

  -:  ---------- >  1:  0be7713a25 commit-graph: fix the_repository reference
  1:  91f300ec0a !  2:  a4082b827e commit-graph: return with errors during write
     @@ -253,3 +253,22 @@
       
       int verify_commit_graph(struct repository *r, struct commit_graph *g);
       
     +
     + diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
     + --- a/t/t5318-commit-graph.sh
     + +++ b/t/t5318-commit-graph.sh
     +@@
     + 	test_path_is_file info/commit-graph
     + '
     + 
     ++test_expect_success 'close with correct error on bad input' '
     ++	cd "$TRASH_DIRECTORY/full" &&
     ++	echo doesnotexist >in &&
     ++	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
     ++	test "$ret" = 1 &&
     ++	test_i18ngrep "error adding pack" stderr
     ++'
     ++
     + test_expect_success 'create commits and repack' '
     + 	cd "$TRASH_DIRECTORY/full" &&
     + 	for i in $(test_seq 3)
  2:  924b22f990 =  3:  469d0c9a32 commit-graph: collapse parameters into flags
  3:  8446011a43 <  -:  ---------- commit-graph: create new version parameter
  4:  6a0e99f9f9 <  -:  ---------- commit-graph: add --version=<n> option
  5:  cca8267dfe <  -:  ---------- commit-graph: implement file format version 2
  6:  e72bca6c78 !  4:  130007d0e1 commit-graph: remove Future Work section
     @@ -12,9 +12,8 @@
      
          It is unlikely that we will ever send a commit-graph file
          as part of the protocol, since we would need to verify the
     -    data, and that is as expensive as writing a commit-graph from
     -    scratch. If we want to start trusting remote content, then
     -    that item can be investigated again.
     +    data, and that is expensive. If we want to start trusting
     +    remote content, then that item can be investigated again.
      
          While there is more work to be done on the feature, having
          a section of the docs devoted to a TODO list is wasteful and
  -:  ---------- >  5:  0ca4e18e98 commit-graph: create write_commit_graph_context
  -:  ---------- >  6:  30c1b618b1 commit-graph: extract fill_oids_from_packs()
  -:  ---------- >  7:  8cb2613dfa commit-graph: extract fill_oids_from_commit_hex()
  -:  ---------- >  8:  8f7129672a commit-graph: extract fill_oids_from_all_packs()
  -:  ---------- >  9:  a37548745b commit-graph: extract count_distinct_commits()
  -:  ---------- > 10:  57366ffdaa commit-graph: extract copy_oids_to_commits()
  -:  ---------- > 11:  fc81c8946d commit-graph: extract write_commit_graph_file()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 01/11] commit-graph: fix the_repository reference
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-13  2:56         ` Junio C Hamano
  2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
                         ` (11 subsequent siblings)
  12 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The parse_commit_buffer() method takes a repository pointer, so it
should not refer to the_repository anymore.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e4d1233226 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	item->date = parse_commit_date(bufptr, tail);
 
 	if (check_graph)
-		load_commit_graph_info(the_repository, item);
+		load_commit_graph_info(r, item);
 
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 02/11] commit-graph: return with errors during write
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-13  3:13         ` Junio C Hamano
  2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
                         ` (10 subsequent siblings)
  12 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c  | 19 +++++++------
 builtin/commit.c        |  5 ++--
 builtin/gc.c            |  7 ++---
 commit-graph.c          | 60 ++++++++++++++++++++++++++++-------------
 commit-graph.h          | 10 +++----
 t/t5318-commit-graph.sh |  8 ++++++
 6 files changed, 70 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2e86251f02 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *pack_indexes = NULL;
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
+	int result;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
 	read_replace_refs = 0;
 
-	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-		return 0;
-	}
+	if (opts.reachable)
+		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
 		UNLEAK(buf);
 	}
 
-	write_commit_graph(opts.obj_dir,
-			   pack_indexes,
-			   commit_hex,
-			   opts.append,
-			   1);
+	result = write_commit_graph(opts.obj_dir,
+				    pack_indexes,
+				    commit_hex,
+				    opts.append,
+				    1);
 
 	UNLEAK(lines);
-	return 0;
+	return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "new_index file. Check that disk is not full and quota is\n"
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-		write_commit_graph_reachable(get_object_directory(), 0, 0);
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+		return 1;
 
 	repo_rerere(the_repository, 0);
 	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0,
-					     !quiet && !daemonized);
+	if (gc_write_commit_graph &&
+	    write_commit_graph_reachable(get_object_directory(), 0,
+					 !quiet && !daemonized))
+		return 1;
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..ee487a364b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+				 int report_progress)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
+	int result;
 
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+	result = write_commit_graph(obj_dir, NULL, &list,
+				    append, report_progress);
 
 	string_list_clear(&list, 0);
+	return result;
 }
 
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress)
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
-	char *graph_name;
+	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
@@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
-		return;
+		return 0;
 
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
+	commits.list = NULL;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
 			strbuf_setlen(&packname, dirlen);
 			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p)
-				die(_("error adding pack %s"), packname.buf);
-			if (open_pack_index(p))
-				die(_("error opening index for %s"), packname.buf);
+			if (!p) {
+				error(_("error adding pack %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
+			if (open_pack_index(p)) {
+				error(_("error opening index for %s"), packname.buf);
+				res = 1;
+				goto cleanup;
+			}
 			for_each_object_in_pack(p, add_packed_commits, &oids,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
@@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
 	}
 	stop_progress(&progress);
 
-	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
-		die(_("the commit graph format cannot write %d commits"), count_distinct);
+	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
+		error(_("the commit graph format cannot write %d commits"), count_distinct);
+		res = 1;
+		goto cleanup;
+	}
 
 	commits.nr = 0;
 	commits.alloc = count_distinct;
@@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
 	num_chunks = num_extra_edges ? 4 : 3;
 	stop_progress(&progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
-		die(_("too many commits to write graph"));
+	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+		error(_("too many commits to write graph"));
+		res = 1;
+		goto cleanup;
+	}
 
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name)) {
 		UNLEAK(graph_name);
-		die_errno(_("unable to create leading directories of %s"),
-			  graph_name);
+		error(_("unable to create leading directories of %s"),
+			graph_name);
+		res = errno;
+		goto cleanup;
 	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
@@ -1107,9 +1126,12 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+cleanup:
 	free(graph_name);
 	free(commits.list);
 	free(oids.list);
+
+	return res;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
diff --git a/commit-graph.h b/commit-graph.h
index 7dfb8c896f..d15670bf46 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
+int write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress);
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress);
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..3b6fd0d728 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
 	test_path_is_file info/commit-graph
 '
 
+test_expect_success 'close with correct error on bad input' '
+	cd "$TRASH_DIRECTORY/full" &&
+	echo doesnotexist >in &&
+	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
+	test "$ret" = 1 &&
+	test_i18ngrep "error adding pack" stderr
+'
+
 test_expect_success 'create commits and repack' '
 	cd "$TRASH_DIRECTORY/full" &&
 	for i in $(test_seq 3)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 03/11] commit-graph: collapse parameters into flags
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-13  3:44         ` Junio C Hamano
  2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
                         ` (9 subsequent siblings)
  12 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 8 +++++---
 builtin/commit.c       | 2 +-
 builtin/gc.c           | 4 ++--
 commit-graph.c         | 9 +++++----
 commit-graph.h         | 8 +++++---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2e86251f02..828b1a713f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
 	int result;
+	int flags = COMMIT_GRAPH_PROGRESS;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
+	if (opts.append)
+		flags |= COMMIT_GRAPH_APPEND;
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
+		return write_commit_graph_reachable(opts.obj_dir, flags);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
 	result = write_commit_graph(opts.obj_dir,
 				    pack_indexes,
 				    commit_hex,
-				    opts.append,
-				    1);
+				    flags);
 
 	UNLEAK(lines);
 	return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+	    write_commit_graph_reachable(get_object_directory(), 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	}
 
 	if (gc_write_commit_graph &&
-	    write_commit_graph_reachable(get_object_directory(), 0,
-					 !quiet && !daemonized))
+	    write_commit_graph_reachable(get_object_directory(),
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index ee487a364b..8bbd50658c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				 int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    append, report_progress);
+				    flags);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append,
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress)
+		       unsigned int flags)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
 	int res = 0;
+	int append = flags & COMMIT_GRAPH_APPEND;
+	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
diff --git a/commit-graph.h b/commit-graph.h
index d15670bf46..70f4caf0c7 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,14 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress);
+#define COMMIT_GRAPH_APPEND     (1 << 0)
+#define COMMIT_GRAPH_PROGRESS   (1 << 1)
+
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress);
+		       unsigned int flags);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 04/11] commit-graph: remove Future Work section
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
                         ` (8 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.

As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.

It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is expensive. If we want to start trusting
remote content, then that item can be investigated again.

While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 7805b0968c..fb53341d5e 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,23 +127,6 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
-Future Work
------------
-
-- After computing and storing generation numbers, we must make graph
-  walks aware of generation numbers to gain the performance benefits they
-  enable. This will mostly be accomplished by swapping a commit-date-ordered
-  priority queue with one ordered by generation number. The following
-  operations are important candidates:
-
-    - 'log --topo-order'
-    - 'tag --merged'
-
-- A server could provide a commit-graph file as part of the network protocol
-  to avoid extra calculations by clients. This feature is only of benefit if
-  the user is willing to trust the file, because verifying the file is correct
-  is as hard as computing it from scratch.
-
 Related Links
 -------------
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 05/11] commit-graph: create write_commit_graph_context
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too large and complex. To simplify
it, we should extract several small methods. However, we will risk
repeating a lot of declarations related to progress incidators and
object id or commit lists.

Create a new write_commit_graph_context struct that contains the
core data structures used in this process. Replace the other local
variables with the values inside the context object. Following this
change, we will start to lift code segments wholesale out of the
write_commit_graph() method and into their own methods.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 390 ++++++++++++++++++++++++-------------------------
 1 file changed, 194 insertions(+), 196 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8bbd50658c..58f0f0ae34 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -518,14 +518,38 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 	return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
 }
 
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+struct write_commit_graph_context {
+	struct repository *r;
+	const char *obj_dir;
+	char *graph_name;
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	int num_extra_edges;
+	unsigned long approx_nr_objects;
+	struct progress *progress;
+	int progress_done;
+	uint64_t progress_cnt;
+	unsigned append:1,
+		 report_progress:1;
+};
+
 static void write_graph_chunk_fanout(struct hashfile *f,
-				     struct commit **commits,
-				     int nr_commits,
-				     struct progress *progress,
-				     uint64_t *progress_cnt)
+				     struct write_commit_graph_context *ctx)
 {
 	int i, count = 0;
-	struct commit **list = commits;
+	struct commit **list = ctx->commits.list;
 
 	/*
 	 * Write the first-level table (the list is sorted,
@@ -533,10 +557,10 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 	 * having to do eight extra binary search iterations).
 	 */
 	for (i = 0; i < 256; i++) {
-		while (count < nr_commits) {
+		while (count < ctx->commits.nr) {
 			if ((*list)->object.oid.hash[0] != i)
 				break;
-			display_progress(progress, ++*progress_cnt);
+			display_progress(ctx->progress, ++ctx->progress_cnt);
 			count++;
 			list++;
 		}
@@ -546,14 +570,12 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits,
-				   struct progress *progress,
-				   uint64_t *progress_cnt)
+				   struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
+	struct commit **list = ctx->commits.list;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++) {
-		display_progress(progress, ++*progress_cnt);
+	for (count = 0; count < ctx->commits.nr; count++, list++) {
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
 	}
 }
@@ -565,19 +587,17 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits,
-				   struct progress *progress,
-				   uint64_t *progress_cnt)
+				   struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
-	struct commit **last = commits + nr_commits;
+	struct commit **list = ctx->commits.list;
+	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	uint32_t num_extra_edges = 0;
 
 	while (list < last) {
 		struct commit_list *parent;
 		int edge_value;
 		uint32_t packedDate[2];
-		display_progress(progress, ++*progress_cnt);
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 
 		parse_commit_no_graph(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -588,8 +608,8 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			edge_value = GRAPH_PARENT_NONE;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
-					      commits,
-					      nr_commits,
+					      ctx->commits.list,
+					      ctx->commits.nr,
 					      commit_to_sha1);
 
 			if (edge_value < 0)
@@ -609,8 +629,8 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			edge_value = GRAPH_EXTRA_EDGES_NEEDED | num_extra_edges;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
-					      commits,
-					      nr_commits,
+					      ctx->commits.list,
+					      ctx->commits.nr,
 					      commit_to_sha1);
 			if (edge_value < 0)
 				BUG("missing parent %s for commit %s",
@@ -642,19 +662,16 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 }
 
 static void write_graph_chunk_extra_edges(struct hashfile *f,
-					  struct commit **commits,
-					  int nr_commits,
-					  struct progress *progress,
-					  uint64_t *progress_cnt)
+					  struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
-	struct commit **last = commits + nr_commits;
+	struct commit **list = ctx->commits.list;
+	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	struct commit_list *parent;
 
 	while (list < last) {
 		int num_parents = 0;
 
-		display_progress(progress, ++*progress_cnt);
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 
 		for (parent = (*list)->parents; num_parents < 3 && parent;
 		     parent = parent->next)
@@ -668,8 +685,8 @@ static void write_graph_chunk_extra_edges(struct hashfile *f,
 		/* Since num_parents > 2, this initializer is safe. */
 		for (parent = (*list)->parents->next; parent; parent = parent->next) {
 			int edge_value = sha1_pos(parent->item->object.oid.hash,
-						  commits,
-						  nr_commits,
+						  ctx->commits.list,
+						  ctx->commits.nr,
 						  commit_to_sha1);
 
 			if (edge_value < 0)
@@ -693,125 +710,111 @@ static int commit_compare(const void *_a, const void *_b)
 	return oidcmp(a, b);
 }
 
-struct packed_commit_list {
-	struct commit **list;
-	int nr;
-	int alloc;
-};
-
-struct packed_oid_list {
-	struct object_id *list;
-	int nr;
-	int alloc;
-	struct progress *progress;
-	int progress_done;
-};
-
 static int add_packed_commits(const struct object_id *oid,
 			      struct packed_git *pack,
 			      uint32_t pos,
 			      void *data)
 {
-	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	struct write_commit_graph_context *ctx = (struct write_commit_graph_context*)data;
 	enum object_type type;
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
-	if (list->progress)
-		display_progress(list->progress, ++list->progress_done);
+	if (ctx->progress)
+		display_progress(ctx->progress, ++ctx->progress_done);
 
 	oi.typep = &type;
-	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
+	if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
 
 	if (type != OBJ_COMMIT)
 		return 0;
 
-	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
-	oidcpy(&(list->list[list->nr]), oid);
-	list->nr++;
+	ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+	oidcpy(&(ctx->oids.list[ctx->oids.nr]), oid);
+	ctx->oids.nr++;
 
 	return 0;
 }
 
-static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+static void add_missing_parents(struct write_commit_graph_context *ctx, struct commit *commit)
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
 		if (!(parent->item->object.flags & UNINTERESTING)) {
-			ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
-			oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
-			oids->nr++;
+			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
+			ctx->oids.nr++;
 			parent->item->object.flags |= UNINTERESTING;
 		}
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids, int report_progress)
+static void close_reachable(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct commit *commit;
-	struct progress *progress = NULL;
 
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Loading known commits in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Loading known commits in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
 	/*
-	 * As this loop runs, oids->nr may grow, but not more
+	 * As this loop runs, ctx->oids.nr may grow, but not more
 	 * than the number of missing commits in the reachable
 	 * closure.
 	 */
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Expanding reachable commits in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Expanding reachable commits in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit && !parse_commit_no_graph(commit))
-			add_missing_parents(oids, commit);
+			add_missing_parents(ctx, commit);
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Clearing commit marks in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Clearing commit marks in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits,
-				       int report_progress)
+static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct commit_list *list = NULL;
-	struct progress *progress = NULL;
 
-	if (report_progress)
-		progress = start_progress(
-			_("Computing commit graph generation numbers"),
-			commits->nr);
-	for (i = 0; i < commits->nr; i++) {
-		display_progress(progress, i + 1);
-		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+	if (ctx->report_progress)
+		ctx->progress = start_progress(
+					_("Computing commit graph generation numbers"),
+					ctx->commits.nr);
+	for (i = 0; i < ctx->commits.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
+		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
 
-		commit_list_insert(commits->list[i], &list);
+		commit_list_insert(ctx->commits.list[i], &list);
 		while (list) {
 			struct commit *current = list->item;
 			struct commit_list *parent;
@@ -838,7 +841,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits,
 			}
 		}
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -869,8 +872,7 @@ int write_commit_graph(const char *obj_dir,
 		       struct string_list *commit_hex,
 		       unsigned int flags)
 {
-	struct packed_oid_list oids;
-	struct packed_commit_list commits;
+	struct write_commit_graph_context *ctx;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
 	char *graph_name = NULL;
@@ -878,44 +880,38 @@ int write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	int num_extra_edges;
 	struct commit_list *parent;
-	struct progress *progress = NULL;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
-	unsigned long approx_nr_objects;
 	int res = 0;
-	int append = flags & COMMIT_GRAPH_APPEND;
-	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
 
-	oids.nr = 0;
-	approx_nr_objects = approximate_object_count();
-	oids.alloc = approx_nr_objects / 32;
-	oids.progress = NULL;
-	oids.progress_done = 0;
-	commits.list = NULL;
-
-	if (append) {
-		prepare_commit_graph_one(the_repository, obj_dir);
-		if (the_repository->objects->commit_graph)
-			oids.alloc += the_repository->objects->commit_graph->num_commits;
+	ctx = xcalloc(1, sizeof(struct write_commit_graph_context));
+	ctx->r = the_repository;
+	ctx->obj_dir = obj_dir;
+	ctx->append = flags & COMMIT_GRAPH_APPEND ? 1 : 0;
+	ctx->report_progress = flags & COMMIT_GRAPH_PROGRESS ? 1 : 0;
+
+	ctx->approx_nr_objects = approximate_object_count();
+	ctx->oids.alloc = ctx->approx_nr_objects / 32;
+
+	if (ctx->append) {
+		prepare_commit_graph_one(ctx->r, ctx->obj_dir);
+		if (ctx->r->objects->commit_graph)
+			ctx->oids.alloc += ctx->r->objects->commit_graph->num_commits;
 	}
 
-	if (oids.alloc < 1024)
-		oids.alloc = 1024;
-	ALLOC_ARRAY(oids.list, oids.alloc);
-
-	if (append && the_repository->objects->commit_graph) {
-		struct commit_graph *commit_graph =
-			the_repository->objects->commit_graph;
-		for (i = 0; i < commit_graph->num_commits; i++) {
-			const unsigned char *hash = commit_graph->chunk_oid_lookup +
-				commit_graph->hash_len * i;
-			hashcpy(oids.list[oids.nr++].hash, hash);
+	if (ctx->oids.alloc < 1024)
+		ctx->oids.alloc = 1024;
+	ALLOC_ARRAY(ctx->oids.list, ctx->oids.alloc);
+
+	if (ctx->append && ctx->r->objects->commit_graph) {
+		struct commit_graph *g = ctx->r->objects->commit_graph;
+		for (i = 0; i < g->num_commits; i++) {
+			const unsigned char *hash = g->chunk_oid_lookup + g->hash_len * i;
+			hashcpy(ctx->oids.list[ctx->oids.nr++].hash, hash);
 		}
 	}
 
@@ -924,14 +920,14 @@ int write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		if (report_progress) {
+		if (ctx->report_progress) {
 			strbuf_addf(&progress_title,
 				    Q_("Finding commits for commit graph in %d pack",
 				       "Finding commits for commit graph in %d packs",
 				       pack_indexes->nr),
 				    pack_indexes->nr);
-			oids.progress = start_delayed_progress(progress_title.buf, 0);
-			oids.progress_done = 0;
+			ctx->progress = start_delayed_progress(progress_title.buf, 0);
+			ctx->progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
@@ -948,75 +944,76 @@ int write_commit_graph(const char *obj_dir,
 				res = 1;
 				goto cleanup;
 			}
-			for_each_object_in_pack(p, add_packed_commits, &oids,
+			for_each_object_in_pack(p, add_packed_commits, ctx,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
 			free(p);
 		}
-		stop_progress(&oids.progress);
+		stop_progress(&ctx->progress);
 		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress) {
+		if (ctx->report_progress) {
 			strbuf_addf(&progress_title,
 				    Q_("Finding commits for commit graph from %d ref",
 				       "Finding commits for commit graph from %d refs",
 				       commit_hex->nr),
 				    commit_hex->nr);
-			progress = start_delayed_progress(progress_title.buf,
-							  commit_hex->nr);
+			ctx->progress = start_delayed_progress(
+						progress_title.buf,
+						commit_hex->nr);
 		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			display_progress(progress, i + 1);
+			display_progress(ctx->progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
-			result = lookup_commit_reference_gently(the_repository, &oid, 1);
+			result = lookup_commit_reference_gently(ctx->r, &oid, 1);
 
 			if (result) {
-				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
-				oidcpy(&oids.list[oids.nr], &(result->object.oid));
-				oids.nr++;
+				ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+				oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
+				ctx->oids.nr++;
 			}
 		}
-		stop_progress(&progress);
+		stop_progress(&ctx->progress);
 		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
-		if (report_progress)
-			oids.progress = start_delayed_progress(
+		if (ctx->report_progress)
+			ctx->progress = start_delayed_progress(
 				_("Finding commits for commit graph among packed objects"),
-				approx_nr_objects);
-		for_each_packed_object(add_packed_commits, &oids,
+				ctx->approx_nr_objects);
+		for_each_packed_object(add_packed_commits, ctx,
 				       FOR_EACH_OBJECT_PACK_ORDER);
-		if (oids.progress_done < approx_nr_objects)
-			display_progress(oids.progress, approx_nr_objects);
-		stop_progress(&oids.progress);
+		if (ctx->progress_done < ctx->approx_nr_objects)
+			display_progress(ctx->progress, ctx->approx_nr_objects);
+		stop_progress(&ctx->progress);
 	}
 
-	close_reachable(&oids, report_progress);
+	close_reachable(ctx);
 
-	if (report_progress)
-		progress = start_delayed_progress(
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
 			_("Counting distinct commits in commit graph"),
-			oids.nr);
-	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
-	QSORT(oids.list, oids.nr, commit_compare);
+			ctx->oids.nr);
+	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
+	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
 	count_distinct = 1;
-	for (i = 1; i < oids.nr; i++) {
-		display_progress(progress, i + 1);
-		if (!oideq(&oids.list[i - 1], &oids.list[i]))
+	for (i = 1; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
 			count_distinct++;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
 	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
 		error(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -1024,54 +1021,54 @@ int write_commit_graph(const char *obj_dir,
 		goto cleanup;
 	}
 
-	commits.nr = 0;
-	commits.alloc = count_distinct;
-	ALLOC_ARRAY(commits.list, commits.alloc);
+	ctx->commits.alloc = count_distinct;
+	ALLOC_ARRAY(ctx->commits.list, ctx->commits.alloc);
 
-	num_extra_edges = 0;
-	if (report_progress)
-		progress = start_delayed_progress(
+	ctx->num_extra_edges = 0;
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
 			_("Finding extra edges in commit graph"),
-			oids.nr);
-	for (i = 0; i < oids.nr; i++) {
+			ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
 		int num_parents = 0;
-		display_progress(progress, i + 1);
-		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
+		display_progress(ctx->progress, i + 1);
+		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
 			continue;
 
-		commits.list[commits.nr] = lookup_commit(the_repository, &oids.list[i]);
-		parse_commit_no_graph(commits.list[commits.nr]);
+		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
+		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
 
-		for (parent = commits.list[commits.nr]->parents;
+		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
 		     parent; parent = parent->next)
 			num_parents++;
 
 		if (num_parents > 2)
-			num_extra_edges += num_parents - 1;
+			ctx->num_extra_edges += num_parents - 1;
 
-		commits.nr++;
+		ctx->commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
 		error(_("too many commits to write graph"));
 		res = 1;
 		goto cleanup;
 	}
 
-	compute_generation_numbers(&commits, report_progress);
+	compute_generation_numbers(ctx);
 
-	graph_name = get_commit_graph_filename(obj_dir);
-	if (safe_create_leading_directories(graph_name)) {
-		UNLEAK(graph_name);
+	num_chunks = ctx->num_extra_edges ? 4 : 3;
+
+	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
+	if (safe_create_leading_directories(ctx->graph_name)) {
+		UNLEAK(ctx->graph_name);
 		error(_("unable to create leading directories of %s"),
-			graph_name);
+			ctx->graph_name);
 		res = errno;
 		goto cleanup;
 	}
 
-	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
@@ -1084,7 +1081,7 @@ int write_commit_graph(const char *obj_dir,
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (num_extra_edges)
+	if (ctx->num_extra_edges)
 		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
 	else
 		chunk_ids[3] = 0;
@@ -1092,9 +1089,9 @@ int write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
@@ -1105,32 +1102,33 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	if (report_progress) {
+	if (ctx->report_progress) {
 		strbuf_addf(&progress_title,
 			    Q_("Writing out commit graph in %d pass",
 			       "Writing out commit graph in %d passes",
 			       num_chunks),
 			    num_chunks);
-		progress = start_delayed_progress(
+		ctx->progress = start_delayed_progress(
 			progress_title.buf,
-			num_chunks * commits.nr);
+			num_chunks * ctx->commits.nr);
 	}
-	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
-	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
-	write_graph_chunk_data(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
-	if (num_extra_edges)
-		write_graph_chunk_extra_edges(f, commits.list, commits.nr, progress, &progress_cnt);
-	stop_progress(&progress);
+	write_graph_chunk_fanout(f, ctx);
+	write_graph_chunk_oids(f, hashsz, ctx);
+	write_graph_chunk_data(f, hashsz, ctx);
+	if (ctx->num_extra_edges)
+		write_graph_chunk_extra_edges(f, ctx);
+	stop_progress(&ctx->progress);
 	strbuf_release(&progress_title);
 
-	close_commit_graph(the_repository);
+	close_commit_graph(ctx->r);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
 cleanup:
 	free(graph_name);
-	free(commits.list);
-	free(oids.list);
+	free(ctx->commits.list);
+	free(ctx->oids.list);
+	free(ctx);
 
 	return res;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

Extract fill_oids_from_commit_hex() that reads the given commit
id list and fille the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 72 ++++++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 80c7069aaa..fb25280df1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -912,6 +912,44 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 	return 0;
 }
 
+static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx,
+				      struct string_list *commit_hex)
+{
+	uint32_t i;
+	struct strbuf progress_title = STRBUF_INIT;
+
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Finding commits for commit graph from %d ref",
+			       "Finding commits for commit graph from %d refs",
+			       commit_hex->nr),
+			    commit_hex->nr);
+		ctx->progress = start_delayed_progress(
+					progress_title.buf,
+					commit_hex->nr);
+	}
+	for (i = 0; i < commit_hex->nr; i++) {
+		const char *end;
+		struct object_id oid;
+		struct commit *result;
+
+		display_progress(ctx->progress, i + 1);
+		if (commit_hex->items[i].string &&
+		    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
+			continue;
+
+		result = lookup_commit_reference_gently(ctx->r, &oid, 1);
+
+		if (result) {
+			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
+			ctx->oids.nr++;
+		}
+	}
+	stop_progress(&ctx->progress);
+	strbuf_release(&progress_title);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -965,38 +1003,8 @@ int write_commit_graph(const char *obj_dir,
 			goto cleanup;
 	}
 
-	if (commit_hex) {
-		if (ctx->report_progress) {
-			strbuf_addf(&progress_title,
-				    Q_("Finding commits for commit graph from %d ref",
-				       "Finding commits for commit graph from %d refs",
-				       commit_hex->nr),
-				    commit_hex->nr);
-			ctx->progress = start_delayed_progress(
-						progress_title.buf,
-						commit_hex->nr);
-		}
-		for (i = 0; i < commit_hex->nr; i++) {
-			const char *end;
-			struct object_id oid;
-			struct commit *result;
-
-			display_progress(ctx->progress, i + 1);
-			if (commit_hex->items[i].string &&
-			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
-				continue;
-
-			result = lookup_commit_reference_gently(ctx->r, &oid, 1);
-
-			if (result) {
-				ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
-				oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
-				ctx->oids.nr++;
-			}
-		}
-		stop_progress(&ctx->progress);
-		strbuf_reset(&progress_title);
-	}
+	if (commit_hex)
+		fill_oids_from_commit_hex(ctx, commit_hex);
 
 	if (!pack_indexes && !commit_hex) {
 		if (ctx->report_progress)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-13  5:05         ` Junio C Hamano
  2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  12 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

This extracts fill_oids_from_packs() that reads the given
pack-file list and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 83 ++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 36 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 58f0f0ae34..80c7069aaa 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -867,6 +867,51 @@ int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 	return result;
 }
 
+static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
+				struct string_list *pack_indexes)
+{
+	uint32_t i;
+	struct strbuf progress_title = STRBUF_INIT;
+	struct strbuf packname = STRBUF_INIT;
+	int dirlen;
+
+	strbuf_addf(&packname, "%s/pack/", ctx->obj_dir);
+	dirlen = packname.len;
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Finding commits for commit graph in %d pack",
+			       "Finding commits for commit graph in %d packs",
+			       pack_indexes->nr),
+			    pack_indexes->nr);
+		ctx->progress = start_delayed_progress(progress_title.buf, 0);
+		ctx->progress_done = 0;
+	}
+	for (i = 0; i < pack_indexes->nr; i++) {
+		struct packed_git *p;
+		strbuf_setlen(&packname, dirlen);
+		strbuf_addstr(&packname, pack_indexes->items[i].string);
+		p = add_packed_git(packname.buf, packname.len, 1);
+		if (!p) {
+			error(_("error adding pack %s"), packname.buf);
+			return 1;
+		}
+		if (open_pack_index(p)) {
+			error(_("error opening index for %s"), packname.buf);
+			return 1;
+		}
+		for_each_object_in_pack(p, add_packed_commits, ctx,
+					FOR_EACH_OBJECT_PACK_ORDER);
+		close_pack(p);
+		free(p);
+	}
+
+	stop_progress(&ctx->progress);
+	strbuf_reset(&progress_title);
+	strbuf_release(&packname);
+
+	return 0;
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -916,42 +961,8 @@ int write_commit_graph(const char *obj_dir,
 	}
 
 	if (pack_indexes) {
-		struct strbuf packname = STRBUF_INIT;
-		int dirlen;
-		strbuf_addf(&packname, "%s/pack/", obj_dir);
-		dirlen = packname.len;
-		if (ctx->report_progress) {
-			strbuf_addf(&progress_title,
-				    Q_("Finding commits for commit graph in %d pack",
-				       "Finding commits for commit graph in %d packs",
-				       pack_indexes->nr),
-				    pack_indexes->nr);
-			ctx->progress = start_delayed_progress(progress_title.buf, 0);
-			ctx->progress_done = 0;
-		}
-		for (i = 0; i < pack_indexes->nr; i++) {
-			struct packed_git *p;
-			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes->items[i].string);
-			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p) {
-				error(_("error adding pack %s"), packname.buf);
-				res = 1;
-				goto cleanup;
-			}
-			if (open_pack_index(p)) {
-				error(_("error opening index for %s"), packname.buf);
-				res = 1;
-				goto cleanup;
-			}
-			for_each_object_in_pack(p, add_packed_commits, ctx,
-						FOR_EACH_OBJECT_PACK_ORDER);
-			close_pack(p);
-			free(p);
-		}
-		stop_progress(&ctx->progress);
-		strbuf_reset(&progress_title);
-		strbuf_release(&packname);
+		if ((res = fill_oids_from_packs(ctx, pack_indexes)))
+			goto cleanup;
 	}
 
 	if (commit_hex) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

Extract fill_oids_from_all_packs() that reads all pack-files
for commits and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fb25280df1..730d529815 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -950,6 +950,19 @@ static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx,
 	strbuf_release(&progress_title);
 }
 
+static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx)
+{
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Finding commits for commit graph among packed objects"),
+			ctx->approx_nr_objects);
+	for_each_packed_object(add_packed_commits, ctx,
+			       FOR_EACH_OBJECT_PACK_ORDER);
+	if (ctx->progress_done < ctx->approx_nr_objects)
+		display_progress(ctx->progress, ctx->approx_nr_objects);
+	stop_progress(&ctx->progress);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -1006,17 +1019,8 @@ int write_commit_graph(const char *obj_dir,
 	if (commit_hex)
 		fill_oids_from_commit_hex(ctx, commit_hex);
 
-	if (!pack_indexes && !commit_hex) {
-		if (ctx->report_progress)
-			ctx->progress = start_delayed_progress(
-				_("Finding commits for commit graph among packed objects"),
-				ctx->approx_nr_objects);
-		for_each_packed_object(add_packed_commits, ctx,
-				       FOR_EACH_OBJECT_PACK_ORDER);
-		if (ctx->progress_done < ctx->approx_nr_objects)
-			display_progress(ctx->progress, ctx->approx_nr_objects);
-		stop_progress(&ctx->progress);
-	}
+	if (!pack_indexes && !commit_hex)
+		fill_oids_from_all_packs(ctx);
 
 	close_reachable(ctx);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 09/11] commit-graph: extract count_distinct_commits()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
                         ` (3 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

Extract count_distinct_commits(), which sorts the oids list, then
iterates through to find duplicates.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 730d529815..f7419c919b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -963,6 +963,27 @@ static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
+{
+	uint32_t i, count_distinct = 1;
+
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			ctx->oids.nr);
+	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
+	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
+
+	for (i = 1; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
+			count_distinct++;
+	}
+	stop_progress(&ctx->progress);
+
+	return count_distinct;
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -1024,19 +1045,7 @@ int write_commit_graph(const char *obj_dir,
 
 	close_reachable(ctx);
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-			_("Counting distinct commits in commit graph"),
-			ctx->oids.nr);
-	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
-	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
-	count_distinct = 1;
-	for (i = 1; i < ctx->oids.nr; i++) {
-		display_progress(ctx->progress, i + 1);
-		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
-			count_distinct++;
-	}
-	stop_progress(&ctx->progress);
+	count_distinct = count_distinct_commits(ctx);
 
 	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
 		error(_("the commit graph format cannot write %d commits"), count_distinct);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (8 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

Extract copy_oids_to_commits(), which fills the commits list
with the distinct commits from the oids list. During this loop,
it also counts the number of "extra" edges from octopus merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 57 ++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f7419c919b..16cdd7afb2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -984,6 +984,37 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 	return count_distinct;
 }
 
+static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
+{
+	uint32_t i;
+	struct commit_list *parent;
+
+	ctx->num_extra_edges = 0;
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		int num_parents = 0;
+		display_progress(ctx->progress, i + 1);
+		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
+			continue;
+
+		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
+		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
+
+		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			ctx->num_extra_edges += num_parents - 1;
+
+		ctx->commits.nr++;
+	}
+	stop_progress(&ctx->progress);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -997,7 +1028,6 @@ int write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	struct commit_list *parent;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
 	int res = 0;
@@ -1056,30 +1086,7 @@ int write_commit_graph(const char *obj_dir,
 	ctx->commits.alloc = count_distinct;
 	ALLOC_ARRAY(ctx->commits.list, ctx->commits.alloc);
 
-	ctx->num_extra_edges = 0;
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-			_("Finding extra edges in commit graph"),
-			ctx->oids.nr);
-	for (i = 0; i < ctx->oids.nr; i++) {
-		int num_parents = 0;
-		display_progress(ctx->progress, i + 1);
-		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
-			continue;
-
-		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
-		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
-
-		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
-		     parent; parent = parent->next)
-			num_parents++;
-
-		if (num_parents > 2)
-			ctx->num_extra_edges += num_parents - 1;
-
-		ctx->commits.nr++;
-	}
-	stop_progress(&ctx->progress);
+	copy_oids_to_commits(ctx);
 
 	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
 		error(_("too many commits to write graph"));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v4 11/11] commit-graph: extract write_commit_graph_file()
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (9 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
@ 2019-05-09 14:22       ` Derrick Stolee via GitGitGadget
  2019-05-13  5:09         ` Junio C Hamano
  2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
  12 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-05-09 14:22 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting methods one by one.

Extract write_commit_graph_file() that takes all of the information
in the context struct and writes the data to a commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 155 +++++++++++++++++++++++++------------------------
 1 file changed, 80 insertions(+), 75 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 16cdd7afb2..7723156964 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1015,21 +1015,91 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-int write_commit_graph(const char *obj_dir,
-		       struct string_list *pack_indexes,
-		       struct string_list *commit_hex,
-		       unsigned int flags)
+static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 {
-	struct write_commit_graph_context *ctx;
+	uint32_t i;
 	struct hashfile *f;
-	uint32_t i, count_distinct = 0;
-	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
-	int num_chunks;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
+	int num_chunks = ctx->num_extra_edges ? 4 : 3;
+
+	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
+	if (safe_create_leading_directories(ctx->graph_name)) {
+		UNLEAK(ctx->graph_name);
+		error(_("unable to create leading directories of %s"),
+			ctx->graph_name);
+		return errno;
+	}
+
+	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, oid_version());
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (ctx->num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Writing out commit graph in %d pass",
+			       "Writing out commit graph in %d passes",
+			       num_chunks),
+			    num_chunks);
+		ctx->progress = start_delayed_progress(
+			progress_title.buf,
+			num_chunks * ctx->commits.nr);
+	}
+	write_graph_chunk_fanout(f, ctx);
+	write_graph_chunk_oids(f, hashsz, ctx);
+	write_graph_chunk_data(f, hashsz, ctx);
+	if (ctx->num_extra_edges)
+		write_graph_chunk_extra_edges(f, ctx);
+	stop_progress(&ctx->progress);
+	strbuf_release(&progress_title);
+
+	close_commit_graph(ctx->r);
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	commit_lock_file(&lk);
+
+	return 0;
+}
+
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       unsigned int flags)
+{
+	struct write_commit_graph_context *ctx;
+	uint32_t i, count_distinct = 0;
 	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
@@ -1096,75 +1166,10 @@ int write_commit_graph(const char *obj_dir,
 
 	compute_generation_numbers(ctx);
 
-	num_chunks = ctx->num_extra_edges ? 4 : 3;
-
-	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
-	if (safe_create_leading_directories(ctx->graph_name)) {
-		UNLEAK(ctx->graph_name);
-		error(_("unable to create leading directories of %s"),
-			ctx->graph_name);
-		res = errno;
-		goto cleanup;
-	}
-
-	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
-	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
-
-	hashwrite_be32(f, GRAPH_SIGNATURE);
-
-	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused padding byte */
-
-	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
-	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
-	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (ctx->num_extra_edges)
-		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
-	else
-		chunk_ids[3] = 0;
-	chunk_ids[4] = 0;
-
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
-	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
-
-	for (i = 0; i <= num_chunks; i++) {
-		uint32_t chunk_write[3];
-
-		chunk_write[0] = htonl(chunk_ids[i]);
-		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
-		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
-		hashwrite(f, chunk_write, 12);
-	}
-
-	if (ctx->report_progress) {
-		strbuf_addf(&progress_title,
-			    Q_("Writing out commit graph in %d pass",
-			       "Writing out commit graph in %d passes",
-			       num_chunks),
-			    num_chunks);
-		ctx->progress = start_delayed_progress(
-			progress_title.buf,
-			num_chunks * ctx->commits.nr);
-	}
-	write_graph_chunk_fanout(f, ctx);
-	write_graph_chunk_oids(f, hashsz, ctx);
-	write_graph_chunk_data(f, hashsz, ctx);
-	if (ctx->num_extra_edges)
-		write_graph_chunk_extra_edges(f, ctx);
-	stop_progress(&ctx->progress);
-	strbuf_release(&progress_title);
-
-	close_commit_graph(ctx->r);
-	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-	commit_lock_file(&lk);
+	res = write_commit_graph_file(ctx);
 
 cleanup:
-	free(graph_name);
+	free(ctx->graph_name);
 	free(ctx->commits.list);
 	free(ctx->oids.list);
 	free(ctx);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2)
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (10 preceding siblings ...)
  2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
@ 2019-05-09 17:58       ` Josh Steadmon
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
  12 siblings, 0 replies; 89+ messages in thread
From: Josh Steadmon @ 2019-05-09 17:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Junio C Hamano

On 2019.05.09 07:22, Derrick Stolee via GitGitGadget wrote:
> This series replaces ds/commit-graph-file-v2, and I'm using the same
> gitgitgadget PR to continue the version numbers and hopefully make that
> clear. This is a slight modification on patches 1-11 from the incremental
> file format RFC [0].
> 
> The commit-graph feature is growing, thanks to all of the contributions by
> several community members. This also means that the write_commit_graph()
> method is a bit unwieldy now. This series refactors that method to use a
> write_commit_graph_context struct that is passed between several smaller
> methods. The final result should be a write_commit_graph() method that has a
> clear set of steps. Future changes should then be easier to understand.

This series looks good to me.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 01/11] commit-graph: fix the_repository reference
  2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
@ 2019-05-13  2:56         ` Junio C Hamano
  0 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-05-13  2:56 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The parse_commit_buffer() method takes a repository pointer, so it
> should not refer to the_repository anymore.

Yup, makes sense.  Thanks for spotting.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/commit.c b/commit.c
> index a5333c7ac6..e4d1233226 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -443,7 +443,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
>  	item->date = parse_commit_date(bufptr, tail);
>  
>  	if (check_graph)
> -		load_commit_graph_info(the_repository, item);
> +		load_commit_graph_info(r, item);
>  
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 02/11] commit-graph: return with errors during write
  2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-05-13  3:13         ` Junio C Hamano
  2019-05-13 11:04           ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Junio C Hamano @ 2019-05-13  3:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method uses die() to report failure and
> exit when confronted with an unexpected condition. This use of
> die() in a library function is incorrect and is now replaced by
> error() statements and an int return type.

OK.

> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 537fdfd0f0..2e86251f02 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
>  	struct string_list *pack_indexes = NULL;
>  	struct string_list *commit_hex = NULL;
>  	struct string_list lines;
> +	int result;
>  
>  	static struct option builtin_commit_graph_write_options[] = {
>  		OPT_STRING(0, "object-dir", &opts.obj_dir,
> @@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
>  
>  	read_replace_refs = 0;
>  
> -	if (opts.reachable) {
> -		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
> -		return 0;
> -	}
> +	if (opts.reachable)
> +		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
>  
>  	string_list_init(&lines, 0);
>  	if (opts.stdin_packs || opts.stdin_commits) {
> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>  		UNLEAK(buf);
>  	}
>  
> -	write_commit_graph(opts.obj_dir,
> -			   pack_indexes,
> -			   commit_hex,
> -			   opts.append,
> -			   1);
> +	result = write_commit_graph(opts.obj_dir,
> +				    pack_indexes,
> +				    commit_hex,
> +				    opts.append,
> +				    1);
>  
>  	UNLEAK(lines);
> -	return 0;
> +	return result;
>  }

What were the error values this function used to return?  I am
wondering if the callers of this function are prepraed to see the
returned values from write_commit_graph() this function stores in
'result' (which presumably are small negative value like our usual
internal API convention)?


> diff --git a/builtin/commit.c b/builtin/commit.c
> index 2986553d5f..b9ea7222fa 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  		      "new_index file. Check that disk is not full and quota is\n"
>  		      "not exceeded, and then \"git reset HEAD\" to recover."));
>  
> -	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
> -		write_commit_graph_reachable(get_object_directory(), 0, 0);
> +	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
> +	    write_commit_graph_reachable(get_object_directory(), 0, 0))
> +		return 1;

This is good.  An error signalled as a small negative integer would
not seep thru to the exit status but is explicitly turned into 1
with this change.

> +	if (gc_write_commit_graph &&
> +	    write_commit_graph_reachable(get_object_directory(), 0,
> +					 !quiet && !daemonized))
> +		return 1;

Ditto.

> +int write_commit_graph_reachable(const char *obj_dir, int append,
> +				 int report_progress)
>  {
>  	struct string_list list = STRING_LIST_INIT_DUP;
> +	int result;
>  
>  	for_each_ref(add_ref_to_list, &list);
> -	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
> +	result = write_commit_graph(obj_dir, NULL, &list,
> +				    append, report_progress);
>  
>  	string_list_clear(&list, 0);
> +	return result;
>  }

OK.  The callers of write_commit_graph_reachable() can be careful
about its return values to the same degree as the callers of
write_commit_graph().

These functions perhaps deserves
/*
 * returns X when ....
 */
in front (or in *.h)?

> +int write_commit_graph(const char *obj_dir,
> +		       struct string_list *pack_indexes,
> +		       struct string_list *commit_hex,
> +		       int append, int report_progress)
>  {
>  	struct packed_oid_list oids;
>  	struct packed_commit_list commits;
>  	struct hashfile *f;
>  	uint32_t i, count_distinct = 0;
> -	char *graph_name;
> +	char *graph_name = NULL;
>  	struct lock_file lk = LOCK_INIT;
>  	uint32_t chunk_ids[5];
>  	uint64_t chunk_offsets[5];
> @@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
>  	uint64_t progress_cnt = 0;
>  	struct strbuf progress_title = STRBUF_INIT;
>  	unsigned long approx_nr_objects;
> +	int res = 0;
>  
>  	if (!commit_graph_compatible(the_repository))
> -		return;
> +		return 0;

OK.  I tend to find "return 0" easier to read/follow than "return
res" here.

>  	oids.nr = 0;
>  	approx_nr_objects = approximate_object_count();
>  	oids.alloc = approx_nr_objects / 32;
>  	oids.progress = NULL;
>  	oids.progress_done = 0;
> +	commits.list = NULL;
>  
>  	if (append) {
>  		prepare_commit_graph_one(the_repository, obj_dir);
> @@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
>  			strbuf_setlen(&packname, dirlen);
>  			strbuf_addstr(&packname, pack_indexes->items[i].string);
>  			p = add_packed_git(packname.buf, packname.len, 1);
> -			if (!p)
> -				die(_("error adding pack %s"), packname.buf);
> -			if (open_pack_index(p))
> -				die(_("error opening index for %s"), packname.buf);
> +			if (!p) {
> +				error(_("error adding pack %s"), packname.buf);
> +				res = 1;
> +				goto cleanup;
> +			}
> +			if (open_pack_index(p)) {
> +				error(_("error opening index for %s"), packname.buf);
> +				res = 1;
> +				goto cleanup;
> +			}

Hmph, is this signal an error by returning a positive "1"?  That's a
bit unusual.

> @@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
>  	}
>  	stop_progress(&progress);
>  
> -	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
> -		die(_("the commit graph format cannot write %d commits"), count_distinct);
> +	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
> +		error(_("the commit graph format cannot write %d commits"), count_distinct);
> +		res = 1;
> +		goto cleanup;
> +	}
>  
>  	commits.nr = 0;
>  	commits.alloc = count_distinct;
> @@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
>  	num_chunks = num_extra_edges ? 4 : 3;
>  	stop_progress(&progress);
>  
> -	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
> -		die(_("too many commits to write graph"));
> +	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
> +		error(_("too many commits to write graph"));
> +		res = 1;
> +		goto cleanup;
> +	}
>  
>  	compute_generation_numbers(&commits, report_progress);
>  
>  	graph_name = get_commit_graph_filename(obj_dir);
>  	if (safe_create_leading_directories(graph_name)) {
>  		UNLEAK(graph_name);
> -		die_errno(_("unable to create leading directories of %s"),
> -			  graph_name);
> +		error(_("unable to create leading directories of %s"),
> +			graph_name);
> +		res = errno;
> +		goto cleanup;
>  	}

Hmph.  Do we know errno==0 means no error everywhere?  Do we know
errno==1 is not used by anybody as a meaningful value?

What I am getting at is if a hardcoded "1" we saw above as "error
exists but we are not telling the caller what kind of system-level
error led to it by returning errno" (and a hardcoded "0" as "there
is no error") are consistent with this use of "res" where "the
callers are allowed to learn what system-level error led to this
error return from this function by sending the return value of this
function to strerror() or comparing with EWHATEVER".  I do not think
this is a good design.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 03/11] commit-graph: collapse parameters into flags
  2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-05-13  3:44         ` Junio C Hamano
  2019-05-13 11:07           ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Junio C Hamano @ 2019-05-13  3:44 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() and write_commit_graph_reachable() methods
> currently take two boolean parameters: 'append' and 'report_progress'.
> We will soon expand the possible options to send to these methods, so
> instead of complicating the parameter list, first simplify it.

I think this change to introduce "flags" and pack these two into a
single parameter, even if there is no plan to add code that starts
using third and subsequent bits immediately.

We are no longer adding anything beyond PROGRESS and APPEND in this
series, no?

>
> Collapse these parameters into a 'flags' parameter, and adjust the
> callers to provide flags as necessary.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit-graph.c | 8 +++++---
>  builtin/commit.c       | 2 +-
>  builtin/gc.c           | 4 ++--
>  commit-graph.c         | 9 +++++----
>  commit-graph.h         | 8 +++++---
>  5 files changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 2e86251f02..828b1a713f 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
>  	struct string_list *commit_hex = NULL;
>  	struct string_list lines;
>  	int result;
> +	int flags = COMMIT_GRAPH_PROGRESS;

Make it a habit to use "unsigned" not a signed type, when you pack a
collection of bits into a flag word, unless you are treating the MSB
specially, e.g. checking to see if it is negative is cheaper than
masking with MSB to see if it is set.

> ...
>  	result = write_commit_graph(opts.obj_dir,
>  				    pack_indexes,
>  				    commit_hex,
> -				    opts.append,
> -				    1);
> +				    flags);
> ...
> -int write_commit_graph_reachable(const char *obj_dir, int append,
> -				 int report_progress)
> +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
> ...
>  int write_commit_graph(const char *obj_dir,
>  		       struct string_list *pack_indexes,
>  		       struct string_list *commit_hex,
> -		       int append, int report_progress)
> +		       unsigned int flags)

OK, so the receivers of the flags word know the collection is
unsigned; it's just the user of the API in graph_write() that gets
the signedness wrong.  OK, easy enough to correct, I guess.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs()
  2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
@ 2019-05-13  5:05         ` Junio C Hamano
  0 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-05-13  5:05 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method is too complex, so we are
> extracting methods one by one.
>
> This extracts fill_oids_from_packs() that reads the given
> pack-file list and fills the oid list in the context.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---

Quite straight-forward.  Looking good.

Thanks.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 11/11] commit-graph: extract write_commit_graph_file()
  2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
@ 2019-05-13  5:09         ` Junio C Hamano
  0 siblings, 0 replies; 89+ messages in thread
From: Junio C Hamano @ 2019-05-13  5:09 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The write_commit_graph() method is too complex, so we are
> extracting methods one by one.
>
> Extract write_commit_graph_file() that takes all of the information
> in the context struct and writes the data to a commit-graph file.

The later parts of splitting pieces out of write_commit_graph() into
separate helper functions look all sensible.  One big benefit of
doing this, even if each of these helper functions have a single
caller, is that each of these individual steps now has a descriptive
name.

Module a few nits (and possibly s/method/helper function/g), the
series look good to me.

Thanks.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 155 +++++++++++++++++++++++++------------------------
>  1 file changed, 80 insertions(+), 75 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 16cdd7afb2..7723156964 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1015,21 +1015,91 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
>  	stop_progress(&ctx->progress);
>  }
>  
> -int write_commit_graph(const char *obj_dir,
> -		       struct string_list *pack_indexes,
> -		       struct string_list *commit_hex,
> -		       unsigned int flags)
> +static int write_commit_graph_file(struct write_commit_graph_context *ctx)
>  {
> -	struct write_commit_graph_context *ctx;
> +	uint32_t i;
>  	struct hashfile *f;
> -	uint32_t i, count_distinct = 0;
> -	char *graph_name = NULL;
>  	struct lock_file lk = LOCK_INIT;
>  	uint32_t chunk_ids[5];
>  	uint64_t chunk_offsets[5];
> -	int num_chunks;
>  	const unsigned hashsz = the_hash_algo->rawsz;
>  	struct strbuf progress_title = STRBUF_INIT;
> +	int num_chunks = ctx->num_extra_edges ? 4 : 3;
> +
> +	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
> +	if (safe_create_leading_directories(ctx->graph_name)) {
> +		UNLEAK(ctx->graph_name);
> +		error(_("unable to create leading directories of %s"),
> +			ctx->graph_name);
> +		return errno;
> +	}
> +
> +	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
> +	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
> +
> +	hashwrite_be32(f, GRAPH_SIGNATURE);
> +
> +	hashwrite_u8(f, GRAPH_VERSION);
> +	hashwrite_u8(f, oid_version());
> +	hashwrite_u8(f, num_chunks);
> +	hashwrite_u8(f, 0); /* unused padding byte */
> +
> +	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
> +	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
> +	chunk_ids[2] = GRAPH_CHUNKID_DATA;
> +	if (ctx->num_extra_edges)
> +		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
> +	else
> +		chunk_ids[3] = 0;
> +	chunk_ids[4] = 0;
> +
> +	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
> +	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
> +	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
> +	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
> +	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
> +
> +	for (i = 0; i <= num_chunks; i++) {
> +		uint32_t chunk_write[3];
> +
> +		chunk_write[0] = htonl(chunk_ids[i]);
> +		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
> +		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
> +		hashwrite(f, chunk_write, 12);
> +	}
> +
> +	if (ctx->report_progress) {
> +		strbuf_addf(&progress_title,
> +			    Q_("Writing out commit graph in %d pass",
> +			       "Writing out commit graph in %d passes",
> +			       num_chunks),
> +			    num_chunks);
> +		ctx->progress = start_delayed_progress(
> +			progress_title.buf,
> +			num_chunks * ctx->commits.nr);
> +	}
> +	write_graph_chunk_fanout(f, ctx);
> +	write_graph_chunk_oids(f, hashsz, ctx);
> +	write_graph_chunk_data(f, hashsz, ctx);
> +	if (ctx->num_extra_edges)
> +		write_graph_chunk_extra_edges(f, ctx);
> +	stop_progress(&ctx->progress);
> +	strbuf_release(&progress_title);
> +
> +	close_commit_graph(ctx->r);
> +	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
> +	commit_lock_file(&lk);
> +
> +	return 0;
> +}
> +
> +int write_commit_graph(const char *obj_dir,
> +		       struct string_list *pack_indexes,
> +		       struct string_list *commit_hex,
> +		       unsigned int flags)
> +{
> +	struct write_commit_graph_context *ctx;
> +	uint32_t i, count_distinct = 0;
>  	int res = 0;
>  
>  	if (!commit_graph_compatible(the_repository))
> @@ -1096,75 +1166,10 @@ int write_commit_graph(const char *obj_dir,
>  
>  	compute_generation_numbers(ctx);
>  
> -	num_chunks = ctx->num_extra_edges ? 4 : 3;
> -
> -	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
> -	if (safe_create_leading_directories(ctx->graph_name)) {
> -		UNLEAK(ctx->graph_name);
> -		error(_("unable to create leading directories of %s"),
> -			ctx->graph_name);
> -		res = errno;
> -		goto cleanup;
> -	}
> -
> -	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
> -	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
> -
> -	hashwrite_be32(f, GRAPH_SIGNATURE);
> -
> -	hashwrite_u8(f, GRAPH_VERSION);
> -	hashwrite_u8(f, oid_version());
> -	hashwrite_u8(f, num_chunks);
> -	hashwrite_u8(f, 0); /* unused padding byte */
> -
> -	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
> -	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
> -	chunk_ids[2] = GRAPH_CHUNKID_DATA;
> -	if (ctx->num_extra_edges)
> -		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
> -	else
> -		chunk_ids[3] = 0;
> -	chunk_ids[4] = 0;
> -
> -	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
> -	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
> -	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
> -	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
> -	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
> -
> -	for (i = 0; i <= num_chunks; i++) {
> -		uint32_t chunk_write[3];
> -
> -		chunk_write[0] = htonl(chunk_ids[i]);
> -		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
> -		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
> -		hashwrite(f, chunk_write, 12);
> -	}
> -
> -	if (ctx->report_progress) {
> -		strbuf_addf(&progress_title,
> -			    Q_("Writing out commit graph in %d pass",
> -			       "Writing out commit graph in %d passes",
> -			       num_chunks),
> -			    num_chunks);
> -		ctx->progress = start_delayed_progress(
> -			progress_title.buf,
> -			num_chunks * ctx->commits.nr);
> -	}
> -	write_graph_chunk_fanout(f, ctx);
> -	write_graph_chunk_oids(f, hashsz, ctx);
> -	write_graph_chunk_data(f, hashsz, ctx);
> -	if (ctx->num_extra_edges)
> -		write_graph_chunk_extra_edges(f, ctx);
> -	stop_progress(&ctx->progress);
> -	strbuf_release(&progress_title);
> -
> -	close_commit_graph(ctx->r);
> -	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
> -	commit_lock_file(&lk);
> +	res = write_commit_graph_file(ctx);
>  
>  cleanup:
> -	free(graph_name);
> +	free(ctx->graph_name);
>  	free(ctx->commits.list);
>  	free(ctx->oids.list);
>  	free(ctx);

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 02/11] commit-graph: return with errors during write
  2019-05-13  3:13         ` Junio C Hamano
@ 2019-05-13 11:04           ` Derrick Stolee
  2019-05-13 11:22             ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee @ 2019-05-13 11:04 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

On 5/12/2019 11:13 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>>  		UNLEAK(buf);
>>  	}
>>  
>> -	write_commit_graph(opts.obj_dir,
>> -			   pack_indexes,
>> -			   commit_hex,
>> -			   opts.append,
>> -			   1);
>> +	result = write_commit_graph(opts.obj_dir,
>> +				    pack_indexes,
>> +				    commit_hex,
>> +				    opts.append,
>> +				    1);
>>  
>>  	UNLEAK(lines);
>> -	return 0;
>> +	return result;
>>  }
> 
> What were the error values this function used to return?  I am
> wondering if the callers of this function are prepraed to see the
> returned values from write_commit_graph() this function stores in
> 'result' (which presumably are small negative value like our usual
> internal API convention)?

The only caller is cmd_commit_graph() and it is in this snippet:

        if (argc > 0) {
                if (!strcmp(argv[0], "read"))
                        return graph_read(argc, argv);
                if (!strcmp(argv[0], "verify"))
                        return graph_verify(argc, argv);
                if (!strcmp(argv[0], "write"))
                        return graph_write(argc, argv);
        }

So these return values are passed directly to the result of the
builtin. If that is against convention (passing an error code from
the library to the result of the builtin) then I can modify.

> OK.  The callers of write_commit_graph_reachable() can be careful
> about its return values to the same degree as the callers of
> write_commit_graph().
> 
> These functions perhaps deserves
> /*
>  * returns X when ....
>  */
> in front (or in *.h)?

Can do, in commit-graph.h.

>> +int write_commit_graph(const char *obj_dir,
>> +		       struct string_list *pack_indexes,
>> +		       struct string_list *commit_hex,
>> +		       int append, int report_progress)
>>  {
>>  	struct packed_oid_list oids;
>>  	struct packed_commit_list commits;
>>  	struct hashfile *f;
>>  	uint32_t i, count_distinct = 0;
>> -	char *graph_name;
>> +	char *graph_name = NULL;
>>  	struct lock_file lk = LOCK_INIT;
>>  	uint32_t chunk_ids[5];
>>  	uint64_t chunk_offsets[5];
>> @@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
>>  	uint64_t progress_cnt = 0;
>>  	struct strbuf progress_title = STRBUF_INIT;
>>  	unsigned long approx_nr_objects;
>> +	int res = 0;
>>  
>>  	if (!commit_graph_compatible(the_repository))
>> -		return;
>> +		return 0;
> 
> OK.  I tend to find "return 0" easier to read/follow than "return
> res" here.

Yes, this choice was deliberate as there is no cleanup to do if we
return this early. Also note that we don't "fail" because we did
exactly as much work as we expect in this scenario. I'll be careful
to point this out when I add a comment to the header file.

>>  	oids.nr = 0;
>>  	approx_nr_objects = approximate_object_count();
>>  	oids.alloc = approx_nr_objects / 32;
>>  	oids.progress = NULL;
>>  	oids.progress_done = 0;
>> +	commits.list = NULL;
>>  
>>  	if (append) {
>>  		prepare_commit_graph_one(the_repository, obj_dir);
>> @@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
>>  			strbuf_setlen(&packname, dirlen);
>>  			strbuf_addstr(&packname, pack_indexes->items[i].string);
>>  			p = add_packed_git(packname.buf, packname.len, 1);
>> -			if (!p)
>> -				die(_("error adding pack %s"), packname.buf);
>> -			if (open_pack_index(p))
>> -				die(_("error opening index for %s"), packname.buf);
>> +			if (!p) {
>> +				error(_("error adding pack %s"), packname.buf);
>> +				res = 1;
>> +				goto cleanup;
>> +			}
>> +			if (open_pack_index(p)) {
>> +				error(_("error opening index for %s"), packname.buf);
>> +				res = 1;
>> +				goto cleanup;
>> +			}
> 
> Hmph, is this signal an error by returning a positive "1"?  That's a
> bit unusual.

Your hint above of "passing a negative value by convention" did make me
think I must be doing something wrong.

>> @@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
>>  	}
>>  	stop_progress(&progress);
>>  
>> -	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
>> -		die(_("the commit graph format cannot write %d commits"), count_distinct);
>> +	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
>> +		error(_("the commit graph format cannot write %d commits"), count_distinct);
>> +		res = 1;
>> +		goto cleanup;
>> +	}
>>  
>>  	commits.nr = 0;
>>  	commits.alloc = count_distinct;
>> @@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
>>  	num_chunks = num_extra_edges ? 4 : 3;
>>  	stop_progress(&progress);
>>  
>> -	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
>> -		die(_("too many commits to write graph"));
>> +	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
>> +		error(_("too many commits to write graph"));
>> +		res = 1;
>> +		goto cleanup;
>> +	}
>>  
>>  	compute_generation_numbers(&commits, report_progress);
>>  
>>  	graph_name = get_commit_graph_filename(obj_dir);
>>  	if (safe_create_leading_directories(graph_name)) {
>>  		UNLEAK(graph_name);
>> -		die_errno(_("unable to create leading directories of %s"),
>> -			  graph_name);
>> +		error(_("unable to create leading directories of %s"),
>> +			graph_name);
>> +		res = errno;
>> +		goto cleanup;
>>  	}
> 
> Hmph.  Do we know errno==0 means no error everywhere?  Do we know
> errno==1 is not used by anybody as a meaningful value?
> 
> What I am getting at is if a hardcoded "1" we saw above as "error
> exists but we are not telling the caller what kind of system-level
> error led to it by returning errno" (and a hardcoded "0" as "there
> is no error") are consistent with this use of "res" where "the
> callers are allowed to learn what system-level error led to this
> error return from this function by sending the return value of this
> function to strerror() or comparing with EWHATEVER".  I do not think
> this is a good design.

That's a good point. In a new design, would you like me to (1) ignore
errno here and use a constant value for "write_commit_graph() failed
at some point" or to (2) split the possible _reasons_ for the failure
into different constants? I believe the use of error() should prevent
the need for the second option. The first option would only change
this 'res = errno' into 'res = 1'.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 03/11] commit-graph: collapse parameters into flags
  2019-05-13  3:44         ` Junio C Hamano
@ 2019-05-13 11:07           ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-13 11:07 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

On 5/12/2019 11:44 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The write_commit_graph() and write_commit_graph_reachable() methods
>> currently take two boolean parameters: 'append' and 'report_progress'.
>> We will soon expand the possible options to send to these methods, so
>> instead of complicating the parameter list, first simplify it.
> 
> I think this change to introduce "flags" and pack these two into a
> single parameter, even if there is no plan to add code that starts
> using third and subsequent bits immediately.
> 
> We are no longer adding anything beyond PROGRESS and APPEND in this
> series, no?

In this series, we are no longer expanding the options. I will add
a flag when I update the incremental file format series. I can modify
the message to no longer hint at an immediate addition.

>>
>> Collapse these parameters into a 'flags' parameter, and adjust the
>> callers to provide flags as necessary.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  builtin/commit-graph.c | 8 +++++---
>>  builtin/commit.c       | 2 +-
>>  builtin/gc.c           | 4 ++--
>>  commit-graph.c         | 9 +++++----
>>  commit-graph.h         | 8 +++++---
>>  5 files changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> index 2e86251f02..828b1a713f 100644
>> --- a/builtin/commit-graph.c
>> +++ b/builtin/commit-graph.c
>> @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
>>  	struct string_list *commit_hex = NULL;
>>  	struct string_list lines;
>>  	int result;
>> +	int flags = COMMIT_GRAPH_PROGRESS;
> 
> Make it a habit to use "unsigned" not a signed type, when you pack a
> collection of bits into a flag word, unless you are treating the MSB
> specially, e.g. checking to see if it is negative is cheaper than
> masking with MSB to see if it is set.

Ah sorry. I missed this one after changing the parameter in your earlier
feedback.
 
>> ...
>>  	result = write_commit_graph(opts.obj_dir,
>>  				    pack_indexes,
>>  				    commit_hex,
>> -				    opts.append,
>> -				    1);
>> +				    flags);
>> ...
>> -int write_commit_graph_reachable(const char *obj_dir, int append,
>> -				 int report_progress)
>> +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
>> ...
>>  int write_commit_graph(const char *obj_dir,
>>  		       struct string_list *pack_indexes,
>>  		       struct string_list *commit_hex,
>> -		       int append, int report_progress)
>> +		       unsigned int flags)
> 
> OK, so the receivers of the flags word know the collection is
> unsigned; it's just the user of the API in graph_write() that gets
> the signedness wrong.  OK, easy enough to correct, I guess.
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v4 02/11] commit-graph: return with errors during write
  2019-05-13 11:04           ` Derrick Stolee
@ 2019-05-13 11:22             ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-05-13 11:22 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Derrick Stolee

On 5/13/2019 7:04 AM, Derrick Stolee wrote:
> On 5/12/2019 11:13 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>> @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
>>>  		UNLEAK(buf);
>>>  	}
>>>  
>>> -	write_commit_graph(opts.obj_dir,
>>> -			   pack_indexes,
>>> -			   commit_hex,
>>> -			   opts.append,
>>> -			   1);
>>> +	result = write_commit_graph(opts.obj_dir,
>>> +				    pack_indexes,
>>> +				    commit_hex,
>>> +				    opts.append,
>>> +				    1);
>>>  
>>>  	UNLEAK(lines);
>>> -	return 0;
>>> +	return result;
>>>  }
>>
>> What were the error values this function used to return?  I am
>> wondering if the callers of this function are prepraed to see the
>> returned values from write_commit_graph() this function stores in
>> 'result' (which presumably are small negative value like our usual
>> internal API convention)?
> 
> The only caller is cmd_commit_graph() and it is in this snippet:
> 
>         if (argc > 0) {
>                 if (!strcmp(argv[0], "read"))
>                         return graph_read(argc, argv);
>                 if (!strcmp(argv[0], "verify"))
>                         return graph_verify(argc, argv);
>                 if (!strcmp(argv[0], "write"))
>                         return graph_write(argc, argv);
>         }
> 
> So these return values are passed directly to the result of the
> builtin. If that is against convention (passing an error code from
> the library to the result of the builtin) then I can modify.

And I see from your other feedback (upon re-reading) that you prefer
translating a negative error value from the library into a "1" here
for the builtin.

As I prepare my next version, I'll have write_commit_graph() return -1
for all errors and have graph_write() translate that to a 1. But I'll
wait to see if you want more specific error codes from write_commit_graph().

-Stolee


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 00/11] Commit-graph write refactor (was: Create commit-graph file format v2)
  2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
                         ` (11 preceding siblings ...)
  2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
@ 2019-06-12 13:29       ` " Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
                           ` (10 more replies)
  12 siblings, 11 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano

This series replaces ds/commit-graph-file-v2, and I'm using the same
gitgitgadget PR to continue the version numbers and hopefully make that
clear. This is a slight modification on patches 1-11 from the incremental
file format RFC [0].

The commit-graph feature is growing, thanks to all of the contributions by
several community members. This also means that the write_commit_graph()
method is a bit unwieldy now. This series refactors that method to use a
write_commit_graph_context struct that is passed between several smaller
methods. The final result should be a write_commit_graph() method that has a
clear set of steps. Future changes should then be easier to understand.

 * Patches 1-4: these are small changes which either fix issues or just
   provide clean-up. These are mostly borrowed from
   ds/commit-graph-format-v2. 
   
   
 * Patches 5-11: these provide a non-functional refactor of
   write_commit_graph() into several methods using a "struct
   write_commit_graph_context" to share across the methods.
   
   

Updates to commits previously in this thread:

 * "commit-graph: remove Future Work section" no longer says that 'verify'
   takes as long as 'write'. [1]
   
   
 * "commit-graph: return with errors during write" now has a test to check
   we don't die(). [2]
   
   

Ævar: Looking at the old thread, I only saw two comments that still apply to
this series [1] [2]. Please point me to any comments I have missed.

Updates in V5:

 * API calls are updated to return 0 on success and a negative value on
   failure.
   
   
 * Stopped passing 'errno' through an API function, instead returns -1.
   
   
 * "extracting methods" -> "extracting helper functions" in commit messages.
   
   
 * flags are now unsigned ints.
   
   

Thanks, -Stolee

[0] https://public-inbox.org/git/pull.184.git.gitgitgadget@gmail.com/

[1] https://public-inbox.org/git/87o94mql0a.fsf@evledraar.gmail.com/

[2] https://public-inbox.org/git/87pnp2qlkv.fsf@evledraar.gmail.com/

Derrick Stolee (11):
  commit-graph: fix the_repository reference
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: remove Future Work section
  commit-graph: create write_commit_graph_context
  commit-graph: extract fill_oids_from_packs()
  commit-graph: extract fill_oids_from_commit_hex()
  commit-graph: extract fill_oids_from_all_packs()
  commit-graph: extract count_distinct_commits()
  commit-graph: extract copy_oids_to_commits()
  commit-graph: extract write_commit_graph_file()

 Documentation/technical/commit-graph.txt |  17 -
 builtin/commit-graph.c                   |  22 +-
 builtin/commit.c                         |   5 +-
 builtin/gc.c                             |   7 +-
 commit-graph.c                           | 607 +++++++++++++----------
 commit-graph.h                           |  20 +-
 commit.c                                 |   2 +-
 t/t5318-commit-graph.sh                  |   8 +
 8 files changed, 378 insertions(+), 310 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v4:

  1:  0be7713a25 =  1:  0be7713a25 commit-graph: fix the_repository reference
  2:  a4082b827e !  2:  95f66e85b2 commit-graph: return with errors during write
     @@ -5,14 +5,17 @@
          The write_commit_graph() method uses die() to report failure and
          exit when confronted with an unexpected condition. This use of
          die() in a library function is incorrect and is now replaced by
     -    error() statements and an int return type.
     +    error() statements and an int return type. Return zero on success
     +    and a negative value on failure.
      
          Now that we use 'goto cleanup' to jump to the terminal condition
          on an error, we have new paths that could lead to uninitialized
          values. New initializers are added to correct for this.
      
          The builtins 'commit-graph', 'gc', and 'commit' call these methods,
     -    so update them to check the return value.
     +    so update them to check the return value. Test that 'git commit-graph
     +    write' returns a proper error code when hitting a failure condition
     +    in write_commit_graph().
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -23,7 +26,7 @@
       	struct string_list *pack_indexes = NULL;
       	struct string_list *commit_hex = NULL;
       	struct string_list lines;
     -+	int result;
     ++	int result = 0;
       
       	static struct option builtin_commit_graph_write_options[] = {
       		OPT_STRING(0, "object-dir", &opts.obj_dir,
     @@ -49,11 +52,12 @@
      -			   commit_hex,
      -			   opts.append,
      -			   1);
     -+	result = write_commit_graph(opts.obj_dir,
     -+				    pack_indexes,
     -+				    commit_hex,
     -+				    opts.append,
     -+				    1);
     ++	if (write_commit_graph(opts.obj_dir,
     ++			       pack_indexes,
     ++			       commit_hex,
     ++			       opts.append,
     ++			       1))
     ++		result = 1;
       
       	UNLEAK(lines);
      -	return 0;
     @@ -167,12 +171,12 @@
      -				die(_("error opening index for %s"), packname.buf);
      +			if (!p) {
      +				error(_("error adding pack %s"), packname.buf);
     -+				res = 1;
     ++				res = -1;
      +				goto cleanup;
      +			}
      +			if (open_pack_index(p)) {
      +				error(_("error opening index for %s"), packname.buf);
     -+				res = 1;
     ++				res = -1;
      +				goto cleanup;
      +			}
       			for_each_object_in_pack(p, add_packed_commits, &oids,
     @@ -186,7 +190,7 @@
      -		die(_("the commit graph format cannot write %d commits"), count_distinct);
      +	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
      +		error(_("the commit graph format cannot write %d commits"), count_distinct);
     -+		res = 1;
     ++		res = -1;
      +		goto cleanup;
      +	}
       
     @@ -200,7 +204,7 @@
      -		die(_("too many commits to write graph"));
      +	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
      +		error(_("too many commits to write graph"));
     -+		res = 1;
     ++		res = -1;
      +		goto cleanup;
      +	}
       
     @@ -213,7 +217,7 @@
      -			  graph_name);
      +		error(_("unable to create leading directories of %s"),
      +			graph_name);
     -+		res = errno;
     ++		res = -1;
      +		goto cleanup;
       	}
       
     @@ -240,6 +244,12 @@
       int generation_numbers_enabled(struct repository *r);
       
      -void write_commit_graph_reachable(const char *obj_dir, int append,
     ++/*
     ++ * The write_commit_graph* methods return zero on success
     ++ * and a negative value on failure. Note that if the repository
     ++ * is not compatible with the commit-graph feature, then the
     ++ * methods will return 0 without writing a commit-graph.
     ++ */
      +int write_commit_graph_reachable(const char *obj_dir, int append,
       				  int report_progress);
      -void write_commit_graph(const char *obj_dir,
  3:  469d0c9a32 !  3:  b4e3ae579a commit-graph: collapse parameters into flags
     @@ -4,8 +4,8 @@
      
          The write_commit_graph() and write_commit_graph_reachable() methods
          currently take two boolean parameters: 'append' and 'report_progress'.
     -    We will soon expand the possible options to send to these methods, so
     -    instead of complicating the parameter list, first simplify it.
     +    As we update these methods, adding more parameters this way becomes
     +    cluttered and hard to maintain.
      
          Collapse these parameters into a 'flags' parameter, and adjust the
          callers to provide flags as necessary.
     @@ -18,8 +18,8 @@
      @@
       	struct string_list *commit_hex = NULL;
       	struct string_list lines;
     - 	int result;
     -+	int flags = COMMIT_GRAPH_PROGRESS;
     + 	int result = 0;
     ++	unsigned int flags = COMMIT_GRAPH_PROGRESS;
       
       	static struct option builtin_commit_graph_write_options[] = {
       		OPT_STRING(0, "object-dir", &opts.obj_dir,
     @@ -39,15 +39,15 @@
       	string_list_init(&lines, 0);
       	if (opts.stdin_packs || opts.stdin_commits) {
      @@
     - 	result = write_commit_graph(opts.obj_dir,
     - 				    pack_indexes,
     - 				    commit_hex,
     --				    opts.append,
     --				    1);
     -+				    flags);
     + 	if (write_commit_graph(opts.obj_dir,
     + 			       pack_indexes,
     + 			       commit_hex,
     +-			       opts.append,
     +-			       1))
     ++			       flags))
     + 		result = 1;
       
       	UNLEAK(lines);
     - 	return result;
      
       diff --git a/builtin/commit.c b/builtin/commit.c
       --- a/builtin/commit.c
     @@ -124,11 +124,17 @@
        */
       int generation_numbers_enabled(struct repository *r);
       
     --int write_commit_graph_reachable(const char *obj_dir, int append,
     --				  int report_progress);
      +#define COMMIT_GRAPH_APPEND     (1 << 0)
      +#define COMMIT_GRAPH_PROGRESS   (1 << 1)
      +
     + /*
     +  * The write_commit_graph* methods return zero on success
     +  * and a negative value on failure. Note that if the repository
     +  * is not compatible with the commit-graph feature, then the
     +  * methods will return 0 without writing a commit-graph.
     +  */
     +-int write_commit_graph_reachable(const char *obj_dir, int append,
     +-				  int report_progress);
      +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
       int write_commit_graph(const char *obj_dir,
       		       struct string_list *pack_indexes,
  4:  130007d0e1 =  4:  a5223a37a9 commit-graph: remove Future Work section
  5:  0ca4e18e98 !  5:  b5b8a87676 commit-graph: create write_commit_graph_context
     @@ -3,7 +3,7 @@
          commit-graph: create write_commit_graph_context
      
          The write_commit_graph() method is too large and complex. To simplify
     -    it, we should extract several small methods. However, we will risk
     +    it, we should extract several helper functions. However, we will risk
          repeating a lot of declarations related to progress incidators and
          object id or commit lists.
      
     @@ -11,7 +11,7 @@
          core data structures used in this process. Replace the other local
          variables with the values inside the context object. Following this
          change, we will start to lift code segments wholesale out of the
     -    write_commit_graph() method and into their own methods.
     +    write_commit_graph() method and into helper functions.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -454,7 +454,7 @@
       		for (i = 0; i < pack_indexes->nr; i++) {
       			struct packed_git *p;
      @@
     - 				res = 1;
     + 				res = -1;
       				goto cleanup;
       			}
      -			for_each_object_in_pack(p, add_packed_commits, &oids,
     @@ -610,7 +610,7 @@
      -	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
      +	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
       		error(_("too many commits to write graph"));
     - 		res = 1;
     + 		res = -1;
       		goto cleanup;
       	}
       
     @@ -628,7 +628,7 @@
       		error(_("unable to create leading directories of %s"),
      -			graph_name);
      +			ctx->graph_name);
     - 		res = errno;
     + 		res = -1;
       		goto cleanup;
       	}
       
  6:  30c1b618b1 !  6:  98e243be67 commit-graph: extract fill_oids_from_packs()
     @@ -3,7 +3,7 @@
          commit-graph: extract fill_oids_from_packs()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          This extracts fill_oids_from_packs() that reads the given
          pack-file list and fills the oid list in the context.
     @@ -43,11 +43,11 @@
      +		p = add_packed_git(packname.buf, packname.len, 1);
      +		if (!p) {
      +			error(_("error adding pack %s"), packname.buf);
     -+			return 1;
     ++			return -1;
      +		}
      +		if (open_pack_index(p)) {
      +			error(_("error opening index for %s"), packname.buf);
     -+			return 1;
     ++			return -1;
      +		}
      +		for_each_object_in_pack(p, add_packed_commits, ctx,
      +					FOR_EACH_OBJECT_PACK_ORDER);
     @@ -89,12 +89,12 @@
      -			p = add_packed_git(packname.buf, packname.len, 1);
      -			if (!p) {
      -				error(_("error adding pack %s"), packname.buf);
     --				res = 1;
     +-				res = -1;
      -				goto cleanup;
      -			}
      -			if (open_pack_index(p)) {
      -				error(_("error opening index for %s"), packname.buf);
     --				res = 1;
     +-				res = -1;
      -				goto cleanup;
      -			}
      -			for_each_object_in_pack(p, add_packed_commits, ctx,
  7:  8cb2613dfa !  7:  fe36c8ad28 commit-graph: extract fill_oids_from_commit_hex()
     @@ -3,7 +3,7 @@
          commit-graph: extract fill_oids_from_commit_hex()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          Extract fill_oids_from_commit_hex() that reads the given commit
          id list and fille the oid list in the context.
  8:  8f7129672a !  8:  b8dfb663f3 commit-graph: extract fill_oids_from_all_packs()
     @@ -3,7 +3,7 @@
          commit-graph: extract fill_oids_from_all_packs()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          Extract fill_oids_from_all_packs() that reads all pack-files
          for commits and fills the oid list in the context.
  9:  a37548745b !  9:  40acc6ec37 commit-graph: extract count_distinct_commits()
     @@ -3,7 +3,7 @@
          commit-graph: extract count_distinct_commits()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          Extract count_distinct_commits(), which sorts the oids list, then
          iterates through to find duplicates.
 10:  57366ffdaa ! 10:  b403c01ef5 commit-graph: extract copy_oids_to_commits()
     @@ -3,7 +3,7 @@
          commit-graph: extract copy_oids_to_commits()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          Extract copy_oids_to_commits(), which fills the commits list
          with the distinct commits from the oids list. During this loop,
 11:  fc81c8946d ! 11:  7ecf923040 commit-graph: extract write_commit_graph_file()
     @@ -3,7 +3,7 @@
          commit-graph: extract write_commit_graph_file()
      
          The write_commit_graph() method is too complex, so we are
     -    extracting methods one by one.
     +    extracting helper functions one by one.
      
          Extract write_commit_graph_file() that takes all of the information
          in the context struct and writes the data to a commit-graph file.
     @@ -41,7 +41,7 @@
      +		UNLEAK(ctx->graph_name);
      +		error(_("unable to create leading directories of %s"),
      +			ctx->graph_name);
     -+		return errno;
     ++		return -1;
      +	}
      +
      +	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
     @@ -124,7 +124,7 @@
      -		UNLEAK(ctx->graph_name);
      -		error(_("unable to create leading directories of %s"),
      -			ctx->graph_name);
     --		res = errno;
     +-		res = -1;
      -		goto cleanup;
      -	}
      -

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 01/11] commit-graph: fix the_repository reference
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
                           ` (9 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The parse_commit_buffer() method takes a repository pointer, so it
should not refer to the_repository anymore.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e4d1233226 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	item->date = parse_commit_date(bufptr, tail);
 
 	if (check_graph)
-		load_commit_graph_info(the_repository, item);
+		load_commit_graph_info(r, item);
 
 	return 0;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 02/11] commit-graph: return with errors during write
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-29 17:23           ` SZEDER Gábor
  2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
                           ` (8 subsequent siblings)
  10 siblings, 1 reply; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c  | 20 +++++++-------
 builtin/commit.c        |  5 ++--
 builtin/gc.c            |  7 ++---
 commit-graph.c          | 60 ++++++++++++++++++++++++++++-------------
 commit-graph.h          | 16 +++++++----
 t/t5318-commit-graph.sh |  8 ++++++
 6 files changed, 77 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2a1c4d701f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *pack_indexes = NULL;
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
+	int result = 0;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
 	read_replace_refs = 0;
 
-	if (opts.reachable) {
-		write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-		return 0;
-	}
+	if (opts.reachable)
+		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,15 @@ static int graph_write(int argc, const char **argv)
 		UNLEAK(buf);
 	}
 
-	write_commit_graph(opts.obj_dir,
-			   pack_indexes,
-			   commit_hex,
-			   opts.append,
-			   1);
+	if (write_commit_graph(opts.obj_dir,
+			       pack_indexes,
+			       commit_hex,
+			       opts.append,
+			       1))
+		result = 1;
 
 	UNLEAK(lines);
-	return 0;
+	return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "new_index file. Check that disk is not full and quota is\n"
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
-	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-		write_commit_graph_reachable(get_object_directory(), 0, 0);
+	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+		return 1;
 
 	repo_rerere(the_repository, 0);
 	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	if (gc_write_commit_graph)
-		write_commit_graph_reachable(get_object_directory(), 0,
-					     !quiet && !daemonized);
+	if (gc_write_commit_graph &&
+	    write_commit_graph_reachable(get_object_directory(), 0,
+					 !quiet && !daemonized))
+		return 1;
 
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..1b58d1da14 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+				 int report_progress)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
+	int result;
 
 	for_each_ref(add_ref_to_list, &list);
-	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+	result = write_commit_graph(obj_dir, NULL, &list,
+				    append, report_progress);
 
 	string_list_clear(&list, 0);
+	return result;
 }
 
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress)
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
-	char *graph_name;
+	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
@@ -883,15 +886,17 @@ void write_commit_graph(const char *obj_dir,
 	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
+	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
-		return;
+		return 0;
 
 	oids.nr = 0;
 	approx_nr_objects = approximate_object_count();
 	oids.alloc = approx_nr_objects / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
+	commits.list = NULL;
 
 	if (append) {
 		prepare_commit_graph_one(the_repository, obj_dir);
@@ -932,10 +937,16 @@ void write_commit_graph(const char *obj_dir,
 			strbuf_setlen(&packname, dirlen);
 			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p)
-				die(_("error adding pack %s"), packname.buf);
-			if (open_pack_index(p))
-				die(_("error opening index for %s"), packname.buf);
+			if (!p) {
+				error(_("error adding pack %s"), packname.buf);
+				res = -1;
+				goto cleanup;
+			}
+			if (open_pack_index(p)) {
+				error(_("error opening index for %s"), packname.buf);
+				res = -1;
+				goto cleanup;
+			}
 			for_each_object_in_pack(p, add_packed_commits, &oids,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
@@ -1006,8 +1017,11 @@ void write_commit_graph(const char *obj_dir,
 	}
 	stop_progress(&progress);
 
-	if (count_distinct >= GRAPH_EDGE_LAST_MASK)
-		die(_("the commit graph format cannot write %d commits"), count_distinct);
+	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
+		error(_("the commit graph format cannot write %d commits"), count_distinct);
+		res = -1;
+		goto cleanup;
+	}
 
 	commits.nr = 0;
 	commits.alloc = count_distinct;
@@ -1039,16 +1053,21 @@ void write_commit_graph(const char *obj_dir,
 	num_chunks = num_extra_edges ? 4 : 3;
 	stop_progress(&progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK)
-		die(_("too many commits to write graph"));
+	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+		error(_("too many commits to write graph"));
+		res = -1;
+		goto cleanup;
+	}
 
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
 	if (safe_create_leading_directories(graph_name)) {
 		UNLEAK(graph_name);
-		die_errno(_("unable to create leading directories of %s"),
-			  graph_name);
+		error(_("unable to create leading directories of %s"),
+			graph_name);
+		res = -1;
+		goto cleanup;
 	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
@@ -1107,9 +1126,12 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+cleanup:
 	free(graph_name);
 	free(commits.list);
 	free(oids.list);
+
+	return res;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
diff --git a/commit-graph.h b/commit-graph.h
index 7dfb8c896f..869717ca19 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,12 +65,18 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
+/*
+ * The write_commit_graph* methods return zero on success
+ * and a negative value on failure. Note that if the repository
+ * is not compatible with the commit-graph feature, then the
+ * methods will return 0 without writing a commit-graph.
+ */
+int write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress);
-void write_commit_graph(const char *obj_dir,
-			struct string_list *pack_indexes,
-			struct string_list *commit_hex,
-			int append, int report_progress);
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       int append, int report_progress);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..3b6fd0d728 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
 	test_path_is_file info/commit-graph
 '
 
+test_expect_success 'close with correct error on bad input' '
+	cd "$TRASH_DIRECTORY/full" &&
+	echo doesnotexist >in &&
+	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
+	test "$ret" = 1 &&
+	test_i18ngrep "error adding pack" stderr
+'
+
 test_expect_success 'create commits and repack' '
 	cd "$TRASH_DIRECTORY/full" &&
 	for i in $(test_seq 3)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 03/11] commit-graph: collapse parameters into flags
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
                           ` (7 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
As we update these methods, adding more parameters this way becomes
cluttered and hard to maintain.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 8 +++++---
 builtin/commit.c       | 2 +-
 builtin/gc.c           | 4 ++--
 commit-graph.c         | 9 +++++----
 commit-graph.h         | 8 +++++---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2a1c4d701f..d8efa5bab2 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
 	struct string_list *commit_hex = NULL;
 	struct string_list lines;
 	int result = 0;
+	unsigned int flags = COMMIT_GRAPH_PROGRESS;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
 		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
+	if (opts.append)
+		flags |= COMMIT_GRAPH_APPEND;
 
 	read_replace_refs = 0;
 
 	if (opts.reachable)
-		return write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
+		return write_commit_graph_reachable(opts.obj_dir, flags);
 
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
 	if (write_commit_graph(opts.obj_dir,
 			       pack_indexes,
 			       commit_hex,
-			       opts.append,
-			       1))
+			       flags))
 		result = 1;
 
 	UNLEAK(lines);
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		      "not exceeded, and then \"git reset HEAD\" to recover."));
 
 	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-	    write_commit_graph_reachable(get_object_directory(), 0, 0))
+	    write_commit_graph_reachable(get_object_directory(), 0))
 		return 1;
 
 	repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	}
 
 	if (gc_write_commit_graph &&
-	    write_commit_graph_reachable(get_object_directory(), 0,
-					 !quiet && !daemonized))
+	    write_commit_graph_reachable(get_object_directory(),
+					 !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0))
 		return 1;
 
 	if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index 1b58d1da14..fc40b531af 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
 	return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				 int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 {
 	struct string_list list = STRING_LIST_INIT_DUP;
 	int result;
 
 	for_each_ref(add_ref_to_list, &list);
 	result = write_commit_graph(obj_dir, NULL, &list,
-				    append, report_progress);
+				    flags);
 
 	string_list_clear(&list, 0);
 	return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append,
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress)
+		       unsigned int flags)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
 	struct strbuf progress_title = STRBUF_INIT;
 	unsigned long approx_nr_objects;
 	int res = 0;
+	int append = flags & COMMIT_GRAPH_APPEND;
+	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
diff --git a/commit-graph.h b/commit-graph.h
index 869717ca19..01538b5cf5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -65,18 +65,20 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
  */
 int generation_numbers_enabled(struct repository *r);
 
+#define COMMIT_GRAPH_APPEND     (1 << 0)
+#define COMMIT_GRAPH_PROGRESS   (1 << 1)
+
 /*
  * The write_commit_graph* methods return zero on success
  * and a negative value on failure. Note that if the repository
  * is not compatible with the commit-graph feature, then the
  * methods will return 0 without writing a commit-graph.
  */
-int write_commit_graph_reachable(const char *obj_dir, int append,
-				  int report_progress);
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags);
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
-		       int append, int report_progress);
+		       unsigned int flags);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 04/11] commit-graph: remove Future Work section
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (2 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
                           ` (6 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.

As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.

It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is expensive. If we want to start trusting
remote content, then that item can be investigated again.

While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index 7805b0968c..fb53341d5e 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,23 +127,6 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
-Future Work
------------
-
-- After computing and storing generation numbers, we must make graph
-  walks aware of generation numbers to gain the performance benefits they
-  enable. This will mostly be accomplished by swapping a commit-date-ordered
-  priority queue with one ordered by generation number. The following
-  operations are important candidates:
-
-    - 'log --topo-order'
-    - 'tag --merged'
-
-- A server could provide a commit-graph file as part of the network protocol
-  to avoid extra calculations by clients. This feature is only of benefit if
-  the user is willing to trust the file, because verifying the file is correct
-  is as hard as computing it from scratch.
-
 Related Links
 -------------
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 05/11] commit-graph: create write_commit_graph_context
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (3 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
                           ` (5 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too large and complex. To simplify
it, we should extract several helper functions. However, we will risk
repeating a lot of declarations related to progress incidators and
object id or commit lists.

Create a new write_commit_graph_context struct that contains the
core data structures used in this process. Replace the other local
variables with the values inside the context object. Following this
change, we will start to lift code segments wholesale out of the
write_commit_graph() method and into helper functions.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 390 ++++++++++++++++++++++++-------------------------
 1 file changed, 194 insertions(+), 196 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc40b531af..6d7e83cfe8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -518,14 +518,38 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
 	return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
 }
 
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+struct write_commit_graph_context {
+	struct repository *r;
+	const char *obj_dir;
+	char *graph_name;
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	int num_extra_edges;
+	unsigned long approx_nr_objects;
+	struct progress *progress;
+	int progress_done;
+	uint64_t progress_cnt;
+	unsigned append:1,
+		 report_progress:1;
+};
+
 static void write_graph_chunk_fanout(struct hashfile *f,
-				     struct commit **commits,
-				     int nr_commits,
-				     struct progress *progress,
-				     uint64_t *progress_cnt)
+				     struct write_commit_graph_context *ctx)
 {
 	int i, count = 0;
-	struct commit **list = commits;
+	struct commit **list = ctx->commits.list;
 
 	/*
 	 * Write the first-level table (the list is sorted,
@@ -533,10 +557,10 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 	 * having to do eight extra binary search iterations).
 	 */
 	for (i = 0; i < 256; i++) {
-		while (count < nr_commits) {
+		while (count < ctx->commits.nr) {
 			if ((*list)->object.oid.hash[0] != i)
 				break;
-			display_progress(progress, ++*progress_cnt);
+			display_progress(ctx->progress, ++ctx->progress_cnt);
 			count++;
 			list++;
 		}
@@ -546,14 +570,12 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits,
-				   struct progress *progress,
-				   uint64_t *progress_cnt)
+				   struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
+	struct commit **list = ctx->commits.list;
 	int count;
-	for (count = 0; count < nr_commits; count++, list++) {
-		display_progress(progress, ++*progress_cnt);
+	for (count = 0; count < ctx->commits.nr; count++, list++) {
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
 	}
 }
@@ -565,19 +587,17 @@ static const unsigned char *commit_to_sha1(size_t index, void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-				   struct commit **commits, int nr_commits,
-				   struct progress *progress,
-				   uint64_t *progress_cnt)
+				   struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
-	struct commit **last = commits + nr_commits;
+	struct commit **list = ctx->commits.list;
+	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	uint32_t num_extra_edges = 0;
 
 	while (list < last) {
 		struct commit_list *parent;
 		int edge_value;
 		uint32_t packedDate[2];
-		display_progress(progress, ++*progress_cnt);
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 
 		parse_commit_no_graph(*list);
 		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -588,8 +608,8 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			edge_value = GRAPH_PARENT_NONE;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
-					      commits,
-					      nr_commits,
+					      ctx->commits.list,
+					      ctx->commits.nr,
 					      commit_to_sha1);
 
 			if (edge_value < 0)
@@ -609,8 +629,8 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			edge_value = GRAPH_EXTRA_EDGES_NEEDED | num_extra_edges;
 		else {
 			edge_value = sha1_pos(parent->item->object.oid.hash,
-					      commits,
-					      nr_commits,
+					      ctx->commits.list,
+					      ctx->commits.nr,
 					      commit_to_sha1);
 			if (edge_value < 0)
 				BUG("missing parent %s for commit %s",
@@ -642,19 +662,16 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 }
 
 static void write_graph_chunk_extra_edges(struct hashfile *f,
-					  struct commit **commits,
-					  int nr_commits,
-					  struct progress *progress,
-					  uint64_t *progress_cnt)
+					  struct write_commit_graph_context *ctx)
 {
-	struct commit **list = commits;
-	struct commit **last = commits + nr_commits;
+	struct commit **list = ctx->commits.list;
+	struct commit **last = ctx->commits.list + ctx->commits.nr;
 	struct commit_list *parent;
 
 	while (list < last) {
 		int num_parents = 0;
 
-		display_progress(progress, ++*progress_cnt);
+		display_progress(ctx->progress, ++ctx->progress_cnt);
 
 		for (parent = (*list)->parents; num_parents < 3 && parent;
 		     parent = parent->next)
@@ -668,8 +685,8 @@ static void write_graph_chunk_extra_edges(struct hashfile *f,
 		/* Since num_parents > 2, this initializer is safe. */
 		for (parent = (*list)->parents->next; parent; parent = parent->next) {
 			int edge_value = sha1_pos(parent->item->object.oid.hash,
-						  commits,
-						  nr_commits,
+						  ctx->commits.list,
+						  ctx->commits.nr,
 						  commit_to_sha1);
 
 			if (edge_value < 0)
@@ -693,125 +710,111 @@ static int commit_compare(const void *_a, const void *_b)
 	return oidcmp(a, b);
 }
 
-struct packed_commit_list {
-	struct commit **list;
-	int nr;
-	int alloc;
-};
-
-struct packed_oid_list {
-	struct object_id *list;
-	int nr;
-	int alloc;
-	struct progress *progress;
-	int progress_done;
-};
-
 static int add_packed_commits(const struct object_id *oid,
 			      struct packed_git *pack,
 			      uint32_t pos,
 			      void *data)
 {
-	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	struct write_commit_graph_context *ctx = (struct write_commit_graph_context*)data;
 	enum object_type type;
 	off_t offset = nth_packed_object_offset(pack, pos);
 	struct object_info oi = OBJECT_INFO_INIT;
 
-	if (list->progress)
-		display_progress(list->progress, ++list->progress_done);
+	if (ctx->progress)
+		display_progress(ctx->progress, ++ctx->progress_done);
 
 	oi.typep = &type;
-	if (packed_object_info(the_repository, pack, offset, &oi) < 0)
+	if (packed_object_info(ctx->r, pack, offset, &oi) < 0)
 		die(_("unable to get type of object %s"), oid_to_hex(oid));
 
 	if (type != OBJ_COMMIT)
 		return 0;
 
-	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
-	oidcpy(&(list->list[list->nr]), oid);
-	list->nr++;
+	ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+	oidcpy(&(ctx->oids.list[ctx->oids.nr]), oid);
+	ctx->oids.nr++;
 
 	return 0;
 }
 
-static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+static void add_missing_parents(struct write_commit_graph_context *ctx, struct commit *commit)
 {
 	struct commit_list *parent;
 	for (parent = commit->parents; parent; parent = parent->next) {
 		if (!(parent->item->object.flags & UNINTERESTING)) {
-			ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
-			oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
-			oids->nr++;
+			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+			oidcpy(&ctx->oids.list[ctx->oids.nr], &(parent->item->object.oid));
+			ctx->oids.nr++;
 			parent->item->object.flags |= UNINTERESTING;
 		}
 	}
 }
 
-static void close_reachable(struct packed_oid_list *oids, int report_progress)
+static void close_reachable(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct commit *commit;
-	struct progress *progress = NULL;
 
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Loading known commits in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Loading known commits in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 		if (commit)
 			commit->object.flags |= UNINTERESTING;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
 	/*
-	 * As this loop runs, oids->nr may grow, but not more
+	 * As this loop runs, ctx->oids.nr may grow, but not more
 	 * than the number of missing commits in the reachable
 	 * closure.
 	 */
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Expanding reachable commits in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Expanding reachable commits in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit && !parse_commit_no_graph(commit))
-			add_missing_parents(oids, commit);
+			add_missing_parents(ctx, commit);
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
-	if (report_progress)
-		progress = start_delayed_progress(
-			_("Clearing commit marks in commit graph"), oids->nr);
-	for (i = 0; i < oids->nr; i++) {
-		display_progress(progress, i + 1);
-		commit = lookup_commit(the_repository, &oids->list[i]);
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Clearing commit marks in commit graph"),
+					ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		commit = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (commit)
 			commit->object.flags &= ~UNINTERESTING;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 }
 
-static void compute_generation_numbers(struct packed_commit_list* commits,
-				       int report_progress)
+static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct commit_list *list = NULL;
-	struct progress *progress = NULL;
 
-	if (report_progress)
-		progress = start_progress(
-			_("Computing commit graph generation numbers"),
-			commits->nr);
-	for (i = 0; i < commits->nr; i++) {
-		display_progress(progress, i + 1);
-		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+	if (ctx->report_progress)
+		ctx->progress = start_progress(
+					_("Computing commit graph generation numbers"),
+					ctx->commits.nr);
+	for (i = 0; i < ctx->commits.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
+		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
 			continue;
 
-		commit_list_insert(commits->list[i], &list);
+		commit_list_insert(ctx->commits.list[i], &list);
 		while (list) {
 			struct commit *current = list->item;
 			struct commit_list *parent;
@@ -838,7 +841,7 @@ static void compute_generation_numbers(struct packed_commit_list* commits,
 			}
 		}
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 }
 
 static int add_ref_to_list(const char *refname,
@@ -869,8 +872,7 @@ int write_commit_graph(const char *obj_dir,
 		       struct string_list *commit_hex,
 		       unsigned int flags)
 {
-	struct packed_oid_list oids;
-	struct packed_commit_list commits;
+	struct write_commit_graph_context *ctx;
 	struct hashfile *f;
 	uint32_t i, count_distinct = 0;
 	char *graph_name = NULL;
@@ -878,44 +880,38 @@ int write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	int num_extra_edges;
 	struct commit_list *parent;
-	struct progress *progress = NULL;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	uint64_t progress_cnt = 0;
 	struct strbuf progress_title = STRBUF_INIT;
-	unsigned long approx_nr_objects;
 	int res = 0;
-	int append = flags & COMMIT_GRAPH_APPEND;
-	int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
 
-	oids.nr = 0;
-	approx_nr_objects = approximate_object_count();
-	oids.alloc = approx_nr_objects / 32;
-	oids.progress = NULL;
-	oids.progress_done = 0;
-	commits.list = NULL;
-
-	if (append) {
-		prepare_commit_graph_one(the_repository, obj_dir);
-		if (the_repository->objects->commit_graph)
-			oids.alloc += the_repository->objects->commit_graph->num_commits;
+	ctx = xcalloc(1, sizeof(struct write_commit_graph_context));
+	ctx->r = the_repository;
+	ctx->obj_dir = obj_dir;
+	ctx->append = flags & COMMIT_GRAPH_APPEND ? 1 : 0;
+	ctx->report_progress = flags & COMMIT_GRAPH_PROGRESS ? 1 : 0;
+
+	ctx->approx_nr_objects = approximate_object_count();
+	ctx->oids.alloc = ctx->approx_nr_objects / 32;
+
+	if (ctx->append) {
+		prepare_commit_graph_one(ctx->r, ctx->obj_dir);
+		if (ctx->r->objects->commit_graph)
+			ctx->oids.alloc += ctx->r->objects->commit_graph->num_commits;
 	}
 
-	if (oids.alloc < 1024)
-		oids.alloc = 1024;
-	ALLOC_ARRAY(oids.list, oids.alloc);
-
-	if (append && the_repository->objects->commit_graph) {
-		struct commit_graph *commit_graph =
-			the_repository->objects->commit_graph;
-		for (i = 0; i < commit_graph->num_commits; i++) {
-			const unsigned char *hash = commit_graph->chunk_oid_lookup +
-				commit_graph->hash_len * i;
-			hashcpy(oids.list[oids.nr++].hash, hash);
+	if (ctx->oids.alloc < 1024)
+		ctx->oids.alloc = 1024;
+	ALLOC_ARRAY(ctx->oids.list, ctx->oids.alloc);
+
+	if (ctx->append && ctx->r->objects->commit_graph) {
+		struct commit_graph *g = ctx->r->objects->commit_graph;
+		for (i = 0; i < g->num_commits; i++) {
+			const unsigned char *hash = g->chunk_oid_lookup + g->hash_len * i;
+			hashcpy(ctx->oids.list[ctx->oids.nr++].hash, hash);
 		}
 	}
 
@@ -924,14 +920,14 @@ int write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		if (report_progress) {
+		if (ctx->report_progress) {
 			strbuf_addf(&progress_title,
 				    Q_("Finding commits for commit graph in %d pack",
 				       "Finding commits for commit graph in %d packs",
 				       pack_indexes->nr),
 				    pack_indexes->nr);
-			oids.progress = start_delayed_progress(progress_title.buf, 0);
-			oids.progress_done = 0;
+			ctx->progress = start_delayed_progress(progress_title.buf, 0);
+			ctx->progress_done = 0;
 		}
 		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
@@ -948,75 +944,76 @@ int write_commit_graph(const char *obj_dir,
 				res = -1;
 				goto cleanup;
 			}
-			for_each_object_in_pack(p, add_packed_commits, &oids,
+			for_each_object_in_pack(p, add_packed_commits, ctx,
 						FOR_EACH_OBJECT_PACK_ORDER);
 			close_pack(p);
 			free(p);
 		}
-		stop_progress(&oids.progress);
+		stop_progress(&ctx->progress);
 		strbuf_reset(&progress_title);
 		strbuf_release(&packname);
 	}
 
 	if (commit_hex) {
-		if (report_progress) {
+		if (ctx->report_progress) {
 			strbuf_addf(&progress_title,
 				    Q_("Finding commits for commit graph from %d ref",
 				       "Finding commits for commit graph from %d refs",
 				       commit_hex->nr),
 				    commit_hex->nr);
-			progress = start_delayed_progress(progress_title.buf,
-							  commit_hex->nr);
+			ctx->progress = start_delayed_progress(
+						progress_title.buf,
+						commit_hex->nr);
 		}
 		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			display_progress(progress, i + 1);
+			display_progress(ctx->progress, i + 1);
 			if (commit_hex->items[i].string &&
 			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
-			result = lookup_commit_reference_gently(the_repository, &oid, 1);
+			result = lookup_commit_reference_gently(ctx->r, &oid, 1);
 
 			if (result) {
-				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
-				oidcpy(&oids.list[oids.nr], &(result->object.oid));
-				oids.nr++;
+				ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+				oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
+				ctx->oids.nr++;
 			}
 		}
-		stop_progress(&progress);
+		stop_progress(&ctx->progress);
 		strbuf_reset(&progress_title);
 	}
 
 	if (!pack_indexes && !commit_hex) {
-		if (report_progress)
-			oids.progress = start_delayed_progress(
+		if (ctx->report_progress)
+			ctx->progress = start_delayed_progress(
 				_("Finding commits for commit graph among packed objects"),
-				approx_nr_objects);
-		for_each_packed_object(add_packed_commits, &oids,
+				ctx->approx_nr_objects);
+		for_each_packed_object(add_packed_commits, ctx,
 				       FOR_EACH_OBJECT_PACK_ORDER);
-		if (oids.progress_done < approx_nr_objects)
-			display_progress(oids.progress, approx_nr_objects);
-		stop_progress(&oids.progress);
+		if (ctx->progress_done < ctx->approx_nr_objects)
+			display_progress(ctx->progress, ctx->approx_nr_objects);
+		stop_progress(&ctx->progress);
 	}
 
-	close_reachable(&oids, report_progress);
+	close_reachable(ctx);
 
-	if (report_progress)
-		progress = start_delayed_progress(
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
 			_("Counting distinct commits in commit graph"),
-			oids.nr);
-	display_progress(progress, 0); /* TODO: Measure QSORT() progress */
-	QSORT(oids.list, oids.nr, commit_compare);
+			ctx->oids.nr);
+	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
+	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
 	count_distinct = 1;
-	for (i = 1; i < oids.nr; i++) {
-		display_progress(progress, i + 1);
-		if (!oideq(&oids.list[i - 1], &oids.list[i]))
+	for (i = 1; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
 			count_distinct++;
 	}
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
 	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
 		error(_("the commit graph format cannot write %d commits"), count_distinct);
@@ -1024,54 +1021,54 @@ int write_commit_graph(const char *obj_dir,
 		goto cleanup;
 	}
 
-	commits.nr = 0;
-	commits.alloc = count_distinct;
-	ALLOC_ARRAY(commits.list, commits.alloc);
+	ctx->commits.alloc = count_distinct;
+	ALLOC_ARRAY(ctx->commits.list, ctx->commits.alloc);
 
-	num_extra_edges = 0;
-	if (report_progress)
-		progress = start_delayed_progress(
+	ctx->num_extra_edges = 0;
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
 			_("Finding extra edges in commit graph"),
-			oids.nr);
-	for (i = 0; i < oids.nr; i++) {
+			ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
 		int num_parents = 0;
-		display_progress(progress, i + 1);
-		if (i > 0 && oideq(&oids.list[i - 1], &oids.list[i]))
+		display_progress(ctx->progress, i + 1);
+		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
 			continue;
 
-		commits.list[commits.nr] = lookup_commit(the_repository, &oids.list[i]);
-		parse_commit_no_graph(commits.list[commits.nr]);
+		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
+		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
 
-		for (parent = commits.list[commits.nr]->parents;
+		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
 		     parent; parent = parent->next)
 			num_parents++;
 
 		if (num_parents > 2)
-			num_extra_edges += num_parents - 1;
+			ctx->num_extra_edges += num_parents - 1;
 
-		commits.nr++;
+		ctx->commits.nr++;
 	}
-	num_chunks = num_extra_edges ? 4 : 3;
-	stop_progress(&progress);
+	stop_progress(&ctx->progress);
 
-	if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
+	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
 		error(_("too many commits to write graph"));
 		res = -1;
 		goto cleanup;
 	}
 
-	compute_generation_numbers(&commits, report_progress);
+	compute_generation_numbers(ctx);
 
-	graph_name = get_commit_graph_filename(obj_dir);
-	if (safe_create_leading_directories(graph_name)) {
-		UNLEAK(graph_name);
+	num_chunks = ctx->num_extra_edges ? 4 : 3;
+
+	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
+	if (safe_create_leading_directories(ctx->graph_name)) {
+		UNLEAK(ctx->graph_name);
 		error(_("unable to create leading directories of %s"),
-			graph_name);
+			ctx->graph_name);
 		res = -1;
 		goto cleanup;
 	}
 
-	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
@@ -1084,7 +1081,7 @@ int write_commit_graph(const char *obj_dir,
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (num_extra_edges)
+	if (ctx->num_extra_edges)
 		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
 	else
 		chunk_ids[3] = 0;
@@ -1092,9 +1089,9 @@ int write_commit_graph(const char *obj_dir,
 
 	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
 
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
@@ -1105,32 +1102,33 @@ int write_commit_graph(const char *obj_dir,
 		hashwrite(f, chunk_write, 12);
 	}
 
-	if (report_progress) {
+	if (ctx->report_progress) {
 		strbuf_addf(&progress_title,
 			    Q_("Writing out commit graph in %d pass",
 			       "Writing out commit graph in %d passes",
 			       num_chunks),
 			    num_chunks);
-		progress = start_delayed_progress(
+		ctx->progress = start_delayed_progress(
 			progress_title.buf,
-			num_chunks * commits.nr);
+			num_chunks * ctx->commits.nr);
 	}
-	write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt);
-	write_graph_chunk_oids(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
-	write_graph_chunk_data(f, hashsz, commits.list, commits.nr, progress, &progress_cnt);
-	if (num_extra_edges)
-		write_graph_chunk_extra_edges(f, commits.list, commits.nr, progress, &progress_cnt);
-	stop_progress(&progress);
+	write_graph_chunk_fanout(f, ctx);
+	write_graph_chunk_oids(f, hashsz, ctx);
+	write_graph_chunk_data(f, hashsz, ctx);
+	if (ctx->num_extra_edges)
+		write_graph_chunk_extra_edges(f, ctx);
+	stop_progress(&ctx->progress);
 	strbuf_release(&progress_title);
 
-	close_commit_graph(the_repository);
+	close_commit_graph(ctx->r);
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
 cleanup:
 	free(graph_name);
-	free(commits.list);
-	free(oids.list);
+	free(ctx->commits.list);
+	free(ctx->oids.list);
+	free(ctx);
 
 	return res;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (4 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
                           ` (4 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

This extracts fill_oids_from_packs() that reads the given
pack-file list and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 83 ++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 36 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6d7e83cfe8..02e5f8c651 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -867,6 +867,51 @@ int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 	return result;
 }
 
+static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
+				struct string_list *pack_indexes)
+{
+	uint32_t i;
+	struct strbuf progress_title = STRBUF_INIT;
+	struct strbuf packname = STRBUF_INIT;
+	int dirlen;
+
+	strbuf_addf(&packname, "%s/pack/", ctx->obj_dir);
+	dirlen = packname.len;
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Finding commits for commit graph in %d pack",
+			       "Finding commits for commit graph in %d packs",
+			       pack_indexes->nr),
+			    pack_indexes->nr);
+		ctx->progress = start_delayed_progress(progress_title.buf, 0);
+		ctx->progress_done = 0;
+	}
+	for (i = 0; i < pack_indexes->nr; i++) {
+		struct packed_git *p;
+		strbuf_setlen(&packname, dirlen);
+		strbuf_addstr(&packname, pack_indexes->items[i].string);
+		p = add_packed_git(packname.buf, packname.len, 1);
+		if (!p) {
+			error(_("error adding pack %s"), packname.buf);
+			return -1;
+		}
+		if (open_pack_index(p)) {
+			error(_("error opening index for %s"), packname.buf);
+			return -1;
+		}
+		for_each_object_in_pack(p, add_packed_commits, ctx,
+					FOR_EACH_OBJECT_PACK_ORDER);
+		close_pack(p);
+		free(p);
+	}
+
+	stop_progress(&ctx->progress);
+	strbuf_reset(&progress_title);
+	strbuf_release(&packname);
+
+	return 0;
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -916,42 +961,8 @@ int write_commit_graph(const char *obj_dir,
 	}
 
 	if (pack_indexes) {
-		struct strbuf packname = STRBUF_INIT;
-		int dirlen;
-		strbuf_addf(&packname, "%s/pack/", obj_dir);
-		dirlen = packname.len;
-		if (ctx->report_progress) {
-			strbuf_addf(&progress_title,
-				    Q_("Finding commits for commit graph in %d pack",
-				       "Finding commits for commit graph in %d packs",
-				       pack_indexes->nr),
-				    pack_indexes->nr);
-			ctx->progress = start_delayed_progress(progress_title.buf, 0);
-			ctx->progress_done = 0;
-		}
-		for (i = 0; i < pack_indexes->nr; i++) {
-			struct packed_git *p;
-			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes->items[i].string);
-			p = add_packed_git(packname.buf, packname.len, 1);
-			if (!p) {
-				error(_("error adding pack %s"), packname.buf);
-				res = -1;
-				goto cleanup;
-			}
-			if (open_pack_index(p)) {
-				error(_("error opening index for %s"), packname.buf);
-				res = -1;
-				goto cleanup;
-			}
-			for_each_object_in_pack(p, add_packed_commits, ctx,
-						FOR_EACH_OBJECT_PACK_ORDER);
-			close_pack(p);
-			free(p);
-		}
-		stop_progress(&ctx->progress);
-		strbuf_reset(&progress_title);
-		strbuf_release(&packname);
+		if ((res = fill_oids_from_packs(ctx, pack_indexes)))
+			goto cleanup;
 	}
 
 	if (commit_hex) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (6 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
                           ` (2 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract fill_oids_from_commit_hex() that reads the given commit
id list and fille the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 72 ++++++++++++++++++++++++++++----------------------
 1 file changed, 40 insertions(+), 32 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 02e5f8c651..4fae1fcdb2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -912,6 +912,44 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 	return 0;
 }
 
+static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx,
+				      struct string_list *commit_hex)
+{
+	uint32_t i;
+	struct strbuf progress_title = STRBUF_INIT;
+
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Finding commits for commit graph from %d ref",
+			       "Finding commits for commit graph from %d refs",
+			       commit_hex->nr),
+			    commit_hex->nr);
+		ctx->progress = start_delayed_progress(
+					progress_title.buf,
+					commit_hex->nr);
+	}
+	for (i = 0; i < commit_hex->nr; i++) {
+		const char *end;
+		struct object_id oid;
+		struct commit *result;
+
+		display_progress(ctx->progress, i + 1);
+		if (commit_hex->items[i].string &&
+		    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
+			continue;
+
+		result = lookup_commit_reference_gently(ctx->r, &oid, 1);
+
+		if (result) {
+			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
+			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
+			ctx->oids.nr++;
+		}
+	}
+	stop_progress(&ctx->progress);
+	strbuf_release(&progress_title);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -965,38 +1003,8 @@ int write_commit_graph(const char *obj_dir,
 			goto cleanup;
 	}
 
-	if (commit_hex) {
-		if (ctx->report_progress) {
-			strbuf_addf(&progress_title,
-				    Q_("Finding commits for commit graph from %d ref",
-				       "Finding commits for commit graph from %d refs",
-				       commit_hex->nr),
-				    commit_hex->nr);
-			ctx->progress = start_delayed_progress(
-						progress_title.buf,
-						commit_hex->nr);
-		}
-		for (i = 0; i < commit_hex->nr; i++) {
-			const char *end;
-			struct object_id oid;
-			struct commit *result;
-
-			display_progress(ctx->progress, i + 1);
-			if (commit_hex->items[i].string &&
-			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
-				continue;
-
-			result = lookup_commit_reference_gently(ctx->r, &oid, 1);
-
-			if (result) {
-				ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
-				oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
-				ctx->oids.nr++;
-			}
-		}
-		stop_progress(&ctx->progress);
-		strbuf_reset(&progress_title);
-	}
+	if (commit_hex)
+		fill_oids_from_commit_hex(ctx, commit_hex);
 
 	if (!pack_indexes && !commit_hex) {
 		if (ctx->report_progress)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (5 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
                           ` (3 subsequent siblings)
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract fill_oids_from_all_packs() that reads all pack-files
for commits and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 4fae1fcdb2..61cb43ddf8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -950,6 +950,19 @@ static void fill_oids_from_commit_hex(struct write_commit_graph_context *ctx,
 	strbuf_release(&progress_title);
 }
 
+static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx)
+{
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Finding commits for commit graph among packed objects"),
+			ctx->approx_nr_objects);
+	for_each_packed_object(add_packed_commits, ctx,
+			       FOR_EACH_OBJECT_PACK_ORDER);
+	if (ctx->progress_done < ctx->approx_nr_objects)
+		display_progress(ctx->progress, ctx->approx_nr_objects);
+	stop_progress(&ctx->progress);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -1006,17 +1019,8 @@ int write_commit_graph(const char *obj_dir,
 	if (commit_hex)
 		fill_oids_from_commit_hex(ctx, commit_hex);
 
-	if (!pack_indexes && !commit_hex) {
-		if (ctx->report_progress)
-			ctx->progress = start_delayed_progress(
-				_("Finding commits for commit graph among packed objects"),
-				ctx->approx_nr_objects);
-		for_each_packed_object(add_packed_commits, ctx,
-				       FOR_EACH_OBJECT_PACK_ORDER);
-		if (ctx->progress_done < ctx->approx_nr_objects)
-			display_progress(ctx->progress, ctx->approx_nr_objects);
-		stop_progress(&ctx->progress);
-	}
+	if (!pack_indexes && !commit_hex)
+		fill_oids_from_all_packs(ctx);
 
 	close_reachable(ctx);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 09/11] commit-graph: extract count_distinct_commits()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (7 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract count_distinct_commits(), which sorts the oids list, then
iterates through to find duplicates.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 61cb43ddf8..1a0a875a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -963,6 +963,27 @@ static void fill_oids_from_all_packs(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
+{
+	uint32_t i, count_distinct = 1;
+
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Counting distinct commits in commit graph"),
+			ctx->oids.nr);
+	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
+	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
+
+	for (i = 1; i < ctx->oids.nr; i++) {
+		display_progress(ctx->progress, i + 1);
+		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
+			count_distinct++;
+	}
+	stop_progress(&ctx->progress);
+
+	return count_distinct;
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -1024,19 +1045,7 @@ int write_commit_graph(const char *obj_dir,
 
 	close_reachable(ctx);
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-			_("Counting distinct commits in commit graph"),
-			ctx->oids.nr);
-	display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
-	QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
-	count_distinct = 1;
-	for (i = 1; i < ctx->oids.nr; i++) {
-		display_progress(ctx->progress, i + 1);
-		if (!oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
-			count_distinct++;
-	}
-	stop_progress(&ctx->progress);
+	count_distinct = count_distinct_commits(ctx);
 
 	if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
 		error(_("the commit graph format cannot write %d commits"), count_distinct);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (8 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract copy_oids_to_commits(), which fills the commits list
with the distinct commits from the oids list. During this loop,
it also counts the number of "extra" edges from octopus merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 57 ++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 1a0a875a7b..72f9c5c7e2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -984,6 +984,37 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 	return count_distinct;
 }
 
+static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
+{
+	uint32_t i;
+	struct commit_list *parent;
+
+	ctx->num_extra_edges = 0;
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+			_("Finding extra edges in commit graph"),
+			ctx->oids.nr);
+	for (i = 0; i < ctx->oids.nr; i++) {
+		int num_parents = 0;
+		display_progress(ctx->progress, i + 1);
+		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
+			continue;
+
+		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
+		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
+
+		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			ctx->num_extra_edges += num_parents - 1;
+
+		ctx->commits.nr++;
+	}
+	stop_progress(&ctx->progress);
+}
+
 int write_commit_graph(const char *obj_dir,
 		       struct string_list *pack_indexes,
 		       struct string_list *commit_hex,
@@ -997,7 +1028,6 @@ int write_commit_graph(const char *obj_dir,
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
 	int num_chunks;
-	struct commit_list *parent;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
 	int res = 0;
@@ -1056,30 +1086,7 @@ int write_commit_graph(const char *obj_dir,
 	ctx->commits.alloc = count_distinct;
 	ALLOC_ARRAY(ctx->commits.list, ctx->commits.alloc);
 
-	ctx->num_extra_edges = 0;
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-			_("Finding extra edges in commit graph"),
-			ctx->oids.nr);
-	for (i = 0; i < ctx->oids.nr; i++) {
-		int num_parents = 0;
-		display_progress(ctx->progress, i + 1);
-		if (i > 0 && oideq(&ctx->oids.list[i - 1], &ctx->oids.list[i]))
-			continue;
-
-		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
-		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
-
-		for (parent = ctx->commits.list[ctx->commits.nr]->parents;
-		     parent; parent = parent->next)
-			num_parents++;
-
-		if (num_parents > 2)
-			ctx->num_extra_edges += num_parents - 1;
-
-		ctx->commits.nr++;
-	}
-	stop_progress(&ctx->progress);
+	copy_oids_to_commits(ctx);
 
 	if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) {
 		error(_("too many commits to write graph"));
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v5 11/11] commit-graph: extract write_commit_graph_file()
  2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
                           ` (9 preceding siblings ...)
  2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
@ 2019-06-12 13:29         ` Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2019-06-12 13:29 UTC (permalink / raw)
  To: git; +Cc: sandals, avarab, peff, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract write_commit_graph_file() that takes all of the information
in the context struct and writes the data to a commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 155 +++++++++++++++++++++++++------------------------
 1 file changed, 80 insertions(+), 75 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 72f9c5c7e2..9d2c72f5b4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1015,21 +1015,91 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-int write_commit_graph(const char *obj_dir,
-		       struct string_list *pack_indexes,
-		       struct string_list *commit_hex,
-		       unsigned int flags)
+static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 {
-	struct write_commit_graph_context *ctx;
+	uint32_t i;
 	struct hashfile *f;
-	uint32_t i, count_distinct = 0;
-	char *graph_name = NULL;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[5];
 	uint64_t chunk_offsets[5];
-	int num_chunks;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
+	int num_chunks = ctx->num_extra_edges ? 4 : 3;
+
+	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
+	if (safe_create_leading_directories(ctx->graph_name)) {
+		UNLEAK(ctx->graph_name);
+		error(_("unable to create leading directories of %s"),
+			ctx->graph_name);
+		return -1;
+	}
+
+	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, oid_version());
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (ctx->num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	if (ctx->report_progress) {
+		strbuf_addf(&progress_title,
+			    Q_("Writing out commit graph in %d pass",
+			       "Writing out commit graph in %d passes",
+			       num_chunks),
+			    num_chunks);
+		ctx->progress = start_delayed_progress(
+			progress_title.buf,
+			num_chunks * ctx->commits.nr);
+	}
+	write_graph_chunk_fanout(f, ctx);
+	write_graph_chunk_oids(f, hashsz, ctx);
+	write_graph_chunk_data(f, hashsz, ctx);
+	if (ctx->num_extra_edges)
+		write_graph_chunk_extra_edges(f, ctx);
+	stop_progress(&ctx->progress);
+	strbuf_release(&progress_title);
+
+	close_commit_graph(ctx->r);
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	commit_lock_file(&lk);
+
+	return 0;
+}
+
+int write_commit_graph(const char *obj_dir,
+		       struct string_list *pack_indexes,
+		       struct string_list *commit_hex,
+		       unsigned int flags)
+{
+	struct write_commit_graph_context *ctx;
+	uint32_t i, count_distinct = 0;
 	int res = 0;
 
 	if (!commit_graph_compatible(the_repository))
@@ -1096,75 +1166,10 @@ int write_commit_graph(const char *obj_dir,
 
 	compute_generation_numbers(ctx);
 
-	num_chunks = ctx->num_extra_edges ? 4 : 3;
-
-	ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
-	if (safe_create_leading_directories(ctx->graph_name)) {
-		UNLEAK(ctx->graph_name);
-		error(_("unable to create leading directories of %s"),
-			ctx->graph_name);
-		res = -1;
-		goto cleanup;
-	}
-
-	hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
-	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
-
-	hashwrite_be32(f, GRAPH_SIGNATURE);
-
-	hashwrite_u8(f, GRAPH_VERSION);
-	hashwrite_u8(f, oid_version());
-	hashwrite_u8(f, num_chunks);
-	hashwrite_u8(f, 0); /* unused padding byte */
-
-	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
-	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
-	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	if (ctx->num_extra_edges)
-		chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
-	else
-		chunk_ids[3] = 0;
-	chunk_ids[4] = 0;
-
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
-	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
-	chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
-
-	for (i = 0; i <= num_chunks; i++) {
-		uint32_t chunk_write[3];
-
-		chunk_write[0] = htonl(chunk_ids[i]);
-		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
-		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
-		hashwrite(f, chunk_write, 12);
-	}
-
-	if (ctx->report_progress) {
-		strbuf_addf(&progress_title,
-			    Q_("Writing out commit graph in %d pass",
-			       "Writing out commit graph in %d passes",
-			       num_chunks),
-			    num_chunks);
-		ctx->progress = start_delayed_progress(
-			progress_title.buf,
-			num_chunks * ctx->commits.nr);
-	}
-	write_graph_chunk_fanout(f, ctx);
-	write_graph_chunk_oids(f, hashsz, ctx);
-	write_graph_chunk_data(f, hashsz, ctx);
-	if (ctx->num_extra_edges)
-		write_graph_chunk_extra_edges(f, ctx);
-	stop_progress(&ctx->progress);
-	strbuf_release(&progress_title);
-
-	close_commit_graph(ctx->r);
-	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-	commit_lock_file(&lk);
+	res = write_commit_graph_file(ctx);
 
 cleanup:
-	free(graph_name);
+	free(ctx->graph_name);
 	free(ctx->commits.list);
 	free(ctx->oids.list);
 	free(ctx);
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v5 02/11] commit-graph: return with errors during write
  2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
@ 2019-06-29 17:23           ` SZEDER Gábor
  2019-07-01 12:19             ` Derrick Stolee
  0 siblings, 1 reply; 89+ messages in thread
From: SZEDER Gábor @ 2019-06-29 17:23 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Junio C Hamano, Derrick Stolee

On Wed, Jun 12, 2019 at 06:29:37AM -0700, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index e80c1cac02..3b6fd0d728 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
>  	test_path_is_file info/commit-graph
>  '
>  
> +test_expect_success 'close with correct error on bad input' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	echo doesnotexist >in &&
> +	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
> +	test "$ret" = 1 &&

This could be: 

  test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr


> +	test_i18ngrep "error adding pack" stderr
> +'

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v5 02/11] commit-graph: return with errors during write
  2019-06-29 17:23           ` SZEDER Gábor
@ 2019-07-01 12:19             ` Derrick Stolee
  0 siblings, 0 replies; 89+ messages in thread
From: Derrick Stolee @ 2019-07-01 12:19 UTC (permalink / raw)
  To: SZEDER Gábor, Derrick Stolee via GitGitGadget
  Cc: git, sandals, avarab, peff, Junio C Hamano, Derrick Stolee

On 6/29/2019 1:23 PM, SZEDER Gábor wrote:
> On Wed, Jun 12, 2019 at 06:29:37AM -0700, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
>> index e80c1cac02..3b6fd0d728 100755
>> --- a/t/t5318-commit-graph.sh
>> +++ b/t/t5318-commit-graph.sh
>> @@ -23,6 +23,14 @@ test_expect_success 'write graph with no packs' '
>>  	test_path_is_file info/commit-graph
>>  '
>>  
>> +test_expect_success 'close with correct error on bad input' '
>> +	cd "$TRASH_DIRECTORY/full" &&
>> +	echo doesnotexist >in &&
>> +	{ git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
>> +	test "$ret" = 1 &&
> 
> This could be: 
> 
>   test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr
> 
> 
>> +	test_i18ngrep "error adding pack" stderr
>> +'

Thanks!, you are right! test_expect_code is what I should have used here
instead of finding the "ret=$?" trick in t0005-signals.sh, which needs to
do more interesting logic on the return code.

Here is your suggestion as a diff. Junio: could you squash this in, or
should I submit a full patch?

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 22cb9d66430..4391007f4c1 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,8 +26,7 @@ test_expect_success 'write graph with no packs' '
 test_expect_success 'close with correct error on bad input' '
        cd "$TRASH_DIRECTORY/full" &&
        echo doesnotexist >in &&
-       { git commit-graph write --stdin-packs <in 2>stderr; ret=$?; } &&
-       test "$ret" = 1 &&
+       test_expect_code 1 git commit-graph write --stdin-packs <in 2>stderr &&
        test_i18ngrep "error adding pack" stderr
 '

I took inventory of when we are using "=$?" in the test scripts and saw
this was the only one that could easily be removed. Every other place is
doing something that can't be replaced by test_expect_code.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, back to index

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 21:59 [PATCH 0/6] Create commit-graph file format v2 Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 3/6] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-01-23 21:59 ` [PATCH 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-01-24  9:31   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-01-23 23:56   ` Jonathan Tan
2019-01-24  9:40   ` Ævar Arnfjörð Bjarmason
2019-01-24 14:34     ` Derrick Stolee
2019-03-21  9:21   ` Ævar Arnfjörð Bjarmason
2019-01-23 21:59 ` [PATCH 6/6] commit-graph: test verifying a corrupt v2 header Derrick Stolee via GitGitGadget
2019-01-23 23:59   ` Jonathan Tan
2019-01-24 23:05 ` [PATCH 0/6] Create commit-graph file format v2 Junio C Hamano
2019-01-24 23:39 ` Junio C Hamano
2019-01-25 13:54   ` Derrick Stolee
2019-04-24 19:58 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 1/5] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 2/5] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-04-25  5:21     ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 3/5] commit-graph: create new version flags Derrick Stolee via GitGitGadget
2019-04-25  5:29     ` Junio C Hamano
2019-04-25 11:09       ` Derrick Stolee
2019-04-25 21:31     ` Ævar Arnfjörð Bjarmason
2019-04-26  2:20       ` Junio C Hamano
2019-04-24 19:58   ` [PATCH v2 4/5] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-04-24 19:58   ` [PATCH v2 5/5] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-04-25 22:09   ` [PATCH v2 0/5] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-04-26  2:28     ` Junio C Hamano
2019-04-26  8:33       ` Ævar Arnfjörð Bjarmason
2019-04-26 12:06         ` Derrick Stolee
2019-04-26 13:55           ` Ævar Arnfjörð Bjarmason
2019-04-27 12:57     ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11   ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 1/6] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-01 14:46       ` Ævar Arnfjörð Bjarmason
2019-05-01 13:11     ` [PATCH v3 2/6] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 3/6] commit-graph: create new version parameter Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 4/6] commit-graph: add --version=<n> option Derrick Stolee via GitGitGadget
2019-05-01 13:11     ` [PATCH v3 5/6] commit-graph: implement file format version 2 Derrick Stolee via GitGitGadget
2019-05-01 19:12       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:56         ` Derrick Stolee
2019-05-01 13:11     ` [PATCH v3 6/6] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-01 14:58       ` Ævar Arnfjörð Bjarmason
2019-05-01 19:59         ` Derrick Stolee
2019-05-01 20:25     ` [PATCH v3 0/6] Create commit-graph file format v2 Ævar Arnfjörð Bjarmason
2019-05-02 13:26       ` Derrick Stolee
2019-05-02 18:02         ` Ævar Arnfjörð Bjarmason
2019-05-03 12:47           ` Derrick Stolee
2019-05-03 13:41             ` Ævar Arnfjörð Bjarmason
2019-05-06  8:27               ` Christian Couder
2019-05-06 13:47                 ` Derrick Stolee
2019-05-03 14:16             ` SZEDER Gábor
2019-05-03 15:11               ` Derrick Stolee
2019-05-09 14:22     ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-05-13  2:56         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-05-13  3:13         ` Junio C Hamano
2019-05-13 11:04           ` Derrick Stolee
2019-05-13 11:22             ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-05-13  3:44         ` Junio C Hamano
2019-05-13 11:07           ` Derrick Stolee
2019-05-09 14:22       ` [PATCH v4 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-05-13  5:05         ` Junio C Hamano
2019-05-09 14:22       ` [PATCH v4 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-05-09 14:22       ` [PATCH v4 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget
2019-05-13  5:09         ` Junio C Hamano
2019-05-09 17:58       ` [PATCH v4 00/11] Commit-graph write refactor (was: Create commit-graph file format v2) Josh Steadmon
2019-06-12 13:29       ` [PATCH v5 " Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 01/11] commit-graph: fix the_repository reference Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 02/11] commit-graph: return with errors during write Derrick Stolee via GitGitGadget
2019-06-29 17:23           ` SZEDER Gábor
2019-07-01 12:19             ` Derrick Stolee
2019-06-12 13:29         ` [PATCH v5 03/11] commit-graph: collapse parameters into flags Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 04/11] commit-graph: remove Future Work section Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 05/11] commit-graph: create write_commit_graph_context Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 06/11] commit-graph: extract fill_oids_from_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 08/11] commit-graph: extract fill_oids_from_all_packs() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 07/11] commit-graph: extract fill_oids_from_commit_hex() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 09/11] commit-graph: extract count_distinct_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 10/11] commit-graph: extract copy_oids_to_commits() Derrick Stolee via GitGitGadget
2019-06-12 13:29         ` [PATCH v5 11/11] commit-graph: extract write_commit_graph_file() Derrick Stolee via GitGitGadget

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git