git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc'
@ 2018-06-04 16:52 Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 01/21] commit-graph: UNLEAK before die() Derrick Stolee
                   ` (23 more replies)
  0 siblings, 24 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Thanks for the feedback on v3. There were several small cleanups, but
perhaps the biggest change is the addition of "commit-graph: use
string-list API for input" which makes "commit-graph: add '--reachable'
option" much simpler.

The inter-diff is still reasonably large, but I'll send it in a
follow-up PR.

Thanks,
-Stolee

Derrick Stolee (21):
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: use string-list API for input
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt                 |  10 +-
 Documentation/git-commit-graph.txt       |  14 +-
 Documentation/git-fsck.txt               |   3 +
 Documentation/git-gc.txt                 |   4 +
 Documentation/technical/commit-graph.txt |  22 --
 builtin/commit-graph.c                   |  98 ++++++---
 builtin/fsck.c                           |  21 ++
 builtin/gc.c                             |   6 +
 commit-graph.c                           | 248 +++++++++++++++++++++--
 commit-graph.h                           |  10 +-
 commit.c                                 |   9 +-
 commit.h                                 |   1 +
 t/t5318-commit-graph.sh                  | 201 ++++++++++++++++++
 13 files changed, 569 insertions(+), 78 deletions(-)

-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 01/21] commit-graph: UNLEAK before die()
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph)
+	if (!graph) {
+		UNLEAK(graph_name);
 		die("graph file %s does not exist", graph_name);
+	}
+
 	FREE_AND_NULL(graph_name);
 
 	printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 02/21] commit-graph: fix GRAPH_MIN_SIZE
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 01/21] commit-graph: UNLEAK before die() Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index bb54c1214c..c09e87c3c2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -34,10 +34,11 @@
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-			GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 03/21] commit-graph: parse commit from chosen graph
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 01/21] commit-graph: UNLEAK before die() Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 04/21] commit: force commit to parse from object database Derrick Stolee
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c09e87c3c2..472b41ec30 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -311,7 +311,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
 	uint32_t pos;
 
@@ -319,9 +319,21 @@ int parse_commit_in_graph(struct commit *item)
 		return 0;
 	if (item->object.parsed)
 		return 1;
+
+	if (find_commit_in_graph(item, g, &pos))
+		return fill_commit_in_graph(item, g, pos);
+
+	return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+
 	prepare_commit_graph();
-	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-		return fill_commit_in_graph(item, commit_graph, pos);
+	if (commit_graph)
+		return parse_commit_in_graph_one(commit_graph, item);
 	return 0;
 }
 
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 04/21] commit: force commit to parse from object database
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (2 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 9 +++++++--
 commit.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 1d28677dfb..6eaed0174c 100644
--- a/commit.c
+++ b/commit.c
@@ -392,7 +392,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph)
 {
 	enum object_type type;
 	void *buffer;
@@ -403,7 +403,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (parse_commit_in_graph(item))
+	if (use_commit_graph && parse_commit_in_graph(item))
 		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
@@ -424,6 +424,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 	return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+	return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (parse_commit(item))
diff --git a/commit.h b/commit.h
index b5afde1ae9..5fde74fcd7 100644
--- a/commit.h
+++ b/commit.h
@@ -73,6 +73,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 05/21] commit-graph: load a root tree from specific graph
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (3 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 04/21] commit: force commit to parse from object database Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 472b41ec30..fee8437ce3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -359,14 +359,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
 	return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+						 const struct commit *c)
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
 	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
+
+	return load_tree_for_commit(g, (struct commit *)c);
+}
 
-	return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+	return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 06/21] commit-graph: add 'verify' subcommand
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (4 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  6 +++++
 builtin/commit-graph.c             | 38 ++++++++++++++++++++++++++++++
 commit-graph.c                     | 23 ++++++++++++++++++
 commit-graph.h                     |  2 ++
 t/t5318-commit-graph.sh            | 10 ++++++++
 5 files changed, 79 insertions(+)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git commit-graph read' [--object-dir <dir>]
+'git commit-graph verify' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 --------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..3079cde6f9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,10 +8,16 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
+	N_("git commit-graph verify [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+	N_("git commit-graph verify [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	NULL
@@ -29,6 +35,36 @@ static struct opts_commit_graph {
 	int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+	struct commit_graph *graph = NULL;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_verify_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			   N_("dir"),
+			   N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_verify_options,
+			     builtin_commit_graph_verify_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+	FREE_AND_NULL(graph_name);
+
+	if (!graph)
+		return 0;
+
+	return verify_commit_graph(graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
@@ -165,6 +201,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
+		if (!strcmp(argv[0], "verify"))
+			return graph_verify(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index fee8437ce3..c860cbe721 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -824,3 +824,26 @@ void write_commit_graph(const char *obj_dir,
 	oids.alloc = 0;
 	oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_commit_graph_error = 1;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
+int verify_commit_graph(struct commit_graph *g)
+{
+	if (!g) {
+		graph_report("no commit-graph file loaded");
+		return 1;
+	}
+
+	return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..71a39c5a57 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -53,4 +53,6 @@ void write_commit_graph(const char *obj_dir,
 			int nr_commits,
 			int append);
 
+int verify_commit_graph(struct commit_graph *g);
+
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 59d0be2877..0830ef9fdd 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
 	objdir=".git/objects"
 '
 
+test_expect_success 'verify graph with no graph file' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify
+'
+
 test_expect_success 'write graph with no packs' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write --object-dir . &&
@@ -230,4 +235,9 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'git commit-graph verify' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify >output
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 07/21] commit-graph: verify catches corrupt signature
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (5 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 08/21] commit-graph: verify required chunks are present Derrick Stolee
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 43 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0830ef9fdd..c0c1ff09b9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
 	git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify <position> <data> <string>
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+	pos=$1
+	data="${2:-\0}"
+	grepstr=$3
+	cd "$TRASH_DIRECTORY/full" &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+	test_must_fail git commit-graph verify 2>test_err &&
+	grep -v "^+" test_err >err
+	test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+	corrupt_graph_and_verify 0 "\0" \
+		"graph signature"
+'
+
+test_expect_success 'detect bad version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+		"hash version"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 08/21] commit-graph: verify required chunks are present
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (6 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          |  9 +++++++++
 t/t5318-commit-graph.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index c860cbe721..25d5edea82 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -845,5 +845,14 @@ int verify_commit_graph(struct commit_graph *g)
 		return 1;
 	}
 
+	verify_commit_graph_error = 0;
+
+	if (!g->chunk_oid_fanout)
+		graph_report("commit-graph is missing the OID Fanout chunk");
+	if (!g->chunk_oid_lookup)
+		graph_report("commit-graph is missing the OID Lookup chunk");
+	if (!g->chunk_commit_data)
+		graph_report("commit-graph is missing the Commit Data chunk");
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1ff09b9..846396665e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			    1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			     2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
 		"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+		"missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+		"missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+		"missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 09/21] commit-graph: verify corrupt OID fanout and lookup
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (7 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 08/21] commit-graph: verify required chunks are present Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 10/21] commit-graph: verify objects exist Derrick Stolee
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 36 ++++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 22 ++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 25d5edea82..a90cc2e6f4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -840,6 +840,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct commit_graph *g)
 {
+	uint32_t i, cur_fanout_pos = 0;
+	struct object_id prev_oid, cur_oid;
+
 	if (!g) {
 		graph_report("no commit-graph file loaded");
 		return 1;
@@ -854,5 +857,38 @@ int verify_commit_graph(struct commit_graph *g)
 	if (!g->chunk_commit_data)
 		graph_report("commit-graph is missing the Commit Data chunk");
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+			graph_report("commit-graph has incorrect OID order: %s then %s",
+				     oid_to_hex(&prev_oid),
+				     oid_to_hex(&cur_oid));
+
+		oidcpy(&prev_oid, &cur_oid);
+
+		while (cur_oid.hash[0] > cur_fanout_pos) {
+			uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+			if (i != fanout_value)
+				graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+					     cur_fanout_pos, fanout_value, i);
+
+			cur_fanout_pos++;
+		}
+	}
+
+	while (cur_fanout_pos < 256) {
+		uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+		if (g->num_commits != fanout_value)
+			graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+				     cur_fanout_pos, fanout_value, i);
+
+		cur_fanout_pos++;
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 846396665e..c29eae47c9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			    1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			     2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+		       $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
 		"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+		"incorrect OID order"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 10/21] commit-graph: verify objects exist
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (8 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 11/21] commit-graph: verify root tree OIDs Derrick Stolee
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 +++++++++++++++++
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 24 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index a90cc2e6f4..0cf1b61d80 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -239,6 +239,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
 	struct commit *c;
 	struct object_id oid;
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -890,5 +891,21 @@ int verify_commit_graph(struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		struct commit *odb_commit;
+
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node());
+		if (parse_commit_internal(odb_commit, 0, 0)) {
+			graph_report("failed to parse %s from object database",
+				     oid_to_hex(&cur_oid));
+			continue;
+		}
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c29eae47c9..cf60e48496 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
 		"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+		"from object database"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 11/21] commit-graph: verify root tree OIDs
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (9 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 10/21] commit-graph: verify objects exist Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 12/21] commit-graph: verify parent list Derrick Stolee
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 ++++++++++++++++-
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 0cf1b61d80..50a8e27910 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -862,6 +862,8 @@ int verify_commit_graph(struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
+		struct commit *graph_commit;
+
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -879,6 +881,11 @@ int verify_commit_graph(struct commit_graph *g)
 
 			cur_fanout_pos++;
 		}
+
+		graph_commit = lookup_commit(&cur_oid);
+		if (!parse_commit_in_graph_one(g, graph_commit))
+			graph_report("failed to parse %s from commit-graph",
+				     oid_to_hex(&cur_oid));
 	}
 
 	while (cur_fanout_pos < 256) {
@@ -895,16 +902,24 @@ int verify_commit_graph(struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
-		struct commit *odb_commit;
+		struct commit *graph_commit, *odb_commit;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+		graph_commit = lookup_commit(&cur_oid);
 		odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node());
 		if (parse_commit_internal(odb_commit, 0, 0)) {
 			graph_report("failed to parse %s from object database",
 				     oid_to_hex(&cur_oid));
 			continue;
 		}
+
+		if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,
+			   get_commit_tree_oid(odb_commit)))
+			graph_report("root tree OID for commit %s in commit-graph is %s != %s",
+				     oid_to_hex(&cur_oid),
+				     oid_to_hex(get_commit_tree_oid(graph_commit)),
+				     oid_to_hex(get_commit_tree_oid(odb_commit)));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cf60e48496..c0c1248eda 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
 		"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+		"root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 12/21] commit-graph: verify parent list
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (10 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 11/21] commit-graph: verify root tree OIDs Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 13/21] commit-graph: verify generation number Derrick Stolee
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0xFFFFFFFF to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 28 ++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 18 ++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 50a8e27910..d2a86e329e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -240,6 +240,9 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 	struct commit *c;
 	struct object_id oid;
 
+	if (pos >= g->num_commits)
+		die("invalid parent position %"PRIu64, pos);
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -903,6 +906,7 @@ int verify_commit_graph(struct commit_graph *g)
 
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
+		struct commit_list *graph_parents, *odb_parents;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -920,6 +924,30 @@ int verify_commit_graph(struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     oid_to_hex(get_commit_tree_oid(graph_commit)),
 				     oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+		graph_parents = graph_commit->parents;
+		odb_parents = odb_commit->parents;
+
+		while (graph_parents) {
+			if (odb_parents == NULL) {
+				graph_report("commit-graph parent list for commit %s is too long",
+					     oid_to_hex(&cur_oid));
+				break;
+			}
+
+			if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
+				graph_report("commit-graph parent for %s is %s != %s",
+					     oid_to_hex(&cur_oid),
+					     oid_to_hex(&graph_parents->item->object.oid),
+					     oid_to_hex(&odb_parents->item->object.oid));
+
+			graph_parents = graph_parents->next;
+			odb_parents = odb_parents->next;
+		}
+
+		if (odb_parents != NULL)
+			graph_report("commit-graph parent list for commit %s terminates early",
+				     oid_to_hex(&cur_oid));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1248eda..ec0964112a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
 		"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+		"invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+		"is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+		"commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 13/21] commit-graph: verify generation number
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (11 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 12/21] commit-graph: verify parent list Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 14/21] commit-graph: verify commit date Derrick Stolee
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 34 ++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 11 +++++++++++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index d2a86e329e..5faecae2a7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -842,10 +842,14 @@ static void graph_report(const char *fmt, ...)
 	va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid;
+	int generation_zero = 0;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -907,6 +911,7 @@ int verify_commit_graph(struct commit_graph *g)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
+		uint32_t max_generation = 0;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -941,6 +946,9 @@ int verify_commit_graph(struct commit_graph *g)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
+			if (graph_parents->item->generation > max_generation)
+				max_generation = graph_parents->item->generation;
+
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
 		}
@@ -948,6 +956,32 @@ int verify_commit_graph(struct commit_graph *g)
 		if (odb_parents != NULL)
 			graph_report("commit-graph parent list for commit %s terminates early",
 				     oid_to_hex(&cur_oid));
+
+		if (!graph_commit->generation) {
+			if (generation_zero == GENERATION_NUMBER_EXISTS)
+				graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",
+					     oid_to_hex(&cur_oid));
+			generation_zero = GENERATION_ZERO_EXISTS;
+		} else if (generation_zero == GENERATION_ZERO_EXISTS)
+			graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",
+				     oid_to_hex(&cur_oid));
+
+		if (generation_zero == GENERATION_ZERO_EXISTS)
+			continue;
+
+		/*
+		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
+		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
+		 * extra logic in the following condition.
+		 */
+		if (max_generation == GENERATION_NUMBER_MAX)
+			max_generation--;
+
+		if (graph_commit->generation != max_generation + 1)
+			graph_report("commit-graph generation for commit %s is %u != %u",
+				     oid_to_hex(&cur_oid),
+				     graph_commit->generation,
+				     max_generation + 1);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ec0964112a..a6ea1341dc 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '
 		"commit-graph parent for"
 '
 
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+		"generation for commit"
+'
+
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+		"non-zero generation number"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 14/21] commit-graph: verify commit date
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (12 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 13/21] commit-graph: verify generation number Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 6 ++++++
 t/t5318-commit-graph.sh | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5faecae2a7..47fdd62e88 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -982,6 +982,12 @@ int verify_commit_graph(struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     graph_commit->generation,
 				     max_generation + 1);
+
+		if (graph_commit->date != odb_commit->date)
+			graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,
+				     oid_to_hex(&cur_oid),
+				     graph_commit->date,
+				     odb_commit->date);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a6ea1341dc..6a873bfda8 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
 		"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+		"commit date"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 15/21] commit-graph: test for corrupted octopus edge
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (13 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 14/21] commit-graph: verify commit date Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 16/21] commit-graph: verify contents match checksum Derrick Stolee
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6a873bfda8..cf67fb391a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+			     $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
 		"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+		"invalid parent"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 16/21] commit-graph: verify contents match checksum
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (14 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 17/21] fsck: verify commit-graph Derrick Stolee
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 ++++++++++++++--
 t/t5318-commit-graph.sh |  6 ++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 47fdd62e88..05879de26c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -829,6 +829,7 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -848,8 +849,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
-	struct object_id prev_oid, cur_oid;
+	struct object_id prev_oid, cur_oid, checksum;
 	int generation_zero = 0;
+	struct hashfile *f;
+	int devnull;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -868,6 +871,15 @@ int verify_commit_graph(struct commit_graph *g)
 	if (verify_commit_graph_error)
 		return verify_commit_graph_error;
 
+	devnull = open("/dev/null", O_WRONLY);
+	f = hashfd(devnull, NULL);
+	hashwrite(f, g->data, g->data_len - g->hash_len);
+	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
+		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+	}
+
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit;
 
@@ -905,7 +917,7 @@ int verify_commit_graph(struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
-	if (verify_commit_graph_error)
+	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cf67fb391a..2297a44e7d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 			     $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
 		"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 17/21] fsck: verify commit-graph
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (15 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 16/21] commit-graph: verify contents match checksum Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-06 11:08   ` Ævar Arnfjörð Bjarmason
  2018-06-04 16:52 ` [PATCH v4 18/21] commit-graph: use string-list API for input Derrick Stolee
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c             | 21 +++++++++++++++++++++
 t/t5318-commit-graph.sh    |  8 ++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 ---------------------
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index ef78c6c00c..a6d5045b77 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
 #include "streaming.h"
 #include "decorate.h"
 #include "packfile.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -45,6 +46,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	check_connectivity();
+
+	if (core_commit_graph) {
+		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL, NULL };
+		commit_graph_verify.argv = verify_argv;
+		commit_graph_verify.git_cmd = 1;
+
+		if (run_command(&commit_graph_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb();
+		for (alt = alt_odb_list; alt; alt = alt->next) {
+			verify_argv[2] = "--object-dir";
+			verify_argv[3] = alt->path;
+			if (run_command(&commit_graph_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2297a44e7d..44d4c71f0b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git fsck &&
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum" &&
+	test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 18/21] commit-graph: use string-list API for input
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (16 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 17/21] fsck: verify commit-graph Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:52 ` [PATCH v4 19/21] commit-graph: add '--reachable' option Derrick Stolee
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 39 +++++++++++++--------------------------
 commit-graph.c         | 15 +++++++--------
 commit-graph.h         |  7 +++----
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 3079cde6f9..d8eb8278b3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-	const char **pack_indexes = NULL;
-	int packs_nr = 0;
-	const char **commit_hex = NULL;
-	int commits_nr = 0;
-	const char **lines = NULL;
-	int lines_nr = 0;
-	int lines_alloc = 0;
+	struct string_list *pack_indexes = NULL;
+	struct string_list *commit_hex = NULL;
+	struct string_list lines;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
 
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-		lines_nr = 0;
-		lines_alloc = 128;
-		ALLOC_ARRAY(lines, lines_alloc);
-
-		while (strbuf_getline(&buf, stdin) != EOF) {
-			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-			lines[lines_nr++] = strbuf_detach(&buf, NULL);
-		}
-
-		if (opts.stdin_packs) {
-			pack_indexes = lines;
-			packs_nr = lines_nr;
-		}
-		if (opts.stdin_commits) {
-			commit_hex = lines;
-			commits_nr = lines_nr;
-		}
+		string_list_init(&lines, 0);
+
+		while (strbuf_getline(&buf, stdin) != EOF)
+			string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+		if (opts.stdin_packs)
+			pack_indexes = &lines;
+		if (opts.stdin_commits)
+			commit_hex = &lines;
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr,
 			   commit_hex,
-			   commits_nr,
 			   opts.append);
 
+	string_list_clear(&lines, 0);
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 05879de26c..c6070735c2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -653,10 +653,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append)
 {
 	struct packed_oid_list oids;
@@ -697,10 +695,10 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		for (i = 0; i < nr_packs; i++) {
+		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes[i]);
+			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
 			if (!p)
 				die("error adding pack %s", packname.buf);
@@ -713,12 +711,13 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	if (commit_hex) {
-		for (i = 0; i < nr_commits; i++) {
+		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+			if (commit_hex->items[i].string &&
+			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
 			result = lookup_commit_reference_gently(&oid, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 71a39c5a57..66661e1fc5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -47,10 +48,8 @@ struct commit_graph {
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append);
 
 int verify_commit_graph(struct commit_graph *g);
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 19/21] commit-graph: add '--reachable' option
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (17 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 18/21] commit-graph: use string-list API for input Derrick Stolee
@ 2018-06-04 16:52 ` Derrick Stolee
  2018-06-04 16:53 ` [PATCH v4 20/21] gc: automatically write commit-graph files Derrick Stolee
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:52 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  8 ++++++--
 builtin/commit-graph.c             | 16 ++++++++++++----
 commit-graph.c                     | 18 ++++++++++++++++++
 commit-graph.h                     |  1 +
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index d8eb8278b3..76423b3fa5 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -9,7 +9,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -24,12 +24,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int reachable;
 	int stdin_packs;
 	int stdin_commits;
 	int append;
@@ -126,6 +127,8 @@ static int graph_write(int argc, const char **argv)
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "reachable", &opts.reachable,
+			N_("start walk at all refs")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -139,11 +142,16 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
-	if (opts.stdin_packs && opts.stdin_commits)
-		die(_("cannot use both --stdin-commits and --stdin-packs"));
+	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	if (opts.reachable) {
+		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		return 0;
+	}
+
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		string_list_init(&lines, 0);
diff --git a/commit-graph.c b/commit-graph.c
index c6070735c2..946bcfa98c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -652,6 +653,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 	}
 }
 
+static int add_ref_to_list(const char *refname,
+			   const struct object_id *oid,
+			   int flags, void *cb_data)
+{
+	struct string_list *list = (struct string_list*)cb_data;
+	string_list_append(list, oid_to_hex(oid));
+	return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+	struct string_list list;
+	string_list_init(&list, 1);
+	for_each_ref(add_ref_to_list, &list);
+	write_commit_graph(obj_dir, NULL, &list, append);
+}
+
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/commit-graph.h b/commit-graph.h
index 66661e1fc5..ee20f5e280 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -47,6 +47,7 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 44d4c71f0b..ffb2ed7c95 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' '
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --reachable &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 20/21] gc: automatically write commit-graph files
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (18 preceding siblings ...)
  2018-06-04 16:52 ` [PATCH v4 19/21] commit-graph: add '--reachable' option Derrick Stolee
@ 2018-06-04 16:53 ` Derrick Stolee
  2018-06-06 11:11   ` Ævar Arnfjörð Bjarmason
  2018-06-04 16:53 ` [PATCH v4 21/21] commit-graph: update design document Derrick Stolee
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:53 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 10 +++++++++-
 Documentation/git-gc.txt |  4 ++++
 builtin/gc.c             |  6 ++++++
 t/t5318-commit-graph.sh  | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 11f027194e..d2eb3c8e9b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -900,7 +900,8 @@ the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
 core.commitGraph::
 	Enable git commit graph feature. Allows reading from the
-	commit-graph file.
+	commit-graph file. See `gc.commitGraph` for automatically
+	maintaining the file.
 
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1553,6 +1554,13 @@ gc.autoDetach::
 	Make `git gc --auto` return immediately and run in background
 	if the system supports it. Default is true.
 
+gc.commitGraph::
+	If true, then gc will rewrite the commit-graph file when
+	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+	'--auto' the commit-graph will be updated if housekeeping is
+	required. Default is false. See linkgit:git-commit-graph[1]
+	for details.
+
 gc.logExpiry::
 	If the file gc.log exists, then `git gc --auto` won't run
 	unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..a6526b3592 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,6 +119,10 @@ The optional configuration variable `gc.packRefs` determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..efd214a59f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "commit-graph.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -34,6 +35,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -121,6 +123,7 @@ static void gc_config(void)
 	git_config_get_int("gc.aggressivedepth", &aggressive_depth);
 	git_config_get_int("gc.auto", &gc_auto_threshold);
 	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+	git_config_get_bool("gc.commitgraph", &gc_commit_graph);
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_get_expiry("gc.pruneexpire", &prune_expire);
 	git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -480,6 +483,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (pack_garbage.nr > 0)
 		clean_pack_garbage();
 
+	if (gc_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 0);
+
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ffb2ed7c95..b24e8b6689 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit --allow-empty -m "blank" &&
+	git commit-graph write --reachable &&
+	cp $objdir/info/commit-graph commit-graph-before-gc &&
+	git reset --hard HEAD~1 &&
+	git config gc.commitGraph true &&
+	git gc &&
+	cp $objdir/info/commit-graph commit-graph-after-gc &&
+	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	git commit-graph write --reachable &&
+	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v4 21/21] commit-graph: update design document
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (19 preceding siblings ...)
  2018-06-04 16:53 ` [PATCH v4 20/21] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-04 16:53 ` Derrick Stolee
  2018-06-04 17:02 ` [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 16:53 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
     - 'log --topo-order'
     - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (20 preceding siblings ...)
  2018-06-04 16:53 ` [PATCH v4 21/21] commit-graph: update design document Derrick Stolee
@ 2018-06-04 17:02 ` Derrick Stolee
  2018-06-05 14:51 ` Ævar Arnfjörð Bjarmason
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
  23 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-04 17:02 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: jnareb, avarab, marten.agren, gitster

Sorry I forgot to --in-reply-to the previous version [1]

[1] 
https://public-inbox.org/git/20180524162504.158394-1-dstolee@microsoft.com/T/#u

On 6/4/2018 12:52 PM, Derrick Stolee wrote:
> Thanks for the feedback on v3. There were several small cleanups, but
> perhaps the biggest change is the addition of "commit-graph: use
> string-list API for input" which makes "commit-graph: add '--reachable'
> option" much simpler.
>
> The inter-diff is still reasonably large, but I'll send it in a
> follow-up PR.

s/PR/message

Here is that diff:


diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9a3abd87e7..d2eb3c8e9b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -900,7 +900,8 @@ the `GIT_NOTES_REF` environment variable.  See 
linkgit:git-notes[1].

  core.commitGraph::
         Enable git commit graph feature. Allows reading from the
-       commit-graph file.
+       commit-graph file. See `gc.commitGraph` for automatically
+       maintaining the file.

  core.sparseCheckout::
         Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1554,10 +1555,11 @@ gc.autoDetach::
         if the system supports it. Default is true.

  gc.commitGraph::
-       If true, then gc will rewrite the commit-graph file after any
-       change to the object database. If '--auto' is used, then the
-       commit-graph will not be updated unless the threshold is met.
-       See linkgit:git-commit-graph[1] for details.
+       If true, then gc will rewrite the commit-graph file when
+       linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+       '--auto' the commit-graph will be updated if housekeeping is
+       required. Default is false. See linkgit:git-commit-graph[1]
+       for details.

  gc.logExpiry::
         If the file gc.log exists, then `git gc --auto` won't run
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 17dd654a59..a6526b3592 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,9 +119,9 @@ The optional configuration variable `gc.packRefs` 
determines if
  it within all non-bare repos or it can be set to a boolean value.
  This defaults to true.

-The optional configuration variable 'gc.commitGraph' determines if
-'git gc' runs 'git commit-graph write'. This can be set to a boolean
-value. This defaults to false.
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.

  The optional configuration variable `gc.aggressiveWindow` controls how
  much time is spent optimizing the delta compression of the objects in
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 20ce6437ae..76423b3fa5 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -39,7 +39,7 @@ static struct opts_commit_graph {

  static int graph_verify(int argc, const char **argv)
  {
-       struct commit_graph *graph = 0;
+       struct commit_graph *graph = NULL;
         char *graph_name;

         static struct option builtin_commit_graph_verify_options[] = {
@@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv)

  static int graph_write(int argc, const char **argv)
  {
-       const char **pack_indexes = NULL;
-       int packs_nr = 0;
-       const char **commit_hex = NULL;
-       int commits_nr = 0;
-       const char **lines = NULL;
-       int lines_nr = 0;
-       int lines_alloc = 0;
+       struct string_list *pack_indexes = NULL;
+       struct string_list *commit_hex = NULL;
+       struct string_list lines;

         static struct option builtin_commit_graph_write_options[] = {
                 OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -158,32 +154,23 @@ static int graph_write(int argc, const char **argv)

         if (opts.stdin_packs || opts.stdin_commits) {
                 struct strbuf buf = STRBUF_INIT;
-               lines_nr = 0;
-               lines_alloc = 128;
-               ALLOC_ARRAY(lines, lines_alloc);
-
-               while (strbuf_getline(&buf, stdin) != EOF) {
-                       ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-                       lines[lines_nr++] = strbuf_detach(&buf, NULL);
-               }
-
-               if (opts.stdin_packs) {
-                       pack_indexes = lines;
-                       packs_nr = lines_nr;
-               }
-               if (opts.stdin_commits) {
-                       commit_hex = lines;
-                       commits_nr = lines_nr;
-               }
+               string_list_init(&lines, 0);
+
+               while (strbuf_getline(&buf, stdin) != EOF)
+                       string_list_append(&lines, strbuf_detach(&buf, 
NULL));
+
+               if (opts.stdin_packs)
+                       pack_indexes = &lines;
+               if (opts.stdin_commits)
+                       commit_hex = &lines;
         }

         write_commit_graph(opts.obj_dir,
                            pack_indexes,
-                          packs_nr,
                            commit_hex,
-                          commits_nr,
                            opts.append);

+       string_list_clear(&lines, 0);
         return 0;
  }

@@ -207,10 +194,10 @@ int cmd_commit_graph(int argc, const char **argv, 
const char *prefix)
                              PARSE_OPT_STOP_AT_NON_OPTION);

         if (argc > 0) {
-               if (!strcmp(argv[0], "verify"))
-                       return graph_verify(argc, argv);
                 if (!strcmp(argv[0], "read"))
                         return graph_read(argc, argv);
+               if (!strcmp(argv[0], "verify"))
+                       return graph_verify(argc, argv);
                 if (!strcmp(argv[0], "write"))
                         return graph_write(argc, argv);
         }
diff --git a/commit-graph.c b/commit-graph.c
index 057d734926..946bcfa98c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1,5 +1,6 @@
  #include "cache.h"
  #include "config.h"
+#include "dir.h"
  #include "git-compat-util.h"
  #include "lockfile.h"
  #include "pack.h"
@@ -652,42 +653,26 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
         }
  }

-struct hex_list {
-       char **hex_strs;
-       int hex_nr;
-       int hex_alloc;
-};
-
  static int add_ref_to_list(const char *refname,
                            const struct object_id *oid,
                            int flags, void *cb_data)
  {
-       struct hex_list *list = (struct hex_list*)cb_data;
-
-       ALLOC_GROW(list->hex_strs, list->hex_nr + 1, list->hex_alloc);
-       list->hex_strs[list->hex_nr] = xcalloc(GIT_MAX_HEXSZ + 1, 1);
-       strcpy(list->hex_strs[list->hex_nr], oid_to_hex(oid));
-       list->hex_nr++;
+       struct string_list *list = (struct string_list*)cb_data;
+       string_list_append(list, oid_to_hex(oid));
         return 0;
  }

  void write_commit_graph_reachable(const char *obj_dir, int append)
  {
-       struct hex_list list;
-       list.hex_nr = 0;
-       list.hex_alloc = 128;
-       ALLOC_ARRAY(list.hex_strs, list.hex_alloc);
-
+       struct string_list list;
+       string_list_init(&list, 1);
         for_each_ref(add_ref_to_list, &list);
-
-       write_commit_graph(obj_dir, NULL, 0, (const char 
**)list.hex_strs, list.hex_nr, append);
+       write_commit_graph(obj_dir, NULL, &list, append);
  }

  void write_commit_graph(const char *obj_dir,
-                       const char **pack_indexes,
-                       int nr_packs,
-                       const char **commit_hex,
-                       int nr_commits,
+                       struct string_list *pack_indexes,
+                       struct string_list *commit_hex,
                         int append)
  {
         struct packed_oid_list oids;
@@ -695,7 +680,6 @@ void write_commit_graph(const char *obj_dir,
         struct hashfile *f;
         uint32_t i, count_distinct = 0;
         char *graph_name;
-       int fd;
         struct lock_file lk = LOCK_INIT;
         uint32_t chunk_ids[5];
         uint64_t chunk_offsets[5];
@@ -729,10 +713,10 @@ void write_commit_graph(const char *obj_dir,
                 int dirlen;
                 strbuf_addf(&packname, "%s/pack/", obj_dir);
                 dirlen = packname.len;
-               for (i = 0; i < nr_packs; i++) {
+               for (i = 0; i < pack_indexes->nr; i++) {
                         struct packed_git *p;
                         strbuf_setlen(&packname, dirlen);
-                       strbuf_addstr(&packname, pack_indexes[i]);
+                       strbuf_addstr(&packname, 
pack_indexes->items[i].string);
                         p = add_packed_git(packname.buf, packname.len, 1);
                         if (!p)
                                 die("error adding pack %s", packname.buf);
@@ -745,12 +729,13 @@ void write_commit_graph(const char *obj_dir,
         }

         if (commit_hex) {
-               for (i = 0; i < nr_commits; i++) {
+               for (i = 0; i < commit_hex->nr; i++) {
                         const char *end;
                         struct object_id oid;
                         struct commit *result;

-                       if (commit_hex[i] && 
parse_oid_hex(commit_hex[i], &oid, &end))
+                       if (commit_hex->items[i].string &&
+ parse_oid_hex(commit_hex->items[i].string, &oid, &end))
                                 continue;

                         result = lookup_commit_reference_gently(&oid, 1);
@@ -809,23 +794,11 @@ void write_commit_graph(const char *obj_dir,
         compute_generation_numbers(&commits);

         graph_name = get_commit_graph_filename(obj_dir);
-       fd = hold_lock_file_for_update(&lk, graph_name, 0);
-
-       if (fd < 0) {
-               struct strbuf folder = STRBUF_INIT;
-               strbuf_addstr(&folder, graph_name);
-               strbuf_setlen(&folder, strrchr(folder.buf, '/') - 
folder.buf);
-
-               if (mkdir(folder.buf, 0777) < 0)
-                       die_errno(_("cannot mkdir %s"), folder.buf);
-               strbuf_release(&folder);
-
-               fd = hold_lock_file_for_update(&lk, graph_name, 
LOCK_DIE_ON_ERROR);
-
-               if (fd < 0)
-                       die_errno("unable to create '%s'", graph_name);
-       }
+       if (safe_create_leading_directories(graph_name))
+               die_errno(_("unable to create leading directories of %s"),
+                         graph_name);

+       hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
         f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);

         hashwrite_be32(f, GRAPH_SIGNATURE);
@@ -879,21 +852,22 @@ static int verify_commit_graph_error;
  static void graph_report(const char *fmt, ...)
  {
         va_list ap;
-       struct strbuf sb = STRBUF_INIT;
         verify_commit_graph_error = 1;

         va_start(ap, fmt);
-       strbuf_vaddf(&sb, fmt, ap);
-
-       fprintf(stderr, "%s\n", sb.buf);
-       strbuf_release(&sb);
+       vfprintf(stderr, fmt, ap);
+       fprintf(stderr, "\n");
         va_end(ap);
  }

+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
  int verify_commit_graph(struct commit_graph *g)
  {
         uint32_t i, cur_fanout_pos = 0;
         struct object_id prev_oid, cur_oid, checksum;
+       int generation_zero = 0;
         struct hashfile *f;
         int devnull;

@@ -1012,6 +986,18 @@ int verify_commit_graph(struct commit_graph *g)
                         graph_report("commit-graph parent list for 
commit %s terminates early",
                                      oid_to_hex(&cur_oid));

+               if (!graph_commit->generation) {
+                       if (generation_zero == GENERATION_NUMBER_EXISTS)
+                               graph_report("commit-graph has 
generation number zero for commit %s, but non-zero elsewhere",
+ oid_to_hex(&cur_oid));
+                       generation_zero = GENERATION_ZERO_EXISTS;
+               } else if (generation_zero == GENERATION_ZERO_EXISTS)
+                       graph_report("commit-graph has non-zero 
generation number for commit %s, but zero elsewhere",
+                                    oid_to_hex(&cur_oid));
+
+               if (generation_zero == GENERATION_ZERO_EXISTS)
+                       continue;
+
                 /*
                  * If one of our parents has generation 
GENERATION_NUMBER_MAX, then
                  * our generation is also GENERATION_NUMBER_MAX. 
Decrement to avoid
diff --git a/commit-graph.h b/commit-graph.h
index 9a06a5f188..ee20f5e280 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
  #define COMMIT_GRAPH_H

  #include "git-compat-util.h"
+#include "string-list.h"

  char *get_commit_graph_filename(const char *obj_dir);

@@ -48,10 +49,8 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file);

  void write_commit_graph_reachable(const char *obj_dir, int append);
  void write_commit_graph(const char *obj_dir,
-                       const char **pack_indexes,
-                       int nr_packs,
-                       const char **commit_hex,
-                       int nr_commits,
+                       struct string_list *pack_indexes,
+                       struct string_list *commit_hex,
                         int append);

  int verify_commit_graph(struct commit_graph *g);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d20b17586f..b24e8b6689 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
  '

  graph_git_two_modes() {
-       git -c core.graph=true $1 >output
-       git -c core.graph=false $1 >expect
+       git -c core.commitGraph=true $1 >output
+       git -c core.commitGraph=false $1 >expect
         test_cmp output expect
  }

@@ -245,7 +245,7 @@ test_expect_success 'perform fast-forward merge in 
full repo' '
         test_cmp expect output
  '

-test_expect_success 'check that gc clears commit-graph' '
+test_expect_success 'check that gc computes commit-graph' '
         cd "$TRASH_DIRECTORY/full" &&
         git commit --allow-empty -m "blank" &&
         git commit-graph write --reachable &&
@@ -281,29 +281,29 @@ GRAPH_CHUNK_LOOKUP_OFFSET=8
  GRAPH_CHUNK_LOOKUP_WIDTH=12
  GRAPH_CHUNK_LOOKUP_ROWS=5
  GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
-GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
-                             1 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
-GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
-                               2 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
-GRAPH_FANOUT_OFFSET=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
-                         $GRAPH_CHUNK_LOOKUP_WIDTH \* 
$GRAPH_CHUNK_LOOKUP_ROWS`
-GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
-GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
-GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
-GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN 
\* 8`
-GRAPH_BYTE_OID_LOOKUP_MISSING=`expr $GRAPH_OID_LOOKUP_OFFSET + 
$HASH_LEN \* 4 + 10`
-GRAPH_COMMIT_DATA_OFFSET=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS`
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                           1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                            2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+                      $GRAPH_CHUNK_LOOKUP_WIDTH \* 
$GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN 
\* 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS))
  GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
-GRAPH_BYTE_COMMIT_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN`
-GRAPH_BYTE_COMMIT_EXTRA_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN + 4`
-GRAPH_BYTE_COMMIT_WRONG_PARENT=`expr $GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN + 3`
-GRAPH_BYTE_COMMIT_GENERATION=`expr $GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN + 8`
-GRAPH_BYTE_COMMIT_DATE=`expr $GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12`
-GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
-GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
-                               $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
-GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
-GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* 
$NUM_OCTOPUS_EDGES`
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN 
+ 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN 
+ 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 
11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+                            $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES))

  # usage: corrupt_graph_and_verify <position> <data> <string>
  # Manipulates the commit-graph file at the position
@@ -320,7 +320,7 @@ corrupt_graph_and_verify() {
         printf "$data" | dd of="$objdir/info/commit-graph" bs=1 
seek="$pos" conv=notrunc &&
         test_must_fail git commit-graph verify 2>test_err &&
         grep -v "^+" test_err >err
-       grep "$grepstr" err
+       test_i18ngrep "$grepstr" err
  }

  test_expect_success 'detect bad signature' '
@@ -338,9 +338,9 @@ test_expect_success 'detect bad hash version' '
                 "hash version"
  '

-test_expect_success 'detect bad chunk count' '
+test_expect_success 'detect low chunk count' '
         corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
-               "missing the Commit Data chunk"
+               "missing the .* chunk"
  '

  test_expect_success 'detect missing OID fanout chunk' '
@@ -363,7 +363,7 @@ test_expect_success 'detect incorrect fanout' '
                 "fanout value"
  '

-test_expect_success 'detect incorrect fanout' '
+test_expect_success 'detect incorrect fanout final value' '
         corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
                 "fanout value"
  '
@@ -393,14 +393,19 @@ test_expect_success 'detect extra parent int-id' '
                 "is too long"
  '

-test_expect_success 'detect incorrect tree OID' '
+test_expect_success 'detect wrong parent' '
         corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
                 "commit-graph parent for"
  '

+test_expect_success 'detect incorrect generation number' '
+       corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+               "generation for commit"
+'
+
  test_expect_success 'detect incorrect generation number' '
         corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
-               "generation"
+               "non-zero generation number"
  '

  test_expect_success 'detect incorrect commit date' '


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (21 preceding siblings ...)
  2018-06-04 17:02 ` [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
@ 2018-06-05 14:51 ` Ævar Arnfjörð Bjarmason
  2018-06-05 16:37   ` Derrick Stolee
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
  23 siblings, 1 reply; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-05 14:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, stolee, jnareb, marten.agren, gitster

On Mon, Jun 4, 2018 at 6:52 PM, Derrick Stolee <dstolee@microsoft.com> wrote:
> Thanks for the feedback on v3. There were several small cleanups, but
> perhaps the biggest change is the addition of "commit-graph: use
> string-list API for input" which makes "commit-graph: add '--reachable'
> option" much simpler.

Do you have a version of this pushed somewhere? Your fsck/upstream
branch conflicts with e.g. long-since-merged nd/repack-keep-pack.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-05 14:51 ` Ævar Arnfjörð Bjarmason
@ 2018-06-05 16:37   ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-05 16:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee
  Cc: git, jnareb, marten.agren, gitster

On 6/5/2018 10:51 AM, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Jun 4, 2018 at 6:52 PM, Derrick Stolee <dstolee@microsoft.com> wrote:
>> Thanks for the feedback on v3. There were several small cleanups, but
>> perhaps the biggest change is the addition of "commit-graph: use
>> string-list API for input" which makes "commit-graph: add '--reachable'
>> option" much simpler.
> Do you have a version of this pushed somewhere? Your fsck/upstream
> branch conflicts with e.g. long-since-merged nd/repack-keep-pack.

Sorry I forgot to push. It is now available on GitHub [1]. I based my 
commits on 'next' including the core.commitGraph fix for 
t5318-commit-graph.sh I already sent [2].

[1] https://github.com/derrickstolee/git/tree/fsck/upstream
     fsck/upstream branch matches this patch series

[2] 
https://public-inbox.org/git/20180604123906.136417-1-dstolee@microsoft.com/
     [PATCH] t5318-commit-graph.sh: use core.commitGraph

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 17/21] fsck: verify commit-graph
  2018-06-04 16:52 ` [PATCH v4 17/21] fsck: verify commit-graph Derrick Stolee
@ 2018-06-06 11:08   ` Ævar Arnfjörð Bjarmason
  2018-06-06 11:31     ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-06 11:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git\, stolee\, jnareb\, marten.agren\, gitster\


On Mon, Jun 04 2018, Derrick Stolee wrote:

> +		prepare_alt_odb();
> +		for (alt = alt_odb_list; alt; alt = alt->next) {
> +			verify_argv[2] = "--object-dir";
> +			verify_argv[3] = alt->path;
> +			if (run_command(&commit_graph_verify))
> +				errors_found |= ERROR_COMMIT_GRAPH;
> +		}
> +	}
> +

This doesn't compile under clang on master. It needs to account for
0b20903405 ("sha1_file: add repository argument to prepare_alt_odb",
2018-03-23).

    builtin/fsck.c:837:19: error: too few arguments to function call, single argument 'r' was not specified
                    prepare_alt_odb();
                    ~~~~~~~~~~~~~~~ ^

Ditto this error due to a missing resolution with 031dc927f4
("object-store: move alt_odb_list and alt_odb_tail to object store",
2018-03-23):

    builtin/fsck.c:838:14: error: use of undeclared identifier 'alt_odb_list'
                for (alt = alt_odb_list; alt; alt = alt->next) {

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 20/21] gc: automatically write commit-graph files
  2018-06-04 16:53 ` [PATCH v4 20/21] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-06 11:11   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-06 11:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git\, stolee\, jnareb\, marten.agren\, gitster\


On Mon, Jun 04 2018, Derrick Stolee wrote:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 11f027194e..d2eb3c8e9b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -900,7 +900,8 @@ the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>
>  core.commitGraph::
>  	Enable git commit graph feature. Allows reading from the
> -	commit-graph file.
> +	commit-graph file. See `gc.commitGraph` for automatically
> +	maintaining the file.
>
>  core.sparseCheckout::
>  	Enable "sparse checkout" feature. See section "Sparse checkout" in
> @@ -1553,6 +1554,13 @@ gc.autoDetach::
>  	Make `git gc --auto` return immediately and run in background
>  	if the system supports it. Default is true.
>
> +gc.commitGraph::
> +	If true, then gc will rewrite the commit-graph file when
> +	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
> +	'--auto' the commit-graph will be updated if housekeeping is
> +	required. Default is false. See linkgit:git-commit-graph[1]
> +	for details.
> +
>  gc.logExpiry::
>  	If the file gc.log exists, then `git gc --auto` won't run
>  	unless that file is more than 'gc.logExpiry' old.  Default is

Re my last E-Mail in
https://public-inbox.org/git/87k1rcyxxj.fsf@evledraar.gmail.com/
88258f58-c28f-0f74-a378-003704c41117@gmail.com and your reply in
https://public-inbox.org/git/88258f58-c28f-0f74-a378-003704c41117@gmail.com/
it seems what you've submitted here is the old version. E.g. this hunk
fails to apply to master because of the conflict with
gc.bigPackThreshold, which was merged in 30b015bffe ("Merge branch
'nd/repack-keep-pack'", 2018-05-23) .

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v4 17/21] fsck: verify commit-graph
  2018-06-06 11:08   ` Ævar Arnfjörð Bjarmason
@ 2018-06-06 11:31     ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee
  Cc: git, jnareb, marten.agren, gitster

On 6/6/2018 7:08 AM, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Jun 04 2018, Derrick Stolee wrote:
>
>> +		prepare_alt_odb();
>> +		for (alt = alt_odb_list; alt; alt = alt->next) {
>> +			verify_argv[2] = "--object-dir";
>> +			verify_argv[3] = alt->path;
>> +			if (run_command(&commit_graph_verify))
>> +				errors_found |= ERROR_COMMIT_GRAPH;
>> +		}
>> +	}
>> +
> This doesn't compile under clang on master. It needs to account for
> 0b20903405 ("sha1_file: add repository argument to prepare_alt_odb",
> 2018-03-23).
>
>      builtin/fsck.c:837:19: error: too few arguments to function call, single argument 'r' was not specified
>                      prepare_alt_odb();
>                      ~~~~~~~~~~~~~~~ ^
>
> Ditto this error due to a missing resolution with 031dc927f4
> ("object-store: move alt_odb_list and alt_odb_tail to object store",
> 2018-03-23):
>
>      builtin/fsck.c:838:14: error: use of undeclared identifier 'alt_odb_list'
>                  for (alt = alt_odb_list; alt; alt = alt->next) {

Thanks, Ævar. I forgot to rebase onto 'next'. Doing so now and will send 
v5 shortly.

-Stolee

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                   ` (22 preceding siblings ...)
  2018-06-05 14:51 ` Ævar Arnfjörð Bjarmason
@ 2018-06-06 11:36 ` " Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 01/21] commit-graph: UNLEAK before die() Derrick Stolee
                     ` (21 more replies)
  23 siblings, 22 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Thanks, Ævar, for pointing out that I forgot to rebase onto 'next'.

There were a few collisions with the object-store refactoring. Junio
even pointed them out, so I'm sorry to forget that. I also did a test
merge with 'pu' and it seems the only conflicts are with the earlier
version of this patch.

Thanks,
-Stolee

Derrick Stolee (21):
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: use string-list API for input
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt                 |  10 +-
 Documentation/git-commit-graph.txt       |  14 +-
 Documentation/git-fsck.txt               |   3 +
 Documentation/git-gc.txt                 |   4 +
 Documentation/technical/commit-graph.txt |  22 --
 builtin/commit-graph.c                   |  98 ++++++---
 builtin/fsck.c                           |  21 ++
 builtin/gc.c                             |   6 +
 commit-graph.c                           | 248 +++++++++++++++++++++--
 commit-graph.h                           |  10 +-
 commit.c                                 |   9 +-
 commit.h                                 |   1 +
 t/t5318-commit-graph.sh                  | 201 ++++++++++++++++++
 13 files changed, 569 insertions(+), 78 deletions(-)

-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 01/21] commit-graph: UNLEAK before die()
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
                     ` (20 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph)
+	if (!graph) {
+		UNLEAK(graph_name);
 		die("graph file %s does not exist", graph_name);
+	}
+
 	FREE_AND_NULL(graph_name);
 
 	printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 02/21] commit-graph: fix GRAPH_MIN_SIZE
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 01/21] commit-graph: UNLEAK before die() Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 04/21] commit: force commit to parse from object database Derrick Stolee
                     ` (18 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b63a1fc85e..f83f6d2373 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,10 +35,11 @@
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-			GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 03/21] commit-graph: parse commit from chosen graph
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 01/21] commit-graph: UNLEAK before die() Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
                     ` (19 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f83f6d2373..e77b19971d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
 	uint32_t pos;
 
@@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item)
 		return 0;
 	if (item->object.parsed)
 		return 1;
+
+	if (find_commit_in_graph(item, g, &pos))
+		return fill_commit_in_graph(item, g, pos);
+
+	return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+
 	prepare_commit_graph();
-	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-		return fill_commit_in_graph(item, commit_graph, pos);
+	if (commit_graph)
+		return parse_commit_in_graph_one(commit_graph, item);
 	return 0;
 }
 
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 04/21] commit: force commit to parse from object database
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (2 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
                     ` (17 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 9 +++++++--
 commit.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 298ad747c6..922bb68741 100644
--- a/commit.c
+++ b/commit.c
@@ -405,7 +405,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph)
 {
 	enum object_type type;
 	void *buffer;
@@ -416,7 +416,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (parse_commit_in_graph(item))
+	if (use_commit_graph && parse_commit_in_graph(item))
 		return 0;
 	buffer = read_object_file(&item->object.oid, &type, &size);
 	if (!buffer)
@@ -437,6 +437,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 	return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+	return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (parse_commit(item))
diff --git a/commit.h b/commit.h
index cb943013d0..4065fd12ac 100644
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 05/21] commit-graph: load a root tree from specific graph
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (3 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 04/21] commit: force commit to parse from object database Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
                     ` (16 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e77b19971d..9e228d3bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
 	return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+						 const struct commit *c)
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
 	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
+
+	return load_tree_for_commit(g, (struct commit *)c);
+}
 
-	return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+	return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 06/21] commit-graph: add 'verify' subcommand
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (4 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
                     ` (15 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  6 +++++
 builtin/commit-graph.c             | 38 ++++++++++++++++++++++++++++++
 commit-graph.c                     | 23 ++++++++++++++++++
 commit-graph.h                     |  2 ++
 t/t5318-commit-graph.sh            | 10 ++++++++
 5 files changed, 79 insertions(+)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git commit-graph read' [--object-dir <dir>]
+'git commit-graph verify' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 --------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..3079cde6f9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,10 +8,16 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
+	N_("git commit-graph verify [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+	N_("git commit-graph verify [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	NULL
@@ -29,6 +35,36 @@ static struct opts_commit_graph {
 	int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+	struct commit_graph *graph = NULL;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_verify_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			   N_("dir"),
+			   N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_verify_options,
+			     builtin_commit_graph_verify_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+	FREE_AND_NULL(graph_name);
+
+	if (!graph)
+		return 0;
+
+	return verify_commit_graph(graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
@@ -165,6 +201,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
+		if (!strcmp(argv[0], "verify"))
+			return graph_verify(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index 9e228d3bb5..432920ad2a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir,
 	oids.alloc = 0;
 	oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_commit_graph_error = 1;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
+int verify_commit_graph(struct commit_graph *g)
+{
+	if (!g) {
+		graph_report("no commit-graph file loaded");
+		return 1;
+	}
+
+	return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..71a39c5a57 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -53,4 +53,6 @@ void write_commit_graph(const char *obj_dir,
 			int nr_commits,
 			int append);
 
+int verify_commit_graph(struct commit_graph *g);
+
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 59d0be2877..0830ef9fdd 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
 	objdir=".git/objects"
 '
 
+test_expect_success 'verify graph with no graph file' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify
+'
+
 test_expect_success 'write graph with no packs' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write --object-dir . &&
@@ -230,4 +235,9 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'git commit-graph verify' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify >output
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 07/21] commit-graph: verify catches corrupt signature
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (5 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 08/21] commit-graph: verify required chunks are present Derrick Stolee
                     ` (14 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 43 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0830ef9fdd..c0c1ff09b9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
 	git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify <position> <data> <string>
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+	pos=$1
+	data="${2:-\0}"
+	grepstr=$3
+	cd "$TRASH_DIRECTORY/full" &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+	test_must_fail git commit-graph verify 2>test_err &&
+	grep -v "^+" test_err >err
+	test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+	corrupt_graph_and_verify 0 "\0" \
+		"graph signature"
+'
+
+test_expect_success 'detect bad version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+		"hash version"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 08/21] commit-graph: verify required chunks are present
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (6 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 12:21     ` Ævar Arnfjörð Bjarmason
  2018-06-06 11:36   ` [PATCH v5 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
                     ` (13 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          |  9 +++++++++
 t/t5318-commit-graph.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 432920ad2a..f41d5a0504 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,5 +848,14 @@ int verify_commit_graph(struct commit_graph *g)
 		return 1;
 	}
 
+	verify_commit_graph_error = 0;
+
+	if (!g->chunk_oid_fanout)
+		graph_report("commit-graph is missing the OID Fanout chunk");
+	if (!g->chunk_oid_lookup)
+		graph_report("commit-graph is missing the OID Lookup chunk");
+	if (!g->chunk_commit_data)
+		graph_report("commit-graph is missing the Commit Data chunk");
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1ff09b9..846396665e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			    1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			     2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
 		"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+		"missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+		"missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+		"missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 09/21] commit-graph: verify corrupt OID fanout and lookup
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (7 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 08/21] commit-graph: verify required chunks are present Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 10/21] commit-graph: verify objects exist Derrick Stolee
                     ` (12 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 36 ++++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 22 ++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f41d5a0504..d7a5b50a6c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct commit_graph *g)
 {
+	uint32_t i, cur_fanout_pos = 0;
+	struct object_id prev_oid, cur_oid;
+
 	if (!g) {
 		graph_report("no commit-graph file loaded");
 		return 1;
@@ -857,5 +860,38 @@ int verify_commit_graph(struct commit_graph *g)
 	if (!g->chunk_commit_data)
 		graph_report("commit-graph is missing the Commit Data chunk");
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+			graph_report("commit-graph has incorrect OID order: %s then %s",
+				     oid_to_hex(&prev_oid),
+				     oid_to_hex(&cur_oid));
+
+		oidcpy(&prev_oid, &cur_oid);
+
+		while (cur_oid.hash[0] > cur_fanout_pos) {
+			uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+			if (i != fanout_value)
+				graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+					     cur_fanout_pos, fanout_value, i);
+
+			cur_fanout_pos++;
+		}
+	}
+
+	while (cur_fanout_pos < 256) {
+		uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+		if (g->num_commits != fanout_value)
+			graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+				     cur_fanout_pos, fanout_value, i);
+
+		cur_fanout_pos++;
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 846396665e..c29eae47c9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			    1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			     2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+		       $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
 		"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+		"incorrect OID order"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 10/21] commit-graph: verify objects exist
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (8 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 11/21] commit-graph: verify root tree OIDs Derrick Stolee
                     ` (11 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 +++++++++++++++++
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 24 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index d7a5b50a6c..893cc2f346 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -242,6 +242,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
 	struct commit *c;
 	struct object_id oid;
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -893,5 +894,21 @@ int verify_commit_graph(struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		struct commit *odb_commit;
+
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node());
+		if (parse_commit_internal(odb_commit, 0, 0)) {
+			graph_report("failed to parse %s from object database",
+				     oid_to_hex(&cur_oid));
+			continue;
+		}
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c29eae47c9..cf60e48496 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
 		"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+		"from object database"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 11/21] commit-graph: verify root tree OIDs
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (9 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 10/21] commit-graph: verify objects exist Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 13/21] commit-graph: verify generation number Derrick Stolee
                     ` (10 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 ++++++++++++++++-
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 893cc2f346..d7e408a99d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -865,6 +865,8 @@ int verify_commit_graph(struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
+		struct commit *graph_commit;
+
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -882,6 +884,11 @@ int verify_commit_graph(struct commit_graph *g)
 
 			cur_fanout_pos++;
 		}
+
+		graph_commit = lookup_commit(&cur_oid);
+		if (!parse_commit_in_graph_one(g, graph_commit))
+			graph_report("failed to parse %s from commit-graph",
+				     oid_to_hex(&cur_oid));
 	}
 
 	while (cur_fanout_pos < 256) {
@@ -898,16 +905,24 @@ int verify_commit_graph(struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
-		struct commit *odb_commit;
+		struct commit *graph_commit, *odb_commit;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+		graph_commit = lookup_commit(&cur_oid);
 		odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node());
 		if (parse_commit_internal(odb_commit, 0, 0)) {
 			graph_report("failed to parse %s from object database",
 				     oid_to_hex(&cur_oid));
 			continue;
 		}
+
+		if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,
+			   get_commit_tree_oid(odb_commit)))
+			graph_report("root tree OID for commit %s in commit-graph is %s != %s",
+				     oid_to_hex(&cur_oid),
+				     oid_to_hex(get_commit_tree_oid(graph_commit)),
+				     oid_to_hex(get_commit_tree_oid(odb_commit)));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cf60e48496..c0c1248eda 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
 		"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+		"root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 12/21] commit-graph: verify parent list
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (11 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 13/21] commit-graph: verify generation number Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 14/21] commit-graph: verify commit date Derrick Stolee
                     ` (8 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0xFFFFFFFF to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 28 ++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 18 ++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index d7e408a99d..fcebd0925c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -243,6 +243,9 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 	struct commit *c;
 	struct object_id oid;
 
+	if (pos >= g->num_commits)
+		die("invalid parent position %"PRIu64, pos);
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -906,6 +909,7 @@ int verify_commit_graph(struct commit_graph *g)
 
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
+		struct commit_list *graph_parents, *odb_parents;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -923,6 +927,30 @@ int verify_commit_graph(struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     oid_to_hex(get_commit_tree_oid(graph_commit)),
 				     oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+		graph_parents = graph_commit->parents;
+		odb_parents = odb_commit->parents;
+
+		while (graph_parents) {
+			if (odb_parents == NULL) {
+				graph_report("commit-graph parent list for commit %s is too long",
+					     oid_to_hex(&cur_oid));
+				break;
+			}
+
+			if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
+				graph_report("commit-graph parent for %s is %s != %s",
+					     oid_to_hex(&cur_oid),
+					     oid_to_hex(&graph_parents->item->object.oid),
+					     oid_to_hex(&odb_parents->item->object.oid));
+
+			graph_parents = graph_parents->next;
+			odb_parents = odb_parents->next;
+		}
+
+		if (odb_parents != NULL)
+			graph_report("commit-graph parent list for commit %s terminates early",
+				     oid_to_hex(&cur_oid));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1248eda..ec0964112a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
 		"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+		"invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+		"is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+		"commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 13/21] commit-graph: verify generation number
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (10 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 11/21] commit-graph: verify root tree OIDs Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 12/21] commit-graph: verify parent list Derrick Stolee
                     ` (9 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 34 ++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 11 +++++++++++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index fcebd0925c..b97fa05ec9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -845,10 +845,14 @@ static void graph_report(const char *fmt, ...)
 	va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid;
+	int generation_zero = 0;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -910,6 +914,7 @@ int verify_commit_graph(struct commit_graph *g)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
+		uint32_t max_generation = 0;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -944,6 +949,9 @@ int verify_commit_graph(struct commit_graph *g)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
+			if (graph_parents->item->generation > max_generation)
+				max_generation = graph_parents->item->generation;
+
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
 		}
@@ -951,6 +959,32 @@ int verify_commit_graph(struct commit_graph *g)
 		if (odb_parents != NULL)
 			graph_report("commit-graph parent list for commit %s terminates early",
 				     oid_to_hex(&cur_oid));
+
+		if (!graph_commit->generation) {
+			if (generation_zero == GENERATION_NUMBER_EXISTS)
+				graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",
+					     oid_to_hex(&cur_oid));
+			generation_zero = GENERATION_ZERO_EXISTS;
+		} else if (generation_zero == GENERATION_ZERO_EXISTS)
+			graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",
+				     oid_to_hex(&cur_oid));
+
+		if (generation_zero == GENERATION_ZERO_EXISTS)
+			continue;
+
+		/*
+		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
+		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
+		 * extra logic in the following condition.
+		 */
+		if (max_generation == GENERATION_NUMBER_MAX)
+			max_generation--;
+
+		if (graph_commit->generation != max_generation + 1)
+			graph_report("commit-graph generation for commit %s is %u != %u",
+				     oid_to_hex(&cur_oid),
+				     graph_commit->generation,
+				     max_generation + 1);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ec0964112a..a6ea1341dc 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '
 		"commit-graph parent for"
 '
 
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+		"generation for commit"
+'
+
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+		"non-zero generation number"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 14/21] commit-graph: verify commit date
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (12 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 12/21] commit-graph: verify parent list Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
                     ` (7 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 6 ++++++
 t/t5318-commit-graph.sh | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index b97fa05ec9..d83f0ce5d5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -985,6 +985,12 @@ int verify_commit_graph(struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     graph_commit->generation,
 				     max_generation + 1);
+
+		if (graph_commit->date != odb_commit->date)
+			graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,
+				     oid_to_hex(&cur_oid),
+				     graph_commit->date,
+				     odb_commit->date);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a6ea1341dc..6a873bfda8 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
 		"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+		"commit date"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 15/21] commit-graph: test for corrupted octopus edge
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (13 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 14/21] commit-graph: verify commit date Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 16/21] commit-graph: verify contents match checksum Derrick Stolee
                     ` (6 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6a873bfda8..cf67fb391a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+			     $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
 		"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+		"invalid parent"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 16/21] commit-graph: verify contents match checksum
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (14 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 17/21] fsck: verify commit-graph Derrick Stolee
                     ` (5 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 ++++++++++++++--
 t/t5318-commit-graph.sh |  6 ++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d83f0ce5d5..0f93d5d864 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -832,6 +832,7 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -851,8 +852,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
-	struct object_id prev_oid, cur_oid;
+	struct object_id prev_oid, cur_oid, checksum;
 	int generation_zero = 0;
+	struct hashfile *f;
+	int devnull;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -871,6 +874,15 @@ int verify_commit_graph(struct commit_graph *g)
 	if (verify_commit_graph_error)
 		return verify_commit_graph_error;
 
+	devnull = open("/dev/null", O_WRONLY);
+	f = hashfd(devnull, NULL);
+	hashwrite(f, g->data, g->data_len - g->hash_len);
+	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
+		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+	}
+
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit;
 
@@ -908,7 +920,7 @@ int verify_commit_graph(struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
-	if (verify_commit_graph_error)
+	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cf67fb391a..2297a44e7d 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 			     $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
 		"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (16 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 17/21] fsck: verify commit-graph Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 12:11     ` Ævar Arnfjörð Bjarmason
  2018-06-06 11:36   ` [PATCH v5 19/21] commit-graph: add '--reachable' option Derrick Stolee
                     ` (3 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 39 +++++++++++++--------------------------
 commit-graph.c         | 15 +++++++--------
 commit-graph.h         |  7 +++----
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 3079cde6f9..d8eb8278b3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-	const char **pack_indexes = NULL;
-	int packs_nr = 0;
-	const char **commit_hex = NULL;
-	int commits_nr = 0;
-	const char **lines = NULL;
-	int lines_nr = 0;
-	int lines_alloc = 0;
+	struct string_list *pack_indexes = NULL;
+	struct string_list *commit_hex = NULL;
+	struct string_list lines;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
 
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-		lines_nr = 0;
-		lines_alloc = 128;
-		ALLOC_ARRAY(lines, lines_alloc);
-
-		while (strbuf_getline(&buf, stdin) != EOF) {
-			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-			lines[lines_nr++] = strbuf_detach(&buf, NULL);
-		}
-
-		if (opts.stdin_packs) {
-			pack_indexes = lines;
-			packs_nr = lines_nr;
-		}
-		if (opts.stdin_commits) {
-			commit_hex = lines;
-			commits_nr = lines_nr;
-		}
+		string_list_init(&lines, 0);
+
+		while (strbuf_getline(&buf, stdin) != EOF)
+			string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+		if (opts.stdin_packs)
+			pack_indexes = &lines;
+		if (opts.stdin_commits)
+			commit_hex = &lines;
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr,
 			   commit_hex,
-			   commits_nr,
 			   opts.append);
 
+	string_list_clear(&lines, 0);
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 0f93d5d864..f23bf4cf50 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -656,10 +656,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append)
 {
 	struct packed_oid_list oids;
@@ -700,10 +698,10 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		for (i = 0; i < nr_packs; i++) {
+		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes[i]);
+			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
 			if (!p)
 				die("error adding pack %s", packname.buf);
@@ -716,12 +714,13 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	if (commit_hex) {
-		for (i = 0; i < nr_commits; i++) {
+		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+			if (commit_hex->items[i].string &&
+			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
 			result = lookup_commit_reference_gently(&oid, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 71a39c5a57..66661e1fc5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -47,10 +48,8 @@ struct commit_graph {
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append);
 
 int verify_commit_graph(struct commit_graph *g);
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 17/21] fsck: verify commit-graph
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (15 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 16/21] commit-graph: verify contents match checksum Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 18/21] commit-graph: use string-list API for input Derrick Stolee
                     ` (4 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c             | 21 +++++++++++++++++++++
 t/t5318-commit-graph.sh    |  8 ++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 ---------------------
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3ad4f160f9..9fb2edc69f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	check_connectivity();
+
+	if (core_commit_graph) {
+		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
+		commit_graph_verify.argv = verify_argv;
+		commit_graph_verify.git_cmd = 1;
+
+		if (run_command(&commit_graph_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb(the_repository);
+		for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+			verify_argv[2] = "--object-dir";
+			verify_argv[3] = alt->path;
+			if (run_command(&commit_graph_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2297a44e7d..44d4c71f0b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git fsck &&
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum" &&
+	test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 19/21] commit-graph: add '--reachable' option
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (17 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 18/21] commit-graph: use string-list API for input Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 20/21] gc: automatically write commit-graph files Derrick Stolee
                     ` (2 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  8 ++++++--
 builtin/commit-graph.c             | 16 ++++++++++++----
 commit-graph.c                     | 18 ++++++++++++++++++
 commit-graph.h                     |  1 +
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index d8eb8278b3..76423b3fa5 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -9,7 +9,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -24,12 +24,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int reachable;
 	int stdin_packs;
 	int stdin_commits;
 	int append;
@@ -126,6 +127,8 @@ static int graph_write(int argc, const char **argv)
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "reachable", &opts.reachable,
+			N_("start walk at all refs")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -139,11 +142,16 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
-	if (opts.stdin_packs && opts.stdin_commits)
-		die(_("cannot use both --stdin-commits and --stdin-packs"));
+	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	if (opts.reachable) {
+		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		return 0;
+	}
+
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		string_list_init(&lines, 0);
diff --git a/commit-graph.c b/commit-graph.c
index f23bf4cf50..0d5adc8035 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -655,6 +656,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 	}
 }
 
+static int add_ref_to_list(const char *refname,
+			   const struct object_id *oid,
+			   int flags, void *cb_data)
+{
+	struct string_list *list = (struct string_list*)cb_data;
+	string_list_append(list, oid_to_hex(oid));
+	return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+	struct string_list list;
+	string_list_init(&list, 1);
+	for_each_ref(add_ref_to_list, &list);
+	write_commit_graph(obj_dir, NULL, &list, append);
+}
+
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/commit-graph.h b/commit-graph.h
index 66661e1fc5..ee20f5e280 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -47,6 +47,7 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 44d4c71f0b..ffb2ed7c95 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' '
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --reachable &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 20/21] gc: automatically write commit-graph files
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (18 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 19/21] commit-graph: add '--reachable' option Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-06 11:36   ` [PATCH v5 21/21] commit-graph: update design document Derrick Stolee
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 10 +++++++++-
 Documentation/git-gc.txt |  4 ++++
 builtin/gc.c             |  6 ++++++
 t/t5318-commit-graph.sh  | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..f2b5ed17c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -906,7 +906,8 @@ the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
 core.commitGraph::
 	Enable git commit graph feature. Allows reading from the
-	commit-graph file.
+	commit-graph file. See `gc.commitGraph` for automatically
+	maintaining the file.
 
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1647,6 +1648,13 @@ this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
+gc.commitGraph::
+	If true, then gc will rewrite the commit-graph file when
+	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+	'--auto' the commit-graph will be updated if housekeeping is
+	required. Default is false. See linkgit:git-commit-graph[1]
+	for details.
+
 gc.logExpiry::
 	If the file gc.log exists, then `git gc --auto` won't run
 	unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 24b2dd44fe..f5bc98ccb3 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..4e06e8372d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
 	git_config_get_int("gc.aggressivedepth", &aggressive_depth);
 	git_config_get_int("gc.auto", &gc_auto_threshold);
 	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+	git_config_get_bool("gc.commitgraph", &gc_commit_graph);
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_get_expiry("gc.pruneexpire", &prune_expire);
 	git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (pack_garbage.nr > 0)
 		clean_pack_garbage();
 
+	if (gc_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 0);
+
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ffb2ed7c95..b24e8b6689 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit --allow-empty -m "blank" &&
+	git commit-graph write --reachable &&
+	cp $objdir/info/commit-graph commit-graph-before-gc &&
+	git reset --hard HEAD~1 &&
+	git config gc.commitGraph true &&
+	git gc &&
+	cp $objdir/info/commit-graph commit-graph-after-gc &&
+	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	git commit-graph write --reachable &&
+	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v5 21/21] commit-graph: update design document
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (19 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 20/21] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-06 11:36   ` Derrick Stolee
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 11:36 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb, avarab, marten.agren, gitster, Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
     - 'log --topo-order'
     - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 11:36   ` [PATCH v5 18/21] commit-graph: use string-list API for input Derrick Stolee
@ 2018-06-06 12:11     ` Ævar Arnfjörð Bjarmason
  2018-06-06 12:15       ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-06 12:11 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git\, stolee\, jnareb\, marten.agren\, gitster\


On Wed, Jun 06 2018, Derrick Stolee wrote:

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit-graph.c | 39 +++++++++++++--------------------------
>  commit-graph.c         | 15 +++++++--------
>  commit-graph.h         |  7 +++----
>  3 files changed, 23 insertions(+), 38 deletions(-)

> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 3079cde6f9..d8eb8278b3 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
>
>  static int graph_write(int argc, const char **argv)
>  {
> -	const char **pack_indexes = NULL;
> -	int packs_nr = 0;
> -	const char **commit_hex = NULL;
> -	int commits_nr = 0;
> -	const char **lines = NULL;
> -	int lines_nr = 0;
> -	int lines_alloc = 0;
> +	struct string_list *pack_indexes = NULL;
> +	struct string_list *commit_hex = NULL;
> +	struct string_list lines;
>
>  	static struct option builtin_commit_graph_write_options[] = {
>  		OPT_STRING(0, "object-dir", &opts.obj_dir,
> @@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
>
>  	if (opts.stdin_packs || opts.stdin_commits) {
>  		struct strbuf buf = STRBUF_INIT;
> -		lines_nr = 0;
> -		lines_alloc = 128;
> -		ALLOC_ARRAY(lines, lines_alloc);
> -
> -		while (strbuf_getline(&buf, stdin) != EOF) {
> -			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
> -			lines[lines_nr++] = strbuf_detach(&buf, NULL);
> -		}
> -
> -		if (opts.stdin_packs) {
> -			pack_indexes = lines;
> -			packs_nr = lines_nr;
> -		}
> -		if (opts.stdin_commits) {
> -			commit_hex = lines;
> -			commits_nr = lines_nr;
> -		}
> +		string_list_init(&lines, 0);
> +
> +		while (strbuf_getline(&buf, stdin) != EOF)
> +			string_list_append(&lines, strbuf_detach(&buf, NULL));
> +
> +		if (opts.stdin_packs)
> +			pack_indexes = &lines;
> +		if (opts.stdin_commits)
> +			commit_hex = &lines;
>  	}
>
>  	write_commit_graph(opts.obj_dir,
>  			   pack_indexes,
> -			   packs_nr,
>  			   commit_hex,
> -			   commits_nr,
>  			   opts.append);
>
> +	string_list_clear(&lines, 0);
>  	return 0;
>  }

This results in an invalid free() & segfault because you're freeing
&lines which may not have been allocated by string_list_init().

Monkeypatch on top which I used to fix it:

    diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
    index 76423b3fa5..c7eb68aa3a 100644
    --- a/builtin/commit-graph.c
    +++ b/builtin/commit-graph.c
    @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
            struct string_list *pack_indexes = NULL;
            struct string_list *commit_hex = NULL;
            struct string_list lines;
    +       int free_lines = 0;

            static struct option builtin_commit_graph_write_options[] = {
                    OPT_STRING(0, "object-dir", &opts.obj_dir,
    @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
            if (opts.stdin_packs || opts.stdin_commits) {
                    struct strbuf buf = STRBUF_INIT;
                    string_list_init(&lines, 0);
    +               free_lines = 1;

                    while (strbuf_getline(&buf, stdin) != EOF)
                            string_list_append(&lines, strbuf_detach(&buf, NULL));
    @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
                               commit_hex,
                               opts.append);

    -       string_list_clear(&lines, 0);
    +       if (free_lines)
    +               string_list_clear(&lines, 0);
            return 0;
     }

But probably having a pointer to the struct which is NULL etc. is
better.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 12:11     ` Ævar Arnfjörð Bjarmason
@ 2018-06-06 12:15       ` Derrick Stolee
  2018-06-06 12:26         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 12:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Derrick Stolee
  Cc: git, jnareb, marten.agren, gitster

On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 06 2018, Derrick Stolee wrote:
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   builtin/commit-graph.c | 39 +++++++++++++--------------------------
>>   commit-graph.c         | 15 +++++++--------
>>   commit-graph.h         |  7 +++----
>>   3 files changed, 23 insertions(+), 38 deletions(-)
>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>> index 3079cde6f9..d8eb8278b3 100644
>> --- a/builtin/commit-graph.c
>> +++ b/builtin/commit-graph.c
>> @@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
>>
>>   static int graph_write(int argc, const char **argv)
>>   {
>> -	const char **pack_indexes = NULL;
>> -	int packs_nr = 0;
>> -	const char **commit_hex = NULL;
>> -	int commits_nr = 0;
>> -	const char **lines = NULL;
>> -	int lines_nr = 0;
>> -	int lines_alloc = 0;
>> +	struct string_list *pack_indexes = NULL;
>> +	struct string_list *commit_hex = NULL;
>> +	struct string_list lines;
>>
>>   	static struct option builtin_commit_graph_write_options[] = {
>>   		OPT_STRING(0, "object-dir", &opts.obj_dir,
>> @@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
>>
>>   	if (opts.stdin_packs || opts.stdin_commits) {
>>   		struct strbuf buf = STRBUF_INIT;
>> -		lines_nr = 0;
>> -		lines_alloc = 128;
>> -		ALLOC_ARRAY(lines, lines_alloc);
>> -
>> -		while (strbuf_getline(&buf, stdin) != EOF) {
>> -			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
>> -			lines[lines_nr++] = strbuf_detach(&buf, NULL);
>> -		}
>> -
>> -		if (opts.stdin_packs) {
>> -			pack_indexes = lines;
>> -			packs_nr = lines_nr;
>> -		}
>> -		if (opts.stdin_commits) {
>> -			commit_hex = lines;
>> -			commits_nr = lines_nr;
>> -		}
>> +		string_list_init(&lines, 0);
>> +
>> +		while (strbuf_getline(&buf, stdin) != EOF)
>> +			string_list_append(&lines, strbuf_detach(&buf, NULL));
>> +
>> +		if (opts.stdin_packs)
>> +			pack_indexes = &lines;
>> +		if (opts.stdin_commits)
>> +			commit_hex = &lines;
>>   	}
>>
>>   	write_commit_graph(opts.obj_dir,
>>   			   pack_indexes,
>> -			   packs_nr,
>>   			   commit_hex,
>> -			   commits_nr,
>>   			   opts.append);
>>
>> +	string_list_clear(&lines, 0);
>>   	return 0;
>>   }
> This results in an invalid free() & segfault because you're freeing
> &lines which may not have been allocated by string_list_init().

Good point. Did my tests not catch this? (seems it requires calling `git 
commit-graph write` with no `--stdin-packs` or `--stdin-commits`).

>
> Monkeypatch on top which I used to fix it:
>
>      diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>      index 76423b3fa5..c7eb68aa3a 100644
>      --- a/builtin/commit-graph.c
>      +++ b/builtin/commit-graph.c
>      @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
>              struct string_list *pack_indexes = NULL;
>              struct string_list *commit_hex = NULL;
>              struct string_list lines;
>      +       int free_lines = 0;
>
>              static struct option builtin_commit_graph_write_options[] = {
>                      OPT_STRING(0, "object-dir", &opts.obj_dir,
>      @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
>              if (opts.stdin_packs || opts.stdin_commits) {
>                      struct strbuf buf = STRBUF_INIT;
>                      string_list_init(&lines, 0);
>      +               free_lines = 1;
>
>                      while (strbuf_getline(&buf, stdin) != EOF)
>                              string_list_append(&lines, strbuf_detach(&buf, NULL));
>      @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
>                                 commit_hex,
>                                 opts.append);
>
>      -       string_list_clear(&lines, 0);
>      +       if (free_lines)
>      +               string_list_clear(&lines, 0);
>              return 0;
>       }
>
> But probably having a pointer to the struct which is NULL etc. is
> better.

Wouldn't the easiest fix be to call `string_list_init(&lines, 0)` 
outside of any conditional?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 08/21] commit-graph: verify required chunks are present
  2018-06-06 11:36   ` [PATCH v5 08/21] commit-graph: verify required chunks are present Derrick Stolee
@ 2018-06-06 12:21     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-06 12:21 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git\, stolee\, jnareb\, marten.agren\, gitster\


On Wed, Jun 06 2018, Derrick Stolee wrote:

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index c0c1ff09b9..846396665e 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
>
>  GRAPH_BYTE_VERSION=4
>  GRAPH_BYTE_HASH=5
> +GRAPH_BYTE_CHUNK_COUNT=6
> +GRAPH_CHUNK_LOOKUP_OFFSET=8
> +GRAPH_CHUNK_LOOKUP_WIDTH=12
> +GRAPH_CHUNK_LOOKUP_ROWS=5
> +GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
> +GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
> +			    1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))

On GNU bash, version 4.1.2(2)-release (x86_64-redhat-linux-gnu) this
emits:

./t5318-commit-graph.sh: line 285: 8 +                      1 \* 12: syntax error: invalid arithmetic operator (error token is "\* 12")

The same goes for the rest of this "\*" within $((...)) in this file, it
should just be "*"..

> +GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
> +			     2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 12:15       ` Derrick Stolee
@ 2018-06-06 12:26         ` Ævar Arnfjörð Bjarmason
  2018-06-06 12:45           ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-06 12:26 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee, git\, jnareb\, marten.agren\, gitster\


On Wed, Jun 06 2018, Derrick Stolee wrote:

> On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Jun 06 2018, Derrick Stolee wrote:
>>
>>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>>> ---
>>>   builtin/commit-graph.c | 39 +++++++++++++--------------------------
>>>   commit-graph.c         | 15 +++++++--------
>>>   commit-graph.h         |  7 +++----
>>>   3 files changed, 23 insertions(+), 38 deletions(-)
>>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>> index 3079cde6f9..d8eb8278b3 100644
>>> --- a/builtin/commit-graph.c
>>> +++ b/builtin/commit-graph.c
>>> @@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
>>>
>>>   static int graph_write(int argc, const char **argv)
>>>   {
>>> -	const char **pack_indexes = NULL;
>>> -	int packs_nr = 0;
>>> -	const char **commit_hex = NULL;
>>> -	int commits_nr = 0;
>>> -	const char **lines = NULL;
>>> -	int lines_nr = 0;
>>> -	int lines_alloc = 0;
>>> +	struct string_list *pack_indexes = NULL;
>>> +	struct string_list *commit_hex = NULL;
>>> +	struct string_list lines;
>>>
>>>   	static struct option builtin_commit_graph_write_options[] = {
>>>   		OPT_STRING(0, "object-dir", &opts.obj_dir,
>>> @@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
>>>
>>>   	if (opts.stdin_packs || opts.stdin_commits) {
>>>   		struct strbuf buf = STRBUF_INIT;
>>> -		lines_nr = 0;
>>> -		lines_alloc = 128;
>>> -		ALLOC_ARRAY(lines, lines_alloc);
>>> -
>>> -		while (strbuf_getline(&buf, stdin) != EOF) {
>>> -			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
>>> -			lines[lines_nr++] = strbuf_detach(&buf, NULL);
>>> -		}
>>> -
>>> -		if (opts.stdin_packs) {
>>> -			pack_indexes = lines;
>>> -			packs_nr = lines_nr;
>>> -		}
>>> -		if (opts.stdin_commits) {
>>> -			commit_hex = lines;
>>> -			commits_nr = lines_nr;
>>> -		}
>>> +		string_list_init(&lines, 0);
>>> +
>>> +		while (strbuf_getline(&buf, stdin) != EOF)
>>> +			string_list_append(&lines, strbuf_detach(&buf, NULL));
>>> +
>>> +		if (opts.stdin_packs)
>>> +			pack_indexes = &lines;
>>> +		if (opts.stdin_commits)
>>> +			commit_hex = &lines;
>>>   	}
>>>
>>>   	write_commit_graph(opts.obj_dir,
>>>   			   pack_indexes,
>>> -			   packs_nr,
>>>   			   commit_hex,
>>> -			   commits_nr,
>>>   			   opts.append);
>>>
>>> +	string_list_clear(&lines, 0);
>>>   	return 0;
>>>   }
>> This results in an invalid free() & segfault because you're freeing
>> &lines which may not have been allocated by string_list_init().
>
> Good point. Did my tests not catch this? (seems it requires calling
> `git commit-graph write` with no `--stdin-packs` or
> `--stdin-commits`).

Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
you're on a more forgiving compiler/platform/options. I compiled with
-O0 -g on clang 4.0.1-8 + Debian testing.

>>
>> Monkeypatch on top which I used to fix it:
>>
>>      diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>      index 76423b3fa5..c7eb68aa3a 100644
>>      --- a/builtin/commit-graph.c
>>      +++ b/builtin/commit-graph.c
>>      @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
>>              struct string_list *pack_indexes = NULL;
>>              struct string_list *commit_hex = NULL;
>>              struct string_list lines;
>>      +       int free_lines = 0;
>>
>>              static struct option builtin_commit_graph_write_options[] = {
>>                      OPT_STRING(0, "object-dir", &opts.obj_dir,
>>      @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
>>              if (opts.stdin_packs || opts.stdin_commits) {
>>                      struct strbuf buf = STRBUF_INIT;
>>                      string_list_init(&lines, 0);
>>      +               free_lines = 1;
>>
>>                      while (strbuf_getline(&buf, stdin) != EOF)
>>                              string_list_append(&lines, strbuf_detach(&buf, NULL));
>>      @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
>>                                 commit_hex,
>>                                 opts.append);
>>
>>      -       string_list_clear(&lines, 0);
>>      +       if (free_lines)
>>      +               string_list_clear(&lines, 0);
>>              return 0;
>>       }
>>
>> But probably having a pointer to the struct which is NULL etc. is
>> better.
>
> Wouldn't the easiest fix be to call `string_list_init(&lines, 0)`
> outside of any conditional?

Sure that works too. We'd be doing the init when we don't need it, but
it's not like this part is performance critical or anything...

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 12:26         ` Ævar Arnfjörð Bjarmason
@ 2018-06-06 12:45           ` Derrick Stolee
  2018-06-08 12:48             ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-06 12:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, git, jnareb, marten.agren, gitster

On 6/6/2018 8:26 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 06 2018, Derrick Stolee wrote:
>
>> On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Wed, Jun 06 2018, Derrick Stolee wrote:
>>>
>>>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>>>> ---
>>>>    builtin/commit-graph.c | 39 +++++++++++++--------------------------
>>>>    commit-graph.c         | 15 +++++++--------
>>>>    commit-graph.h         |  7 +++----
>>>>    3 files changed, 23 insertions(+), 38 deletions(-)
>>>> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>>> index 3079cde6f9..d8eb8278b3 100644
>>>> --- a/builtin/commit-graph.c
>>>> +++ b/builtin/commit-graph.c
>>>> @@ -118,13 +118,9 @@ static int graph_read(int argc, const char **argv)
>>>>
>>>>    static int graph_write(int argc, const char **argv)
>>>>    {
>>>> -	const char **pack_indexes = NULL;
>>>> -	int packs_nr = 0;
>>>> -	const char **commit_hex = NULL;
>>>> -	int commits_nr = 0;
>>>> -	const char **lines = NULL;
>>>> -	int lines_nr = 0;
>>>> -	int lines_alloc = 0;
>>>> +	struct string_list *pack_indexes = NULL;
>>>> +	struct string_list *commit_hex = NULL;
>>>> +	struct string_list lines;
>>>>
>>>>    	static struct option builtin_commit_graph_write_options[] = {
>>>>    		OPT_STRING(0, "object-dir", &opts.obj_dir,
>>>> @@ -150,32 +146,23 @@ static int graph_write(int argc, const char **argv)
>>>>
>>>>    	if (opts.stdin_packs || opts.stdin_commits) {
>>>>    		struct strbuf buf = STRBUF_INIT;
>>>> -		lines_nr = 0;
>>>> -		lines_alloc = 128;
>>>> -		ALLOC_ARRAY(lines, lines_alloc);
>>>> -
>>>> -		while (strbuf_getline(&buf, stdin) != EOF) {
>>>> -			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
>>>> -			lines[lines_nr++] = strbuf_detach(&buf, NULL);
>>>> -		}
>>>> -
>>>> -		if (opts.stdin_packs) {
>>>> -			pack_indexes = lines;
>>>> -			packs_nr = lines_nr;
>>>> -		}
>>>> -		if (opts.stdin_commits) {
>>>> -			commit_hex = lines;
>>>> -			commits_nr = lines_nr;
>>>> -		}
>>>> +		string_list_init(&lines, 0);
>>>> +
>>>> +		while (strbuf_getline(&buf, stdin) != EOF)
>>>> +			string_list_append(&lines, strbuf_detach(&buf, NULL));
>>>> +
>>>> +		if (opts.stdin_packs)
>>>> +			pack_indexes = &lines;
>>>> +		if (opts.stdin_commits)
>>>> +			commit_hex = &lines;
>>>>    	}
>>>>
>>>>    	write_commit_graph(opts.obj_dir,
>>>>    			   pack_indexes,
>>>> -			   packs_nr,
>>>>    			   commit_hex,
>>>> -			   commits_nr,
>>>>    			   opts.append);
>>>>
>>>> +	string_list_clear(&lines, 0);
>>>>    	return 0;
>>>>    }
>>> This results in an invalid free() & segfault because you're freeing
>>> &lines which may not have been allocated by string_list_init().
>> Good point. Did my tests not catch this? (seems it requires calling
>> `git commit-graph write` with no `--stdin-packs` or
>> `--stdin-commits`).
> Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
> you're on a more forgiving compiler/platform/options. I compiled with
> -O0 -g on clang 4.0.1-8 + Debian testing.

I appreciate the extra platform testing. I'm using GCC on Ubuntu (gcc 
(Ubuntu 7.3.0-16ubuntu3) 7.3.0).

>
>>> Monkeypatch on top which I used to fix it:
>>>
>>>       diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
>>>       index 76423b3fa5..c7eb68aa3a 100644
>>>       --- a/builtin/commit-graph.c
>>>       +++ b/builtin/commit-graph.c
>>>       @@ -122,6 +122,7 @@ static int graph_write(int argc, const char **argv)
>>>               struct string_list *pack_indexes = NULL;
>>>               struct string_list *commit_hex = NULL;
>>>               struct string_list lines;
>>>       +       int free_lines = 0;
>>>
>>>               static struct option builtin_commit_graph_write_options[] = {
>>>                       OPT_STRING(0, "object-dir", &opts.obj_dir,
>>>       @@ -155,6 +156,7 @@ static int graph_write(int argc, const char **argv)
>>>               if (opts.stdin_packs || opts.stdin_commits) {
>>>                       struct strbuf buf = STRBUF_INIT;
>>>                       string_list_init(&lines, 0);
>>>       +               free_lines = 1;
>>>
>>>                       while (strbuf_getline(&buf, stdin) != EOF)
>>>                               string_list_append(&lines, strbuf_detach(&buf, NULL));
>>>       @@ -170,7 +172,8 @@ static int graph_write(int argc, const char **argv)
>>>                                  commit_hex,
>>>                                  opts.append);
>>>
>>>       -       string_list_clear(&lines, 0);
>>>       +       if (free_lines)
>>>       +               string_list_clear(&lines, 0);
>>>               return 0;
>>>        }
>>>
>>> But probably having a pointer to the struct which is NULL etc. is
>>> better.
>> Wouldn't the easiest fix be to call `string_list_init(&lines, 0)`
>> outside of any conditional?
> Sure that works too. We'd be doing the init when we don't need it, but
> it's not like this part is performance critical or anything...


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v5 18/21] commit-graph: use string-list API for input
  2018-06-06 12:45           ` Derrick Stolee
@ 2018-06-08 12:48             ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 12:48 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, git, jnareb, marten.agren, gitster

On 6/6/2018 8:45 AM, Derrick Stolee wrote:
> On 6/6/2018 8:26 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Jun 06 2018, Derrick Stolee wrote:
>>
>>> On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:
>>>> On Wed, Jun 06 2018, Derrick Stolee wrote:
>>>>
>>>>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>>>>>
>>>>> +    string_list_clear(&lines, 0);
>>>>>        return 0;
>>>>>    }
>>>> This results in an invalid free() & segfault because you're freeing
>>>> &lines which may not have been allocated by string_list_init().
>>> Good point. Did my tests not catch this? (seems it requires calling
>>> `git commit-graph write` with no `--stdin-packs` or
>>> `--stdin-commits`).
>> Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
>> you're on a more forgiving compiler/platform/options. I compiled with
>> -O0 -g on clang 4.0.1-8 + Debian testing.
>
> I appreciate the extra platform testing. I'm using GCC on Ubuntu (gcc 
> (Ubuntu 7.3.0-16ubuntu3) 7.3.0).

While the errors didn't repro on my box, they did on the VSTS Linux 
build machines [1]. I created an internal PR just so I could run those 
tests (and on our OSX agents which use clang) and so I can verify I 
fixed the situation. I'll send a v6 soon so Junio only picks up a 
version that succeeds in tests.

Thanks,
-Stolee

[1] 
https://github.com/Microsoft/vsts-agent-docker/blob/master/ubuntu/16.04/standard/Dockerfile
     What's installed on a VSTS Linux build agent

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
                     ` (20 preceding siblings ...)
  2018-06-06 11:36   ` [PATCH v5 21/21] commit-graph: update design document Derrick Stolee
@ 2018-06-08 13:56   ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 01/21] commit-graph: UNLEAK before die() Derrick Stolee
                       ` (22 more replies)
  21 siblings, 23 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

Sorry for the fast rerolls, but Ævar found some runtime issues with the
string-list use and '\*' as multiplication on some platforms. Thus, v4
and v5 are broken. I did get a repro of those issues using VSTS Linux
build servers and checked that they are fixed in this version.

This version depends on 'next' and sb/object-store-alloc. To accommodate
the repository arguments added by sb/object-store-alloc, the following
diff occurs from the previous patch:

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 76423b3fa5..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,6 +3,7 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"

 static char const * const builtin_commit_graph_usage[] = {
@@ -63,7 +64,7 @@ static int graph_verify(int argc, const char **argv)
        if (!graph)
                return 0;

-       return verify_commit_graph(graph);
+       return verify_commit_graph(the_repository, graph);
 }

 static int graph_read(int argc, const char **argv)
@@ -152,9 +153,9 @@ static int graph_write(int argc, const char **argv)
                return 0;
        }

+       string_list_init(&lines, 0);
        if (opts.stdin_packs || opts.stdin_commits) {
                struct strbuf buf = STRBUF_INIT;
-               string_list_init(&lines, 0);

                while (strbuf_getline(&buf, stdin) != EOF)
                        string_list_append(&lines, strbuf_detach(&buf, NULL));
diff --git a/commit-graph.c b/commit-graph.c
index 0d5adc8035..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -12,6 +12,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"

 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -866,7 +867,7 @@ static void graph_report(const char *fmt, ...)
 #define GENERATION_ZERO_EXISTS 1
 #define GENERATION_NUMBER_EXISTS 2

-int verify_commit_graph(struct commit_graph *g)
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
        uint32_t i, cur_fanout_pos = 0;
        struct object_id prev_oid, cur_oid, checksum;
@@ -948,7 +949,7 @@ int verify_commit_graph(struct commit_graph *g)
                hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

                graph_commit = lookup_commit(&cur_oid);
-               odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node());
+               odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
                if (parse_commit_internal(odb_commit, 0, 0)) {
                        graph_report("failed to parse %s from object database",
                                     oid_to_hex(&cur_oid));
diff --git a/commit-graph.h b/commit-graph.h
index ee20f5e280..506cb45fb1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H

 #include "git-compat-util.h"
+#include "repository.h"
 #include "string-list.h"

 char *get_commit_graph_filename(const char *obj_dir);
@@ -53,6 +54,6 @@ void write_commit_graph(const char *obj_dir,
                        struct string_list *commit_hex,
                        int append);

-int verify_commit_graph(struct commit_graph *g);
+int verify_commit_graph(struct repository *r, struct commit_graph *g);

 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b24e8b6689..9a0661983c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
 '

 graph_git_two_modes() {
-       git -c core.commitGraph=true $1 >output
-       git -c core.commitGraph=false $1 >expect
+       git -c core.graph=true $1 >output
+       git -c core.graph=false $1 >expect
        test_cmp output expect
 }

@@ -282,17 +282,17 @@ GRAPH_CHUNK_LOOKUP_WIDTH=12
 GRAPH_CHUNK_LOOKUP_ROWS=5
 GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
 GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-                           1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+                           1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-                            2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+                            2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-                      $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS))
-GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
-GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
-GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
-GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
-GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 10))
-GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* $NUM_COMMITS))
+                      $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
@@ -301,9 +301,9 @@ GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
-                            $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS))
+                            $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
-GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))

 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position

---

Derrick Stolee (21):
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: use string-list API for input
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt                 |  10 +-
 Documentation/git-commit-graph.txt       |  14 +-
 Documentation/git-fsck.txt               |   3 +
 Documentation/git-gc.txt                 |   4 +
 Documentation/technical/commit-graph.txt |  22 --
 builtin/commit-graph.c                   |  99 ++++++---
 builtin/fsck.c                           |  21 ++
 builtin/gc.c                             |   6 +
 commit-graph.c                           | 249 +++++++++++++++++++++--
 commit-graph.h                           |  11 +-
 commit.c                                 |   9 +-
 commit.h                                 |   1 +
 t/t5318-commit-graph.sh                  | 201 ++++++++++++++++++
 13 files changed, 572 insertions(+), 78 deletions(-)

-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 01/21] commit-graph: UNLEAK before die()
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
                       ` (21 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph)
+	if (!graph) {
+		UNLEAK(graph_name);
 		die("graph file %s does not exist", graph_name);
+	}
+
 	FREE_AND_NULL(graph_name);
 
 	printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 03/21] commit-graph: parse commit from chosen graph
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 01/21] commit-graph: UNLEAK before die() Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 04/21] commit: force commit to parse from object database Derrick Stolee
                       ` (19 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f83f6d2373..e77b19971d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
 	uint32_t pos;
 
@@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item)
 		return 0;
 	if (item->object.parsed)
 		return 1;
+
+	if (find_commit_in_graph(item, g, &pos))
+		return fill_commit_in_graph(item, g, pos);
+
+	return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+
 	prepare_commit_graph();
-	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-		return fill_commit_in_graph(item, commit_graph, pos);
+	if (commit_graph)
+		return parse_commit_in_graph_one(commit_graph, item);
 	return 0;
 }
 
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 01/21] commit-graph: UNLEAK before die() Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
                       ` (20 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b63a1fc85e..f83f6d2373 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,10 +35,11 @@
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-			GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 04/21] commit: force commit to parse from object database
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (2 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
                       ` (18 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 9 +++++++--
 commit.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index d53dc16d72..720c6acddf 100644
--- a/commit.c
+++ b/commit.c
@@ -418,7 +418,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph)
 {
 	enum object_type type;
 	void *buffer;
@@ -429,7 +429,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (parse_commit_in_graph(item))
+	if (use_commit_graph && parse_commit_in_graph(item))
 		return 0;
 	buffer = read_object_file(&item->object.oid, &type, &size);
 	if (!buffer)
@@ -450,6 +450,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 	return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+	return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 3ad07c2e3d..7e0f273720 100644
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 05/21] commit-graph: load a root tree from specific graph
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (3 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 04/21] commit: force commit to parse from object database Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
                       ` (17 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e77b19971d..9e228d3bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
 	return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+						 const struct commit *c)
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
 	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
+
+	return load_tree_for_commit(g, (struct commit *)c);
+}
 
-	return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+	return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 06/21] commit-graph: add 'verify' subcommand
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (4 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
                       ` (16 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  6 +++++
 builtin/commit-graph.c             | 39 ++++++++++++++++++++++++++++++
 commit-graph.c                     | 23 ++++++++++++++++++
 commit-graph.h                     |  3 +++
 t/t5318-commit-graph.sh            | 10 ++++++++
 5 files changed, 81 insertions(+)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git commit-graph read' [--object-dir <dir>]
+'git commit-graph verify' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 --------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..9d108f43a9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,15 +3,22 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
+	N_("git commit-graph verify [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+	N_("git commit-graph verify [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	NULL
@@ -29,6 +36,36 @@ static struct opts_commit_graph {
 	int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+	struct commit_graph *graph = NULL;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_verify_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			   N_("dir"),
+			   N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_verify_options,
+			     builtin_commit_graph_verify_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+	FREE_AND_NULL(graph_name);
+
+	if (!graph)
+		return 0;
+
+	return verify_commit_graph(the_repository, graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
@@ -165,6 +202,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
+		if (!strcmp(argv[0], "verify"))
+			return graph_verify(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index 9e228d3bb5..22ef696e18 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir,
 	oids.alloc = 0;
 	oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_commit_graph_error = 1;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
+{
+	if (!g) {
+		graph_report("no commit-graph file loaded");
+		return 1;
+	}
+
+	return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..4359812fb4 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
+#include "repository.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -53,4 +54,6 @@ void write_commit_graph(const char *obj_dir,
 			int nr_commits,
 			int append);
 
+int verify_commit_graph(struct repository *r, struct commit_graph *g);
+
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 77d85aefe7..6ca451dfd2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
 	objdir=".git/objects"
 '
 
+test_expect_success 'verify graph with no graph file' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify
+'
+
 test_expect_success 'write graph with no packs' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write --object-dir . &&
@@ -230,4 +235,9 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'git commit-graph verify' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify >output
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 07/21] commit-graph: verify catches corrupt signature
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (5 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 08/21] commit-graph: verify required chunks are present Derrick Stolee
                       ` (15 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 43 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6ca451dfd2..8f96e2636c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
 	git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify <position> <data> <string>
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+	pos=$1
+	data="${2:-\0}"
+	grepstr=$3
+	cd "$TRASH_DIRECTORY/full" &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+	test_must_fail git commit-graph verify 2>test_err &&
+	grep -v "^+" test_err >err
+	test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+	corrupt_graph_and_verify 0 "\0" \
+		"graph signature"
+'
+
+test_expect_success 'detect bad version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+		"hash version"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 08/21] commit-graph: verify required chunks are present
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (6 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
                       ` (14 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          |  9 +++++++++
 t/t5318-commit-graph.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 22ef696e18..f30b4ccee9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,5 +848,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return 1;
 	}
 
+	verify_commit_graph_error = 0;
+
+	if (!g->chunk_oid_fanout)
+		graph_report("commit-graph is missing the OID Fanout chunk");
+	if (!g->chunk_oid_lookup)
+		graph_report("commit-graph is missing the OID Lookup chunk");
+	if (!g->chunk_commit_data)
+		graph_report("commit-graph is missing the Commit Data chunk");
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 8f96e2636c..c03792a8ed 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			    1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			     2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
 		"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+		"missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+		"missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+		"missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (7 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 08/21] commit-graph: verify required chunks are present Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 10/21] commit-graph: verify objects exist Derrick Stolee
                       ` (13 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 36 ++++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 22 ++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f30b4ccee9..866a9e7e41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
+	uint32_t i, cur_fanout_pos = 0;
+	struct object_id prev_oid, cur_oid;
+
 	if (!g) {
 		graph_report("no commit-graph file loaded");
 		return 1;
@@ -857,5 +860,38 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (!g->chunk_commit_data)
 		graph_report("commit-graph is missing the Commit Data chunk");
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+			graph_report("commit-graph has incorrect OID order: %s then %s",
+				     oid_to_hex(&prev_oid),
+				     oid_to_hex(&cur_oid));
+
+		oidcpy(&prev_oid, &cur_oid);
+
+		while (cur_oid.hash[0] > cur_fanout_pos) {
+			uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+			if (i != fanout_value)
+				graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+					     cur_fanout_pos, fanout_value, i);
+
+			cur_fanout_pos++;
+		}
+	}
+
+	while (cur_fanout_pos < 256) {
+		uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+		if (g->num_commits != fanout_value)
+			graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+				     cur_fanout_pos, fanout_value, i);
+
+		cur_fanout_pos++;
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c03792a8ed..4809cc881f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			    1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			     2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+		       $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
 		"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+		"incorrect OID order"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 10/21] commit-graph: verify objects exist
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (8 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 12/21] commit-graph: verify parent list Derrick Stolee
                       ` (12 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 18 ++++++++++++++++++
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 25 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 866a9e7e41..00e89b71e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -11,6 +11,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -242,6 +243,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
 	struct commit *c;
 	struct object_id oid;
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -893,5 +895,21 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		struct commit *odb_commit;
+
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
+		if (parse_commit_internal(odb_commit, 0, 0)) {
+			graph_report("failed to parse %s from object database",
+				     oid_to_hex(&cur_oid));
+			continue;
+		}
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4809cc881f..af5e34c0cb 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
 		"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+		"from object database"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 11/21] commit-graph: verify root tree OIDs
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (10 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 12/21] commit-graph: verify parent list Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 13/21] commit-graph: verify generation number Derrick Stolee
                       ` (10 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 ++++++++++++++++-
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 00e89b71e9..5df18394f9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -866,6 +866,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
+		struct commit *graph_commit;
+
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -883,6 +885,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 
 			cur_fanout_pos++;
 		}
+
+		graph_commit = lookup_commit(&cur_oid);
+		if (!parse_commit_in_graph_one(g, graph_commit))
+			graph_report("failed to parse %s from commit-graph",
+				     oid_to_hex(&cur_oid));
 	}
 
 	while (cur_fanout_pos < 256) {
@@ -899,16 +906,24 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
-		struct commit *odb_commit;
+		struct commit *graph_commit, *odb_commit;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+		graph_commit = lookup_commit(&cur_oid);
 		odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
 		if (parse_commit_internal(odb_commit, 0, 0)) {
 			graph_report("failed to parse %s from object database",
 				     oid_to_hex(&cur_oid));
 			continue;
 		}
+
+		if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,
+			   get_commit_tree_oid(odb_commit)))
+			graph_report("root tree OID for commit %s in commit-graph is %s != %s",
+				     oid_to_hex(&cur_oid),
+				     oid_to_hex(get_commit_tree_oid(graph_commit)),
+				     oid_to_hex(get_commit_tree_oid(odb_commit)));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index af5e34c0cb..2b9214bc83 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
 		"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+		"root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 12/21] commit-graph: verify parent list
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (9 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 10/21] commit-graph: verify objects exist Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-12 16:55       ` Junio C Hamano
  2018-06-08 13:56     ` [PATCH v6 11/21] commit-graph: verify root tree OIDs Derrick Stolee
                       ` (11 subsequent siblings)
  22 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0xFFFFFFFF to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 28 ++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 18 ++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5df18394f9..6d8d774eb0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -244,6 +244,9 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 	struct commit *c;
 	struct object_id oid;
 
+	if (pos >= g->num_commits)
+		die("invalid parent position %"PRIu64, pos);
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -907,6 +910,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
+		struct commit_list *graph_parents, *odb_parents;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -924,6 +928,30 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     oid_to_hex(get_commit_tree_oid(graph_commit)),
 				     oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+		graph_parents = graph_commit->parents;
+		odb_parents = odb_commit->parents;
+
+		while (graph_parents) {
+			if (odb_parents == NULL) {
+				graph_report("commit-graph parent list for commit %s is too long",
+					     oid_to_hex(&cur_oid));
+				break;
+			}
+
+			if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
+				graph_report("commit-graph parent for %s is %s != %s",
+					     oid_to_hex(&cur_oid),
+					     oid_to_hex(&graph_parents->item->object.oid),
+					     oid_to_hex(&odb_parents->item->object.oid));
+
+			graph_parents = graph_parents->next;
+			odb_parents = odb_parents->next;
+		}
+
+		if (odb_parents != NULL)
+			graph_report("commit-graph parent list for commit %s terminates early",
+				     oid_to_hex(&cur_oid));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2b9214bc83..9a3481c30f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
 		"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+		"invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+		"is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+		"commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 13/21] commit-graph: verify generation number
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (11 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 11/21] commit-graph: verify root tree OIDs Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 14/21] commit-graph: verify commit date Derrick Stolee
                       ` (9 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 34 ++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 11 +++++++++++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 6d8d774eb0..e0f71658da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -846,10 +846,14 @@ static void graph_report(const char *fmt, ...)
 	va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid;
+	int generation_zero = 0;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -911,6 +915,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
+		uint32_t max_generation = 0;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -945,6 +950,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
+			if (graph_parents->item->generation > max_generation)
+				max_generation = graph_parents->item->generation;
+
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
 		}
@@ -952,6 +960,32 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		if (odb_parents != NULL)
 			graph_report("commit-graph parent list for commit %s terminates early",
 				     oid_to_hex(&cur_oid));
+
+		if (!graph_commit->generation) {
+			if (generation_zero == GENERATION_NUMBER_EXISTS)
+				graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",
+					     oid_to_hex(&cur_oid));
+			generation_zero = GENERATION_ZERO_EXISTS;
+		} else if (generation_zero == GENERATION_ZERO_EXISTS)
+			graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",
+				     oid_to_hex(&cur_oid));
+
+		if (generation_zero == GENERATION_ZERO_EXISTS)
+			continue;
+
+		/*
+		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
+		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
+		 * extra logic in the following condition.
+		 */
+		if (max_generation == GENERATION_NUMBER_MAX)
+			max_generation--;
+
+		if (graph_commit->generation != max_generation + 1)
+			graph_report("commit-graph generation for commit %s is %u != %u",
+				     oid_to_hex(&cur_oid),
+				     graph_commit->generation,
+				     max_generation + 1);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 9a3481c30f..5b75c4dca2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '
 		"commit-graph parent for"
 '
 
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+		"generation for commit"
+'
+
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+		"non-zero generation number"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 14/21] commit-graph: verify commit date
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (12 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 13/21] commit-graph: verify generation number Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
                       ` (8 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 6 ++++++
 t/t5318-commit-graph.sh | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e0f71658da..6d6c6beff9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -986,6 +986,12 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     graph_commit->generation,
 				     max_generation + 1);
+
+		if (graph_commit->date != odb_commit->date)
+			graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,
+				     oid_to_hex(&cur_oid),
+				     graph_commit->date,
+				     odb_commit->date);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5b75c4dca2..b7b4410e75 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
 		"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+		"commit date"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 15/21] commit-graph: test for corrupted octopus edge
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (13 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 14/21] commit-graph: verify commit date Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 16/21] commit-graph: verify contents match checksum Derrick Stolee
                       ` (7 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b7b4410e75..cbd6462226 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+			     $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
 		"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+		"invalid parent"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 16/21] commit-graph: verify contents match checksum
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (14 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 17/21] fsck: verify commit-graph Derrick Stolee
                       ` (6 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 ++++++++++++++--
 t/t5318-commit-graph.sh |  6 ++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6d6c6beff9..d926c4b59f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -833,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -852,8 +853,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
-	struct object_id prev_oid, cur_oid;
+	struct object_id prev_oid, cur_oid, checksum;
 	int generation_zero = 0;
+	struct hashfile *f;
+	int devnull;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -872,6 +875,15 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error)
 		return verify_commit_graph_error;
 
+	devnull = open("/dev/null", O_WRONLY);
+	f = hashfd(devnull, NULL);
+	hashwrite(f, g->data, g->data_len - g->hash_len);
+	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
+		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+	}
+
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit;
 
@@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
-	if (verify_commit_graph_error)
+	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cbd6462226..bd5b8428af 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 			     $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
 		"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum"
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 17/21] fsck: verify commit-graph
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (15 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 16/21] commit-graph: verify contents match checksum Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 19/21] commit-graph: add '--reachable' option Derrick Stolee
                       ` (5 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c             | 21 +++++++++++++++++++++
 t/t5318-commit-graph.sh    |  8 ++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 ---------------------
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3ad4f160f9..9fb2edc69f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	check_connectivity();
+
+	if (core_commit_graph) {
+		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
+		commit_graph_verify.argv = verify_argv;
+		commit_graph_verify.git_cmd = 1;
+
+		if (run_command(&commit_graph_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb(the_repository);
+		for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+			verify_argv[2] = "--object-dir";
+			verify_argv[3] = alt->path;
+			if (run_command(&commit_graph_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index bd5b8428af..0da5a51552 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git fsck &&
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum" &&
+	test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 19/21] commit-graph: add '--reachable' option
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (16 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 17/21] fsck: verify commit-graph Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 18/21] commit-graph: use string-list API for input Derrick Stolee
                       ` (4 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  8 ++++++--
 builtin/commit-graph.c             | 16 ++++++++++++----
 commit-graph.c                     | 18 ++++++++++++++++++
 commit-graph.h                     |  1 +
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index ea28bc311a..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -25,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int reachable;
 	int stdin_packs;
 	int stdin_commits;
 	int append;
@@ -127,6 +128,8 @@ static int graph_write(int argc, const char **argv)
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "reachable", &opts.reachable,
+			N_("start walk at all refs")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -140,11 +143,16 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
-	if (opts.stdin_packs && opts.stdin_commits)
-		die(_("cannot use both --stdin-commits and --stdin-packs"));
+	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	if (opts.reachable) {
+		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		return 0;
+	}
+
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
diff --git a/commit-graph.c b/commit-graph.c
index a06e7e9549..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -656,6 +657,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 	}
 }
 
+static int add_ref_to_list(const char *refname,
+			   const struct object_id *oid,
+			   int flags, void *cb_data)
+{
+	struct string_list *list = (struct string_list*)cb_data;
+	string_list_append(list, oid_to_hex(oid));
+	return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+	struct string_list list;
+	string_list_init(&list, 1);
+	for_each_ref(add_ref_to_list, &list);
+	write_commit_graph(obj_dir, NULL, &list, append);
+}
+
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/commit-graph.h b/commit-graph.h
index a79b854470..506cb45fb1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -48,6 +48,7 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0da5a51552..06ecbb3f4a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' '
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --reachable &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 18/21] commit-graph: use string-list API for input
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (17 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 19/21] commit-graph: add '--reachable' option Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 13:56     ` [PATCH v6 20/21] gc: automatically write commit-graph files Derrick Stolee
                       ` (3 subsequent siblings)
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 39 +++++++++++++--------------------------
 commit-graph.c         | 15 +++++++--------
 commit-graph.h         |  7 +++----
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9d108f43a9..ea28bc311a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-	const char **pack_indexes = NULL;
-	int packs_nr = 0;
-	const char **commit_hex = NULL;
-	int commits_nr = 0;
-	const char **lines = NULL;
-	int lines_nr = 0;
-	int lines_alloc = 0;
+	struct string_list *pack_indexes = NULL;
+	struct string_list *commit_hex = NULL;
+	struct string_list lines;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -149,34 +145,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-		lines_nr = 0;
-		lines_alloc = 128;
-		ALLOC_ARRAY(lines, lines_alloc);
-
-		while (strbuf_getline(&buf, stdin) != EOF) {
-			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-			lines[lines_nr++] = strbuf_detach(&buf, NULL);
-		}
-
-		if (opts.stdin_packs) {
-			pack_indexes = lines;
-			packs_nr = lines_nr;
-		}
-		if (opts.stdin_commits) {
-			commit_hex = lines;
-			commits_nr = lines_nr;
-		}
+
+		while (strbuf_getline(&buf, stdin) != EOF)
+			string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+		if (opts.stdin_packs)
+			pack_indexes = &lines;
+		if (opts.stdin_commits)
+			commit_hex = &lines;
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr,
 			   commit_hex,
-			   commits_nr,
 			   opts.append);
 
+	string_list_clear(&lines, 0);
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index d926c4b59f..a06e7e9549 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -657,10 +657,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append)
 {
 	struct packed_oid_list oids;
@@ -701,10 +699,10 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		for (i = 0; i < nr_packs; i++) {
+		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes[i]);
+			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
 			if (!p)
 				die("error adding pack %s", packname.buf);
@@ -717,12 +715,13 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	if (commit_hex) {
-		for (i = 0; i < nr_commits; i++) {
+		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+			if (commit_hex->items[i].string &&
+			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
 			result = lookup_commit_reference_gently(&oid, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 4359812fb4..a79b854470 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -3,6 +3,7 @@
 
 #include "git-compat-util.h"
 #include "repository.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -48,10 +49,8 @@ struct commit_graph {
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 20/21] gc: automatically write commit-graph files
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (18 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 18/21] commit-graph: use string-list API for input Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 22:24       ` SZEDER Gábor
  2018-06-08 13:56     ` [PATCH v6 21/21] commit-graph: update design document Derrick Stolee
                       ` (2 subsequent siblings)
  22 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 10 +++++++++-
 Documentation/git-gc.txt |  4 ++++
 builtin/gc.c             |  6 ++++++
 t/t5318-commit-graph.sh  | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..f2b5ed17c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -906,7 +906,8 @@ the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
 core.commitGraph::
 	Enable git commit graph feature. Allows reading from the
-	commit-graph file.
+	commit-graph file. See `gc.commitGraph` for automatically
+	maintaining the file.
 
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1647,6 +1648,13 @@ this configuration variable is ignored, all packs except the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
+gc.commitGraph::
+	If true, then gc will rewrite the commit-graph file when
+	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+	'--auto' the commit-graph will be updated if housekeeping is
+	required. Default is false. See linkgit:git-commit-graph[1]
+	for details.
+
 gc.logExpiry::
 	If the file gc.log exists, then `git gc --auto` won't run
 	unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 24b2dd44fe..f5bc98ccb3 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..4e06e8372d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
 	git_config_get_int("gc.aggressivedepth", &aggressive_depth);
 	git_config_get_int("gc.auto", &gc_auto_threshold);
 	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+	git_config_get_bool("gc.commitgraph", &gc_commit_graph);
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_get_expiry("gc.pruneexpire", &prune_expire);
 	git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (pack_garbage.nr > 0)
 		clean_pack_garbage();
 
+	if (gc_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 0);
+
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 06ecbb3f4a..9a0661983c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit --allow-empty -m "blank" &&
+	git commit-graph write --reachable &&
+	cp $objdir/info/commit-graph commit-graph-before-gc &&
+	git reset --hard HEAD~1 &&
+	git config gc.commitGraph true &&
+	git gc &&
+	cp $objdir/info/commit-graph commit-graph-after-gc &&
+	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	git commit-graph write --reachable &&
+	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v6 21/21] commit-graph: update design document
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (19 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 20/21] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-08 13:56     ` Derrick Stolee
  2018-06-08 15:05     ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Jakub Narębski
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
  22 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 13:56 UTC (permalink / raw)
  To: git; +Cc: avarab, sbeller, jnareb, marten.agren, gitster, Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
     - 'log --topo-order'
     - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.rc1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (20 preceding siblings ...)
  2018-06-08 13:56     ` [PATCH v6 21/21] commit-graph: update design document Derrick Stolee
@ 2018-06-08 15:05     ` Jakub Narębski
  2018-06-08 15:11       ` Derrick Stolee
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
  22 siblings, 1 reply; 122+ messages in thread
From: Jakub Narębski @ 2018-06-08 15:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Ævar Arnfjörð Bjarmason, Stefan Beller,
	Martin Ågren, Junio C Hamano

On Fri, 8 Jun 2018 at 15:56, Derrick Stolee <dstolee@microsoft.com> wrote:

> [..], the following
> diff occurs from the previous patch:
[...]
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index b24e8b6689..9a0661983c 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
>  '
>
>  graph_git_two_modes() {
> -       git -c core.commitGraph=true $1 >output
> -       git -c core.commitGraph=false $1 >expect
> +       git -c core.graph=true $1 >output
> +       git -c core.graph=false $1 >expect
>         test_cmp output expect
>  }

It seems that you have accidentally removed the fix from previous version.
It needs to be core.commitGraph, not core.graph.

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-08 15:05     ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Jakub Narębski
@ 2018-06-08 15:11       ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-08 15:11 UTC (permalink / raw)
  To: Jakub Narębski, Derrick Stolee
  Cc: git, Ævar Arnfjörð Bjarmason, Stefan Beller,
	Martin Ågren, Junio C Hamano

On 6/8/2018 11:05 AM, Jakub Narębski wrote:
> On Fri, 8 Jun 2018 at 15:56, Derrick Stolee <dstolee@microsoft.com> wrote:
>
>> [..], the following
>> diff occurs from the previous patch:
> [...]
>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
>> index b24e8b6689..9a0661983c 100755
>> --- a/t/t5318-commit-graph.sh
>> +++ b/t/t5318-commit-graph.sh
>> @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
>>   '
>>
>>   graph_git_two_modes() {
>> -       git -c core.commitGraph=true $1 >output
>> -       git -c core.commitGraph=false $1 >expect
>> +       git -c core.graph=true $1 >output
>> +       git -c core.graph=false $1 >expect
>>          test_cmp output expect
>>   }
> It seems that you have accidentally removed the fix from previous version.
> It needs to be core.commitGraph, not core.graph.
>
>

I didn't rebase the fix that I sent as a separate patch [1] (we want 
that change applied to 'master' while this one targets topics in 'next' 
and 'pu'). So this specific diff is unfortunate noise.

Thanks!
-Stolee

[1] 
https://public-inbox.org/git/20180604123906.136417-1-dstolee@microsoft.com/
     [PATCH] t5318-commit-graph.sh: use core.commitGraph

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v6 20/21] gc: automatically write commit-graph files
  2018-06-08 13:56     ` [PATCH v6 20/21] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-08 22:24       ` SZEDER Gábor
  2018-06-25 14:48         ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: SZEDER Gábor @ 2018-06-08 22:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, git, avarab, sbeller, jnareb, marten.agren, gitster


> The commit-graph file is a very helpful feature for speeding up git
> operations. In order to make it more useful, make it possible to
> write the commit-graph file during standard garbage collection
> operations.
> 
> Add a 'gc.commitGraph' config setting that triggers writing a
> commit-graph file after any non-trivial 'git gc' command. Defaults to
> false while the commit-graph feature matures. We specifically do not
> want to have this on by default until the commit-graph feature is fully
> integrated with history-modifying features like shallow clones.

So I played around with an earlier version of this patch series a
while ago... and as I looked into my gitconfig today I was surprised
to have both 'core.commitGraph' and 'gc.commitGraph' config variables
set.  When I looked into it I came across this email from Ævar:

  https://public-inbox.org/git/87fu3peni2.fsf@evledraar.gmail.com/

  > Other than the question if 'gc.commitGraph' and 'core.commitGraph'
  > should be independent config variables, and the exact wording of the
  > git-gc docs, it looks good to me.

  Sans doc errors you pointed out in other places (you need to set
  core.commitGraph so it's read at all), I think it's very useful to have
  these split up. It's simliar to pack.useBitmaps & pack.writeBitmaps.

I think the comparison with pack bitmaps makes a lot of sense and I
have to say that I really like those 'useBitmaps' and 'writeBitmaps'
variable names, because it's clear right away which one is which,
without consulting the documentation.  I think having 'useCommitGraph'
and 'writeCommitGraph' variables would be a lot better than the same
variable name in two different sections, and I'm sure that then I
wouldn't have been caught off guard.

Yeah, I know, my timing sucks, with 'core.commitGraph' already out
there in the -rc releases...  sorry.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v6 12/21] commit-graph: verify parent list
  2018-06-08 13:56     ` [PATCH v6 12/21] commit-graph: verify parent list Derrick Stolee
@ 2018-06-12 16:55       ` Junio C Hamano
  0 siblings, 0 replies; 122+ messages in thread
From: Junio C Hamano @ 2018-06-12 16:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git\, avarab\, sbeller\, jnareb\, marten.agren\

Derrick Stolee <dstolee@microsoft.com> writes:

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 2b9214bc83..9a3481c30f 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
>  GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
>  GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
>  GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
> +GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
> +GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
> +GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))

There seems to depend on having GRAPH_COMMIT_DATA_OFFSET in this
file, but I do not think 'next' or sb/object-store-alloc has one,
and I do not think I saw an earlier patch in this series to add such
a line.

It appears that many messages in this series are sent in
quoted-printable when they do not have to, which made me say fuzzy
things like "I do not think I saw" in the previous paragraph,
instead of being able to be definitive with a simple "grep" result.
Please try to see if you can find a way to tell your MUA not to do
unnecessary mail munging, if you can do so easily.

The worst one was 07/21, which came in base64 and contained a patch
to this file in MS-DOS line ending convention, which made it
impossible to apply.

>  
>  # usage: corrupt_graph_and_verify <position> <data> <string>
>  # Manipulates the commit-graph file at the position
> @@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
>  		"root tree OID for commit"
>  '
>  
> +test_expect_success 'detect incorrect parent int-id' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
> +		"invalid parent"
> +'
> +
> +test_expect_success 'detect extra parent int-id' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
> +		"is too long"
> +'
> +
> +test_expect_success 'detect wrong parent' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
> +		"commit-graph parent for"
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v6 20/21] gc: automatically write commit-graph files
  2018-06-08 22:24       ` SZEDER Gábor
@ 2018-06-25 14:48         ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-25 14:48 UTC (permalink / raw)
  To: SZEDER Gábor, Derrick Stolee
  Cc: git, avarab, sbeller, jnareb, marten.agren, gitster

On 6/8/2018 6:24 PM, SZEDER Gábor wrote:
>> The commit-graph file is a very helpful feature for speeding up git
>> operations. In order to make it more useful, make it possible to
>> write the commit-graph file during standard garbage collection
>> operations.
>>
>> Add a 'gc.commitGraph' config setting that triggers writing a
>> commit-graph file after any non-trivial 'git gc' command. Defaults to
>> false while the commit-graph feature matures. We specifically do not
>> want to have this on by default until the commit-graph feature is fully
>> integrated with history-modifying features like shallow clones.
> So I played around with an earlier version of this patch series a
> while ago... and as I looked into my gitconfig today I was surprised
> to have both 'core.commitGraph' and 'gc.commitGraph' config variables
> set.  When I looked into it I came across this email from Ævar:
>
>    https://public-inbox.org/git/87fu3peni2.fsf@evledraar.gmail.com/
>
>    > Other than the question if 'gc.commitGraph' and 'core.commitGraph'
>    > should be independent config variables, and the exact wording of the
>    > git-gc docs, it looks good to me.
>
>    Sans doc errors you pointed out in other places (you need to set
>    core.commitGraph so it's read at all), I think it's very useful to have
>    these split up. It's simliar to pack.useBitmaps & pack.writeBitmaps.
>
> I think the comparison with pack bitmaps makes a lot of sense and I
> have to say that I really like those 'useBitmaps' and 'writeBitmaps'
> variable names, because it's clear right away which one is which,
> without consulting the documentation.  I think having 'useCommitGraph'
> and 'writeCommitGraph' variables would be a lot better than the same
> variable name in two different sections, and I'm sure that then I
> wouldn't have been caught off guard.

Sorry for the late reply. Maybe 'gc.writeCommitGraph' makes it clear 
that the setting enables writing the commit-graph during  a 'git gc'. Is 
that more clear?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 00/22] Integrate commit-graph into 'fsck' and 'gc'
  2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
                       ` (21 preceding siblings ...)
  2018-06-08 15:05     ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Jakub Narębski
@ 2018-06-27 13:24     ` " Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph Derrick Stolee
                         ` (21 more replies)
  22 siblings, 22 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

This v7 has very little material difference from the previous version.
The only changes are:

* Rename 'gc.commitGraph' to 'gc.writeCommitGraph'

* Add 't5318-commit-graph.sh: use core.commitGraph' patch that was
  dropped.

* Rebase onto latest master, which appears to fix any issues merging
  into 'next' and 'pu'.

I'm sending this from my gmail account to hopefully avoid any message
munging that happened in v6.

You can see my commits on GitHub [1].

[1] https://github.com/gitgitgadget/git/pull/6

Thanks,
-Stolee

Derrick Stolee (22):
  t5318-commit-graph.sh: use core.commitGraph
  commit-graph: UNLEAK before die()
  commit-graph: fix GRAPH_MIN_SIZE
  commit-graph: parse commit from chosen graph
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: add 'verify' subcommand
  commit-graph: verify catches corrupt signature
  commit-graph: verify required chunks are present
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify objects exist
  commit-graph: verify root tree OIDs
  commit-graph: verify parent list
  commit-graph: verify generation number
  commit-graph: verify commit date
  commit-graph: test for corrupted octopus edge
  commit-graph: verify contents match checksum
  fsck: verify commit-graph
  commit-graph: use string-list API for input
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/config.txt                 |   9 +-
 Documentation/git-commit-graph.txt       |  14 +-
 Documentation/git-fsck.txt               |   3 +
 Documentation/git-gc.txt                 |   4 +
 Documentation/technical/commit-graph.txt |  22 --
 builtin/commit-graph.c                   |  99 ++++++---
 builtin/fsck.c                           |  21 ++
 builtin/gc.c                             |   6 +
 commit-graph.c                           | 249 +++++++++++++++++++++--
 commit-graph.h                           |  11 +-
 commit.c                                 |  10 +-
 commit.h                                 |   1 +
 t/t5318-commit-graph.sh                  | 205 ++++++++++++++++++-
 13 files changed, 572 insertions(+), 82 deletions(-)


base-commit: ed843436dd4924c10669820cc73daf50f0b4dabd
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 17:38         ` Junio C Hamano
  2018-06-27 13:24       ` [PATCH v7 02/22] commit-graph: UNLEAK before die() Derrick Stolee
                         ` (20 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph tests should be checking that normal Git operations
succeed and have matching output with and without the commit-graph
feature enabled. However, the test was toggling 'core.graph' instead
of the correct 'core.commitGraph' variable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---

Junio,

I sent this patch as a one-off a while ago, but it seems it was dropped.
I'm adding it back here so we don't forget it.

-Stolee

 t/t5318-commit-graph.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 77d85aefe7..59d0be2877 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -28,8 +28,8 @@ test_expect_success 'create commits and repack' '
 '
 
 graph_git_two_modes() {
-	git -c core.graph=true $1 >output
-	git -c core.graph=false $1 >expect
+	git -c core.commitGraph=true $1 >output
+	git -c core.commitGraph=false $1 >expect
 	test_cmp output expect
 }
 
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 02/22] commit-graph: UNLEAK before die()
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 03/22] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
                         ` (19 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph)
+	if (!graph) {
+		UNLEAK(graph_name);
 		die("graph file %s does not exist", graph_name);
+	}
+
 	FREE_AND_NULL(graph_name);
 
 	printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 03/22] commit-graph: fix GRAPH_MIN_SIZE
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 02/22] commit-graph: UNLEAK before die() Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 04/22] commit-graph: parse commit from chosen graph Derrick Stolee
                         ` (18 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b63a1fc85e..f83f6d2373 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,10 +35,11 @@
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-			GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+			+ GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 04/22] commit-graph: parse commit from chosen graph
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (2 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 03/22] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 05/22] commit: force commit to parse from object database Derrick Stolee
                         ` (17 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f83f6d2373..e77b19971d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
 	uint32_t pos;
 
@@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item)
 		return 0;
 	if (item->object.parsed)
 		return 1;
+
+	if (find_commit_in_graph(item, g, &pos))
+		return fill_commit_in_graph(item, g, pos);
+
+	return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+
 	prepare_commit_graph();
-	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
-		return fill_commit_in_graph(item, commit_graph, pos);
+	if (commit_graph)
+		return parse_commit_in_graph_one(commit_graph, item);
 	return 0;
 }
 
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 05/22] commit: force commit to parse from object database
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (3 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 04/22] commit-graph: parse commit from chosen graph Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 06/22] commit-graph: load a root tree from specific graph Derrick Stolee
                         ` (16 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 10 ++++++++--
 commit.h |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 0c3b75aeff..598cf21cee 100644
--- a/commit.c
+++ b/commit.c
@@ -418,7 +418,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph)
 {
 	enum object_type type;
 	void *buffer;
@@ -429,7 +429,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (parse_commit_in_graph(item))
+	if (use_commit_graph && parse_commit_in_graph(item))
 		return 0;
 	buffer = read_object_file(&item->object.oid, &type, &size);
 	if (!buffer)
@@ -441,6 +441,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return error("Object %s not a commit",
 			     oid_to_hex(&item->object.oid));
 	}
+
 	ret = parse_commit_buffer(item, buffer, size, 0);
 	if (save_commit_buffer && !ret) {
 		set_commit_buffer(item, buffer, size);
@@ -450,6 +451,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 	return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+	return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 3ad07c2e3d..7e0f273720 100644
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 06/22] commit-graph: load a root tree from specific graph
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (4 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 05/22] commit: force commit to parse from object database Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-07-11  9:38         ` SZEDER Gábor
  2018-06-27 13:24       ` [PATCH v7 07/22] commit-graph: add 'verify' subcommand Derrick Stolee
                         ` (15 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e77b19971d..9e228d3bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
 	return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+						 const struct commit *c)
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
 	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
+
+	return load_tree_for_commit(g, (struct commit *)c);
+}
 
-	return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+	return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 07/22] commit-graph: add 'verify' subcommand
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (5 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 06/22] commit-graph: load a root tree from specific graph Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 21:59         ` Jonathan Tan
  2018-06-27 13:24       ` [PATCH v7 08/22] commit-graph: verify catches corrupt signature Derrick Stolee
                         ` (14 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  6 +++++
 builtin/commit-graph.c             | 39 ++++++++++++++++++++++++++++++
 commit-graph.c                     | 23 ++++++++++++++++++
 commit-graph.h                     |  3 +++
 t/t5318-commit-graph.sh            | 10 ++++++++
 5 files changed, 81 insertions(+)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git commit-graph read' [--object-dir <dir>]
+'git commit-graph verify' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 --------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..9d108f43a9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,15 +3,22 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
+	N_("git commit-graph verify [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+	N_("git commit-graph verify [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	NULL
@@ -29,6 +36,36 @@ static struct opts_commit_graph {
 	int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+	struct commit_graph *graph = NULL;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_verify_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			   N_("dir"),
+			   N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_verify_options,
+			     builtin_commit_graph_verify_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+	FREE_AND_NULL(graph_name);
+
+	if (!graph)
+		return 0;
+
+	return verify_commit_graph(the_repository, graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
@@ -165,6 +202,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 	if (argc > 0) {
 		if (!strcmp(argv[0], "read"))
 			return graph_read(argc, argv);
+		if (!strcmp(argv[0], "verify"))
+			return graph_verify(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index 9e228d3bb5..22ef696e18 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir,
 	oids.alloc = 0;
 	oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_commit_graph_error = 1;
+
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
+{
+	if (!g) {
+		graph_report("no commit-graph file loaded");
+		return 1;
+	}
+
+	return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..4359812fb4 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
+#include "repository.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -53,4 +54,6 @@ void write_commit_graph(const char *obj_dir,
 			int nr_commits,
 			int append);
 
+int verify_commit_graph(struct repository *r, struct commit_graph *g);
+
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 59d0be2877..0830ef9fdd 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -11,6 +11,11 @@ test_expect_success 'setup full repo' '
 	objdir=".git/objects"
 '
 
+test_expect_success 'verify graph with no graph file' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify
+'
+
 test_expect_success 'write graph with no packs' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write --object-dir . &&
@@ -230,4 +235,9 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'git commit-graph verify' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph verify >output
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 08/22] commit-graph: verify catches corrupt signature
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (6 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 07/22] commit-graph: add 'verify' subcommand Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 09/22] commit-graph: verify required chunks are present Derrick Stolee
                         ` (13 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 43 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0830ef9fdd..c0c1ff09b9 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
 	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse commits/8 | git commit-graph write --stdin-commits &&
 	git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify <position> <data> <string>
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+	pos=$1
+	data="${2:-\0}"
+	grepstr=$3
+	cd "$TRASH_DIRECTORY/full" &&
+	test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+	cp $objdir/info/commit-graph commit-graph-backup &&
+	printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
+	test_must_fail git commit-graph verify 2>test_err &&
+	grep -v "^+" test_err >err
+	test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+	corrupt_graph_and_verify 0 "\0" \
+		"graph signature"
+'
+
+test_expect_success 'detect bad version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+		"graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+	corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+		"hash version"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 09/22] commit-graph: verify required chunks are present
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (7 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 08/22] commit-graph: verify catches corrupt signature Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 10/22] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
                         ` (12 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          |  9 +++++++++
 t/t5318-commit-graph.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 22ef696e18..f30b4ccee9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,5 +848,14 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return 1;
 	}
 
+	verify_commit_graph_error = 0;
+
+	if (!g->chunk_oid_fanout)
+		graph_report("commit-graph is missing the OID Fanout chunk");
+	if (!g->chunk_oid_lookup)
+		graph_report("commit-graph is missing the OID Lookup chunk");
+	if (!g->chunk_commit_data)
+		graph_report("commit-graph is missing the Commit Data chunk");
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0c1ff09b9..dc16849ddd 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			    1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+			     2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
 		"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+		"missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+		"missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+		"missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+		"missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 10/22] commit-graph: verify corrupt OID fanout and lookup
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (8 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 09/22] commit-graph: verify required chunks are present Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 11/22] commit-graph: verify objects exist Derrick Stolee
                         ` (11 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 36 ++++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 22 ++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f30b4ccee9..866a9e7e41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
+	uint32_t i, cur_fanout_pos = 0;
+	struct object_id prev_oid, cur_oid;
+
 	if (!g) {
 		graph_report("no commit-graph file loaded");
 		return 1;
@@ -857,5 +860,38 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (!g->chunk_commit_data)
 		graph_report("commit-graph is missing the Commit Data chunk");
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
+			graph_report("commit-graph has incorrect OID order: %s then %s",
+				     oid_to_hex(&prev_oid),
+				     oid_to_hex(&cur_oid));
+
+		oidcpy(&prev_oid, &cur_oid);
+
+		while (cur_oid.hash[0] > cur_fanout_pos) {
+			uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+			if (i != fanout_value)
+				graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+					     cur_fanout_pos, fanout_value, i);
+
+			cur_fanout_pos++;
+		}
+	}
+
+	while (cur_fanout_pos < 256) {
+		uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
+
+		if (g->num_commits != fanout_value)
+			graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
+				     cur_fanout_pos, fanout_value, i);
+
+		cur_fanout_pos++;
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index dc16849ddd..8ae2a49cfa 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			    1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 			     2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+		       $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
 		"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+		"fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+		"incorrect OID order"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 11/22] commit-graph: verify objects exist
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (9 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 10/22] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 12/22] commit-graph: verify root tree OIDs Derrick Stolee
                         ` (10 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 18 ++++++++++++++++++
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 25 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 866a9e7e41..00e89b71e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -11,6 +11,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -242,6 +243,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
 	struct commit *c;
 	struct object_id oid;
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -893,5 +895,21 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
+	if (verify_commit_graph_error)
+		return verify_commit_graph_error;
+
+	for (i = 0; i < g->num_commits; i++) {
+		struct commit *odb_commit;
+
+		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+		odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
+		if (parse_commit_internal(odb_commit, 0, 0)) {
+			graph_report("failed to parse %s from object database",
+				     oid_to_hex(&cur_oid));
+			continue;
+		}
+	}
+
 	return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 8ae2a49cfa..a27ec97269 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
 	git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
 		"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+		"from object database"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 12/22] commit-graph: verify root tree OIDs
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (10 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 11/22] commit-graph: verify objects exist Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 13/22] commit-graph: verify parent list Derrick Stolee
                         ` (9 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 17 ++++++++++++++++-
 t/t5318-commit-graph.sh |  7 +++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 00e89b71e9..5df18394f9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -866,6 +866,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
+		struct commit *graph_commit;
+
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
 		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -883,6 +885,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 
 			cur_fanout_pos++;
 		}
+
+		graph_commit = lookup_commit(&cur_oid);
+		if (!parse_commit_in_graph_one(g, graph_commit))
+			graph_report("failed to parse %s from commit-graph",
+				     oid_to_hex(&cur_oid));
 	}
 
 	while (cur_fanout_pos < 256) {
@@ -899,16 +906,24 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
-		struct commit *odb_commit;
+		struct commit *graph_commit, *odb_commit;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+		graph_commit = lookup_commit(&cur_oid);
 		odb_commit = (struct commit *)create_object(r, cur_oid.hash, alloc_commit_node(r));
 		if (parse_commit_internal(odb_commit, 0, 0)) {
 			graph_report("failed to parse %s from object database",
 				     oid_to_hex(&cur_oid));
 			continue;
 		}
+
+		if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid,
+			   get_commit_tree_oid(odb_commit)))
+			graph_report("root tree OID for commit %s in commit-graph is %s != %s",
+				     oid_to_hex(&cur_oid),
+				     oid_to_hex(get_commit_tree_oid(graph_commit)),
+				     oid_to_hex(get_commit_tree_oid(odb_commit)));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a27ec97269..f258c6d5d0 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
 		"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+		"root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 13/22] commit-graph: verify parent list
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (11 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 12/22] commit-graph: verify root tree OIDs Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 14/22] commit-graph: verify generation number Derrick Stolee
                         ` (8 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0xFFFFFFFF to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 28 ++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 18 ++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5df18394f9..6d8d774eb0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -244,6 +244,9 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 	struct commit *c;
 	struct object_id oid;
 
+	if (pos >= g->num_commits)
+		die("invalid parent position %"PRIu64, pos);
+
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
 	c = lookup_commit(&oid);
 	if (!c)
@@ -907,6 +910,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
+		struct commit_list *graph_parents, *odb_parents;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -924,6 +928,30 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     oid_to_hex(get_commit_tree_oid(graph_commit)),
 				     oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+		graph_parents = graph_commit->parents;
+		odb_parents = odb_commit->parents;
+
+		while (graph_parents) {
+			if (odb_parents == NULL) {
+				graph_report("commit-graph parent list for commit %s is too long",
+					     oid_to_hex(&cur_oid));
+				break;
+			}
+
+			if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid))
+				graph_report("commit-graph parent for %s is %s != %s",
+					     oid_to_hex(&cur_oid),
+					     oid_to_hex(&graph_parents->item->object.oid),
+					     oid_to_hex(&odb_parents->item->object.oid));
+
+			graph_parents = graph_parents->next;
+			odb_parents = odb_parents->next;
+		}
+
+		if (odb_parents != NULL)
+			graph_report("commit-graph parent list for commit %s terminates early",
+				     oid_to_hex(&cur_oid));
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index f258c6d5d0..b41c8f4d9b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
 		"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+		"invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+		"is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+		"commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 14/22] commit-graph: verify generation number
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (12 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 13/22] commit-graph: verify parent list Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 15/22] commit-graph: verify commit date Derrick Stolee
                         ` (7 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 34 ++++++++++++++++++++++++++++++++++
 t/t5318-commit-graph.sh | 11 +++++++++++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 6d8d774eb0..e0f71658da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -846,10 +846,14 @@ static void graph_report(const char *fmt, ...)
 	va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid;
+	int generation_zero = 0;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -911,6 +915,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
+		uint32_t max_generation = 0;
 
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -945,6 +950,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
+			if (graph_parents->item->generation > max_generation)
+				max_generation = graph_parents->item->generation;
+
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
 		}
@@ -952,6 +960,32 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		if (odb_parents != NULL)
 			graph_report("commit-graph parent list for commit %s terminates early",
 				     oid_to_hex(&cur_oid));
+
+		if (!graph_commit->generation) {
+			if (generation_zero == GENERATION_NUMBER_EXISTS)
+				graph_report("commit-graph has generation number zero for commit %s, but non-zero elsewhere",
+					     oid_to_hex(&cur_oid));
+			generation_zero = GENERATION_ZERO_EXISTS;
+		} else if (generation_zero == GENERATION_ZERO_EXISTS)
+			graph_report("commit-graph has non-zero generation number for commit %s, but zero elsewhere",
+				     oid_to_hex(&cur_oid));
+
+		if (generation_zero == GENERATION_ZERO_EXISTS)
+			continue;
+
+		/*
+		 * If one of our parents has generation GENERATION_NUMBER_MAX, then
+		 * our generation is also GENERATION_NUMBER_MAX. Decrement to avoid
+		 * extra logic in the following condition.
+		 */
+		if (max_generation == GENERATION_NUMBER_MAX)
+			max_generation--;
+
+		if (graph_commit->generation != max_generation + 1)
+			graph_report("commit-graph generation for commit %s is %u != %u",
+				     oid_to_hex(&cur_oid),
+				     graph_commit->generation,
+				     max_generation + 1);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b41c8f4d9b..3128e19aef 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '
 		"commit-graph parent for"
 '
 
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
+		"generation for commit"
+'
+
+test_expect_success 'detect incorrect generation number' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
+		"non-zero generation number"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 15/22] commit-graph: verify commit date
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (13 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 14/22] commit-graph: verify generation number Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 16/22] commit-graph: test for corrupted octopus edge Derrick Stolee
                         ` (6 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 6 ++++++
 t/t5318-commit-graph.sh | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e0f71658da..6d6c6beff9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -986,6 +986,12 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 				     oid_to_hex(&cur_oid),
 				     graph_commit->generation,
 				     max_generation + 1);
+
+		if (graph_commit->date != odb_commit->date)
+			graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime,
+				     oid_to_hex(&cur_oid),
+				     graph_commit->date,
+				     odb_commit->date);
 	}
 
 	return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3128e19aef..2c65e6a95c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
 		"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+		"commit date"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 16/22] commit-graph: test for corrupted octopus edge
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (14 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 15/22] commit-graph: verify commit date Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 17/22] commit-graph: verify contents match checksum Derrick Stolee
                         ` (5 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t5318-commit-graph.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2c65e6a95c..a0cf1f66de 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+			     $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
 		"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+	corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+		"invalid parent"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 17/22] commit-graph: verify contents match checksum
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (15 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 16/22] commit-graph: test for corrupted octopus edge Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 17:51         ` Junio C Hamano
  2018-06-27 13:24       ` [PATCH v7 18/22] fsck: verify commit-graph Derrick Stolee
                         ` (4 subsequent siblings)
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 ++++++++++++++--
 t/t5318-commit-graph.sh |  6 ++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6d6c6beff9..d926c4b59f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -833,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -852,8 +853,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
 	uint32_t i, cur_fanout_pos = 0;
-	struct object_id prev_oid, cur_oid;
+	struct object_id prev_oid, cur_oid, checksum;
 	int generation_zero = 0;
+	struct hashfile *f;
+	int devnull;
 
 	if (!g) {
 		graph_report("no commit-graph file loaded");
@@ -872,6 +875,15 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 	if (verify_commit_graph_error)
 		return verify_commit_graph_error;
 
+	devnull = open("/dev/null", O_WRONLY);
+	f = hashfd(devnull, NULL);
+	hashwrite(f, g->data, g->data_len - g->hash_len);
+	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
+		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+	}
+
 	for (i = 0; i < g->num_commits; i++) {
 		struct commit *graph_commit;
 
@@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
 		cur_fanout_pos++;
 	}
 
-	if (verify_commit_graph_error)
+	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
 	for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a0cf1f66de..fed05e2f12 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 			     $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify <position> <data> <string>
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
 		"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum"
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 18/22] fsck: verify commit-graph
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (16 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 17/22] commit-graph: verify contents match checksum Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 19/22] commit-graph: use string-list API for input Derrick Stolee
                         ` (3 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c             | 21 +++++++++++++++++++++
 t/t5318-commit-graph.sh    |  8 ++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 ---------------------
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3ad4f160f9..9fb2edc69f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	check_connectivity();
+
+	if (core_commit_graph) {
+		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+		const char *verify_argv[] = { "commit-graph", "verify", NULL, NULL, NULL };
+		commit_graph_verify.argv = verify_argv;
+		commit_graph_verify.git_cmd = 1;
+
+		if (run_command(&commit_graph_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb(the_repository);
+		for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+			verify_argv[2] = "--object-dir";
+			verify_argv[3] = alt->path;
+			if (run_command(&commit_graph_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index fed05e2f12..a9e8c774d5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
 		"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git fsck &&
+	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+		"incorrect checksum" &&
+	test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 19/22] commit-graph: use string-list API for input
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (17 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 18/22] fsck: verify commit-graph Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 20/22] commit-graph: add '--reachable' option Derrick Stolee
                         ` (2 subsequent siblings)
  21 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 39 +++++++++++++--------------------------
 commit-graph.c         | 15 +++++++--------
 commit-graph.h         |  7 +++----
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9d108f43a9..ea28bc311a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-	const char **pack_indexes = NULL;
-	int packs_nr = 0;
-	const char **commit_hex = NULL;
-	int commits_nr = 0;
-	const char **lines = NULL;
-	int lines_nr = 0;
-	int lines_alloc = 0;
+	struct string_list *pack_indexes = NULL;
+	struct string_list *commit_hex = NULL;
+	struct string_list lines;
 
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -149,34 +145,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-		lines_nr = 0;
-		lines_alloc = 128;
-		ALLOC_ARRAY(lines, lines_alloc);
-
-		while (strbuf_getline(&buf, stdin) != EOF) {
-			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-			lines[lines_nr++] = strbuf_detach(&buf, NULL);
-		}
-
-		if (opts.stdin_packs) {
-			pack_indexes = lines;
-			packs_nr = lines_nr;
-		}
-		if (opts.stdin_commits) {
-			commit_hex = lines;
-			commits_nr = lines_nr;
-		}
+
+		while (strbuf_getline(&buf, stdin) != EOF)
+			string_list_append(&lines, strbuf_detach(&buf, NULL));
+
+		if (opts.stdin_packs)
+			pack_indexes = &lines;
+		if (opts.stdin_commits)
+			commit_hex = &lines;
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr,
 			   commit_hex,
-			   commits_nr,
 			   opts.append);
 
+	string_list_clear(&lines, 0);
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index d926c4b59f..a06e7e9549 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -657,10 +657,8 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append)
 {
 	struct packed_oid_list oids;
@@ -701,10 +699,10 @@ void write_commit_graph(const char *obj_dir,
 		int dirlen;
 		strbuf_addf(&packname, "%s/pack/", obj_dir);
 		dirlen = packname.len;
-		for (i = 0; i < nr_packs; i++) {
+		for (i = 0; i < pack_indexes->nr; i++) {
 			struct packed_git *p;
 			strbuf_setlen(&packname, dirlen);
-			strbuf_addstr(&packname, pack_indexes[i]);
+			strbuf_addstr(&packname, pack_indexes->items[i].string);
 			p = add_packed_git(packname.buf, packname.len, 1);
 			if (!p)
 				die("error adding pack %s", packname.buf);
@@ -717,12 +715,13 @@ void write_commit_graph(const char *obj_dir,
 	}
 
 	if (commit_hex) {
-		for (i = 0; i < nr_commits; i++) {
+		for (i = 0; i < commit_hex->nr; i++) {
 			const char *end;
 			struct object_id oid;
 			struct commit *result;
 
-			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+			if (commit_hex->items[i].string &&
+			    parse_oid_hex(commit_hex->items[i].string, &oid, &end))
 				continue;
 
 			result = lookup_commit_reference_gently(&oid, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 4359812fb4..a79b854470 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -3,6 +3,7 @@
 
 #include "git-compat-util.h"
 #include "repository.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -48,10 +49,8 @@ struct commit_graph {
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
-			const char **pack_indexes,
-			int nr_packs,
-			const char **commit_hex,
-			int nr_commits,
+			struct string_list *pack_indexes,
+			struct string_list *commit_hex,
 			int append);
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g);
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 20/22] commit-graph: add '--reachable' option
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (18 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 19/22] commit-graph: use string-list API for input Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 17:53         ` Junio C Hamano
  2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
  2018-06-27 13:24       ` [PATCH v7 22/22] commit-graph: update design document Derrick Stolee
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  8 ++++++--
 builtin/commit-graph.c             | 16 ++++++++++++----
 commit-graph.c                     | 18 ++++++++++++++++++
 commit-graph.h                     |  1 +
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index ea28bc311a..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph verify [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -25,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--reachable|--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int reachable;
 	int stdin_packs;
 	int stdin_commits;
 	int append;
@@ -127,6 +128,8 @@ static int graph_write(int argc, const char **argv)
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "reachable", &opts.reachable,
+			N_("start walk at all refs")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
@@ -140,11 +143,16 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
-	if (opts.stdin_packs && opts.stdin_commits)
-		die(_("cannot use both --stdin-commits and --stdin-packs"));
+	if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+		die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
+	if (opts.reachable) {
+		write_commit_graph_reachable(opts.obj_dir, opts.append);
+		return 0;
+	}
+
 	string_list_init(&lines, 0);
 	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
diff --git a/commit-graph.c b/commit-graph.c
index a06e7e9549..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -656,6 +657,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
 	}
 }
 
+static int add_ref_to_list(const char *refname,
+			   const struct object_id *oid,
+			   int flags, void *cb_data)
+{
+	struct string_list *list = (struct string_list*)cb_data;
+	string_list_append(list, oid_to_hex(oid));
+	return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+	struct string_list list;
+	string_list_init(&list, 1);
+	for_each_ref(add_ref_to_list, &list);
+	write_commit_graph(obj_dir, NULL, &list, append);
+}
+
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/commit-graph.h b/commit-graph.h
index a79b854470..506cb45fb1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -48,6 +48,7 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
+void write_commit_graph_reachable(const char *obj_dir, int append);
 void write_commit_graph(const char *obj_dir,
 			struct string_list *pack_indexes,
 			struct string_list *commit_hex,
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a9e8c774d5..2208c266b5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -205,6 +205,16 @@ test_expect_success 'build graph from commits with append' '
 graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph using --reachable' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --reachable &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 21/22] gc: automatically write commit-graph files
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (19 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 20/22] commit-graph: add '--reachable' option Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 18:09         ` Junio C Hamano
                           ` (2 more replies)
  2018-06-27 13:24       ` [PATCH v7 22/22] commit-graph: update design document Derrick Stolee
  21 siblings, 3 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt |  9 ++++++---
 Documentation/git-gc.txt |  4 ++++
 builtin/gc.c             |  6 ++++++
 t/t5318-commit-graph.sh  | 14 ++++++++++++++
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1cc18a828c..978deecfee 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -904,9 +904,12 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
-core.commitGraph::
-	Enable git commit graph feature. Allows reading from the
-	commit-graph file.
+gc.commitGraph::
+	If true, then gc will rewrite the commit-graph file when
+	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+	'--auto' the commit-graph will be updated if housekeeping is
+	required. Default is false. See linkgit:git-commit-graph[1]
+	for details.
 
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 24b2dd44fe..f5bc98ccb3 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..b7dd206f41 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_write_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
 	git_config_get_int("gc.aggressivedepth", &aggressive_depth);
 	git_config_get_int("gc.auto", &gc_auto_threshold);
 	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
+	git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
 	git_config_get_bool("gc.autodetach", &detach_auto);
 	git_config_get_expiry("gc.pruneexpire", &prune_expire);
 	git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 	if (pack_garbage.nr > 0)
 		clean_pack_garbage();
 
+	if (gc_write_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 0);
+
 	if (auto_gc && too_many_loose_objects())
 		warning(_("There are too many unreachable loose objects; "
 			"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2208c266b5..5947de3d24 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' '
 	test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit --allow-empty -m "blank" &&
+	git commit-graph write --reachable &&
+	cp $objdir/info/commit-graph commit-graph-before-gc &&
+	git reset --hard HEAD~1 &&
+	git config gc.writeCommitGraph true &&
+	git gc &&
+	cp $objdir/info/commit-graph commit-graph-after-gc &&
+	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	git commit-graph write --reachable &&
+	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+'
+
 # the verify tests below expect the commit-graph to contain
 # exactly the commits reachable from the commits/8 branch.
 # If the file changes the set of commits in the list, then the
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v7 22/22] commit-graph: update design document
  2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
                         ` (20 preceding siblings ...)
  2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-27 13:24       ` Derrick Stolee
  2018-06-27 18:09         ` Junio C Hamano
  21 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 13:24 UTC (permalink / raw)
  To: git; +Cc: gitster, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
     - 'log --topo-order'
     - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.24.g1b579a2ee9


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph
  2018-06-27 13:24       ` [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph Derrick Stolee
@ 2018-06-27 17:38         ` Junio C Hamano
  0 siblings, 0 replies; 122+ messages in thread
From: Junio C Hamano @ 2018-06-27 17:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> The commit-graph tests should be checking that normal Git operations
> succeed and have matching output with and without the commit-graph
> feature enabled. However, the test was toggling 'core.graph' instead
> of the correct 'core.commitGraph' variable.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>
> Junio,
>
> I sent this patch as a one-off a while ago, but it seems it was dropped.
> I'm adding it back here so we don't forget it.

Thanks, will queue.


>
> -Stolee
>
>  t/t5318-commit-graph.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 77d85aefe7..59d0be2877 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -28,8 +28,8 @@ test_expect_success 'create commits and repack' '
>  '
>  
>  graph_git_two_modes() {
> -	git -c core.graph=true $1 >output
> -	git -c core.graph=false $1 >expect
> +	git -c core.commitGraph=true $1 >output
> +	git -c core.commitGraph=false $1 >expect
>  	test_cmp output expect
>  }

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 17/22] commit-graph: verify contents match checksum
  2018-06-27 13:24       ` [PATCH v7 17/22] commit-graph: verify contents match checksum Derrick Stolee
@ 2018-06-27 17:51         ` Junio C Hamano
  0 siblings, 0 replies; 122+ messages in thread
From: Junio C Hamano @ 2018-06-27 17:51 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> +	devnull = open("/dev/null", O_WRONLY);
> +	f = hashfd(devnull, NULL);
> +	hashwrite(f, g->data, g->data_len - g->hash_len);

I wondered if we can hide the knowledge of "/dev/null" by reusing
hashfd_check() from csum-file.c (which is used by the verification
of pack idx file), which also gives us the hash comparison of an
existing file on disk for free.  But unfortunately this codepath
only verifies the checksum and not contents, so we cannot avoid
setting up the /dev/null sink manually like this patch does, I
guess.

> +	finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
> +	if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
> +		graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt"));
> +		verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
> +	}
> +
>  	for (i = 0; i < g->num_commits; i++) {
>  		struct commit *graph_commit;
>  
> @@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
>  		cur_fanout_pos++;
>  	}
>  
> -	if (verify_commit_graph_error)
> +	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>  		return verify_commit_graph_error;
>  
>  	for (i = 0; i < g->num_commits; i++) {
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index a0cf1f66de..fed05e2f12 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
>  GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
>  			     $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
>  GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
> +GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
>  
>  # usage: corrupt_graph_and_verify <position> <data> <string>
>  # Manipulates the commit-graph file at the position
> @@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus merge' '
>  		"invalid parent"
>  '
>  
> +test_expect_success 'detect invalid checksum hash' '
> +	corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> +		"incorrect checksum"
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 20/22] commit-graph: add '--reachable' option
  2018-06-27 13:24       ` [PATCH v7 20/22] commit-graph: add '--reachable' option Derrick Stolee
@ 2018-06-27 17:53         ` Junio C Hamano
  0 siblings, 0 replies; 122+ messages in thread
From: Junio C Hamano @ 2018-06-27 17:53 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> diff --git a/commit-graph.c b/commit-graph.c
> index a06e7e9549..adf54e3fe7 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -7,6 +7,7 @@
>  #include "packfile.h"
>  #include "commit.h"
>  #include "object.h"
> +#include "refs.h"
>  #include "revision.h"
>  #include "sha1-lookup.h"
>  #include "commit-graph.h"
> @@ -656,6 +657,23 @@ static void compute_generation_numbers(struct packed_commit_list* commits)
>  	}
>  }
>  
> +static int add_ref_to_list(const char *refname,
> +			   const struct object_id *oid,
> +			   int flags, void *cb_data)
> +{
> +	struct string_list *list = (struct string_list*)cb_data;

Style: "(struct string_list *)cb_data"

Also please have a blank line here between the decls and the first
statement.

> +	string_list_append(list, oid_to_hex(oid));
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 21/22] gc: automatically write commit-graph files
  2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
@ 2018-06-27 18:09         ` Junio C Hamano
  2018-06-27 18:24           ` Derrick Stolee
  2018-08-03  9:39         ` SZEDER Gábor
  2018-08-12 20:18         ` [PATCH] t5318: use 'test_cmp_bin' to compare " SZEDER Gábor
  2 siblings, 1 reply; 122+ messages in thread
From: Junio C Hamano @ 2018-06-27 18:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> @@ -40,6 +41,7 @@ static int aggressive_depth = 50;
>  static int aggressive_window = 250;
>  static int gc_auto_threshold = 6700;
>  static int gc_auto_pack_limit = 50;
> +static int gc_write_commit_graph = 0;

Please avoid unnecessary (and undesirable) explicit initialization
to 0.  Instead, let BSS to handle it by leaving " = 0" out.

> +test_expect_success 'check that gc computes commit-graph' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git commit --allow-empty -m "blank" &&
> +	git commit-graph write --reachable &&
> +	cp $objdir/info/commit-graph commit-graph-before-gc &&
> +	git reset --hard HEAD~1 &&
> +	git config gc.writeCommitGraph true &&
> +	git gc &&
> +	cp $objdir/info/commit-graph commit-graph-after-gc &&
> +	! test_cmp commit-graph-before-gc commit-graph-after-gc &&

The set of commits in the commit graph will chanbe by discarding the
(old) tip commit, so the fact that the contents of the file changed
across gc proves that "commit-graph write" was triggered during gc.

Come to think of it, do we promise to end-users (in docs etc.) that
commit-graph covers *ONLY* commits reachable from refs and HEAD?  I
am wondering what should happen if "git gc" here does not prune the
reflog for HEAD---wouldn't we want to reuse the properties of the
commit we are discarding when it comes back (e.g. you push, then
reset back, and the next pull makes it reappear in your repository)?

I guess what I am really questioning is if it is sensible to define
"--reachable" as "starting at all refs", unlike the usual connectivity
rules "gc" uses, especially when this is run from inside "gc".

> +	git commit-graph write --reachable &&
> +	test_cmp commit-graph-after-gc $objdir/info/commit-graph

This says that running "commit-graph write" twice without changing
the topology MUST yield byte-for-byte identical commit-graph file.

Is that a reasonable requirement on the future implementation?  I am
wondering if there will arise a situation where you need to store
records in "some" fixed order but two records compare "the same" and
tie-breaking them to give stable sort is expensive, or something
like that, which would benefit if you leave an escape hatch to allow
two logically identical graphs expressed bitwise differently.

> +'
> +
>  # the verify tests below expect the commit-graph to contain
>  # exactly the commits reachable from the commits/8 branch.
>  # If the file changes the set of commits in the list, then the

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 22/22] commit-graph: update design document
  2018-06-27 13:24       ` [PATCH v7 22/22] commit-graph: update design document Derrick Stolee
@ 2018-06-27 18:09         ` Junio C Hamano
  0 siblings, 0 replies; 122+ messages in thread
From: Junio C Hamano @ 2018-06-27 18:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> The commit-graph feature is now integrated with 'fsck' and 'gc',
> so remove those items from the "Future Work" section of the
> commit-graph design document.

Nice ;-)

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 21/22] gc: automatically write commit-graph files
  2018-06-27 18:09         ` Junio C Hamano
@ 2018-06-27 18:24           ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-27 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, avarab, sbeller, jnareb, marten.agren, Derrick Stolee

On 6/27/2018 2:09 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> @@ -40,6 +41,7 @@ static int aggressive_depth = 50;
>>   static int aggressive_window = 250;
>>   static int gc_auto_threshold = 6700;
>>   static int gc_auto_pack_limit = 50;
>> +static int gc_write_commit_graph = 0;
> Please avoid unnecessary (and undesirable) explicit initialization
> to 0.  Instead, let BSS to handle it by leaving " = 0" out.
>
>> +test_expect_success 'check that gc computes commit-graph' '
>> +	cd "$TRASH_DIRECTORY/full" &&
>> +	git commit --allow-empty -m "blank" &&
>> +	git commit-graph write --reachable &&
>> +	cp $objdir/info/commit-graph commit-graph-before-gc &&
>> +	git reset --hard HEAD~1 &&
>> +	git config gc.writeCommitGraph true &&
>> +	git gc &&
>> +	cp $objdir/info/commit-graph commit-graph-after-gc &&
>> +	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
> The set of commits in the commit graph will chanbe by discarding the
> (old) tip commit, so the fact that the contents of the file changed
> across gc proves that "commit-graph write" was triggered during gc.
>
> Come to think of it, do we promise to end-users (in docs etc.) that
> commit-graph covers *ONLY* commits reachable from refs and HEAD?  I
> am wondering what should happen if "git gc" here does not prune the
> reflog for HEAD---wouldn't we want to reuse the properties of the
> commit we are discarding when it comes back (e.g. you push, then
> reset back, and the next pull makes it reappear in your repository)?

Today I learned that 'gc' keeps some of the reflog around. That makes 
sense, but I wouldn't optimize the commit-graph file for this scenario.

> I guess what I am really questioning is if it is sensible to define
> "--reachable" as "starting at all refs", unlike the usual connectivity
> rules "gc" uses, especially when this is run from inside "gc".

It is sensible to me, especially because we only lose performance if we 
visit those other commits that are still in the object database. By 
writing the commit-graph on 'gc' and not during 'fetch', we are already 
assuming the commit-graph will usually be behind the set of commits that 
the user cares about, by design.

An alternate view on the decision will need help answering from others 
who know more than me: In fetch negotiation, does the client report 
commits in the reflog as 'have's or do they get re-downloaded on a 
resulting 'git pull'?

>
>> +	git commit-graph write --reachable &&
>> +	test_cmp commit-graph-after-gc $objdir/info/commit-graph
> This says that running "commit-graph write" twice without changing
> the topology MUST yield byte-for-byte identical commit-graph file.
>
> Is that a reasonable requirement on the future implementation?  I am
> wondering if there will arise a situation where you need to store
> records in "some" fixed order but two records compare "the same" and
> tie-breaking them to give stable sort is expensive, or something
> like that, which would benefit if you leave an escape hatch to allow
> two logically identical graphs expressed bitwise differently.

Since the file format allows flexibility in the order of the chunks, it 
is possible to have bitwise-different files that represent the same set 
of data. However, I would not want git to provide inconsistent output 
given the same set of commits covered by the file.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 07/22] commit-graph: add 'verify' subcommand
  2018-06-27 13:24       ` [PATCH v7 07/22] commit-graph: add 'verify' subcommand Derrick Stolee
@ 2018-06-27 21:59         ` Jonathan Tan
  2018-06-29 12:47           ` Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: Jonathan Tan @ 2018-06-27 21:59 UTC (permalink / raw)
  To: stolee
  Cc: git, gitster, avarab, sbeller, jnareb, marten.agren, dstolee,
	Jonathan Tan

> +int verify_commit_graph(struct repository *r, struct commit_graph *g)

I haven't had the time to review this patch set, but I did rebase my
object store refactoring [1] on this and wrote a test:

    static void test_verify_commit_graph(const char *gitdir, const char *worktree)
    {
    	struct repository r;
    	char *graph_name;
    	struct commit_graph *graph;
    
    	repo_init(&r, gitdir, worktree);
    
    	graph_name = get_commit_graph_filename(r.objects->objectdir);
    	graph = load_commit_graph_one(graph_name);
    
    	printf("verification returned %d\n", verify_commit_graph(&r, graph));
    
    	repo_clear(&r);
    }

However, it doesn't work because verify_commit_graph() invokes
parse_commit_internal(), which tries to look up replace refs in
the_repository.

I think that verify_commit_graph() should not take a repository argument
for now. To minimize churn on the review of this patch set, and to
minimize diffs when we migrate parse_commit_internal() (and likely other
functions) to take in a repository argument, I would be OK with
something like the following instead:

    int verify_commit_graph(struct commit_graph *g)
    {
	    /*
	     * NEEDSWORK: Make r into a parameter when all functions
	     * invoked by this function are not hardcoded to operate on
	     * the_repository.
	     */
	    struct repository *r = the_repository;
	    /* ... */

As for my rebased refactoring, I'll send the patches to the mailing list
once Junio updates ds/commit-graph-fsck with these latest changes, so
that I can rebase once again on that and ensure that everything still
works.

[1] https://public-inbox.org/git/cover.1529616356.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 07/22] commit-graph: add 'verify' subcommand
  2018-06-27 21:59         ` Jonathan Tan
@ 2018-06-29 12:47           ` Derrick Stolee
  0 siblings, 0 replies; 122+ messages in thread
From: Derrick Stolee @ 2018-06-29 12:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, avarab, sbeller, jnareb, marten.agren, dstolee

On 6/27/2018 5:59 PM, Jonathan Tan wrote:
>> +int verify_commit_graph(struct repository *r, struct commit_graph *g)
> I haven't had the time to review this patch set, but I did rebase my
> object store refactoring [1] on this and wrote a test:
>
>      static void test_verify_commit_graph(const char *gitdir, const char *worktree)
>      {
>      	struct repository r;
>      	char *graph_name;
>      	struct commit_graph *graph;
>      
>      	repo_init(&r, gitdir, worktree);
>      
>      	graph_name = get_commit_graph_filename(r.objects->objectdir);
>      	graph = load_commit_graph_one(graph_name);
>      
>      	printf("verification returned %d\n", verify_commit_graph(&r, graph));
>      
>      	repo_clear(&r);
>      }
>
> However, it doesn't work because verify_commit_graph() invokes
> parse_commit_internal(), which tries to look up replace refs in
> the_repository.
>
> I think that verify_commit_graph() should not take a repository argument
> for now. To minimize churn on the review of this patch set, and to
> minimize diffs when we migrate parse_commit_internal() (and likely other
> functions) to take in a repository argument, I would be OK with
> something like the following instead:
>
>      int verify_commit_graph(struct commit_graph *g)
>      {
> 	    /*
> 	     * NEEDSWORK: Make r into a parameter when all functions
> 	     * invoked by this function are not hardcoded to operate on
> 	     * the_repository.
> 	     */
> 	    struct repository *r = the_repository;
> 	    /* ... */
>
> As for my rebased refactoring, I'll send the patches to the mailing list
> once Junio updates ds/commit-graph-fsck with these latest changes, so
> that I can rebase once again on that and ensure that everything still
> works.
>
> [1] https://public-inbox.org/git/cover.1529616356.git.jonathantanmy@google.com/

Thanks for looking into this, Jonathan. At some point I took my series 
and moved it on top of Stefan's lookup_object() series, but then moved 
off of it since those commits were not in 'pu'. This 'struct repository 
*r' didn't get removed in that process.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 06/22] commit-graph: load a root tree from specific graph
  2018-06-27 13:24       ` [PATCH v7 06/22] commit-graph: load a root tree from specific graph Derrick Stolee
@ 2018-07-11  9:38         ` SZEDER Gábor
  2018-07-13 16:30           ` [PATCH] coccinelle: update commit.cocci Derrick Stolee
  0 siblings, 1 reply; 122+ messages in thread
From: SZEDER Gábor @ 2018-07-11  9:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, git, gitster, avarab, sbeller, jnareb,
	marten.agren, Derrick Stolee


> When lazy-loading a tree for a commit, it will be important to select
> the tree from a specific struct commit_graph. Create a new method that
> specifies the commit-graph file and use that in
> get_commit_tree_in_graph().
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e77b19971d..9e228d3bb5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *
>  	return c->maybe_tree;
>  }
>  
> -struct tree *get_commit_tree_in_graph(const struct commit *c)
> +static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
> +						 const struct commit *c)
>  {
>  	if (c->maybe_tree)
>  		return c->maybe_tree;

Coccinelle and 'commit.cocci' complain about this change, because it
renames a function directly accessing struct commit's maybe_tree
member.

In 'commit.cocci' the regex listing the names of functions that are
allowed to access c->maybe_tree directly should be updated accordingly
as well.

>  	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
> -		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
> +		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
> +
> +	return load_tree_for_commit(g, (struct commit *)c);
> +}
>  
> -	return load_tree_for_commit(commit_graph, (struct commit *)c);
> +struct tree *get_commit_tree_in_graph(const struct commit *c)
> +{
> +	return get_commit_tree_in_graph_one(commit_graph, c);
>  }
>  
>  static void write_graph_chunk_fanout(struct hashfile *f,
> -- 
> 2.18.0.24.g1b579a2ee9
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH] coccinelle: update commit.cocci
  2018-07-11  9:38         ` SZEDER Gábor
@ 2018-07-13 16:30           ` Derrick Stolee
  2018-07-13 17:45             ` Eric Sunshine
  0 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-07-13 16:30 UTC (permalink / raw)
  To: git; +Cc: gitster, szeder.dev, Derrick Stolee

A recent patch series renamed the get_commit_tree_from_graph method but
forgot to update the coccinelle script that excempted it from rules
regarding accesses to 'maybe_tree'. This fixes that oversight to bring
the coccinelle scripts back to a good state.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 contrib/coccinelle/commit.cocci | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index a7e9215ffc..aec3345adb 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -12,7 +12,7 @@ expression c;
 
 // These excluded functions must access c->maybe_tree direcly.
 @@
-identifier f !~ "^(get_commit_tree|get_commit_tree_in_graph|load_tree_for_commit)$";
+identifier f !~ "^(get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
 expression c;
 @@
   f(...) {...

base-commit: d4f65b8d141e041eb5e558cd9e763873e29863b9
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH] coccinelle: update commit.cocci
  2018-07-13 16:30           ` [PATCH] coccinelle: update commit.cocci Derrick Stolee
@ 2018-07-13 17:45             ` Eric Sunshine
  0 siblings, 0 replies; 122+ messages in thread
From: Eric Sunshine @ 2018-07-13 17:45 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git List, Junio C Hamano, SZEDER Gábor

On Fri, Jul 13, 2018 at 12:30 PM Derrick Stolee <dstolee@microsoft.com> wrote:
> A recent patch series renamed the get_commit_tree_from_graph method but
> forgot to update the coccinelle script that excempted it from rules

s/excempted/exempted/

> regarding accesses to 'maybe_tree'. This fixes that oversight to bring
> the coccinelle scripts back to a good state.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v7 21/22] gc: automatically write commit-graph files
  2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
  2018-06-27 18:09         ` Junio C Hamano
@ 2018-08-03  9:39         ` SZEDER Gábor
  2018-08-12 20:18         ` [PATCH] t5318: use 'test_cmp_bin' to compare " SZEDER Gábor
  2 siblings, 0 replies; 122+ messages in thread
From: SZEDER Gábor @ 2018-08-03  9:39 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: SZEDER Gábor, git, gitster, avarab, sbeller, jnareb,
	marten.agren, Derrick Stolee


Hi Derrick,

> The commit-graph file is a very helpful feature for speeding up git
> operations. In order to make it more useful, make it possible to
> write the commit-graph file during standard garbage collection
> operations.
> 
> Add a 'gc.commitGraph' config setting that triggers writing a
> commit-graph file after any non-trivial 'git gc' command. Defaults to
> false while the commit-graph feature matures. We specifically do not
> want to have this on by default until the commit-graph feature is fully
> integrated with history-modifying features like shallow clones.
> 
> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config.txt |  9 ++++++---
>  Documentation/git-gc.txt |  4 ++++
>  builtin/gc.c             |  6 ++++++
>  t/t5318-commit-graph.sh  | 14 ++++++++++++++
>  4 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1cc18a828c..978deecfee 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -904,9 +904,12 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> -core.commitGraph::
> -	Enable git commit graph feature. Allows reading from the
> -	commit-graph file.
> +gc.commitGraph::
> +	If true, then gc will rewrite the commit-graph file when
> +	linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
> +	'--auto' the commit-graph will be updated if housekeeping is
> +	required. Default is false. See linkgit:git-commit-graph[1]
> +	for details.

Sorry, I've lost track of this patch series, unfortunately, but after
it hit master yesterday a merge conflict with one of my (unrelated)
WIP patches drew my attention to this hunk above, which got me
confused:

 - Why remove the description of 'core.commitGraph'?  'git grep -i
   core.commitGraph' shows that it's still used supported.

 - Why insert the description of 'gc.commitGraph' here instead of next
   to the other 'gc.*' config variables, breaking lexicographical
   order of confvars?

 - This hunk, the other documentation hunk right below, and indeed the
   commit message all speak about 'gc.commitGraph', but the new code
   and test added in this patch use 'gc.writeCommitGraph' (following
   my suggestion from a couple of weeks ago, I think?).

What's going on?

>  
>  core.sparseCheckout::
>  	Enable "sparse checkout" feature. See section "Sparse checkout" in
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 24b2dd44fe..f5bc98ccb3 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` determines if
>  it within all non-bare repos or it can be set to a boolean value.
>  This defaults to true.
>  
> +The optional configuration variable `gc.commitGraph` determines if
> +'git gc' should run 'git commit-graph write'. This can be set to a
> +boolean value. This defaults to false.
> +
>  The optional configuration variable `gc.aggressiveWindow` controls how
>  much time is spent optimizing the delta compression of the objects in
>  the repository when the --aggressive option is specified.  The larger
> diff --git a/builtin/gc.c b/builtin/gc.c
> index ccfb1ceaeb..b7dd206f41 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -20,6 +20,7 @@
>  #include "sigchain.h"
>  #include "argv-array.h"
>  #include "commit.h"
> +#include "commit-graph.h"
>  #include "packfile.h"
>  #include "object-store.h"
>  #include "pack.h"
> @@ -40,6 +41,7 @@ static int aggressive_depth = 50;
>  static int aggressive_window = 250;
>  static int gc_auto_threshold = 6700;
>  static int gc_auto_pack_limit = 50;
> +static int gc_write_commit_graph = 0;
>  static int detach_auto = 1;
>  static timestamp_t gc_log_expire_time;
>  static const char *gc_log_expire = "1.day.ago";
> @@ -129,6 +131,7 @@ static void gc_config(void)
>  	git_config_get_int("gc.aggressivedepth", &aggressive_depth);
>  	git_config_get_int("gc.auto", &gc_auto_threshold);
>  	git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit);
> +	git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph);
>  	git_config_get_bool("gc.autodetach", &detach_auto);
>  	git_config_get_expiry("gc.pruneexpire", &prune_expire);
>  	git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire);
> @@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  	if (pack_garbage.nr > 0)
>  		clean_pack_garbage();
>  
> +	if (gc_write_commit_graph)
> +		write_commit_graph_reachable(get_object_directory(), 0);
> +
>  	if (auto_gc && too_many_loose_objects())
>  		warning(_("There are too many unreachable loose objects; "
>  			"run 'git prune' to remove them."));
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 2208c266b5..5947de3d24 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full repo' '
>  	test_cmp expect output
>  '
>  
> +test_expect_success 'check that gc computes commit-graph' '
> +	cd "$TRASH_DIRECTORY/full" &&
> +	git commit --allow-empty -m "blank" &&
> +	git commit-graph write --reachable &&
> +	cp $objdir/info/commit-graph commit-graph-before-gc &&
> +	git reset --hard HEAD~1 &&
> +	git config gc.writeCommitGraph true &&
> +	git gc &&
> +	cp $objdir/info/commit-graph commit-graph-after-gc &&
> +	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
> +	git commit-graph write --reachable &&
> +	test_cmp commit-graph-after-gc $objdir/info/commit-graph
> +'
> +
>  # the verify tests below expect the commit-graph to contain
>  # exactly the commits reachable from the commits/8 branch.
>  # If the file changes the set of commits in the list, then the
> -- 
> 2.18.0.24.g1b579a2ee9
> 
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH] t5318: use 'test_cmp_bin' to compare commit-graph files
  2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
  2018-06-27 18:09         ` Junio C Hamano
  2018-08-03  9:39         ` SZEDER Gábor
@ 2018-08-12 20:18         ` " SZEDER Gábor
  2018-08-13 11:24           ` Derrick Stolee
  2 siblings, 1 reply; 122+ messages in thread
From: SZEDER Gábor @ 2018-08-12 20:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, SZEDER Gábor

The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues on Windows,
where 'test_cmp' is a shell function to deal with random LF-CRLF
conversions.

Use 'test_cmp_bin' instead.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t5318-commit-graph.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4f17d7701e..1c148ebf21 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -254,9 +254,9 @@ test_expect_success 'check that gc computes commit-graph' '
 	git config gc.writeCommitGraph true &&
 	git gc &&
 	cp $objdir/info/commit-graph commit-graph-after-gc &&
-	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
 	git commit-graph write --reachable &&
-	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+	test_cmp_bin commit-graph-after-gc $objdir/info/commit-graph
 '
 
 # the verify tests below expect the commit-graph to contain
-- 
2.18.0.408.g42635c01bc


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH] t5318: use 'test_cmp_bin' to compare commit-graph files
  2018-08-12 20:18         ` [PATCH] t5318: use 'test_cmp_bin' to compare " SZEDER Gábor
@ 2018-08-13 11:24           ` Derrick Stolee
  2018-08-13 11:41             ` SZEDER Gábor
  0 siblings, 1 reply; 122+ messages in thread
From: Derrick Stolee @ 2018-08-13 11:24 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 8/12/2018 4:18 PM, SZEDER Gábor wrote:
> The commit-graph files are binary files, so they should not be
> compared with 'test_cmp', because that might cause issues on Windows,
> where 'test_cmp' is a shell function to deal with random LF-CRLF
> conversions.
>
> Use 'test_cmp_bin' instead.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>   t/t5318-commit-graph.sh | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 4f17d7701e..1c148ebf21 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -254,9 +254,9 @@ test_expect_success 'check that gc computes commit-graph' '
>   	git config gc.writeCommitGraph true &&
>   	git gc &&
>   	cp $objdir/info/commit-graph commit-graph-after-gc &&
> -	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
> +	! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
>   	git commit-graph write --reachable &&
> -	test_cmp commit-graph-after-gc $objdir/info/commit-graph
> +	test_cmp_bin commit-graph-after-gc $objdir/info/commit-graph
>   '
>   
>   # the verify tests below expect the commit-graph to contain

Thanks, Szeder! I didn't know about test_cmp_bin, and I appreciate you 
keeping up test hygiene.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH] t5318: use 'test_cmp_bin' to compare commit-graph files
  2018-08-13 11:24           ` Derrick Stolee
@ 2018-08-13 11:41             ` SZEDER Gábor
  2018-08-13 11:52               ` [PATCH v2] " SZEDER Gábor
  0 siblings, 1 reply; 122+ messages in thread
From: SZEDER Gábor @ 2018-08-13 11:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Git mailing list

On Mon, Aug 13, 2018 at 1:24 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/12/2018 4:18 PM, SZEDER Gábor wrote:
> > The commit-graph files are binary files, so they should not be
> > compared with 'test_cmp', because that might cause issues on Windows,
> > where 'test_cmp' is a shell function to deal with random LF-CRLF
> > conversions.
> >
> > Use 'test_cmp_bin' instead.

Hang on a minute, this is not the commit message I wanted to send
out...

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2] t5318: use 'test_cmp_bin' to compare commit-graph files
  2018-08-13 11:41             ` SZEDER Gábor
@ 2018-08-13 11:52               ` " SZEDER Gábor
  0 siblings, 0 replies; 122+ messages in thread
From: SZEDER Gábor @ 2018-08-13 11:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, SZEDER Gábor

The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues like
crashing[1] or infinite loop[2] on Windows, where 'test_cmp' is a
shell function to deal with random LF-CRLF conversions[3].

Use 'test_cmp_bin' instead.

1 - b93e6e3663 (t5000, t5003: do not use test_cmp to compare binary
    files, 2014-06-04)
2 - f9f3851b4d (t9300: use test_cmp_bin instead of test_cmp to compare
    binary files, 2014-09-12)
3 - 4d715ac05c (Windows: a test_cmp that is agnostic to random LF <>
    CRLF conversions, 2013-10-26)

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---

Same diff, updated commit message, including Derrick's Reviewed-by.

 t/t5318-commit-graph.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4f17d7701e..1c148ebf21 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -254,9 +254,9 @@ test_expect_success 'check that gc computes commit-graph' '
 	git config gc.writeCommitGraph true &&
 	git gc &&
 	cp $objdir/info/commit-graph commit-graph-after-gc &&
-	! test_cmp commit-graph-before-gc commit-graph-after-gc &&
+	! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
 	git commit-graph write --reachable &&
-	test_cmp commit-graph-after-gc $objdir/info/commit-graph
+	test_cmp_bin commit-graph-after-gc $objdir/info/commit-graph
 '
 
 # the verify tests below expect the commit-graph to contain
-- 
2.18.0.408.g42635c01bc


^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, back to index

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 16:52 [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 01/21] commit-graph: UNLEAK before die() Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 04/21] commit: force commit to parse from object database Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 08/21] commit-graph: verify required chunks are present Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 10/21] commit-graph: verify objects exist Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 11/21] commit-graph: verify root tree OIDs Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 12/21] commit-graph: verify parent list Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 13/21] commit-graph: verify generation number Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 14/21] commit-graph: verify commit date Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 16/21] commit-graph: verify contents match checksum Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 17/21] fsck: verify commit-graph Derrick Stolee
2018-06-06 11:08   ` Ævar Arnfjörð Bjarmason
2018-06-06 11:31     ` Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 18/21] commit-graph: use string-list API for input Derrick Stolee
2018-06-04 16:52 ` [PATCH v4 19/21] commit-graph: add '--reachable' option Derrick Stolee
2018-06-04 16:53 ` [PATCH v4 20/21] gc: automatically write commit-graph files Derrick Stolee
2018-06-06 11:11   ` Ævar Arnfjörð Bjarmason
2018-06-04 16:53 ` [PATCH v4 21/21] commit-graph: update design document Derrick Stolee
2018-06-04 17:02 ` [PATCH v4 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-06-05 14:51 ` Ævar Arnfjörð Bjarmason
2018-06-05 16:37   ` Derrick Stolee
2018-06-06 11:36 ` [PATCH v5 " Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 01/21] commit-graph: UNLEAK before die() Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 04/21] commit: force commit to parse from object database Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 08/21] commit-graph: verify required chunks are present Derrick Stolee
2018-06-06 12:21     ` Ævar Arnfjörð Bjarmason
2018-06-06 11:36   ` [PATCH v5 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 10/21] commit-graph: verify objects exist Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 11/21] commit-graph: verify root tree OIDs Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 13/21] commit-graph: verify generation number Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 12/21] commit-graph: verify parent list Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 14/21] commit-graph: verify commit date Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 16/21] commit-graph: verify contents match checksum Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 17/21] fsck: verify commit-graph Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 18/21] commit-graph: use string-list API for input Derrick Stolee
2018-06-06 12:11     ` Ævar Arnfjörð Bjarmason
2018-06-06 12:15       ` Derrick Stolee
2018-06-06 12:26         ` Ævar Arnfjörð Bjarmason
2018-06-06 12:45           ` Derrick Stolee
2018-06-08 12:48             ` Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 19/21] commit-graph: add '--reachable' option Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 20/21] gc: automatically write commit-graph files Derrick Stolee
2018-06-06 11:36   ` [PATCH v5 21/21] commit-graph: update design document Derrick Stolee
2018-06-08 13:56   ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 01/21] commit-graph: UNLEAK before die() Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 03/21] commit-graph: parse commit from chosen graph Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 04/21] commit: force commit to parse from object database Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 05/21] commit-graph: load a root tree from specific graph Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 06/21] commit-graph: add 'verify' subcommand Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 07/21] commit-graph: verify catches corrupt signature Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 08/21] commit-graph: verify required chunks are present Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 10/21] commit-graph: verify objects exist Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 12/21] commit-graph: verify parent list Derrick Stolee
2018-06-12 16:55       ` Junio C Hamano
2018-06-08 13:56     ` [PATCH v6 11/21] commit-graph: verify root tree OIDs Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 13/21] commit-graph: verify generation number Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 14/21] commit-graph: verify commit date Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 15/21] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 16/21] commit-graph: verify contents match checksum Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 17/21] fsck: verify commit-graph Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 19/21] commit-graph: add '--reachable' option Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 18/21] commit-graph: use string-list API for input Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 20/21] gc: automatically write commit-graph files Derrick Stolee
2018-06-08 22:24       ` SZEDER Gábor
2018-06-25 14:48         ` Derrick Stolee
2018-06-08 13:56     ` [PATCH v6 21/21] commit-graph: update design document Derrick Stolee
2018-06-08 15:05     ` [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc' Jakub Narębski
2018-06-08 15:11       ` Derrick Stolee
2018-06-27 13:24     ` [PATCH v7 00/22] " Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 01/22] t5318-commit-graph.sh: use core.commitGraph Derrick Stolee
2018-06-27 17:38         ` Junio C Hamano
2018-06-27 13:24       ` [PATCH v7 02/22] commit-graph: UNLEAK before die() Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 03/22] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 04/22] commit-graph: parse commit from chosen graph Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 05/22] commit: force commit to parse from object database Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 06/22] commit-graph: load a root tree from specific graph Derrick Stolee
2018-07-11  9:38         ` SZEDER Gábor
2018-07-13 16:30           ` [PATCH] coccinelle: update commit.cocci Derrick Stolee
2018-07-13 17:45             ` Eric Sunshine
2018-06-27 13:24       ` [PATCH v7 07/22] commit-graph: add 'verify' subcommand Derrick Stolee
2018-06-27 21:59         ` Jonathan Tan
2018-06-29 12:47           ` Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 08/22] commit-graph: verify catches corrupt signature Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 09/22] commit-graph: verify required chunks are present Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 10/22] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 11/22] commit-graph: verify objects exist Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 12/22] commit-graph: verify root tree OIDs Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 13/22] commit-graph: verify parent list Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 14/22] commit-graph: verify generation number Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 15/22] commit-graph: verify commit date Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 16/22] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 17/22] commit-graph: verify contents match checksum Derrick Stolee
2018-06-27 17:51         ` Junio C Hamano
2018-06-27 13:24       ` [PATCH v7 18/22] fsck: verify commit-graph Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 19/22] commit-graph: use string-list API for input Derrick Stolee
2018-06-27 13:24       ` [PATCH v7 20/22] commit-graph: add '--reachable' option Derrick Stolee
2018-06-27 17:53         ` Junio C Hamano
2018-06-27 13:24       ` [PATCH v7 21/22] gc: automatically write commit-graph files Derrick Stolee
2018-06-27 18:09         ` Junio C Hamano
2018-06-27 18:24           ` Derrick Stolee
2018-08-03  9:39         ` SZEDER Gábor
2018-08-12 20:18         ` [PATCH] t5318: use 'test_cmp_bin' to compare " SZEDER Gábor
2018-08-13 11:24           ` Derrick Stolee
2018-08-13 11:41             ` SZEDER Gábor
2018-08-13 11:52               ` [PATCH v2] " SZEDER Gábor
2018-06-27 13:24       ` [PATCH v7 22/22] commit-graph: update design document Derrick Stolee
2018-06-27 18:09         ` Junio C Hamano

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox