git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
@ 2018-05-31 17:40 Derrick Stolee
  2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:40 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

The commit-graph file stores a condensed version of the commit history.
This helps speed up several operations involving commit walks. This
feature does not work well if those commits "change" using features like
commit grafts, replace objects, or shallow clones.

Since the commit-graph feature is optional, hidden behind the
'core.commitGraph' config option, and requires manual maintenance (until
ds/commit-graph-fsck' is merged), these issues only arise for expert
users who decided to opt-in.

This RFC is a first attempt at rectify the issues that come up when
these features interact. I have not succeeded entirely, as I will
discuss below.

The first two "DO NOT MERGE" commits are not intended to be part of the
main product, but are ways to help the full test suite run while
computing and consuming commit-graph files. This is acheived by calling
write_commit_graph_reachable() from `git fetch` and `git commit`. I
believe this is the only dependence on ds/commit-graph-fsck. The other
commits should only depend on ds/commit-graph-lockfile-fix.

Running the full test suite after these DO NO MERGE commits results in
the following test failures, which I categorize by type of failure.

The following tests are red herrings. Most work by deleting a commit
from the object database and trying to demonstrate a failure. Others
rely on how certain objects are loaded. These are not bugs, but will
add noise if you run the tests suite with this patch.

t0410-partial-clone.sh			Failed tests:  5, 8
t5307-pack-missing-commit.sh		Failed tests:  3-4
t6011-rev-list-with-bad-commit.sh	Failed test:  4
t6024-recursive-merge.sh		Failed test:  4

The following tests are fixed in "commit-graph: enable replace-object
and grafts".

t6001-rev-list-graft.sh			Failed tests:  3, 5, 7, 9, 11, 13
t6050-replace.sh			Failed tests:  11-15, 23-24, 29

The following tests involve shallow clones.

t5500-fetch-pack.sh			Failed tests:  30-31, 34, 348-349
t5537-fetch-shallow.sh			Failed tests:  4-7, 9
t5802-connect-helper.sh			Failed test:  3

These tests are mostly fixed by three commits:

* commit-graph: avoid writing when repo is shallow
* fetch: destroy commit graph on shallow parameters
* commit-graph: revert to odb on missing parents

Each of these cases cover a different interaction that occurs with
shallow clones. Some are due to a commit becoming shallow, while others
are due to a commit becoming unshallow (and hence invalidating its
generation number).

After these changes, there is one test case that still fails, and I
cannot understand why:

t5500-fetch-pack.sh			Failed test:  348

This test fails when performing a `git fetch --shallow-exclude`. When I
halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
directory to replay the fetch it performs as expected. After struggling
with it for a while, I figured I should just put this series up for
discussion. Maybe someone with more experience in shallow clones can
point out the obvious issues I'm having.

Thanks,
-Stolee

Derrick Stolee (6):
  DO NOT MERGE: compute commit-graph on every commit
  DO NOT MERGE: write commit-graph on every fetch
  commit-graph: enable replace-object and grafts
  commit-graph: avoid writing when repo is shallow
  fetch: destroy commit graph on shallow parameters
  commit-graph: revert to odb on missing parents

 builtin/commit.c  |  5 +++++
 builtin/fetch.c   | 10 +++++++++
 builtin/gc.c      |  2 +-
 builtin/replace.c |  3 +++
 commit-graph.c    | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 commit-graph.h    |  9 ++++++++
 commit.c          |  5 +++++
 environment.c     |  2 +-
 8 files changed, 95 insertions(+), 6 deletions(-)

-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-05-31 19:39   ` Stefan Beller
  2018-05-31 17:41 ` [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch Derrick Stolee
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

Also enable core.commitGraph and gc.commitGraph. Helps to test the
commit-graph feature with the rest of the test infrastructure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit.c | 5 +++++
 builtin/gc.c     | 2 +-
 environment.c    | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index e8e8d13be4..8751b816c1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -32,6 +32,7 @@
 #include "column.h"
 #include "sequencer.h"
 #include "mailmap.h"
+#include "commit-graph.h"
 
 static const char * const builtin_commit_usage[] = {
 	N_("git commit [<options>] [--] <pathspec>..."),
@@ -1623,5 +1624,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 
 	UNLEAK(err);
 	UNLEAK(sb);
+
+	if (core_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 1);
+
 	return 0;
 }
diff --git a/builtin/gc.c b/builtin/gc.c
index efd214a59f..999c09d8e2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -35,7 +35,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
-static int gc_commit_graph = 0;
+static int gc_commit_graph = 1;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
diff --git a/environment.c b/environment.c
index 8853e2f0dd..fdb2d56d90 100644
--- a/environment.c
+++ b/environment.c
@@ -62,7 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
-int core_commit_graph;
+int core_commit_graph = 1;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
  2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

THIS IS ONLY FOR TESTING TO INCREASE TEST COVERAGE

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fetch.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 8ee998ea2e..af896e4b74 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -20,6 +20,7 @@
 #include "utf8.h"
 #include "packfile.h"
 #include "list-objects-filter-options.h"
+#include "commit-graph.h"
 
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
@@ -1480,6 +1481,9 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	close_all_packs();
 
+	if (core_commit_graph)
+		write_commit_graph_reachable(get_object_directory(), 1);
+
 	argv_array_pushl(&argv_gc_auto, "gc", "--auto", NULL);
 	if (verbosity < 0)
 		argv_array_push(&argv_gc_auto, "--quiet");
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/6] commit-graph: enable replace-object and grafts
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
  2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
  2018-05-31 17:41 ` [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-06-09 15:47   ` Jakub Narebski
  2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

Create destroy_commit_graph() method to delete the commit-graph file
when history is altered by a replace-object call. If the commit-graph
is rebuilt after that, we will load the correct object while reading
the commit-graph.

When parsing a commit, first check if the commit was grafted. If so,
then ignore the commit-graph for that commit and insted use the parents
loaded by parsing the commit buffer and comparing against the graft
file. 

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/replace.c |  3 +++
 commit-graph.c    | 20 +++++++++++++++++++-
 commit-graph.h    |  9 +++++++++
 commit.c          |  5 +++++
 4 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index 9f01f3fc7f..d553aadcdc 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -15,6 +15,7 @@
 #include "parse-options.h"
 #include "run-command.h"
 #include "tag.h"
+#include "commit-graph.h"
 
 static const char * const git_replace_usage[] = {
 	N_("git replace [-f] <object> <replacement>"),
@@ -468,6 +469,8 @@ int cmd_replace(int argc, const char **argv, const char *prefix)
 		usage_msg_opt("--raw only makes sense with --edit",
 			      git_replace_usage, options);
 
+	destroy_commit_graph(get_object_directory());
+
 	switch (cmdmode) {
 	case MODE_DELETE:
 		if (argc < 1)
diff --git a/commit-graph.c b/commit-graph.c
index e9195dfb17..95af4ed519 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -6,6 +6,7 @@
 #include "pack.h"
 #include "packfile.h"
 #include "commit.h"
+#include "dir.h"
 #include "object.h"
 #include "refs.h"
 #include "revision.h"
@@ -240,15 +241,22 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 {
 	struct commit *c;
 	struct object_id oid;
+	const unsigned char *real;
 
 	if (pos >= g->num_commits)
 		die("invalid parent position %"PRIu64, pos);
 
 	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+
+	real = lookup_replace_object(oid.hash);
+
 	c = lookup_commit(&oid);
 	if (!c)
 		die("could not find commit %s", oid_to_hex(&oid));
-	c->graph_pos = pos;
+
+	if (!hashcmp(real, oid.hash))
+		c->graph_pos = pos;
+
 	return &commit_list_insert(c, pptr)->next;
 }
 
@@ -1019,3 +1027,13 @@ int verify_commit_graph(struct commit_graph *g)
 
 	return verify_commit_graph_error;
 }
+
+void destroy_commit_graph(const char *obj_dir)
+{
+	char *graph_name;
+	close_commit_graph();
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	remove_path(graph_name);
+	FREE_AND_NULL(graph_name);
+}
diff --git a/commit-graph.h b/commit-graph.h
index 9a06a5f188..1d17da1582 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -56,4 +56,13 @@ void write_commit_graph(const char *obj_dir,
 
 int verify_commit_graph(struct commit_graph *g);
 
+/*
+ * Delete the commit-graph file in the given object directory.
+ *
+ * WARNING: this deletes data, so should only be used when
+ * performing history-altering actions like replace-object
+ * or grafts.
+ */
+void destroy_commit_graph(const char *obj_dir);
+
 #endif
diff --git a/commit.c b/commit.c
index 6eaed0174c..2fe31cde77 100644
--- a/commit.c
+++ b/commit.c
@@ -403,6 +403,11 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com
 		return -1;
 	if (item->object.parsed)
 		return 0;
+
+	prepare_commit_graft();
+	if (commit_graft_pos(item->object.oid.hash) >= 0)
+		use_commit_graph = 0;
+
 	if (use_commit_graph && parse_commit_in_graph(item))
 		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
                   ` (2 preceding siblings ...)
  2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-05-31 19:07   ` Stefan Beller
  2018-06-01  2:30   ` Junio C Hamano
  2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

Shallow clones do not interact well with the commit-graph feature for
several reasons. Instead of doing the hard thing to fix those
interactions, instead prevent reading or writing a commit-graph file for
shallow repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 95af4ed519..80e377b90f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -208,6 +208,9 @@ static void prepare_commit_graph(void)
 		return;
 	prepare_commit_graph_run_once = 1;
 
+	if (is_repository_shallow())
+		return;
+
 	obj_dir = get_object_directory();
 	prepare_commit_graph_one(obj_dir);
 	prepare_alt_odb();
@@ -711,6 +714,15 @@ void write_commit_graph(const char *obj_dir,
 	int num_extra_edges;
 	struct commit_list *parent;
 
+	/*
+	 * Shallow clones are not supproted, as they create bad
+	 * generation skips as they are un-shallowed.
+	 */
+	if (is_repository_shallow()) {
+		warning("writing a commit-graph in a shallow repository is not supported");
+		return;
+	}
+
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
 
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
                   ` (3 preceding siblings ...)
  2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-05-31 19:29   ` Stefan Beller
  2018-05-31 17:41 ` [RFC PATCH 6/6] commit-graph: revert to odb on missing parents Derrick Stolee
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

The commit-graph feature is not compatible with history-rewriting
features such as shallow clones. When running a 'git fetch' with
any of the shallow/unshallow options, destroy the commit-graph
file and override core.commitGraph to be false.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fetch.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index af896e4b74..2001dca881 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1452,6 +1452,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (update_shallow || depth || deepen_since || deepen_not.nr ||
+	    deepen_relative || unshallow || update_shallow || is_repository_shallow()) {
+		destroy_commit_graph(get_object_directory());
+		core_commit_graph = 0;
+	}
+
 	if (remote) {
 		if (filter_options.choice || repository_format_partial_clone)
 			fetch_one_setup_partial(remote);
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 6/6] commit-graph: revert to odb on missing parents
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
                   ` (4 preceding siblings ...)
  2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
@ 2018-05-31 17:41 ` Derrick Stolee
  2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
  2018-06-08 11:59 ` Jakub Narebski
  7 siblings, 0 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-05-31 17:41 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: sbeller@google.com, jnareb@gmail.com, stolee@gmail.com,
	Derrick Stolee

The commit-graph format includes a way to specify a parent is
"missing" from the commit-graph (i.e. we do not have a record of
that parent in our list of object IDs, and hence cannot provide
a graph position). For mose cases, this does not occur due to
the close_reachable() method adding all reachable commits. However,
in a shallow clone, we will try to record the parents of a commit
on the shallow boundary, but the parents are not in the repository.

The GRAPH_PARENT_MISSING value that is stored in the format is
purposeful, especially for future plans to make the commit-graph file
incremental or transporting sections of a commit-graph file across
the network.

In the meantime, check if a commit has a missing parent while filling
its details from the commit-graph. If a parent is missing, still
assign the generation number and graph position for that item, but
report that the commit-graph failed to fill the contents. Then the
caller is responsible for filling the rest of the data from a commit
buffer.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 80e377b90f..3e33d061fe 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -278,17 +278,44 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	struct commit_list **pptr;
 	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
 
-	item->object.parsed = 1;
+	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 	item->graph_pos = pos;
 
+	/*
+	 * If we have any edges marked as GRAPH_PARENT_MISSING, we must not parse any
+	 * more of this object and leave it to the commit buffer to parse.
+	 */
+	edge_value = get_be32(commit_data + g->hash_len);
+	if (edge_value == GRAPH_PARENT_MISSING)
+		return 0;
+	if (edge_value == GRAPH_PARENT_NONE)
+		goto continue_parsing;
+
+	edge_value = get_be32(commit_data + g->hash_len + 4);
+	if (edge_value == GRAPH_PARENT_MISSING)
+		return 0;
+	if (edge_value == GRAPH_PARENT_NONE)
+		goto continue_parsing;
+	if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED))
+		goto continue_parsing;
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+	do {
+		edge_value = get_be32(parent_data_ptr);
+		if (edge_value == GRAPH_PARENT_MISSING)
+			return 0;
+		parent_data_ptr++;
+	} while (!(edge_value & GRAPH_LAST_EDGE));
+	
+continue_parsing:
+	item->object.parsed = 1;
 	item->maybe_tree = NULL;
 
 	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
-
 	pptr = &item->parents;
 
 	edge_value = get_be32(commit_data + g->hash_len);
-- 
2.16.2.338.gcfe06ae955


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
                   ` (5 preceding siblings ...)
  2018-05-31 17:41 ` [RFC PATCH 6/6] commit-graph: revert to odb on missing parents Derrick Stolee
@ 2018-05-31 18:33 ` Stefan Beller
  2018-06-01  1:09   ` Derrick Stolee
  2018-06-08 11:59 ` Jakub Narebski
  7 siblings, 1 reply; 18+ messages in thread
From: Stefan Beller @ 2018-05-31 18:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git@vger.kernel.org, jnareb@gmail.com, stolee@gmail.com

On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee <dstolee@microsoft.com> wrote:
> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.
>
> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.
>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> write_commit_graph_reachable() from `git fetch` and `git commit`. I
> believe this is the only dependence on ds/commit-graph-fsck. The other
> commits should only depend on ds/commit-graph-lockfile-fix.

I applied these patches on top of ds/commit-graph-fsck
(it would have been nice if you mentioned that it applies there)
Running the test suite with all patches applied results in:

./t0410-partial-clone.sh                    (Wstat: 256 Tests: 15 Failed: 2)
  Failed tests:  5, 8
./t5307-pack-missing-commit.sh              (Wstat: 256 Tests: 5 Failed: 2)
  Failed tests:  3-4
./t5500-fetch-pack.sh                       (Wstat: 256 Tests: 353 Failed: 1)
  Failed test:  348
./t6011-rev-list-with-bad-commit.sh         (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  4
./t6024-recursive-merge.sh                  (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  4

which you identified as 4x noise and t5500 as not understood.

$ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x
[...]
expecting success:
git -C shallow12 fetch --shallow-exclude one origin &&
git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
test_write_lines three two >expected &&
test_cmp expected actual

++ git -C shallow12 fetch --shallow-exclude one origin
trace: built-in: git fetch --shallow-exclude one origin
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git --shallow-file  pack-objects --revs --thin
--stdout --shallow --progress --delta-base-offset --include-tag
trace: built-in: git pack-objects --revs --thin --stdout --shallow
--progress --delta-base-offset --include-tag
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 0 (delta 0)
trace: run_command: git --shallow-file  unpack-objects --pack_header=2,4
trace: built-in: git unpack-objects --pack_header=2,4
Unpacking objects: 100% (4/4), done.
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
trace: built-in: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
remote: Total 0 (delta 0), reused 0 (delta 0)
trace: run_command: git unpack-objects --pack_header=2,0
trace: built-in: git unpack-objects --pack_header=2,0
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.
 * [new tag]         one        -> one
 * [new tag]         two        -> two
run_processes_parallel: preparing to run up to 1 tasks
run_processes_parallel: done
trace: run_command: git gc --auto
trace: built-in: git gc --auto
++ git -C shallow12 log --pretty=tformat:%s origin/master
trace: built-in: git log '--pretty=tformat:%s' origin/master
++ test_write_lines three two
++ printf '%s\n' three two
++ test_cmp expected actual
++ diff -u expected actual
--- expected 2018-05-31 18:14:25.944540810 +0000
+++ actual 2018-05-31 18:14:25.944540810 +0000
@@ -1,2 +1,3 @@
 three
 two
+one
error: last command exited with $?=1
not ok 348 - fetch exclude tag one
#
# git -C shallow12 fetch --shallow-exclude one origin &&
# git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
# test_write_lines three two >expected &&
# test_cmp expected actual
#


> After these changes, there is one test case that still fails, and I
> cannot understand why:
>
> t5500-fetch-pack.sh                     Failed test:  348
>
> This test fails when performing a `git fetch --shallow-exclude`. When I
> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
> directory to replay the fetch it performs as expected.

What is "as expected" ?

When I insert a test_pause before that first line, such that:

test_expect_success 'fetch exclude tag one' '
    test_pause &&
    git -C shallow12 fetch --shallow-exclude one origin &&
    git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
    test_write_lines three two >expected &&
    test_cmp expected actual
'

and then run

  rm "shallow-exclude/.git/objects/info/commit-graph

the test works after exiting the test_pause.

Note how the shallow-exclude is the *remote* of the fetch.
So I think you also want to introduce the destruction
of commit graphs in upload-pack.c which is the remote counter part to fetch.

Why do you think these other tests are noice?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
@ 2018-05-31 19:07   ` Stefan Beller
  2018-06-01  2:30   ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2018-05-31 19:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git@vger.kernel.org, jnareb@gmail.com, stolee@gmail.com

On Thu, May 31, 2018 at 10:41 AM, Derrick Stolee <dstolee@microsoft.com> wrote:
> Shallow clones do not interact well with the commit-graph feature for
> several reasons. Instead of doing the hard thing to fix those
> interactions, instead prevent reading or writing a commit-graph file for
> shallow repositories.

Makes sense.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 95af4ed519..80e377b90f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -208,6 +208,9 @@ static void prepare_commit_graph(void)
>                 return;
>         prepare_commit_graph_run_once = 1;
>
> +       if (is_repository_shallow())
> +               return;
> +
>         obj_dir = get_object_directory();
>         prepare_commit_graph_one(obj_dir);
>         prepare_alt_odb();
> @@ -711,6 +714,15 @@ void write_commit_graph(const char *obj_dir,
>         int num_extra_edges;
>         struct commit_list *parent;
>
> +       /*
> +        * Shallow clones are not supproted, as they create bad

supported

> +        * generation skips as they are un-shallowed.
> +        */
> +       if (is_repository_shallow()) {
> +               warning("writing a commit-graph in a shallow repository is not supported");

_() ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters
  2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
@ 2018-05-31 19:29   ` Stefan Beller
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2018-05-31 19:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git@vger.kernel.org, jnareb@gmail.com, stolee@gmail.com

On Thu, May 31, 2018 at 10:41 AM, Derrick Stolee <dstolee@microsoft.com> wrote:
> The commit-graph feature is not compatible with history-rewriting
> features such as shallow clones.

I associate "history rewriting" with changing objects in the history.
For example interactive rebase or the BFG cleaner[1] / filter-branch
to remove certain commits from other commits as parents.
This history rewriting leads to different sha1s, and the commit
graph feature is compatible with that in the sense that you can
add all the new sha1s /commits to the graph and prune out the
old unreferenced commits.

Shallow clones are not rewriting history IMHO, as the sha1s do
not change. What changes is the assumption of presence of
the parent commits (which makes it hard to compute the
generation number), by the grafting trick, that "overlays" (?)
history.

This is more of a nit, though.

[1] https://rtyley.github.io/bfg-repo-cleaner/

> When running a 'git fetch' with
> any of the shallow/unshallow options, destroy the commit-graph
> file and override core.commitGraph to be false.

We do that *before* the actual fetch happens such that
the improved negotiation of the future cannot even try to
benefit from generation numbers?

We do it at fetch time instead of other local operations
as that is an entry point to commit-graph incompatible
features. Would this also be needed in clone?

I was about to ask if a more fine grained inclusion to
lookup_commit would make sense, but that is explicitely
called out in the cover letter as 'too hard for now'.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/fetch.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index af896e4b74..2001dca881 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1452,6 +1452,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>                 }
>         }
>
> +       if (update_shallow || depth || deepen_since || deepen_not.nr ||
> +           deepen_relative || unshallow || update_shallow || is_repository_shallow()) {
> +               destroy_commit_graph(get_object_directory());
> +               core_commit_graph = 0;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit
  2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
@ 2018-05-31 19:39   ` Stefan Beller
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Beller @ 2018-05-31 19:39 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git@vger.kernel.org, jnareb@gmail.com, stolee@gmail.com

On Thu, May 31, 2018 at 10:41 AM, Derrick Stolee <dstolee@microsoft.com> wrote:
> Also enable core.commitGraph and gc.commitGraph. Helps to test the
> commit-graph feature with the rest of the test infrastructure.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/commit.c | 5 +++++
>  builtin/gc.c     | 2 +-
>  environment.c    | 2 +-
>  3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index e8e8d13be4..8751b816c1 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -32,6 +32,7 @@
>  #include "column.h"
>  #include "sequencer.h"
>  #include "mailmap.h"
> +#include "commit-graph.h"
>
>  static const char * const builtin_commit_usage[] = {
>         N_("git commit [<options>] [--] <pathspec>..."),
> @@ -1623,5 +1624,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>
>         UNLEAK(err);
>         UNLEAK(sb);
> +
> +       if (core_commit_graph)
> +               write_commit_graph_reachable(get_object_directory(), 1);
> +

I'd personally put it before the UNLEAKS, as then
we have the cleanup at the end of the function and
not scattered somewhere in the middle.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
  2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
@ 2018-06-01  1:09   ` Derrick Stolee
  0 siblings, 0 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-06-01  1:09 UTC (permalink / raw)
  To: Stefan Beller, Derrick Stolee; +Cc: git@vger.kernel.org, jnareb@gmail.com

On 5/31/2018 2:33 PM, Stefan Beller wrote:
> On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee <dstolee@microsoft.com> wrote:
>> The commit-graph file stores a condensed version of the commit history.
>> This helps speed up several operations involving commit walks. This
>> feature does not work well if those commits "change" using features like
>> commit grafts, replace objects, or shallow clones.
>>
>> Since the commit-graph feature is optional, hidden behind the
>> 'core.commitGraph' config option, and requires manual maintenance (until
>> ds/commit-graph-fsck' is merged), these issues only arise for expert
>> users who decided to opt-in.
>>
>> This RFC is a first attempt at rectify the issues that come up when
>> these features interact. I have not succeeded entirely, as I will
>> discuss below.
>>
>> The first two "DO NOT MERGE" commits are not intended to be part of the
>> main product, but are ways to help the full test suite run while
>> computing and consuming commit-graph files. This is acheived by calling
>> write_commit_graph_reachable() from `git fetch` and `git commit`. I
>> believe this is the only dependence on ds/commit-graph-fsck. The other
>> commits should only depend on ds/commit-graph-lockfile-fix.
> I applied these patches on top of ds/commit-graph-fsck
> (it would have been nice if you mentioned that it applies there)
> Running the test suite with all patches applied results in:
>
> ./t0410-partial-clone.sh                    (Wstat: 256 Tests: 15 Failed: 2)
>    Failed tests:  5, 8
> ./t5307-pack-missing-commit.sh              (Wstat: 256 Tests: 5 Failed: 2)
>    Failed tests:  3-4
> ./t5500-fetch-pack.sh                       (Wstat: 256 Tests: 353 Failed: 1)
>    Failed test:  348
> ./t6011-rev-list-with-bad-commit.sh         (Wstat: 256 Tests: 6 Failed: 1)
>    Failed test:  4
> ./t6024-recursive-merge.sh                  (Wstat: 256 Tests: 6 Failed: 1)
>    Failed test:  4
>
> which you identified as 4x noise and t5500 as not understood.
>
> $ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x
> [...]
> expecting success:
> git -C shallow12 fetch --shallow-exclude one origin &&
> git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
> test_write_lines three two >expected &&
> test_cmp expected actual
>
> ++ git -C shallow12 fetch --shallow-exclude one origin
> trace: built-in: git fetch --shallow-exclude one origin
> trace: run_command: unset GIT_PREFIX; 'git-upload-pack
> '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
> trace: run_command: git --shallow-file  pack-objects --revs --thin
> --stdout --shallow --progress --delta-base-offset --include-tag
> trace: built-in: git pack-objects --revs --thin --stdout --shallow
> --progress --delta-base-offset --include-tag
> remote: Counting objects: 4, done.
> remote: Compressing objects: 100% (3/3), done.
> remote: Total 4 (delta 0), reused 0 (delta 0)
> trace: run_command: git --shallow-file  unpack-objects --pack_header=2,4
> trace: built-in: git unpack-objects --pack_header=2,4
> Unpacking objects: 100% (4/4), done.
> trace: run_command: git rev-list --objects --stdin --not --all --quiet
> trace: built-in: git rev-list --objects --stdin --not --all --quiet
> trace: run_command: unset GIT_PREFIX; 'git-upload-pack
> '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
> trace: run_command: git pack-objects --revs --thin --stdout --progress
> --delta-base-offset
> trace: built-in: git pack-objects --revs --thin --stdout --progress
> --delta-base-offset
> remote: Total 0 (delta 0), reused 0 (delta 0)
> trace: run_command: git unpack-objects --pack_header=2,0
> trace: built-in: git unpack-objects --pack_header=2,0
> trace: run_command: git rev-list --objects --stdin --not --all --quiet
> trace: built-in: git rev-list --objects --stdin --not --all --quiet
>  From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.
>   * [new tag]         one        -> one
>   * [new tag]         two        -> two
> run_processes_parallel: preparing to run up to 1 tasks
> run_processes_parallel: done
> trace: run_command: git gc --auto
> trace: built-in: git gc --auto
> ++ git -C shallow12 log --pretty=tformat:%s origin/master
> trace: built-in: git log '--pretty=tformat:%s' origin/master
> ++ test_write_lines three two
> ++ printf '%s\n' three two
> ++ test_cmp expected actual
> ++ diff -u expected actual
> --- expected 2018-05-31 18:14:25.944540810 +0000
> +++ actual 2018-05-31 18:14:25.944540810 +0000
> @@ -1,2 +1,3 @@
>   three
>   two
> +one
> error: last command exited with $?=1
> not ok 348 - fetch exclude tag one
> #
> # git -C shallow12 fetch --shallow-exclude one origin &&
> # git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
> # test_write_lines three two >expected &&
> # test_cmp expected actual
> #
>
>
>> After these changes, there is one test case that still fails, and I
>> cannot understand why:
>>
>> t5500-fetch-pack.sh                     Failed test:  348
>>
>> This test fails when performing a `git fetch --shallow-exclude`. When I
>> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
>> directory to replay the fetch it performs as expected.
> What is "as expected" ?
>
> When I insert a test_pause before that first line, such that:
>
> test_expect_success 'fetch exclude tag one' '
>      test_pause &&
>      git -C shallow12 fetch --shallow-exclude one origin &&
>      git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
>      test_write_lines three two >expected &&
>      test_cmp expected actual
> '
>
> and then run
>
>    rm "shallow-exclude/.git/objects/info/commit-graph
>
> the test works after exiting the test_pause.
>
> Note how the shallow-exclude is the *remote* of the fetch.
> So I think you also want to introduce the destruction
> of commit graphs in upload-pack.c which is the remote counter part to fetch.

Thanks for the details above. I agree that the solution is probably to 
disable the commit-graph during certain upload-pack parameters. 
Something is interfering there. I don't think the "destroy" pattern is 
right here.

> Why do you think these other tests are noice?

Take t5307-pack-missing-commit.sh for a typical example: a repo is 
constructed, a pack is made, and then that pack is manipulated to 
"remove" a commit. That missing commit is "detected" by running 
rev-list. Except if we are running the commit-graph on every 'git 
commit' call, the rev-list now succeeds since we are not looking at the 
packfile for the commit details any more!

This is what I mean by the tests being "noise". Adding the "DO NOT 
MERGE" commits only interrupted the test mechanism for unrelated behavior.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
  2018-05-31 19:07   ` Stefan Beller
@ 2018-06-01  2:30   ` Junio C Hamano
  2018-06-01 11:46     ` Derrick Stolee
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2018-06-01  2:30 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, sbeller@google.com, jnareb@gmail.com,
	stolee@gmail.com

Derrick Stolee <dstolee@microsoft.com> writes:

> Shallow clones do not interact well with the commit-graph feature for
> several reasons. Instead of doing the hard thing to fix those
> interactions, instead prevent reading or writing a commit-graph file for
> shallow repositories.

The latter instead would want to vanish, I would guess.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 95af4ed519..80e377b90f 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -208,6 +208,9 @@ static void prepare_commit_graph(void)
>  		return;
>  	prepare_commit_graph_run_once = 1;
>  
> +	if (is_repository_shallow())
> +		return;
> +
>  	obj_dir = get_object_directory();
>  	prepare_commit_graph_one(obj_dir);
>  	prepare_alt_odb();
> @@ -711,6 +714,15 @@ void write_commit_graph(const char *obj_dir,
>  	int num_extra_edges;
>  	struct commit_list *parent;
>  
> +	/*
> +	 * Shallow clones are not supproted, as they create bad
> +	 * generation skips as they are un-shallowed.
> +	 */
> +	if (is_repository_shallow()) {
> +		warning("writing a commit-graph in a shallow repository is not supported");
> +		return;
> +	}
> +
>  	oids.nr = 0;
>  	oids.alloc = approximate_object_count() / 4;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-06-01  2:30   ` Junio C Hamano
@ 2018-06-01 11:46     ` Derrick Stolee
  2018-06-02 18:39       ` Jakub Narebski
  2018-06-04  2:19       ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Derrick Stolee @ 2018-06-01 11:46 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee
  Cc: git@vger.kernel.org, sbeller@google.com, jnareb@gmail.com

On 5/31/2018 10:30 PM, Junio C Hamano wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> Shallow clones do not interact well with the commit-graph feature for
>> several reasons. Instead of doing the hard thing to fix those
>> interactions, instead prevent reading or writing a commit-graph file for
>> shallow repositories.
> The latter instead would want to vanish, I would guess.

Do you mean that we should call destroy_commit_graph() if we detect a 
shallow repository during write_commit_graph(), then I can make that change.

>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   commit-graph.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 95af4ed519..80e377b90f 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -208,6 +208,9 @@ static void prepare_commit_graph(void)
>>   		return;
>>   	prepare_commit_graph_run_once = 1;
>>   
>> +	if (is_repository_shallow())
>> +		return;
>> +
>>   	obj_dir = get_object_directory();
>>   	prepare_commit_graph_one(obj_dir);
>>   	prepare_alt_odb();
>> @@ -711,6 +714,15 @@ void write_commit_graph(const char *obj_dir,
>>   	int num_extra_edges;
>>   	struct commit_list *parent;
>>   
>> +	/*
>> +	 * Shallow clones are not supproted, as they create bad
>> +	 * generation skips as they are un-shallowed.
>> +	 */
>> +	if (is_repository_shallow()) {
>> +		warning("writing a commit-graph in a shallow repository is not supported");
>> +		return;
>> +	}
>> +
>>   	oids.nr = 0;
>>   	oids.alloc = approximate_object_count() / 4;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-06-01 11:46     ` Derrick Stolee
@ 2018-06-02 18:39       ` Jakub Narebski
  2018-06-04  2:19       ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Jakub Narebski @ 2018-06-02 18:39 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee, git@vger.kernel.org,
	sbeller@google.com

Derrick Stolee <stolee@gmail.com> writes:
> On 5/31/2018 10:30 PM, Junio C Hamano wrote:
>> Derrick Stolee <dstolee@microsoft.com> writes:
>>
>>> Shallow clones do not interact well with the commit-graph feature for
>>> several reasons. Instead of doing the hard thing to fix those
>>> interactions, instead prevent reading or writing a commit-graph file for
>>> shallow repositories.
>>
>> The latter instead would want to vanish, I would guess.
>
> Do you mean that we should call destroy_commit_graph() if we detect a
> shallow repository during write_commit_graph(), then I can make that
> change.

I think Junio meant here the "instead" word, because you have it twice
in the second sentence of quoted paragraph.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow
  2018-06-01 11:46     ` Derrick Stolee
  2018-06-02 18:39       ` Jakub Narebski
@ 2018-06-04  2:19       ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2018-06-04  2:19 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee, git@vger.kernel.org, sbeller@google.com,
	jnareb@gmail.com

Derrick Stolee <stolee@gmail.com> writes:

>>> several reasons. Instead of doing the hard thing to fix those
>>> interactions, instead prevent reading or writing a commit-graph file for
>>> shallow repositories.
>> The latter instead would want to vanish, I would guess.
>
> Do you mean that we should call destroy_commit_graph() if we detect a
> shallow repository during write_commit_graph(), then I can make that
> change.
>

No, I was just having trouble with reading "Instead of doing X,
instead do Y".

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
  2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
                   ` (6 preceding siblings ...)
  2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
@ 2018-06-08 11:59 ` Jakub Narebski
  7 siblings, 0 replies; 18+ messages in thread
From: Jakub Narebski @ 2018-06-08 11:59 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Stefan Beller, Derrick Stolee, Johannes Schindelin

Derrick Stolee <dstolee@microsoft.com> writes:

> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.

I like to think about those features as providing an overlay for the
commit graph (similar to overlay filesystems, like overlayfs); in the
case of git-replace quite literally.

I will be calling all those features "history-altering features", for
short.

> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.

What I would like to see here in cover letter is a generic description
of _strategy_ to deal with those history-altering features.

From what I understand you have the following options for each of
replacements, shallow clones and grafts:
 - turn off serialized commit-graph if given history-altering feature is
   present, as if core.commitGraph was set to false
 - invalidate and optionally refresh commit-graph file if given
   history-altering feature is present (maybe only if it changed the
   view of the history, is such check is possible)

For automatic invalidation you would need to have either:
 - cover all possible ways by which given history-altering feature can
   change the view of history, or
 - keep state of history-altering change for which commit-graph was
   created (e.g. in proposed VAL4 chunk) and compare with current view
   of history if it changed

For automatic turning off you would need only to check if
history-altering feature is present.


Let us examine each of those history-altering features that Git
supports:

* shallow clone

  - shallow clone usually means having shorter history, so serialized
    commit-graph is not needed as much; also changing the depth
    screws-up assumptions about generation numbers

  - there are only a few entry points changing the view of history,
    namely fetch and push with shallow options (--depth=<depth>,
    --deepen=<depth>, --shallow-since=<date>, --update-shallow,
    --shallow-exclude=<revision>, --unshallow)

  - it is easy to check for presence of shallow clone feature by
    chacking of $GIT_DIR/shallow exists and is not empty

  - different contents of $GIT_DIR/shallow means different view of
    history

  - NOTE: internally uses grafts mechanism (emulated grafts)

* replacements (replace objects, git-replace)

  - git-replace can be used to join current development repository with
    historical repository, in which case we would certainly want to make
    use of serialied commit graph; on the other hand the replacements do
    not necessary alter the view of the history

  - theoretically you could create replacement refs by hand, but in
    practice there are TWO ways of getting them:

    - using git-replace command to create, edit/change and delete
      replacement objects'
      '
    - fetching or having pushed-to refs in refs/replace/* namespace

  - you need to check if there are any refs in refs/replace/* namespace
    to check if the feature is used (but it doesn't necessarily mean
    that it altered project history)

  - changed list of refs in refs/replace/* namespace (which you can get
    with for-each-ref command/API) does not necessarily mean that the
    view of the history changed: you can replace non-commit object, you
    can replace commits and not change their parents; it is not as easy
    as checking if file exists

  - NOTE: the feature can be turned off manually with
    GIT_NO_REPLACE_OBJECTS environment variable and with
    --no-replace-objects option to git wrapper.

    Also when pushing, fetching and fsck-ing this feature is turned off
    and refs in refs/replace/* namespace are treated as ordinary refs.

    This may mean that we would want to create commit-graph with
    replacements for ordinary non-bare repository, and without
    replacements for server-side bare repository.

* grafts

  - subset of uses of git-replace, older and *obsolete* feature (because
    it is unsafe to use; that is you need to be careful with it).

  - edited by hand, no automatic entry points

  - if $GIT_DIR/info/grafts file is present, then feature is enabled
    (barring some corner cases, like empty file or file consisting only
    of comments)

  - changed contents of this file means changed view of history (well,
    except for reordering lines, or removing/adding empty lines and
    comments)

>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> write_commit_graph_reachable() from `git fetch` and `git commit`. I
> believe this is the only dependence on ds/commit-graph-fsck. The other
> commits should only depend on ds/commit-graph-lockfile-fix.

That's clever way of increasing coverage.

> Running the full test suite after these DO NO MERGE commits results in
> the following test failures, which I categorize by type of failure.
>
> The following tests are red herrings. Most work by deleting a commit
> from the object database and trying to demonstrate a failure. Others
> rely on how certain objects are loaded. These are not bugs, but will
> add noise if you run the tests suite with this patch.
>
> t0410-partial-clone.sh		Failed tests:  5, 8
> t5307-pack-missing-commit.sh		Failed tests:  3-4
> t6011-rev-list-with-bad-commit.sh	Failed test:  4
> t6024-recursive-merge.sh		Failed test:  4

Does this means that those tests should be in the end (i.e. when
core.commitGraph is turned on by defult) be simply run with
core.commitGraph explicitly disabled for the test?

> The following tests are fixed in "commit-graph: enable replace-object
> and grafts".

Would it make sense to split this commit into two dealing separately
with replace objects and with grafts?  Or do they use the same
underlying API?

> t6001-rev-list-graft.sh		Failed tests:  3, 5, 7, 9, 11, 13

O.K.  It might be good idea to add separate test that does the same, but
with git-replace instead of grafts, though.

> t6050-replace.sh			Failed tests:  11-15, 23-24, 29

The t6050-replace.sh does not test changing the DAG of revision
(excluding changing the SHA-1 of commit), if I read it correctly.  It
would be good to test using git-replace to change committer date, and to
change parents: shortening or lengthening history (e.g. emulating
joining two independent lines of development in the latter case).

See also comment about t6001 above.

>
> The following tests involve shallow clones.
>
> t5500-fetch-pack.sh			Failed tests:  30-31, 34, 348-349
> t5537-fetch-shallow.sh		Failed tests:  4-7, 9
> t5802-connect-helper.sh		Failed test:  3
>
> These tests are mostly fixed by three commits:
>
> * commit-graph: avoid writing when repo is shallow
> * fetch: destroy commit graph on shallow parameters

Seems O.K. from the subject, but I'd have to check the details.

> * commit-graph: revert to odb on missing parents

I wonder if reverting to using object database is a good solution, and
if it wouldn't be better to invalidate / delete commit-graph file
instead...

>
> Each of these cases cover a different interaction that occurs with
> shallow clones. Some are due to a commit becoming shallow, while others
> are due to a commit becoming unshallow (and hence invalidating its
> generation number).

Why do not simply check if repository is shallow?

>
> After these changes, there is one test case that still fails, and I
> cannot understand why:
>
> t5500-fetch-pack.sh			Failed test:  348
>
> This test fails when performing a `git fetch --shallow-exclude`. When I
> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
> directory to replay the fetch it performs as expected. After struggling
> with it for a while, I figured I should just put this series up for
> discussion. Maybe someone with more experience in shallow clones can
> point out the obvious issues I'm having.
>
> Thanks,
> -Stolee
>
> Derrick Stolee (6):
>   DO NOT MERGE: compute commit-graph on every commit
>   DO NOT MERGE: write commit-graph on every fetch
>   commit-graph: enable replace-object and grafts
>   commit-graph: avoid writing when repo is shallow
>   fetch: destroy commit graph on shallow parameters
>   commit-graph: revert to odb on missing parents
>
>  builtin/commit.c  |  5 +++++
>  builtin/fetch.c   | 10 +++++++++
>  builtin/gc.c      |  2 +-
>  builtin/replace.c |  3 +++
>  commit-graph.c    | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++----
>  commit-graph.h    |  9 ++++++++
>  commit.c          |  5 +++++
>  environment.c     |  2 +-
>  8 files changed, 95 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 3/6] commit-graph: enable replace-object and grafts
  2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
@ 2018-06-09 15:47   ` Jakub Narebski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Narebski @ 2018-06-09 15:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Stefan Beller, Derrick Stolee

Derrick Stolee <dstolee@microsoft.com> writes:

First, this commit seems to try to do two things at once, and in its
current form it should be split into adding destroy_commit_graph() and
into grafts / replacements check.  Those are separate and unconnected
features, and should be in separate patches, in my opinion.

> Create destroy_commit_graph() method to delete the commit-graph file
> when history is altered by a replace-object call. If the commit-graph
> is rebuilt after that, we will load the correct object while reading
> the commit-graph.

I'm not sure if this is the best solution.  It would work both for
invalidation, and for turning off the feature, but in my opinion there
are better solutions for both:
 - in case of invalidation, wouldn't it be better to re-create the file?
 - in case of turning off, wouldn't it be better to simply set
   core_commit_graph to false?

Nevertheless I think destroy_commit_graph(), however it would be named
at the end, would be a good to have.

> When parsing a commit, first check if the commit was grafted. If so,
> then ignore the commit-graph for that commit and insted use the parents
> loaded by parsing the commit buffer and comparing against the graft
> file.

Unless uyou turn off using generation numbers and other such indices,
this is not enough to prevent from generating incorrect results if
current commit-graph file doesn't agree with current altered view of the
project history.

Take a simple example of joining two histories using replacements
(git-replace) or grafts.  Numbers denote generation numbers:

Original history:

    A<---B<---C             <===  master
    1    2    3


    a<---b<---c<---d<---e   <===  historical
    1    2    3    4    5

Altered view of history:

    a<---b<---c<---d<---e<---[A]<---B<---C   <===  master

In the new view of history, 'e' is reachable from 'C'.  But going by
pre-altering generation numbers, gen_{pre}(e) = 5  >  gen_{pre}(C) = 3.
This means that we don't even arrive at '[A]', where replacement object
or graft changed parents compared to what's loaded from commit-graph
file, isn't it?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/replace.c |  3 +++
>  commit-graph.c    | 20 +++++++++++++++++++-
>  commit-graph.h    |  9 +++++++++
>  commit.c          |  5 +++++
>  4 files changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/replace.c b/builtin/replace.c
> index 9f01f3fc7f..d553aadcdc 100644
> --- a/builtin/replace.c
> +++ b/builtin/replace.c
> @@ -15,6 +15,7 @@
>  #include "parse-options.h"
>  #include "run-command.h"
>  #include "tag.h"
> +#include "commit-graph.h"
>  
>  static const char * const git_replace_usage[] = {
>  	N_("git replace [-f] <object> <replacement>"),
> @@ -468,6 +469,8 @@ int cmd_replace(int argc, const char **argv, const char *prefix)
>  		usage_msg_opt("--raw only makes sense with --edit",
>  			      git_replace_usage, options);
>  
> +	destroy_commit_graph(get_object_directory());
> +
>  	switch (cmdmode) {
>  	case MODE_DELETE:
>  		if (argc < 1)
> diff --git a/commit-graph.c b/commit-graph.c
> index e9195dfb17..95af4ed519 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -6,6 +6,7 @@
>  #include "pack.h"
>  #include "packfile.h"
>  #include "commit.h"
> +#include "dir.h"
>  #include "object.h"
>  #include "refs.h"
>  #include "revision.h"
> @@ -240,15 +241,22 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
>  {
>  	struct commit *c;
>  	struct object_id oid;
> +	const unsigned char *real;
>  
>  	if (pos >= g->num_commits)
>  		die("invalid parent position %"PRIu64, pos);
>  
>  	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
> +
> +	real = lookup_replace_object(oid.hash);

So this function returns either original SHA-1 / OID of object, or if
object should be replaced then it returns the replacement object's name
(replaced recursively, if necessary).

This deals with replacements, but not with grafts (from graft file)!

> +
>  	c = lookup_commit(&oid);
>  	if (!c)
>  		die("could not find commit %s", oid_to_hex(&oid));
> -	c->graph_pos = pos;
> +
> +	if (!hashcmp(real, oid.hash))
> +		c->graph_pos = pos;

I don't quite understand this reasoning here.  The serialized
commit-graph file can either follow the current state of altered
history, or not.  In the later case the commit-graph can either follow
original history, or altered but not current view of history.

What does it matter of the object was replaced or not when marking
object as coming from graph at position 'pos'?  If commit is replaced,
it still comes from commit-graph at position 'pos', isn't it?

Also, what if the starting commit was replaced?  Then this logic
wouldn't file, is it?


I don't see how it relates to what's written in the commit message:

DS> When parsing a commit, first check if the commit was grafted. If so,
DS> then ignore the commit-graph for that commit and insted use the parents
DS> loaded by parsing the commit buffer and comparing against the graft
DS> file.

> +
>  	return &commit_list_insert(c, pptr)->next;
>  }
>  
> @@ -1019,3 +1027,13 @@ int verify_commit_graph(struct commit_graph *g)
>  
>  	return verify_commit_graph_error;
>  }
> +
> +void destroy_commit_graph(const char *obj_dir)
> +{
> +	char *graph_name;
> +	close_commit_graph();
> +
> +	graph_name = get_commit_graph_filename(obj_dir);
> +	remove_path(graph_name);
> +	FREE_AND_NULL(graph_name);
> +}

All right, though I onder if it wouldn't be better to first
delete/remove/unlink the graph file, and only then close it.

Otherwise I think you might be introducing subtle race condition here.
But I may be wrong here.

> diff --git a/commit-graph.h b/commit-graph.h
> index 9a06a5f188..1d17da1582 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -56,4 +56,13 @@ void write_commit_graph(const char *obj_dir,
>  
>  int verify_commit_graph(struct commit_graph *g);
>  
> +/*
> + * Delete the commit-graph file in the given object directory.
> + *
> + * WARNING: this deletes data, so should only be used when
> + * performing history-altering actions like replace-object
> + * or grafts.
> + */
> +void destroy_commit_graph(const char *obj_dir);

Why it is not called delete_commit_graph(), then?

> +
>  #endif
> diff --git a/commit.c b/commit.c
> index 6eaed0174c..2fe31cde77 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -403,6 +403,11 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com
>  		return -1;
>  	if (item->object.parsed)
>  		return 0;
> +
> +	prepare_commit_graft();
> +	if (commit_graft_pos(item->object.oid.hash) >= 0)
> +		use_commit_graph = 0;

All right, this allegedly handles grafts.

I don't understand why you treat replacement objects in different place
than grafts; I can understand that you might want to treat them
differently (because grafts are deprecated).

> +
>  	if (use_commit_graph && parse_commit_in_graph(item))
>  		return 0;
>  	buffer = read_sha1_file(item->object.oid.hash, &type, &size);

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-09 15:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31 17:40 [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 1/6] DO NOT MERGE: compute commit-graph on every commit Derrick Stolee
2018-05-31 19:39   ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 2/6] DO NOT MERGE: write commit-graph on every fetch Derrick Stolee
2018-05-31 17:41 ` [RFC PATCH 3/6] commit-graph: enable replace-object and grafts Derrick Stolee
2018-06-09 15:47   ` Jakub Narebski
2018-05-31 17:41 ` [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow Derrick Stolee
2018-05-31 19:07   ` Stefan Beller
2018-06-01  2:30   ` Junio C Hamano
2018-06-01 11:46     ` Derrick Stolee
2018-06-02 18:39       ` Jakub Narebski
2018-06-04  2:19       ` Junio C Hamano
2018-05-31 17:41 ` [RFC PATCH 5/6] fetch: destroy commit graph on shallow parameters Derrick Stolee
2018-05-31 19:29   ` Stefan Beller
2018-05-31 17:41 ` [RFC PATCH 6/6] commit-graph: revert to odb on missing parents Derrick Stolee
2018-05-31 18:33 ` [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo Stefan Beller
2018-06-01  1:09   ` Derrick Stolee
2018-06-08 11:59 ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).