git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC Patch 0/3] Move generation, graph_pos to a slab
@ 2020-06-04  7:27 Abhishek Kumar
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
                   ` (6 more replies)
  0 siblings, 7 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-04  7:27 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

The struct commit is used in many contexts. However, members generation
and graph_pos are only used for commit-graph related operations and
otherwise waste memory.

This wastage would have been more pronounced as transistion to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

The third patch ("commit: convert commit->graph_pos to a slab",
2020-06-04) is currently failing diff-submodule related tests (t4041,
t4059 and t4060) for gcc [1]. I am going to send a second version soon,
fixing that.

[1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189

Abhishek Kumar (3):
  commit: introduce helpers for generation slab
  commit: convert commit->generation to a slab
  commit: convert commit->graph_pos to a slab

 alloc.c                             |   2 -
 blame.c                             |   2 +-
 bloom.c                             |   6 +-
 commit-graph.c                      | 116 +++++++++++++++++++++-------
 commit-graph.h                      |   8 ++
 commit-reach.c                      |  50 ++++++------
 commit.c                            |   6 +-
 commit.h                            |   6 --
 contrib/coccinelle/generation.cocci |  12 +++
 contrib/coccinelle/graph_pos.cocci  |  12 +++
 revision.c                          |  16 ++--
 11 files changed, 158 insertions(+), 78 deletions(-)
 create mode 100644 contrib/coccinelle/generation.cocci
 create mode 100644 contrib/coccinelle/graph_pos.cocci

-- 
2.27.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [GSoC Patch 1/3] commit: introduce helpers for generation slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
@ 2020-06-04  7:27 ` Abhishek Kumar
  2020-06-04 14:36   ` Derrick Stolee
                     ` (2 more replies)
  2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-04  7:27 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

The struct member generation refers to "generation number" (or more
broadly, a reachablity index value) used by commit-graph to reduce time
taken to walk commits. However, generation is not useful in other
contexts and bloats the struct.

Let's move it to a commit-slab and shrink the struct by four bytes.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c | 27 +++++++++++++++++++++++++++
 commit-graph.h |  5 +++++
 commit.h       |  3 ---
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e3420ddcbf..63f419048d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb)
 	       commit_pos_at(&commit_pos, b);
 }
 
+define_commit_slab(generation_slab, uint32_t);
+static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);
+
+uint32_t generation(const struct commit *c)
+{
+	uint32_t *gen = generation_slab_peek(&generation_slab, c);
+
+	return gen ? *gen : GENERATION_NUMBER_INFINITY;
+}
+
+static void set_generation(const struct commit *c, const uint32_t generation)
+{
+	unsigned int i = generation_slab.slab_count;
+	uint32_t *gen = generation_slab_at(&generation_slab, c);
+
+	/*
+	 * commit-slab initializes with zero, overwrite this with
+	 * GENERATION_NUMBER_INFINITY
+	 */
+	for (; i < generation_slab.slab_count; ++i) {
+		memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY,
+		       generation_slab.slab_size * sizeof(uint32_t));
+	}
+
+	*gen = generation;
+}
+
 static int commit_gen_cmp(const void *va, const void *vb)
 {
 	const struct commit *a = *(const struct commit **)va;
diff --git a/commit-graph.h b/commit-graph.h
index 4212766a4f..653bd041ad 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -8,6 +8,10 @@
 #include "object-store.h"
 #include "oidset.h"
 
+#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
+#define GENERATION_NUMBER_MAX 0x3FFFFFFF
+#define GENERATION_NUMBER_ZERO 0
+
 #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
 #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
@@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *);
  */
 void disable_commit_graph(struct repository *r);
 
+uint32_t generation(const struct commit *c);
 #endif
diff --git a/commit.h b/commit.h
index 1b2dea5d85..cc610400d5 100644
--- a/commit.h
+++ b/commit.h
@@ -11,9 +11,6 @@
 #include "commit-slab.h"
 
 #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
-#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
-#define GENERATION_NUMBER_MAX 0x3FFFFFFF
-#define GENERATION_NUMBER_ZERO 0
 
 struct commit_list {
 	struct commit *item;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSoC Patch 2/3] commit: convert commit->generation to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
@ 2020-06-04  7:27 ` Abhishek Kumar
  2020-06-04 14:27   ` Derrick Stolee
                     ` (2 more replies)
  2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-04  7:27 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

In this commit, we will use the generation slab helpers introduced in
last commit and replace existing uses of commit->generation using
'contrib/coccinelle/generation.cocci'

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                             |  1 -
 blame.c                             |  2 +-
 commit-graph.c                      | 39 +++++++++++-----------
 commit-reach.c                      | 50 ++++++++++++++---------------
 commit.c                            |  4 +--
 commit.h                            |  1 -
 contrib/coccinelle/generation.cocci | 12 +++++++
 revision.c                          | 16 ++++-----
 8 files changed, 68 insertions(+), 57 deletions(-)
 create mode 100644 contrib/coccinelle/generation.cocci

diff --git a/alloc.c b/alloc.c
index 1c64c4dd16..cbed187094 100644
--- a/alloc.c
+++ b/alloc.c
@@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c)
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index(r);
 	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
-	c->generation = GENERATION_NUMBER_INFINITY;
 }
 
 void *alloc_commit_node(struct repository *r)
diff --git a/blame.c b/blame.c
index da7e28800e..50e6316076 100644
--- a/blame.c
+++ b/blame.c
@@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
 	if (!bd)
 		return 1;
 
-	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
+	if (generation(origin->commit) == GENERATION_NUMBER_INFINITY)
 		return 1;
 
 	filter = get_bloom_filter(r, origin->commit, 0);
diff --git a/commit-graph.c b/commit-graph.c
index 63f419048d..9ce7d4acb1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -120,9 +120,9 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *b = *(const struct commit **)vb;
 
 	/* lower generation commits first */
-	if (a->generation < b->generation)
+	if (generation(a) < generation(b))
 		return -1;
-	else if (a->generation > b->generation)
+	else if (generation(a) > generation(b))
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -712,7 +712,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
 	item->graph_pos = pos;
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -754,7 +754,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
 
 	pptr = &item->parents;
 
@@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl((*list)->generation << 2);
+		packedDate[0] |= htonl(generation((*list)) << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1280,8 +1280,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
 		display_progress(ctx->progress, i + 1);
-		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
+		if (generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
+		    generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1292,22 +1292,23 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+				if (generation(parent->item) == GENERATION_NUMBER_INFINITY ||
+				    generation(parent->item) == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (parent->item->generation > max_generation) {
-					max_generation = parent->item->generation;
+				} else if (generation(parent->item) > max_generation) {
+					max_generation = generation(parent->item);
 				}
 			}
 
 			if (all_parents_computed) {
-				current->generation = max_generation + 1;
+				set_generation(current, max_generation + 1);
 				pop_commit(&list);
 
-				if (current->generation > GENERATION_NUMBER_MAX)
-					current->generation = GENERATION_NUMBER_MAX;
+				if (generation(current) > GENERATION_NUMBER_MAX)
+					set_generation(current,
+						       GENERATION_NUMBER_MAX);
 			}
 		}
 	}
@@ -2314,8 +2315,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (graph_parents->item->generation > max_generation)
-				max_generation = graph_parents->item->generation;
+			if (generation(graph_parents->item) > max_generation)
+				max_generation = generation(graph_parents->item);
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2325,7 +2326,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 			graph_report(_("commit-graph parent list for commit %s terminates early"),
 				     oid_to_hex(&cur_oid));
 
-		if (!graph_commit->generation) {
+		if (!generation(graph_commit)) {
 			if (generation_zero == GENERATION_NUMBER_EXISTS)
 				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
 					     oid_to_hex(&cur_oid));
@@ -2345,10 +2346,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (graph_commit->generation != max_generation + 1)
+		if (generation(graph_commit) != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     graph_commit->generation,
+				     generation(graph_commit),
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-reach.c b/commit-reach.c
index 4ca7e706a1..77c980054a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit_list *parents;
 		int flags;
 
-		if (min_generation && commit->generation > last_gen)
+		if (min_generation && generation(commit) > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit->generation, last_gen,
+			    generation(commit), last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit->generation;
+		last_gen = generation(commit);
 
-		if (commit->generation < min_generation)
+		if (generation(commit) < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		repo_parse_commit(r, array[i]);
 	for (i = 0; i < cnt; i++) {
 		struct commit_list *common;
-		uint32_t min_generation = array[i]->generation;
+		uint32_t min_generation = generation(array[i]);
 
 		if (redundant[i])
 			continue;
@@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (array[j]->generation < min_generation)
-				min_generation = array[j]->generation;
+			if (generation(array[j]) < min_generation)
+				min_generation = generation(array[j]);
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (reference[i]->generation < min_generation)
-			min_generation = reference[i]->generation;
+		if (generation(reference[i]) < min_generation)
+			min_generation = generation(reference[i]);
 	}
 
-	if (commit->generation > min_generation)
+	if (generation(commit) > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit->generation);
+				     generation(commit));
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate,
 	/* Otherwise, we don't know; prepare to recurse */
 	parse_commit_or_die(candidate);
 
-	if (candidate->generation < cutoff)
+	if (generation(candidate) < cutoff)
 		return CONTAINS_NO;
 
 	return CONTAINS_UNKNOWN;
@@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	for (p = want; p; p = p->next) {
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (c->generation < cutoff)
-			cutoff = c->generation;
+		if (generation(c) < cutoff)
+			cutoff = generation(c);
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (a->generation < b->generation)
+	if (generation(a) < generation(b))
 		return -1;
-	if (a->generation > b->generation)
+	if (generation(a) > generation(b))
 		return 1;
 	return 0;
 }
@@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 		list[nr_commits] = (struct commit *)from_one;
 		if (parse_commit(list[nr_commits]) ||
-		    list[nr_commits]->generation < min_generation) {
+		    generation(list[nr_commits]) < min_generation) {
 			result = 0;
 			goto cleanup;
 		}
@@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 					if (parse_commit(parent->item) ||
 					    parent->item->date < min_commit_date ||
-					    parent->item->generation < min_generation)
+					    generation(parent->item) < min_generation)
 						continue;
 
 					commit_list_insert(parent->item, &stack);
@@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (from_iter->item->generation < min_generation)
-				min_generation = from_iter->item->generation;
+			if (generation(from_iter->item) < min_generation)
+				min_generation = generation(from_iter->item);
 		}
 
 		from_iter = from_iter->next;
@@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (to_iter->item->generation < min_generation)
-				min_generation = to_iter->item->generation;
+			if (generation(to_iter->item) < min_generation)
+				min_generation = generation(to_iter->item);
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (c->generation < min_generation)
-			min_generation = c->generation;
+		if (generation(c) < min_generation)
+			min_generation = generation(c);
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
@@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 			parse_commit(p);
 
-			if (p->generation < min_generation)
+			if (generation(p) < min_generation)
 				continue;
 
 			if (p->object.flags & PARENT2)
diff --git a/commit.c b/commit.c
index 87686a7055..8dad0f8446 100644
--- a/commit.c
+++ b/commit.c
@@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 	const struct commit *a = a_, *b = b_;
 
 	/* newer commits first */
-	if (a->generation < b->generation)
+	if (generation(a) < generation(b))
 		return 1;
-	else if (a->generation > b->generation)
+	else if (generation(a) > generation(b))
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/commit.h b/commit.h
index cc610400d5..01e1c4c3eb 100644
--- a/commit.h
+++ b/commit.h
@@ -34,7 +34,6 @@ struct commit {
 	 */
 	struct tree *maybe_tree;
 	uint32_t graph_pos;
-	uint32_t generation;
 	unsigned int index;
 };
 
diff --git a/contrib/coccinelle/generation.cocci b/contrib/coccinelle/generation.cocci
new file mode 100644
index 0000000000..da13c44856
--- /dev/null
+++ b/contrib/coccinelle/generation.cocci
@@ -0,0 +1,12 @@
+@@
+struct commit *c;
+expression E;
+@@
+- c->generation = E
++ set_generation(c, E)
+
+@@
+struct commit *c;
+@@
+- c->generation
++ generation(c)
diff --git a/revision.c b/revision.c
index 60cca8c0b9..d76382007c 100644
--- a/revision.c
+++ b/revision.c
@@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
 	if (!revs->repo->objects->commit_graph)
 		return -1;
 
-	if (commit->generation == GENERATION_NUMBER_INFINITY)
+	if (generation(commit) == GENERATION_NUMBER_INFINITY)
 		return -1;
 
 	filter = get_bloom_filter(revs->repo, commit, 0);
@@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->explore_queue)) &&
-	       c->generation >= gen_cutoff)
+	       generation(c) >= gen_cutoff)
 		explore_walk_step(revs);
 }
 
@@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs)
 	if (parse_commit_gently(c, 1) < 0)
 		return;
 
-	explore_to_depth(revs, c->generation);
+	explore_to_depth(revs, generation(c));
 
 	for (p = c->parents; p; p = p->next) {
 		struct commit *parent = p->item;
@@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->indegree_queue)) &&
-	       c->generation >= gen_cutoff)
+	       generation(c) >= gen_cutoff)
 		indegree_walk_step(revs);
 }
 
@@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (c->generation < info->min_generation)
-			info->min_generation = c->generation;
+		if (generation(c) < info->min_generation)
+			info->min_generation = generation(c);
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (parent->generation < info->min_generation) {
-			info->min_generation = parent->generation;
+		if (generation(parent) < info->min_generation) {
+			info->min_generation = generation(parent);
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSoC Patch 3/3] commit: convert commit->graph_pos to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
  2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
@ 2020-06-04  7:27 ` Abhishek Kumar
  2020-06-07 12:12   ` Jakub Narębski
  2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-04  7:27 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

The member graph_pos refers to the integer position used to identify a
commit in commit-graph files. However, graph_pos is not useful in other
contexts and bloats the struct.

Let's move it to a commit-slab and shrink the struct by four bytes.

Existing references to graph_pos are replaced using
'contrib/coccinelle/graph_pos.cocci'.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                            |  1 -
 bloom.c                            |  6 ++--
 commit-graph.c                     | 50 +++++++++++++++++++++++-------
 commit-graph.h                     |  3 ++
 commit.c                           |  2 +-
 commit.h                           |  2 --
 contrib/coccinelle/graph_pos.cocci | 12 +++++++
 7 files changed, 58 insertions(+), 18 deletions(-)
 create mode 100644 contrib/coccinelle/graph_pos.cocci

diff --git a/alloc.c b/alloc.c
index cbed187094..f37fb3b8b6 100644
--- a/alloc.c
+++ b/alloc.c
@@ -108,7 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c)
 {
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index(r);
-	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 }
 
 void *alloc_commit_node(struct repository *r)
diff --git a/bloom.c b/bloom.c
index 9b86aa3f59..5bee5bb0c1 100644
--- a/bloom.c
+++ b/bloom.c
@@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 {
 	uint32_t lex_pos, start_index, end_index;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (graph_pos(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = c->graph_pos - g->num_commits_in_base;
+	lex_pos = graph_pos(c) - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
@@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
-		if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
+		if (graph_pos(c) != COMMIT_NOT_FROM_GRAPH &&
 			r->objects->commit_graph->chunk_bloom_indexes) {
 			if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
 				return filter;
diff --git a/commit-graph.c b/commit-graph.c
index 9ce7d4acb1..7ff460b442 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -87,6 +87,34 @@ static int commit_pos_cmp(const void *va, const void *vb)
 	       commit_pos_at(&commit_pos, b);
 }
 
+define_commit_slab(graph_pos_slab, uint32_t);
+static struct graph_pos_slab graph_pos_slab = COMMIT_SLAB_INIT(1, graph_pos_slab);
+
+uint32_t graph_pos(const struct commit *c)
+{
+	uint32_t *pos = graph_pos_slab_peek(&graph_pos_slab, c);
+
+	return pos ? *pos : COMMIT_NOT_FROM_GRAPH;
+}
+
+static void set_graph_pos(const struct commit *c, const uint32_t position)
+{
+	unsigned int i = graph_pos_slab.slab_count;
+	uint32_t *pos = graph_pos_slab_at(&graph_pos_slab, c);
+
+	/*
+	 * commit-slab initializes with zero, overwrite this with
+	 * COMMIT_NOT_FROM_GRAPH
+	 */
+	for (; i < graph_pos_slab.slab_count; ++i)
+	{
+		memset(graph_pos_slab.slab[i], COMMIT_NOT_FROM_GRAPH,
+		       graph_pos_slab.slab_size * sizeof(uint32_t));
+	}
+
+	*pos = position;
+}
+
 define_commit_slab(generation_slab, uint32_t);
 static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);
 
@@ -697,7 +725,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	c = lookup_commit(r, &oid);
 	if (!c)
 		die(_("could not find commit %s"), oid_to_hex(&oid));
-	c->graph_pos = pos;
+	set_graph_pos(c, pos);
 	return &commit_list_insert(c, pptr)->next;
 }
 
@@ -711,7 +739,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	item->graph_pos = pos;
+	set_graph_pos(item, pos);
 	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
 }
 
@@ -741,7 +769,7 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	item->graph_pos = pos;
+	set_graph_pos(item, pos);
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -786,8 +814,8 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-		*pos = item->graph_pos;
+	if (graph_pos(item) != COMMIT_NOT_FROM_GRAPH) {
+		*pos = graph_pos(item);
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -843,11 +871,11 @@ static struct tree *load_tree_for_commit(struct repository *r,
 	struct object_id oid;
 	const unsigned char *commit_data;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (graph_pos(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (graph_pos(c) - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -861,7 +889,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r,
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
-	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
+	if (graph_pos(c) == COMMIT_NOT_FROM_GRAPH)
 		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
 
 	return load_tree_for_commit(r, g, (struct commit *)c);
@@ -1247,7 +1275,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 			continue;
 		if (ctx->split) {
 			if ((!parse_commit(commit) &&
-			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
+			     graph_pos(commit) == COMMIT_NOT_FROM_GRAPH) ||
 			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
 				add_missing_parents(ctx, commit);
 		} else if (!parse_commit_no_graph(commit))
@@ -1493,7 +1521,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 			if (ctx->split) {
 				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
-				if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH)
+				if (!c || graph_pos(c) != COMMIT_NOT_FROM_GRAPH)
 					continue;
 			}
 
@@ -1527,7 +1555,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
-		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		    graph_pos(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH)
 			continue;
 
 		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
diff --git a/commit-graph.h b/commit-graph.h
index 653bd041ad..3cb59ba336 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -8,6 +8,7 @@
 #include "object-store.h"
 #include "oidset.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
 #define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
 #define GENERATION_NUMBER_MAX 0x3FFFFFFF
 #define GENERATION_NUMBER_ZERO 0
@@ -142,4 +143,6 @@ void free_commit_graph(struct commit_graph *);
 void disable_commit_graph(struct repository *r);
 
 uint32_t generation(const struct commit *c);
+
+uint32_t graph_pos(const struct commit *c);
 #endif
diff --git a/commit.c b/commit.c
index 8dad0f8446..da6de08b2b 100644
--- a/commit.c
+++ b/commit.c
@@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+	if (graph_pos(commit) != COMMIT_NOT_FROM_GRAPH)
 		return get_commit_tree_in_graph(r, commit);
 
 	return NULL;
diff --git a/commit.h b/commit.h
index 01e1c4c3eb..0b10464a10 100644
--- a/commit.h
+++ b/commit.h
@@ -10,8 +10,6 @@
 #include "pretty.h"
 #include "commit-slab.h"
 
-#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
-
 struct commit_list {
 	struct commit *item;
 	struct commit_list *next;
diff --git a/contrib/coccinelle/graph_pos.cocci b/contrib/coccinelle/graph_pos.cocci
new file mode 100644
index 0000000000..0929164bdf
--- /dev/null
+++ b/contrib/coccinelle/graph_pos.cocci
@@ -0,0 +1,12 @@
+@@
+struct commit *c;
+expression E;
+@@
+- c->graph_pos = E
++ set_graph_pos(c, E)
+
+@@
+struct commit *c;
+@@
+- c->graph_pos
++ graph_pos(c)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (2 preceding siblings ...)
  2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
@ 2020-06-04 14:22 ` Derrick Stolee
  2020-06-04 17:55   ` Junio C Hamano
  2020-06-07 19:53   ` SZEDER Gábor
  2020-06-05 19:00 ` Jakub Narębski
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 39+ messages in thread
From: Derrick Stolee @ 2020-06-04 14:22 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: jnareb

On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> The struct commit is used in many contexts. However, members generation
> and graph_pos are only used for commit-graph related operations and
> otherwise waste memory.
> 
> This wastage would have been more pronounced as transistion to
> generation number v2, which uses 64-bit generation number instead of
> current 32-bits.

Thanks! This is an important step, and will already improve
performance in subtle ways.

> The third patch ("commit: convert commit->graph_pos to a slab",
> 2020-06-04) is currently failing diff-submodule related tests (t4041,
> t4059 and t4060) for gcc [1]. I am going to send a second version soon,
> fixing that.
> 
> [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189
> 
> Abhishek Kumar (3):
>   commit: introduce helpers for generation slab
>   commit: convert commit->generation to a slab
>   commit: convert commit->graph_pos to a slab

If we have a commit-graph file, then we have graph_pos
and generation both coming from that file. Perhaps it
would be better to combine the data into a single slab
that stores a "struct commit_graph_data" or something?

This would change only the slab definitions, since you
already do a good job of wrapping the slab access in
methods.

>  alloc.c                             |   2 -
>  blame.c                             |   2 +-
>  bloom.c                             |   6 +-
>  commit-graph.c                      | 116 +++++++++++++++++++++-------
>  commit-graph.h                      |   8 ++
>  commit-reach.c                      |  50 ++++++------
>  commit.c                            |   6 +-
>  commit.h                            |   6 --
>  contrib/coccinelle/generation.cocci |  12 +++
>  contrib/coccinelle/graph_pos.cocci  |  12 +++
>  revision.c                          |  16 ++--
>  11 files changed, 158 insertions(+), 78 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>  create mode 100644 contrib/coccinelle/graph_pos.cocci

I appreciate the Coccinelle scripts to help identify
automatic fixes for other topics in-flight. However,
I wonder if they would be better placed inside the
existing commit.cocci file?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab
  2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
@ 2020-06-04 14:27   ` Derrick Stolee
  2020-06-04 17:49   ` Junio C Hamano
  2020-06-06 22:03   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2020-06-04 14:27 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: jnareb

On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> In this commit, we will use the generation slab helpers introduced in
> last commit and replace existing uses of commit->generation using
> 'contrib/coccinelle/generation.cocci'
> @@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		else
>  			packedDate[0] = 0;
>  
> -		packedDate[0] |= htonl((*list)->generation << 2);
> +		packedDate[0] |= htonl(generation((*list)) << 2);

nit: We no longer need the extra parens around *list.

> @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs)
>  		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
>  		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
>  
> -		if (c->generation < info->min_generation)
> -			info->min_generation = c->generation;
> +		if (generation(c) < info->min_generation)
> +			info->min_generation = generation(c);

A pattern I've noticed in several places is that the struct
member is accessed multiple times in the same method body,
and this is auto-converted to multiple method calls. However,
these values are fixed, so it would be better to store the
value as a local variable and reuse that variable instead.

This is one of the shortcomings of the Coccinelle transformation,
so you'll need to manually inspect each of the diff fragments to
see if we can reduce the number of method calls. It might be
helpful to do that as a follow-up, so we can see that this patch
is generated by the Coccinelle script, and then a later patch can
be scrutinized more carefully when you are doing manual code
manipulation.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
@ 2020-06-04 14:36   ` Derrick Stolee
  2020-06-04 17:35   ` Junio C Hamano
  2020-06-05 23:23   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2020-06-04 14:36 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: jnareb

On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> The struct member generation refers to "generation number" (or more
> broadly, a reachablity index value) used by commit-graph to reduce time
> taken to walk commits. However, generation is not useful in other
> contexts and bloats the struct.
> 
> Let's move it to a commit-slab and shrink the struct by four bytes.
> 
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c | 27 +++++++++++++++++++++++++++
>  commit-graph.h |  5 +++++
>  commit.h       |  3 ---
>  3 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e3420ddcbf..63f419048d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>  
> +define_commit_slab(generation_slab, uint32_t);
> +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);
> +
> +uint32_t generation(const struct commit *c)
> +{
> +	uint32_t *gen = generation_slab_peek(&generation_slab, c);
> +
> +	return gen ? *gen : GENERATION_NUMBER_INFINITY;
> +}

This is good: if we don't have the value, then use INFINITY.
In the header file, perhaps include a warning comment that a
caller _must_ first parse the commit or else we have no guarantee
that the generation slab is populated. This matches the current
expectations before accessing the generation member.

> +static void set_generation(const struct commit *c, const uint32_t generation)
> +{
> +	unsigned int i = generation_slab.slab_count;
> +	uint32_t *gen = generation_slab_at(&generation_slab, c);
> +
> +	/*
> +	 * commit-slab initializes with zero, overwrite this with
> +	 * GENERATION_NUMBER_INFINITY
> +	 */
> +	for (; i < generation_slab.slab_count; ++i) {
> +		memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY,
> +		       generation_slab.slab_size * sizeof(uint32_t));
> +	}

Here is an example where combining the graph_pos and generation
slabs into one would be helpful. The only reason the generation
would be INFINITY is if graph_pos is COMMIT_NOT_FROM_GRAPH. If
the two values are side-by-side, we could just check graph_pos
first and return INFINITY instead of paying this initialization
cost as the slab grows.

I would also like to avoid initializing the slab if there is
no commit-graph present. I wonder if we can populate the slab
while parsing the commit-graph and check here if the slab is
NULL before doing any other logic? (I'm not sure if this is
possible, but it would be nice.)

> diff --git a/commit-graph.h b/commit-graph.h
> index 4212766a4f..653bd041ad 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -8,6 +8,10 @@
>  #include "object-store.h"
>  #include "oidset.h"
>  
> +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> +#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> +#define GENERATION_NUMBER_ZERO 0
> +
>  #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
>  #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
>  #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
> @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *);
>   */
>  void disable_commit_graph(struct repository *r);
>  
> +uint32_t generation(const struct commit *c);
>  #endif
> diff --git a/commit.h b/commit.h
> index 1b2dea5d85..cc610400d5 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -11,9 +11,6 @@
>  #include "commit-slab.h"
>  
>  #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
> -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> -#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> -#define GENERATION_NUMBER_ZERO 0

I appreciate that you are able to relocate these constants to
a more appropriate location.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
  2020-06-04 14:36   ` Derrick Stolee
@ 2020-06-04 17:35   ` Junio C Hamano
  2020-06-05 23:23   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2020-06-04 17:35 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The struct member generation refers to "generation number" (or more
> broadly, a reachablity index value) used by commit-graph to reduce time
> taken to walk commits. However, generation is not useful in other
> contexts and bloats the struct.
>
> Let's move it to a commit-slab and shrink the struct by four bytes.
>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c | 27 +++++++++++++++++++++++++++
>  commit-graph.h |  5 +++++
>  commit.h       |  3 ---
>  3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index e3420ddcbf..63f419048d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>  
> +define_commit_slab(generation_slab, uint32_t);
> +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);
> +
> +uint32_t generation(const struct commit *c)
> +{
> +	uint32_t *gen = generation_slab_peek(&generation_slab, c);
> +
> +	return gen ? *gen : GENERATION_NUMBER_INFINITY;
> +}
> +
> +static void set_generation(const struct commit *c, const uint32_t generation)
> +{
> +	unsigned int i = generation_slab.slab_count;
> +	uint32_t *gen = generation_slab_at(&generation_slab, c);
> +
> +	/*
> +	 * commit-slab initializes with zero, overwrite this with
> +	 * GENERATION_NUMBER_INFINITY
> +	 */
> +	for (; i < generation_slab.slab_count; ++i) {

Style: favor post-increment over pre-increment when there is no
difference, especially when updating the loop control in for() loop.

> +		memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY,
> +		       generation_slab.slab_size * sizeof(uint32_t));
> +	}
> +
> +	*gen = generation;
> +}
> +
>  static int commit_gen_cmp(const void *va, const void *vb)
>  {
>  	const struct commit *a = *(const struct commit **)va;
> diff --git a/commit-graph.h b/commit-graph.h
> index 4212766a4f..653bd041ad 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -8,6 +8,10 @@
>  #include "object-store.h"
>  #include "oidset.h"
>  
> +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> +#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> +#define GENERATION_NUMBER_ZERO 0
> +

Makes sense to move it from commit.h, I guess.

>  #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
>  #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
>  #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
> @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *);
>   */
>  void disable_commit_graph(struct repository *r);
>  
> +uint32_t generation(const struct commit *c);
>  #endif
> diff --git a/commit.h b/commit.h
> index 1b2dea5d85..cc610400d5 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -11,9 +11,6 @@
>  #include "commit-slab.h"
>  
>  #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
> -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> -#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> -#define GENERATION_NUMBER_ZERO 0
>  
>  struct commit_list {
>  	struct commit *item;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab
  2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
  2020-06-04 14:27   ` Derrick Stolee
@ 2020-06-04 17:49   ` Junio C Hamano
  2020-06-06 22:03   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2020-06-04 17:49 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> In this commit, we will use the generation slab helpers introduced in
> last commit and replace existing uses of commit->generation using
> 'contrib/coccinelle/generation.cocci'
>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  alloc.c                             |  1 -
>  blame.c                             |  2 +-
>  commit-graph.c                      | 39 +++++++++++-----------
>  commit-reach.c                      | 50 ++++++++++++++---------------
>  commit.c                            |  4 +--
>  commit.h                            |  1 -
>  contrib/coccinelle/generation.cocci | 12 +++++++
>  revision.c                          | 16 ++++-----
>  8 files changed, 68 insertions(+), 57 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>
> diff --git a/alloc.c b/alloc.c
> index 1c64c4dd16..cbed187094 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c)
>  	c->object.type = OBJ_COMMIT;
>  	c->index = alloc_commit_index(r);
>  	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
> -	c->generation = GENERATION_NUMBER_INFINITY;
>  }
>  
>  void *alloc_commit_node(struct repository *r)
> diff --git a/blame.c b/blame.c
> index da7e28800e..50e6316076 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
>  	if (!bd)
>  		return 1;
>  
> -	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
> +	if (generation(origin->commit) == GENERATION_NUMBER_INFINITY)

Hmmmm, as C is not all that object-oriented that lets us say "commit
objects have generation() method", a plain vanilla function whose
name is generation() is a bit overly vague.  The field name this
helper function replaces, .generation, is very localized to a commit
"object" and does not have such a problem.

We probably need to choose a better name in the previous step to fix
it.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
@ 2020-06-04 17:55   ` Junio C Hamano
  2020-06-07 19:53   ` SZEDER Gábor
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2020-06-04 17:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhishek Kumar, git, jnareb

Derrick Stolee <stolee@gmail.com> writes:

> If we have a commit-graph file, then we have graph_pos
> and generation both coming from that file. Perhaps it
> would be better to combine the data into a single slab
> that stores a "struct commit_graph_data" or something?

Excellent.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (3 preceding siblings ...)
  2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
@ 2020-06-05 19:00 ` Jakub Narębski
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
  6 siblings, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-05 19:00 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee, Jakub Narębski

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The struct commit is used in many contexts. However, members generation
> and graph_pos are only used for commit-graph related operations and
> otherwise waste memory.

Very minor nitpick: this sentence would read better if the names of
`generation` and `graph_pos` fields (but especially the 'generation')
were quoted.

>
> This wastage would have been more pronounced as transistion to
> generation number v2, which uses 64-bit generation number instead of
> current 32-bits.

Good.  Moving reachability index value into a commit slab was one of
prerequisites to switching to the generation number v2, see [2]

[2]: https://public-inbox.org/git/cfa2c367-5cd7-add5-0293-caa75b103f34@gmail.com/t/#u

The other prerequisite was proper handling of commit-graph format
change, either by using "metadata chunk" as more flexible replacement of
mishandled format version field in the commit-graph file header, or as
proposed in [3] (and subsequent posts), removing "CDAT" chunk and
replacing it with "CDA2" chunk.

[3]: https://public-inbox.org/git/xmqq369z7i1b.fsf@gitster.c.googlers.com/t/#u


Also, we should probably stop mishandling the format version field, that
is do not error out [4] when commit-graph version of the file does not
match version supported by git code running the command, but just simply
not use the commit-graph (like it is done for Bloom filter chunks).

[4]: https://github.com/git/git/blob/master/commit-graph.c#L253

>
> The third patch ("commit: convert commit->graph_pos to a slab",
> 2020-06-04) is currently failing diff-submodule related tests (t4041,
> t4059 and t4060) for gcc [1]. I am going to send a second version soon,
> fixing that.
>
> [1]: https://travis-ci.com/github/abhishekkumar2718/git/jobs/343441189
>
> Abhishek Kumar (3):
>   commit: introduce helpers for generation slab
>   commit: convert commit->generation to a slab
>   commit: convert commit->graph_pos to a slab
>
>  alloc.c                             |   2 -
>  blame.c                             |   2 +-
>  bloom.c                             |   6 +-
>  commit-graph.c                      | 116 +++++++++++++++++++++-------
>  commit-graph.h                      |   8 ++
>  commit-reach.c                      |  50 ++++++------
>  commit.c                            |   6 +-
>  commit.h                            |   6 --
>  contrib/coccinelle/generation.cocci |  12 +++
>  contrib/coccinelle/graph_pos.cocci  |  12 +++

It is nice to see the use of Coccinelle scripts.

>  revision.c                          |  16 ++--
>  11 files changed, 158 insertions(+), 78 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>  create mode 100644 contrib/coccinelle/graph_pos.cocci

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 1/3] commit: introduce helpers for generation slab
  2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
  2020-06-04 14:36   ` Derrick Stolee
  2020-06-04 17:35   ` Junio C Hamano
@ 2020-06-05 23:23   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-05 23:23 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The struct member generation refers to "generation number" (or more

Again, a minor thing: 'generation'.

> broadly, a reachablity index value) used by commit-graph to reduce time
> taken to walk commits. However, generation is not useful in other
> contexts and bloats the struct.
>
> Let's move it to a commit-slab and shrink the struct by four bytes.

It looks like the description is from earlier version of the commit,
before it was split -- because this commit does not remove 'generation'
member from the 'struct commit', actually.

This commit is about creating helper functions.

>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c | 27 +++++++++++++++++++++++++++
>  commit-graph.h |  5 +++++
>  commit.h       |  3 ---
>  3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index e3420ddcbf..63f419048d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,33 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>  
> +define_commit_slab(generation_slab, uint32_t);
> +static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);

All right, we need this for the following helper functions to work.

We might want to encapsulate all commit-graph data together, in a single
struct (e.g. as 'struct commit_graph_data' instead of uint32_t here).

On the other hand other data is stored on slab often as separate
scalar data (contains_cache, commit_seen, indegree_slab,
author_date_slab, commit_base, commit_pos), but not always; sometimes it
is a struct (bloom_filter_slab, buffer_slab, commit_rev_name), sometimes
it is an array (commit_depth, ref_bitmap, commit_weight), and sometimes
it is an array/list of structs or pointer to struct (commit_names,
commit_name_slab, saved_parents, blame_suspects, commit_todo_item).

> +
> +uint32_t generation(const struct commit *c)
> +{
> +	uint32_t *gen = generation_slab_peek(&generation_slab, c);
> +
> +	return gen ? *gen : GENERATION_NUMBER_INFINITY;
> +}

All right, this is a synthetic getter using the fact that commits
outside the commit-graph should get GENERATION_NUMBER_INFINITY (because
[effective] commit-graph is closed under reachability, is full DAG).

Should we have something like that for 'graph_pos' and
COMMIT_NOT_FROM_GRAPH?

> +
> +static void set_generation(const struct commit *c, const uint32_t generation)
> +{
> +	unsigned int i = generation_slab.slab_count;
> +	uint32_t *gen = generation_slab_at(&generation_slab, c);
> +
> +	/*
> +	 * commit-slab initializes with zero, overwrite this with
> +	 * GENERATION_NUMBER_INFINITY
> +	 */
> +	for (; i < generation_slab.slab_count; ++i) {
> +		memset(generation_slab.slab[i], GENERATION_NUMBER_INFINITY,
> +		       generation_slab.slab_size * sizeof(uint32_t));
> +	}
> +
> +	*gen = generation;
> +}

All right. I wonder if putting 'generation' and 'graph_pos' on the slab
together, gathered in 'struct commit_graph_data' would make this helper
more complex...

> +
>  static int commit_gen_cmp(const void *va, const void *vb)
>  {
>  	const struct commit *a = *(const struct commit **)va;
> diff --git a/commit-graph.h b/commit-graph.h
> index 4212766a4f..653bd041ad 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -8,6 +8,10 @@
>  #include "object-store.h"
>  #include "oidset.h"
>  
> +#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> +#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> +#define GENERATION_NUMBER_ZERO 0
> +
>  #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
>  #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
>  #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"
> @@ -137,4 +141,5 @@ void free_commit_graph(struct commit_graph *);
>   */
>  void disable_commit_graph(struct repository *r);
>  
> +uint32_t generation(const struct commit *c);
>  #endif
> diff --git a/commit.h b/commit.h
> index 1b2dea5d85..cc610400d5 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -11,9 +11,6 @@
>  #include "commit-slab.h"
>  
>  #define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
> -#define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
> -#define GENERATION_NUMBER_MAX 0x3FFFFFFF
> -#define GENERATION_NUMBER_ZERO 0
>  
>  struct commit_list {
>  	struct commit *item;

Why this change?

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 2/3] commit: convert commit->generation to a slab
  2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
  2020-06-04 14:27   ` Derrick Stolee
  2020-06-04 17:49   ` Junio C Hamano
@ 2020-06-06 22:03   ` Jakub Narębski
  2 siblings, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-06 22:03 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> In this commit, we will use the generation slab helpers introduced in
> last commit and replace existing uses of commit->generation using
> 'contrib/coccinelle/generation.cocci'
>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  alloc.c                             |  1 -
>  blame.c                             |  2 +-
>  commit-graph.c                      | 39 +++++++++++-----------
>  commit-reach.c                      | 50 ++++++++++++++---------------
>  commit.c                            |  4 +--
>  commit.h                            |  1 -
>  contrib/coccinelle/generation.cocci | 12 +++++++
>  revision.c                          | 16 ++++-----
>  8 files changed, 68 insertions(+), 57 deletions(-)
>  create mode 100644 contrib/coccinelle/generation.cocci
>

For easier review I have changed the order of files, grouping together
different categories of changes.


First category is removing commit->generation and related changes:

> diff --git a/commit.h b/commit.h
> index cc610400d5..01e1c4c3eb 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -34,7 +34,6 @@ struct commit {
>  	 */
>  	struct tree *maybe_tree;
>  	uint32_t graph_pos;
> -	uint32_t generation;
>  	unsigned int index;
>  };
>  

This is quite straightforward.

> diff --git a/alloc.c b/alloc.c
> index 1c64c4dd16..cbed187094 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -109,7 +109,6 @@ void init_commit_node(struct repository *r, struct commit *c)
>  	c->object.type = OBJ_COMMIT;
>  	c->index = alloc_commit_index(r);
>  	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
> -	c->generation = GENERATION_NUMBER_INFINITY;
>  }
>  
>  void *alloc_commit_node(struct repository *r)

But this change might need a more detailed write-up in the commit
message.

If I understand it correctly, this is function is used by generic commit
allocator, and given commit object may not need generation number. That
is why the default value of GENERATION_NUMBER_INFINITY is handled by new
"getters" and "setters", i.e. generation() and set_generation() -- which
are called only if commit-graph is present, I think.

The generation() returns GENERATION_NUMBER_INFINITY for commits not in
generation_slab, assuming that for all commits in the commit graph it
would be filled by commit-graph parsing -- just like init_commit_node()
would initialize commit->generation to this value.

Because <slabname>_at() (including generation_slab_at()) extends array
if necessary, we want all data on generation_slab to be correctly
initialized to GENERATION_NUMBER_INFINITY, just like init_commit_node()
would do it, because generation() "getter" cannot.  The commit might be
present on generation_slab just because allocation is done in chunks
(slabs).


Second category is semantic patch that generates the rest of changes
(which could be adjusted manually in subsequent commits).

> diff --git a/contrib/coccinelle/generation.cocci b/contrib/coccinelle/generation.cocci
> new file mode 100644
> index 0000000000..da13c44856
> --- /dev/null
> +++ b/contrib/coccinelle/generation.cocci
> @@ -0,0 +1,12 @@
> +@@
> +struct commit *c;
> +expression E;
> +@@
> +- c->generation = E
> ++ set_generation(c, E)
> +
> +@@
> +struct commit *c;
> +@@
> +- c->generation
> ++ generation(c)

I wonder if Coccinelle is able to automatically discard extra
parentheses (the problem noticed by Stolee in his reply) with the
following chunk:

  +@@
  +struct commit *c;
  +@@
  +- (c)->generation
  ++ generation(c)


Third category is all the changes that are just straight mechanical
changes being the result of applying the Coccinelle patch.

> diff --git a/blame.c b/blame.c
> index da7e28800e..50e6316076 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
>  	if (!bd)
>  		return 1;
>  
> -	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
> +	if (generation(origin->commit) == GENERATION_NUMBER_INFINITY)
>  		return 1;
>  
>  	filter = get_bloom_filter(r, origin->commit, 0);
> diff --git a/commit-graph.c b/commit-graph.c
> index 63f419048d..9ce7d4acb1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -120,9 +120,9 @@ static int commit_gen_cmp(const void *va, const void *vb)
>  	const struct commit *b = *(const struct commit **)vb;
>  
>  	/* lower generation commits first */
> -	if (a->generation < b->generation)
> +	if (generation(a) < generation(b))
>  		return -1;
> -	else if (a->generation > b->generation)
> +	else if (generation(a) > generation(b))
>  		return 1;
>  
>  	/* use date as a heuristic when generations are equal */
> @@ -712,7 +712,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  	lex_index = pos - g->num_commits_in_base;
>  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
>  	item->graph_pos = pos;
> -	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> +	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
>  }
>  
>  static inline void set_commit_tree(struct commit *c, struct tree *t)
> @@ -754,7 +754,7 @@ static int fill_commit_in_graph(struct repository *r,
>  	date_low = get_be32(commit_data + g->hash_len + 12);
>  	item->date = (timestamp_t)((date_high << 32) | date_low);
>  
> -	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> +	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
>  
>  	pptr = &item->parents;
>  
> @@ -1048,7 +1048,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		else
>  			packedDate[0] = 0;
>  
> -		packedDate[0] |= htonl((*list)->generation << 2);
> +		packedDate[0] |= htonl(generation((*list)) << 2);
>  
>  		packedDate[1] = htonl((*list)->date);
>  		hashwrite(f, packedDate, 8);
> @@ -1280,8 +1280,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		display_progress(ctx->progress, i + 1);
> -		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
> -		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
> +		if (generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
> +		    generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
>  			continue;
>  
>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1292,22 +1292,23 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			uint32_t max_generation = 0;
>  
>  			for (parent = current->parents; parent; parent = parent->next) {
> -				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
> -				    parent->item->generation == GENERATION_NUMBER_ZERO) {
> +				if (generation(parent->item) == GENERATION_NUMBER_INFINITY ||
> +				    generation(parent->item) == GENERATION_NUMBER_ZERO) {
>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> -				} else if (parent->item->generation > max_generation) {
> -					max_generation = parent->item->generation;
> +				} else if (generation(parent->item) > max_generation) {
> +					max_generation = generation(parent->item);
>  				}
>  			}

Here generation(parent->item) is called three times, which is probably
cost effective to save the value to the local variable (as Stolee
noticed).

>  
>  			if (all_parents_computed) {
> -				current->generation = max_generation + 1;
> +				set_generation(current, max_generation + 1);
>  				pop_commit(&list);
>  
> -				if (current->generation > GENERATION_NUMBER_MAX)
> -					current->generation = GENERATION_NUMBER_MAX;
> +				if (generation(current) > GENERATION_NUMBER_MAX)
> +					set_generation(current,
> +						       GENERATION_NUMBER_MAX);
>  			}
>  		}
>  	}
> @@ -2314,8 +2315,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  					     oid_to_hex(&graph_parents->item->object.oid),
>  					     oid_to_hex(&odb_parents->item->object.oid));
>  
> -			if (graph_parents->item->generation > max_generation)
> -				max_generation = graph_parents->item->generation;
> +			if (generation(graph_parents->item) > max_generation)
> +				max_generation = generation(graph_parents->item);
>  
>  			graph_parents = graph_parents->next;
>  			odb_parents = odb_parents->next;
> @@ -2325,7 +2326,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  			graph_report(_("commit-graph parent list for commit %s terminates early"),
>  				     oid_to_hex(&cur_oid));
>  
> -		if (!graph_commit->generation) {
> +		if (!generation(graph_commit)) {
>  			if (generation_zero == GENERATION_NUMBER_EXISTS)
>  				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
>  					     oid_to_hex(&cur_oid));
> @@ -2345,10 +2346,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
>  		if (max_generation == GENERATION_NUMBER_MAX)
>  			max_generation--;
>  
> -		if (graph_commit->generation != max_generation + 1)
> +		if (generation(graph_commit) != max_generation + 1)
>  			graph_report(_("commit-graph generation for commit %s is %u != %u"),
>  				     oid_to_hex(&cur_oid),
> -				     graph_commit->generation,
> +				     generation(graph_commit),
>  				     max_generation + 1);
>  
>  		if (graph_commit->date != odb_commit->date)
> diff --git a/commit-reach.c b/commit-reach.c
> index 4ca7e706a1..77c980054a 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r,
>  		struct commit_list *parents;
>  		int flags;
>  
> -		if (min_generation && commit->generation > last_gen)
> +		if (min_generation && generation(commit) > last_gen)
>  			BUG("bad generation skip %8x > %8x at %s",
> -			    commit->generation, last_gen,
> +			    generation(commit), last_gen,
>  			    oid_to_hex(&commit->object.oid));
> -		last_gen = commit->generation;
> +		last_gen = generation(commit);
>  
> -		if (commit->generation < min_generation)
> +		if (generation(commit) < min_generation)
>  			break;

Here generation(commit) is called three times (two times in the
exceptional case).

>  
>  		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
> @@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
>  		repo_parse_commit(r, array[i]);
>  	for (i = 0; i < cnt; i++) {
>  		struct commit_list *common;
> -		uint32_t min_generation = array[i]->generation;
> +		uint32_t min_generation = generation(array[i]);
>  
>  		if (redundant[i])
>  			continue;
> @@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
>  			filled_index[filled] = j;
>  			work[filled++] = array[j];
>  
> -			if (array[j]->generation < min_generation)
> -				min_generation = array[j]->generation;
> +			if (generation(array[j]) < min_generation)
> +				min_generation = generation(array[j]);
>  		}
>  		common = paint_down_to_common(r, array[i], filled,
>  					      work, min_generation);
> @@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
>  	for (i = 0; i < nr_reference; i++) {
>  		if (repo_parse_commit(r, reference[i]))
>  			return ret;
> -		if (reference[i]->generation < min_generation)
> -			min_generation = reference[i]->generation;
> +		if (generation(reference[i]) < min_generation)
> +			min_generation = generation(reference[i]);
>  	}
>  
> -	if (commit->generation > min_generation)
> +	if (generation(commit) > min_generation)
>  		return ret;
>  
>  	bases = paint_down_to_common(r, commit,
>  				     nr_reference, reference,
> -				     commit->generation);
> +				     generation(commit));
>  	if (commit->object.flags & PARENT2)
>  		ret = 1;
>  	clear_commit_marks(commit, all_flags);
> @@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate,
>  	/* Otherwise, we don't know; prepare to recurse */
>  	parse_commit_or_die(candidate);
>  
> -	if (candidate->generation < cutoff)
> +	if (generation(candidate) < cutoff)
>  		return CONTAINS_NO;
>  
>  	return CONTAINS_UNKNOWN;
> @@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>  	for (p = want; p; p = p->next) {
>  		struct commit *c = p->item;
>  		load_commit_graph_info(the_repository, c);
> -		if (c->generation < cutoff)
> -			cutoff = c->generation;
> +		if (generation(c) < cutoff)
> +			cutoff = generation(c);
>  	}
>  
>  	result = contains_test(candidate, want, cache, cutoff);
> @@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
>  	const struct commit *a = *(const struct commit * const *)_a;
>  	const struct commit *b = *(const struct commit * const *)_b;
>  
> -	if (a->generation < b->generation)
> +	if (generation(a) < generation(b))
>  		return -1;
> -	if (a->generation > b->generation)
> +	if (generation(a) > generation(b))
>  		return 1;
>  	return 0;
>  }
> @@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  
>  		list[nr_commits] = (struct commit *)from_one;
>  		if (parse_commit(list[nr_commits]) ||
> -		    list[nr_commits]->generation < min_generation) {
> +		    generation(list[nr_commits]) < min_generation) {
>  			result = 0;
>  			goto cleanup;
>  		}
> @@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  
>  					if (parse_commit(parent->item) ||
>  					    parent->item->date < min_commit_date ||
> -					    parent->item->generation < min_generation)
> +					    generation(parent->item) < min_generation)
>  						continue;
>  
>  					commit_list_insert(parent->item, &stack);
> @@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
>  			if (from_iter->item->date < min_commit_date)
>  				min_commit_date = from_iter->item->date;
>  
> -			if (from_iter->item->generation < min_generation)
> -				min_generation = from_iter->item->generation;
> +			if (generation(from_iter->item) < min_generation)
> +				min_generation = generation(from_iter->item);
>  		}
>  
>  		from_iter = from_iter->next;
> @@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
>  			if (to_iter->item->date < min_commit_date)
>  				min_commit_date = to_iter->item->date;
>  
> -			if (to_iter->item->generation < min_generation)
> -				min_generation = to_iter->item->generation;
> +			if (generation(to_iter->item) < min_generation)
> +				min_generation = generation(to_iter->item);
>  		}
>  
>  		to_iter->item->object.flags |= PARENT2;
> @@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>  		struct commit *c = *item;
>  
>  		parse_commit(c);
> -		if (c->generation < min_generation)
> -			min_generation = c->generation;
> +		if (generation(c) < min_generation)
> +			min_generation = generation(c);
>  
>  		if (!(c->object.flags & PARENT1)) {
>  			c->object.flags |= PARENT1;
> @@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>  
>  			parse_commit(p);
>  
> -			if (p->generation < min_generation)
> +			if (generation(p) < min_generation)
>  				continue;
>  
>  			if (p->object.flags & PARENT2)
> diff --git a/commit.c b/commit.c
> index 87686a7055..8dad0f8446 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
>  	const struct commit *a = a_, *b = b_;
>  
>  	/* newer commits first */
> -	if (a->generation < b->generation)
> +	if (generation(a) < generation(b))
>  		return 1;
> -	else if (a->generation > b->generation)
> +	else if (generation(a) > generation(b))
>  		return -1;
>  
>  	/* use date as a heuristic when generations are equal */
> diff --git a/revision.c b/revision.c
> index 60cca8c0b9..d76382007c 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
>  	if (!revs->repo->objects->commit_graph)
>  		return -1;
>  
> -	if (commit->generation == GENERATION_NUMBER_INFINITY)
> +	if (generation(commit) == GENERATION_NUMBER_INFINITY)
>  		return -1;
>  
>  	filter = get_bloom_filter(revs->repo, commit, 0);
> @@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs,
>  	struct topo_walk_info *info = revs->topo_walk_info;
>  	struct commit *c;
>  	while ((c = prio_queue_peek(&info->explore_queue)) &&
> -	       c->generation >= gen_cutoff)
> +	       generation(c) >= gen_cutoff)
>  		explore_walk_step(revs);
>  }
>  
> @@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs)
>  	if (parse_commit_gently(c, 1) < 0)
>  		return;
>  
> -	explore_to_depth(revs, c->generation);
> +	explore_to_depth(revs, generation(c));
>  
>  	for (p = c->parents; p; p = p->next) {
>  		struct commit *parent = p->item;
> @@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs,
>  	struct topo_walk_info *info = revs->topo_walk_info;
>  	struct commit *c;
>  	while ((c = prio_queue_peek(&info->indegree_queue)) &&
> -	       c->generation >= gen_cutoff)
> +	       generation(c) >= gen_cutoff)
>  		indegree_walk_step(revs);
>  }
>  
> @@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs)
>  		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
>  		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
>  
> -		if (c->generation < info->min_generation)
> -			info->min_generation = c->generation;
> +		if (generation(c) < info->min_generation)
> +			info->min_generation = generation(c);
>  
>  		*(indegree_slab_at(&info->indegree, c)) = 1;
>  
> @@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>  		if (parse_commit_gently(parent, 1) < 0)
>  			continue;
>  
> -		if (parent->generation < info->min_generation) {
> -			info->min_generation = parent->generation;
> +		if (generation(parent) < info->min_generation) {
> +			info->min_generation = generation(parent);
>  			compute_indegrees_to_depth(revs, info->min_generation);
>  		}

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 3/3] commit: convert commit->graph_pos to a slab
  2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
@ 2020-06-07 12:12   ` Jakub Narębski
  0 siblings, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-07 12:12 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The member graph_pos refers to the integer position used to identify a
> commit in commit-graph files. However, graph_pos is not useful in other
> contexts and bloats the struct.
>
> Let's move it to a commit-slab and shrink the struct by four bytes.
>
> Existing references to graph_pos are replaced using
> 'contrib/coccinelle/graph_pos.cocci'.
>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  alloc.c                            |  1 -
>  bloom.c                            |  6 ++--
>  commit-graph.c                     | 50 +++++++++++++++++++++++-------
>  commit-graph.h                     |  3 ++
>  commit.c                           |  2 +-
>  commit.h                           |  2 --
>  contrib/coccinelle/graph_pos.cocci | 12 +++++++
>  7 files changed, 58 insertions(+), 18 deletions(-)
>  create mode 100644 contrib/coccinelle/graph_pos.cocci

I have reordered the chunks to make it easier to review.

> diff --git a/commit.h b/commit.h
> index 01e1c4c3eb..0b10464a10 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -10,8 +10,6 @@
>  #include "pretty.h"
>  #include "commit-slab.h"
>  
> -#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
> -
>  struct commit_list {
>  	struct commit *item;
>  	struct commit_list *next;
> diff --git a/commit-graph.h b/commit-graph.h
> index 653bd041ad..3cb59ba336 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -8,6 +8,7 @@
>  #include "object-store.h"
>  #include "oidset.h"
>  
> +#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
>  #define GENERATION_NUMBER_INFINITY 0xFFFFFFFF
>  #define GENERATION_NUMBER_MAX 0x3FFFFFFF
>  #define GENERATION_NUMBER_ZERO 0

Those two chunks move COMMIT_NOT_FROM_GRAPH from commit.h to
commit-graph.h, because it is no longer needed in init_commit_node()
from alloc.c.  On the other hand the remaining commit-graph #define-s
were moved in first commit in series.


What I DO NOT SEE in this commit is actual *removal* of `graph_pos`
field from the `struct commit`, i.e.:

  diff --git a/commit.h b/commit.h
  index cc610400d5..01e1c4c3eb 100644
  --- a/commit.h
  +++ b/commit.h
  @@ -34,6 +34,5 @@ struct commit {
   	 */
   	struct tree *maybe_tree;
  -	uint32_t graph_pos;
   	unsigned int index;
   };
   


> diff --git a/alloc.c b/alloc.c
> index cbed187094..f37fb3b8b6 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -108,7 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c)
>  {
>  	c->object.type = OBJ_COMMIT;
>  	c->index = alloc_commit_index(r);
> -	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
>  }
>
>  void *alloc_commit_node(struct repository *r)

This removes commit->graph_pos initialization from init_commit_node(),
and thus from alloc_commit_node(); the handling of COMMIT_NOT_FROM_GRAPH
is moved to setter and getter "methods", see below.

> diff --git a/commit-graph.c b/commit-graph.c
> index 9ce7d4acb1..7ff460b442 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,34 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>  
> +define_commit_slab(graph_pos_slab, uint32_t);
> +static struct graph_pos_slab graph_pos_slab = COMMIT_SLAB_INIT(1, graph_pos_slab);
> +
> +uint32_t graph_pos(const struct commit *c)
> +{
> +	uint32_t *pos = graph_pos_slab_peek(&graph_pos_slab, c);
> +
> +	return pos ? *pos : COMMIT_NOT_FROM_GRAPH;
> +}
> +
> +static void set_graph_pos(const struct commit *c, const uint32_t position)
> +{
> +	unsigned int i = graph_pos_slab.slab_count;
> +	uint32_t *pos = graph_pos_slab_at(&graph_pos_slab, c);
> +
> +	/*
> +	 * commit-slab initializes with zero, overwrite this with
> +	 * COMMIT_NOT_FROM_GRAPH
> +	 */
> +	for (; i < graph_pos_slab.slab_count; ++i)
> +	{
> +		memset(graph_pos_slab.slab[i], COMMIT_NOT_FROM_GRAPH,
> +		       graph_pos_slab.slab_size * sizeof(uint32_t));
> +	}
> +
> +	*pos = position;
> +}
> +
>  define_commit_slab(generation_slab, uint32_t);
>  static struct generation_slab generation_slab = COMMIT_SLAB_INIT(1, generation_slab);
>  
[...]
> @@ -142,4 +143,6 @@ void free_commit_graph(struct commit_graph *);
>  void disable_commit_graph(struct repository *r);
>  
>  uint32_t generation(const struct commit *c);
> +
> +uint32_t graph_pos(const struct commit *c);
>  #endif


I wonder why those helper functions: graph_pos() and set_graph_pos()
were not introduced in the 1st patch of this series, together with
generation() and set_generation().

The same comments as for previous patch apply: if graph_pos was not
explicitely set, we want for it to be COMMIT_NOT_FROM_GRAPH (like
init_commit_node() did before this change).  If it is not on
commit-slab, it is not set -- this is handled by graph_pos() function.
However we allocate memory on slab in chunks, and set_graph_pos()
ensures that those extra allocated `graph_pos` values on commit-slab are
properly initialized to COMMIT_NOT_FROM_GRAPH, like init_commit_node()
did.


Note that we always have COMMIT_NOT_FROM_GRAPH for `graph_pos` if and
only if there is GENERATION_NUMBER_INFINITY for `generation`, so perhaps
putting those two together in `struct commit_graph_info` would make
sense.  But whether doing more work in "setter" set_*() functions or
doing extra conditional in "getter" *() would give better performance
needs (micro-)benchmarking.


> diff --git a/contrib/coccinelle/graph_pos.cocci b/contrib/coccinelle/graph_pos.cocci
> new file mode 100644
> index 0000000000..0929164bdf
> --- /dev/null
> +++ b/contrib/coccinelle/graph_pos.cocci
> @@ -0,0 +1,12 @@
> +@@
> +struct commit *c;
> +expression E;
> +@@
> +- c->graph_pos = E
> ++ set_graph_pos(c, E)
> +
> +@@
> +struct commit *c;
> +@@
> +- c->graph_pos
> ++ graph_pos(c)

This is semantic patch that generates the rest of changes.  (If any of
those needs manual improvement, it is better left for a separate "manual
fixup" patch, in my opinion.)

> diff --git a/bloom.c b/bloom.c
> index 9b86aa3f59..5bee5bb0c1 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
>  {
>  	uint32_t lex_pos, start_index, end_index;
>  
> -	while (c->graph_pos < g->num_commits_in_base)
> +	while (graph_pos(c) < g->num_commits_in_base)
>  		g = g->base_graph;
>  
>  	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
>  	if (!g->chunk_bloom_indexes)
>  		return 0;
>  
> -	lex_pos = c->graph_pos - g->num_commits_in_base;
> +	lex_pos = graph_pos(c) - g->num_commits_in_base;
>  
>  	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
>

Here graph_pos(c) is used twice.

It probably needs to be checked (with benchmark) if it matters.


> @@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  
>  	if (!filter->data) {
>  		load_commit_graph_info(r, c);
> -		if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
> +		if (graph_pos(c) != COMMIT_NOT_FROM_GRAPH &&
>  			r->objects->commit_graph->chunk_bloom_indexes) {
>  			if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
>  				return filter;

> diff --git a/commit-graph.c b/commit-graph.c
> index 9ce7d4acb1..7ff460b442 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -697,7 +725,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
>  	c = lookup_commit(r, &oid);
>  	if (!c)
>  		die(_("could not find commit %s"), oid_to_hex(&oid));
> -	c->graph_pos = pos;
> +	set_graph_pos(c, pos);
>  	return &commit_list_insert(c, pptr)->next;
>  }
>  
> @@ -711,7 +739,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  
>  	lex_index = pos - g->num_commits_in_base;
>  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
> -	item->graph_pos = pos;
> +	set_graph_pos(item, pos);
>  	set_generation(item, get_be32(commit_data + g->hash_len + 8) >> 2);
>  }
>  
> @@ -741,7 +769,7 @@ static int fill_commit_in_graph(struct repository *r,
>  	 * Store the "full" position, but then use the
>  	 * "local" position for the rest of the calculation.
>  	 */
> -	item->graph_pos = pos;
> +	set_graph_pos(item, pos);
>  	lex_index = pos - g->num_commits_in_base;
>  
>  	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;

> @@ -786,8 +814,8 @@ static int fill_commit_in_graph(struct repository *r,
>  
>  static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
>  {
> -	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
> -		*pos = item->graph_pos;
> +	if (graph_pos(item) != COMMIT_NOT_FROM_GRAPH) {
> +		*pos = graph_pos(item);
>  		return 1;

Here graph_pos(item) is used twice.

>  	} else {
>  		struct commit_graph *cur_g = g;
> @@ -843,11 +871,11 @@ static struct tree *load_tree_for_commit(struct repository *r,
>  	struct object_id oid;
>  	const unsigned char *commit_data;
>  
> -	while (c->graph_pos < g->num_commits_in_base)
> +	while (graph_pos(c) < g->num_commits_in_base)
>  		g = g->base_graph;
>  
>  	commit_data = g->chunk_commit_data +
> -			GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base);
> +			GRAPH_DATA_WIDTH * (graph_pos(c) - g->num_commits_in_base);
>  
>  	hashcpy(oid.hash, commit_data);
>  	set_commit_tree(c, lookup_tree(r, &oid));

Here graph_pos(c) is used twice.

> @@ -861,7 +889,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r,
>  {
>  	if (c->maybe_tree)
>  		return c->maybe_tree;
> -	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
> +	if (graph_pos(c) == COMMIT_NOT_FROM_GRAPH)
>  		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
>  
>  	return load_tree_for_commit(r, g, (struct commit *)c);
> @@ -1247,7 +1275,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
>  			continue;
>  		if (ctx->split) {
>  			if ((!parse_commit(commit) &&
> -			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
> +			     graph_pos(commit) == COMMIT_NOT_FROM_GRAPH) ||
>  			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
>  				add_missing_parents(ctx, commit);
>  		} else if (!parse_commit_no_graph(commit))
> @@ -1493,7 +1521,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
>  			if (ctx->split) {
>  				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
>  
> -				if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH)
> +				if (!c || graph_pos(c) != COMMIT_NOT_FROM_GRAPH)
>  					continue;
>  			}
>  
> @@ -1527,7 +1555,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
>  		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
>  
>  		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
> -		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
> +		    graph_pos(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH)
>  			continue;
>  
>  		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
> diff --git a/commit.c b/commit.c
> index 8dad0f8446..da6de08b2b 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
>  	if (commit->maybe_tree || !commit->object.parsed)
>  		return commit->maybe_tree;
>  
> -	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
> +	if (graph_pos(commit) != COMMIT_NOT_FROM_GRAPH)
>  		return get_commit_tree_in_graph(r, commit);
>  
>  	return NULL;

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [GSOC Patch v2 0/4] Move generation, graph_pos to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (4 preceding siblings ...)
  2020-06-05 19:00 ` Jakub Narębski
@ 2020-06-07 19:32 ` Abhishek Kumar
  2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
                     ` (5 more replies)
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
  6 siblings, 6 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation number v2, which uses 64-bite generation number instead of
current 32-bits.

Abhishek Kumar (4):
  commit-graph: introduce commit_graph_data_slab
  commit: move members graph_pos, generation to a slab
  commit-graph: use generation directly when writing commit-graph
  commit-graph: minimize commit_graph_data_slab access

 alloc.c                         |   2 -
 blame.c                         |   2 +-
 bloom.c                         |   7 +-
 commit-graph.c                  | 127 ++++++++++++++++++++++++--------
 commit-graph.h                  |  10 +++
 commit-reach.c                  |  69 ++++++++++-------
 commit.c                        |   8 +-
 contrib/coccinelle/commit.cocci |  18 +++++
 revision.c                      |  20 +++--
 9 files changed, 190 insertions(+), 73 deletions(-)

-- 
2.27.0

Thanks to Dr. Stolee, Dr. Narebski and Junio for their excellent
suggestions.

Changes in v2:
- Introduce struct commit_graph_data.
- Merge `graph_pos`, `generation` slabs into a single,
  `commit_graph_data` slab.
- Use graph position for an intermediate check for generation, saving
  the cost of initializing generation numbers.
- Add an follow-up patch caching results of slab access in local
  variables.
- Move coccinelle transformation to commit.coccinelle instead of
  creating new scripts.
- Elaborate on removing default values from init_commit_node().
- Revert moving macro constants (e.g. COMMIT_NOT_FROM_GRAPH,
  GENERATION_NUMBER_ZERO) from commit.h to commit-graph.h

About the failing diff-submodule related tests, I came up with a
plausible explanation but could be wrong on this:

Commit slabs rely on uniqueness of commit->index to access data. But
submodules are repositories on their own, alloc_commit_index(), which
relies on repository->parsed_objects->commit_count no longer returns
unique values.

A commit belong to super repo and another belonging to submodule might
have the same index but different generation and graph positions.

This could be fixed by defining commit index as maximum of commit index
of all repositories + 1 but I have no idea how that would impact other
code.

Thoughts on this?

Regards
Abhishek

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
@ 2020-06-07 19:32   ` Abhishek Kumar
  2020-06-15 16:27     ` Taylor Blau
  2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit-graph related
operations and otherwise waste memory.

As they are often accessed together, let's introduce struct
commit_graph_data and move them to a commit_graph_data slab.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h | 10 ++++++++++
 2 files changed, 59 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e3420ddcbf..7d887a6a2c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb)
 	       commit_pos_at(&commit_pos, b);
 }
 
+define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
+static struct commit_graph_data_slab commit_graph_data_slab =
+	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
+
+uint32_t commit_graph_position(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
+}
+
+uint32_t commit_graph_generation(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	if (!data)
+		return GENERATION_NUMBER_INFINITY;
+	if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
+		return GENERATION_NUMBER_INFINITY;
+
+	return data->generation;
+}
+
+static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
+{
+	uint32_t i = commit_graph_data_slab.slab_count, j;
+	uint32_t slab_size = commit_graph_data_slab.slab_size;
+	struct commit_graph_data *data =
+		commit_graph_data_slab_at(&commit_graph_data_slab, c);
+
+	/*
+	 * commit-slab initializes elements with zero, overwrite this with
+	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
+	 *
+	 * We avoid the cost of initializing `generation` as generation
+	 * number would be GENERATION_NUMBER_INFINITY if graph position
+	 * is COMMIT_NOT_FROM_GRAPH.
+	 */
+	for (; i < commit_graph_data_slab.slab_count; i++) {
+		for (j = 0; j < slab_size; j++) {
+			commit_graph_data_slab.slab[i][j].graph_pos = COMMIT_NOT_FROM_GRAPH;
+		}
+	}
+
+	return data;
+}
+
 static int commit_gen_cmp(const void *va, const void *vb)
 {
 	const struct commit *a = *(const struct commit **)va;
diff --git a/commit-graph.h b/commit-graph.h
index 4212766a4f..9d22f98f44 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -137,4 +137,14 @@ void free_commit_graph(struct commit_graph *);
  */
 void disable_commit_graph(struct repository *r);
 
+struct commit_graph_data {
+	uint32_t graph_pos;
+	uint32_t generation;
+};
+
+/* 
+ * Commits should be parsed before accessing generation, graph positions.
+ */
+uint32_t commit_graph_generation(const struct commit *);
+uint32_t commit_graph_position(const struct commit *);
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
  2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
@ 2020-06-07 19:32   ` Abhishek Kumar
  2020-06-08  8:26     ` SZEDER Gábor
  2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

We remove members `graph_pos` and `generation` from the struct commit.
The default assignments in init_commit_node() are no longer valid,
which is fine as the slab helpers return appropriate default values and
the assignments are removed.

We will replace existing use of commit->generation and commit->graph_pos
by commit_graph_data slab helpers using
`contrib/coccinelle/commit.cocci'.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                         |  2 --
 blame.c                         |  2 +-
 bloom.c                         |  6 ++--
 commit-graph.c                  | 60 ++++++++++++++++-----------------
 commit-graph.h                  |  2 +-
 commit-reach.c                  | 50 +++++++++++++--------------
 commit.c                        |  6 ++--
 contrib/coccinelle/commit.cocci | 18 ++++++++++
 revision.c                      | 16 ++++-----
 9 files changed, 89 insertions(+), 73 deletions(-)

diff --git a/alloc.c b/alloc.c
index 1c64c4dd16..f37fb3b8b6 100644
--- a/alloc.c
+++ b/alloc.c
@@ -108,8 +108,6 @@ void init_commit_node(struct repository *r, struct commit *c)
 {
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index(r);
-	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
-	c->generation = GENERATION_NUMBER_INFINITY;
 }
 
 void *alloc_commit_node(struct repository *r)
diff --git a/blame.c b/blame.c
index da7e28800e..82fa16d658 100644
--- a/blame.c
+++ b/blame.c
@@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
 	if (!bd)
 		return 1;
 
-	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
 		return 1;
 
 	filter = get_bloom_filter(r, origin->commit, 0);
diff --git a/bloom.c b/bloom.c
index 9b86aa3f59..df62e3763d 100644
--- a/bloom.c
+++ b/bloom.c
@@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 {
 	uint32_t lex_pos, start_index, end_index;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = c->graph_pos - g->num_commits_in_base;
+	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
@@ -188,7 +188,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
-		if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
+		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH &&
 			r->objects->commit_graph->chunk_bloom_indexes) {
 			if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
 				return filter;
diff --git a/commit-graph.c b/commit-graph.c
index 7d887a6a2c..f7cca4def4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -142,9 +142,9 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *b = *(const struct commit **)vb;
 
 	/* lower generation commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -719,7 +719,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	c = lookup_commit(r, &oid);
 	if (!c)
 		die(_("could not find commit %s"), oid_to_hex(&oid));
-	c->graph_pos = pos;
+	commit_graph_data_at(c)->graph_pos = pos;
 	return &commit_list_insert(c, pptr)->next;
 }
 
@@ -733,8 +733,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	item->graph_pos = pos;
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->graph_pos = pos;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -763,7 +763,7 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	item->graph_pos = pos;
+	commit_graph_data_at(item)->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -776,7 +776,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -808,8 +808,8 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-		*pos = item->graph_pos;
+	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
+		*pos = commit_graph_position(item);
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -865,11 +865,11 @@ static struct tree *load_tree_for_commit(struct repository *r,
 	struct object_id oid;
 	const unsigned char *commit_data;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -883,7 +883,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r,
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
-	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(c) == COMMIT_NOT_FROM_GRAPH)
 		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
 
 	return load_tree_for_commit(r, g, (struct commit *)c);
@@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl((*list)->generation << 2);
+		packedDate[0] |= htonl(commit_graph_generation((*list)) << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1269,7 +1269,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 			continue;
 		if (ctx->split) {
 			if ((!parse_commit(commit) &&
-			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
+			     commit_graph_position(commit) == COMMIT_NOT_FROM_GRAPH) ||
 			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
 				add_missing_parents(ctx, commit);
 		} else if (!parse_commit_no_graph(commit))
@@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
 		display_progress(ctx->progress, i + 1);
-		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
+		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
+		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
+				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (parent->item->generation > max_generation) {
-					max_generation = parent->item->generation;
+				} else if (commit_graph_generation(parent->item) > max_generation) {
+					max_generation = commit_graph_generation(parent->item);
 				}
 			}
 
 			if (all_parents_computed) {
-				current->generation = max_generation + 1;
+				commit_graph_data_at(current)->generation = max_generation + 1;
 				pop_commit(&list);
 
-				if (current->generation > GENERATION_NUMBER_MAX)
-					current->generation = GENERATION_NUMBER_MAX;
+				if (commit_graph_generation(current) > GENERATION_NUMBER_MAX)
+					commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX;
 			}
 		}
 	}
@@ -1514,7 +1514,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 			if (ctx->split) {
 				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
-				if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH)
+				if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
 					continue;
 			}
 
@@ -1548,7 +1548,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
-		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		    commit_graph_position(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH)
 			continue;
 
 		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
@@ -2336,8 +2336,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (graph_parents->item->generation > max_generation)
-				max_generation = graph_parents->item->generation;
+			if (commit_graph_generation(graph_parents->item) > max_generation)
+				max_generation = commit_graph_generation(graph_parents->item);
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2347,7 +2347,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 			graph_report(_("commit-graph parent list for commit %s terminates early"),
 				     oid_to_hex(&cur_oid));
 
-		if (!graph_commit->generation) {
+		if (!commit_graph_generation(graph_commit)) {
 			if (generation_zero == GENERATION_NUMBER_EXISTS)
 				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
 					     oid_to_hex(&cur_oid));
@@ -2367,10 +2367,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (graph_commit->generation != max_generation + 1)
+		if (commit_graph_generation(graph_commit) != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     graph_commit->generation,
+				     commit_graph_generation(graph_commit),
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-graph.h b/commit-graph.h
index 9d22f98f44..2d1fecf481 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -142,7 +142,7 @@ struct commit_graph_data {
 	uint32_t generation;
 };
 
-/* 
+/*
  * Commits should be parsed before accessing generation, graph positions.
  */
 uint32_t commit_graph_generation(const struct commit *);
diff --git a/commit-reach.c b/commit-reach.c
index 4ca7e706a1..3b2f863f5f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit_list *parents;
 		int flags;
 
-		if (min_generation && commit->generation > last_gen)
+		if (min_generation && commit_graph_generation(commit) > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit->generation, last_gen,
+			    commit_graph_generation(commit), last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit->generation;
+		last_gen = commit_graph_generation(commit);
 
-		if (commit->generation < min_generation)
+		if (commit_graph_generation(commit) < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		repo_parse_commit(r, array[i]);
 	for (i = 0; i < cnt; i++) {
 		struct commit_list *common;
-		uint32_t min_generation = array[i]->generation;
+		uint32_t min_generation = commit_graph_generation(array[i]);
 
 		if (redundant[i])
 			continue;
@@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (array[j]->generation < min_generation)
-				min_generation = array[j]->generation;
+			if (commit_graph_generation(array[j]) < min_generation)
+				min_generation = commit_graph_generation(array[j]);
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (reference[i]->generation < min_generation)
-			min_generation = reference[i]->generation;
+		if (commit_graph_generation(reference[i]) < min_generation)
+			min_generation = commit_graph_generation(reference[i]);
 	}
 
-	if (commit->generation > min_generation)
+	if (commit_graph_generation(commit) > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit->generation);
+				     commit_graph_generation(commit));
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate,
 	/* Otherwise, we don't know; prepare to recurse */
 	parse_commit_or_die(candidate);
 
-	if (candidate->generation < cutoff)
+	if (commit_graph_generation(candidate) < cutoff)
 		return CONTAINS_NO;
 
 	return CONTAINS_UNKNOWN;
@@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	for (p = want; p; p = p->next) {
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (c->generation < cutoff)
-			cutoff = c->generation;
+		if (commit_graph_generation(c) < cutoff)
+			cutoff = commit_graph_generation(c);
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	if (a->generation > b->generation)
+	if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 	return 0;
 }
@@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 		list[nr_commits] = (struct commit *)from_one;
 		if (parse_commit(list[nr_commits]) ||
-		    list[nr_commits]->generation < min_generation) {
+		    commit_graph_generation(list[nr_commits]) < min_generation) {
 			result = 0;
 			goto cleanup;
 		}
@@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 					if (parse_commit(parent->item) ||
 					    parent->item->date < min_commit_date ||
-					    parent->item->generation < min_generation)
+					    commit_graph_generation(parent->item) < min_generation)
 						continue;
 
 					commit_list_insert(parent->item, &stack);
@@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (from_iter->item->generation < min_generation)
-				min_generation = from_iter->item->generation;
+			if (commit_graph_generation(from_iter->item) < min_generation)
+				min_generation = commit_graph_generation(from_iter->item);
 		}
 
 		from_iter = from_iter->next;
@@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (to_iter->item->generation < min_generation)
-				min_generation = to_iter->item->generation;
+			if (commit_graph_generation(to_iter->item) < min_generation)
+				min_generation = commit_graph_generation(to_iter->item);
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (c->generation < min_generation)
-			min_generation = c->generation;
+		if (commit_graph_generation(c) < min_generation)
+			min_generation = commit_graph_generation(c);
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
@@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 			parse_commit(p);
 
-			if (p->generation < min_generation)
+			if (commit_graph_generation(p) < min_generation)
 				continue;
 
 			if (p->object.flags & PARENT2)
diff --git a/commit.c b/commit.c
index 87686a7055..ad9a76dcc6 100644
--- a/commit.c
+++ b/commit.c
@@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
 		return get_commit_tree_in_graph(r, commit);
 
 	return NULL;
@@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 	const struct commit *a = a_, *b = b_;
 
 	/* newer commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return 1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index 778e4704f6..af6dd4c20c 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -32,3 +32,21 @@ expression c;
 - c->maybe_tree
 + repo_get_commit_tree(specify_the_right_repo_here, c)
   ...>}
+
+@@
+struct commit *c;
+expression E;
+@@
+(
+- c->generation = E;
++ commit_graph_data_at(c)->generation = E;
+|
+- c->graph_pos = E;
++ commit_graph_data_at(c)->graph_pos = E;
+|
+- c->generation
++ commit_graph_generation(c)
+|
+- c->graph_pos
++ commit_graph_position(c)
+)
diff --git a/revision.c b/revision.c
index 60cca8c0b9..cb1b200e9f 100644
--- a/revision.c
+++ b/revision.c
@@ -720,7 +720,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
 	if (!revs->repo->objects->commit_graph)
 		return -1;
 
-	if (commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY)
 		return -1;
 
 	filter = get_bloom_filter(revs->repo, commit, 0);
@@ -3314,7 +3314,7 @@ static void explore_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->explore_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		explore_walk_step(revs);
 }
 
@@ -3330,7 +3330,7 @@ static void indegree_walk_step(struct rev_info *revs)
 	if (parse_commit_gently(c, 1) < 0)
 		return;
 
-	explore_to_depth(revs, c->generation);
+	explore_to_depth(revs, commit_graph_generation(c));
 
 	for (p = c->parents; p; p = p->next) {
 		struct commit *parent = p->item;
@@ -3354,7 +3354,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->indegree_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		indegree_walk_step(revs);
 }
 
@@ -3414,8 +3414,8 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (c->generation < info->min_generation)
-			info->min_generation = c->generation;
+		if (commit_graph_generation(c) < info->min_generation)
+			info->min_generation = commit_graph_generation(c);
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3473,8 +3473,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (parent->generation < info->min_generation) {
-			info->min_generation = parent->generation;
+		if (commit_graph_generation(parent) < info->min_generation) {
+			info->min_generation = commit_graph_generation(parent);
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
  2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
  2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
@ 2020-06-07 19:32   ` Abhishek Kumar
  2020-06-08 16:31     ` Jakub Narębski
  2020-06-07 19:32   ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the
graph position for commit is COMMIT_NOT_FROM_GRAPH.

While this is true when reading from a commit graph, no graph positions
are associated with a commit when writing a commit graph. Therefore, the
helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a
finite generation number.

Let's fix this by using generation number directly when writing a commit
graph.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f7cca4def4..0dc79e7c90 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl(commit_graph_generation((*list)) << 2);
+		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
+		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
+
 		display_progress(ctx->progress, i + 1);
-		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
-		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
+		if (generation != GENERATION_NUMBER_INFINITY &&
+		    generation != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
-				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
+
+				if (generation == GENERATION_NUMBER_INFINITY ||
+				    generation == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
                     ` (2 preceding siblings ...)
  2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
@ 2020-06-07 19:32   ` Abhishek Kumar
  2020-06-08 16:22   ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski
  2020-06-15 16:24   ` Taylor Blau
  5 siblings, 0 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-07 19:32 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

In an earlier patch, multiple struct acccesses to `graph_pos` and
`generation` were auto-converted to multiple method calls.

Since the values are fixed and commit-slab access costly, we would be
better off with storing the values as a local variable and reuse it.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>

[1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/
---
 bloom.c        |  5 +++--
 commit-graph.c | 55 +++++++++++++++++++++++++++++-----------------
 commit-reach.c | 59 ++++++++++++++++++++++++++++++++------------------
 commit.c       |  6 +++--
 revision.c     | 12 ++++++----
 5 files changed, 88 insertions(+), 49 deletions(-)

diff --git a/bloom.c b/bloom.c
index df62e3763d..568ebd75e7 100644
--- a/bloom.c
+++ b/bloom.c
@@ -33,15 +33,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct commit *c)
 {
 	uint32_t lex_pos, start_index, end_index;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
+	lex_pos = graph_pos - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
diff --git a/commit-graph.c b/commit-graph.c
index 0dc79e7c90..5b10d1da2c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -141,10 +141,12 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *a = *(const struct commit **)va;
 	const struct commit *b = *(const struct commit **)vb;
 
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
 	/* lower generation commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return -1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -726,6 +728,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
 {
 	const unsigned char *commit_data;
+	struct commit_graph_data *graph_data;
 	uint32_t lex_index;
 
 	while (pos < g->num_commits_in_base)
@@ -733,8 +736,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	commit_graph_data_at(item)->graph_pos = pos;
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -750,6 +755,7 @@ static int fill_commit_in_graph(struct repository *r,
 	uint32_t *parent_data_ptr;
 	uint64_t date_low, date_high;
 	struct commit_list **pptr;
+	struct commit_graph_data *graph_data;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
 
@@ -763,7 +769,8 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	commit_graph_data_at(item)->graph_pos = pos;
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -776,7 +783,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -808,8 +815,9 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
-		*pos = commit_graph_position(item);
+	uint32_t graph_pos = commit_graph_position(item);
+	if (graph_pos != COMMIT_NOT_FROM_GRAPH) {
+		*pos = graph_pos;
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -864,12 +872,13 @@ static struct tree *load_tree_for_commit(struct repository *r,
 {
 	struct object_id oid;
 	const unsigned char *commit_data;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -1301,7 +1310,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
-		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
+		uint32_t generation = commit_graph_generation(ctx->commits.list[i]);
 
 		display_progress(ctx->progress, i + 1);
 		if (generation != GENERATION_NUMBER_INFINITY &&
@@ -1316,23 +1325,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
+				generation = commit_graph_generation(parent->item);
 
 				if (generation == GENERATION_NUMBER_INFINITY ||
 				    generation == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (commit_graph_generation(parent->item) > max_generation) {
-					max_generation = commit_graph_generation(parent->item);
+				} else if (generation > max_generation) {
+					max_generation = generation;
 				}
 			}
 
 			if (all_parents_computed) {
-				commit_graph_data_at(current)->generation = max_generation + 1;
+				struct commit_graph_data *graph_data = commit_graph_data_at(current);
+
+				graph_data->generation = max_generation + 1;
 				pop_commit(&list);
 
-				if (commit_graph_generation(current) > GENERATION_NUMBER_MAX)
-					commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX;
+				if (graph_data->generation > GENERATION_NUMBER_MAX)
+					graph_data->generation = GENERATION_NUMBER_MAX;
 			}
 		}
 	}
@@ -2301,6 +2313,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
+		uint32_t generation;
 
 		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
@@ -2339,8 +2352,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (commit_graph_generation(graph_parents->item) > max_generation)
-				max_generation = commit_graph_generation(graph_parents->item);
+			generation = commit_graph_generation(graph_parents->item);
+			if (generation > max_generation)
+				max_generation = generation;
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2370,10 +2384,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (commit_graph_generation(graph_commit) != max_generation + 1)
+		generation = commit_graph_generation(graph_commit);
+		if (generation != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     commit_graph_generation(graph_commit),
+				     generation,
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-reach.c b/commit-reach.c
index 3b2f863f5f..f5e5c0a32b 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -58,14 +58,15 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit *commit = prio_queue_get(&queue);
 		struct commit_list *parents;
 		int flags;
+		uint32_t generation = commit_graph_generation(commit);
 
-		if (min_generation && commit_graph_generation(commit) > last_gen)
+		if (min_generation && generation > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit_graph_generation(commit), last_gen,
+			    generation, last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit_graph_generation(commit);
+		last_gen = generation;
 
-		if (commit_graph_generation(commit) < min_generation)
+		if (generation < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -181,13 +182,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		if (redundant[i])
 			continue;
 		for (j = filled = 0; j < cnt; j++) {
+			uint32_t curr_generation;
 			if (i == j || redundant[j])
 				continue;
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (commit_graph_generation(array[j]) < min_generation)
-				min_generation = commit_graph_generation(array[j]);
+			curr_generation = commit_graph_generation(array[j]);
+			if (curr_generation < min_generation)
+				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -316,23 +319,26 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 {
 	struct commit_list *bases;
 	int ret = 0, i;
-	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+	uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY;
 
 	if (repo_parse_commit(r, commit))
 		return ret;
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (commit_graph_generation(reference[i]) < min_generation)
-			min_generation = commit_graph_generation(reference[i]);
+
+		generation = commit_graph_generation(reference[i]);
+		if (generation < min_generation)
+			min_generation = generation;
 	}
 
-	if (commit_graph_generation(commit) > min_generation)
+	generation = commit_graph_generation(commit);
+	if (generation > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit_graph_generation(commit));
+				     generation);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -490,10 +496,12 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	const struct commit_list *p;
 
 	for (p = want; p; p = p->next) {
+		uint32_t generation;
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (commit_graph_generation(c) < cutoff)
-			cutoff = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < cutoff)
+			cutoff = generation;
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +552,12 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
+
+	if (generation_a < generation_b)
 		return -1;
-	if (commit_graph_generation(a) > commit_graph_generation(b))
+	if (generation_a > generation_b)
 		return 1;
 	return 0;
 }
@@ -662,11 +673,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
 
 		if (!parse_commit(from_iter->item)) {
+			uint32_t generation;
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (commit_graph_generation(from_iter->item) < min_generation)
-				min_generation = commit_graph_generation(from_iter->item);
+			generation = commit_graph_generation(from_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		from_iter = from_iter->next;
@@ -674,11 +687,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 
 	while (to_iter) {
 		if (!parse_commit(to_iter->item)) {
+			uint32_t generation;
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (commit_graph_generation(to_iter->item) < min_generation)
-				min_generation = commit_graph_generation(to_iter->item);
+			generation = commit_graph_generation(to_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -718,11 +733,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
 	for (item = to; item < to_last; item++) {
+		uint32_t generation;
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (commit_graph_generation(c) < min_generation)
-			min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < min_generation)
+			min_generation = generation;
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
diff --git a/commit.c b/commit.c
index ad9a76dcc6..f85ade78ed 100644
--- a/commit.c
+++ b/commit.c
@@ -729,11 +729,13 @@ int compare_commits_by_author_date(const void *a_, const void *b_,
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused)
 {
 	const struct commit *a = a_, *b = b_;
+	const uint32_t generation_a = commit_graph_generation(a),
+		       generation_b = commit_graph_generation(b);
 
 	/* newer commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return 1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/revision.c b/revision.c
index cb1b200e9f..c257089808 100644
--- a/revision.c
+++ b/revision.c
@@ -3407,6 +3407,7 @@ static void init_topo_walk(struct rev_info *revs)
 	info->min_generation = GENERATION_NUMBER_INFINITY;
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *c = list->item;
+		uint32_t generation;
 
 		if (parse_commit_gently(c, 1))
 			continue;
@@ -3414,8 +3415,9 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (commit_graph_generation(c) < info->min_generation)
-			info->min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < info->min_generation)
+			info->min_generation = generation;
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3466,6 +3468,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 	for (p = commit->parents; p; p = p->next) {
 		struct commit *parent = p->item;
 		int *pi;
+		uint32_t generation;
 
 		if (parent->object.flags & UNINTERESTING)
 			continue;
@@ -3473,8 +3476,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (commit_graph_generation(parent) < info->min_generation) {
-			info->min_generation = commit_graph_generation(parent);
+		generation = commit_graph_generation(parent);
+		if (generation < info->min_generation) {
+			info->min_generation = generation;
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
  2020-06-04 17:55   ` Junio C Hamano
@ 2020-06-07 19:53   ` SZEDER Gábor
  2020-06-08  5:48     ` Abhishek Kumar
  1 sibling, 1 reply; 39+ messages in thread
From: SZEDER Gábor @ 2020-06-07 19:53 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhishek Kumar, git, jnareb

On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > The struct commit is used in many contexts. However, members generation
> > and graph_pos are only used for commit-graph related operations and
> > otherwise waste memory.
> > 
> > This wastage would have been more pronounced as transistion to
> > generation number v2, which uses 64-bit generation number instead of
> > current 32-bits.
> 
> Thanks! This is an important step, and will already improve
> performance in subtle ways.

While the reduced memory footprint of each commit object might improve
performance, accessing graph position and generation numbers in a
commit-slab is more expensive than direct field accesses in 'struct
commit' instances.  Consequently, these patches increase the runtime
of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
repository from 0.630s to 0.940s.


> >  create mode 100644 contrib/coccinelle/generation.cocci
> >  create mode 100644 contrib/coccinelle/graph_pos.cocci
> 
> I appreciate the Coccinelle scripts to help identify
> automatic fixes for other topics in-flight. However,
> I wonder if they would be better placed inside the
> existing commit.cocci file?

We add Coccinelle scripts to avoid undesirable code patterns entering
our code base.  That, however, is not the case here: this is a
one-time conversion, and at the end of this series 'struct commit'
won't have a 'generation' field anymore, so once it's merged the
compiler will catch any new 'commit->generation' accesses.  Therefore
I don't think that these Coccinelle scripts should be added at all.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-07 19:53   ` SZEDER Gábor
@ 2020-06-08  5:48     ` Abhishek Kumar
  2020-06-08  8:36       ` SZEDER Gábor
  0 siblings, 1 reply; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-08  5:48 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: stolee, jnareb, git

On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> > On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > > The struct commit is used in many contexts. However, members generation
> > > and graph_pos are only used for commit-graph related operations and
> > > otherwise waste memory.
> > > 
> > > This wastage would have been more pronounced as transistion to
> > > generation number v2, which uses 64-bit generation number instead of
> > > current 32-bits.
> > 
> > Thanks! This is an important step, and will already improve
> > performance in subtle ways.
> 
> While the reduced memory footprint of each commit object might improve
> performance, accessing graph position and generation numbers in a
> commit-slab is more expensive than direct field accesses in 'struct
> commit' instances.  Consequently, these patches increase the runtime
> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> repository from 0.630s to 0.940s.
> 

Thank you for checking performance. Performance penalty was something we
had discussed here [1]. 

Caching the commit slab results in local variables helped wonderfully in v2 [2].
For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
in the linux repository increased from 0.762 to 0.767s. Since this is a
change of <1%, it is *no longer* a performance regression in my opinion.

[1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/
[2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/

> 
> > >  create mode 100644 contrib/coccinelle/generation.cocci
> > >  create mode 100644 contrib/coccinelle/graph_pos.cocci
> > 
> > I appreciate the Coccinelle scripts to help identify
> > automatic fixes for other topics in-flight. However,
> > I wonder if they would be better placed inside the
> > existing commit.cocci file?
> 
> We add Coccinelle scripts to avoid undesirable code patterns entering
> our code base.  That, however, is not the case here: this is a
> one-time conversion, and at the end of this series 'struct commit'
> won't have a 'generation' field anymore, so once it's merged the
> compiler will catch any new 'commit->generation' accesses.  Therefore
> I don't think that these Coccinelle scripts should be added at all.
> 

Alright, that makes sense to me. Will remove in a subsequent version.

Thanks
Abhishek

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab
  2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
@ 2020-06-08  8:26     ` SZEDER Gábor
  2020-06-08 12:35       ` Derrick Stolee
  0 siblings, 1 reply; 39+ messages in thread
From: SZEDER Gábor @ 2020-06-08  8:26 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, jnareb, stolee

On Mon, Jun 08, 2020 at 01:02:35AM +0530, Abhishek Kumar wrote:
> diff --git a/commit-graph.c b/commit-graph.c
> index 7d887a6a2c..f7cca4def4 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c

> @@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		display_progress(ctx->progress, i + 1);
> -		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
> -		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
> +		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
> +		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
>  			continue;
>  
>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			uint32_t max_generation = 0;
>  
>  			for (parent = current->parents; parent; parent = parent->next) {
> -				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
> -				    parent->item->generation == GENERATION_NUMBER_ZERO) {
> +				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
> +				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;
> -				} else if (parent->item->generation > max_generation) {
> -					max_generation = parent->item->generation;
> +				} else if (commit_graph_generation(parent->item) > max_generation) {
> +					max_generation = commit_graph_generation(parent->item);
>  				}
>  			}
>  
>  			if (all_parents_computed) {
> -				current->generation = max_generation + 1;
> +				commit_graph_data_at(current)->generation = max_generation + 1;
>  				pop_commit(&list);
>  
> -				if (current->generation > GENERATION_NUMBER_MAX)
> -					current->generation = GENERATION_NUMBER_MAX;
> +				if (commit_graph_generation(current) > GENERATION_NUMBER_MAX)
> +					commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX;
>  			}
>  		}
>  	}

Something about these conversions is not right, as they send
compute_generation_numbers() into an endless loop, and
't5318-commit-graph.sh' hangs because of this.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-08  5:48     ` Abhishek Kumar
@ 2020-06-08  8:36       ` SZEDER Gábor
  2020-06-08 13:45         ` Derrick Stolee
  2020-06-08 15:21         ` Jakub Narębski
  0 siblings, 2 replies; 39+ messages in thread
From: SZEDER Gábor @ 2020-06-08  8:36 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: stolee, jnareb, git

On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> > On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> > > On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> > > > The struct commit is used in many contexts. However, members generation
> > > > and graph_pos are only used for commit-graph related operations and
> > > > otherwise waste memory.
> > > > 
> > > > This wastage would have been more pronounced as transistion to
> > > > generation number v2, which uses 64-bit generation number instead of
> > > > current 32-bits.
> > > 
> > > Thanks! This is an important step, and will already improve
> > > performance in subtle ways.
> > 
> > While the reduced memory footprint of each commit object might improve
> > performance, accessing graph position and generation numbers in a
> > commit-slab is more expensive than direct field accesses in 'struct
> > commit' instances.  Consequently, these patches increase the runtime
> > of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> > repository from 0.630s to 0.940s.
> > 
> 
> Thank you for checking performance. Performance penalty was something we
> had discussed here [1]. 
> 
> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> in the linux repository increased from 0.762 to 0.767s. Since this is a
> change of <1%, it is *no longer* a performance regression in my opinion.

Interesting, I measured 0.870s with v2, still a notable increase from
0.630s.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab
  2020-06-08  8:26     ` SZEDER Gábor
@ 2020-06-08 12:35       ` Derrick Stolee
  0 siblings, 0 replies; 39+ messages in thread
From: Derrick Stolee @ 2020-06-08 12:35 UTC (permalink / raw)
  To: SZEDER Gábor, Abhishek Kumar; +Cc: git, jnareb

On 6/8/2020 4:26 AM, SZEDER Gábor wrote:
> On Mon, Jun 08, 2020 at 01:02:35AM +0530, Abhishek Kumar wrote:
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 7d887a6a2c..f7cca4def4 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
> 
>> @@ -1302,8 +1302,8 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>>  					ctx->commits.nr);
>>  	for (i = 0; i < ctx->commits.nr; i++) {
>>  		display_progress(ctx->progress, i + 1);
>> -		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
>> -		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
>> +		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
>> +		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
>>  			continue;
>>  
>>  		commit_list_insert(ctx->commits.list[i], &list);
>> @@ -1314,22 +1314,22 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>>  			uint32_t max_generation = 0;
>>  
>>  			for (parent = current->parents; parent; parent = parent->next) {
>> -				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
>> -				    parent->item->generation == GENERATION_NUMBER_ZERO) {
>> +				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
>> +				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
>>  					all_parents_computed = 0;
>>  					commit_list_insert(parent->item, &list);
>>  					break;
>> -				} else if (parent->item->generation > max_generation) {
>> -					max_generation = parent->item->generation;
>> +				} else if (commit_graph_generation(parent->item) > max_generation) {
>> +					max_generation = commit_graph_generation(parent->item);
>>  				}
>>  			}
>>  
>>  			if (all_parents_computed) {
>> -				current->generation = max_generation + 1;
>> +				commit_graph_data_at(current)->generation = max_generation + 1;
>>  				pop_commit(&list);
>>  
>> -				if (current->generation > GENERATION_NUMBER_MAX)
>> -					current->generation = GENERATION_NUMBER_MAX;
>> +				if (commit_graph_generation(current) > GENERATION_NUMBER_MAX)
>> +					commit_graph_data_at(current)->generation = GENERATION_NUMBER_MAX;
>>  			}
>>  		}
>>  	}
> 
> Something about these conversions is not right, as they send
> compute_generation_numbers() into an endless loop, and
> 't5318-commit-graph.sh' hangs because of this.

Abhishek responded off-list, but it's worth having the discussion
here, too.

While the next patch fixes the bug introduced here, we strive to
have every patch compile and pass all tests on all platforms. It
can be hard to verify that last "all platforms" condition, but
we can run (most) tests on each of our patches using the following:

$ git rebase -x "make -j12 DEVELOPER=1 && (cd t && prove -j8 t[0-8]*.sh)" <base>

Thanks, Szeder, for finding this issue in the patch.

Looking at this patch and patch 3, I think you should just squash that patch
into this one, since the code you are removing in patch 3 was added by this
one. Add a paragraph in your commit message that details why we need to use
commit_graph_data_at() directly in write_graph_chunk_data() and
compute_generation_numbers().

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-08  8:36       ` SZEDER Gábor
@ 2020-06-08 13:45         ` Derrick Stolee
  2020-06-08 16:46           ` SZEDER Gábor
  2020-06-08 15:21         ` Jakub Narębski
  1 sibling, 1 reply; 39+ messages in thread
From: Derrick Stolee @ 2020-06-08 13:45 UTC (permalink / raw)
  To: SZEDER Gábor, Abhishek Kumar; +Cc: jnareb, git

On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
>> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
>>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
>>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
>>>>> The struct commit is used in many contexts. However, members generation
>>>>> and graph_pos are only used for commit-graph related operations and
>>>>> otherwise waste memory.
>>>>>
>>>>> This wastage would have been more pronounced as transistion to
>>>>> generation number v2, which uses 64-bit generation number instead of
>>>>> current 32-bits.
>>>>
>>>> Thanks! This is an important step, and will already improve
>>>> performance in subtle ways.
>>>
>>> While the reduced memory footprint of each commit object might improve
>>> performance, accessing graph position and generation numbers in a
>>> commit-slab is more expensive than direct field accesses in 'struct
>>> commit' instances.  Consequently, these patches increase the runtime
>>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
>>> repository from 0.630s to 0.940s.
>>>
>>
>> Thank you for checking performance. Performance penalty was something we
>> had discussed here [1]. 
>>
>> Caching the commit slab results in local variables helped wonderfully in v2 [2].
>> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
>> in the linux repository increased from 0.762 to 0.767s. Since this is a
>> change of <1%, it is *no longer* a performance regression in my opinion.
> 
> Interesting, I measured 0.870s with v2, still a notable increase from
> 0.630s.

This is an interesting point. The --is-ancestor is critical to the
performance issue (as measured on my machine).

For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get

v2.27.0:
real    0m0.515s
user    0m0.467s
sys     0m0.048s

v2 series:
real    0m0.534s
user    0m0.481s
sys     0m0.053s

With "--is-ancestor" I see the following:

v2.27.0:
real    0m0.591s
user    0m0.539s
sys     0m0.052s

v2 series:
real    0m0.773s
user    0m0.733s
sys     0m0.040s

The --is-ancestor option [1] says

    Check if the first <commit> is an ancestor of the second
    <commit>, and exit with status 0 if true, or with status
    1 if not. Errors are signaled by a non-zero status that
    is not 1.

[1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor

This _should_ be faster than "git branch --contains HEAD~50000",
but it is much much slower:

$ time git branch --contains HEAD~50000
real    0m0.068s
user    0m0.061s
sys     0m0.008s

So, there is definitely something going on that slows the
"--is-ancestor" path in this case. But, the solution is not
to halt the current patch (which likely has memory footprint
benefits when dealing with a lot of tree and blob objects)
and instead fix the underlying algorithm.

Let's add that to the list of things to do.

>>>  create mode 100644 contrib/coccinelle/generation.cocci
>>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
>>
>> I appreciate the Coccinelle scripts to help identify
>> automatic fixes for other topics in-flight. However,
>> I wonder if they would be better placed inside the
>> existing commit.cocci file?
>
> We add Coccinelle scripts to avoid undesirable code patterns entering
> our code base.  That, however, is not the case here: this is a
> one-time conversion, and at the end of this series 'struct commit'
> won't have a 'generation' field anymore, so once it's merged the
> compiler will catch any new 'commit->generation' accesses.  Therefore
> I don't think that these Coccinelle scripts should be added at all.

I disagree. We _also_ add Coccinelle scripts when doing one-time
refactors to avoid logical merge conflicts with other topics in
flight. If someone else is working on a parallel topic that adds
references to graph_pos or generation member, then the scripts provide
an easy way for the maintainer to update those references in the merge
commit. Alternatively, the contributor could rebase on top of this
series and run the scripts themselves to fix their patches before
submission.

For example, this was done carefully in the sha->object_id
conversion using contrib/coccinelle/object_id.cocci.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-08  8:36       ` SZEDER Gábor
  2020-06-08 13:45         ` Derrick Stolee
@ 2020-06-08 15:21         ` Jakub Narębski
  1 sibling, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-08 15:21 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Abhishek Kumar, Derrick Stolee, git

SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
>> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
>>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
>>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:

>>>>> The struct commit is used in many contexts. However, members generation
>>>>> and graph_pos are only used for commit-graph related operations and
>>>>> otherwise waste memory.
>>>>> 
>>>>> This wastage would have been more pronounced as transistion to
>>>>> generation number v2, which uses 64-bit generation number instead of
>>>>> current 32-bits.
>>>> 
>>>> Thanks! This is an important step, and will already improve
>>>> performance in subtle ways.
>>> 
>>> While the reduced memory footprint of each commit object might improve
>>> performance, accessing graph position and generation numbers in a
>>> commit-slab is more expensive than direct field accesses in 'struct
>>> commit' instances.  Consequently, these patches increase the runtime
>>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
>>> repository from 0.630s to 0.940s. 
>> 
>> Thank you for checking performance. Performance penalty was something we
>> had discussed here [1]. 
>> 
>> Caching the commit slab results in local variables helped wonderfully in v2 [2].
>> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
>> in the linux repository increased from 0.762 to 0.767s. Since this is a
>> change of <1%, it is *no longer* a performance regression in my opinion.
>>
>> [1]: https://lore.kernel.org/git/9a15c7ba-8b55-099a-3c59-b5e7ff6124f6@gmail.com/
>> [2]: https://lore.kernel.org/git/20200607193237.699335-5-abhishekkumar8222@gmail.com/
>
> Interesting, I measured 0.870s with v2, still a notable increase from
> 0.630s [a change of +38%].

I wonder what might be the cause for this difference.  Is it difference
in hardware (faster memory, larger CPU cache?), difference in operating
system, or difference in position of HEAD?

On one hand it is large relative difference.  On the other hand it is
almost unnoticeable absolute difference of 0.25s.


I also wonder how the performance changes (with moving commit-graph data
to the slab) for commands that do not use this data, like e.g.:

  $ git -o core.commitGraph=false merge-base --is-ancestor HEAD~50000 HEAD

or

  $ git gc


Sidenote: I think the performance changes should be mentioned at least
in the cover letter for the series, if not in commit message(s).

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 0/4] Move generation, graph_pos to a slab
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
                     ` (3 preceding siblings ...)
  2020-06-07 19:32   ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
@ 2020-06-08 16:22   ` Jakub Narębski
  2020-06-15 16:24   ` Taylor Blau
  5 siblings, 0 replies; 39+ messages in thread
From: Jakub Narębski @ 2020-06-08 16:22 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> The struct commit is used in many contexts. However, members
> `generation` and `graph_pos` are only used for commit graph related
> operations and otherwise waste memory.
>
> This wastage would have been more pronounced as we transition to
> generation number v2, which uses 64-bite generation number instead of
> current 32-bits.

It would be nice (though not required) to have some specific data:
benchmarks with time and memory (RSS maybe?) of Git commands using
commit-graph and those not using it, before and after this patch series.
Maybe time to run the test suite, or the perf suite...

But this is not a show stopper, in my opinion.

Best,

  Jakub Narębski


> Abhishek Kumar (4):
>   commit-graph: introduce commit_graph_data_slab
>   commit: move members graph_pos, generation to a slab
>   commit-graph: use generation directly when writing commit-graph
>   commit-graph: minimize commit_graph_data_slab access
>
>  alloc.c                         |   2 -
>  blame.c                         |   2 +-
>  bloom.c                         |   7 +-
>  commit-graph.c                  | 127 ++++++++++++++++++++++++--------
>  commit-graph.h                  |  10 +++
>  commit-reach.c                  |  69 ++++++++++-------
>  commit.c                        |   8 +-
>  contrib/coccinelle/commit.cocci |  18 +++++
>  revision.c                      |  20 +++--
>  9 files changed, 190 insertions(+), 73 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph
  2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
@ 2020-06-08 16:31     ` Jakub Narębski
  2020-06-15 16:31       ` Taylor Blau
  0 siblings, 1 reply; 39+ messages in thread
From: Jakub Narębski @ 2020-06-08 16:31 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, Derrick Stolee

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

> commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the
> graph position for commit is COMMIT_NOT_FROM_GRAPH.
>
> While this is true when reading from a commit graph, no graph positions
> are associated with a commit when writing a commit graph. Therefore, the
> helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a
> finite generation number.
>
> Let's fix this by using generation number directly when writing a commit
> graph.

I think that to avoid having non-working patch (which can cause problems
when bisecting), it would be a better idea to switch the order of
patches 2 and 3.  This way we won't have incorrect behaviour.

>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index f7cca4def4..0dc79e7c90 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  		else
>  			packedDate[0] = 0;
>  
> -		packedDate[0] |= htonl(commit_graph_generation((*list)) << 2);
> +		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
>

All right.

>  		packedDate[1] = htonl((*list)->date);
>  		hashwrite(f, packedDate, 8);
> @@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  					_("Computing commit graph generation numbers"),
>  					ctx->commits.nr);
>  	for (i = 0; i < ctx->commits.nr; i++) {
> +		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> +
>  		display_progress(ctx->progress, i + 1);
> -		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
> -		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
> +		if (generation != GENERATION_NUMBER_INFINITY &&
> +		    generation != GENERATION_NUMBER_ZERO)
>  			continue;
>

All right; this also introduces local variable to avoid accessing the
slab twice^W four times...

>  		commit_list_insert(ctx->commits.list[i], &list);
> @@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  			uint32_t max_generation = 0;
>  
>  			for (parent = current->parents; parent; parent = parent->next) {
> -				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
> -				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
> +
> +				if (generation == GENERATION_NUMBER_INFINITY ||
> +				    generation == GENERATION_NUMBER_ZERO) {
>  					all_parents_computed = 0;
>  					commit_list_insert(parent->item, &list);
>  					break;

... which is then used here.

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSoC Patch 0/3] Move generation, graph_pos to a slab
  2020-06-08 13:45         ` Derrick Stolee
@ 2020-06-08 16:46           ` SZEDER Gábor
  0 siblings, 0 replies; 39+ messages in thread
From: SZEDER Gábor @ 2020-06-08 16:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Abhishek Kumar, jnareb, git

On Mon, Jun 08, 2020 at 09:45:12AM -0400, Derrick Stolee wrote:
> On 6/8/2020 4:36 AM, SZEDER Gábor wrote:
> > On Mon, Jun 08, 2020 at 11:18:27AM +0530, Abhishek Kumar wrote:
> >> On Sun, Jun 07, 2020 at 09:53:47PM +0200, SZEDER Gábor wrote:
> >>> On Thu, Jun 04, 2020 at 10:22:27AM -0400, Derrick Stolee wrote:
> >>>> On 6/4/2020 3:27 AM, Abhishek Kumar wrote:
> >>>>> The struct commit is used in many contexts. However, members generation
> >>>>> and graph_pos are only used for commit-graph related operations and
> >>>>> otherwise waste memory.
> >>>>>
> >>>>> This wastage would have been more pronounced as transistion to
> >>>>> generation number v2, which uses 64-bit generation number instead of
> >>>>> current 32-bits.
> >>>>
> >>>> Thanks! This is an important step, and will already improve
> >>>> performance in subtle ways.
> >>>
> >>> While the reduced memory footprint of each commit object might improve
> >>> performance, accessing graph position and generation numbers in a
> >>> commit-slab is more expensive than direct field accesses in 'struct
> >>> commit' instances.  Consequently, these patches increase the runtime
> >>> of 'git merge-base --is-ancestor HEAD~50000 HEAD' in the linux
> >>> repository from 0.630s to 0.940s.
> >>>
> >>
> >> Thank you for checking performance. Performance penalty was something we
> >> had discussed here [1]. 
> >>
> >> Caching the commit slab results in local variables helped wonderfully in v2 [2].
> >> For example, the runtime of 'git merge-base --is-ancestor HEAD~50000 HEAD'
> >> in the linux repository increased from 0.762 to 0.767s. Since this is a
> >> change of <1%, it is *no longer* a performance regression in my opinion.
> > 
> > Interesting, I measured 0.870s with v2, still a notable increase from
> > 0.630s.
> 
> This is an interesting point. The --is-ancestor is critical to the
> performance issue (as measured on my machine).
> 
> For "git merge-base HEAD~50000 HEAD" on the Linux repo, I get
> 
> v2.27.0:
> real    0m0.515s
> user    0m0.467s
> sys     0m0.048s
> 
> v2 series:
> real    0m0.534s
> user    0m0.481s
> sys     0m0.053s

I, too, see similarly small differences in this case.

> With "--is-ancestor" I see the following:
> 
> v2.27.0:
> real    0m0.591s
> user    0m0.539s
> sys     0m0.052s
> 
> v2 series:
> real    0m0.773s
> user    0m0.733s
> sys     0m0.040s
> 
> The --is-ancestor option [1] says
> 
>     Check if the first <commit> is an ancestor of the second
>     <commit>, and exit with status 0 if true, or with status
>     1 if not. Errors are signaled by a non-zero status that
>     is not 1.
> 
> [1] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor
> 
> This _should_ be faster than "git branch --contains HEAD~50000",
> but it is much much slower:
> 
> $ time git branch --contains HEAD~50000
> real    0m0.068s
> user    0m0.061s
> sys     0m0.008s
> 
> So, there is definitely something going on that slows the
> "--is-ancestor" path in this case. But, the solution is not
> to halt the current patch (which likely has memory footprint
> benefits when dealing with a lot of tree and blob objects)
> and instead fix the underlying algorithm.

Other, more common cases are affected as well, notably the simple 'git
rev-list --topo-order':

  performance: 1.226479734 s: git command: /home/szeder/src/git/BUILDS/v2.27.0/bin/git rev-list --topo-order HEAD
  max RSS: 162400k
  
  performance: 1.741309536 s: git command: /home/szeder/src/git/git rev-list --topo-order HEAD
  max RSS: 169556k

Is the supposed memory footprint reduction that large to justify this
runtime increase?

> Let's add that to the list of things to do.

And to the commit messages.

> >>>  create mode 100644 contrib/coccinelle/generation.cocci
> >>>  create mode 100644 contrib/coccinelle/graph_pos.cocci
> >>
> >> I appreciate the Coccinelle scripts to help identify
> >> automatic fixes for other topics in-flight. However,
> >> I wonder if they would be better placed inside the
> >> existing commit.cocci file?
> >
> > We add Coccinelle scripts to avoid undesirable code patterns entering
> > our code base.  That, however, is not the case here: this is a
> > one-time conversion, and at the end of this series 'struct commit'
> > won't have a 'generation' field anymore, so once it's merged the
> > compiler will catch any new 'commit->generation' accesses.  Therefore
> > I don't think that these Coccinelle scripts should be added at all.
> 
> I disagree. We _also_ add Coccinelle scripts when doing one-time
> refactors to avoid logical merge conflicts with other topics in
> flight. If someone else is working on a parallel topic that adds
> references to graph_pos or generation member, then the scripts provide
> an easy way for the maintainer to update those references in the merge
> commit. Alternatively, the contributor could rebase on top of this
> series and run the scripts themselves to fix their patches before
> submission.
> 
> For example, this was done carefully in the sha->object_id
> conversion using contrib/coccinelle/object_id.cocci.

'object_id.cocci' is not about sha->object_id conversions, but about
avoiding undesirable code patterns, e.g. we prefer oideq() over
!oidcmp(), and the compiler, of course, can't help to catch that.
Coccinelle scripts used for actual sha->object_id transformations were
not added to 'object_id.cocci', but were recorded only in the commit
messages for reference, see e.g.  9b56149996 (merge-recursive: convert
struct merge_file_info to object_id, 2016-06-24) and a couple of its
ancestors.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 0/4] Move generation, graph_pos to a slab
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
                     ` (4 preceding siblings ...)
  2020-06-08 16:22   ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski
@ 2020-06-15 16:24   ` Taylor Blau
  5 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2020-06-15 16:24 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, jnareb, stolee

Hi Abhishek,

I am so excited that you are working on this, and welcome to Git! This
change will make a meaningful difference for us at GitHub (who use
commit graphs extensively), and it sounds like they will even make a
larger difference with your later changes.

I was out of the office last week, and so I hadn't gotten a chance to
review the first version of this series, but I'll review v2 now...

On Mon, Jun 08, 2020 at 01:02:33AM +0530, Abhishek Kumar wrote:
> The struct commit is used in many contexts. However, members
> `generation` and `graph_pos` are only used for commit graph related
> operations and otherwise waste memory.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab
  2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
@ 2020-06-15 16:27     ` Taylor Blau
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2020-06-15 16:27 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, jnareb, stolee

On Mon, Jun 08, 2020 at 01:02:34AM +0530, Abhishek Kumar wrote:
> The struct commit is used in many contexts. However, members
> `generation` and `graph_pos` are only used for commit-graph related
> operations and otherwise waste memory.
>
> As they are often accessed together, let's introduce struct
> commit_graph_data and move them to a commit_graph_data slab.
>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  commit-graph.h | 10 ++++++++++
>  2 files changed, 59 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index e3420ddcbf..7d887a6a2c 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>
> +define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
> +static struct commit_graph_data_slab commit_graph_data_slab =
> +	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
> +
> +uint32_t commit_graph_position(const struct commit *c)
> +{
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> +
> +	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
> +}
> +
> +uint32_t commit_graph_generation(const struct commit *c)
> +{
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> +
> +	if (!data)
> +		return GENERATION_NUMBER_INFINITY;
> +	if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
> +		return GENERATION_NUMBER_INFINITY;
> +
> +	return data->generation;
> +}
> +
> +static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
> +{
> +	uint32_t i = commit_graph_data_slab.slab_count, j;
> +	uint32_t slab_size = commit_graph_data_slab.slab_size;
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_at(&commit_graph_data_slab, c);
> +
> +	/*
> +	 * commit-slab initializes elements with zero, overwrite this with
> +	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
> +	 *
> +	 * We avoid the cost of initializing `generation` as generation
> +	 * number would be GENERATION_NUMBER_INFINITY if graph position
> +	 * is COMMIT_NOT_FROM_GRAPH.
> +	 */
> +	for (; i < commit_graph_data_slab.slab_count; i++) {
> +		for (j = 0; j < slab_size; j++) {
> +			commit_graph_data_slab.slab[i][j].graph_pos = COMMIT_NOT_FROM_GRAPH;
> +		}
> +	}
> +
> +	return data;
> +}
> +

It looks like this function ('commit_graph_data_at') is unused in this
patch. No big deal, especially since I can see that you are using it in
the second patch, but I think that you should remove it from this patch
and instead introduce it there.

In fact, this causes a build error (with 'make DEVELOPER=1') because of
'-Wunused-function'. It's good practice to "git rebase -x 'make
DEVELOPER=1' origin/master" before sending.

>  static int commit_gen_cmp(const void *va, const void *vb)
>  {
>  	const struct commit *a = *(const struct commit **)va;
> diff --git a/commit-graph.h b/commit-graph.h
> index 4212766a4f..9d22f98f44 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -137,4 +137,14 @@ void free_commit_graph(struct commit_graph *);
>   */
>  void disable_commit_graph(struct repository *r);
>
> +struct commit_graph_data {
> +	uint32_t graph_pos;
> +	uint32_t generation;
> +};
> +
> +/*
> + * Commits should be parsed before accessing generation, graph positions.
> + */
> +uint32_t commit_graph_generation(const struct commit *);
> +uint32_t commit_graph_position(const struct commit *);
>  #endif
> --
> 2.27.0

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph
  2020-06-08 16:31     ` Jakub Narębski
@ 2020-06-15 16:31       ` Taylor Blau
  0 siblings, 0 replies; 39+ messages in thread
From: Taylor Blau @ 2020-06-15 16:31 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Abhishek Kumar, git, Derrick Stolee

On Mon, Jun 08, 2020 at 06:31:49PM +0200, Jakub Narębski wrote:
> Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
>
> > commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the
> > graph position for commit is COMMIT_NOT_FROM_GRAPH.
> >
> > While this is true when reading from a commit graph, no graph positions
> > are associated with a commit when writing a commit graph. Therefore, the
> > helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a
> > finite generation number.
> >
> > Let's fix this by using generation number directly when writing a commit
> > graph.
>
> I think that to avoid having non-working patch (which can cause problems
> when bisecting), it would be a better idea to switch the order of
> patches 2 and 3.  This way we won't have incorrect behaviour.

Yup... agreed (with the additional caveat that the unused function in
the first patch also be moved into this new--larger--patch).

> >
> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> > ---
> >  commit-graph.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index f7cca4def4..0dc79e7c90 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -1070,7 +1070,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> >  		else
> >  			packedDate[0] = 0;
> >
> > -		packedDate[0] |= htonl(commit_graph_generation((*list)) << 2);
> > +		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
> >
>
> All right.
>
> >  		packedDate[1] = htonl((*list)->date);
> >  		hashwrite(f, packedDate, 8);
> > @@ -1301,9 +1301,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  					_("Computing commit graph generation numbers"),
> >  					ctx->commits.nr);
> >  	for (i = 0; i < ctx->commits.nr; i++) {
> > +		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
> > +
> >  		display_progress(ctx->progress, i + 1);
> > -		if (commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_INFINITY &&
> > -		    commit_graph_generation(ctx->commits.list[i]) != GENERATION_NUMBER_ZERO)
> > +		if (generation != GENERATION_NUMBER_INFINITY &&
> > +		    generation != GENERATION_NUMBER_ZERO)
> >  			continue;
> >
>
> All right; this also introduces local variable to avoid accessing the
> slab twice^W four times...
>
> >  		commit_list_insert(ctx->commits.list[i], &list);
> > @@ -1314,8 +1316,9 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
> >  			uint32_t max_generation = 0;
> >
> >  			for (parent = current->parents; parent; parent = parent->next) {
> > -				if (commit_graph_generation(parent->item) == GENERATION_NUMBER_INFINITY ||
> > -				    commit_graph_generation(parent->item) == GENERATION_NUMBER_ZERO) {
> > +
> > +				if (generation == GENERATION_NUMBER_INFINITY ||
> > +				    generation == GENERATION_NUMBER_ZERO) {
> >  					all_parents_computed = 0;
> >  					commit_list_insert(parent->item, &list);
> >  					break;
>
> ... which is then used here.
>
> Best,
> --
> Jakub Narębski
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [GSOC Patch v4 0/4] Move generation, graph_pos to a slab
  2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (5 preceding siblings ...)
  2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
@ 2020-06-17  9:14 ` Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
                     ` (4 more replies)
  6 siblings, 5 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:14 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation nuber v2, which uses 64-bit generation number instead of
current 32-bits.

While the overall test suite runs as fast as master
(series: 26m48s, master: 27m34s, faster by 2.87%), certain commands
like `git merge-base --is-ancestor` were slowed by 40% as discovered
by Szeder Gábor [1]. After minimizing commit-slab access, the slow down
persists but is closer to 20%.

Derrick Stolee believes the slow down is attributable to the underlying
algorithm rather than the slowness of commit-slab access [2] and we will
follow-up in a later series.

Abhishek Kumar (4):
  object: drop parsed_object_pool->commit_count
  commit-graph: introduce commit_graph_data_slab
  commit: move members graph_pos, generation to a slab
  commit-graph: minimize commit_graph_data_slab access

 alloc.c                         |  18 +++--
 alloc.h                         |   2 +-
 blame.c                         |   2 +-
 blob.c                          |   2 +-
 bloom.c                         |   7 +-
 builtin/commit-graph.c          |   2 +-
 builtin/fsck.c                  |   2 +-
 commit-graph.c                  | 130 ++++++++++++++++++++++++--------
 commit-graph.h                  |  10 +++
 commit-reach.c                  |  69 ++++++++++-------
 commit.c                        |  12 +--
 commit.h                        |   2 -
 contrib/coccinelle/commit.cocci |  18 +++++
 object.c                        |   4 +-
 object.h                        |   3 +-
 refs.c                          |   2 +-
 revision.c                      |  20 +++--
 t/helper/test-reach.c           |   2 +-
 tag.c                           |   2 +-
 tree.c                          |   2 +-
 20 files changed, 217 insertions(+), 94 deletions(-)

-- 
2.27.0

Changes in v4:
- Fix segfault while initializing commit_graph_data_slab.
- Fix the "commit index should have unique values" more cleanly

Changes in v3:
- Introduce alloc commit to fix the failing diff-submodule test.
- Elaborate on perforamnce and the slow down in commit message

Changes in v2:
- Introduce struct commit_graph_data
- Merge `graph_pos`, `generation` slabs into a single,
  `commit_graph_data` slab.
- Use graph position for an intermediate check for generation, saving
  the cost of initializing generation numbers.
- Add an follow-up patch caching results of slab access in local
  variables.
- Move coccinelle transformation to commit.coccinelle instead of
  creating new scripts.
- Elaborate on removing default values from init_commit_node().
- Revert moving macro constants (e.g. COMMIT_NOT_FROM_GRAPH,
  GENERATION_NUMBER_ZERO0 from commit.h to commit-graph.h

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
@ 2020-06-17  9:14   ` Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:14 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                | 16 +++++++++++-----
 alloc.h                |  2 +-
 blob.c                 |  2 +-
 builtin/commit-graph.c |  2 +-
 builtin/fsck.c         |  2 +-
 commit.c               |  4 ++--
 object.c               |  4 ++--
 object.h               |  3 +--
 refs.c                 |  2 +-
 t/helper/test-reach.c  |  2 +-
 tag.c                  |  2 +-
 tree.c                 |  2 +-
 12 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/alloc.c b/alloc.c
index 1c64c4dd16..99fa934b32 100644
--- a/alloc.c
+++ b/alloc.c
@@ -99,15 +99,21 @@ void *alloc_object_node(struct repository *r)
 	return obj;
 }
 
-static unsigned int alloc_commit_index(struct repository *r)
+/*
+ * The returned count is to be used as an index into commit slabs,
+ * that are *NOT* maintained per repository, and that is why a single
+ * global counter is used.
+ */
+static unsigned int alloc_commit_index(void)
 {
-	return r->parsed_objects->commit_count++;
+	static unsigned int parsed_commits_count;
+	return parsed_commits_count++;
 }
 
-void init_commit_node(struct repository *r, struct commit *c)
+void init_commit_node(struct commit *c)
 {
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(r);
+	c->index = alloc_commit_index();
 	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	c->generation = GENERATION_NUMBER_INFINITY;
 }
@@ -115,7 +121,7 @@ void init_commit_node(struct repository *r, struct commit *c)
 void *alloc_commit_node(struct repository *r)
 {
 	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
-	init_commit_node(r, c);
+	init_commit_node(c);
 	return c;
 }
 
diff --git a/alloc.h b/alloc.h
index ed1071c11e..371d388b55 100644
--- a/alloc.h
+++ b/alloc.h
@@ -9,7 +9,7 @@ struct repository;
 
 void *alloc_blob_node(struct repository *r);
 void *alloc_tree_node(struct repository *r);
-void init_commit_node(struct repository *r, struct commit *c);
+void init_commit_node(struct commit *c);
 void *alloc_commit_node(struct repository *r);
 void *alloc_tag_node(struct repository *r);
 void *alloc_object_node(struct repository *r);
diff --git a/blob.c b/blob.c
index 36f9abda19..182718aba9 100644
--- a/blob.c
+++ b/blob.c
@@ -10,7 +10,7 @@ struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(r, obj, OBJ_BLOB, 0);
+	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 75455da138..f6797e2a9f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -154,7 +154,7 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
 			   NULL, 0);
 	if (!result)
 		return error(_("invalid object: %s"), hash);
-	else if (object_as_type(the_repository, result, OBJ_COMMIT, 1))
+	else if (object_as_type(result, OBJ_COMMIT, 1))
 		oidset_insert(commits, &result->oid);
 
 	display_progress(progress, oidset_size(commits));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index f02cbdb439..b2cef01389 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -241,7 +241,7 @@ static void mark_unreachable_referents(const struct object_id *oid)
 		enum object_type type = oid_object_info(the_repository,
 							&obj->oid, NULL);
 		if (type > 0)
-			object_as_type(the_repository, obj, type, 0);
+			object_as_type(obj, type, 0);
 	}
 
 	options.walk = mark_used;
diff --git a/commit.c b/commit.c
index 87686a7055..b30875e66b 100644
--- a/commit.c
+++ b/commit.c
@@ -37,7 +37,7 @@ struct commit *lookup_commit_reference_gently(struct repository *r,
 
 	if (!obj)
 		return NULL;
-	return object_as_type(r, obj, OBJ_COMMIT, quiet);
+	return object_as_type(obj, OBJ_COMMIT, quiet);
 }
 
 struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid)
@@ -62,7 +62,7 @@ struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
-	return object_as_type(r, obj, OBJ_COMMIT, 0);
+	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/object.c b/object.c
index 794c86650e..3257518656 100644
--- a/object.c
+++ b/object.c
@@ -157,13 +157,13 @@ void *create_object(struct repository *r, const struct object_id *oid, void *o)
 	return obj;
 }
 
-void *object_as_type(struct repository *r, struct object *obj, enum object_type type, int quiet)
+void *object_as_type(struct object *obj, enum object_type type, int quiet)
 {
 	if (obj->type == type)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			init_commit_node(r, (struct commit *) obj);
+			init_commit_node((struct commit *) obj);
 		else
 			obj->type = type;
 		return obj;
diff --git a/object.h b/object.h
index b22328b838..532d7d7f28 100644
--- a/object.h
+++ b/object.h
@@ -15,7 +15,6 @@ struct parsed_object_pool {
 	struct alloc_state *commit_state;
 	struct alloc_state *tag_state;
 	struct alloc_state *object_state;
-	unsigned commit_count;
 
 	/* parent substitutions from .git/info/grafts and .git/shallow */
 	struct commit_graft **grafts;
@@ -121,7 +120,7 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid);
 
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
-void *object_as_type(struct repository *r, struct object *obj, enum object_type type, int quiet);
+void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/refs.c b/refs.c
index 224ff66c7b..1f551dd279 100644
--- a/refs.c
+++ b/refs.c
@@ -339,7 +339,7 @@ enum peel_status peel_object(const struct object_id *name, struct object_id *oid
 
 	if (o->type == OBJ_NONE) {
 		int type = oid_object_info(the_repository, name, NULL);
-		if (type < 0 || !object_as_type(the_repository, o, type, 0))
+		if (type < 0 || !object_as_type(o, type, 0))
 			return PEEL_INVALID;
 	}
 
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index a0272178b7..ccf837cb33 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -67,7 +67,7 @@ int cmd__reach(int ac, const char **av)
 			die("failed to load commit for input %s resulting in oid %s\n",
 			    buf.buf, oid_to_hex(&oid));
 
-		c = object_as_type(r, peeled, OBJ_COMMIT, 0);
+		c = object_as_type(peeled, OBJ_COMMIT, 0);
 
 		if (!c)
 			die("failed to load commit for input %s resulting in oid %s\n",
diff --git a/tag.c b/tag.c
index 71b544467e..1ed2684e45 100644
--- a/tag.c
+++ b/tag.c
@@ -103,7 +103,7 @@ struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
-	return object_as_type(r, obj, OBJ_TAG, 0);
+	return object_as_type(obj, OBJ_TAG, 0);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
diff --git a/tree.c b/tree.c
index 1466bcc6a8..e76517f6b1 100644
--- a/tree.c
+++ b/tree.c
@@ -200,7 +200,7 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
-	return object_as_type(r, obj, OBJ_TREE, 0);
+	return object_as_type(obj, OBJ_TREE, 0);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
@ 2020-06-17  9:14   ` Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:14 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit-graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

As they are often accessed together, let's introduce struct
commit_graph_data and move them to a commit_graph_data slab.

While the overall test suite runs just as fast as master,
(series: 26m48s, master: 27m34s, faster by 2.87%), certain commands
like `git merge-base --is-ancestor` were slowed by 40% as discovered
by Szeder Gábor [1]. After minimizing commit-slab access, the slow down
persists but is closer to 20%.

Derrick Stolee believes the slow down is attributable to the underlying
algorithm rather than the slowness of commit-slab access [2] and we will
follow-up in a later series.

[1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---

On linux.git with HEAD at 08bf1a27 (Merge tag 'powerpc-5.8-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux,
2020-06-13):

`git merge-base --is-ancestor HEAD~50000 HEAD`
Time (master):    0.787s
Time (series):    0.927s
Change:           17.79%    (slower)

Max RSS (master): 177694kb
Max RSS (series): 177707kb
Change:           0.01%     (more)

`git gc`
Time (master):    3m55s
Time (series):    3m38s
Change:           7.23%     (faster)

Max RSS (master): 4889868kb
Max RSS (series): 4911960kb
Change:           0.45%     (more)

Earlier implementation of commit_graph_data_at() was incorrect, as we
used to iterate from old slab count to new slab count - assuming all
intermediate slabs are allocated. This is incorrect as the slabs are
allocated only when there's a corresponding commit.

It now makes *two slab accesses* in the worst case, but it's okay since
the worst case occurs once nearly every (512kb / 8b) commits.

 commit-graph.c | 78 +++++++++++++++++++++++++++++++++++++++++++-------
 commit-graph.h | 10 +++++++
 2 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2ff042fbf4..8ad7d202b2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -87,6 +87,58 @@ static int commit_pos_cmp(const void *va, const void *vb)
 	       commit_pos_at(&commit_pos, b);
 }
 
+define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
+static struct commit_graph_data_slab commit_graph_data_slab =
+	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
+
+uint32_t commit_graph_position(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
+}
+
+uint32_t commit_graph_generation(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	if (!data)
+		return GENERATION_NUMBER_INFINITY;
+	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
+		return GENERATION_NUMBER_INFINITY;
+
+	return data->generation;
+}
+
+static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
+{
+	unsigned int i, nth_slab;
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	if (data)
+		return data;
+
+	nth_slab = c->index / commit_graph_data_slab.slab_size;
+	data = commit_graph_data_slab_at(&commit_graph_data_slab, c);
+
+	/*
+	 * commit-slab initializes elements with zero, overwrite this with
+	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
+	 *
+	 * We avoid initializing generation with checking if graph position
+	 * is not COMMIT_NOT_FROM_GRAPH.
+	 */
+	for (i = 0; i < commit_graph_data_slab.slab_size; i++) {
+		commit_graph_data_slab.slab[nth_slab][i].graph_pos =
+			COMMIT_NOT_FROM_GRAPH;
+	}
+
+	return data;
+}
+
 static int commit_gen_cmp(const void *va, const void *vb)
 {
 	const struct commit *a = *(const struct commit **)va;
@@ -1020,7 +1072,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl((*list)->generation << 2);
+		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1251,9 +1303,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
+		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
+
 		display_progress(ctx->progress, i + 1);
-		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
+		if (generation != GENERATION_NUMBER_INFINITY &&
+		    generation != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1264,22 +1318,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+				generation = commit_graph_data_at(parent->item)->generation;
+
+				if (generation == GENERATION_NUMBER_INFINITY ||
+				    generation == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (parent->item->generation > max_generation) {
-					max_generation = parent->item->generation;
+				} else if (generation > max_generation) {
+					max_generation = generation;
 				}
 			}
 
 			if (all_parents_computed) {
-				current->generation = max_generation + 1;
+				struct commit_graph_data *data = commit_graph_data_at(current);
+
+				data->generation = max_generation + 1;
 				pop_commit(&list);
 
-				if (current->generation > GENERATION_NUMBER_MAX)
-					current->generation = GENERATION_NUMBER_MAX;
+				if (data->generation > GENERATION_NUMBER_MAX)
+					data->generation = GENERATION_NUMBER_MAX;
 			}
 		}
 	}
diff --git a/commit-graph.h b/commit-graph.h
index 3ba0da1e5f..28f89cdf3e 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -135,4 +135,14 @@ void free_commit_graph(struct commit_graph *);
  */
 void disable_commit_graph(struct repository *r);
 
+struct commit_graph_data {
+	uint32_t graph_pos;
+	uint32_t generation;
+};
+
+/*
+ * Commits should be parsed before accessing generation, graph positions.
+ */
+uint32_t commit_graph_generation(const struct commit *);
+uint32_t commit_graph_position(const struct commit *);
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
@ 2020-06-17  9:14   ` Abhishek Kumar
  2020-06-17  9:14   ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
  2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
  4 siblings, 0 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:14 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

We remove members `graph_pos` and `generation` from the struct commit.
The default assignments in init_commit_node() are no longer valid,
which is fine as the slab helpers return appropriate default values and
the assignments are removed.

We will replace existing use of commit->generation and commit->graph_pos
by commit_graph_data_slab helpers using
`contrib/coccinelle/commit.cocci'.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                         |  2 --
 blame.c                         |  2 +-
 bloom.c                         |  6 ++--
 commit-graph.c                  | 40 +++++++++++++-------------
 commit-reach.c                  | 50 ++++++++++++++++-----------------
 commit.c                        |  6 ++--
 commit.h                        |  2 --
 contrib/coccinelle/commit.cocci | 18 ++++++++++++
 revision.c                      | 16 +++++------
 9 files changed, 78 insertions(+), 64 deletions(-)

diff --git a/alloc.c b/alloc.c
index 99fa934b32..957a0af362 100644
--- a/alloc.c
+++ b/alloc.c
@@ -114,8 +114,6 @@ void init_commit_node(struct commit *c)
 {
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
-	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
-	c->generation = GENERATION_NUMBER_INFINITY;
 }
 
 void *alloc_commit_node(struct repository *r)
diff --git a/blame.c b/blame.c
index da7e28800e..82fa16d658 100644
--- a/blame.c
+++ b/blame.c
@@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
 	if (!bd)
 		return 1;
 
-	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
 		return 1;
 
 	filter = get_bloom_filter(r, origin->commit, 0);
diff --git a/bloom.c b/bloom.c
index 6c7611847a..3062aafaba 100644
--- a/bloom.c
+++ b/bloom.c
@@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 {
 	uint32_t lex_pos, start_index, end_index;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = c->graph_pos - g->num_commits_in_base;
+	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
@@ -193,7 +193,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
-		if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
+		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH &&
 			r->objects->commit_graph->chunk_bloom_indexes) {
 			if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
 				return filter;
diff --git a/commit-graph.c b/commit-graph.c
index 8ad7d202b2..14cc7e931c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -145,9 +145,9 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *b = *(const struct commit **)vb;
 
 	/* lower generation commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -722,7 +722,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	c = lookup_commit(r, &oid);
 	if (!c)
 		die(_("could not find commit %s"), oid_to_hex(&oid));
-	c->graph_pos = pos;
+	commit_graph_data_at(c)->graph_pos = pos;
 	return &commit_list_insert(c, pptr)->next;
 }
 
@@ -736,8 +736,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	item->graph_pos = pos;
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->graph_pos = pos;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -766,7 +766,7 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	item->graph_pos = pos;
+	commit_graph_data_at(item)->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -779,7 +779,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -811,8 +811,8 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-		*pos = item->graph_pos;
+	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
+		*pos = commit_graph_position(item);
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -868,11 +868,11 @@ static struct tree *load_tree_for_commit(struct repository *r,
 	struct object_id oid;
 	const unsigned char *commit_data;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -886,7 +886,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r,
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
-	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(c) == COMMIT_NOT_FROM_GRAPH)
 		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
 
 	return load_tree_for_commit(r, g, (struct commit *)c);
@@ -1271,7 +1271,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 			continue;
 		if (ctx->split) {
 			if ((!parse_commit(commit) &&
-			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
+			     commit_graph_position(commit) == COMMIT_NOT_FROM_GRAPH) ||
 			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
 				add_missing_parents(ctx, commit);
 		} else if (!parse_commit_no_graph(commit))
@@ -1516,7 +1516,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 			if (ctx->split) {
 				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
-				if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH)
+				if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
 					continue;
 			}
 
@@ -1550,7 +1550,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
-		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		    commit_graph_position(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH)
 			continue;
 
 		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
@@ -2337,8 +2337,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (graph_parents->item->generation > max_generation)
-				max_generation = graph_parents->item->generation;
+			if (commit_graph_generation(graph_parents->item) > max_generation)
+				max_generation = commit_graph_generation(graph_parents->item);
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2348,7 +2348,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 			graph_report(_("commit-graph parent list for commit %s terminates early"),
 				     oid_to_hex(&cur_oid));
 
-		if (!graph_commit->generation) {
+		if (!commit_graph_generation(graph_commit)) {
 			if (generation_zero == GENERATION_NUMBER_EXISTS)
 				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
 					     oid_to_hex(&cur_oid));
@@ -2368,10 +2368,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (graph_commit->generation != max_generation + 1)
+		if (commit_graph_generation(graph_commit) != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     graph_commit->generation,
+				     commit_graph_generation(graph_commit),
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-reach.c b/commit-reach.c
index 4ca7e706a1..3b2f863f5f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit_list *parents;
 		int flags;
 
-		if (min_generation && commit->generation > last_gen)
+		if (min_generation && commit_graph_generation(commit) > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit->generation, last_gen,
+			    commit_graph_generation(commit), last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit->generation;
+		last_gen = commit_graph_generation(commit);
 
-		if (commit->generation < min_generation)
+		if (commit_graph_generation(commit) < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		repo_parse_commit(r, array[i]);
 	for (i = 0; i < cnt; i++) {
 		struct commit_list *common;
-		uint32_t min_generation = array[i]->generation;
+		uint32_t min_generation = commit_graph_generation(array[i]);
 
 		if (redundant[i])
 			continue;
@@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (array[j]->generation < min_generation)
-				min_generation = array[j]->generation;
+			if (commit_graph_generation(array[j]) < min_generation)
+				min_generation = commit_graph_generation(array[j]);
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (reference[i]->generation < min_generation)
-			min_generation = reference[i]->generation;
+		if (commit_graph_generation(reference[i]) < min_generation)
+			min_generation = commit_graph_generation(reference[i]);
 	}
 
-	if (commit->generation > min_generation)
+	if (commit_graph_generation(commit) > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit->generation);
+				     commit_graph_generation(commit));
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate,
 	/* Otherwise, we don't know; prepare to recurse */
 	parse_commit_or_die(candidate);
 
-	if (candidate->generation < cutoff)
+	if (commit_graph_generation(candidate) < cutoff)
 		return CONTAINS_NO;
 
 	return CONTAINS_UNKNOWN;
@@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	for (p = want; p; p = p->next) {
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (c->generation < cutoff)
-			cutoff = c->generation;
+		if (commit_graph_generation(c) < cutoff)
+			cutoff = commit_graph_generation(c);
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	if (a->generation > b->generation)
+	if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 	return 0;
 }
@@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 		list[nr_commits] = (struct commit *)from_one;
 		if (parse_commit(list[nr_commits]) ||
-		    list[nr_commits]->generation < min_generation) {
+		    commit_graph_generation(list[nr_commits]) < min_generation) {
 			result = 0;
 			goto cleanup;
 		}
@@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 					if (parse_commit(parent->item) ||
 					    parent->item->date < min_commit_date ||
-					    parent->item->generation < min_generation)
+					    commit_graph_generation(parent->item) < min_generation)
 						continue;
 
 					commit_list_insert(parent->item, &stack);
@@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (from_iter->item->generation < min_generation)
-				min_generation = from_iter->item->generation;
+			if (commit_graph_generation(from_iter->item) < min_generation)
+				min_generation = commit_graph_generation(from_iter->item);
 		}
 
 		from_iter = from_iter->next;
@@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (to_iter->item->generation < min_generation)
-				min_generation = to_iter->item->generation;
+			if (commit_graph_generation(to_iter->item) < min_generation)
+				min_generation = commit_graph_generation(to_iter->item);
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (c->generation < min_generation)
-			min_generation = c->generation;
+		if (commit_graph_generation(c) < min_generation)
+			min_generation = commit_graph_generation(c);
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
@@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 			parse_commit(p);
 
-			if (p->generation < min_generation)
+			if (commit_graph_generation(p) < min_generation)
 				continue;
 
 			if (p->object.flags & PARENT2)
diff --git a/commit.c b/commit.c
index b30875e66b..ed0917a2c7 100644
--- a/commit.c
+++ b/commit.c
@@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
 		return get_commit_tree_in_graph(r, commit);
 
 	return NULL;
@@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 	const struct commit *a = a_, *b = b_;
 
 	/* newer commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return 1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/commit.h b/commit.h
index 1b2dea5d85..e901538909 100644
--- a/commit.h
+++ b/commit.h
@@ -36,8 +36,6 @@ struct commit {
 	 * or get_commit_tree_oid().
 	 */
 	struct tree *maybe_tree;
-	uint32_t graph_pos;
-	uint32_t generation;
 	unsigned int index;
 };
 
diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index 778e4704f6..af6dd4c20c 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -32,3 +32,21 @@ expression c;
 - c->maybe_tree
 + repo_get_commit_tree(specify_the_right_repo_here, c)
   ...>}
+
+@@
+struct commit *c;
+expression E;
+@@
+(
+- c->generation = E;
++ commit_graph_data_at(c)->generation = E;
+|
+- c->graph_pos = E;
++ commit_graph_data_at(c)->graph_pos = E;
+|
+- c->generation
++ commit_graph_generation(c)
+|
+- c->graph_pos
++ commit_graph_position(c)
+)
diff --git a/revision.c b/revision.c
index ebb4d2a0f2..8648d7c43c 100644
--- a/revision.c
+++ b/revision.c
@@ -725,7 +725,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
 	if (!revs->repo->objects->commit_graph)
 		return -1;
 
-	if (commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY)
 		return -1;
 
 	filter = get_bloom_filter(revs->repo, commit, 0);
@@ -3320,7 +3320,7 @@ static void explore_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->explore_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		explore_walk_step(revs);
 }
 
@@ -3336,7 +3336,7 @@ static void indegree_walk_step(struct rev_info *revs)
 	if (parse_commit_gently(c, 1) < 0)
 		return;
 
-	explore_to_depth(revs, c->generation);
+	explore_to_depth(revs, commit_graph_generation(c));
 
 	for (p = c->parents; p; p = p->next) {
 		struct commit *parent = p->item;
@@ -3360,7 +3360,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->indegree_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		indegree_walk_step(revs);
 }
 
@@ -3420,8 +3420,8 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (c->generation < info->min_generation)
-			info->min_generation = c->generation;
+		if (commit_graph_generation(c) < info->min_generation)
+			info->min_generation = commit_graph_generation(c);
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3479,8 +3479,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (parent->generation < info->min_generation) {
-			info->min_generation = parent->generation;
+		if (commit_graph_generation(parent) < info->min_generation) {
+			info->min_generation = commit_graph_generation(parent);
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
                     ` (2 preceding siblings ...)
  2020-06-17  9:14   ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
@ 2020-06-17  9:14   ` Abhishek Kumar
  2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
  4 siblings, 0 replies; 39+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:14 UTC (permalink / raw)
  To: git; +Cc: jnareb, stolee

In an earlier patch, multiple struct acccesses to `graph_pos` and
`generation` were auto-converted to multiple method calls.

Since the values are fixed and commit-slab access costly, we would be
better off with storing the values as a local variable and reusing it.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 bloom.c        |  5 +++--
 commit-graph.c | 40 ++++++++++++++++++++++------------
 commit-reach.c | 59 ++++++++++++++++++++++++++++++++------------------
 commit.c       |  6 +++--
 revision.c     | 12 ++++++----
 5 files changed, 79 insertions(+), 43 deletions(-)

diff --git a/bloom.c b/bloom.c
index 3062aafaba..6a7f2f2bdc 100644
--- a/bloom.c
+++ b/bloom.c
@@ -33,15 +33,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct commit *c)
 {
 	uint32_t lex_pos, start_index, end_index;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
+	lex_pos = graph_pos - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
diff --git a/commit-graph.c b/commit-graph.c
index 14cc7e931c..fdd1c4fa7c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -144,10 +144,12 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *a = *(const struct commit **)va;
 	const struct commit *b = *(const struct commit **)vb;
 
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
 	/* lower generation commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return -1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -729,6 +731,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
 {
 	const unsigned char *commit_data;
+	struct commit_graph_data *graph_data;
 	uint32_t lex_index;
 
 	while (pos < g->num_commits_in_base)
@@ -736,8 +739,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	commit_graph_data_at(item)->graph_pos = pos;
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -753,6 +758,7 @@ static int fill_commit_in_graph(struct repository *r,
 	uint32_t *parent_data_ptr;
 	uint64_t date_low, date_high;
 	struct commit_list **pptr;
+	struct commit_graph_data *graph_data;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
 
@@ -766,7 +772,8 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	commit_graph_data_at(item)->graph_pos = pos;
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -779,7 +786,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -811,8 +818,9 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
-		*pos = commit_graph_position(item);
+	uint32_t graph_pos = commit_graph_position(item);
+	if (graph_pos != COMMIT_NOT_FROM_GRAPH) {
+		*pos = graph_pos;
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -867,12 +875,13 @@ static struct tree *load_tree_for_commit(struct repository *r,
 {
 	struct object_id oid;
 	const unsigned char *commit_data;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -2299,6 +2308,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
+		uint32_t generation;
 
 		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
@@ -2337,8 +2347,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (commit_graph_generation(graph_parents->item) > max_generation)
-				max_generation = commit_graph_generation(graph_parents->item);
+			generation = commit_graph_generation(graph_parents->item);
+			if (generation > max_generation)
+				max_generation = generation;
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2368,10 +2379,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (commit_graph_generation(graph_commit) != max_generation + 1)
+		generation = commit_graph_generation(graph_commit);
+		if (generation != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     commit_graph_generation(graph_commit),
+				     generation,
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-reach.c b/commit-reach.c
index 3b2f863f5f..f5e5c0a32b 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -58,14 +58,15 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit *commit = prio_queue_get(&queue);
 		struct commit_list *parents;
 		int flags;
+		uint32_t generation = commit_graph_generation(commit);
 
-		if (min_generation && commit_graph_generation(commit) > last_gen)
+		if (min_generation && generation > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit_graph_generation(commit), last_gen,
+			    generation, last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit_graph_generation(commit);
+		last_gen = generation;
 
-		if (commit_graph_generation(commit) < min_generation)
+		if (generation < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -181,13 +182,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		if (redundant[i])
 			continue;
 		for (j = filled = 0; j < cnt; j++) {
+			uint32_t curr_generation;
 			if (i == j || redundant[j])
 				continue;
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (commit_graph_generation(array[j]) < min_generation)
-				min_generation = commit_graph_generation(array[j]);
+			curr_generation = commit_graph_generation(array[j]);
+			if (curr_generation < min_generation)
+				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -316,23 +319,26 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 {
 	struct commit_list *bases;
 	int ret = 0, i;
-	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+	uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY;
 
 	if (repo_parse_commit(r, commit))
 		return ret;
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (commit_graph_generation(reference[i]) < min_generation)
-			min_generation = commit_graph_generation(reference[i]);
+
+		generation = commit_graph_generation(reference[i]);
+		if (generation < min_generation)
+			min_generation = generation;
 	}
 
-	if (commit_graph_generation(commit) > min_generation)
+	generation = commit_graph_generation(commit);
+	if (generation > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit_graph_generation(commit));
+				     generation);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -490,10 +496,12 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	const struct commit_list *p;
 
 	for (p = want; p; p = p->next) {
+		uint32_t generation;
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (commit_graph_generation(c) < cutoff)
-			cutoff = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < cutoff)
+			cutoff = generation;
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +552,12 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
+
+	if (generation_a < generation_b)
 		return -1;
-	if (commit_graph_generation(a) > commit_graph_generation(b))
+	if (generation_a > generation_b)
 		return 1;
 	return 0;
 }
@@ -662,11 +673,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
 
 		if (!parse_commit(from_iter->item)) {
+			uint32_t generation;
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (commit_graph_generation(from_iter->item) < min_generation)
-				min_generation = commit_graph_generation(from_iter->item);
+			generation = commit_graph_generation(from_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		from_iter = from_iter->next;
@@ -674,11 +687,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 
 	while (to_iter) {
 		if (!parse_commit(to_iter->item)) {
+			uint32_t generation;
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (commit_graph_generation(to_iter->item) < min_generation)
-				min_generation = commit_graph_generation(to_iter->item);
+			generation = commit_graph_generation(to_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -718,11 +733,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
 	for (item = to; item < to_last; item++) {
+		uint32_t generation;
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (commit_graph_generation(c) < min_generation)
-			min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < min_generation)
+			min_generation = generation;
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
diff --git a/commit.c b/commit.c
index ed0917a2c7..43d29a800d 100644
--- a/commit.c
+++ b/commit.c
@@ -729,11 +729,13 @@ int compare_commits_by_author_date(const void *a_, const void *b_,
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused)
 {
 	const struct commit *a = a_, *b = b_;
+	const uint32_t generation_a = commit_graph_generation(a),
+		       generation_b = commit_graph_generation(b);
 
 	/* newer commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return 1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/revision.c b/revision.c
index 8648d7c43c..32be93f404 100644
--- a/revision.c
+++ b/revision.c
@@ -3413,6 +3413,7 @@ static void init_topo_walk(struct rev_info *revs)
 	info->min_generation = GENERATION_NUMBER_INFINITY;
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *c = list->item;
+		uint32_t generation;
 
 		if (parse_commit_gently(c, 1))
 			continue;
@@ -3420,8 +3421,9 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (commit_graph_generation(c) < info->min_generation)
-			info->min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < info->min_generation)
+			info->min_generation = generation;
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3472,6 +3474,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 	for (p = commit->parents; p; p = p->next) {
 		struct commit *parent = p->item;
 		int *pi;
+		uint32_t generation;
 
 		if (parent->object.flags & UNINTERESTING)
 			continue;
@@ -3479,8 +3482,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (commit_graph_generation(parent) < info->min_generation) {
-			info->min_generation = commit_graph_generation(parent);
+		generation = commit_graph_generation(parent);
+		if (generation < info->min_generation) {
+			info->min_generation = generation;
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v4 0/4] Move generation, graph_pos to a slab
  2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
                     ` (3 preceding siblings ...)
  2020-06-17  9:14   ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
@ 2020-06-19 13:59   ` Derrick Stolee
  2020-06-19 17:44     ` Junio C Hamano
  4 siblings, 1 reply; 39+ messages in thread
From: Derrick Stolee @ 2020-06-19 13:59 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: jnareb, SZEDER Gábor, Taylor Blau

On 6/17/2020 5:14 AM, Abhishek Kumar wrote:
> The struct commit is used in many contexts. However, members
> `generation` and `graph_pos` are only used for commit graph related
> operations and otherwise waste memory.
> 
> This wastage would have been more pronounced as we transition to
> generation nuber v2, which uses 64-bit generation number instead of
> current 32-bits.

Thanks, Szeder (CC'd) for the quality review in the previous
versions. I manually built and tested all of the patches here
and verified they passed all tests.

I think this series is in good shape.

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [GSOC Patch v4 0/4] Move generation, graph_pos to a slab
  2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
@ 2020-06-19 17:44     ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2020-06-19 17:44 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Abhishek Kumar, git, jnareb, SZEDER Gábor, Taylor Blau

Derrick Stolee <stolee@gmail.com> writes:

> On 6/17/2020 5:14 AM, Abhishek Kumar wrote:
>> The struct commit is used in many contexts. However, members
>> `generation` and `graph_pos` are only used for commit graph related
>> operations and otherwise waste memory.
>> 
>> This wastage would have been more pronounced as we transition to
>> generation nuber v2, which uses 64-bit generation number instead of
>> current 32-bits.
>
> Thanks, Szeder (CC'd) for the quality review in the previous
> versions. I manually built and tested all of the patches here
> and verified they passed all tests.
>
> I think this series is in good shape.

Thank you to all who are involved in this topic.  Looking good.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-06-19 17:44 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-04  7:27 [GSoC Patch 0/3] Move generation, graph_pos to a slab Abhishek Kumar
2020-06-04  7:27 ` [GSoC Patch 1/3] commit: introduce helpers for generation slab Abhishek Kumar
2020-06-04 14:36   ` Derrick Stolee
2020-06-04 17:35   ` Junio C Hamano
2020-06-05 23:23   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 2/3] commit: convert commit->generation to a slab Abhishek Kumar
2020-06-04 14:27   ` Derrick Stolee
2020-06-04 17:49   ` Junio C Hamano
2020-06-06 22:03   ` Jakub Narębski
2020-06-04  7:27 ` [GSoC Patch 3/3] commit: convert commit->graph_pos " Abhishek Kumar
2020-06-07 12:12   ` Jakub Narębski
2020-06-04 14:22 ` [GSoC Patch 0/3] Move generation, graph_pos " Derrick Stolee
2020-06-04 17:55   ` Junio C Hamano
2020-06-07 19:53   ` SZEDER Gábor
2020-06-08  5:48     ` Abhishek Kumar
2020-06-08  8:36       ` SZEDER Gábor
2020-06-08 13:45         ` Derrick Stolee
2020-06-08 16:46           ` SZEDER Gábor
2020-06-08 15:21         ` Jakub Narębski
2020-06-05 19:00 ` Jakub Narębski
2020-06-07 19:32 ` [GSOC Patch v2 0/4] " Abhishek Kumar
2020-06-07 19:32   ` [GSOC Patch v2 1/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-15 16:27     ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 2/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-08  8:26     ` SZEDER Gábor
2020-06-08 12:35       ` Derrick Stolee
2020-06-07 19:32   ` [GSOC Patch v2 3/4] commit-graph: use generation directly when writing commit-graph Abhishek Kumar
2020-06-08 16:31     ` Jakub Narębski
2020-06-15 16:31       ` Taylor Blau
2020-06-07 19:32   ` [GSOC Patch v2 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-08 16:22   ` [GSOC Patch v2 0/4] Move generation, graph_pos to a slab Jakub Narębski
2020-06-15 16:24   ` Taylor Blau
2020-06-17  9:14 ` [GSOC Patch v4 " Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 1/4] object: drop parsed_object_pool->commit_count Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-17  9:14   ` [GSOC Patch v4 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-19 13:59   ` [GSOC Patch v4 0/4] Move generation, graph_pos to a slab Derrick Stolee
2020-06-19 17:44     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).