git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC Patch v3 0/4] Move generation, graph_pos to a slab
@ 2020-06-12 18:40 Abhishek Kumar
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-12 18:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

While the overall test suite runs slightly faster than master
(series: 27m10s, master: 27ms34s, faster by 2.35%), certain commands
like `git merge-base --is-ancestory` are slowed by nearly 40% as
discovered by SDEZER Gabor [1].

Derrick Stolee believes the slow down is attributable to the underlying
algorithm rather than the slowness of commit-slab access [2] and we will
follow-up on that in a later series.

I did not mention maximum RSS in the commit messages as they were nearly
identical (series: 68104kb, master: 68040kb, fewer by <0.1%). This leads
me to conclude that either the test using maximum memory involves commit
graph or did not involve the struct commit at all. The move to
commit-slab reduces memory footprint for the cases where struct commit
is used but members generation and graph position are not. Average RSS
would have been a good and more representative measure, but 
unfortunately time(1) could not measure it on my system.

With this, I feel the patch will require minor fixes, if any. I am
moving ahead with working the next step of "Implement Generation Number
v2" that is proper handling of commit-graph format change.

Based on the discussions, I feel we should compute both generation
number v1 and the date offset value with storing date offsets in a new
chunk as the cost is mostly from walking the commits.

Abhishek Kumar (4):
  alloc: introduce parsed_commits_count
  commit-graph: introduce commit_graph_data_slab
  commit: move members graph_pos, generation to a slab
  commit-graph: minimize commit_graph_data_slab access

 alloc.c                         |   6 +-
 blame.c                         |   2 +-
 bloom.c                         |   7 +-
 commit-graph.c                  | 122 ++++++++++++++++++++++++--------
 commit-graph.h                  |  10 +++
 commit-reach.c                  |  69 +++++++++++-------
 commit.c                        |   8 ++-
 contrib/coccinelle/commit.cocci |  18 +++++
 revision.c                      |  20 +++---
 9 files changed, 188 insertions(+), 74 deletions(-)

-- 
2.27.0

Changes in v3:
- Introduce alloc commit to fix the failing diff-submodule test.
- Elaborate on performance and slow down noticed in the commit message.

Changes in v2:
- Introduce struct commit_graph_data.
- Merge `graph_pos`, `generation` slabs into a single,
  `commit_graph_data` slab.
- Use graph position for an intermediate check for generation, saving
  the cost of initializing generation numbers.
- Add an follow-up patch caching results of slab access in local
  variables.
- Move coccinelle transformation to commit.coccinelle instead of
  creating new scripts.
- Elaborate on removing default values from init_commit_node().
- Revert moving macro constants (e.g. COMMIT_NOT_FROM_GRAPH,
  GENERATION_NUMBER_ZERO) from commit.h to commit-graph.h

CI Build: https://travis-ci.com/github/abhishekkumar2718/git  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
@ 2020-06-12 18:40 ` Abhishek Kumar
  2020-06-12 20:08   ` Junio C Hamano
                     ` (2 more replies)
  2020-06-12 18:40 ` [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-12 18:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

Commit slab relies on uniqueness of commit->index to access data. As
submodules are repositories on their own, alloc_commit_index() (which
depends on repository->parsed_objects->commit_count) no longer
returns unique values.

This would break tests once we move `generation` and `graph_pos` into a
commit slab, as commits of supermodule and submodule can have the same
index but must have different graph positions.

Let's introduce a counter variable, `parsed_commits_count` to keep track
of parsed commits so far.


Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---

CI Build for the failing tests:
https://travis-ci.com/github/abhishekkumar2718/git/jobs/345413840

 alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/alloc.c b/alloc.c
index 1c64c4dd16..29f0e3aa80 100644
--- a/alloc.c
+++ b/alloc.c
@@ -101,7 +101,9 @@ void *alloc_object_node(struct repository *r)
 
 static unsigned int alloc_commit_index(struct repository *r)
 {
-	return r->parsed_objects->commit_count++;
+	static unsigned int parsed_commits_count = 0;
+	r->parsed_objects->commit_count++;
+	return parsed_commits_count++;
 }
 
 void init_commit_node(struct repository *r, struct commit *c)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab
  2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
@ 2020-06-12 18:40 ` Abhishek Kumar
  2020-06-13  6:53   ` SZEDER Gábor
  2020-06-12 18:40 ` [PATCH v3 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-12 18:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

The struct commit is used in many contexts. However, members
`generation` and `graph_pos` are only used for commit-graph related
operations and otherwise waste memory.

This wastage would have been more pronounced as we transition to
generation number v2, which uses 64-bit generation number instead of
current 32-bits.

As they are often accessed together, let's introduce struct
commit_graph_data and move them to a commit_graph_data slab.

While the overall test suite runs just as fast as master,
(series: 27m10s, master: 27m34s), certain commands like
`git merge-base --is-ancestor` are slowed by nearly 40% as
discovered by SDEZER Gabor [1].

Derrick Stolee believes the slow down is attributable to the
underlying algorithm rather slowness of commit-slab access [2] and we will
follow-up on that in a later series.

[1]: https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2]: https://lore.kernel.org/git/13db757a-9412-7f1e-805c-8a028c4ab2b1@gmail.com/

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h | 10 ++++++++++
 2 files changed, 59 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2ff042fbf4..91120ba3d3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb)
 	       commit_pos_at(&commit_pos, b);
 }
 
+define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
+static struct commit_graph_data_slab commit_graph_data_slab =
+	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
+
+uint32_t commit_graph_position(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
+}
+
+uint32_t commit_graph_generation(const struct commit *c)
+{
+	struct commit_graph_data *data =
+		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
+
+	if (!data)
+		return GENERATION_NUMBER_INFINITY;
+	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
+		return GENERATION_NUMBER_INFINITY;
+
+	return data->generation;
+}
+
+static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
+{
+	uint32_t i = commit_graph_data_slab.slab_count, j;
+	uint32_t slab_size = commit_graph_data_slab.slab_size;
+	struct commit_graph_data *data =
+		commit_graph_data_slab_at(&commit_graph_data_slab, c);
+
+	/*
+	 * commit-slab initializes elements with zero, overwrite this with
+	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
+	 *
+	 * We avoid initializing generation with checking if graph position
+	 * is not COMMIT_NOT_FROM_GRAPH.
+	 */
+	for (; i < commit_graph_data_slab.slab_count; i++) {
+		for (j = 0; j < slab_size; j++) {
+			commit_graph_data_slab[i][j].graph_pos =
+				COMMIT_NOT_FROM_GRAPH;
+		}
+	}
+
+	return data;
+}
+
 static int commit_gen_cmp(const void *va, const void *vb)
 {
 	const struct commit *a = *(const struct commit **)va;
diff --git a/commit-graph.h b/commit-graph.h
index 3ba0da1e5f..cc76757007 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -135,4 +135,14 @@ void free_commit_graph(struct commit_graph *);
  */
 void disable_commit_graph(struct repository *r);
 
+struct commit_graph_data {
+	uint32_t graph_pos;
+	uint32_t generation;
+};
+
+/* 
+ * Commits should be parsed before accessing generation, graph positions.
+ */
+uint32_t commit_graph_generation(const struct commit *);
+uint32_t commit_graph_position(const struct commit *);
 #endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] commit: move members graph_pos, generation to a slab
  2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
  2020-06-12 18:40 ` [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
@ 2020-06-12 18:40 ` Abhishek Kumar
  2020-06-12 18:40 ` [PATCH v3 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
  2020-06-12 21:26 ` [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Jakub Narębski
  4 siblings, 0 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-12 18:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

We remove members `graph_pos` and `generation` from the struct commit.
The default assignments in init_commit_node() are no longer valid,
which is fine as the slab helpers return appropriate default values and
the assignments are removed.

We will replace existing use of commit->generation and commit->graph_pos
by commit_graph_data_slab helpers using
`contrib/coccinelle/commit.cocci'.

commit_graph_generation() returns GENERATION_NUMBER_INFINITY if the
graph position for commit is COMMIT_NOT_FROM_GRAPH.

While this is true when reading from a commit graph, no graph positions
are associated with a commit when writing a commit graph. Therefore, the
helper incorrectly returns GENERATION_NUMBER_INFINITY despite having a
finite generation number. This is fixed by using generation number
directly when writing a commit graph, namely in functions
compute_generation_numbers() and write_graph_chunk_data().

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 alloc.c                         |  2 -
 blame.c                         |  2 +-
 bloom.c                         |  6 +--
 commit-graph.c                  | 68 ++++++++++++++++++---------------
 commit-graph.h                  |  2 +-
 commit-reach.c                  | 50 ++++++++++++------------
 commit.c                        |  6 +--
 contrib/coccinelle/commit.cocci | 18 +++++++++
 revision.c                      | 16 ++++----
 9 files changed, 96 insertions(+), 74 deletions(-)

diff --git a/alloc.c b/alloc.c
index 29f0e3aa80..fe7d7345e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -110,8 +110,6 @@ void init_commit_node(struct repository *r, struct commit *c)
 {
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index(r);
-	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
-	c->generation = GENERATION_NUMBER_INFINITY;
 }
 
 void *alloc_commit_node(struct repository *r)
diff --git a/blame.c b/blame.c
index da7e28800e..82fa16d658 100644
--- a/blame.c
+++ b/blame.c
@@ -1272,7 +1272,7 @@ static int maybe_changed_path(struct repository *r,
 	if (!bd)
 		return 1;
 
-	if (origin->commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(origin->commit) == GENERATION_NUMBER_INFINITY)
 		return 1;
 
 	filter = get_bloom_filter(r, origin->commit, 0);
diff --git a/bloom.c b/bloom.c
index 6c7611847a..3062aafaba 100644
--- a/bloom.c
+++ b/bloom.c
@@ -34,14 +34,14 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 {
 	uint32_t lex_pos, start_index, end_index;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = c->graph_pos - g->num_commits_in_base;
+	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
@@ -193,7 +193,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
-		if (c->graph_pos != COMMIT_NOT_FROM_GRAPH &&
+		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH &&
 			r->objects->commit_graph->chunk_bloom_indexes) {
 			if (load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
 				return filter;
diff --git a/commit-graph.c b/commit-graph.c
index 91120ba3d3..cb4365e9b6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -128,7 +128,7 @@ static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
 	 */
 	for (; i < commit_graph_data_slab.slab_count; i++) {
 		for (j = 0; j < slab_size; j++) {
-			commit_graph_data_slab[i][j].graph_pos =
+			commit_graph_data_slab.slab[i][j].graph_pos =
 				COMMIT_NOT_FROM_GRAPH;
 		}
 	}
@@ -142,9 +142,9 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *b = *(const struct commit **)vb;
 
 	/* lower generation commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -719,7 +719,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 	c = lookup_commit(r, &oid);
 	if (!c)
 		die(_("could not find commit %s"), oid_to_hex(&oid));
-	c->graph_pos = pos;
+	commit_graph_data_at(c)->graph_pos = pos;
 	return &commit_list_insert(c, pptr)->next;
 }
 
@@ -733,8 +733,8 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	item->graph_pos = pos;
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->graph_pos = pos;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -763,7 +763,7 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	item->graph_pos = pos;
+	commit_graph_data_at(item)->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -776,7 +776,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -808,8 +808,8 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-		*pos = item->graph_pos;
+	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
+		*pos = commit_graph_position(item);
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -865,11 +865,11 @@ static struct tree *load_tree_for_commit(struct repository *r,
 	struct object_id oid;
 	const unsigned char *commit_data;
 
-	while (c->graph_pos < g->num_commits_in_base)
+	while (commit_graph_position(c) < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (c->graph_pos - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -883,7 +883,7 @@ static struct tree *get_commit_tree_in_graph_one(struct repository *r,
 {
 	if (c->maybe_tree)
 		return c->maybe_tree;
-	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(c) == COMMIT_NOT_FROM_GRAPH)
 		BUG("get_commit_tree_in_graph_one called from non-commit-graph commit");
 
 	return load_tree_for_commit(r, g, (struct commit *)c);
@@ -1069,7 +1069,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 		else
 			packedDate[0] = 0;
 
-		packedDate[0] |= htonl((*list)->generation << 2);
+		packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);
 
 		packedDate[1] = htonl((*list)->date);
 		hashwrite(f, packedDate, 8);
@@ -1268,7 +1268,7 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 			continue;
 		if (ctx->split) {
 			if ((!parse_commit(commit) &&
-			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
+			     commit_graph_position(commit) == COMMIT_NOT_FROM_GRAPH) ||
 			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
 				add_missing_parents(ctx, commit);
 		} else if (!parse_commit_no_graph(commit))
@@ -1300,9 +1300,11 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 	for (i = 0; i < ctx->commits.nr; i++) {
+		uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
+
 		display_progress(ctx->progress, i + 1);
-		if (ctx->commits.list[i]->generation != GENERATION_NUMBER_INFINITY &&
-		    ctx->commits.list[i]->generation != GENERATION_NUMBER_ZERO)
+		if (generation != GENERATION_NUMBER_INFINITY &&
+		    generation != GENERATION_NUMBER_ZERO)
 			continue;
 
 		commit_list_insert(ctx->commits.list[i], &list);
@@ -1313,22 +1315,26 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 			uint32_t max_generation = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+				generation = commit_graph_data_at(parent->item)->generation;
+
+				if (generation == GENERATION_NUMBER_INFINITY ||
+				    generation == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
-				} else if (parent->item->generation > max_generation) {
-					max_generation = parent->item->generation;
+				} else if (generation > max_generation) {
+					max_generation = generation;
 				}
 			}
 
 			if (all_parents_computed) {
-				current->generation = max_generation + 1;
+				struct commit_graph_data *data = commit_graph_data_at(current);
+
+				data->generation = max_generation + 1;
 				pop_commit(&list);
 
-				if (current->generation > GENERATION_NUMBER_MAX)
-					current->generation = GENERATION_NUMBER_MAX;
+				if (data->generation > GENERATION_NUMBER_MAX)
+					data->generation = GENERATION_NUMBER_MAX;
 			}
 		}
 	}
@@ -1507,7 +1513,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 			if (ctx->split) {
 				struct commit *c = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
-				if (!c || c->graph_pos != COMMIT_NOT_FROM_GRAPH)
+				if (!c || commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
 					continue;
 			}
 
@@ -1541,7 +1547,7 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
 		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
-		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		    commit_graph_position(ctx->commits.list[ctx->commits.nr]) != COMMIT_NOT_FROM_GRAPH)
 			continue;
 
 		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
@@ -2328,8 +2334,8 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (graph_parents->item->generation > max_generation)
-				max_generation = graph_parents->item->generation;
+			if (commit_graph_generation(graph_parents->item) > max_generation)
+				max_generation = commit_graph_generation(graph_parents->item);
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2339,7 +2345,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 			graph_report(_("commit-graph parent list for commit %s terminates early"),
 				     oid_to_hex(&cur_oid));
 
-		if (!graph_commit->generation) {
+		if (!commit_graph_generation(graph_commit)) {
 			if (generation_zero == GENERATION_NUMBER_EXISTS)
 				graph_report(_("commit-graph has generation number zero for commit %s, but non-zero elsewhere"),
 					     oid_to_hex(&cur_oid));
@@ -2359,10 +2365,10 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (graph_commit->generation != max_generation + 1)
+		if (commit_graph_generation(graph_commit) != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     graph_commit->generation,
+				     commit_graph_generation(graph_commit),
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-graph.h b/commit-graph.h
index cc76757007..28f89cdf3e 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -140,7 +140,7 @@ struct commit_graph_data {
 	uint32_t generation;
 };
 
-/* 
+/*
  * Commits should be parsed before accessing generation, graph positions.
  */
 uint32_t commit_graph_generation(const struct commit *);
diff --git a/commit-reach.c b/commit-reach.c
index 4ca7e706a1..3b2f863f5f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -59,13 +59,13 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit_list *parents;
 		int flags;
 
-		if (min_generation && commit->generation > last_gen)
+		if (min_generation && commit_graph_generation(commit) > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit->generation, last_gen,
+			    commit_graph_generation(commit), last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit->generation;
+		last_gen = commit_graph_generation(commit);
 
-		if (commit->generation < min_generation)
+		if (commit_graph_generation(commit) < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -176,7 +176,7 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		repo_parse_commit(r, array[i]);
 	for (i = 0; i < cnt; i++) {
 		struct commit_list *common;
-		uint32_t min_generation = array[i]->generation;
+		uint32_t min_generation = commit_graph_generation(array[i]);
 
 		if (redundant[i])
 			continue;
@@ -186,8 +186,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (array[j]->generation < min_generation)
-				min_generation = array[j]->generation;
+			if (commit_graph_generation(array[j]) < min_generation)
+				min_generation = commit_graph_generation(array[j]);
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -323,16 +323,16 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (reference[i]->generation < min_generation)
-			min_generation = reference[i]->generation;
+		if (commit_graph_generation(reference[i]) < min_generation)
+			min_generation = commit_graph_generation(reference[i]);
 	}
 
-	if (commit->generation > min_generation)
+	if (commit_graph_generation(commit) > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit->generation);
+				     commit_graph_generation(commit));
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -467,7 +467,7 @@ static enum contains_result contains_test(struct commit *candidate,
 	/* Otherwise, we don't know; prepare to recurse */
 	parse_commit_or_die(candidate);
 
-	if (candidate->generation < cutoff)
+	if (commit_graph_generation(candidate) < cutoff)
 		return CONTAINS_NO;
 
 	return CONTAINS_UNKNOWN;
@@ -492,8 +492,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	for (p = want; p; p = p->next) {
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (c->generation < cutoff)
-			cutoff = c->generation;
+		if (commit_graph_generation(c) < cutoff)
+			cutoff = commit_graph_generation(c);
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +544,9 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return -1;
-	if (a->generation > b->generation)
+	if (commit_graph_generation(a) > commit_graph_generation(b))
 		return 1;
 	return 0;
 }
@@ -585,7 +585,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 		list[nr_commits] = (struct commit *)from_one;
 		if (parse_commit(list[nr_commits]) ||
-		    list[nr_commits]->generation < min_generation) {
+		    commit_graph_generation(list[nr_commits]) < min_generation) {
 			result = 0;
 			goto cleanup;
 		}
@@ -621,7 +621,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
 
 					if (parse_commit(parent->item) ||
 					    parent->item->date < min_commit_date ||
-					    parent->item->generation < min_generation)
+					    commit_graph_generation(parent->item) < min_generation)
 						continue;
 
 					commit_list_insert(parent->item, &stack);
@@ -665,8 +665,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (from_iter->item->generation < min_generation)
-				min_generation = from_iter->item->generation;
+			if (commit_graph_generation(from_iter->item) < min_generation)
+				min_generation = commit_graph_generation(from_iter->item);
 		}
 
 		from_iter = from_iter->next;
@@ -677,8 +677,8 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (to_iter->item->generation < min_generation)
-				min_generation = to_iter->item->generation;
+			if (commit_graph_generation(to_iter->item) < min_generation)
+				min_generation = commit_graph_generation(to_iter->item);
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -721,8 +721,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (c->generation < min_generation)
-			min_generation = c->generation;
+		if (commit_graph_generation(c) < min_generation)
+			min_generation = commit_graph_generation(c);
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
@@ -755,7 +755,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 			parse_commit(p);
 
-			if (p->generation < min_generation)
+			if (commit_graph_generation(p) < min_generation)
 				continue;
 
 			if (p->object.flags & PARENT2)
diff --git a/commit.c b/commit.c
index 87686a7055..ad9a76dcc6 100644
--- a/commit.c
+++ b/commit.c
@@ -339,7 +339,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+	if (commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
 		return get_commit_tree_in_graph(r, commit);
 
 	return NULL;
@@ -731,9 +731,9 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 	const struct commit *a = a_, *b = b_;
 
 	/* newer commits first */
-	if (a->generation < b->generation)
+	if (commit_graph_generation(a) < commit_graph_generation(b))
 		return 1;
-	else if (a->generation > b->generation)
+	else if (commit_graph_generation(a) > commit_graph_generation(b))
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index 778e4704f6..af6dd4c20c 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -32,3 +32,21 @@ expression c;
 - c->maybe_tree
 + repo_get_commit_tree(specify_the_right_repo_here, c)
   ...>}
+
+@@
+struct commit *c;
+expression E;
+@@
+(
+- c->generation = E;
++ commit_graph_data_at(c)->generation = E;
+|
+- c->graph_pos = E;
++ commit_graph_data_at(c)->graph_pos = E;
+|
+- c->generation
++ commit_graph_generation(c)
+|
+- c->graph_pos
++ commit_graph_position(c)
+)
diff --git a/revision.c b/revision.c
index ebb4d2a0f2..8648d7c43c 100644
--- a/revision.c
+++ b/revision.c
@@ -725,7 +725,7 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
 	if (!revs->repo->objects->commit_graph)
 		return -1;
 
-	if (commit->generation == GENERATION_NUMBER_INFINITY)
+	if (commit_graph_generation(commit) == GENERATION_NUMBER_INFINITY)
 		return -1;
 
 	filter = get_bloom_filter(revs->repo, commit, 0);
@@ -3320,7 +3320,7 @@ static void explore_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->explore_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		explore_walk_step(revs);
 }
 
@@ -3336,7 +3336,7 @@ static void indegree_walk_step(struct rev_info *revs)
 	if (parse_commit_gently(c, 1) < 0)
 		return;
 
-	explore_to_depth(revs, c->generation);
+	explore_to_depth(revs, commit_graph_generation(c));
 
 	for (p = c->parents; p; p = p->next) {
 		struct commit *parent = p->item;
@@ -3360,7 +3360,7 @@ static void compute_indegrees_to_depth(struct rev_info *revs,
 	struct topo_walk_info *info = revs->topo_walk_info;
 	struct commit *c;
 	while ((c = prio_queue_peek(&info->indegree_queue)) &&
-	       c->generation >= gen_cutoff)
+	       commit_graph_generation(c) >= gen_cutoff)
 		indegree_walk_step(revs);
 }
 
@@ -3420,8 +3420,8 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (c->generation < info->min_generation)
-			info->min_generation = c->generation;
+		if (commit_graph_generation(c) < info->min_generation)
+			info->min_generation = commit_graph_generation(c);
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3479,8 +3479,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (parent->generation < info->min_generation) {
-			info->min_generation = parent->generation;
+		if (commit_graph_generation(parent) < info->min_generation) {
+			info->min_generation = commit_graph_generation(parent);
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] commit-graph: minimize commit_graph_data_slab access
  2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (2 preceding siblings ...)
  2020-06-12 18:40 ` [PATCH v3 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
@ 2020-06-12 18:40 ` Abhishek Kumar
  2020-06-12 21:26 ` [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Jakub Narębski
  4 siblings, 0 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-12 18:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jnareb

In an earlier patch, multiple struct acccesses to `graph_pos` and
`generation` were auto-converted to multiple method calls.

Since the values are fixed and commit-slab access costly, we would be
better off with storing the values as a local variable and reuse it.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 bloom.c        |  5 +++--
 commit-graph.c | 40 ++++++++++++++++++++++------------
 commit-reach.c | 59 ++++++++++++++++++++++++++++++++------------------
 commit.c       |  6 +++--
 revision.c     | 12 ++++++----
 5 files changed, 79 insertions(+), 43 deletions(-)

diff --git a/bloom.c b/bloom.c
index 3062aafaba..6a7f2f2bdc 100644
--- a/bloom.c
+++ b/bloom.c
@@ -33,15 +33,16 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct commit *c)
 {
 	uint32_t lex_pos, start_index, end_index;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
-	lex_pos = commit_graph_position(c) - g->num_commits_in_base;
+	lex_pos = graph_pos - g->num_commits_in_base;
 
 	end_index = get_be32(g->chunk_bloom_indexes + 4 * lex_pos);
 
diff --git a/commit-graph.c b/commit-graph.c
index cb4365e9b6..5d86d20459 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -141,10 +141,12 @@ static int commit_gen_cmp(const void *va, const void *vb)
 	const struct commit *a = *(const struct commit **)va;
 	const struct commit *b = *(const struct commit **)vb;
 
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
 	/* lower generation commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return -1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return 1;
 
 	/* use date as a heuristic when generations are equal */
@@ -726,6 +728,7 @@ static struct commit_list **insert_parent_or_die(struct repository *r,
 static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
 {
 	const unsigned char *commit_data;
+	struct commit_graph_data *graph_data;
 	uint32_t lex_index;
 
 	while (pos < g->num_commits_in_base)
@@ -733,8 +736,10 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
-	commit_graph_data_at(item)->graph_pos = pos;
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
 static inline void set_commit_tree(struct commit *c, struct tree *t)
@@ -750,6 +755,7 @@ static int fill_commit_in_graph(struct repository *r,
 	uint32_t *parent_data_ptr;
 	uint64_t date_low, date_high;
 	struct commit_list **pptr;
+	struct commit_graph_data *graph_data;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
 
@@ -763,7 +769,8 @@ static int fill_commit_in_graph(struct repository *r,
 	 * Store the "full" position, but then use the
 	 * "local" position for the rest of the calculation.
 	 */
-	commit_graph_data_at(item)->graph_pos = pos;
+	graph_data = commit_graph_data_at(item);
+	graph_data->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
 
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
@@ -776,7 +783,7 @@ static int fill_commit_in_graph(struct repository *r,
 	date_low = get_be32(commit_data + g->hash_len + 12);
 	item->date = (timestamp_t)((date_high << 32) | date_low);
 
-	commit_graph_data_at(item)->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 
 	pptr = &item->parents;
 
@@ -808,8 +815,9 @@ static int fill_commit_in_graph(struct repository *r,
 
 static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
 {
-	if (commit_graph_position(item) != COMMIT_NOT_FROM_GRAPH) {
-		*pos = commit_graph_position(item);
+	uint32_t graph_pos = commit_graph_position(item);
+	if (graph_pos != COMMIT_NOT_FROM_GRAPH) {
+		*pos = graph_pos;
 		return 1;
 	} else {
 		struct commit_graph *cur_g = g;
@@ -864,12 +872,13 @@ static struct tree *load_tree_for_commit(struct repository *r,
 {
 	struct object_id oid;
 	const unsigned char *commit_data;
+	uint32_t graph_pos = commit_graph_position(c);
 
-	while (commit_graph_position(c) < g->num_commits_in_base)
+	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
 	commit_data = g->chunk_commit_data +
-			GRAPH_DATA_WIDTH * (commit_graph_position(c) - g->num_commits_in_base);
+			GRAPH_DATA_WIDTH * (graph_pos - g->num_commits_in_base);
 
 	hashcpy(oid.hash, commit_data);
 	set_commit_tree(c, lookup_tree(r, &oid));
@@ -2296,6 +2305,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		struct commit *graph_commit, *odb_commit;
 		struct commit_list *graph_parents, *odb_parents;
 		uint32_t max_generation = 0;
+		uint32_t generation;
 
 		display_progress(progress, i + 1);
 		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
@@ -2334,8 +2344,9 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 					     oid_to_hex(&graph_parents->item->object.oid),
 					     oid_to_hex(&odb_parents->item->object.oid));
 
-			if (commit_graph_generation(graph_parents->item) > max_generation)
-				max_generation = commit_graph_generation(graph_parents->item);
+			generation = commit_graph_generation(graph_parents->item);
+			if (generation > max_generation)
+				max_generation = generation;
 
 			graph_parents = graph_parents->next;
 			odb_parents = odb_parents->next;
@@ -2365,10 +2376,11 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		if (max_generation == GENERATION_NUMBER_MAX)
 			max_generation--;
 
-		if (commit_graph_generation(graph_commit) != max_generation + 1)
+		generation = commit_graph_generation(graph_commit);
+		if (generation != max_generation + 1)
 			graph_report(_("commit-graph generation for commit %s is %u != %u"),
 				     oid_to_hex(&cur_oid),
-				     commit_graph_generation(graph_commit),
+				     generation,
 				     max_generation + 1);
 
 		if (graph_commit->date != odb_commit->date)
diff --git a/commit-reach.c b/commit-reach.c
index 3b2f863f5f..f5e5c0a32b 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -58,14 +58,15 @@ static struct commit_list *paint_down_to_common(struct repository *r,
 		struct commit *commit = prio_queue_get(&queue);
 		struct commit_list *parents;
 		int flags;
+		uint32_t generation = commit_graph_generation(commit);
 
-		if (min_generation && commit_graph_generation(commit) > last_gen)
+		if (min_generation && generation > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
-			    commit_graph_generation(commit), last_gen,
+			    generation, last_gen,
 			    oid_to_hex(&commit->object.oid));
-		last_gen = commit_graph_generation(commit);
+		last_gen = generation;
 
-		if (commit_graph_generation(commit) < min_generation)
+		if (generation < min_generation)
 			break;
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
@@ -181,13 +182,15 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt
 		if (redundant[i])
 			continue;
 		for (j = filled = 0; j < cnt; j++) {
+			uint32_t curr_generation;
 			if (i == j || redundant[j])
 				continue;
 			filled_index[filled] = j;
 			work[filled++] = array[j];
 
-			if (commit_graph_generation(array[j]) < min_generation)
-				min_generation = commit_graph_generation(array[j]);
+			curr_generation = commit_graph_generation(array[j]);
+			if (curr_generation < min_generation)
+				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
 					      work, min_generation);
@@ -316,23 +319,26 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 {
 	struct commit_list *bases;
 	int ret = 0, i;
-	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+	uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY;
 
 	if (repo_parse_commit(r, commit))
 		return ret;
 	for (i = 0; i < nr_reference; i++) {
 		if (repo_parse_commit(r, reference[i]))
 			return ret;
-		if (commit_graph_generation(reference[i]) < min_generation)
-			min_generation = commit_graph_generation(reference[i]);
+
+		generation = commit_graph_generation(reference[i]);
+		if (generation < min_generation)
+			min_generation = generation;
 	}
 
-	if (commit_graph_generation(commit) > min_generation)
+	generation = commit_graph_generation(commit);
+	if (generation > min_generation)
 		return ret;
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     commit_graph_generation(commit));
+				     generation);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);
@@ -490,10 +496,12 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	const struct commit_list *p;
 
 	for (p = want; p; p = p->next) {
+		uint32_t generation;
 		struct commit *c = p->item;
 		load_commit_graph_info(the_repository, c);
-		if (commit_graph_generation(c) < cutoff)
-			cutoff = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < cutoff)
+			cutoff = generation;
 	}
 
 	result = contains_test(candidate, want, cache, cutoff);
@@ -544,9 +552,12 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 	const struct commit *a = *(const struct commit * const *)_a;
 	const struct commit *b = *(const struct commit * const *)_b;
 
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	uint32_t generation_a = commit_graph_generation(a);
+	uint32_t generation_b = commit_graph_generation(b);
+
+	if (generation_a < generation_b)
 		return -1;
-	if (commit_graph_generation(a) > commit_graph_generation(b))
+	if (generation_a > generation_b)
 		return 1;
 	return 0;
 }
@@ -662,11 +673,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
 
 		if (!parse_commit(from_iter->item)) {
+			uint32_t generation;
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
 
-			if (commit_graph_generation(from_iter->item) < min_generation)
-				min_generation = commit_graph_generation(from_iter->item);
+			generation = commit_graph_generation(from_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		from_iter = from_iter->next;
@@ -674,11 +687,13 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 
 	while (to_iter) {
 		if (!parse_commit(to_iter->item)) {
+			uint32_t generation;
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
 
-			if (commit_graph_generation(to_iter->item) < min_generation)
-				min_generation = commit_graph_generation(to_iter->item);
+			generation = commit_graph_generation(to_iter->item);
+			if (generation < min_generation)
+				min_generation = generation;
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -718,11 +733,13 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
 	for (item = to; item < to_last; item++) {
+		uint32_t generation;
 		struct commit *c = *item;
 
 		parse_commit(c);
-		if (commit_graph_generation(c) < min_generation)
-			min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < min_generation)
+			min_generation = generation;
 
 		if (!(c->object.flags & PARENT1)) {
 			c->object.flags |= PARENT1;
diff --git a/commit.c b/commit.c
index ad9a76dcc6..f85ade78ed 100644
--- a/commit.c
+++ b/commit.c
@@ -729,11 +729,13 @@ int compare_commits_by_author_date(const void *a_, const void *b_,
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused)
 {
 	const struct commit *a = a_, *b = b_;
+	const uint32_t generation_a = commit_graph_generation(a),
+		       generation_b = commit_graph_generation(b);
 
 	/* newer commits first */
-	if (commit_graph_generation(a) < commit_graph_generation(b))
+	if (generation_a < generation_b)
 		return 1;
-	else if (commit_graph_generation(a) > commit_graph_generation(b))
+	else if (generation_a > generation_b)
 		return -1;
 
 	/* use date as a heuristic when generations are equal */
diff --git a/revision.c b/revision.c
index 8648d7c43c..32be93f404 100644
--- a/revision.c
+++ b/revision.c
@@ -3413,6 +3413,7 @@ static void init_topo_walk(struct rev_info *revs)
 	info->min_generation = GENERATION_NUMBER_INFINITY;
 	for (list = revs->commits; list; list = list->next) {
 		struct commit *c = list->item;
+		uint32_t generation;
 
 		if (parse_commit_gently(c, 1))
 			continue;
@@ -3420,8 +3421,9 @@ static void init_topo_walk(struct rev_info *revs)
 		test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
 		test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
 
-		if (commit_graph_generation(c) < info->min_generation)
-			info->min_generation = commit_graph_generation(c);
+		generation = commit_graph_generation(c);
+		if (generation < info->min_generation)
+			info->min_generation = generation;
 
 		*(indegree_slab_at(&info->indegree, c)) = 1;
 
@@ -3472,6 +3474,7 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 	for (p = commit->parents; p; p = p->next) {
 		struct commit *parent = p->item;
 		int *pi;
+		uint32_t generation;
 
 		if (parent->object.flags & UNINTERESTING)
 			continue;
@@ -3479,8 +3482,9 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
 		if (parse_commit_gently(parent, 1) < 0)
 			continue;
 
-		if (commit_graph_generation(parent) < info->min_generation) {
-			info->min_generation = commit_graph_generation(parent);
+		generation = commit_graph_generation(parent);
+		if (generation < info->min_generation) {
+			info->min_generation = generation;
 			compute_indegrees_to_depth(revs, info->min_generation);
 		}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
@ 2020-06-12 20:08   ` Junio C Hamano
  2020-06-12 22:00     ` Junio C Hamano
  2020-06-12 22:52   ` Junio C Hamano
  2020-06-12 23:16   ` Jakub Narębski
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2020-06-12 20:08 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

>  static unsigned int alloc_commit_index(struct repository *r)
>  {
> -	return r->parsed_objects->commit_count++;
> +	static unsigned int parsed_commits_count = 0;
> +	r->parsed_objects->commit_count++;
> +	return parsed_commits_count++;
>  }

Hmph, ugly.  

The need for this hack, which butchers the function that explicitly
takes a repository as its parameter so that parsed commits can be
counted per repository to not count commits per repository, makes
the whole thing smell fishy.

There shouldn't be a reason why a commit in the superproject and
another commit in the submodule need to be in the same commit graph
in the first place.

Instead of breaking the function like this, the right "fix" may be
to make the commit slab per repository, because the commit index are
taken from the separate namespace per repository.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GSoC Patch v3 0/4] Move generation, graph_pos to a slab
  2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
                   ` (3 preceding siblings ...)
  2020-06-12 18:40 ` [PATCH v3 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
@ 2020-06-12 21:26 ` Jakub Narębski
  4 siblings, 0 replies; 14+ messages in thread
From: Jakub Narębski @ 2020-06-12 21:26 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: Jakub Narębski, Derrick Stolee

On 12.06.2020, Abhishek Kumar wrote:

> The struct commit is used in many contexts. However, members
> `generation` and `graph_pos` are only used for commit graph related
> operations and otherwise waste memory.
> 
> This wastage would have been more pronounced as we transition to
> generation number v2, which uses 64-bit generation number instead of
> current 32-bits.

I think s/would have been/would be/1, but I am not native English
speaker.

> 
> While the overall test suite runs slightly faster than master
> (series: 27m10s, master: 27ms34s, faster by 2.35%), certain commands
> like `git merge-base --is-ancestory` are slowed by nearly 40% as
> discovered by SDEZER Gabor [1].

First, the name is SZEDER Gábor.

Second, it would be nice to have some specific examples, like for
example the results of running `git merge-base --is-ancestory` in
specific repository, and from specific starting point.

It might be good idea to also show performance change for a command
that handles large amount of commits but does not use the commit-graph,
like for example `git gc`.

> 
> Derrick Stolee believes the slow down is attributable to the underlying
> algorithm rather than the slowness of commit-slab access [2] and we will
> follow-up on that in a later series.

Would it be possible to show profiling results?

> 
> I did not mention maximum RSS in the commit messages as they were nearly
> identical (series: 68104kb, master: 68040kb, fewer by <0.1%). This leads
> me to conclude that either the test using maximum memory involves commit
> graph or did not involve the struct commit at all. The move to
> commit-slab reduces memory footprint for the cases where struct commit
> is used but members generation and graph position are not. Average RSS
> would have been a good and more representative measure, but
> unfortunately time(1) could not measure it on my system.

What operating system do you use?

> 
> With this, I feel the patch will require minor fixes, if any. I am
> moving ahead with working the next step of "Implement Generation Number
> v2" that is proper handling of commit-graph format change.

All right.  It should be not a problem to rebase series on top of
different implementation of [inline-able] helper function if it
turns out that the move to slab serious affects negatively performance.

> 
> Based on the discussions, I feel we should compute both generation
> number v1 and the date offset value with storing date offsets in a new
> chunk as the cost is mostly from walking the commits.

Should we store offsets and corrected commit date on the slab,
or just the corrected date (with offset applied)?  We should be
using corrected commit date only; offset can be recomputed if
needed, e.g. when writing the commit-graph.

> 
> Abhishek Kumar (4):
>    alloc: introduce parsed_commits_count
>    commit-graph: introduce commit_graph_data_slab
>    commit: move members graph_pos, generation to a slab
>    commit-graph: minimize commit_graph_data_slab access
> 
>   alloc.c                         |   6 +-
>   blame.c                         |   2 +-
>   bloom.c                         |   7 +-
>   commit-graph.c                  | 122 ++++++++++++++++++++++++--------
>   commit-graph.h                  |  10 +++
>   commit-reach.c                  |  69 +++++++++++-------
>   commit.c                        |   8 ++-
>   contrib/coccinelle/commit.cocci |  18 +++++
>   revision.c                      |  20 +++---
>   9 files changed, 188 insertions(+), 74 deletions(-)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 20:08   ` Junio C Hamano
@ 2020-06-12 22:00     ` Junio C Hamano
  2020-06-12 22:20       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2020-06-12 22:00 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb, SZEDER Gábor

Junio C Hamano <gitster@pobox.com> writes:

> Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
>
>>  static unsigned int alloc_commit_index(struct repository *r)
>>  {
>> -	return r->parsed_objects->commit_count++;
>> +	static unsigned int parsed_commits_count = 0;
>> +	r->parsed_objects->commit_count++;
>> +	return parsed_commits_count++;
>>  }
>
> Hmph, ugly.  
>
> The need for this hack, which butchers the function that explicitly
> takes a repository as its parameter so that parsed commits can be
> counted per repository to not count commits per repository, makes
> the whole thing smell fishy.
>
> There shouldn't be a reason why a commit in the superproject and
> another commit in the submodule need to be in the same commit graph
> in the first place.
>
> Instead of breaking the function like this, the right "fix" may be
> to make the commit slab per repository, because the commit index are
> taken from the separate namespace per repository.

Given a commit object (i.e. "struct commit *"), currently there is
no way to tell from which repository it came from.  So from that
point of view, we cannot identify which commit slab to use, unless
we add a back-pointer that says from which repository each commit
object came from, but that makes the commit object even heavier.

Back when 14ba97f8 (alloc: allow arbitrary repositories for alloc
functions, 2018-05-15) made the count per repository (which you are
reverting with this patch), there must have been a reason why it did
so.  We know that commit slab code wants you to count globally and
that is why you wrote this patch, but don't other parts of the code
expect and rely on the commits being counted per repository?  In
other words, with this change, who are you breaking to help the
commit slab code?

Cc'ing SZEDER who also have touched this and made the function
static early last year.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 22:00     ` Junio C Hamano
@ 2020-06-12 22:20       ` Junio C Hamano
  0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2020-06-12 22:20 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb, SZEDER Gábor

Junio C Hamano <gitster@pobox.com> writes:

> Back when 14ba97f8 (alloc: allow arbitrary repositories for alloc
> functions, 2018-05-15) made the count per repository (which you are
> reverting with this patch), there must have been a reason why it did
> so.  We know that commit slab code wants you to count globally and
> that is why you wrote this patch, but don't other parts of the code
> expect and rely on the commits being counted per repository?  In
> other words, with this change, who are you breaking to help the
> commit slab code?

I did a bit more digging, as I had a bit of time after pushing
today's integration result out, and it turns out that in the today's
code, c->index is the only thing that uses this counter.  It is set
in init_commit_node() function, which is called from object.c when
an in-core commit object is created.  The callchain looks like this:

object_as_type(struct repository *, struct object *, enum object_type, int quiet)
  -> init_commit_node(struct repository *, struct commit *)
       -> alloc_commit_index(struct repository *)

What is interesting is that object_as_type() takes the parameter
"struct repository *" ONLY BECAUSE it needs to pass something to
init_commit_node(), which in turn takes it ONLY BECAUSE the
alloc_commit_index() wants one to be passed.

Since the ONLY reason why there needs a monotorically increasing
counter of in-core commit objects is because we need to be able to
index into the commit slab, and because we cannot make commit slabs
per-repository due to the lack of backpointer from an in-core commit
object to the repository it belongs to, reverting 14ba97f8 (alloc:
allow arbitrary repositories for alloc functions, 2018-05-15) is a
reasonable thing to do, but then since the only reason the above
three functions in the callchain take "struct repository *" is
because the bottom-most alloc_commit_index() wants it WHEN IT DOES
NOT USE IT, it would be good to get rid of the "struct repository"
parameter from the functions involved in the callchain altogether.

Then we won't have to ask the "who are we breaking with this change"
question.  At that point it would be clear that everybody would be
OK with a single counter to ensure uniqueness of in-core commit
across all the in-core repository instances.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
  2020-06-12 20:08   ` Junio C Hamano
@ 2020-06-12 22:52   ` Junio C Hamano
  2020-06-13 18:57     ` Abhishek Kumar
  2020-06-12 23:16   ` Jakub Narębski
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2020-06-12 22:52 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:

>  static unsigned int alloc_commit_index(struct repository *r)
>  {
> -	return r->parsed_objects->commit_count++;
> +	static unsigned int parsed_commits_count = 0;
> +	r->parsed_objects->commit_count++;
> +	return parsed_commits_count++;
>  }

I'll queue this as-is, together with the rest of the series, but
with the following SQUASH??? to document why we are counting
globally.

I do not think r->parsed_objects->*_count is used by anybody, and we
probably can eventually get rid of these stats fields, and when that
happens, we may want to lose the "struct repository *" pointer from
the functions in the callchain, but not right now.

Thanks.

 alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/alloc.c b/alloc.c
index 29f0e3aa80..ee92661b71 100644
--- a/alloc.c
+++ b/alloc.c
@@ -99,6 +99,11 @@ void *alloc_object_node(struct repository *r)
 	return obj;
 }
 
+/*
+ * The returned count is to be used as an index into commit slabs,
+ * that are *NOT* maintained per repository, and that is why a single
+ * global counter is used.
+ */
 static unsigned int alloc_commit_index(struct repository *r)
 {
 	static unsigned int parsed_commits_count = 0;
-- 
2.27.0-90-geebb51ba8c


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
  2020-06-12 20:08   ` Junio C Hamano
  2020-06-12 22:52   ` Junio C Hamano
@ 2020-06-12 23:16   ` Jakub Narębski
  2 siblings, 0 replies; 14+ messages in thread
From: Jakub Narębski @ 2020-06-12 23:16 UTC (permalink / raw)
  To: Abhishek Kumar, git; +Cc: Jakub Narębski, Derrick Stolee

On 12.06.2020, Abhishek Kumar wrote:

> Commit slab relies on uniqueness of commit->index to access data. As
> submodules are repositories on their own, alloc_commit_index() (which
> depends on repository->parsed_objects->commit_count) no longer
> returns unique values.
> 
> This would break tests once we move `generation` and `graph_pos` into a
> commit slab, as commits of supermodule and submodule can have the same
> index but must have different graph positions.

First, commits of supermodule and of submodule are in different graphs,
so I don't see why they have to be in the same serialized commit-graph
file.

Second, Git stores many different types of information on slab already.
How comes that we have not had any problems till now?  

There is contains_cache, commit_seen, indegree_slab, author_date_slab,
commit_base, commit_pos, bloom_filter_slab, buffer_slab, commit_rev_name,
commit_names, commit_name_slab, saved_parents, blame_suspects,
commit_todo_item.

> 
> Let's introduce a counter variable, `parsed_commits_count` to keep track
> of parsed commits so far.

All right, thought it might be worth mentioning that it is a global
variable, or rather a static variable in a function.

> 
> 
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
> 
> CI Build for the failing tests:
> https://travis-ci.com/github/abhishekkumar2718/git/jobs/345413840

The failed tests are, from what I see:
- t4060-diff-submodule-option-diff-format.sh
- t4041-diff-submodule-option.sh
- t4059-diff-submodule-not-initialized.sh


> 
>   alloc.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/alloc.c b/alloc.c
> index 1c64c4dd16..29f0e3aa80 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -101,7 +101,9 @@ void *alloc_object_node(struct repository *r)
>   
>   static unsigned int alloc_commit_index(struct repository *r)
>   {
> -	return r->parsed_objects->commit_count++;
> +	static unsigned int parsed_commits_count = 0;
> +	r->parsed_objects->commit_count++;

Do we use r->parsed_objects->commit_count anywhere?

> +	return parsed_commits_count++;

Does it matter that it is not thread safe, because it is not atomic?
Shouldn't it be

  +	static _Atomic unsigned int parsed_commits_count = 0;

or

  +	static _Atomic unsigned int parsed_commits_count = ATOMIC_VAR_INIT(0);

(If it is allowed).

>   }
>   
>   void init_commit_node(struct repository *r, struct commit *c)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab
  2020-06-12 18:40 ` [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
@ 2020-06-13  6:53   ` SZEDER Gábor
  2020-06-17  9:18     ` Abhishek Kumar
  0 siblings, 1 reply; 14+ messages in thread
From: SZEDER Gábor @ 2020-06-13  6:53 UTC (permalink / raw)
  To: Abhishek Kumar; +Cc: git, stolee, jnareb

On Sat, Jun 13, 2020 at 12:10:12AM +0530, Abhishek Kumar wrote:
> diff --git a/commit-graph.c b/commit-graph.c
> index 2ff042fbf4..91120ba3d3 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb)
>  	       commit_pos_at(&commit_pos, b);
>  }
>  
> +define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
> +static struct commit_graph_data_slab commit_graph_data_slab =
> +	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
> +
> +uint32_t commit_graph_position(const struct commit *c)
> +{
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> +
> +	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
> +}
> +
> +uint32_t commit_graph_generation(const struct commit *c)
> +{
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> +
> +	if (!data)
> +		return GENERATION_NUMBER_INFINITY;
> +	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
> +		return GENERATION_NUMBER_INFINITY;
> +
> +	return data->generation;
> +}
> +
> +static struct commit_graph_data *commit_graph_data_at(const struct commit *c)

  commit-graph.c: At top level:
  commit-graph.c:115:34: error: ‘commit_graph_data_at’ defined but not used [-Werror=unused-function]
   static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
                                    ^

Please make sure that each and every commit of your series can be
built with DEVELOPER=1 and passes the test suite.

> +{
> +	uint32_t i = commit_graph_data_slab.slab_count, j;
> +	uint32_t slab_size = commit_graph_data_slab.slab_size;
> +	struct commit_graph_data *data =
> +		commit_graph_data_slab_at(&commit_graph_data_slab, c);
> +
> +	/*
> +	 * commit-slab initializes elements with zero, overwrite this with
> +	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
> +	 *
> +	 * We avoid initializing generation with checking if graph position
> +	 * is not COMMIT_NOT_FROM_GRAPH.
> +	 */
> +	for (; i < commit_graph_data_slab.slab_count; i++) {
> +		for (j = 0; j < slab_size; j++) {
> +			commit_graph_data_slab[i][j].graph_pos =

  commit-graph.c: In function ‘commit_graph_data_at’:
  commit-graph.c:131:26: error: subscripted value is neither array nor pointer nor vector
      commit_graph_data_slab[i][j].graph_pos =
                            ^

Once the next patch is applied Git can be built again, but then:

  $ ~/src/git/git -C webkit.git commit-graph write
  Finding commits for commit graph among packed objects: 100% (3984415/3984415), done.
  Expanding reachable commits in commit graph: 219420, done.
  Segmentation fault

Looking at it with a debugger:

  Program received signal SIGSEGV, Segmentation fault.
  0x00000000005079b2 in commit_graph_data_at (c=0x1ad9aa0) at commit-graph.c:131
  131                             commit_graph_data_slab.slab[i][j].graph_pos =
  (gdb) bt
  #0  0x00000000005079b2 in commit_graph_data_at (c=0x1ad9aa0)
      at commit-graph.c:131
  #1  0x000000000050a6a7 in compute_generation_numbers (ctx=0x9d61a0)
      at commit-graph.c:1304
  #2  0x000000000050d37d in write_commit_graph (odb=0x9d3d80, pack_indexes=0x0, 
      commits=0x0, flags=COMMIT_GRAPH_WRITE_PROGRESS, 
      split_opts=0x98e730 <split_opts>) at commit-graph.c:2178
  #3  0x000000000042bb7e in graph_write (argc=0, argv=0x7fffffffe2f0)
      at builtin/commit-graph.c:242
  #4  0x000000000042bc8d in cmd_commit_graph (argc=1, argv=0x7fffffffe2f0, 
      prefix=0x0) at builtin/commit-graph.c:278
  #5  0x00000000004061d1 in run_builtin (p=0x97b950 <commands+528>, argc=2, 
      argv=0x7fffffffe2f0) at git.c:448
  #6  0x0000000000406521 in handle_builtin (argc=2, argv=0x7fffffffe2f0)
      at git.c:673
  #7  0x00000000004067c9 in run_argv (argcp=0x7fffffffe18c, argv=0x7fffffffe180)
      at git.c:740
  #8  0x0000000000406c3a in cmd_main (argc=2, argv=0x7fffffffe2f0) at git.c:871
  #9  0x00000000004cfbda in main (argc=5, argv=0x7fffffffe2d8)
      at common-main.c:52
  (gdb) p commit_graph_data_slab
  $1 = {
    slab_size = 65532, 
    stride = 1, 
    slab_count = 4, 
    slab = 0x4f68690
  }
  (gdb) p i
  $2 = 0
  (gdb) p commit_graph_data_slab.slab[i]
  $3 = (struct commit_graph_data *) 0x0
  (gdb) p c->index
  $4 = 213506
  (gdb) up
  #1  0x000000000050a6a7 in compute_generation_numbers (ctx=0x9d61a0)
      at commit-graph.c:1304
  1304                    uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
  (gdb) p i
  $5 = 0


The way the loop variable 'i' is initialized and is used in the loop
header suggests that you assume that all slabs with an index
<slab_count are allocated.  That's not the case, only those slabs are
allocated that contain data associated with a commit that has already
been looked at.


> +				COMMIT_NOT_FROM_GRAPH;
> +		}
> +	}
> +
> +	return data;
> +}
> +
>  static int commit_gen_cmp(const void *va, const void *vb)
>  {
>  	const struct commit *a = *(const struct commit **)va;
> diff --git a/commit-graph.h b/commit-graph.h
> index 3ba0da1e5f..cc76757007 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -135,4 +135,14 @@ void free_commit_graph(struct commit_graph *);
>   */
>  void disable_commit_graph(struct repository *r);
>  
> +struct commit_graph_data {
> +	uint32_t graph_pos;
> +	uint32_t generation;
> +};
> +
> +/* 

This line adds a trailing space which is then removed in next patch.
Please don't add it in the first place.

> + * Commits should be parsed before accessing generation, graph positions.
> + */
> +uint32_t commit_graph_generation(const struct commit *);
> +uint32_t commit_graph_position(const struct commit *);
>  #endif
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/4] alloc: introduce parsed_commits_count
  2020-06-12 22:52   ` Junio C Hamano
@ 2020-06-13 18:57     ` Abhishek Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-13 18:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: jnareb, stolee, git

On Fri, Jun 12, 2020 at 03:52:26PM -0700, Junio C Hamano wrote:
> Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
> 
> >  static unsigned int alloc_commit_index(struct repository *r)
> >  {
> > -	return r->parsed_objects->commit_count++;
> > +	static unsigned int parsed_commits_count = 0;
> > +	r->parsed_objects->commit_count++;
> > +	return parsed_commits_count++;
> >  }
> 
> I'll queue this as-is, together with the rest of the series, but
> with the following SQUASH??? to document why we are counting
> globally.
> 
> I do not think r->parsed_objects->*_count is used by anybody, and we
> probably can eventually get rid of these stats fields, and when that
> happens, we may want to lose the "struct repository *" pointer from
> the functions in the callchain, but not right now.
> 
> Thanks.
> 
>  alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/alloc.c b/alloc.c
> index 29f0e3aa80..ee92661b71 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -99,6 +99,11 @@ void *alloc_object_node(struct repository *r)
>  	return obj;
>  }
>  
> +/*
> + * The returned count is to be used as an index into commit slabs,
> + * that are *NOT* maintained per repository, and that is why a single
> + * global counter is used.
> + */
>  static unsigned int alloc_commit_index(struct repository *r)
>  {
>  	static unsigned int parsed_commits_count = 0;
> -- 
> 2.27.0-90-geebb51ba8c
> 

Thanks for taking a closer look. I have created an issue on
gitgitgadget [1] to get rid of commit count and avoid passing repository
to the functions anymore.

[1]: https://github.com/gitgitgadget/git/issues/657

Abhishek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab
  2020-06-13  6:53   ` SZEDER Gábor
@ 2020-06-17  9:18     ` Abhishek Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Abhishek Kumar @ 2020-06-17  9:18 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: jnareb, stolee, git

On Sat, Jun 13, 2020 at 08:53:39AM +0200, SZEDER Gábor wrote:
> On Sat, Jun 13, 2020 at 12:10:12AM +0530, Abhishek Kumar wrote:
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 2ff042fbf4..91120ba3d3 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -87,6 +87,55 @@ static int commit_pos_cmp(const void *va, const void *vb)
> >  	       commit_pos_at(&commit_pos, b);
> >  }
> >  
> > +define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
> > +static struct commit_graph_data_slab commit_graph_data_slab =
> > +	COMMIT_SLAB_INIT(1, commit_graph_data_slab);
> > +
> > +uint32_t commit_graph_position(const struct commit *c)
> > +{
> > +	struct commit_graph_data *data =
> > +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> > +
> > +	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
> > +}
> > +
> > +uint32_t commit_graph_generation(const struct commit *c)
> > +{
> > +	struct commit_graph_data *data =
> > +		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
> > +
> > +	if (!data)
> > +		return GENERATION_NUMBER_INFINITY;
> > +	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
> > +		return GENERATION_NUMBER_INFINITY;
> > +
> > +	return data->generation;
> > +}
> > +
> > +static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
> 
>   commit-graph.c: At top level:
>   commit-graph.c:115:34: error: ‘commit_graph_data_at’ defined but not used [-Werror=unused-function]
>    static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
>                                     ^
> 
> Please make sure that each and every commit of your series can be
> built with DEVELOPER=1 and passes the test suite.
> 
> > +{
> > +	uint32_t i = commit_graph_data_slab.slab_count, j;
> > +	uint32_t slab_size = commit_graph_data_slab.slab_size;
> > +	struct commit_graph_data *data =
> > +		commit_graph_data_slab_at(&commit_graph_data_slab, c);
> > +
> > +	/*
> > +	 * commit-slab initializes elements with zero, overwrite this with
> > +	 * COMMIT_NOT_FROM_GRAPH for graph_pos.
> > +	 *
> > +	 * We avoid initializing generation with checking if graph position
> > +	 * is not COMMIT_NOT_FROM_GRAPH.
> > +	 */
> > +	for (; i < commit_graph_data_slab.slab_count; i++) {
> > +		for (j = 0; j < slab_size; j++) {
> > +			commit_graph_data_slab[i][j].graph_pos =
> 
>   commit-graph.c: In function ‘commit_graph_data_at’:
>   commit-graph.c:131:26: error: subscripted value is neither array nor pointer nor vector
>       commit_graph_data_slab[i][j].graph_pos =
>                             ^
> 
> Once the next patch is applied Git can be built again, but then:
> 
>   $ ~/src/git/git -C webkit.git commit-graph write
>   Finding commits for commit graph among packed objects: 100% (3984415/3984415), done.
>   Expanding reachable commits in commit graph: 219420, done.
>   Segmentation fault
> 
> Looking at it with a debugger:
> 
>   Program received signal SIGSEGV, Segmentation fault.
>   0x00000000005079b2 in commit_graph_data_at (c=0x1ad9aa0) at commit-graph.c:131
>   131                             commit_graph_data_slab.slab[i][j].graph_pos =
>   (gdb) bt
>   #0  0x00000000005079b2 in commit_graph_data_at (c=0x1ad9aa0)
>       at commit-graph.c:131
>   #1  0x000000000050a6a7 in compute_generation_numbers (ctx=0x9d61a0)
>       at commit-graph.c:1304
>   #2  0x000000000050d37d in write_commit_graph (odb=0x9d3d80, pack_indexes=0x0, 
>       commits=0x0, flags=COMMIT_GRAPH_WRITE_PROGRESS, 
>       split_opts=0x98e730 <split_opts>) at commit-graph.c:2178
>   #3  0x000000000042bb7e in graph_write (argc=0, argv=0x7fffffffe2f0)
>       at builtin/commit-graph.c:242
>   #4  0x000000000042bc8d in cmd_commit_graph (argc=1, argv=0x7fffffffe2f0, 
>       prefix=0x0) at builtin/commit-graph.c:278
>   #5  0x00000000004061d1 in run_builtin (p=0x97b950 <commands+528>, argc=2, 
>       argv=0x7fffffffe2f0) at git.c:448
>   #6  0x0000000000406521 in handle_builtin (argc=2, argv=0x7fffffffe2f0)
>       at git.c:673
>   #7  0x00000000004067c9 in run_argv (argcp=0x7fffffffe18c, argv=0x7fffffffe180)
>       at git.c:740
>   #8  0x0000000000406c3a in cmd_main (argc=2, argv=0x7fffffffe2f0) at git.c:871
>   #9  0x00000000004cfbda in main (argc=5, argv=0x7fffffffe2d8)
>       at common-main.c:52
>   (gdb) p commit_graph_data_slab
>   $1 = {
>     slab_size = 65532, 
>     stride = 1, 
>     slab_count = 4, 
>     slab = 0x4f68690
>   }
>   (gdb) p i
>   $2 = 0
>   (gdb) p commit_graph_data_slab.slab[i]
>   $3 = (struct commit_graph_data *) 0x0
>   (gdb) p c->index
>   $4 = 213506
>   (gdb) up
>   #1  0x000000000050a6a7 in compute_generation_numbers (ctx=0x9d61a0)
>       at commit-graph.c:1304
>   1304                    uint32_t generation = commit_graph_data_at(ctx->commits.list[i])->generation;
>   (gdb) p i
>   $5 = 0
> 
> 
> The way the loop variable 'i' is initialized and is used in the loop
> header suggests that you assume that all slabs with an index
> <slab_count are allocated.  That's not the case, only those slabs are
> allocated that contain data associated with a commit that has already
> been looked at.
> 

Wait, so assuming there are no slabs before and I call
`commit_graph_data_slab_at(&commit_graph_data_slab, c)` with an commit
index that would be in fourth slab, the slab would look like:

commit_graph_data_slab.slab[] = {NULL, NULL, NULL, <address>}

That's new to me! Thanks for sharing this.

Hopefully this is fixed in the v4 series here [1].

[1]: https://lore.kernel.org/git/20200617091411.14650-1-abhishekkumar8222@gmail.com/T/#u

> 
> > +				COMMIT_NOT_FROM_GRAPH;
> > +		}
> > +	}
> > +
> > +	return data;
> > +}
> > +
> >  static int commit_gen_cmp(const void *va, const void *vb)
> >  {
> >  	const struct commit *a = *(const struct commit **)va;
> > diff --git a/commit-graph.h b/commit-graph.h
> > index 3ba0da1e5f..cc76757007 100644
> > --- a/commit-graph.h
> > +++ b/commit-graph.h
> > @@ -135,4 +135,14 @@ void free_commit_graph(struct commit_graph *);
> >   */
> >  void disable_commit_graph(struct repository *r);
> >  
> > +struct commit_graph_data {
> > +	uint32_t graph_pos;
> > +	uint32_t generation;
> > +};
> > +
> > +/* 
> 
> This line adds a trailing space which is then removed in next patch.
> Please don't add it in the first place.
> 
> > + * Commits should be parsed before accessing generation, graph positions.
> > + */
> > +uint32_t commit_graph_generation(const struct commit *);
> > +uint32_t commit_graph_position(const struct commit *);
> >  #endif
> > -- 
> > 2.27.0
> > 

Regards
Abhishek

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-17  9:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-12 18:40 [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Abhishek Kumar
2020-06-12 18:40 ` [PATCH v3 1/4] alloc: introduce parsed_commits_count Abhishek Kumar
2020-06-12 20:08   ` Junio C Hamano
2020-06-12 22:00     ` Junio C Hamano
2020-06-12 22:20       ` Junio C Hamano
2020-06-12 22:52   ` Junio C Hamano
2020-06-13 18:57     ` Abhishek Kumar
2020-06-12 23:16   ` Jakub Narębski
2020-06-12 18:40 ` [PATCH v3 2/4] commit-graph: introduce commit_graph_data_slab Abhishek Kumar
2020-06-13  6:53   ` SZEDER Gábor
2020-06-17  9:18     ` Abhishek Kumar
2020-06-12 18:40 ` [PATCH v3 3/4] commit: move members graph_pos, generation to a slab Abhishek Kumar
2020-06-12 18:40 ` [PATCH v3 4/4] commit-graph: minimize commit_graph_data_slab access Abhishek Kumar
2020-06-12 21:26 ` [GSoC Patch v3 0/4] Move generation, graph_pos to a slab Jakub Narębski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).