git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v4 05/10] commit-graph: always load commit-graph information
Date: Mon, 30 Apr 2018 00:14:58 +0200	[thread overview]
Message-ID: <866049irrx.fsf@gmail.com> (raw)
In-Reply-To: <20180425143735.240183-6-dstolee@microsoft.com> (Derrick Stolee's message of "Wed, 25 Apr 2018 14:37:58 +0000")

Derrick Stolee <dstolee@microsoft.com> writes:

> Most code paths load commits using lookup_commit() and then
> parse_commit().

And this automatically loads commit graph if needed, thanks to changes
in parse_commit_gently(), which parse_commit() uses.

>                 In some cases, including some branch lookups, the commit
> is parsed using parse_object_buffer() which side-steps parse_commit() in
> favor of parse_commit_buffer().

I guess the problem is that we cannot just add parse_commit_in_graph() 
like we did in parse_commit_gently(), for some reason?  Like for example
that parse_commit_gently() uses parse_commit_buffer() - which could have
been handled by moving parse_commit_in_graph() down the call chain from
parse_commit_gently() to parse_commit_buffer()... if not the fact that
check_commit() also uses parse_commit_buffer(), but it does not want to
load commit graph.  Am I right?

>
> With generation numbers in the commit-graph, we need to ensure that any
> commit that exists in the commit-graph file has its generation number
> loaded.

Is it generation number, or generation number and position in commit
graph?

>
> Create new load_commit_graph_info() method to fill in the information
> for a commit that exists only in the commit-graph file. Call it from
> parse_commit_buffer() after loading the other commit information from
> the given buffer. Only fill this information when specified by the
> 'check_graph' parameter.

I think this commit would be easier to review if it was split into pure
refactoring part (extracting fill_commit_graph_info() and
find_commit_in_graph()).  On the other hand the refactoring was needed
to reduce code duplication betweem existing parse_commit_in_graph() and
new load_commit_graph_info() functions.

I guess that the difference between parse_commit_in_graph() and
load_commit_graph_info() is that the former cares only about having just
enough information that is needed for parse_commit_gently() - and does
not load graph data if commit is parsed, while the latter is about
loading commit-graph data like generation numbers.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-graph.c | 45 ++++++++++++++++++++++++++++++---------------
>  commit-graph.h |  8 ++++++++
>  commit.c       |  7 +++++--
>  commit.h       |  2 +-
>  object.c       |  2 +-
>  sha1_file.c    |  2 +-
>  6 files changed, 46 insertions(+), 20 deletions(-)

I wonder if it would be possible to add tests for this feature, for
example that commit-graph is read when it should (including those branch
lookups), and is not read when the feature should be disabled.

But the only way to test it I can think of is a stupid one: create
invalid commit graph, and check that git fails as expected (trying to
read said malformed file), and does not fail if commit graph feature is
disabled.

>

Let me reorder files (BTW, is there a way for Git to put *.h files
before *.c files in diff?) for easier review:

> diff --git a/commit-graph.h b/commit-graph.h
> index 260a468e73..96cccb10f3 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -17,6 +17,14 @@ char *get_commit_graph_filename(const char *obj_dir);
>   */
>  int parse_commit_in_graph(struct commit *item);
>  
> +/*
> + * It is possible that we loaded commit contents from the commit buffer,
> + * but we also want to ensure the commit-graph content is correctly
> + * checked and filled. Fill the graph_pos and generation members of
> + * the given commit.
> + */
> +void load_commit_graph_info(struct commit *item);
> +
>  struct tree *get_commit_tree_in_graph(const struct commit *c);
>  
>  struct commit_graph {
> diff --git a/commit-graph.c b/commit-graph.c
> index 047fa9fca5..aebd242def 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -245,6 +245,12 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
>  	return &commit_list_insert(c, pptr)->next;
>  }
>  
> +static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)
> +{
> +	const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos;
> +	item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> +}

The comment in the header file commit-graph.h talks about filling
graph_pos and generation members of the given commit, but I don't see
filling graph_pos member here.

Sidenote: it is a tiny little bit strange to see symbolic constants like
GRAPH_DATA_WIDTH near using magic values such as 8 and 2.

> +
>  static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
>  {
>  	uint32_t edge_value;
> @@ -292,31 +298,40 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin
>  	return 1;
>  }
>  
> +static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
> +{
> +	if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
> +		*pos = item->graph_pos;
> +		return 1;
> +	} else {
> +		return bsearch_graph(g, &(item->object.oid), pos);
> +	}
> +}

Nice refactoring here.

> +
>  int parse_commit_in_graph(struct commit *item)
>  {
> +	uint32_t pos;
> +
>  	if (!core_commit_graph)
>  		return 0;
>  	if (item->object.parsed)
>  		return 1;
> -
>  	prepare_commit_graph();
> -	if (commit_graph) {
> -		uint32_t pos;
> -		int found;
> -		if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
> -			pos = item->graph_pos;
> -			found = 1;
> -		} else {
> -			found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
> -		}
> -
> -		if (found)
> -			return fill_commit_in_graph(item, commit_graph, pos);
> -	}
> -
> +	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
> +		return fill_commit_in_graph(item, commit_graph, pos);
>  	return 0;
>  }
>  
> +void load_commit_graph_info(struct commit *item)
> +{
> +	uint32_t pos;
> +	if (!core_commit_graph)
> +		return;
> +	prepare_commit_graph();
> +	if (commit_graph && find_commit_in_graph(item, commit_graph, &pos))
> +		fill_commit_graph_info(item, commit_graph, pos);
> +}

Similar functions, different goals (as the names imply).

> +
>  static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *c)
>  {
>  	struct object_id oid;
> diff --git a/commit.c b/commit.c
> index 4d00b0a1d6..39a3749abd 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -331,7 +331,7 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
>  	return ret;
>  }
>  
> -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size)
> +int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph)
>  {
>  	const char *tail = buffer;
>  	const char *bufptr = buffer;
> @@ -386,6 +386,9 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
>  	}
>  	item->date = parse_commit_date(bufptr, tail);
>  
> +	if (check_graph)
> +		load_commit_graph_info(item);
> +

All right, read commit-graph specific data after parsing commit itself.
It is at the end because commit object needs to be parsed sequentially,
and it includes more info that is contained in commit-graph CDAT+EDGE
data.

>  	return 0;
>  }
>  
> @@ -412,7 +415,7 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
>  		return error("Object %s not a commit",
>  			     oid_to_hex(&item->object.oid));
>  	}
> -	ret = parse_commit_buffer(item, buffer, size);
> +	ret = parse_commit_buffer(item, buffer, size, 0);

The parse_commit_gently() contract is that it provides only bare minimum
of information, from commit-graph if possible, and does read object from
disk and parses it only when it could not avoid it.  If it needs to
parse it, it doesn't need to fill commit-graph specific data again.

All right.

>  	if (save_commit_buffer && !ret) {
>  		set_commit_buffer(item, buffer, size);
>  		return 0;
> diff --git a/commit.h b/commit.h
> index 64436ff44e..b5afde1ae9 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -72,7 +72,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
>   */
>  struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
>  
> -int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size);
> +int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size, int check_graph);
>  int parse_commit_gently(struct commit *item, int quiet_on_missing);
>  static inline int parse_commit(struct commit *item)
>  {
> diff --git a/object.c b/object.c
> index e6ad3f61f0..efe4871325 100644
> --- a/object.c
> +++ b/object.c
> @@ -207,7 +207,7 @@ struct object *parse_object_buffer(const struct object_id *oid, enum object_type
>  	} else if (type == OBJ_COMMIT) {
>  		struct commit *commit = lookup_commit(oid);
>  		if (commit) {
> -			if (parse_commit_buffer(commit, buffer, size))
> +			if (parse_commit_buffer(commit, buffer, size, 1))

All that rigamarole was needed because of

DS>                 In some cases, including some branch lookups, the commit
DS> is parsed using parse_object_buffer() which side-steps parse_commit() in
DS> favor of parse_commit_buffer().

Here we want parse_object_buffer() to get also commit-graph specific
data, if available.  All right.

>  				return NULL;
>  			if (!get_cached_commit_buffer(commit, NULL)) {
>  				set_commit_buffer(commit, buffer, size);
> diff --git a/sha1_file.c b/sha1_file.c
> index 1b94f39c4c..0fd4f0b8b6 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1755,7 +1755,7 @@ static void check_commit(const void *buf, size_t size)
>  {
>  	struct commit c;
>  	memset(&c, 0, sizeof(c));
> -	if (parse_commit_buffer(&c, buf, size))
> +	if (parse_commit_buffer(&c, buf, size, 0))

For check we don't need commit graph data.  Looks all right.

>  		die("corrupt commit");
>  }

Best,
-- 
Jakub Narębski

  reply	other threads:[~2018-04-29 22:15 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 16:51 [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 16:51 ` [PATCH 1/6] object.c: parse commit in graph first Derrick Stolee
2018-04-03 18:21   ` Jonathan Tan
2018-04-03 18:28     ` Jeff King
2018-04-03 18:32       ` Derrick Stolee
2018-04-03 16:51 ` [PATCH 2/6] commit: add generation number to struct commmit Derrick Stolee
2018-04-03 18:05   ` Brandon Williams
2018-04-03 18:28     ` Jeff King
2018-04-03 18:31       ` Derrick Stolee
2018-04-03 18:32       ` Brandon Williams
2018-04-03 18:44       ` Stefan Beller
2018-04-03 23:17       ` Ramsay Jones
2018-04-03 23:19         ` Jeff King
2018-04-03 18:24   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 3/6] commit-graph: compute generation numbers Derrick Stolee
2018-04-03 18:30   ` Jonathan Tan
2018-04-03 18:49     ` Stefan Beller
2018-04-03 16:51 ` [PATCH 4/6] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-03 18:31   ` Stefan Beller
2018-04-03 18:31   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 5/6] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-03 19:01   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 6/6] commit-graph.txt: update future work Derrick Stolee
2018-04-03 19:04   ` Jonathan Tan
2018-04-03 16:56 ` [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 18:03 ` Brandon Williams
2018-04-03 18:29   ` Derrick Stolee
2018-04-03 18:47     ` Jeff King
2018-04-03 19:05       ` Jeff King
2018-04-04 15:45         ` [PATCH 7/6] ref-filter: use generation number for --contains Derrick Stolee
2018-04-04 15:45           ` [PATCH 8/6] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-04 15:48             ` Derrick Stolee
2018-04-04 17:01               ` Brandon Williams
2018-04-04 18:24               ` Jeff King
2018-04-04 18:53                 ` Derrick Stolee
2018-04-04 18:59                   ` Jeff King
2018-04-04 18:22           ` [PATCH 7/6] ref-filter: use generation number for --contains Jeff King
2018-04-04 19:06             ` Derrick Stolee
2018-04-04 19:16               ` Jeff King
2018-04-04 19:22                 ` Derrick Stolee
2018-04-04 19:42                   ` Jeff King
2018-04-04 19:45                     ` Derrick Stolee
2018-04-04 19:46                       ` Jeff King
2018-04-07 17:09     ` [PATCH 0/6] Compute and consume generation numbers Jakub Narebski
2018-04-07 16:55 ` Jakub Narebski
2018-04-08  1:06   ` Derrick Stolee
2018-04-11 19:32     ` Jakub Narebski
2018-04-11 19:58       ` Derrick Stolee
2018-04-14 16:52         ` Jakub Narebski
2018-04-21 20:44           ` Jakub Narebski
2018-04-23 13:54             ` Derrick Stolee
2018-04-09 16:41 ` [PATCH v2 00/10] " Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 01/10] object.c: parse commit in graph first Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 02/10] merge: check config before loading commits Derrick Stolee
2018-04-11  2:12     ` Junio C Hamano
2018-04-11 12:49       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 03/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-09 17:59     ` Stefan Beller
2018-04-11  2:31     ` Junio C Hamano
2018-04-11 12:57       ` Derrick Stolee
2018-04-11 23:28         ` Junio C Hamano
2018-04-09 16:42   ` [PATCH v2 04/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-11  2:51     ` Junio C Hamano
2018-04-11 13:02       ` Derrick Stolee
2018-04-11 18:49         ` Stefan Beller
2018-04-11 19:26         ` Eric Sunshine
2018-04-09 16:42   ` [PATCH v2 05/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 06/10] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-11  3:02     ` Junio C Hamano
2018-04-11 13:24       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 07/10] commit-graph.txt: update future work Derrick Stolee
2018-04-12  9:12     ` Junio C Hamano
2018-04-12 11:35       ` Derrick Stolee
2018-04-13  9:53         ` Jakub Narebski
2018-04-09 16:42   ` [PATCH v2 08/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 09/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 10/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-17 17:00   ` [PATCH v3 0/9] Compute and consume generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 1/9] commit: add generation number to struct commmit Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 2/9] commit-graph: compute generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 3/9] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-18 14:31       ` Jakub Narebski
2018-04-18 14:46         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 4/9] commit-graph.txt: update design document Derrick Stolee
2018-04-18 19:47       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 5/9] ref-filter: use generation number for --contains Derrick Stolee
2018-04-18 21:02       ` Jakub Narebski
2018-04-23 14:22         ` Derrick Stolee
2018-04-24 18:56           ` Jakub Narebski
2018-04-25 14:11             ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 6/9] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-18 22:15       ` Jakub Narebski
2018-04-23 14:31         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-18 23:19       ` Jakub Narebski
2018-04-23 14:40         ` Derrick Stolee
2018-04-23 21:38           ` Jakub Narebski
2018-04-24 12:31             ` Derrick Stolee
2018-04-19  8:32       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 8/9] commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 17:50       ` Derrick Stolee
2018-04-19  0:02       ` Jakub Narebski
2018-04-23 14:49         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 9/9] merge: check config before loading commits Derrick Stolee
2018-04-19  0:04     ` [PATCH v3 0/9] Compute and consume generation numbers Jakub Narebski
2018-04-23 14:54       ` Derrick Stolee
2018-04-25 14:37     ` [PATCH v4 00/10] " Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 01/10] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-04-28 17:54         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 02/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-28 22:35         ` Jakub Narebski
2018-04-30 12:05           ` Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 03/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-26  2:35         ` Junio C Hamano
2018-04-26 12:58           ` Derrick Stolee
2018-04-26 13:49             ` Derrick Stolee
2018-04-29  9:08         ` Jakub Narebski
2018-05-01 12:10           ` Derrick Stolee
2018-05-02 16:15             ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 04/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-26  3:22         ` Junio C Hamano
2018-04-26  9:02           ` Jakub Narebski
2018-04-28 14:38             ` Jakub Narebski
2018-04-29 15:40         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 05/10] commit-graph: always load commit-graph information Derrick Stolee
2018-04-29 22:14         ` Jakub Narebski [this message]
2018-05-01 12:19           ` Derrick Stolee
2018-04-29 22:18         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 06/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-30 16:34         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 07/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-30 17:05         ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-30 22:19         ` Jakub Narebski
2018-05-01 11:47           ` Derrick Stolee
2018-05-02 13:05             ` Jakub Narebski
2018-05-02 13:42               ` Derrick Stolee
2018-04-25 14:38       ` [PATCH v4 09/10] merge: check config before loading commits Derrick Stolee
2018-04-30 22:54         ` Jakub Narebski
2018-05-01 11:52           ` Derrick Stolee
2018-05-02 11:41             ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 10/10] commit-graph.txt: update design document Derrick Stolee
2018-04-30 23:32         ` Jakub Narebski
2018-05-01 12:00           ` Derrick Stolee
2018-05-02  7:57             ` Jakub Narebski
2018-04-25 14:40       ` [PATCH v4 00/10] Compute and consume generation numbers Derrick Stolee
2018-04-28 17:28         ` Jakub Narebski
2018-05-01 12:47       ` [PATCH v5 00/11] " Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 01/11] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 02/11] commit: add generation number to struct commmit Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 03/11] commit-graph: compute generation numbers Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 04/11] commit: use generations in paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 05/11] commit-graph: always load commit-graph information Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 06/11] ref-filter: use generation number for --contains Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 07/11] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 08/11] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 09/11] commit: use generation number in remove_redundant() Derrick Stolee
2018-05-01 15:37           ` Derrick Stolee
2018-05-03 18:45           ` Jakub Narebski
2018-05-01 12:47         ` [PATCH v5 10/11] merge: check config before loading commits Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 11/11] commit-graph.txt: update design document Derrick Stolee
2018-05-03 11:18         ` [PATCH v5 00/11] Compute and consume generation numbers Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=866049irrx.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).