git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"gitster@pobox.com" <gitster@pobox.com>,
	"peff@peff.net" <peff@peff.net>,
	"avarab@gmail.com" <avarab@gmail.com>
Subject: Re: [PATCH v4 03/10] commit-graph: compute generation numbers
Date: Tue, 1 May 2018 08:10:20 -0400	[thread overview]
Message-ID: <0bd1ffe0-c727-653a-46a3-f9d4ea17bec2@gmail.com> (raw)
In-Reply-To: <86r2myidmq.fsf@gmail.com>

On 4/29/2018 5:08 AM, Jakub Narebski wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> While preparing commits to be written into a commit-graph file, compute
>> the generation numbers using a depth-first strategy.
> Sidenote: for generation numbers it does not matter if we use
> depth-first or breadth-first strategy, but it is more natural to use
> depth-first search because generation numbers need post-order processing
> (parents before child).
>
>> The only commits that are walked in this depth-first search are those
>> without a precomputed generation number. Thus, computation time will be
>> relative to the number of new commits to the commit-graph file.
> A question: what happens if the existing commit graph is from older
> version of git and has _ZERO for generation numbers?
>
> Answer: I see that we treat both _INFINITY (not in commit-graph) and
> _ZERO (in commit graph but not computed) as not computed generation
> numbers.  All right.
>
>> If a computed generation number would exceed GENERATION_NUMBER_MAX, then
>> use GENERATION_NUMBER_MAX instead.
> All right, though I guess this would remain theoretical for a long
> while.
>
> We don't have any way of testing this, at least not without recompiling
> Git with lower value of GENERATION_NUMBER_MAX -- which means not
> automatically, isn't it?
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   commit-graph.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 45 insertions(+)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 9ad21c3ffb..047fa9fca5 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -439,6 +439,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>>   		else
>>   			packedDate[0] = 0;
>>   
>> +		if ((*list)->generation != GENERATION_NUMBER_INFINITY)
>> +			packedDate[0] |= htonl((*list)->generation << 2);
>> +
> If we stumble upon commit marked as "not in commit-graph" while writing
> commit graph, it is a BUG(), isn't it?
>
> (Problem noticed by Junio.)

Since we are computing the values for all commits in the list, this 
condition is not important and will be removed.

>
> It is a bit strange to me that the code uses get_be32 for reading, but
> htonl for writing.  Is Git tested on non little-endian machines, like
> big-endian ppc64 or s390x, or on mixed-endian machines (or
> selectable-endian machines with data endianness set to non
> little-endian, like ia64)?  If not, could we use for example openSUSE
> Build Service (https://build.opensuse.org/) for this?

Since we are packing two values into 64 bits, I am using htonl() here to 
arrange the 30-bit generation number alongside the 34-bit commit date 
value, then writing with hashwrite(). The other 32-bit integers are 
written with hashwrite_be32() to avoid translating this data in-memory.

>
>>   		packedDate[1] = htonl((*list)->date);
>>   		hashwrite(f, packedDate, 8);
>>   
>> @@ -571,6 +574,46 @@ static void close_reachable(struct packed_oid_list *oids)
>>   	}
>>   }
>>   
>> +static void compute_generation_numbers(struct commit** commits,
>> +				       int nr_commits)
>> +{
>> +	int i;
>> +	struct commit_list *list = NULL;
> All right, commit_list will work as stack.
>
>> +
>> +	for (i = 0; i < nr_commits; i++) {
>> +		if (commits[i]->generation != GENERATION_NUMBER_INFINITY &&
>> +		    commits[i]->generation != GENERATION_NUMBER_ZERO)
>> +			continue;
> All right, we consider _INFINITY and _SERO as not computed.  If
> generation number is computed (by 'recursion' or from commit graph), we
> (re)use it.  This means that generation number calculation is
> incremental, as intended -- good.
>
>> +
>> +		commit_list_insert(commits[i], &list);
> Start depth-first walks from commits given.
>
>> +		while (list) {
>> +			struct commit *current = list->item;
>> +			struct commit_list *parent;
>> +			int all_parents_computed = 1;
> Here all_parents_computed is a boolean flag.  I see that it is easier to
> start with assumption that all parents will have computed generation
> numbers.
>
>> +			uint32_t max_generation = 0;
> The generation number value of 0 functions as sentinel; generation
> numbers start from 1.  Not that it matters much, as lowest possible
> generation number is 1, and we could have started from that value.

Except that for a commit with no parents, we want it to receive 
generation number max_generation + 1 = 1, so this value of 0 is important.

>
>> +
>> +			for (parent = current->parents; parent; parent = parent->next) {
>> +				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
>> +				    parent->item->generation == GENERATION_NUMBER_ZERO) {
>> +					all_parents_computed = 0;
>> +					commit_list_insert(parent->item, &list);
>> +					break;
> If some parent doesn't have generation number calculated, we add it to
> stack (and break out of loop because it is depth-first walk), and mark
> this situation.  All right.
>
>> +				} else if (parent->item->generation > max_generation) {
>> +					max_generation = parent->item->generation;
> Otherwise, update max_generation.  All right.
>
>> +				}
>> +			}
>> +
>> +			if (all_parents_computed) {
>> +				current->generation = max_generation + 1;
>> +				pop_commit(&list);
>> +			}
>> +
>> +			if (current->generation > GENERATION_NUMBER_MAX)
>> +				current->generation = GENERATION_NUMBER_MAX;
> This conditional should be inside all_parents_computed test, for example
> like this:
>
>    +			if (all_parents_computed) {
>    +				current->generation = max_generation + 1;
>    +				if (current->generation > GENERATION_NUMBER_MAX)
>    +					current->generation = GENERATION_NUMBER_MAX;
>    +
>    +				pop_commit(&list);
>    +			}
>
> (Noticed by Junio.)
>
> Sidenote: when we revisit the commit, returning from depth-first walk of
> one of its parents, we calculate max_generation from scratch again.
> This does not matter for performance, as it's just data access and
> calculating maximum - any workaround to not restart those calculations
> would take more time and memory.  And it's simple.
>
>> +		}
>> +	}
>> +}
>> +
>>   void write_commit_graph(const char *obj_dir,
>>   			const char **pack_indexes,
>>   			int nr_packs,
>> @@ -694,6 +737,8 @@ void write_commit_graph(const char *obj_dir,
>>   	if (commits.nr >= GRAPH_PARENT_MISSING)
>>   		die(_("too many commits to write graph"));
>>   
>> +	compute_generation_numbers(commits.list, commits.nr);
>> +
> Nice and simple.  All right.
>
> I guess that we do not pass "struct packed_commit_list commits" as
> argument to compute_generation_numbers instead of "struct commit**
> commits.list" and "int commits.nr" to compute_generation_numbers() to
> keep the latter nice and generic?

Good catch. There is no reason to not use packed_commit_list here.

>
>>   	graph_name = get_commit_graph_filename(obj_dir);
>>   	fd = hold_lock_file_for_update(&lk, graph_name, 0);
> Best,


  reply	other threads:[~2018-05-01 12:10 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 16:51 [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 16:51 ` [PATCH 1/6] object.c: parse commit in graph first Derrick Stolee
2018-04-03 18:21   ` Jonathan Tan
2018-04-03 18:28     ` Jeff King
2018-04-03 18:32       ` Derrick Stolee
2018-04-03 16:51 ` [PATCH 2/6] commit: add generation number to struct commmit Derrick Stolee
2018-04-03 18:05   ` Brandon Williams
2018-04-03 18:28     ` Jeff King
2018-04-03 18:31       ` Derrick Stolee
2018-04-03 18:32       ` Brandon Williams
2018-04-03 18:44       ` Stefan Beller
2018-04-03 23:17       ` Ramsay Jones
2018-04-03 23:19         ` Jeff King
2018-04-03 18:24   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 3/6] commit-graph: compute generation numbers Derrick Stolee
2018-04-03 18:30   ` Jonathan Tan
2018-04-03 18:49     ` Stefan Beller
2018-04-03 16:51 ` [PATCH 4/6] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-03 18:31   ` Stefan Beller
2018-04-03 18:31   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 5/6] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-03 19:01   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 6/6] commit-graph.txt: update future work Derrick Stolee
2018-04-03 19:04   ` Jonathan Tan
2018-04-03 16:56 ` [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 18:03 ` Brandon Williams
2018-04-03 18:29   ` Derrick Stolee
2018-04-03 18:47     ` Jeff King
2018-04-03 19:05       ` Jeff King
2018-04-04 15:45         ` [PATCH 7/6] ref-filter: use generation number for --contains Derrick Stolee
2018-04-04 15:45           ` [PATCH 8/6] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-04 15:48             ` Derrick Stolee
2018-04-04 17:01               ` Brandon Williams
2018-04-04 18:24               ` Jeff King
2018-04-04 18:53                 ` Derrick Stolee
2018-04-04 18:59                   ` Jeff King
2018-04-04 18:22           ` [PATCH 7/6] ref-filter: use generation number for --contains Jeff King
2018-04-04 19:06             ` Derrick Stolee
2018-04-04 19:16               ` Jeff King
2018-04-04 19:22                 ` Derrick Stolee
2018-04-04 19:42                   ` Jeff King
2018-04-04 19:45                     ` Derrick Stolee
2018-04-04 19:46                       ` Jeff King
2018-04-07 17:09     ` [PATCH 0/6] Compute and consume generation numbers Jakub Narebski
2018-04-07 16:55 ` Jakub Narebski
2018-04-08  1:06   ` Derrick Stolee
2018-04-11 19:32     ` Jakub Narebski
2018-04-11 19:58       ` Derrick Stolee
2018-04-14 16:52         ` Jakub Narebski
2018-04-21 20:44           ` Jakub Narebski
2018-04-23 13:54             ` Derrick Stolee
2018-04-09 16:41 ` [PATCH v2 00/10] " Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 01/10] object.c: parse commit in graph first Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 02/10] merge: check config before loading commits Derrick Stolee
2018-04-11  2:12     ` Junio C Hamano
2018-04-11 12:49       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 03/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-09 17:59     ` Stefan Beller
2018-04-11  2:31     ` Junio C Hamano
2018-04-11 12:57       ` Derrick Stolee
2018-04-11 23:28         ` Junio C Hamano
2018-04-09 16:42   ` [PATCH v2 04/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-11  2:51     ` Junio C Hamano
2018-04-11 13:02       ` Derrick Stolee
2018-04-11 18:49         ` Stefan Beller
2018-04-11 19:26         ` Eric Sunshine
2018-04-09 16:42   ` [PATCH v2 05/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 06/10] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-11  3:02     ` Junio C Hamano
2018-04-11 13:24       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 07/10] commit-graph.txt: update future work Derrick Stolee
2018-04-12  9:12     ` Junio C Hamano
2018-04-12 11:35       ` Derrick Stolee
2018-04-13  9:53         ` Jakub Narebski
2018-04-09 16:42   ` [PATCH v2 08/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 09/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 10/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-17 17:00   ` [PATCH v3 0/9] Compute and consume generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 1/9] commit: add generation number to struct commmit Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 2/9] commit-graph: compute generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 3/9] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-18 14:31       ` Jakub Narebski
2018-04-18 14:46         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 4/9] commit-graph.txt: update design document Derrick Stolee
2018-04-18 19:47       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 5/9] ref-filter: use generation number for --contains Derrick Stolee
2018-04-18 21:02       ` Jakub Narebski
2018-04-23 14:22         ` Derrick Stolee
2018-04-24 18:56           ` Jakub Narebski
2018-04-25 14:11             ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 6/9] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-18 22:15       ` Jakub Narebski
2018-04-23 14:31         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-18 23:19       ` Jakub Narebski
2018-04-23 14:40         ` Derrick Stolee
2018-04-23 21:38           ` Jakub Narebski
2018-04-24 12:31             ` Derrick Stolee
2018-04-19  8:32       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 8/9] commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 17:50       ` Derrick Stolee
2018-04-19  0:02       ` Jakub Narebski
2018-04-23 14:49         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 9/9] merge: check config before loading commits Derrick Stolee
2018-04-19  0:04     ` [PATCH v3 0/9] Compute and consume generation numbers Jakub Narebski
2018-04-23 14:54       ` Derrick Stolee
2018-04-25 14:37     ` [PATCH v4 00/10] " Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 01/10] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-04-28 17:54         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 02/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-28 22:35         ` Jakub Narebski
2018-04-30 12:05           ` Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 03/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-26  2:35         ` Junio C Hamano
2018-04-26 12:58           ` Derrick Stolee
2018-04-26 13:49             ` Derrick Stolee
2018-04-29  9:08         ` Jakub Narebski
2018-05-01 12:10           ` Derrick Stolee [this message]
2018-05-02 16:15             ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 04/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-26  3:22         ` Junio C Hamano
2018-04-26  9:02           ` Jakub Narebski
2018-04-28 14:38             ` Jakub Narebski
2018-04-29 15:40         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 05/10] commit-graph: always load commit-graph information Derrick Stolee
2018-04-29 22:14         ` Jakub Narebski
2018-05-01 12:19           ` Derrick Stolee
2018-04-29 22:18         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 06/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-30 16:34         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 07/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-30 17:05         ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-30 22:19         ` Jakub Narebski
2018-05-01 11:47           ` Derrick Stolee
2018-05-02 13:05             ` Jakub Narebski
2018-05-02 13:42               ` Derrick Stolee
2018-04-25 14:38       ` [PATCH v4 09/10] merge: check config before loading commits Derrick Stolee
2018-04-30 22:54         ` Jakub Narebski
2018-05-01 11:52           ` Derrick Stolee
2018-05-02 11:41             ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 10/10] commit-graph.txt: update design document Derrick Stolee
2018-04-30 23:32         ` Jakub Narebski
2018-05-01 12:00           ` Derrick Stolee
2018-05-02  7:57             ` Jakub Narebski
2018-04-25 14:40       ` [PATCH v4 00/10] Compute and consume generation numbers Derrick Stolee
2018-04-28 17:28         ` Jakub Narebski
2018-05-01 12:47       ` [PATCH v5 00/11] " Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 01/11] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 02/11] commit: add generation number to struct commmit Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 03/11] commit-graph: compute generation numbers Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 04/11] commit: use generations in paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 05/11] commit-graph: always load commit-graph information Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 06/11] ref-filter: use generation number for --contains Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 07/11] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 08/11] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 09/11] commit: use generation number in remove_redundant() Derrick Stolee
2018-05-01 15:37           ` Derrick Stolee
2018-05-03 18:45           ` Jakub Narebski
2018-05-01 12:47         ` [PATCH v5 10/11] merge: check config before loading commits Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 11/11] commit-graph.txt: update design document Derrick Stolee
2018-05-03 11:18         ` [PATCH v5 00/11] Compute and consume generation numbers Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bd1ffe0-c727-653a-46a3-f9d4ea17bec2@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).