git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "Mike Hommey" <mh@glandium.org>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	git@vger.kernel.org
Subject: [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling)
Date: Wed, 18 Sep 2019 10:43:13 +0200	[thread overview]
Message-ID: <86o8ziatb2.fsf_-_@gmail.com> (raw)
In-Reply-To: <55fad895-2c18-5a91-79b9-7b958fe280c6@gmail.com> (Derrick Stolee's message of "Tue, 25 Jun 2019 06:54:29 -0400")

Derrick Stolee <stolee@gmail.com> writes:
> On 6/25/2019 3:51 AM, Jakub Narebski wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>>> Derrick Stolee <stolee@gmail.com> writes:
[...]
>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>> backward-compatibile in use (though not in generation and validation).
[...]
>>> Do you have benchmark for this "monotonically offset corrected commit
>>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>>> and https://github.com/derrickstolee/gen-test ?
>> 
>> I guess this will have to wait...
>
> I have not had time to revisit this topic and re-run performance
> numbers, sorry.

I have created pull requests against `reach-perf` branch of
derrickstolee/git repository[1], and companion pull request against
gen-test repository[2] with proposed prototype of backward-compatible
corrected commit date (with monotonic offsets).

Could you please run the tests for this generation number v5?  I was not
able to do so.  It was my first time trying to compile Git on MS
Windows, and while there were no problems compiling `master` (well,
except for compilation taking a long time), I was unable to do it for
`reach-perf` branch because of independent of change compilation errors.

It would be good to test also the legacy mode, i.e. old Git (using
generation number v0, i.e. topological levels (shortest path from node
to sink, plus one) with all backward-compatible generation numbers, or
at least generation number v5.


Do I understand it correctly that for the time being because of
backward-compatibility concerns instead of incrementing the version
number there would be used some kind of "metadata" chunk (at least until
version of Git that fails hard, instead of turning off commit-graph
feature softly, on unknown version numbers dies out).  Is there some
proposal how such chunk should look like?

[1]: https://github.com/derrickstolee/git/pull/10
[2]: https://github.com/derrickstolee/gen-test/pull/1

--- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ----
From: Jakub Narębski <jnareb@gmail.com>
Subject: [PATCH] commit-graph: generation v5 (backward compatible date ceiling)

This generation number option is backward-compatible (with
reachability index v0) version of the generation number v3, that is
the corrected commit date.  It takes the simple approach of faking
"commit dates" being in the right order by adding an offset to the
commit date (which is stored in the generation number column), so the
modified commit date is always at least one more than the maximum
modified commit date of all ancestors.  Additionally the offset itself
is adjusted in such way that the offset of a commit is always at least
one more than the offsets of all ancestors.

Two following conditions are satisfied on the offset of a commit C for
every parent P:

 1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
 2. offset(C) > offset(P)

This reachability index is locally computable, which means it is
compatible with incremental commit graph feature.

It has the same problem as v3, namely that we have a "maximum clock
skew" based on the maximum generation number we can store, only more
so.

Note: it includes incidental whitespace cleanups for v3 code.

Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
Note that the diff looks a bit strange in the part adding the code for
the new compute_generation_numbers_5() subroutine, because it got
entangled in whitespace cleanup (which I shouldn't do, I'm sorry).

Relevant part of README.md from `gen-test` repository:

--------
**V5: Corrected Commit Date with Strictly Monotonic Offset.**
For a commit C, let its _offset commit date_ (denoted by odate(C))
be a commit date plus some offset, i.e. odate(C) = date(C) + offset(C),
such that:

1. Offset commit date is greater than the maximum of the commit date
   of C and the offset commit dates of its parents.

     If odate(A) < odate(B), then A cannot reach B.

2. Offset of a commit is one more than the maximum offset of a parent,
   or more

     If offset(A) < offset(B), then A cannot reach B.

This is backward-compatible version of V3: Corrected Commit Date.

### Comparing Reachability Index Versions Viability
[...]
| Index                     | Compatible? | Immutable? | Local? |
|---------------------------|-------------|------------|--------|
| Minimum Generation Number | Yes         | Yes        | Yes    |
| (Epoch, Date) pairs       | Yes         | Yes        | Yes    |
| Maximum Generation Number | Yes         | No         | No     |
| Corrected Commit Date     | No          | Yes        | Yes    |
| FELINE index              | Yes         | No         | No     |
| Offset Commit Date *NEW*  | Yes         | Yes        | Yes    |

_Note:_ The corrected commit date uses the generation number column
to store an offset of "how much do I need to add to my commit date
to get my corrected commit date?" The values stored in that column
are then not backwards-compatible.

_Note:_ The corrected commit date with strictly monotonic offset also
uses the generation number column to store the date offset, but the
offset alone can be used as generation number (as reachability index)
itself.


 commit-graph.c | 126 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 95 insertions(+), 31 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 91863d4895..633b4b24f8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -99,10 +99,9 @@ int compare_generations(struct generation *a, struct generation *b)
 			return 1;
 		return 0;
 
-	case 3:
-		ta = a->date + a->value1;
-		tb = b->date + b->value1;
-
+	case 3: /* V3: Corrected Commit Date */
+	case 5: /* V5: Strictly Monotonic Corrected Commit Date */
+		/* handle special cases, i.e. commits outside commit graph */
 		if (a->value1 == GENERATION_NUMBER_INFINITY) {
 			if (b->value1 == GENERATION_NUMBER_INFINITY)
 				return 0;
@@ -112,6 +111,10 @@ int compare_generations(struct generation *a, struct generation *b)
 			return -1;
 		}
 
+		/* corrected commit date = date + offset (correction) */
+		ta = a->date + a->value1;
+		tb = b->date + b->value1;
+
 		if (ta < tb)
 			return -1;
 		if (ta > tb)
@@ -162,6 +165,7 @@ void get_generation_version_from_commit(const struct commit *c,
 
 		case 1:
 		case 3:
+		case 5:
 			gen->value1 = c->generation;
 			gen->date = c->date;
 			break;
@@ -212,9 +216,10 @@ void set_generation_below_commit(const struct commit *c, struct generation *g)
 			break;
 
 		case 3:
+		case 5: /* ??? */
 			if (g->value1 + g->date >= gc.value1 + gc.date) {
 				g->value1 = 0;
-				g->date = gc.value1 + gc.date;			
+				g->date = gc.value1 + gc.date;
 			}
 			break;
 
@@ -363,7 +368,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 			else
 				graph->chunk_large_edges = data + chunk_offset;
 			break;
-		
+
 		case GRAPH_CHUNKID_FELINE:
 			if (graph->chunk_feline_gen)
 				chunk_repeated = 1;
@@ -971,27 +976,27 @@ static void compute_generation_numbers_3(struct packed_commit_list *commits)
 	int i;
 	struct commit_list *list = NULL;
 
-        for (i = 0; i < commits->nr; i++) {
-                if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
-                    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
-                        continue;
+	for (i = 0; i < commits->nr; i++) {
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
 
-                commit_list_insert(commits->list[i], &list);
+		commit_list_insert(commits->list[i], &list);
 
-                while (list) {
-                        struct commit *current = list->item;
-                        struct commit_list *parent;
-                        int all_parents_computed = 1;
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
 
-                        timestamp_t max_timestamp = current->date;
+			timestamp_t max_timestamp = current->date;
 
-                        for (parent = current->parents; parent; parent = parent->next) {
-                                if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-                                    parent->item->generation == GENERATION_NUMBER_ZERO) {
-                                        all_parents_computed = 0;
-                                        commit_list_insert(parent->item, &list);
-                                        break;
-                                } else {
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+					parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+				} else {
 					struct generation pg;
 					timestamp_t pt;
 					get_generation_version_from_commit(parent->item, 3, &pg);
@@ -1001,19 +1006,75 @@ static void compute_generation_numbers_3(struct packed_commit_list *commits)
 					if (pt > max_timestamp)
 						max_timestamp = pt + 1;
 				}
-                        }
+			}
+
+			if (all_parents_computed) {
+				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
+				pop_commit(&list);
+
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
+}
+
+static void compute_generation_numbers_5(struct packed_commit_list *commits)
+{
+	int i;
+	struct commit_list *list = NULL;
+
+	for (i = 0; i < commits->nr; i++) {
+		/* skip already computed generation numbers */
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
+
+		commit_list_insert(commits->list[i], &list);
+
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
+
+			timestamp_t max_timestamp = current->date;
+			uint32_t max_generation = 0;
+
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+
+				} else {
+					struct generation pg;
+					timestamp_t pt;
+					get_generation_version_from_commit(parent->item, 5, &pg);
+
+					pt = pg.value1 + pg.date;
+
+					if (pt > max_timestamp)
+						max_timestamp = pt + 1;
+					if (pg.value1 > max_generation)
+						max_generation = pg.value1;
+				}
+			}
 
-                        if (all_parents_computed) {
+			if (all_parents_computed) {
 				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
-                                pop_commit(&list);
+				if (current->generation < max_generation + 1)
+					current->generation = max_generation + 1;
+				pop_commit(&list);
 
-                                if (current->generation > GENERATION_NUMBER_MAX)
-                                        die(_("generation number gap is too high!"));
-                        }
-                }
-        }
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
 }
 
+
 static void compute_generation_numbers(struct packed_commit_list *commits,
 				       int generation_version)
 {
@@ -1038,6 +1099,9 @@ static void compute_generation_numbers(struct packed_commit_list *commits,
 	case 4:
 		/* compute at write time */
 		return;
+	case 5:
+		compute_generation_numbers_5(commits);
+		return;
 	}
 }
 
-- 
2.23.0.windows.1

  reply	other threads:[~2019-09-18  8:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-18  0:54 Revision walking, commit dates, slop Mike Hommey
2019-05-18  1:50 ` SZEDER Gábor
2019-05-18  3:58   ` Mike Hommey
2019-05-18  4:17     ` Mike Hommey
2019-05-18 12:01       ` SZEDER Gábor
2019-05-19 22:28         ` Jakub Narebski
2019-05-20  1:33       ` Derrick Stolee
2019-05-20 11:02         ` Jakub Narebski
2019-05-20 11:20           ` Derrick Stolee
2019-05-20 13:42             ` Jakub Narebski
2019-05-20 23:27               ` Jakub Narebski
2019-05-21  1:20                 ` Derrick Stolee
2019-05-22 18:29                   ` Jakub Narebski
2019-05-22 19:06                     ` Derrick Stolee
2019-05-23 21:04                       ` Jakub Narebski
2019-06-25  7:51               ` Jakub Narebski
2019-06-25 10:54                 ` Derrick Stolee
2019-09-18  8:43                   ` Jakub Narebski [this message]
2019-09-18 12:26                     ` [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) Derrick Stolee
2019-05-21 13:14         ` [PATCH] revision: use generation for A..B --topo-order queries Derrick Stolee
2019-05-21 13:59           ` [PATCH 2/2] revision: keep topo-walk free of unintersting commits Derrick Stolee
2019-05-22  2:19             ` Mike Hommey
2019-05-21  2:00   ` Revision walking, commit dates, slop Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86o8ziatb2.fsf_-_@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).