git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Cc: "sbeller@google.com" <sbeller@google.com>,
	"stolee@gmail.com" <stolee@gmail.com>,
	"jonathantanmy@google.com" <jonathantanmy@google.com>,
	"gitster@pobox.com" <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v2 17/18] commit-reach: make can_all_from_reach... linear
Date: Fri, 20 Jul 2018 16:33:28 +0000	[thread overview]
Message-ID: <20180720163227.105950-18-dstolee@microsoft.com> (raw)
In-Reply-To: <20180720163227.105950-1-dstolee@microsoft.com>

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each
'from' commit, sorted by ascending generation number. We do not walk
beyond the minimum generation number or the minimum commit date. This
DFS is likely to be faster than the existing reachable() method because
we expect previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other 'from' commits.
We also mark the visited commits using 'assign_flag' to prevent re-
walking the same commits.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case:

Before: 1.52 s
 After: 0.26 s

Large Case:

Before: 3.50 s
 After: 0.27 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 122 ++++++++++++++++++++++++++++++-------------------
 commit-reach.h |   9 ++--
 upload-pack.c  |   5 +-
 3 files changed, 83 insertions(+), 53 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index f5858944fd..bc522d6840 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -514,66 +514,87 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
 	return is_descendant_of(commit, list);
 }
 
-int reachable(struct commit *from, unsigned int with_flag,
-	      unsigned int assign_flag, time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-	struct prio_queue work = { compare_commits_by_commit_date };
+	const struct commit *a = (const struct commit *)_a;
+	const struct commit *b = (const struct commit *)_b;
 
-	prio_queue_put(&work, from);
-	while (work.nr) {
-		struct commit_list *list;
-		struct commit *commit = prio_queue_get(&work);
-
-		if (commit->object.flags & with_flag) {
-			from->object.flags |= assign_flag;
-			break;
-		}
-		if (!commit->object.parsed)
-			parse_object(the_repository, &commit->object.oid);
-		if (commit->object.flags & REACHABLE)
-			continue;
-		commit->object.flags |= REACHABLE;
-		if (commit->date < min_commit_date)
-			continue;
-		for (list = commit->parents; list; list = list->next) {
-			struct commit *parent = list->item;
-			if (!(parent->object.flags & REACHABLE))
-				prio_queue_put(&work, parent);
-		}
-	}
-	from->object.flags |= REACHABLE;
-	clear_commit_marks(from, REACHABLE);
-	clear_prio_queue(&work);
-	return (from->object.flags & assign_flag);
+	if (a->generation < b->generation)
+		return -1;
+	if (a->generation > b->generation)
+		return 1;
+	return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 				 unsigned int with_flag,
 				 unsigned int assign_flag,
-				 time_t min_commit_date)
+				 time_t min_commit_date,
+				 uint32_t min_generation)
 {
+	struct commit **list = NULL;
 	int i;
+	int result = 1;
 
+	ALLOC_ARRAY(list, from->nr);
 	for (i = 0; i < from->nr; i++) {
-		struct object *from_one = from->objects[i].item;
+		list[i] = (struct commit *)from->objects[i].item;
 
-		if (from_one->flags & assign_flag)
-			continue;
-		from_one = deref_tag(the_repository, from_one, "a from object", 0);
-		if (!from_one || from_one->type != OBJ_COMMIT) {
-			/* no way to tell if this is reachable by
-			 * looking at the ancestry chain alone, so
-			 * leave a note to ourselves not to worry about
-			 * this object anymore.
-			 */
-			from->objects[i].item->flags |= assign_flag;
-			continue;
-		}
-		if (!reachable((struct commit *)from_one, with_flag, assign_flag,
-			       min_commit_date))
+		if (parse_commit(list[i]) ||
+		    list[i]->generation < min_generation)
 			return 0;
 	}
-	return 1;
+
+	QSORT(list, from->nr, compare_commits_by_gen);
+
+	for (i = 0; i < from->nr; i++) {
+		/* DFS from list[i] */
+		struct commit_list *stack = NULL;
+
+		list[i]->object.flags |= assign_flag;
+		commit_list_insert(list[i], &stack);
+
+		while (stack) {
+			struct commit_list *parent;
+
+			if (stack->item->object.flags & with_flag) {
+				pop_commit(&stack);
+				continue;
+			}
+
+			for (parent = stack->item->parents; parent; parent = parent->next) {
+				if (parent->item->object.flags & (with_flag | RESULT))
+					stack->item->object.flags |= RESULT;
+
+				if (!(parent->item->object.flags & assign_flag)) {
+					parent->item->object.flags |= assign_flag;
+
+					if (parse_commit(parent->item) ||
+					    parent->item->date < min_commit_date ||
+					    parent->item->generation < min_generation)
+						continue;
+
+					commit_list_insert(parent->item, &stack);
+					break;
+				}
+			}
+
+			if (!parent)
+				pop_commit(&stack);
+		}
+
+		if (!(list[i]->object.flags & (with_flag | RESULT))) {
+			result = 0;
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	for (i = 0; i < from->nr; i++) {
+		clear_commit_marks(list[i], RESULT);
+		clear_commit_marks(list[i], assign_flag);
+	}
+	return result;
 }
 
 int can_all_from_reach(struct commit_list *from, struct commit_list *to,
@@ -583,6 +604,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 	time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0;
 	struct commit_list *from_iter = from, *to_iter = to;
 	int result;
+	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
 	while (from_iter) {
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
@@ -590,6 +612,9 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		if (!parse_commit(from_iter->item)) {
 			if (from_iter->item->date < min_commit_date)
 				min_commit_date = from_iter->item->date;
+
+			if (from_iter->item->generation < min_generation)
+				min_generation = from_iter->item->generation;
 		}
 
 		from_iter = from_iter->next;
@@ -599,6 +624,9 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		if (!parse_commit(to_iter->item)) {
 			if (to_iter->item->date < min_commit_date)
 				min_commit_date = to_iter->item->date;
+
+			if (to_iter->item->generation < min_generation)
+				min_generation = to_iter->item->generation;
 		}
 
 		to_iter->item->object.flags |= PARENT2;
@@ -607,7 +635,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 	}
 
 	result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
-					      min_commit_date);
+					      min_commit_date, min_generation);
 
 	while (from) {
 		clear_commit_marks(from->item, PARENT1);
diff --git a/commit-reach.h b/commit-reach.h
index aa202c9703..7d313e2975 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -59,19 +59,18 @@ define_commit_slab(contains_cache, enum contains_result);
 int commit_contains(struct ref_filter *filter, struct commit *commit,
 		    struct commit_list *list, struct contains_cache *cache);
 
-int reachable(struct commit *from, unsigned int with_flag,
-	      unsigned int assign_flag, time_t min_commit_date);
-
 /*
  * Determine if every commit in 'from' can reach at least one commit
  * that is marked with 'with_flag'. As we traverse, use 'assign_flag'
  * as a marker for commits that are already visited. Do not walk
- * commits with date below 'min_commit_date'.
+ * commits with date below 'min_commit_date' or generation below
+ * 'min_generation'.
  */
 int can_all_from_reach_with_flag(struct object_array *from,
 				 unsigned int with_flag,
 				 unsigned int assign_flag,
-				 time_t min_commit_date);
+				 time_t min_commit_date,
+				 uint32_t min_generation);
 int can_all_from_reach(struct commit_list *from, struct commit_list *to,
 		       int commit_date_cutoff);
 
diff --git a/upload-pack.c b/upload-pack.c
index 11c426685d..1e498f1188 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -338,11 +338,14 @@ static int got_oid(const char *hex, struct object_id *oid)
 
 static int ok_to_give_up(void)
 {
+	uint32_t min_generation = GENERATION_NUMBER_ZERO;
+
 	if (!have_obj.nr)
 		return 0;
 
 	return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
-					    COMMON_KNOWN, oldest_have);
+					    COMMON_KNOWN, oldest_have,
+					    min_generation);
 }
 
 static int get_common_commits(void)
-- 
2.18.0.118.gd4f65b8d14


  parent reply	other threads:[~2018-07-20 16:33 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 13:00 [PATCH 00/16] Consolidate reachability logic Derrick Stolee via GitGitGadget
2018-06-19 20:25 ` [PATCH 04/16] upload-pack: make reachable() more generic Derrick Stolee via GitGitGadget
2018-06-19 20:35 ` [PATCH 05/16] upload-pack: refactor ok_to_give_up() Derrick Stolee via GitGitGadget
2018-06-25 17:16 ` [PATCH 01/16] commit-reach: move walk methods from commit.c Derrick Stolee via GitGitGadget
2018-07-16 18:57   ` Stefan Beller
2018-07-16 21:31   ` Jonathan Tan
2018-06-25 17:35 ` [PATCH 02/16] commit-reach: move ref_newer from remote.c Derrick Stolee via GitGitGadget
2018-07-16 19:10   ` Stefan Beller
2018-06-25 18:01 ` [PATCH 03/16] commit-reach: move commit_contains from ref-filter Derrick Stolee via GitGitGadget
2018-07-16 19:14   ` Stefan Beller
2018-06-28 12:31 ` [PATCH 15/16] commit-reach: make can_all_from_reach... linear Derrick Stolee via GitGitGadget
2018-07-16 22:37   ` Stefan Beller
2018-07-17  1:16   ` Jonathan Tan
2018-10-01 19:16   ` René Scharfe
2018-10-01 19:26     ` Derrick Stolee
2018-10-01 20:37       ` René Scharfe
2018-10-04 22:59         ` René Scharfe
2018-10-05 12:15           ` Derrick Stolee
2018-10-05 16:51           ` Jeff King
2018-10-05 18:48             ` René Scharfe
2018-10-05 19:08               ` Jeff King
2018-10-05 19:36                 ` René Scharfe
2018-10-05 19:42                   ` Jeff King
2018-10-14 14:29                     ` René Scharfe
2018-10-15 15:31                       ` Derrick Stolee
2018-10-15 16:26                         ` René Scharfe
2018-10-16 23:09                       ` Junio C Hamano
2018-10-17  8:33                       ` Jeff King
2020-11-18  2:16                         ` Jonathan Nieder
2020-11-18  6:54                           ` Jeff King
2020-11-18 17:47                             ` René Scharfe
2018-10-05 19:12             ` Ævar Arnfjörð Bjarmason
2018-10-05 19:28               ` Jeff King
2018-10-05 19:42                 ` Ævar Arnfjörð Bjarmason
2018-10-05 19:44                   ` Jeff King
2018-07-12 20:47 ` [PATCH 06/16] upload-pack: generalize commit date cutoff Derrick Stolee via GitGitGadget
2018-07-16 19:38   ` Stefan Beller
2018-07-18 16:04     ` Derrick Stolee
2018-07-12 20:52 ` [PATCH 07/16] commit-reach: move can_all_from_reach_with_flags Derrick Stolee via GitGitGadget
2018-07-16 22:37   ` Jonathan Tan
2018-07-13 14:06 ` [PATCH 08/16] test-reach: create new test tool for ref_newer Derrick Stolee via GitGitGadget
2018-07-16 23:00   ` Jonathan Tan
2018-07-18 16:14     ` Derrick Stolee
2018-07-13 14:28 ` [PATCH 09/16] test-reach: test in_merge_bases Derrick Stolee via GitGitGadget
2018-07-13 14:38 ` [PATCH 10/16] test-reach: test is_descendant_of Derrick Stolee via GitGitGadget
2018-07-13 14:51 ` [PATCH 11/16] test-reach: test get_merge_bases_many Derrick Stolee via GitGitGadget
2018-07-16 21:24   ` Stefan Beller
2018-07-16 23:08   ` Jonathan Tan
2018-07-13 16:51 ` [PATCH 12/16] test-reach: test reduce_heads Derrick Stolee via GitGitGadget
2018-07-16 21:30   ` Stefan Beller
2018-07-16 21:59     ` Eric Sunshine
2018-07-13 17:22 ` [PATCH 13/16] test-reach: test can_all_from_reach_with_flags Derrick Stolee via GitGitGadget
2018-07-16 21:54   ` Stefan Beller
2018-07-18 16:54     ` Derrick Stolee
2018-07-17  0:10   ` Jonathan Tan
2018-07-13 18:37 ` [PATCH 14/16] commit-reach: replace ref_newer logic Derrick Stolee via GitGitGadget
2018-07-16 22:16   ` Stefan Beller
2018-07-13 19:25 ` [PATCH 16/16] commit-reach: use can_all_from_reach Derrick Stolee via GitGitGadget
2018-07-16 22:47   ` Stefan Beller
2018-07-16 13:54 ` [PATCH 00/16] Consolidate reachability logic Ramsay Jones
2018-07-16 16:18   ` Jeff King
2018-07-16 18:40     ` Eric Sunshine
2018-07-16 18:56       ` Jeff King
2018-07-16 18:59         ` Eric Sunshine
2018-07-18 12:32           ` Johannes Schindelin
2018-07-18 12:23     ` Johannes Schindelin
2018-07-18 19:21       ` Jeff King
2018-07-19 16:34         ` Johannes Schindelin
2018-07-16 17:26   ` Stefan Beller
2018-07-16 18:44     ` Eric Sunshine
2018-07-16 18:47       ` Derrick Stolee
2018-07-18 12:28         ` Johannes Schindelin
2018-07-18 15:01           ` Duy Nguyen
2018-07-18 17:01             ` Junio C Hamano
2018-07-18 17:11               ` Derrick Stolee
2018-07-19 16:37                 ` Johannes Schindelin
2018-07-19 16:32               ` Johannes Schindelin
2018-07-20 16:33 ` [PATCH v2 00/18] " Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 01/18] commit-reach: move walk methods from commit.c Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 02/18] commit.h: remove method declarations Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 03/18] commit-reach: move ref_newer from remote.c Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 04/18] commit-reach: move commit_contains from ref-filter Derrick Stolee
2018-08-28 21:24     ` Jonathan Nieder
2018-08-28 21:33       ` Derrick Stolee
2018-08-28 21:36       ` [PATCH] commit-reach: correct accidental #include of C file Jonathan Nieder
2018-08-28 21:39         ` Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 05/18] upload-pack: make reachable() more generic Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 06/18] upload-pack: refactor ok_to_give_up() Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 07/18] upload-pack: generalize commit date cutoff Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 08/18] commit-reach: move can_all_from_reach_with_flags Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 09/18] test-reach: create new test tool for ref_newer Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 10/18] test-reach: test in_merge_bases Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 11/18] test-reach: test is_descendant_of Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 12/18] test-reach: test get_merge_bases_many Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 13/18] test-reach: test reduce_heads Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 14/18] test-reach: test can_all_from_reach_with_flags Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 15/18] test-reach: test commit_contains Derrick Stolee
2018-07-23 20:35     ` Jonathan Tan
2018-07-25 18:08       ` Junio C Hamano
2018-07-25 18:30         ` Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 16/18] commit-reach: replace ref_newer logic Derrick Stolee
2018-07-20 16:33   ` Derrick Stolee [this message]
2018-07-23 20:41     ` [PATCH v2 17/18] commit-reach: make can_all_from_reach... linear Jonathan Tan
2018-08-01 20:41       ` Derrick Stolee
2018-09-12  4:14     ` Jeff King
2018-09-12  4:29       ` Jeff King
2018-09-12 13:08         ` Derrick Stolee
2018-07-20 16:33   ` [PATCH v2 18/18] commit-reach: use can_all_from_reach Derrick Stolee
2018-07-20 17:10   ` [PATCH v2 00/18] Consolidate reachability logic Stefan Beller
2018-07-20 17:15     ` Derrick Stolee
2018-07-20 22:16       ` Stefan Beller
2018-08-01 20:33         ` Derrick Stolee
2018-07-20 17:18   ` Derrick Stolee
2018-07-20 18:09     ` Eric Sunshine
2018-07-20 19:14       ` Derrick Stolee
2018-07-20 17:41   ` Duy Nguyen
2018-07-20 19:09     ` Derrick Stolee
2018-07-20 22:45   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180720163227.105950-18-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).