git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 00/13] Consolidate reachability logic
@ 2018-06-29 16:12 Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
                   ` (13 more replies)
  0 siblings, 14 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

This RFC is a bit unpolished because I was mostly seeing where the idea
could go. I wanted to achieve the following:

1. Consolidate several different commit walks into one file
2. Reduce duplicate reachability logic
3. Increase testability (correctness and performance)
4. Improve performance of reachability queries

My approach is mostly in three parts:

  I. Move code to a new commit-reach.c file.
 II. Add a 'test-tool reach' command to test these methods directly.
III. Modify the logic by improving performance and calling methods with
     similar logic but different prototypes.

The 'test-tool reach' command is helpful to make sure I don't break
anything as I change the logic, but also so I can test methods that are
normally only exposed by other more complicated commands. For instance,
ref_newer() is part of 'git push -f' and ok_to_give_up() is buried deep
within fetch negotiation. Both of these methods have some problematic
performacne issues that are corrected by this series. As I discovered
them, it was clear that it would be better to consolidate walk logic
instead of discovering a new walk in another file hidden somewhere.

For the ok_to_give_up() method, I refactored the method so I could pull
the logic out of the depths of fetch negotiation. In the commit
"commit-reach: make can_all_from_reach... linear" I discuss how the
existing algorithm is quadratic and how we can make it linear. Also, we
can use heuristic knowledge about the shape of the commit graph and the
usual haves/wants to get some extra performance bonus. (The heuristic is
to do a DFS with first-parents first, and stop on first found result. We
expect haves/wants to include ref tips, which typically have their
previous values in their first-parent history.)

The "test-reach" commit in particular is not split well or described
well for mailing-list review. I figured I would send the RFC instead of
tweaking it carefully, because I will need to re-do most of these
changes after more of the object-store series is merged. I don't plan to
send a v1 patch until the lookup_commit() and parse_commit() code is
stable again.

Thanks,
-Stolee

Derrick Stolee (13):
  commit-reach: move walk methods from commit.c
  commit-reach: move ref_newer from remote.c
  commit-reach: move commit_contains from ref-filter
  upload-pack: make reachable() more generic
  upload-pack: refactor ok_to_give_up()
  commit-reach: move can_all_from_reach_with_flag()
  test-reach
  test-reach: test reduce_heads()
  commit-reach: test can_all_from_reach
  commit-reach: test is_descendant_of
  commit-reach: make can_all_from_reach... linear
  commit-reach: use is_descendant_of for ref_newer
  commit-reach: use can_all_from_reach

 Makefile              |   2 +
 builtin/remote.c      |   1 +
 commit-reach.c        | 656 ++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h        |  37 +++
 commit.c              | 365 +----------------------
 commit.h              |   6 +-
 fast-import.c         |   1 +
 fetch-pack.c          |   3 +-
 http-push.c           |   1 +
 ref-filter.c          | 147 +---------
 remote.c              |  48 +---
 remote.h              |   1 -
 sha1-name.c           |   3 +-
 t/helper/test-reach.c | 128 +++++++++
 t/helper/test-tool.c  |   1 +
 t/helper/test-tool.h  |   1 +
 t/t6600-test-reach.sh | 177 ++++++++++++
 upload-pack.c         |  58 +---
 walker.c              |   3 +-
 19 files changed, 1035 insertions(+), 604 deletions(-)
 create mode 100644 commit-reach.c
 create mode 100644 commit-reach.h
 create mode 100644 t/helper/test-reach.c
 create mode 100755 t/t6600-test-reach.sh


base-commit: d4f65b8d141e041eb5e558cd9e763873e29863b9
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 01/13] commit-reach: move walk methods from commit.c
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 21:35   ` Stefan Beller
  2018-06-29 21:52   ` Junio C Hamano
  2018-06-29 16:12 ` [RFC PATCH 02/13] commit-reach: move ref_newer from remote.c Derrick Stolee
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 commit-reach.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h |  41 ++++++
 commit.c       | 358 ------------------------------------------------
 4 files changed, 401 insertions(+), 358 deletions(-)
 create mode 100644 commit-reach.c
 create mode 100644 commit-reach.h

diff --git a/Makefile b/Makefile
index 0cb6590f24..89ad873ce0 100644
--- a/Makefile
+++ b/Makefile
@@ -828,6 +828,7 @@ LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
 LIB_OBJS += commit-graph.o
+LIB_OBJS += commit-reach.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-reach.c b/commit-reach.c
new file mode 100644
index 0000000000..1438393165
--- /dev/null
+++ b/commit-reach.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "prio-queue.h"
+#include "commit-reach.h"
+
+/* Remember to update object flag allocation in object.h */
+#define PARENT1         (1u<<16)
+#define PARENT2         (1u<<17)
+#define STALE           (1u<<18)
+#define RESULT          (1u<<19)
+
+static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
+
+static int queue_has_nonstale(struct prio_queue *queue)
+{
+	int i;
+	for (i = 0; i < queue->nr; i++) {
+		struct commit *commit = queue->array[i].data;
+		if (!(commit->object.flags & STALE))
+			return 1;
+	}
+	return 0;
+}
+
+/* all input commits in one and twos[] must have been parsed! */
+static struct commit_list *paint_down_to_common(struct commit *one, int n,
+						struct commit **twos,
+						int min_generation)
+{
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	struct commit_list *result = NULL;
+	int i;
+	uint32_t last_gen = GENERATION_NUMBER_INFINITY;
+
+	one->object.flags |= PARENT1;
+	if (!n) {
+		commit_list_append(one, &result);
+		return result;
+	}
+	prio_queue_put(&queue, one);
+
+	for (i = 0; i < n; i++) {
+		twos[i]->object.flags |= PARENT2;
+		prio_queue_put(&queue, twos[i]);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *commit = prio_queue_get(&queue);
+		struct commit_list *parents;
+		int flags;
+
+		if (commit->generation > last_gen)
+			BUG("bad generation skip %8x > %8x at %s",
+			    commit->generation, last_gen,
+			    oid_to_hex(&commit->object.oid));
+		last_gen = commit->generation;
+
+		if (commit->generation < min_generation)
+			break;
+
+		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
+		if (flags == (PARENT1 | PARENT2)) {
+			if (!(commit->object.flags & RESULT)) {
+				commit->object.flags |= RESULT;
+				commit_list_insert_by_date(commit, &result);
+			}
+			/* Mark parents of a found merge stale */
+			flags |= STALE;
+		}
+		parents = commit->parents;
+		while (parents) {
+			struct commit *p = parents->item;
+			parents = parents->next;
+			if ((p->object.flags & flags) == flags)
+				continue;
+			if (parse_commit(p))
+				return NULL;
+			p->object.flags |= flags;
+			prio_queue_put(&queue, p);
+		}
+	}
+
+	clear_prio_queue(&queue);
+	return result;
+}
+
+static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos)
+{
+	struct commit_list *list = NULL;
+	struct commit_list *result = NULL;
+	int i;
+
+	for (i = 0; i < n; i++) {
+		if (one == twos[i])
+			/*
+			 * We do not mark this even with RESULT so we do not
+			 * have to clean it up.
+			 */
+			return commit_list_insert(one, &result);
+	}
+
+	if (parse_commit(one))
+		return NULL;
+	for (i = 0; i < n; i++) {
+		if (parse_commit(twos[i]))
+			return NULL;
+	}
+
+	list = paint_down_to_common(one, n, twos, 0);
+
+	while (list) {
+		struct commit *commit = pop_commit(&list);
+		if (!(commit->object.flags & STALE))
+			commit_list_insert_by_date(commit, &result);
+	}
+	return result;
+}
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in)
+{
+	struct commit_list *i, *j, *k, *ret = NULL;
+
+	if (!in)
+		return ret;
+
+	commit_list_insert(in->item, &ret);
+
+	for (i = in->next; i; i = i->next) {
+		struct commit_list *new_commits = NULL, *end = NULL;
+
+		for (j = ret; j; j = j->next) {
+			struct commit_list *bases;
+			bases = get_merge_bases(i->item, j->item);
+			if (!new_commits)
+				new_commits = bases;
+			else
+				end->next = bases;
+			for (k = bases; k; k = k->next)
+				end = k;
+		}
+		ret = new_commits;
+	}
+	return ret;
+}
+
+static int remove_redundant(struct commit **array, int cnt)
+{
+	/*
+	 * Some commit in the array may be an ancestor of
+	 * another commit.  Move such commit to the end of
+	 * the array, and return the number of commits that
+	 * are independent from each other.
+	 */
+	struct commit **work;
+	unsigned char *redundant;
+	int *filled_index;
+	int i, j, filled;
+
+	work = xcalloc(cnt, sizeof(*work));
+	redundant = xcalloc(cnt, 1);
+	ALLOC_ARRAY(filled_index, cnt - 1);
+
+	for (i = 0; i < cnt; i++)
+		parse_commit(array[i]);
+	for (i = 0; i < cnt; i++) {
+		struct commit_list *common;
+		uint32_t min_generation = array[i]->generation;
+
+		if (redundant[i])
+			continue;
+		for (j = filled = 0; j < cnt; j++) {
+			if (i == j || redundant[j])
+				continue;
+			filled_index[filled] = j;
+			work[filled++] = array[j];
+
+			if (array[j]->generation < min_generation)
+				min_generation = array[j]->generation;
+		}
+		common = paint_down_to_common(array[i], filled, work,
+					      min_generation);
+		if (array[i]->object.flags & PARENT2)
+			redundant[i] = 1;
+		for (j = 0; j < filled; j++)
+			if (work[j]->object.flags & PARENT1)
+				redundant[filled_index[j]] = 1;
+		clear_commit_marks(array[i], all_flags);
+		clear_commit_marks_many(filled, work, all_flags);
+		free_commit_list(common);
+	}
+
+	/* Now collect the result */
+	COPY_ARRAY(work, array, cnt);
+	for (i = filled = 0; i < cnt; i++)
+		if (!redundant[i])
+			array[filled++] = work[i];
+	for (j = filled, i = 0; i < cnt; i++)
+		if (redundant[i])
+			array[j++] = work[i];
+	free(work);
+	free(redundant);
+	free(filled_index);
+	return filled;
+}
+
+static struct commit_list *get_merge_bases_many_0(struct commit *one,
+						  int n,
+						  struct commit **twos,
+						  int cleanup)
+{
+	struct commit_list *list;
+	struct commit **rslt;
+	struct commit_list *result;
+	int cnt, i;
+
+	result = merge_bases_many(one, n, twos);
+	for (i = 0; i < n; i++) {
+		if (one == twos[i])
+			return result;
+	}
+	if (!result || !result->next) {
+		if (cleanup) {
+			clear_commit_marks(one, all_flags);
+			clear_commit_marks_many(n, twos, all_flags);
+		}
+		return result;
+	}
+
+	/* There are more than one */
+	cnt = commit_list_count(result);
+	rslt = xcalloc(cnt, sizeof(*rslt));
+	for (list = result, i = 0; list; list = list->next)
+		rslt[i++] = list->item;
+	free_commit_list(result);
+
+	clear_commit_marks(one, all_flags);
+	clear_commit_marks_many(n, twos, all_flags);
+
+	cnt = remove_redundant(rslt, cnt);
+	result = NULL;
+	for (i = 0; i < cnt; i++)
+		commit_list_insert_by_date(rslt[i], &result);
+	free(rslt);
+	return result;
+}
+
+struct commit_list *get_merge_bases_many(struct commit *one,
+					 int n,
+					 struct commit **twos)
+{
+	return get_merge_bases_many_0(one, n, twos, 1);
+}
+
+struct commit_list *get_merge_bases_many_dirty(struct commit *one,
+					       int n,
+					       struct commit **twos)
+{
+	return get_merge_bases_many_0(one, n, twos, 0);
+}
+
+struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
+{
+	return get_merge_bases_many_0(one, 1, &two, 1);
+}
+
+/*
+ * Is "commit" a descendant of one of the elements on the "with_commit" list?
+ */
+int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
+{
+	if (!with_commit)
+		return 1;
+	while (with_commit) {
+		struct commit *other;
+
+		other = with_commit->item;
+		with_commit = with_commit->next;
+		if (in_merge_bases(other, commit))
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * Is "commit" an ancestor of one of the "references"?
+ */
+int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference)
+{
+	struct commit_list *bases;
+	int ret = 0, i;
+	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+
+	if (parse_commit(commit))
+		return ret;
+	for (i = 0; i < nr_reference; i++) {
+		if (parse_commit(reference[i]))
+			return ret;
+		if (reference[i]->generation < min_generation)
+			min_generation = reference[i]->generation;
+	}
+
+	if (commit->generation > min_generation)
+		return ret;
+
+	bases = paint_down_to_common(commit, nr_reference, reference, commit->generation);
+	if (commit->object.flags & PARENT2)
+		ret = 1;
+	clear_commit_marks(commit, all_flags);
+	clear_commit_marks_many(nr_reference, reference, all_flags);
+	free_commit_list(bases);
+	return ret;
+}
+
+/*
+ * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
+ */
+int in_merge_bases(struct commit *commit, struct commit *reference)
+{
+	return in_merge_bases_many(commit, 1, &reference);
+}
+
+struct commit_list *reduce_heads(struct commit_list *heads)
+{
+	struct commit_list *p;
+	struct commit_list *result = NULL, **tail = &result;
+	struct commit **array;
+	int num_head, i;
+
+	if (!heads)
+		return NULL;
+
+	/* Uniquify */
+	for (p = heads; p; p = p->next)
+		p->item->object.flags &= ~STALE;
+	for (p = heads, num_head = 0; p; p = p->next) {
+		if (p->item->object.flags & STALE)
+			continue;
+		p->item->object.flags |= STALE;
+		num_head++;
+	}
+	array = xcalloc(num_head, sizeof(*array));
+	for (p = heads, i = 0; p; p = p->next) {
+		if (p->item->object.flags & STALE) {
+			array[i++] = p->item;
+			p->item->object.flags &= ~STALE;
+		}
+	}
+	num_head = remove_redundant(array, num_head);
+	for (i = 0; i < num_head; i++)
+		tail = &commit_list_insert(array[i], tail)->next;
+	free(array);
+	return result;
+}
+
+void reduce_heads_replace(struct commit_list **heads)
+{
+	struct commit_list *result = reduce_heads(*heads);
+	free_commit_list(*heads);
+	*heads = result;
+}
diff --git a/commit-reach.h b/commit-reach.h
new file mode 100644
index 0000000000..244f48c5f2
--- /dev/null
+++ b/commit-reach.h
@@ -0,0 +1,41 @@
+#ifndef __COMMIT_REACH_H__
+#define __COMMIT_REACH_H__
+
+#include "commit.h"
+
+struct commit_list *get_merge_bases_many(struct commit *one,
+					 int n,
+					 struct commit **twos);
+struct commit_list *get_merge_bases_many_dirty(struct commit *one,
+					       int n,
+					       struct commit **twos);
+struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
+struct commit_list *get_octopus_merge_bases(struct commit_list *in);
+
+/* To be used only when object flags after this call no longer matter */
+struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos);
+
+int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
+int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference);
+int in_merge_bases(struct commit *commit, struct commit *reference);
+
+
+/*
+ * Takes a list of commits and returns a new list where those
+ * have been removed that can be reached from other commits in
+ * the list. It is useful for, e.g., reducing the commits
+ * randomly thrown at the git-merge command and removing
+ * redundant commits that the user shouldn't have given to it.
+ *
+ * This function destroys the STALE bit of the commit objects'
+ * flags.
+ */
+struct commit_list *reduce_heads(struct commit_list *heads);
+
+/*
+ * Like `reduce_heads()`, except it replaces the list. Use this
+ * instead of `foo = reduce_heads(foo);` to avoid memory leaks.
+ */
+void reduce_heads_replace(struct commit_list **heads);
+
+#endif
diff --git a/commit.c b/commit.c
index 598cf21cee..d4ddaf4879 100644
--- a/commit.c
+++ b/commit.c
@@ -818,364 +818,6 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 		clear_author_date_slab(&author_date);
 }
 
-/* merge-base stuff */
-
-/* Remember to update object flag allocation in object.h */
-#define PARENT1		(1u<<16)
-#define PARENT2		(1u<<17)
-#define STALE		(1u<<18)
-#define RESULT		(1u<<19)
-
-static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
-
-static int queue_has_nonstale(struct prio_queue *queue)
-{
-	int i;
-	for (i = 0; i < queue->nr; i++) {
-		struct commit *commit = queue->array[i].data;
-		if (!(commit->object.flags & STALE))
-			return 1;
-	}
-	return 0;
-}
-
-/* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n,
-						struct commit **twos,
-						int min_generation)
-{
-	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
-	struct commit_list *result = NULL;
-	int i;
-	uint32_t last_gen = GENERATION_NUMBER_INFINITY;
-
-	one->object.flags |= PARENT1;
-	if (!n) {
-		commit_list_append(one, &result);
-		return result;
-	}
-	prio_queue_put(&queue, one);
-
-	for (i = 0; i < n; i++) {
-		twos[i]->object.flags |= PARENT2;
-		prio_queue_put(&queue, twos[i]);
-	}
-
-	while (queue_has_nonstale(&queue)) {
-		struct commit *commit = prio_queue_get(&queue);
-		struct commit_list *parents;
-		int flags;
-
-		if (commit->generation > last_gen)
-			BUG("bad generation skip %8x > %8x at %s",
-			    commit->generation, last_gen,
-			    oid_to_hex(&commit->object.oid));
-		last_gen = commit->generation;
-
-		if (commit->generation < min_generation)
-			break;
-
-		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
-		if (flags == (PARENT1 | PARENT2)) {
-			if (!(commit->object.flags & RESULT)) {
-				commit->object.flags |= RESULT;
-				commit_list_insert_by_date(commit, &result);
-			}
-			/* Mark parents of a found merge stale */
-			flags |= STALE;
-		}
-		parents = commit->parents;
-		while (parents) {
-			struct commit *p = parents->item;
-			parents = parents->next;
-			if ((p->object.flags & flags) == flags)
-				continue;
-			if (parse_commit(p))
-				return NULL;
-			p->object.flags |= flags;
-			prio_queue_put(&queue, p);
-		}
-	}
-
-	clear_prio_queue(&queue);
-	return result;
-}
-
-static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos)
-{
-	struct commit_list *list = NULL;
-	struct commit_list *result = NULL;
-	int i;
-
-	for (i = 0; i < n; i++) {
-		if (one == twos[i])
-			/*
-			 * We do not mark this even with RESULT so we do not
-			 * have to clean it up.
-			 */
-			return commit_list_insert(one, &result);
-	}
-
-	if (parse_commit(one))
-		return NULL;
-	for (i = 0; i < n; i++) {
-		if (parse_commit(twos[i]))
-			return NULL;
-	}
-
-	list = paint_down_to_common(one, n, twos, 0);
-
-	while (list) {
-		struct commit *commit = pop_commit(&list);
-		if (!(commit->object.flags & STALE))
-			commit_list_insert_by_date(commit, &result);
-	}
-	return result;
-}
-
-struct commit_list *get_octopus_merge_bases(struct commit_list *in)
-{
-	struct commit_list *i, *j, *k, *ret = NULL;
-
-	if (!in)
-		return ret;
-
-	commit_list_insert(in->item, &ret);
-
-	for (i = in->next; i; i = i->next) {
-		struct commit_list *new_commits = NULL, *end = NULL;
-
-		for (j = ret; j; j = j->next) {
-			struct commit_list *bases;
-			bases = get_merge_bases(i->item, j->item);
-			if (!new_commits)
-				new_commits = bases;
-			else
-				end->next = bases;
-			for (k = bases; k; k = k->next)
-				end = k;
-		}
-		ret = new_commits;
-	}
-	return ret;
-}
-
-static int remove_redundant(struct commit **array, int cnt)
-{
-	/*
-	 * Some commit in the array may be an ancestor of
-	 * another commit.  Move such commit to the end of
-	 * the array, and return the number of commits that
-	 * are independent from each other.
-	 */
-	struct commit **work;
-	unsigned char *redundant;
-	int *filled_index;
-	int i, j, filled;
-
-	work = xcalloc(cnt, sizeof(*work));
-	redundant = xcalloc(cnt, 1);
-	ALLOC_ARRAY(filled_index, cnt - 1);
-
-	for (i = 0; i < cnt; i++)
-		parse_commit(array[i]);
-	for (i = 0; i < cnt; i++) {
-		struct commit_list *common;
-		uint32_t min_generation = array[i]->generation;
-
-		if (redundant[i])
-			continue;
-		for (j = filled = 0; j < cnt; j++) {
-			if (i == j || redundant[j])
-				continue;
-			filled_index[filled] = j;
-			work[filled++] = array[j];
-
-			if (array[j]->generation < min_generation)
-				min_generation = array[j]->generation;
-		}
-		common = paint_down_to_common(array[i], filled, work,
-					      min_generation);
-		if (array[i]->object.flags & PARENT2)
-			redundant[i] = 1;
-		for (j = 0; j < filled; j++)
-			if (work[j]->object.flags & PARENT1)
-				redundant[filled_index[j]] = 1;
-		clear_commit_marks(array[i], all_flags);
-		clear_commit_marks_many(filled, work, all_flags);
-		free_commit_list(common);
-	}
-
-	/* Now collect the result */
-	COPY_ARRAY(work, array, cnt);
-	for (i = filled = 0; i < cnt; i++)
-		if (!redundant[i])
-			array[filled++] = work[i];
-	for (j = filled, i = 0; i < cnt; i++)
-		if (redundant[i])
-			array[j++] = work[i];
-	free(work);
-	free(redundant);
-	free(filled_index);
-	return filled;
-}
-
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
-						  int n,
-						  struct commit **twos,
-						  int cleanup)
-{
-	struct commit_list *list;
-	struct commit **rslt;
-	struct commit_list *result;
-	int cnt, i;
-
-	result = merge_bases_many(one, n, twos);
-	for (i = 0; i < n; i++) {
-		if (one == twos[i])
-			return result;
-	}
-	if (!result || !result->next) {
-		if (cleanup) {
-			clear_commit_marks(one, all_flags);
-			clear_commit_marks_many(n, twos, all_flags);
-		}
-		return result;
-	}
-
-	/* There are more than one */
-	cnt = commit_list_count(result);
-	rslt = xcalloc(cnt, sizeof(*rslt));
-	for (list = result, i = 0; list; list = list->next)
-		rslt[i++] = list->item;
-	free_commit_list(result);
-
-	clear_commit_marks(one, all_flags);
-	clear_commit_marks_many(n, twos, all_flags);
-
-	cnt = remove_redundant(rslt, cnt);
-	result = NULL;
-	for (i = 0; i < cnt; i++)
-		commit_list_insert_by_date(rslt[i], &result);
-	free(rslt);
-	return result;
-}
-
-struct commit_list *get_merge_bases_many(struct commit *one,
-					 int n,
-					 struct commit **twos)
-{
-	return get_merge_bases_many_0(one, n, twos, 1);
-}
-
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-					       int n,
-					       struct commit **twos)
-{
-	return get_merge_bases_many_0(one, n, twos, 0);
-}
-
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
-{
-	return get_merge_bases_many_0(one, 1, &two, 1);
-}
-
-/*
- * Is "commit" a descendant of one of the elements on the "with_commit" list?
- */
-int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
-{
-	if (!with_commit)
-		return 1;
-	while (with_commit) {
-		struct commit *other;
-
-		other = with_commit->item;
-		with_commit = with_commit->next;
-		if (in_merge_bases(other, commit))
-			return 1;
-	}
-	return 0;
-}
-
-/*
- * Is "commit" an ancestor of one of the "references"?
- */
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference)
-{
-	struct commit_list *bases;
-	int ret = 0, i;
-	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
-
-	if (parse_commit(commit))
-		return ret;
-	for (i = 0; i < nr_reference; i++) {
-		if (parse_commit(reference[i]))
-			return ret;
-		if (reference[i]->generation < min_generation)
-			min_generation = reference[i]->generation;
-	}
-
-	if (commit->generation > min_generation)
-		return ret;
-
-	bases = paint_down_to_common(commit, nr_reference, reference, commit->generation);
-	if (commit->object.flags & PARENT2)
-		ret = 1;
-	clear_commit_marks(commit, all_flags);
-	clear_commit_marks_many(nr_reference, reference, all_flags);
-	free_commit_list(bases);
-	return ret;
-}
-
-/*
- * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
- */
-int in_merge_bases(struct commit *commit, struct commit *reference)
-{
-	return in_merge_bases_many(commit, 1, &reference);
-}
-
-struct commit_list *reduce_heads(struct commit_list *heads)
-{
-	struct commit_list *p;
-	struct commit_list *result = NULL, **tail = &result;
-	struct commit **array;
-	int num_head, i;
-
-	if (!heads)
-		return NULL;
-
-	/* Uniquify */
-	for (p = heads; p; p = p->next)
-		p->item->object.flags &= ~STALE;
-	for (p = heads, num_head = 0; p; p = p->next) {
-		if (p->item->object.flags & STALE)
-			continue;
-		p->item->object.flags |= STALE;
-		num_head++;
-	}
-	array = xcalloc(num_head, sizeof(*array));
-	for (p = heads, i = 0; p; p = p->next) {
-		if (p->item->object.flags & STALE) {
-			array[i++] = p->item;
-			p->item->object.flags &= ~STALE;
-		}
-	}
-	num_head = remove_redundant(array, num_head);
-	for (i = 0; i < num_head; i++)
-		tail = &commit_list_insert(array[i], tail)->next;
-	free(array);
-	return result;
-}
-
-void reduce_heads_replace(struct commit_list **heads)
-{
-	struct commit_list *result = reduce_heads(*heads);
-	free_commit_list(*heads);
-	*heads = result;
-}
-
 static const char gpg_sig_header[] = "gpgsig";
 static const int gpg_sig_header_len = sizeof(gpg_sig_header) - 1;
 
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 02/13] commit-reach: move ref_newer from remote.c
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/remote.c |  1 +
 commit-reach.c   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h   |  2 ++
 http-push.c      |  1 +
 remote.c         | 48 +-------------------------------------------
 remote.h         |  1 -
 6 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 1a82d850a2..341f704ada 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -9,6 +9,7 @@
 #include "refs.h"
 #include "refspec.h"
 #include "argv-array.h"
+#include "commit-reach.h"
 
 static const char * const builtin_remote_usage[] = {
 	N_("git remote [-v | --verbose]"),
diff --git a/commit-reach.c b/commit-reach.c
index 1438393165..80cdb738f6 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,5 +1,10 @@
 #include "cache.h"
+#include "commit.h"
+#include "decorate.h"
 #include "prio-queue.h"
+#include "tree.h"
+#include "revision.h"
+#include "tag.h"
 #include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
@@ -357,3 +362,50 @@ void reduce_heads_replace(struct commit_list **heads)
 	free_commit_list(*heads);
 	*heads = result;
 }
+
+static void unmark_and_free(struct commit_list *list, unsigned int mark)
+{
+	while (list) {
+		struct commit *commit = pop_commit(&list);
+		commit->object.flags &= ~mark;
+	}
+}
+
+int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
+{
+	struct object *o;
+	struct commit *old_commit, *new_commit;
+	struct commit_list *list, *used;
+	int found = 0;
+
+	/*
+	 * Both new_commit and old_commit must be commit-ish and new_commit is descendant of
+	 * old_commit.  Otherwise we require --force.
+	 */
+	o = deref_tag(parse_object(old_oid), NULL, 0);
+	if (!o || o->type != OBJ_COMMIT)
+		return 0;
+	old_commit = (struct commit *) o;
+
+	o = deref_tag(parse_object(new_oid), NULL, 0);
+	if (!o || o->type != OBJ_COMMIT)
+		return 0;
+	new_commit = (struct commit *) o;
+
+	if (parse_commit(new_commit) < 0)
+		return 0;
+
+	used = list = NULL;
+	commit_list_insert(new_commit, &list);
+	while (list) {
+		new_commit = pop_most_recent_commit(&list, TMP_MARK);
+		commit_list_insert(new_commit, &used);
+		if (new_commit == old_commit) {
+			found = 1;
+			break;
+		}
+	}
+	unmark_and_free(list, TMP_MARK);
+	unmark_and_free(used, TMP_MARK);
+	return found;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 244f48c5f2..35ec9f0ddb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -38,4 +38,6 @@ struct commit_list *reduce_heads(struct commit_list *heads);
  */
 void reduce_heads_replace(struct commit_list **heads);
 
+int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
+
 #endif
diff --git a/http-push.c b/http-push.c
index 7e38522098..218315b00b 100644
--- a/http-push.c
+++ b/http-push.c
@@ -13,6 +13,7 @@
 #include "argv-array.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "commit-reach.h"
 
 #ifdef EXPAT_NEEDS_XMLPARSE_H
 #include <xmlparse.h>
diff --git a/remote.c b/remote.c
index abe80c1397..dcb5a33fac 100644
--- a/remote.c
+++ b/remote.c
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "mergesort.h"
 #include "argv-array.h"
+#include "commit-reach.h"
 
 enum map_direction { FROM_SRC, FROM_DST };
 
@@ -1781,53 +1782,6 @@ int resolve_remote_symref(struct ref *ref, struct ref *list)
 	return 1;
 }
 
-static void unmark_and_free(struct commit_list *list, unsigned int mark)
-{
-	while (list) {
-		struct commit *commit = pop_commit(&list);
-		commit->object.flags &= ~mark;
-	}
-}
-
-int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
-{
-	struct object *o;
-	struct commit *old_commit, *new_commit;
-	struct commit_list *list, *used;
-	int found = 0;
-
-	/*
-	 * Both new_commit and old_commit must be commit-ish and new_commit is descendant of
-	 * old_commit.  Otherwise we require --force.
-	 */
-	o = deref_tag(parse_object(old_oid), NULL, 0);
-	if (!o || o->type != OBJ_COMMIT)
-		return 0;
-	old_commit = (struct commit *) o;
-
-	o = deref_tag(parse_object(new_oid), NULL, 0);
-	if (!o || o->type != OBJ_COMMIT)
-		return 0;
-	new_commit = (struct commit *) o;
-
-	if (parse_commit(new_commit) < 0)
-		return 0;
-
-	used = list = NULL;
-	commit_list_insert(new_commit, &list);
-	while (list) {
-		new_commit = pop_most_recent_commit(&list, TMP_MARK);
-		commit_list_insert(new_commit, &used);
-		if (new_commit == old_commit) {
-			found = 1;
-			break;
-		}
-	}
-	unmark_and_free(list, TMP_MARK);
-	unmark_and_free(used, TMP_MARK);
-	return found;
-}
-
 /*
  * Lookup the upstream branch for the given branch and if present, optionally
  * compute the commit ahead/behind values for the pair.
diff --git a/remote.h b/remote.h
index 45ecc6cefa..56fb9cbb27 100644
--- a/remote.h
+++ b/remote.h
@@ -149,7 +149,6 @@ extern struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 				    const struct string_list *server_options);
 
 int resolve_remote_symref(struct ref *ref, struct ref *list);
-int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
 
 /*
  * Remove and free all but the first of any entries in the input list
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 02/13] commit-reach: move ref_newer from remote.c Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 21:38   ` Stefan Beller
  2018-06-29 22:00   ` Junio C Hamano
  2018-06-29 16:12 ` [RFC PATCH 04/13] upload-pack: make reachable() more generic Derrick Stolee
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 119 +++++++++++++++++++++++++++++++++++++++
 commit-reach.h |  44 +++++----------
 fast-import.c  |   1 +
 ref-filter.c   | 147 +++----------------------------------------------
 4 files changed, 141 insertions(+), 170 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 80cdb738f6..6cfd7379ce 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -409,3 +409,122 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
 	unmark_and_free(used, TMP_MARK);
 	return found;
 }
+
+/*
+ * Mimicking the real stack, this stack lives on the heap, avoiding stack
+ * overflows.
+ *
+ * At each recursion step, the stack items points to the commits whose
+ * ancestors are to be inspected.
+ */
+struct contains_stack {
+	int nr, alloc;
+	struct contains_stack_entry {
+		struct commit *commit;
+		struct commit_list *parents;
+	} *contains_stack;
+};
+
+static int in_commit_list(const struct commit_list *want, struct commit *c)
+{
+	for (; want; want = want->next)
+		if (!oidcmp(&want->item->object.oid, &c->object.oid))
+			return 1;
+	return 0;
+}
+
+/*
+ * Test whether the candidate is contained in the list.
+ * Do not recurse to find out, though, but return -1 if inconclusive.
+ */
+static enum contains_result contains_test(struct commit *candidate,
+					  const struct commit_list *want,
+					  struct contains_cache *cache,
+					  uint32_t cutoff)
+{
+	enum contains_result *cached = contains_cache_at(cache, candidate);
+
+	/* If we already have the answer cached, return that. */
+	if (*cached)
+		return *cached;
+
+	/* or are we it? */
+	if (in_commit_list(want, candidate)) {
+		*cached = CONTAINS_YES;
+		return CONTAINS_YES;
+	}
+
+	/* Otherwise, we don't know; prepare to recurse */
+	parse_commit_or_die(candidate);
+
+	if (candidate->generation < cutoff)
+		return CONTAINS_NO;
+
+	return CONTAINS_UNKNOWN;
+}
+
+static void push_to_contains_stack(struct commit *candidate, struct contains_stack *contains_stack)
+{
+	ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, contains_stack->alloc);
+	contains_stack->contains_stack[contains_stack->nr].commit = candidate;
+	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
+}
+
+static enum contains_result contains_tag_algo(struct commit *candidate,
+					      const struct commit_list *want,
+					      struct contains_cache *cache)
+{
+	struct contains_stack contains_stack = { 0, 0, NULL };
+	enum contains_result result;
+	uint32_t cutoff = GENERATION_NUMBER_INFINITY;
+	const struct commit_list *p;
+
+	for (p = want; p; p = p->next) {
+		struct commit *c = p->item;
+		load_commit_graph_info(c);
+		if (c->generation < cutoff)
+			cutoff = c->generation;
+	}
+
+	result = contains_test(candidate, want, cache, cutoff);
+	if (result != CONTAINS_UNKNOWN)
+		return result;
+
+	push_to_contains_stack(candidate, &contains_stack);
+	while (contains_stack.nr) {
+		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
+		struct commit *commit = entry->commit;
+		struct commit_list *parents = entry->parents;
+
+		if (!parents) {
+			*contains_cache_at(cache, commit) = CONTAINS_NO;
+			contains_stack.nr--;
+		}
+		/*
+		 * If we just popped the stack, parents->item has been marked,
+		 * therefore contains_test will return a meaningful yes/no.
+		 */
+		else switch (contains_test(parents->item, want, cache, cutoff)) {
+		case CONTAINS_YES:
+			*contains_cache_at(cache, commit) = CONTAINS_YES;
+			contains_stack.nr--;
+			break;
+		case CONTAINS_NO:
+			entry->parents = parents->next;
+			break;
+		case CONTAINS_UNKNOWN:
+			push_to_contains_stack(parents->item, &contains_stack);
+			break;
+		}
+	}
+	free(contains_stack.contains_stack);
+	return contains_test(candidate, want, cache, cutoff);
+}
+
+int commit_contains(struct ref_filter *filter, struct commit *commit,
+		    struct commit_list *list, struct contains_cache *cache)
+{
+	if (filter->with_commit_tag_algo)
+		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
+	return is_descendant_of(commit, list);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 35ec9f0ddb..986fb388d5 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -2,42 +2,24 @@
 #define __COMMIT_REACH_H__
 
 #include "commit.h"
+#include "commit-slab.h"
+#include "ref-filter.h"
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-					 int n,
-					 struct commit **twos);
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-					       int n,
-					       struct commit **twos);
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
-struct commit_list *get_octopus_merge_bases(struct commit_list *in);
-
-/* To be used only when object flags after this call no longer matter */
-struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos);
-
-int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference);
-int in_merge_bases(struct commit *commit, struct commit *reference);
-
+int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
 
 /*
- * Takes a list of commits and returns a new list where those
- * have been removed that can be reached from other commits in
- * the list. It is useful for, e.g., reducing the commits
- * randomly thrown at the git-merge command and removing
- * redundant commits that the user shouldn't have given to it.
- *
- * This function destroys the STALE bit of the commit objects'
- * flags.
+ * Unknown has to be "0" here, because that's the default value for
+ * contains_cache slab entries that have not yet been assigned.
  */
-struct commit_list *reduce_heads(struct commit_list *heads);
+enum contains_result {
+	CONTAINS_UNKNOWN = 0,
+	CONTAINS_NO,
+	CONTAINS_YES
+};
 
-/*
- * Like `reduce_heads()`, except it replaces the list. Use this
- * instead of `foo = reduce_heads(foo);` to avoid memory leaks.
- */
-void reduce_heads_replace(struct commit_list **heads);
+define_commit_slab(contains_cache, enum contains_result);
 
-int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
+int commit_contains(struct ref_filter *filter, struct commit *commit,
+		    struct commit_list *list, struct contains_cache *cache);
 
 #endif
diff --git a/fast-import.c b/fast-import.c
index 4d55910ab9..49ce8e8426 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -171,6 +171,7 @@ Format of STDIN stream:
 #include "packfile.h"
 #include "object-store.h"
 #include "mem-pool.h"
+#include "commit-reach.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
diff --git a/ref-filter.c b/ref-filter.c
index fa3685d91f..f4f71728ae 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -16,7 +16,7 @@
 #include "trailer.h"
 #include "wt-status.h"
 #include "commit-slab.h"
-#include "commit-graph.h"
+#include "commit-reach.h"
 
 static struct ref_msg {
 	const char *gone;
@@ -1620,144 +1620,6 @@ static int get_ref_atom_value(struct ref_array_item *ref, int atom,
 	return 0;
 }
 
-/*
- * Unknown has to be "0" here, because that's the default value for
- * contains_cache slab entries that have not yet been assigned.
- */
-enum contains_result {
-	CONTAINS_UNKNOWN = 0,
-	CONTAINS_NO,
-	CONTAINS_YES
-};
-
-define_commit_slab(contains_cache, enum contains_result);
-
-struct ref_filter_cbdata {
-	struct ref_array *array;
-	struct ref_filter *filter;
-	struct contains_cache contains_cache;
-	struct contains_cache no_contains_cache;
-};
-
-/*
- * Mimicking the real stack, this stack lives on the heap, avoiding stack
- * overflows.
- *
- * At each recursion step, the stack items points to the commits whose
- * ancestors are to be inspected.
- */
-struct contains_stack {
-	int nr, alloc;
-	struct contains_stack_entry {
-		struct commit *commit;
-		struct commit_list *parents;
-	} *contains_stack;
-};
-
-static int in_commit_list(const struct commit_list *want, struct commit *c)
-{
-	for (; want; want = want->next)
-		if (!oidcmp(&want->item->object.oid, &c->object.oid))
-			return 1;
-	return 0;
-}
-
-/*
- * Test whether the candidate is contained in the list.
- * Do not recurse to find out, though, but return -1 if inconclusive.
- */
-static enum contains_result contains_test(struct commit *candidate,
-					  const struct commit_list *want,
-					  struct contains_cache *cache,
-					  uint32_t cutoff)
-{
-	enum contains_result *cached = contains_cache_at(cache, candidate);
-
-	/* If we already have the answer cached, return that. */
-	if (*cached)
-		return *cached;
-
-	/* or are we it? */
-	if (in_commit_list(want, candidate)) {
-		*cached = CONTAINS_YES;
-		return CONTAINS_YES;
-	}
-
-	/* Otherwise, we don't know; prepare to recurse */
-	parse_commit_or_die(candidate);
-
-	if (candidate->generation < cutoff)
-		return CONTAINS_NO;
-
-	return CONTAINS_UNKNOWN;
-}
-
-static void push_to_contains_stack(struct commit *candidate, struct contains_stack *contains_stack)
-{
-	ALLOC_GROW(contains_stack->contains_stack, contains_stack->nr + 1, contains_stack->alloc);
-	contains_stack->contains_stack[contains_stack->nr].commit = candidate;
-	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
-}
-
-static enum contains_result contains_tag_algo(struct commit *candidate,
-					      const struct commit_list *want,
-					      struct contains_cache *cache)
-{
-	struct contains_stack contains_stack = { 0, 0, NULL };
-	enum contains_result result;
-	uint32_t cutoff = GENERATION_NUMBER_INFINITY;
-	const struct commit_list *p;
-
-	for (p = want; p; p = p->next) {
-		struct commit *c = p->item;
-		load_commit_graph_info(c);
-		if (c->generation < cutoff)
-			cutoff = c->generation;
-	}
-
-	result = contains_test(candidate, want, cache, cutoff);
-	if (result != CONTAINS_UNKNOWN)
-		return result;
-
-	push_to_contains_stack(candidate, &contains_stack);
-	while (contains_stack.nr) {
-		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
-		struct commit *commit = entry->commit;
-		struct commit_list *parents = entry->parents;
-
-		if (!parents) {
-			*contains_cache_at(cache, commit) = CONTAINS_NO;
-			contains_stack.nr--;
-		}
-		/*
-		 * If we just popped the stack, parents->item has been marked,
-		 * therefore contains_test will return a meaningful yes/no.
-		 */
-		else switch (contains_test(parents->item, want, cache, cutoff)) {
-		case CONTAINS_YES:
-			*contains_cache_at(cache, commit) = CONTAINS_YES;
-			contains_stack.nr--;
-			break;
-		case CONTAINS_NO:
-			entry->parents = parents->next;
-			break;
-		case CONTAINS_UNKNOWN:
-			push_to_contains_stack(parents->item, &contains_stack);
-			break;
-		}
-	}
-	free(contains_stack.contains_stack);
-	return contains_test(candidate, want, cache, cutoff);
-}
-
-static int commit_contains(struct ref_filter *filter, struct commit *commit,
-			   struct commit_list *list, struct contains_cache *cache)
-{
-	if (filter->with_commit_tag_algo)
-		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
-	return is_descendant_of(commit, list);
-}
-
 /*
  * Return 1 if the refname matches one of the patterns, otherwise 0.
  * A pattern can be a literal prefix (e.g. a refname "refs/heads/master"
@@ -1984,6 +1846,13 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 	return ref_kind_from_refname(refname);
 }
 
+struct ref_filter_cbdata {
+       struct ref_array *array;
+       struct ref_filter *filter;
+       struct contains_cache contains_cache;
+       struct contains_cache no_contains_cache;
+};
+
 /*
  * A call-back given to for_each_ref().  Filter refs and keep them for
  * later object processing.
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 04/13] upload-pack: make reachable() more generic
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (2 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 22:05   ` Junio C Hamano
  2018-06-29 16:12 ` [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up() Derrick Stolee
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

In anticipation of moving the reachable() method to commit-reach.c,
modify the prototype to be more generic to flags known outside of
upload-pack.c. Also rename 'want' to 'from' to make the statement
more clear outside of the context of haves/wants negotiation.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 upload-pack.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 87c6722ea5..95c56dc027 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -28,7 +28,6 @@
 #define OUR_REF		(1u << 12)
 #define WANTED		(1u << 13)
 #define COMMON_KNOWN	(1u << 14)
-#define REACHABLE	(1u << 15)
 
 #define SHALLOW		(1u << 16)
 #define NOT_SHALLOW	(1u << 17)
@@ -334,36 +333,36 @@ static int got_oid(const char *hex, struct object_id *oid)
 	return 0;
 }
 
-static int reachable(struct commit *want)
+static int reachable(struct commit *from, int with_flag, int assign_flag)
 {
 	struct prio_queue work = { compare_commits_by_commit_date };
 
-	prio_queue_put(&work, want);
+	prio_queue_put(&work, from);
 	while (work.nr) {
 		struct commit_list *list;
 		struct commit *commit = prio_queue_get(&work);
 
-		if (commit->object.flags & THEY_HAVE) {
-			want->object.flags |= COMMON_KNOWN;
+		if (commit->object.flags & with_flag) {
+			from->object.flags |= assign_flag;
 			break;
 		}
 		if (!commit->object.parsed)
 			parse_object(&commit->object.oid);
-		if (commit->object.flags & REACHABLE)
+		if (commit->object.flags & TMP_MARK)
 			continue;
-		commit->object.flags |= REACHABLE;
+		commit->object.flags |= TMP_MARK;
 		if (commit->date < oldest_have)
 			continue;
 		for (list = commit->parents; list; list = list->next) {
 			struct commit *parent = list->item;
-			if (!(parent->object.flags & REACHABLE))
+			if (!(parent->object.flags & TMP_MARK))
 				prio_queue_put(&work, parent);
 		}
 	}
-	want->object.flags |= REACHABLE;
-	clear_commit_marks(want, REACHABLE);
+	from->object.flags |= TMP_MARK;
+	clear_commit_marks(from, TMP_MARK);
 	clear_prio_queue(&work);
-	return (want->object.flags & COMMON_KNOWN);
+	return (from->object.flags & assign_flag);
 }
 
 static int ok_to_give_up(void)
@@ -388,7 +387,7 @@ static int ok_to_give_up(void)
 			want_obj.objects[i].item->flags |= COMMON_KNOWN;
 			continue;
 		}
-		if (!reachable((struct commit *)want))
+		if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
 			return 0;
 	}
 	return 1;
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (3 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 04/13] upload-pack: make reachable() more generic Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 21:44   ` Stefan Beller
  2018-06-29 16:12 ` [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag() Derrick Stolee
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

In anticipation of consolidating all commit reachability algorithms,
refactor ok_to_give_up() in order to allow splitting its logic into
an external method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 upload-pack.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 95c56dc027..2f09cadbc0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -365,34 +365,40 @@ static int reachable(struct commit *from, int with_flag, int assign_flag)
 	return (from->object.flags & assign_flag);
 }
 
-static int ok_to_give_up(void)
+static int can_all_from_reach_with_flag(struct object_array from,
+					int with_flag, int assign_flag)
 {
 	int i;
 
-	if (!have_obj.nr)
-		return 0;
-
-	for (i = 0; i < want_obj.nr; i++) {
-		struct object *want = want_obj.objects[i].item;
+	for (i = 0; i < from.nr; i++) {
+		struct object *from_one = from.objects[i].item;
 
-		if (want->flags & COMMON_KNOWN)
+		if (from_one->flags & assign_flag)
 			continue;
-		want = deref_tag(want, "a want line", 0);
-		if (!want || want->type != OBJ_COMMIT) {
+		from_one = deref_tag(from_one, "a from object", 0);
+		if (!from_one || from_one->type != OBJ_COMMIT) {
 			/* no way to tell if this is reachable by
 			 * looking at the ancestry chain alone, so
 			 * leave a note to ourselves not to worry about
 			 * this object anymore.
 			 */
-			want_obj.objects[i].item->flags |= COMMON_KNOWN;
+			from.objects[i].item->flags |= assign_flag;
 			continue;
 		}
-		if (!reachable((struct commit *)want, THEY_HAVE, COMMON_KNOWN))
+		if (!reachable((struct commit *)from_one, with_flag, assign_flag))
 			return 0;
 	}
 	return 1;
 }
 
+static int ok_to_give_up(void)
+{
+	if (!have_obj.nr)
+		return 0;
+
+	return can_all_from_reach_with_flag(want_obj, THEY_HAVE, COMMON_KNOWN);
+}
+
 static int get_common_commits(void)
 {
 	struct object_id oid;
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (4 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up() Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 21:47   ` Stefan Beller
  2018-06-29 16:12 ` [RFC PATCH 07/13] test-reach Derrick Stolee
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h |  8 +++++++
 upload-pack.c  | 62 +++-----------------------------------------------
 3 files changed, 72 insertions(+), 59 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 6cfd7379ce..44c09669ec 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "decorate.h"
 #include "prio-queue.h"
 #include "tree.h"
@@ -528,3 +529,63 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
 		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
 	return is_descendant_of(commit, list);
 }
+
+static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
+{
+	struct prio_queue work = { compare_commits_by_commit_date };
+
+	prio_queue_put(&work, from);
+	while (work.nr) {
+		struct commit_list *list;
+		struct commit *commit = prio_queue_get(&work);
+
+		if (commit->object.flags & with_flag) {
+			from->object.flags |= assign_flag;
+			break;
+		}
+		if (!commit->object.parsed)
+			parse_object(&commit->object.oid);
+		if (commit->object.flags & TMP_MARK)
+			continue;
+		commit->object.flags |= TMP_MARK;
+		if (commit->date < min_commit_date)
+			continue;
+		for (list = commit->parents; list; list = list->next) {
+			struct commit *parent = list->item;
+			if (!(parent->object.flags & TMP_MARK))
+				prio_queue_put(&work, parent);
+		}
+	}
+	from->object.flags |= TMP_MARK;
+	clear_commit_marks(from, TMP_MARK);
+	clear_prio_queue(&work);
+	return (from->object.flags & assign_flag);
+}
+
+int can_all_from_reach_with_flag(struct object_array from,
+				 int with_flag, int assign_flag,
+				 time_t min_commit_date)
+{
+	int i;
+
+	for (i = 0; i < from.nr; i++) {
+		struct object *from_one = from.objects[i].item;
+
+		if (from_one->flags & assign_flag)
+			continue;
+		from_one = deref_tag(from_one, "a from object", 0);
+		if (!from_one || from_one->type != OBJ_COMMIT) {
+			/* no way to tell if this is reachable by
+			 * looking at the ancestry chain alone, so
+			 * leave a note to ourselves not to worry about
+			 * this object anymore.
+			 */
+			from.objects[i].item->flags |= assign_flag;
+			continue;
+		}
+		if (!reachable((struct commit *)from_one, with_flag,
+			       assign_flag, min_commit_date))
+			return 0;
+	}
+	return 1;
+}
diff --git a/commit-reach.h b/commit-reach.h
index 986fb388d5..c3da8488eb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -22,4 +22,12 @@ define_commit_slab(contains_cache, enum contains_result);
 int commit_contains(struct ref_filter *filter, struct commit *commit,
 		    struct commit_list *list, struct contains_cache *cache);
 
+/*
+ * Can every commit or tag in the 'from' array reach a commit that has
+ * the 'with_flag' bit on? Mark the commits in 'from' that can reach
+ * such commits with 'assign_flag'.
+ */
+int can_all_from_reach_with_flag(struct object_array from, int with_flag,
+				 int assign_flag, time_t min_commit_date);
+
 #endif
diff --git a/upload-pack.c b/upload-pack.c
index 2f09cadbc0..7c58cb8f5e 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -22,6 +22,7 @@
 #include "quote.h"
 #include "upload-pack.h"
 #include "serve.h"
+#include "commit-reach.h"
 
 /* Remember to update object flag allocation in object.h */
 #define THEY_HAVE	(1u << 11)
@@ -333,70 +334,13 @@ static int got_oid(const char *hex, struct object_id *oid)
 	return 0;
 }
 
-static int reachable(struct commit *from, int with_flag, int assign_flag)
-{
-	struct prio_queue work = { compare_commits_by_commit_date };
-
-	prio_queue_put(&work, from);
-	while (work.nr) {
-		struct commit_list *list;
-		struct commit *commit = prio_queue_get(&work);
-
-		if (commit->object.flags & with_flag) {
-			from->object.flags |= assign_flag;
-			break;
-		}
-		if (!commit->object.parsed)
-			parse_object(&commit->object.oid);
-		if (commit->object.flags & TMP_MARK)
-			continue;
-		commit->object.flags |= TMP_MARK;
-		if (commit->date < oldest_have)
-			continue;
-		for (list = commit->parents; list; list = list->next) {
-			struct commit *parent = list->item;
-			if (!(parent->object.flags & TMP_MARK))
-				prio_queue_put(&work, parent);
-		}
-	}
-	from->object.flags |= TMP_MARK;
-	clear_commit_marks(from, TMP_MARK);
-	clear_prio_queue(&work);
-	return (from->object.flags & assign_flag);
-}
-
-static int can_all_from_reach_with_flag(struct object_array from,
-					int with_flag, int assign_flag)
-{
-	int i;
-
-	for (i = 0; i < from.nr; i++) {
-		struct object *from_one = from.objects[i].item;
-
-		if (from_one->flags & assign_flag)
-			continue;
-		from_one = deref_tag(from_one, "a from object", 0);
-		if (!from_one || from_one->type != OBJ_COMMIT) {
-			/* no way to tell if this is reachable by
-			 * looking at the ancestry chain alone, so
-			 * leave a note to ourselves not to worry about
-			 * this object anymore.
-			 */
-			from.objects[i].item->flags |= assign_flag;
-			continue;
-		}
-		if (!reachable((struct commit *)from_one, with_flag, assign_flag))
-			return 0;
-	}
-	return 1;
-}
-
 static int ok_to_give_up(void)
 {
 	if (!have_obj.nr)
 		return 0;
 
-	return can_all_from_reach_with_flag(want_obj, THEY_HAVE, COMMON_KNOWN);
+	return can_all_from_reach_with_flag(want_obj, THEY_HAVE,
+					    COMMON_KNOWN, oldest_have);
 }
 
 static int get_common_commits(void)
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 07/13] test-reach
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (5 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag() Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 21:54   ` Stefan Beller
  2018-06-29 16:12 ` [RFC PATCH 08/13] test-reach: test reduce_heads() Derrick Stolee
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

---
 Makefile              |   1 +
 t/helper/test-reach.c | 123 ++++++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c  |   1 +
 t/helper/test-tool.h  |   1 +
 t/t6600-test-reach.sh |  81 ++++++++++++++++++++++++++++
 5 files changed, 207 insertions(+)
 create mode 100644 t/helper/test-reach.c
 create mode 100755 t/t6600-test-reach.sh

diff --git a/Makefile b/Makefile
index 89ad873ce0..c2a2c6457e 100644
--- a/Makefile
+++ b/Makefile
@@ -716,6 +716,7 @@ TEST_BUILTINS_OBJS += test-mktemp.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
+TEST_BUILTINS_OBJS += test-reach.o
 TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
new file mode 100644
index 0000000000..d57660af45
--- /dev/null
+++ b/t/helper/test-reach.c
@@ -0,0 +1,123 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "commit-reach.h"
+#include "config.h"
+#include "parse-options.h"
+#include "tag.h"
+
+int cmd__reach(int ac, const char **av)
+{
+	struct object_id oid_A, oid_B;
+	struct commit *A, *B;
+	struct commit **X, **Y;
+	int nr_X, alloc_X, nr_Y, alloc_Y;
+	struct commit_list *list_X, *list_Y;
+	struct strbuf buf = STRBUF_INIT;
+
+	setup_git_directory();
+	get_git_config(default_git_config, 0);
+
+	if (argc < 2)
+		exit(1);
+
+	/* load input data */
+	A = B = NULL;
+	list_X = list_Y = NULL;
+	nr_X = nr_Y = 0;
+	alloc_X = alloc_Y = 16;
+	ALLOC_ARRAY(X, alloc_X);
+	ALLOC_ARRAY(Y, alloc_Y);
+
+	while (strbuf_getline(&buf, stdin) != EOF) {
+		struct object_id oid;
+		struct object *o;
+		struct commit *c;
+		if (buf.len < 3)
+			continue;
+
+		if (get_oid_committish(buf.buf + 2, &oid))
+			die("failed to resolve %s", buf.buf + 2);
+
+		o = parse_object(&oid);
+		o = deref_tag_noverify(o);
+
+		if (!o)
+			die("failed to load commit for input %s resulting in oid %s\n",
+			    buf.buf, oid_to_hex(&oid));
+
+		c = object_as_type(o, OBJ_COMMIT, 0);
+
+		if (!c)
+			die("failed to load commit for input %s resulting in oid %s\n",
+			    buf.buf, oid_to_hex(&oid));
+
+		switch (buf.buf[0]) {
+			case 'A':
+				oidcpy(&oid_A, &oid);
+				A = c;
+				break;
+
+			case 'B':
+				oidcpy(&oid_B, &oid);
+				B = c;
+				break;
+
+			case 'X':
+				ALLOC_GROW(X, nr_X + 1, alloc_X);
+				X[nr_X++] = c;
+				commit_list_insert(c, &list_X);
+				break;
+
+			case 'Y':
+				ALLOC_GROW(Y, nr_Y + 1, alloc_Y);
+				Y[nr_Y++] = c;
+				commit_list_insert(c, &list_Y);
+				break;
+
+			default:
+				die("unexpected start of line: %c", buf.buf[0]);
+		}
+	}
+	strbuf_release(&buf);
+
+	if (ac > 2 && !strcmp(av[2], "graph:off"))
+		core_commit_graph = 0;
+	if (ac > 2 && !strcmp(av[2], "graph:on"))
+		core_commit_graph = 1;
+
+	if (!strcmp(av[1], "ref_newer"))
+		printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B));
+	else if (!strcmp(av[1], "in_merge_base"))
+		printf("%s:%d\n", av[1], in_merge_bases(A, B));
+	else if (!strcmp(av[1], "get_merge_bases_many")) {
+		struct commit_list *list = get_merge_bases_many(A, nr_X, X);
+		printf("%s(A,X):\n", av[1]);
+		while (list) {
+			printf("%s\n", oid_to_hex(&list->item->object.oid));
+			list = list->next;
+		}
+
+		list = get_merge_bases_many(B, nr_Y, Y);
+		printf("%s(B,Y):\n", av[1]);
+		while (list) {
+			printf("%s\n", oid_to_hex(&list->item->object.oid));
+			list = list->next;
+		}
+	} else if (!strcmp(av[1], "reduce_heads")) {
+		struct commit_list *list = reduce_heads(list_X);
+		printf("%s(X):\n", av[1]);
+		while (list) {
+			printf("%s\n", oid_to_hex(&list->item->object.oid));
+			list = list->next;
+		}
+
+		list = reduce_heads(list_Y);
+		printf("%s(Y):\n", av[1]);
+		while (list) {
+			printf("%s\n", oid_to_hex(&list->item->object.oid));
+			list = list->next;
+		}
+	}
+
+	exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 805a45de9c..64d250b602 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -26,6 +26,7 @@ static struct test_cmd cmds[] = {
 	{ "online-cpus", cmd__online_cpus },
 	{ "path-utils", cmd__path_utils },
 	{ "prio-queue", cmd__prio_queue },
+	{ "reach", cmd__reach },
 	{ "read-cache", cmd__read_cache },
 	{ "ref-store", cmd__ref_store },
 	{ "regex", cmd__regex },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 7116ddfb94..0b1d3534b5 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -20,6 +20,7 @@ int cmd__mktemp(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
 int cmd__path_utils(int argc, const char **argv);
 int cmd__prio_queue(int argc, const char **argv);
+int cmd__reach(int argc, const char **argv);
 int cmd__read_cache(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
new file mode 100755
index 0000000000..c9337b6b46
--- /dev/null
+++ b/t/t6600-test-reach.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+test_description='basic commit reachability tests'
+
+. ./test-lib.sh
+
+# Construct a grid-like commit graph with points (x,y)
+# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
+# parents (x-1, y) and (x, y-1), keeping in mind that
+# we drop a parent if a coordinate is nonpositive.
+#
+#             (10,10)
+#            /       \
+#         (10,9)    (9,10)
+#        /     \   /      \
+#    (10,8)    (9,9)      (8,10)
+#   /     \    /   \      /    \
+#         ( continued...)
+#   \     /    \   /      \    /
+#    (3,1)     (2,2)      (1,3)
+#        \     /    \     /
+#         (2,1)      (2,1)
+#              \    /
+#              (1,1)
+#
+# We use branch 'comit-x-y' to refer to (x,y).
+# This grid allows interesting reachability and
+# non-reachability queries: (x,y) can reach (x',y')
+# if and only if x' <= x and y' <= y.
+test_expect_success 'setup' '
+	for i in $(test_seq 1 10)
+	do
+		test_commit "1-$i" &&
+		git branch -f commit-1-$i
+	done &&
+	for j in $(test_seq 1 9)
+	do
+		git reset --hard commit-$j-1 &&
+		x=$(($j + 1)) &&
+		test_commit "$x-1" &&
+		git branch -f commit-$x-1 &&
+
+		for i in $(test_seq 2 10)
+		do
+			git merge commit-$j-$i -m "$x-$i" &&
+			git branch -f commit-$x-$i
+		done
+	done &&
+	git commit-graph write --reachable
+'
+
+test_reach_two_modes() {
+	test-tool reach $1 graph:off <input >output &&
+	test_cmp output expect &&
+	test-tool reach $1 graph:on <input >output &&
+	test_cmp output expect
+}
+
+test_expect_success 'ref_newer:miss' '
+	cat >input <<- EOF &&
+	A:commit-5-7
+	B:commit-4-9
+	EOF
+	cat >expect <<- EOF &&
+	ref_newer:0
+	EOF
+	test_reach_two_modes "ref_newer"
+'
+
+test_expect_success 'ref_newer:miss' '
+	cat >input <<- EOF &&
+	A:commit-5-7
+	B:commit-2-3
+	EOF
+	cat >expect <<- EOF &&
+	ref_newer:1
+	EOF
+	test_reach_two_modes "ref_newer"
+'
+
+test_done
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 08/13] test-reach: test reduce_heads()
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (6 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 07/13] test-reach Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 22:06   ` Stefan Beller
  2018-06-29 16:12 ` [RFC PATCH 09/13] commit-reach: test can_all_from_reach Derrick Stolee
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t6600-test-reach.sh | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index c9337b6b46..0f60db9c60 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' '
 	test_reach_two_modes "ref_newer"
 '
 
+test_expect_success 'reduce_heads' '
+	cat >input <<- EOF &&
+	X:commit-1-10
+	X:commit-2-8
+	X:commit-3-6
+	X:commit-4-4
+	X:commit-1-7
+	X:commit-2-5
+	X:commit-3-3
+	X:commit-5-1
+	Y:commit-10-10
+	Y:commit-3-10
+	Y:commit-9-9
+	Y:commit-8-1
+	EOF
+	printf "reduce_heads(X):\n" >expect &&
+	git rev-parse commit-5-1 >>expect &&
+	git rev-parse commit-4-4 >>expect &&
+	git rev-parse commit-3-6 >>expect &&
+	git rev-parse commit-2-8 >>expect &&
+	git rev-parse commit-1-10 >>expect &&
+	printf "reduce_heads(Y):\n" >>expect &&
+	git rev-parse commit-10-10 >>expect &&
+	test_reach_two_modes "reduce_heads"
+'
+
 test_done
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 09/13] commit-reach: test can_all_from_reach
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (7 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 08/13] test-reach: test reduce_heads() Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 16:12 ` [RFC PATCH 10/13] commit-reach: test is_descendant_of Derrick Stolee
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c        | 52 +++++++++++++++++++++++++++++++++++++++----
 commit-reach.h        |  4 +++-
 t/helper/test-reach.c |  7 ++++--
 t/t6600-test-reach.sh | 51 +++++++++++++++++++++++++++++++++++++++---
 upload-pack.c         |  2 +-
 5 files changed, 105 insertions(+), 11 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 44c09669ec..992ad5cdc7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -562,14 +562,14 @@ static int reachable(struct commit *from, int with_flag, int assign_flag, time_t
 	return (from->object.flags & assign_flag);
 }
 
-int can_all_from_reach_with_flag(struct object_array from,
+int can_all_from_reach_with_flag(struct object_array *from,
 				 int with_flag, int assign_flag,
 				 time_t min_commit_date)
 {
 	int i;
 
-	for (i = 0; i < from.nr; i++) {
-		struct object *from_one = from.objects[i].item;
+	for (i = 0; i < from->nr; i++) {
+		struct object *from_one = from->objects[i].item;
 
 		if (from_one->flags & assign_flag)
 			continue;
@@ -580,7 +580,7 @@ int can_all_from_reach_with_flag(struct object_array from,
 			 * leave a note to ourselves not to worry about
 			 * this object anymore.
 			 */
-			from.objects[i].item->flags |= assign_flag;
+			from->objects[i].item->flags |= assign_flag;
 			continue;
 		}
 		if (!reachable((struct commit *)from_one, with_flag,
@@ -589,3 +589,47 @@ int can_all_from_reach_with_flag(struct object_array from,
 	}
 	return 1;
 }
+
+int can_all_from_reach(struct commit_list *from, struct commit_list *to)
+{
+	struct object_array from_objs = OBJECT_ARRAY_INIT;
+	time_t min_commit_date = from->item->date;
+	struct commit_list *from_iter = from;
+	struct commit_list *to_iter = to;
+	int result;
+
+	while (from_iter) {
+		add_object_array(&from_iter->item->object, NULL, &from_objs);
+
+		if (from_iter->item->date < min_commit_date)
+			min_commit_date = from_iter->item->date;
+
+		from_iter = from_iter->next;
+	}
+
+	while (to_iter) {
+		if (to_iter->item->date < min_commit_date)
+			min_commit_date = to_iter->item->date;
+
+		to_iter->item->object.flags |= PARENT2;
+
+		to_iter = to_iter->next;
+	}
+
+	result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
+					      min_commit_date);
+
+	while (from) {
+		clear_commit_marks(from->item, PARENT1);
+		from = from->next;
+	}
+
+	while (to) {
+		clear_commit_marks(to->item, PARENT2);
+		to = to->next;
+	}
+
+	object_array_clear(&from_objs);
+
+	return result;
+}
diff --git a/commit-reach.h b/commit-reach.h
index c3da8488eb..8ab06af2eb 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,7 +27,9 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
  * the 'with_flag' bit on? Mark the commits in 'from' that can reach
  * such commits with 'assign_flag'.
  */
-int can_all_from_reach_with_flag(struct object_array from, int with_flag,
+int can_all_from_reach_with_flag(struct object_array *from, int with_flag,
 				 int assign_flag, time_t min_commit_date);
 
+int can_all_from_reach(struct commit_list *from, struct commit_list *to);
+
 #endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index d57660af45..88639a2945 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -15,9 +15,9 @@ int cmd__reach(int ac, const char **av)
 	struct strbuf buf = STRBUF_INIT;
 
 	setup_git_directory();
-	get_git_config(default_git_config, 0);
+	git_config(git_default_config, NULL);
 
-	if (argc < 2)
+	if (ac < 2)
 		exit(1);
 
 	/* load input data */
@@ -117,6 +117,9 @@ int cmd__reach(int ac, const char **av)
 			printf("%s\n", oid_to_hex(&list->item->object.oid));
 			list = list->next;
 		}
+	} else if (!strcmp(av[1], "can_all_from_reach")) {
+		int result = can_all_from_reach(list_X, list_Y);
+		printf("%s(X,Y):%d\n", av[1], result);
 	}
 
 	exit(0);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0f60db9c60..ef25e70174 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -64,7 +64,7 @@ test_expect_success 'ref_newer:miss' '
 	cat >expect <<- EOF &&
 	ref_newer:0
 	EOF
-	test_reach_two_modes "ref_newer"
+	test_reach_two_modes ref_newer
 '
 
 test_expect_success 'ref_newer:miss' '
@@ -75,7 +75,7 @@ test_expect_success 'ref_newer:miss' '
 	cat >expect <<- EOF &&
 	ref_newer:1
 	EOF
-	test_reach_two_modes "ref_newer"
+	test_reach_two_modes ref_newer
 '
 
 test_expect_success 'reduce_heads' '
@@ -101,7 +101,52 @@ test_expect_success 'reduce_heads' '
 	git rev-parse commit-1-10 >>expect &&
 	printf "reduce_heads(Y):\n" >>expect &&
 	git rev-parse commit-10-10 >>expect &&
-	test_reach_two_modes "reduce_heads"
+	test_reach_two_modes reduce_heads
+'
+
+test_expect_success 'can_all_from_reach:hit' '
+	cat >input <<- EOF &&
+	X:commit-2-10
+	X:commit-3-9
+	X:commit-4-8
+	X:commit-5-7
+	X:commit-6-6
+	X:commit-7-5
+	X:commit-8-4
+	X:commit-9-3
+	Y:commit-1-9
+	Y:commit-2-8
+	Y:commit-3-7
+	Y:commit-4-6
+	Y:commit-5-5
+	Y:commit-6-4
+	Y:commit-7-3
+	Y:commit-8-1
+	EOF
+	printf "can_all_from_reach(X,Y):1\n" >expect &&
+	test_reach_two_modes can_all_from_reach
+'
+
+test_expect_success 'can_all_from_reach:miss' '
+	cat >input <<- EOF &&
+	X:commit-2-10
+	X:commit-3-9
+	X:commit-4-8
+	X:commit-5-7
+	X:commit-6-6
+	X:commit-7-5
+	X:commit-8-4
+	X:commit-9-3
+	Y:commit-1-9
+	Y:commit-2-8
+	Y:commit-3-7
+	Y:commit-4-6
+	Y:commit-5-5
+	Y:commit-6-4
+	Y:commit-8-5
+	EOF
+	printf "can_all_from_reach(X,Y):0\n" >expect &&
+	test_reach_two_modes can_all_from_reach
 '
 
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index 7c58cb8f5e..e59fbca257 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -339,7 +339,7 @@ static int ok_to_give_up(void)
 	if (!have_obj.nr)
 		return 0;
 
-	return can_all_from_reach_with_flag(want_obj, THEY_HAVE,
+	return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
 					    COMMON_KNOWN, oldest_have);
 }
 
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 10/13] commit-reach: test is_descendant_of
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (8 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 09/13] commit-reach: test can_all_from_reach Derrick Stolee
@ 2018-06-29 16:12 ` Derrick Stolee
  2018-06-29 16:13 ` [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear Derrick Stolee
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:12 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/helper/test-reach.c |  2 ++
 t/t6600-test-reach.sh | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 88639a2945..14aaef5bff 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -89,6 +89,8 @@ int cmd__reach(int ac, const char **av)
 		printf("%s:%d\n", av[1], ref_newer(&oid_A, &oid_B));
 	else if (!strcmp(av[1], "in_merge_base"))
 		printf("%s:%d\n", av[1], in_merge_bases(A, B));
+	else if (!strcmp(av[1], "is_descendant_of"))
+		printf("%s(A,X):%d\n", av[1], is_descendant_of(A, list_X));
 	else if (!strcmp(av[1], "get_merge_bases_many")) {
 		struct commit_list *list = get_merge_bases_many(A, nr_X, X);
 		printf("%s(A,X):\n", av[1]);
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index ef25e70174..62655ae650 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -78,6 +78,31 @@ test_expect_success 'ref_newer:miss' '
 	test_reach_two_modes ref_newer
 '
 
+test_expect_success 'is_descendant_of:hit' '
+	cat >input <<- EOF &&
+	A:commit-5-7
+	X:commit-4-8
+	X:commit-6-6
+	X:commit-1-1
+	EOF
+	cat >expect <<- EOF &&
+	is_descendant_of(A,X):1
+	EOF
+	test_reach_two_modes is_descendant_of
+'
+
+test_expect_success 'is_descendant_of:miss' '
+	cat >input <<- EOF &&
+	A:commit-5-7
+	X:commit-4-8
+	X:commit-6-6
+	EOF
+	cat >expect <<- EOF &&
+	is_descendant_of(A,X):0
+	EOF
+	test_reach_two_modes is_descendant_of
+'
+
 test_expect_success 'reduce_heads' '
 	cat >input <<- EOF &&
 	X:commit-1-10
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (9 preceding siblings ...)
  2018-06-29 16:12 ` [RFC PATCH 10/13] commit-reach: test is_descendant_of Derrick Stolee
@ 2018-06-29 16:13 ` Derrick Stolee
  2018-06-29 23:18   ` Stefan Beller
  2018-06-29 16:13 ` [RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer Derrick Stolee
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:13 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each from
commit, sorted by ascending generation number. We do not walk beyond the
minimum generation number or the minimum commit date. This DFS is likely
to be faster than the existing reachable() method because we expect
previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other from commits. We
also mark the visited commits using 'assign_flag' to prevent re-walking
the same code.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case
----------

Before: 1.45 s
 After: 0.34 s

Large Case
----------

Before: 5.83 s
 After: 0.37 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 122 ++++++++++++++++++++++++++++++-------------------
 commit-reach.h |   3 +-
 upload-pack.c  |   5 +-
 3 files changed, 82 insertions(+), 48 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 992ad5cdc7..8e24455d9f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
 	return is_descendant_of(commit, list);
 }
 
-static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-	struct prio_queue work = { compare_commits_by_commit_date };
+	const struct commit *a = (const struct commit *)_a;
+	const struct commit *b = (const struct commit *)_b;
 
-	prio_queue_put(&work, from);
-	while (work.nr) {
-		struct commit_list *list;
-		struct commit *commit = prio_queue_get(&work);
-
-		if (commit->object.flags & with_flag) {
-			from->object.flags |= assign_flag;
-			break;
-		}
-		if (!commit->object.parsed)
-			parse_object(&commit->object.oid);
-		if (commit->object.flags & TMP_MARK)
-			continue;
-		commit->object.flags |= TMP_MARK;
-		if (commit->date < min_commit_date)
-			continue;
-		for (list = commit->parents; list; list = list->next) {
-			struct commit *parent = list->item;
-			if (!(parent->object.flags & TMP_MARK))
-				prio_queue_put(&work, parent);
-		}
-	}
-	from->object.flags |= TMP_MARK;
-	clear_commit_marks(from, TMP_MARK);
-	clear_prio_queue(&work);
-	return (from->object.flags & assign_flag);
+	if (a->generation < b->generation)
+		return -1;
+	if (a->generation > b->generation)
+		return 1;
+	return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 				 int with_flag, int assign_flag,
-				 time_t min_commit_date)
+				 time_t min_commit_date,
+				 uint32_t min_generation)
 {
+	struct commit **list = NULL;
 	int i;
+	int result = 1;
 
+	ALLOC_ARRAY(list, from->nr);
 	for (i = 0; i < from->nr; i++) {
-		struct object *from_one = from->objects[i].item;
+		list[i] = (struct commit *)from->objects[i].item;
 
-		if (from_one->flags & assign_flag)
-			continue;
-		from_one = deref_tag(from_one, "a from object", 0);
-		if (!from_one || from_one->type != OBJ_COMMIT) {
-			/* no way to tell if this is reachable by
-			 * looking at the ancestry chain alone, so
-			 * leave a note to ourselves not to worry about
-			 * this object anymore.
-			 */
-			from->objects[i].item->flags |= assign_flag;
-			continue;
-		}
-		if (!reachable((struct commit *)from_one, with_flag,
-			       assign_flag, min_commit_date))
+		parse_commit(list[i]);
+
+		if (list[i]->generation < min_generation)
 			return 0;
 	}
-	return 1;
+
+	QSORT(list, from->nr, compare_commits_by_gen);
+
+	for (i = 0; i < from->nr; i++) {
+		/* DFS from list[i] */
+		struct commit_list *stack = NULL;
+
+		list[i]->object.flags |= assign_flag;
+		commit_list_insert(list[i], &stack);
+
+		while (stack) {
+			struct commit_list *parent;
+
+			if (stack->item->object.flags & with_flag) {
+				pop_commit(&stack);
+				continue;
+			}
+
+			for (parent = stack->item->parents; parent; parent = parent->next) {
+				if (parent->item->object.flags & (with_flag | RESULT))
+					stack->item->object.flags |= RESULT;
+
+				if (!(parent->item->object.flags & assign_flag)) {
+					parent->item->object.flags |= assign_flag;
+
+					parse_commit(parent->item);
+
+					if (parent->item->date < min_commit_date ||
+					    parent->item->generation < min_generation)
+						continue;
+
+					commit_list_insert(parent->item, &stack);
+					break;
+				}
+			}
+
+			if (!parent)
+				pop_commit(&stack);
+		}
+
+		if (!(list[i]->object.flags & (with_flag | RESULT))) {
+			result = 0;
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	for (i = 0; i < from->nr; i++) {
+		clear_commit_marks(list[i], RESULT);
+		clear_commit_marks(list[i], assign_flag);
+	}
+	return result;
 }
 
 int can_all_from_reach(struct commit_list *from, struct commit_list *to)
@@ -597,6 +621,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to)
 	struct commit_list *from_iter = from;
 	struct commit_list *to_iter = to;
 	int result;
+	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
 	while (from_iter) {
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
@@ -608,16 +633,21 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to)
 	}
 
 	while (to_iter) {
+		parse_commit(to_iter->item);
+
 		if (to_iter->item->date < min_commit_date)
 			min_commit_date = to_iter->item->date;
 
+		if (to_iter->item->generation < min_generation)
+			min_generation = to_iter->item->generation;
+
 		to_iter->item->object.flags |= PARENT2;
 
 		to_iter = to_iter->next;
 	}
 
 	result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
-					      min_commit_date);
+					      min_commit_date, min_generation);
 
 	while (from) {
 		clear_commit_marks(from->item, PARENT1);
diff --git a/commit-reach.h b/commit-reach.h
index 8ab06af2eb..3eb4c057e6 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -28,7 +28,8 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
  * such commits with 'assign_flag'.
  */
 int can_all_from_reach_with_flag(struct object_array *from, int with_flag,
-				 int assign_flag, time_t min_commit_date);
+				 int assign_flag, time_t min_commit_date,
+				 uint32_t min_generation);
 
 int can_all_from_reach(struct commit_list *from, struct commit_list *to);
 
diff --git a/upload-pack.c b/upload-pack.c
index e59fbca257..95ba077e2b 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -336,11 +336,14 @@ static int got_oid(const char *hex, struct object_id *oid)
 
 static int ok_to_give_up(void)
 {
+	uint32_t min_generation = GENERATION_NUMBER_ZERO;
+
 	if (!have_obj.nr)
 		return 0;
 
 	return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
-					    COMMON_KNOWN, oldest_have);
+					    COMMON_KNOWN, oldest_have,
+					    min_generation);
 }
 
 static int get_common_commits(void)
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (10 preceding siblings ...)
  2018-06-29 16:13 ` [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear Derrick Stolee
@ 2018-06-29 16:13 ` Derrick Stolee
  2018-06-29 16:13 ` [RFC PATCH 13/13] commit-reach: use can_all_from_reach Derrick Stolee
  2018-06-29 17:33 ` [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
  13 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:13 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

The ref_newer method is used by 'git push' to detect if a force-push is
requried. This method does not use any kind of cutoff when walking, so
in the case of a force-push will walk all reachable commits.

The is_descendant_of method already uses paint_down_to_common along with
cutoffs. By translating the ref_newer arguments into the commit and
commit_list required by is_descendant_of, we can have one fewer commit
walk and also improve our performance!

For a copy of the Linux repository, 'test-tool reach ref_newer' presents
the following improvements with the specified input.

Input
-----
A:v4.9
B:v3.19

Before: 0.11 s
 After: 0.10 s

To test the negative case, add a new commit with parent v3.19,
regenerate the commit-graph, and then run with B pointing at that
commit.

Before: 0.52 s
 After: 0.12 s

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 17 ++++-------------
 commit.c       |  7 +++++--
 commit.h       |  6 +++++-
 fetch-pack.c   |  3 ++-
 sha1-name.c    |  3 ++-
 walker.c       |  3 ++-
 6 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 8e24455d9f..249e9a4fac 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -376,7 +376,7 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
 {
 	struct object *o;
 	struct commit *old_commit, *new_commit;
-	struct commit_list *list, *used;
+	struct commit_list *list = NULL;
 	int found = 0;
 
 	/*
@@ -396,18 +396,9 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid)
 	if (parse_commit(new_commit) < 0)
 		return 0;
 
-	used = list = NULL;
-	commit_list_insert(new_commit, &list);
-	while (list) {
-		new_commit = pop_most_recent_commit(&list, TMP_MARK);
-		commit_list_insert(new_commit, &used);
-		if (new_commit == old_commit) {
-			found = 1;
-			break;
-		}
-	}
-	unmark_and_free(list, TMP_MARK);
-	unmark_and_free(used, TMP_MARK);
+	commit_list_insert(old_commit, &list);
+	found = is_descendant_of(new_commit, list);
+	free_commit_list(list);
 	return found;
 }
 
diff --git a/commit.c b/commit.c
index d4ddaf4879..9870682673 100644
--- a/commit.c
+++ b/commit.c
@@ -556,7 +556,8 @@ void commit_list_sort_by_date(struct commit_list **list)
 }
 
 struct commit *pop_most_recent_commit(struct commit_list **list,
-				      unsigned int mark)
+				      unsigned int mark,
+				      uint32_t min_generation)
 {
 	struct commit *ret = pop_commit(list);
 	struct commit_list *parents = ret->parents;
@@ -565,7 +566,9 @@ struct commit *pop_most_recent_commit(struct commit_list **list,
 		struct commit *commit = parents->item;
 		if (!parse_commit(commit) && !(commit->object.flags & mark)) {
 			commit->object.flags |= mark;
-			commit_list_insert_by_date(commit, list);
+
+			if (commit->generation >= min_generation)
+				commit_list_insert_by_date(commit, list);
 		}
 		parents = parents->next;
 	}
diff --git a/commit.h b/commit.h
index 7e0f273720..5eaeded5e2 100644
--- a/commit.h
+++ b/commit.h
@@ -159,9 +159,13 @@ extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
  * of its parents.
+ *
+ * The parents are not added if their generation number is strictly
+ * lower than min_generation.
  **/
 struct commit *pop_most_recent_commit(struct commit_list **list,
-				      unsigned int mark);
+				      unsigned int mark,
+				      uint32_t min_generation);
 
 struct commit *pop_commit(struct commit_list **stack);
 
diff --git a/fetch-pack.c b/fetch-pack.c
index a320ce9872..351e3d4bcd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -600,7 +600,8 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	while (complete && cutoff <= complete->item->date) {
 		print_verbose(args, _("Marking %s as complete"),
 			      oid_to_hex(&complete->item->object.oid));
-		pop_most_recent_commit(&complete, COMPLETE);
+		pop_most_recent_commit(&complete, COMPLETE,
+				       GENERATION_NUMBER_ZERO);
 	}
 }
 
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..471a54464d 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -1141,7 +1141,8 @@ static int get_oid_oneline(const char *prefix, struct object_id *oid,
 		struct commit *commit;
 		int matches;
 
-		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
+		commit = pop_most_recent_commit(&list, ONELINE_SEEN,
+						GENERATION_NUMBER_ZERO);
 		if (!parse_object(&commit->object.oid))
 			continue;
 		buf = get_commit_buffer(commit, NULL);
diff --git a/walker.c b/walker.c
index 0b162a09b9..e243fc8768 100644
--- a/walker.c
+++ b/walker.c
@@ -78,7 +78,8 @@ static int process_commit(struct walker *walker, struct commit *commit)
 		return -1;
 
 	while (complete && complete->item->date >= commit->date) {
-		pop_most_recent_commit(&complete, COMPLETE);
+		pop_most_recent_commit(&complete, COMPLETE,
+				       GENERATION_NUMBER_ZERO);
 	}
 
 	if (commit->object.flags & COMPLETE)
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 13/13] commit-reach: use can_all_from_reach
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (11 preceding siblings ...)
  2018-06-29 16:13 ` [RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer Derrick Stolee
@ 2018-06-29 16:13 ` Derrick Stolee
  2018-06-29 23:21   ` Stefan Beller
  2018-06-29 17:33 ` [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
  13 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 16:13 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com,
	Derrick Stolee

The is_descendant_of method previously used in_merge_bases() to check if
the commit can reach any of the commits in the provided list. This had
two performance problems:

1. The performance is quadratic in worst-case.

2. A single in_merge_bases() call requires walking beyond the target
   commit in order to find the full set of boundary commits that may be
   merge-bases.

The can_all_from_reach method avoids this quadratic behavior and can
limit the search beyond the target commits using generation numbers. It
requires a small prototype adjustment to stop using commit-date as a
cutoff, as that optimization is no longer appropriate here.

Performance was meausured on a copy of the Linux repository using the
'test-tool reach is_descendant_of' command using this input:

A:v4.9
X:v4.10
X:v4.11
X:v4.12
X:v4.13
X:v4.14
X:v4.15
X:v4.16
X:v4.17
X.v3.0

Note that this input is tailored to demonstrate the quadratic nature of
the previous method, as it will compute merge-bases for v4.9 versus all
of the later versions before checking against v4.1.

Before: 0.31 s
 After: 0.27 s

Since we previously used the is_descendant_of method in the ref_newer
method, we also measured performance there using
'test-tool reach ref_newer':

Before: 0.12 s
 After: 0.11 s

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---

One thing I know is missing from this commit is a special-case to use
the old logic when there is no commit-graph present. The
can_all_from_reach() algorithm can be worse when we do not have good
generation number cutoffs. In the previous case of
can_all_from_reach_with_flags(), we already had an established pattern
of using commit date as a cutoff, so the generation number is only a
second cutoff and the algorithm cannot walk more commits than before.

 commit-reach.c        | 34 +++++++++++++++++-----------------
 commit-reach.h        |  3 ++-
 t/helper/test-reach.c |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 249e9a4fac..a823d6965c 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -273,17 +273,15 @@ struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
  */
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
 {
+	struct commit_list *from_list = NULL;
+	int result;
 	if (!with_commit)
 		return 1;
-	while (with_commit) {
-		struct commit *other;
 
-		other = with_commit->item;
-		with_commit = with_commit->next;
-		if (in_merge_bases(other, commit))
-			return 1;
-	}
-	return 0;
+	commit_list_insert(commit, &from_list);
+	result = can_all_from_reach(from_list, with_commit, 0);
+	free_commit_list(from_list);
+	return result;
 }
 
 /*
@@ -605,10 +603,11 @@ int can_all_from_reach_with_flag(struct object_array *from,
 	return result;
 }
 
-int can_all_from_reach(struct commit_list *from, struct commit_list *to)
+int can_all_from_reach(struct commit_list *from, struct commit_list *to,
+		       int cutoff_by_min_date)
 {
 	struct object_array from_objs = OBJECT_ARRAY_INIT;
-	time_t min_commit_date = from->item->date;
+	time_t min_commit_date = cutoff_by_min_date ? from->item->date : 0;
 	struct commit_list *from_iter = from;
 	struct commit_list *to_iter = to;
 	int result;
@@ -617,20 +616,21 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to)
 	while (from_iter) {
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
 
-		if (from_iter->item->date < min_commit_date)
+		if (!parse_commit(from_iter->item) &&
+		    from_iter->item->date < min_commit_date)
 			min_commit_date = from_iter->item->date;
 
 		from_iter = from_iter->next;
 	}
 
 	while (to_iter) {
-		parse_commit(to_iter->item);
-
-		if (to_iter->item->date < min_commit_date)
-			min_commit_date = to_iter->item->date;
+		if (!parse_commit(to_iter->item)) {
+			if (to_iter->item->date < min_commit_date)
+				min_commit_date = to_iter->item->date;
 
-		if (to_iter->item->generation < min_generation)
-			min_generation = to_iter->item->generation;
+			if (to_iter->item->generation < min_generation)
+				min_generation = to_iter->item->generation;
+		}
 
 		to_iter->item->object.flags |= PARENT2;
 
diff --git a/commit-reach.h b/commit-reach.h
index 3eb4c057e6..180f865d7d 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -31,6 +31,7 @@ int can_all_from_reach_with_flag(struct object_array *from, int with_flag,
 				 int assign_flag, time_t min_commit_date,
 				 uint32_t min_generation);
 
-int can_all_from_reach(struct commit_list *from, struct commit_list *to);
+int can_all_from_reach(struct commit_list *from, struct commit_list *to,
+		       int cutoff_by_min_date);
 
 #endif
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 14aaef5bff..c05137c9f3 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -120,7 +120,7 @@ int cmd__reach(int ac, const char **av)
 			list = list->next;
 		}
 	} else if (!strcmp(av[1], "can_all_from_reach")) {
-		int result = can_all_from_reach(list_X, list_Y);
+		int result = can_all_from_reach(list_X, list_Y, 1);
 		printf("%s(X,Y):%d\n", av[1], result);
 	}
 
-- 
2.18.0.118.gd4f65b8d14


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/13] Consolidate reachability logic
  2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
                   ` (12 preceding siblings ...)
  2018-06-29 16:13 ` [RFC PATCH 13/13] commit-reach: use can_all_from_reach Derrick Stolee
@ 2018-06-29 17:33 ` Derrick Stolee
  13 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-29 17:33 UTC (permalink / raw)
  To: Derrick Stolee, git@vger.kernel.org
  Cc: peff@peff.net, sbeller@google.com, jnareb@gmail.com

This RFC is available as a GitHub pull request [1].

Thanks,
-Stolee

[1] https://github.com/derrickstolee/git/pull/8

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c
  2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
@ 2018-06-29 21:35   ` Stefan Beller
  2018-06-29 21:52   ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 21:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

Hi Derrick,

> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1         (1u<<16)
> +#define PARENT2         (1u<<17)
> +#define STALE           (1u<<18)
> +#define RESULT          (1u<<19)

Something is up with whitespaces here, apart from that this patch
looks good.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
  2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
@ 2018-06-29 21:38   ` Stefan Beller
  2018-06-30  1:32     ` Derrick Stolee
  2018-06-29 22:00   ` Junio C Hamano
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 21:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

Hi Derrick,

> +struct ref_filter_cbdata {
> +       struct ref_array *array;
> +       struct ref_filter *filter;
> +       struct contains_cache contains_cache;
> +       struct contains_cache no_contains_cache;

These members also seem to be moved whitespace-inconsistently.
Could you clarify in the commit message that you re-indent them or
what happened?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up()
  2018-06-29 16:12 ` [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up() Derrick Stolee
@ 2018-06-29 21:44   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 21:44 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

Hi Derrick,

> -static int ok_to_give_up(void)
> +static int can_all_from_reach_with_flag(struct object_array from,

This method is hard to read; at first I thought you missed a word,
but then I realized that it asks "can all 'from' members reach
['something'] and we pass in the 'flag', so maybe

  all_reachable_flags

or something?



>                         /* no way to tell if this is reachable by

While at it, you may want to fix the comment here.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
  2018-06-29 16:12 ` [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag() Derrick Stolee
@ 2018-06-29 21:47   ` Stefan Beller
  2018-06-30  1:35     ` Derrick Stolee
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 21:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

Hi Derrick,

> +static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
[...]
> +               if (commit->date < min_commit_date)
> +                       continue;
[...]

> +int can_all_from_reach_with_flag(struct object_array from,
> +                                int with_flag, int assign_flag,
> +                                time_t min_commit_date)

Can you expand on why we introduce the min_commit_date in the commit message?
(It looks like a rebase error as I would have expected a move only,
given the subject)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 01/13] commit-reach: move walk methods from commit.c
  2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
  2018-06-29 21:35   ` Stefan Beller
@ 2018-06-29 21:52   ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-06-29 21:52 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com

Derrick Stolee <dstolee@microsoft.com> writes:

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Makefile       |   1 +
>  commit-reach.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++++
>  commit-reach.h |  41 ++++++
>  commit.c       | 358 ------------------------------------------------
>  4 files changed, 401 insertions(+), 358 deletions(-)
>  create mode 100644 commit-reach.c

"blame -bs -C HEAD^.." tells us that all lines except for the
initial #include of this file came from commit.c verbatim, which is
good for a "move lines" patch like this one.

>  create mode 100644 commit-reach.h

"blame -bs -C -C HEAD^.." tells us, but "blame -bs -C HEAD^.." does
not tell us, that most of the lines are from commit.h, which is a
very bad sign.  reduce_heads(), for example, have two duplicated
decl and depending on which header file is included, the source
files will risk getting inconsistent definition once the headers
start to deviate from each other.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 07/13] test-reach
  2018-06-29 16:12 ` [RFC PATCH 07/13] test-reach Derrick Stolee
@ 2018-06-29 21:54   ` Stefan Beller
  2018-06-30  1:40     ` Derrick Stolee
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 21:54 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@microsoft.com> wrote:

> +# Construct a grid-like commit graph with points (x,y)
> +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
> +# parents (x-1, y) and (x, y-1), keeping in mind that
> +# we drop a parent if a coordinate is nonpositive.
> +#
> +#             (10,10)
> +#            /       \
> +#         (10,9)    (9,10)
> +#        /     \   /      \
> +#    (10,8)    (9,9)      (8,10)
> +#   /     \    /   \      /    \
> +#         ( continued...)
> +#   \     /    \   /      \    /
> +#    (3,1)     (2,2)      (1,3)
> +#        \     /    \     /
> +#         (2,1)      (2,1)
> +#              \    /
> +#              (1,1)
> +#
> +# We use branch 'comit-x-y' to refer to (x,y).
> +# This grid allows interesting reachability and
> +# non-reachability queries: (x,y) can reach (x',y')
> +# if and only if x' <= x and y' <= y.

This is an interesting DAG, though not very common
in reality (81 merges, 18 single parent commits,
one root with a depth of at most 10).

It reminds me of FELINE as that also has 2 independent numbers :)

I guess it is easy to make up artificial test cases for that,
but I wonder if we want more variety for a generic reachability
test tool (e.g. long chains of parallel histories, octopus,
disconnected parts of the graph)

> +test_expect_success 'setup' '
[...]
> +       git commit-graph write --reachable

Is this only to test the full commit-graph?, maybe we'd
want to write out the commit graph at (5,10) or so, such
that half the walking is tested without graph.
What about author/commit dates?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
  2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
  2018-06-29 21:38   ` Stefan Beller
@ 2018-06-29 22:00   ` Junio C Hamano
  1 sibling, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-06-29 22:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com

Derrick Stolee <dstolee@microsoft.com> writes:

> +int commit_contains(struct ref_filter *filter, struct commit *commit,
> +		    struct commit_list *list, struct contains_cache *cache)

This is a symbol that is used to be file-local private.  Is it named
appropriately in the new context, which is "globally visible
throughout the system"?  The convention to call into it now must be
documented a lot better (e.g. how should list/cache etc are to be
prepared?).

> +{
> +	if (filter->with_commit_tag_algo)
> +		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
> +	return is_descendant_of(commit, list);
> +}
> diff --git a/commit-reach.h b/commit-reach.h
> index 35ec9f0ddb..986fb388d5 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -2,42 +2,24 @@
>  #define __COMMIT_REACH_H__
>  
>  #include "commit.h"
> +#include "commit-slab.h"
> +#include "ref-filter.h"
>  
> -struct commit_list *get_merge_bases_many(struct commit *one,
> -					 int n,
> -					 struct commit **twos);
> -struct commit_list *get_merge_bases_many_dirty(struct commit *one,
> -					       int n,
> -					       struct commit **twos);
> -struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
> -struct commit_list *get_octopus_merge_bases(struct commit_list *in);
> -
> -/* To be used only when object flags after this call no longer matter */
> -struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos);
> -
> -int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
> -int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference);
> -int in_merge_bases(struct commit *commit, struct commit *reference);
> -
> +int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
>  
>  /*
> - * Takes a list of commits and returns a new list where those
> - * have been removed that can be reached from other commits in
> - * the list. It is useful for, e.g., reducing the commits
> - * randomly thrown at the git-merge command and removing
> - * redundant commits that the user shouldn't have given to it.
> - *
> - * This function destroys the STALE bit of the commit objects'
> - * flags.

The above removal of lines is sloppy; they are mostly duplicates in
commit.h that should never have been moved here in the first place,
no?

> + * Unknown has to be "0" here, because that's the default value for
> + * contains_cache slab entries that have not yet been assigned.
>   */
> -struct commit_list *reduce_heads(struct commit_list *heads);
> +enum contains_result {
> +	CONTAINS_UNKNOWN = 0,
> +	CONTAINS_NO,
> +	CONTAINS_YES
> +};

Are these names specific enough, or were they OK in the limited
context inside ref-filter but now are overly broad as globally
visible names?  I suspect it might be the latter.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 04/13] upload-pack: make reachable() more generic
  2018-06-29 16:12 ` [RFC PATCH 04/13] upload-pack: make reachable() more generic Derrick Stolee
@ 2018-06-29 22:05   ` Junio C Hamano
  0 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2018-06-29 22:05 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	jnareb@gmail.com

Derrick Stolee <dstolee@microsoft.com> writes:

> In anticipation of moving the reachable() method to commit-reach.c,
> modify the prototype to be more generic to flags known outside of
> upload-pack.c. Also rename 'want' to 'from' to make the statement
> more clear outside of the context of haves/wants negotiation.

FWIW, I find the order of things done quite sensible.  Rename,
extend, etc., to prepare before moving and then move, not the other
way around.  You would want to do the same for some symbols named
overly broadly you moved in earlier steps, too.

Also, if you will be making this function a global one in a later
step, now is the time to give it a good name suitable in the new
global context before moving in this preparatory step.  If it will
be file-scope static in its new home, then reachable() may still be
a good enough name.  Let's keep reading...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 08/13] test-reach: test reduce_heads()
  2018-06-29 16:12 ` [RFC PATCH 08/13] test-reach: test reduce_heads() Derrick Stolee
@ 2018-06-29 22:06   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 22:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@microsoft.com> wrote:
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t6600-test-reach.sh | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
> index c9337b6b46..0f60db9c60 100755
> --- a/t/t6600-test-reach.sh
> +++ b/t/t6600-test-reach.sh
> @@ -78,4 +78,30 @@ test_expect_success 'ref_newer:miss' '
>         test_reach_two_modes "ref_newer"
>  '
>
> +test_expect_success 'reduce_heads' '
> +       cat >input <<- EOF &&
> +       X:commit-1-10
> +       X:commit-2-8
> +       X:commit-3-6
> +       X:commit-4-4
> +       X:commit-1-7
> +       X:commit-2-5
> +       X:commit-3-3
> +       X:commit-5-1
> +       Y:commit-10-10
> +       Y:commit-3-10
> +       Y:commit-9-9
> +       Y:commit-8-1
> +       EOF
> +       printf "reduce_heads(X):\n" >expect &&
> +       git rev-parse commit-5-1 >>expect &&

See 6ac767e5c00 (t6036, t6042: prefer test_cmp to sequences of test, 2018-05-24)
on how to avoid some forks here:

    git rev-parse >expect \
        many HEADs or Tips &&

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear
  2018-06-29 16:13 ` [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear Derrick Stolee
@ 2018-06-29 23:18   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 23:18 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@microsoft.com> wrote:
>
> The can_all_from_reach_with_flags() algorithm is currently quadratic in
> the worst case, because it calls the reachable() method for every 'from'
> without tracking which commits have already been walked or which can
> already reach a commit in 'to'.
>
> Rewrite the algorithm to walk each commit a constant number of times.
>
> We also add some optimizations that should work for the main consumer of
> this method: fetch negotitation (haves/wants).
>
> The first step includes using a depth-first-search (DFS) from each from
> commit, sorted by ascending generation number. We do not walk beyond the
> minimum generation number or the minimum commit date. This DFS is likely
> to be faster than the existing reachable() method because we expect
> previous ref values to be along the first-parent history.
>
> If we find a target commit, then we mark everything in the DFS stack as
> a RESULT. This expands the set of targets for the other from commits. We
> also mark the visited commits using 'assign_flag' to prevent re-walking
> the same code.
>
> We still need to clear our flags at the end, which is why we will have a
> total of three visits to each commit.
>
> Performance was measured on the Linux repository using
> 'test-tool reach can_all_from_reach'. The input included rows seeded by
> tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
> v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
> v3 releases and want all major v4 releases." The "large" case included
> X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
> tags to the set, which does not greatly increase the number of objects
> that are considered, but does increase the number of 'from' commits,
> demonstrating the quadratic nature of the previous code.
>
> Small Case
> ----------
>
> Before: 1.45 s
>  After: 0.34 s
>
> Large Case
> ----------
>
> Before: 5.83 s
>  After: 0.37 s
>
> Note how the time increases between the two cases in the two versions.
> The new code increases relative to the number of commits that need to be
> walked, but not directly relative to the number of 'from' commits.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-reach.c | 122 ++++++++++++++++++++++++++++++-------------------
>  commit-reach.h |   3 +-
>  upload-pack.c  |   5 +-
>  3 files changed, 82 insertions(+), 48 deletions(-)
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 992ad5cdc7..8e24455d9f 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
>         return is_descendant_of(commit, list);
>  }
>
> -static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
> +static int compare_commits_by_gen(const void *_a, const void *_b)
>  {
> -       struct prio_queue work = { compare_commits_by_commit_date };
> +       const struct commit *a = (const struct commit *)_a;
> +       const struct commit *b = (const struct commit *)_b;
>
> -       prio_queue_put(&work, from);
> -       while (work.nr) {
> -               struct commit_list *list;
> -               struct commit *commit = prio_queue_get(&work);
> -
> -               if (commit->object.flags & with_flag) {
> -                       from->object.flags |= assign_flag;
> -                       break;
> -               }
> -               if (!commit->object.parsed)
> -                       parse_object(&commit->object.oid);
> -               if (commit->object.flags & TMP_MARK)
> -                       continue;
> -               commit->object.flags |= TMP_MARK;
> -               if (commit->date < min_commit_date)
> -                       continue;
> -               for (list = commit->parents; list; list = list->next) {
> -                       struct commit *parent = list->item;
> -                       if (!(parent->object.flags & TMP_MARK))
> -                               prio_queue_put(&work, parent);
> -               }
> -       }
> -       from->object.flags |= TMP_MARK;
> -       clear_commit_marks(from, TMP_MARK);
> -       clear_prio_queue(&work);
> -       return (from->object.flags & assign_flag);
> +       if (a->generation < b->generation)
> +               return -1;
> +       if (a->generation > b->generation)
> +               return 1;
> +       return 0;
>  }
>
>  int can_all_from_reach_with_flag(struct object_array *from,
>                                  int with_flag, int assign_flag,
> -                                time_t min_commit_date)
> +                                time_t min_commit_date,
> +                                uint32_t min_generation)
>  {
> +       struct commit **list = NULL;
>         int i;
> +       int result = 1;
>
> +       ALLOC_ARRAY(list, from->nr);
>         for (i = 0; i < from->nr; i++) {
> -               struct object *from_one = from->objects[i].item;
> +               list[i] = (struct commit *)from->objects[i].item;
>
> -               if (from_one->flags & assign_flag)
> -                       continue;
> -               from_one = deref_tag(from_one, "a from object", 0);
> -               if (!from_one || from_one->type != OBJ_COMMIT) {
> -                       /* no way to tell if this is reachable by
> -                        * looking at the ancestry chain alone, so
> -                        * leave a note to ourselves not to worry about
> -                        * this object anymore.
> -                        */
> -                       from->objects[i].item->flags |= assign_flag;
> -                       continue;
> -               }
> -               if (!reachable((struct commit *)from_one, with_flag,
> -                              assign_flag, min_commit_date))
> +               parse_commit(list[i]);
> +
> +               if (list[i]->generation < min_generation)
>                         return 0;
>         }
> -       return 1;
> +
> +       QSORT(list, from->nr, compare_commits_by_gen);
> +
> +       for (i = 0; i < from->nr; i++) {
> +               /* DFS from list[i] */
> +               struct commit_list *stack = NULL;
> +
> +               list[i]->object.flags |= assign_flag;
> +               commit_list_insert(list[i], &stack);
> +
> +               while (stack) {
> +                       struct commit_list *parent;
> +
> +                       if (stack->item->object.flags & with_flag) {
> +                               pop_commit(&stack);
> +                               continue;
> +                       }
> +
> +                       for (parent = stack->item->parents; parent; parent = parent->next) {
> +                               if (parent->item->object.flags & (with_flag | RESULT))
> +                                       stack->item->object.flags |= RESULT;
> +
> +                               if (!(parent->item->object.flags & assign_flag)) {
> +                                       parent->item->object.flags |= assign_flag;
> +
> +                                       parse_commit(parent->item);

We can use parse_commit here as it would print a warning and keep going?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 13/13] commit-reach: use can_all_from_reach
  2018-06-29 16:13 ` [RFC PATCH 13/13] commit-reach: use can_all_from_reach Derrick Stolee
@ 2018-06-29 23:21   ` Stefan Beller
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Beller @ 2018-06-29 23:21 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

Hi Derrick,
On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@microsoft.com> wrote:
>
> The is_descendant_of method previously used in_merge_bases() to check if
> the commit can reach any of the commits in the provided list. This had
> two performance problems:
>
> 1. The performance is quadratic in worst-case.
>
> 2. A single in_merge_bases() call requires walking beyond the target
>    commit in order to find the full set of boundary commits that may be
>    merge-bases.
>
> The can_all_from_reach method avoids this quadratic behavior and can
> limit the search beyond the target commits using generation numbers. It
> requires a small prototype adjustment to stop using commit-date as a
> cutoff, as that optimization is no longer appropriate here.
>
> Performance was meausured on a copy of the Linux repository using the
> 'test-tool reach is_descendant_of' command using this input:
>
> A:v4.9
> X:v4.10
> X:v4.11
> X:v4.12
> X:v4.13
> X:v4.14
> X:v4.15
> X:v4.16
> X:v4.17
> X.v3.0
>
> Note that this input is tailored to demonstrate the quadratic nature of
> the previous method, as it will compute merge-bases for v4.9 versus all
> of the later versions before checking against v4.1.
>
> Before: 0.31 s
>  After: 0.27 s
>
> Since we previously used the is_descendant_of method in the ref_newer
> method, we also measured performance there using
> 'test-tool reach ref_newer':
>
> Before: 0.12 s
>  After: 0.11 s
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>
> One thing I know is missing from this commit is a special-case to use
> the old logic when there is no commit-graph present. The
> can_all_from_reach() algorithm can be worse when we do not have good
> generation number cutoffs. In the previous case of
> can_all_from_reach_with_flags(), we already had an established pattern
> of using commit date as a cutoff, so the generation number is only a
> second cutoff and the algorithm cannot walk more commits than before.

I like this series,
Thanks for writing it!
Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter
  2018-06-29 21:38   ` Stefan Beller
@ 2018-06-30  1:32     ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-30  1:32 UTC (permalink / raw)
  To: Stefan Beller, Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On 6/29/2018 5:38 PM, Stefan Beller wrote:
> Hi Derrick,
>
>> +struct ref_filter_cbdata {
>> +       struct ref_array *array;
>> +       struct ref_filter *filter;
>> +       struct contains_cache contains_cache;
>> +       struct contains_cache no_contains_cache;
> These members also seem to be moved whitespace-inconsistently.
> Could you clarify in the commit message that you re-indent them or
> what happened?

Sorry, there appears to be something strange that happens when I paste 
into vim that causes my whitespace to mess up, especially leading 
whitespace. Usually `git rebase --whitespace=fix` fixes it.

I'll be more careful in my next replay of the series.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag()
  2018-06-29 21:47   ` Stefan Beller
@ 2018-06-30  1:35     ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-30  1:35 UTC (permalink / raw)
  To: Stefan Beller, Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On 6/29/2018 5:47 PM, Stefan Beller wrote:
> Hi Derrick,
>
>> +static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
> [...]
>> +               if (commit->date < min_commit_date)
>> +                       continue;
> [...]
>
>> +int can_all_from_reach_with_flag(struct object_array from,
>> +                                int with_flag, int assign_flag,
>> +                                time_t min_commit_date)
> Can you expand on why we introduce the min_commit_date in the commit message?
> (It looks like a rebase error as I would have expected a move only,
> given the subject)

This is a case where this should have been two steps:

1. Remove dependence on the global 'oldest_have' in upload-pack.c by 
creating the min_commit_date parameter.

2. Move the code from upload-pack.c to commit-reach.c.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 07/13] test-reach
  2018-06-29 21:54   ` Stefan Beller
@ 2018-06-30  1:40     ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2018-06-30  1:40 UTC (permalink / raw)
  To: Stefan Beller, Derrick Stolee; +Cc: git, Jeff King, Jakub Narębski

On 6/29/2018 5:54 PM, Stefan Beller wrote:
> On Fri, Jun 29, 2018 at 9:13 AM Derrick Stolee <dstolee@microsoft.com> wrote:
>
>> +# Construct a grid-like commit graph with points (x,y)
>> +# with 1 <= x <= 10, 1 <= y <= 10, where (x,y) has
>> +# parents (x-1, y) and (x, y-1), keeping in mind that
>> +# we drop a parent if a coordinate is nonpositive.
>> +#
>> +#             (10,10)
>> +#            /       \
>> +#         (10,9)    (9,10)
>> +#        /     \   /      \
>> +#    (10,8)    (9,9)      (8,10)
>> +#   /     \    /   \      /    \
>> +#         ( continued...)
>> +#   \     /    \   /      \    /
>> +#    (3,1)     (2,2)      (1,3)
>> +#        \     /    \     /
>> +#         (2,1)      (2,1)
>> +#              \    /
>> +#              (1,1)
>> +#
>> +# We use branch 'comit-x-y' to refer to (x,y).
>> +# This grid allows interesting reachability and
>> +# non-reachability queries: (x,y) can reach (x',y')
>> +# if and only if x' <= x and y' <= y.
> This is an interesting DAG, though not very common
> in reality (81 merges, 18 single parent commits,
> one root with a depth of at most 10).
>
> It reminds me of FELINE as that also has 2 independent numbers :)

The design of this graph is exactly so you (the test writer) can know if 
two commits can reach exactly by whether the two-dimensional keys are 
comparable. It's wide enough that we can come up with interesting test 
cases for some of these walks, especially the have/wants negotiation.

>
> I guess it is easy to make up artificial test cases for that,
> but I wonder if we want more variety for a generic reachability
> test tool (e.g. long chains of parallel histories, octopus,
> disconnected parts of the graph)
>
>> +test_expect_success 'setup' '
> [...]
>> +       git commit-graph write --reachable
> Is this only to test the full commit-graph?, maybe we'd
> want to write out the commit graph at (5,10) or so, such
> that half the walking is tested without graph.
> What about author/commit dates?

There are a lot of ways we can get strange edge cases for commit-graph 
walks, especially when we start being clever. This test case is only a 
start to feeling like we have good coverage, but having a non-trivial 
walk include both outputs (yes/no) for each method is a good start.

I think after we get a basic coverage for these methods on this example, 
we can add examples as necessary when we find tricky code paths that 
need coverage.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-06-30  1:40 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
2018-06-29 21:35   ` Stefan Beller
2018-06-29 21:52   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 02/13] commit-reach: move ref_newer from remote.c Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
2018-06-29 21:38   ` Stefan Beller
2018-06-30  1:32     ` Derrick Stolee
2018-06-29 22:00   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 04/13] upload-pack: make reachable() more generic Derrick Stolee
2018-06-29 22:05   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up() Derrick Stolee
2018-06-29 21:44   ` Stefan Beller
2018-06-29 16:12 ` [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag() Derrick Stolee
2018-06-29 21:47   ` Stefan Beller
2018-06-30  1:35     ` Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 07/13] test-reach Derrick Stolee
2018-06-29 21:54   ` Stefan Beller
2018-06-30  1:40     ` Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 08/13] test-reach: test reduce_heads() Derrick Stolee
2018-06-29 22:06   ` Stefan Beller
2018-06-29 16:12 ` [RFC PATCH 09/13] commit-reach: test can_all_from_reach Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 10/13] commit-reach: test is_descendant_of Derrick Stolee
2018-06-29 16:13 ` [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear Derrick Stolee
2018-06-29 23:18   ` Stefan Beller
2018-06-29 16:13 ` [RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer Derrick Stolee
2018-06-29 16:13 ` [RFC PATCH 13/13] commit-reach: use can_all_from_reach Derrick Stolee
2018-06-29 23:21   ` Stefan Beller
2018-06-29 17:33 ` [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).