git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] a few mark_parents_uninteresting cleanups
@ 2018-05-11 18:00 Jeff King
  2018-05-11 18:00 ` [PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check Jeff King
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Jeff King @ 2018-05-11 18:00 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

This is a follow-up to the discussion from February:

  https://public-inbox.org/git/1519522496-73090-1-git-send-email-dstolee@microsoft.com/

There I theorized that some of these extra has_object_file() checks were
unnecessary. After poking around a bit, I've convinced myself that this
is the case, so here are some patches.

After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
queries, 2018-02-24), I doubt this will provide any noticeable speedup.
IMHO the value is just in simplifying the logic.

The first two patches are the actual has_object_file simplifications.
The second two are an attempt to fix some head-scratching I had while
reading mark_parents_uninteresting(). I hope the result is easier to
follow, but I may have just shuffled one confusion for another. :)

  [1/4]: mark_tree_contents_uninteresting(): drop missing object check
  [2/4]: mark_parents_uninteresting(): drop missing object check
  [3/4]: mark_parents_uninteresting(): replace list with stack
  [4/4]: mark_parents_uninteresting(): avoid most allocation

 revision.c | 90 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 50 insertions(+), 40 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check
  2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
@ 2018-05-11 18:00 ` Jeff King
  2018-05-11 18:01 ` [PATCH 2/4] mark_parents_uninteresting(): " Jeff King
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2018-05-11 18:00 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

It's generally acceptable for UNINTERESTING objects in a
traversal to be unavailable (e.g., see aeeae1b771). When
marking trees UNINTERESTING, we access the object database
twice: once to check if the object is missing (and return
quietly if it is), and then again to actually parse it.

We can instead just try to parse; if that fails, we can then
return quietly. That halves the effort we spend on locating
the object.

Note that this isn't _exactly_ the same as the original
behavior, as the parse failure could be due to other
problems than a missing object: it could be corrupted, in
which case the original code would have died. But the new
behavior is arguably better, as it covers the object being
unavailable for any reason. We'll also still issue a warning
to stderr in such a case.

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 1cff11833e..ef70f69f08 100644
--- a/revision.c
+++ b/revision.c
@@ -52,12 +52,9 @@ static void mark_tree_contents_uninteresting(struct tree *tree)
 {
 	struct tree_desc desc;
 	struct name_entry entry;
-	struct object *obj = &tree->object;
 
-	if (!has_object_file(&obj->oid))
+	if (parse_tree_gently(tree, 1) < 0)
 		return;
-	if (parse_tree(tree) < 0)
-		die("bad tree %s", oid_to_hex(&obj->oid));
 
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
-- 
2.17.0.988.gec4b43b3e5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] mark_parents_uninteresting(): drop missing object check
  2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
  2018-05-11 18:00 ` [PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check Jeff King
@ 2018-05-11 18:01 ` Jeff King
  2018-05-13  2:23   ` Junio C Hamano
  2018-05-11 18:02 ` [PATCH 3/4] mark_parents_uninteresting(): replace list with stack Jeff King
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2018-05-11 18:01 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

We allow UNINTERESTING objects in a traversal to be
unavailable. As part of this, mark_parents_uninteresting()
checks whether we have a particular uninteresting parent; if
not, we will mark it as "parsed" so that later code skips
it.

This code is redundant and even a little bit harmful, so
let's drop it.

It's redundant because when our parse_object() call in
add_parents_to_list() fails, we already quietly skip
UNINTERESTING parents. This redundancy is a historical
artifact. The mark_parents_uninteresting() protection is
from 454fbbcde3 (git-rev-list: allow missing objects when
the parent is marked UNINTERESTING, 2005-07-10). Much later,
aeeae1b771 (revision traversal: allow UNINTERESTING objects
to be missing, 2009-01-27) covered more cases by making the
actual parse more gentle.

  As an aside, even if this weren't redundant, it would be
  insufficient. The gentle parsing handles both missing and
  corrupted objects, whereas the has_object_file() check
  we're getting rid of covers only missing ones.

And the code we're dropping is harmful for two reasons:

  1. We spend extra time on the object lookup, even though
     we don't actually need the information at this point
     (and will just repeat that lookup later when we parse
     for the common case that we _do_ have the object).

  2. It "lies" about the commit by setting the parsed flag,
     even though we didn't load any useful data into the
     struct. This shouldn't matter for the UNINTERESTING
     case, but we may later clear our flags and do another
     traversal in the same process. While pretty unlikely,
     it's possible that we could then look at the same
     commit without the UNINTERESTING flag, in which case
     we'd produce the wrong result (we'd think it's a commit
     with no parents, when in fact we should probably die
     due to the missing object).

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/revision.c b/revision.c
index ef70f69f08..cee4f3a4b4 100644
--- a/revision.c
+++ b/revision.c
@@ -103,18 +103,6 @@ void mark_parents_uninteresting(struct commit *commit)
 		struct commit *commit = pop_commit(&parents);
 
 		while (commit) {
-			/*
-			 * A missing commit is ok iff its parent is marked
-			 * uninteresting.
-			 *
-			 * We just mark such a thing parsed, so that when
-			 * it is popped next time around, we won't be trying
-			 * to parse it and get an error.
-			 */
-			if (!commit->object.parsed &&
-			    !has_object_file(&commit->object.oid))
-				commit->object.parsed = 1;
-
 			if (commit->object.flags & UNINTERESTING)
 				break;
 
-- 
2.17.0.988.gec4b43b3e5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] mark_parents_uninteresting(): replace list with stack
  2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
  2018-05-11 18:00 ` [PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check Jeff King
  2018-05-11 18:01 ` [PATCH 2/4] mark_parents_uninteresting(): " Jeff King
@ 2018-05-11 18:02 ` Jeff King
  2018-05-11 18:03 ` [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation Jeff King
  2018-05-14 12:50 ` [PATCH 0/4] a few mark_parents_uninteresting cleanups Derrick Stolee
  4 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2018-05-11 18:02 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

The mark_parents_uninteresting() function uses two loops:
an outer one to process our queue of pending parents, and an
inner one to process first-parent chains. This is a clever
optimization from 941ba8db57 (Eliminate recursion in
setting/clearing marks in commit list, 2012-01-14) to limit
the number of linked-list allocations when following
single-parent chains.

Unfortunately, this makes the result a little hard to read.
Let's replace the list with a stack. Then we don't have to
worry about doing this double-loop optimization, as we'll
just reuse the top element of the stack as we pop/push.

Signed-off-by: Jeff King <peff@peff.net>
---
The diff makes a lot more sense with "-w".

 revision.c | 67 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/revision.c b/revision.c
index cee4f3a4b4..89ff9a99ce 100644
--- a/revision.c
+++ b/revision.c
@@ -92,38 +92,57 @@ void mark_tree_uninteresting(struct tree *tree)
 	mark_tree_contents_uninteresting(tree);
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+struct commit_stack {
+	struct commit **items;
+	size_t nr, alloc;
+};
+#define COMMIT_STACK_INIT { NULL, 0, 0 }
+
+static void commit_stack_push(struct commit_stack *stack, struct commit *commit)
 {
-	struct commit_list *parents = NULL, *l;
+	ALLOC_GROW(stack->items, stack->nr + 1, stack->alloc);
+	stack->items[stack->nr++] = commit;
+}
 
-	for (l = commit->parents; l; l = l->next)
-		commit_list_insert(l->item, &parents);
+static struct commit *commit_stack_pop(struct commit_stack *stack)
+{
+	return stack->nr ? stack->items[--stack->nr] : NULL;
+}
 
-	while (parents) {
-		struct commit *commit = pop_commit(&parents);
+static void commit_stack_clear(struct commit_stack *stack)
+{
+	FREE_AND_NULL(stack->items);
+	stack->nr = stack->alloc = 0;
+}
 
-		while (commit) {
-			if (commit->object.flags & UNINTERESTING)
-				break;
+void mark_parents_uninteresting(struct commit *commit)
+{
+	struct commit_stack pending = COMMIT_STACK_INIT;
+	struct commit_list *l;
 
-			commit->object.flags |= UNINTERESTING;
+	for (l = commit->parents; l; l = l->next)
+		commit_stack_push(&pending, l->item);
 
-			/*
-			 * Normally we haven't parsed the parent
-			 * yet, so we won't have a parent of a parent
-			 * here. However, it may turn out that we've
-			 * reached this commit some other way (where it
-			 * wasn't uninteresting), in which case we need
-			 * to mark its parents recursively too..
-			 */
-			if (!commit->parents)
-				break;
+	while (pending.nr > 0) {
+		struct commit *commit = commit_stack_pop(&pending);
 
-			for (l = commit->parents->next; l; l = l->next)
-				commit_list_insert(l->item, &parents);
-			commit = commit->parents->item;
-		}
+		if (commit->object.flags & UNINTERESTING)
+			return;
+		commit->object.flags |= UNINTERESTING;
+
+		/*
+		 * Normally we haven't parsed the parent
+		 * yet, so we won't have a parent of a parent
+		 * here. However, it may turn out that we've
+		 * reached this commit some other way (where it
+		 * wasn't uninteresting), in which case we need
+		 * to mark its parents recursively too..
+		 */
+		for (l = commit->parents; l; l = l->next)
+			commit_stack_push(&pending, l->item);
 	}
+
+	commit_stack_clear(&pending);
 }
 
 static void add_pending_object_with_path(struct rev_info *revs,
-- 
2.17.0.988.gec4b43b3e5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
  2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
                   ` (2 preceding siblings ...)
  2018-05-11 18:02 ` [PATCH 3/4] mark_parents_uninteresting(): replace list with stack Jeff King
@ 2018-05-11 18:03 ` Jeff King
  2018-05-14 12:47   ` Derrick Stolee
  2018-05-14 12:50 ` [PATCH 0/4] a few mark_parents_uninteresting cleanups Derrick Stolee
  4 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2018-05-11 18:03 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano

Commit 941ba8db57 (Eliminate recursion in setting/clearing
marks in commit list, 2012-01-14) used a clever double-loop
to avoid allocations for single-parent chains of history.
However, it did so only when following parents of parents
(which was an uncommon case), and _always_ incurred at least
one allocation to populate the list of pending parents in
the first place.

We can turn this into zero-allocation in the common case by
iterating directly over the initial parent list, and then
following up on any pending items we might have discovered.

Signed-off-by: Jeff King <peff@peff.net>
---
Again, try "-w" for more readability.

 revision.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/revision.c b/revision.c
index 89ff9a99ce..cbe041128e 100644
--- a/revision.c
+++ b/revision.c
@@ -115,32 +115,38 @@ static void commit_stack_clear(struct commit_stack *stack)
 	stack->nr = stack->alloc = 0;
 }
 
-void mark_parents_uninteresting(struct commit *commit)
+static void mark_one_parent_uninteresting(struct commit *commit,
+					  struct commit_stack *pending)
 {
-	struct commit_stack pending = COMMIT_STACK_INIT;
 	struct commit_list *l;
 
+	if (commit->object.flags & UNINTERESTING)
+		return;
+	commit->object.flags |= UNINTERESTING;
+
+	/*
+	 * Normally we haven't parsed the parent
+	 * yet, so we won't have a parent of a parent
+	 * here. However, it may turn out that we've
+	 * reached this commit some other way (where it
+	 * wasn't uninteresting), in which case we need
+	 * to mark its parents recursively too..
+	 */
 	for (l = commit->parents; l; l = l->next)
-		commit_stack_push(&pending, l->item);
+		commit_stack_push(pending, l->item);
+}
 
-	while (pending.nr > 0) {
-		struct commit *commit = commit_stack_pop(&pending);
+void mark_parents_uninteresting(struct commit *commit)
+{
+	struct commit_stack pending = COMMIT_STACK_INIT;
+	struct commit_list *l;
 
-		if (commit->object.flags & UNINTERESTING)
-			return;
-		commit->object.flags |= UNINTERESTING;
+	for (l = commit->parents; l; l = l->next)
+		mark_one_parent_uninteresting(l->item, &pending);
 
-		/*
-		 * Normally we haven't parsed the parent
-		 * yet, so we won't have a parent of a parent
-		 * here. However, it may turn out that we've
-		 * reached this commit some other way (where it
-		 * wasn't uninteresting), in which case we need
-		 * to mark its parents recursively too..
-		 */
-		for (l = commit->parents; l; l = l->next)
-			commit_stack_push(&pending, l->item);
-	}
+	while (pending.nr > 0)
+		mark_one_parent_uninteresting(commit_stack_pop(&pending),
+					      &pending);
 
 	commit_stack_clear(&pending);
 }
-- 
2.17.0.988.gec4b43b3e5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] mark_parents_uninteresting(): drop missing object check
  2018-05-11 18:01 ` [PATCH 2/4] mark_parents_uninteresting(): " Jeff King
@ 2018-05-13  2:23   ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2018-05-13  2:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Derrick Stolee

Jeff King <peff@peff.net> writes:

>   2. It "lies" about the commit by setting the parsed flag,
>      even though we didn't load any useful data into the
>      struct. This shouldn't matter for the UNINTERESTING
>      case, but we may later clear our flags and do another
>      traversal in the same process. While pretty unlikely,
>      it's possible that we could then look at the same
>      commit without the UNINTERESTING flag,...

Yeah, if two ranges given to tbdiff to be compared are computed
in-core, uninteresting boundary of one range is likely to be
interesting on the other range.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
  2018-05-11 18:03 ` [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation Jeff King
@ 2018-05-14 12:47   ` Derrick Stolee
  2018-05-14 13:09     ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Derrick Stolee @ 2018-05-14 12:47 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Derrick Stolee, Junio C Hamano

On 5/11/2018 2:03 PM, Jeff King wrote:
> Commit 941ba8db57 (Eliminate recursion in setting/clearing
> marks in commit list, 2012-01-14) used a clever double-loop
> to avoid allocations for single-parent chains of history.
> However, it did so only when following parents of parents
> (which was an uncommon case), and _always_ incurred at least
> one allocation to populate the list of pending parents in
> the first place.
>
> We can turn this into zero-allocation in the common case by
> iterating directly over the initial parent list, and then
> following up on any pending items we might have discovered.

This change appears to improve performance, but I was unable to measure 
any difference between this commit and the one ahead, even when merging 
ds/generation-numbers (which significantly reduces the other costs). I 
was testing 'git status' and 'git rev-list --boundary 
master...origin/master' in the Linux repo with my copy of master 70,000+ 
commits behind origin/master.

It's still a good change, but I was hoping to find a measurable benefit :(

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Again, try "-w" for more readability.
>
>   revision.c | 44 +++++++++++++++++++++++++-------------------
>   1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/revision.c b/revision.c
> index 89ff9a99ce..cbe041128e 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -115,32 +115,38 @@ static void commit_stack_clear(struct commit_stack *stack)
>   	stack->nr = stack->alloc = 0;
>   }
>   
> -void mark_parents_uninteresting(struct commit *commit)
> +static void mark_one_parent_uninteresting(struct commit *commit,
> +					  struct commit_stack *pending)
>   {
> -	struct commit_stack pending = COMMIT_STACK_INIT;
>   	struct commit_list *l;
>   
> +	if (commit->object.flags & UNINTERESTING)
> +		return;
> +	commit->object.flags |= UNINTERESTING;
> +
> +	/*
> +	 * Normally we haven't parsed the parent
> +	 * yet, so we won't have a parent of a parent
> +	 * here. However, it may turn out that we've
> +	 * reached this commit some other way (where it
> +	 * wasn't uninteresting), in which case we need
> +	 * to mark its parents recursively too..
> +	 */
>   	for (l = commit->parents; l; l = l->next)
> -		commit_stack_push(&pending, l->item);
> +		commit_stack_push(pending, l->item);
> +}
>   
> -	while (pending.nr > 0) {
> -		struct commit *commit = commit_stack_pop(&pending);
> +void mark_parents_uninteresting(struct commit *commit)
> +{
> +	struct commit_stack pending = COMMIT_STACK_INIT;
> +	struct commit_list *l;
>   
> -		if (commit->object.flags & UNINTERESTING)
> -			return;
> -		commit->object.flags |= UNINTERESTING;
> +	for (l = commit->parents; l; l = l->next)
> +		mark_one_parent_uninteresting(l->item, &pending);
>   
> -		/*
> -		 * Normally we haven't parsed the parent
> -		 * yet, so we won't have a parent of a parent
> -		 * here. However, it may turn out that we've
> -		 * reached this commit some other way (where it
> -		 * wasn't uninteresting), in which case we need
> -		 * to mark its parents recursively too..
> -		 */
> -		for (l = commit->parents; l; l = l->next)
> -			commit_stack_push(&pending, l->item);
> -	}
> +	while (pending.nr > 0)
> +		mark_one_parent_uninteresting(commit_stack_pop(&pending),
> +					      &pending);
>   
>   	commit_stack_clear(&pending);
>   }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] a few mark_parents_uninteresting cleanups
  2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
                   ` (3 preceding siblings ...)
  2018-05-11 18:03 ` [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation Jeff King
@ 2018-05-14 12:50 ` Derrick Stolee
  4 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee @ 2018-05-14 12:50 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Derrick Stolee, Junio C Hamano

On 5/11/2018 2:00 PM, Jeff King wrote:
> This is a follow-up to the discussion from February:
>
>    https://public-inbox.org/git/1519522496-73090-1-git-send-email-dstolee@microsoft.com/
>
> There I theorized that some of these extra has_object_file() checks were
> unnecessary. After poking around a bit, I've convinced myself that this
> is the case, so here are some patches.
>
> After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
> queries, 2018-02-24), I doubt this will provide any noticeable speedup.
> IMHO the value is just in simplifying the logic.
>
> The first two patches are the actual has_object_file simplifications.
> The second two are an attempt to fix some head-scratching I had while
> reading mark_parents_uninteresting(). I hope the result is easier to
> follow, but I may have just shuffled one confusion for another. :)
>
>    [1/4]: mark_tree_contents_uninteresting(): drop missing object check
>    [2/4]: mark_parents_uninteresting(): drop missing object check
>    [3/4]: mark_parents_uninteresting(): replace list with stack
>    [4/4]: mark_parents_uninteresting(): avoid most allocation
>
>   revision.c | 90 ++++++++++++++++++++++++++++++------------------------
>   1 file changed, 50 insertions(+), 40 deletions(-)
>
> -Peff

This series looks good to me. I found Patch 3 hard to read in the diff, 
so I just looked at the final result and the new arrangement is very 
clear about how it should behave.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
  2018-05-14 12:47   ` Derrick Stolee
@ 2018-05-14 13:09     ` Jeff King
  2018-05-14 13:25       ` Derrick Stolee
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2018-05-14 13:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Derrick Stolee, Junio C Hamano

On Mon, May 14, 2018 at 08:47:33AM -0400, Derrick Stolee wrote:

> On 5/11/2018 2:03 PM, Jeff King wrote:
> > Commit 941ba8db57 (Eliminate recursion in setting/clearing
> > marks in commit list, 2012-01-14) used a clever double-loop
> > to avoid allocations for single-parent chains of history.
> > However, it did so only when following parents of parents
> > (which was an uncommon case), and _always_ incurred at least
> > one allocation to populate the list of pending parents in
> > the first place.
> > 
> > We can turn this into zero-allocation in the common case by
> > iterating directly over the initial parent list, and then
> > following up on any pending items we might have discovered.
> 
> This change appears to improve performance, but I was unable to measure any
> difference between this commit and the one ahead, even when merging
> ds/generation-numbers (which significantly reduces the other costs). I was
> testing 'git status' and 'git rev-list --boundary master...origin/master' in
> the Linux repo with my copy of master 70,000+ commits behind origin/master.

I think you'd want to go the other way: this is marking uninteresting
commits, so you'd want origin/master..master, which would make those 70k
commits uninteresting.

I still doubt it will create that much difference, though. It's
more or less saving a single malloc per commit we traverse. Certainly
without the .commits file that's a drop in the bucket compared to
inflating the object data, but I wouldn't be surprised if we end up with
a few mallocs even after your patches.

Anyway, thanks for poking at it. :)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
  2018-05-14 13:09     ` Jeff King
@ 2018-05-14 13:25       ` Derrick Stolee
  2018-05-14 14:09         ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Derrick Stolee @ 2018-05-14 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Derrick Stolee, Junio C Hamano

On 5/14/2018 9:09 AM, Jeff King wrote:
> On Mon, May 14, 2018 at 08:47:33AM -0400, Derrick Stolee wrote:
>
>> On 5/11/2018 2:03 PM, Jeff King wrote:
>>> Commit 941ba8db57 (Eliminate recursion in setting/clearing
>>> marks in commit list, 2012-01-14) used a clever double-loop
>>> to avoid allocations for single-parent chains of history.
>>> However, it did so only when following parents of parents
>>> (which was an uncommon case), and _always_ incurred at least
>>> one allocation to populate the list of pending parents in
>>> the first place.
>>>
>>> We can turn this into zero-allocation in the common case by
>>> iterating directly over the initial parent list, and then
>>> following up on any pending items we might have discovered.
>> This change appears to improve performance, but I was unable to measure any
>> difference between this commit and the one ahead, even when merging
>> ds/generation-numbers (which significantly reduces the other costs). I was
>> testing 'git status' and 'git rev-list --boundary master...origin/master' in
>> the Linux repo with my copy of master 70,000+ commits behind origin/master.
> I think you'd want to go the other way: this is marking uninteresting
> commits, so you'd want origin/master..master, which would make those 70k
> commits uninteresting.

Thanks for the tip. Running 'git rev-list origin/master..master' 100 
times gave the following times, on average (with ds/generation-numbers):

PATCH 3/4: 0.19 s
PATCH 4/4: 0.16 s
     Rel %: -16%

>
> I still doubt it will create that much difference, though. It's
> more or less saving a single malloc per commit we traverse. Certainly
> without the .commits file that's a drop in the bucket compared to
> inflating the object data, but I wouldn't be surprised if we end up with
> a few mallocs even after your patches.
>
> Anyway, thanks for poking at it. :)
>
> -Peff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation
  2018-05-14 13:25       ` Derrick Stolee
@ 2018-05-14 14:09         ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2018-05-14 14:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Derrick Stolee, Junio C Hamano

On Mon, May 14, 2018 at 09:25:46AM -0400, Derrick Stolee wrote:

> > I think you'd want to go the other way: this is marking uninteresting
> > commits, so you'd want origin/master..master, which would make those 70k
> > commits uninteresting.
> 
> Thanks for the tip. Running 'git rev-list origin/master..master' 100 times
> gave the following times, on average (with ds/generation-numbers):
> 
> PATCH 3/4: 0.19 s
> PATCH 4/4: 0.16 s
>     Rel %: -16%

Nifty. I liked it as just a cleanup, but I'm happier still to find that
it makes things faster (even if it's pretty small in absolute numbers).

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-14 14:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-11 18:00 [PATCH 0/4] a few mark_parents_uninteresting cleanups Jeff King
2018-05-11 18:00 ` [PATCH 1/4] mark_tree_contents_uninteresting(): drop missing object check Jeff King
2018-05-11 18:01 ` [PATCH 2/4] mark_parents_uninteresting(): " Jeff King
2018-05-13  2:23   ` Junio C Hamano
2018-05-11 18:02 ` [PATCH 3/4] mark_parents_uninteresting(): replace list with stack Jeff King
2018-05-11 18:03 ` [PATCH 4/4] mark_parents_uninteresting(): avoid most allocation Jeff King
2018-05-14 12:47   ` Derrick Stolee
2018-05-14 13:09     ` Jeff King
2018-05-14 13:25       ` Derrick Stolee
2018-05-14 14:09         ` Jeff King
2018-05-14 12:50 ` [PATCH 0/4] a few mark_parents_uninteresting cleanups Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).