git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/15] name-rev: eliminate recursion
@ 2019-09-19 21:46 SZEDER Gábor
  2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
                   ` (18 more replies)
  0 siblings, 19 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

'git name-rev' is implemented using a recursive algorithm, and,
consequently, it can segfault in deep histories (e.g. WebKit), and
thanks to a test case demonstrating this limitation every test run
results in a dmesg entry logging the segfaulting git process.

This patch series eliminates the recursion.

Patches 1-5 and 14-15 are while-at-it cleanups I noticed on the way,
and patch 6 improves test coverage.

Patches 7-11 are preparatory refactorings that are supposed to make
this series easier to follow, and make patch 12, the one finally
eliminating the recursion, somewhat shorter, and even much shorter
when viewed with '--ignore-all-space'.  Patch 13 cleans up after those
preparatory steps.

SZEDER Gábor (15):
  t6120-describe: correct test repo history graph in comment
  t6120-describe: modernize the 'check_describe' helper
  name-rev: use strip_suffix() in get_rev_name()
  name-rev: avoid unnecessary cast in name_ref()
  name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  t6120: add a test to cover inner conditions in 'git name-rev's
    name_rev()
  name-rev: extract creating/updating a 'struct name_rev' into a helper
  name-rev: pull out deref handling from the recursion
  name-rev: restructure parsing commits and applying date cutoff
  name-rev: restructure creating/updating 'struct rev_name' instances
  name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  name-rev: eliminate recursion in name_rev()
  name-rev: cleanup name_ref()
  name-rev: plug a memory leak in name_rev()
  name-rev: plug a memory leak in name_rev() in the deref case

 builtin/name-rev.c  | 140 ++++++++++++++++++++++++++++----------------
 t/t6120-describe.sh |  72 ++++++++++++++++++-----
 2 files changed, 147 insertions(+), 65 deletions(-)

-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH 01/15] t6120-describe: correct test repo history graph in comment
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
@ 2019-09-19 21:46 ` SZEDER Gábor
  2019-09-20 21:47   ` Junio C Hamano
  2019-09-19 21:46 ` [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

At the top of 't6120-describe.sh' an ASCII graph illustrates the
repository's history used in this test script.  This graph is a bit
misleading, because it swapped the second merge commit's first and
second parents.

When describing/naming a commit it does make a difference which parent
is the first and which is the second/Nth, so update this graph to
accurately represent that second merge.

While at it, move this history graph from the 'test_description'
variable to a regular comment.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 2b883d8174..0bf7e0c8bc 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -1,15 +1,14 @@
 #!/bin/sh
 
-test_description='test describe
+test_description='test describe'
+
+#       ,---o----o----o-----.
+#      /   D,R   e           \
+#  o--o-----o-------------o---o----x
+#      \    B            /
+#       `---o----o----o-'
+#                A    c
 
-                       B
-        .--------------o----o----o----x
-       /                   /    /
- o----o----o----o----o----.    /
-       \        A    c        /
-        .------------o---o---o
-                   D,R   e
-'
 . ./test-lib.sh
 
 check_describe () {
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
  2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
@ 2019-09-19 21:46 ` SZEDER Gábor
  2019-09-20 21:49   ` Junio C Hamano
  2019-09-19 21:46 ` [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name() SZEDER Gábor
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The 'check_describe' helper function runs 'git describe' outside of
'test_expect_success' blocks, with extra hand-rolled code to record
and examine its exit code.

Update this helper and move the 'git decribe' invocation inside the
'test_expect_success' block.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 0bf7e0c8bc..07e6793e84 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -14,14 +14,12 @@ test_description='test describe'
 check_describe () {
 	expect="$1"
 	shift
-	R=$(git describe "$@" 2>err.actual)
-	S=$?
-	cat err.actual >&3
-	test_expect_success "describe $*" '
-	test $S = 0 &&
+	describe_opts="$@"
+	test_expect_success "describe $describe_opts" '
+	R=$(git describe $describe_opts 2>err.actual) &&
 	case "$R" in
 	$expect)	echo happy ;;
-	*)	echo "Oops - $R is not $expect";
+	*)	echo "Oops - $R is not $expect" &&
 		false ;;
 	esac
 	'
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
  2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
  2019-09-19 21:46 ` [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
@ 2019-09-19 21:46 ` SZEDER Gábor
  2019-09-20 16:36   ` René Scharfe
  2019-09-19 21:46 ` [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

Use strip_suffix() instead of open-coding it, making the code more
idiomatic.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c785fe16ba..d345456656 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -317,11 +317,11 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 	if (!n->generation)
 		return n->tip_name;
 	else {
-		int len = strlen(n->tip_name);
-		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
-			len -= 2;
+		size_t len;
+		strip_suffix(n->tip_name, "^0", &len);
 		strbuf_reset(buf);
-		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
+		strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
+			    n->generation);
 		return buf->buf;
 	}
 }
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (2 preceding siblings ...)
  2019-09-19 21:46 ` [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name() SZEDER Gábor
@ 2019-09-19 21:46 ` SZEDER Gábor
  2019-09-20 16:37   ` René Scharfe
  2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

Casting a 'struct object' to 'struct commit' is unnecessary there,
because it's already available in the local 'commit' variable.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index d345456656..e406ff8e17 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -268,7 +268,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		int from_tag = starts_with(path, "refs/tags/");
 
 		if (taggerdate == TIME_MAX)
-			taggerdate = ((struct commit *)o)->date;
+			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
 			 from_tag, deref);
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (3 preceding siblings ...)
  2019-09-19 21:46 ` [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:11   ` Derrick Stolee
  2019-09-20 16:37   ` René Scharfe
  2019-09-19 21:47 ` [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e406ff8e17..dec2228cc7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -98,7 +98,7 @@ static void name_rev(struct commit *commit,
 	}
 
 	if (name == NULL) {
-		name = xmalloc(sizeof(rev_name));
+		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
 		goto copy_data;
 	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (4 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:14   ` Derrick Stolee
  2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

In 'builtin/name-rev.c' in the name_rev() function there is a loop
iterating over all parents of the given commit, and the loop body
looks like this:

  if (parent_number > 1) {
    if (generation > 0)
      // do stuff #1
    else
      // do stuff #2
  } else {
     // do stuff #3
  }

These conditions are not covered properly in the test suite.  As far
as purely test coverage goes, they are all executed several times over
in 't6120-describe.sh'.  However, they don't directly influence the
command's output, because the repository used in that test script
contains several branches and tags pointing somewhere into the middle
of the commit DAG, and thus result in a better name for the
to-be-named commit.  In an early version of this patch series I
managed to mess up those conditions (every single one of them at
once!), but the whole test suite still passed successfully.

So add a new test case that operates on the following history:

    -----------master
   /          /
  A----------M2
   \        /
    \---M1-C
     \ /
      B

and names the commit 'B', where:

  - The merge commit at master makes sure that the 'do stuff #3'
    affects the final name.

  - The merge commit M2 make sure that the 'do stuff #1' part
    affects the final name.

  - And M1 makes sure that the 'do stuff #2' part affects the final
    name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 07e6793e84..2a0f2204c4 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -421,4 +421,47 @@ test_expect_success 'describe complains about missing object' '
 	test_must_fail git describe $ZERO_OID
 '
 
+#   -----------master
+#  /          /
+# A----------M2
+#  \        /
+#   \---M1-C
+#    \ /
+#     B
+test_expect_success 'test' '
+	git init repo &&
+	(
+		cd repo &&
+
+		echo A >file &&
+		git add file &&
+		git commit -m A &&
+		A=$(git rev-parse HEAD) &&
+
+		git checkout --detach &&
+		echo B >file &&
+		git commit -m B file &&
+		B=$(git rev-parse HEAD) &&
+
+		git checkout $A &&
+		git merge --no-ff $B &&  # M1
+
+		echo C >file &&
+		git commit -m C file &&
+
+		git checkout $A &&
+		git merge --no-ff HEAD@{1} && # M2
+
+		git checkout master &&
+		git merge --no-ff HEAD@{1} &&
+
+		git log --graph --oneline &&
+
+		echo "$B master^2^2~1^2" >expect &&
+		git name-rev $B >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (5 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:18   ` Derrick Stolee
  2019-09-22  8:18   ` [PATCH] name-rev: rewrite create_or_update_name() Martin Ågren
  2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

In a later patch in this series we'll want to do this in two places.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index dec2228cc7..cb8ac2fa64 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -75,12 +75,36 @@ static int is_better_name(struct rev_name *name,
 	return 0;
 }
 
+static struct rev_name *create_or_update_name(struct commit *commit,
+					      const char *tip_name,
+					      timestamp_t taggerdate,
+					      int generation, int distance,
+					      int from_tag)
+{
+	struct rev_name *name = get_commit_rev_name(commit);
+
+	if (name == NULL) {
+		name = xmalloc(sizeof(*name));
+		set_commit_rev_name(commit, name);
+		goto copy_data;
+	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
+copy_data:
+		name->tip_name = tip_name;
+		name->taggerdate = taggerdate;
+		name->generation = generation;
+		name->distance = distance;
+		name->from_tag = from_tag;
+
+		return name;
+	} else
+		return NULL;
+}
+
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int generation, int distance, int from_tag,
 		int deref)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 	char *to_free = NULL;
@@ -97,18 +121,8 @@ static void name_rev(struct commit *commit,
 			die("generation: %d, but deref?", generation);
 	}
 
-	if (name == NULL) {
-		name = xmalloc(sizeof(*name));
-		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-	} else {
+	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
+				   distance, from_tag)) {
 		free(to_free);
 		return;
 	}
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (6 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:21   ` Derrick Stolee
  2019-09-20 16:37   ` René Scharfe
  2019-09-19 21:47 ` [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The 'if (deref) { ... }' condition near the beginning of the recursive
name_rev() function can only ever be true in the first invocation,
because the 'deref' parameter is always 0 in the subsequent recursive
invocations.

Extract this condition from the recursion into name_rev()'s caller and
drop the function's 'deref' parameter.  This makes eliminating the
recursion a bit easier to follow, and it will be moved back into
name_rev() after the recursion is elminated.

Furthermore, drop the condition that die()s when both 'deref' and
'generation' are non-null (which should have been a BUG() to begin
with).

Note that this change reintroduces the memory leak that was plugged in
in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
case, 2017-05-04), but a later patch in this series will plug it in
again.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index cb8ac2fa64..42cea5c881 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -102,30 +102,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag,
-		int deref)
+		int generation, int distance, int from_tag)
 {
 	struct commit_list *parents;
 	int parent_number = 1;
-	char *to_free = NULL;
 
 	parse_commit(commit);
 
 	if (commit->date < cutoff)
 		return;
 
-	if (deref) {
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
-
-		if (generation)
-			die("generation: %d, but deref?", generation);
-	}
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag)) {
-		free(to_free);
+				   distance, from_tag))
 		return;
-	}
 
 	for (parents = commit->parents;
 			parents;
@@ -144,11 +133,11 @@ static void name_rev(struct commit *commit,
 
 			name_rev(parents->item, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag, 0);
+				 from_tag);
 		} else {
 			name_rev(parents->item, tip_name, taggerdate,
 				 generation + 1, distance + 1,
-				 from_tag, 0);
+				 from_tag);
 		}
 	}
 }
@@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
+		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref);
+		if (deref)
+			tip_name = xstrfmt("%s^0", path);
+		else
+			tip_name = xstrdup(path);
+		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
 	}
 	return 0;
 }
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (7 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-21 12:37   ` René Scharfe
  2019-09-19 21:47 ` [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.

Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.

This makes eliminating the recursion a bit easier to follow, and it
will be moved back to name_rev() after the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 42cea5c881..99643aa4dc 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -107,11 +107,6 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	parse_commit(commit);
-
-	if (commit->date < cutoff)
-		return;
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
 				   distance, from_tag))
 		return;
@@ -119,6 +114,12 @@ static void name_rev(struct commit *commit,
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
+		struct commit *parent = parents->item;
+
+		parse_commit(parent);
+		if (parent->date < cutoff)
+			continue;
+
 		if (parent_number > 1) {
 			size_t len;
 			char *new_name;
@@ -131,11 +132,11 @@ static void name_rev(struct commit *commit,
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
 
-			name_rev(parents->item, new_name, taggerdate, 0,
+			name_rev(parent, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
 				 from_tag);
 		} else {
-			name_rev(parents->item, tip_name, taggerdate,
+			name_rev(parent, tip_name, taggerdate,
 				 generation + 1, distance + 1,
 				 from_tag);
 		}
@@ -269,16 +270,18 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
-		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (deref)
-			tip_name = xstrfmt("%s^0", path);
-		else
-			tip_name = xstrdup(path);
-		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		if (commit->date >= cutoff) {
+			const char *tip_name;
+			if (deref)
+				tip_name = xstrfmt("%s^0", path);
+			else
+				tip_name = xstrdup(path);
+			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		}
 	}
 	return 0;
 }
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (8 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:27   ` Derrick Stolee
  2019-09-19 21:47 ` [PATCH 11/15] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to to the commit, or
returns early.

Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.

This makes eliminating the recursion a bit easier to follow, and it
will be moved back to name_rev() after the recursion is eliminated.

This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 99643aa4dc..98a549fef7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -107,14 +107,12 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag))
-		return;
-
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
+		const char *new_name;
+		int new_generation, new_distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -122,7 +120,6 @@ static void name_rev(struct commit *commit,
 
 		if (parent_number > 1) {
 			size_t len;
-			char *new_name;
 
 			strip_suffix(tip_name, "^0", &len);
 			if (generation > 0)
@@ -131,15 +128,19 @@ static void name_rev(struct commit *commit,
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-
-			name_rev(parent, new_name, taggerdate, 0,
-				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag);
+			new_generation = 0;
+			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
-			name_rev(parent, tip_name, taggerdate,
-				 generation + 1, distance + 1,
-				 from_tag);
+			new_name = tip_name;
+			new_generation = generation + 1;
+			new_distance = distance + 1;
 		}
+
+		if (create_or_update_name(parent, new_name, taggerdate,
+					  new_generation, new_distance,
+					  from_tag))
+			name_rev(parent, new_name, taggerdate,
+				 new_generation, new_distance, from_tag);
 	}
 }
 
@@ -276,11 +277,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		if (commit->date >= cutoff) {
 			const char *tip_name;
+			char *to_free = NULL;
 			if (deref)
-				tip_name = xstrfmt("%s^0", path);
+				tip_name = to_free = xstrfmt("%s^0", path);
 			else
 				tip_name = xstrdup(path);
-			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+			if (create_or_update_name(commit, tip_name, taggerdate,
+						  0, 0, from_tag))
+				name_rev(commit, tip_name, taggerdate, 0, 0,
+					 from_tag);
+			else
+				free(to_free);
 		}
 	}
 	return 0;
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 11/15] name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (9 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 21:47 ` [PATCH 12/15] name-rev: eliminate recursion in name_rev() SZEDER Gábor
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.

Let's simplify the function's signature and remove these two
unnecessary parameters.

Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.

Drop name_rev()'s 'generation' and 'distance' parameters.
---
 builtin/name-rev.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 98a549fef7..f2198a8bc3 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -102,8 +102,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag)
+		int from_tag)
 {
+	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 
@@ -112,7 +113,7 @@ static void name_rev(struct commit *commit,
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
 		const char *new_name;
-		int new_generation, new_distance;
+		int generation, distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -122,25 +123,25 @@ static void name_rev(struct commit *commit,
 			size_t len;
 
 			strip_suffix(tip_name, "^0", &len);
-			if (generation > 0)
+			if (name->generation > 0)
 				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   generation, parent_number);
+						   name->generation,
+						   parent_number);
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-			new_generation = 0;
-			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
+			generation = 0;
+			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
 			new_name = tip_name;
-			new_generation = generation + 1;
-			new_distance = distance + 1;
+			generation = name->generation + 1;
+			distance = name->distance + 1;
 		}
 
 		if (create_or_update_name(parent, new_name, taggerdate,
-					  new_generation, new_distance,
+					  generation, distance,
 					  from_tag))
-			name_rev(parent, new_name, taggerdate,
-				 new_generation, new_distance, from_tag);
+			name_rev(parent, new_name, taggerdate, from_tag);
 	}
 }
 
@@ -284,7 +285,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 				tip_name = xstrdup(path);
 			if (create_or_update_name(commit, tip_name, taggerdate,
 						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate, 0, 0,
+				name_rev(commit, tip_name, taggerdate,
 					 from_tag);
 			else
 				free(to_free);
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 12/15] name-rev: eliminate recursion in name_rev()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (10 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 11/15] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 21:47 ` [PATCH 13/15] name-rev: cleanup name_ref() SZEDER Gábor
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space.  E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine.

Eliminate the recursion by inserting the interesting parents into a
'commit_list' and iteratating until the list becomes empty.

Note that the order in which the parent commits are added to that list
is important: they must be inserted at the beginning of the list, and
their relative order must be kept as well, because otherwise
performance suffers.

The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure.  Now
the recursion is gone, so flip it to expect success.

Also gone are the dmesg entries logging the segfault of the git
process on every execution of the test suite.

Unfortunately, eliminating the recursion comes with a performance
penaly: 'git name-rev --all' tends to be between 15-20% slower than
before.

Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git.  This shouldn't matter in
practice, because the output has always been unordered anyway.

This patch is best viewed with '--ignore-all-space'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c  | 85 ++++++++++++++++++++++++++-------------------
 t/t6120-describe.sh |  2 +-
 2 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index f2198a8bc3..b6fa495340 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -100,48 +100,63 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 		return NULL;
 }
 
-static void name_rev(struct commit *commit,
+static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int from_tag)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
-	struct commit_list *parents;
-	int parent_number = 1;
-
-	for (parents = commit->parents;
-			parents;
-			parents = parents->next, parent_number++) {
-		struct commit *parent = parents->item;
-		const char *new_name;
-		int generation, distance;
-
-		parse_commit(parent);
-		if (parent->date < cutoff)
-			continue;
+	struct commit_list *list = NULL;
+
+	commit_list_insert(start_commit, &list);
+
+	while (list) {
+		struct commit *commit = pop_commit(&list);
+		struct rev_name *name = get_commit_rev_name(commit);
+		struct commit_list *parents, *new_parents = NULL;
+		struct commit_list **last_new_parent = &new_parents;
+		int parent_number = 1;
+
+		for (parents = commit->parents;
+				parents;
+				parents = parents->next, parent_number++) {
+			struct commit *parent = parents->item;
+			const char *new_name;
+			int generation, distance;
+
+			parse_commit(parent);
+			if (parent->date < cutoff)
+				continue;
 
-		if (parent_number > 1) {
-			size_t len;
+			if (parent_number > 1) {
+				size_t len;
+
+				strip_suffix(name->tip_name, "^0", &len);
+				if (name->generation > 0)
+					new_name = xstrfmt("%.*s~%d^%d",
+							   (int)len,
+							   name->tip_name,
+							   name->generation,
+							   parent_number);
+				else
+					new_name = xstrfmt("%.*s^%d", (int)len,
+							   name->tip_name,
+							   parent_number);
+				generation = 0;
+				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
+			} else {
+				new_name = name->tip_name;
+				generation = name->generation + 1;
+				distance = name->distance + 1;
+			}
 
-			strip_suffix(tip_name, "^0", &len);
-			if (name->generation > 0)
-				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   name->generation,
-						   parent_number);
-			else
-				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
-						   parent_number);
-			generation = 0;
-			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
-		} else {
-			new_name = tip_name;
-			generation = name->generation + 1;
-			distance = name->distance + 1;
+			if (create_or_update_name(parent, new_name, taggerdate,
+						  generation, distance,
+						  from_tag))
+				last_new_parent = commit_list_append(parent,
+						  last_new_parent);
 		}
 
-		if (create_or_update_name(parent, new_name, taggerdate,
-					  generation, distance,
-					  from_tag))
-			name_rev(parent, new_name, taggerdate, from_tag);
+		*last_new_parent = list;
+		list = new_parents;
 	}
 }
 
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 2a0f2204c4..e37f02d21c 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -379,7 +379,7 @@ test_expect_success 'describe tag object' '
 	test_i18ngrep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
-test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
+test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
 	do
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 13/15] name-rev: cleanup name_ref()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (11 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 12/15] name-rev: eliminate recursion in name_rev() SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug a memory leak in name_rev() SZEDER Gábor
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.

Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index b6fa495340..e202835129 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -102,9 +102,23 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int from_tag)
+		int from_tag, int deref)
 {
 	struct commit_list *list = NULL;
+	char *to_free = NULL;
+
+	parse_commit(start_commit);
+	if (start_commit->date < cutoff)
+		return;
+
+	if (deref)
+		tip_name = to_free = xstrfmt("%s^0", tip_name);
+
+	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
+				   from_tag)) {
+		free(to_free);
+		return;
+	}
 
 	commit_list_insert(start_commit, &list);
 
@@ -291,20 +305,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (commit->date >= cutoff) {
-			const char *tip_name;
-			char *to_free = NULL;
-			if (deref)
-				tip_name = to_free = xstrfmt("%s^0", path);
-			else
-				tip_name = xstrdup(path);
-			if (create_or_update_name(commit, tip_name, taggerdate,
-						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate,
-					 from_tag);
-			else
-				free(to_free);
-		}
+		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
 	}
 	return 0;
 }
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 14/15] name-rev: plug a memory leak in name_rev()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (12 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 13/15] name-rev: cleanup name_ref() SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case SZEDER Gábor
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The loop iterating over the parent commits in the name_rev() function
contains two xstrfmt() calls, and their result is leaked if the parent
commit is not processed further (because that parent has already been
visited before, and this further visit doesn't result in a better name
for its ancestors).

Make sure that the result of those xstrfmt() calls is free()d if the
parent commit is not processed further.

This results in slightly but measurably lower memory usage: the
avarage maximum resident size of 5 'git name-rev --all' invocations in
'linux.git' shrinks from 3256124kB to 319990kB, just about 2% less.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e202835129..3331075aa4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -135,6 +135,7 @@ static void name_rev(struct commit *start_commit,
 			struct commit *parent = parents->item;
 			const char *new_name;
 			int generation, distance;
+			const char *new_name_to_free = NULL;
 
 			parse_commit(parent);
 			if (parent->date < cutoff)
@@ -154,6 +155,7 @@ static void name_rev(struct commit *start_commit,
 					new_name = xstrfmt("%.*s^%d", (int)len,
 							   name->tip_name,
 							   parent_number);
+				new_name_to_free = new_name;
 				generation = 0;
 				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 			} else {
@@ -167,6 +169,8 @@ static void name_rev(struct commit *start_commit,
 						  from_tag))
 				last_new_parent = commit_list_append(parent,
 						  last_new_parent);
+			else
+				free((char*) new_name_to_free);
 		}
 
 		*last_new_parent = list;
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (13 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug a memory leak in name_rev() SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 22:47   ` SZEDER Gábor
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug a " SZEDER Gábor
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The name_rev() function's 'tip_name' parameter is a freshly
xstrdup()ed string, so when name_rev() invokes:

  tip_name = xstrfmt("%s^0", tip_name);

then the original 'tip_name' string is leaked.

Make sure that this string is free()d after it has been used as input
for that xstrfmt() call.

This only happens when name_rev() is invoked with a tag, i.e.
relatively infrequently in a usual repository, so any reduction in
memory usage is lost in the noise.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e202835129..f867d45f0b 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -101,18 +101,22 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 }
 
 static void name_rev(struct commit *start_commit,
-		const char *tip_name, timestamp_t taggerdate,
+		const char *start_tip_name, timestamp_t taggerdate,
 		int from_tag, int deref)
 {
 	struct commit_list *list = NULL;
+	const char *tip_name;
 	char *to_free = NULL;
 
 	parse_commit(start_commit);
 	if (start_commit->date < cutoff)
 		return;
 
-	if (deref)
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
+	if (deref) {
+		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
+		free((char*) start_tip_name);
+	} else
+		tip_name = start_tip_name;
 
 	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
 				   from_tag)) {
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 15/15] name-rev: plug a memory leak in name_rev() in the deref case
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (14 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-20 15:35   ` Derrick Stolee
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug memory leak in name_rev() SZEDER Gábor
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The name_rev() function's 'tip_name' parameter is a freshly
xstrdup()ed string, so when name_rev() invokes:

  tip_name = xstrfmt("%s^0", tip_name);

then the original 'tip_name' string is leaked.

Make sure that this string is free()d after it has been used as input
for that xstrfmt() call.

This only happens when name_rev() is invoked with a tag, i.e.
relatively infrequently in a usual repository, so any reduction in
memory usage is lost in the noise.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 3331075aa4..d65de04918 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -101,18 +101,22 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 }
 
 static void name_rev(struct commit *start_commit,
-		const char *tip_name, timestamp_t taggerdate,
+		const char *start_tip_name, timestamp_t taggerdate,
 		int from_tag, int deref)
 {
 	struct commit_list *list = NULL;
+	const char *tip_name;
 	char *to_free = NULL;
 
 	parse_commit(start_commit);
 	if (start_commit->date < cutoff)
 		return;
 
-	if (deref)
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
+	if (deref) {
+		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
+		free((char*) start_tip_name);
+	} else
+		tip_name = start_tip_name;
 
 	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
 				   from_tag)) {
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH 15/15] name-rev: plug memory leak in name_rev()
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (15 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug a " SZEDER Gábor
@ 2019-09-19 21:47 ` SZEDER Gábor
  2019-09-19 22:48   ` SZEDER Gábor
  2019-09-20 15:37 ` [PATCH 00/15] name-rev: eliminate recursion Derrick Stolee
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
  18 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 21:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, SZEDER Gábor

The loop iterating over the parent commits in the name_rev() function
contains two xstrfmt() calls, and their result is leaked if the parent
commit is not processed further (because that parent has already been
visited before, and this further visit doesn't result in a better name
for its ancestors).

Make sure that the result of those xstrfmt() calls is free()d if the
parent commit is not processed further.

This results in slightly but measurably lower memory usage: the
avarage maximum resident size of 5 'git name-rev --all' invocations in
'linux.git' shrinks from 3256124kB to 319990kB, just about 2% less.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index f867d45f0b..d65de04918 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -139,6 +139,7 @@ static void name_rev(struct commit *start_commit,
 			struct commit *parent = parents->item;
 			const char *new_name;
 			int generation, distance;
+			const char *new_name_to_free = NULL;
 
 			parse_commit(parent);
 			if (parent->date < cutoff)
@@ -158,6 +159,7 @@ static void name_rev(struct commit *start_commit,
 					new_name = xstrfmt("%.*s^%d", (int)len,
 							   name->tip_name,
 							   parent_number);
+				new_name_to_free = new_name;
 				generation = 0;
 				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 			} else {
@@ -171,6 +173,8 @@ static void name_rev(struct commit *start_commit,
 						  from_tag))
 				last_new_parent = commit_list_append(parent,
 						  last_new_parent);
+			else
+				free((char*) new_name_to_free);
 		}
 
 		*last_new_parent = list;
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case
  2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case SZEDER Gábor
@ 2019-09-19 22:47   ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 22:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Please ignore this mail.

On Thu, Sep 19, 2019 at 11:47:10PM +0200, SZEDER Gábor wrote:
> The name_rev() function's 'tip_name' parameter is a freshly
> xstrdup()ed string, so when name_rev() invokes:
> 
>   tip_name = xstrfmt("%s^0", tip_name);
> 
> then the original 'tip_name' string is leaked.
> 
> Make sure that this string is free()d after it has been used as input
> for that xstrfmt() call.
> 
> This only happens when name_rev() is invoked with a tag, i.e.
> relatively infrequently in a usual repository, so any reduction in
> memory usage is lost in the noise.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index e202835129..f867d45f0b 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -101,18 +101,22 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  }
>  
>  static void name_rev(struct commit *start_commit,
> -		const char *tip_name, timestamp_t taggerdate,
> +		const char *start_tip_name, timestamp_t taggerdate,
>  		int from_tag, int deref)
>  {
>  	struct commit_list *list = NULL;
> +	const char *tip_name;
>  	char *to_free = NULL;
>  
>  	parse_commit(start_commit);
>  	if (start_commit->date < cutoff)
>  		return;
>  
> -	if (deref)
> -		tip_name = to_free = xstrfmt("%s^0", tip_name);
> +	if (deref) {
> +		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
> +		free((char*) start_tip_name);
> +	} else
> +		tip_name = start_tip_name;
>  
>  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
>  				   from_tag)) {
> -- 
> 2.23.0.331.g4e51dcdf11
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 15/15] name-rev: plug memory leak in name_rev()
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug memory leak in name_rev() SZEDER Gábor
@ 2019-09-19 22:48   ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-19 22:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Please ignore this patch as well.

On Thu, Sep 19, 2019 at 11:47:12PM +0200, SZEDER Gábor wrote:
> The loop iterating over the parent commits in the name_rev() function
> contains two xstrfmt() calls, and their result is leaked if the parent
> commit is not processed further (because that parent has already been
> visited before, and this further visit doesn't result in a better name
> for its ancestors).
> 
> Make sure that the result of those xstrfmt() calls is free()d if the
> parent commit is not processed further.
> 
> This results in slightly but measurably lower memory usage: the
> avarage maximum resident size of 5 'git name-rev --all' invocations in
> 'linux.git' shrinks from 3256124kB to 319990kB, just about 2% less.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index f867d45f0b..d65de04918 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -139,6 +139,7 @@ static void name_rev(struct commit *start_commit,
>  			struct commit *parent = parents->item;
>  			const char *new_name;
>  			int generation, distance;
> +			const char *new_name_to_free = NULL;
>  
>  			parse_commit(parent);
>  			if (parent->date < cutoff)
> @@ -158,6 +159,7 @@ static void name_rev(struct commit *start_commit,
>  					new_name = xstrfmt("%.*s^%d", (int)len,
>  							   name->tip_name,
>  							   parent_number);
> +				new_name_to_free = new_name;
>  				generation = 0;
>  				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
>  			} else {
> @@ -171,6 +173,8 @@ static void name_rev(struct commit *start_commit,
>  						  from_tag))
>  				last_new_parent = commit_list_append(parent,
>  						  last_new_parent);
> +			else
> +				free((char*) new_name_to_free);
>  		}
>  
>  		*last_new_parent = list;
> -- 
> 2.23.0.331.g4e51dcdf11
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
@ 2019-09-20 15:11   ` Derrick Stolee
  2019-09-20 15:40     ` SZEDER Gábor
  2019-09-20 16:37   ` René Scharfe
  1 sibling, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:11 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index e406ff8e17..dec2228cc7 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -98,7 +98,7 @@ static void name_rev(struct commit *commit,
>  	}
>  
>  	if (name == NULL) {
> -		name = xmalloc(sizeof(rev_name));
> +		name = xmalloc(sizeof(*name));

Is this our preferred way to use xmalloc()? If so, then
I've been doing it wrong and will correct myself in the
future.

-Stolee

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  2019-09-19 21:47 ` [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
@ 2019-09-20 15:14   ` Derrick Stolee
  2019-09-20 15:44     ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:14 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> In 'builtin/name-rev.c' in the name_rev() function there is a loop
> iterating over all parents of the given commit, and the loop body
> looks like this:
> 
>   if (parent_number > 1) {
>     if (generation > 0)
>       // do stuff #1
>     else
>       // do stuff #2
>   } else {
>      // do stuff #3
>   }
> 
> These conditions are not covered properly in the test suite.  As far
> as purely test coverage goes, they are all executed several times over
> in 't6120-describe.sh'.  However, they don't directly influence the
> command's output, because the repository used in that test script
> contains several branches and tags pointing somewhere into the middle
> of the commit DAG, and thus result in a better name for the
> to-be-named commit.  In an early version of this patch series I
> managed to mess up those conditions (every single one of them at
> once!), but the whole test suite still passed successfully.
> 
> So add a new test case that operates on the following history:
> 
>     -----------master
>    /          /
>   A----------M2
>    \        /
>     \---M1-C
>      \ /
>       B
> 
> and names the commit 'B', where:
> 
>   - The merge commit at master makes sure that the 'do stuff #3'
>     affects the final name.
> 
>   - The merge commit M2 make sure that the 'do stuff #1' part
>     affects the final name.
> 
>   - And M1 makes sure that the 'do stuff #2' part affects the final
>     name.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  t/t6120-describe.sh | 43 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 07e6793e84..2a0f2204c4 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -421,4 +421,47 @@ test_expect_success 'describe complains about missing object' '
>  	test_must_fail git describe $ZERO_OID
>  '
>  
> +#   -----------master
> +#  /          /
> +# A----------M2
> +#  \        /
> +#   \---M1-C
> +#    \ /
> +#     B
> +test_expect_success 'test' '
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		echo A >file &&
> +		git add file &&
> +		git commit -m A &&
> +		A=$(git rev-parse HEAD) &&

Is it not enough to do something like test_commit here?

> +
> +		git checkout --detach &&
> +		echo B >file &&
> +		git commit -m B file &&
> +		B=$(git rev-parse HEAD) &&
> +
> +		git checkout $A &&
> +		git merge --no-ff $B &&  # M1
> +
> +		echo C >file &&
> +		git commit -m C file &&
> +
> +		git checkout $A &&
> +		git merge --no-ff HEAD@{1} && # M2
> +
> +		git checkout master &&
> +		git merge --no-ff HEAD@{1} &&
> +
> +		git log --graph --oneline &&
> +
> +		echo "$B master^2^2~1^2" >expect &&
> +		git name-rev $B >actual &&

This matches your description.

Thanks,
-Stolee
 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper
  2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
@ 2019-09-20 15:18   ` Derrick Stolee
  2019-09-22  8:18   ` [PATCH] name-rev: rewrite create_or_update_name() Martin Ågren
  1 sibling, 0 replies; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:18 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> In a later patch in this series we'll want to do this in two places.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 40 +++++++++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index dec2228cc7..cb8ac2fa64 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -75,12 +75,36 @@ static int is_better_name(struct rev_name *name,
>  	return 0;
>  }
>  
> +static struct rev_name *create_or_update_name(struct commit *commit,
> +					      const char *tip_name,
> +					      timestamp_t taggerdate,
> +					      int generation, int distance,
> +					      int from_tag)
> +{
> +	struct rev_name *name = get_commit_rev_name(commit);
> +
> +	if (name == NULL) {
> +		name = xmalloc(sizeof(*name));
> +		set_commit_rev_name(commit, name);
> +		goto copy_data;
> +	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> +copy_data:
> +		name->tip_name = tip_name;
> +		name->taggerdate = taggerdate;
> +		name->generation = generation;
> +		name->distance = distance;
> +		name->from_tag = from_tag;
> +
> +		return name;
> +	} else
> +		return NULL;
> +}
> +
>  static void name_rev(struct commit *commit,
>  		const char *tip_name, timestamp_t taggerdate,
>  		int generation, int distance, int from_tag,
>  		int deref)
>  {
> -	struct rev_name *name = get_commit_rev_name(commit);

A perhaps small benefit: we delay this call until after some
other checks happen. It's just looking up data in a cache, but
it may help a little.

>  	struct commit_list *parents;
>  	int parent_number = 1;
>  	char *to_free = NULL;
> @@ -97,18 +121,8 @@ static void name_rev(struct commit *commit,
>  			die("generation: %d, but deref?", generation);
>  	}
>  
> -	if (name == NULL) {
> -		name = xmalloc(sizeof(*name));
> -		set_commit_rev_name(commit, name);
> -		goto copy_data;
> -	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> -copy_data:
> -		name->tip_name = tip_name;
> -		name->taggerdate = taggerdate;
> -		name->generation = generation;
> -		name->distance = distance;
> -		name->from_tag = from_tag;
> -	} else {
> +	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
> +				   distance, from_tag)) {

Otherwise this method extraction looks correct.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
@ 2019-09-20 15:21   ` Derrick Stolee
  2019-09-20 17:42     ` SZEDER Gábor
  2019-09-20 16:37   ` René Scharfe
  1 sibling, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:21 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> The 'if (deref) { ... }' condition near the beginning of the recursive
> name_rev() function can only ever be true in the first invocation,
> because the 'deref' parameter is always 0 in the subsequent recursive
> invocations.
> 
> Extract this condition from the recursion into name_rev()'s caller and
> drop the function's 'deref' parameter.  This makes eliminating the
> recursion a bit easier to follow, and it will be moved back into
> name_rev() after the recursion is elminated.
> 
> Furthermore, drop the condition that die()s when both 'deref' and
> 'generation' are non-null (which should have been a BUG() to begin
> with).

These changes seem sensible. I look forward to seeing how deref is
reintroduced.

> Note that this change reintroduces the memory leak that was plugged in
> in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
> case, 2017-05-04), but a later patch in this series will plug it in
> again.

The memory leak is now for "tip_name" correct? Just tracking to make
sure it gets plugged later.

> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index cb8ac2fa64..42cea5c881 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -102,30 +102,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  
>  static void name_rev(struct commit *commit,
>  		const char *tip_name, timestamp_t taggerdate,
> -		int generation, int distance, int from_tag,
> -		int deref)
> +		int generation, int distance, int from_tag)
>  {
>  	struct commit_list *parents;
>  	int parent_number = 1;
> -	char *to_free = NULL;
>  
>  	parse_commit(commit);
>  
>  	if (commit->date < cutoff)
>  		return;
>  
> -	if (deref) {
> -		tip_name = to_free = xstrfmt("%s^0", tip_name);
> -
> -		if (generation)
> -			die("generation: %d, but deref?", generation);
> -	}
> -
>  	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
> -				   distance, from_tag)) {
> -		free(to_free);
> +				   distance, from_tag))
>  		return;
> -	}
>  
>  	for (parents = commit->parents;
>  			parents;
> @@ -144,11 +133,11 @@ static void name_rev(struct commit *commit,
>  
>  			name_rev(parents->item, new_name, taggerdate, 0,
>  				 distance + MERGE_TRAVERSAL_WEIGHT,
> -				 from_tag, 0);
> +				 from_tag);
>  		} else {
>  			name_rev(parents->item, tip_name, taggerdate,
>  				 generation + 1, distance + 1,
> -				 from_tag, 0);
> +				 from_tag);
>  		}
>  	}
>  }
> @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  	if (o && o->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *)o;
>  		int from_tag = starts_with(path, "refs/tags/");
> +		const char *tip_name;
>  
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
> -			 from_tag, deref);
> +		if (deref)
> +			tip_name = xstrfmt("%s^0", path);
> +		else
> +			tip_name = xstrdup(path);

(leak above, as noted in message).

> +		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
>  	}
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances
  2019-09-19 21:47 ` [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
@ 2019-09-20 15:27   ` Derrick Stolee
  2019-09-20 17:09     ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:27 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> At the beginning of the recursive name_rev() function it creates a new
> 'struct rev_name' instance for each previously unvisited commit or, if
> this visit results in better name for an already visited commit, then
> updates the 'struct rev_name' instance attached to to the commit, or
> returns early.
> 
> Restructure this so it's caller creates or updates the 'struct
> rev_name' instance associated with the commit to be passed as
> parameter, i.e. both name_ref() before calling name_rev() and
> name_rev() itself as it iterates over the parent commits.
> 
> This makes eliminating the recursion a bit easier to follow, and it
> will be moved back to name_rev() after the recursion is eliminated.
> 
> This change also plugs the memory leak that was temporarily unplugged
> in the earlier "name-rev: pull out deref handling from the recursion"
> patch in this series.
[snip]
>  
> @@ -276,11 +277,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		path = name_ref_abbrev(path, can_abbreviate_output);
>  		if (commit->date >= cutoff) {
>  			const char *tip_name;
> +			char *to_free = NULL;
>  			if (deref)
> -				tip_name = xstrfmt("%s^0", path);
> +				tip_name = to_free = xstrfmt("%s^0", path);
>  			else
>  				tip_name = xstrdup(path);

So this xstrdup(path) is not a leak?

> -			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
> +			if (create_or_update_name(commit, tip_name, taggerdate,
> +						  0, 0, from_tag))
> +				name_rev(commit, tip_name, taggerdate, 0, 0,
> +					 from_tag);
> +			else
> +				free(to_free);
>  		}
>  	}
>  	return 0;
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 15/15] name-rev: plug a memory leak in name_rev() in the deref case
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug a " SZEDER Gábor
@ 2019-09-20 15:35   ` Derrick Stolee
  0 siblings, 0 replies; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:35 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> The name_rev() function's 'tip_name' parameter is a freshly
> xstrdup()ed string, so when name_rev() invokes...
This patch 15/15 seems to be the same as your 14/15, and
we should use your _other_ 15/15, right?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/15] name-rev: eliminate recursion
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (16 preceding siblings ...)
  2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug memory leak in name_rev() SZEDER Gábor
@ 2019-09-20 15:37 ` Derrick Stolee
  2019-09-20 17:37   ` SZEDER Gábor
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
  18 siblings, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-09-20 15:37 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: git

On 9/19/2019 5:46 PM, SZEDER Gábor wrote:
> 'git name-rev' is implemented using a recursive algorithm, and,
> consequently, it can segfault in deep histories (e.g. WebKit), and
> thanks to a test case demonstrating this limitation every test run
> results in a dmesg entry logging the segfaulting git process.
> 
> This patch series eliminates the recursion.

A noble goal! Recursion into commit history is much easier to get
stack overflows than when we recurse into the directory hierarchy.

> Patches 1-5 and 14-15 are while-at-it cleanups I noticed on the way,
> and patch 6 improves test coverage.

These cleanups are nice, and I think I followed them pretty closely.
 
> Patches 7-11 are preparatory refactorings that are supposed to make
> this series easier to follow, and make patch 12, the one finally
> eliminating the recursion, somewhat shorter, and even much shorter
> when viewed with '--ignore-all-space'.  Patch 13 cleans up after those
> preparatory steps.

I responded to several of these, mostly with questions and not actual
recommendations. I do want to apply your patches locally so I can try
this --ignore-all-space trick to really be sure patch 12 is doing the
right thing.

Great organization of patches!

-Stolee

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-09-20 15:11   ` Derrick Stolee
@ 2019-09-20 15:40     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 15:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 11:11:02AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> > ---
> >  builtin/name-rev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> > index e406ff8e17..dec2228cc7 100644
> > --- a/builtin/name-rev.c
> > +++ b/builtin/name-rev.c
> > @@ -98,7 +98,7 @@ static void name_rev(struct commit *commit,
> >  	}
> >  
> >  	if (name == NULL) {
> > -		name = xmalloc(sizeof(rev_name));
> > +		name = xmalloc(sizeof(*name));
> 
> Is this our preferred way to use xmalloc()? If so, then
> I've been doing it wrong and will correct myself in the
> future.

I seem to remember that Peff mentioned in a commit message that this
is the preferred way, but can't find it at the moment.  Anyway, when
using 'sizeof(*ptr)' the type is inferred by the compiler, but when
using 'sizeof(type)' then we have to make sure that 'type' is indeed
the right type.

Besides, that 'rev_name' should have been spelled as 'struct rev_name'
to begin with.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  2019-09-20 15:14   ` Derrick Stolee
@ 2019-09-20 15:44     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 15:44 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 11:14:56AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> > These conditions are not covered properly in the test suite.  As far
> > as purely test coverage goes, they are all executed several times over
> > in 't6120-describe.sh'.  However, they don't directly influence the
> > command's output, because the repository used in that test script
> > contains several branches and tags pointing somewhere into the middle
> > of the commit DAG, and thus result in a better name for the
> > to-be-named commit.

> > diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> > index 07e6793e84..2a0f2204c4 100755
> > --- a/t/t6120-describe.sh
> > +++ b/t/t6120-describe.sh
> > @@ -421,4 +421,47 @@ test_expect_success 'describe complains about missing object' '
> >  	test_must_fail git describe $ZERO_OID
> >  '
> >  
> > +#   -----------master
> > +#  /          /
> > +# A----------M2
> > +#  \        /
> > +#   \---M1-C
> > +#    \ /
> > +#     B
> > +test_expect_success 'test' '
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +
> > +		echo A >file &&
> > +		git add file &&
> > +		git commit -m A &&
> > +		A=$(git rev-parse HEAD) &&
> 
> Is it not enough to do something like test_commit here?

No, because 'test_commit' adds branches and tags pointing to commits
somewhere in the middle of the history, and those will serve as better
starting point for the resulting name.

> > +
> > +		git checkout --detach &&
> > +		echo B >file &&
> > +		git commit -m B file &&
> > +		B=$(git rev-parse HEAD) &&
> > +
> > +		git checkout $A &&
> > +		git merge --no-ff $B &&  # M1
> > +
> > +		echo C >file &&
> > +		git commit -m C file &&
> > +
> > +		git checkout $A &&
> > +		git merge --no-ff HEAD@{1} && # M2
> > +
> > +		git checkout master &&
> > +		git merge --no-ff HEAD@{1} &&
> > +
> > +		git log --graph --oneline &&
> > +
> > +		echo "$B master^2^2~1^2" >expect &&
> > +		git name-rev $B >actual &&
> 
> This matches your description.
> 
> Thanks,
> -Stolee
>  

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name()
  2019-09-19 21:46 ` [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name() SZEDER Gábor
@ 2019-09-20 16:36   ` René Scharfe
  2019-09-20 17:10     ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-20 16:36 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 19.09.19 um 23:46 schrieb SZEDER Gábor:
> Use strip_suffix() instead of open-coding it, making the code more
> idiomatic.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index c785fe16ba..d345456656 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -317,11 +317,11 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
>  	if (!n->generation)
>  		return n->tip_name;
>  	else {
> -		int len = strlen(n->tip_name);
> -		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
> -			len -= 2;
> +		size_t len;
> +		strip_suffix(n->tip_name, "^0", &len);
>  		strbuf_reset(buf);
> -		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
> +		strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
> +			    n->generation);
>  		return buf->buf;
>  	}
>  }
>

This gets rid of the repeated magic string length constant 2, which is
nice.  But why not go all the way to full strbuf-ness?  It's shorter,
looks less busy, and the extra two copied bytes shouldn't matter in a
measurable way.

	else {
		strbuf_reset(buf);
		strbuf_addstr(buf, n->tip_name);
		strbuf_strip_suffix(buf, "^0");
		strbuf_addf(buf, "~%d", n->generation);
		return buf->buf;
	}


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref()
  2019-09-19 21:46 ` [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
@ 2019-09-20 16:37   ` René Scharfe
  0 siblings, 0 replies; 98+ messages in thread
From: René Scharfe @ 2019-09-20 16:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 19.09.19 um 23:46 schrieb SZEDER Gábor:
> Casting a 'struct object' to 'struct commit' is unnecessary there,
> because it's already available in the local 'commit' variable.

That's true, but you can't see that only by reading your email.

>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index d345456656..e406ff8e17 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -268,7 +268,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo

Here's the pertinent context line; --function-context would have been too
much, I think, but -U4 would have shown it:

		struct commit *commit = (struct commit *)o;
>  		int from_tag = starts_with(path, "refs/tags/");
>
>  		if (taggerdate == TIME_MAX)
> -			taggerdate = ((struct commit *)o)->date;
> +			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
>  		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
>  			 from_tag, deref);
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
  2019-09-20 15:11   ` Derrick Stolee
@ 2019-09-20 16:37   ` René Scharfe
  1 sibling, 0 replies; 98+ messages in thread
From: René Scharfe @ 2019-09-20 16:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 19.09.19 um 23:47 schrieb SZEDER Gábor:
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index e406ff8e17..dec2228cc7 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -98,7 +98,7 @@ static void name_rev(struct commit *commit,
>  	}
>
>  	if (name == NULL) {
> -		name = xmalloc(sizeof(rev_name));
> +		name = xmalloc(sizeof(*name));

Here are the declarations of both (and my beloved --function-context
option would only have shown the second one):

typedef struct rev_name {
	const char *tip_name;
	timestamp_t taggerdate;
	int generation;
	int distance;
	int from_tag;
} rev_name;

	struct rev_name *name = get_commit_rev_name(commit);

So your patch is correct.  Had me scratching my head when I first saw
it, though.  That old code has been present since bd321bcc51 ("Add
git-name-rev", 2005-10-26).

>  		set_commit_rev_name(commit, name);
>  		goto copy_data;
>  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
  2019-09-20 15:21   ` Derrick Stolee
@ 2019-09-20 16:37   ` René Scharfe
  2019-09-20 18:13     ` SZEDER Gábor
  1 sibling, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-20 16:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 19.09.19 um 23:47 schrieb SZEDER Gábor:
> The 'if (deref) { ... }' condition near the beginning of the recursive
> name_rev() function can only ever be true in the first invocation,
> because the 'deref' parameter is always 0 in the subsequent recursive
> invocations.
>
> Extract this condition from the recursion into name_rev()'s caller and
> drop the function's 'deref' parameter.  This makes eliminating the
> recursion a bit easier to follow, and it will be moved back into
> name_rev() after the recursion is elminated.
>
> Furthermore, drop the condition that die()s when both 'deref' and
> 'generation' are non-null (which should have been a BUG() to begin
> with).
>
> Note that this change reintroduces the memory leak that was plugged in
> in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
> case, 2017-05-04), but a later patch in this series will plug it in
> again.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index cb8ac2fa64..42cea5c881 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -102,30 +102,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>
>  static void name_rev(struct commit *commit,
>  		const char *tip_name, timestamp_t taggerdate,
> -		int generation, int distance, int from_tag,
> -		int deref)
> +		int generation, int distance, int from_tag)
>  {
>  	struct commit_list *parents;
>  	int parent_number = 1;
> -	char *to_free = NULL;
>
>  	parse_commit(commit);
>
>  	if (commit->date < cutoff)
>  		return;
>
> -	if (deref) {
> -		tip_name = to_free = xstrfmt("%s^0", tip_name);
> -
> -		if (generation)
> -			die("generation: %d, but deref?", generation);
> -	}
> -
>  	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
> -				   distance, from_tag)) {
> -		free(to_free);
> +				   distance, from_tag))
>  		return;
> -	}
>
>  	for (parents = commit->parents;
>  			parents;
> @@ -144,11 +133,11 @@ static void name_rev(struct commit *commit,
>
>  			name_rev(parents->item, new_name, taggerdate, 0,
>  				 distance + MERGE_TRAVERSAL_WEIGHT,
> -				 from_tag, 0);
> +				 from_tag);
>  		} else {
>  			name_rev(parents->item, tip_name, taggerdate,
>  				 generation + 1, distance + 1,
> -				 from_tag, 0);
> +				 from_tag);
>  		}
>  	}
>  }
> @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  	if (o && o->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *)o;
>  		int from_tag = starts_with(path, "refs/tags/");
> +		const char *tip_name;

This should not be const because you allocate the buffer it points to
right here in the function, in each execution path.

>
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
> -			 from_tag, deref);
> +		if (deref)
> +			tip_name = xstrfmt("%s^0", path);
> +		else
> +			tip_name = xstrdup(path);
> +		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);

tip_name should be free(3)'d here.  Except we can't do that because
name_rev() sometimes stores that pointer in a commit slab.  Ugh.

If the (re)introduced leak doesn't impact performance and memory
usage too much then duplicating tip_name again in name_rev() or
rather your new create_or_update_name() would likely make the
lifetimes of those string buffers easier to manage.

>  	}
>  	return 0;
>  }
>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances
  2019-09-20 15:27   ` Derrick Stolee
@ 2019-09-20 17:09     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 17:09 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 11:27:49AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> > At the beginning of the recursive name_rev() function it creates a new
> > 'struct rev_name' instance for each previously unvisited commit or, if
> > this visit results in better name for an already visited commit, then
> > updates the 'struct rev_name' instance attached to to the commit, or
> > returns early.
> > 
> > Restructure this so it's caller creates or updates the 'struct
> > rev_name' instance associated with the commit to be passed as
> > parameter, i.e. both name_ref() before calling name_rev() and
> > name_rev() itself as it iterates over the parent commits.
> > 
> > This makes eliminating the recursion a bit easier to follow, and it
> > will be moved back to name_rev() after the recursion is eliminated.
> > 
> > This change also plugs the memory leak that was temporarily unplugged
> > in the earlier "name-rev: pull out deref handling from the recursion"
> > patch in this series.
> [snip]
> >  
> > @@ -276,11 +277,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >  		path = name_ref_abbrev(path, can_abbreviate_output);
> >  		if (commit->date >= cutoff) {
> >  			const char *tip_name;
> > +			char *to_free = NULL;
> >  			if (deref)
> > -				tip_name = xstrfmt("%s^0", path);
> > +				tip_name = to_free = xstrfmt("%s^0", path);
> >  			else
> >  				tip_name = xstrdup(path);
> 
> So this xstrdup(path) is not a leak?

Well... yes, everything is leaked, eventually ;)

First of all, name_ref() is a callback function invoked by
for_each_ref(), and 'path' is one of its parameters.  This means
that we must duplicate this string, because we might need to access it
even after iterating over refs is over (when displaying the name of
the commits).

This copy then becomes the initial name_rev() invocation's 'tip_name'
parameter, and things get a bit harder to reason about because of
merges, the recursion, and being in the middle of the refactorings...

So for argument's sake look at current master's 'builtin/name-rev.c',
and assume that it has to deal with a linear history with a single
branch, and no tags.  When name_ref() is invoked with a branch, then
deref = 0, so in name_rev() the condition:

  if (deref) {
        tip_name = to_free = xstrfmt("%s^0", tip_name);

won't be fulfilled, and 'tip_name' remains unchanged.  Then comes the
initialization of the 'struct rev_name' associated with the tip commit
in the commit slab, including the

  name->tip_name = tip_name;

assignment, IOW we now have a pointer to the original 'tip_name' in
the commit slab.

Then name_rev() looks at the tip commit's parents, or rather, because
of the linear history, at its only parent.  This means that neither of
the other two xstrfmt() will be invoked, and name_rev() will be
recursively invoked with the original 'tip_name' pointer.  This will
then initialize another 'struct rev_name' instance, including yet
another pointer to the original 'tip_name'.

This will go on until the root commit is reached, and in the end every
commit will have an associated 'struct rev_name' instance with a
pointer to the original 'tip_name' string.

At this point we're done with the recursion, and since the repo has
only a single branch, we're done with for_each_ref() as well, and
finally print names of the commits.

Then it's time to clean up and free() memory, but:

  - we could only release the commit slab and free() all 'struct
    rev_name' instances within, but we can't free() the
    'name->tip_name' pointers, because, as shown above, we might have
    a bunch of pointers pointing to the same string.

  - we are at the end of cmd_name_rev() and are about to exit the
    program, so the OS will release any and all resources anyway.
    Yeah, it's far from ready to be turned into a library call...

Now, in a real repository we'll have multiple branches, tags, and
merges, so there will be cases when an existing 'struct rev_name'
instance is updated, and it's 'name->tip_name' is overwritten by a
better name.  At that point the old value is potentially leaked, but
we can't really do anything about it, because we don't know whether
any other 'struct rev_name' instances point to it.

Eliminating the recursion doesn't change anything in this respect.



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name()
  2019-09-20 16:36   ` René Scharfe
@ 2019-09-20 17:10     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 17:10 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 06:36:30PM +0200, René Scharfe wrote:
> Am 19.09.19 um 23:46 schrieb SZEDER Gábor:
> > Use strip_suffix() instead of open-coding it, making the code more
> > idiomatic.
> >
> > Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> > ---
> >  builtin/name-rev.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> > index c785fe16ba..d345456656 100644
> > --- a/builtin/name-rev.c
> > +++ b/builtin/name-rev.c
> > @@ -317,11 +317,11 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
> >  	if (!n->generation)
> >  		return n->tip_name;
> >  	else {
> > -		int len = strlen(n->tip_name);
> > -		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
> > -			len -= 2;
> > +		size_t len;
> > +		strip_suffix(n->tip_name, "^0", &len);
> >  		strbuf_reset(buf);
> > -		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
> > +		strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
> > +			    n->generation);
> >  		return buf->buf;
> >  	}
> >  }
> >
> 
> This gets rid of the repeated magic string length constant 2, which is
> nice.  But why not go all the way to full strbuf-ness?  It's shorter,
> looks less busy, and the extra two copied bytes shouldn't matter in a
> measurable way.
> 
> 	else {
> 		strbuf_reset(buf);
> 		strbuf_addstr(buf, n->tip_name);
> 		strbuf_strip_suffix(buf, "^0");
> 		strbuf_addf(buf, "~%d", n->generation);
> 		return buf->buf;
> 	}

Oh, I like this, thanks!


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 00/15] name-rev: eliminate recursion
  2019-09-20 15:37 ` [PATCH 00/15] name-rev: eliminate recursion Derrick Stolee
@ 2019-09-20 17:37   ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 17:37 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 11:37:12AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:46 PM, SZEDER Gábor wrote:
> > 'git name-rev' is implemented using a recursive algorithm, and,
> > consequently, it can segfault in deep histories (e.g. WebKit), and
> > thanks to a test case demonstrating this limitation every test run
> > results in a dmesg entry logging the segfaulting git process.
> > 
> > This patch series eliminates the recursion.
> 
> A noble goal! Recursion into commit history is much easier to get
> stack overflows than when we recurse into the directory hierarchy.
> 
> > Patches 1-5 and 14-15 are while-at-it cleanups I noticed on the way,
> > and patch 6 improves test coverage.
> 
> These cleanups are nice, and I think I followed them pretty closely.
>  
> > Patches 7-11 are preparatory refactorings that are supposed to make
> > this series easier to follow, and make patch 12, the one finally
> > eliminating the recursion, somewhat shorter, and even much shorter
> > when viewed with '--ignore-all-space'.  Patch 13 cleans up after those
> > preparatory steps.
> 
> I responded to several of these, mostly with questions and not actual
> recommendations. I do want to apply your patches locally so I can try
> this --ignore-all-space trick to really be sure patch 12 is doing the
> right thing.

  git fetch https://github.com/szeder/git name-rev-no-recursion

(But this is sort of a v1.1, as it already includes René's suggestion
for patch 3.)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-20 15:21   ` Derrick Stolee
@ 2019-09-20 17:42     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 17:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 11:21:53AM -0400, Derrick Stolee wrote:
> On 9/19/2019 5:47 PM, SZEDER Gábor wrote:
> > The 'if (deref) { ... }' condition near the beginning of the recursive
> > name_rev() function can only ever be true in the first invocation,
> > because the 'deref' parameter is always 0 in the subsequent recursive
> > invocations.
> > 
> > Extract this condition from the recursion into name_rev()'s caller and
> > drop the function's 'deref' parameter.  This makes eliminating the
> > recursion a bit easier to follow, and it will be moved back into
> > name_rev() after the recursion is elminated.

s/elminated/eliminated/

> > Furthermore, drop the condition that die()s when both 'deref' and
> > 'generation' are non-null (which should have been a BUG() to begin
> > with).
> 
> These changes seem sensible. I look forward to seeing how deref is
> reintroduced.
> 
> > Note that this change reintroduces the memory leak that was plugged in
> > in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
> > case, 2017-05-04), but a later patch in this series will plug it in
> > again.
> 
> The memory leak is now for "tip_name" correct? Just tracking to make
> sure it gets plugged later.

Yes, it's 'tip_name' (the one returned by xstrfmt()).

> > -	if (deref) {
> > -		tip_name = to_free = xstrfmt("%s^0", tip_name);

> > +		if (deref)
> > +			tip_name = xstrfmt("%s^0", path);
> > +		else
> > +			tip_name = xstrdup(path);

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-20 16:37   ` René Scharfe
@ 2019-09-20 18:13     ` SZEDER Gábor
  2019-09-20 18:14       ` SZEDER Gábor
  2019-09-21 12:37       ` René Scharfe
  0 siblings, 2 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 18:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

> > @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >  	if (o && o->type == OBJ_COMMIT) {
> >  		struct commit *commit = (struct commit *)o;
> >  		int from_tag = starts_with(path, "refs/tags/");
> > +		const char *tip_name;
> 
> This should not be const because you allocate the buffer it points to
> right here in the function, in each execution path.

Marking it as const indicates that this function doesn't modify the
buffer where the pointer points at.

> >
> >  		if (taggerdate == TIME_MAX)
> >  			taggerdate = commit->date;
> >  		path = name_ref_abbrev(path, can_abbreviate_output);
> > -		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
> > -			 from_tag, deref);
> > +		if (deref)
> > +			tip_name = xstrfmt("%s^0", path);
> > +		else
> > +			tip_name = xstrdup(path);
> > +		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
> 
> tip_name should be free(3)'d here.  Except we can't do that because
> name_rev() sometimes stores that pointer in a commit slab.  Ugh.
> 
> If the (re)introduced leak doesn't impact performance and memory
> usage too much then duplicating tip_name again in name_rev() or
> rather your new create_or_update_name() would likely make the
> lifetimes of those string buffers easier to manage.

Yeah, the easiest would be when each 'struct rev_name' in the commit
slab would have its own 'tip_name' string, but that would result in
a lot of duplicated strings and increased memory usage.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-20 18:13     ` SZEDER Gábor
@ 2019-09-20 18:14       ` SZEDER Gábor
  2019-09-21  9:57         ` SZEDER Gábor
  2019-09-21 12:37       ` René Scharfe
  1 sibling, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 18:14 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
> > If the (re)introduced leak doesn't impact performance and memory
> > usage too much then duplicating tip_name again in name_rev() or
> > rather your new create_or_update_name() would likely make the
> > lifetimes of those string buffers easier to manage.
> 
> Yeah, the easiest would be when each 'struct rev_name' in the commit
> slab would have its own 'tip_name' string, but that would result in
> a lot of duplicated strings and increased memory usage.

I didn't measure how much more memory would be used, though.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/15] t6120-describe: correct test repo history graph in comment
  2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
@ 2019-09-20 21:47   ` Junio C Hamano
  2019-09-20 22:29     ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: Junio C Hamano @ 2019-09-20 21:47 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> At the top of 't6120-describe.sh' an ASCII graph illustrates the
> repository's history used in this test script.  This graph is a bit
> misleading, because it swapped the second merge commit's first and
> second parents.

Hmm...

> +#       ,---o----o----o-----.
> +#      /   D,R   e           \
> +#  o--o-----o-------------o---o----x
> +#      \    B            /
> +#       `---o----o----o-'
> +#                A    c

What's the first parent of the merge between 'B' and 'c' in this
picture and how does the reader figure it out?  What about the same
question on the direct parent of 'x'?  Is it generally accepted that
a straight line denotes the first ancestry, or something?  I do not
offhand see between these two the new one is a clear improvement.

I do agree with the issue with illustrating topology, and it is an
issue worth addressing.  In the past when the order of parents
mattered, I experimented to find ways to depict them clearly,
without much success.  One of the things I tried was to label the
parents, like so:

> -                       B
> -        .--------------o---1o---2o----x
> -       /                   2    1
> - o----o----o----o----o----.    /
> -       \        A    c        /
> -        .------------o---o---o
> -                   D,R   e

but I did not find it very satisfactory.

In any case, since this step is about "improving" the illustration,
I'd like to see a clear improvement.  Perhaps an extra comment that
says "straight line is the first parent chain" next to the drawing
might qualify as such.

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper
  2019-09-19 21:46 ` [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
@ 2019-09-20 21:49   ` Junio C Hamano
  0 siblings, 0 replies; 98+ messages in thread
From: Junio C Hamano @ 2019-09-20 21:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> The 'check_describe' helper function runs 'git describe' outside of
> 'test_expect_success' blocks, with extra hand-rolled code to record
> and examine its exit code.
>
> Update this helper and move the 'git decribe' invocation inside the
> 'test_expect_success' block.

Thanks for a fix.  This makes quite a lot of sense.

>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  t/t6120-describe.sh | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
> index 0bf7e0c8bc..07e6793e84 100755
> --- a/t/t6120-describe.sh
> +++ b/t/t6120-describe.sh
> @@ -14,14 +14,12 @@ test_description='test describe'
>  check_describe () {
>  	expect="$1"
>  	shift
> -	R=$(git describe "$@" 2>err.actual)
> -	S=$?
> -	cat err.actual >&3
> -	test_expect_success "describe $*" '
> -	test $S = 0 &&
> +	describe_opts="$@"
> +	test_expect_success "describe $describe_opts" '
> +	R=$(git describe $describe_opts 2>err.actual) &&
>  	case "$R" in
>  	$expect)	echo happy ;;
> -	*)	echo "Oops - $R is not $expect";
> +	*)	echo "Oops - $R is not $expect" &&
>  		false ;;
>  	esac
>  	'

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/15] t6120-describe: correct test repo history graph in comment
  2019-09-20 21:47   ` Junio C Hamano
@ 2019-09-20 22:29     ` SZEDER Gábor
  2019-09-28  4:06       ` Junio C Hamano
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-20 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Sep 20, 2019 at 02:47:38PM -0700, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
> > At the top of 't6120-describe.sh' an ASCII graph illustrates the
> > repository's history used in this test script.  This graph is a bit
> > misleading, because it swapped the second merge commit's first and
> > second parents.
> 
> Hmm...
> 
> > +#       ,---o----o----o-----.
> > +#      /   D,R   e           \
> > +#  o--o-----o-------------o---o----x
> > +#      \    B            /
> > +#       `---o----o----o-'
> > +#                A    c
> 
> What's the first parent of the merge between 'B' and 'c' in this
> picture and how does the reader figure it out?  What about the same
> question on the direct parent of 'x'?  Is it generally accepted that
> a straight line denotes the first ancestry, or something?

I've always thought that the parents are numbered from top to bottom,
i.e. 'B' is the first parent of the first merge, and the unnamed
commit at the top is the first parent of the second merge.

Would it help if it were arranged like this:

  o---o-----o----o----o-------o----x
       \   D,R   e           /
        \---o-------------o-'
         \  B            /
          `-o----o----o-'
                 A    c

This is basically how 'git log --graph' would show them, except that
this is horizontal.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-20 18:14       ` SZEDER Gábor
@ 2019-09-21  9:57         ` SZEDER Gábor
  2019-09-21 12:37           ` René Scharfe
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-21  9:57 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
> > > If the (re)introduced leak doesn't impact performance and memory
> > > usage too much then duplicating tip_name again in name_rev() or
> > > rather your new create_or_update_name() would likely make the
> > > lifetimes of those string buffers easier to manage.
> > 
> > Yeah, the easiest would be when each 'struct rev_name' in the commit
> > slab would have its own 'tip_name' string, but that would result in
> > a lot of duplicated strings and increased memory usage.
> 
> I didn't measure how much more memory would be used, though.

So, I tried the patch below to give each 'struct rev_name' instance
its own copy of 'tip_name', and the memory usage of 'git name-rev
--all' usually increased.

The increase depends on how many merges and how many refs there are
compared to the number of commits: the fewer merges and refs, the
higher the more the memory usage increased:

  linux:         +4.8%
  gcc:           +7.2% 
  gecko-dev:     +9.2%
  webkit:       +12.4%
  llvm-project: +19.0%

git.git is the exception with its unusually high number of merge
commits (about 25%), and the memory usage decresed by 4.4%.


 --- >8 ---

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6969af76c4..62ab78242b 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 		set_commit_rev_name(commit, name);
 		goto copy_data;
 	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
+		free((char*) name->tip_name);
 copy_data:
 		name->tip_name = tip_name;
 		name->taggerdate = taggerdate;
@@ -106,21 +107,19 @@ static void name_rev(struct commit *start_commit,
 {
 	struct commit_list *list = NULL;
 	const char *tip_name;
-	char *to_free = NULL;
 
 	parse_commit(start_commit);
 	if (start_commit->date < cutoff)
 		return;
 
 	if (deref) {
-		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
-		free((char*) start_tip_name);
+		tip_name = xstrfmt("%s^0", start_tip_name);
 	} else
-		tip_name = start_tip_name;
+		tip_name = strdup(start_tip_name);
 
 	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
 				   from_tag)) {
-		free(to_free);
+		free((char*) tip_name);
 		return;
 	}
 
@@ -139,7 +138,6 @@ static void name_rev(struct commit *start_commit,
 			struct commit *parent = parents->item;
 			const char *new_name;
 			int generation, distance;
-			const char *new_name_to_free = NULL;
 
 			parse_commit(parent);
 			if (parent->date < cutoff)
@@ -159,11 +157,10 @@ static void name_rev(struct commit *start_commit,
 					new_name = xstrfmt("%.*s^%d", (int)len,
 							   name->tip_name,
 							   parent_number);
-				new_name_to_free = new_name;
 				generation = 0;
 				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 			} else {
-				new_name = name->tip_name;
+				new_name = strdup(name->tip_name);
 				generation = name->generation + 1;
 				distance = name->distance + 1;
 			}
@@ -174,7 +171,7 @@ static void name_rev(struct commit *start_commit,
 				last_new_parent = commit_list_append(parent,
 						  last_new_parent);
 			else
-				free((char*) new_name_to_free);
+				free((char*) new_name);
 		}
 
 		*last_new_parent = list;
@@ -313,7 +310,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
+		name_rev(commit, path, taggerdate, from_tag, deref);
 	}
 	return 0;
 }
-- 
2.23.0.331.g4e51dcdf11


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-20 18:13     ` SZEDER Gábor
  2019-09-20 18:14       ` SZEDER Gábor
@ 2019-09-21 12:37       ` René Scharfe
  2019-09-21 14:21         ` SZEDER Gábor
  1 sibling, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-21 12:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 20.09.19 um 20:13 schrieb SZEDER Gábor:
>>> @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>>>  	if (o && o->type == OBJ_COMMIT) {
>>>  		struct commit *commit = (struct commit *)o;
>>>  		int from_tag = starts_with(path, "refs/tags/");
>>> +		const char *tip_name;
>>
>> This should not be const because you allocate the buffer it points to
>> right here in the function, in each execution path.
>
> Marking it as const indicates that this function doesn't modify the
> buffer where the pointer points at.

Right, and that's at odds with this code:

>>> +		if (deref)
>>> +			tip_name = xstrfmt("%s^0", path);
>>> +		else
>>> +			tip_name = xstrdup(path);

... which allocates said memory and writes a string to it.

René

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff
  2019-09-19 21:47 ` [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
@ 2019-09-21 12:37   ` René Scharfe
  0 siblings, 0 replies; 98+ messages in thread
From: René Scharfe @ 2019-09-21 12:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 19.09.19 um 23:47 schrieb SZEDER Gábor:
> At the beginning of the recursive name_rev() function it parses the
> commit it got as parameter, and returns early if the commit is older
> than a cutoff limit.
>
> Restructure this so the caller parses the commit and checks its date,
> and doesn't invoke name_rev() if the commit to be passed as parameter
> is older than the cutoff, i.e. both name_ref() before calling
> name_rev() and name_rev() itself as it iterates over the parent
> commits.
>
> This makes eliminating the recursion a bit easier to follow, and it
> will be moved back to name_rev() after the recursion is eliminated.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 42cea5c881..99643aa4dc 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -107,11 +107,6 @@ static void name_rev(struct commit *commit,
>  	struct commit_list *parents;
>  	int parent_number = 1;
>
> -	parse_commit(commit);
> -
> -	if (commit->date < cutoff)
> -		return;
> -
>  	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
>  				   distance, from_tag))
>  		return;
> @@ -119,6 +114,12 @@ static void name_rev(struct commit *commit,
>  	for (parents = commit->parents;
>  			parents;
>  			parents = parents->next, parent_number++) {
> +		struct commit *parent = parents->item;
> +
> +		parse_commit(parent);
> +		if (parent->date < cutoff)
> +			continue;
> +
>  		if (parent_number > 1) {
>  			size_t len;
>  			char *new_name;
> @@ -131,11 +132,11 @@ static void name_rev(struct commit *commit,
>  				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
>  						   parent_number);
>

The check now also skips this allocation for old commits...

> -			name_rev(parents->item, new_name, taggerdate, 0,
> +			name_rev(parent, new_name, taggerdate, 0,
>  				 distance + MERGE_TRAVERSAL_WEIGHT,
>  				 from_tag);
>  		} else {
> -			name_rev(parents->item, tip_name, taggerdate,
> +			name_rev(parent, tip_name, taggerdate,
>  				 generation + 1, distance + 1,
>  				 from_tag);
>  		}
> @@ -269,16 +270,18 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  	if (o && o->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *)o;
>  		int from_tag = starts_with(path, "refs/tags/");
> -		const char *tip_name;
>
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		if (deref)
> -			tip_name = xstrfmt("%s^0", path);
> -		else
> -			tip_name = xstrdup(path);
> -		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
> +		if (commit->date >= cutoff) {
> +			const char *tip_name;
> +			if (deref)
> +				tip_name = xstrfmt("%s^0", path);
> +			else
> +				tip_name = xstrdup(path);

... and this allocation here as well.  If this improves performance
in a meaningful way then perhaps it should be kept at this place?
And if it doesn't, then an additional allocation might not hurt much?

Just a thought, I still didn't measure..

> +			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
> +		}
>  	}
>  	return 0;
>  }
>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-21  9:57         ` SZEDER Gábor
@ 2019-09-21 12:37           ` René Scharfe
  2019-09-22 19:05             ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-21 12:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
>>>> If the (re)introduced leak doesn't impact performance and memory
>>>> usage too much then duplicating tip_name again in name_rev() or
>>>> rather your new create_or_update_name() would likely make the
>>>> lifetimes of those string buffers easier to manage.
>>>
>>> Yeah, the easiest would be when each 'struct rev_name' in the commit
>>> slab would have its own 'tip_name' string, but that would result in
>>> a lot of duplicated strings and increased memory usage.
>>
>> I didn't measure how much more memory would be used, though.
>
> So, I tried the patch below to give each 'struct rev_name' instance
> its own copy of 'tip_name', and the memory usage of 'git name-rev
> --all' usually increased.
>
> The increase depends on how many merges and how many refs there are
> compared to the number of commits: the fewer merges and refs, the
> higher the more the memory usage increased:
>
>   linux:         +4.8%
>   gcc:           +7.2%
>   gecko-dev:     +9.2%
>   webkit:       +12.4%
>   llvm-project: +19.0%

Is that the overall memory usage or just for struct rev_name instances
and tip_name strings?  And how much is that in absolute terms?  (Perhaps
it's worth it to get the memory ownership question off the table at
least during the transformation to iterative processing.)

> git.git is the exception with its unusually high number of merge
> commits (about 25%), and the memory usage decresed by 4.4%.

Interesting.

I wonder why regular commits even need a struct name_rev.  Shouldn't
only tips and roots need ones?  And perhaps merges and occasional
regular "checkpoint" commits, to avoid too many duplicate traversals.

That's not exactly on-topic, though, and I didn't think all that
deeply about it, but perhaps switching to a different marking
strategy could get rid of recursion as a side-effect?  *waves hands
vaguely*

>
>
>  --- >8 ---
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 6969af76c4..62ab78242b 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  		set_commit_rev_name(commit, name);
>  		goto copy_data;
>  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> +		free((char*) name->tip_name);
>  copy_data:
>  		name->tip_name = tip_name;

I would have expected a xstrdup() call here.

>  		name->taggerdate = taggerdate;
> @@ -106,21 +107,19 @@ static void name_rev(struct commit *start_commit,
>  {
>  	struct commit_list *list = NULL;
>  	const char *tip_name;
> -	char *to_free = NULL;
>
>  	parse_commit(start_commit);
>  	if (start_commit->date < cutoff)
>  		return;
>
>  	if (deref) {
> -		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
> -		free((char*) start_tip_name);
> +		tip_name = xstrfmt("%s^0", start_tip_name);
>  	} else
> -		tip_name = start_tip_name;
> +		tip_name = strdup(start_tip_name);

This would not be needed with the central xstrdup() call mentioned above.

>
>  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
>  				   from_tag)) {
> -		free(to_free);
> +		free((char*) tip_name);
>  		return;
>  	}
>
> @@ -139,7 +138,6 @@ static void name_rev(struct commit *start_commit,
>  			struct commit *parent = parents->item;
>  			const char *new_name;
>  			int generation, distance;
> -			const char *new_name_to_free = NULL;
>
>  			parse_commit(parent);
>  			if (parent->date < cutoff)
> @@ -159,11 +157,10 @@ static void name_rev(struct commit *start_commit,
>  					new_name = xstrfmt("%.*s^%d", (int)len,
>  							   name->tip_name,
>  							   parent_number);
> -				new_name_to_free = new_name;
>  				generation = 0;
>  				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
>  			} else {
> -				new_name = name->tip_name;
> +				new_name = strdup(name->tip_name);

... and neither would this.

Sure the xstrfmt() result would be duplicated instead of being reused, but
that doesn't increase memory usage overall.

>  				generation = name->generation + 1;
>  				distance = name->distance + 1;
>  			}
> @@ -174,7 +171,7 @@ static void name_rev(struct commit *start_commit,
>  				last_new_parent = commit_list_append(parent,
>  						  last_new_parent);
>  			else
> -				free((char*) new_name_to_free);
> +				free((char*) new_name);
>  		}
>
>  		*last_new_parent = list;
> @@ -313,7 +310,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
> +		name_rev(commit, path, taggerdate, from_tag, deref);
>  	}
>  	return 0;
>  }
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-21 12:37       ` René Scharfe
@ 2019-09-21 14:21         ` SZEDER Gábor
  2019-09-21 15:52           ` René Scharfe
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-21 14:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Sat, Sep 21, 2019 at 02:37:05PM +0200, René Scharfe wrote:
> Am 20.09.19 um 20:13 schrieb SZEDER Gábor:
> >>> @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >>>  	if (o && o->type == OBJ_COMMIT) {
> >>>  		struct commit *commit = (struct commit *)o;
> >>>  		int from_tag = starts_with(path, "refs/tags/");
> >>> +		const char *tip_name;
> >>
> >> This should not be const because you allocate the buffer it points to
> >> right here in the function, in each execution path.
> >
> > Marking it as const indicates that this function doesn't modify the
> > buffer where the pointer points at.
> 
> Right, and that's at odds with this code:
> 
> >>> +		if (deref)
> >>> +			tip_name = xstrfmt("%s^0", path);
> >>> +		else
> >>> +			tip_name = xstrdup(path);
> 
> ... which allocates said memory and writes a string to it.

... before assigning it to the const pointer.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-21 14:21         ` SZEDER Gábor
@ 2019-09-21 15:52           ` René Scharfe
  0 siblings, 0 replies; 98+ messages in thread
From: René Scharfe @ 2019-09-21 15:52 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 21.09.19 um 16:21 schrieb SZEDER Gábor:
> On Sat, Sep 21, 2019 at 02:37:05PM +0200, René Scharfe wrote:
>> Am 20.09.19 um 20:13 schrieb SZEDER Gábor:
>>>>> @@ -280,12 +269,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>>>>>  	if (o && o->type == OBJ_COMMIT) {
>>>>>  		struct commit *commit = (struct commit *)o;
>>>>>  		int from_tag = starts_with(path, "refs/tags/");
>>>>> +		const char *tip_name;
>>>>
>>>> This should not be const because you allocate the buffer it points to
>>>> right here in the function, in each execution path.
>>>
>>> Marking it as const indicates that this function doesn't modify the
>>> buffer where the pointer points at.
>>
>> Right, and that's at odds with this code:
>>
>>>>> +		if (deref)
>>>>> +			tip_name = xstrfmt("%s^0", path);
>>>>> +		else
>>>>> +			tip_name = xstrdup(path);
>>
>> ... which allocates said memory and writes a string to it.
>
> ... before assigning it to the const pointer.
>

Sure, you can cast anything to anything else, and slapping on a const
qualifier is even allowed to be done implicitly for pointers to objects
(but not for pointers to pointers).  Removing it later (e.g. for
free(3)) is a warning sign; such sites need to be checked manually, as
the compiler won't do it.

The declaration says we don't modify the buffer, but then we actually
create it, which is as big a modification as can be.  That's a bit
misleading.  Is protection against accidental updates worth the
misdirection, and where would they come from?  Usually code without
such tricks is easier to read and maintain.

René

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH] name-rev: rewrite create_or_update_name()
  2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
  2019-09-20 15:18   ` Derrick Stolee
@ 2019-09-22  8:18   ` Martin Ågren
  2019-12-09 12:43     ` SZEDER Gábor
  1 sibling, 1 reply; 98+ messages in thread
From: Martin Ågren @ 2019-09-22  8:18 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.

Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Hi SZEDER,

 For the record, --color-moved confirms that your patch is a move and
 the conversion around it looks good to me. I was a bit puzzled by what
 the moved code actually wanted to *do* and came up with this rewrite.

 I guess it's subjective which of these ways of writing this function is
 "better", but I found this rewrite helped in understanding this
 function. Feel free to pick up in a reroll, squash or ignore as you see
 fit. This is based on top of your whole series (your e5d77042f), but
 could perhaps go immediately after your patch 07/15.

 It seems there was some discussion around leaks and leak-plugs. That
 would conflict/interact with this. Instead of placing a call to free()
 just before the label so we can more or less goto around it, the middle
 section of this rewritten function would turn from "if no name,
 allocate one" to "if we have a name, free stuff, else allocate one".
 Again, it's subjective which way is "better", so trust your judgment.

 Martin

 builtin/name-rev.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6969af76c4..03a5f0b189 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -83,21 +83,21 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = get_commit_rev_name(commit);
 
+	if (name && !is_better_name(name, taggerdate, distance, from_tag))
+		return NULL;
+
 	if (name == NULL) {
 		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-
-		return name;
-	} else
-		return NULL;
+	}
+
+	name->tip_name = tip_name;
+	name->taggerdate = taggerdate;
+	name->generation = generation;
+	name->distance = distance;
+	name->from_tag = from_tag;
+
+	return name;
 }
 
 static void name_rev(struct commit *start_commit,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-21 12:37           ` René Scharfe
@ 2019-09-22 19:05             ` SZEDER Gábor
  2019-09-23 18:43               ` René Scharfe
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-22 19:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote:
> Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
> > On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
> >> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
> >>>> If the (re)introduced leak doesn't impact performance and memory
> >>>> usage too much then duplicating tip_name again in name_rev() or
> >>>> rather your new create_or_update_name() would likely make the
> >>>> lifetimes of those string buffers easier to manage.
> >>>
> >>> Yeah, the easiest would be when each 'struct rev_name' in the commit
> >>> slab would have its own 'tip_name' string, but that would result in
> >>> a lot of duplicated strings and increased memory usage.
> >>
> >> I didn't measure how much more memory would be used, though.
> >
> > So, I tried the patch below to give each 'struct rev_name' instance
> > its own copy of 'tip_name', and the memory usage of 'git name-rev
> > --all' usually increased.
> >
> > The increase depends on how many merges and how many refs there are
> > compared to the number of commits: the fewer merges and refs, the
> > higher the more the memory usage increased:
> >
> >   linux:         +4.8%
> >   gcc:           +7.2%
> >   gecko-dev:     +9.2%
> >   webkit:       +12.4%
> >   llvm-project: +19.0%
> 
> Is that the overall memory usage or just for struct rev_name instances
> and tip_name strings?

It's overall memory usage, the avarage of five runs of:

  /usr/bin/time --format='%M' ~/src/git/git name-rev --all

> And how much is that in absolute terms?  

git:     29801 ->  28514
linux:  317018 -> 332218
gcc:    106462 -> 114140
gecko:  315448 -> 344486
webkit:  55847 ->  62780
llvm:   112867 -> 134384

> (Perhaps
> it's worth it to get the memory ownership question off the table at
> least during the transformation to iterative processing.)

I looked into it only because I got curious, but other than that I
will definitely play the "beyond the scope of this patch series" card
:)

> > git.git is the exception with its unusually high number of merge
> > commits (about 25%), and the memory usage decresed by 4.4%.
> 
> Interesting.
> 
> I wonder why regular commits even need a struct name_rev.  Shouldn't
> only tips and roots need ones?  And perhaps merges and occasional
> regular "checkpoint" commits, to avoid too many duplicate traversals.

The 'struct rev_name' holds all info that's necessary to determine the
commit's name.  It seems to be much simpler to just attach one to each
commit and then retrieve it from the commit slab when printing the
name of the commit than to come up with an algorithm where only a
sleect set of commits get a 'struct rev_name', including how to access
those when naming a commit that doesn't have one.

> That's not exactly on-topic, though, and I didn't think all that
> deeply about it, but perhaps switching to a different marking
> strategy could get rid of recursion as a side-effect?  *waves hands
> vaguely*

I suppose a topo-order-based history walk should be able to name all
commits in a single traversal, and, consequently, be faster.  However,
'git rev-list --all --topo-order' doesn't seem to be that much faster
than 'git name-rev --all', so it might not be worth the effort.

> >  --- >8 ---
> >
> > diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> > index 6969af76c4..62ab78242b 100644
> > --- a/builtin/name-rev.c
> > +++ b/builtin/name-rev.c
> > @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
> >  		set_commit_rev_name(commit, name);
> >  		goto copy_data;
> >  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> > +		free((char*) name->tip_name);
> >  copy_data:
> >  		name->tip_name = tip_name;
> 
> I would have expected a xstrdup() call here.

But then we'd needed to release the results of all those xstrfmt()
calls at the callsites of create_or_update_name(), so instead of those
strdup() calls that you deem unnecessary we would need additional
free() calls.

> >  		name->taggerdate = taggerdate;
> > @@ -106,21 +107,19 @@ static void name_rev(struct commit *start_commit,
> >  {
> >  	struct commit_list *list = NULL;
> >  	const char *tip_name;
> > -	char *to_free = NULL;
> >
> >  	parse_commit(start_commit);
> >  	if (start_commit->date < cutoff)
> >  		return;
> >
> >  	if (deref) {
> > -		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
> > -		free((char*) start_tip_name);
> > +		tip_name = xstrfmt("%s^0", start_tip_name);
> >  	} else
> > -		tip_name = start_tip_name;
> > +		tip_name = strdup(start_tip_name);
> 
> This would not be needed with the central xstrdup() call mentioned above.
> 
> >
> >  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
> >  				   from_tag)) {
> > -		free(to_free);
> > +		free((char*) tip_name);
> >  		return;
> >  	}
> >
> > @@ -139,7 +138,6 @@ static void name_rev(struct commit *start_commit,
> >  			struct commit *parent = parents->item;
> >  			const char *new_name;
> >  			int generation, distance;
> > -			const char *new_name_to_free = NULL;
> >
> >  			parse_commit(parent);
> >  			if (parent->date < cutoff)
> > @@ -159,11 +157,10 @@ static void name_rev(struct commit *start_commit,
> >  					new_name = xstrfmt("%.*s^%d", (int)len,
> >  							   name->tip_name,
> >  							   parent_number);
> > -				new_name_to_free = new_name;
> >  				generation = 0;
> >  				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
> >  			} else {
> > -				new_name = name->tip_name;
> > +				new_name = strdup(name->tip_name);
> 
> ... and neither would this.
> 
> Sure the xstrfmt() result would be duplicated instead of being reused, but
> that doesn't increase memory usage overall.
> 
> >  				generation = name->generation + 1;
> >  				distance = name->distance + 1;
> >  			}
> > @@ -174,7 +171,7 @@ static void name_rev(struct commit *start_commit,
> >  				last_new_parent = commit_list_append(parent,
> >  						  last_new_parent);
> >  			else
> > -				free((char*) new_name_to_free);
> > +				free((char*) new_name);
> >  		}
> >
> >  		*last_new_parent = list;
> > @@ -313,7 +310,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >  		if (taggerdate == TIME_MAX)
> >  			taggerdate = commit->date;
> >  		path = name_ref_abbrev(path, can_abbreviate_output);
> > -		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
> > +		name_rev(commit, path, taggerdate, from_tag, deref);
> >  	}
> >  	return 0;
> >  }
> >

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-22 19:05             ` SZEDER Gábor
@ 2019-09-23 18:43               ` René Scharfe
  2019-09-23 18:59                 ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-23 18:43 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 22.09.19 um 21:05 schrieb SZEDER Gábor:
> On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote:
>> Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
>>> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
>>>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
>>>>>> If the (re)introduced leak doesn't impact performance and memory
>>>>>> usage too much then duplicating tip_name again in name_rev() or
>>>>>> rather your new create_or_update_name() would likely make the
>>>>>> lifetimes of those string buffers easier to manage.
>>>>>
>>>>> Yeah, the easiest would be when each 'struct rev_name' in the commit
>>>>> slab would have its own 'tip_name' string, but that would result in
>>>>> a lot of duplicated strings and increased memory usage.
>>>>
>>>> I didn't measure how much more memory would be used, though.
>>>
>>> So, I tried the patch below to give each 'struct rev_name' instance
>>> its own copy of 'tip_name', and the memory usage of 'git name-rev
>>> --all' usually increased.
>>>
>>> The increase depends on how many merges and how many refs there are
>>> compared to the number of commits: the fewer merges and refs, the
>>> higher the more the memory usage increased:
>>>
>>>   linux:         +4.8%
>>>   gcc:           +7.2%
>>>   gecko-dev:     +9.2%
>>>   webkit:       +12.4%
>>>   llvm-project: +19.0%
>>
>> Is that the overall memory usage or just for struct rev_name instances
>> and tip_name strings?
>
> It's overall memory usage, the avarage of five runs of:
>
>   /usr/bin/time --format='%M' ~/src/git/git name-rev --all
>
>> And how much is that in absolute terms?
>
> git:     29801 ->  28514
> linux:  317018 -> 332218
> gcc:    106462 -> 114140
> gecko:  315448 -> 344486
> webkit:  55847 ->  62780
> llvm:   112867 -> 134384

I only have the first two handy, and I get numbers like this with
master:

git, lots of branches with long names: 3075476
git, local clone, single branch:       1349016
linux, single branch:                  1520468

O_o

>> (Perhaps
>> it's worth it to get the memory ownership question off the table at
>> least during the transformation to iterative processing.)
>
> I looked into it only because I got curious, but other than that I
> will definitely play the "beyond the scope of this patch series" card
> :)

Fair enough.

>> I wonder why regular commits even need a struct name_rev.  Shouldn't
>> only tips and roots need ones?  And perhaps merges and occasional
>> regular "checkpoint" commits, to avoid too many duplicate traversals.
>
> The 'struct rev_name' holds all info that's necessary to determine the
> commit's name.  It seems to be much simpler to just attach one to each
> commit and then retrieve it from the commit slab when printing the
> name of the commit than to come up with an algorithm where only a
> sleect set of commits get a 'struct rev_name', including how to access
> those when naming a commit that doesn't have one.

Sure, the lookup of individual commits is much easier once all commits
have name tags attached.  Preparing that sounds expensive, though.
It's a trade-off favoring looking up lots of names per program run.

>>>  --- >8 ---
>>>
>>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>>> index 6969af76c4..62ab78242b 100644
>>> --- a/builtin/name-rev.c
>>> +++ b/builtin/name-rev.c
>>> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>>>  		set_commit_rev_name(commit, name);
>>>  		goto copy_data;
>>>  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
>>> +		free((char*) name->tip_name);
>>>  copy_data:
>>>  		name->tip_name = tip_name;
>>
>> I would have expected a xstrdup() call here.
>
> But then we'd needed to release the results of all those xstrfmt()
> calls at the callsites of create_or_update_name(), so instead of those
> strdup() calls that you deem unnecessary we would need additional
> free() calls.

Correct.  That would be simpler and shouldn't affect peak memory
usage.

René

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-23 18:43               ` René Scharfe
@ 2019-09-23 18:59                 ` SZEDER Gábor
  2019-09-23 19:55                   ` René Scharfe
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-23 18:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Mon, Sep 23, 2019 at 08:43:11PM +0200, René Scharfe wrote:
> > It's overall memory usage, the avarage of five runs of:
> >
> >   /usr/bin/time --format='%M' ~/src/git/git name-rev --all
> >
> >> And how much is that in absolute terms?
> >
> > git:     29801 ->  28514
> > linux:  317018 -> 332218
> > gcc:    106462 -> 114140
> > gecko:  315448 -> 344486
> > webkit:  55847 ->  62780
> > llvm:   112867 -> 134384
> 
> I only have the first two handy, and I get numbers like this with
> master:
> 
> git, lots of branches with long names: 3075476
> git, local clone, single branch:       1349016
> linux, single branch:                  1520468
> 
> O_o

I have commit graph present and enabled.  Without that I get approx.
the same memory usage in my linux repo as you did (along with much
longer runtime).

Will have to clarify this in the commit messages that talk about
runtime and memory usage.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-23 18:59                 ` SZEDER Gábor
@ 2019-09-23 19:55                   ` René Scharfe
  2019-09-23 20:47                     ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-23 19:55 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 23.09.19 um 20:59 schrieb SZEDER Gábor:
> On Mon, Sep 23, 2019 at 08:43:11PM +0200, René Scharfe wrote:
>>> It's overall memory usage, the avarage of five runs of:
>>>
>>>   /usr/bin/time --format='%M' ~/src/git/git name-rev --all
>>>
>>>> And how much is that in absolute terms?
>>>
>>> git:     29801 ->  28514
>>> linux:  317018 -> 332218
>>> gcc:    106462 -> 114140
>>> gecko:  315448 -> 344486
>>> webkit:  55847 ->  62780
>>> llvm:   112867 -> 134384
>>
>> I only have the first two handy, and I get numbers like this with
>> master:
>>
>> git, lots of branches with long names: 3075476
>> git, local clone, single branch:       1349016
>> linux, single branch:                  1520468
>>
>> O_o
>
> I have commit graph present and enabled.  Without that I get approx.
> the same memory usage in my linux repo as you did (along with much
> longer runtime).

OK.  Cloned git afresh and tried with master and without commit-graph
again, after "git commit-graph write" and both again with the patch
below:

git:                           109880
git w/ commit-graph:            47208
git w/ patch:                   94304
git w/ commit-graph and patch:  31220

Strange numbers, at least compared to my number for the clone above:
One order of magnitude less!  Not sure what to make of it.  (Tried
the clone again, same result.)

Anyway, here's the patch:

-- >8 --
Subject: [PATCH] name-rev: use FLEX_ARRAY for tip_name in struct rev_name

Give each rev_name its very own tip_name string.  This simplifies memory
ownership, as callers of name_rev() only have to make sure the tip_name
parameter exists for the duration of the call and don't have to preserve
it for the whole run of the program.

It also saves four or eight bytes per object because this change removes
the pointer indirection.  Memory usage is still higher for linear
histories that previously shared the same tip_name value between
multiple name_rev instances.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c785fe16ba..4162fb29ee 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -12,11 +12,11 @@
 #define CUTOFF_DATE_SLOP 86400 /* one day */

 typedef struct rev_name {
-	const char *tip_name;
 	timestamp_t taggerdate;
 	int generation;
 	int distance;
 	int from_tag;
+	char tip_name[FLEX_ARRAY];
 } rev_name;

 define_commit_slab(commit_rev_name, struct rev_name *);
@@ -97,17 +97,14 @@ static void name_rev(struct commit *commit,
 			die("generation: %d, but deref?", generation);
 	}

-	if (name == NULL) {
-		name = xmalloc(sizeof(rev_name));
-		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
+	if (!name || is_better_name(name, taggerdate, distance, from_tag)) {
+		free(name);
+		FLEX_ALLOC_STR(name, tip_name, tip_name);
 		name->taggerdate = taggerdate;
 		name->generation = generation;
 		name->distance = distance;
 		name->from_tag = from_tag;
+		set_commit_rev_name(commit, name);
 	} else {
 		free(to_free);
 		return;
@@ -131,12 +128,14 @@ static void name_rev(struct commit *commit,
 			name_rev(parents->item, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
 				 from_tag, 0);
+			free(new_name);
 		} else {
 			name_rev(parents->item, tip_name, taggerdate,
 				 generation + 1, distance + 1,
 				 from_tag, 0);
 		}
 	}
+	free(to_free);
 }

 static int subpath_matches(const char *path, const char *filter)
@@ -270,8 +269,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = ((struct commit *)o)->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref);
+		name_rev(commit, path, taggerdate, 0, 0, from_tag, deref);
 	}
 	return 0;
 }
--
2.23.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-23 19:55                   ` René Scharfe
@ 2019-09-23 20:47                     ` SZEDER Gábor
  2019-09-24 17:03                       ` René Scharfe
  0 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-23 20:47 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Mon, Sep 23, 2019 at 09:55:11PM +0200, René Scharfe wrote:
> -- >8 --
> Subject: [PATCH] name-rev: use FLEX_ARRAY for tip_name in struct rev_name
> 
> Give each rev_name its very own tip_name string.  This simplifies memory
> ownership, as callers of name_rev() only have to make sure the tip_name
> parameter exists for the duration of the call and don't have to preserve
> it for the whole run of the program.
> 
> It also saves four or eight bytes per object because this change removes
> the pointer indirection.  Memory usage is still higher for linear
> histories that previously shared the same tip_name value between
> multiple name_rev instances.

Besides looking at memory usage, have you run any performance
benchmarks?  Here it seems to make 'git name-rev --all >out' slower by
17% in the git repo and by 19.5% in the linux repo.


> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/name-rev.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index c785fe16ba..4162fb29ee 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -12,11 +12,11 @@
>  #define CUTOFF_DATE_SLOP 86400 /* one day */
> 
>  typedef struct rev_name {
> -	const char *tip_name;
>  	timestamp_t taggerdate;
>  	int generation;
>  	int distance;
>  	int from_tag;
> +	char tip_name[FLEX_ARRAY];
>  } rev_name;
> 
>  define_commit_slab(commit_rev_name, struct rev_name *);
> @@ -97,17 +97,14 @@ static void name_rev(struct commit *commit,
>  			die("generation: %d, but deref?", generation);
>  	}
> 
> -	if (name == NULL) {
> -		name = xmalloc(sizeof(rev_name));
> -		set_commit_rev_name(commit, name);
> -		goto copy_data;
> -	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> -copy_data:
> -		name->tip_name = tip_name;
> +	if (!name || is_better_name(name, taggerdate, distance, from_tag)) {
> +		free(name);
> +		FLEX_ALLOC_STR(name, tip_name, tip_name);
>  		name->taggerdate = taggerdate;
>  		name->generation = generation;
>  		name->distance = distance;
>  		name->from_tag = from_tag;
> +		set_commit_rev_name(commit, name);
>  	} else {
>  		free(to_free);
>  		return;
> @@ -131,12 +128,14 @@ static void name_rev(struct commit *commit,
>  			name_rev(parents->item, new_name, taggerdate, 0,
>  				 distance + MERGE_TRAVERSAL_WEIGHT,
>  				 from_tag, 0);
> +			free(new_name);
>  		} else {
>  			name_rev(parents->item, tip_name, taggerdate,
>  				 generation + 1, distance + 1,
>  				 from_tag, 0);
>  		}
>  	}
> +	free(to_free);
>  }
> 
>  static int subpath_matches(const char *path, const char *filter)
> @@ -270,8 +269,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = ((struct commit *)o)->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
> -			 from_tag, deref);
> +		name_rev(commit, path, taggerdate, 0, 0, from_tag, deref);
>  	}
>  	return 0;
>  }
> --
> 2.23.0

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-23 20:47                     ` SZEDER Gábor
@ 2019-09-24 17:03                       ` René Scharfe
  2019-09-26 17:33                         ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: René Scharfe @ 2019-09-24 17:03 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git

Am 23.09.19 um 22:47 schrieb SZEDER Gábor:
> On Mon, Sep 23, 2019 at 09:55:11PM +0200, René Scharfe wrote:
>> -- >8 --
>> Subject: [PATCH] name-rev: use FLEX_ARRAY for tip_name in struct rev_name
>>
>> Give each rev_name its very own tip_name string.  This simplifies memory
>> ownership, as callers of name_rev() only have to make sure the tip_name
>> parameter exists for the duration of the call and don't have to preserve
>> it for the whole run of the program.
>>
>> It also saves four or eight bytes per object because this change removes
>> the pointer indirection.  Memory usage is still higher for linear
>> histories that previously shared the same tip_name value between
>> multiple name_rev instances.
>
> Besides looking at memory usage, have you run any performance
> benchmarks?  Here it seems to make 'git name-rev --all >out' slower by
> 17% in the git repo and by 19.5% in the linux repo.

Did measure now; I also see a slowdown with my patch applied:

git:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     462.8 ms ±   2.8 ms    [User: 440.6 ms, System: 20.5 ms]
  Range (min … max):   459.6 ms … 466.5 ms    10 runs

git w/ commit-graph:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     104.0 ms ±   1.5 ms    [User: 93.7 ms, System: 10.0 ms]
  Range (min … max):   101.5 ms … 107.1 ms    28 runs

git w/ patch:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     475.1 ms ±   3.7 ms    [User: 458.3 ms, System: 16.0 ms]
  Range (min … max):   470.4 ms … 481.4 ms    10 runs

git w/ commit-graph and patch:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     110.9 ms ±   1.5 ms    [User: 106.6 ms, System: 4.1 ms]
  Range (min … max):   109.0 ms … 114.7 ms    26 runs


linux:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):      6.670 s ±  0.027 s    [User: 6.450 s, System: 0.208 s]
  Range (min … max):    6.640 s …  6.721 s    10 runs

linux w/ patch:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):      6.784 s ±  0.160 s    [User: 6.567 s, System: 0.214 s]
  Range (min … max):    6.638 s …  7.211 s    10 runs

linux w/ commit-graph:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     929.6 ms ±   5.3 ms    [User: 881.4 ms, System: 46.8 ms]
  Range (min … max):   924.1 ms … 939.5 ms    10 runs

linux w/ commit-graph and patch:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):      1.004 s ±  0.007 s    [User: 957.4 ms, System: 45.6 ms]
  Range (min … max):    0.997 s …  1.021 s    10 runs

We can reuse a strbuf instead of allocating new strings when adding
suffixes to get some of the performance loss back.  I guess it's easier
after the recursion is removed.  Numbers:

git w/ both patches:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     448.0 ms ±   2.4 ms    [User: 428.2 ms, System: 19.6 ms]
  Range (min … max):   445.3 ms … 453.4 ms    10 runs

git w/ commit-graph and both patches:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):      98.7 ms ±   1.6 ms    [User: 93.5 ms, System: 5.0 ms]
  Range (min … max):    96.7 ms … 102.8 ms    30 runs

linux w/ both patches:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):      6.727 s ±  0.063 s    [User: 6.486 s, System: 0.226 s]
  Range (min … max):    6.675 s …  6.872 s    10 runs

linux w/ commit-graph and both patches:
Benchmark #1: ~/src/git/git name-rev --all
  Time (mean ± σ):     988.8 ms ±   4.5 ms    [User: 937.5 ms, System: 49.2 ms]
  Range (min … max):   981.4 ms … 994.8 ms    10 runs


---
 builtin/name-rev.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 4162fb29ee..7fee664574 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -75,15 +75,14 @@ static int is_better_name(struct rev_name *name,
 	return 0;
 }

-static void name_rev(struct commit *commit,
-		const char *tip_name, timestamp_t taggerdate,
+static void name_rev(struct commit *commit, struct strbuf *sb,
+		timestamp_t taggerdate,
 		int generation, int distance, int from_tag,
 		int deref)
 {
 	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
-	char *to_free = NULL;

 	parse_commit(commit);

@@ -91,7 +90,7 @@ static void name_rev(struct commit *commit,
 		return;

 	if (deref) {
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
+		strbuf_addstr(sb, "^0");

 		if (generation)
 			die("generation: %d, but deref?", generation);
@@ -99,14 +98,13 @@ static void name_rev(struct commit *commit,

 	if (!name || is_better_name(name, taggerdate, distance, from_tag)) {
 		free(name);
-		FLEX_ALLOC_STR(name, tip_name, tip_name);
+		FLEX_ALLOC_MEM(name, tip_name, sb->buf, sb->len);
 		name->taggerdate = taggerdate;
 		name->generation = generation;
 		name->distance = distance;
 		name->from_tag = from_tag;
 		set_commit_rev_name(commit, name);
 	} else {
-		free(to_free);
 		return;
 	}

@@ -114,28 +112,26 @@ static void name_rev(struct commit *commit,
 			parents;
 			parents = parents->next, parent_number++) {
 		if (parent_number > 1) {
-			size_t len;
-			char *new_name;
-
-			strip_suffix(tip_name, "^0", &len);
+			int stripped = strbuf_strip_suffix(sb, "^0");
+			size_t base_len = sb->len;
 			if (generation > 0)
-				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   generation, parent_number);
-			else
-				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
-						   parent_number);
+				strbuf_addf(sb, "~%d", generation);
+			strbuf_addf(sb, "^%d", parent_number);

-			name_rev(parents->item, new_name, taggerdate, 0,
+			name_rev(parents->item, sb, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
 				 from_tag, 0);
-			free(new_name);
+			strbuf_setlen(sb, base_len);
+			if (stripped)
+				strbuf_addstr(sb, "^0");
 		} else {
-			name_rev(parents->item, tip_name, taggerdate,
+			size_t base_len = sb->len;
+			name_rev(parents->item, sb, taggerdate,
 				 generation + 1, distance + 1,
 				 from_tag, 0);
+			strbuf_setlen(sb, base_len);
 		}
 	}
-	free(to_free);
 }

 static int subpath_matches(const char *path, const char *filter)
@@ -200,6 +196,7 @@ static int tipcmp(const void *a_, const void *b_)

 static int name_ref(const char *path, const struct object_id *oid, int flags, void *cb_data)
 {
+	static struct strbuf sb = STRBUF_INIT;
 	struct object *o = parse_object(the_repository, oid);
 	struct name_ref_data *data = cb_data;
 	int can_abbreviate_output = data->tags_only && data->name_only;
@@ -269,7 +266,9 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = ((struct commit *)o)->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, path, taggerdate, 0, 0, from_tag, deref);
+		strbuf_reset(&sb);
+		strbuf_addstr(&sb, path);
+		name_rev(commit, &sb, taggerdate, 0, 0, from_tag, deref);
 	}
 	return 0;
 }
--
2.23.0

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion
  2019-09-24 17:03                       ` René Scharfe
@ 2019-09-26 17:33                         ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-09-26 17:33 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Tue, Sep 24, 2019 at 07:03:50PM +0200, René Scharfe wrote:
> Am 23.09.19 um 22:47 schrieb SZEDER Gábor:
> > On Mon, Sep 23, 2019 at 09:55:11PM +0200, René Scharfe wrote:
> >> -- >8 --
> >> Subject: [PATCH] name-rev: use FLEX_ARRAY for tip_name in struct rev_name
> >>
> >> Give each rev_name its very own tip_name string.  This simplifies memory
> >> ownership, as callers of name_rev() only have to make sure the tip_name
> >> parameter exists for the duration of the call and don't have to preserve
> >> it for the whole run of the program.
> >>
> >> It also saves four or eight bytes per object because this change removes
> >> the pointer indirection.  Memory usage is still higher for linear
> >> histories that previously shared the same tip_name value between
> >> multiple name_rev instances.
> >
> > Besides looking at memory usage, have you run any performance
> > benchmarks?  Here it seems to make 'git name-rev --all >out' slower by
> > 17% in the git repo and by 19.5% in the linux repo.
> 
> Did measure now; I also see a slowdown with my patch applied:

Thanks for confirming.

> We can reuse a strbuf instead of allocating new strings when adding
> suffixes to get some of the performance loss back.  I guess it's easier
> after the recursion is removed.  Numbers:

Agreed, the conflicts on first sight are too ugly to have these
changes in parallel cooking topics.  Furthermore, after the recursion
is gone we can measure the memory usage and performance impact of your
changes even in big linear repositories.

I think I will drop the last two patches plugging memory leaks from v2
of my series, because it seems your proposed changes do it cleaner and
make them moot anyway.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH 01/15] t6120-describe: correct test repo history graph in comment
  2019-09-20 22:29     ` SZEDER Gábor
@ 2019-09-28  4:06       ` Junio C Hamano
  0 siblings, 0 replies; 98+ messages in thread
From: Junio C Hamano @ 2019-09-28  4:06 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

>> Hmm...
>> 
>> > +#       ,---o----o----o-----.
>> > +#      /   D,R   e           \
>> > +#  o--o-----o-------------o---o----x
>> > +#      \    B            /
>> > +#       `---o----o----o-'
>> > +#                A    c
>> 
>> What's the first parent of the merge between 'B' and 'c' in this
>> picture and how does the reader figure it out?  What about the same
>> question on the direct parent of 'x'?  Is it generally accepted that
>> a straight line denotes the first ancestry, or something?
>
> I've always thought that the parents are numbered from top to bottom,
> i.e. 'B' is the first parent of the first merge, and the unnamed
> commit at the top is the first parent of the second merge.
>
> Would it help if it were arranged like this:
>
>   o---o-----o----o----o-------o----x
>        \   D,R   e           /
>         \---o-------------o-'
>          \  B            /
>           `-o----o----o-'
>                  A    c
>
> This is basically how 'git log --graph' would show them, except that
> this is horizontal.

Either is fine as long as they come with your "for a merge, earlier
parents are drawn near the top of the page" rule clearly described
near it (without such comment, I do not think either is clear enough).

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v2 00/13] name-rev: eliminate recursion
  2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
                   ` (17 preceding siblings ...)
  2019-09-20 15:37 ` [PATCH 00/15] name-rev: eliminate recursion Derrick Stolee
@ 2019-11-12 10:38 ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment SZEDER Gábor
                     ` (14 more replies)
  18 siblings, 15 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

'git name-rev' is implemented using a recursive algorithm, and,
consequently, it can segfault in deep histories (e.g. WebKit), and
thanks to a test case demonstrating this limitation every test run
results in a dmesg entry logging the segfaulting git process.

This patch series eliminates the recursion.

Patches 1-5 are while-at-it cleanups I noticed on the way, and patch 6
improves test coverage.  Patches 7-11 are preparatory refactorings
that are supposed to make this series easier to follow, and make patch
12, the one finally eliminating the recursion, somewhat shorter, and
even much shorter when viewed with '--ignore-all-space'.  Patch 13
cleans up after those preparatory steps.

Changes since v1:

  - Patch 12 now eliminates the recursion using a LIFO 'prio_queue'
    instead of a 'commit_list' to avoid any performance penalty.

  - Commit message updates, clarifications, typofixes, missing
    signoffs, etc., most notably in patches 6 and 12.

  - Updated ASCII art history graphs.

  - Replaced the strbuf_suffix() cleanup in patch 3 with René's
    suggestion; now that patch needs his signoff.

  - Dropped the last two patches plugging memory leaks; René's plan
    to clean up memory ownership looked more promising, and that
    would make these two dropped patches moot anyway.

v1: https://public-inbox.org/git/20190919214712.7348-1-szeder.dev@gmail.com/T/#u

René Scharfe (1):
  name-rev: use strbuf_strip_suffix() in get_rev_name()

SZEDER Gábor (12):
  t6120-describe: correct test repo history graph in comment
  t6120-describe: modernize the 'check_describe' helper
  name-rev: avoid unnecessary cast in name_ref()
  name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  t6120: add a test to cover inner conditions in 'git name-rev's
    name_rev()
  name-rev: extract creating/updating a 'struct name_rev' into a helper
  name-rev: pull out deref handling from the recursion
  name-rev: restructure parsing commits and applying date cutoff
  name-rev: restructure creating/updating 'struct rev_name' instances
  name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  name-rev: eliminate recursion in name_rev()
  name-rev: cleanup name_ref()

 builtin/name-rev.c  | 147 +++++++++++++++++++++++++++++---------------
 t/t6120-describe.sh |  72 +++++++++++++++++-----
 2 files changed, 153 insertions(+), 66 deletions(-)

Range-diff:
 1:  673da20e3d !  1:  8d70ed050d t6120-describe: correct test repo history graph in comment
    @@ t/t6120-describe.sh
     -test_description='test describe
     +test_description='test describe'
     +
    -+#       ,---o----o----o-----.
    -+#      /   D,R   e           \
    -+#  o--o-----o-------------o---o----x
    -+#      \    B            /
    -+#       `---o----o----o-'
    -+#                A    c
    ++#  o---o-----o----o----o-------o----x
    ++#       \   D,R   e           /
    ++#        \---o-------------o-'
    ++#         \  B            /
    ++#          `-o----o----o-'
    ++#                 A    c
    ++#
    ++# First parent of a merge commit is on the same line, second parent below.
      
     -                       B
     -        .--------------o----o----o----x
 2:  05df899693 =  2:  3720b6859d t6120-describe: modernize the 'check_describe' helper
 3:  7b0227cfea !  3:  ad2f2eee68 name-rev: use strip_suffix() in get_rev_name()
    @@
      ## Metadata ##
    -Author: SZEDER Gábor <szeder.dev@gmail.com>
    +Author: René Scharfe <l.s.r@web.de>
     
      ## Commit message ##
    -    name-rev: use strip_suffix() in get_rev_name()
    +    name-rev: use strbuf_strip_suffix() in get_rev_name()
     
    -    Use strip_suffix() instead of open-coding it, making the code more
    -    idiomatic.
    +    get_name_rev() basically open-codes strip_suffix() before adding a
    +    string to a strbuf.
     
    +    Let's use the strbuf right from the beginning, i.e. add the whole
    +    string to the strbuf and then use strbuf_strip_suffix(), making the
    +    code more idiomatic.
    +
    +    [TODO: René's signoff!]
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      ## builtin/name-rev.c ##
    @@ builtin/name-rev.c: static const char *get_rev_name(const struct object *o, stru
     -		int len = strlen(n->tip_name);
     -		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
     -			len -= 2;
    -+		size_t len;
    -+		strip_suffix(n->tip_name, "^0", &len);
      		strbuf_reset(buf);
     -		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
    -+		strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
    -+			    n->generation);
    ++		strbuf_addstr(buf, n->tip_name);
    ++		strbuf_strip_suffix(buf, "^0");
    ++		strbuf_addf(buf, "~%d", n->generation);
      		return buf->buf;
      	}
      }
 4:  40faecdc2a =  4:  c86a2ae2d0 name-rev: avoid unnecessary cast in name_ref()
 5:  c71df3dadf =  5:  4fc960cc05 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
 6:  1dcb76072f !  6:  1493cb4484 t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
    @@ Commit message
         looks like this:
     
           if (parent_number > 1) {
    -        if (generation > 0)
    -          // do stuff #1
    -        else
    -          // do stuff #2
    +          if (generation > 0)
    +              // branch #1
    +              new_name = ...
    +          else
    +              // branch #2
    +              new_name = ...
    +          name_rev(parent, new_name, ...);
           } else {
    -         // do stuff #3
    +          // branch #3
    +          name_rev(...);
           }
     
         These conditions are not covered properly in the test suite.  As far
    @@ Commit message
         command's output, because the repository used in that test script
         contains several branches and tags pointing somewhere into the middle
         of the commit DAG, and thus result in a better name for the
    -    to-be-named commit.  In an early version of this patch series I
    -    managed to mess up those conditions (every single one of them at
    -    once!), but the whole test suite still passed successfully.
    +    to-be-named commit.  This can hide bugs: e.g. by replacing the
    +    'new_name' parameter of the first recursive name_rev() call with
    +    'tip_name' (effectively making both branch #1 and #2 a noop) 'git
    +    name-rev --all' shows thousands of bogus names in the Git repository,
    +    but the whole test suite still passes successfully.  In an early
    +    version of a later patch in this series I managed to mess up all three
    +    branches (at once!), but the test suite still passed.
     
         So add a new test case that operates on the following history:
     
    -        -----------master
    -       /          /
    -      A----------M2
    -       \        /
    -        \---M1-C
    -         \ /
    -          B
    +      A--------------master
    +       \            /
    +        \----------M2
    +         \        /
    +          \---M1-C
    +           \ /
    +            B
     
    -    and names the commit 'B', where:
    +    and names the commit 'B' to make sure that all three branches are
    +    crucial to determine 'B's name:
     
    -      - The merge commit at master makes sure that the 'do stuff #3'
    -        affects the final name.
    +      - There is only a single ref, so all names are based on 'master',
    +        without any undesired interference from other refs.
     
    -      - The merge commit M2 make sure that the 'do stuff #1' part
    -        affects the final name.
    +      - Each time name_rev() follows the second parent of a merge commit,
    +        it appends "^2" to the name.  Following 'master's second parent
    +        right at the start ensures that all commits on the ancestry path
    +        from 'master' to 'B' have a different base name from the original
    +        'tip_name' of the very first name_rev() invocation.  Currently,
    +        while name_rev() is recursive, it doesn't matter, but it will be
    +        necessary to properly cover all three branches after the recursion
    +        is eliminated later in this series.
     
    -      - And M1 makes sure that the 'do stuff #2' part affects the final
    -        name.
    +      - Following 'M2's second parent makes sure that branch #2 (i.e. when
    +        'generation = 0') affects 'B's name.
    +
    +      - Following the only parent of the non-merge commit 'C' ensures that
    +        branch #3 affects 'B's name, and that it increments 'generation'.
    +
    +      - Coming from 'C' 'generation' is 1, thus following 'M1's second
    +        parent makes sure that branch #1 affects 'B's name.
     
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      ## t/t6120-describe.sh ##
    -@@ t/t6120-describe.sh: test_expect_success 'describe complains about missing object' '
    - 	test_must_fail git describe $ZERO_OID
    +@@ t/t6120-describe.sh: test_expect_success 'name-rev a rev shortly after epoch' '
    + 	test_cmp expect actual
      '
      
    -+#   -----------master
    -+#  /          /
    -+# A----------M2
    -+#  \        /
    -+#   \---M1-C
    -+#    \ /
    -+#     B
    -+test_expect_success 'test' '
    ++# A--------------master
    ++#  \            /
    ++#   \----------M2
    ++#    \        /
    ++#     \---M1-C
    ++#      \ /
    ++#       B
    ++test_expect_success 'name-rev covers all conditions while looking at parents' '
     +	git init repo &&
     +	(
     +		cd repo &&
    @@ t/t6120-describe.sh: test_expect_success 'describe complains about missing objec
     +		git checkout master &&
     +		git merge --no-ff HEAD@{1} &&
     +
    -+		git log --graph --oneline &&
    -+
     +		echo "$B master^2^2~1^2" >expect &&
     +		git name-rev $B >actual &&
     +
 7:  bdd8378b06 =  7:  fc842e578b name-rev: extract creating/updating a 'struct name_rev' into a helper
 8:  ce21c351f9 !  8:  7f182503e2 name-rev: pull out deref handling from the recursion
    @@ Commit message
         Extract this condition from the recursion into name_rev()'s caller and
         drop the function's 'deref' parameter.  This makes eliminating the
         recursion a bit easier to follow, and it will be moved back into
    -    name_rev() after the recursion is elminated.
    +    name_rev() after the recursion is eliminated.
     
         Furthermore, drop the condition that die()s when both 'deref' and
         'generation' are non-null (which should have been a BUG() to begin
    @@ Commit message
     
         Note that this change reintroduces the memory leak that was plugged in
         in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
    -    case, 2017-05-04), but a later patch in this series will plug it in
    -    again.
    +    case, 2017-05-04), but a later patch (name-rev: restructure
    +    creating/updating 'struct rev_name' instances) in this series will
    +    plug it in again.
     
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
 9:  c8acc6b597 !  9:  0cdd40b75b name-rev: restructure parsing commits and applying date cutoff
    @@ Commit message
         name_rev() and name_rev() itself as it iterates over the parent
         commits.
     
    -    This makes eliminating the recursion a bit easier to follow, and it
    -    will be moved back to name_rev() after the recursion is eliminated.
    +    This makes eliminating the recursion a bit easier to follow, and the
    +    condition moved to name_ref() will be moved back to name_rev() after
    +    the recursion is eliminated.
     
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
10:  c731f27158 ! 10:  e1733e3c56 name-rev: restructure creating/updating 'struct rev_name' instances
    @@ Commit message
         At the beginning of the recursive name_rev() function it creates a new
         'struct rev_name' instance for each previously unvisited commit or, if
         this visit results in better name for an already visited commit, then
    -    updates the 'struct rev_name' instance attached to to the commit, or
    +    updates the 'struct rev_name' instance attached to the commit, or
         returns early.
     
         Restructure this so it's caller creates or updates the 'struct
    @@ Commit message
         parameter, i.e. both name_ref() before calling name_rev() and
         name_rev() itself as it iterates over the parent commits.
     
    -    This makes eliminating the recursion a bit easier to follow, and it
    -    will be moved back to name_rev() after the recursion is eliminated.
    +    This makes eliminating the recursion a bit easier to follow, and the
    +    condition moved to name_ref() will be moved back to name_rev() after
    +    the recursion is eliminated.
     
         This change also plugs the memory leak that was temporarily unplugged
         in the earlier "name-rev: pull out deref handling from the recursion"
11:  ba14bde230 ! 11:  bd6e2e6d87 name-rev: drop name_rev()'s 'generation' and 'distance' parameters
    @@ Commit message
         'taggerdate' and 'from_tag' parameters as well, but those parameters
         will be necessary later, after the recursion is eliminated.
     
    -    Drop name_rev()'s 'generation' and 'distance' parameters.
    +    Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      ## builtin/name-rev.c ##
     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
12:  2d03ac11f3 ! 12:  0cf63c6d64 name-rev: eliminate recursion in name_rev()
    @@ Commit message
         segfault when processing a deep history if it exhausts the available
         stack space.  E.g. running 'git name-rev --all' and 'git name-rev
         HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
    -    results in segfaults on my machine.
    +    results in segfaults on my machine ('ulimit -s' reports 8192kB of
    +    stack size limit), and nowadays the former segfaults in the Linux repo
    +    as well (it reached the necessasry depth sometime between v5.3-rc4 and
    +    -rc5).
     
         Eliminate the recursion by inserting the interesting parents into a
    -    'commit_list' and iteratating until the list becomes empty.
    +    LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
     
    -    Note that the order in which the parent commits are added to that list
    -    is important: they must be inserted at the beginning of the list, and
    -    their relative order must be kept as well, because otherwise
    -    performance suffers.
    +    Note that the parent commits must be added in reverse order to the
    +    LIFO 'prio_queue', so their relative order is preserved during
    +    processing, i.e. the first parent should come out first from the
    +    queue, because otherwise performance greatly suffers on mergy
    +    histories [2].
     
         The stacksize-limited test 'name-rev works in a deep repo' in
         't6120-describe.sh' demonstrated this issue and expected failure.  Now
    -    the recursion is gone, so flip it to expect success.
    -
    -    Also gone are the dmesg entries logging the segfault of the git
    -    process on every execution of the test suite.
    -
    -    Unfortunately, eliminating the recursion comes with a performance
    -    penaly: 'git name-rev --all' tends to be between 15-20% slower than
    -    before.
    +    the recursion is gone, so flip it to expect success.  Also gone are
    +    the dmesg entries logging the segfault of that segfaulting 'git
    +    name-rev' process on every execution of the test suite.
     
         Note that this slightly changes the order of lines in the output of
         'git name-rev --all', usually swapping two lines every 35 lines in
    @@ Commit message
     
         This patch is best viewed with '--ignore-all-space'.
     
    +    [1] Early versions of this patch used a 'commit_list', resulting in
    +        ~15% performance penalty for 'git name-rev --all' in 'linux.git',
    +        presumably because of the memory allocation and release for each
    +        insertion and removal. Using a LIFO 'prio_queue' has basically no
    +        effect on performance.
    +
    +    [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
    +        'v0.1^2~5', meaning that usually following the first parent of a
    +        merge results in the best name for its ancestors.  So when later
    +        we follow the remaining parent(s) of a merge, and reach an already
    +        named commit, then we usually find that we can't give that commit
    +        a better name, and thus we don't have to visit any of its
    +        ancestors again.
    +
    +        OTOH, if we were to follow the Nth parent of the merge first, then
    +        the name of all its ancestors would include a corresponding '^N'.
    +        Those are not the best names for those commits, so when later we
    +        reach an already named commit following the first parent of that
    +        merge, then we would have to update the name of that commit and
    +        the names of all of its ancestors as well.  Consequently, we would
    +        have to visit many commits several times, resulting in a
    +        significant slowdown.
    +
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      ## builtin/name-rev.c ##
    +@@
    + #include "tag.h"
    + #include "refs.h"
    + #include "parse-options.h"
    ++#include "prio-queue.h"
    + #include "sha1-lookup.h"
    + #include "commit-slab.h"
    + 
     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
      		return NULL;
      }
    @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
     -		parse_commit(parent);
     -		if (parent->date < cutoff)
     -			continue;
    -+	struct commit_list *list = NULL;
    ++	struct prio_queue queue;
    ++	struct commit *commit;
    ++	struct commit **parents_to_queue = NULL;
    ++	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
     +
    -+	commit_list_insert(start_commit, &list);
    ++	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
    ++	prio_queue_put(&queue, start_commit);
     +
    -+	while (list) {
    -+		struct commit *commit = pop_commit(&list);
    ++	while ((commit = prio_queue_get(&queue))) {
     +		struct rev_name *name = get_commit_rev_name(commit);
    -+		struct commit_list *parents, *new_parents = NULL;
    -+		struct commit_list **last_new_parent = &new_parents;
    ++		struct commit_list *parents;
     +		int parent_number = 1;
     +
    ++		parents_to_queue_nr = 0;
    ++
     +		for (parents = commit->parents;
     +				parents;
     +				parents = parents->next, parent_number++) {
    @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
     -			distance = name->distance + 1;
     +			if (create_or_update_name(parent, new_name, taggerdate,
     +						  generation, distance,
    -+						  from_tag))
    -+				last_new_parent = commit_list_append(parent,
    -+						  last_new_parent);
    ++						  from_tag)) {
    ++				ALLOC_GROW(parents_to_queue,
    ++					   parents_to_queue_nr + 1,
    ++					   parents_to_queue_alloc);
    ++				parents_to_queue[parents_to_queue_nr] = parent;
    ++				parents_to_queue_nr++;
    ++			}
      		}
      
     -		if (create_or_update_name(parent, new_name, taggerdate,
     -					  generation, distance,
     -					  from_tag))
     -			name_rev(parent, new_name, taggerdate, from_tag);
    -+		*last_new_parent = list;
    -+		list = new_parents;
    ++		/* The first parent must come out first from the prio_queue */
    ++		while (parents_to_queue_nr)
    ++			prio_queue_put(&queue,
    ++				       parents_to_queue[--parents_to_queue_nr]);
      	}
    ++
    ++	clear_prio_queue(&queue);
    ++	free(parents_to_queue);
      }
      
    + static int subpath_matches(const char *path, const char *filter)
     
      ## t/t6120-describe.sh ##
     @@ t/t6120-describe.sh: test_expect_success 'describe tag object' '
13:  1ef69550ca ! 13:  316f7af43c name-rev: cleanup name_ref()
    @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
     -		int from_tag)
     +		int from_tag, int deref)
      {
    - 	struct commit_list *list = NULL;
    + 	struct prio_queue queue;
    + 	struct commit *commit;
    + 	struct commit **parents_to_queue = NULL;
    + 	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
     +	char *to_free = NULL;
     +
     +	parse_commit(start_commit);
    @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
     +		return;
     +	}
      
    - 	commit_list_insert(start_commit, &list);
    - 
    + 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
    + 	prio_queue_put(&queue, start_commit);
     @@ builtin/name-rev.c: static int name_ref(const char *path, const struct object_id *oid, int flags, vo
      		if (taggerdate == TIME_MAX)
      			taggerdate = commit->date;
14:  9d513b3092 <  -:  ---------- name-rev: plug a memory leak in name_rev()
15:  8489abb62e <  -:  ---------- name-rev: plug a memory leak in name_rev() in the deref case
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

At the top of 't6120-describe.sh' an ASCII graph illustrates the
repository's history used in this test script.  This graph is a bit
misleading, because it swapped the second merge commit's first and
second parents.

When describing/naming a commit it does make a difference which parent
is the first and which is the second/Nth, so update this graph to
accurately represent that second merge.

While at it, move this history graph from the 'test_description'
variable to a regular comment.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 45047d0a72..9b184179d1 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -1,15 +1,16 @@
 #!/bin/sh
 
-test_description='test describe
+test_description='test describe'
+
+#  o---o-----o----o----o-------o----x
+#       \   D,R   e           /
+#        \---o-------------o-'
+#         \  B            /
+#          `-o----o----o-'
+#                 A    c
+#
+# First parent of a merge commit is on the same line, second parent below.
 
-                       B
-        .--------------o----o----o----x
-       /                   /    /
- o----o----o----o----o----.    /
-       \        A    c        /
-        .------------o---o---o
-                   D,R   e
-'
 . ./test-lib.sh
 
 check_describe () {
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-27 18:02     ` Jonathan Tan
  2019-11-12 10:38   ` [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
                     ` (12 subsequent siblings)
  14 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

The 'check_describe' helper function runs 'git describe' outside of
'test_expect_success' blocks, with extra hand-rolled code to record
and examine its exit code.

Update this helper and move the 'git decribe' invocation inside the
'test_expect_success' block.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 9b184179d1..a2988fa0c2 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -16,14 +16,12 @@ test_description='test describe'
 check_describe () {
 	expect="$1"
 	shift
-	R=$(git describe "$@" 2>err.actual)
-	S=$?
-	cat err.actual >&3
-	test_expect_success "describe $*" '
-	test $S = 0 &&
+	describe_opts="$@"
+	test_expect_success "describe $describe_opts" '
+	R=$(git describe $describe_opts 2>err.actual) &&
 	case "$R" in
 	$expect)	echo happy ;;
-	*)	echo "Oops - $R is not $expect";
+	*)	echo "Oops - $R is not $expect" &&
 		false ;;
 	esac
 	'
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name()
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 19:02     ` René Scharfe
  2019-11-12 10:38   ` [PATCH v2 04/13] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

From: René Scharfe <l.s.r@web.de>

get_name_rev() basically open-codes strip_suffix() before adding a
string to a strbuf.

Let's use the strbuf right from the beginning, i.e. add the whole
string to the strbuf and then use strbuf_strip_suffix(), making the
code more idiomatic.

[TODO: René's signoff!]
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index b0f0776947..15919adbdb 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -321,11 +321,10 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 	if (!n->generation)
 		return n->tip_name;
 	else {
-		int len = strlen(n->tip_name);
-		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
-			len -= 2;
 		strbuf_reset(buf);
-		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
+		strbuf_addstr(buf, n->tip_name);
+		strbuf_strip_suffix(buf, "^0");
+		strbuf_addf(buf, "~%d", n->generation);
 		return buf->buf;
 	}
 }
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 04/13] name-rev: avoid unnecessary cast in name_ref()
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (2 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 05/13] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

Casting a 'struct object' to 'struct commit' is unnecessary there,
because it's already available in the local 'commit' variable.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 15919adbdb..e40f51c2b4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -271,9 +271,9 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
 
 		if (taggerdate == TIME_MAX)
-			taggerdate = ((struct commit *)o)->date;
+			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
 			 from_tag, deref);
 	}
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 05/13] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (3 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 04/13] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 06/13] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e40f51c2b4..7e003c2702 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -102,7 +102,7 @@ static void name_rev(struct commit *commit,
 	}
 
 	if (name == NULL) {
-		name = xmalloc(sizeof(rev_name));
+		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
 		goto copy_data;
 	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 06/13] t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (4 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 05/13] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 07/13] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

In 'builtin/name-rev.c' in the name_rev() function there is a loop
iterating over all parents of the given commit, and the loop body
looks like this:

  if (parent_number > 1) {
      if (generation > 0)
          // branch #1
          new_name = ...
      else
          // branch #2
          new_name = ...
      name_rev(parent, new_name, ...);
  } else {
      // branch #3
      name_rev(...);
  }

These conditions are not covered properly in the test suite.  As far
as purely test coverage goes, they are all executed several times over
in 't6120-describe.sh'.  However, they don't directly influence the
command's output, because the repository used in that test script
contains several branches and tags pointing somewhere into the middle
of the commit DAG, and thus result in a better name for the
to-be-named commit.  This can hide bugs: e.g. by replacing the
'new_name' parameter of the first recursive name_rev() call with
'tip_name' (effectively making both branch #1 and #2 a noop) 'git
name-rev --all' shows thousands of bogus names in the Git repository,
but the whole test suite still passes successfully.  In an early
version of a later patch in this series I managed to mess up all three
branches (at once!), but the test suite still passed.

So add a new test case that operates on the following history:

  A--------------master
   \            /
    \----------M2
     \        /
      \---M1-C
       \ /
        B

and names the commit 'B' to make sure that all three branches are
crucial to determine 'B's name:

  - There is only a single ref, so all names are based on 'master',
    without any undesired interference from other refs.

  - Each time name_rev() follows the second parent of a merge commit,
    it appends "^2" to the name.  Following 'master's second parent
    right at the start ensures that all commits on the ancestry path
    from 'master' to 'B' have a different base name from the original
    'tip_name' of the very first name_rev() invocation.  Currently,
    while name_rev() is recursive, it doesn't matter, but it will be
    necessary to properly cover all three branches after the recursion
    is eliminated later in this series.

  - Following 'M2's second parent makes sure that branch #2 (i.e. when
    'generation = 0') affects 'B's name.

  - Following the only parent of the non-merge commit 'C' ensures that
    branch #3 affects 'B's name, and that it increments 'generation'.

  - Coming from 'C' 'generation' is 1, thus following 'M1's second
    parent makes sure that branch #1 affects 'B's name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index a2988fa0c2..0d119e9652 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -438,4 +438,45 @@ test_expect_success 'name-rev a rev shortly after epoch' '
 	test_cmp expect actual
 '
 
+# A--------------master
+#  \            /
+#   \----------M2
+#    \        /
+#     \---M1-C
+#      \ /
+#       B
+test_expect_success 'name-rev covers all conditions while looking at parents' '
+	git init repo &&
+	(
+		cd repo &&
+
+		echo A >file &&
+		git add file &&
+		git commit -m A &&
+		A=$(git rev-parse HEAD) &&
+
+		git checkout --detach &&
+		echo B >file &&
+		git commit -m B file &&
+		B=$(git rev-parse HEAD) &&
+
+		git checkout $A &&
+		git merge --no-ff $B &&  # M1
+
+		echo C >file &&
+		git commit -m C file &&
+
+		git checkout $A &&
+		git merge --no-ff HEAD@{1} && # M2
+
+		git checkout master &&
+		git merge --no-ff HEAD@{1} &&
+
+		echo "$B master^2^2~1^2" >expect &&
+		git name-rev $B >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 07/13] name-rev: extract creating/updating a 'struct name_rev' into a helper
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (5 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 06/13] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 08/13] name-rev: pull out deref handling from the recursion SZEDER Gábor
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

In a later patch in this series we'll want to do this in two places.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 7e003c2702..e43df19709 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -79,12 +79,36 @@ static int is_better_name(struct rev_name *name,
 	return 0;
 }
 
+static struct rev_name *create_or_update_name(struct commit *commit,
+					      const char *tip_name,
+					      timestamp_t taggerdate,
+					      int generation, int distance,
+					      int from_tag)
+{
+	struct rev_name *name = get_commit_rev_name(commit);
+
+	if (name == NULL) {
+		name = xmalloc(sizeof(*name));
+		set_commit_rev_name(commit, name);
+		goto copy_data;
+	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
+copy_data:
+		name->tip_name = tip_name;
+		name->taggerdate = taggerdate;
+		name->generation = generation;
+		name->distance = distance;
+		name->from_tag = from_tag;
+
+		return name;
+	} else
+		return NULL;
+}
+
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int generation, int distance, int from_tag,
 		int deref)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 	char *to_free = NULL;
@@ -101,18 +125,8 @@ static void name_rev(struct commit *commit,
 			die("generation: %d, but deref?", generation);
 	}
 
-	if (name == NULL) {
-		name = xmalloc(sizeof(*name));
-		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-	} else {
+	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
+				   distance, from_tag)) {
 		free(to_free);
 		return;
 	}
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 08/13] name-rev: pull out deref handling from the recursion
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (6 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 07/13] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 09/13] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

The 'if (deref) { ... }' condition near the beginning of the recursive
name_rev() function can only ever be true in the first invocation,
because the 'deref' parameter is always 0 in the subsequent recursive
invocations.

Extract this condition from the recursion into name_rev()'s caller and
drop the function's 'deref' parameter.  This makes eliminating the
recursion a bit easier to follow, and it will be moved back into
name_rev() after the recursion is eliminated.

Furthermore, drop the condition that die()s when both 'deref' and
'generation' are non-null (which should have been a BUG() to begin
with).

Note that this change reintroduces the memory leak that was plugged in
in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
case, 2017-05-04), but a later patch (name-rev: restructure
creating/updating 'struct rev_name' instances) in this series will
plug it in again.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e43df19709..e112a92b03 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -106,30 +106,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag,
-		int deref)
+		int generation, int distance, int from_tag)
 {
 	struct commit_list *parents;
 	int parent_number = 1;
-	char *to_free = NULL;
 
 	parse_commit(commit);
 
 	if (commit->date < cutoff)
 		return;
 
-	if (deref) {
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
-
-		if (generation)
-			die("generation: %d, but deref?", generation);
-	}
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag)) {
-		free(to_free);
+				   distance, from_tag))
 		return;
-	}
 
 	for (parents = commit->parents;
 			parents;
@@ -148,11 +137,11 @@ static void name_rev(struct commit *commit,
 
 			name_rev(parents->item, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag, 0);
+				 from_tag);
 		} else {
 			name_rev(parents->item, tip_name, taggerdate,
 				 generation + 1, distance + 1,
-				 from_tag, 0);
+				 from_tag);
 		}
 	}
 }
@@ -284,12 +273,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
+		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref);
+		if (deref)
+			tip_name = xstrfmt("%s^0", path);
+		else
+			tip_name = xstrdup(path);
+		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
 	}
 	return 0;
 }
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 09/13] name-rev: restructure parsing commits and applying date cutoff
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (7 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 08/13] name-rev: pull out deref handling from the recursion SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 10/13] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.

Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e112a92b03..5041227790 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -111,11 +111,6 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	parse_commit(commit);
-
-	if (commit->date < cutoff)
-		return;
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
 				   distance, from_tag))
 		return;
@@ -123,6 +118,12 @@ static void name_rev(struct commit *commit,
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
+		struct commit *parent = parents->item;
+
+		parse_commit(parent);
+		if (parent->date < cutoff)
+			continue;
+
 		if (parent_number > 1) {
 			size_t len;
 			char *new_name;
@@ -135,11 +136,11 @@ static void name_rev(struct commit *commit,
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
 
-			name_rev(parents->item, new_name, taggerdate, 0,
+			name_rev(parent, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
 				 from_tag);
 		} else {
-			name_rev(parents->item, tip_name, taggerdate,
+			name_rev(parent, tip_name, taggerdate,
 				 generation + 1, distance + 1,
 				 from_tag);
 		}
@@ -273,16 +274,18 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
-		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (deref)
-			tip_name = xstrfmt("%s^0", path);
-		else
-			tip_name = xstrdup(path);
-		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		if (commit->date >= cutoff) {
+			const char *tip_name;
+			if (deref)
+				tip_name = xstrfmt("%s^0", path);
+			else
+				tip_name = xstrdup(path);
+			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		}
 	}
 	return 0;
 }
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 10/13] name-rev: restructure creating/updating 'struct rev_name' instances
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (8 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 09/13] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-12 10:38   ` [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to the commit, or
returns early.

Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 5041227790..6416c49f67 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -111,14 +111,12 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag))
-		return;
-
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
+		const char *new_name;
+		int new_generation, new_distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -126,7 +124,6 @@ static void name_rev(struct commit *commit,
 
 		if (parent_number > 1) {
 			size_t len;
-			char *new_name;
 
 			strip_suffix(tip_name, "^0", &len);
 			if (generation > 0)
@@ -135,15 +132,19 @@ static void name_rev(struct commit *commit,
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-
-			name_rev(parent, new_name, taggerdate, 0,
-				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag);
+			new_generation = 0;
+			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
-			name_rev(parent, tip_name, taggerdate,
-				 generation + 1, distance + 1,
-				 from_tag);
+			new_name = tip_name;
+			new_generation = generation + 1;
+			new_distance = distance + 1;
 		}
+
+		if (create_or_update_name(parent, new_name, taggerdate,
+					  new_generation, new_distance,
+					  from_tag))
+			name_rev(parent, new_name, taggerdate,
+				 new_generation, new_distance, from_tag);
 	}
 }
 
@@ -280,11 +281,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		if (commit->date >= cutoff) {
 			const char *tip_name;
+			char *to_free = NULL;
 			if (deref)
-				tip_name = xstrfmt("%s^0", path);
+				tip_name = to_free = xstrfmt("%s^0", path);
 			else
 				tip_name = xstrdup(path);
-			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+			if (create_or_update_name(commit, tip_name, taggerdate,
+						  0, 0, from_tag))
+				name_rev(commit, tip_name, taggerdate, 0, 0,
+					 from_tag);
+			else
+				free(to_free);
 		}
 	}
 	return 0;
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (9 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 10/13] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-27 18:13     ` Jonathan Tan
  2019-11-12 10:38   ` [PATCH v2 12/13] name-rev: eliminate recursion in name_rev() SZEDER Gábor
                     ` (3 subsequent siblings)
  14 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.

Let's simplify the function's signature and remove these two
unnecessary parameters.

Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6416c49f67..fc61d6fa71 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -106,8 +106,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag)
+		int from_tag)
 {
+	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 
@@ -116,7 +117,7 @@ static void name_rev(struct commit *commit,
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
 		const char *new_name;
-		int new_generation, new_distance;
+		int generation, distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -126,25 +127,25 @@ static void name_rev(struct commit *commit,
 			size_t len;
 
 			strip_suffix(tip_name, "^0", &len);
-			if (generation > 0)
+			if (name->generation > 0)
 				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   generation, parent_number);
+						   name->generation,
+						   parent_number);
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-			new_generation = 0;
-			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
+			generation = 0;
+			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
 			new_name = tip_name;
-			new_generation = generation + 1;
-			new_distance = distance + 1;
+			generation = name->generation + 1;
+			distance = name->distance + 1;
 		}
 
 		if (create_or_update_name(parent, new_name, taggerdate,
-					  new_generation, new_distance,
+					  generation, distance,
 					  from_tag))
-			name_rev(parent, new_name, taggerdate,
-				 new_generation, new_distance, from_tag);
+			name_rev(parent, new_name, taggerdate, from_tag);
 	}
 }
 
@@ -288,7 +289,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 				tip_name = xstrdup(path);
 			if (create_or_update_name(commit, tip_name, taggerdate,
 						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate, 0, 0,
+				name_rev(commit, tip_name, taggerdate,
 					 from_tag);
 			else
 				free(to_free);
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 12/13] name-rev: eliminate recursion in name_rev()
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (10 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-27 17:57     ` Jonathan Tan
  2019-11-12 10:38   ` [PATCH v2 13/13] name-rev: cleanup name_ref() SZEDER Gábor
                     ` (2 subsequent siblings)
  14 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space.  E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).

Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.

Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].

The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure.  Now
the recursion is gone, so flip it to expect success.  Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.

Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git.  This shouldn't matter in
practice, because the output has always been unordered anyway.

This patch is best viewed with '--ignore-all-space'.

[1] Early versions of this patch used a 'commit_list', resulting in
    ~15% performance penalty for 'git name-rev --all' in 'linux.git',
    presumably because of the memory allocation and release for each
    insertion and removal. Using a LIFO 'prio_queue' has basically no
    effect on performance.

[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
    'v0.1^2~5', meaning that usually following the first parent of a
    merge results in the best name for its ancestors.  So when later
    we follow the remaining parent(s) of a merge, and reach an already
    named commit, then we usually find that we can't give that commit
    a better name, and thus we don't have to visit any of its
    ancestors again.

    OTOH, if we were to follow the Nth parent of the merge first, then
    the name of all its ancestors would include a corresponding '^N'.
    Those are not the best names for those commits, so when later we
    reach an already named commit following the first parent of that
    merge, then we would have to update the name of that commit and
    the names of all of its ancestors as well.  Consequently, we would
    have to visit many commits several times, resulting in a
    significant slowdown.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c  | 99 +++++++++++++++++++++++++++++----------------
 t/t6120-describe.sh |  2 +-
 2 files changed, 65 insertions(+), 36 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index fc61d6fa71..a3b796eac4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -6,6 +6,7 @@
 #include "tag.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "sha1-lookup.h"
 #include "commit-slab.h"
 
@@ -104,49 +105,77 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 		return NULL;
 }
 
-static void name_rev(struct commit *commit,
+static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int from_tag)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
-	struct commit_list *parents;
-	int parent_number = 1;
-
-	for (parents = commit->parents;
-			parents;
-			parents = parents->next, parent_number++) {
-		struct commit *parent = parents->item;
-		const char *new_name;
-		int generation, distance;
-
-		parse_commit(parent);
-		if (parent->date < cutoff)
-			continue;
+	struct prio_queue queue;
+	struct commit *commit;
+	struct commit **parents_to_queue = NULL;
+	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
+
+	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
+	prio_queue_put(&queue, start_commit);
+
+	while ((commit = prio_queue_get(&queue))) {
+		struct rev_name *name = get_commit_rev_name(commit);
+		struct commit_list *parents;
+		int parent_number = 1;
+
+		parents_to_queue_nr = 0;
+
+		for (parents = commit->parents;
+				parents;
+				parents = parents->next, parent_number++) {
+			struct commit *parent = parents->item;
+			const char *new_name;
+			int generation, distance;
+
+			parse_commit(parent);
+			if (parent->date < cutoff)
+				continue;
 
-		if (parent_number > 1) {
-			size_t len;
+			if (parent_number > 1) {
+				size_t len;
+
+				strip_suffix(name->tip_name, "^0", &len);
+				if (name->generation > 0)
+					new_name = xstrfmt("%.*s~%d^%d",
+							   (int)len,
+							   name->tip_name,
+							   name->generation,
+							   parent_number);
+				else
+					new_name = xstrfmt("%.*s^%d", (int)len,
+							   name->tip_name,
+							   parent_number);
+				generation = 0;
+				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
+			} else {
+				new_name = name->tip_name;
+				generation = name->generation + 1;
+				distance = name->distance + 1;
+			}
 
-			strip_suffix(tip_name, "^0", &len);
-			if (name->generation > 0)
-				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   name->generation,
-						   parent_number);
-			else
-				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
-						   parent_number);
-			generation = 0;
-			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
-		} else {
-			new_name = tip_name;
-			generation = name->generation + 1;
-			distance = name->distance + 1;
+			if (create_or_update_name(parent, new_name, taggerdate,
+						  generation, distance,
+						  from_tag)) {
+				ALLOC_GROW(parents_to_queue,
+					   parents_to_queue_nr + 1,
+					   parents_to_queue_alloc);
+				parents_to_queue[parents_to_queue_nr] = parent;
+				parents_to_queue_nr++;
+			}
 		}
 
-		if (create_or_update_name(parent, new_name, taggerdate,
-					  generation, distance,
-					  from_tag))
-			name_rev(parent, new_name, taggerdate, from_tag);
+		/* The first parent must come out first from the prio_queue */
+		while (parents_to_queue_nr)
+			prio_queue_put(&queue,
+				       parents_to_queue[--parents_to_queue_nr]);
 	}
+
+	clear_prio_queue(&queue);
+	free(parents_to_queue);
 }
 
 static int subpath_matches(const char *path, const char *filter)
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 0d119e9652..09c50f3f04 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -381,7 +381,7 @@ test_expect_success 'describe tag object' '
 	test_i18ngrep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
-test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
+test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
 	do
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v2 13/13] name-rev: cleanup name_ref()
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (11 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 12/13] name-rev: eliminate recursion in name_rev() SZEDER Gábor
@ 2019-11-12 10:38   ` SZEDER Gábor
  2019-11-27 18:01     ` Jonathan Tan
  2019-11-12 19:17   ` [PATCH v2 00/13] name-rev: eliminate recursion Johannes Schindelin
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
  14 siblings, 1 reply; 98+ messages in thread
From: SZEDER Gábor @ 2019-11-12 10:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, René Scharfe, git, SZEDER Gábor

Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.

Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index a3b796eac4..cc488ee319 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -107,12 +107,26 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int from_tag)
+		int from_tag, int deref)
 {
 	struct prio_queue queue;
 	struct commit *commit;
 	struct commit **parents_to_queue = NULL;
 	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
+	char *to_free = NULL;
+
+	parse_commit(start_commit);
+	if (start_commit->date < cutoff)
+		return;
+
+	if (deref)
+		tip_name = to_free = xstrfmt("%s^0", tip_name);
+
+	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
+				   from_tag)) {
+		free(to_free);
+		return;
+	}
 
 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
 	prio_queue_put(&queue, start_commit);
@@ -309,20 +323,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (commit->date >= cutoff) {
-			const char *tip_name;
-			char *to_free = NULL;
-			if (deref)
-				tip_name = to_free = xstrfmt("%s^0", path);
-			else
-				tip_name = xstrdup(path);
-			if (create_or_update_name(commit, tip_name, taggerdate,
-						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate,
-					 from_tag);
-			else
-				free(to_free);
-		}
+		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
 	}
 	return 0;
 }
-- 
2.24.0.388.gde53c094ea


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name()
  2019-11-12 10:38   ` [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
@ 2019-11-12 19:02     ` René Scharfe
  0 siblings, 0 replies; 98+ messages in thread
From: René Scharfe @ 2019-11-12 19:02 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: Derrick Stolee, git

Am 12.11.19 um 11:38 schrieb SZEDER Gábor:

Thanks for keeping the ball rolling, Gábor!

> From: René Scharfe <l.s.r@web.de>
>
> get_name_rev() basically open-codes strip_suffix() before adding a
> string to a strbuf.
>
> Let's use the strbuf right from the beginning, i.e. add the whole
> string to the strbuf and then use strbuf_strip_suffix(), making the
> code more idiomatic.
>
> [TODO: René's signoff!]

Signed-off-by: René Scharfe <l.s.r@web.de>

> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  builtin/name-rev.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index b0f0776947..15919adbdb 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -321,11 +321,10 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
>  	if (!n->generation)
>  		return n->tip_name;
>  	else {
> -		int len = strlen(n->tip_name);
> -		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
> -			len -= 2;
>  		strbuf_reset(buf);
> -		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
> +		strbuf_addstr(buf, n->tip_name);
> +		strbuf_strip_suffix(buf, "^0");
> +		strbuf_addf(buf, "~%d", n->generation);
>  		return buf->buf;
>  	}
>  }
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/13] name-rev: eliminate recursion
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (12 preceding siblings ...)
  2019-11-12 10:38   ` [PATCH v2 13/13] name-rev: cleanup name_ref() SZEDER Gábor
@ 2019-11-12 19:17   ` Johannes Schindelin
  2019-11-13 19:25     ` Sebastiaan Dammann
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
  14 siblings, 1 reply; 98+ messages in thread
From: Johannes Schindelin @ 2019-11-12 19:17 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Derrick Stolee, René Scharfe, git,
	Sebastiaan Dammann

[-- Attachment #1: Type: text/plain, Size: 23574 bytes --]

Hi,

[Cc:ing Sebastian, as they indicated in
https://public-inbox.org/git/CAE7Eq9hEiVf1rMNdWx55_nQsz2gVv0N%2Bs1KckK1evtmruqcHyA@mail.gmail.com/t/#u
that they would be interested in testing this]

Sebastian, could you test this patch series? Since you are on Windows,
you should be able to do so by

- installing Git for Windows' SDK
  (https://gitforwindows.org/#download-sdk)
- `sdk cd git` (possibly `sdk init git`, although that should be
  implied)
- `sdk init build-extra` followed by
  `/usr/src/build-extra/apply-from-public-inbox.sh
  https://public-inbox.org/git/20191112103821.30265-1-szeder.dev@gmail.com/`
- `sdk build`

The result should include a `git.exe` in `/usr/src/git/` that you can
copy to your server and test via `/path/to/git.exe name-rev ...`.

Ciao,
Johannes

On Tue, 12 Nov 2019, SZEDER Gábor wrote:

> 'git name-rev' is implemented using a recursive algorithm, and,
> consequently, it can segfault in deep histories (e.g. WebKit), and
> thanks to a test case demonstrating this limitation every test run
> results in a dmesg entry logging the segfaulting git process.
>
> This patch series eliminates the recursion.
>
> Patches 1-5 are while-at-it cleanups I noticed on the way, and patch 6
> improves test coverage.  Patches 7-11 are preparatory refactorings
> that are supposed to make this series easier to follow, and make patch
> 12, the one finally eliminating the recursion, somewhat shorter, and
> even much shorter when viewed with '--ignore-all-space'.  Patch 13
> cleans up after those preparatory steps.
>
> Changes since v1:
>
>   - Patch 12 now eliminates the recursion using a LIFO 'prio_queue'
>     instead of a 'commit_list' to avoid any performance penalty.
>
>   - Commit message updates, clarifications, typofixes, missing
>     signoffs, etc., most notably in patches 6 and 12.
>
>   - Updated ASCII art history graphs.
>
>   - Replaced the strbuf_suffix() cleanup in patch 3 with René's
>     suggestion; now that patch needs his signoff.
>
>   - Dropped the last two patches plugging memory leaks; René's plan
>     to clean up memory ownership looked more promising, and that
>     would make these two dropped patches moot anyway.
>
> v1: https://public-inbox.org/git/20190919214712.7348-1-szeder.dev@gmail.com/T/#u
>
> René Scharfe (1):
>   name-rev: use strbuf_strip_suffix() in get_rev_name()
>
> SZEDER Gábor (12):
>   t6120-describe: correct test repo history graph in comment
>   t6120-describe: modernize the 'check_describe' helper
>   name-rev: avoid unnecessary cast in name_ref()
>   name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
>   t6120: add a test to cover inner conditions in 'git name-rev's
>     name_rev()
>   name-rev: extract creating/updating a 'struct name_rev' into a helper
>   name-rev: pull out deref handling from the recursion
>   name-rev: restructure parsing commits and applying date cutoff
>   name-rev: restructure creating/updating 'struct rev_name' instances
>   name-rev: drop name_rev()'s 'generation' and 'distance' parameters
>   name-rev: eliminate recursion in name_rev()
>   name-rev: cleanup name_ref()
>
>  builtin/name-rev.c  | 147 +++++++++++++++++++++++++++++---------------
>  t/t6120-describe.sh |  72 +++++++++++++++++-----
>  2 files changed, 153 insertions(+), 66 deletions(-)
>
> Range-diff:
>  1:  673da20e3d !  1:  8d70ed050d t6120-describe: correct test repo history graph in comment
>     @@ t/t6120-describe.sh
>      -test_description='test describe
>      +test_description='test describe'
>      +
>     -+#       ,---o----o----o-----.
>     -+#      /   D,R   e           \
>     -+#  o--o-----o-------------o---o----x
>     -+#      \    B            /
>     -+#       `---o----o----o-'
>     -+#                A    c
>     ++#  o---o-----o----o----o-------o----x
>     ++#       \   D,R   e           /
>     ++#        \---o-------------o-'
>     ++#         \  B            /
>     ++#          `-o----o----o-'
>     ++#                 A    c
>     ++#
>     ++# First parent of a merge commit is on the same line, second parent below.
>
>      -                       B
>      -        .--------------o----o----o----x
>  2:  05df899693 =  2:  3720b6859d t6120-describe: modernize the 'check_describe' helper
>  3:  7b0227cfea !  3:  ad2f2eee68 name-rev: use strip_suffix() in get_rev_name()
>     @@
>       ## Metadata ##
>     -Author: SZEDER Gábor <szeder.dev@gmail.com>
>     +Author: René Scharfe <l.s.r@web.de>
>
>       ## Commit message ##
>     -    name-rev: use strip_suffix() in get_rev_name()
>     +    name-rev: use strbuf_strip_suffix() in get_rev_name()
>
>     -    Use strip_suffix() instead of open-coding it, making the code more
>     -    idiomatic.
>     +    get_name_rev() basically open-codes strip_suffix() before adding a
>     +    string to a strbuf.
>
>     +    Let's use the strbuf right from the beginning, i.e. add the whole
>     +    string to the strbuf and then use strbuf_strip_suffix(), making the
>     +    code more idiomatic.
>     +
>     +    [TODO: René's signoff!]
>          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
>       ## builtin/name-rev.c ##
>     @@ builtin/name-rev.c: static const char *get_rev_name(const struct object *o, stru
>      -		int len = strlen(n->tip_name);
>      -		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
>      -			len -= 2;
>     -+		size_t len;
>     -+		strip_suffix(n->tip_name, "^0", &len);
>       		strbuf_reset(buf);
>      -		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
>     -+		strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
>     -+			    n->generation);
>     ++		strbuf_addstr(buf, n->tip_name);
>     ++		strbuf_strip_suffix(buf, "^0");
>     ++		strbuf_addf(buf, "~%d", n->generation);
>       		return buf->buf;
>       	}
>       }
>  4:  40faecdc2a =  4:  c86a2ae2d0 name-rev: avoid unnecessary cast in name_ref()
>  5:  c71df3dadf =  5:  4fc960cc05 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
>  6:  1dcb76072f !  6:  1493cb4484 t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
>     @@ Commit message
>          looks like this:
>
>            if (parent_number > 1) {
>     -        if (generation > 0)
>     -          // do stuff #1
>     -        else
>     -          // do stuff #2
>     +          if (generation > 0)
>     +              // branch #1
>     +              new_name = ...
>     +          else
>     +              // branch #2
>     +              new_name = ...
>     +          name_rev(parent, new_name, ...);
>            } else {
>     -         // do stuff #3
>     +          // branch #3
>     +          name_rev(...);
>            }
>
>          These conditions are not covered properly in the test suite.  As far
>     @@ Commit message
>          command's output, because the repository used in that test script
>          contains several branches and tags pointing somewhere into the middle
>          of the commit DAG, and thus result in a better name for the
>     -    to-be-named commit.  In an early version of this patch series I
>     -    managed to mess up those conditions (every single one of them at
>     -    once!), but the whole test suite still passed successfully.
>     +    to-be-named commit.  This can hide bugs: e.g. by replacing the
>     +    'new_name' parameter of the first recursive name_rev() call with
>     +    'tip_name' (effectively making both branch #1 and #2 a noop) 'git
>     +    name-rev --all' shows thousands of bogus names in the Git repository,
>     +    but the whole test suite still passes successfully.  In an early
>     +    version of a later patch in this series I managed to mess up all three
>     +    branches (at once!), but the test suite still passed.
>
>          So add a new test case that operates on the following history:
>
>     -        -----------master
>     -       /          /
>     -      A----------M2
>     -       \        /
>     -        \---M1-C
>     -         \ /
>     -          B
>     +      A--------------master
>     +       \            /
>     +        \----------M2
>     +         \        /
>     +          \---M1-C
>     +           \ /
>     +            B
>
>     -    and names the commit 'B', where:
>     +    and names the commit 'B' to make sure that all three branches are
>     +    crucial to determine 'B's name:
>
>     -      - The merge commit at master makes sure that the 'do stuff #3'
>     -        affects the final name.
>     +      - There is only a single ref, so all names are based on 'master',
>     +        without any undesired interference from other refs.
>
>     -      - The merge commit M2 make sure that the 'do stuff #1' part
>     -        affects the final name.
>     +      - Each time name_rev() follows the second parent of a merge commit,
>     +        it appends "^2" to the name.  Following 'master's second parent
>     +        right at the start ensures that all commits on the ancestry path
>     +        from 'master' to 'B' have a different base name from the original
>     +        'tip_name' of the very first name_rev() invocation.  Currently,
>     +        while name_rev() is recursive, it doesn't matter, but it will be
>     +        necessary to properly cover all three branches after the recursion
>     +        is eliminated later in this series.
>
>     -      - And M1 makes sure that the 'do stuff #2' part affects the final
>     -        name.
>     +      - Following 'M2's second parent makes sure that branch #2 (i.e. when
>     +        'generation = 0') affects 'B's name.
>     +
>     +      - Following the only parent of the non-merge commit 'C' ensures that
>     +        branch #3 affects 'B's name, and that it increments 'generation'.
>     +
>     +      - Coming from 'C' 'generation' is 1, thus following 'M1's second
>     +        parent makes sure that branch #1 affects 'B's name.
>
>          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
>       ## t/t6120-describe.sh ##
>     -@@ t/t6120-describe.sh: test_expect_success 'describe complains about missing object' '
>     - 	test_must_fail git describe $ZERO_OID
>     +@@ t/t6120-describe.sh: test_expect_success 'name-rev a rev shortly after epoch' '
>     + 	test_cmp expect actual
>       '
>
>     -+#   -----------master
>     -+#  /          /
>     -+# A----------M2
>     -+#  \        /
>     -+#   \---M1-C
>     -+#    \ /
>     -+#     B
>     -+test_expect_success 'test' '
>     ++# A--------------master
>     ++#  \            /
>     ++#   \----------M2
>     ++#    \        /
>     ++#     \---M1-C
>     ++#      \ /
>     ++#       B
>     ++test_expect_success 'name-rev covers all conditions while looking at parents' '
>      +	git init repo &&
>      +	(
>      +		cd repo &&
>     @@ t/t6120-describe.sh: test_expect_success 'describe complains about missing objec
>      +		git checkout master &&
>      +		git merge --no-ff HEAD@{1} &&
>      +
>     -+		git log --graph --oneline &&
>     -+
>      +		echo "$B master^2^2~1^2" >expect &&
>      +		git name-rev $B >actual &&
>      +
>  7:  bdd8378b06 =  7:  fc842e578b name-rev: extract creating/updating a 'struct name_rev' into a helper
>  8:  ce21c351f9 !  8:  7f182503e2 name-rev: pull out deref handling from the recursion
>     @@ Commit message
>          Extract this condition from the recursion into name_rev()'s caller and
>          drop the function's 'deref' parameter.  This makes eliminating the
>          recursion a bit easier to follow, and it will be moved back into
>     -    name_rev() after the recursion is elminated.
>     +    name_rev() after the recursion is eliminated.
>
>          Furthermore, drop the condition that die()s when both 'deref' and
>          'generation' are non-null (which should have been a BUG() to begin
>     @@ Commit message
>
>          Note that this change reintroduces the memory leak that was plugged in
>          in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
>     -    case, 2017-05-04), but a later patch in this series will plug it in
>     -    again.
>     +    case, 2017-05-04), but a later patch (name-rev: restructure
>     +    creating/updating 'struct rev_name' instances) in this series will
>     +    plug it in again.
>
>          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
>  9:  c8acc6b597 !  9:  0cdd40b75b name-rev: restructure parsing commits and applying date cutoff
>     @@ Commit message
>          name_rev() and name_rev() itself as it iterates over the parent
>          commits.
>
>     -    This makes eliminating the recursion a bit easier to follow, and it
>     -    will be moved back to name_rev() after the recursion is eliminated.
>     +    This makes eliminating the recursion a bit easier to follow, and the
>     +    condition moved to name_ref() will be moved back to name_rev() after
>     +    the recursion is eliminated.
>
>          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
> 10:  c731f27158 ! 10:  e1733e3c56 name-rev: restructure creating/updating 'struct rev_name' instances
>     @@ Commit message
>          At the beginning of the recursive name_rev() function it creates a new
>          'struct rev_name' instance for each previously unvisited commit or, if
>          this visit results in better name for an already visited commit, then
>     -    updates the 'struct rev_name' instance attached to to the commit, or
>     +    updates the 'struct rev_name' instance attached to the commit, or
>          returns early.
>
>          Restructure this so it's caller creates or updates the 'struct
>     @@ Commit message
>          parameter, i.e. both name_ref() before calling name_rev() and
>          name_rev() itself as it iterates over the parent commits.
>
>     -    This makes eliminating the recursion a bit easier to follow, and it
>     -    will be moved back to name_rev() after the recursion is eliminated.
>     +    This makes eliminating the recursion a bit easier to follow, and the
>     +    condition moved to name_ref() will be moved back to name_rev() after
>     +    the recursion is eliminated.
>
>          This change also plugs the memory leak that was temporarily unplugged
>          in the earlier "name-rev: pull out deref handling from the recursion"
> 11:  ba14bde230 ! 11:  bd6e2e6d87 name-rev: drop name_rev()'s 'generation' and 'distance' parameters
>     @@ Commit message
>          'taggerdate' and 'from_tag' parameters as well, but those parameters
>          will be necessary later, after the recursion is eliminated.
>
>     -    Drop name_rev()'s 'generation' and 'distance' parameters.
>     +    Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
>       ## builtin/name-rev.c ##
>      @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
> 12:  2d03ac11f3 ! 12:  0cf63c6d64 name-rev: eliminate recursion in name_rev()
>     @@ Commit message
>          segfault when processing a deep history if it exhausts the available
>          stack space.  E.g. running 'git name-rev --all' and 'git name-rev
>          HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
>     -    results in segfaults on my machine.
>     +    results in segfaults on my machine ('ulimit -s' reports 8192kB of
>     +    stack size limit), and nowadays the former segfaults in the Linux repo
>     +    as well (it reached the necessasry depth sometime between v5.3-rc4 and
>     +    -rc5).
>
>          Eliminate the recursion by inserting the interesting parents into a
>     -    'commit_list' and iteratating until the list becomes empty.
>     +    LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
>
>     -    Note that the order in which the parent commits are added to that list
>     -    is important: they must be inserted at the beginning of the list, and
>     -    their relative order must be kept as well, because otherwise
>     -    performance suffers.
>     +    Note that the parent commits must be added in reverse order to the
>     +    LIFO 'prio_queue', so their relative order is preserved during
>     +    processing, i.e. the first parent should come out first from the
>     +    queue, because otherwise performance greatly suffers on mergy
>     +    histories [2].
>
>          The stacksize-limited test 'name-rev works in a deep repo' in
>          't6120-describe.sh' demonstrated this issue and expected failure.  Now
>     -    the recursion is gone, so flip it to expect success.
>     -
>     -    Also gone are the dmesg entries logging the segfault of the git
>     -    process on every execution of the test suite.
>     -
>     -    Unfortunately, eliminating the recursion comes with a performance
>     -    penaly: 'git name-rev --all' tends to be between 15-20% slower than
>     -    before.
>     +    the recursion is gone, so flip it to expect success.  Also gone are
>     +    the dmesg entries logging the segfault of that segfaulting 'git
>     +    name-rev' process on every execution of the test suite.
>
>          Note that this slightly changes the order of lines in the output of
>          'git name-rev --all', usually swapping two lines every 35 lines in
>     @@ Commit message
>
>          This patch is best viewed with '--ignore-all-space'.
>
>     +    [1] Early versions of this patch used a 'commit_list', resulting in
>     +        ~15% performance penalty for 'git name-rev --all' in 'linux.git',
>     +        presumably because of the memory allocation and release for each
>     +        insertion and removal. Using a LIFO 'prio_queue' has basically no
>     +        effect on performance.
>     +
>     +    [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
>     +        'v0.1^2~5', meaning that usually following the first parent of a
>     +        merge results in the best name for its ancestors.  So when later
>     +        we follow the remaining parent(s) of a merge, and reach an already
>     +        named commit, then we usually find that we can't give that commit
>     +        a better name, and thus we don't have to visit any of its
>     +        ancestors again.
>     +
>     +        OTOH, if we were to follow the Nth parent of the merge first, then
>     +        the name of all its ancestors would include a corresponding '^N'.
>     +        Those are not the best names for those commits, so when later we
>     +        reach an already named commit following the first parent of that
>     +        merge, then we would have to update the name of that commit and
>     +        the names of all of its ancestors as well.  Consequently, we would
>     +        have to visit many commits several times, resulting in a
>     +        significant slowdown.
>     +
>          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>
>       ## builtin/name-rev.c ##
>     +@@
>     + #include "tag.h"
>     + #include "refs.h"
>     + #include "parse-options.h"
>     ++#include "prio-queue.h"
>     + #include "sha1-lookup.h"
>     + #include "commit-slab.h"
>     +
>      @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
>       		return NULL;
>       }
>     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
>      -		parse_commit(parent);
>      -		if (parent->date < cutoff)
>      -			continue;
>     -+	struct commit_list *list = NULL;
>     ++	struct prio_queue queue;
>     ++	struct commit *commit;
>     ++	struct commit **parents_to_queue = NULL;
>     ++	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
>      +
>     -+	commit_list_insert(start_commit, &list);
>     ++	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
>     ++	prio_queue_put(&queue, start_commit);
>      +
>     -+	while (list) {
>     -+		struct commit *commit = pop_commit(&list);
>     ++	while ((commit = prio_queue_get(&queue))) {
>      +		struct rev_name *name = get_commit_rev_name(commit);
>     -+		struct commit_list *parents, *new_parents = NULL;
>     -+		struct commit_list **last_new_parent = &new_parents;
>     ++		struct commit_list *parents;
>      +		int parent_number = 1;
>      +
>     ++		parents_to_queue_nr = 0;
>     ++
>      +		for (parents = commit->parents;
>      +				parents;
>      +				parents = parents->next, parent_number++) {
>     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
>      -			distance = name->distance + 1;
>      +			if (create_or_update_name(parent, new_name, taggerdate,
>      +						  generation, distance,
>     -+						  from_tag))
>     -+				last_new_parent = commit_list_append(parent,
>     -+						  last_new_parent);
>     ++						  from_tag)) {
>     ++				ALLOC_GROW(parents_to_queue,
>     ++					   parents_to_queue_nr + 1,
>     ++					   parents_to_queue_alloc);
>     ++				parents_to_queue[parents_to_queue_nr] = parent;
>     ++				parents_to_queue_nr++;
>     ++			}
>       		}
>
>      -		if (create_or_update_name(parent, new_name, taggerdate,
>      -					  generation, distance,
>      -					  from_tag))
>      -			name_rev(parent, new_name, taggerdate, from_tag);
>     -+		*last_new_parent = list;
>     -+		list = new_parents;
>     ++		/* The first parent must come out first from the prio_queue */
>     ++		while (parents_to_queue_nr)
>     ++			prio_queue_put(&queue,
>     ++				       parents_to_queue[--parents_to_queue_nr]);
>       	}
>     ++
>     ++	clear_prio_queue(&queue);
>     ++	free(parents_to_queue);
>       }
>
>     + static int subpath_matches(const char *path, const char *filter)
>
>       ## t/t6120-describe.sh ##
>      @@ t/t6120-describe.sh: test_expect_success 'describe tag object' '
> 13:  1ef69550ca ! 13:  316f7af43c name-rev: cleanup name_ref()
>     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
>      -		int from_tag)
>      +		int from_tag, int deref)
>       {
>     - 	struct commit_list *list = NULL;
>     + 	struct prio_queue queue;
>     + 	struct commit *commit;
>     + 	struct commit **parents_to_queue = NULL;
>     + 	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
>      +	char *to_free = NULL;
>      +
>      +	parse_commit(start_commit);
>     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
>      +		return;
>      +	}
>
>     - 	commit_list_insert(start_commit, &list);
>     -
>     + 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
>     + 	prio_queue_put(&queue, start_commit);
>      @@ builtin/name-rev.c: static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>       		if (taggerdate == TIME_MAX)
>       			taggerdate = commit->date;
> 14:  9d513b3092 <  -:  ---------- name-rev: plug a memory leak in name_rev()
> 15:  8489abb62e <  -:  ---------- name-rev: plug a memory leak in name_rev() in the deref case
> --
> 2.24.0.388.gde53c094ea
>
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 00/13] name-rev: eliminate recursion
  2019-11-12 19:17   ` [PATCH v2 00/13] name-rev: eliminate recursion Johannes Schindelin
@ 2019-11-13 19:25     ` Sebastiaan Dammann
  0 siblings, 0 replies; 98+ messages in thread
From: Sebastiaan Dammann @ 2019-11-13 19:25 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: SZEDER Gábor, Junio C Hamano, Derrick Stolee,
	René Scharfe, git

Hi Johannes,

The patch works very well on Windows.

Tested it on a repository with 146823 commits and for every commit a note.

Groeten,
Sebastiaan Dammann

On Tue, 12 Nov 2019 at 20:18, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi,
>
> [Cc:ing Sebastian, as they indicated in
> https://public-inbox.org/git/CAE7Eq9hEiVf1rMNdWx55_nQsz2gVv0N%2Bs1KckK1evtmruqcHyA@mail.gmail.com/t/#u
> that they would be interested in testing this]
>
> Sebastian, could you test this patch series? Since you are on Windows,
> you should be able to do so by
>
> - installing Git for Windows' SDK
>   (https://gitforwindows.org/#download-sdk)
> - `sdk cd git` (possibly `sdk init git`, although that should be
>   implied)
> - `sdk init build-extra` followed by
>   `/usr/src/build-extra/apply-from-public-inbox.sh
>   https://public-inbox.org/git/20191112103821.30265-1-szeder.dev@gmail.com/`
> - `sdk build`
>
> The result should include a `git.exe` in `/usr/src/git/` that you can
> copy to your server and test via `/path/to/git.exe name-rev ...`.
>
> Ciao,
> Johannes
>
> On Tue, 12 Nov 2019, SZEDER Gábor wrote:
>
> > 'git name-rev' is implemented using a recursive algorithm, and,
> > consequently, it can segfault in deep histories (e.g. WebKit), and
> > thanks to a test case demonstrating this limitation every test run
> > results in a dmesg entry logging the segfaulting git process.
> >
> > This patch series eliminates the recursion.
> >
> > Patches 1-5 are while-at-it cleanups I noticed on the way, and patch 6
> > improves test coverage.  Patches 7-11 are preparatory refactorings
> > that are supposed to make this series easier to follow, and make patch
> > 12, the one finally eliminating the recursion, somewhat shorter, and
> > even much shorter when viewed with '--ignore-all-space'.  Patch 13
> > cleans up after those preparatory steps.
> >
> > Changes since v1:
> >
> >   - Patch 12 now eliminates the recursion using a LIFO 'prio_queue'
> >     instead of a 'commit_list' to avoid any performance penalty.
> >
> >   - Commit message updates, clarifications, typofixes, missing
> >     signoffs, etc., most notably in patches 6 and 12.
> >
> >   - Updated ASCII art history graphs.
> >
> >   - Replaced the strbuf_suffix() cleanup in patch 3 with René's
> >     suggestion; now that patch needs his signoff.
> >
> >   - Dropped the last two patches plugging memory leaks; René's plan
> >     to clean up memory ownership looked more promising, and that
> >     would make these two dropped patches moot anyway.
> >
> > v1: https://public-inbox.org/git/20190919214712.7348-1-szeder.dev@gmail.com/T/#u
> >
> > René Scharfe (1):
> >   name-rev: use strbuf_strip_suffix() in get_rev_name()
> >
> > SZEDER Gábor (12):
> >   t6120-describe: correct test repo history graph in comment
> >   t6120-describe: modernize the 'check_describe' helper
> >   name-rev: avoid unnecessary cast in name_ref()
> >   name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
> >   t6120: add a test to cover inner conditions in 'git name-rev's
> >     name_rev()
> >   name-rev: extract creating/updating a 'struct name_rev' into a helper
> >   name-rev: pull out deref handling from the recursion
> >   name-rev: restructure parsing commits and applying date cutoff
> >   name-rev: restructure creating/updating 'struct rev_name' instances
> >   name-rev: drop name_rev()'s 'generation' and 'distance' parameters
> >   name-rev: eliminate recursion in name_rev()
> >   name-rev: cleanup name_ref()
> >
> >  builtin/name-rev.c  | 147 +++++++++++++++++++++++++++++---------------
> >  t/t6120-describe.sh |  72 +++++++++++++++++-----
> >  2 files changed, 153 insertions(+), 66 deletions(-)
> >
> > Range-diff:
> >  1:  673da20e3d !  1:  8d70ed050d t6120-describe: correct test repo history graph in comment
> >     @@ t/t6120-describe.sh
> >      -test_description='test describe
> >      +test_description='test describe'
> >      +
> >     -+#       ,---o----o----o-----.
> >     -+#      /   D,R   e           \
> >     -+#  o--o-----o-------------o---o----x
> >     -+#      \    B            /
> >     -+#       `---o----o----o-'
> >     -+#                A    c
> >     ++#  o---o-----o----o----o-------o----x
> >     ++#       \   D,R   e           /
> >     ++#        \---o-------------o-'
> >     ++#         \  B            /
> >     ++#          `-o----o----o-'
> >     ++#                 A    c
> >     ++#
> >     ++# First parent of a merge commit is on the same line, second parent below.
> >
> >      -                       B
> >      -        .--------------o----o----o----x
> >  2:  05df899693 =  2:  3720b6859d t6120-describe: modernize the 'check_describe' helper
> >  3:  7b0227cfea !  3:  ad2f2eee68 name-rev: use strip_suffix() in get_rev_name()
> >     @@
> >       ## Metadata ##
> >     -Author: SZEDER Gábor <szeder.dev@gmail.com>
> >     +Author: René Scharfe <l.s.r@web.de>
> >
> >       ## Commit message ##
> >     -    name-rev: use strip_suffix() in get_rev_name()
> >     +    name-rev: use strbuf_strip_suffix() in get_rev_name()
> >
> >     -    Use strip_suffix() instead of open-coding it, making the code more
> >     -    idiomatic.
> >     +    get_name_rev() basically open-codes strip_suffix() before adding a
> >     +    string to a strbuf.
> >
> >     +    Let's use the strbuf right from the beginning, i.e. add the whole
> >     +    string to the strbuf and then use strbuf_strip_suffix(), making the
> >     +    code more idiomatic.
> >     +
> >     +    [TODO: René's signoff!]
> >          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> >       ## builtin/name-rev.c ##
> >     @@ builtin/name-rev.c: static const char *get_rev_name(const struct object *o, stru
> >      -                int len = strlen(n->tip_name);
> >      -                if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
> >      -                        len -= 2;
> >     -+                size_t len;
> >     -+                strip_suffix(n->tip_name, "^0", &len);
> >                       strbuf_reset(buf);
> >      -                strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
> >     -+                strbuf_addf(buf, "%.*s~%d", (int) len, n->tip_name,
> >     -+                            n->generation);
> >     ++                strbuf_addstr(buf, n->tip_name);
> >     ++                strbuf_strip_suffix(buf, "^0");
> >     ++                strbuf_addf(buf, "~%d", n->generation);
> >                       return buf->buf;
> >               }
> >       }
> >  4:  40faecdc2a =  4:  c86a2ae2d0 name-rev: avoid unnecessary cast in name_ref()
> >  5:  c71df3dadf =  5:  4fc960cc05 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
> >  6:  1dcb76072f !  6:  1493cb4484 t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
> >     @@ Commit message
> >          looks like this:
> >
> >            if (parent_number > 1) {
> >     -        if (generation > 0)
> >     -          // do stuff #1
> >     -        else
> >     -          // do stuff #2
> >     +          if (generation > 0)
> >     +              // branch #1
> >     +              new_name = ...
> >     +          else
> >     +              // branch #2
> >     +              new_name = ...
> >     +          name_rev(parent, new_name, ...);
> >            } else {
> >     -         // do stuff #3
> >     +          // branch #3
> >     +          name_rev(...);
> >            }
> >
> >          These conditions are not covered properly in the test suite.  As far
> >     @@ Commit message
> >          command's output, because the repository used in that test script
> >          contains several branches and tags pointing somewhere into the middle
> >          of the commit DAG, and thus result in a better name for the
> >     -    to-be-named commit.  In an early version of this patch series I
> >     -    managed to mess up those conditions (every single one of them at
> >     -    once!), but the whole test suite still passed successfully.
> >     +    to-be-named commit.  This can hide bugs: e.g. by replacing the
> >     +    'new_name' parameter of the first recursive name_rev() call with
> >     +    'tip_name' (effectively making both branch #1 and #2 a noop) 'git
> >     +    name-rev --all' shows thousands of bogus names in the Git repository,
> >     +    but the whole test suite still passes successfully.  In an early
> >     +    version of a later patch in this series I managed to mess up all three
> >     +    branches (at once!), but the test suite still passed.
> >
> >          So add a new test case that operates on the following history:
> >
> >     -        -----------master
> >     -       /          /
> >     -      A----------M2
> >     -       \        /
> >     -        \---M1-C
> >     -         \ /
> >     -          B
> >     +      A--------------master
> >     +       \            /
> >     +        \----------M2
> >     +         \        /
> >     +          \---M1-C
> >     +           \ /
> >     +            B
> >
> >     -    and names the commit 'B', where:
> >     +    and names the commit 'B' to make sure that all three branches are
> >     +    crucial to determine 'B's name:
> >
> >     -      - The merge commit at master makes sure that the 'do stuff #3'
> >     -        affects the final name.
> >     +      - There is only a single ref, so all names are based on 'master',
> >     +        without any undesired interference from other refs.
> >
> >     -      - The merge commit M2 make sure that the 'do stuff #1' part
> >     -        affects the final name.
> >     +      - Each time name_rev() follows the second parent of a merge commit,
> >     +        it appends "^2" to the name.  Following 'master's second parent
> >     +        right at the start ensures that all commits on the ancestry path
> >     +        from 'master' to 'B' have a different base name from the original
> >     +        'tip_name' of the very first name_rev() invocation.  Currently,
> >     +        while name_rev() is recursive, it doesn't matter, but it will be
> >     +        necessary to properly cover all three branches after the recursion
> >     +        is eliminated later in this series.
> >
> >     -      - And M1 makes sure that the 'do stuff #2' part affects the final
> >     -        name.
> >     +      - Following 'M2's second parent makes sure that branch #2 (i.e. when
> >     +        'generation = 0') affects 'B's name.
> >     +
> >     +      - Following the only parent of the non-merge commit 'C' ensures that
> >     +        branch #3 affects 'B's name, and that it increments 'generation'.
> >     +
> >     +      - Coming from 'C' 'generation' is 1, thus following 'M1's second
> >     +        parent makes sure that branch #1 affects 'B's name.
> >
> >          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> >       ## t/t6120-describe.sh ##
> >     -@@ t/t6120-describe.sh: test_expect_success 'describe complains about missing object' '
> >     -         test_must_fail git describe $ZERO_OID
> >     +@@ t/t6120-describe.sh: test_expect_success 'name-rev a rev shortly after epoch' '
> >     +         test_cmp expect actual
> >       '
> >
> >     -+#   -----------master
> >     -+#  /          /
> >     -+# A----------M2
> >     -+#  \        /
> >     -+#   \---M1-C
> >     -+#    \ /
> >     -+#     B
> >     -+test_expect_success 'test' '
> >     ++# A--------------master
> >     ++#  \            /
> >     ++#   \----------M2
> >     ++#    \        /
> >     ++#     \---M1-C
> >     ++#      \ /
> >     ++#       B
> >     ++test_expect_success 'name-rev covers all conditions while looking at parents' '
> >      +        git init repo &&
> >      +        (
> >      +                cd repo &&
> >     @@ t/t6120-describe.sh: test_expect_success 'describe complains about missing objec
> >      +                git checkout master &&
> >      +                git merge --no-ff HEAD@{1} &&
> >      +
> >     -+                git log --graph --oneline &&
> >     -+
> >      +                echo "$B master^2^2~1^2" >expect &&
> >      +                git name-rev $B >actual &&
> >      +
> >  7:  bdd8378b06 =  7:  fc842e578b name-rev: extract creating/updating a 'struct name_rev' into a helper
> >  8:  ce21c351f9 !  8:  7f182503e2 name-rev: pull out deref handling from the recursion
> >     @@ Commit message
> >          Extract this condition from the recursion into name_rev()'s caller and
> >          drop the function's 'deref' parameter.  This makes eliminating the
> >          recursion a bit easier to follow, and it will be moved back into
> >     -    name_rev() after the recursion is elminated.
> >     +    name_rev() after the recursion is eliminated.
> >
> >          Furthermore, drop the condition that die()s when both 'deref' and
> >          'generation' are non-null (which should have been a BUG() to begin
> >     @@ Commit message
> >
> >          Note that this change reintroduces the memory leak that was plugged in
> >          in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
> >     -    case, 2017-05-04), but a later patch in this series will plug it in
> >     -    again.
> >     +    case, 2017-05-04), but a later patch (name-rev: restructure
> >     +    creating/updating 'struct rev_name' instances) in this series will
> >     +    plug it in again.
> >
> >          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> >  9:  c8acc6b597 !  9:  0cdd40b75b name-rev: restructure parsing commits and applying date cutoff
> >     @@ Commit message
> >          name_rev() and name_rev() itself as it iterates over the parent
> >          commits.
> >
> >     -    This makes eliminating the recursion a bit easier to follow, and it
> >     -    will be moved back to name_rev() after the recursion is eliminated.
> >     +    This makes eliminating the recursion a bit easier to follow, and the
> >     +    condition moved to name_ref() will be moved back to name_rev() after
> >     +    the recursion is eliminated.
> >
> >          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> > 10:  c731f27158 ! 10:  e1733e3c56 name-rev: restructure creating/updating 'struct rev_name' instances
> >     @@ Commit message
> >          At the beginning of the recursive name_rev() function it creates a new
> >          'struct rev_name' instance for each previously unvisited commit or, if
> >          this visit results in better name for an already visited commit, then
> >     -    updates the 'struct rev_name' instance attached to to the commit, or
> >     +    updates the 'struct rev_name' instance attached to the commit, or
> >          returns early.
> >
> >          Restructure this so it's caller creates or updates the 'struct
> >     @@ Commit message
> >          parameter, i.e. both name_ref() before calling name_rev() and
> >          name_rev() itself as it iterates over the parent commits.
> >
> >     -    This makes eliminating the recursion a bit easier to follow, and it
> >     -    will be moved back to name_rev() after the recursion is eliminated.
> >     +    This makes eliminating the recursion a bit easier to follow, and the
> >     +    condition moved to name_ref() will be moved back to name_rev() after
> >     +    the recursion is eliminated.
> >
> >          This change also plugs the memory leak that was temporarily unplugged
> >          in the earlier "name-rev: pull out deref handling from the recursion"
> > 11:  ba14bde230 ! 11:  bd6e2e6d87 name-rev: drop name_rev()'s 'generation' and 'distance' parameters
> >     @@ Commit message
> >          'taggerdate' and 'from_tag' parameters as well, but those parameters
> >          will be necessary later, after the recursion is eliminated.
> >
> >     -    Drop name_rev()'s 'generation' and 'distance' parameters.
> >     +    Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> >       ## builtin/name-rev.c ##
> >      @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
> > 12:  2d03ac11f3 ! 12:  0cf63c6d64 name-rev: eliminate recursion in name_rev()
> >     @@ Commit message
> >          segfault when processing a deep history if it exhausts the available
> >          stack space.  E.g. running 'git name-rev --all' and 'git name-rev
> >          HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
> >     -    results in segfaults on my machine.
> >     +    results in segfaults on my machine ('ulimit -s' reports 8192kB of
> >     +    stack size limit), and nowadays the former segfaults in the Linux repo
> >     +    as well (it reached the necessasry depth sometime between v5.3-rc4 and
> >     +    -rc5).
> >
> >          Eliminate the recursion by inserting the interesting parents into a
> >     -    'commit_list' and iteratating until the list becomes empty.
> >     +    LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
> >
> >     -    Note that the order in which the parent commits are added to that list
> >     -    is important: they must be inserted at the beginning of the list, and
> >     -    their relative order must be kept as well, because otherwise
> >     -    performance suffers.
> >     +    Note that the parent commits must be added in reverse order to the
> >     +    LIFO 'prio_queue', so their relative order is preserved during
> >     +    processing, i.e. the first parent should come out first from the
> >     +    queue, because otherwise performance greatly suffers on mergy
> >     +    histories [2].
> >
> >          The stacksize-limited test 'name-rev works in a deep repo' in
> >          't6120-describe.sh' demonstrated this issue and expected failure.  Now
> >     -    the recursion is gone, so flip it to expect success.
> >     -
> >     -    Also gone are the dmesg entries logging the segfault of the git
> >     -    process on every execution of the test suite.
> >     -
> >     -    Unfortunately, eliminating the recursion comes with a performance
> >     -    penaly: 'git name-rev --all' tends to be between 15-20% slower than
> >     -    before.
> >     +    the recursion is gone, so flip it to expect success.  Also gone are
> >     +    the dmesg entries logging the segfault of that segfaulting 'git
> >     +    name-rev' process on every execution of the test suite.
> >
> >          Note that this slightly changes the order of lines in the output of
> >          'git name-rev --all', usually swapping two lines every 35 lines in
> >     @@ Commit message
> >
> >          This patch is best viewed with '--ignore-all-space'.
> >
> >     +    [1] Early versions of this patch used a 'commit_list', resulting in
> >     +        ~15% performance penalty for 'git name-rev --all' in 'linux.git',
> >     +        presumably because of the memory allocation and release for each
> >     +        insertion and removal. Using a LIFO 'prio_queue' has basically no
> >     +        effect on performance.
> >     +
> >     +    [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
> >     +        'v0.1^2~5', meaning that usually following the first parent of a
> >     +        merge results in the best name for its ancestors.  So when later
> >     +        we follow the remaining parent(s) of a merge, and reach an already
> >     +        named commit, then we usually find that we can't give that commit
> >     +        a better name, and thus we don't have to visit any of its
> >     +        ancestors again.
> >     +
> >     +        OTOH, if we were to follow the Nth parent of the merge first, then
> >     +        the name of all its ancestors would include a corresponding '^N'.
> >     +        Those are not the best names for those commits, so when later we
> >     +        reach an already named commit following the first parent of that
> >     +        merge, then we would have to update the name of that commit and
> >     +        the names of all of its ancestors as well.  Consequently, we would
> >     +        have to visit many commits several times, resulting in a
> >     +        significant slowdown.
> >     +
> >          Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> >
> >       ## builtin/name-rev.c ##
> >     +@@
> >     + #include "tag.h"
> >     + #include "refs.h"
> >     + #include "parse-options.h"
> >     ++#include "prio-queue.h"
> >     + #include "sha1-lookup.h"
> >     + #include "commit-slab.h"
> >     +
> >      @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit *commit,
> >                       return NULL;
> >       }
> >     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
> >      -                parse_commit(parent);
> >      -                if (parent->date < cutoff)
> >      -                        continue;
> >     -+        struct commit_list *list = NULL;
> >     ++        struct prio_queue queue;
> >     ++        struct commit *commit;
> >     ++        struct commit **parents_to_queue = NULL;
> >     ++        size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
> >      +
> >     -+        commit_list_insert(start_commit, &list);
> >     ++        memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
> >     ++        prio_queue_put(&queue, start_commit);
> >      +
> >     -+        while (list) {
> >     -+                struct commit *commit = pop_commit(&list);
> >     ++        while ((commit = prio_queue_get(&queue))) {
> >      +                struct rev_name *name = get_commit_rev_name(commit);
> >     -+                struct commit_list *parents, *new_parents = NULL;
> >     -+                struct commit_list **last_new_parent = &new_parents;
> >     ++                struct commit_list *parents;
> >      +                int parent_number = 1;
> >      +
> >     ++                parents_to_queue_nr = 0;
> >     ++
> >      +                for (parents = commit->parents;
> >      +                                parents;
> >      +                                parents = parents->next, parent_number++) {
> >     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
> >      -                        distance = name->distance + 1;
> >      +                        if (create_or_update_name(parent, new_name, taggerdate,
> >      +                                                  generation, distance,
> >     -+                                                  from_tag))
> >     -+                                last_new_parent = commit_list_append(parent,
> >     -+                                                  last_new_parent);
> >     ++                                                  from_tag)) {
> >     ++                                ALLOC_GROW(parents_to_queue,
> >     ++                                           parents_to_queue_nr + 1,
> >     ++                                           parents_to_queue_alloc);
> >     ++                                parents_to_queue[parents_to_queue_nr] = parent;
> >     ++                                parents_to_queue_nr++;
> >     ++                        }
> >                       }
> >
> >      -                if (create_or_update_name(parent, new_name, taggerdate,
> >      -                                          generation, distance,
> >      -                                          from_tag))
> >      -                        name_rev(parent, new_name, taggerdate, from_tag);
> >     -+                *last_new_parent = list;
> >     -+                list = new_parents;
> >     ++                /* The first parent must come out first from the prio_queue */
> >     ++                while (parents_to_queue_nr)
> >     ++                        prio_queue_put(&queue,
> >     ++                                       parents_to_queue[--parents_to_queue_nr]);
> >               }
> >     ++
> >     ++        clear_prio_queue(&queue);
> >     ++        free(parents_to_queue);
> >       }
> >
> >     + static int subpath_matches(const char *path, const char *filter)
> >
> >       ## t/t6120-describe.sh ##
> >      @@ t/t6120-describe.sh: test_expect_success 'describe tag object' '
> > 13:  1ef69550ca ! 13:  316f7af43c name-rev: cleanup name_ref()
> >     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
> >      -                int from_tag)
> >      +                int from_tag, int deref)
> >       {
> >     -         struct commit_list *list = NULL;
> >     +         struct prio_queue queue;
> >     +         struct commit *commit;
> >     +         struct commit **parents_to_queue = NULL;
> >     +         size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
> >      +        char *to_free = NULL;
> >      +
> >      +        parse_commit(start_commit);
> >     @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
> >      +                return;
> >      +        }
> >
> >     -         commit_list_insert(start_commit, &list);
> >     -
> >     +         memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
> >     +         prio_queue_put(&queue, start_commit);
> >      @@ builtin/name-rev.c: static int name_ref(const char *path, const struct object_id *oid, int flags, vo
> >                       if (taggerdate == TIME_MAX)
> >                               taggerdate = commit->date;
> > 14:  9d513b3092 <  -:  ---------- name-rev: plug a memory leak in name_rev()
> > 15:  8489abb62e <  -:  ---------- name-rev: plug a memory leak in name_rev() in the deref case
> > --
> > 2.24.0.388.gde53c094ea
> >
> >

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 12/13] name-rev: eliminate recursion in name_rev()
  2019-11-12 10:38   ` [PATCH v2 12/13] name-rev: eliminate recursion in name_rev() SZEDER Gábor
@ 2019-11-27 17:57     ` Jonathan Tan
  2019-12-09 12:22       ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: Jonathan Tan @ 2019-11-27 17:57 UTC (permalink / raw)
  To: szeder.dev; +Cc: gitster, stolee, l.s.r, git, Jonathan Tan

> Note that this slightly changes the order of lines in the output of
> 'git name-rev --all', usually swapping two lines every 35 lines in
> git.git or every 150 lines in linux.git.  This shouldn't matter in
> practice, because the output has always been unordered anyway.

I didn't verify that the changing of order is fine, but other than that,
this patch looks great.

> This patch is best viewed with '--ignore-all-space'.

Thanks for the tip! I ended up unindenting the loop to see the changes
better, but I should have done this instead.

> -static void name_rev(struct commit *commit,
> +static void name_rev(struct commit *start_commit,
>  		const char *tip_name, timestamp_t taggerdate,
>  		int from_tag)

There are many changes from tip_name to name->tip_name in this function
that mean that tip_name is no longer used within this function. Should
this cleanup have been done in one of the earlier patches?

Apart from that, overall, this patch looks like a straightforward good
change. When we have a parent, instead of immediately calling name_rev()
recursively, we first add it to an array, and then (in reverse order)
add it to a priority queue which is actually used as a LIFO stack.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/13] name-rev: cleanup name_ref()
  2019-11-12 10:38   ` [PATCH v2 13/13] name-rev: cleanup name_ref() SZEDER Gábor
@ 2019-11-27 18:01     ` Jonathan Tan
  2019-12-09 12:32       ` SZEDER Gábor
  0 siblings, 1 reply; 98+ messages in thread
From: Jonathan Tan @ 2019-11-27 18:01 UTC (permalink / raw)
  To: szeder.dev; +Cc: gitster, stolee, l.s.r, git, Jonathan Tan

> Earlier patches in this series moved a couple of conditions from the
> recursive name_rev() function into its caller name_ref(), for no other
> reason than to make eliminating the recursion a bit easier to follow.
> 
> Since the previous patch name_rev() is not recursive anymore, so let's
> move all those conditions back into name_rev().

I don't really see the need for this code movement, to be honest. There
is no big difference in doing the checks in one place or the other, and
if you ask me, it might even be better to do it in the caller of
name_rev(), and leave name_rev() to handle only the naming.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper
  2019-11-12 10:38   ` [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
@ 2019-11-27 18:02     ` Jonathan Tan
  0 siblings, 0 replies; 98+ messages in thread
From: Jonathan Tan @ 2019-11-27 18:02 UTC (permalink / raw)
  To: szeder.dev; +Cc: gitster, stolee, l.s.r, git, Jonathan Tan

> The 'check_describe' helper function runs 'git describe' outside of
> 'test_expect_success' blocks, with extra hand-rolled code to record
> and examine its exit code.
> 
> Update this helper and move the 'git decribe' invocation inside the
> 'test_expect_success' block.

decribe -> describe

Otherwise, patches 1 and 2 are relatively straightforward and look good
to me.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  2019-11-12 10:38   ` [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
@ 2019-11-27 18:13     ` Jonathan Tan
  0 siblings, 0 replies; 98+ messages in thread
From: Jonathan Tan @ 2019-11-27 18:13 UTC (permalink / raw)
  To: szeder.dev; +Cc: gitster, stolee, l.s.r, git, Jonathan Tan

>  		if (create_or_update_name(parent, new_name, taggerdate,
> -					  new_generation, new_distance,
> +					  generation, distance,
>  					  from_tag))
> -			name_rev(parent, new_name, taggerdate,
> -				 new_generation, new_distance, from_tag);
> +			name_rev(parent, new_name, taggerdate, from_tag);
>  	}
>  }

[snip]

>  			if (create_or_update_name(commit, tip_name, taggerdate,
>  						  0, 0, from_tag))
> -				name_rev(commit, tip_name, taggerdate, 0, 0,
> +				name_rev(commit, tip_name, taggerdate,
>  					 from_tag);
>  			else
>  				free(to_free);

All invocations of name_rev() are first preceded by
create_or_update_name(), which sets the "generation" and "distance"
fields in the name accordingly, so this looks good. All preceding
patches look good too. I have already sent emails for patch 12 and 13,
so this concludes my review.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v3 00/14] name-rev: eliminate recursion
  2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
                     ` (13 preceding siblings ...)
  2019-11-12 19:17   ` [PATCH v2 00/13] name-rev: eliminate recursion Johannes Schindelin
@ 2019-12-09 11:52   ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment SZEDER Gábor
                       ` (14 more replies)
  14 siblings, 15 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

'git name-rev' is implemented using a recursive algorithm, and,
consequently, it can segfault in deep histories (e.g. WebKit), and
thanks to a test case demonstrating this limitation every test run
results in a dmesg entry logging the segfaulting git process.

This patch series eliminates the recursion.

Changes since v2:

  - Add the new patch 12 to use 'name->tip_name' instead of
    'tip_name', to make the patch eliminating the recursion a bit even
    easier to follow (only with '--ignore-all-space', though, without
    that option that patch's diff is still mostly gibberish).
    The end result is the still same, see the empty interdiff.

  - Minor commit message updates (a typofix and René's signoff).

v2: https://public-inbox.org/git/20191112103821.30265-1-szeder.dev@gmail.com/
v1: https://public-inbox.org/git/20190919214712.7348-1-szeder.dev@gmail.com/T/#u

René Scharfe (1):
  name-rev: use strbuf_strip_suffix() in get_rev_name()

SZEDER Gábor (13):
  t6120-describe: correct test repo history graph in comment
  t6120-describe: modernize the 'check_describe' helper
  name-rev: avoid unnecessary cast in name_ref()
  name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  t6120: add a test to cover inner conditions in 'git name-rev's
    name_rev()
  name-rev: extract creating/updating a 'struct name_rev' into a helper
  name-rev: pull out deref handling from the recursion
  name-rev: restructure parsing commits and applying date cutoff
  name-rev: restructure creating/updating 'struct rev_name' instances
  name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  name-rev: use 'name->tip_name' instead of 'tip_name'
  name-rev: eliminate recursion in name_rev()
  name-rev: cleanup name_ref()

 builtin/name-rev.c  | 147 +++++++++++++++++++++++++++++---------------
 t/t6120-describe.sh |  72 +++++++++++++++++-----
 2 files changed, 153 insertions(+), 66 deletions(-)

Interdiff against v2:
Range-diff against v2:
 1:  8d70ed050d =  1:  8d70ed050d t6120-describe: correct test repo history graph in comment
 2:  3720b6859d !  2:  d2091869c8 t6120-describe: modernize the 'check_describe' helper
    @@ Commit message
         'test_expect_success' blocks, with extra hand-rolled code to record
         and examine its exit code.
     
    -    Update this helper and move the 'git decribe' invocation inside the
    +    Update this helper and move the 'git describe' invocation inside the
         'test_expect_success' block.
     
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
 3:  ad2f2eee68 !  3:  9d13032871 name-rev: use strbuf_strip_suffix() in get_rev_name()
    @@ Commit message
         string to the strbuf and then use strbuf_strip_suffix(), making the
         code more idiomatic.
     
    -    [TODO: René's signoff!]
    +    Signed-off-by: René Scharfe <l.s.r@web.de>
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      ## builtin/name-rev.c ##
 4:  c86a2ae2d0 =  4:  b1a8d7ce03 name-rev: avoid unnecessary cast in name_ref()
 5:  4fc960cc05 =  5:  3497d0bc42 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
 6:  1493cb4484 =  6:  43cba1a369 t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
 7:  fc842e578b =  7:  7053fc707c name-rev: extract creating/updating a 'struct name_rev' into a helper
 8:  7f182503e2 =  8:  28d957df88 name-rev: pull out deref handling from the recursion
 9:  0cdd40b75b =  9:  5bd4dede3d name-rev: restructure parsing commits and applying date cutoff
10:  e1733e3c56 = 10:  92f3897ff3 name-rev: restructure creating/updating 'struct rev_name' instances
11:  bd6e2e6d87 = 11:  cd24270f23 name-rev: drop name_rev()'s 'generation' and 'distance' parameters
 -:  ---------- > 12:  f33c0bbfd0 name-rev: use 'name->tip_name' instead of 'tip_name'
12:  0cf63c6d64 ! 13:  e5d7d291bd name-rev: eliminate recursion in name_rev()
    @@ builtin/name-rev.c: static struct rev_name *create_or_update_name(struct commit
     +				distance = name->distance + 1;
     +			}
      
    --			strip_suffix(tip_name, "^0", &len);
    +-			strip_suffix(name->tip_name, "^0", &len);
     -			if (name->generation > 0)
    --				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
    +-				new_name = xstrfmt("%.*s~%d^%d",
    +-						   (int)len,
    +-						   name->tip_name,
     -						   name->generation,
     -						   parent_number);
     -			else
    --				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
    +-				new_name = xstrfmt("%.*s^%d", (int)len,
    +-						   name->tip_name,
     -						   parent_number);
     -			generation = 0;
     -			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
     -		} else {
    --			new_name = tip_name;
    +-			new_name = name->tip_name;
     -			generation = name->generation + 1;
     -			distance = name->distance + 1;
     +			if (create_or_update_name(parent, new_name, taggerdate,
13:  316f7af43c = 14:  0b556389a3 name-rev: cleanup name_ref()
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 02/14] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

At the top of 't6120-describe.sh' an ASCII graph illustrates the
repository's history used in this test script.  This graph is a bit
misleading, because it swapped the second merge commit's first and
second parents.

When describing/naming a commit it does make a difference which parent
is the first and which is the second/Nth, so update this graph to
accurately represent that second merge.

While at it, move this history graph from the 'test_description'
variable to a regular comment.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 45047d0a72..9b184179d1 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -1,15 +1,16 @@
 #!/bin/sh
 
-test_description='test describe
+test_description='test describe'
+
+#  o---o-----o----o----o-------o----x
+#       \   D,R   e           /
+#        \---o-------------o-'
+#         \  B            /
+#          `-o----o----o-'
+#                 A    c
+#
+# First parent of a merge commit is on the same line, second parent below.
 
-                       B
-        .--------------o----o----o----x
-       /                   /    /
- o----o----o----o----o----.    /
-       \        A    c        /
-        .------------o---o---o
-                   D,R   e
-'
 . ./test-lib.sh
 
 check_describe () {
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 02/14] t6120-describe: modernize the 'check_describe' helper
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 03/14] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
                       ` (12 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

The 'check_describe' helper function runs 'git describe' outside of
'test_expect_success' blocks, with extra hand-rolled code to record
and examine its exit code.

Update this helper and move the 'git describe' invocation inside the
'test_expect_success' block.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 9b184179d1..a2988fa0c2 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -16,14 +16,12 @@ test_description='test describe'
 check_describe () {
 	expect="$1"
 	shift
-	R=$(git describe "$@" 2>err.actual)
-	S=$?
-	cat err.actual >&3
-	test_expect_success "describe $*" '
-	test $S = 0 &&
+	describe_opts="$@"
+	test_expect_success "describe $describe_opts" '
+	R=$(git describe $describe_opts 2>err.actual) &&
 	case "$R" in
 	$expect)	echo happy ;;
-	*)	echo "Oops - $R is not $expect";
+	*)	echo "Oops - $R is not $expect" &&
 		false ;;
 	esac
 	'
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 03/14] name-rev: use strbuf_strip_suffix() in get_rev_name()
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 02/14] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 04/14] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
                       ` (11 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

From: René Scharfe <l.s.r@web.de>

get_name_rev() basically open-codes strip_suffix() before adding a
string to a strbuf.

Let's use the strbuf right from the beginning, i.e. add the whole
string to the strbuf and then use strbuf_strip_suffix(), making the
code more idiomatic.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index b0f0776947..15919adbdb 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -321,11 +321,10 @@ static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 	if (!n->generation)
 		return n->tip_name;
 	else {
-		int len = strlen(n->tip_name);
-		if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
-			len -= 2;
 		strbuf_reset(buf);
-		strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
+		strbuf_addstr(buf, n->tip_name);
+		strbuf_strip_suffix(buf, "^0");
+		strbuf_addf(buf, "~%d", n->generation);
 		return buf->buf;
 	}
 }
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 04/14] name-rev: avoid unnecessary cast in name_ref()
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (2 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 03/14] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 05/14] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
                       ` (10 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

Casting a 'struct object' to 'struct commit' is unnecessary there,
because it's already available in the local 'commit' variable.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 15919adbdb..e40f51c2b4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -272,7 +272,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		int from_tag = starts_with(path, "refs/tags/");
 
 		if (taggerdate == TIME_MAX)
-			taggerdate = ((struct commit *)o)->date;
+			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
 			 from_tag, deref);
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 05/14] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (3 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 04/14] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 06/14] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e40f51c2b4..7e003c2702 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -102,7 +102,7 @@ static void name_rev(struct commit *commit,
 	}
 
 	if (name == NULL) {
-		name = xmalloc(sizeof(rev_name));
+		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
 		goto copy_data;
 	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 06/14] t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (4 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 05/14] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 07/14] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

In 'builtin/name-rev.c' in the name_rev() function there is a loop
iterating over all parents of the given commit, and the loop body
looks like this:

  if (parent_number > 1) {
      if (generation > 0)
          // branch #1
          new_name = ...
      else
          // branch #2
          new_name = ...
      name_rev(parent, new_name, ...);
  } else {
      // branch #3
      name_rev(...);
  }

These conditions are not covered properly in the test suite.  As far
as purely test coverage goes, they are all executed several times over
in 't6120-describe.sh'.  However, they don't directly influence the
command's output, because the repository used in that test script
contains several branches and tags pointing somewhere into the middle
of the commit DAG, and thus result in a better name for the
to-be-named commit.  This can hide bugs: e.g. by replacing the
'new_name' parameter of the first recursive name_rev() call with
'tip_name' (effectively making both branch #1 and #2 a noop) 'git
name-rev --all' shows thousands of bogus names in the Git repository,
but the whole test suite still passes successfully.  In an early
version of a later patch in this series I managed to mess up all three
branches (at once!), but the test suite still passed.

So add a new test case that operates on the following history:

  A--------------master
   \            /
    \----------M2
     \        /
      \---M1-C
       \ /
        B

and names the commit 'B' to make sure that all three branches are
crucial to determine 'B's name:

  - There is only a single ref, so all names are based on 'master',
    without any undesired interference from other refs.

  - Each time name_rev() follows the second parent of a merge commit,
    it appends "^2" to the name.  Following 'master's second parent
    right at the start ensures that all commits on the ancestry path
    from 'master' to 'B' have a different base name from the original
    'tip_name' of the very first name_rev() invocation.  Currently,
    while name_rev() is recursive, it doesn't matter, but it will be
    necessary to properly cover all three branches after the recursion
    is eliminated later in this series.

  - Following 'M2's second parent makes sure that branch #2 (i.e. when
    'generation = 0') affects 'B's name.

  - Following the only parent of the non-merge commit 'C' ensures that
    branch #3 affects 'B's name, and that it increments 'generation'.

  - Coming from 'C' 'generation' is 1, thus following 'M1's second
    parent makes sure that branch #1 affects 'B's name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 t/t6120-describe.sh | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index a2988fa0c2..0d119e9652 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -438,4 +438,45 @@ test_expect_success 'name-rev a rev shortly after epoch' '
 	test_cmp expect actual
 '
 
+# A--------------master
+#  \            /
+#   \----------M2
+#    \        /
+#     \---M1-C
+#      \ /
+#       B
+test_expect_success 'name-rev covers all conditions while looking at parents' '
+	git init repo &&
+	(
+		cd repo &&
+
+		echo A >file &&
+		git add file &&
+		git commit -m A &&
+		A=$(git rev-parse HEAD) &&
+
+		git checkout --detach &&
+		echo B >file &&
+		git commit -m B file &&
+		B=$(git rev-parse HEAD) &&
+
+		git checkout $A &&
+		git merge --no-ff $B &&  # M1
+
+		echo C >file &&
+		git commit -m C file &&
+
+		git checkout $A &&
+		git merge --no-ff HEAD@{1} && # M2
+
+		git checkout master &&
+		git merge --no-ff HEAD@{1} &&
+
+		echo "$B master^2^2~1^2" >expect &&
+		git name-rev $B >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 07/14] name-rev: extract creating/updating a 'struct name_rev' into a helper
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (5 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 06/14] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 08/14] name-rev: pull out deref handling from the recursion SZEDER Gábor
                       ` (7 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

In a later patch in this series we'll want to do this in two places.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 7e003c2702..e43df19709 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -79,12 +79,36 @@ static int is_better_name(struct rev_name *name,
 	return 0;
 }
 
+static struct rev_name *create_or_update_name(struct commit *commit,
+					      const char *tip_name,
+					      timestamp_t taggerdate,
+					      int generation, int distance,
+					      int from_tag)
+{
+	struct rev_name *name = get_commit_rev_name(commit);
+
+	if (name == NULL) {
+		name = xmalloc(sizeof(*name));
+		set_commit_rev_name(commit, name);
+		goto copy_data;
+	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
+copy_data:
+		name->tip_name = tip_name;
+		name->taggerdate = taggerdate;
+		name->generation = generation;
+		name->distance = distance;
+		name->from_tag = from_tag;
+
+		return name;
+	} else
+		return NULL;
+}
+
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int generation, int distance, int from_tag,
 		int deref)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 	char *to_free = NULL;
@@ -101,18 +125,8 @@ static void name_rev(struct commit *commit,
 			die("generation: %d, but deref?", generation);
 	}
 
-	if (name == NULL) {
-		name = xmalloc(sizeof(*name));
-		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-	} else {
+	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
+				   distance, from_tag)) {
 		free(to_free);
 		return;
 	}
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 08/14] name-rev: pull out deref handling from the recursion
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (6 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 07/14] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 09/14] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
                       ` (6 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

The 'if (deref) { ... }' condition near the beginning of the recursive
name_rev() function can only ever be true in the first invocation,
because the 'deref' parameter is always 0 in the subsequent recursive
invocations.

Extract this condition from the recursion into name_rev()'s caller and
drop the function's 'deref' parameter.  This makes eliminating the
recursion a bit easier to follow, and it will be moved back into
name_rev() after the recursion is eliminated.

Furthermore, drop the condition that die()s when both 'deref' and
'generation' are non-null (which should have been a BUG() to begin
with).

Note that this change reintroduces the memory leak that was plugged in
in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
case, 2017-05-04), but a later patch (name-rev: restructure
creating/updating 'struct rev_name' instances) in this series will
plug it in again.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e43df19709..e112a92b03 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -106,30 +106,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag,
-		int deref)
+		int generation, int distance, int from_tag)
 {
 	struct commit_list *parents;
 	int parent_number = 1;
-	char *to_free = NULL;
 
 	parse_commit(commit);
 
 	if (commit->date < cutoff)
 		return;
 
-	if (deref) {
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
-
-		if (generation)
-			die("generation: %d, but deref?", generation);
-	}
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag)) {
-		free(to_free);
+				   distance, from_tag))
 		return;
-	}
 
 	for (parents = commit->parents;
 			parents;
@@ -148,11 +137,11 @@ static void name_rev(struct commit *commit,
 
 			name_rev(parents->item, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag, 0);
+				 from_tag);
 		} else {
 			name_rev(parents->item, tip_name, taggerdate,
 				 generation + 1, distance + 1,
-				 from_tag, 0);
+				 from_tag);
 		}
 	}
 }
@@ -284,12 +273,16 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
+		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, 0, 0,
-			 from_tag, deref);
+		if (deref)
+			tip_name = xstrfmt("%s^0", path);
+		else
+			tip_name = xstrdup(path);
+		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
 	}
 	return 0;
 }
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 09/14] name-rev: restructure parsing commits and applying date cutoff
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (7 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 08/14] name-rev: pull out deref handling from the recursion SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 10/14] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
                       ` (5 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.

Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index e112a92b03..5041227790 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -111,11 +111,6 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	parse_commit(commit);
-
-	if (commit->date < cutoff)
-		return;
-
 	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
 				   distance, from_tag))
 		return;
@@ -123,6 +118,12 @@ static void name_rev(struct commit *commit,
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
+		struct commit *parent = parents->item;
+
+		parse_commit(parent);
+		if (parent->date < cutoff)
+			continue;
+
 		if (parent_number > 1) {
 			size_t len;
 			char *new_name;
@@ -135,11 +136,11 @@ static void name_rev(struct commit *commit,
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
 
-			name_rev(parents->item, new_name, taggerdate, 0,
+			name_rev(parent, new_name, taggerdate, 0,
 				 distance + MERGE_TRAVERSAL_WEIGHT,
 				 from_tag);
 		} else {
-			name_rev(parents->item, tip_name, taggerdate,
+			name_rev(parent, tip_name, taggerdate,
 				 generation + 1, distance + 1,
 				 from_tag);
 		}
@@ -273,16 +274,18 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 	if (o && o->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *)o;
 		int from_tag = starts_with(path, "refs/tags/");
-		const char *tip_name;
 
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (deref)
-			tip_name = xstrfmt("%s^0", path);
-		else
-			tip_name = xstrdup(path);
-		name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		if (commit->date >= cutoff) {
+			const char *tip_name;
+			if (deref)
+				tip_name = xstrfmt("%s^0", path);
+			else
+				tip_name = xstrdup(path);
+			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+		}
 	}
 	return 0;
 }
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 10/14] name-rev: restructure creating/updating 'struct rev_name' instances
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (8 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 09/14] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 11/14] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to the commit, or
returns early.

Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 5041227790..6416c49f67 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -111,14 +111,12 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	if (!create_or_update_name(commit, tip_name, taggerdate, generation,
-				   distance, from_tag))
-		return;
-
 	for (parents = commit->parents;
 			parents;
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
+		const char *new_name;
+		int new_generation, new_distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -126,7 +124,6 @@ static void name_rev(struct commit *commit,
 
 		if (parent_number > 1) {
 			size_t len;
-			char *new_name;
 
 			strip_suffix(tip_name, "^0", &len);
 			if (generation > 0)
@@ -135,15 +132,19 @@ static void name_rev(struct commit *commit,
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-
-			name_rev(parent, new_name, taggerdate, 0,
-				 distance + MERGE_TRAVERSAL_WEIGHT,
-				 from_tag);
+			new_generation = 0;
+			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
-			name_rev(parent, tip_name, taggerdate,
-				 generation + 1, distance + 1,
-				 from_tag);
+			new_name = tip_name;
+			new_generation = generation + 1;
+			new_distance = distance + 1;
 		}
+
+		if (create_or_update_name(parent, new_name, taggerdate,
+					  new_generation, new_distance,
+					  from_tag))
+			name_rev(parent, new_name, taggerdate,
+				 new_generation, new_distance, from_tag);
 	}
 }
 
@@ -280,11 +281,17 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		path = name_ref_abbrev(path, can_abbreviate_output);
 		if (commit->date >= cutoff) {
 			const char *tip_name;
+			char *to_free = NULL;
 			if (deref)
-				tip_name = xstrfmt("%s^0", path);
+				tip_name = to_free = xstrfmt("%s^0", path);
 			else
 				tip_name = xstrdup(path);
-			name_rev(commit, tip_name, taggerdate, 0, 0, from_tag);
+			if (create_or_update_name(commit, tip_name, taggerdate,
+						  0, 0, from_tag))
+				name_rev(commit, tip_name, taggerdate, 0, 0,
+					 from_tag);
+			else
+				free(to_free);
 		}
 	}
 	return 0;
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 11/14] name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (9 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 10/14] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 12/14] name-rev: use 'name->tip_name' instead of 'tip_name' SZEDER Gábor
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.

Let's simplify the function's signature and remove these two
unnecessary parameters.

Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6416c49f67..fc61d6fa71 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -106,8 +106,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int generation, int distance, int from_tag)
+		int from_tag)
 {
+	struct rev_name *name = get_commit_rev_name(commit);
 	struct commit_list *parents;
 	int parent_number = 1;
 
@@ -116,7 +117,7 @@ static void name_rev(struct commit *commit,
 			parents = parents->next, parent_number++) {
 		struct commit *parent = parents->item;
 		const char *new_name;
-		int new_generation, new_distance;
+		int generation, distance;
 
 		parse_commit(parent);
 		if (parent->date < cutoff)
@@ -126,25 +127,25 @@ static void name_rev(struct commit *commit,
 			size_t len;
 
 			strip_suffix(tip_name, "^0", &len);
-			if (generation > 0)
+			if (name->generation > 0)
 				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
-						   generation, parent_number);
+						   name->generation,
+						   parent_number);
 			else
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
-			new_generation = 0;
-			new_distance = distance + MERGE_TRAVERSAL_WEIGHT;
+			generation = 0;
+			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
 			new_name = tip_name;
-			new_generation = generation + 1;
-			new_distance = distance + 1;
+			generation = name->generation + 1;
+			distance = name->distance + 1;
 		}
 
 		if (create_or_update_name(parent, new_name, taggerdate,
-					  new_generation, new_distance,
+					  generation, distance,
 					  from_tag))
-			name_rev(parent, new_name, taggerdate,
-				 new_generation, new_distance, from_tag);
+			name_rev(parent, new_name, taggerdate, from_tag);
 	}
 }
 
@@ -288,7 +289,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 				tip_name = xstrdup(path);
 			if (create_or_update_name(commit, tip_name, taggerdate,
 						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate, 0, 0,
+				name_rev(commit, tip_name, taggerdate,
 					 from_tag);
 			else
 				free(to_free);
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 12/14] name-rev: use 'name->tip_name' instead of 'tip_name'
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (10 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 11/14] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 13/14] name-rev: eliminate recursion in name_rev() SZEDER Gábor
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

Following the previous patches in this series we can get the value of
'name_rev()'s 'tip_name' parameter from the 'struct rev_name'
associated with the commit as well.

So let's use 'name->tip_name' instead, which makes the patch
eliminating the recursion of name_rev() a bit easier to follow.

Note that at this point we could drop the 'tip_name' parameter as
well, but that parameter will be necessary later, after the recursion
is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index fc61d6fa71..6c1e6e9868 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -126,18 +126,21 @@ static void name_rev(struct commit *commit,
 		if (parent_number > 1) {
 			size_t len;
 
-			strip_suffix(tip_name, "^0", &len);
+			strip_suffix(name->tip_name, "^0", &len);
 			if (name->generation > 0)
-				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
+				new_name = xstrfmt("%.*s~%d^%d",
+						   (int)len,
+						   name->tip_name,
 						   name->generation,
 						   parent_number);
 			else
-				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
+				new_name = xstrfmt("%.*s^%d", (int)len,
+						   name->tip_name,
 						   parent_number);
 			generation = 0;
 			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 		} else {
-			new_name = tip_name;
+			new_name = name->tip_name;
 			generation = name->generation + 1;
 			distance = name->distance + 1;
 		}
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 13/14] name-rev: eliminate recursion in name_rev()
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (11 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 12/14] name-rev: use 'name->tip_name' instead of 'tip_name' SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 11:52     ` [PATCH v3 14/14] name-rev: cleanup name_ref() SZEDER Gábor
  2019-12-09 15:08     ` [PATCH v3 00/14] name-rev: eliminate recursion Derrick Stolee
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space.  E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).

Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.

Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].

The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure.  Now
the recursion is gone, so flip it to expect success.  Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.

Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git.  This shouldn't matter in
practice, because the output has always been unordered anyway.

This patch is best viewed with '--ignore-all-space'.

[1] Early versions of this patch used a 'commit_list', resulting in
    ~15% performance penalty for 'git name-rev --all' in 'linux.git',
    presumably because of the memory allocation and release for each
    insertion and removal. Using a LIFO 'prio_queue' has basically no
    effect on performance.

[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
    'v0.1^2~5', meaning that usually following the first parent of a
    merge results in the best name for its ancestors.  So when later
    we follow the remaining parent(s) of a merge, and reach an already
    named commit, then we usually find that we can't give that commit
    a better name, and thus we don't have to visit any of its
    ancestors again.

    OTOH, if we were to follow the Nth parent of the merge first, then
    the name of all its ancestors would include a corresponding '^N'.
    Those are not the best names for those commits, so when later we
    reach an already named commit following the first parent of that
    merge, then we would have to update the name of that commit and
    the names of all of its ancestors as well.  Consequently, we would
    have to visit many commits several times, resulting in a
    significant slowdown.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c  | 102 +++++++++++++++++++++++++++-----------------
 t/t6120-describe.sh |   2 +-
 2 files changed, 65 insertions(+), 39 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6c1e6e9868..a3b796eac4 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -6,6 +6,7 @@
 #include "tag.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "prio-queue.h"
 #include "sha1-lookup.h"
 #include "commit-slab.h"
 
@@ -104,52 +105,77 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 		return NULL;
 }
 
-static void name_rev(struct commit *commit,
+static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int from_tag)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
-	struct commit_list *parents;
-	int parent_number = 1;
-
-	for (parents = commit->parents;
-			parents;
-			parents = parents->next, parent_number++) {
-		struct commit *parent = parents->item;
-		const char *new_name;
-		int generation, distance;
-
-		parse_commit(parent);
-		if (parent->date < cutoff)
-			continue;
+	struct prio_queue queue;
+	struct commit *commit;
+	struct commit **parents_to_queue = NULL;
+	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
+
+	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
+	prio_queue_put(&queue, start_commit);
+
+	while ((commit = prio_queue_get(&queue))) {
+		struct rev_name *name = get_commit_rev_name(commit);
+		struct commit_list *parents;
+		int parent_number = 1;
+
+		parents_to_queue_nr = 0;
+
+		for (parents = commit->parents;
+				parents;
+				parents = parents->next, parent_number++) {
+			struct commit *parent = parents->item;
+			const char *new_name;
+			int generation, distance;
+
+			parse_commit(parent);
+			if (parent->date < cutoff)
+				continue;
 
-		if (parent_number > 1) {
-			size_t len;
+			if (parent_number > 1) {
+				size_t len;
+
+				strip_suffix(name->tip_name, "^0", &len);
+				if (name->generation > 0)
+					new_name = xstrfmt("%.*s~%d^%d",
+							   (int)len,
+							   name->tip_name,
+							   name->generation,
+							   parent_number);
+				else
+					new_name = xstrfmt("%.*s^%d", (int)len,
+							   name->tip_name,
+							   parent_number);
+				generation = 0;
+				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
+			} else {
+				new_name = name->tip_name;
+				generation = name->generation + 1;
+				distance = name->distance + 1;
+			}
 
-			strip_suffix(name->tip_name, "^0", &len);
-			if (name->generation > 0)
-				new_name = xstrfmt("%.*s~%d^%d",
-						   (int)len,
-						   name->tip_name,
-						   name->generation,
-						   parent_number);
-			else
-				new_name = xstrfmt("%.*s^%d", (int)len,
-						   name->tip_name,
-						   parent_number);
-			generation = 0;
-			distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
-		} else {
-			new_name = name->tip_name;
-			generation = name->generation + 1;
-			distance = name->distance + 1;
+			if (create_or_update_name(parent, new_name, taggerdate,
+						  generation, distance,
+						  from_tag)) {
+				ALLOC_GROW(parents_to_queue,
+					   parents_to_queue_nr + 1,
+					   parents_to_queue_alloc);
+				parents_to_queue[parents_to_queue_nr] = parent;
+				parents_to_queue_nr++;
+			}
 		}
 
-		if (create_or_update_name(parent, new_name, taggerdate,
-					  generation, distance,
-					  from_tag))
-			name_rev(parent, new_name, taggerdate, from_tag);
+		/* The first parent must come out first from the prio_queue */
+		while (parents_to_queue_nr)
+			prio_queue_put(&queue,
+				       parents_to_queue[--parents_to_queue_nr]);
 	}
+
+	clear_prio_queue(&queue);
+	free(parents_to_queue);
 }
 
 static int subpath_matches(const char *path, const char *filter)
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 0d119e9652..09c50f3f04 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -381,7 +381,7 @@ test_expect_success 'describe tag object' '
 	test_i18ngrep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
-test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
+test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
 	do
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 14/14] name-rev: cleanup name_ref()
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (12 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 13/14] name-rev: eliminate recursion in name_rev() SZEDER Gábor
@ 2019-12-09 11:52     ` SZEDER Gábor
  2019-12-09 15:08     ` [PATCH v3 00/14] name-rev: eliminate recursion Derrick Stolee
  14 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, René Scharfe, Jonathan Tan, git,
	SZEDER Gábor

Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.

Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 builtin/name-rev.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index a3b796eac4..cc488ee319 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -107,12 +107,26 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 
 static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
-		int from_tag)
+		int from_tag, int deref)
 {
 	struct prio_queue queue;
 	struct commit *commit;
 	struct commit **parents_to_queue = NULL;
 	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
+	char *to_free = NULL;
+
+	parse_commit(start_commit);
+	if (start_commit->date < cutoff)
+		return;
+
+	if (deref)
+		tip_name = to_free = xstrfmt("%s^0", tip_name);
+
+	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
+				   from_tag)) {
+		free(to_free);
+		return;
+	}
 
 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
 	prio_queue_put(&queue, start_commit);
@@ -309,20 +323,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		if (commit->date >= cutoff) {
-			const char *tip_name;
-			char *to_free = NULL;
-			if (deref)
-				tip_name = to_free = xstrfmt("%s^0", path);
-			else
-				tip_name = xstrdup(path);
-			if (create_or_update_name(commit, tip_name, taggerdate,
-						  0, 0, from_tag))
-				name_rev(commit, tip_name, taggerdate,
-					 from_tag);
-			else
-				free(to_free);
-		}
+		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
 	}
 	return 0;
 }
-- 
2.24.0.801.g241c134b8d


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 12/13] name-rev: eliminate recursion in name_rev()
  2019-11-27 17:57     ` Jonathan Tan
@ 2019-12-09 12:22       ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 12:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, stolee, l.s.r, git

On Wed, Nov 27, 2019 at 09:57:51AM -0800, Jonathan Tan wrote:
> > Note that this slightly changes the order of lines in the output of
> > 'git name-rev --all', usually swapping two lines every 35 lines in
> > git.git or every 150 lines in linux.git.  This shouldn't matter in
> > practice, because the output has always been unordered anyway.
> 
> I didn't verify that the changing of order is fine, but other than that,
> this patch looks great.

FWIW, the sorted output is the same (well, it would clearly be a bug
if it wasn't):

  $ .../v2.24.0/bin/git name-rev --all |sort >orig.sorted
  $ git name-rev --all |sort >new.sorted
  $ diff -u orig.sorted new.sorted 
  $ echo $?
  0

I still don't understand where that slight change in the order of
lines comes from, though I didn't really tried to understand it, to be
honest...

> > This patch is best viewed with '--ignore-all-space'.
> 
> Thanks for the tip! I ended up unindenting the loop to see the changes
> better, but I should have done this instead.

At one point I was considering an additional noop preparatory commit
that would have added one more indentation level to name_rev()'s for
loop.  That patch would have an empty diff with '--ignore-all-spaces',
of course, and the diff of the patch eliminating the recursion would
look sensible by default.
Would that have been worth it?  Dunno.

> > -static void name_rev(struct commit *commit,
> > +static void name_rev(struct commit *start_commit,
> >  		const char *tip_name, timestamp_t taggerdate,
> >  		int from_tag)
> 
> There are many changes from tip_name to name->tip_name in this function
> that mean that tip_name is no longer used within this function. Should
> this cleanup have been done in one of the earlier patches?

I've added a new patch to do that.

> Apart from that, overall, this patch looks like a straightforward good
> change. When we have a parent, instead of immediately calling name_rev()
> recursively, we first add it to an array, and then (in reverse order)
> add it to a priority queue which is actually used as a LIFO stack.

Yeah, 'commit->parents' is a single linked list, so we can't iterate
over it backwards, hence that interim array.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v2 13/13] name-rev: cleanup name_ref()
  2019-11-27 18:01     ` Jonathan Tan
@ 2019-12-09 12:32       ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 12:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, stolee, l.s.r, git

On Wed, Nov 27, 2019 at 10:01:02AM -0800, Jonathan Tan wrote:
> > Earlier patches in this series moved a couple of conditions from the
> > recursive name_rev() function into its caller name_ref(), for no other
> > reason than to make eliminating the recursion a bit easier to follow.
> > 
> > Since the previous patch name_rev() is not recursive anymore, so let's
> > move all those conditions back into name_rev().
> 
> I don't really see the need for this code movement, to be honest. There
> is no big difference in doing the checks in one place or the other, and
> if you ask me, it might even be better to do it in the caller of
> name_rev(), and leave name_rev() to handle only the naming.

I think it does make sense: in my view the cutoff handling and calling
create_or_update_name() is part of "handling the naming", and it's
better when they are constrained to only a single function.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH] name-rev: rewrite create_or_update_name()
  2019-09-22  8:18   ` [PATCH] name-rev: rewrite create_or_update_name() Martin Ågren
@ 2019-12-09 12:43     ` SZEDER Gábor
  0 siblings, 0 replies; 98+ messages in thread
From: SZEDER Gábor @ 2019-12-09 12:43 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Junio C Hamano, git

On Sun, Sep 22, 2019 at 10:18:46AM +0200, Martin Ågren wrote:
> This code was moved straight out of name_rev(). As such, we inherited
> the "goto" to jump from an if into an else-if. We also inherited the
> fact that "nothing to do -- return NULL" is handled last.

>  For the record, --color-moved confirms that your patch is a move and
>  the conversion around it looks good to me. I was a bit puzzled by what
>  the moved code actually wanted to *do* and came up with this rewrite.

Yeah, I had a bit of a "Huh?!" moment myself looking at that goto
jumping over the condition, too...  Initially I left it as-is to keep
this patch a pure code movement, and that others might have a bit of
fun as well when they stumble upon it in the future ;)

>  It seems there was some discussion around leaks and leak-plugs. That
>  would conflict/interact with this. 

And I didn't pick it up in later versions, because René's plans to
clean up memory ownership would deal with it (and with much more) as
well.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 00/14] name-rev: eliminate recursion
  2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
                       ` (13 preceding siblings ...)
  2019-12-09 11:52     ` [PATCH v3 14/14] name-rev: cleanup name_ref() SZEDER Gábor
@ 2019-12-09 15:08     ` Derrick Stolee
  2019-12-11 17:33       ` Junio C Hamano
  14 siblings, 1 reply; 98+ messages in thread
From: Derrick Stolee @ 2019-12-09 15:08 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: René Scharfe, Jonathan Tan, git

On 12/9/2019 6:52 AM, SZEDER Gábor wrote:
> 'git name-rev' is implemented using a recursive algorithm, and,
> consequently, it can segfault in deep histories (e.g. WebKit), and
> thanks to a test case demonstrating this limitation every test run
> results in a dmesg entry logging the segfaulting git process.
> 
> This patch series eliminates the recursion.
> 
> Changes since v2:
> 
>   - Add the new patch 12 to use 'name->tip_name' instead of
>     'tip_name', to make the patch eliminating the recursion a bit even
>     easier to follow (only with '--ignore-all-space', though, without
>     that option that patch's diff is still mostly gibberish).
>     The end result is the still same, see the empty interdiff.

This new commit makes sense, and I see how it adjusts the context lines
in the patch that follows. This series looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 00/14] name-rev: eliminate recursion
  2019-12-09 15:08     ` [PATCH v3 00/14] name-rev: eliminate recursion Derrick Stolee
@ 2019-12-11 17:33       ` Junio C Hamano
  0 siblings, 0 replies; 98+ messages in thread
From: Junio C Hamano @ 2019-12-11 17:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: SZEDER Gábor, René Scharfe, Jonathan Tan, git

Derrick Stolee <stolee@gmail.com> writes:

> On 12/9/2019 6:52 AM, SZEDER Gábor wrote:
>> 'git name-rev' is implemented using a recursive algorithm, and,
>> consequently, it can segfault in deep histories (e.g. WebKit), and
>> thanks to a test case demonstrating this limitation every test run
>> results in a dmesg entry logging the segfaulting git process.
>> 
>> This patch series eliminates the recursion.
>> 
>> Changes since v2:
>> 
>>   - Add the new patch 12 to use 'name->tip_name' instead of
>>     'tip_name', to make the patch eliminating the recursion a bit even
>>     easier to follow (only with '--ignore-all-space', though, without
>>     that option that patch's diff is still mostly gibberish).
>>     The end result is the still same, see the empty interdiff.
>
> This new commit makes sense, and I see how it adjusts the context lines
> in the patch that follows. This series looks good to me.

I've finished eyeballing the patches myself, and they seem to be in
a good shape, too.

Thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2019-12-11 17:34 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 21:46 [PATCH 00/15] name-rev: eliminate recursion SZEDER Gábor
2019-09-19 21:46 ` [PATCH 01/15] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-09-20 21:47   ` Junio C Hamano
2019-09-20 22:29     ` SZEDER Gábor
2019-09-28  4:06       ` Junio C Hamano
2019-09-19 21:46 ` [PATCH 02/15] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-09-20 21:49   ` Junio C Hamano
2019-09-19 21:46 ` [PATCH 03/15] name-rev: use strip_suffix() in get_rev_name() SZEDER Gábor
2019-09-20 16:36   ` René Scharfe
2019-09-20 17:10     ` SZEDER Gábor
2019-09-19 21:46 ` [PATCH 04/15] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 05/15] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-09-20 15:11   ` Derrick Stolee
2019-09-20 15:40     ` SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 06/15] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-09-20 15:14   ` Derrick Stolee
2019-09-20 15:44     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 07/15] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-09-20 15:18   ` Derrick Stolee
2019-09-22  8:18   ` [PATCH] name-rev: rewrite create_or_update_name() Martin Ågren
2019-12-09 12:43     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 08/15] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-09-20 15:21   ` Derrick Stolee
2019-09-20 17:42     ` SZEDER Gábor
2019-09-20 16:37   ` René Scharfe
2019-09-20 18:13     ` SZEDER Gábor
2019-09-20 18:14       ` SZEDER Gábor
2019-09-21  9:57         ` SZEDER Gábor
2019-09-21 12:37           ` René Scharfe
2019-09-22 19:05             ` SZEDER Gábor
2019-09-23 18:43               ` René Scharfe
2019-09-23 18:59                 ` SZEDER Gábor
2019-09-23 19:55                   ` René Scharfe
2019-09-23 20:47                     ` SZEDER Gábor
2019-09-24 17:03                       ` René Scharfe
2019-09-26 17:33                         ` SZEDER Gábor
2019-09-21 12:37       ` René Scharfe
2019-09-21 14:21         ` SZEDER Gábor
2019-09-21 15:52           ` René Scharfe
2019-09-19 21:47 ` [PATCH 09/15] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-09-21 12:37   ` René Scharfe
2019-09-19 21:47 ` [PATCH 10/15] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-09-20 15:27   ` Derrick Stolee
2019-09-20 17:09     ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 11/15] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-09-19 21:47 ` [PATCH 12/15] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 13/15] name-rev: cleanup name_ref() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug a memory leak in name_rev() SZEDER Gábor
2019-09-19 21:47 ` [PATCH 14/15] name-rev: plug memory leak in name_rev() in the deref case SZEDER Gábor
2019-09-19 22:47   ` SZEDER Gábor
2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug a " SZEDER Gábor
2019-09-20 15:35   ` Derrick Stolee
2019-09-19 21:47 ` [PATCH 15/15] name-rev: plug memory leak in name_rev() SZEDER Gábor
2019-09-19 22:48   ` SZEDER Gábor
2019-09-20 15:37 ` [PATCH 00/15] name-rev: eliminate recursion Derrick Stolee
2019-09-20 17:37   ` SZEDER Gábor
2019-11-12 10:38 ` [PATCH v2 00/13] " SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 01/13] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 02/13] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-11-27 18:02     ` Jonathan Tan
2019-11-12 10:38   ` [PATCH v2 03/13] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
2019-11-12 19:02     ` René Scharfe
2019-11-12 10:38   ` [PATCH v2 04/13] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 05/13] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 06/13] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 07/13] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 08/13] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 09/13] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 10/13] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 11/13] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-11-27 18:13     ` Jonathan Tan
2019-11-12 10:38   ` [PATCH v2 12/13] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-11-27 17:57     ` Jonathan Tan
2019-12-09 12:22       ` SZEDER Gábor
2019-11-12 10:38   ` [PATCH v2 13/13] name-rev: cleanup name_ref() SZEDER Gábor
2019-11-27 18:01     ` Jonathan Tan
2019-12-09 12:32       ` SZEDER Gábor
2019-11-12 19:17   ` [PATCH v2 00/13] name-rev: eliminate recursion Johannes Schindelin
2019-11-13 19:25     ` Sebastiaan Dammann
2019-12-09 11:52   ` [PATCH v3 00/14] " SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 01/14] t6120-describe: correct test repo history graph in comment SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 02/14] t6120-describe: modernize the 'check_describe' helper SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 03/14] name-rev: use strbuf_strip_suffix() in get_rev_name() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 04/14] name-rev: avoid unnecessary cast in name_ref() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 05/14] name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 06/14] t6120: add a test to cover inner conditions in 'git name-rev's name_rev() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 07/14] name-rev: extract creating/updating a 'struct name_rev' into a helper SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 08/14] name-rev: pull out deref handling from the recursion SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 09/14] name-rev: restructure parsing commits and applying date cutoff SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 10/14] name-rev: restructure creating/updating 'struct rev_name' instances SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 11/14] name-rev: drop name_rev()'s 'generation' and 'distance' parameters SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 12/14] name-rev: use 'name->tip_name' instead of 'tip_name' SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 13/14] name-rev: eliminate recursion in name_rev() SZEDER Gábor
2019-12-09 11:52     ` [PATCH v3 14/14] name-rev: cleanup name_ref() SZEDER Gábor
2019-12-09 15:08     ` [PATCH v3 00/14] name-rev: eliminate recursion Derrick Stolee
2019-12-11 17:33       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).