[PATCH 00/10] name-rev: improve memory usage

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 00/10] name-rev: improve memory usage
@ 2020-02-04 21:12 René Scharfe
  2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:12 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

This series seeks to get reduce the size of memory allocations, the
number of reallocations and the amount of leaks in git name-rev, to
improve its performance.  It starts with a few cleanups:

Martin Ågren (1):
  name-rev: rewrite create_or_update_name()

René Scharfe (9):
  name-rev: remove unused typedef
  name-rev: respect const qualifier
  name-rev: don't _peek() in create_or_update_name()

... then plugs a minor leak:

  name-rev: don't leak path copy in name_ref()

... and gets rid of a level of indirection in commit slab usage:

  name-rev: put struct rev_name into commit slab

The next two patches eliminate reallocations while building name strings
for parent commits, which can make a surprisingly big difference in some
cases:

  name-rev: factor out get_parent_name()
  name-rev: pre-size buffer in get_parent_name()

The next one avoids building names that are be discarded right away by
checking first if they are better than a possibly present other name
assigned earlier, which only provides a small speedup, but is the right
thing to do:

  name-rev: generate name strings only if they are better

And finally a tricky one whose commit message is a lot longer than its
diff, which adds a bit of overhead and which probably needs the most
reviewer attention to make sure it won't cause double frees:

  name-rev: release unused name strings

 builtin/name-rev.c | 133 ++++++++++++++++++++++++++-------------------
 1 file changed, 76 insertions(+), 57 deletions(-)

--
2.25.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/10] name-rev: rewrite create_or_update_name()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
@ 2020-02-04 21:14 ` René Scharfe
  2020-02-05  2:00   ` Derrick Stolee
  2020-02-05 16:45   ` Andrei Rybak
  2020-02-04 21:15 ` [PATCH 02/10] name-rev: remove unused typedef René Scharfe
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:14 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.

Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Original submission:
https://lore.kernel.org/git/20190922081846.14452-1-martin.agren@gmail.com/

 builtin/name-rev.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6b9e8c850b..c2239ac3f7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -88,21 +88,21 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = get_commit_rev_name(commit);

+	if (name && !is_better_name(name, taggerdate, distance, from_tag))
+		return NULL;
+
 	if (name == NULL) {
 		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-
-		return name;
-	} else
-		return NULL;
+	}
+
+	name->tip_name = tip_name;
+	name->taggerdate = taggerdate;
+	name->generation = generation;
+	name->distance = distance;
+	name->from_tag = from_tag;
+
+	return name;
 }

 static void name_rev(struct commit *start_commit,
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/10] name-rev: remove unused typedef
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
  2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
@ 2020-02-04 21:15 ` René Scharfe
  2020-02-04 21:16 ` [PATCH 03/10] name-rev: respect const qualifier René Scharfe
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:15 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c2239ac3f7..a8de9cc561 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -16,13 +16,13 @@
  */
 #define CUTOFF_DATE_SLOP 86400

-typedef struct rev_name {
+struct rev_name {
 	const char *tip_name;
 	timestamp_t taggerdate;
 	int generation;
 	int distance;
 	int from_tag;
-} rev_name;
+};

 define_commit_slab(commit_rev_name, struct rev_name *);

--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/10] name-rev: respect const qualifier
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
  2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
  2020-02-04 21:15 ` [PATCH 02/10] name-rev: remove unused typedef René Scharfe
@ 2020-02-04 21:16 ` René Scharfe
  2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index a8de9cc561..2e6820bd5b 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -32,7 +32,7 @@ static struct commit_rev_name rev_names;
 /* How many generations are maximally preferred over _one_ merge traversal? */
 #define MERGE_TRAVERSAL_WEIGHT 65535

-static struct rev_name *get_commit_rev_name(struct commit *commit)
+static struct rev_name *get_commit_rev_name(const struct commit *commit)
 {
 	struct rev_name **slot = commit_rev_name_peek(&rev_names, commit);

@@ -357,11 +357,11 @@ static const char *get_exact_ref_match(const struct object *o)
 static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 {
 	struct rev_name *n;
-	struct commit *c;
+	const struct commit *c;

 	if (o->type != OBJ_COMMIT)
 		return get_exact_ref_match(o);
-	c = (struct commit *) o;
+	c = (const struct commit *) o;
 	n = get_commit_rev_name(c);
 	if (!n)
 		return NULL;
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/10] name-rev: don't leak path copy in name_ref()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (2 preceding siblings ...)
  2020-02-04 21:16 ` [PATCH 03/10] name-rev: respect const qualifier René Scharfe
@ 2020-02-04 21:17 ` René Scharfe
  2020-02-05 14:35   ` Derrick Stolee
  2020-02-04 21:20 ` [PATCH 05/10] name-rev: don't _peek() in create_or_update_name() René Scharfe
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:17 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it.  Move the duplication to name_rev() and release
the copy in the latter case.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 2e6820bd5b..3e22a0503e 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -121,6 +121,8 @@ static void name_rev(struct commit *start_commit,

 	if (deref)
 		tip_name = to_free = xstrfmt("%s^0", tip_name);
+	else
+		tip_name = to_free = xstrdup(tip_name);

 	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
 				   from_tag)) {
@@ -323,7 +325,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
 		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
+		name_rev(commit, path, taggerdate, from_tag, deref);
 	}
 	return 0;
 }
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/10] name-rev: don't _peek() in create_or_update_name()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (3 preceding siblings ...)
  2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
@ 2020-02-04 21:20 ` René Scharfe
  2020-02-04 21:22 ` [PATCH 06/10] name-rev: put struct rev_name into commit slab René Scharfe
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:20 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 3e22a0503e..41aed436ca 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -39,11 +39,6 @@ static struct rev_name *get_commit_rev_name(const struct commit *commit)
 	return slot ? *slot : NULL;
 }

-static void set_commit_rev_name(struct commit *commit, struct rev_name *name)
-{
-	*commit_rev_name_at(&rev_names, commit) = name;
-}
-
 static int is_better_name(struct rev_name *name,
 			  timestamp_t taggerdate,
 			  int distance,
@@ -86,15 +81,14 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 					      int generation, int distance,
 					      int from_tag)
 {
-	struct rev_name *name = get_commit_rev_name(commit);
+	struct rev_name **slot = commit_rev_name_at(&rev_names, commit);
+	struct rev_name *name = *slot;

 	if (name && !is_better_name(name, taggerdate, distance, from_tag))
 		return NULL;

-	if (name == NULL) {
-		name = xmalloc(sizeof(*name));
-		set_commit_rev_name(commit, name);
-	}
+	if (!name)
+		name = *slot = xmalloc(sizeof(*name));

 	name->tip_name = tip_name;
 	name->taggerdate = taggerdate;
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/10] name-rev: put struct rev_name into commit slab
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (4 preceding siblings ...)
  2020-02-04 21:20 ` [PATCH 05/10] name-rev: don't _peek() in create_or_update_name() René Scharfe
@ 2020-02-04 21:22 ` René Scharfe
  2020-02-04 21:23 ` [PATCH 07/10] name-rev: factor out get_parent_name() René Scharfe
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:22 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly.  Avoid that allocation and
pointer indirection by storing the full struct in the commit slab.  Use
the tip_name member pointer to determine if the returned struct is
initialized.

Performance in the Linux repository measured with hyperfine before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     953.5 ms ±   6.3 ms    [User: 901.2 ms, System: 52.1 ms]
  Range (min … max):   945.2 ms … 968.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.0 ms ±   3.1 ms    [User: 807.4 ms, System: 43.6 ms]
  Range (min … max):   846.7 ms … 857.0 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 41aed436ca..14381a3c64 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -24,7 +24,7 @@ struct rev_name {
 	int from_tag;
 };

-define_commit_slab(commit_rev_name, struct rev_name *);
+define_commit_slab(commit_rev_name, struct rev_name);

 static timestamp_t cutoff = TIME_MAX;
 static struct commit_rev_name rev_names;
@@ -32,11 +32,16 @@ static struct commit_rev_name rev_names;
 /* How many generations are maximally preferred over _one_ merge traversal? */
 #define MERGE_TRAVERSAL_WEIGHT 65535

+static int is_valid_rev_name(const struct rev_name *name)
+{
+	return name && name->tip_name;
+}
+
 static struct rev_name *get_commit_rev_name(const struct commit *commit)
 {
-	struct rev_name **slot = commit_rev_name_peek(&rev_names, commit);
+	struct rev_name *name = commit_rev_name_peek(&rev_names, commit);

-	return slot ? *slot : NULL;
+	return is_valid_rev_name(name) ? name : NULL;
 }

 static int is_better_name(struct rev_name *name,
@@ -81,15 +86,12 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 					      int generation, int distance,
 					      int from_tag)
 {
-	struct rev_name **slot = commit_rev_name_at(&rev_names, commit);
-	struct rev_name *name = *slot;
+	struct rev_name *name = commit_rev_name_at(&rev_names, commit);

-	if (name && !is_better_name(name, taggerdate, distance, from_tag))
+	if (is_valid_rev_name(name) &&
+	    !is_better_name(name, taggerdate, distance, from_tag))
 		return NULL;

-	if (!name)
-		name = *slot = xmalloc(sizeof(*name));
-
 	name->tip_name = tip_name;
 	name->taggerdate = taggerdate;
 	name->generation = generation;
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/10] name-rev: factor out get_parent_name()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (5 preceding siblings ...)
  2020-02-04 21:22 ` [PATCH 06/10] name-rev: put struct rev_name into commit slab René Scharfe
@ 2020-02-04 21:23 ` René Scharfe
  2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:23 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Reduce nesting by moving code to come up with a name for the parent into
its own function.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 14381a3c64..6701fb1569 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -101,6 +101,19 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 	return name;
 }

+static char *get_parent_name(const struct rev_name *name, int parent_number)
+{
+	size_t len;
+
+	strip_suffix(name->tip_name, "^0", &len);
+	if (name->generation > 0)
+		return xstrfmt("%.*s~%d^%d", (int)len, name->tip_name,
+			       name->generation, parent_number);
+	else
+		return xstrfmt("%.*s^%d", (int)len, name->tip_name,
+			       parent_number);
+}
+
 static void name_rev(struct commit *start_commit,
 		const char *tip_name, timestamp_t taggerdate,
 		int from_tag, int deref)
@@ -148,19 +161,7 @@ static void name_rev(struct commit *start_commit,
 				continue;

 			if (parent_number > 1) {
-				size_t len;
-
-				strip_suffix(name->tip_name, "^0", &len);
-				if (name->generation > 0)
-					new_name = xstrfmt("%.*s~%d^%d",
-							   (int)len,
-							   name->tip_name,
-							   name->generation,
-							   parent_number);
-				else
-					new_name = xstrfmt("%.*s^%d", (int)len,
-							   name->tip_name,
-							   parent_number);
+				new_name = get_parent_name(name, parent_number);
 				generation = 0;
 				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 			} else {
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/10] name-rev: pre-size buffer in get_parent_name()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (6 preceding siblings ...)
  2020-02-04 21:23 ` [PATCH 07/10] name-rev: factor out get_parent_name() René Scharfe
@ 2020-02-04 21:24 ` René Scharfe
  2020-02-05  3:19   ` Derrick Stolee
  2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:24 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      5.822 s ±  0.013 s    [User: 5.304 s, System: 0.516 s]
  Range (min … max):    5.803 s …  5.837 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.527 s ±  0.003 s    [User: 1.015 s, System: 0.511 s]
  Range (min … max):    1.524 s …  1.535 s    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6701fb1569..793356edd1 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -103,15 +103,23 @@ static struct rev_name *create_or_update_name(struct commit *commit,

 static char *get_parent_name(const struct rev_name *name, int parent_number)
 {
+	struct strbuf sb = STRBUF_INIT;
 	size_t len;

 	strip_suffix(name->tip_name, "^0", &len);
-	if (name->generation > 0)
-		return xstrfmt("%.*s~%d^%d", (int)len, name->tip_name,
-			       name->generation, parent_number);
-	else
-		return xstrfmt("%.*s^%d", (int)len, name->tip_name,
-			       parent_number);
+	if (name->generation > 0) {
+		strbuf_grow(&sb, len +
+			    1 + decimal_width(name->generation) +
+			    1 + decimal_width(parent_number));
+		strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
+			    name->generation, parent_number);
+	} else {
+		strbuf_grow(&sb, len +
+			    1 + decimal_width(parent_number));
+		strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
+			    parent_number);
+	}
+	return strbuf_detach(&sb, NULL);
 }

 static void name_rev(struct commit *start_commit,
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/10] name-rev: generate name strings only if they are better
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (7 preceding siblings ...)
  2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
@ 2020-02-04 21:25 ` René Scharfe
  2020-02-05 15:11   ` Derrick Stolee
  2020-02-04 21:26 ` [PATCH 10/10] name-rev: release unused name strings René Scharfe
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:25 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name().  This avoids allocations for names that are
rejected by that function.  Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     437.8 ms ±   4.0 ms    [User: 422.5 ms, System: 15.2 ms]
  Range (min … max):   432.8 ms … 446.3 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     408.5 ms ±   1.4 ms    [User: 387.2 ms, System: 21.2 ms]
  Range (min … max):   407.1 ms … 411.7 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 793356edd1..98f55bcea9 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -81,7 +81,6 @@ static int is_better_name(struct rev_name *name,
 }

 static struct rev_name *create_or_update_name(struct commit *commit,
-					      const char *tip_name,
 					      timestamp_t taggerdate,
 					      int generation, int distance,
 					      int from_tag)
@@ -92,7 +91,6 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 	    !is_better_name(name, taggerdate, distance, from_tag))
 		return NULL;

-	name->tip_name = tip_name;
 	name->taggerdate = taggerdate;
 	name->generation = generation;
 	name->distance = distance;
@@ -130,22 +128,20 @@ static void name_rev(struct commit *start_commit,
 	struct commit *commit;
 	struct commit **parents_to_queue = NULL;
 	size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
-	char *to_free = NULL;
+	struct rev_name *start_name;

 	parse_commit(start_commit);
 	if (start_commit->date < cutoff)
 		return;

+	start_name = create_or_update_name(start_commit, taggerdate, 0, 0,
+					   from_tag);
+	if (!start_name)
+		return;
 	if (deref)
-		tip_name = to_free = xstrfmt("%s^0", tip_name);
+		start_name->tip_name = xstrfmt("%s^0", tip_name);
 	else
-		tip_name = to_free = xstrdup(tip_name);
-
-	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
-				   from_tag)) {
-		free(to_free);
-		return;
-	}
+		start_name->tip_name = xstrdup(tip_name);

 	memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
 	prio_queue_put(&queue, start_commit);
@@ -161,7 +157,7 @@ static void name_rev(struct commit *start_commit,
 				parents;
 				parents = parents->next, parent_number++) {
 			struct commit *parent = parents->item;
-			const char *new_name;
+			struct rev_name *parent_name;
 			int generation, distance;

 			parse_commit(parent);
@@ -169,18 +165,23 @@ static void name_rev(struct commit *start_commit,
 				continue;

 			if (parent_number > 1) {
-				new_name = get_parent_name(name, parent_number);
 				generation = 0;
 				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
 			} else {
-				new_name = name->tip_name;
 				generation = name->generation + 1;
 				distance = name->distance + 1;
 			}

-			if (create_or_update_name(parent, new_name, taggerdate,
-						  generation, distance,
-						  from_tag)) {
+			parent_name = create_or_update_name(parent, taggerdate,
+							    generation,
+							    distance, from_tag);
+			if (parent_name) {
+				if (parent_number > 1)
+					parent_name->tip_name =
+						get_parent_name(name,
+								parent_number);
+				else
+					parent_name->tip_name = name->tip_name;
 				ALLOC_GROW(parents_to_queue,
 					   parents_to_queue_nr + 1,
 					   parents_to_queue_alloc);
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/10] name-rev: release unused name strings
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (8 preceding siblings ...)
  2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
@ 2020-02-04 21:26 ` René Scharfe
  2020-02-05 15:19   ` Derrick Stolee
  2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: René Scharfe @ 2020-02-04 21:26 UTC (permalink / raw)
  To: Git Mailing List; +Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

name_rev() assigns a name to a commit and its parents and grandparents
and so on.  Commits share their name string with their first parent,
which in turn does the same, recursively to the root.  That saves a lot
of allocations.  When a better name is found, the old name is replaced,
but its memory is not released.  That leakage can become significant.

Can we release these old strings exactly once even though they are
referenced multiple times?  Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.

Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0.  These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.

That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.

If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.

We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.

For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly.  Here's the output of GNU
time before:

0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps

... and with this patch:

1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps

It also gets faster; hyperfine before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.534 s ±  0.006 s    [User: 1.039 s, System: 0.494 s]
  Range (min … max):    1.522 s …  1.542 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.338 s ±  0.006 s    [User: 1.047 s, System: 0.291 s]
  Range (min … max):    1.327 s …  1.346 s    10 runs

For the Linux repo it doesn't pay off; memory usage only gets down from:

0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps

... to:

0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps

The runtime actually increases slightly from:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
  Range (min … max):   824.1 ms … 838.9 ms    10 runs

... to:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
  Range (min … max):   843.4 ms … 854.3 ms    10 runs

Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca.  1000x longer in the latter.

Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 98f55bcea9..23a639ff30 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -17,7 +17,7 @@
 #define CUTOFF_DATE_SLOP 86400

 struct rev_name {
-	const char *tip_name;
+	char *tip_name;
 	timestamp_t taggerdate;
 	int generation;
 	int distance;
@@ -34,7 +34,7 @@ static struct commit_rev_name rev_names;

 static int is_valid_rev_name(const struct rev_name *name)
 {
-	return name && name->tip_name;
+	return name && (name->generation || name->tip_name);
 }

 static struct rev_name *get_commit_rev_name(const struct commit *commit)
@@ -87,9 +87,20 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = commit_rev_name_at(&rev_names, commit);

-	if (is_valid_rev_name(name) &&
-	    !is_better_name(name, taggerdate, distance, from_tag))
-		return NULL;
+	if (is_valid_rev_name(name)) {
+		if (!is_better_name(name, taggerdate, distance, from_tag))
+			return NULL;
+
+		/*
+		 * This string might still be shared with ancestors
+		 * (generation > 0).  We can release it here regardless,
+		 * because the new name that has just won will be better
+		 * for them as well, so name_rev() will replace these
+		 * stale pointers when it processes the parents.
+		 */
+		if (!name->generation)
+			free(name->tip_name);
+	}

 	name->taggerdate = taggerdate;
 	name->generation = generation;
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] name-rev: rewrite create_or_update_name()
  2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
@ 2020-02-05  2:00   ` Derrick Stolee
  2020-02-05  2:35     ` Taylor Blau
  2020-02-05 16:45   ` Andrei Rybak
  1 sibling, 1 reply; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05  2:00 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:14 PM, René Scharfe wrote:
> This code was moved straight out of name_rev(). As such, we inherited
> the "goto" to jump from an if into an else-if. We also inherited the
> fact that "nothing to do -- return NULL" is handled last.
> 
> Rewrite the function to first handle the "nothing to do" case. Then we
> can handle the conditional allocation early before going on to populate
> the struct. No need for goto-ing.

I read this carefully and agree it is functionally equivalent.

Since you are removing a goto and rearranging if/else blocks, I thought
this response needed to be explicit, at least more than usual. Good work.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] name-rev: rewrite create_or_update_name()
  2020-02-05  2:00   ` Derrick Stolee
@ 2020-02-05  2:35     ` Taylor Blau
  0 siblings, 0 replies; 27+ messages in thread
From: Taylor Blau @ 2020-02-05  2:35 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: René Scharfe, Git Mailing List, SZEDER Gábor,
	Martin Ågren, Junio C Hamano

On Tue, Feb 04, 2020 at 09:00:23PM -0500, Derrick Stolee wrote:
> On 2/4/2020 4:14 PM, René Scharfe wrote:
> > This code was moved straight out of name_rev(). As such, we inherited
> > the "goto" to jump from an if into an else-if. We also inherited the
> > fact that "nothing to do -- return NULL" is handled last.
> >
> > Rewrite the function to first handle the "nothing to do" case. Then we
> > can handle the conditional allocation early before going on to populate
> > the struct. No need for goto-ing.
>
> I read this carefully and agree it is functionally equivalent.

As did I. I originally thought that my MUA was wrapping the 'else if'
line oddly, but then I discovered that it's the label's indentation that
was throwing me off.

In any case, this makes good sense. Well done indeed.

> Since you are removing a goto and rearranging if/else blocks, I thought
> this response needed to be explicit, at least more than usual. Good work.
>
> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 08/10] name-rev: pre-size buffer in get_parent_name()
  2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
@ 2020-02-05  3:19   ` Derrick Stolee
  2020-02-05 15:16     ` René Scharfe
  0 siblings, 1 reply; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05  3:19 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:24 PM, René Scharfe wrote:
> We can calculate the size of new name easily and precisely. Open-code
> the xstrfmt() calls and grow the buffers as needed before filling them.
> This provides a surprisingly large benefit when working with the
> Chromium repository; here are the numbers measured using hyperfine
> before:
> 
> Benchmark #1: ./git -C ../chromium/src name-rev --all
>   Time (mean ± σ):      5.822 s ±  0.013 s    [User: 5.304 s, System: 0.516 s]
>   Range (min … max):    5.803 s …  5.837 s    10 runs
> 
> ... and with this patch:
> 
> Benchmark #1: ./git -C ../chromium/src name-rev --all
>   Time (mean ± σ):      1.527 s ±  0.003 s    [User: 1.015 s, System: 0.511 s]
>   Range (min … max):    1.524 s …  1.535 s    10 runs

Nice!

> +	if (name->generation > 0) {
> +		strbuf_grow(&sb, len +
> +			    1 + decimal_width(name->generation) +
> +			    1 + decimal_width(parent_number));

Just curious: these strbuf_grow() calls are what _really_ improve the
performance, right? If you dropped them, then can we expect the performance
to be similar to the old code?

> +		strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
> +			    name->generation, parent_number);
> +	} else {
> +		strbuf_grow(&sb, len +
> +			    1 + decimal_width(parent_number));
> +		strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
> +			    parent_number);
> +	}
> +	return strbuf_detach(&sb, NULL);
>  }
> 
>  static void name_rev(struct commit *start_commit,
> --
> 2.25.0
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/10] name-rev: improve memory usage
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (9 preceding siblings ...)
  2020-02-04 21:26 ` [PATCH 10/10] name-rev: release unused name strings René Scharfe
@ 2020-02-05  3:28 ` Derrick Stolee
  2020-02-05 15:20   ` Derrick Stolee
  2020-02-05 17:19 ` [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name() René Scharfe
  2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
  12 siblings, 1 reply; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05  3:28 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:12 PM, René Scharfe wrote:
> This series seeks to get reduce the size of memory allocations, the
> number of reallocations and the amount of leaks in git name-rev, to
> improve its performance.

I am generally very happy with the performance benefits and think
this series is very well organized.

>   name-rev: don't leak path copy in name_ref()
>   name-rev: generate name strings only if they are better
>   name-rev: release unused name strings

I don't have the right context to understand these patches without
applying them and investigating the methods around the changes.
They intrigue me, though, so I plan to pick this up again tomorrow.

The rest of the changes look good.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/10] name-rev: don't leak path copy in name_ref()
  2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
@ 2020-02-05 14:35   ` Derrick Stolee
  0 siblings, 0 replies; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05 14:35 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:17 PM, René Scharfe wrote:
> name_ref() duplicates the path string and passes it to name_rev(), which
> either puts it into a commit slab or ignores it if there is already a
> better name, leaking it.  Move the duplication to name_rev() and release
> the copy in the latter case.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/name-rev.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 2e6820bd5b..3e22a0503e 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -121,6 +121,8 @@ static void name_rev(struct commit *start_commit,
> 
>  	if (deref)
>  		tip_name = to_free = xstrfmt("%s^0", tip_name);
> +	else
> +		tip_name = to_free = xstrdup(tip_name);

We now unconditionally duplicate the input tip_name and free it
within the method. Good.

>  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
>  				   from_tag)) {
> @@ -323,7 +325,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
> +		name_rev(commit, path, taggerdate, from_tag, deref);

And we no longer duplicate 'path' here, as it is unconditionally duplicated
in name_ref().

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/10] name-rev: generate name strings only if they are better
  2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
@ 2020-02-05 15:11   ` Derrick Stolee
  2020-02-05 15:50     ` René Scharfe
  0 siblings, 1 reply; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05 15:11 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:25 PM, René Scharfe wrote:
> -			if (create_or_update_name(parent, new_name, taggerdate,
> -						  generation, distance,
> -						  from_tag)) {
> +			parent_name = create_or_update_name(parent, taggerdate,
> +							    generation,
> +							    distance, from_tag);
> +			if (parent_name) {

As someone unfamiliar with the name-rev code, it took me a while to see why
the algorithm isn't exponential in complexity. It technically _is_, but it
is of the form 2^{N / MERGE_TRAVERSAL_WEIGHT} = 2^{N / 65535} and only if
we create a particularly nasty commit history that would never appear in the wild.

But, the critical section is the block above. The confusing part was that
create_or_update_name() returns NULL if the name is updated, but non-NULL if
a better name is created. That makes it clear that this change is saving
allocations.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 08/10] name-rev: pre-size buffer in get_parent_name()
  2020-02-05  3:19   ` Derrick Stolee
@ 2020-02-05 15:16     ` René Scharfe
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-05 15:16 UTC (permalink / raw)
  To: Derrick Stolee, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Am 05.02.20 um 04:19 schrieb Derrick Stolee:
> On 2/4/2020 4:24 PM, René Scharfe wrote:
>> We can calculate the size of new name easily and precisely. Open-code
>> the xstrfmt() calls and grow the buffers as needed before filling them.
>> This provides a surprisingly large benefit when working with the
>> Chromium repository; here are the numbers measured using hyperfine
>> before:
>>
>> Benchmark #1: ./git -C ../chromium/src name-rev --all
>>   Time (mean ± σ):      5.822 s ±  0.013 s    [User: 5.304 s, System: 0.516 s]
>>   Range (min … max):    5.803 s …  5.837 s    10 runs
>>
>> ... and with this patch:
>>
>> Benchmark #1: ./git -C ../chromium/src name-rev --all
>>   Time (mean ± σ):      1.527 s ±  0.003 s    [User: 1.015 s, System: 0.511 s]
>>   Range (min … max):    1.524 s …  1.535 s    10 runs
>
> Nice!
>
>> +	if (name->generation > 0) {
>> +		strbuf_grow(&sb, len +
>> +			    1 + decimal_width(name->generation) +
>> +			    1 + decimal_width(parent_number));
>
> Just curious: these strbuf_grow() calls are what _really_ improve the
> performance, right? If you dropped them, then can we expect the performance
> to be similar to the old code?

The replaced xstrfmt() is defined in strbuf.c and trivially wraps
xstrvfmt(), which in turn does:

	struct strbuf buf = STRBUF_INIT;
	strbuf_vaddf(&buf, fmt, ap);
	return strbuf_detach(&buf, NULL);

And strbuf_addf() trivially wraps strbuf_vaddf().  So indeed, all
I did was to inline xstrfmt() and add the strbuf_grow() calls.

Their high impact shocked me a bit.  vsnprintf(3) really doesn't seem
to like working with a buffer that is too small (when it's supposed to
just calculate how many bytes it would have needed), at least not in
the one from glibc 2.29.

>> +		strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
>> +			    name->generation, parent_number);
>> +	} else {
>> +		strbuf_grow(&sb, len +
>> +			    1 + decimal_width(parent_number));
>> +		strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
>> +			    parent_number);
>> +	}
>> +	return strbuf_detach(&sb, NULL);
>>  }
>>
>>  static void name_rev(struct commit *start_commit,
>> --
>> 2.25.0
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/10] name-rev: release unused name strings
  2020-02-04 21:26 ` [PATCH 10/10] name-rev: release unused name strings René Scharfe
@ 2020-02-05 15:19   ` Derrick Stolee
  0 siblings, 0 replies; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05 15:19 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 4:26 PM, René Scharfe wrote:
> The runtime actually increases slightly from:
> 
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
>   Range (min … max):   824.1 ms … 838.9 ms    10 runs
> 
> ... to:
> 
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
>   Range (min … max):   843.4 ms … 854.3 ms    10 runs
> 
> Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
> create_or_update_name() release almost 1GB, while in the Linux repo
> 240000+ calls release a bit more than 5MB, so the average discarded
> name is ca.  1000x longer in the latter.
> 
> Overall I think it's the right tradeoff to make, as it helps curb the
> memory usage in repositories with big discarded names, and the added
> overhead is small.

I agree this trade-off is worth it. Your reasoning for why it is
happening makes sense, too.

> +	if (is_valid_rev_name(name)) {
> +		if (!is_better_name(name, taggerdate, distance, from_tag))
> +			return NULL;
> +
> +		/*
> +		 * This string might still be shared with ancestors
> +		 * (generation > 0).  We can release it here regardless,
> +		 * because the new name that has just won will be better
> +		 * for them as well, so name_rev() will replace these
> +		 * stale pointers when it processes the parents.
> +		 */
> +		if (!name->generation)
> +			free(name->tip_name);
> +	}

And here, this idea of "still be shared with ancestors" is confusing
without the additional context that the name-rev algorithm is using
depth-first-search to find the "best" name. At this point, we are
trying to replace the existing name with a better one, and use
"generation == 0" to declare "I am the initial owner of tip_name".
The rest of the ancestors will replace their tip_name pointer with
the new name, all while not accessing this freed memory.

Keeping such dangling references to freed memory is certainly
dangerous, but these references are short-lived within the name_rev()
method. That limits the possible ways this could cause issues in
the future.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/10] name-rev: improve memory usage
  2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
@ 2020-02-05 15:20   ` Derrick Stolee
  0 siblings, 0 replies; 27+ messages in thread
From: Derrick Stolee @ 2020-02-05 15:20 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

On 2/4/2020 10:28 PM, Derrick Stolee wrote:
> On 2/4/2020 4:12 PM, René Scharfe wrote:
>> This series seeks to get reduce the size of memory allocations, the
>> number of reallocations and the amount of leaks in git name-rev, to
>> improve its performance.
> 
> I am generally very happy with the performance benefits and think
> this series is very well organized.
> 
>>   name-rev: don't leak path copy in name_ref()
>>   name-rev: generate name strings only if they are better
>>   name-rev: release unused name strings
> 
> I don't have the right context to understand these patches without
> applying them and investigating the methods around the changes.
> They intrigue me, though, so I plan to pick this up again tomorrow.

After looking closely at these three, they LGTM.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/10] name-rev: generate name strings only if they are better
  2020-02-05 15:11   ` Derrick Stolee
@ 2020-02-05 15:50     ` René Scharfe
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-05 15:50 UTC (permalink / raw)
  To: Derrick Stolee, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Am 05.02.20 um 16:11 schrieb Derrick Stolee:
> On 2/4/2020 4:25 PM, René Scharfe wrote:
>> -			if (create_or_update_name(parent, new_name, taggerdate,
>> -						  generation, distance,
>> -						  from_tag)) {
>> +			parent_name = create_or_update_name(parent, taggerdate,
>> +							    generation,
>> +							    distance, from_tag);
>> +			if (parent_name) {
>
> As someone unfamiliar with the name-rev code, it took me a while to see why
> the algorithm isn't exponential in complexity. It technically _is_, but it
> is of the form 2^{N / MERGE_TRAVERSAL_WEIGHT} = 2^{N / 65535} and only if
> we create a particularly nasty commit history that would never appear in the wild.

As I understand it, name_rev() attaches a name (a tag or other reference) to a
commit, and then goes through all its ancestors, depth first, and attaches a
name based on that to all of them as well.  It stops traversal if the name is
not better than a previously attached name.  Each commit has at most one name.

If we have N commits and M tags then we could traverse N * M commits and set
their names in the worst case, if the M tags are all for the very latest
commit and if the tags are ordered from worst to best.  Right?

Which implies that ordering names before attaching them one by one should
yield another speedup.  Older tags are preferred and should be tried first,
followed by younger tags and non-tags.  Hmm.. :)

> But, the critical section is the block above. The confusing part was that
> create_or_update_name() returns NULL if the name is updated, but non-NULL if
> a better name is created. That makes it clear that this change is saving
> allocations.

create_or_update_name() returns NULL if the new name is not better than an
existing one, and the old name is left untouched.  It returns a pointer to
the name record if there either was no old name or if that name was worse.
In that case it updates the name.

After this patch it doesn't set the name string (tip_name) anymore, which
is left for the caller.  And the benefit is that the caller only needs to
prepare said name string if the new name is better.

René

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] name-rev: rewrite create_or_update_name()
  2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
  2020-02-05  2:00   ` Derrick Stolee
@ 2020-02-05 16:45   ` Andrei Rybak
  2020-02-05 16:47     ` René Scharfe
  1 sibling, 1 reply; 27+ messages in thread
From: Andrei Rybak @ 2020-02-05 16:45 UTC (permalink / raw)
  To: René Scharfe, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Hi René,

On 2020-02-04 22:14, René Scharfe wrote:
> This code was moved straight out of name_rev(). As such, we inherited
> the "goto" to jump from an if into an else-if. We also inherited the
> fact that "nothing to do -- return NULL" is handled last.
>
> Rewrite the function to first handle the "nothing to do" case. Then we
> can handle the conditional allocation early before going on to populate
> the struct. No need for goto-ing.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> Original submission:
> https://lore.kernel.org/git/20190922081846.14452-1-martin.agren@gmail.com/

Judging by this sign-off, link to Martin's email, and shortlog from the cover letter ...

> Martin Ågren (1):
>   name-rev: rewrite create_or_update_name()

... it seems that "From: Martin Ågren <martin.agren@gmail.com>" is missing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] name-rev: rewrite create_or_update_name()
  2020-02-05 16:45   ` Andrei Rybak
@ 2020-02-05 16:47     ` René Scharfe
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-05 16:47 UTC (permalink / raw)
  To: Andrei Rybak, Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano

Am 05.02.20 um 17:45 schrieb Andrei Rybak:
> Hi René,
>
> On 2020-02-04 22:14, René Scharfe wrote:
>> This code was moved straight out of name_rev(). As such, we inherited
>> the "goto" to jump from an if into an else-if. We also inherited the
>> fact that "nothing to do -- return NULL" is handled last.
>>
>> Rewrite the function to first handle the "nothing to do" case. Then we
>> can handle the conditional allocation early before going on to populate
>> the struct. No need for goto-ing.
>>
>> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
>> ---
>> Original submission:
>> https://lore.kernel.org/git/20190922081846.14452-1-martin.agren@gmail.com/
>
> Judging by this sign-off, link to Martin's email, and shortlog from the cover letter ...
>
>> Martin Ågren (1):
>>   name-rev: rewrite create_or_update_name()
>
> ... it seems that "From: Martin Ågren <martin.agren@gmail.com>" is missing.

Yes, indeed.  Sorry for that. :(

René


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name()
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (10 preceding siblings ...)
  2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
@ 2020-02-05 17:19 ` René Scharfe
  2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
  12 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-05 17:19 UTC (permalink / raw)
  To: Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano,
	Andrei Rybak

From: =?UTF-8?q?Martin=20=C3=85gren?= <martin.agren@gmail.com>

This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.

Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Missing From: line added; thanks Andrei!

Original submission:
https://lore.kernel.org/git/20190922081846.14452-1-martin.agren@gmail.com/

 builtin/name-rev.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 6b9e8c850b..c2239ac3f7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -88,21 +88,21 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = get_commit_rev_name(commit);

+	if (name && !is_better_name(name, taggerdate, distance, from_tag))
+		return NULL;
+
 	if (name == NULL) {
 		name = xmalloc(sizeof(*name));
 		set_commit_rev_name(commit, name);
-		goto copy_data;
-	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
-copy_data:
-		name->tip_name = tip_name;
-		name->taggerdate = taggerdate;
-		name->generation = generation;
-		name->distance = distance;
-		name->from_tag = from_tag;
-
-		return name;
-	} else
-		return NULL;
+	}
+
+	name->tip_name = tip_name;
+	name->taggerdate = taggerdate;
+	name->generation = generation;
+	name->distance = distance;
+	name->from_tag = from_tag;
+
+	return name;
 }

 static void name_rev(struct commit *start_commit,
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/10] name-rev: sort tip names before applying
  2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
                   ` (11 preceding siblings ...)
  2020-02-05 17:19 ` [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name() René Scharfe
@ 2020-02-05 17:50 ` René Scharfe
  2020-02-05 18:23   ` Junio C Hamano
  12 siblings, 1 reply; 27+ messages in thread
From: René Scharfe @ 2020-02-05 17:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: SZEDER Gábor, Martin Ågren, Junio C Hamano,
	Derrick Stolee

name_ref() is called for each ref and checks if its a better name for
the referenced commit.  If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well.  This in done in
the the order for_each_ref() imposes on us.

That might not be optimal.  If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later.  Setting
better names first can avoid that.

is_better_name() prefers tags, short distances and old references.  The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits.  Sorting the refs by them should yield better
performance than the essentially random order we currently use.

And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.

So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.1 ms ±   4.5 ms    [User: 806.7 ms, System: 44.4 ms]
  Range (min … max):   845.9 ms … 859.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     736.2 ms ±   8.7 ms    [User: 688.4 ms, System: 47.5 ms]
  Range (min … max):   726.0 ms … 755.2 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 60 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 23a639ff30..a9dcd25e46 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -247,6 +247,10 @@ static struct tip_table {
 	struct tip_table_entry {
 		struct object_id oid;
 		const char *refname;
+		struct commit *commit;
+		timestamp_t taggerdate;
+		unsigned int from_tag:1;
+		unsigned int deref:1;
 	} *table;
 	int nr;
 	int alloc;
@@ -254,13 +258,18 @@ static struct tip_table {
 } tip_table;

 static void add_to_tip_table(const struct object_id *oid, const char *refname,
-			     int shorten_unambiguous)
+			     int shorten_unambiguous, struct commit *commit,
+			     timestamp_t taggerdate, int from_tag, int deref)
 {
 	refname = name_ref_abbrev(refname, shorten_unambiguous);

 	ALLOC_GROW(tip_table.table, tip_table.nr + 1, tip_table.alloc);
 	oidcpy(&tip_table.table[tip_table.nr].oid, oid);
 	tip_table.table[tip_table.nr].refname = xstrdup(refname);
+	tip_table.table[tip_table.nr].commit = commit;
+	tip_table.table[tip_table.nr].taggerdate = taggerdate;
+	tip_table.table[tip_table.nr].from_tag = from_tag;
+	tip_table.table[tip_table.nr].deref = deref;
 	tip_table.nr++;
 	tip_table.sorted = 0;
 }
@@ -271,12 +280,30 @@ static int tipcmp(const void *a_, const void *b_)
 	return oidcmp(&a->oid, &b->oid);
 }

+static int cmp_by_tag_and_age(const void *a_, const void *b_)
+{
+	const struct tip_table_entry *a = a_, *b = b_;
+	int cmp;
+
+	/* Prefer tags. */
+	cmp = b->from_tag - a->from_tag;
+	if (cmp)
+		return cmp;
+
+	/* Older is better. */
+	if (a->taggerdate < b->taggerdate)
+		return -1;
+	return a->taggerdate != b->taggerdate;
+}
+
 static int name_ref(const char *path, const struct object_id *oid, int flags, void *cb_data)
 {
 	struct object *o = parse_object(the_repository, oid);
 	struct name_ref_data *data = cb_data;
 	int can_abbreviate_output = data->tags_only && data->name_only;
 	int deref = 0;
+	int from_tag = 0;
+	struct commit *commit = NULL;
 	timestamp_t taggerdate = TIME_MAX;

 	if (data->tags_only && !starts_with(path, "refs/tags/"))
@@ -325,8 +352,6 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 			return 0;
 	}

-	add_to_tip_table(oid, path, can_abbreviate_output);
-
 	while (o && o->type == OBJ_TAG) {
 		struct tag *t = (struct tag *) o;
 		if (!t->tagged)
@@ -336,17 +361,35 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
 		taggerdate = t->date;
 	}
 	if (o && o->type == OBJ_COMMIT) {
-		struct commit *commit = (struct commit *)o;
-		int from_tag = starts_with(path, "refs/tags/");
-
+		commit = (struct commit *)o;
+		from_tag = starts_with(path, "refs/tags/");
 		if (taggerdate == TIME_MAX)
 			taggerdate = commit->date;
-		path = name_ref_abbrev(path, can_abbreviate_output);
-		name_rev(commit, path, taggerdate, from_tag, deref);
 	}
+
+	add_to_tip_table(oid, path, can_abbreviate_output, commit, taggerdate,
+			 from_tag, deref);
 	return 0;
 }

+static void name_tips(void)
+{
+	int i;
+
+	/*
+	 * Try to set better names first, so that worse ones spread
+	 * less.
+	 */
+	QSORT(tip_table.table, tip_table.nr, cmp_by_tag_and_age);
+	for (i = 0; i < tip_table.nr; i++) {
+		struct tip_table_entry *e = &tip_table.table[i];
+		if (e->commit) {
+			name_rev(e->commit, e->refname, e->taggerdate,
+				 e->from_tag, e->deref);
+		}
+	}
+}
+
 static const unsigned char *nth_tip_table_ent(size_t ix, void *table_)
 {
 	struct tip_table_entry *table = table_;
@@ -559,6 +602,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 			cutoff = TIME_MIN;
 	}
 	for_each_ref(name_ref, &data);
+	name_tips();

 	if (transform_stdin) {
 		char buffer[2048];
--
2.25.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/10] name-rev: sort tip names before applying
  2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
@ 2020-02-05 18:23   ` Junio C Hamano
  2020-02-05 18:55     ` René Scharfe
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2020-02-05 18:23 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, SZEDER Gábor, Martin Ågren,
	Derrick Stolee

René Scharfe <l.s.r@web.de> writes:

> Subject: Re: [PATCH 11/10] name-rev: sort tip names before applying

"before applying" what?

> name_ref() is called for each ref and checks if its a better name for

s/its/it's/

> the referenced commit.  If that's the case it remembers it and checks if

s/case/&,/

> a name based on it is better for its ancestors as well.  This in done in
> the the order for_each_ref() imposes on us.

s/the the/the/

> That might not be optimal.  If bad names happen to be encountered first
> (as defined by is_better_name()), names derived from them may spread to
> a lot of commits, only to be replaced by better names later.  Setting
> better names first can avoid that.
>
> is_better_name() prefers tags, short distances and old references.  The
> distance is a measure that we need to calculate for each candidate
> commit, but the other two properties are not dependent on the
> relationships of commits.  Sorting the refs by them should yield better
> performance than the essentially random order we currently use.

Good insight.

> And applying older references first should also help to reduce rework
> due to the fact that older commits have less ancestors than newer ones.

"older" in the sense of committer/tagger timestamp.  I wonder it
would further help if the commit-graph is available and give us
generation number.

> So add all details of names to the tip table first, then sort them
> to prefer tags and older references and then apply them in this order.
> Here's the performance as measures by hyperfine for the Linux repo
> before:
>
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     851.1 ms ±   4.5 ms    [User: 806.7 ms, System: 44.4 ms]
>   Range (min … max):   845.9 ms … 859.5 ms    10 runs
>
> ... and with this patch:
>
> Benchmark #1: ./git -C ../linux/ name-rev --all
>   Time (mean ± σ):     736.2 ms ±   8.7 ms    [User: 688.4 ms, System: 47.5 ms]
>   Range (min … max):   726.0 ms … 755.2 ms    10 runs
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/name-rev.c | 60 +++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 52 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 23a639ff30..a9dcd25e46 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -247,6 +247,10 @@ static struct tip_table {
>  	struct tip_table_entry {
>  		struct object_id oid;
>  		const char *refname;
> +		struct commit *commit;
> +		timestamp_t taggerdate;
> +		unsigned int from_tag:1;
> +		unsigned int deref:1;
>  	} *table;
>  	int nr;
>  	int alloc;
> @@ -254,13 +258,18 @@ static struct tip_table {
>  } tip_table;
>
>  static void add_to_tip_table(const struct object_id *oid, const char *refname,
> -			     int shorten_unambiguous)
> +			     int shorten_unambiguous, struct commit *commit,
> +			     timestamp_t taggerdate, int from_tag, int deref)
>  {
>  	refname = name_ref_abbrev(refname, shorten_unambiguous);
>
>  	ALLOC_GROW(tip_table.table, tip_table.nr + 1, tip_table.alloc);
>  	oidcpy(&tip_table.table[tip_table.nr].oid, oid);
>  	tip_table.table[tip_table.nr].refname = xstrdup(refname);
> +	tip_table.table[tip_table.nr].commit = commit;
> +	tip_table.table[tip_table.nr].taggerdate = taggerdate;
> +	tip_table.table[tip_table.nr].from_tag = from_tag;
> +	tip_table.table[tip_table.nr].deref = deref;
>  	tip_table.nr++;
>  	tip_table.sorted = 0;
>  }
> @@ -271,12 +280,30 @@ static int tipcmp(const void *a_, const void *b_)
>  	return oidcmp(&a->oid, &b->oid);
>  }
>
> +static int cmp_by_tag_and_age(const void *a_, const void *b_)
> +{
> +	const struct tip_table_entry *a = a_, *b = b_;
> +	int cmp;
> +
> +	/* Prefer tags. */
> +	cmp = b->from_tag - a->from_tag;

We end up with negative -1 if b is not from tag and a is from tag,
even though the from_tag field is unsigned, so this is not wrong
per-se but feels a bit subtle.

> +	if (cmp)
> +		return cmp;
> +
> +	/* Older is better. */
> +	if (a->taggerdate < b->taggerdate)
> +		return -1;

We are here if both are from tag or neither is from tag.  Mental
note: let's make sure that add_to_tip_table() is always called with
usable timestamp even for a commit that is not from tag.

> +	return a->taggerdate != b->taggerdate;

OK, we know a is either of the same age as or younger than b at this
point, so we would return 1 if a is not the same age as b.

>  static int name_ref(const char *path, const struct object_id *oid, int flags, void *cb_data)
>  {
>  	struct object *o = parse_object(the_repository, oid);
>  	struct name_ref_data *data = cb_data;
>  	int can_abbreviate_output = data->tags_only && data->name_only;
>  	int deref = 0;
> +	int from_tag = 0;
> +	struct commit *commit = NULL;
>  	timestamp_t taggerdate = TIME_MAX;
>
>  	if (data->tags_only && !starts_with(path, "refs/tags/"))
> @@ -325,8 +352,6 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  			return 0;
>  	}
>
> -	add_to_tip_table(oid, path, can_abbreviate_output);
> -
>  	while (o && o->type == OBJ_TAG) {
>  		struct tag *t = (struct tag *) o;
>  		if (!t->tagged)
> @@ -336,17 +361,35 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		taggerdate = t->date;
>  	}
>  	if (o && o->type == OBJ_COMMIT) {
> -		struct commit *commit = (struct commit *)o;
> -		int from_tag = starts_with(path, "refs/tags/");
> -
> +		commit = (struct commit *)o;
> +		from_tag = starts_with(path, "refs/tags/");
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;

OK, so a lightweight tag gets the commit date, while an annotated
tag gets the tagger date (while peeling in the loop just above this
part).

I never make an annotated tag long after creating the commit the tag
refers to myself (i.e. on the day I tag a release or a release
candidate, I know that the commit I am creating is what I will tag
even before creating the commit, and I tag the commit soon after
that), but I can imagine that in some other workflows the
maintainers may be tagging long after the commit was made, and more
importantly, the interval of time between comitting and tagging
might be different from release to release.  I wonder if it mimicks
the topological ordering better if always compare the timestamps of
underlying commits.

> -		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, path, taggerdate, from_tag, deref);
>  	}
> +
> +	add_to_tip_table(oid, path, can_abbreviate_output, commit, taggerdate,
> +			 from_tag, deref);
>  	return 0;
>  }
>
> +static void name_tips(void)
> +{
> +	int i;
> +
> +	/*
> +	 * Try to set better names first, so that worse ones spread
> +	 * less.
> +	 */
> +	QSORT(tip_table.table, tip_table.nr, cmp_by_tag_and_age);
> +	for (i = 0; i < tip_table.nr; i++) {
> +		struct tip_table_entry *e = &tip_table.table[i];
> +		if (e->commit) {

Sorry, I am confused.  I didn't see anything that clears
tip_table.table[i].commit in the code.

> +			name_rev(e->commit, e->refname, e->taggerdate,
> +				 e->from_tag, e->deref);
> +		}
> +	}
> +}
> +
>  static const unsigned char *nth_tip_table_ent(size_t ix, void *table_)
>  {
>  	struct tip_table_entry *table = table_;
> @@ -559,6 +602,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  			cutoff = TIME_MIN;
>  	}
>  	for_each_ref(name_ref, &data);
> +	name_tips();
>
>  	if (transform_stdin) {
>  		char buffer[2048];
> --
> 2.25.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 11/10] name-rev: sort tip names before applying
  2020-02-05 18:23   ` Junio C Hamano
@ 2020-02-05 18:55     ` René Scharfe
  0 siblings, 0 replies; 27+ messages in thread
From: René Scharfe @ 2020-02-05 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, SZEDER Gábor, Martin Ågren,
	Derrick Stolee

Am 05.02.20 um 19:23 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Subject: Re: [PATCH 11/10] name-rev: sort tip names before applying
>
> "before applying" what?

The tip name.  "Using" or "attaching to commits" might be better than
"applying" here.

>> name_ref() is called for each ref and checks if its a better name for
>
> s/its/it's/
>
>> the referenced commit.  If that's the case it remembers it and checks if
>
> s/case/&,/
>
>> a name based on it is better for its ancestors as well.  This in done in
>> the the order for_each_ref() imposes on us.
>
> s/the the/the/

Right.  I was just too excited..

>> And applying older references first should also help to reduce rework
>> due to the fact that older commits have less ancestors than newer ones.
>
> "older" in the sense of committer/tagger timestamp.  I wonder it
> would further help if the commit-graph is available and give us
> generation number.

To sort by generation number instead of by timestamp?  That would at
the very least help in repos whose timestamps are wonky (imported from
timeless version control system, or commits from a computer with a broken
clock).

>> So add all details of names to the tip table first, then sort them
>> to prefer tags and older references and then apply them in this order.
>> Here's the performance as measures by hyperfine for the Linux repo
>> before:
>>
>> Benchmark #1: ./git -C ../linux/ name-rev --all
>>   Time (mean ± σ):     851.1 ms ±   4.5 ms    [User: 806.7 ms, System: 44.4 ms]
>>   Range (min … max):   845.9 ms … 859.5 ms    10 runs
>>
>> ... and with this patch:
>>
>> Benchmark #1: ./git -C ../linux/ name-rev --all
>>   Time (mean ± σ):     736.2 ms ±   8.7 ms    [User: 688.4 ms, System: 47.5 ms]
>>   Range (min … max):   726.0 ms … 755.2 ms    10 runs
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/name-rev.c | 60 +++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 52 insertions(+), 8 deletions(-)
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index 23a639ff30..a9dcd25e46 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -247,6 +247,10 @@ static struct tip_table {
>>  	struct tip_table_entry {
>>  		struct object_id oid;
>>  		const char *refname;
>> +		struct commit *commit;
>> +		timestamp_t taggerdate;
>> +		unsigned int from_tag:1;
>> +		unsigned int deref:1;
>>  	} *table;
>>  	int nr;
>>  	int alloc;
>> @@ -254,13 +258,18 @@ static struct tip_table {
>>  } tip_table;
>>
>>  static void add_to_tip_table(const struct object_id *oid, const char *refname,
>> -			     int shorten_unambiguous)
>> +			     int shorten_unambiguous, struct commit *commit,
>> +			     timestamp_t taggerdate, int from_tag, int deref)
>>  {
>>  	refname = name_ref_abbrev(refname, shorten_unambiguous);
>>
>>  	ALLOC_GROW(tip_table.table, tip_table.nr + 1, tip_table.alloc);
>>  	oidcpy(&tip_table.table[tip_table.nr].oid, oid);
>>  	tip_table.table[tip_table.nr].refname = xstrdup(refname);
>> +	tip_table.table[tip_table.nr].commit = commit;
>> +	tip_table.table[tip_table.nr].taggerdate = taggerdate;
>> +	tip_table.table[tip_table.nr].from_tag = from_tag;
>> +	tip_table.table[tip_table.nr].deref = deref;
>>  	tip_table.nr++;
>>  	tip_table.sorted = 0;
>>  }
>> @@ -271,12 +280,30 @@ static int tipcmp(const void *a_, const void *b_)
>>  	return oidcmp(&a->oid, &b->oid);
>>  }
>>
>> +static int cmp_by_tag_and_age(const void *a_, const void *b_)
>> +{
>> +	const struct tip_table_entry *a = a_, *b = b_;
>> +	int cmp;
>> +
>> +	/* Prefer tags. */
>> +	cmp = b->from_tag - a->from_tag;
>
> We end up with negative -1 if b is not from tag and a is from tag,
> even though the from_tag field is unsigned, so this is not wrong
> per-se but feels a bit subtle.

Integer promotion -- I admit I looked it up just before sending the
patch to be sure C does the right thing here.  Comparing explicitly
would be more readable.

>
>> +	if (cmp)
>> +		return cmp;
>> +
>> +	/* Older is better. */
>> +	if (a->taggerdate < b->taggerdate)
>> +		return -1;
>
> We are here if both are from tag or neither is from tag.  Mental
> note: let's make sure that add_to_tip_table() is always called with
> usable timestamp even for a commit that is not from tag.
>
>> +	return a->taggerdate != b->taggerdate;
>
> OK, we know a is either of the same age as or younger than b at this
> point, so we would return 1 if a is not the same age as b.
>
>>  static int name_ref(const char *path, const struct object_id *oid, int flags, void *cb_data)
>>  {
>>  	struct object *o = parse_object(the_repository, oid);
>>  	struct name_ref_data *data = cb_data;
>>  	int can_abbreviate_output = data->tags_only && data->name_only;
>>  	int deref = 0;
>> +	int from_tag = 0;
>> +	struct commit *commit = NULL;

Please remember this line.

>>  	timestamp_t taggerdate = TIME_MAX;
>>
>>  	if (data->tags_only && !starts_with(path, "refs/tags/"))
>> @@ -325,8 +352,6 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>>  			return 0;
>>  	}
>>
>> -	add_to_tip_table(oid, path, can_abbreviate_output);
>> -
>>  	while (o && o->type == OBJ_TAG) {
>>  		struct tag *t = (struct tag *) o;
>>  		if (!t->tagged)
>> @@ -336,17 +361,35 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>>  		taggerdate = t->date;
>>  	}
>>  	if (o && o->type == OBJ_COMMIT) {
>> -		struct commit *commit = (struct commit *)o;
>> -		int from_tag = starts_with(path, "refs/tags/");
>> -
>> +		commit = (struct commit *)o;
>> +		from_tag = starts_with(path, "refs/tags/");
>>  		if (taggerdate == TIME_MAX)
>>  			taggerdate = commit->date;
>
> OK, so a lightweight tag gets the commit date, while an annotated
> tag gets the tagger date (while peeling in the loop just above this
> part).
>
> I never make an annotated tag long after creating the commit the tag
> refers to myself (i.e. on the day I tag a release or a release
> candidate, I know that the commit I am creating is what I will tag
> even before creating the commit, and I tag the commit soon after
> that), but I can imagine that in some other workflows the
> maintainers may be tagging long after the commit was made, and more
> importantly, the interval of time between comitting and tagging
> might be different from release to release.  I wonder if it mimicks
> the topological ordering better if always compare the timestamps of
> underlying commits.

I also don't understand why lightweight tags are preferred over tag
objects both by having from_tag set and by having an older timestamp.
In my mind an annotated tag should have more weight since it's harder
to create.

>
>> -		path = name_ref_abbrev(path, can_abbreviate_output);
>> -		name_rev(commit, path, taggerdate, from_tag, deref);
>>  	}
>> +
>> +	add_to_tip_table(oid, path, can_abbreviate_output, commit, taggerdate,
>> +			 from_tag, deref);
>>  	return 0;
>>  }
>>
>> +static void name_tips(void)
>> +{
>> +	int i;
>> +
>> +	/*
>> +	 * Try to set better names first, so that worse ones spread
>> +	 * less.
>> +	 */
>> +	QSORT(tip_table.table, tip_table.nr, cmp_by_tag_and_age);
>> +	for (i = 0; i < tip_table.nr; i++) {
>> +		struct tip_table_entry *e = &tip_table.table[i];
>> +		if (e->commit) {
>
> Sorry, I am confused.  I didn't see anything that clears
> tip_table.table[i].commit in the code.

The line we remembered initializes commit to NULL.  It stays NULL for
refs that don't resolve to commits.

>
>> +			name_rev(e->commit, e->refname, e->taggerdate,
>> +				 e->from_tag, e->deref);
>> +		}
>> +	}
>> +}
>> +
>>  static const unsigned char *nth_tip_table_ent(size_t ix, void *table_)
>>  {
>>  	struct tip_table_entry *table = table_;
>> @@ -559,6 +602,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>>  			cutoff = TIME_MIN;
>>  	}
>>  	for_each_ref(name_ref, &data);
>> +	name_tips();
>>
>>  	if (transform_stdin) {
>>  		char buffer[2048];
>> --
>> 2.25.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-02-05 18:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 21:12 [PATCH 00/10] name-rev: improve memory usage René Scharfe
2020-02-04 21:14 ` [PATCH 01/10] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05  2:00   ` Derrick Stolee
2020-02-05  2:35     ` Taylor Blau
2020-02-05 16:45   ` Andrei Rybak
2020-02-05 16:47     ` René Scharfe
2020-02-04 21:15 ` [PATCH 02/10] name-rev: remove unused typedef René Scharfe
2020-02-04 21:16 ` [PATCH 03/10] name-rev: respect const qualifier René Scharfe
2020-02-04 21:17 ` [PATCH 04/10] name-rev: don't leak path copy in name_ref() René Scharfe
2020-02-05 14:35   ` Derrick Stolee
2020-02-04 21:20 ` [PATCH 05/10] name-rev: don't _peek() in create_or_update_name() René Scharfe
2020-02-04 21:22 ` [PATCH 06/10] name-rev: put struct rev_name into commit slab René Scharfe
2020-02-04 21:23 ` [PATCH 07/10] name-rev: factor out get_parent_name() René Scharfe
2020-02-04 21:24 ` [PATCH 08/10] name-rev: pre-size buffer in get_parent_name() René Scharfe
2020-02-05  3:19   ` Derrick Stolee
2020-02-05 15:16     ` René Scharfe
2020-02-04 21:25 ` [PATCH 09/10] name-rev: generate name strings only if they are better René Scharfe
2020-02-05 15:11   ` Derrick Stolee
2020-02-05 15:50     ` René Scharfe
2020-02-04 21:26 ` [PATCH 10/10] name-rev: release unused name strings René Scharfe
2020-02-05 15:19   ` Derrick Stolee
2020-02-05  3:28 ` [PATCH 00/10] name-rev: improve memory usage Derrick Stolee
2020-02-05 15:20   ` Derrick Stolee
2020-02-05 17:19 ` [PATCH 01/10 RESEND AUTHOR FIXED] name-rev: rewrite create_or_update_name() René Scharfe
2020-02-05 17:50 ` [PATCH 11/10] name-rev: sort tip names before applying René Scharfe
2020-02-05 18:23   ` Junio C Hamano
2020-02-05 18:55     ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).