git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Corrupt name-rev output
@ 2022-04-17 16:20 Thomas Hurst
  2022-04-20  3:13 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Hurst @ 2022-04-17 16:20 UTC (permalink / raw)
  To: git

I've noticed a series of about 20 commits in the HardenedBSD repository
fairly reliably produce garbage names from git name-rev - usually
fragments of another commit, sometimes unprintable nonsense.  Sometimes
it works just fine...

Here's a quick demo showing how to reproduce the problem:

% uname -mrs
FreeBSD 13.0-RELEASE-p11 amd64
% git --version
git version 2.35.2
% git clone --bare --mirror https://github.com/HardenedBSD/hardenedBSD.git
% cd hardenedBSD.git
% git rev-list --branches=\* |
  git name-rev --stdin --refs=heads/\* |
  egrep -v '^[0-9a-f]{40}( \([a-zA-Z0-9_/.^~-]+\))?$'
3eb67b534cab6a78b44b13e4323fd60353003089 (y:    marcel
MFC after:      3 days
Relnotes:       yes
Sponsored by:   ScaleEngine Inc.
Differential Revision:  https://reviews.freebsd.org/D3065
~3)
3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (y:    marcel
MFC after:      3 days
Relnotes:       yes
Sponsored by:   ScaleEngine Inc.
Differential Revision:  https://reviews.freebsd.org/D3065
~4)
<continues>

Here's the result of another run, which includes all the affected
hashes - the name here presumably also has some non-printable garbage
not shown:

3eb67b534cab6a78b44b13e4323fd60353003089 (H~3)
3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (H~4)
888a2df901616fb2900279c75580de3d4bc93278 (H~5)
58b9ad4f2e737ff922ff49ae564a32ee94c0ff6b (H~6)
59181d461b8d8c45144dc1856fa94da8a197b2ee (H~7)
9dd8627f2d53008b73ed93f92b71e24e8e319d93 (H~8)
3e1340e9928c7398551d3192a08acb678defc75e (H~9)
6ed06b88b912202db5b9d28b281ddbf4a1b0652a (H~10)
03b1b642f686502c128ac78ccf9dba7605f2aedf (H~11)
e7861fb764975a2c5f2f558b0800656679959624 (H~12)
4c28d4a51469dbb69bac67ec3b1c3a3d114ddf80 (H~13)
9ad0ed2485e40adce13b3900230640db66a38311 (H~14)
88dbdc409320e2785be4b1fcf0392999e3ab4374 (H~15)
c09997f49b7ca54f8d7305be57a037a294c32780 (H~16)
1f95262a4cda63a14016278f8ccc42c217858c36 (H~17)
50fec7a08d222bfd97499afa1c4000322c2280ff (H~18)
df3a215e38869117f8297e0910e08ae95eea5a59 (H~19)
cc81f4cf69d01cfc329827dbbc44da219bdddc05 (H~20)
5a7f7f242655f04a42e152d2c2f45819e9de4376 (H~21)
39cb585f2d4d4db10a282b06d9f7a9f07b658804 (H~22)
1264cc1d38cb1f1f03e34e395b33011108945638 (H~23)
98883cdf600f4aa3d5aba1d60248000132b6ad58 (H~24)
541df39729d32a70ba4da5184a8e82aaa3c87ae5 (H~25)
288fdc199a6936caab9d89d7deeb9b2bcf3b1f68 (H~26)

Passing these commits into name-rev as arguments finds them under
hardened/current/relro~199^2

git fsck --full does not reveal or fix anything, and the problem also
persists with a build from source from the next branch.

I was unable to reproduce on an Ubuntu machine with 2.32.0, so I used
that as a starting point for bisection and landed here:

  3656f842789d25d75da41c6c029470052a573b54
  name-rev: prefer shorter names over following merges

-- 
Thomas 'Freaky' Hurst
    http://hur.st/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-17 16:20 Corrupt name-rev output Thomas Hurst
@ 2022-04-20  3:13 ` Junio C Hamano
  2022-04-21  2:11   ` Elijah Newren
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-04-20  3:13 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Thomas Hurst

Thomas Hurst <tom@hur.st> writes:

> I've noticed a series of about 20 commits in the HardenedBSD repository
> fairly reliably produce garbage names from git name-rev - usually
> fragments of another commit, sometimes unprintable nonsense.  Sometimes
> it works just fine...
>
> Here's a quick demo showing how to reproduce the problem:
>
> % uname -mrs
> FreeBSD 13.0-RELEASE-p11 amd64
> % git --version
> git version 2.35.2
> % git clone --bare --mirror https://github.com/HardenedBSD/hardenedBSD.git
> % cd hardenedBSD.git
> % git rev-list --branches=\* |
>   git name-rev --stdin --refs=heads/\* |
>   egrep -v '^[0-9a-f]{40}( \([a-zA-Z0-9_/.^~-]+\))?$'
> 3eb67b534cab6a78b44b13e4323fd60353003089 (y:    marcel
> MFC after:      3 days
> Relnotes:       yes
> Sponsored by:   ScaleEngine Inc.
> Differential Revision:  https://reviews.freebsd.org/D3065
> ~3)
> 3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (y:    marcel
> ...
> Passing these commits into name-rev as arguments finds them under
> hardened/current/relro~199^2
>
> git fsck --full does not reveal or fix anything, and the problem also
> persists with a build from source from the next branch.
>
> I was unable to reproduce on an Ubuntu machine with 2.32.0, so I used
> that as a starting point for bisection and landed here:
>
>   3656f842789d25d75da41c6c029470052a573b54
>   name-rev: prefer shorter names over following merges

commit 3656f842789d25d75da41c6c029470052a573b54
Author: Elijah Newren <newren@gmail.com>

Hmph, Elijah, does this ring a bell?

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-20  3:13 ` Junio C Hamano
@ 2022-04-21  2:11   ` Elijah Newren
  2022-04-21 17:55     ` René Scharfe
  0 siblings, 1 reply; 8+ messages in thread
From: Elijah Newren @ 2022-04-21  2:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Thomas Hurst, René Scharfe

On Tue, Apr 19, 2022 at 8:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Thomas Hurst <tom@hur.st> writes:
>
> > I've noticed a series of about 20 commits in the HardenedBSD repository
> > fairly reliably produce garbage names from git name-rev - usually
> > fragments of another commit, sometimes unprintable nonsense.  Sometimes
> > it works just fine...
> >
> > Here's a quick demo showing how to reproduce the problem:
> >
> > % uname -mrs
> > FreeBSD 13.0-RELEASE-p11 amd64
> > % git --version
> > git version 2.35.2
> > % git clone --bare --mirror https://github.com/HardenedBSD/hardenedBSD.git
> > % cd hardenedBSD.git
> > % git rev-list --branches=\* |
> >   git name-rev --stdin --refs=heads/\* |
> >   egrep -v '^[0-9a-f]{40}( \([a-zA-Z0-9_/.^~-]+\))?$'
> > 3eb67b534cab6a78b44b13e4323fd60353003089 (y:    marcel
> > MFC after:      3 days
> > Relnotes:       yes
> > Sponsored by:   ScaleEngine Inc.
> > Differential Revision:  https://reviews.freebsd.org/D3065
> > ~3)
> > 3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (y:    marcel
> > ...
> > Passing these commits into name-rev as arguments finds them under
> > hardened/current/relro~199^2
> >
> > git fsck --full does not reveal or fix anything, and the problem also
> > persists with a build from source from the next branch.
> >
> > I was unable to reproduce on an Ubuntu machine with 2.32.0, so I used
> > that as a starting point for bisection and landed here:
> >
> >   3656f842789d25d75da41c6c029470052a573b54
> >   name-rev: prefer shorter names over following merges
>
> commit 3656f842789d25d75da41c6c029470052a573b54
> Author: Elijah Newren <newren@gmail.com>
>
> Hmph, Elijah, does this ring a bell?

After digging around last night and tonight, this appears to be a poor
interaction with commit 2d53975488 ("name-rev: release unused name
strings", 2020-02-04), which frees shared strings and relies on all
other users of that shared string to update their name, which
apparently seemed to rely on some intricacies of how the algorithm
worked that are no longer valid with my change, resulting in some
use-after-frees (though for some reason valgrind isn't spotting them
for me, which made it harder to track these down).

Reverting 2d53975488 fixes the problem.

Maybe this means we need to have tip_name be a string + a refcount, so
that we can know when we can safely free it?  Adding Rene to the cc
for comments.  Rene: If it helps, there's a slightly simpler
reproduction: clone the repo Thomas mentions, and then instead of his
"rev-list | name-rev | grep" sequence just run:

    git name-rev --refs=heads/\* 3eb67b534cab6a78b44b13e4323fd60353003089

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-21  2:11   ` Elijah Newren
@ 2022-04-21 17:55     ` René Scharfe
  2022-04-22 18:44       ` René Scharfe
  0 siblings, 1 reply; 8+ messages in thread
From: René Scharfe @ 2022-04-21 17:55 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano; +Cc: Git Mailing List, Thomas Hurst

Am 21.04.22 um 04:11 schrieb Elijah Newren:
> On Tue, Apr 19, 2022 at 8:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Thomas Hurst <tom@hur.st> writes:
>>
>>> I've noticed a series of about 20 commits in the HardenedBSD repository
>>> fairly reliably produce garbage names from git name-rev - usually
>>> fragments of another commit, sometimes unprintable nonsense.  Sometimes
>>> it works just fine...
>>>
>>> Here's a quick demo showing how to reproduce the problem:
>>>
>>> % uname -mrs
>>> FreeBSD 13.0-RELEASE-p11 amd64
>>> % git --version
>>> git version 2.35.2
>>> % git clone --bare --mirror https://github.com/HardenedBSD/hardenedBSD.git
>>> % cd hardenedBSD.git
>>> % git rev-list --branches=\* |
>>>   git name-rev --stdin --refs=heads/\* |
>>>   egrep -v '^[0-9a-f]{40}( \([a-zA-Z0-9_/.^~-]+\))?$'
>>> 3eb67b534cab6a78b44b13e4323fd60353003089 (y:    marcel
>>> MFC after:      3 days
>>> Relnotes:       yes
>>> Sponsored by:   ScaleEngine Inc.
>>> Differential Revision:  https://reviews.freebsd.org/D3065
>>> ~3)
>>> 3ac660fc0c6eb0f876972e7e415c89f1ebed1939 (y:    marcel
>>> ...
>>> Passing these commits into name-rev as arguments finds them under
>>> hardened/current/relro~199^2
>>>
>>> git fsck --full does not reveal or fix anything, and the problem also
>>> persists with a build from source from the next branch.
>>>
>>> I was unable to reproduce on an Ubuntu machine with 2.32.0, so I used
>>> that as a starting point for bisection and landed here:
>>>
>>>   3656f842789d25d75da41c6c029470052a573b54
>>>   name-rev: prefer shorter names over following merges
>>
>> commit 3656f842789d25d75da41c6c029470052a573b54
>> Author: Elijah Newren <newren@gmail.com>
>>
>> Hmph, Elijah, does this ring a bell?
>
> After digging around last night and tonight, this appears to be a poor
> interaction with commit 2d53975488 ("name-rev: release unused name
> strings", 2020-02-04), which frees shared strings and relies on all
> other users of that shared string to update their name, which
> apparently seemed to rely on some intricacies of how the algorithm
> worked that are no longer valid with my change, resulting in some
> use-after-frees (though for some reason valgrind isn't spotting them
> for me, which made it harder to track these down).

So we need better test coverage.

The strings are freed for generation 0 because their new name was always
better for higher generations as well and thus would be replaced and not
looked at again.  I think that assumption doesn't hold anymore because
3656f84278 (name-rev: prefer shorter names over following merges,
2021-12-04) raised the bar for names to be accepted for generation > 0.

> Reverting 2d53975488 fixes the problem.

That's a good band-aid.  Reverting 3656f84278 would help as well, but
change the output.

> Maybe this means we need to have tip_name be a string + a refcount, so
> that we can know when we can safely free it?

Sure, at the cost of storing and maintaining the refcounts.

IIRC I also explored avoiding to append suffixes to candidate name
strings in get_parent_name() and instead storing a reference to the
parent.  Many tag names are shorter than the eight bytes of a pointer
and certainly shorter than an object ID, so while that would reduce the
number of strings to deal with (and thus the need to free them), it
probably would increases the overall memory usage.

>  Adding Rene to the cc
> for comments.  Rene: If it helps, there's a slightly simpler
> reproduction: clone the repo Thomas mentions, and then instead of his
> "rev-list | name-rev | grep" sequence just run:
>
>     git name-rev --refs=heads/\* 3eb67b534cab6a78b44b13e4323fd60353003089

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-21 17:55     ` René Scharfe
@ 2022-04-22 18:44       ` René Scharfe
  2022-04-23 16:47         ` Junio C Hamano
  2022-05-17 10:15         ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 8+ messages in thread
From: René Scharfe @ 2022-04-22 18:44 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano; +Cc: Git Mailing List, Thomas Hurst

Am 21.04.22 um 19:55 schrieb René Scharfe:
> Am 21.04.22 um 04:11 schrieb Elijah Newren:
>
>> Reverting 2d53975488 fixes the problem.
>
> That's a good band-aid.
Or perhaps it's all we need.  I can't replicate the original reduction
of peak memory usage for the Chromium repo anymore.  In fact, the very
next commit, 079f970971 (name-rev: sort tip names before applying,
2020-02-05), reduced the number of times free(3) is called there from
44245 to 5, and 3656f84278 (name-rev: prefer shorter names over
following merges, 2021-12-04) brought that number down to zero.

I can't reproduce the issue with the hardenedBSD repo, by the way, but
e.g. with 'git name-rev 58b82150da' in the Linux repo.

--- >8 ---
Subject: [PATCH] Revert "name-rev: release unused name strings"

This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6.

3656f84278 (name-rev: prefer shorter names over following merges,
2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
name strings, 2020-02-04) that a better name for a child is a better
name for all of its ancestors as well, because it added a penalty for
generation > 0.  This leads to strings being free(3)'d that are still
needed.

079f970971 (name-rev: sort tip names before applying, 2020-02-05)
already reduced the number of free(3) calls for the use case that
motivated the original patch (name-rev --all in the Chromium repository)
from ca. 44000 to 5, and 3656f84278 eliminated even those few.  So this
revert won't affect name-rev's performance on that particular repo.

Reported-by: Thomas Hurst <tom@hur.st>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/name-rev.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c59b5699fe..02ea9d1633 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -18,7 +18,7 @@
 #define CUTOFF_DATE_SLOP 86400

 struct rev_name {
-	char *tip_name;
+	const char *tip_name;
 	timestamp_t taggerdate;
 	int generation;
 	int distance;
@@ -84,7 +84,7 @@ static int commit_is_before_cutoff(struct commit *commit)

 static int is_valid_rev_name(const struct rev_name *name)
 {
-	return name && (name->generation || name->tip_name);
+	return name && name->tip_name;
 }

 static struct rev_name *get_commit_rev_name(const struct commit *commit)
@@ -146,20 +146,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
 {
 	struct rev_name *name = commit_rev_name_at(&rev_names, commit);

-	if (is_valid_rev_name(name)) {
-		if (!is_better_name(name, taggerdate, generation, distance, from_tag))
-			return NULL;
-
-		/*
-		 * This string might still be shared with ancestors
-		 * (generation > 0).  We can release it here regardless,
-		 * because the new name that has just won will be better
-		 * for them as well, so name_rev() will replace these
-		 * stale pointers when it processes the parents.
-		 */
-		if (!name->generation)
-			free(name->tip_name);
-	}
+	if (is_valid_rev_name(name) &&
+	    !is_better_name(name, taggerdate, generation, distance, from_tag))
+		return NULL;

 	name->taggerdate = taggerdate;
 	name->generation = generation;
--
2.35.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-22 18:44       ` René Scharfe
@ 2022-04-23 16:47         ` Junio C Hamano
  2022-05-17 10:15         ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-04-23 16:47 UTC (permalink / raw)
  To: René Scharfe; +Cc: Elijah Newren, Git Mailing List, Thomas Hurst

René Scharfe <l.s.r@web.de> writes:

> Subject: [PATCH] Revert "name-rev: release unused name strings"
>
> This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6.
>
> 3656f84278 (name-rev: prefer shorter names over following merges,
> 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
> name strings, 2020-02-04) that a better name for a child is a better
> name for all of its ancestors as well, because it added a penalty for
> generation > 0.  This leads to strings being free(3)'d that are still
> needed.

Nicely described.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-04-22 18:44       ` René Scharfe
  2022-04-23 16:47         ` Junio C Hamano
@ 2022-05-17 10:15         ` Ævar Arnfjörð Bjarmason
  2022-05-17 16:20           ` René Scharfe
  1 sibling, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 10:15 UTC (permalink / raw)
  To: René Scharfe
  Cc: Elijah Newren, Junio C Hamano, Git Mailing List, Thomas Hurst


On Fri, Apr 22 2022, René Scharfe wrote:

> Am 21.04.22 um 19:55 schrieb René Scharfe:
>> Am 21.04.22 um 04:11 schrieb Elijah Newren:
>>
>>> Reverting 2d53975488 fixes the problem.
>>
>> That's a good band-aid.
> Or perhaps it's all we need.  I can't replicate the original reduction
> of peak memory usage for the Chromium repo anymore.  In fact, the very
> next commit, 079f970971 (name-rev: sort tip names before applying,
> 2020-02-05), reduced the number of times free(3) is called there from
> 44245 to 5, and 3656f84278 (name-rev: prefer shorter names over
> following merges, 2021-12-04) brought that number down to zero.
>
> I can't reproduce the issue with the hardenedBSD repo, by the way, but
> e.g. with 'git name-rev 58b82150da' in the Linux repo.
>
> --- >8 ---
> Subject: [PATCH] Revert "name-rev: release unused name strings"
>
> This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6.
>
> 3656f84278 (name-rev: prefer shorter names over following merges,
> 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
> name strings, 2020-02-04) that a better name for a child is a better
> name for all of its ancestors as well, because it added a penalty for
> generation > 0.  This leads to strings being free(3)'d that are still
> needed.
>
> 079f970971 (name-rev: sort tip names before applying, 2020-02-05)
> already reduced the number of free(3) calls for the use case that
> motivated the original patch (name-rev --all in the Chromium repository)
> from ca. 44000 to 5, and 3656f84278 eliminated even those few.  So this
> revert won't affect name-rev's performance on that particular repo.
>
> Reported-by: Thomas Hurst <tom@hur.st>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  builtin/name-rev.c | 21 +++++----------------
>  1 file changed, 5 insertions(+), 16 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index c59b5699fe..02ea9d1633 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -18,7 +18,7 @@
>  #define CUTOFF_DATE_SLOP 86400
>
>  struct rev_name {
> -	char *tip_name;
> +	const char *tip_name;
>  	timestamp_t taggerdate;
>  	int generation;
>  	int distance;
> @@ -84,7 +84,7 @@ static int commit_is_before_cutoff(struct commit *commit)
>
>  static int is_valid_rev_name(const struct rev_name *name)
>  {
> -	return name && (name->generation || name->tip_name);
> +	return name && name->tip_name;
>  }
>
>  static struct rev_name *get_commit_rev_name(const struct commit *commit)
> @@ -146,20 +146,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  {
>  	struct rev_name *name = commit_rev_name_at(&rev_names, commit);
>
> -	if (is_valid_rev_name(name)) {
> -		if (!is_better_name(name, taggerdate, generation, distance, from_tag))
> -			return NULL;
> -
> -		/*
> -		 * This string might still be shared with ancestors
> -		 * (generation > 0).  We can release it here regardless,
> -		 * because the new name that has just won will be better
> -		 * for them as well, so name_rev() will replace these
> -		 * stale pointers when it processes the parents.
> -		 */
> -		if (!name->generation)
> -			free(name->tip_name);
> -	}
> +	if (is_valid_rev_name(name) &&
> +	    !is_better_name(name, taggerdate, generation, distance, from_tag))
> +		return NULL;
>
>  	name->taggerdate = taggerdate;
>  	name->generation = generation;

I haven't dug into whether it's a false positive, but with this change
GCC's -fanalyzer has started complaining about a potential NULL
dereference:

    builtin/name-rev.c:230:50: error: dereference of NULL ‘name’ [CWE-476] [-Werror=analyzer-null-dereference]
      230 |                                 generation = name->generation + 1;

This "fixes" it, and passes all tests, but presumably a better fix
involves the callers of get_commit_rev_name() (or that function itself)
deciding if they're OK with NULL here earlier?:
	
	diff --git a/builtin/name-rev.c b/builtin/name-rev.c
	index 02ea9d16330..1d3a620ac72 100644
	--- a/builtin/name-rev.c
	+++ b/builtin/name-rev.c
	@@ -209,6 +209,7 @@ static void name_rev(struct commit *start_commit,
	 		struct rev_name *name = get_commit_rev_name(commit);
	 		struct commit_list *parents;
	 		int parent_number = 1;
	+		assert(name);
	 
	 		parents_to_queue_nr = 0;
	 
	

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Corrupt name-rev output
  2022-05-17 10:15         ` Ævar Arnfjörð Bjarmason
@ 2022-05-17 16:20           ` René Scharfe
  0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2022-05-17 16:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren, Junio C Hamano, Git Mailing List, Thomas Hurst

Am 17.05.22 um 12:15 schrieb Ævar Arnfjörð Bjarmason:
>
> On Fri, Apr 22 2022, René Scharfe wrote:
>
>> Am 21.04.22 um 19:55 schrieb René Scharfe:
>>> Am 21.04.22 um 04:11 schrieb Elijah Newren:
>>>
>>>> Reverting 2d53975488 fixes the problem.
>>>
>>> That's a good band-aid.
>> Or perhaps it's all we need.  I can't replicate the original reduction
>> of peak memory usage for the Chromium repo anymore.  In fact, the very
>> next commit, 079f970971 (name-rev: sort tip names before applying,
>> 2020-02-05), reduced the number of times free(3) is called there from
>> 44245 to 5, and 3656f84278 (name-rev: prefer shorter names over
>> following merges, 2021-12-04) brought that number down to zero.
>>
>> I can't reproduce the issue with the hardenedBSD repo, by the way, but
>> e.g. with 'git name-rev 58b82150da' in the Linux repo.
>>
>> --- >8 ---
>> Subject: [PATCH] Revert "name-rev: release unused name strings"
>>
>> This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6.
>>
>> 3656f84278 (name-rev: prefer shorter names over following merges,
>> 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
>> name strings, 2020-02-04) that a better name for a child is a better
>> name for all of its ancestors as well, because it added a penalty for
>> generation > 0.  This leads to strings being free(3)'d that are still
>> needed.
>>
>> 079f970971 (name-rev: sort tip names before applying, 2020-02-05)
>> already reduced the number of free(3) calls for the use case that
>> motivated the original patch (name-rev --all in the Chromium repository)
>> from ca. 44000 to 5, and 3656f84278 eliminated even those few.  So this
>> revert won't affect name-rev's performance on that particular repo.
>>
>> Reported-by: Thomas Hurst <tom@hur.st>
>> Helped-by: Elijah Newren <newren@gmail.com>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  builtin/name-rev.c | 21 +++++----------------
>>  1 file changed, 5 insertions(+), 16 deletions(-)
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index c59b5699fe..02ea9d1633 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -18,7 +18,7 @@
>>  #define CUTOFF_DATE_SLOP 86400
>>
>>  struct rev_name {
>> -	char *tip_name;
>> +	const char *tip_name;
>>  	timestamp_t taggerdate;
>>  	int generation;
>>  	int distance;
>> @@ -84,7 +84,7 @@ static int commit_is_before_cutoff(struct commit *commit)
>>
>>  static int is_valid_rev_name(const struct rev_name *name)
>>  {
>> -	return name && (name->generation || name->tip_name);
>> +	return name && name->tip_name;
>>  }
>>
>>  static struct rev_name *get_commit_rev_name(const struct commit *commit)
>> @@ -146,20 +146,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>>  {
>>  	struct rev_name *name = commit_rev_name_at(&rev_names, commit);
>>
>> -	if (is_valid_rev_name(name)) {
>> -		if (!is_better_name(name, taggerdate, generation, distance, from_tag))
>> -			return NULL;
>> -
>> -		/*
>> -		 * This string might still be shared with ancestors
>> -		 * (generation > 0).  We can release it here regardless,
>> -		 * because the new name that has just won will be better
>> -		 * for them as well, so name_rev() will replace these
>> -		 * stale pointers when it processes the parents.
>> -		 */
>> -		if (!name->generation)
>> -			free(name->tip_name);
>> -	}
>> +	if (is_valid_rev_name(name) &&
>> +	    !is_better_name(name, taggerdate, generation, distance, from_tag))
>> +		return NULL;
>>
>>  	name->taggerdate = taggerdate;
>>  	name->generation = generation;
>
> I haven't dug into whether it's a false positive, but with this change
> GCC's -fanalyzer has started complaining about a potential NULL
> dereference:
>
>     builtin/name-rev.c:230:50: error: dereference of NULL ‘name’ [CWE-476] [-Werror=analyzer-null-dereference]
>       230 |                                 generation = name->generation + 1;
>
> This "fixes" it, and passes all tests, but presumably a better fix
> involves the callers of get_commit_rev_name() (or that function itself)
> deciding if they're OK with NULL here earlier?:
>
> 	diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> 	index 02ea9d16330..1d3a620ac72 100644
> 	--- a/builtin/name-rev.c
> 	+++ b/builtin/name-rev.c
> 	@@ -209,6 +209,7 @@ static void name_rev(struct commit *start_commit,
> 	 		struct rev_name *name = get_commit_rev_name(commit);
> 	 		struct commit_list *parents;
> 	 		int parent_number = 1;
> 	+		assert(name);
>
> 	 		parents_to_queue_nr = 0;
>
>

It's a false positive AFAICS because we set the name of the first commit
to go into the queue with create_or_update_name(), then go through its
parents and set their name as well before putting them into the queue to
traverse ancestors recursively.  I.e. everything we pull from the queue
must have a non-NULL name, right?

This begs the question why a free(3) call would make a difference to the
analyzer -- it doesn't affect NULL pointers after all.

Another good question is whether we should remove the validity check and
use commit_rev_name_peek() directly instead of get_commit_rev_name() in
the loop since we know it must be valid.  This might save a few cycles
and perhaps calm down the analyzer.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-17 16:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-17 16:20 Corrupt name-rev output Thomas Hurst
2022-04-20  3:13 ` Junio C Hamano
2022-04-21  2:11   ` Elijah Newren
2022-04-21 17:55     ` René Scharfe
2022-04-22 18:44       ` René Scharfe
2022-04-23 16:47         ` Junio C Hamano
2022-05-17 10:15         ` Ævar Arnfjörð Bjarmason
2022-05-17 16:20           ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).