git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] khash: clarify that allocations never fail
@ 2021-07-03 10:05 René Scharfe
  2021-07-03 10:38 ` Jeff King
  2021-07-03 12:57 ` [PATCH v2] " René Scharfe
  0 siblings, 2 replies; 12+ messages in thread
From: René Scharfe @ 2021-07-03 10:05 UTC (permalink / raw)
  To: Git List; +Cc: Junio C Hamano, Eric Wong, Jeff King

We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
the program on error, so code that's using them doesn't have to handle
allocation failures.  Make this behavior explicit by replacing the code
that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 khash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/khash.h b/khash.h
index 21c2095216..84ff7230b6 100644
--- a/khash.h
+++ b/khash.h
@@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
 			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
 			else { /* hash table size to be changed (shrink or expand); rehash */ \
 				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
-				if (!new_flags) return -1;								\
+				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
 				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
 				if (h->n_buckets < new_n_buckets) {	/* expand */		\
 					REALLOC_ARRAY(h->keys, new_n_buckets); \
@@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
 		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
 			if (h->n_buckets > (h->size<<1)) {							\
 				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
-					*ret = -1; return h->n_buckets;						\
+					BUG("kh_resize_" #name " failed");					\
 				}														\
 			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
-				*ret = -1; return h->n_buckets;							\
+				BUG("kh_resize_" #name " failed");						\
 			}															\
 		} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
 		{																\
--
2.32.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 10:05 [PATCH] khash: clarify that allocations never fail René Scharfe
@ 2021-07-03 10:38 ` Jeff King
  2021-07-03 10:44   ` Jeff King
                     ` (2 more replies)
  2021-07-03 12:57 ` [PATCH v2] " René Scharfe
  1 sibling, 3 replies; 12+ messages in thread
From: Jeff King @ 2021-07-03 10:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Eric Wong

On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:

> We use our standard allocation functions and macros (xcalloc,
> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> the program on error, so code that's using them doesn't have to handle
> allocation failures.  Make this behavior explicit by replacing the code
> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.

Seems like a good idea.

We're very sloppy about checking the "ret" field from kh_put_* for
errors (it's a tri-state for "already existed", "newly added", or
"error"). I think that's not a problem because as you show here, we
can't actually hit the error case. This makes that much more obvious.

Two nits if we wanted to go further:

> diff --git a/khash.h b/khash.h
> index 21c2095216..84ff7230b6 100644
> --- a/khash.h
> +++ b/khash.h
> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
> -				if (!new_flags) return -1;								\
> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\

I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
2016-02-22), but left the now-obsolete error-check.

But a few lines below...

>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>  					REALLOC_ARRAY(h->keys, new_n_buckets); \

These REALLOC_ARRAY() calls are in the same boat. You dropped the error
check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
of arrays, 2014-09-16).

Should we make the two match? I'd probably do so by making the former
match the latter, and just drop the conditional and BUG entirely.

> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>  			if (h->n_buckets > (h->size<<1)) {							\
>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
> -					*ret = -1; return h->n_buckets;						\
> +					BUG("kh_resize_" #name " failed");					\
>  				}														\
>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
> -				*ret = -1; return h->n_buckets;							\
> +				BUG("kh_resize_" #name " failed");						\

After the first hunk, does kh_resize_*() ever return anything but 0? If
not, can we drop its return entirely, making it more clear that it's not
expected to fail? Both for human readers, but also for the compiler
(which could then alert us at compile-time if we missed any error
cases).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 10:38 ` Jeff King
@ 2021-07-03 10:44   ` Jeff King
  2021-07-03 11:35   ` Ævar Arnfjörð Bjarmason
  2021-07-03 12:57   ` René Scharfe
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2021-07-03 10:44 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List, Junio C Hamano, Eric Wong

On Sat, Jul 03, 2021 at 06:38:27AM -0400, Jeff King wrote:

> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
> 
> > We use our standard allocation functions and macros (xcalloc,
> > ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> > the program on error, so code that's using them doesn't have to handle
> > allocation failures.  Make this behavior explicit by replacing the code
> > that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
> 
> Seems like a good idea.
> 
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.

Actually a quad-state, looking at the code (it distinguishes "in table
but deleted", though I don't think I've ever seen a case where that is
useful).

> Two nits if we wanted to go further:

In patch form (mostly because I was curious if I was missing any cases;
I'd probably squash it in with yours):

diff --git a/khash.h b/khash.h
index 84ff7230b6..fad486c966 100644
--- a/khash.h
+++ b/khash.h
@@ -74,7 +74,7 @@ static const double __ac_HASH_UPPER = 0.77;
 	void kh_destroy_##name(kh_##name##_t *h);					\
 	void kh_clear_##name(kh_##name##_t *h);						\
 	khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \
-	int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
+	void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
 	khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \
 	void kh_del_##name(kh_##name##_t *h, khint_t x);
 
@@ -116,7 +116,7 @@ static const double __ac_HASH_UPPER = 0.77;
 			return __ac_iseither(h->flags, i)? h->n_buckets : i;		\
 		} else return 0;												\
 	}																	\
-	SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
+	SCOPE void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
 	{ /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \
 		khint32_t *new_flags = NULL;										\
 		khint_t j = 1;													\
@@ -173,18 +173,15 @@ static const double __ac_HASH_UPPER = 0.77;
 			h->n_occupied = h->size;									\
 			h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \
 		}																\
-		return 0;														\
 	}																	\
 	SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
 	{																	\
 		khint_t x;														\
 		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
 			if (h->n_buckets > (h->size<<1)) {							\
-				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
-					BUG("kh_resize_" #name " failed");					\
-				}														\
-			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
-				BUG("kh_resize_" #name " failed");						\
+				kh_resize_##name(h, h->n_buckets - 1); /* clear "deleted" elements */ \
+			} else { \
+				kh_resize_##name(h, h->n_buckets + 1); /* expand the hash table */ \
 			}															\
 		} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
 		{																\

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 10:38 ` Jeff King
  2021-07-03 10:44   ` Jeff King
@ 2021-07-03 11:35   ` Ævar Arnfjörð Bjarmason
  2021-07-03 12:56     ` René Scharfe
  2021-07-03 12:57   ` René Scharfe
  2 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-03 11:35 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Git List, Junio C Hamano, Eric Wong


On Sat, Jul 03 2021, Jeff King wrote:

> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>
>> We use our standard allocation functions and macros (xcalloc,
>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>> the program on error, so code that's using them doesn't have to handle
>> allocation failures.  Make this behavior explicit by replacing the code
>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>
> Seems like a good idea.
>
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.
>
> Two nits if we wanted to go further:
>
>> diff --git a/khash.h b/khash.h
>> index 21c2095216..84ff7230b6 100644
>> --- a/khash.h
>> +++ b/khash.h
>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>> -				if (!new_flags) return -1;								\
>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>
> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
> 2016-02-22), but left the now-obsolete error-check.
>
> But a few lines below...
>
>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>
> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
> of arrays, 2014-09-16).
>
> Should we make the two match? I'd probably do so by making the former
> match the latter, and just drop the conditional and BUG entirely.

Yes, I don't see why we should be guarding theis anymore than we do
xmalloc() or other x*() functions in various places (which is what it
resolves to).

If anything we might consider renaming it via coccinelle to
XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
any errors themselves.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 11:35   ` Ævar Arnfjörð Bjarmason
@ 2021-07-03 12:56     ` René Scharfe
  2021-07-03 13:05       ` Ævar Arnfjörð Bjarmason
  2021-07-04  9:01       ` Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: René Scharfe @ 2021-07-03 12:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: Git List, Junio C Hamano, Eric Wong

Am 03.07.21 um 13:35 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Jul 03 2021, Jeff King wrote:
>
>> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>>
>>> We use our standard allocation functions and macros (xcalloc,
>>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>>> the program on error, so code that's using them doesn't have to handle
>>> allocation failures.  Make this behavior explicit by replacing the code
>>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>>
>> Seems like a good idea.
>>
>> We're very sloppy about checking the "ret" field from kh_put_* for
>> errors (it's a tri-state for "already existed", "newly added", or
>> "error"). I think that's not a problem because as you show here, we
>> can't actually hit the error case. This makes that much more obvious.
>>
>> Two nits if we wanted to go further:
>>
>>> diff --git a/khash.h b/khash.h
>>> index 21c2095216..84ff7230b6 100644
>>> --- a/khash.h
>>> +++ b/khash.h
>>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>>> -				if (!new_flags) return -1;								\
>>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>>
>> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
>> 2016-02-22), but left the now-obsolete error-check.
>>
>> But a few lines below...
>>
>>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>>
>> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
>> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
>> of arrays, 2014-09-16).
>>
>> Should we make the two match? I'd probably do so by making the former
>> match the latter, and just drop the conditional and BUG entirely.
>
> Yes, I don't see why we should be guarding theis anymore than we do
> xmalloc() or other x*() functions in various places (which is what it
> resolves to).

Agreed.

> If anything we might consider renaming it via coccinelle to
> XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
> any errors themselves.

I don't think there's any confusion in our internal code about the macros'
handling of allocation errors.

The following semantic patch finds a leery xmalloc() caller in
compat/mmap.c, though:

@@
expression PTR, SIZE, SIZE2;
@@
(
  PTR = xmalloc(SIZE);
|
  PTR = xcalloc(SIZE, SIZE2);
|
  PTR = xrealloc(SIZE);
|
  ALLOC_ARRAY(PTR, SIZE);
|
  CALLOC_ARRAY(PTR, SIZE);
|
  REALLOC_ARRAY(PTR, SIZE);
)
  if (
- PTR == NULL
+ 0
  ) {...}

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 10:38 ` Jeff King
  2021-07-03 10:44   ` Jeff King
  2021-07-03 11:35   ` Ævar Arnfjörð Bjarmason
@ 2021-07-03 12:57   ` René Scharfe
  2 siblings, 0 replies; 12+ messages in thread
From: René Scharfe @ 2021-07-03 12:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Junio C Hamano, Eric Wong

Am 03.07.21 um 12:38 schrieb Jeff King:
> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>
>> We use our standard allocation functions and macros (xcalloc,
>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>> the program on error, so code that's using them doesn't have to handle
>> allocation failures.  Make this behavior explicit by replacing the code
>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>
> Seems like a good idea.
>
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.
>
> Two nits if we wanted to go further:
>
>> diff --git a/khash.h b/khash.h
>> index 21c2095216..84ff7230b6 100644
>> --- a/khash.h
>> +++ b/khash.h
>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>> -				if (!new_flags) return -1;								\
>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>
> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
> 2016-02-22), but left the now-obsolete error-check.
>
> But a few lines below...
>
>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>
> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
> of arrays, 2014-09-16).
>
> Should we make the two match? I'd probably do so by making the former
> match the latter, and just drop the conditional and BUG entirely.

Yeah, makes sense, thank you.

>
>> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>>  			if (h->n_buckets > (h->size<<1)) {							\
>>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
>> -					*ret = -1; return h->n_buckets;						\
>> +					BUG("kh_resize_" #name " failed");					\
>>  				}														\
>>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
>> -				*ret = -1; return h->n_buckets;							\
>> +				BUG("kh_resize_" #name " failed");						\
>
> After the first hunk, does kh_resize_*() ever return anything but 0? If
> not, can we drop its return entirely, making it more clear that it's not
> expected to fail? Both for human readers, but also for the compiler
> (which could then alert us at compile-time if we missed any error
> cases).

Good idea.  Both type of changes make syncing with upstream a bit
harder, but even though the return type change bleeds into the caller,
the overall change affects only a small area.

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] khash: clarify that allocations never fail
  2021-07-03 10:05 [PATCH] khash: clarify that allocations never fail René Scharfe
  2021-07-03 10:38 ` Jeff King
@ 2021-07-03 12:57 ` René Scharfe
  2021-07-04  9:05   ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: René Scharfe @ 2021-07-03 12:57 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason

We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
the program on error instead, so code that's using them doesn't have to
handle allocation failures.  Make this behavior explicit by turning
kh_resize_ into a void function and removing the related unreachable
error handling code.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Changes since v1:
- Removed BUG calls.
- Made kh_resize_ void instead.

 khash.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/khash.h b/khash.h
index 21c2095216..cb79bf8856 100644
--- a/khash.h
+++ b/khash.h
@@ -74,7 +74,7 @@ static const double __ac_HASH_UPPER = 0.77;
 	void kh_destroy_##name(kh_##name##_t *h);					\
 	void kh_clear_##name(kh_##name##_t *h);						\
 	khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \
-	int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
+	void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
 	khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \
 	void kh_del_##name(kh_##name##_t *h, khint_t x);

@@ -116,7 +116,7 @@ static const double __ac_HASH_UPPER = 0.77;
 			return __ac_iseither(h->flags, i)? h->n_buckets : i;		\
 		} else return 0;												\
 	}																	\
-	SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
+	SCOPE void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
 	{ /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \
 		khint32_t *new_flags = NULL;										\
 		khint_t j = 1;													\
@@ -126,7 +126,6 @@ static const double __ac_HASH_UPPER = 0.77;
 			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
 			else { /* hash table size to be changed (shrink or expand); rehash */ \
 				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
-				if (!new_flags) return -1;								\
 				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
 				if (h->n_buckets < new_n_buckets) {	/* expand */		\
 					REALLOC_ARRAY(h->keys, new_n_buckets); \
@@ -173,18 +172,15 @@ static const double __ac_HASH_UPPER = 0.77;
 			h->n_occupied = h->size;									\
 			h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \
 		}																\
-		return 0;														\
 	}																	\
 	SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
 	{																	\
 		khint_t x;														\
 		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
 			if (h->n_buckets > (h->size<<1)) {							\
-				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
-					*ret = -1; return h->n_buckets;						\
-				}														\
-			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
-				*ret = -1; return h->n_buckets;							\
+				kh_resize_##name(h, h->n_buckets - 1); /* clear "deleted" elements */ \
+			} else { \
+				kh_resize_##name(h, h->n_buckets + 1); /* expand the hash table */ \
 			}															\
 		} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
 		{																\
--
2.32.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 12:56     ` René Scharfe
@ 2021-07-03 13:05       ` Ævar Arnfjörð Bjarmason
  2021-07-04  9:01       ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-03 13:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Git List, Junio C Hamano, Eric Wong


On Sat, Jul 03 2021, René Scharfe wrote:

> Am 03.07.21 um 13:35 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Jul 03 2021, Jeff King wrote:
>>
>>> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>>>
>>>> We use our standard allocation functions and macros (xcalloc,
>>>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>>>> the program on error, so code that's using them doesn't have to handle
>>>> allocation failures.  Make this behavior explicit by replacing the code
>>>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>>>
>>> Seems like a good idea.
>>>
>>> We're very sloppy about checking the "ret" field from kh_put_* for
>>> errors (it's a tri-state for "already existed", "newly added", or
>>> "error"). I think that's not a problem because as you show here, we
>>> can't actually hit the error case. This makes that much more obvious.
>>>
>>> Two nits if we wanted to go further:
>>>
>>>> diff --git a/khash.h b/khash.h
>>>> index 21c2095216..84ff7230b6 100644
>>>> --- a/khash.h
>>>> +++ b/khash.h
>>>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>>>> -				if (!new_flags) return -1;								\
>>>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>>>
>>> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
>>> 2016-02-22), but left the now-obsolete error-check.
>>>
>>> But a few lines below...
>>>
>>>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>>>
>>> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
>>> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
>>> of arrays, 2014-09-16).
>>>
>>> Should we make the two match? I'd probably do so by making the former
>>> match the latter, and just drop the conditional and BUG entirely.
>>
>> Yes, I don't see why we should be guarding theis anymore than we do
>> xmalloc() or other x*() functions in various places (which is what it
>> resolves to).
>
> Agreed.
>
>> If anything we might consider renaming it via coccinelle to
>> XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
>> any errors themselves.
>
> I don't think there's any confusion in our internal code about the macros'
> handling of allocation errors.
>
> The following semantic patch finds a leery xmalloc() caller in
> compat/mmap.c, though:
>
> @@
> expression PTR, SIZE, SIZE2;
> @@
> (
>   PTR = xmalloc(SIZE);
> |
>   PTR = xcalloc(SIZE, SIZE2);
> |
>   PTR = xrealloc(SIZE);
> |
>   ALLOC_ARRAY(PTR, SIZE);
> |
>   CALLOC_ARRAY(PTR, SIZE);
> |
>   REALLOC_ARRAY(PTR, SIZE);
> )
>   if (
> - PTR == NULL
> + 0
>   ) {...}
>
> René

Good catch, a bug as old as 730d48a2ef8 ([PATCH] If NO_MMAP is defined,
fake mmap() and munmap(), 2005-10-08).

It would be nice to have that coccinelle patch as a follow-up
patch. Perhaps along with changing that "0" to something that's simply a
syntax error, or just:

    if (0 /* always false due to (implicit?) x*() function call above */)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-03 12:56     ` René Scharfe
  2021-07-03 13:05       ` Ævar Arnfjörð Bjarmason
@ 2021-07-04  9:01       ` Jeff King
  2021-07-04  9:41         ` René Scharfe
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2021-07-04  9:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano,
	Eric Wong

On Sat, Jul 03, 2021 at 02:56:50PM +0200, René Scharfe wrote:

> > If anything we might consider renaming it via coccinelle to
> > XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
> > any errors themselves.
> 
> I don't think there's any confusion in our internal code about the macros'
> handling of allocation errors.

Agreed.

> The following semantic patch finds a leery xmalloc() caller in
> compat/mmap.c, though:
> 
> @@
> expression PTR, SIZE, SIZE2;
> @@
> (
>   PTR = xmalloc(SIZE);
> |
>   PTR = xcalloc(SIZE, SIZE2);
> |
>   PTR = xrealloc(SIZE);
> |
>   ALLOC_ARRAY(PTR, SIZE);
> |
>   CALLOC_ARRAY(PTR, SIZE);
> |
>   REALLOC_ARRAY(PTR, SIZE);
> )
>   if (
> - PTR == NULL
> + 0
>   ) {...}

IMHO that should not be using xmalloc() at all. It is a replacement for
system mmap, which can fail with ENOMEM, and we should be able to do the
same. Using xmalloc here is probably losing an opportunity to close
another pack window to free up memory for a new one.

I doubt it matters that much in practice (most systems are not even
compiling or using this code, and it would only matter in a tight memory
situation). But as a general rule, I think compat/ wrappers should
behave as much like (sensible) system equivalents as possible.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] khash: clarify that allocations never fail
  2021-07-03 12:57 ` [PATCH v2] " René Scharfe
@ 2021-07-04  9:05   ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2021-07-04  9:05 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Junio C Hamano, Eric Wong,
	Ævar Arnfjörð Bjarmason

On Sat, Jul 03, 2021 at 02:57:30PM +0200, René Scharfe wrote:

> We use our standard allocation functions and macros (xcalloc,
> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> the program on error instead, so code that's using them doesn't have to
> handle allocation failures.  Make this behavior explicit by turning
> kh_resize_ into a void function and removing the related unreachable
> error handling code.

Thanks, this looks good to me.

>  	SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
>  	{																	\
>  		khint_t x;														\
>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>  			if (h->n_buckets > (h->size<<1)) {							\
> -				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
> -					*ret = -1; return h->n_buckets;						\
> -				}														\
> -			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
> -				*ret = -1; return h->n_buckets;							\
> +				kh_resize_##name(h, h->n_buckets - 1); /* clear "deleted" elements */ \
> +			} else { \
> +				kh_resize_##name(h, h->n_buckets + 1); /* expand the hash table */ \
>  			}															\

I had to read this over again carefully, because the second "else-if"
makes the conversion less obvious. I think the original would have been
easier to read as:

  if (something) {
    if (do_option_one())
        ...error...
  } else {
    if (do_option_two())
        ...error...
  }

instead of:

  if (something) {
    if (do_option_one())
        ...error...
  } else if (do_option_two())
        ...error...
  }

All of which is to say the conversion here is correct, and I think as a
bonus the resulting code is easier to follow. ;)

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-04  9:01       ` Jeff King
@ 2021-07-04  9:41         ` René Scharfe
  2021-07-04 10:11           ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: René Scharfe @ 2021-07-04  9:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano,
	Eric Wong, Johannes Schindelin

Am 04.07.21 um 11:01 schrieb Jeff King:
> On Sat, Jul 03, 2021 at 02:56:50PM +0200, René Scharfe wrote:
>
>> The following semantic patch finds a leery xmalloc() caller in
>> compat/mmap.c, though:
>>
>> @@
>> expression PTR, SIZE, SIZE2;
>> @@
>> (
>>   PTR = xmalloc(SIZE);
>> |
>>   PTR = xcalloc(SIZE, SIZE2);
>> |
>>   PTR = xrealloc(SIZE);
>> |
>>   ALLOC_ARRAY(PTR, SIZE);
>> |
>>   CALLOC_ARRAY(PTR, SIZE);
>> |
>>   REALLOC_ARRAY(PTR, SIZE);
>> )
>>   if (
>> - PTR == NULL
>> + 0
>>   ) {...}
>

Btw. the found code is:

	start = xmalloc(length);
	if (start == NULL) {
		errno = ENOMEM;
		return MAP_FAILED;
	}

start cannot be NULL, so the check is dead code.

> IMHO that should not be using xmalloc() at all. It is a replacement for
> system mmap, which can fail with ENOMEM, and we should be able to do the
> same. Using xmalloc here is probably losing an opportunity to close
> another pack window to free up memory for a new one.

Do you mean using malloc(3) directly instead of xmalloc() would no
longer try to release pack windows?  xmalloc() hasn't done that anymore
since 9827d4c185 (packfile: drop release_pack_memory(), 2019-08-12).

xmalloc() still brings support for zero-sized allocations on platforms
whose malloc(3) doesn't like them, and it enforces GIT_ALLOC_LIMIT.
mmap() is supposed give up with EINVAL if the length is 0, so the
first feature is not actually helping.  And GIT_ALLOC_LIMIT is not
documented and only useful for testing, right?

> I doubt it matters that much in practice (most systems are not even
> compiling or using this code, and it would only matter in a tight memory
> situation). But as a general rule, I think compat/ wrappers should
> behave as much like (sensible) system equivalents as possible.

Makes sense.

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] khash: clarify that allocations never fail
  2021-07-04  9:41         ` René Scharfe
@ 2021-07-04 10:11           ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2021-07-04 10:11 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano,
	Eric Wong, Johannes Schindelin

On Sun, Jul 04, 2021 at 11:41:23AM +0200, René Scharfe wrote:

> Btw. the found code is:
> 
> 	start = xmalloc(length);
> 	if (start == NULL) {
> 		errno = ENOMEM;
> 		return MAP_FAILED;
> 	}
> 
> start cannot be NULL, so the check is dead code.

Yep, so it's doubly silly.

> > IMHO that should not be using xmalloc() at all. It is a replacement for
> > system mmap, which can fail with ENOMEM, and we should be able to do the
> > same. Using xmalloc here is probably losing an opportunity to close
> > another pack window to free up memory for a new one.
> 
> Do you mean using malloc(3) directly instead of xmalloc() would no
> longer try to release pack windows?  xmalloc() hasn't done that anymore
> since 9827d4c185 (packfile: drop release_pack_memory(), 2019-08-12).

No, I meant that by using xmalloc() here, if the allocation fails, we'll
immediately die(). Whereas the caller of mmap() could get the ENOMEM
error, then unmap an in-use pack window and try again.

However, I forgot that we don't actually do that (yet). We unmap windows
if we go over our own packed_git_window_size counter, but not when mmap
fails. Eric posted a patch recently to change that, though (at which
point the die() in xmalloc() would be working against us).

(Actually, we did _try_ to do something like that in xmmap prior to
9827d4c185, but I don't think it actually worked since it was based on
our own internal limit, and not what the OS would allow).

> xmalloc() still brings support for zero-sized allocations on platforms
> whose malloc(3) doesn't like them, and it enforces GIT_ALLOC_LIMIT.
> mmap() is supposed give up with EINVAL if the length is 0, so the
> first feature is not actually helping.  And GIT_ALLOC_LIMIT is not
> documented and only useful for testing, right?

Yeah, I think failing on a 0-length mmap is OK, since that's what real
systems do (and this is a wrapper). Our xmmap_gently() handles this,
which is the right spot.

I don't think of GIT_ALLOC_LIMIT as something we're committed to
publicly supporting, though I have (ab)used it once or twice myself.
However, I'm not sure if it makes sense here. True, on a system without
mmap it is a heap allocation, but to me it's conceptually very different
than a normal allocation (and on a system with mmap, we have no problem
at all requesting arbitrarily large maps). If we did want to put a limit
here, I think we'd want to do it at the xmmap layer, using a separate
variable.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-04 10:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03 10:05 [PATCH] khash: clarify that allocations never fail René Scharfe
2021-07-03 10:38 ` Jeff King
2021-07-03 10:44   ` Jeff King
2021-07-03 11:35   ` Ævar Arnfjörð Bjarmason
2021-07-03 12:56     ` René Scharfe
2021-07-03 13:05       ` Ævar Arnfjörð Bjarmason
2021-07-04  9:01       ` Jeff King
2021-07-04  9:41         ` René Scharfe
2021-07-04 10:11           ` Jeff King
2021-07-03 12:57   ` René Scharfe
2021-07-03 12:57 ` [PATCH v2] " René Scharfe
2021-07-04  9:05   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).