[PATCH] Properly align memory allocations and temporary buffers

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] Properly align memory allocations and temporary buffers
@ 2022-01-05 13:23 Jessica Clarke
  2022-01-06 21:46 ` Taylor Blau
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-05 13:23 UTC (permalink / raw)
  To: git; +Cc: Jessica Clarke

Currently git_qsort_s allocates a buffer on the stack that has no
alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
alignment for any type.

On CHERI, and thus Arm's Morello prototype, pointers are implemented as
hardware capabilities which, as well as having a normal integer address,
have additional bounds, permissions and other metadata in a second word,
so on a 64-bit architecture they are 128-bit quantities, including their
alignment requirements. Despite being 128-bit, their integer component
is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
uintmax_t does not sufficiently align an allocation.

Moreover, these capabilities have an additional "129th" tag bit, which
tracks the validity of the capability and is cleared on any invalid
operation that doesn't trap (e.g. partially overwriting a capability
will invalidate it) which, combined with the architecture's strict
checks on capability manipulation instructions, ensures it is
architecturally impossible to construct a capability that gives more
rights than those you were given in the first place. To store these tag
bits, each capability sized and aligned word in memory gains a single
tag bit that is stored in unaddressable (to the processor) memory. This
means that it is impossible to store a capability at an unaligned
address: a normal load or store of a capability will always take an
alignment fault even if the (micro)architecture supports unaligned
loads/stores for other data types, and a memcpy will, if the destination
is not appropriately aligned, copy the byte representation but lose the
tag, meaning that if it is eventually copied back and loaded from an
aligned location any attempt to dereference it will trap with a tag
fault. Thus, even char buffers that are memcpy'ed to or from must be
properly aligned on CHERI architectures if they are to hold pointers.

Address both of these by introducing a new git_max_align type put in a
union with the on-stack buffer to force its alignment, as well as a new
GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
that gets used for mem_pool_alloc. As well as making the code work on
CHERI, the former change likely also improves performance on some
architectures by making memcpy faster (either because it can use larger
block sizes or because the microarchitecture has inefficient unaligned
accesses).

Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---
 compat/qsort_s.c  | 11 +++++++----
 git-compat-util.h | 11 +++++++++++
 mem-pool.c        |  6 +++---
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/compat/qsort_s.c b/compat/qsort_s.c
index 52d1f0a73d..1ccdb87451 100644
--- a/compat/qsort_s.c
+++ b/compat/qsort_s.c
@@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
 		int (*cmp)(const void *, const void *, void *), void *ctx)
 {
 	const size_t size = st_mult(n, s);
-	char buf[1024];
+	union {
+		char buf[1024];
+		git_max_align align;
+	} u;

 	if (!n)
 		return 0;
 	if (!b || !cmp)
 		return -1;

-	if (size < sizeof(buf)) {
-		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf, ctx);
+	if (size < sizeof(u.buf)) {
+		/* The temporary array fits in the small on-stack buffer. */
+		msort_with_tmp(b, n, s, cmp, u.buf, ctx);
 	} else {
 		/* It's somewhat large, so malloc it.  */
 		char *tmp = xmalloc(size);
diff --git a/git-compat-util.h b/git-compat-util.h
index 5fa54a7afe..28581a45c5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
 #define _ALL_SOURCE 1
 #endif

+typedef union {
+	uintmax_t max_align_uintmax;
+	void *max_align_pointer;
+} git_max_align;
+
+typedef struct {
+	char unalign;
+	git_max_align aligned;
+} git_max_alignment;
+#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
+
 /* used on Mac OS X */
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
diff --git a/mem-pool.c b/mem-pool.c
index ccdcad2e3d..748eff925a 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
 	struct mp_block *p = NULL;
 	void *r;

-	/* round up to a 'uintmax_t' alignment */
-	if (len & (sizeof(uintmax_t) - 1))
-		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
+	if (len & (GIT_MAX_ALIGNMENT - 1))
+		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));

 	if (pool->mp_block &&
 	    pool->mp_block->end - pool->mp_block->next_free >= len)
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
@ 2022-01-06 21:46 ` Taylor Blau
  2022-01-06 21:56   ` Jessica Clarke
  2022-01-06 22:27   ` Junio C Hamano
  2022-01-06 23:22 ` brian m. carlson
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 32+ messages in thread
From: Taylor Blau @ 2022-01-06 21:46 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: René Scharfe, git

(+cc René as another possible reviewer)

On Wed, Jan 05, 2022 at 01:23:24PM +0000, Jessica Clarke wrote:
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
>
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.
>
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.
>
> Address both of these by introducing a new git_max_align type put in a
> union with the on-stack buffer to force its alignment, as well as a new
> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
> that gets used for mem_pool_alloc. As well as making the code work on
> CHERI, the former change likely also improves performance on some
> architectures by making memcpy faster (either because it can use larger
> block sizes or because the microarchitecture has inefficient unaligned
> accesses).
>
> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
> ---
>  compat/qsort_s.c  | 11 +++++++----
>  git-compat-util.h | 11 +++++++++++
>  mem-pool.c        |  6 +++---
>  3 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
> index 52d1f0a73d..1ccdb87451 100644
> --- a/compat/qsort_s.c
> +++ b/compat/qsort_s.c
> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	union {
> +		char buf[1024];
> +		git_max_align align;
> +	} u;

I'm not sure I understand. Clearly this union aligns buf along the width
of git_max_align. But what about the preimage makes buf unaligned?

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5fa54a7afe..28581a45c5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>  #define _ALL_SOURCE 1
>  #endif
>
> +typedef union {
> +	uintmax_t max_align_uintmax;
> +	void *max_align_pointer;
> +} git_max_align;

OK, the purpose of this union is to be as wide as the least common
alignment between uintmax_t and void *, yes?

> +
> +typedef struct {
> +	char unalign;
> +	git_max_align aligned;
> +} git_max_alignment;
> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)

...then the offset of the aligned field within the git_max_alignment
struct is going to be that common alignment? Could you not `#define
GIT_MAX_ALIGNMENT` to be `sizeof(git_max_align)` directly, or is there
something I'm missing?

I suppose the way you wrote it here is done in order to prevent padding
on the end of the git_max_align union from artificially increasing the
value of GIT_MAX_ALIGNMENT.

In any case, I *think* what you wrote here is right. The typedef's are
uncommon to our codebase, though. I wonder how much of this is all
necessary.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-06 21:46 ` Taylor Blau
@ 2022-01-06 21:56   ` Jessica Clarke
  2022-01-06 22:27   ` Junio C Hamano
  1 sibling, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-06 21:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: René Scharfe, git

On 6 Jan 2022, at 21:46, Taylor Blau <me@ttaylorr.com> wrote:
> 
> (+cc René as another possible reviewer)
> 
> On Wed, Jan 05, 2022 at 01:23:24PM +0000, Jessica Clarke wrote:
>> Currently git_qsort_s allocates a buffer on the stack that has no
>> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
>> alignment for any type.
>> 
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word,
>> so on a 64-bit architecture they are 128-bit quantities, including their
>> alignment requirements. Despite being 128-bit, their integer component
>> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
>> uintmax_t does not sufficiently align an allocation.
>> 
>> Moreover, these capabilities have an additional "129th" tag bit, which
>> tracks the validity of the capability and is cleared on any invalid
>> operation that doesn't trap (e.g. partially overwriting a capability
>> will invalidate it) which, combined with the architecture's strict
>> checks on capability manipulation instructions, ensures it is
>> architecturally impossible to construct a capability that gives more
>> rights than those you were given in the first place. To store these tag
>> bits, each capability sized and aligned word in memory gains a single
>> tag bit that is stored in unaddressable (to the processor) memory. This
>> means that it is impossible to store a capability at an unaligned
>> address: a normal load or store of a capability will always take an
>> alignment fault even if the (micro)architecture supports unaligned
>> loads/stores for other data types, and a memcpy will, if the destination
>> is not appropriately aligned, copy the byte representation but lose the
>> tag, meaning that if it is eventually copied back and loaded from an
>> aligned location any attempt to dereference it will trap with a tag
>> fault. Thus, even char buffers that are memcpy'ed to or from must be
>> properly aligned on CHERI architectures if they are to hold pointers.
>> 
>> Address both of these by introducing a new git_max_align type put in a
>> union with the on-stack buffer to force its alignment, as well as a new
>> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
>> that gets used for mem_pool_alloc. As well as making the code work on
>> CHERI, the former change likely also improves performance on some
>> architectures by making memcpy faster (either because it can use larger
>> block sizes or because the microarchitecture has inefficient unaligned
>> accesses).
>> 
>> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
>> ---
>> compat/qsort_s.c  | 11 +++++++----
>> git-compat-util.h | 11 +++++++++++
>> mem-pool.c        |  6 +++---
>> 3 files changed, 21 insertions(+), 7 deletions(-)
>> 
>> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
>> index 52d1f0a73d..1ccdb87451 100644
>> --- a/compat/qsort_s.c
>> +++ b/compat/qsort_s.c
>> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
>> 		int (*cmp)(const void *, const void *, void *), void *ctx)
>> {
>> 	const size_t size = st_mult(n, s);
>> -	char buf[1024];
>> +	union {
>> +		char buf[1024];
>> +		git_max_align align;
>> +	} u;
> 
> I'm not sure I understand. Clearly this union aligns buf along the width
> of git_max_align. But what about the preimage makes buf unaligned?

It’s a char array, so it can have any alignment. Its address could be
0x10007. And it doesn’t align to the width of git_max_align, it aligns
to the alignment of git_max_align. Those don’t need to be the same, the
alignment just needs to be a factor of the size.

(Technically if git_max_align has a size > 1024 then it’d also make the
union bigger, but that’s clearly absurd for any real C implementation)

>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5fa54a7afe..28581a45c5 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>> #define _ALL_SOURCE 1
>> #endif
>> 
>> +typedef union {
>> +	uintmax_t max_align_uintmax;
>> +	void *max_align_pointer;
>> +} git_max_align;
> 
> OK, the purpose of this union is to be as wide as the least common
> alignment between uintmax_t and void *, yes?

No, the purpose is for the union’s *alignment* to be the least common
alignment between uintmax_t and void *. The size doesn’t matter for
anything.

>> +
>> +typedef struct {
>> +	char unalign;
>> +	git_max_align aligned;
>> +} git_max_alignment;
>> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
> 
> ...then the offset of the aligned field within the git_max_alignment
> struct is going to be that common alignment? Could you not `#define
> GIT_MAX_ALIGNMENT` to be `sizeof(git_max_align)` directly, or is there
> something I'm missing?

You could, but that would over-align in cases where the alignment of
git_max_align is smaller than its size. For example, uint32_t and
uint64_t only require 2-byte alignment on m68k. Using offsetof ensures
we actually query the thing we care about, the alignment, and not the
size, which is guaranteed to be a multiple of the alignment, but not
necessarily equal to it.

> I suppose the way you wrote it here is done in order to prevent padding
> on the end of the git_max_align union from artificially increasing the
> value of GIT_MAX_ALIGNMENT.

So long as all those types have a size that is a power of two there
shouldn’t actually be any padding in the union, though it might be
legal for a hostile compiler to introduce it anyway for fun.

> In any case, I *think* what you wrote here is right. The typedef's are
> uncommon to our codebase, though. I wonder how much of this is all
> necessary.

If you’re willing to risk overaligning and wasting space then you can
just use sizeof the union. If you want it to be precise then I don’t
think you can cut any of it out (otherwise I would have done...).

Jess


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-06 21:46 ` Taylor Blau
  2022-01-06 21:56   ` Jessica Clarke
@ 2022-01-06 22:27   ` Junio C Hamano
  2022-01-06 22:56     ` Jessica Clarke
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2022-01-06 22:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jessica Clarke, René Scharfe, git

Taylor Blau <me@ttaylorr.com> writes:

> (+cc René as another possible reviewer)
>
> On Wed, Jan 05, 2022 at 01:23:24PM +0000, Jessica Clarke wrote:
>> Currently git_qsort_s allocates a buffer on the stack that has no
>> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
>> alignment for any type.
>>
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word,
>> so on a 64-bit architecture they are 128-bit quantities, including their
>> alignment requirements. Despite being 128-bit, their integer component
>> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
>> uintmax_t does not sufficiently align an allocation.

Alignment aside, if uintmax_t is 64-bit but your pointer needs
128-bit to store, saving a pointer value in uintmax_t variable would
not work correctly, I presume, as casting the 64-bit integral type
back into pointer would not be sufficient to recover the lost
information that used to be in the second word.

So, does the architecture have 128-bit uintptr_t and that is a safe
type both from the point of view of alignment and from the point of
view of not losing information?  

If that type is larger than uintmax_t, something smells wrong,
though.  max is not max anymore at that point.

IIRC, uintptr_t is optional in C99, so a simpler solution to use the
larger type between uintptr_t and uintmax_t as a replacement for how
we use uintmax_t would not quite work out of the box X-<.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-06 22:27   ` Junio C Hamano
@ 2022-01-06 22:56     ` Jessica Clarke
  2022-01-07  0:10       ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-06 22:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, René Scharfe, git

On 6 Jan 2022, at 22:27, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> (+cc René as another possible reviewer)
>> 
>> On Wed, Jan 05, 2022 at 01:23:24PM +0000, Jessica Clarke wrote:
>>> Currently git_qsort_s allocates a buffer on the stack that has no
>>> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
>>> alignment for any type.
>>> 
>>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>>> hardware capabilities which, as well as having a normal integer address,
>>> have additional bounds, permissions and other metadata in a second word,
>>> so on a 64-bit architecture they are 128-bit quantities, including their
>>> alignment requirements. Despite being 128-bit, their integer component
>>> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
>>> uintmax_t does not sufficiently align an allocation.
> 
> Alignment aside, if uintmax_t is 64-bit but your pointer needs
> 128-bit to store, saving a pointer value in uintmax_t variable would
> not work correctly, I presume, as casting the 64-bit integral type
> back into pointer would not be sufficient to recover the lost
> information that used to be in the second word.
> 
> So, does the architecture have 128-bit uintptr_t and that is a safe
> type both from the point of view of alignment and from the point of
> view of not losing information?  

Yes. It is basically just a char * that lets you perform arbitrary
arithmetic; not an unsigned long any more for our architectures.

> If that type is larger than uintmax_t, something smells wrong,
> though.  max is not max anymore at that point.
> 
> IIRC, uintptr_t is optional in C99, so a simpler solution to use the
> larger type between uintptr_t and uintmax_t as a replacement for how
> we use uintmax_t would not quite work out of the box X-<.

This is also true of uint128_t, it doesn’t fit in a uintmax_t either.
uintmax_t was a mistake as it becomes part of the ABI and can never be
revised even when new integer types come along. uintmax_t can hold any
valid address, but will strip the metadata. It turns out almost no
software tries to put a pointer in a uintmax_t and cast it back to a
pointer.

Note that, even if we wanted to, we coldn't just map uintmax_t to a
uintptr_t, as on 32-bit architectures uintmax_t is a 64-bit integer but
uintptr_t only has a 32-bit range (plus 32 bits of metadata), and we do
have a 32-bit CHERI variant for embedded use cases.

This is one of a few warts that people will just have to deal with for
CHERI; there’s no way round it if you want anything that implements the
ideas of CHERI.

Jess


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
  2022-01-06 21:46 ` Taylor Blau
@ 2022-01-06 23:22 ` brian m. carlson
  2022-01-06 23:31   ` Jessica Clarke
  2022-01-07 14:57 ` Philip Oakley
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: brian m. carlson @ 2022-01-06 23:22 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3521 bytes --]

On 2022-01-05 at 13:23:24, Jessica Clarke wrote:
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
> 
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.
> 
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.

I think this is going to be a problem in a lot of places, not just in
Git.  I'm pretty sure that copying data this way is specifically allowed
by C and POSIX, and thus this approach is going to break a whole lot of
things.

For example, casting a void * to uintptr_t and back should produce two
pointers that compare equal.  The C standard says that two pointers
compare equal if they're both null, both point to the same object, or
one points one past the end of an array and the other happens to point
to the beginning of another object.  If the pointers aren't null and the
original one points to valid data, then the resulting pointer (after the
two casts) would point to the same object (since that's the only valid
option that's left), and therefore could be used to access it, but that
wouldn't necessarily work in this case.

The CHERI paper I'm reading also specifically says it is not changing
uintmax_t, which is a direct violation of the C standard.  If uintptr_t
must be larger than 64 bits, then so must uintmax_t, even if that
happens to be inconvenient (because it changes the ABI from the normal
system ABI).  It sounds like, in fact, that you can't actually provide
uintptr_t with the current architecture, because it can't be provided in
a standard-compliant way.

Is there something I'm missing here, or is it the case that CHERI's
behavior isn't compliant with the C standard?
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-06 23:22 ` brian m. carlson
@ 2022-01-06 23:31   ` Jessica Clarke
  0 siblings, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-06 23:31 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On 6 Jan 2022, at 23:22, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> 
> On 2022-01-05 at 13:23:24, Jessica Clarke wrote:
>> Currently git_qsort_s allocates a buffer on the stack that has no
>> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
>> alignment for any type.
>> 
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word,
>> so on a 64-bit architecture they are 128-bit quantities, including their
>> alignment requirements. Despite being 128-bit, their integer component
>> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
>> uintmax_t does not sufficiently align an allocation.
>> 
>> Moreover, these capabilities have an additional "129th" tag bit, which
>> tracks the validity of the capability and is cleared on any invalid
>> operation that doesn't trap (e.g. partially overwriting a capability
>> will invalidate it) which, combined with the architecture's strict
>> checks on capability manipulation instructions, ensures it is
>> architecturally impossible to construct a capability that gives more
>> rights than those you were given in the first place. To store these tag
>> bits, each capability sized and aligned word in memory gains a single
>> tag bit that is stored in unaddressable (to the processor) memory. This
>> means that it is impossible to store a capability at an unaligned
>> address: a normal load or store of a capability will always take an
>> alignment fault even if the (micro)architecture supports unaligned
>> loads/stores for other data types, and a memcpy will, if the destination
>> is not appropriately aligned, copy the byte representation but lose the
>> tag, meaning that if it is eventually copied back and loaded from an
>> aligned location any attempt to dereference it will trap with a tag
>> fault. Thus, even char buffers that are memcpy'ed to or from must be
>> properly aligned on CHERI architectures if they are to hold pointers.
> 
> I think this is going to be a problem in a lot of places, not just in
> Git.  I'm pretty sure that copying data this way is specifically allowed
> by C and POSIX, and thus this approach is going to break a whole lot of
> things.

The standard says you can copy anything to an unsigned char array to
get its object representation. This we support just fine. The standard
does not say you can copy that object representation back to an object
of the original type and get a valid object of the original type. We
only support that if the unsigned char array is aligned enough.

Technically git is already wrong here by using char not unsigned char,
though that doesn’t make a difference for CHERI, nor any implementation
I know of.

> For example, casting a void * to uintptr_t and back should produce two
> pointers that compare equal.  The C standard says that two pointers
> compare equal if they're both null, both point to the same object, or
> one points one past the end of an array and the other happens to point
> to the beginning of another object.  If the pointers aren't null and the
> original one points to valid data, then the resulting pointer (after the
> two casts) would point to the same object (since that's the only valid
> option that's left), and therefore could be used to access it, but that
> wouldn't necessarily work in this case.

All of this CHERI supports. Casting to uintptr_t and back works just fine.

> The CHERI paper I'm reading also specifically says it is not changing
> uintmax_t, which is a direct violation of the C standard.  If uintptr_t
> must be larger than 64 bits, then so must uintmax_t, even if that
> happens to be inconvenient (because it changes the ABI from the normal
> system ABI).  It sounds like, in fact, that you can't actually provide
> uintptr_t with the current architecture, because it can't be provided in
> a standard-compliant way.

Every C implementation that provides 128-bit integers violates this.
Nothing cares. As explained in my reply to Junio this thread it is not
possible to provide a uintmax_t that can hold a uintptr_t on CHERI in
general. So we don’t. And almost no software cares.

Not providing a uintptr_t breaks basically all systems code ever,
except the code that’s so old it assumes unsigned long is the right
type and is already broken for CHERI as a result.

> Is there something I'm missing here, or is it the case that CHERI's
> behavior isn't compliant with the C standard?

There are some cases where we’re slightly non-compliant. So is every
implementation that exists. We believe the few restrictions we add,
which can all be easily worked around, are vastly outweighed by the
benefit of having hardware that enforces fine-grained memory safety
efficiently, and the empirical evidence is that these restrictions are
not an issue for the vast majority of real-world C code.

Jess


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-06 22:56     ` Jessica Clarke
@ 2022-01-07  0:10       ` Junio C Hamano
  2022-01-07  0:22         ` Jessica Clarke
  2022-01-07  0:31         ` brian m. carlson
  0 siblings, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-01-07  0:10 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: Taylor Blau, René Scharfe, git

Jessica Clarke <jrtc27@jrtc27.com> writes:

> This is also true of uint128_t, it doesn’t fit in a uintmax_t either.

uintmax_t is supposed to be an unsigned integer type capable of
representing any value of any unsigned integer type, so if you have
128-bit unsigned integer, your uintmax_t should be at last that
wide, or your uintmax_t is not uintmax_t as far as C standard is
concerned, no?

uintptr_t is an unsigned integer type that any valid pointer to void
can be converted to this type, then converted back to pointer to
void, and the result will compare equal to the original pointer.  So
the value of uintptr_t cannot be represented by uintmax_t, there is
something wrong.

> uintmax_t was a mistake as it becomes part of the ABI and can never be
> revised even when new integer types come along. uintmax_t can hold any
> valid address, but will strip the metadata.

It is a flaw in the implementation of uintmax_t on the architecture
that needs "the metadata", no?  If the implementation supports a
notion of uintptr_t (i.e. there exists an unsigned integer type that
can safely go back and forth from pointer to void), an unsigned
integer type that is at least as wide as any unsigned integer type
should certainly be able to hold what would fit in uintptr_t, no?

Puzzled.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  0:10       ` Junio C Hamano
@ 2022-01-07  0:22         ` Jessica Clarke
  2022-01-07  0:31         ` brian m. carlson
  1 sibling, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, René Scharfe, git

On 7 Jan 2022, at 00:10, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Jessica Clarke <jrtc27@jrtc27.com> writes:
> 
>> This is also true of uint128_t, it doesn’t fit in a uintmax_t either.
> 
> uintmax_t is supposed to be an unsigned integer type capable of
> representing any value of any unsigned integer type, so if you have
> 128-bit unsigned integer, your uintmax_t should be at last that
> wide, or your uintmax_t is not uintmax_t as far as C standard is
> concerned, no?

Yes. Every 64-bit architecture implemented by GCC and Clang violates
this. This is why uintmax_t is a terrible idea, it gets baked into your
ABI and you can’t add new integer types. People decided for 128-bit
integers it was better to add them than let uintmax_t constrain them.
We take the same approach for CHERI of not caring about uintmax_t. If
you want to hold this against CHERI, go file bugs against GCC and Clang
for violating the standard on x86_64, aarch64, mips64, powerpc64,
s390x, sparc64, and so on.

> uintptr_t is an unsigned integer type that any valid pointer to void
> can be converted to this type, then converted back to pointer to
> void, and the result will compare equal to the original pointer.  So
> the value of uintptr_t cannot be represented by uintmax_t, there is
> something wrong.
> 
>> uintmax_t was a mistake as it becomes part of the ABI and can never be
>> revised even when new integer types come along. uintmax_t can hold any
>> valid address, but will strip the metadata.
> 
> It is a flaw in the implementation of uintmax_t on the architecture
> that needs "the metadata", no?  If the implementation supports a
> notion of uintptr_t (i.e. there exists an unsigned integer type that
> can safely go back and forth from pointer to void), an unsigned
> integer type that is at least as wide as any unsigned integer type
> should certainly be able to hold what would fit in uintptr_t, no?

If you want to get really language-lawyer-y about it, you can actually
argue that this is a compliant implementation of the C standard.
Integer types are permitted to have padding bits, and some combinations
of padding bits are allowed to be trap representations. Technically, in
our representation, the metadata bits are padding bits, because they do
not contribute to the precision like value bits. It is therefore the
case that the *value* of a uintptr_t still fits into a uintmax_t, but
the latter has no padding bits, and casting the latter to the former
yields a trap representation when further cast back to a pointer. This
may not the intent of the spec, and not how anyone thinks of it because
CHERI is the first implementation that pushes the boundary here, but
it’s technically legal under that interpretation. You may disagree with
the interpretation, and I don’t like to use it most of the time because
it’s complicated and involves yet more ill-defined parts of the spec
(e.g. it says arithmetic operations on valid values (they mean objects,
I assume, as the value only includes value bits, but the input could be
a trap representation on some implementations) never generate a trap
representation other than as part of an exceptional condition such as
an overflow, but nowhere defines what counts as an arithmetic
operation).

Jess

> Puzzled.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  0:10       ` Junio C Hamano
  2022-01-07  0:22         ` Jessica Clarke
@ 2022-01-07  0:31         ` brian m. carlson
  2022-01-07  0:39           ` Jessica Clarke
  1 sibling, 1 reply; 32+ messages in thread
From: brian m. carlson @ 2022-01-07  0:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jessica Clarke, Taylor Blau, René Scharfe, git

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

On 2022-01-07 at 00:10:21, Junio C Hamano wrote:
> Jessica Clarke <jrtc27@jrtc27.com> writes:
> 
> > This is also true of uint128_t, it doesn’t fit in a uintmax_t either.

I don't have this type from either GCC or Clang on my Debian amd64/sid
system.  I know those compilers support 128-bit values because Rust uses
them, but they are not exposed in standard C and therefore those
compilers appear to be compliant when run in standards mode.

> uintmax_t is supposed to be an unsigned integer type capable of
> representing any value of any unsigned integer type, so if you have
> 128-bit unsigned integer, your uintmax_t should be at last that
> wide, or your uintmax_t is not uintmax_t as far as C standard is
> concerned, no?
> 
> uintptr_t is an unsigned integer type that any valid pointer to void
> can be converted to this type, then converted back to pointer to
> void, and the result will compare equal to the original pointer.  So
> the value of uintptr_t cannot be represented by uintmax_t, there is
> something wrong.

Yes, this is the case.  The C standard mandates this behavior.

Now, as far as I'm aware, the standard does not mandate that that
uintmax_t have the strictest alignment requirements of any integral
type.  It could be that other types have stricter requirements, for
example, on some systems.  But it is required that conversion from void *
to uintptr_t to uintmax_t to uintptr_t to void * preserve the
functionality of the pointer.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  0:31         ` brian m. carlson
@ 2022-01-07  0:39           ` Jessica Clarke
  2022-01-07  1:43             ` brian m. carlson
  0 siblings, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07  0:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Taylor Blau, René Scharfe, git

On 7 Jan 2022, at 00:31, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> 
> On 2022-01-07 at 00:10:21, Junio C Hamano wrote:
>> Jessica Clarke <jrtc27@jrtc27.com> writes:
>> 
>>> This is also true of uint128_t, it doesn’t fit in a uintmax_t either.
> 
> I don't have this type from either GCC or Clang on my Debian amd64/sid
> system.  I know those compilers support 128-bit values because Rust uses
> them, but they are not exposed in standard C and therefore those
> compilers appear to be compliant when run in standards mode.

Ah, I should have said (un)(signed) __int128, there is no typedef for them even in GNU C.

__int128 exists as an integer type even with -std=c99, not just
-std=gnu99. -pedantic will warn that __int128 does not exist in ISO C,
but __int128__ still exists and works with no warnings even then. So it
very much exists in standards mode.

>> uintmax_t is supposed to be an unsigned integer type capable of
>> representing any value of any unsigned integer type, so if you have
>> 128-bit unsigned integer, your uintmax_t should be at last that
>> wide, or your uintmax_t is not uintmax_t as far as C standard is
>> concerned, no?
>> 
>> uintptr_t is an unsigned integer type that any valid pointer to void
>> can be converted to this type, then converted back to pointer to
>> void, and the result will compare equal to the original pointer.  So
>> the value of uintptr_t cannot be represented by uintmax_t, there is
>> something wrong.
> 
> Yes, this is the case.  The C standard mandates this behavior.
> 
> Now, as far as I'm aware, the standard does not mandate that that
> uintmax_t have the strictest alignment requirements of any integral
> type.  It could be that other types have stricter requirements, for
> example, on some systems.  But it is required that conversion from void *
> to uintptr_t to uintmax_t to uintptr_t to void * preserve the
> functionality of the pointer.

To quote what I said to Junio in a futher reply:

> If you want to get really language-lawyer-y about it, you can actually
> argue that this is a compliant implementation of the C standard.
> Integer types are permitted to have padding bits, and some combinations
> of padding bits are allowed to be trap representations. Technically, in
> our representation, the metadata bits are padding bits, because they do
> not contribute to the precision like value bits. It is therefore the
> case that the *value* of a uintptr_t still fits into a uintmax_t, but
> the latter has no padding bits, and casting the latter to the former
> yields a trap representation when further cast back to a pointer. This
> may not the intent of the spec, and not how anyone thinks of it because
> CHERI is the first implementation that pushes the boundary here, but
> it’s technically legal under that interpretation. You may disagree with
> the interpretation, and I don’t like to use it most of the time because
> it’s complicated and involves yet more ill-defined parts of the spec
> (e.g. it says arithmetic operations on valid values (they mean objects,
> I assume, as the value only includes value bits, but the input could be
> a trap representation on some implementations) never generate a trap
> representation other than as part of an exceptional condition such as
> an overflow, but nowhere defines what counts as an arithmetic
> operation).


So, no, C does not actually require what you say. It requires that void
* -> uintptr_t -> void * give you a valid pointer. It requires that
uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we
do, because the value is formed from only the value bits which
contribute to the precision, which is 64 bits in this case, and
uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t,
since uintptr_t’s precision is the same as uintmax_t’s, be always
valid, which is is. But it does not require that that uintptr_t has the
same representation as the original uintptr_t, which it does not for
us. And therefore it does not require that casting that uintptr_t back
to a void * yields a valid pointer. So if you want to really dig into
the details of the standard, we are technically compliant, even if some
might argue it’s not in the spirit of the standard.

Jess


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  0:39           ` Jessica Clarke
@ 2022-01-07  1:43             ` brian m. carlson
  2022-01-07  2:08               ` Jessica Clarke
  2022-01-07 19:30               ` Junio C Hamano
  0 siblings, 2 replies; 32+ messages in thread
From: brian m. carlson @ 2022-01-07  1:43 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: Junio C Hamano, Taylor Blau, René Scharfe, git

[-- Attachment #1: Type: text/plain, Size: 4378 bytes --]

On 2022-01-07 at 00:39:59, Jessica Clarke wrote:
> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> > If you want to get really language-lawyer-y about it, you can actually
> > argue that this is a compliant implementation of the C standard.
> > Integer types are permitted to have padding bits, and some combinations
> > of padding bits are allowed to be trap representations. Technically, in
> > our representation, the metadata bits are padding bits, because they do
> > not contribute to the precision like value bits. It is therefore the
> > case that the *value* of a uintptr_t still fits into a uintmax_t, but
> > the latter has no padding bits, and casting the latter to the former
> > yields a trap representation when further cast back to a pointer. This
> > may not the intent of the spec, and not how anyone thinks of it because
> > CHERI is the first implementation that pushes the boundary here, but
> > it’s technically legal under that interpretation. You may disagree with
> > the interpretation, and I don’t like to use it most of the time because
> > it’s complicated and involves yet more ill-defined parts of the spec
> > (e.g. it says arithmetic operations on valid values (they mean objects,
> > I assume, as the value only includes value bits, but the input could be
> > a trap representation on some implementations) never generate a trap
> > representation other than as part of an exceptional condition such as
> > an overflow, but nowhere defines what counts as an arithmetic
> > operation).
> 
> 
> So, no, C does not actually require what you say. It requires that void
> * -> uintptr_t -> void * give you a valid pointer. It requires that
> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we
> do, because the value is formed from only the value bits which
> contribute to the precision, which is 64 bits in this case, and
> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t,
> since uintptr_t’s precision is the same as uintmax_t’s, be always
> valid, which is is. But it does not require that that uintptr_t has the
> same representation as the original uintptr_t, which it does not for
> us. And therefore it does not require that casting that uintptr_t back
> to a void * yields a valid pointer. So if you want to really dig into
> the details of the standard, we are technically compliant, even if some
> might argue it’s not in the spirit of the standard.

Sure, implementations are allowed to have padding bits.  They're also
allowed, at the moment, to use signed-magnitude or ones' complement
integers, have CHAR_BIT greater than 8, have sizeof(char) ==
sizeof(short), not implement any of the customary sizes of intN_t or
uintN_t, not provide uintptr_t, and use middle-endian numbers.

However, if your ABI is only compliant in the face of those features
(especially when it could have been written in a way which would have
been compliant without the use of those features), it's intentionally
hostile to real-world developers, and I don't think we should support
it[0].  I'd be willing to revisit this if your ABI were defined in a
reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t),
without padding bits, where the alignment of pointers from malloc is
suitable for all types, and where the alignment of a type is no greater
than sizeof(type).

I'm not opposed to a small amount of finagling for this case, but I am
very much opposed to defining your C ABI in an intentionally difficult
way.  128-bit integers in 64-bit Linux were not originally part of the C
ABIs and if the ABI is ill defined now, that's a historical accident.
But this is a new ABI for a new architecture and it could have been
defined in a responsible way, but wasn't.

As an aside, I was actually going to point out that you could propose a
nice Rust or Go ABI with the status quo, but if your C ABI requires
padding bits, then you're probably going to have a hard time doing so,
since I don't believe those languages support padding bits and they need
to support the C ABI.

[0] For the record, I care strongly about portability, and I would not
accept a runtime having any of the qualities I mentioned in the first
paragraph.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  1:43             ` brian m. carlson
@ 2022-01-07  2:08               ` Jessica Clarke
  2022-01-07  2:11                 ` Jessica Clarke
  2022-01-07 19:30               ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07  2:08 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Taylor Blau, René Scharfe, git

On 7 Jan 2022, at 01:43, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> 
> On 2022-01-07 at 00:39:59, Jessica Clarke wrote:
>> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>>> If you want to get really language-lawyer-y about it, you can actually
>>> argue that this is a compliant implementation of the C standard.
>>> Integer types are permitted to have padding bits, and some combinations
>>> of padding bits are allowed to be trap representations. Technically, in
>>> our representation, the metadata bits are padding bits, because they do
>>> not contribute to the precision like value bits. It is therefore the
>>> case that the *value* of a uintptr_t still fits into a uintmax_t, but
>>> the latter has no padding bits, and casting the latter to the former
>>> yields a trap representation when further cast back to a pointer. This
>>> may not the intent of the spec, and not how anyone thinks of it because
>>> CHERI is the first implementation that pushes the boundary here, but
>>> it’s technically legal under that interpretation. You may disagree with
>>> the interpretation, and I don’t like to use it most of the time because
>>> it’s complicated and involves yet more ill-defined parts of the spec
>>> (e.g. it says arithmetic operations on valid values (they mean objects,
>>> I assume, as the value only includes value bits, but the input could be
>>> a trap representation on some implementations) never generate a trap
>>> representation other than as part of an exceptional condition such as
>>> an overflow, but nowhere defines what counts as an arithmetic
>>> operation).
>> 
>> 
>> So, no, C does not actually require what you say. It requires that void
>> * -> uintptr_t -> void * give you a valid pointer. It requires that
>> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we
>> do, because the value is formed from only the value bits which
>> contribute to the precision, which is 64 bits in this case, and
>> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t,
>> since uintptr_t’s precision is the same as uintmax_t’s, be always
>> valid, which is is. But it does not require that that uintptr_t has the
>> same representation as the original uintptr_t, which it does not for
>> us. And therefore it does not require that casting that uintptr_t back
>> to a void * yields a valid pointer. So if you want to really dig into
>> the details of the standard, we are technically compliant, even if some
>> might argue it’s not in the spirit of the standard.
> 
> Sure, implementations are allowed to have padding bits.  They're also
> allowed, at the moment, to use signed-magnitude or ones' complement
> integers, have CHAR_BIT greater than 8, have sizeof(char) ==
> sizeof(short), not implement any of the customary sizes of intN_t or
> uintN_t, not provide uintptr_t, and use middle-endian numbers.
> 
> However, if your ABI is only compliant in the face of those features
> (especially when it could have been written in a way which would have
> been compliant without the use of those features), it's intentionally
> hostile to real-world developers, and I don't think we should support
> it[0].  I'd be willing to revisit this if your ABI were defined in a
> reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t),
> without padding bits, where the alignment of pointers from malloc is
> suitable for all types, and where the alignment of a type is no greater
> than sizeof(type).
> 
> I'm not opposed to a small amount of finagling for this case, but I am
> very much opposed to defining your C ABI in an intentionally difficult
> way.  128-bit integers in 64-bit Linux were not originally part of the C
> ABIs and if the ABI is ill defined now, that's a historical accident.
> But this is a new ABI for a new architecture and it could have been
> defined in a responsible way, but wasn't.

It’s not a choice. It is not possible to define uintmax_t as being able
to safely round-trip any uintptr_t in a way that preserves pointer
validity unless you want to outlaw 64-bit integers on 32-bit CHERI
implementations, and good luck with that one, for reasons I have
previously explained, so if you do not wish to support this C
implementation then upstream Git will not work on CHERI/Arm’s Morello.
Nothing in this patch causes additional burden on other architectures,
in fact it likely improves performance on some due to increased buffer
alignment, so this seems a rather unnecessary line to take. We have all
of our fork of FreeBSD, kernel and userland, built and working on
CHERI, as well as X11 and the core of a KDE desktop (KWin, Plasma,
Kate, Okular, Gwenview, Konsole; we would have more but up until
recently we’ve needed to cross-compile everything and that’s often a
pain to get things building), plus WebKit’s JavaScriptCore complete
with a working JIT, so Git would be a notable omission in the stack,
and rather important for developers to actually get work done on their
boards.

Note that C requires us to define those bits as padding bits; the range
of an integer type is defined to be 0 to 2^N-1, where N is the number
of value bits, so it would be impossible to have any metadata in a
uintptr_t.

So I take issue with “in a responsible way”. CHERI has been in
development for over 10 years, it’s been done in collaboration with
Arm, with people intimately familiar with C semantics and with
extensive analysis of what real-world software expects of C in order to
come up with the most-compatible implementation we reasonably can
that’s still feasible to implement in hardware and provides the desired
protection properties. Declaring it broken/hostile/etc feels like
throwing the baby out with the bath water; for industry to move away
from the sorry state of affairs that we still see with C and C++ where
memory safety vulnerabilities are ubiquitous you cannot just restrict
new ABIs to the status quo, as you just won’t make any progress.

> As an aside, I was actually going to point out that you could propose a
> nice Rust or Go ABI with the status quo, but if your C ABI requires
> padding bits, then you're probably going to have a hard time doing so,
> since I don't believe those languages support padding bits and they need
> to support the C ABI.

There is no such restriction in Rust that I can see. It’s true you’d
need to slightly tweak Go’s specification as uintptr is currently an
integer type and n-bit integer types are defined to be n-bit two’s
complement values. I see no reason why that couldn’t be done though, I
doubt much cares. Rust only defines the representation of its
fixed-width integer types. There’s no reason you couldn’t add an
additional integer type for a capability, like we do for C. The only
issue with Rust is that it conflates uintptr_t with size_t, for which
there are various discussion threads with Rust maintainers and a path
forward for making it work if someone has the time to invest in making
Rust work. Languages can evolve; there’s no reason they can’t for CHERI
like they do for anything else they want to add, change, deprecate or
remove.

But this is all irrelevant to discussions of C.

Jess

> [0] For the record, I care strongly about portability, and I would not
> accept a runtime having any of the qualities I mentioned in the first
> paragraph.
> -- 
> brian m. carlson (he/him or they/them)
> Toronto, Ontario, CA

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  2:08               ` Jessica Clarke
@ 2022-01-07  2:11                 ` Jessica Clarke
  0 siblings, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07  2:11 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, Taylor Blau, René Scharfe, git

On 7 Jan 2022, at 02:08, Jessica Clarke <jrtc27@jrtc27.com> wrote:
> 
> On 7 Jan 2022, at 01:43, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>> 
>> On 2022-01-07 at 00:39:59, Jessica Clarke wrote:
>>> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>>>> If you want to get really language-lawyer-y about it, you can actually
>>>> argue that this is a compliant implementation of the C standard.
>>>> Integer types are permitted to have padding bits, and some combinations
>>>> of padding bits are allowed to be trap representations. Technically, in
>>>> our representation, the metadata bits are padding bits, because they do
>>>> not contribute to the precision like value bits. It is therefore the
>>>> case that the *value* of a uintptr_t still fits into a uintmax_t, but
>>>> the latter has no padding bits, and casting the latter to the former
>>>> yields a trap representation when further cast back to a pointer. This
>>>> may not the intent of the spec, and not how anyone thinks of it because
>>>> CHERI is the first implementation that pushes the boundary here, but
>>>> it’s technically legal under that interpretation. You may disagree with
>>>> the interpretation, and I don’t like to use it most of the time because
>>>> it’s complicated and involves yet more ill-defined parts of the spec
>>>> (e.g. it says arithmetic operations on valid values (they mean objects,
>>>> I assume, as the value only includes value bits, but the input could be
>>>> a trap representation on some implementations) never generate a trap
>>>> representation other than as part of an exceptional condition such as
>>>> an overflow, but nowhere defines what counts as an arithmetic
>>>> operation).
>>> 
>>> 
>>> So, no, C does not actually require what you say. It requires that void
>>> * -> uintptr_t -> void * give you a valid pointer. It requires that
>>> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we
>>> do, because the value is formed from only the value bits which
>>> contribute to the precision, which is 64 bits in this case, and
>>> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t,
>>> since uintptr_t’s precision is the same as uintmax_t’s, be always
>>> valid, which is is. But it does not require that that uintptr_t has the
>>> same representation as the original uintptr_t, which it does not for
>>> us. And therefore it does not require that casting that uintptr_t back
>>> to a void * yields a valid pointer. So if you want to really dig into
>>> the details of the standard, we are technically compliant, even if some
>>> might argue it’s not in the spirit of the standard.
>> 
>> Sure, implementations are allowed to have padding bits.  They're also
>> allowed, at the moment, to use signed-magnitude or ones' complement
>> integers, have CHAR_BIT greater than 8, have sizeof(char) ==
>> sizeof(short), not implement any of the customary sizes of intN_t or
>> uintN_t, not provide uintptr_t, and use middle-endian numbers.
>> 
>> However, if your ABI is only compliant in the face of those features
>> (especially when it could have been written in a way which would have
>> been compliant without the use of those features), it's intentionally
>> hostile to real-world developers, and I don't think we should support
>> it[0].  I'd be willing to revisit this if your ABI were defined in a
>> reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t),
>> without padding bits, where the alignment of pointers from malloc is
>> suitable for all types, and where the alignment of a type is no greater
>> than sizeof(type).
>> 
>> I'm not opposed to a small amount of finagling for this case, but I am
>> very much opposed to defining your C ABI in an intentionally difficult
>> way.  128-bit integers in 64-bit Linux were not originally part of the C
>> ABIs and if the ABI is ill defined now, that's a historical accident.
>> But this is a new ABI for a new architecture and it could have been
>> defined in a responsible way, but wasn't.
> 
> It’s not a choice. It is not possible to define uintmax_t as being able
> to safely round-trip any uintptr_t in a way that preserves pointer
> validity unless you want to outlaw 64-bit integers on 32-bit CHERI
> implementations, and good luck with that one, for reasons I have
> previously explained, so if you do not wish to support this C
> implementation then upstream Git will not work on CHERI/Arm’s Morello.
> Nothing in this patch causes additional burden on other architectures,
> in fact it likely improves performance on some due to increased buffer
> alignment, so this seems a rather unnecessary line to take. We have all
> of our fork of FreeBSD, kernel and userland, built and working on
> CHERI, as well as X11 and the core of a KDE desktop (KWin, Plasma,
> Kate, Okular, Gwenview, Konsole; we would have more but up until
> recently we’ve needed to cross-compile everything and that’s often a
> pain to get things building), plus WebKit’s JavaScriptCore complete
> with a working JIT, so Git would be a notable omission in the stack,
> and rather important for developers to actually get work done on their
> boards.
> 
> Note that C requires us to define those bits as padding bits; the range
> of an integer type is defined to be 0 to 2^N-1, where N is the number
> of value bits, so it would be impossible to have any metadata in a
> uintptr_t.
> 
> So I take issue with “in a responsible way”. CHERI has been in
> development for over 10 years, it’s been done in collaboration with
> Arm, with people intimately familiar with C semantics and with
> extensive analysis of what real-world software expects of C in order to
> come up with the most-compatible implementation we reasonably can
> that’s still feasible to implement in hardware and provides the desired
> protection properties. Declaring it broken/hostile/etc feels like
> throwing the baby out with the bath water; for industry to move away
> from the sorry state of affairs that we still see with C and C++ where
> memory safety vulnerabilities are ubiquitous you cannot just restrict
> new ABIs to the status quo, as you just won’t make any progress.
> 
>> As an aside, I was actually going to point out that you could propose a
>> nice Rust or Go ABI with the status quo, but if your C ABI requires
>> padding bits, then you're probably going to have a hard time doing so,
>> since I don't believe those languages support padding bits and they need
>> to support the C ABI.
> 
> There is no such restriction in Rust that I can see. It’s true you’d
> need to slightly tweak Go’s specification as uintptr is currently an
> integer type and n-bit integer types are defined to be n-bit two’s
> complement values. I see no reason why that couldn’t be done though, I
> doubt much cares. Rust only defines the representation of its
> fixed-width integer types. There’s no reason you couldn’t add an
> additional integer type for a capability, like we do for C. The only
> issue with Rust is that it conflates uintptr_t with size_t, for which
> there are various discussion threads with Rust maintainers and a path
> forward for making it work if someone has the time to invest in making
> Rust work. Languages can evolve; there’s no reason they can’t for CHERI
> like they do for anything else they want to add, change, deprecate or
> remove.
> 
> But this is all irrelevant to discussions of C.

Oh and:

> where the alignment of pointers from malloc is
> suitable for all types

This is true of CHERI.

> and where the alignment of a type is no greater
> than sizeof(type).

This is true of CHERI, and it’s impossible for that not to be true.
Otherwise arrays don’t work.

Jess


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
  2022-01-06 21:46 ` Taylor Blau
  2022-01-06 23:22 ` brian m. carlson
@ 2022-01-07 14:57 ` Philip Oakley
  2022-01-07 16:08 ` René Scharfe
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Philip Oakley @ 2022-01-07 14:57 UTC (permalink / raw)
  To: Jessica Clarke, git

On 05/01/2022 13:23, Jessica Clarke wrote:
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
>
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.

I did find that the introduction paragraphs in this and its companion
patch needed a little more background of the uninitiated.

I felt it needed a few words to highlight the innovative, potentially
'breaking', changes that CHERI (&co) provides, such as the hidden bit,
the shift from classic von Neumann architecture, inclusion of typing
for  values, etc. and how that affects the compromises that are within
the C standard (and those that features misreadings and conflicts with it).

Some of the follow up discussions have a great similarity with the
Windows LLP64 (`long long`) issues that also confuse the arithmetic
operation capabilities with the memory pointer capabilities.

The layout needed a bit of SPIN (situation, problem, implication, need
payoff) to see how it hangs together across all the different
implementations. I know, from experience, it's not easy to pick out the
'obvious', but implicit, foundational aspects. While, explaining them
from an implementation perspective is easy for those who know the
implementation, I found it took a few reads to get my head around the
discussion (e.g. I wasn't familiar with `typedef union`s).

Improved memory safety does sound to be worthwhile!

Philip
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.
>
> Address both of these by introducing a new git_max_align type put in a
> union with the on-stack buffer to force its alignment, as well as a new
> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
> that gets used for mem_pool_alloc. As well as making the code work on
> CHERI, the former change likely also improves performance on some
> architectures by making memcpy faster (either because it can use larger
> block sizes or because the microarchitecture has inefficient unaligned
> accesses).
>
> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
> ---
>  compat/qsort_s.c  | 11 +++++++----
>  git-compat-util.h | 11 +++++++++++
>  mem-pool.c        |  6 +++---
>  3 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
> index 52d1f0a73d..1ccdb87451 100644
> --- a/compat/qsort_s.c
> +++ b/compat/qsort_s.c
> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	union {
> +		char buf[1024];
> +		git_max_align align;
> +	} u;
>  
>  	if (!n)
>  		return 0;
>  	if (!b || !cmp)
>  		return -1;
>  
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> +	if (size < sizeof(u.buf)) {
> +		/* The temporary array fits in the small on-stack buffer. */
> +		msort_with_tmp(b, n, s, cmp, u.buf, ctx);
>  	} else {
>  		/* It's somewhat large, so malloc it.  */
>  		char *tmp = xmalloc(size);
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5fa54a7afe..28581a45c5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>  #define _ALL_SOURCE 1
>  #endif
>  
> +typedef union {
> +	uintmax_t max_align_uintmax;
> +	void *max_align_pointer;
> +} git_max_align;
> +
> +typedef struct {
> +	char unalign;
> +	git_max_align aligned;
> +} git_max_alignment;
> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
> +
>  /* used on Mac OS X */
>  #ifdef PRECOMPOSE_UNICODE
>  #include "compat/precompose_utf8.h"
> diff --git a/mem-pool.c b/mem-pool.c
> index ccdcad2e3d..748eff925a 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>  	struct mp_block *p = NULL;
>  	void *r;
>  
> -	/* round up to a 'uintmax_t' alignment */
> -	if (len & (sizeof(uintmax_t) - 1))
> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
> +	if (len & (GIT_MAX_ALIGNMENT - 1))
> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>  
>  	if (pool->mp_block &&
>  	    pool->mp_block->end - pool->mp_block->next_free >= len)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
                   ` (2 preceding siblings ...)
  2022-01-07 14:57 ` Philip Oakley
@ 2022-01-07 16:08 ` René Scharfe
  2022-01-07 16:21   ` Jessica Clarke
  2022-01-12 13:58 ` Jessica Clarke
  2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
  5 siblings, 1 reply; 32+ messages in thread
From: René Scharfe @ 2022-01-07 16:08 UTC (permalink / raw)
  To: Jessica Clarke, git

Am 05.01.22 um 14:23 schrieb Jessica Clarke:
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
>
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.
>
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.
>
> Address both of these by introducing a new git_max_align type put in a
> union with the on-stack buffer to force its alignment, as well as a new
> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
> that gets used for mem_pool_alloc. As well as making the code work on
> CHERI, the former change likely also improves performance on some
> architectures by making memcpy faster (either because it can use larger
> block sizes or because the microarchitecture has inefficient unaligned
> accesses).
>
> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
> ---
>  compat/qsort_s.c  | 11 +++++++----
>  git-compat-util.h | 11 +++++++++++
>  mem-pool.c        |  6 +++---
>  3 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
> index 52d1f0a73d..1ccdb87451 100644
> --- a/compat/qsort_s.c
> +++ b/compat/qsort_s.c
> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	union {
> +		char buf[1024];
> +		git_max_align align;
> +	} u;
>
>  	if (!n)
>  		return 0;
>  	if (!b || !cmp)
>  		return -1;
>
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> +	if (size < sizeof(u.buf)) {
> +		/* The temporary array fits in the small on-stack buffer. */
> +		msort_with_tmp(b, n, s, cmp, u.buf, ctx);

So buf gets maximum alignment instead of char alignment (i.e. none)
because some callers use it to sort pointers, which need that on your
platform.  Makes sense.

>  	} else {
>  		/* It's somewhat large, so malloc it.  */
>  		char *tmp = xmalloc(size);

tmp is used instead of buf if the latter is not big enough, so it can
also contain pointers.  No problem, because malloc(3) returns memory
that is properly aligned for anything already.


stable-qsort.c uses the same algorithm as compat/qsort_s.c, it just
lacks the context pointer.  Shouldn't it get the same treatment?  It
is used e.g. (via the macro STABLE_QSORT) in merge-ort.c to sort
pointers..

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5fa54a7afe..28581a45c5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>  #define _ALL_SOURCE 1
>  #endif
>
> +typedef union {
> +	uintmax_t max_align_uintmax;
> +	void *max_align_pointer;
> +} git_max_align;

For your purposes just the void * member would suffice here, right?  And
with the added uintmax_t this currently gets maximum alignment, suitable
for any of our objects.  If we were to start using __int128 etc. then
we'd have to add that to this union as well to really get the maximum
possible alignment, though.

> +
> +typedef struct {
> +	char unalign;
> +	git_max_align aligned;
> +} git_max_alignment;
> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)

C11 has alignas, alignof and max_align_t.  We only recently started to
depend on some C99 features, so perhaps it's a bit early to use
stdalign.h in Git's code base.  That's a pity, though.  The
GIT_MAX_ALIGNMENT macro is sightly enough, but using a union to get
pointer alignment is a bit more cumbersome than something like

	alignas(alignof(max_align_t)) char buf[1024];

> +
>  /* used on Mac OS X */
>  #ifdef PRECOMPOSE_UNICODE
>  #include "compat/precompose_utf8.h"
> diff --git a/mem-pool.c b/mem-pool.c
> index ccdcad2e3d..748eff925a 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>  	struct mp_block *p = NULL;
>  	void *r;
>
> -	/* round up to a 'uintmax_t' alignment */
> -	if (len & (sizeof(uintmax_t) - 1))
> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
> +	if (len & (GIT_MAX_ALIGNMENT - 1))
> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));

OK.

>
>  	if (pool->mp_block &&
>  	    pool->mp_block->end - pool->mp_block->next_free >= len)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 16:08 ` René Scharfe
@ 2022-01-07 16:21   ` Jessica Clarke
  0 siblings, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07 16:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On 7 Jan 2022, at 16:08, René Scharfe <l.s.r@web.de> wrote:
> 
> Am 05.01.22 um 14:23 schrieb Jessica Clarke:
>> Currently git_qsort_s allocates a buffer on the stack that has no
>> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
>> alignment for any type.
>> 
>> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
>> hardware capabilities which, as well as having a normal integer address,
>> have additional bounds, permissions and other metadata in a second word,
>> so on a 64-bit architecture they are 128-bit quantities, including their
>> alignment requirements. Despite being 128-bit, their integer component
>> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
>> uintmax_t does not sufficiently align an allocation.
>> 
>> Moreover, these capabilities have an additional "129th" tag bit, which
>> tracks the validity of the capability and is cleared on any invalid
>> operation that doesn't trap (e.g. partially overwriting a capability
>> will invalidate it) which, combined with the architecture's strict
>> checks on capability manipulation instructions, ensures it is
>> architecturally impossible to construct a capability that gives more
>> rights than those you were given in the first place. To store these tag
>> bits, each capability sized and aligned word in memory gains a single
>> tag bit that is stored in unaddressable (to the processor) memory. This
>> means that it is impossible to store a capability at an unaligned
>> address: a normal load or store of a capability will always take an
>> alignment fault even if the (micro)architecture supports unaligned
>> loads/stores for other data types, and a memcpy will, if the destination
>> is not appropriately aligned, copy the byte representation but lose the
>> tag, meaning that if it is eventually copied back and loaded from an
>> aligned location any attempt to dereference it will trap with a tag
>> fault. Thus, even char buffers that are memcpy'ed to or from must be
>> properly aligned on CHERI architectures if they are to hold pointers.
>> 
>> Address both of these by introducing a new git_max_align type put in a
>> union with the on-stack buffer to force its alignment, as well as a new
>> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
>> that gets used for mem_pool_alloc. As well as making the code work on
>> CHERI, the former change likely also improves performance on some
>> architectures by making memcpy faster (either because it can use larger
>> block sizes or because the microarchitecture has inefficient unaligned
>> accesses).
>> 
>> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
>> ---
>> compat/qsort_s.c  | 11 +++++++----
>> git-compat-util.h | 11 +++++++++++
>> mem-pool.c        |  6 +++---
>> 3 files changed, 21 insertions(+), 7 deletions(-)
>> 
>> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
>> index 52d1f0a73d..1ccdb87451 100644
>> --- a/compat/qsort_s.c
>> +++ b/compat/qsort_s.c
>> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
>> 		int (*cmp)(const void *, const void *, void *), void *ctx)
>> {
>> 	const size_t size = st_mult(n, s);
>> -	char buf[1024];
>> +	union {
>> +		char buf[1024];
>> +		git_max_align align;
>> +	} u;
>> 
>> 	if (!n)
>> 		return 0;
>> 	if (!b || !cmp)
>> 		return -1;
>> 
>> -	if (size < sizeof(buf)) {
>> -		/* The temporary array fits on the small on-stack buffer. */
>> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
>> +	if (size < sizeof(u.buf)) {
>> +		/* The temporary array fits in the small on-stack buffer. */
>> +		msort_with_tmp(b, n, s, cmp, u.buf, ctx);
> 
> So buf gets maximum alignment instead of char alignment (i.e. none)
> because some callers use it to sort pointers, which need that on your
> platform.  Makes sense.

Yes, exactly.

>> 	} else {
>> 		/* It's somewhat large, so malloc it.  */
>> 		char *tmp = xmalloc(size);
> 
> tmp is used instead of buf if the latter is not big enough, so it can
> also contain pointers.  No problem, because malloc(3) returns memory
> that is properly aligned for anything already.

Yes.

> stable-qsort.c uses the same algorithm as compat/qsort_s.c, it just
> lacks the context pointer.  Shouldn't it get the same treatment?  It
> is used e.g. (via the macro STABLE_QSORT) in merge-ort.c to sort
> pointers..

Indeed it should; I guess git_stable_qsort just isn’t used for most
common commands, or at least not with structures containing pointers,
as I haven’t seen it break yet.

>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5fa54a7afe..28581a45c5 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>> #define _ALL_SOURCE 1
>> #endif
>> 
>> +typedef union {
>> +	uintmax_t max_align_uintmax;
>> +	void *max_align_pointer;
>> +} git_max_align;
> 
> For your purposes just the void * member would suffice here, right?  And
> with the added uintmax_t this currently gets maximum alignment, suitable
> for any of our objects.  If we were to start using __int128 etc. then
> we'd have to add that to this union as well to really get the maximum
> possible alignment, though.

For us, yes, uintmax_t’s alignment is never greater than void *’s. On
most 32-bit non-CHERI architectures though that isn’t the case, so
whilst omitting uintmax_t would be fine for qsort, it would break
mem_pool_alloc’s alignment, as this gets used for both. Plus aligning
to uintmax_t may still help performance for memcpy there (e.g. maybe it
guarantees the memcpy implementation can use a load/store pair
instruction that it wouldn’t otherwise necessarily be able to).

Regarding __int128, yes, you would, though both GCC’s and FreeBSD’s
stddef.h don’t admit __int128 exists. They do include long double in
their max_align_t, though, which generally has the same alignment
requirements as __int128 on 64-bit architectures, though I have a funny
feeling it might not on some.

>> +
>> +typedef struct {
>> +	char unalign;
>> +	git_max_align aligned;
>> +} git_max_alignment;
>> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
> 
> C11 has alignas, alignof and max_align_t.  We only recently started to
> depend on some C99 features, so perhaps it's a bit early to use
> stdalign.h in Git's code base.  That's a pity, though.  The
> GIT_MAX_ALIGNMENT macro is sightly enough, but using a union to get
> pointer alignment is a bit more cumbersome than something like
> 
> 	alignas(alignof(max_align_t)) char buf[1024];

It is; max_align_t is our preferred solution and what our programming
guide recommends, but not everyone is ready to jump to C11, so until
then various bodges like this are needed.

Jess

>> +
>> /* used on Mac OS X */
>> #ifdef PRECOMPOSE_UNICODE
>> #include "compat/precompose_utf8.h"
>> diff --git a/mem-pool.c b/mem-pool.c
>> index ccdcad2e3d..748eff925a 100644
>> --- a/mem-pool.c
>> +++ b/mem-pool.c
>> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>> 	struct mp_block *p = NULL;
>> 	void *r;
>> 
>> -	/* round up to a 'uintmax_t' alignment */
>> -	if (len & (sizeof(uintmax_t) - 1))
>> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
>> +	if (len & (GIT_MAX_ALIGNMENT - 1))
>> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
> 
> OK.
> 
>> 
>> 	if (pool->mp_block &&
>> 	    pool->mp_block->end - pool->mp_block->next_free >= len)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07  1:43             ` brian m. carlson
  2022-01-07  2:08               ` Jessica Clarke
@ 2022-01-07 19:30               ` Junio C Hamano
  2022-01-07 19:33                 ` Jessica Clarke
  2022-01-07 20:56                 ` René Scharfe
  1 sibling, 2 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-01-07 19:30 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Jessica Clarke, Taylor Blau, René Scharfe, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I'm not opposed to a small amount of finagling for this case, but I am
> very much opposed to defining your C ABI in an intentionally difficult
> way.

I was looking at git_qsort_s() today and thought that its use of
"char buf[1k]" on stack is something we would write for a system
without alloca().

We already have xalloca() wrapper defined in the compat-util header.
Perhaps it is a good idea to use it instead?

Both the true alloca() and xmalloc() we use as a fallback (when
<alloca.h> is not available) are supposed to return a block of
memory that is suitably aligned for any use, no?  

 compat/qsort_s.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git c/compat/qsort_s.c w/compat/qsort_s.c
index 52d1f0a73d..63660a4304 100644
--- c/compat/qsort_s.c
+++ w/compat/qsort_s.c
@@ -49,21 +49,23 @@ int git_qsort_s(void *b, size_t n, size_t s,
 		int (*cmp)(const void *, const void *, void *), void *ctx)
 {
 	const size_t size = st_mult(n, s);
-	char buf[1024];
+	char *tmp;
+	int use_alloca;
 
 	if (!n)
 		return 0;
 	if (!b || !cmp)
 		return -1;
 
-	if (size < sizeof(buf)) {
-		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf, ctx);
-	} else {
-		/* It's somewhat large, so malloc it.  */
-		char *tmp = xmalloc(size);
-		msort_with_tmp(b, n, s, cmp, tmp, ctx);
+	use_alloca = (size < 1024);
+
+	tmp = use_alloca ? xalloca(size) : xmalloc(size);
+
+	msort_with_tmp(b, n, s, cmp, tmp, ctx);
+
+	if (use_alloca)
+		xalloca_free(tmp);
+	else
 		free(tmp);
-	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 19:30               ` Junio C Hamano
@ 2022-01-07 19:33                 ` Jessica Clarke
  2022-01-07 20:56                 ` René Scharfe
  1 sibling, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-07 19:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, Taylor Blau, René Scharfe, git

On 7 Jan 2022, at 19:30, Junio C Hamano <gitster@pobox.com> wrote:
> 
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
>> I'm not opposed to a small amount of finagling for this case, but I am
>> very much opposed to defining your C ABI in an intentionally difficult
>> way.
> 
> I was looking at git_qsort_s() today and thought that its use of
> "char buf[1k]" on stack is something we would write for a system
> without alloca().
> 
> We already have xalloca() wrapper defined in the compat-util header.
> Perhaps it is a good idea to use it instead?
> 
> Both the true alloca() and xmalloc() we use as a fallback (when
> <alloca.h> is not available) are supposed to return a block of
> memory that is suitably aligned for any use, no?  

Yes; I initially considered using it, but didn’t in case the fact that
xalloca falls back on malloc when alloca doesn’t exist was considered a
regression. If that’s acceptable then xalloca avoids the issue entirely
and it’s just the mem pool allocator that needs some notion of max
alignment that isn’t just uintmax_t in order to support CHERI.

Jess

> compat/qsort_s.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git c/compat/qsort_s.c w/compat/qsort_s.c
> index 52d1f0a73d..63660a4304 100644
> --- c/compat/qsort_s.c
> +++ w/compat/qsort_s.c
> @@ -49,21 +49,23 @@ int git_qsort_s(void *b, size_t n, size_t s,
> 		int (*cmp)(const void *, const void *, void *), void *ctx)
> {
> 	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	char *tmp;
> +	int use_alloca;
> 
> 	if (!n)
> 		return 0;
> 	if (!b || !cmp)
> 		return -1;
> 
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> -	} else {
> -		/* It's somewhat large, so malloc it.  */
> -		char *tmp = xmalloc(size);
> -		msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +	use_alloca = (size < 1024);
> +
> +	tmp = use_alloca ? xalloca(size) : xmalloc(size);
> +
> +	msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +
> +	if (use_alloca)
> +		xalloca_free(tmp);
> +	else
> 		free(tmp);
> -	}
> 	return 0;
> }


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 19:30               ` Junio C Hamano
  2022-01-07 19:33                 ` Jessica Clarke
@ 2022-01-07 20:56                 ` René Scharfe
  2022-01-07 21:30                   ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: René Scharfe @ 2022-01-07 20:56 UTC (permalink / raw)
  To: Junio C Hamano, brian m. carlson; +Cc: Jessica Clarke, Taylor Blau, git

Am 07.01.22 um 20:30 schrieb Junio C Hamano:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> I'm not opposed to a small amount of finagling for this case, but I am
>> very much opposed to defining your C ABI in an intentionally difficult
>> way.
>
> I was looking at git_qsort_s() today and thought that its use of
> "char buf[1k]" on stack is something we would write for a system
> without alloca().
>
> We already have xalloca() wrapper defined in the compat-util header.
> Perhaps it is a good idea to use it instead?
>
> Both the true alloca() and xmalloc() we use as a fallback (when
> <alloca.h> is not available) are supposed to return a block of
> memory that is suitably aligned for any use, no?

If falling back to malloc(3) is fine in general then I wonder if we can
just it use always.  This would save two branches and make buffer
management trivial here.  How much worse is malloc(3) on platforms with
alloca(3)?  Do we sort lots of short lists somewhere?  In other words:
Does this stack allocation optimization actually make a measurable
difference?

Heap fragmention should not be a concern here, at least, because the
pattern of requesting, using and releasing a single allocation won't
leave holes.

>
>  compat/qsort_s.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git c/compat/qsort_s.c w/compat/qsort_s.c
> index 52d1f0a73d..63660a4304 100644
> --- c/compat/qsort_s.c
> +++ w/compat/qsort_s.c
> @@ -49,21 +49,23 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	char *tmp;
> +	int use_alloca;
>
>  	if (!n)
>  		return 0;
>  	if (!b || !cmp)
>  		return -1;
>
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> -	} else {
> -		/* It's somewhat large, so malloc it.  */
> -		char *tmp = xmalloc(size);
> -		msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +	use_alloca = (size < 1024);
> +
> +	tmp = use_alloca ? xalloca(size) : xmalloc(size);
> +
> +	msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +
> +	if (use_alloca)
> +		xalloca_free(tmp);
> +	else
>  		free(tmp);
> -	}
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 20:56                 ` René Scharfe
@ 2022-01-07 21:30                   ` Junio C Hamano
  2022-01-07 23:30                     ` René Scharfe
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2022-01-07 21:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: brian m. carlson, Jessica Clarke, Taylor Blau, git

René Scharfe <l.s.r@web.de> writes:

> If falling back to malloc(3) is fine in general then I wonder if we can
> just it use always.  This would save two branches and make buffer
> management trivial here.  How much worse is malloc(3) on platforms with
> alloca(3)?  Do we sort lots of short lists somewhere?  In other words:
> Does this stack allocation optimization actually make a measurable
> difference?

Well all the preimage of this came from your 04ee8b87 (compat: add
qsort_s(), 2017-01-22), so you tell me ;-)

> Heap fragmention should not be a concern here, at least, because the
> pattern of requesting, using and releasing a single allocation won't
> leave holes.

Sure.  Even in a multi-threaded environment that should be true.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
Subject: compat/qsort_s.c: avoid using potentially unaligned access

The compatibility definition for qsort_s() uses "char buffer[1024]"
on the stack to avoid making malloc() calls for small temporary
space, which essentially hand-rolls alloca().

But the elements of the array being sorted may have alignment needs
more strict than what an array of bytes may have. &buf[0] may be
word aligned, but using the address as if it stores the first
element of an array of a struct, whose first member may need to be
aligned on double-word boundary, would be a no-no.

We could use xalloca() from git-compat-util.h, or alloca() directly
on platforms with HAVE_ALLOCA_H, but let's try using unconditionally
xmalloc() before we know the performance characteristics of the
callers.

It may not make much of an argument to inspect the current callers
and say "it shouldn't matter to any of them", but anyway:

 * The one in object-name.c is used to sort potential matches to a
   given ambiguous object name prefix in the error path;

 * The one in pack-write.c is done once per a pack .idx file being
   written to create the reverse index, so (1) the cost of malloc()
   overhead is dwarfed by the cost of the packing operation, and (2)
   the number of entries being sorted is the number of objects in a
   pack;

 * The one in ref-filter.c is used by "branch --list", "tag --list",
   and "for-each-ref", only once per operation.  We sort an array of
   pointers with entries, each corresponding to a ref that is shown.

 * The one in string-list.c is used by sort_string_list(), which is
   way too generic to assume any access patterns, so it may or may
   not matter, but I do not care too much ;-)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 compat/qsort_s.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git c/compat/qsort_s.c w/compat/qsort_s.c
index 52d1f0a73d..0f7ff30f5f 100644
--- c/compat/qsort_s.c
+++ w/compat/qsort_s.c
@@ -49,21 +49,15 @@ int git_qsort_s(void *b, size_t n, size_t s,
 		int (*cmp)(const void *, const void *, void *), void *ctx)
 {
 	const size_t size = st_mult(n, s);
-	char buf[1024];
+	char *tmp;
 
 	if (!n)
 		return 0;
 	if (!b || !cmp)
 		return -1;
 
-	if (size < sizeof(buf)) {
-		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf, ctx);
-	} else {
-		/* It's somewhat large, so malloc it.  */
-		char *tmp = xmalloc(size);
-		msort_with_tmp(b, n, s, cmp, tmp, ctx);
-		free(tmp);
-	}
+	tmp = xmalloc(size);
+	msort_with_tmp(b, n, s, cmp, tmp, ctx);
+	free(tmp);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 21:30                   ` Junio C Hamano
@ 2022-01-07 23:30                     ` René Scharfe
  2022-01-08  0:18                       ` Elijah Newren
  0 siblings, 1 reply; 32+ messages in thread
From: René Scharfe @ 2022-01-07 23:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Jessica Clarke, Taylor Blau, git, Elijah Newren

Am 07.01.22 um 22:30 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> If falling back to malloc(3) is fine in general then I wonder if we can
>> just it use always.  This would save two branches and make buffer
>> management trivial here.  How much worse is malloc(3) on platforms with
>> alloca(3)?  Do we sort lots of short lists somewhere?  In other words:
>> Does this stack allocation optimization actually make a measurable
>> difference?
>
> Well all the preimage of this came from your 04ee8b87 (compat: add
> qsort_s(), 2017-01-22), so you tell me ;-)

Right, except I stole the code almost verbatim from compat/qsort.c,
which had this optimization since 43fe901b71 (compat: Add simplified
merge sort implementation from glibc, 2008-02-05).  :)  The
optimization may have raised my eyebrow a bit at the time, but not
enough to come up with a meaningful benchmark..

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c (the
original) still uses alloca(3) (and more tricks), by the way;
https://cgit.freebsd.org/src/tree/lib/libc/stdlib/merge.c still
doesn't.

>
>> Heap fragmention should not be a concern here, at least, because the
>> pattern of requesting, using and releasing a single allocation won't
>> leave holes.
>
> Sure.  Even in a multi-threaded environment that should be true.
>
> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 -----
> Subject: compat/qsort_s.c: avoid using potentially unaligned access
>
> The compatibility definition for qsort_s() uses "char buffer[1024]"
> on the stack to avoid making malloc() calls for small temporary
> space, which essentially hand-rolls alloca().
>
> But the elements of the array being sorted may have alignment needs
> more strict than what an array of bytes may have. &buf[0] may be
> word aligned, but using the address as if it stores the first
> element of an array of a struct, whose first member may need to be
> aligned on double-word boundary, would be a no-no.
>
> We could use xalloca() from git-compat-util.h, or alloca() directly
> on platforms with HAVE_ALLOCA_H, but let's try using unconditionally
> xmalloc() before we know the performance characteristics of the
> callers.
>
> It may not make much of an argument to inspect the current callers
> and say "it shouldn't matter to any of them", but anyway:
>
>  * The one in object-name.c is used to sort potential matches to a
>    given ambiguous object name prefix in the error path;
>
>  * The one in pack-write.c is done once per a pack .idx file being
>    written to create the reverse index, so (1) the cost of malloc()
>    overhead is dwarfed by the cost of the packing operation, and (2)
>    the number of entries being sorted is the number of objects in a
>    pack;
>
>  * The one in ref-filter.c is used by "branch --list", "tag --list",
>    and "for-each-ref", only once per operation.  We sort an array of
>    pointers with entries, each corresponding to a ref that is shown.
>
>  * The one in string-list.c is used by sort_string_list(), which is
>    way too generic to assume any access patterns, so it may or may
>    not matter, but I do not care too much ;-)
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  compat/qsort_s.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git c/compat/qsort_s.c w/compat/qsort_s.c
> index 52d1f0a73d..0f7ff30f5f 100644
> --- c/compat/qsort_s.c
> +++ w/compat/qsort_s.c
> @@ -49,21 +49,15 @@ int git_qsort_s(void *b, size_t n, size_t s,
>  		int (*cmp)(const void *, const void *, void *), void *ctx)
>  {
>  	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	char *tmp;
>
>  	if (!n)
>  		return 0;
>  	if (!b || !cmp)
>  		return -1;
>
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> -	} else {
> -		/* It's somewhat large, so malloc it.  */
> -		char *tmp = xmalloc(size);
> -		msort_with_tmp(b, n, s, cmp, tmp, ctx);
> -		free(tmp);
> -	}
> +	tmp = xmalloc(size);
> +	msort_with_tmp(b, n, s, cmp, tmp, ctx);
> +	free(tmp);
>  	return 0;
>  }

--- >8 ---
Subject: [PATCH] stable-qsort: avoid using potentially unaligned access

Like in the previous patch for compat/qsort_s.c, remove the optimization
of using an on-stack buffer to avoid small allocations.  This ensures
maximum alignment for the array elements and simplifies the code a bit.

The performance impact for the current callers is unlikely to be
noticeable:

 * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and
   ALLOC_GROW several times already, so another allocation of up to 1KB
   should not matter much.

 * diffcore-rename.c::diffcore_rename_extended() is called once per diff
   or twice per merge, and those require allocations for each object and
   more already.

 * merge-ort.c::detect_and_process_renames() is called once per merge.
   It's responsible for the two per-merge diffcore_rename_extended()
   calls mentioned above as well, though.  So this is possibly the most
   impacted caller.  Per-object allocations are likely to dwarf the
   additional small allocations in git_stable_qsort(), though.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 stable-qsort.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/stable-qsort.c b/stable-qsort.c
index 6cbaf39f7b..7ff12467cd 100644
--- a/stable-qsort.c
+++ b/stable-qsort.c
@@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s,
 		      int (*cmp)(const void *, const void *))
 {
 	const size_t size = st_mult(n, s);
-	char buf[1024];
-
-	if (size < sizeof(buf)) {
-		/* The temporary array fits on the small on-stack buffer. */
-		msort_with_tmp(b, n, s, cmp, buf);
-	} else {
-		/* It's somewhat large, so malloc it.  */
-		char *tmp = xmalloc(size);
-		msort_with_tmp(b, n, s, cmp, tmp);
-		free(tmp);
-	}
+	char *tmp;
+
+	tmp = xmalloc(size);
+	msort_with_tmp(b, n, s, cmp, tmp);
+	free(tmp);
 }
--
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-07 23:30                     ` René Scharfe
@ 2022-01-08  0:18                       ` Elijah Newren
  0 siblings, 0 replies; 32+ messages in thread
From: Elijah Newren @ 2022-01-08  0:18 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, brian m. carlson, Jessica Clarke, Taylor Blau,
	Git Mailing List

On Fri, Jan 7, 2022 at 3:30 PM René Scharfe <l.s.r@web.de> wrote:
>
...
> --- >8 ---
> Subject: [PATCH] stable-qsort: avoid using potentially unaligned access
>
> Like in the previous patch for compat/qsort_s.c, remove the optimization
> of using an on-stack buffer to avoid small allocations.  This ensures
> maximum alignment for the array elements and simplifies the code a bit.
>
> The performance impact for the current callers is unlikely to be
> noticeable:
>
>  * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and
>    ALLOC_GROW several times already, so another allocation of up to 1KB
>    should not matter much.

Not familiar with this code, but that makes sense to me.

>  * diffcore-rename.c::diffcore_rename_extended() is called once per diff
>    or twice per merge, and those require allocations for each object and
>    more already.

Actually, this sort could be skipped entirely if it can determine all
renames early (either because they are found via exact matches, or
they aren't relevant for the merge result, or they are found via
basename comparisons).  This sort is only called when we have to
resort to the full quadratic pairwise comparisons between files, in
which case we're also slurping in full files, doing the diffcore-delta
work to convert each file's contents into a spanhash, and then
pairwise comparing spanhashes for every pairing of source &
destination files that remain.  That work would absolutely dwarf the
malloc of a kilobyte.  So I agree, it's not worth worrying about this
one.

>  * merge-ort.c::detect_and_process_renames() is called once per merge.
>    It's responsible for the two per-merge diffcore_rename_extended()
>    calls mentioned above as well, though.  So this is possibly the most
>    impacted caller.  Per-object allocations are likely to dwarf the
>    additional small allocations in git_stable_qsort(), though.

The particular sort call directly found in
detect_and_process_renames() was once nearly 30% of overall execution
time in merge-ort[1], but according to some local notes I kept, it
eventually dropped to about ~1% of overall execution time after
trivial directory resolution[2] (due to the fact that it could just
include some higher level directories and omit all the files
underneath them -- i.e. it had far fewer paths to sort).

Since I suspect your change here would generally just be a small
percentage of the overall git_stable_qsort() time (if it can even be
measured), and git_stable_qsort() time is a small percentage of
merge-ort runtime, I think any runtime differences here would be
negligible.

[1] https://lore.kernel.org/git/140c1e89e0ec69c5c5e8a99b632c1cf25c2325d4.1623168703.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.988.v4.git.1626841444.gitgitgadget@gmail.com/

>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  stable-qsort.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/stable-qsort.c b/stable-qsort.c
> index 6cbaf39f7b..7ff12467cd 100644
> --- a/stable-qsort.c
> +++ b/stable-qsort.c
> @@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s,
>                       int (*cmp)(const void *, const void *))
>  {
>         const size_t size = st_mult(n, s);
> -       char buf[1024];
> -
> -       if (size < sizeof(buf)) {
> -               /* The temporary array fits on the small on-stack buffer. */
> -               msort_with_tmp(b, n, s, cmp, buf);
> -       } else {
> -               /* It's somewhat large, so malloc it.  */
> -               char *tmp = xmalloc(size);
> -               msort_with_tmp(b, n, s, cmp, tmp);
> -               free(tmp);
> -       }
> +       char *tmp;
> +
> +       tmp = xmalloc(size);
> +       msort_with_tmp(b, n, s, cmp, tmp);
> +       free(tmp);
>  }
> --
> 2.34.1

Patch looks good to me.  Reviewed-by: Elijah Newren <newren@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
                   ` (3 preceding siblings ...)
  2022-01-07 16:08 ` René Scharfe
@ 2022-01-12 13:58 ` Jessica Clarke
  2022-01-12 15:47   ` René Scharfe
  2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
  5 siblings, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-12 13:58 UTC (permalink / raw)
  To: git

On 5 Jan 2022, at 13:23, Jessica Clarke <jrtc27@jrtc27.com> wrote:
> 
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
> 
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.
> 
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.
> 
> Address both of these by introducing a new git_max_align type put in a
> union with the on-stack buffer to force its alignment, as well as a new
> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align
> that gets used for mem_pool_alloc. As well as making the code work on
> CHERI, the former change likely also improves performance on some
> architectures by making memcpy faster (either because it can use larger
> block sizes or because the microarchitecture has inefficient unaligned
> accesses).
> 
> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
> ---
> compat/qsort_s.c  | 11 +++++++----
> git-compat-util.h | 11 +++++++++++
> mem-pool.c        |  6 +++---
> 3 files changed, 21 insertions(+), 7 deletions(-)

I can see the alternative fixes for qsort are now in next, as is the
cleanup of register_symlink_changes to use string sets, which just
leaves the mem-pool.c change to not assume that uintmax_t is
sufficiently aligned for every type that git will use. If I move
GIT_MAX_ALIGNMENT and its helper aggregates to mem-pool.c would that be
acceptable?

Jess

> diff --git a/compat/qsort_s.c b/compat/qsort_s.c
> index 52d1f0a73d..1ccdb87451 100644
> --- a/compat/qsort_s.c
> +++ b/compat/qsort_s.c
> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s,
> 		int (*cmp)(const void *, const void *, void *), void *ctx)
> {
> 	const size_t size = st_mult(n, s);
> -	char buf[1024];
> +	union {
> +		char buf[1024];
> +		git_max_align align;
> +	} u;
> 
> 	if (!n)
> 		return 0;
> 	if (!b || !cmp)
> 		return -1;
> 
> -	if (size < sizeof(buf)) {
> -		/* The temporary array fits on the small on-stack buffer. */
> -		msort_with_tmp(b, n, s, cmp, buf, ctx);
> +	if (size < sizeof(u.buf)) {
> +		/* The temporary array fits in the small on-stack buffer. */
> +		msort_with_tmp(b, n, s, cmp, u.buf, ctx);
> 	} else {
> 		/* It's somewhat large, so malloc it.  */
> 		char *tmp = xmalloc(size);
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 5fa54a7afe..28581a45c5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
> #define _ALL_SOURCE 1
> #endif
> 
> +typedef union {
> +	uintmax_t max_align_uintmax;
> +	void *max_align_pointer;
> +} git_max_align;
> +
> +typedef struct {
> +	char unalign;
> +	git_max_align aligned;
> +} git_max_alignment;
> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
> +
> /* used on Mac OS X */
> #ifdef PRECOMPOSE_UNICODE
> #include "compat/precompose_utf8.h"
> diff --git a/mem-pool.c b/mem-pool.c
> index ccdcad2e3d..748eff925a 100644
> --- a/mem-pool.c
> +++ b/mem-pool.c
> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
> 	struct mp_block *p = NULL;
> 	void *r;
> 
> -	/* round up to a 'uintmax_t' alignment */
> -	if (len & (sizeof(uintmax_t) - 1))
> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
> +	if (len & (GIT_MAX_ALIGNMENT - 1))
> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
> 
> 	if (pool->mp_block &&
> 	    pool->mp_block->end - pool->mp_block->next_free >= len)
> -- 
> 2.33.1
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-12 13:58 ` Jessica Clarke
@ 2022-01-12 15:47   ` René Scharfe
  2022-01-12 15:49     ` Jessica Clarke
  0 siblings, 1 reply; 32+ messages in thread
From: René Scharfe @ 2022-01-12 15:47 UTC (permalink / raw)
  To: Jessica Clarke, git

Am 12.01.22 um 14:58 schrieb Jessica Clarke:
>
> I can see the alternative fixes for qsort are now in next, as is the
> cleanup of register_symlink_changes to use string sets, which just
> leaves the mem-pool.c change to not assume that uintmax_t is
> sufficiently aligned for every type that git will use. If I move
> GIT_MAX_ALIGNMENT and its helper aggregates to mem-pool.c would that be
> acceptable?

Defining it at its single site of use sounds like a good idea to me.
We can export it to git-compat-util.h later, iff necessary.

>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 5fa54a7afe..28581a45c5 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>> #define _ALL_SOURCE 1
>> #endif
>>
>> +typedef union {
>> +	uintmax_t max_align_uintmax;
>> +	void *max_align_pointer;
>> +} git_max_align;
>> +
>> +typedef struct {
>> +	char unalign;
>> +	git_max_align aligned;
>> +} git_max_alignment;
>> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)

Style nit: We tend to use typedef sparingly.  And the union type doesn't
even need a name.  E.g. this would work as well, while reducing the name
space footprint:

   struct git_max_alignment {
      char unalign;
      union {
         uintmax_t max_align_uintmax;
         void *max_align_pointer;
      } aligned;
   };
   #define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)

When someone uses a mempool for objects that requires 128-bit alignment
(e.g. because it includes a long double or uint128_t member) then we'd
need add long double to the union as well, like e.g. compat/obstack.c
does.  That would be the safe route, but currently only add 8 bytes of
unnecessary overhead to each allocation.

Perhaps mempool alignment should be configurable.  For now a comment
might suffice which indicates that GIT_MAX_ALIGNMENT is only the
maximum alignment requirement of current mempool users, not the
highest possible requirement for the machine.  Renaming it to something
with MEMPOOL in the name would help in this regard as well.

>> +
>> /* used on Mac OS X */
>> #ifdef PRECOMPOSE_UNICODE
>> #include "compat/precompose_utf8.h"
>> diff --git a/mem-pool.c b/mem-pool.c
>> index ccdcad2e3d..748eff925a 100644
>> --- a/mem-pool.c
>> +++ b/mem-pool.c
>> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>> 	struct mp_block *p = NULL;
>> 	void *r;
>>
>> -	/* round up to a 'uintmax_t' alignment */
>> -	if (len & (sizeof(uintmax_t) - 1))
>> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
>> +	if (len & (GIT_MAX_ALIGNMENT - 1))
>> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>>
>> 	if (pool->mp_block &&
>> 	    pool->mp_block->end - pool->mp_block->next_free >= len)
>> --
>> 2.33.1
>>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Properly align memory allocations and temporary buffers
  2022-01-12 15:47   ` René Scharfe
@ 2022-01-12 15:49     ` Jessica Clarke
  0 siblings, 0 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-12 15:49 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On 12 Jan 2022, at 15:47, René Scharfe <l.s.r@web.de> wrote:
> 
> Am 12.01.22 um 14:58 schrieb Jessica Clarke:
>> 
>> I can see the alternative fixes for qsort are now in next, as is the
>> cleanup of register_symlink_changes to use string sets, which just
>> leaves the mem-pool.c change to not assume that uintmax_t is
>> sufficiently aligned for every type that git will use. If I move
>> GIT_MAX_ALIGNMENT and its helper aggregates to mem-pool.c would that be
>> acceptable?
> 
> Defining it at its single site of use sounds like a good idea to me.
> We can export it to git-compat-util.h later, iff necessary.
> 
>>> diff --git a/git-compat-util.h b/git-compat-util.h
>>> index 5fa54a7afe..28581a45c5 100644
>>> --- a/git-compat-util.h
>>> +++ b/git-compat-util.h
>>> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t;
>>> #define _ALL_SOURCE 1
>>> #endif
>>> 
>>> +typedef union {
>>> +	uintmax_t max_align_uintmax;
>>> +	void *max_align_pointer;
>>> +} git_max_align;
>>> +
>>> +typedef struct {
>>> +	char unalign;
>>> +	git_max_align aligned;
>>> +} git_max_alignment;
>>> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned)
> 
> Style nit: We tend to use typedef sparingly.  And the union type doesn't
> even need a name.  E.g. this would work as well, while reducing the name
> space footprint:
> 
>   struct git_max_alignment {
>      char unalign;
>      union {
>         uintmax_t max_align_uintmax;
>         void *max_align_pointer;
>      } aligned;
>   };
>   #define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)
> 
> When someone uses a mempool for objects that requires 128-bit alignment
> (e.g. because it includes a long double or uint128_t member) then we'd
> need add long double to the union as well, like e.g. compat/obstack.c
> does.  That would be the safe route, but currently only add 8 bytes of
> unnecessary overhead to each allocation.
> 
> Perhaps mempool alignment should be configurable.  For now a comment
> might suffice which indicates that GIT_MAX_ALIGNMENT is only the
> maximum alignment requirement of current mempool users, not the
> highest possible requirement for the machine.  Renaming it to something
> with MEMPOOL in the name would help in this regard as well.

Sure, that all makes sense.

Jess

>>> +
>>> /* used on Mac OS X */
>>> #ifdef PRECOMPOSE_UNICODE
>>> #include "compat/precompose_utf8.h"
>>> diff --git a/mem-pool.c b/mem-pool.c
>>> index ccdcad2e3d..748eff925a 100644
>>> --- a/mem-pool.c
>>> +++ b/mem-pool.c
>>> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>>> 	struct mp_block *p = NULL;
>>> 	void *r;
>>> 
>>> -	/* round up to a 'uintmax_t' alignment */
>>> -	if (len & (sizeof(uintmax_t) - 1))
>>> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>>> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
>>> +	if (len & (GIT_MAX_ALIGNMENT - 1))
>>> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>>> 
>>> 	if (pool->mp_block &&
>>> 	    pool->mp_block->end - pool->mp_block->next_free >= len)
>>> --
>>> 2.33.1
>>> 
>> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
                   ` (4 preceding siblings ...)
  2022-01-12 13:58 ` Jessica Clarke
@ 2022-01-23 15:24 ` Jessica Clarke
  2022-01-23 20:17   ` Junio C Hamano
  2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
  5 siblings, 2 replies; 32+ messages in thread
From: Jessica Clarke @ 2022-01-23 15:24 UTC (permalink / raw)
  To: git; +Cc: Jessica Clarke

Currently mem_pool_alloc uses sizeof(uintmax_t) as a proxy for what
should be _Alignof(max_align_t) in C11. On most architectures this is
sufficient (though on m68k it is in fact overly strict, since the
de-facto ABI, which differs from the specified System V ABI, has the
maximum alignment of all types as 2 bytes), but on CHERI, and thus Arm's
Morello prototype, it is insufficient for any type that stores a
pointer, which must be aligned to 128 bits (on 64-bit architectures
extended with CHERI), whilst uintmax_t is a 64-bit integer.

Fix this by introducing our own approximation for max_align_t and a
means to compute _Alignof it without relying on C11. Currently this
union only contains uintmax_t and void *, but more types can be added as
needed.

Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---
 mem-pool.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index ccdcad2e3d..b373cadd01 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -7,6 +7,22 @@
 
 #define BLOCK_GROWTH_SIZE (1024 * 1024 - sizeof(struct mp_block))
 
+/*
+ * The inner union is an approximation for C11's max_align_t, and the
+ * struct + offsetof computes _Alignof. This can all just be replaced
+ * with _Alignof(max_align_t) if/when C11 is part of the baseline.
+ *
+ * Add more types to the union if the current set is insufficient.
+ */
+struct git_max_alignment {
+	char unalign;
+	union {
+		uintmax_t max_align_uintmax;
+		void *max_align_pointer;
+	} aligned;
+};
+#define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)
+
 /*
  * Allocate a new mp_block and insert it after the block specified in
  * `insert_after`. If `insert_after` is NULL, then insert block at the
@@ -69,9 +85,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
 	struct mp_block *p = NULL;
 	void *r;
 
-	/* round up to a 'uintmax_t' alignment */
-	if (len & (sizeof(uintmax_t) - 1))
-		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
+	if (len & (GIT_MAX_ALIGNMENT - 1))
+		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
 
 	if (pool->mp_block &&
 	    pool->mp_block->end - pool->mp_block->next_free >= len)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
@ 2022-01-23 20:17   ` Junio C Hamano
  2022-01-23 20:23     ` Jessica Clarke
  2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2022-01-23 20:17 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

Jessica Clarke <jrtc27@jrtc27.com> writes:

> Currently mem_pool_alloc uses sizeof(uintmax_t) as a proxy for what
> should be _Alignof(max_align_t) in C11. On most architectures this is

Lose "Currently", as the present tense describes the status quo, the
shape of the problematic code we have today that wants improvement
by the proposed patch.

> sufficient (though on m68k it is in fact overly strict, since the
> de-facto ABI, which differs from the specified System V ABI, has the
> maximum alignment of all types as 2 bytes), but on CHERI, and thus Arm's
> Morello prototype, it is insufficient for any type that stores a
> pointer, which must be aligned to 128 bits (on 64-bit architectures
> extended with CHERI), whilst uintmax_t is a 64-bit integer.

OK.

> Fix this by introducing our own approximation for max_align_t and a
> means to compute _Alignof it without relying on C11. Currently this
> union only contains uintmax_t and void *, but more types can be added as
> needed.

Nicely described.

> +/*
> + * The inner union is an approximation for C11's max_align_t, and the
> + * struct + offsetof computes _Alignof. This can all just be replaced
> + * with _Alignof(max_align_t) if/when C11 is part of the baseline.
> + *
> + * Add more types to the union if the current set is insufficient.
> + */
> +struct git_max_alignment {
> +	char unalign;
> +	union {
> +		uintmax_t max_align_uintmax;
> +		void *max_align_pointer;
> +	} aligned;
> +};
> +#define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)
> +

The original computed the alignment requirement for uintmax_t as
sizeof(uintmax_t), not as

	offsetof(struct {
		char unalign;
		union { uintmax_t i; } aligned;
	}, aligned)

because if you have an array of a type, each element of it must be
aligned appropriately already for that type, without the unalignment
the outer struct enforces.  I wonder if your complex offsetof is
equivalent to sizeof(union { uintmax_t u; void *p; })?

IOW, in this struct:

	struct max_alignment_helper {
		char unalign;
		union {
			uintmax_t uintmax_t_unused;
			void *pointer_unused;
		} u[2];
	} s;

both s.u[0] and s.u[1] must be properly aligned, so wouldn't the
alignment requirement for the union type, which can be used to hold
a single value of either uintmax_t or a poinhter, be the distance
between these two array elements, i.e. sizeof(s.u[0])?

To put it differently in yet another way, wouldn't it simplify down
to this?

	union max_alignment_helper {
		uintmax_t uintmax_t_unused;
                void *pointer_unused;
	};
	#define GIT_MAX_ALIGNMENT sizeof(union max_alignment_helper);

I am not saying that the "a forcibly unaligned union in a struct" is
a bad/wrong way to express what you want to achieve.  I just do not
know if there is a reason to choose it over a seemingly simpler
sizeof(that union) without the outer struct and unalign member.

Other than that, looks OK to me.  Especially the parts that use the
macro look correctly converted.

Thanks.

> @@ -69,9 +85,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>  	struct mp_block *p = NULL;
>  	void *r;
>  
> -	/* round up to a 'uintmax_t' alignment */
> -	if (len & (sizeof(uintmax_t) - 1))
> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
> +	if (len & (GIT_MAX_ALIGNMENT - 1))
> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>  
>  	if (pool->mp_block &&
>  	    pool->mp_block->end - pool->mp_block->next_free >= len)


>  /*
>   * Allocate a new mp_block and insert it after the block specified in
>   * `insert_after`. If `insert_after` is NULL, then insert block at the
> @@ -69,9 +85,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>  	struct mp_block *p = NULL;
>  	void *r;
>  
> -	/* round up to a 'uintmax_t' alignment */
> -	if (len & (sizeof(uintmax_t) - 1))
> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
> +	if (len & (GIT_MAX_ALIGNMENT - 1))
> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>  
>  	if (pool->mp_block &&
>  	    pool->mp_block->end - pool->mp_block->next_free >= len)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-23 20:17   ` Junio C Hamano
@ 2022-01-23 20:23     ` Jessica Clarke
  2022-01-23 20:28       ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-23 20:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 23 Jan 2022, at 20:17, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Jessica Clarke <jrtc27@jrtc27.com> writes:
> 
>> Currently mem_pool_alloc uses sizeof(uintmax_t) as a proxy for what
>> should be _Alignof(max_align_t) in C11. On most architectures this is
> 
> Lose "Currently", as the present tense describes the status quo, the
> shape of the problematic code we have today that wants improvement
> by the proposed patch.

Do you want a v3 or is that something you'll amend on git-am?

>> sufficient (though on m68k it is in fact overly strict, since the
>> de-facto ABI, which differs from the specified System V ABI, has the
>> maximum alignment of all types as 2 bytes), but on CHERI, and thus Arm's
>> Morello prototype, it is insufficient for any type that stores a
>> pointer, which must be aligned to 128 bits (on 64-bit architectures
>> extended with CHERI), whilst uintmax_t is a 64-bit integer.
> 
> OK.
> 
>> Fix this by introducing our own approximation for max_align_t and a
>> means to compute _Alignof it without relying on C11. Currently this
>> union only contains uintmax_t and void *, but more types can be added as
>> needed.
> 
> Nicely described.
> 
>> +/*
>> + * The inner union is an approximation for C11's max_align_t, and the
>> + * struct + offsetof computes _Alignof. This can all just be replaced
>> + * with _Alignof(max_align_t) if/when C11 is part of the baseline.
>> + *
>> + * Add more types to the union if the current set is insufficient.
>> + */
>> +struct git_max_alignment {
>> +	char unalign;
>> +	union {
>> +		uintmax_t max_align_uintmax;
>> +		void *max_align_pointer;
>> +	} aligned;
>> +};
>> +#define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)
>> +
> 
> The original computed the alignment requirement for uintmax_t as
> sizeof(uintmax_t), not as
> 
> 	offsetof(struct {
> 		char unalign;
> 		union { uintmax_t i; } aligned;
> 	}, aligned)
> 
> because if you have an array of a type, each element of it must be
> aligned appropriately already for that type, without the unalignment
> the outer struct enforces.  I wonder if your complex offsetof is
> equivalent to sizeof(union { uintmax_t u; void *p; })?
> 
> IOW, in this struct:
> 
> 	struct max_alignment_helper {
> 		char unalign;
> 		union {
> 			uintmax_t uintmax_t_unused;
> 			void *pointer_unused;
> 		} u[2];
> 	} s;
> 
> both s.u[0] and s.u[1] must be properly aligned, so wouldn't the
> alignment requirement for the union type, which can be used to hold
> a single value of either uintmax_t or a poinhter, be the distance
> between these two array elements, i.e. sizeof(s.u[0])?
> 
> To put it differently in yet another way, wouldn't it simplify down
> to this?
> 
> 	union max_alignment_helper {
> 		uintmax_t uintmax_t_unused;
>                void *pointer_unused;
> 	};
> 	#define GIT_MAX_ALIGNMENT sizeof(union max_alignment_helper);
> 
> I am not saying that the "a forcibly unaligned union in a struct" is
> a bad/wrong way to express what you want to achieve.  I just do not
> know if there is a reason to choose it over a seemingly simpler
> sizeof(that union) without the outer struct and unalign member.

So, sizeof(X) does not always equal _Alignof(X), even for primitive
types, _Alignof need only be a factor of sizeof. The two are the same
on most architectures, and is a sensible ABI, but the exception is the
m68k case I was referring to above. On m68k, sizeof(long long) == 8,
but _Alignof(long long) == 2 (yes this is a real pain point of its ABI;
in particular int is only 2-byte aligned, but futex(2) explicitly
requires 4-byte alignment). So using sizeof definitely gets you
something sufficiently aligned, but can waste space. This doesn’t
affect CHERI/Morello, all our implementations keep sizeof == _Alignof,
but as I was changing this code I felt I should use the more precise
construct.

Jess

> Other than that, looks OK to me.  Especially the parts that use the
> macro look correctly converted.
> 
> Thanks.
> 
>> @@ -69,9 +85,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>> 	struct mp_block *p = NULL;
>> 	void *r;
>> 
>> -	/* round up to a 'uintmax_t' alignment */
>> -	if (len & (sizeof(uintmax_t) - 1))
>> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
>> +	if (len & (GIT_MAX_ALIGNMENT - 1))
>> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>> 
>> 	if (pool->mp_block &&
>> 	    pool->mp_block->end - pool->mp_block->next_free >= len)
> 
> 
>> /*
>>  * Allocate a new mp_block and insert it after the block specified in
>>  * `insert_after`. If `insert_after` is NULL, then insert block at the
>> @@ -69,9 +85,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
>> 	struct mp_block *p = NULL;
>> 	void *r;
>> 
>> -	/* round up to a 'uintmax_t' alignment */
>> -	if (len & (sizeof(uintmax_t) - 1))
>> -		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
>> +	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
>> +	if (len & (GIT_MAX_ALIGNMENT - 1))
>> +		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
>> 
>> 	if (pool->mp_block &&
>> 	    pool->mp_block->end - pool->mp_block->next_free >= len)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-23 20:23     ` Jessica Clarke
@ 2022-01-23 20:28       ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-01-23 20:28 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

Jessica Clarke <jrtc27@jrtc27.com> writes:

> So, sizeof(X) does not always equal _Alignof(X), even for primitive
> types, _Alignof need only be a factor of sizeof. The two are the same
> on most architectures, and is a sensible ABI, but the exception is the
> m68k case I was referring to above. On m68k, sizeof(long long) == 8,
> but _Alignof(long long) == 2 (yes this is a real pain point of its ABI;

Ah, thanks.  Having a variant of the above explanation in a comment
next to the "union within a struct" would help future readers from
asking the same question as I asked.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
  2022-01-23 20:17   ` Junio C Hamano
@ 2022-01-23 20:33   ` Jessica Clarke
  2022-01-24 17:11     ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Jessica Clarke @ 2022-01-23 20:33 UTC (permalink / raw)
  To: git; +Cc: Jessica Clarke

mem_pool_alloc uses sizeof(uintmax_t) as a proxy for what should be
_Alignof(max_align_t) in C11. On most architectures this is sufficient
(though on m68k it is in fact overly strict, since the de-facto ABI,
which differs from the specified System V ABI, has the maximum alignment
of all types as 2 bytes), but on CHERI, and thus Arm's Morello
prototype, it is insufficient for any type that stores a pointer, which
must be aligned to 128 bits (on 64-bit architectures extended with
CHERI), whilst uintmax_t is a 64-bit integer.

Fix this by introducing our own approximation for max_align_t and a
means to compute _Alignof it without relying on C11. Currently this
union only contains uintmax_t and void *, but more types can be added as
needed.

Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
---

Changed in v3:
* Removed first "Currently" from commit message body
* Added explanation for _Alignof vs sizeof in comment

 mem-pool.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index ccdcad2e3d..599d8e895f 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -7,6 +7,26 @@
 
 #define BLOCK_GROWTH_SIZE (1024 * 1024 - sizeof(struct mp_block))
 
+/*
+ * The inner union is an approximation for C11's max_align_t, and the
+ * struct + offsetof computes _Alignof. This can all just be replaced
+ * with _Alignof(max_align_t) if/when C11 is part of the baseline.
+ * Note that _Alignof(X) need not be the same as sizeof(X); it's only
+ * required to be a (possibly trivial) factor. They are the same for
+ * most architectures, but m68k for example has only 2-byte alignment
+ * for its 4-byte and 8-byte types, so using sizeof would waste space.
+ *
+ * Add more types to the union if the current set is insufficient.
+ */
+struct git_max_alignment {
+	char unalign;
+	union {
+		uintmax_t max_align_uintmax;
+		void *max_align_pointer;
+	} aligned;
+};
+#define GIT_MAX_ALIGNMENT offsetof(struct git_max_alignment, aligned)
+
 /*
  * Allocate a new mp_block and insert it after the block specified in
  * `insert_after`. If `insert_after` is NULL, then insert block at the
@@ -69,9 +89,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)
 	struct mp_block *p = NULL;
 	void *r;
 
-	/* round up to a 'uintmax_t' alignment */
-	if (len & (sizeof(uintmax_t) - 1))
-		len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
+	/* round up to a 'GIT_MAX_ALIGNMENT' alignment */
+	if (len & (GIT_MAX_ALIGNMENT - 1))
+		len += GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - 1));
 
 	if (pool->mp_block &&
 	    pool->mp_block->end - pool->mp_block->next_free >= len)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v3] mem-pool: Don't assume uintmax_t is aligned enough for all types
  2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
@ 2022-01-24 17:11     ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2022-01-24 17:11 UTC (permalink / raw)
  To: Jessica Clarke; +Cc: git

> +/*
> + * The inner union is an approximation for C11's max_align_t, and the
> + * struct + offsetof computes _Alignof. This can all just be replaced
> + * with _Alignof(max_align_t) if/when C11 is part of the baseline.
> + * Note that _Alignof(X) need not be the same as sizeof(X); it's only
> + * required to be a (possibly trivial) factor. They are the same for
> + * most architectures, but m68k for example has only 2-byte alignment
> + * for its 4-byte and 8-byte types, so using sizeof would waste space.
> + *
> + * Add more types to the union if the current set is insufficient.
> + */

That reads very clear.  Thanks.

Will queue.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-01-24 17:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
2022-01-06 21:46 ` Taylor Blau
2022-01-06 21:56   ` Jessica Clarke
2022-01-06 22:27   ` Junio C Hamano
2022-01-06 22:56     ` Jessica Clarke
2022-01-07  0:10       ` Junio C Hamano
2022-01-07  0:22         ` Jessica Clarke
2022-01-07  0:31         ` brian m. carlson
2022-01-07  0:39           ` Jessica Clarke
2022-01-07  1:43             ` brian m. carlson
2022-01-07  2:08               ` Jessica Clarke
2022-01-07  2:11                 ` Jessica Clarke
2022-01-07 19:30               ` Junio C Hamano
2022-01-07 19:33                 ` Jessica Clarke
2022-01-07 20:56                 ` René Scharfe
2022-01-07 21:30                   ` Junio C Hamano
2022-01-07 23:30                     ` René Scharfe
2022-01-08  0:18                       ` Elijah Newren
2022-01-06 23:22 ` brian m. carlson
2022-01-06 23:31   ` Jessica Clarke
2022-01-07 14:57 ` Philip Oakley
2022-01-07 16:08 ` René Scharfe
2022-01-07 16:21   ` Jessica Clarke
2022-01-12 13:58 ` Jessica Clarke
2022-01-12 15:47   ` René Scharfe
2022-01-12 15:49     ` Jessica Clarke
2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
2022-01-23 20:17   ` Junio C Hamano
2022-01-23 20:23     ` Jessica Clarke
2022-01-23 20:28       ` Junio C Hamano
2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
2022-01-24 17:11     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).