On 2022-01-05 at 13:23:24, Jessica Clarke wrote:
> Currently git_qsort_s allocates a buffer on the stack that has no
> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate
> alignment for any type.
> 
> On CHERI, and thus Arm's Morello prototype, pointers are implemented as
> hardware capabilities which, as well as having a normal integer address,
> have additional bounds, permissions and other metadata in a second word,
> so on a 64-bit architecture they are 128-bit quantities, including their
> alignment requirements. Despite being 128-bit, their integer component
> is still only a 64-bit field, so uintmax_t remains 64-bit, and therefore
> uintmax_t does not sufficiently align an allocation.
> 
> Moreover, these capabilities have an additional "129th" tag bit, which
> tracks the validity of the capability and is cleared on any invalid
> operation that doesn't trap (e.g. partially overwriting a capability
> will invalidate it) which, combined with the architecture's strict
> checks on capability manipulation instructions, ensures it is
> architecturally impossible to construct a capability that gives more
> rights than those you were given in the first place. To store these tag
> bits, each capability sized and aligned word in memory gains a single
> tag bit that is stored in unaddressable (to the processor) memory. This
> means that it is impossible to store a capability at an unaligned
> address: a normal load or store of a capability will always take an
> alignment fault even if the (micro)architecture supports unaligned
> loads/stores for other data types, and a memcpy will, if the destination
> is not appropriately aligned, copy the byte representation but lose the
> tag, meaning that if it is eventually copied back and loaded from an
> aligned location any attempt to dereference it will trap with a tag
> fault. Thus, even char buffers that are memcpy'ed to or from must be
> properly aligned on CHERI architectures if they are to hold pointers.

I think this is going to be a problem in a lot of places, not just in
Git.  I'm pretty sure that copying data this way is specifically allowed
by C and POSIX, and thus this approach is going to break a whole lot of
things.

For example, casting a void * to uintptr_t and back should produce two
pointers that compare equal.  The C standard says that two pointers
compare equal if they're both null, both point to the same object, or
one points one past the end of an array and the other happens to point
to the beginning of another object.  If the pointers aren't null and the
original one points to valid data, then the resulting pointer (after the
two casts) would point to the same object (since that's the only valid
option that's left), and therefore could be used to access it, but that
wouldn't necessarily work in this case.

The CHERI paper I'm reading also specifically says it is not changing
uintmax_t, which is a direct violation of the C standard.  If uintptr_t
must be larger than 64 bits, then so must uintmax_t, even if that
happens to be inconvenient (because it changes the ABI from the normal
system ABI).  It sounds like, in fact, that you can't actually provide
uintptr_t with the current architecture, because it can't be provided in
a standard-compliant way.

Is there something I'm missing here, or is it the case that CHERI's
behavior isn't compliant with the C standard?
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA