git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jessica Clarke <jrtc27@jrtc27.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Taylor Blau" <me@ttaylorr.com>, "René Scharfe" <l.s.r@web.de>,
	git@vger.kernel.org
Subject: Re: [PATCH] Properly align memory allocations and temporary buffers
Date: Fri, 7 Jan 2022 02:11:26 +0000	[thread overview]
Message-ID: <40419F44-8D68-42EF-BA50-27AB2B167265@jrtc27.com> (raw)
In-Reply-To: <B8B73861-9095-47D2-AC4F-5EC8D7589401@jrtc27.com>

On 7 Jan 2022, at 02:08, Jessica Clarke <jrtc27@jrtc27.com> wrote:
> 
> On 7 Jan 2022, at 01:43, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>> 
>> On 2022-01-07 at 00:39:59, Jessica Clarke wrote:
>>> On 7 Jan 2022, at 00:31, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>>>> If you want to get really language-lawyer-y about it, you can actually
>>>> argue that this is a compliant implementation of the C standard.
>>>> Integer types are permitted to have padding bits, and some combinations
>>>> of padding bits are allowed to be trap representations. Technically, in
>>>> our representation, the metadata bits are padding bits, because they do
>>>> not contribute to the precision like value bits. It is therefore the
>>>> case that the *value* of a uintptr_t still fits into a uintmax_t, but
>>>> the latter has no padding bits, and casting the latter to the former
>>>> yields a trap representation when further cast back to a pointer. This
>>>> may not the intent of the spec, and not how anyone thinks of it because
>>>> CHERI is the first implementation that pushes the boundary here, but
>>>> it’s technically legal under that interpretation. You may disagree with
>>>> the interpretation, and I don’t like to use it most of the time because
>>>> it’s complicated and involves yet more ill-defined parts of the spec
>>>> (e.g. it says arithmetic operations on valid values (they mean objects,
>>>> I assume, as the value only includes value bits, but the input could be
>>>> a trap representation on some implementations) never generate a trap
>>>> representation other than as part of an exceptional condition such as
>>>> an overflow, but nowhere defines what counts as an arithmetic
>>>> operation).
>>> 
>>> 
>>> So, no, C does not actually require what you say. It requires that void
>>> * -> uintptr_t -> void * give you a valid pointer. It requires that
>>> uintptr_t -> uintmax_t preserves the *value* of the uintptr_t, which we
>>> do, because the value is formed from only the value bits which
>>> contribute to the precision, which is 64 bits in this case, and
>>> uintmax_t is still 64-bit. It requires that uintmax_t -> uintptr_t,
>>> since uintptr_t’s precision is the same as uintmax_t’s, be always
>>> valid, which is is. But it does not require that that uintptr_t has the
>>> same representation as the original uintptr_t, which it does not for
>>> us. And therefore it does not require that casting that uintptr_t back
>>> to a void * yields a valid pointer. So if you want to really dig into
>>> the details of the standard, we are technically compliant, even if some
>>> might argue it’s not in the spirit of the standard.
>> 
>> Sure, implementations are allowed to have padding bits.  They're also
>> allowed, at the moment, to use signed-magnitude or ones' complement
>> integers, have CHAR_BIT greater than 8, have sizeof(char) ==
>> sizeof(short), not implement any of the customary sizes of intN_t or
>> uintN_t, not provide uintptr_t, and use middle-endian numbers.
>> 
>> However, if your ABI is only compliant in the face of those features
>> (especially when it could have been written in a way which would have
>> been compliant without the use of those features), it's intentionally
>> hostile to real-world developers, and I don't think we should support
>> it[0].  I'd be willing to revisit this if your ABI were defined in a
>> reasonable, sane way, where sizeof(uintmax_t) >= sizeof(uintptr_t),
>> without padding bits, where the alignment of pointers from malloc is
>> suitable for all types, and where the alignment of a type is no greater
>> than sizeof(type).
>> 
>> I'm not opposed to a small amount of finagling for this case, but I am
>> very much opposed to defining your C ABI in an intentionally difficult
>> way.  128-bit integers in 64-bit Linux were not originally part of the C
>> ABIs and if the ABI is ill defined now, that's a historical accident.
>> But this is a new ABI for a new architecture and it could have been
>> defined in a responsible way, but wasn't.
> 
> It’s not a choice. It is not possible to define uintmax_t as being able
> to safely round-trip any uintptr_t in a way that preserves pointer
> validity unless you want to outlaw 64-bit integers on 32-bit CHERI
> implementations, and good luck with that one, for reasons I have
> previously explained, so if you do not wish to support this C
> implementation then upstream Git will not work on CHERI/Arm’s Morello.
> Nothing in this patch causes additional burden on other architectures,
> in fact it likely improves performance on some due to increased buffer
> alignment, so this seems a rather unnecessary line to take. We have all
> of our fork of FreeBSD, kernel and userland, built and working on
> CHERI, as well as X11 and the core of a KDE desktop (KWin, Plasma,
> Kate, Okular, Gwenview, Konsole; we would have more but up until
> recently we’ve needed to cross-compile everything and that’s often a
> pain to get things building), plus WebKit’s JavaScriptCore complete
> with a working JIT, so Git would be a notable omission in the stack,
> and rather important for developers to actually get work done on their
> boards.
> 
> Note that C requires us to define those bits as padding bits; the range
> of an integer type is defined to be 0 to 2^N-1, where N is the number
> of value bits, so it would be impossible to have any metadata in a
> uintptr_t.
> 
> So I take issue with “in a responsible way”. CHERI has been in
> development for over 10 years, it’s been done in collaboration with
> Arm, with people intimately familiar with C semantics and with
> extensive analysis of what real-world software expects of C in order to
> come up with the most-compatible implementation we reasonably can
> that’s still feasible to implement in hardware and provides the desired
> protection properties. Declaring it broken/hostile/etc feels like
> throwing the baby out with the bath water; for industry to move away
> from the sorry state of affairs that we still see with C and C++ where
> memory safety vulnerabilities are ubiquitous you cannot just restrict
> new ABIs to the status quo, as you just won’t make any progress.
> 
>> As an aside, I was actually going to point out that you could propose a
>> nice Rust or Go ABI with the status quo, but if your C ABI requires
>> padding bits, then you're probably going to have a hard time doing so,
>> since I don't believe those languages support padding bits and they need
>> to support the C ABI.
> 
> There is no such restriction in Rust that I can see. It’s true you’d
> need to slightly tweak Go’s specification as uintptr is currently an
> integer type and n-bit integer types are defined to be n-bit two’s
> complement values. I see no reason why that couldn’t be done though, I
> doubt much cares. Rust only defines the representation of its
> fixed-width integer types. There’s no reason you couldn’t add an
> additional integer type for a capability, like we do for C. The only
> issue with Rust is that it conflates uintptr_t with size_t, for which
> there are various discussion threads with Rust maintainers and a path
> forward for making it work if someone has the time to invest in making
> Rust work. Languages can evolve; there’s no reason they can’t for CHERI
> like they do for anything else they want to add, change, deprecate or
> remove.
> 
> But this is all irrelevant to discussions of C.

Oh and:

> where the alignment of pointers from malloc is
> suitable for all types

This is true of CHERI.

> and where the alignment of a type is no greater
> than sizeof(type).

This is true of CHERI, and it’s impossible for that not to be true.
Otherwise arrays don’t work.

Jess


  reply	other threads:[~2022-01-07  2:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05 13:23 [PATCH] Properly align memory allocations and temporary buffers Jessica Clarke
2022-01-06 21:46 ` Taylor Blau
2022-01-06 21:56   ` Jessica Clarke
2022-01-06 22:27   ` Junio C Hamano
2022-01-06 22:56     ` Jessica Clarke
2022-01-07  0:10       ` Junio C Hamano
2022-01-07  0:22         ` Jessica Clarke
2022-01-07  0:31         ` brian m. carlson
2022-01-07  0:39           ` Jessica Clarke
2022-01-07  1:43             ` brian m. carlson
2022-01-07  2:08               ` Jessica Clarke
2022-01-07  2:11                 ` Jessica Clarke [this message]
2022-01-07 19:30               ` Junio C Hamano
2022-01-07 19:33                 ` Jessica Clarke
2022-01-07 20:56                 ` René Scharfe
2022-01-07 21:30                   ` Junio C Hamano
2022-01-07 23:30                     ` René Scharfe
2022-01-08  0:18                       ` Elijah Newren
2022-01-06 23:22 ` brian m. carlson
2022-01-06 23:31   ` Jessica Clarke
2022-01-07 14:57 ` Philip Oakley
2022-01-07 16:08 ` René Scharfe
2022-01-07 16:21   ` Jessica Clarke
2022-01-12 13:58 ` Jessica Clarke
2022-01-12 15:47   ` René Scharfe
2022-01-12 15:49     ` Jessica Clarke
2022-01-23 15:24 ` [PATCH v2] mem-pool: Don't assume uintmax_t is aligned enough for all types Jessica Clarke
2022-01-23 20:17   ` Junio C Hamano
2022-01-23 20:23     ` Jessica Clarke
2022-01-23 20:28       ` Junio C Hamano
2022-01-23 20:33   ` [PATCH v3] " Jessica Clarke
2022-01-24 17:11     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40419F44-8D68-42EF-BA50-27AB2B167265@jrtc27.com \
    --to=jrtc27@jrtc27.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).