From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id 185941F953 for ; Fri, 7 Jan 2022 16:21:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348210AbiAGQVu (ORCPT ); Fri, 7 Jan 2022 11:21:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239927AbiAGQVu (ORCPT ); Fri, 7 Jan 2022 11:21:50 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D863C061574 for ; Fri, 7 Jan 2022 08:21:49 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id w20so11999680wra.9 for ; Fri, 07 Jan 2022 08:21:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrtc27.com; s=gmail.jrtc27.user; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=BmO8I+8v+eHjbiY26/UpqV1JKCk7WHiKE73ulfyWP+0=; b=FIhEUVudyta+WEGkU0xYaJhBOdteGdobcCAUKi2Rr/ocAHYbvU87wjQn49xdq2GkM/ 1kmwTWx2ZpPRfnOdrg7Up9X/i3q17VtNIr+5/Jl/3G0NbqFY4MNMSmCZ5TiB+2JSB41g NDx7qOSLP1tO0nfd3KK3AkjDDw9Qn+Klrmb9/vKI2VpQL9/HiNzb+LwczF/HrMcBMZmv GlqYyr3vijd/6qf/h++FIhODj5HvEZ2PYD8AiNt9cgY2qKpfL64QKZxVs50MmLERnHLu 0SKf4LuTI2im7aY/osPRIkKuQAhlXrAA+QLHOFpg6OtaK87iG8jAFWTmUO/nhypL/5FJ Ve5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=BmO8I+8v+eHjbiY26/UpqV1JKCk7WHiKE73ulfyWP+0=; b=JyFQ8wSdgRD8ofXOBINagL3yPvGhUgkc1axLtBks1swtXEYZNEDvYyNhmLj4yK0dxc WHE7uLOTHM2/RAiR/JNAllSzuiKM4pe8oR7vme0jjAnjJeaDnwhewKof6Xq8u8zLLBTJ JeYynH5h4JogVuTbYKf+aLP8dJBaRvzq8AfwfBETud5C7p4QL94bEAGxDOJ1nruHmgx3 gvkxKhbfjlcl+Rqr/GzGvqP675WTIrdmBmyd/N+joyebpU0eKm6/6YjKrP8PQJVgI+bt VOJNtuJs0RKhIZuMDy4GR1TRZYrNB1MJAz6/YSMFnvyPYmUZ8nTc5u8UpWN6EYq1sQXo twfQ== X-Gm-Message-State: AOAM533736WLRE3KiMhfbzvnpJQEzhXt2NH4DvgUCAO+y54fhmudySYe jRl+tiQH1amiSMEj+lW1kjzGGm03IV5Kss3a X-Google-Smtp-Source: ABdhPJzxoscuXyLKgxwXPyt7nGWL7+JNJHk01Ik4FMDmbRUYvqzB4ZaIBGFVoaaXX8jjbq56zMBi/Q== X-Received: by 2002:a05:6000:50b:: with SMTP id a11mr23517602wrf.387.1641572507811; Fri, 07 Jan 2022 08:21:47 -0800 (PST) Received: from smtpclient.apple (global-5-142.nat-2.net.cam.ac.uk. [131.111.5.142]) by smtp.gmail.com with ESMTPSA id i15sm5415178wrf.6.2022.01.07.08.21.47 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Jan 2022 08:21:47 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.40.0.1.81\)) Subject: Re: [PATCH] Properly align memory allocations and temporary buffers From: Jessica Clarke In-Reply-To: <3b469866-3718-9f0e-419c-bdc793de18e9@web.de> Date: Fri, 7 Jan 2022 16:21:47 +0000 Cc: git@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20220105132324.6651-1-jrtc27@jrtc27.com> <3b469866-3718-9f0e-419c-bdc793de18e9@web.de> To: =?utf-8?Q?Ren=C3=A9_Scharfe?= X-Mailer: Apple Mail (2.3693.40.0.1.81) Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 7 Jan 2022, at 16:08, Ren=C3=A9 Scharfe wrote: >=20 > Am 05.01.22 um 14:23 schrieb Jessica Clarke: >> Currently git_qsort_s allocates a buffer on the stack that has no >> alignment, and mem_pool_alloc assumes uintmax_t's size is adequate >> alignment for any type. >>=20 >> On CHERI, and thus Arm's Morello prototype, pointers are implemented = as >> hardware capabilities which, as well as having a normal integer = address, >> have additional bounds, permissions and other metadata in a second = word, >> so on a 64-bit architecture they are 128-bit quantities, including = their >> alignment requirements. Despite being 128-bit, their integer = component >> is still only a 64-bit field, so uintmax_t remains 64-bit, and = therefore >> uintmax_t does not sufficiently align an allocation. >>=20 >> Moreover, these capabilities have an additional "129th" tag bit, = which >> tracks the validity of the capability and is cleared on any invalid >> operation that doesn't trap (e.g. partially overwriting a capability >> will invalidate it) which, combined with the architecture's strict >> checks on capability manipulation instructions, ensures it is >> architecturally impossible to construct a capability that gives more >> rights than those you were given in the first place. To store these = tag >> bits, each capability sized and aligned word in memory gains a single >> tag bit that is stored in unaddressable (to the processor) memory. = This >> means that it is impossible to store a capability at an unaligned >> address: a normal load or store of a capability will always take an >> alignment fault even if the (micro)architecture supports unaligned >> loads/stores for other data types, and a memcpy will, if the = destination >> is not appropriately aligned, copy the byte representation but lose = the >> tag, meaning that if it is eventually copied back and loaded from an >> aligned location any attempt to dereference it will trap with a tag >> fault. Thus, even char buffers that are memcpy'ed to or from must be >> properly aligned on CHERI architectures if they are to hold pointers. >>=20 >> Address both of these by introducing a new git_max_align type put in = a >> union with the on-stack buffer to force its alignment, as well as a = new >> GIT_MAX_ALIGNMENT macro whose value is the alignment of git_max_align >> that gets used for mem_pool_alloc. As well as making the code work on >> CHERI, the former change likely also improves performance on some >> architectures by making memcpy faster (either because it can use = larger >> block sizes or because the microarchitecture has inefficient = unaligned >> accesses). >>=20 >> Signed-off-by: Jessica Clarke >> --- >> compat/qsort_s.c | 11 +++++++---- >> git-compat-util.h | 11 +++++++++++ >> mem-pool.c | 6 +++--- >> 3 files changed, 21 insertions(+), 7 deletions(-) >>=20 >> diff --git a/compat/qsort_s.c b/compat/qsort_s.c >> index 52d1f0a73d..1ccdb87451 100644 >> --- a/compat/qsort_s.c >> +++ b/compat/qsort_s.c >> @@ -49,16 +49,19 @@ int git_qsort_s(void *b, size_t n, size_t s, >> int (*cmp)(const void *, const void *, void *), void = *ctx) >> { >> const size_t size =3D st_mult(n, s); >> - char buf[1024]; >> + union { >> + char buf[1024]; >> + git_max_align align; >> + } u; >>=20 >> if (!n) >> return 0; >> if (!b || !cmp) >> return -1; >>=20 >> - if (size < sizeof(buf)) { >> - /* The temporary array fits on the small on-stack = buffer. */ >> - msort_with_tmp(b, n, s, cmp, buf, ctx); >> + if (size < sizeof(u.buf)) { >> + /* The temporary array fits in the small on-stack = buffer. */ >> + msort_with_tmp(b, n, s, cmp, u.buf, ctx); >=20 > So buf gets maximum alignment instead of char alignment (i.e. none) > because some callers use it to sort pointers, which need that on your > platform. Makes sense. Yes, exactly. >> } else { >> /* It's somewhat large, so malloc it. */ >> char *tmp =3D xmalloc(size); >=20 > tmp is used instead of buf if the latter is not big enough, so it can > also contain pointers. No problem, because malloc(3) returns memory > that is properly aligned for anything already. Yes. > stable-qsort.c uses the same algorithm as compat/qsort_s.c, it just > lacks the context pointer. Shouldn't it get the same treatment? It > is used e.g. (via the macro STABLE_QSORT) in merge-ort.c to sort > pointers.. Indeed it should; I guess git_stable_qsort just isn=E2=80=99t used for = most common commands, or at least not with structures containing pointers, as I haven=E2=80=99t seen it break yet. >> diff --git a/git-compat-util.h b/git-compat-util.h >> index 5fa54a7afe..28581a45c5 100644 >> --- a/git-compat-util.h >> +++ b/git-compat-util.h >> @@ -274,6 +274,17 @@ typedef unsigned long uintptr_t; >> #define _ALL_SOURCE 1 >> #endif >>=20 >> +typedef union { >> + uintmax_t max_align_uintmax; >> + void *max_align_pointer; >> +} git_max_align; >=20 > For your purposes just the void * member would suffice here, right? = And > with the added uintmax_t this currently gets maximum alignment, = suitable > for any of our objects. If we were to start using __int128 etc. then > we'd have to add that to this union as well to really get the maximum > possible alignment, though. For us, yes, uintmax_t=E2=80=99s alignment is never greater than void = *=E2=80=99s. On most 32-bit non-CHERI architectures though that isn=E2=80=99t the case, = so whilst omitting uintmax_t would be fine for qsort, it would break mem_pool_alloc=E2=80=99s alignment, as this gets used for both. Plus = aligning to uintmax_t may still help performance for memcpy there (e.g. maybe it guarantees the memcpy implementation can use a load/store pair instruction that it wouldn=E2=80=99t otherwise necessarily be able to). Regarding __int128, yes, you would, though both GCC=E2=80=99s and = FreeBSD=E2=80=99s stddef.h don=E2=80=99t admit __int128 exists. They do include long = double in their max_align_t, though, which generally has the same alignment requirements as __int128 on 64-bit architectures, though I have a funny feeling it might not on some. >> + >> +typedef struct { >> + char unalign; >> + git_max_align aligned; >> +} git_max_alignment; >> +#define GIT_MAX_ALIGNMENT offsetof(git_max_alignment, aligned) >=20 > C11 has alignas, alignof and max_align_t. We only recently started to > depend on some C99 features, so perhaps it's a bit early to use > stdalign.h in Git's code base. That's a pity, though. The > GIT_MAX_ALIGNMENT macro is sightly enough, but using a union to get > pointer alignment is a bit more cumbersome than something like >=20 > alignas(alignof(max_align_t)) char buf[1024]; It is; max_align_t is our preferred solution and what our programming guide recommends, but not everyone is ready to jump to C11, so until then various bodges like this are needed. Jess >> + >> /* used on Mac OS X */ >> #ifdef PRECOMPOSE_UNICODE >> #include "compat/precompose_utf8.h" >> diff --git a/mem-pool.c b/mem-pool.c >> index ccdcad2e3d..748eff925a 100644 >> --- a/mem-pool.c >> +++ b/mem-pool.c >> @@ -69,9 +69,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t = len) >> struct mp_block *p =3D NULL; >> void *r; >>=20 >> - /* round up to a 'uintmax_t' alignment */ >> - if (len & (sizeof(uintmax_t) - 1)) >> - len +=3D sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - = 1)); >> + /* round up to a 'GIT_MAX_ALIGNMENT' alignment */ >> + if (len & (GIT_MAX_ALIGNMENT - 1)) >> + len +=3D GIT_MAX_ALIGNMENT - (len & (GIT_MAX_ALIGNMENT - = 1)); >=20 > OK. >=20 >>=20 >> if (pool->mp_block && >> pool->mp_block->end - pool->mp_block->next_free >=3D len)