From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, PDS_RDNS_DYNAMIC_FP,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RDNS_DYNAMIC, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 1B0EF1F8C6 for ; Thu, 5 Aug 2021 19:36:54 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 146D63858402 for ; Thu, 5 Aug 2021 19:36:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 146D63858402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628192213; bh=UwWTdXbl0WpVBw9f3iPQ7K0MUQVe6iCegRwsM8xi/4U=; h=Subject:Date:In-Reply-To:To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=GPVo0fbHtDSJbTcCsfSL71GFH8kVkadOzPWzqmWMfEYMUtzeA+i/3pP55pJZY8VBo L+0bldiiJ+kEHx8XsWJR0BIUMjAJ0JUYuJ39I8XwicbV87/b4eKGkI8VhuhukNRhC4 omBbt0bODUZMvqxjGI4ISJa2yCQADV2y2nx0vFAg= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 9BFCC3858402 for ; Thu, 5 Aug 2021 19:36:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9BFCC3858402 Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-435-X2QhwviwOnqhhWc1cP9XWA-1; Thu, 05 Aug 2021 15:36:26 -0400 X-MC-Unique: X2QhwviwOnqhhWc1cP9XWA-1 Received: by mail-qt1-f199.google.com with SMTP id u16-20020a05622a14d0b029028ca201eab9so2644953qtx.21 for ; Thu, 05 Aug 2021 12:36:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=sQhtyXEuEOjYBJ99EJDpHgXsuY24M0CwKExcLIUGgQg=; b=hMEmLaHaR1MbGcyA6hIJCmzvd+qMqeBC1Oa913oH60hMvJu7NkPtkCSKHN/QrdEy2S 9KCFUOgzQFSe92lVgyzmUxSYg/kdt7Bci0PDCQhEiTHaNFTSYhemwik4lB74Hl3Zokcy jhxxU5g4GfgsltSQKbOuBZ95RxtrK2Q9nfiSHWPGpi1gJ8fCoq8Zz2ynkXVJJr4cFgTp Nw86cfiNA8hfe84g7PWPbdqFjB48Y0zU3hktwSKdy3/Yo2bcmAL9EmJtt7XXzoyvTnJi oYQ1yxie/1zfwgxzTc40OQyexxbiVNOc2sLSE3EEpDG5hwh+jAHIw9qYJaDTEUrKrvq+ z4yw== X-Gm-Message-State: AOAM531WG9PKftMMIoBgt36PBkaU7ZdAJGHarGT6BDBQYfnUbC9++XpA nSvpCQZJbfhSpSgkeoY7hxinxfcouoalNtuOqqhP1ZBl7cAba4LjKDwYQsr/QZNNNeq5QkOj0WD hFk5RN2vTKxJbublaJheO X-Received: by 2002:a05:622a:d5:: with SMTP id p21mr5806083qtw.75.1628192186302; Thu, 05 Aug 2021 12:36:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzzkKiTUjnG5xJlnrC5YWDVV40UiRG5OpbQ8zSMc6K5dJAfRiP2kpyUF3QXfmZe6dDI2CMlyQ== X-Received: by 2002:a05:622a:d5:: with SMTP id p21mr5806051qtw.75.1628192185839; Thu, 05 Aug 2021 12:36:25 -0700 (PDT) Received: from smtpclient.apple (47-208-193-143.trckcmtc01.res.dyn.suddenlink.net. [47.208.193.143]) by smtp.gmail.com with ESMTPSA id 77sm3387569qkd.32.2021.08.05.12.36.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Aug 2021 12:36:25 -0700 (PDT) Message-Id: <7A99614B-28D1-4173-BAE8-8EE561BE13E0@redhat.com> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: A collection of LD_AUDIT bugs that are important for tools (with better formatting for this list) Date: Thu, 5 Aug 2021 12:36:21 -0700 In-Reply-To: <20210805103243.GJ14854@arm.com> To: Szabolcs Nagy References: <8A8FF420-8316-4A22-AC4D-DA1F2D5625A5@rice.edu> <2fc830b9-35da-9b94-369f-4df683078a5c@linaro.org> <8735tguubc.fsf@oldenburg.str.redhat.com> <75b2e838-a32e-6a2a-27b2-75bc06c01118@linaro.org> <3F6132F0-1042-4285-A309-0365D014422A@redhat.com> <81b1ad4e-cda4-4ed9-61e6-6dc8884800d5@linaro.org> <20210805103243.GJ14854@arm.com> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Ben Woodard via Libc-alpha Reply-To: Ben Woodard Cc: Florian Weimer , John Mellor-Crummey , libc-alpha@sourceware.org Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" > On Aug 5, 2021, at 3:32 AM, Szabolcs Nagy wrote: >=20 > The 08/04/2021 15:11, Adhemerval Zanella wrote: >> I updated my branch with a POC patch to support SVE for rtld-audit [1]. >> In the end the layout I ended up using is: >>=20 >> typedef union >> { >> float s; >> double d; >> long double q; >> long double *z; >> } La_aarch64_vector; >>=20 >> /* Registers for entry into PLT on AArch64. */ >> typedef struct La_aarch64_regs >> { >> uint64_t lr_xreg[9]; >> La_aarch64_vector lr_vreg[8]; >> uint64_t lr_sp; >> uint64_t lr_lr; >> uint8_t lr_sve; >> uint16_t *lr_sve_pregs[4]; >> } La_aarch64_regs; >>=20 >> /* Return values for calls from PLT on AArch64. */ >> typedef struct La_aarch64_retval >> { >> /* Up to eight integer registers can be used for a return value. */ >> uint64_t lrv_xreg[8]; >> /* Up to eight V registers can be used for a return value. */ >> La_aarch64_vector lrv_vreg[8]; >> uint8_t lrv_sve; >> } La_aarch64_retval; >>=20 >> It means the if 'lr_sve' is 0 in either La_aarch64_regs or La_aarch64_re= tval >> the La_aarch64_vector contains the floating-pointer registers that can b= e >> accessed directly. Otherwise, 'La_aarch64_vector.z' points to a memory = area >> that holds up to 'lr_sve' bytes from the Z registers, which can be loade= d >> with svld1 intrinsic for instance (as tst-audit28.c does). The P registe= r >> follows the same logic, with each La_aarch64_regs.lr_sve_pregs pointing >> to an area of memory 'lr_sve/8' in size. >=20 > i'd try a more generic extension mechanism like in > the linux sigcontext struct. then it's less likely > that existing plt hook code needs to change when > new register state is present and used. >=20 > and i think we need to handle variant pcs in a > generic way: we don't know if that's sve pcs or not. >=20 > for example >=20 > struct { > uint64_t lr_x[9]; > uint64_t lr_lr; > uint64_t lr_sp; > uint64_t lr_flags; // e.g. is this a variant PCS call? > vreg_t lr_v[8]; > struct extension *lr_ext; > }; What do you think about union=E2=80=99ing the lr_v and the lr_ext? I really= did like that about Adhemeral=E2=80=99s approach. >=20 > struct extension { > struct extension *next; > uint32_t type; // e.g. sve extension > uint32_t len; // can copy the contents even for unknown type > }; >=20 > struct xreg_extension { > struct extension header; > uint64_t x[30]; > }; >=20 > struct vreg_extension { > struct extension header; > vreg_t v[24]; > }; >=20 > struct sve_extension { > struct extension header; > uint16_t vl; > zreg_t *z[32]; > preg_t *p[16]; > char data[]; > }; I like your idea of defining an extensible structure. Casting pointers is g= oing to make some compilers complain but when you are doing system programm= ing doing type polymorphism this way is kind of normal I guess. >=20 >>=20 >> So, to access the FP register as float you can use: >>=20 >> static inline float regs_vec_to_float (const La_aarch64_regs *regs, int= i) >> { >> float r; >> if (regs->lr_sve =3D=3D 0) >> r =3D regs->lr_vreg[i].s; >> else >> memcpy (&r, ®s->lr_vreg[i].z[0], sizeof (r)); >> return r; >> } >>=20 >> To implement it I had to setup lazy binding when profiling or auditing i= s >> used, even when STO_AARCH64_VARIANT_PCS is being set. Also, to not incu= r >> in performance penalties on default non-SVE configuration, the patch use= s >> a new PTL entrypoint, _dl_runtime_profile_sve, which is used iff 'hwcap' >> has HWCAP_SVE bit set. >=20 > variant pcs does not mean 'sve call' it means 'arbitrary pcs'. > so we have to save all registers. >=20 > and a base pcs call does not have to preserve sve state so > we don't need to save the z regs even if sve is present. >=20 Yeah honestly this is why I really believe that we need to define STO_AARCH= 64_VARIANT_SVE (or something like that) and go through the pain of changing= the compilers and binutils. I do not believe that my users really will car= e about the full breadth of STO_AARCH64_VARIANT_PCS and all the potential s= trange ABI variants where all the registers must be saved. I also know that= they will be willing to recompile any code which uses inter-object calls t= hat pass parameters using SVE registers in the case where they want or need= to use performance tools which will go through the audit interface. SVE parameter passing is defined in the AAPCS https://github.com/ARM-softwa= re/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing-rules while all the other PCS variants don=E2=80=99t obey the AAPCS. I w= ould argue that using STO_AARCH64_VARIANT_PCS for inter-object SVE calls wa= s a expedient kludge that needs to be backed out. Having a design that can ultimately accommodate all the variants including = ones that do not obey the AAPCS is great. However, the only problem we need= to solve in the reasonably near future is being able to audit SVE calls. I= f and when there are changes to the AAPCS, having an extensible design such= as yours, we can handle those too. (For example: I cannot tell yet if SME = https://community.arm.com/developer/ip-products/processors/b/processors-ip-= blog/posts/scalable-matrix-extension-armv9-a-architecture will ultimately l= ead to changes to the AAPCS but if so, the changes do not appear to have la= nded yet.) > the main difficulty i see is that we cannot easily tell in > a plt entry if it is for a variant pcs symbol: you have to > look at the symbol table entry using the symbol index from > the relocation. usually such code is in c, but c code does > not preserve all registers, so here it has to be in asm. > the clean way would be a different entrypoint for variant > pcs calls, but that requires linker changes (another PLT0 > like construct where variant pcs PLT can go). >=20 > alternatively use _dl_runtime_profile_vpcs entrypoint for > an elf module that has DT_AARCH64_VARIANT_PCS set and then > always save all registers present. then the default entry > point does not need to deal with extensions. this may be > slow for some hpc usecases. uuugh =E2=80=94 my users are all HPC users. Really, it comes down to I=E2=80=99m not a fan of DT_AARCH64_VARIANT_PCS fo= r SVE inter-object procedure calls. I would sort of like to leave it unaudi= table and just move SVE into its own variant. However, other than SVE I do = not know of any other uses of DT_AARCH64_VARIANT_PCS. Writing assembly wher= e all the registers must be preserved is such a pain. >=20 >>=20 >> I think this is a fair assumption since SVE has a defined set of registe= rs >> for argument passing and return values. A new ABI with either different >> argument passing or different registers would require a different PLT >> entry, but I assume this would require another symbol flag anyway (or >> at least a different ELF mark to indicate so). >>=20 >> For this POC, the profile '_dl_runtime_profile_sve' entrypoint assumes >> the largest SVE register size possible (2048 bits) and thus it requires >> a quite large stack (8976 bytes). I think it would be possible make the >> stack requirement dynamic depending of the vector length, but it would >> make the PLT audit function way more complex. >=20 > yeah, i think we need to understand how the plt hooks are > used: do they actually look at these registers? or they only > need the registers to be preserved? we may not need easy > access to the z reg contents. >=20 Tools that actually use the PLT hooks are very rare =E2=80=94 even more rar= e than tools that use LD_AUDIT. The ones that I have seen use that interfac= e to inspect and modify parameters and return values. So I would argue that= providing RW access to the Z registers is required.=20 The two broad categories that I have seen were either trivial and used for = basic curiosity and debugging (kind of like a targeted ltrace) and ones tha= t worked around bugs in the libraries or did additional security checks not= implemented in the library. The latter seems to be what the PLT portion of= the LD_AUDIT interface was designed to do. Quickly coding up security band= aids to detect and prevent exploitation until the library could be properly= fixed, has been really handy a couple of times. For example, I wrote one a= few years ago that detected prevented a buffer overflow in library where f= ixing the underlying problem in the library required an API change that wou= ld have required broader refactoring of the application. (IIRC that was lib= png or some graphics format handling library - the effectiveness of this so= lution was limited by the fact that glibc and binutils didn=E2=80=99t yet i= mplement DEPAUDIT.) >>=20 >> This patch is not complete yet: the tst-audit28 does not check if compil= er >> supports SVE (we would need a configure check to disable for such case), >> I need to add a proper comment for the _dl_runtime_profile_sve >> stack layout, the test need to check for the P register state clobbering= . >>=20 >> I also haven't check the performance penalties with this approach, and >> maybe the way I am saving/restoring the SVE register might be optimized. >>=20 >> In any case, I checked on a SVE machine and at least the testcase work >> as expected without any regressions. I also did a sniff test on a non S= VE >> machine. >>=20 >> [1] https://sourceware.org/git/?p=3Dglibc.git;a=3Dshortlog;h=3Drefs/head= s/azanella/ld-audit-fixes