From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 282501F619 for ; Tue, 17 Mar 2020 09:03:52 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DB4053899434; Tue, 17 Mar 2020 09:03:50 +0000 (GMT) Received: from albireo.enyo.de (albireo.enyo.de [37.24.231.21]) by sourceware.org (Postfix) with ESMTPS id 385EE385F022 for ; Tue, 17 Mar 2020 09:03:47 +0000 (GMT) Received: from [172.17.203.2] (helo=deneb.enyo.de) by albireo.enyo.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1jE88W-0000gM-NP; Tue, 17 Mar 2020 09:03:44 +0000 Received: from fw by deneb.enyo.de with local (Exim 4.92) (envelope-from ) id 1jE877-0000rw-DL; Tue, 17 Mar 2020 10:02:17 +0100 From: Florian Weimer To: Prem Mallappa via Libc-alpha Subject: Re: [PATCH 0/3] RFC: Platform Support for AMD Zen and AVX2/AVX References: <20200317044646.29707-1-PMallappa@amd.com> Date: Tue, 17 Mar 2020 10:02:17 +0100 In-Reply-To: <20200317044646.29707-1-PMallappa@amd.com> (Prem Mallappa via Libc-alpha's message of "Tue, 17 Mar 2020 10:16:43 +0530") Message-ID: <87wo7je4me.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: codonell@redhat.com, Michael Matz , Prem Mallappa , Prem Mallappa , schwab@suse.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" * Prem Mallappa via Libc-alpha: > From: Prem Mallappa > > Hello Glibc Community, > > == (cross posting to libc-alpha, apologies for the spam) == > > This is in response to > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24979 > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=24080 > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=23249 > > It is clear that there is no panacea here. However, > here is an attempt to address them in parts. > > From [1], enable customers who already have > "haswell" libs and has seen perf benifits by loading > them on AMD Zen. > (Load libraries by placing them in LD_LIBRARY_PATH/zen > or by a symbolic link zen->haswell) > > From [2] and [3] > And, A futuristic generic-avx2/generic-avx libs, > enables OS vendors to supply an optimized set. > And haswell/zen are really a superset, hence > keeping it made sense. > > By this we would like to open it up for discussion > The haswell/zen can be intel/amd > (or any other name, and supply ifunc based loading > internally) I think we cannot use the platform subdirectory for that because there is just a single one. If we want a Intel/AMD split, we need to enhance the dynamic loader to try the CPU vendor directory first, and then fallback to a shared subdirectory. Most distributions do not want to test and ship binaries specific to Intel or AMD CPUs. That's a generic loader change which will need some time to implement, but we can work on something else in the meantime: We need to check for *all* relevant CPU flags such code can use and, and only enable a subdirectory if they are present. This is necessary because virtualization and microcode updates can disable individual CPU features. For the new shared subdirectory, I think we should not restrict ourselves just to AVX2, but we should also include useful extensions that are in practice always implemented in silicon along with AVX2, but can be separately tweaked. This seems to be a reasonable list of CPU feature flags to start with: 3DNOW 3DNOWEXT 3DNOWPREFETCH ABM ADX AES AVX AVX2 BMI BMI2 CET CLFLUSH CLFLUSHOPT CLWB CLZERO CMPXCHG16B ERMS F16C FMA FMA4 FSGSBASE FSRM FXSR HLE LAHF LZCNT MOVBE MWAITX PCLMUL PCOMMIT PKU POPCNT PREFETCHW RDPID RDRAND RDSEED RDTSCP RTM SHA SSE3 SSE4.1 SSE4.2 SSE4A SSSE3 TSC XGETBV XSAVE XSAVEC XSAVEOPT XSAVES You (as in AMD) need to go through this list and come back with the subset that you think should be enabled for current and future CPUs, based on your internal roadmap and known errata for existing CPUs. We do not need a rationale for how you filter down the list, merely the outcome. (I already have the trimmed-down list from Intel.)