From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B679F1F8C6 for ; Fri, 16 Jul 2021 02:50:28 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D8364386EC3A for ; Fri, 16 Jul 2021 02:50:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D8364386EC3A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626403827; bh=VRcBElwupsLdhrPxZrke8duLA7MYLBN6/rjlKT2IEdk=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Zq51sR3dKwKvLG87N2JjbzW1ith7v1n5pyTuKUUDnK9SwImZ5NRfn6eQrFFAVHUyP mjzWkyQnUOb/QtKuyh8n3RT60ZhAREYiXrcc4y1a0bHI4G2lIiIxl+ZbZiVeaxKX1y rnSq70SOEdaIdWl3/BuM7EHpuiHTOWhPj6nj3Luc= Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 5D243385C409 for ; Fri, 16 Jul 2021 02:49:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D243385C409 Received: by mail-pf1-x42c.google.com with SMTP id p22so7614028pfh.8 for ; Thu, 15 Jul 2021 19:49:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0f7QCUypS1Scbn3u7+lNvurJ54mVx8OaM0yqmta+zG0=; b=d5/0GAw9HZvUDumB40LOiTUz7vuWYZWtiudUWj6+JKkzSZ/9pnPcfEt6XQGi+Qv2BN IufG0ss/O9s3+EEQUtnibeYLYowy6OGlgiRtptpIGa5+3Zrur89rufO9A6FQcpBzTFTa TP31oOR7qNJP+SivF9bxKiBMO5OwpeSpaEK2hMIQWGtaiI5+8e2RpZajPeexMCFKufPZ yjVEvFJcE3A8B6mrwYcqQasX6bQyJFOhy3N5KPH8hl1H/3xl4RuzaacNzP8ARjQWqhVT ZremsLf1SR2YXQTh5KhduX622RRxrIiNOySMHxJXkJJUttF7pLavbk0fPCR/T8SBg3gD PC4w== X-Gm-Message-State: AOAM532GBDCrrNxwuF+h6bBv2A0fu9GjVWBHgyJZ18j26YwFNPf9mx1C ukX8d7AzEm3utxB8FaUwHv9biXRONmfrOjvjV0U= X-Google-Smtp-Source: ABdhPJxYtjUzgh7sIgr6Piivz0Uudnf26XLUFNbHp768fp3BN4Pizv7WOsuqooQdS44svZ0he/kfS7STQ3j1RiM9wOc= X-Received: by 2002:a62:1d86:0:b029:32a:311a:9595 with SMTP id d128-20020a621d860000b029032a311a9595mr7955989pfd.74.1626403743534; Thu, 15 Jul 2021 19:49:03 -0700 (PDT) MIME-Version: 1.0 References: <20210716023656.670004-1-jason@redhat.com> In-Reply-To: Date: Thu, 15 Jul 2021 22:48:52 -0400 Message-ID: Subject: Re: [PATCH] c++: implement C++17 hardware interference size To: Jason Merrill Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Noah Goldstein via Libc-alpha Reply-To: Noah Goldstein Cc: "Richard Earnshaw \(lists\)" , libstdc++ , gcc-patches List , GNU C Library Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Thu, Jul 15, 2021 at 10:41 PM Jason Merrill via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > Adding CCs that got lost in the initial mail. > > On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill wrote: > > > The last missing piece of the C++17 standard library is the hardware > > intereference size constants. Much of the delay in implementing these > has > > been due to uncertainty about what the right values are, and even whether > > there is a single constant value that is suitable; the destructive > > interference size is intended to be used in structure layout, so program > > ABIs will depend on it. > > > > In principle, both of these values should be the same as the target's L1 > > cache line size. When compiling for a generic target that is intended to > > support a range of target CPUs with different cache line sizes, the > > constructive size should probably be the minimum size, and the > destructive > > size the maximum, unless you are constrained by ABI compatibility with > > previous code. > > > > JF Bastien's implementation proposal is summarized at > > https://github.com/itanium-cxx-abi/cxx-abi/issues/74 > > > > I implement this by adding new --params for the two sizes. Targets need > to > > override these values in targetm.target_option.override() to support the > > feature. > > > > 64 bytes still seems correct for the x86 family. > > > > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a > > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL > > target, so I'd think 32/64 would make more sense. > > > > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache > > line, I've changed that to 64/256. Does that seem right? > > > > Currently the patch does not adjust the values based on -march, as in > JF's > > proposal. I'll need more guidance from the ARM/AArch64 maintainers about > > how to go about that. --param l1-cache-line-size is set based on -mtune, > > but I don't think we want -mtune to change these ABI-affecting values. > Are > > there -march values for which a smaller range than 64-256 makes sense? > > > > gcc/ChangeLog: > > > > * params.opt: Add destructive-interference-size and > > constructive-interference-size. > > * doc/invoke.texi: Document them. > > * config/aarch64/aarch64.c (aarch64_override_options_internal): > > Set them. > > * config/arm/arm.c (arm_option_override): Set them. > > * config/i386/i386-options.c (ix86_option_override_internal): > > Set them. > > > > gcc/c-family/ChangeLog: > > > > * c.opt: Add -Winterference-size. > > * c-cppbuiltin.c (cpp_atomic_builtins): Add > __GCC_DESTRUCTIVE_SIZE > > and __GCC_CONSTRUCTIVE_SIZE. > > > > gcc/cp/ChangeLog: > > > > * decl.c (cxx_init_decl_processing): Check > > --param *-interference-size values. > > > > libstdc++-v3/ChangeLog: > > > > * include/std/version: Define > __cpp_lib_hardware_interference_size. > > * libsupc++/new: Define hardware interference size variables. > > > > gcc/testsuite/ChangeLog: > > > > * g++.target/aarch64/interference.C: New test. > > * g++.target/arm/interference.C: New test. > > * g++.target/i386/interference.C: New test. > > --- > > gcc/doc/invoke.texi | 22 ++++++++++++++++++ > > gcc/c-family/c.opt | 5 ++++ > > gcc/params.opt | 15 ++++++++++++ > > gcc/c-family/c-cppbuiltin.c | 12 ++++++++++ > > gcc/config/aarch64/aarch64.c | 9 ++++++++ > > gcc/config/arm/arm.c | 6 +++++ > > gcc/config/i386/i386-options.c | 6 +++++ > > gcc/cp/decl.c | 23 +++++++++++++++++++ > > .../g++.target/aarch64/interference.C | 9 ++++++++ > > gcc/testsuite/g++.target/arm/interference.C | 9 ++++++++ > > gcc/testsuite/g++.target/i386/interference.C | 8 +++++++ > > libstdc++-v3/include/std/version | 3 +++ > > libstdc++-v3/libsupc++/new | 10 ++++++-- > > 13 files changed, 135 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C > > create mode 100644 gcc/testsuite/g++.target/arm/interference.C > > create mode 100644 gcc/testsuite/g++.target/i386/interference.C > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index ea8812425e9..f93cb7a20f7 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant > > stride. > > > > This setting is only useful for strides that are known and constant. > > > > +@item destructive_interference_size > > +@item constructive_interference_size > > +The values for the C++17 variables > > +@code{std::hardware_destructive_interference_size} and > > +@code{std::hardware_constructive_interference_size}. The destructive > > +interference size is the minimum recommended offset between two > > +independent concurrently-accessed objects; the constructive > > +interference size is the maximum recommended size of contiguous memory > > +accessed together. Typically both will be the size of an L1 cache > > +line for the target, in bytes. If the target can have a range of L1 > > +cache line sizes, typically the constructive interference size will be > > +the small end of the range and the destructive size will be the large > > +end. > > + > > +These values, particularly the destructive size, are intended to be > > +used for layout, and thus have ABI impact. The default values can > > +change with @samp{-march}, and users should be aware of this and > > +perhaps specify these values directly if they need to be ABI > > +compatible with code that was compiled with a different @samp{-march}. > > +Changing the default values for a generic target should be done > > +cautiously. > > + > > @item loop-interchange-max-num-stmts > > The maximum number of stmts in a loop to be interchanged. > > > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > > index 91929706aff..0398faf430a 100644 > > --- a/gcc/c-family/c.opt > > +++ b/gcc/c-family/c.opt > > @@ -722,6 +722,11 @@ Winit-list-lifetime > > C++ ObjC++ Var(warn_init_list) Warning Init(1) > > Warn about uses of std::initializer_list that can result in dangling > > pointers. > > > > +Winterference-size > > +C++ ObjC++ Var(warn_interference_size) Warning Init(1) > > +Warn about nonsensical values of --param destructive-interference-size > or > > +constructive-interference-size. > > + > > Wimplicit > > C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) > > Warn about implicit declarations. > > diff --git a/gcc/params.opt b/gcc/params.opt > > index 92b003e38cb..a81a3ec82f1 100644 > > --- a/gcc/params.opt > > +++ b/gcc/params.opt > > @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding > > into a jump table (in percent) > > Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param > > Optimization > > The size of L1 cache line. > > > > +-param=destructive-interference-size= > > +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param > > Optimization > > +The minimum recommended offset between two concurrently-accessed objects > > to > > +avoid additional performance degradation due to contention introduced by > > the > > +implementation. Typically the L1 cache line size, but can be larger to > > +accommodate a variety of target processors with different cache line > > sizes. > > +C++17 code might use this value in structure layout. > > + > > +-param=constructive-interference-size= > > +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param > > Optimization > > +The maximum recommended size of contiguous memory occupied by two > objects > > +accessed with temporal locality by concurrent threads. Typically the L1 > > cache > > +line size, but can be smaller to accommodate a variety of target > > processors with > > +different cache line sizes. > > + > > -param=l1-cache-size= > > Common Joined UInteger Var(param_l1_cache_size) Init(64) Param > > Optimization > > The size of L1 cache. > > diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c > > index f79f939bd10..a7bf2544533 100644 > > --- a/gcc/c-family/c-cppbuiltin.c > > +++ b/gcc/c-family/c-cppbuiltin.c > > @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile) > > builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", > > targetm.atomic_test_and_set_trueval); > > > > + /* Macros for C++17 hardware interference size constants. Either both > > or > > + neither should be set. */ > > + gcc_assert (!param_destruct_interfere_size > > + == !param_construct_interfere_size); > > + if (param_destruct_interfere_size) > > + { > > + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", > > + param_destruct_interfere_size); > > + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", > > + param_construct_interfere_size); > > + } > > + > > /* ptr_type_node can't be used here since ptr_mode is only set when > > toplev calls backend_init which is not done with -E or pch. */ > > psize = POINTER_SIZE_UNITS; > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index f5b25a7f704..b172fcdc93e 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct > > gcc_options *opts) > > SET_OPTION_IF_UNSET (opts, &global_options_set, > > param_l1_cache_line_size, > > > aarch64_tune_params.prefetch->l1_cache_line_size); > > + /* For a generic AArch64 target, cover the current range of cache line > > + sizes. */ > > + SET_OPTION_IF_UNSET (opts, &global_options_set, > > + param_destruct_interfere_size, > > + 256); > > + SET_OPTION_IF_UNSET (opts, &global_options_set, > > + param_construct_interfere_size, > > + 64); > > + > > if (aarch64_tune_params.prefetch->l2_cache_size >= 0) > > SET_OPTION_IF_UNSET (opts, &global_options_set, > > param_l2_cache_size, > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index 6d781e23ee9..edfa2ad3426 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -3656,6 +3656,12 @@ arm_option_override (void) > > SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > param_l1_cache_line_size, > > current_tune->prefetch.l1_cache_line_size); > > + /* For a generic ARM target, JF Bastien proposed using 64 for both. > > + ??? Why not 32 for constructive? */ > > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > + param_destruct_interfere_size, 64); > > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > + param_construct_interfere_size, 64); > > if (current_tune->prefetch.l1_cache_size >= 0) > > SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > param_l1_cache_size, > > diff --git a/gcc/config/i386/i386-options.c > > b/gcc/config/i386/i386-options.c > > index 7cba655595e..3b1b3c838f8 100644 > > --- a/gcc/config/i386/i386-options.c > > +++ b/gcc/config/i386/i386-options.c > > @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p, > > SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, > > ix86_tune_cost->l2_cache_size); > > > > + /* 64B is the accepted value for these for all x86. */ > > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > + param_destruct_interfere_size, 64); > > + SET_OPTION_IF_UNSET (&global_options, &global_options_set, > > + param_construct_interfere_size, 64); > > + > > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. > > */ > > if (opts->x_flag_prefetch_loop_arrays < 0 > > && HAVE_prefetch > > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c > > index 01d64a16125..880fb8948a9 100644 > > --- a/gcc/cp/decl.c > > +++ b/gcc/cp/decl.c > > @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void) > > /* Show we use EH for cleanups. */ > > if (flag_exceptions) > > using_eh_for_cleanups (); > > + > > + /* Check that the hardware interference sizes are at least > > + alignof(max_align_t), as required by the standard. */ > > + if (param_destruct_interfere_size) > > + { > > + int max_align = max_align_t_align () / BITS_PER_UNIT; > > + if (param_destruct_interfere_size < max_align) > > + error ("%<--param destructive-interference-size=%d%> is less > than " > > + "%d", param_destruct_interfere_size, max_align); > > + else if (param_destruct_interfere_size < param_l1_cache_line_size) > > + warning (OPT_Winterference_size, > > + "%<--param destructive-interference-size=%d%> " > > + "is less than %<--param l1-cache-line-size=%d%>", > > + param_destruct_interfere_size, > param_l1_cache_line_size); > > + if (param_construct_interfere_size < max_align) > > + error ("%<--param constructive-interference-size=%d%> is less > than > > " > > + "%d", param_construct_interfere_size, max_align); > > + else if (param_construct_interfere_size > > param_l1_cache_line_size) > > + warning (OPT_Winterference_size, > > + "%<--param constructive-interference-size=%d%> " > > + "is greater than %<--param l1-cache-line-size=%d%>", > > + param_construct_interfere_size, > param_l1_cache_line_size); > > + } > > } > > > > /* Enter an abi node in global-module context. returns a cookie to > > diff --git a/gcc/testsuite/g++.target/aarch64/interference.C > > b/gcc/testsuite/g++.target/aarch64/interference.C > > new file mode 100644 > > index 00000000000..0fc01655223 > > --- /dev/null > > +++ b/gcc/testsuite/g++.target/aarch64/interference.C > > @@ -0,0 +1,9 @@ > > +// Test C++17 hardware interference size constants > > +// { dg-do compile { target c++17 } } > > + > > +#include > > + > > +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent > > ones use > > +// 128 or even 256. > > +static_assert(std::hardware_destructive_interference_size == 256); > > +static_assert(std::hardware_constructive_interference_size == 64); > > diff --git a/gcc/testsuite/g++.target/arm/interference.C > > b/gcc/testsuite/g++.target/arm/interference.C > > new file mode 100644 > > index 00000000000..34fe8a52bff > > --- /dev/null > > +++ b/gcc/testsuite/g++.target/arm/interference.C > > @@ -0,0 +1,9 @@ > > +// Test C++17 hardware interference size constants > > +// { dg-do compile { target c++17 } } > > + > > +#include > > + > > +// Recent ARM CPUs have a cache line size of 64. Older ones have > > +// a size of 32, but I guess they're old enough that we don't care? > > +static_assert(std::hardware_destructive_interference_size == 64); > > +static_assert(std::hardware_constructive_interference_size == 64); > > diff --git a/gcc/testsuite/g++.target/i386/interference.C > > b/gcc/testsuite/g++.target/i386/interference.C > > new file mode 100644 > > index 00000000000..c7b910e3ada > > --- /dev/null > > +++ b/gcc/testsuite/g++.target/i386/interference.C > > @@ -0,0 +1,8 @@ > > +// Test C++17 hardware interference size constants > > +// { dg-do compile { target c++17 } } > > + > > +#include > > + > > +// It is generally agreed that these are the right values for all x86. > > +static_assert(std::hardware_destructive_interference_size == 64); > > +static_assert(std::hardware_constructive_interference_size == 64); > > diff --git a/libstdc++-v3/include/std/version > > b/libstdc++-v3/include/std/version > > index 27bcd32cb60..d5e155db48b 100644 > > --- a/libstdc++-v3/include/std/version > > +++ b/libstdc++-v3/include/std/version > > @@ -140,6 +140,9 @@ > > #define __cpp_lib_filesystem 201703 > > #define __cpp_lib_gcd 201606 > > #define __cpp_lib_gcd_lcm 201606 > > +#ifdef __GCC_DESTRUCTIVE_SIZE > > +# define __cpp_lib_hardware_interference_size 201703L > > +#endif > > #define __cpp_lib_hypot 201603 > > #define __cpp_lib_invoke 201411L > > #define __cpp_lib_lcm 201606 > > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new > > index 3349b13fd1b..7bc67a6cb02 100644 > > --- a/libstdc++-v3/libsupc++/new > > +++ b/libstdc++-v3/libsupc++/new > > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) > > _GLIBCXX_USE_NOEXCEPT { } > > } // extern "C++" > > > > #if __cplusplus >= 201703L > > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER > > namespace std > > { > > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER > > #define __cpp_lib_launder 201606 > > /// Pointer optimization barrier [ptr.launder] > > template > > @@ -205,8 +205,14 @@ namespace std > > void launder(const void*) = delete; > > void launder(volatile void*) = delete; > > void launder(const volatile void*) = delete; > > -} > > #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER > > + > > +#ifdef __GCC_DESTRUCTIVE_SIZE > > +# define __cpp_lib_hardware_interference_size 201703L > > + inline constexpr size_t hardware_destructive_interference_size = > > __GCC_DESTRUCTIVE_SIZE; > > + inline constexpr size_t hardware_constructive_interference_size = > > __GCC_CONSTRUCTIVE_SIZE; > > +#endif // __GCC_DESTRUCTIVE_SIZE > > +} > > #endif // C++17 > > > > #if __cplusplus > 201703L > > > > base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735 > > prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092 > > prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a > > -- > > 2.27.0 > > > > > On intel x86 systems with a private L2 cache the spatial prefetcher can cause destructive interference along 128 byte aligned boundaries. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60