unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] c++: implement C++17 hardware interference size
       [not found] <20210716023656.670004-1-jason@redhat.com>
@ 2021-07-16  2:41 ` Jason Merrill via Libc-alpha
  2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jason Merrill via Libc-alpha @ 2021-07-16  2:41 UTC (permalink / raw)
  To: gcc-patches List; +Cc: Richard Earnshaw (lists), libstdc++, libc-alpha

Adding CCs that got lost in the initial mail.

On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill <jason@redhat.com> wrote:

> The last missing piece of the C++17 standard library is the hardware
> intereference size constants.  Much of the delay in implementing these has
> been due to uncertainty about what the right values are, and even whether
> there is a single constant value that is suitable; the destructive
> interference size is intended to be used in structure layout, so program
> ABIs will depend on it.
>
> In principle, both of these values should be the same as the target's L1
> cache line size.  When compiling for a generic target that is intended to
> support a range of target CPUs with different cache line sizes, the
> constructive size should probably be the minimum size, and the destructive
> size the maximum, unless you are constrained by ABI compatibility with
> previous code.
>
> JF Bastien's implementation proposal is summarized at
> https://github.com/itanium-cxx-abi/cxx-abi/issues/74
>
> I implement this by adding new --params for the two sizes.  Targets need to
> override these values in targetm.target_option.override() to support the
> feature.
>
> 64 bytes still seems correct for the x86 family.
>
> I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> target, so I'd think 32/64 would make more sense.
>
> He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> line, I've changed that to 64/256.  Does that seem right?
>
> Currently the patch does not adjust the values based on -march, as in JF's
> proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> how to go about that.  --param l1-cache-line-size is set based on -mtune,
> but I don't think we want -mtune to change these ABI-affecting values.  Are
> there -march values for which a smaller range than 64-256 makes sense?
>
> gcc/ChangeLog:
>
>         * params.opt: Add destructive-interference-size and
>         constructive-interference-size.
>         * doc/invoke.texi: Document them.
>         * config/aarch64/aarch64.c (aarch64_override_options_internal):
>         Set them.
>         * config/arm/arm.c (arm_option_override): Set them.
>         * config/i386/i386-options.c (ix86_option_override_internal):
>         Set them.
>
> gcc/c-family/ChangeLog:
>
>         * c.opt: Add -Winterference-size.
>         * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
>         and __GCC_CONSTRUCTIVE_SIZE.
>
> gcc/cp/ChangeLog:
>
>         * decl.c (cxx_init_decl_processing): Check
>         --param *-interference-size values.
>
> libstdc++-v3/ChangeLog:
>
>         * include/std/version: Define __cpp_lib_hardware_interference_size.
>         * libsupc++/new: Define hardware interference size variables.
>
> gcc/testsuite/ChangeLog:
>
>         * g++.target/aarch64/interference.C: New test.
>         * g++.target/arm/interference.C: New test.
>         * g++.target/i386/interference.C: New test.
> ---
>  gcc/doc/invoke.texi                           | 22 ++++++++++++++++++
>  gcc/c-family/c.opt                            |  5 ++++
>  gcc/params.opt                                | 15 ++++++++++++
>  gcc/c-family/c-cppbuiltin.c                   | 12 ++++++++++
>  gcc/config/aarch64/aarch64.c                  |  9 ++++++++
>  gcc/config/arm/arm.c                          |  6 +++++
>  gcc/config/i386/i386-options.c                |  6 +++++
>  gcc/cp/decl.c                                 | 23 +++++++++++++++++++
>  .../g++.target/aarch64/interference.C         |  9 ++++++++
>  gcc/testsuite/g++.target/arm/interference.C   |  9 ++++++++
>  gcc/testsuite/g++.target/i386/interference.C  |  8 +++++++
>  libstdc++-v3/include/std/version              |  3 +++
>  libstdc++-v3/libsupc++/new                    | 10 ++++++--
>  13 files changed, 135 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
>  create mode 100644 gcc/testsuite/g++.target/arm/interference.C
>  create mode 100644 gcc/testsuite/g++.target/i386/interference.C
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ea8812425e9..f93cb7a20f7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> stride.
>
>  This setting is only useful for strides that are known and constant.
>
> +@item destructive_interference_size
> +@item constructive_interference_size
> +The values for the C++17 variables
> +@code{std::hardware_destructive_interference_size} and
> +@code{std::hardware_constructive_interference_size}.  The destructive
> +interference size is the minimum recommended offset between two
> +independent concurrently-accessed objects; the constructive
> +interference size is the maximum recommended size of contiguous memory
> +accessed together.  Typically both will be the size of an L1 cache
> +line for the target, in bytes.  If the target can have a range of L1
> +cache line sizes, typically the constructive interference size will be
> +the small end of the range and the destructive size will be the large
> +end.
> +
> +These values, particularly the destructive size, are intended to be
> +used for layout, and thus have ABI impact.  The default values can
> +change with @samp{-march}, and users should be aware of this and
> +perhaps specify these values directly if they need to be ABI
> +compatible with code that was compiled with a different @samp{-march}.
> +Changing the default values for a generic target should be done
> +cautiously.
> +
>  @item loop-interchange-max-num-stmts
>  The maximum number of stmts in a loop to be interchanged.
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 91929706aff..0398faf430a 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -722,6 +722,11 @@ Winit-list-lifetime
>  C++ ObjC++ Var(warn_init_list) Warning Init(1)
>  Warn about uses of std::initializer_list that can result in dangling
> pointers.
>
> +Winterference-size
> +C++ ObjC++ Var(warn_interference_size) Warning Init(1)
> +Warn about nonsensical values of --param destructive-interference-size or
> +constructive-interference-size.
> +
>  Wimplicit
>  C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
>  Warn about implicit declarations.
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 92b003e38cb..a81a3ec82f1 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding
> into a jump table (in percent)
>  Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param
> Optimization
>  The size of L1 cache line.
>
> +-param=destructive-interference-size=
> +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param
> Optimization
> +The minimum recommended offset between two concurrently-accessed objects
> to
> +avoid additional performance degradation due to contention introduced by
> the
> +implementation.  Typically the L1 cache line size, but can be larger to
> +accommodate a variety of target processors with different cache line
> sizes.
> +C++17 code might use this value in structure layout.
> +
> +-param=constructive-interference-size=
> +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param
> Optimization
> +The maximum recommended size of contiguous memory occupied by two objects
> +accessed with temporal locality by concurrent threads.  Typically the L1
> cache
> +line size, but can be smaller to accommodate a variety of target
> processors with
> +different cache line sizes.
> +
>  -param=l1-cache-size=
>  Common Joined UInteger Var(param_l1_cache_size) Init(64) Param
> Optimization
>  The size of L1 cache.
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index f79f939bd10..a7bf2544533 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile)
>    builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
>                                  targetm.atomic_test_and_set_trueval);
>
> +  /* Macros for C++17 hardware interference size constants.  Either both
> or
> +     neither should be set.  */
> +  gcc_assert (!param_destruct_interfere_size
> +             == !param_construct_interfere_size);
> +  if (param_destruct_interfere_size)
> +    {
> +      builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
> +                                    param_destruct_interfere_size);
> +      builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
> +                                    param_construct_interfere_size);
> +    }
> +
>    /* ptr_type_node can't be used here since ptr_mode is only set when
>       toplev calls backend_init which is not done with -E  or pch.  */
>    psize = POINTER_SIZE_UNITS;
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index f5b25a7f704..b172fcdc93e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
>      SET_OPTION_IF_UNSET (opts, &global_options_set,
>                          param_l1_cache_line_size,
>                          aarch64_tune_params.prefetch->l1_cache_line_size);
> +  /* For a generic AArch64 target, cover the current range of cache line
> +     sizes.  */
> +  SET_OPTION_IF_UNSET (opts, &global_options_set,
> +                      param_destruct_interfere_size,
> +                      256);
> +  SET_OPTION_IF_UNSET (opts, &global_options_set,
> +                      param_construct_interfere_size,
> +                      64);
> +
>    if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
>      SET_OPTION_IF_UNSET (opts, &global_options_set,
>                          param_l2_cache_size,
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 6d781e23ee9..edfa2ad3426 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3656,6 +3656,12 @@ arm_option_override (void)
>      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
>                          param_l1_cache_line_size,
>                          current_tune->prefetch.l1_cache_line_size);
> +  /* For a generic ARM target, JF Bastien proposed using 64 for both.
> +     ??? Why not 32 for constructive?  */
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +                      param_destruct_interfere_size, 64);
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +                      param_construct_interfere_size, 64);
>    if (current_tune->prefetch.l1_cache_size >= 0)
>      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
>                          param_l1_cache_size,
> diff --git a/gcc/config/i386/i386-options.c
> b/gcc/config/i386/i386-options.c
> index 7cba655595e..3b1b3c838f8 100644
> --- a/gcc/config/i386/i386-options.c
> +++ b/gcc/config/i386/i386-options.c
> @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p,
>    SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
>                        ix86_tune_cost->l2_cache_size);
>
> +  /* 64B is the accepted value for these for all x86.  */
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +                      param_destruct_interfere_size, 64);
> +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +                      param_construct_interfere_size, 64);
> +
>    /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.
> */
>    if (opts->x_flag_prefetch_loop_arrays < 0
>        && HAVE_prefetch
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 01d64a16125..880fb8948a9 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void)
>    /* Show we use EH for cleanups.  */
>    if (flag_exceptions)
>      using_eh_for_cleanups ();
> +
> +  /* Check that the hardware interference sizes are at least
> +     alignof(max_align_t), as required by the standard.  */
> +  if (param_destruct_interfere_size)
> +    {
> +      int max_align = max_align_t_align () / BITS_PER_UNIT;
> +      if (param_destruct_interfere_size < max_align)
> +       error ("%<--param destructive-interference-size=%d%> is less than "
> +              "%d", param_destruct_interfere_size, max_align);
> +      else if (param_destruct_interfere_size < param_l1_cache_line_size)
> +       warning (OPT_Winterference_size,
> +                "%<--param destructive-interference-size=%d%> "
> +                "is less than %<--param l1-cache-line-size=%d%>",
> +                param_destruct_interfere_size, param_l1_cache_line_size);
> +      if (param_construct_interfere_size < max_align)
> +       error ("%<--param constructive-interference-size=%d%> is less than
> "
> +              "%d", param_construct_interfere_size, max_align);
> +      else if (param_construct_interfere_size > param_l1_cache_line_size)
> +       warning (OPT_Winterference_size,
> +                "%<--param constructive-interference-size=%d%> "
> +                "is greater than %<--param l1-cache-line-size=%d%>",
> +                param_construct_interfere_size, param_l1_cache_line_size);
> +    }
>  }
>
>  /* Enter an abi node in global-module context.  returns a cookie to
> diff --git a/gcc/testsuite/g++.target/aarch64/interference.C
> b/gcc/testsuite/g++.target/aarch64/interference.C
> new file mode 100644
> index 00000000000..0fc01655223
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/interference.C
> @@ -0,0 +1,9 @@
> +// Test C++17 hardware interference size constants
> +// { dg-do compile { target c++17 } }
> +
> +#include <new>
> +
> +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent
> ones use
> +// 128 or even 256.
> +static_assert(std::hardware_destructive_interference_size == 256);
> +static_assert(std::hardware_constructive_interference_size == 64);
> diff --git a/gcc/testsuite/g++.target/arm/interference.C
> b/gcc/testsuite/g++.target/arm/interference.C
> new file mode 100644
> index 00000000000..34fe8a52bff
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/arm/interference.C
> @@ -0,0 +1,9 @@
> +// Test C++17 hardware interference size constants
> +// { dg-do compile { target c++17 } }
> +
> +#include <new>
> +
> +// Recent ARM CPUs have a cache line size of 64.  Older ones have
> +// a size of 32, but I guess they're old enough that we don't care?
> +static_assert(std::hardware_destructive_interference_size == 64);
> +static_assert(std::hardware_constructive_interference_size == 64);
> diff --git a/gcc/testsuite/g++.target/i386/interference.C
> b/gcc/testsuite/g++.target/i386/interference.C
> new file mode 100644
> index 00000000000..c7b910e3ada
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/interference.C
> @@ -0,0 +1,8 @@
> +// Test C++17 hardware interference size constants
> +// { dg-do compile { target c++17 } }
> +
> +#include <new>
> +
> +// It is generally agreed that these are the right values for all x86.
> +static_assert(std::hardware_destructive_interference_size == 64);
> +static_assert(std::hardware_constructive_interference_size == 64);
> diff --git a/libstdc++-v3/include/std/version
> b/libstdc++-v3/include/std/version
> index 27bcd32cb60..d5e155db48b 100644
> --- a/libstdc++-v3/include/std/version
> +++ b/libstdc++-v3/include/std/version
> @@ -140,6 +140,9 @@
>  #define __cpp_lib_filesystem 201703
>  #define __cpp_lib_gcd 201606
>  #define __cpp_lib_gcd_lcm 201606
> +#ifdef __GCC_DESTRUCTIVE_SIZE
> +# define __cpp_lib_hardware_interference_size 201703L
> +#endif
>  #define __cpp_lib_hypot 201603
>  #define __cpp_lib_invoke 201411L
>  #define __cpp_lib_lcm 201606
> diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
> index 3349b13fd1b..7bc67a6cb02 100644
> --- a/libstdc++-v3/libsupc++/new
> +++ b/libstdc++-v3/libsupc++/new
> @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*)
> _GLIBCXX_USE_NOEXCEPT { }
>  } // extern "C++"
>
>  #if __cplusplus >= 201703L
> -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
>  namespace std
>  {
> +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
>  #define __cpp_lib_launder 201606
>    /// Pointer optimization barrier [ptr.launder]
>    template<typename _Tp>
> @@ -205,8 +205,14 @@ namespace std
>    void launder(const void*) = delete;
>    void launder(volatile void*) = delete;
>    void launder(const volatile void*) = delete;
> -}
>  #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
> +
> +#ifdef __GCC_DESTRUCTIVE_SIZE
> +# define __cpp_lib_hardware_interference_size 201703L
> +  inline constexpr size_t hardware_destructive_interference_size =
> __GCC_DESTRUCTIVE_SIZE;
> +  inline constexpr size_t hardware_constructive_interference_size =
> __GCC_CONSTRUCTIVE_SIZE;
> +#endif // __GCC_DESTRUCTIVE_SIZE
> +}
>  #endif // C++17
>
>  #if __cplusplus > 201703L
>
> base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735
> prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092
> prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a
> --
> 2.27.0
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16  2:41 ` [PATCH] c++: implement C++17 hardware interference size Jason Merrill via Libc-alpha
@ 2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
  2021-07-16 11:17     ` Jonathan Wakely via Libc-alpha
  2021-07-16 13:26   ` Jonathan Wakely via Libc-alpha
  2021-07-16 15:12   ` Matthias Kretz
  2 siblings, 1 reply; 21+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-07-16  2:48 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Thu, Jul 15, 2021 at 10:41 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Adding CCs that got lost in the initial mail.
>
> On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill <jason@redhat.com> wrote:
>
> > The last missing piece of the C++17 standard library is the hardware
> > intereference size constants.  Much of the delay in implementing these
> has
> > been due to uncertainty about what the right values are, and even whether
> > there is a single constant value that is suitable; the destructive
> > interference size is intended to be used in structure layout, so program
> > ABIs will depend on it.
> >
> > In principle, both of these values should be the same as the target's L1
> > cache line size.  When compiling for a generic target that is intended to
> > support a range of target CPUs with different cache line sizes, the
> > constructive size should probably be the minimum size, and the
> destructive
> > size the maximum, unless you are constrained by ABI compatibility with
> > previous code.
> >
> > JF Bastien's implementation proposal is summarized at
> > https://github.com/itanium-cxx-abi/cxx-abi/issues/74
> >
> > I implement this by adding new --params for the two sizes.  Targets need
> to
> > override these values in targetm.target_option.override() to support the
> > feature.
> >
> > 64 bytes still seems correct for the x86 family.
> >
> > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> > target, so I'd think 32/64 would make more sense.
> >
> > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> > line, I've changed that to 64/256.  Does that seem right?
> >
> > Currently the patch does not adjust the values based on -march, as in
> JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values.
> Are
> > there -march values for which a smaller range than 64-256 makes sense?
> >
> > gcc/ChangeLog:
> >
> >         * params.opt: Add destructive-interference-size and
> >         constructive-interference-size.
> >         * doc/invoke.texi: Document them.
> >         * config/aarch64/aarch64.c (aarch64_override_options_internal):
> >         Set them.
> >         * config/arm/arm.c (arm_option_override): Set them.
> >         * config/i386/i386-options.c (ix86_option_override_internal):
> >         Set them.
> >
> > gcc/c-family/ChangeLog:
> >
> >         * c.opt: Add -Winterference-size.
> >         * c-cppbuiltin.c (cpp_atomic_builtins): Add
> __GCC_DESTRUCTIVE_SIZE
> >         and __GCC_CONSTRUCTIVE_SIZE.
> >
> > gcc/cp/ChangeLog:
> >
> >         * decl.c (cxx_init_decl_processing): Check
> >         --param *-interference-size values.
> >
> > libstdc++-v3/ChangeLog:
> >
> >         * include/std/version: Define
> __cpp_lib_hardware_interference_size.
> >         * libsupc++/new: Define hardware interference size variables.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * g++.target/aarch64/interference.C: New test.
> >         * g++.target/arm/interference.C: New test.
> >         * g++.target/i386/interference.C: New test.
> > ---
> >  gcc/doc/invoke.texi                           | 22 ++++++++++++++++++
> >  gcc/c-family/c.opt                            |  5 ++++
> >  gcc/params.opt                                | 15 ++++++++++++
> >  gcc/c-family/c-cppbuiltin.c                   | 12 ++++++++++
> >  gcc/config/aarch64/aarch64.c                  |  9 ++++++++
> >  gcc/config/arm/arm.c                          |  6 +++++
> >  gcc/config/i386/i386-options.c                |  6 +++++
> >  gcc/cp/decl.c                                 | 23 +++++++++++++++++++
> >  .../g++.target/aarch64/interference.C         |  9 ++++++++
> >  gcc/testsuite/g++.target/arm/interference.C   |  9 ++++++++
> >  gcc/testsuite/g++.target/i386/interference.C  |  8 +++++++
> >  libstdc++-v3/include/std/version              |  3 +++
> >  libstdc++-v3/libsupc++/new                    | 10 ++++++--
> >  13 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/arm/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/interference.C
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ea8812425e9..f93cb7a20f7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> > stride.
> >
> >  This setting is only useful for strides that are known and constant.
> >
> > +@item destructive_interference_size
> > +@item constructive_interference_size
> > +The values for the C++17 variables
> > +@code{std::hardware_destructive_interference_size} and
> > +@code{std::hardware_constructive_interference_size}.  The destructive
> > +interference size is the minimum recommended offset between two
> > +independent concurrently-accessed objects; the constructive
> > +interference size is the maximum recommended size of contiguous memory
> > +accessed together.  Typically both will be the size of an L1 cache
> > +line for the target, in bytes.  If the target can have a range of L1
> > +cache line sizes, typically the constructive interference size will be
> > +the small end of the range and the destructive size will be the large
> > +end.
> > +
> > +These values, particularly the destructive size, are intended to be
> > +used for layout, and thus have ABI impact.  The default values can
> > +change with @samp{-march}, and users should be aware of this and
> > +perhaps specify these values directly if they need to be ABI
> > +compatible with code that was compiled with a different @samp{-march}.
> > +Changing the default values for a generic target should be done
> > +cautiously.
> > +
> >  @item loop-interchange-max-num-stmts
> >  The maximum number of stmts in a loop to be interchanged.
> >
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 91929706aff..0398faf430a 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -722,6 +722,11 @@ Winit-list-lifetime
> >  C++ ObjC++ Var(warn_init_list) Warning Init(1)
> >  Warn about uses of std::initializer_list that can result in dangling
> > pointers.
> >
> > +Winterference-size
> > +C++ ObjC++ Var(warn_interference_size) Warning Init(1)
> > +Warn about nonsensical values of --param destructive-interference-size
> or
> > +constructive-interference-size.
> > +
> >  Wimplicit
> >  C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
> >  Warn about implicit declarations.
> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index 92b003e38cb..a81a3ec82f1 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding
> > into a jump table (in percent)
> >  Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param
> > Optimization
> >  The size of L1 cache line.
> >
> > +-param=destructive-interference-size=
> > +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param
> > Optimization
> > +The minimum recommended offset between two concurrently-accessed objects
> > to
> > +avoid additional performance degradation due to contention introduced by
> > the
> > +implementation.  Typically the L1 cache line size, but can be larger to
> > +accommodate a variety of target processors with different cache line
> > sizes.
> > +C++17 code might use this value in structure layout.
> > +
> > +-param=constructive-interference-size=
> > +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param
> > Optimization
> > +The maximum recommended size of contiguous memory occupied by two
> objects
> > +accessed with temporal locality by concurrent threads.  Typically the L1
> > cache
> > +line size, but can be smaller to accommodate a variety of target
> > processors with
> > +different cache line sizes.
> > +
> >  -param=l1-cache-size=
> >  Common Joined UInteger Var(param_l1_cache_size) Init(64) Param
> > Optimization
> >  The size of L1 cache.
> > diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> > index f79f939bd10..a7bf2544533 100644
> > --- a/gcc/c-family/c-cppbuiltin.c
> > +++ b/gcc/c-family/c-cppbuiltin.c
> > @@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile)
> >    builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
> >                                  targetm.atomic_test_and_set_trueval);
> >
> > +  /* Macros for C++17 hardware interference size constants.  Either both
> > or
> > +     neither should be set.  */
> > +  gcc_assert (!param_destruct_interfere_size
> > +             == !param_construct_interfere_size);
> > +  if (param_destruct_interfere_size)
> > +    {
> > +      builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
> > +                                    param_destruct_interfere_size);
> > +      builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
> > +                                    param_construct_interfere_size);
> > +    }
> > +
> >    /* ptr_type_node can't be used here since ptr_mode is only set when
> >       toplev calls backend_init which is not done with -E  or pch.  */
> >    psize = POINTER_SIZE_UNITS;
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index f5b25a7f704..b172fcdc93e 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -16297,6 +16297,15 @@ aarch64_override_options_internal (struct
> > gcc_options *opts)
> >      SET_OPTION_IF_UNSET (opts, &global_options_set,
> >                          param_l1_cache_line_size,
> >
> aarch64_tune_params.prefetch->l1_cache_line_size);
> > +  /* For a generic AArch64 target, cover the current range of cache line
> > +     sizes.  */
> > +  SET_OPTION_IF_UNSET (opts, &global_options_set,
> > +                      param_destruct_interfere_size,
> > +                      256);
> > +  SET_OPTION_IF_UNSET (opts, &global_options_set,
> > +                      param_construct_interfere_size,
> > +                      64);
> > +
> >    if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
> >      SET_OPTION_IF_UNSET (opts, &global_options_set,
> >                          param_l2_cache_size,
> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index 6d781e23ee9..edfa2ad3426 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -3656,6 +3656,12 @@ arm_option_override (void)
> >      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> >                          param_l1_cache_line_size,
> >                          current_tune->prefetch.l1_cache_line_size);
> > +  /* For a generic ARM target, JF Bastien proposed using 64 for both.
> > +     ??? Why not 32 for constructive?  */
> > +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > +                      param_destruct_interfere_size, 64);
> > +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > +                      param_construct_interfere_size, 64);
> >    if (current_tune->prefetch.l1_cache_size >= 0)
> >      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> >                          param_l1_cache_size,
> > diff --git a/gcc/config/i386/i386-options.c
> > b/gcc/config/i386/i386-options.c
> > index 7cba655595e..3b1b3c838f8 100644
> > --- a/gcc/config/i386/i386-options.c
> > +++ b/gcc/config/i386/i386-options.c
> > @@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p,
> >    SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
> >                        ix86_tune_cost->l2_cache_size);
> >
> > +  /* 64B is the accepted value for these for all x86.  */
> > +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > +                      param_destruct_interfere_size, 64);
> > +  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> > +                      param_construct_interfere_size, 64);
> > +
> >    /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.
> > */
> >    if (opts->x_flag_prefetch_loop_arrays < 0
> >        && HAVE_prefetch
> > diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> > index 01d64a16125..880fb8948a9 100644
> > --- a/gcc/cp/decl.c
> > +++ b/gcc/cp/decl.c
> > @@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void)
> >    /* Show we use EH for cleanups.  */
> >    if (flag_exceptions)
> >      using_eh_for_cleanups ();
> > +
> > +  /* Check that the hardware interference sizes are at least
> > +     alignof(max_align_t), as required by the standard.  */
> > +  if (param_destruct_interfere_size)
> > +    {
> > +      int max_align = max_align_t_align () / BITS_PER_UNIT;
> > +      if (param_destruct_interfere_size < max_align)
> > +       error ("%<--param destructive-interference-size=%d%> is less
> than "
> > +              "%d", param_destruct_interfere_size, max_align);
> > +      else if (param_destruct_interfere_size < param_l1_cache_line_size)
> > +       warning (OPT_Winterference_size,
> > +                "%<--param destructive-interference-size=%d%> "
> > +                "is less than %<--param l1-cache-line-size=%d%>",
> > +                param_destruct_interfere_size,
> param_l1_cache_line_size);
> > +      if (param_construct_interfere_size < max_align)
> > +       error ("%<--param constructive-interference-size=%d%> is less
> than
> > "
> > +              "%d", param_construct_interfere_size, max_align);
> > +      else if (param_construct_interfere_size >
> param_l1_cache_line_size)
> > +       warning (OPT_Winterference_size,
> > +                "%<--param constructive-interference-size=%d%> "
> > +                "is greater than %<--param l1-cache-line-size=%d%>",
> > +                param_construct_interfere_size,
> param_l1_cache_line_size);
> > +    }
> >  }
> >
> >  /* Enter an abi node in global-module context.  returns a cookie to
> > diff --git a/gcc/testsuite/g++.target/aarch64/interference.C
> > b/gcc/testsuite/g++.target/aarch64/interference.C
> > new file mode 100644
> > index 00000000000..0fc01655223
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/aarch64/interference.C
> > @@ -0,0 +1,9 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent
> > ones use
> > +// 128 or even 256.
> > +static_assert(std::hardware_destructive_interference_size == 256);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/gcc/testsuite/g++.target/arm/interference.C
> > b/gcc/testsuite/g++.target/arm/interference.C
> > new file mode 100644
> > index 00000000000..34fe8a52bff
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/arm/interference.C
> > @@ -0,0 +1,9 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// Recent ARM CPUs have a cache line size of 64.  Older ones have
> > +// a size of 32, but I guess they're old enough that we don't care?
> > +static_assert(std::hardware_destructive_interference_size == 64);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/gcc/testsuite/g++.target/i386/interference.C
> > b/gcc/testsuite/g++.target/i386/interference.C
> > new file mode 100644
> > index 00000000000..c7b910e3ada
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/interference.C
> > @@ -0,0 +1,8 @@
> > +// Test C++17 hardware interference size constants
> > +// { dg-do compile { target c++17 } }
> > +
> > +#include <new>
> > +
> > +// It is generally agreed that these are the right values for all x86.
> > +static_assert(std::hardware_destructive_interference_size == 64);
> > +static_assert(std::hardware_constructive_interference_size == 64);
> > diff --git a/libstdc++-v3/include/std/version
> > b/libstdc++-v3/include/std/version
> > index 27bcd32cb60..d5e155db48b 100644
> > --- a/libstdc++-v3/include/std/version
> > +++ b/libstdc++-v3/include/std/version
> > @@ -140,6 +140,9 @@
> >  #define __cpp_lib_filesystem 201703
> >  #define __cpp_lib_gcd 201606
> >  #define __cpp_lib_gcd_lcm 201606
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +#endif
> >  #define __cpp_lib_hypot 201603
> >  #define __cpp_lib_invoke 201411L
> >  #define __cpp_lib_lcm 201606
> > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
> > index 3349b13fd1b..7bc67a6cb02 100644
> > --- a/libstdc++-v3/libsupc++/new
> > +++ b/libstdc++-v3/libsupc++/new
> > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*)
> > _GLIBCXX_USE_NOEXCEPT { }
> >  } // extern "C++"
> >
> >  #if __cplusplus >= 201703L
> > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  namespace std
> >  {
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  #define __cpp_lib_launder 201606
> >    /// Pointer optimization barrier [ptr.launder]
> >    template<typename _Tp>
> > @@ -205,8 +205,14 @@ namespace std
> >    void launder(const void*) = delete;
> >    void launder(volatile void*) = delete;
> >    void launder(const volatile void*) = delete;
> > -}
> >  #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > +
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +  inline constexpr size_t hardware_destructive_interference_size =
> > __GCC_DESTRUCTIVE_SIZE;
> > +  inline constexpr size_t hardware_constructive_interference_size =
> > __GCC_CONSTRUCTIVE_SIZE;
> > +#endif // __GCC_DESTRUCTIVE_SIZE
> > +}
> >  #endif // C++17
> >
> >  #if __cplusplus > 201703L
> >
> > base-commit: f6dde32b9d487dd6e343d0a1e1d1f60783f5e735
> > prerequisite-patch-id: 62730bcaf1f07786fd756efb6f3bbd94d778c092
> > prerequisite-patch-id: 36fa087f6b261576422f8f3b1638f76e5183c95a
> > --
> > 2.27.0
> >
> >
>

On intel x86 systems with a private L2 cache the spatial prefetcher
can cause destructive interference along 128 byte aligned boundaries.
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
@ 2021-07-16 11:17     ` Jonathan Wakely via Libc-alpha
  2021-07-16 13:27       ` Richard Earnshaw via Libc-alpha
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Wakely via Libc-alpha @ 2021-07-16 11:17 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library, Jason Merrill

On Fri, 16 Jul 2021 at 03:51, Noah Goldstein wrote:
> On intel x86 systems with a private L2 cache the spatial prefetcher
> can cause destructive interference along 128 byte aligned boundaries.
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60

Which is a good example of why these "constants" should never have
been standardized in the first place. Sigh.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16  2:41 ` [PATCH] c++: implement C++17 hardware interference size Jason Merrill via Libc-alpha
  2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
@ 2021-07-16 13:26   ` Jonathan Wakely via Libc-alpha
  2021-07-16 15:12   ` Matthias Kretz
  2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Wakely via Libc-alpha @ 2021-07-16 13:26 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Fri, 16 Jul 2021 at 03:42, Jason Merrill via Libstdc++
<libstdc++@gcc.gnu.org> wrote:
> > diff --git a/libstdc++-v3/include/std/version
> > b/libstdc++-v3/include/std/version
> > index 27bcd32cb60..d5e155db48b 100644
> > --- a/libstdc++-v3/include/std/version
> > +++ b/libstdc++-v3/include/std/version
> > @@ -140,6 +140,9 @@
> >  #define __cpp_lib_filesystem 201703
> >  #define __cpp_lib_gcd 201606
> >  #define __cpp_lib_gcd_lcm 201606
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +#endif
> >  #define __cpp_lib_hypot 201603
> >  #define __cpp_lib_invoke 201411L
> >  #define __cpp_lib_lcm 201606
> > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
> > index 3349b13fd1b..7bc67a6cb02 100644
> > --- a/libstdc++-v3/libsupc++/new
> > +++ b/libstdc++-v3/libsupc++/new
> > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*)
> > _GLIBCXX_USE_NOEXCEPT { }
> >  } // extern "C++"
> >
> >  #if __cplusplus >= 201703L
> > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  namespace std
> >  {
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  #define __cpp_lib_launder 201606
> >    /// Pointer optimization barrier [ptr.launder]
> >    template<typename _Tp>
> > @@ -205,8 +205,14 @@ namespace std
> >    void launder(const void*) = delete;
> >    void launder(volatile void*) = delete;
> >    void launder(const volatile void*) = delete;
> > -}
> >  #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > +
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +  inline constexpr size_t hardware_destructive_interference_size =
> > __GCC_DESTRUCTIVE_SIZE;
> > +  inline constexpr size_t hardware_constructive_interference_size =
> > __GCC_CONSTRUCTIVE_SIZE;
> > +#endif // __GCC_DESTRUCTIVE_SIZE
> > +}
> >  #endif // C++17
> >
> >  #if __cplusplus > 201703L

Putting aside my dislike of the entire feature, the libstdc++ parts
are fine, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 11:17     ` Jonathan Wakely via Libc-alpha
@ 2021-07-16 13:27       ` Richard Earnshaw via Libc-alpha
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw via Libc-alpha @ 2021-07-16 13:27 UTC (permalink / raw)
  To: Jonathan Wakely, Noah Goldstein
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library



On 16/07/2021 12:17, Jonathan Wakely via Gcc-patches wrote:
> On Fri, 16 Jul 2021 at 03:51, Noah Goldstein wrote:
>> On intel x86 systems with a private L2 cache the spatial prefetcher
>> can cause destructive interference along 128 byte aligned boundaries.
>> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60
> 
> Which is a good example of why these "constants" should never have
> been standardized in the first place. Sigh.
> 

+1 for that.

I'll have a chat with our architecture guys, but I've no idea if they'll 
commit to any (useful) values for either constant.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16  2:41 ` [PATCH] c++: implement C++17 hardware interference size Jason Merrill via Libc-alpha
  2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
  2021-07-16 13:26   ` Jonathan Wakely via Libc-alpha
@ 2021-07-16 15:12   ` Matthias Kretz
  2021-07-16 15:30     ` Jason Merrill via Libc-alpha
  2021-07-16 17:20     ` Noah Goldstein via Libc-alpha
  2 siblings, 2 replies; 21+ messages in thread
From: Matthias Kretz @ 2021-07-16 15:12 UTC (permalink / raw)
  To: gcc-patches List
  Cc: Richard Earnshaw (lists), libstdc++, libc-alpha, Jason Merrill

On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > Currently the patch does not adjust the values based on -march, as in JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values. 
> > Are
> > there -march values for which a smaller range than 64-256 makes sense?

As a user who cares about ABI but also cares about maximizing performance of 
builds for a specific HPC setup I'd expect the hardware interference size 
values to be allowed to break ABIs. The point of these values is to give me 
better performance portability (but not necessarily binary portability) than 
my usual "pick 64 as a good average".

Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X should 
be interpreted as "my binary is supposed to be optimized for X, I accept 
inefficiencies on everything that's not X".

On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> On intel x86 systems with a private L2 cache the spatial prefetcher
> can cause destructive interference along 128 byte aligned boundaries.
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> 2-architectures-optimization-manual.pdf#page=60

I don't understand how this feature would lead to false sharing. But maybe I 
misunderstand the spatial prefetcher. The first access to one of the two cache 
lines pairs would bring both cache lines to LLC (and possibly L2). If a core 
with a different L2 reads the other cache line the cache line would be 
duplicated; if it writes to it, it would be exclusive to the other core's L2. 
The cache line pairs do not affect each other anymore. Maybe there's a minor 
inefficiency on initial transfer from memory, but isn't that all?

That said. Intel documents the spatial prefetcher exclusively for Sandy 
Bridge. So if you still believe 128 is necessary, set the destructive hardware 
interference size to 64 for all of x86 except -mtune=sandybridge.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 15:12   ` Matthias Kretz
@ 2021-07-16 15:30     ` Jason Merrill via Libc-alpha
  2021-07-16 16:54       ` Jonathan Wakely via Libc-alpha
  2021-07-16 17:20     ` Noah Goldstein via Libc-alpha
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Merrill via Libc-alpha @ 2021-07-16 15:30 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List, libc-alpha

On Fri, Jul 16, 2021, 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".
>
> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> > 2-architectures-optimization-manual.pdf#page=60
>
> I don't understand how this feature would lead to false sharing. But maybe
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=sandybridge.
>

Adjusting them based on tuning would certainly simplify a significant use
case, perhaps the only reasonable use.  Cases more concerned with ABI
stability probably shouldn't use them at all. And that would mean not
needing to worry about the impossible task of finding the right values for
an entire architecture.

I'm thinking about warning by default for any use of the variables without
explicitly specifying their values on the command line. Users could disable
the warning if they're happy using whatever the defaults happen to be.

Jason

>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 15:30     ` Jason Merrill via Libc-alpha
@ 2021-07-16 16:54       ` Jonathan Wakely via Libc-alpha
  2021-07-16 18:43         ` Jason Merrill via Libc-alpha
  2021-07-16 19:26         ` Matthias Kretz
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan Wakely via Libc-alpha @ 2021-07-16 16:54 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Richard Earnshaw (lists), GNU C Library, Matthias Kretz,
	gcc-patches List, libstdc++

On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> Adjusting them based on tuning would certainly simplify a significant use
> case, perhaps the only reasonable use.  Cases more concerned with ABI
> stability probably shouldn't use them at all. And that would mean not
> needing to worry about the impossible task of finding the right values for
> an entire architecture.

But it would be quite a significant change in behaviour if -mtune
started affecting ABI, wouldn't it?

> I'm thinking about warning by default for any use of the variables without
> explicitly specifying their values on the command line. Users could disable
> the warning if they're happy using whatever the defaults happen to be.

I like that suggestion.

Maybe the warning could suggest optimal values based on the current
-mtune flag. That way -mtune wouldn't need to alter ABI, but by
combining -mtune with explicit values for the variables you get the
best performance. And -mtune without overriding the default values
preserves ABI.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 15:12   ` Matthias Kretz
  2021-07-16 15:30     ` Jason Merrill via Libc-alpha
@ 2021-07-16 17:20     ` Noah Goldstein via Libc-alpha
  2021-07-16 19:37       ` Matthias Kretz
  1 sibling, 1 reply; 21+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-07-16 17:20 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".


> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> > 2-architectures-optimization-manual.pdf#page=60
>
> I don't understand how this feature would lead to false sharing. But maybe
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>

If two cores that do not share an L2 cache need exclusive access to
a cache-line, the L2 spatial prefetcher could cause pingponging if those
two cache-lines were adjacent and shared the same 128 byte alignment.
Say core A requests line x1 in exclusive, it also get line x2 (not sure
if x2 would be in shared or exclusive), core B then requests x2 in
exclusive,
it also gets x1. Irrelevant of the state x1 comes into core B's private L2
cache
it invalidates the exclusive state on cache-line x1 in core A's private L2
cache. If this was done in a loop (say a simple `lock add` loop) it would
cause
pingponging on cache-lines x1/x2 between core A and B's private L2 caches.


>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=sandybridge.
>

AFAIK the spatial prefetcher exists on newer x86_64 machines as well.


>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 16:54       ` Jonathan Wakely via Libc-alpha
@ 2021-07-16 18:43         ` Jason Merrill via Libc-alpha
  2021-07-16 19:26         ` Matthias Kretz
  1 sibling, 0 replies; 21+ messages in thread
From: Jason Merrill via Libc-alpha @ 2021-07-16 18:43 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Earnshaw (lists), GNU C Library, Matthias Kretz,
	gcc-patches List, libstdc++

On Fri, Jul 16, 2021, 12:54 PM Jonathan Wakely <jwakely@redhat.com> wrote:

> On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > Adjusting them based on tuning would certainly simplify a significant use
> > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > stability probably shouldn't use them at all. And that would mean not
> > needing to worry about the impossible task of finding the right values
> for
> > an entire architecture.
>
> But it would be quite a significant change in behaviour if -mtune
> started affecting ABI, wouldn't it?
>

Absolutely, though with the below warning, which could mention this issue,
it would only affect the ABI of code that ignores the warning. Code that
silences it by specifying values would not be affected.

> I'm thinking about warning by default for any use of the variables without
> > explicitly specifying their values on the command line. Users could
> disable
> > the warning if they're happy using whatever the defaults happen to be.
>
> I like that suggestion.
>
> Maybe the warning could suggest optimal values based on the current
> -mtune flag.


Sounds good.

That way -mtune wouldn't need to alter ABI, but by
> combining -mtune with explicit values for the variables you get the
> best performance. And -mtune without overriding the default values
> preserves ABI.
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 16:54       ` Jonathan Wakely via Libc-alpha
  2021-07-16 18:43         ` Jason Merrill via Libc-alpha
@ 2021-07-16 19:26         ` Matthias Kretz
  2021-07-16 19:58           ` Jonathan Wakely via Libc-alpha
  1 sibling, 1 reply; 21+ messages in thread
From: Matthias Kretz @ 2021-07-16 19:26 UTC (permalink / raw)
  To: Jason Merrill, Jonathan Wakely
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > Adjusting them based on tuning would certainly simplify a significant use
> > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > stability probably shouldn't use them at all. And that would mean not
> > needing to worry about the impossible task of finding the right values for
> > an entire architecture.
> 
> But it would be quite a significant change in behaviour if -mtune
> started affecting ABI, wouldn't it?

For existing code -mtune still doesn't affect ABI. The users who write 

struct keep_apart {
  alignas(std::hardware_destructive_interference_size) std::atomic<int> cat;
  alignas(std::hardware_destructive_interference_size) std::atomic<int> dog;
};

*want* to have different sizeof(keep_apart) depending on the CPU the code is 
compiled for. I.e. they *ask* for getting their ABI broken. If they wanted to 
specify the value themselves on the command line they'd written:

struct keep_apart {
  alignas(SOME_MACRO) std::atomic<int> cat;
  alignas(SOME_MACRO) std::atomic<int> dog;
};

I would be very disappointed if std::hardware_destructive_interference_size 
and std::hardware_constructive_interference_size turn into a glorified macro.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 17:20     ` Noah Goldstein via Libc-alpha
@ 2021-07-16 19:37       ` Matthias Kretz
  2021-07-16 21:23         ` Noah Goldstein via Libc-alpha
  0 siblings, 1 reply; 21+ messages in thread
From: Matthias Kretz @ 2021-07-16 19:37 UTC (permalink / raw)
  To: Noah Goldstein
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > I don't understand how this feature would lead to false sharing. But maybe
> > I
> > misunderstand the spatial prefetcher. The first access to one of the two
> > cache
> > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > core
> > with a different L2 reads the other cache line the cache line would be
> > duplicated; if it writes to it, it would be exclusive to the other core's
> > L2.
> > The cache line pairs do not affect each other anymore. Maybe there's a
> > minor
> > inefficiency on initial transfer from memory, but isn't that all?
> 
> If two cores that do not share an L2 cache need exclusive access to
> a cache-line, the L2 spatial prefetcher could cause pingponging if those
> two cache-lines were adjacent and shared the same 128 byte alignment.
> Say core A requests line x1 in exclusive, it also get line x2 (not sure
> if x2 would be in shared or exclusive), core B then requests x2 in
> exclusive,
> it also gets x1. Irrelevant of the state x1 comes into core B's private L2
> cache
> it invalidates the exclusive state on cache-line x1 in core A's private L2
> cache. If this was done in a loop (say a simple `lock add` loop) it would
> cause
> pingponging on cache-lines x1/x2 between core A and B's private L2 caches.

Quoting the latest ORM: "The following two hardware prefetchers fetched data 
from memory to the L2 cache and last level cache:
Spatial Prefetcher: This prefetcher strives to complete every cache line 
fetched to the L2 cache with the pair line that completes it to a 128-byte 
aligned chunk."

1. If the requested cache line is already present on some other core, the 
spatial prefetcher should not get used ("fetched data from memory").

2. The section is about data prefetching. It is unclear whether the spatial 
prefetcher applies at all for normal cache line fetches.

3. The ORM uses past tense ("The following two hardware prefetchers fetched 
data"), which indicates to me that Intel isn't doing this for newer 
generations anymore.

4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of cache 
line A and the adjacent cache line B thus is also loaded to LLC. Core 2 
request a read of line B and thus loads line A into LLC. Now both cores have 
both cache lines in LLC. Core 1 writes to line A, which invalidates line A in 
LLC of Core 2 but does not affect line B. Core 2 writes to line B, 
invalidating line A for Core 1. => no false sharing. Where did I get my mental 
cache protocol wrong?

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 19:26         ` Matthias Kretz
@ 2021-07-16 19:58           ` Jonathan Wakely via Libc-alpha
  2021-07-17  8:14             ` Matthias Kretz
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Wakely via Libc-alpha @ 2021-07-16 19:58 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library, Jason Merrill

On Fri, 16 Jul 2021 at 20:26, Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> > On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > > Adjusting them based on tuning would certainly simplify a significant use
> > > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > > stability probably shouldn't use them at all. And that would mean not
> > > needing to worry about the impossible task of finding the right values for
> > > an entire architecture.
> >
> > But it would be quite a significant change in behaviour if -mtune
> > started affecting ABI, wouldn't it?
>
> For existing code -mtune still doesn't affect ABI.

True, because existing code isn't using the constants.

>The users who write
>
> struct keep_apart {
>   alignas(std::hardware_destructive_interference_size) std::atomic<int> cat;
>   alignas(std::hardware_destructive_interference_size) std::atomic<int> dog;
> };
>
> *want* to have different sizeof(keep_apart) depending on the CPU the code is
> compiled for. I.e. they *ask* for getting their ABI broken.

Right, but the person who wants that and the person who chooses the
-mtune option might be different people.

A distro might add -mtune=core2 to all package builds by default, not
expecting it to cause ABI changes. Some header in a package in the
distro might start using the constants. Now everybody who includes
that header needs to use the same -mtune option as the distro default.

That change in the behaviour and expected use of an existing option
seems scary to me. Even with a warning about using the constants
(because somebody's just going to use #pragma around their use of the
constants to disable the warning, and now the ABI impact of -mtune is
much less obvious).

It's much less scary in a world where the code is written and used by
the same group of people, but for something like a linux distro it
worries me.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 19:37       ` Matthias Kretz
@ 2021-07-16 21:23         ` Noah Goldstein via Libc-alpha
  0 siblings, 0 replies; 21+ messages in thread
From: Noah Goldstein via Libc-alpha @ 2021-07-16 21:23 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library

On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > > I don't understand how this feature would lead to false sharing. But
> maybe
> > > I
> > > misunderstand the spatial prefetcher. The first access to one of the
> two
> > > cache
> > > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > > core
> > > with a different L2 reads the other cache line the cache line would be
> > > duplicated; if it writes to it, it would be exclusive to the other
> core's
> > > L2.
> > > The cache line pairs do not affect each other anymore. Maybe there's a
> > > minor
> > > inefficiency on initial transfer from memory, but isn't that all?
> >
> > If two cores that do not share an L2 cache need exclusive access to
> > a cache-line, the L2 spatial prefetcher could cause pingponging if those
> > two cache-lines were adjacent and shared the same 128 byte alignment.
> > Say core A requests line x1 in exclusive, it also get line x2 (not sure
> > if x2 would be in shared or exclusive), core B then requests x2 in
> > exclusive,
> > it also gets x1. Irrelevant of the state x1 comes into core B's private
> L2
> > cache
> > it invalidates the exclusive state on cache-line x1 in core A's private
> L2
> > cache. If this was done in a loop (say a simple `lock add` loop) it would
> > cause
> > pingponging on cache-lines x1/x2 between core A and B's private L2
> caches.
>
> Quoting the latest ORM: "The following two hardware prefetchers fetched
> data
> from memory to the L2 cache and last level cache:
> Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byte
> aligned chunk."
>
> 1. If the requested cache line is already present on some other core, the
> spatial prefetcher should not get used ("fetched data from memory").
>

I think this is correct and I'm incorrect that a request from LLC to L2
will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for
the added confusion!

>
> 2. The section is about data prefetching. It is unclear whether the
> spatial
> prefetcher applies at all for normal cache line fetches.
>
> 3. The ORM uses past tense ("The following two hardware prefetchers
> fetched
> data"), which indicates to me that Intel isn't doing this for newer
> generations anymore.


> 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of
> cache
> line A and the adjacent cache line B thus is also loaded to LLC. Core 2
> request a read of line B and thus loads line A into LLC. Now both cores
> have
> both cache lines in LLC. Core 1 writes to line A, which invalidates line A
> in
> LLC of Core 2 but does not affect line B. Core 2 writes to line B,
> invalidating line A for Core 1. => no false sharing. Where did I get my
> mental
> cache protocol wrong?


> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-16 19:58           ` Jonathan Wakely via Libc-alpha
@ 2021-07-17  8:14             ` Matthias Kretz
  2021-07-17 13:32               ` Jonathan Wakely via Libc-alpha
  0 siblings, 1 reply; 21+ messages in thread
From: Matthias Kretz @ 2021-07-17  8:14 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Earnshaw (lists), libstdc++, gcc-patches List,
	GNU C Library, Jason Merrill

On Friday, 16 July 2021 21:58:36 CEST Jonathan Wakely wrote:
> On Fri, 16 Jul 2021 at 20:26, Matthias Kretz <m.kretz@gsi.de> wrote:
> > On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> > > On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > > > Adjusting them based on tuning would certainly simplify a significant
> > > > use
> > > > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > > > stability probably shouldn't use them at all. And that would mean not
> > > > needing to worry about the impossible task of finding the right values
> > > > for
> > > > an entire architecture.
> > > 
> > > But it would be quite a significant change in behaviour if -mtune
> > > started affecting ABI, wouldn't it?
> > 
> > For existing code -mtune still doesn't affect ABI.
> 
> True, because existing code isn't using the constants.
> 
> >The users who write
> >
> > struct keep_apart {
> > 
> >   alignas(std::hardware_destructive_interference_size) std::atomic<int>
> >   cat;
> >   alignas(std::hardware_destructive_interference_size) std::atomic<int>
> >   dog;
> > 
> > };
> > 
> > *want* to have different sizeof(keep_apart) depending on the CPU the code
> > is compiled for. I.e. they *ask* for getting their ABI broken.
> 
> Right, but the person who wants that and the person who chooses the
> -mtune option might be different people.

Yes. But it was the intent of the person who wrote the code that the person 
compiling the code can change the data layout of keep_apart via -mtune. Of 
course, if the one compiling doesn't want to choose because the binary needs 
to work on the widest range of systems, then there's a problem we might want 
to solve (direction of target_clones?). (Or the developer of the library 
solves it by providing the ABI for all possible interference_size values.)

> A distro might add -mtune=core2 to all package builds by default, not
> expecting it to cause ABI changes. Some header in a package in the
> distro might start using the constants. Now everybody who includes
> that header needs to use the same -mtune option as the distro default.

If somebody writes a library with `keep_apart` in the public API/ABI then 
you're right.

> That change in the behaviour and expected use of an existing option
> seems scary to me. Even with a warning about using the constants
> (because somebody's just going to use #pragma around their use of the
> constants to disable the warning, and now the ABI impact of -mtune is
> much less obvious).

There are people who say that linking TUs compiled with different compiler 
flags is UB. In general I think that's correct, but we can make explicit 
exceptions. Up to now -mtune wouldn't lead to UB, AFAIK, though -march easily 
does. So maybe, to keep the status quo, the constants should be tied to -march 
not -mtune?

> It's much less scary in a world where the code is written and used by
> the same group of people, but for something like a linux distro it
> worries me.

The developer who wants his code to be included in a distro should care about 
binary distribution. If his code has an ABI issue, that's a bug he needs to 
fix. It's not the fault of the packager.



-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-17  8:14             ` Matthias Kretz
@ 2021-07-17 13:32               ` Jonathan Wakely via Libc-alpha
  2021-07-17 13:54                 ` Matthias Kretz
  2021-07-20 18:05                 ` Thomas Rodgers
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan Wakely via Libc-alpha @ 2021-07-17 13:32 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), GNU C Library, Jason Merrill, libstdc++,
	gcc-patches List, Jonathan Wakely

On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:

> On Friday, 16 July 2021 21:58:36 CEST Jonathan Wakely wrote:
> > On Fri, 16 Jul 2021 at 20:26, Matthias Kretz <m.kretz@gsi.de> wrote:
> > > On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> > > > On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > > > > Adjusting them based on tuning would certainly simplify a
> significant
> > > > > use
> > > > > case, perhaps the only reasonable use.  Cases more concerned with
> ABI
> > > > > stability probably shouldn't use them at all. And that would mean
> not
> > > > > needing to worry about the impossible task of finding the right
> values
> > > > > for
> > > > > an entire architecture.
> > > >
> > > > But it would be quite a significant change in behaviour if -mtune
> > > > started affecting ABI, wouldn't it?
> > >
> > > For existing code -mtune still doesn't affect ABI.
> >
> > True, because existing code isn't using the constants.
> >
> > >The users who write
> > >
> > > struct keep_apart {
> > >
> > >   alignas(std::hardware_destructive_interference_size) std::atomic<int>
> > >   cat;
> > >   alignas(std::hardware_destructive_interference_size) std::atomic<int>
> > >   dog;
> > >
> > > };
> > >
> > > *want* to have different sizeof(keep_apart) depending on the CPU the
> code
> > > is compiled for. I.e. they *ask* for getting their ABI broken.
> >
> > Right, but the person who wants that and the person who chooses the
> > -mtune option might be different people.
>
> Yes. But it was the intent of the person who wrote the code that the
> person
> compiling the code can change the data layout of keep_apart via -mtune. Of
> course, if the one compiling doesn't want to choose because the binary
> needs
> to work on the widest range of systems, then there's a problem we might
> want
> to solve (direction of target_clones?). (Or the developer of the library
> solves it by providing the ABI for all possible interference_size values.)
>
> > A distro might add -mtune=core2 to all package builds by default, not
> > expecting it to cause ABI changes. Some header in a package in the
> > distro might start using the constants. Now everybody who includes
> > that header needs to use the same -mtune option as the distro default.
>
> If somebody writes a library with `keep_apart` in the public API/ABI then
> you're right.
>

Yes, it's fine if those constants don't affect anything across module
boundaries.


> > That change in the behaviour and expected use of an existing option
> > seems scary to me. Even with a warning about using the constants
> > (because somebody's just going to use #pragma around their use of the
> > constants to disable the warning, and now the ABI impact of -mtune is
> > much less obvious).
>
> There are people who say that linking TUs compiled with different compiler
> flags is UB. In general I think that's correct, but we can make explicit
> exceptions. Up to now -mtune wouldn't lead to UB, AFAIK, though -march
> easily
> does. So maybe, to keep the status quo, the constants should be tied to
> -march
> not -mtune?
>
> > It's much less scary in a world where the code is written and used by
> > the same group of people, but for something like a linux distro it
> > worries me.
>
> The developer who wants his code to be included in a distro should care
> about
> binary distribution. If his code has an ABI issue, that's a bug he needs
> to
> fix. It's not the fault of the packager.
>

Yes but in practice it's the packagers who have to deal with the bug
reports, analyze the problem, and often fix the bug too. It might not be
the packager's fault but it's often their problem :-(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-17 13:32               ` Jonathan Wakely via Libc-alpha
@ 2021-07-17 13:54                 ` Matthias Kretz
  2021-07-17 21:37                   ` Jason Merrill via Libc-alpha
  2021-07-20 18:05                 ` Thomas Rodgers
  1 sibling, 1 reply; 21+ messages in thread
From: Matthias Kretz @ 2021-07-17 13:54 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Earnshaw (lists), GNU C Library, Jason Merrill, libstdc++,
	gcc-patches List, Jonathan Wakely

On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:
> On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:
> > If somebody writes a library with `keep_apart` in the public API/ABI then
> > you're right.
> 
> Yes, it's fine if those constants don't affect anything across module
> boundaries.

I believe a significant fraction of hardware interference size usage will be 
internal.

> > The developer who wants his code to be included in a distro should care
> > about
> > binary distribution. If his code has an ABI issue, that's a bug he needs
> > to
> > fix. It's not the fault of the packager.
> 
> Yes but in practice it's the packagers who have to deal with the bug
> reports, analyze the problem, and often fix the bug too. It might not be
> the packager's fault but it's often their problem 

I can imagine. But I don't think requiring users to specify the value 
according to what -mtune suggests will improve things. Users will write a 
configure/cmake/... macro to parse the value -mtune prints and pass that on 
the command line (we'll soon find this solution on SO 😜). I.e. things are 
likely to be even more broken.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-17 13:54                 ` Matthias Kretz
@ 2021-07-17 21:37                   ` Jason Merrill via Libc-alpha
  2021-07-19  9:41                     ` Richard Earnshaw via Libc-alpha
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Merrill via Libc-alpha @ 2021-07-17 21:37 UTC (permalink / raw)
  To: Matthias Kretz
  Cc: Richard Earnshaw (lists), Jonathan Wakely, GNU C Library,
	libstdc++, gcc-patches List, Jonathan Wakely

On Sat, Jul 17, 2021 at 6:55 AM Matthias Kretz <m.kretz@gsi.de> wrote:

> On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:
> > On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:
> > > If somebody writes a library with `keep_apart` in the public API/ABI
> then
> > > you're right.
> >
> > Yes, it's fine if those constants don't affect anything across module
> > boundaries.
>
> I believe a significant fraction of hardware interference size usage will
> be
> internal.
>

I would hope for this to be the vast majority of usage.  I want the warning
to discourage people from using the interference size variables in the
public API of a library.


> > > The developer who wants his code to be included in a distro should care
> > > about
> > > binary distribution. If his code has an ABI issue, that's a bug he
> needs
> > > to
> > > fix. It's not the fault of the packager.
> >
> > Yes but in practice it's the packagers who have to deal with the bug
> > reports, analyze the problem, and often fix the bug too. It might not be
> > the packager's fault but it's often their problem
>
> I can imagine. But I don't think requiring users to specify the value
> according to what -mtune suggests will improve things. Users will write a
> configure/cmake/... macro to parse the value -mtune prints and pass that
> on
> the command line (we'll soon find this solution on SO 😜). I.e. things are
> likely to be even more broken.


Simpler would be a flag to say "set them based on -mtune", e.g.
-finterference-tuning or --param destructive-intereference-size=tuning.
That would be just as easy to write as -Wno-interference-size.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-17 21:37                   ` Jason Merrill via Libc-alpha
@ 2021-07-19  9:41                     ` Richard Earnshaw via Libc-alpha
  2021-07-20 16:43                       ` Jason Merrill via Libc-alpha
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw via Libc-alpha @ 2021-07-19  9:41 UTC (permalink / raw)
  To: Jason Merrill, Matthias Kretz
  Cc: Richard Earnshaw (lists), Jonathan Wakely, GNU C Library,
	libstdc++, gcc-patches List, Jonathan Wakely



On 17/07/2021 22:37, Jason Merrill via Gcc-patches wrote:
> On Sat, Jul 17, 2021 at 6:55 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> 
>> On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:
>>> On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:
>>>> If somebody writes a library with `keep_apart` in the public API/ABI
>> then
>>>> you're right.
>>>
>>> Yes, it's fine if those constants don't affect anything across module
>>> boundaries.
>>
>> I believe a significant fraction of hardware interference size usage will
>> be
>> internal.
>>
> 
> I would hope for this to be the vast majority of usage.  I want the warning
> to discourage people from using the interference size variables in the
> public API of a library.
> 
> 
>>>> The developer who wants his code to be included in a distro should care
>>>> about
>>>> binary distribution. If his code has an ABI issue, that's a bug he
>> needs
>>>> to
>>>> fix. It's not the fault of the packager.
>>>
>>> Yes but in practice it's the packagers who have to deal with the bug
>>> reports, analyze the problem, and often fix the bug too. It might not be
>>> the packager's fault but it's often their problem
>>
>> I can imagine. But I don't think requiring users to specify the value
>> according to what -mtune suggests will improve things. Users will write a
>> configure/cmake/... macro to parse the value -mtune prints and pass that
>> on
>> the command line (we'll soon find this solution on SO 😜). I.e. things are
>> likely to be even more broken.
> 
> 
> Simpler would be a flag to say "set them based on -mtune", e.g.
> -finterference-tuning or --param destructive-intereference-size=tuning.
> That would be just as easy to write as -Wno-interference-size.
> 
> Jason
> 

Please be very careful about an option name like that.  The x86 meaning 
and interpretation of -mtune is subtly different to that of Arm and 
AArch64 and possibly other targets as well.

Also, should the behaviour of a compiler configured with --with-cpu=foo 
be handled differently to a command-line option that sets foo 
explicitly?  In the back-end I'm not sure we can really tell the difference.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-19  9:41                     ` Richard Earnshaw via Libc-alpha
@ 2021-07-20 16:43                       ` Jason Merrill via Libc-alpha
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Merrill via Libc-alpha @ 2021-07-20 16:43 UTC (permalink / raw)
  To: Richard Earnshaw, Matthias Kretz
  Cc: Richard Earnshaw (lists), Jonathan Wakely, GNU C Library,
	libstdc++, gcc-patches List, Jonathan Wakely

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On 7/19/21 5:41 AM, Richard Earnshaw wrote:
> 
> 
> On 17/07/2021 22:37, Jason Merrill via Gcc-patches wrote:
>> On Sat, Jul 17, 2021 at 6:55 AM Matthias Kretz <m.kretz@gsi.de> wrote:
>>
>>> On Saturday, 17 July 2021 15:32:42 CEST Jonathan Wakely wrote:
>>>> On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:
>>>>> If somebody writes a library with `keep_apart` in the public API/ABI
>>> then
>>>>> you're right.
>>>>
>>>> Yes, it's fine if those constants don't affect anything across module
>>>> boundaries.
>>>
>>> I believe a significant fraction of hardware interference size usage 
>>> will
>>> be
>>> internal.
>>>
>>
>> I would hope for this to be the vast majority of usage.  I want the 
>> warning
>> to discourage people from using the interference size variables in the
>> public API of a library.
>>
>>
>>>>> The developer who wants his code to be included in a distro should 
>>>>> care
>>>>> about
>>>>> binary distribution. If his code has an ABI issue, that's a bug he
>>> needs
>>>>> to
>>>>> fix. It's not the fault of the packager.
>>>>
>>>> Yes but in practice it's the packagers who have to deal with the bug
>>>> reports, analyze the problem, and often fix the bug too. It might 
>>>> not be
>>>> the packager's fault but it's often their problem
>>>
>>> I can imagine. But I don't think requiring users to specify the value
>>> according to what -mtune suggests will improve things. Users will 
>>> write a
>>> configure/cmake/... macro to parse the value -mtune prints and pass that
>>> on
>>> the command line (we'll soon find this solution on SO 😜). I.e. 
>>> things are
>>> likely to be even more broken.
>>
>>
>> Simpler would be a flag to say "set them based on -mtune", e.g.
>> -finterference-tuning or --param destructive-intereference-size=tuning.
>> That would be just as easy to write as -Wno-interference-size.
> 
> Please be very careful about an option name like that.  The x86 meaning 
> and interpretation of -mtune is subtly different to that of Arm and 
> AArch64 and possibly other targets as well.
> 
> Also, should the behaviour of a compiler configured with --with-cpu=foo 
> be handled differently to a command-line option that sets foo 
> explicitly?  In the back-end I'm not sure we can really tell the 
> difference.

I don't see any reason to treat them differently.  The meaning of this 
option would be "set the interference sizes to be optimal for the 
current target CPU, without regard for ABI stability".  For x86 this 
wouldn't have any effect; for Arm/AArch64 it would set them to the 
tuning L1 cache line size, if set.

Here's what I have currently:

Jason

[-- Attachment #2: 0001-c-implement-C-17-hardware-interference-size.patch --]
[-- Type: text/x-patch, Size: 21657 bytes --]

From b10bfd228f23ef2f7499802c8fd1c84798646039 Mon Sep 17 00:00:00 2001
From: Jason Merrill <jason@redhat.com>
Date: Thu, 15 Jul 2021 15:30:17 -0400
Subject: [PATCH] c++: implement C++17 hardware interference size
To: gcc-patches@gcc.gnu.org

The last missing piece of the C++17 standard library is the hardware
intereference size constants.  Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.

In principle, both of these values should be the same as the target's L1
cache line size.  When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.

JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74

I implement this by adding new --params for the two sizes.  Targets need to
override these values in targetm.target_option.override() to support the
feature.

64 bytes still seems correct for the x86 family.

I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
target, so I'd think 32/64 would make more sense.

He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
line, I've changed that to 64/256.  Does that seem right?

Currently the patch does not adjust the values based on -march, as in JF's
proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
how to go about that.  --param l1-cache-line-size is set based on -mtune,
but I don't think we want -mtune to change these ABI-affecting values.  Are
there -march values for which a smaller range than 64-256 makes sense?

gcc/ChangeLog:

	* params.opt: Add destructive-interference-size and
	constructive-interference-size.
	* doc/invoke.texi: Document them.
	* config/aarch64/aarch64.c (aarch64_override_options_internal):
	Set them.
	* config/arm/arm.c (arm_option_override): Set them.
	* config/i386/i386-options.c (ix86_option_override_internal):
	Set them.

gcc/c-family/ChangeLog:

	* c.opt: Add -Winterference-size.
	* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
	and __GCC_CONSTRUCTIVE_SIZE.

gcc/cp/ChangeLog:

	* decl.c (cxx_init_decl_processing): Check
	--param *-interference-size values.

libstdc++-v3/ChangeLog:

	* include/std/version: Define __cpp_lib_hardware_interference_size.
	* libsupc++/new: Define hardware interference size variables.

gcc/testsuite/ChangeLog:

	* g++.target/aarch64/interference.C: New test.
	* g++.target/arm/interference.C: New test.
	* g++.target/i386/interference.C: New test.
---
 gcc/doc/invoke.texi                           | 61 +++++++++++++++++++
 gcc/c-family/c.opt                            |  5 ++
 gcc/common.opt                                |  4 ++
 gcc/params.opt                                | 15 +++++
 gcc/c-family/c-cppbuiltin.c                   | 12 ++++
 gcc/config/aarch64/aarch64.c                  | 23 +++++++
 gcc/config/arm/arm.c                          | 20 ++++++
 gcc/config/i386/i386-options.c                |  6 ++
 gcc/cp/constexpr.c                            | 39 ++++++++++++
 gcc/cp/decl.c                                 | 23 +++++++
 .../g++.target/aarch64/interference.C         |  9 +++
 gcc/testsuite/g++.target/arm/interference.C   |  9 +++
 gcc/testsuite/g++.target/i386/interference.C  |  8 +++
 libstdc++-v3/include/std/version              |  3 +
 libstdc++-v3/libsupc++/new                    | 10 ++-
 15 files changed, 245 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
 create mode 100644 gcc/testsuite/g++.target/arm/interference.C
 create mode 100644 gcc/testsuite/g++.target/i386/interference.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c..6e7bc43c4cb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8992,6 +8992,27 @@ that has already been done in the current function.  Therefore,
 seemingly insignificant changes in the source program can cause the
 warnings produced by @option{-Winline} to appear or disappear.
 
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+The ideal value for the variable depends on how widely the code being
+compiled will be deployed, and how important its ABI stability is.
+
+If performance on a specific CPU is most important, you probably want
+to use @option{-finterference-tune}.
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all.
+
+If neither of these applies to your code, i.e. the use does not affect
+ABI outside your project and you want to optimize for a generic
+target, you can turn off the warning with
+@option{-Wno-interference-size}.
+
 @item -Wint-in-bool-context
 @opindex Wint-in-bool-context
 @opindex Wno-int-in-bool-context
@@ -10472,6 +10493,23 @@ and treated equal to @option{-ffp-contract=off}.
 
 The default is @option{-ffp-contract=fast}.
 
+@item -finterference-tune
+@opindex finterference-tune
+Set @option{--param destructive-interference-size} and @option{--param
+constructive-interference-size} based on the current @option{-mtune}
+option, typically to the L1 cache line size for the particular target
+CPU, sometimes to a range if tuning for a generic target.
+
+With this option, all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+this option, and the same @option{-mtune} (or @option{-mcpu}).
+
+Use of the C++17 hardware interference size variables in a context for
+which ABI stability is important is always dangerous, but even more so
+with this option.
+
+See also @option{-Winterference-size}.
+
 @item -fomit-frame-pointer
 @opindex fomit-frame-pointer
 Omit the frame pointer in functions that don't need one.  This avoids the
@@ -13873,6 +13911,29 @@ prefetch hints can be issued for any constant stride.
 
 This setting is only useful for strides that are known and constant.
 
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}.  The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together.  Typically both will be the size of an L1 cache
+line for the target, in bytes.  For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+These values, particularly the destructive size, are intended to be
+used for layout, and thus have ABI impact.  The default values are not
+guaranteed to be stable, so use of these variables in a context where
+ABI stability is important, such as the public interface of a library,
+is strongly discouraged; if they are used in that context, users can
+stabilize the values using these options.
+
+See also @option{-finterference-tune} and @option{-Winterference-size}.
+
 @item loop-interchange-max-num-stmts
 The maximum number of stmts in a loop to be interchanged.
 
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 91929706aff..0398faf430a 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
 C++ ObjC++ Var(warn_init_list) Warning Init(1)
 Warn about uses of std::initializer_list that can result in dangling pointers.
 
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
 Wimplicit
 C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
 Warn about implicit declarations.
diff --git a/gcc/common.opt b/gcc/common.opt
index d9da1131eda..58f1f48c39b 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1835,6 +1835,10 @@ finstrument-functions-exclude-file-list=
 Common RejectNegative Joined
 -finstrument-functions-exclude-file-list=filename,...	Do not instrument functions listed in files.
 
+finterference-tune
+Common Var(flag_interference_tune)
+Set C++17 hardware interference size variables based on the current CPU tuning.
+
 fipa-cp
 Common Var(flag_ipa_cp) Optimization
 Perform interprocedural constant propagation.
diff --git a/gcc/params.opt b/gcc/params.opt
index 92b003e38cb..a81a3ec82f1 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -358,6 +358,21 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
 Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
 The size of L1 cache line.
 
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation.  Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads.  Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
 -param=l1-cache-size=
 Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
 The size of L1 cache.
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index f79f939bd10..a7bf2544533 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,18 @@ cpp_atomic_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
 				 targetm.atomic_test_and_set_trueval);
 
+  /* Macros for C++17 hardware interference size constants.  Either both or
+     neither should be set.  */
+  gcc_assert (!param_destruct_interfere_size
+	      == !param_construct_interfere_size);
+  if (param_destruct_interfere_size)
+    {
+      builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+				     param_destruct_interfere_size);
+      builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+				     param_construct_interfere_size);
+    }
+
   /* ptr_type_node can't be used here since ptr_mode is only set when
      toplev calls backend_init which is not done with -E  or pch.  */
   psize = POINTER_SIZE_UNITS;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3bdf19d71b5..c244da98786 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16297,6 +16297,29 @@ aarch64_override_options_internal (struct gcc_options *opts)
     SET_OPTION_IF_UNSET (opts, &global_options_set,
 			 param_l1_cache_line_size,
 			 aarch64_tune_params.prefetch->l1_cache_line_size);
+
+  if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0
+      && flag_interference_tune)
+    {
+      SET_OPTION_IF_UNSET (opts, &global_options_set,
+			   param_destruct_interfere_size,
+			   aarch64_tune_params.prefetch->l1_cache_line_size);
+      SET_OPTION_IF_UNSET (opts, &global_options_set,
+			   param_construct_interfere_size,
+			   aarch64_tune_params.prefetch->l1_cache_line_size);
+    }
+  else
+    {
+      /* For a generic AArch64 target, cover the current range of cache line
+	 sizes.  */
+      SET_OPTION_IF_UNSET (opts, &global_options_set,
+			   param_destruct_interfere_size,
+			   256);
+      SET_OPTION_IF_UNSET (opts, &global_options_set,
+			   param_construct_interfere_size,
+			   64);
+    }
+
   if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
     SET_OPTION_IF_UNSET (opts, &global_options_set,
 			 param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6d781e23ee9..9a651f0c599 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3656,6 +3656,26 @@ arm_option_override (void)
     SET_OPTION_IF_UNSET (&global_options, &global_options_set,
 			 param_l1_cache_line_size,
 			 current_tune->prefetch.l1_cache_line_size);
+  if (current_tune->prefetch.l1_cache_line_size >= 0
+      && flag_interference_tune)
+    {
+      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+			   param_destruct_interfere_size,
+			   current_tune->prefetch.l1_cache_line_size);
+      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+			   param_construct_interfere_size,
+			   current_tune->prefetch.l1_cache_line_size);
+    }
+  else
+    {
+      /* For a generic ARM target, JF Bastien proposed using 64 for both.
+	 ??? Why not 32 for constructive?  */
+      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+			   param_destruct_interfere_size, 64);
+      SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+			   param_construct_interfere_size, 64);
+    }
+
   if (current_tune->prefetch.l1_cache_size >= 0)
     SET_OPTION_IF_UNSET (&global_options, &global_options_set,
 			 param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 3416a4f1752..2f9da1a2a28 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2571,6 +2571,12 @@ ix86_option_override_internal (bool main_args_p,
   SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
 		       ix86_tune_cost->l2_cache_size);
 
+  /* 64B is the accepted value for these for all x86.  */
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+		       param_destruct_interfere_size, 64);
+  SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+		       param_construct_interfere_size, 64);
+
   /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
   if (opts->x_flag_prefetch_loop_arrays < 0
       && HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index f0b8d252d6b..f47701fca39 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6051,6 +6051,43 @@ inline_asm_in_constexpr_error (location_t loc)
 	  "%<constexpr%> function in C++20");
 }
 
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+   context; maybe complain about that.  */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+  if (cxx_dialect >= cxx17
+      && warn_interference_size
+      && !flag_interference_tune
+      && DECL_CONTEXT (decl) == std_node
+      && !strncmp (IDENTIFIER_POINTER (DECL_NAME (decl)),
+		   "hardware_", 9))
+    {
+      static bool warned = false;
+      if (id_equal (DECL_NAME (decl),
+			 "hardware_destructive_interference_size"))
+	{
+	  if (!global_options_set.x_param_destruct_interfere_size
+	      && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+	      && !warned)
+	    {
+	      warned = true;
+	      inform (loc, "if this use is part of a public ABI, change it to "
+		      "use a constant or a different variable to protect "
+		      "against changes in the default");
+	      inform (loc, "the default value for this target is %d bytes.",
+		      param_destruct_interfere_size);
+	      inform (loc, "L1 cache line size from %<-mtune%> is %d bytes.",
+		      param_l1_cache_line_size);
+	      if (param_destruct_interfere_size != param_l1_cache_line_size)
+		inform (loc, "to adjust the interference size to match, use "
+			"%<-finterference-tune%>");
+	    }
+	}
+    }
+}
+
 /* Attempt to reduce the expression T to a constant value.
    On failure, issue diagnostic and return error_mark_node.  */
 /* FIXME unify with c_fully_fold */
@@ -6195,6 +6232,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 	      r = *p;
 	      break;
 	    }
+      if (ctx->manifestly_const_eval)
+	maybe_warn_about_constant_value (loc, t);
       if (COMPLETE_TYPE_P (TREE_TYPE (t))
 	  && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
 	{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 01d64a16125..880fb8948a9 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4732,6 +4732,29 @@ cxx_init_decl_processing (void)
   /* Show we use EH for cleanups.  */
   if (flag_exceptions)
     using_eh_for_cleanups ();
+
+  /* Check that the hardware interference sizes are at least
+     alignof(max_align_t), as required by the standard.  */
+  if (param_destruct_interfere_size)
+    {
+      int max_align = max_align_t_align () / BITS_PER_UNIT;
+      if (param_destruct_interfere_size < max_align)
+	error ("%<--param destructive-interference-size=%d%> is less than "
+	       "%d", param_destruct_interfere_size, max_align);
+      else if (param_destruct_interfere_size < param_l1_cache_line_size)
+	warning (OPT_Winterference_size,
+		 "%<--param destructive-interference-size=%d%> "
+		 "is less than %<--param l1-cache-line-size=%d%>",
+		 param_destruct_interfere_size, param_l1_cache_line_size);
+      if (param_construct_interfere_size < max_align)
+	error ("%<--param constructive-interference-size=%d%> is less than "
+	       "%d", param_construct_interfere_size, max_align);
+      else if (param_construct_interfere_size > param_l1_cache_line_size)
+	warning (OPT_Winterference_size,
+		 "%<--param constructive-interference-size=%d%> "
+		 "is greater than %<--param l1-cache-line-size=%d%>",
+		 param_construct_interfere_size, param_l1_cache_line_size);
+    }
 }
 
 /* Enter an abi node in global-module context.  returns a cookie to
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64.  Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 27bcd32cb60..d5e155db48b 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
 #define __cpp_lib_filesystem 201703
 #define __cpp_lib_gcd 201606
 #define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
 #define __cpp_lib_hypot 201603
 #define __cpp_lib_invoke 201411L
 #define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
 } // extern "C++"
 
 #if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
 namespace std
 {
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
 #define __cpp_lib_launder 201606
   /// Pointer optimization barrier [ptr.launder]
   template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
   void launder(const void*) = delete;
   void launder(volatile void*) = delete;
   void launder(const volatile void*) = delete;
-}
 #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+  inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+  inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
 #endif // C++17
 
 #if __cplusplus > 201703L
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] c++: implement C++17 hardware interference size
  2021-07-17 13:32               ` Jonathan Wakely via Libc-alpha
  2021-07-17 13:54                 ` Matthias Kretz
@ 2021-07-20 18:05                 ` Thomas Rodgers
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Rodgers @ 2021-07-20 18:05 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Richard Earnshaw (lists), Matthias Kretz, GNU C Library,
	libstdc++, gcc-patches List

On 2021-07-17 06:32, Jonathan Wakely via Gcc-patches wrote:

> On Sat, 17 Jul 2021, 09:15 Matthias Kretz, <m.kretz@gsi.de> wrote:
> 
> On Friday, 16 July 2021 21:58:36 CEST Jonathan Wakely wrote: On Fri, 16 
> Jul 2021 at 20:26, Matthias Kretz <m.kretz@gsi.de> wrote: On Friday, 16 
> July 2021 18:54:30 CEST Jonathan Wakely wrote: On Fri, 16 Jul 2021 at 
> 16:33, Jason Merrill wrote: Adjusting them based on tuning would 
> certainly simplify a
  significant

> use
> case, perhaps the only reasonable use.  Cases more concerned with
  ABI

> stability probably shouldn't use them at all. And that would mean
  not

> needing to worry about the impossible task of finding the right
  values

> for
> an entire architecture.
> But it would be quite a significant change in behaviour if -mtune
> started affecting ABI, wouldn't it?

For existing code -mtune still doesn't affect ABI.
True, because existing code isn't using the constants.

> The users who write
> 
> struct keep_apart {
> 
> alignas(std::hardware_destructive_interference_size) std::atomic<int>
> cat;
> alignas(std::hardware_destructive_interference_size) std::atomic<int>
> dog;
> 
> };
> 
> *want* to have different sizeof(keep_apart) depending on the CPU the
  code

>> is compiled for. I.e. they *ask* for getting their ABI broken.
> 
> Right, but the person who wants that and the person who chooses the
> -mtune option might be different people.

Yes. But it was the intent of the person who wrote the code that the
person
compiling the code can change the data layout of keep_apart via -mtune. 
Of
course, if the one compiling doesn't want to choose because the binary
needs
to work on the widest range of systems, then there's a problem we might
want
to solve (direction of target_clones?). (Or the developer of the library
solves it by providing the ABI for all possible interference_size 
values.)

> A distro might add -mtune=core2 to all package builds by default, not
> expecting it to cause ABI changes. Some header in a package in the
> distro might start using the constants. Now everybody who includes
> that header needs to use the same -mtune option as the distro default.

If somebody writes a library with `keep_apart` in the public API/ABI 
then
you're right.

Yes, it's fine if those constants don't affect anything across module
boundaries.

>> That change in the behaviour and expected use of an existing option
>> seems scary to me. Even with a warning about using the constants
>> (because somebody's just going to use #pragma around their use of the
>> constants to disable the warning, and now the ABI impact of -mtune is
>> much less obvious).
> 
> There are people who say that linking TUs compiled with different 
> compiler
> flags is UB. In general I think that's correct, but we can make 
> explicit
> exceptions. Up to now -mtune wouldn't lead to UB, AFAIK, though -march
> easily
> does. So maybe, to keep the status quo, the constants should be tied to
> -march
> not -mtune?
> 
>> It's much less scary in a world where the code is written and used by
>> the same group of people, but for something like a linux distro it
>> worries me.
> 
> The developer who wants his code to be included in a distro should care
> about
> binary distribution. If his code has an ABI issue, that's a bug he 
> needs
> to
> fix. It's not the fault of the packager.

Yes but in practice it's the packagers who have to deal with the bug
reports, analyze the problem, and often fix the bug too. It might not be
the packager's fault but it's often their problem :-(

Apropos of nothing, I can absolutely see the use of this creeping into 
Boost at some point.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-07-20 18:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210716023656.670004-1-jason@redhat.com>
2021-07-16  2:41 ` [PATCH] c++: implement C++17 hardware interference size Jason Merrill via Libc-alpha
2021-07-16  2:48   ` Noah Goldstein via Libc-alpha
2021-07-16 11:17     ` Jonathan Wakely via Libc-alpha
2021-07-16 13:27       ` Richard Earnshaw via Libc-alpha
2021-07-16 13:26   ` Jonathan Wakely via Libc-alpha
2021-07-16 15:12   ` Matthias Kretz
2021-07-16 15:30     ` Jason Merrill via Libc-alpha
2021-07-16 16:54       ` Jonathan Wakely via Libc-alpha
2021-07-16 18:43         ` Jason Merrill via Libc-alpha
2021-07-16 19:26         ` Matthias Kretz
2021-07-16 19:58           ` Jonathan Wakely via Libc-alpha
2021-07-17  8:14             ` Matthias Kretz
2021-07-17 13:32               ` Jonathan Wakely via Libc-alpha
2021-07-17 13:54                 ` Matthias Kretz
2021-07-17 21:37                   ` Jason Merrill via Libc-alpha
2021-07-19  9:41                     ` Richard Earnshaw via Libc-alpha
2021-07-20 16:43                       ` Jason Merrill via Libc-alpha
2021-07-20 18:05                 ` Thomas Rodgers
2021-07-16 17:20     ` Noah Goldstein via Libc-alpha
2021-07-16 19:37       ` Matthias Kretz
2021-07-16 21:23         ` Noah Goldstein via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).