unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: christoph.muellner@vrull.eu, libc-alpha@sourceware.org,
	Darius Rad <darius@bluespec.com>,
	Andrew Waterman <andrew@sifive.com>,
	philipp.tomsich@vrull.eu, Evan Green <evan@rivosinc.com>,
	DJ Delorie <dj@redhat.com>, Vineet Gupta <vineetg@rivosinc.com>,
	kito.cheng@sifive.com, jeffreyalaw@gmail.com
Subject: Re: [PATCH 2/7] RISC-V: Add Zbb optimized memchr as ifunc
Date: Tue, 30 Apr 2024 14:45:12 -0300	[thread overview]
Message-ID: <b6228914-df12-4dfe-b222-e4038641fef8@linaro.org> (raw)
In-Reply-To: <mhng-b337be0b-322d-49dd-9941-54790eb0800e@palmer-ri-x1c9a>



On 30/04/24 12:13, Palmer Dabbelt wrote:
> On Wed, 24 Apr 2024 06:36:43 PDT (-0700), adhemerval.zanella@linaro.org wrote:
>>
>>
>> On 24/04/24 10:16, Christoph Müllner wrote:
>>> On Wed, Apr 24, 2024 at 2:53 PM Adhemerval Zanella Netto
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 22/04/24 04:43, Christoph Müllner wrote:
>>>>> When building with Zbb enabled, memchr benefits from using orc.b in
>>>>> find_zero_all().  This patch changes the build system such, that a
>>>>> non-Zbb version as well as a Zbb version of this routine is built.
>>>>> Further, a ifunc resolver is provided that selects the right routine
>>>>> based on the outcome of extension probing via hwprobe().
>>>>>
>>>>> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
>>>>> ---
>>>>>  sysdeps/riscv/multiarch/memchr-generic.c      | 26 +++++++++
>>>>>  sysdeps/riscv/multiarch/memchr-zbb.c          | 30 ++++++++++
>>>>>  .../unix/sysv/linux/riscv/multiarch/Makefile  |  3 +
>>>>>  .../linux/riscv/multiarch/ifunc-impl-list.c   | 31 ++++++++--
>>>>>  .../unix/sysv/linux/riscv/multiarch/memchr.c  | 57 +++++++++++++++++++
>>>>>  5 files changed, 142 insertions(+), 5 deletions(-)
>>>>>  create mode 100644 sysdeps/riscv/multiarch/memchr-generic.c
>>>>>  create mode 100644 sysdeps/riscv/multiarch/memchr-zbb.c
>>>>>  create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
>>>>>
>>>>> diff --git a/sysdeps/riscv/multiarch/memchr-generic.c b/sysdeps/riscv/multiarch/memchr-generic.c
>>>>> new file mode 100644
>>>>> index 0000000000..a96c36398b
>>>>> --- /dev/null
>>>>> +++ b/sysdeps/riscv/multiarch/memchr-generic.c
>>>>> @@ -0,0 +1,26 @@
>>>>> +/* Re-include the default memchr implementation.
>>>>> +   Copyright (C) 2024 Free Software Foundation, Inc.
>>>>> +   This file is part of the GNU C Library.
>>>>> +
>>>>> +   The GNU C Library is free software; you can redistribute it and/or
>>>>> +   modify it under the terms of the GNU Lesser General Public
>>>>> +   License as published by the Free Software Foundation; either
>>>>> +   version 2.1 of the License, or (at your option) any later version.
>>>>> +
>>>>> +   The GNU C Library is distributed in the hope that it will be useful,
>>>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>>> +   Lesser General Public License for more details.
>>>>> +
>>>>> +   You should have received a copy of the GNU Lesser General Public
>>>>> +   License along with the GNU C Library; if not, see
>>>>> +   <https://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +#include <string.h>
>>>>> +
>>>>> +#if IS_IN(libc)
>>>>> +# define MEMCHR __memchr_generic
>>>>> +# undef libc_hidden_builtin_def
>>>>> +# define libc_hidden_builtin_def(x)
>>>>> +#endif
>>>>> +#include <string/memchr.c>
>>>>> diff --git a/sysdeps/riscv/multiarch/memchr-zbb.c b/sysdeps/riscv/multiarch/memchr-zbb.c
>>>>> new file mode 100644
>>>>> index 0000000000..bead0335ae
>>>>> --- /dev/null
>>>>> +++ b/sysdeps/riscv/multiarch/memchr-zbb.c
>>>>> @@ -0,0 +1,30 @@
>>>>> +/* Re-include the default memchr implementation for Zbb.
>>>>> +   Copyright (C) 2024 Free Software Foundation, Inc.
>>>>> +   This file is part of the GNU C Library.
>>>>> +
>>>>> +   The GNU C Library is free software; you can redistribute it and/or
>>>>> +   modify it under the terms of the GNU Lesser General Public
>>>>> +   License as published by the Free Software Foundation; either
>>>>> +   version 2.1 of the License, or (at your option) any later version.
>>>>> +
>>>>> +   The GNU C Library is distributed in the hope that it will be useful,
>>>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>>> +   Lesser General Public License for more details.
>>>>> +
>>>>> +   You should have received a copy of the GNU Lesser General Public
>>>>> +   License along with the GNU C Library; if not, see
>>>>> +   <https://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +#include <string.h>
>>>>> +
>>>>> +#if IS_IN(libc)
>>>>> +# define MEMCHR __memchr_zbb
>>>>> +# undef libc_hidden_builtin_def
>>>>> +# define libc_hidden_builtin_def(x)
>>>>> +#endif
>>>>> +/* Convince preprocessor to have Zbb instructions.  */
>>>>> +#ifndef __riscv_zbb
>>>>> +# define __riscv_zbb
>>>>> +#endif
>>>>
>>>> Is there a way to specific the compiler to enable a extension, like aarch64
>>>> -march=arch{+[no]feature}? I think ideally this should be enabled as CFLAGS
>>>> instead of messing with compiler defined pre-processor.
>>>
>>> The tools expect a list of all extensions as parameter to the -march= option.
>>> But there is no way to append extensions to an existing march string
>>> on the command line.
>>>
>>> And if we would add this feature today, it would take many years until we could
>>> use it here, because we want to remain compatible with old tools.
>>> Or we enable the optimization only when being built with new tools, but that
>>> adds even more complexity and build/test configurations.
>>>
>>> What we have is:
>>> * Preprocessor (since forever): Extension test macros (__riscv_EXTENSION)
>>> * Command line (since forever): -march=BASE_EXTENSIONLIST
>>> * GAS (since Nov 21): .option arch, +EXTENSION (in combination with
>>> option push/pop)
>>> * GCC (since Nov 23): __attribute__((target("arch=+EXTENSION")))
>>>
>>> I was not sure about using __riscv_zbb as well, but I considered it safe within
>>> ifdef tests that ensure the macro won't be set twice.
>>> If that's a concern, I could change to use something like this:
>>> #define __riscv_force_zbb
>>> #include <impl.c>
>>> #undef __riscv_force_zbb
>>> ... and change string-fza.h like this:
>>> #if defined(__riscv_zbb) || defined(__riscv_force_zbb)
>>> // orc.b
>>> #endif
>>>
>>> BR
>>> Christoph
>>
>> Another options would to parse the current march and add the extension if required,
>> something like:
>>
>> abi=$(riscv64-linux-gnu-gcc -Q --help=target | grep march | cut -d '=' -f2 | xargs)
>> if [[ ! "$abi" =~ "_zbb" ]]
>> then
>>   abi="$abi"_zbb
>> fi
> 
> That alone likely won't do it, there's a bunch of ordering rules in the ISA string handling so we might get tripped up on them.  We've got a fairly relaxed version of the rules in GCC to try and match the various older rules, though, so it might be possible to make something similar work.
> 
> We should probably just add some sort of -march=+zbb type argument.  IIRC Kito was going to do it at some point, not sure if he got around to it?

I am just pointing this out because I think the way RISCV extension selection
is currently implemented makes it awkward to provide ifunc implementation in
a agnostic way (specially now that RISCV has dozens of extensions) without
knowing the current target compiler is generating.

Some other ABI allows to either specify a ISA/chip reference (like powerpc
with -mcpu=powerX) or a ABI extension directly (like aarch64 with -march=+xxx).

> 
>> I don't have a strong preference, it is just that by not using the compiler flag
>> we won't be able to either use the builtin (__builtin_riscv_orc_b_32) and/or get
>> a possible better code generation from compiler.
> 
> I think we'd likely get slightly better codgen from telling the compiler about the bitmanip extensions.  Maybe we want something like
> 
>    diff --git a/string/memchr.c b/string/memchr.c
>    index 08b5c41667..1b62dce8d8 100644
>    --- a/string/memchr.c
>    +++ b/string/memchr.c
>    @@ -29,15 +29,19 @@
>     # define __memchr MEMCHR
>     #endif
>       +#ifndef __MEMCHR_CODEGEN_ATTRIBUTE
>    +#define __MEMCHR_CODEGEN_ATTRIBUTE
>    +#endif
>    +
>     static __always_inline const char *
>    -sadd (uintptr_t x, uintptr_t y)
>    +sadd (uintptr_t x, uintptr_t y) __MEMCHR_CODEGEN_ATTRIBUTE
>     {
>       return (const char *)(y > UINTPTR_MAX - x ? UINTPTR_MAX : x + y);
>     }
>        /* Search no more than N bytes of S for C.  */
>     void *
>    -__memchr (void const *s, int c_in, size_t n)
>    +__memchr (void const *s, int c_in, size_t n) __MEMCHR_CODEGEN_ATTRIBUTE
>     {
>       if (__glibc_unlikely (n == 0))
>         return NULL;
> 
> in the generic versions, so we can add a
> 
>    #define __MEMCHR_CODEGEN_ATTRIBUTE __attribuet__((target("+zbb")))
> 
> (or whatever the syntax is) to the Zbb-flavored versions of these routines?

Yeah, this might work and it is clear than messing with compiler-defined
macros.

> 
> It might also be worth just jumping to the fast-misaligned versions for these routines, too --the slow-misaligned stuff is there for compatibility with old stuff (though memchr aligns the pointer, so it doesn't matter so much here).

I was hopping that ABIs that would like to provide unaligned variants
for mem* routines to improve the generic code, but it seems that for some
it is easier to just add an assembly routine (as loongarch did).

For memchr, I think it should be easy to provide a unaligned version.
Something like (completely untested):

/* Search no more than N bytes of S for C.  */
void *
__memchr (void const *s, int c_in, size_t n)
{
  if (__glibc_unlikely (n == 0))
    return NULL;

#ifdef USE_MEMCHR_UNALIGNED
  /* Read the first word, but munge it so that bytes before the array
     will not match goal.  */
  const op_t *word_ptr = PTR_ALIGN_DOWN (s, sizeof (op_t));
  uintptr_t s_int = (uintptr_t) s;

  op_t word = *word_ptr;
  op_t repeated_c = repeat_bytes (c_in);
  /* Compute the address of the last byte taking in consideration possible
     overflow.  */
  const char *lbyte = sadd (s_int, n - 1);
  /* And also the address of the word containing the last byte. */
  const op_t *lword = (const op_t *) PTR_ALIGN_DOWN (lbyte, sizeof (op_t));

  find_t mask = shift_find (find_eq_all (word, repeated_c), s_int);
  if (mask != 0)
    {
      char *ret = (char *) s + index_first (mask);
      return (ret <= lbyte) ? ret : NULL;
    }
  if (word_ptr == lword)
    return NULL;
#endif

  word = *++word_ptr;
  while (word_ptr != lword)
    {
      if (has_eq (word, repeated_c))
        return (char *) word_ptr + index_first_eq (word, repeated_c);
      word = *++word_ptr;
    }

  if (has_eq (word, repeated_c))
    {
      /* We found a match, but it might be in a byte past the end of the
         array.  */
      char *ret = (char *) word_ptr + index_first_eq (word, repeated_c);
      if (ret <= lbyte)
        return ret;
    }
  return NULL;
}

> 
>>>>> +#include <string/memchr.c>
>>>>> diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
>>>>> index fcef5659d4..5586d11c89 100644
>>>>> --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
>>>>> +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile
>>>>> @@ -1,5 +1,8 @@
>>>>>  ifeq ($(subdir),string)
>>>>>  sysdep_routines += \
>>>>> +  memchr \
>>>>> +  memchr-generic \
>>>>> +  memchr-zbb \
>>>>>    memcpy \
>>>>>    memcpy-generic \
>>>>>    memcpy_noalignment \
>>>>> diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
>>>>> index 9f806d7a9e..7321144a32 100644
>>>>> --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
>>>>> +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c
>>>>> @@ -20,19 +20,40 @@
>>>>>  #include <string.h>
>>>>>  #include <sys/hwprobe.h>
>>>>>
>>>>> +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
>>>>> +
>>>>>  size_t
>>>>>  __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>>>>>                       size_t max)
>>>>>  {
>>>>>    size_t i = max;
>>>>> +  struct riscv_hwprobe pairs[] = {
>>>>> +    { .key = RISCV_HWPROBE_KEY_IMA_EXT_0 },
>>>>> +    { .key = RISCV_HWPROBE_KEY_CPUPERF_0 },
>>>>> +  };
>>>>>
>>>>> +  bool has_zbb = false;
>>>>>    bool fast_unaligned = false;
>>>>>
>>>>> -  struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 };
>>>>> -  if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0
>>>>> -      && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK)
>>>>> -          == RISCV_HWPROBE_MISALIGNED_FAST)
>>>>> -    fast_unaligned = true;
>>>>> +  if (__riscv_hwprobe (pairs, ARRAY_SIZE (pairs), 0, NULL, 0) == 0)
>>>>> +    {
>>>>> +      struct riscv_hwprobe *pair;
>>>>> +
>>>>> +      /* RISCV_HWPROBE_KEY_IMA_EXT_0  */
>>>>> +      pair = &pairs[0];
>>>>> +      if (pair->value & RISCV_HWPROBE_EXT_ZBB)
>>>>> +        has_zbb = true;
>>>>> +
>>>>> +      /* RISCV_HWPROBE_KEY_CPUPERF_0  */
>>>>> +      pair = &pairs[1];
>>>>> +      if ((pair->value & RISCV_HWPROBE_MISALIGNED_MASK)
>>>>> +        == RISCV_HWPROBE_MISALIGNED_FAST)
>>>>> +        fast_unaligned = true;
>>>>> +    }
>>>>> +
>>>>> +  IFUNC_IMPL (i, name, memchr,
>>>>> +           IFUNC_IMPL_ADD (array, i, memchr, has_zbb, __memchr_zbb)
>>>>> +           IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_generic))
>>>>>
>>>>>    IFUNC_IMPL (i, name, memcpy,
>>>>>             IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned,
>>>>> diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c b/sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
>>>>> new file mode 100644
>>>>> index 0000000000..bc076cbf24
>>>>> --- /dev/null
>>>>> +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c
>>>>> @@ -0,0 +1,57 @@
>>>>> +/* Multiple versions of memchr.
>>>>> +   All versions must be listed in ifunc-impl-list.c.
>>>>> +   Copyright (C) 2017-2024 Free Software Foundation, Inc.
>>>>> +   This file is part of the GNU C Library.
>>>>> +
>>>>> +   The GNU C Library is free software; you can redistribute it and/or
>>>>> +   modify it under the terms of the GNU Lesser General Public
>>>>> +   License as published by the Free Software Foundation; either
>>>>> +   version 2.1 of the License, or (at your option) any later version.
>>>>> +
>>>>> +   The GNU C Library is distributed in the hope that it will be useful,
>>>>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>>>> +   Lesser General Public License for more details.
>>>>> +
>>>>> +   You should have received a copy of the GNU Lesser General Public
>>>>> +   License along with the GNU C Library; if not, see
>>>>> +   <https://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +#if IS_IN (libc)
>>>>> +/* Redefine memchr so that the compiler won't complain about the type
>>>>> +   mismatch with the IFUNC selector in strong_alias, below.  */
>>>>> +# undef memchr
>>>>> +# define memchr __redirect_memchr
>>>>> +# include <stdint.h>
>>>>> +# include <string.h>
>>>>> +# include <ifunc-init.h>
>>>>> +# include <riscv-ifunc.h>
>>>>> +# include <sys/hwprobe.h>
>>>>> +
>>>>> +extern __typeof (__redirect_memchr) __libc_memchr;
>>>>> +
>>>>> +extern __typeof (__redirect_memchr) __memchr_generic attribute_hidden;
>>>>> +extern __typeof (__redirect_memchr) __memchr_zbb attribute_hidden;
>>>>> +
>>>>> +static inline __typeof (__redirect_memchr) *
>>>>> +select_memchr_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func)
>>>>> +{
>>>>> +  unsigned long long int v;
>>>>> +  if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0
>>>>> +      && (v & RISCV_HWPROBE_EXT_ZBB))
>>>>> +    return __memchr_zbb;
>>>>> +
>>>>> +  return __memchr_generic;
>>>>> +}
>>>>> +
>>>>> +riscv_libc_ifunc (__libc_memchr, select_memchr_ifunc);
>>>>> +
>>>>> +# undef memchr
>>>>> +strong_alias (__libc_memchr, memchr);
>>>>> +# ifdef SHARED
>>>>> +__hidden_ver1 (memchr, __GI_memchr, __redirect_memchr)
>>>>> +  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memchr);
>>>>> +# endif
>>>>> +#else
>>>>> +# include <string/memchr.c>
>>>>> +#endif

  reply	other threads:[~2024-04-30 17:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-22  7:43 [PATCH 0/7] Add ifunc support for existing Zbb optimizations Christoph Müllner
2024-04-22  7:43 ` [PATCH 1/7] RISC-V: Use .insn directive form for orc.b Christoph Müllner
2024-04-22  7:43 ` [PATCH 2/7] RISC-V: Add Zbb optimized memchr as ifunc Christoph Müllner
2024-04-24 12:53   ` Adhemerval Zanella Netto
2024-04-24 13:16     ` Christoph Müllner
2024-04-24 13:36       ` Adhemerval Zanella Netto
2024-04-26 11:40         ` Christoph Müllner
2024-04-30 15:13         ` Palmer Dabbelt
2024-04-30 17:45           ` Adhemerval Zanella Netto [this message]
2024-04-30 17:54             ` Palmer Dabbelt
2024-04-30 18:44               ` Vineet Gupta
2024-05-06 13:20               ` Christoph Müllner
2024-05-06 13:32                 ` Kito Cheng
2024-05-06 13:46                   ` Christoph Müllner
2024-05-06 13:58                     ` Kito Cheng
2024-04-22  7:43 ` [PATCH 3/7] RISC-V: Add Zbb optimized memrchr " Christoph Müllner
2024-04-22  7:44 ` [PATCH 4/7] RISC-V: Add Zbb optimized strchrnul " Christoph Müllner
2024-04-22  7:44 ` [PATCH 5/7] RISC-V: Add Zbb optimized strcmp " Christoph Müllner
2024-04-22  7:44 ` [PATCH 6/7] RISC-V: Add Zbb optimized strlen " Christoph Müllner
2024-04-22  7:44 ` [PATCH 7/7] RISC-V: Add Zbb optimized strncmp " Christoph Müllner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6228914-df12-4dfe-b222-e4038641fef8@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=andrew@sifive.com \
    --cc=christoph.muellner@vrull.eu \
    --cc=darius@bluespec.com \
    --cc=dj@redhat.com \
    --cc=evan@rivosinc.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=palmer@dabbelt.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).