unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	'GNU C Library' <libc-alpha@sourceware.org>
Cc: nd <nd@arm.com>
Subject: Re: [PATCH] Improve string benchtests
Date: Wed, 6 Feb 2019 12:53:26 -0200	[thread overview]
Message-ID: <8d43c338-50a5-9bf0-f16c-7d072a75d741@linaro.org> (raw)
In-Reply-To: <DB5PR08MB103066A4010D52D641FDD871836F0@DB5PR08MB1030.eurprd08.prod.outlook.com>



On 06/02/2019 12:01, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>>>> Same as before for wcpncpy: instead of reimplement the generic implementation
>>>> on benchtests we can just include them. And it also leads to an possible
>>>> optimization on generic implementation for wcpncpy.
>>>
>>> The point is to enable useful comparisons of string implementations. If we include
>>> the generic implementation then we just compare the generic implementation with
>>> itself in many cases. And that isn't useful. If I change a generic implementation I
>>> want to see the difference that makes in the benchmark comparison rather than
>>> showing no difference.
>>
>> My understanding is we have the generic implementation as the baseline
>> where arch-specific optimization might be applied and the idea of the 
>> comparison is to check against it.  I see no point in using a different 
>> implementation on benchtests, it should compare against exactly what 
>> glibc is currently providing.
> 
> I have to disagree, we cannot do an exact comparison unless build the generic
> string functions as part of GLIBC and call them via the PLT. Including source
> files with lots of #define magic is never going to be equivalent.
> 
> The goal here is not an accurate comparison with generic string functions but
> to enable a realistic comparison with an efficient baseline - the existing byte
> oriented implementations provide a baseline but are too slow to be useful.

The idea is not to be equivalent, since benchtests already adds the exported
libc symbol which will be called through PLT.  I do agree with you that the
byte-oriented baseline is somewhat useless now that most architecture implements
efficient word or vectorized common symbol, and the idea is also to provide
some more efficient generic string implementation (I have a long-standing patchset
'Improve generic string routines' to address this).

So my point is to which exactly should we compare on benchtests? Current we have:

  1. Byte-oriented 'simple' implementation which, as we agree, should not be
     used as a baseline.

  2. Some named 'stupid' which are usually composed implementation that might
     in fact be a faster implementation than some 'clever' ones.

  3. Compiler builtins, which also does not represent meaningful data for
     libc optimization (it will either be inline, call libc implementation, or
     mix both strategies).

  4. The libc implementations themselves, possible including all ifunc variations.

So which really give us meaningful data for future optimization? Should we keep
add multiple implementation as baselines to compare with?

What about an architecture that uses as baseline an arch-specific implementation,
which might use non optimal strategy that a future generic implementation might
use? We have examples for both string and math code on different architectures
where the generic implementation ended up performing better than the arch-specific
implementation.

Another example is your recent bench-strlen improvement (5289f1f56b7) which
added the memchr_strlen.  The generic implementation of strlen uses a similar
strategy of memchr, with the difference it does not need to materialize the
magic constant and add some loop unrolling for tail comparison. At first
it should be faster than memchr_strlen, however if the architecture has 
an optimized memchr implementation (which is a hotspot and it is usually a
target for arch-specific optimization), the memchr_strlen should be indeed
faster (and have a lower i-cache footprint). 

My point is using memchr_strlen as the *generic* implementation and also use
it as the *baseline* for performance comparison shows to the developer that
optimizing memchr would be a net gain in general than providing multiple
different optimization for multiple symbols that can be built by memchr
calls.

So I still think we should define better which exactly we need to compare
in benchtests and use the generic implementation, which will be used as
default for new ports, as the default basline. The file inclusion is just to 
avoid code duplication, I don't have a strong opinion whether to include or 
just copy-paste the code on benchtests. 

> 
>> If you want to check if the your changes improves the generic, you can
>> compare against multiples glibc builds.
> 
> That doesn't work so well given it takes a long time to rebuild GLIBC and 
> benchmarks. For all benchmarking I do, I always create a direct comparison of
> old vs new in a single run so it shows the differences and can be run repeatedly
> to confirm. The string bench is setup to do this already, so why remove this
> useful feature?
>  
>>> Maybe the name generic_xxx is confusing? It's meant to be the baseline,
>>> something which you should beat in all cases with the actual implementation.
>>
>> My understanding is the baseline should be the generic implementation which
>> is selected if the architecture does not provide an optimized one.
> 
> That means you never compare the generic implementation against a baseline.
> Given that is what we do today, I don't see why we should stop doing that.
> 
> Cheers,
> Wilco
>     
> 

  reply	other threads:[~2019-02-06 14:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-27 18:26 [PATCH] Improve string benchtests Wilco Dijkstra
2019-02-01 15:36 ` Wilco Dijkstra
2019-02-04 19:35 ` Adhemerval Zanella
2019-02-05 15:17   ` Wilco Dijkstra
2019-02-05 19:33     ` Adhemerval Zanella
2019-02-06 14:01       ` Wilco Dijkstra
2019-02-06 14:53         ` Adhemerval Zanella [this message]
2019-03-06 18:14           ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d43c338-50a5-9bf0-f16c-7d072a75d741@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).