From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C5C5B1F453 for ; Wed, 6 Feb 2019 14:53:35 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:to:cc:references:from:subject:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=hPcSgA7FK0LXV0ek qBFDt6UywNp2UfiiV1d1y3Y+i4jB92O/559P0tinNPl1P54Z9VrcCzWkfIhlSI1Q SpBD+PEqjpki4Ukjf+NYjj+eLaRBiWxd3ruKQgDlKWIPBj7R15xzHb0WtVDEnnRK mrNrHJ3mqd85+DBxnTatByPr32A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:to:cc:references:from:subject:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=GV23gjE/3/MPE7iYpb/Z+x AYhT4=; b=Sv7B+PUELZO/2IwE8mZuXVXSeWiKr5I11+867V/+0WqcQ5IcFU+4rI 07ocgkauUK+c0CES6v3Rr9egiEja+1k+XelX/0kdZC23lg+2CMl29FrdGFk8462y /zINSkh85qvAnf5xAbQveBivV4mEiBxJk6wTGBcfkC52QHrWCSU5Q= Received: (qmail 78161 invoked by alias); 6 Feb 2019 14:53:33 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 78152 invoked by uid 89); 6 Feb 2019 14:53:32 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-qk1-f194.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:cc:references:from:openpgp:autocrypt:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=pSGfLCNau8BGhZGJIgrlZeXiAmbETKTH4OdPbt2W8G4=; b=ElENsAlK7FRP+vl/9UP5alt0amrc9xJ/lZkaoJHiLzOtaSl2o4zRNNIOZfbcSP61ry Id3dVUf4gjjYUGAfvfO/9Wy349Xtif8+wQ93HVVRLC2vnGWushgxl7nx2wBstLHX0FFJ xxP1/ckWew2VnpAYpcjYGxnYCZUunK9/IpiDBB/vnjI4E7ooYTjAayAMukPqYmReW12x wgVq8M8zbrRlxGLBoTexOpdEwK7XVzkKJ6JmgdGXIBjkk3siH4l8zS+RTrr6pmlsPK4S Uy71wrvDZVK0zvN3PhY58rsUPDZsHbia7ZTbQDvuQC/s97fgYPdqDMVSNuHbLWy9pfZS 1m4A== To: Wilco Dijkstra , 'GNU C Library' Cc: nd References: <49967cf5-a89a-fa17-5c94-556c92705bef@linaro.org> <1dc12364-668c-0216-a569-295a0c1f394f@linaro.org> From: Adhemerval Zanella Openpgp: preference=signencrypt Subject: Re: [PATCH] Improve string benchtests Message-ID: <8d43c338-50a5-9bf0-f16c-7d072a75d741@linaro.org> Date: Wed, 6 Feb 2019 12:53:26 -0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 06/02/2019 12:01, Wilco Dijkstra wrote: > Hi Adhemerval, > >>>> Same as before for wcpncpy: instead of reimplement the generic implementation >>>> on benchtests we can just include them. And it also leads to an possible >>>> optimization on generic implementation for wcpncpy. >>> >>> The point is to enable useful comparisons of string implementations. If we include >>> the generic implementation then we just compare the generic implementation with >>> itself in many cases. And that isn't useful. If I change a generic implementation I >>> want to see the difference that makes in the benchmark comparison rather than >>> showing no difference. >> >> My understanding is we have the generic implementation as the baseline >> where arch-specific optimization might be applied and the idea of the >> comparison is to check against it.  I see no point in using a different >> implementation on benchtests, it should compare against exactly what >> glibc is currently providing. > > I have to disagree, we cannot do an exact comparison unless build the generic > string functions as part of GLIBC and call them via the PLT. Including source > files with lots of #define magic is never going to be equivalent. > > The goal here is not an accurate comparison with generic string functions but > to enable a realistic comparison with an efficient baseline - the existing byte > oriented implementations provide a baseline but are too slow to be useful. The idea is not to be equivalent, since benchtests already adds the exported libc symbol which will be called through PLT. I do agree with you that the byte-oriented baseline is somewhat useless now that most architecture implements efficient word or vectorized common symbol, and the idea is also to provide some more efficient generic string implementation (I have a long-standing patchset 'Improve generic string routines' to address this). So my point is to which exactly should we compare on benchtests? Current we have: 1. Byte-oriented 'simple' implementation which, as we agree, should not be used as a baseline. 2. Some named 'stupid' which are usually composed implementation that might in fact be a faster implementation than some 'clever' ones. 3. Compiler builtins, which also does not represent meaningful data for libc optimization (it will either be inline, call libc implementation, or mix both strategies). 4. The libc implementations themselves, possible including all ifunc variations. So which really give us meaningful data for future optimization? Should we keep add multiple implementation as baselines to compare with? What about an architecture that uses as baseline an arch-specific implementation, which might use non optimal strategy that a future generic implementation might use? We have examples for both string and math code on different architectures where the generic implementation ended up performing better than the arch-specific implementation. Another example is your recent bench-strlen improvement (5289f1f56b7) which added the memchr_strlen. The generic implementation of strlen uses a similar strategy of memchr, with the difference it does not need to materialize the magic constant and add some loop unrolling for tail comparison. At first it should be faster than memchr_strlen, however if the architecture has an optimized memchr implementation (which is a hotspot and it is usually a target for arch-specific optimization), the memchr_strlen should be indeed faster (and have a lower i-cache footprint). My point is using memchr_strlen as the *generic* implementation and also use it as the *baseline* for performance comparison shows to the developer that optimizing memchr would be a net gain in general than providing multiple different optimization for multiple symbols that can be built by memchr calls. So I still think we should define better which exactly we need to compare in benchtests and use the generic implementation, which will be used as default for new ports, as the default basline. The file inclusion is just to avoid code duplication, I don't have a strong opinion whether to include or just copy-paste the code on benchtests. > >> If you want to check if the your changes improves the generic, you can >> compare against multiples glibc builds. > > That doesn't work so well given it takes a long time to rebuild GLIBC and > benchmarks. For all benchmarking I do, I always create a direct comparison of > old vs new in a single run so it shows the differences and can be run repeatedly > to confirm. The string bench is setup to do this already, so why remove this > useful feature? > >>> Maybe the name generic_xxx is confusing? It's meant to be the baseline, >>> something which you should beat in all cases with the actual implementation. >> >> My understanding is the baseline should be the generic implementation which >> is selected if the architecture does not provide an optimized one. > > That means you never compare the generic implementation against a baseline. > Given that is what we do today, I don't see why we should stop doing that. > > Cheers, > Wilco > >