From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 514961F461 for ; Tue, 25 Jun 2019 18:34:54 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:to:cc:references:from:subject:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=GSCnGDZ47UtsHYBq 2GIBOgFxbyyFLelN8I+J6xi3BZuqURBZeuc7E13ero2rqhfIaRYlETr4HufQHk4d LIh1MnUJTraa5qrZut/0N+Gh9WV15ozLQSPlMEdE6bqeoB2c4m4jpm5+h+7XrWku L5PTHciNQXQBeSYghXUDNFOzlZA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:to:cc:references:from:subject:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=ncZCqA/T1CQ0OZayGTDwF7 uwMMc=; b=EQ+GICnKoKocwY9WNvmApTjvI3Ia1VJoNYgQb500yCJHc0S5sBTAuh yKjpf+ri6rbOvtHp8pLKNSOpMh4j91C6PdbzHnb29Ox4d8i6NBzq8QpAS8lZTEU1 DnCELpr4bhYpSkY7w57jgbTrAGBMmx4prqyKUC9B16xWqY5smTlBU= Received: (qmail 20615 invoked by alias); 25 Jun 2019 18:34:51 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 20590 invoked by uid 89); 25 Jun 2019 18:34:49 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-qt1-f196.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:cc:references:from:openpgp:autocrypt:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Ta7KsbxwoL4UbcNN9/0s+dBBuxKBd/n4jSH9uKb4gqE=; b=OEtoJml44qziiGUl+VUb3dsPzfJXJUeM0KI1FBEYI2nTabIULe4Exqjw8D0tb8ZYj8 8lZ81krCLCUzN6XKc8zOqTEqsxFDH88y/xgWsgl7+rjzjbnWlq+MNyXUrAYLB8ydEldl 2IqqyC3hD73cdzHvFmyBek14qb3yg/G8XhnfP86uFdjv71pfysQiLUq387/ClmwZh1Xw KKdfZCs4L8ZIF+gBubX8inWCAChXYhglhx1Bz16+YXlgnpvqgAedyniVH8BuFXCpnUCC cQNFvRScBW6rIGJQz45UxeBl/bg4g35KymB1TTy3tnTtJsuTXZmfJEJkf9hdSsoXlolV KLFA== To: "Gabriel F. T. Gomes" Cc: libc-alpha@sourceware.org References: <20190329133529.22523-1-adhemerval.zanella@linaro.org> <20190329133529.22523-22-adhemerval.zanella@linaro.org> <20190624210726.nllhjjcdtruehpn6@tereshkova> From: Adhemerval Zanella Openpgp: preference=signencrypt Subject: Re: [PATCH 21/28] powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf Message-ID: <94c7d9c7-9f30-7a77-de52-6199b7d8cd29@linaro.org> Date: Tue, 25 Jun 2019 15:34:42 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: <20190624210726.nllhjjcdtruehpn6@tereshkova> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 24/06/2019 18:07, Gabriel F. T. Gomes wrote: > On Fri, Mar 29 2019, Adhemerval Zanella wrote: >> >> This patches consolidates all the powerpc llround{f} implementations on >> the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}. The only missing >> optimization is the power6x one which I could not make GCC generates >> mftgpr for 32 bits output. > > Similar to the llrint case, no harm done as such optimization wasn't > available, anyway. > >> + /* The barrier prevents compiler from optimizing it to llround when >> + compiled with -fno-math-errno */ >> + math_opt_barrier (x); > > I don't actually understand what this accomplishes, and I don't see any > difference in the code generated with and without this barrier (all my > builds have -fno-math-errno passed to the compiler). Could you help me > understand this? > > This is the code I get (with or without the barrier): > > 00079dc0 <__llround_power6>: > 79dc0: fc 20 0b 10 frin f1,f1 > 79dc4: 94 21 ff f0 stwu r1,-16(r1) > 79dc8: fc 00 0e 5e fctidz f0,f1 > 79dcc: d8 01 00 08 stfd f0,8(r1) > 79dd0: 80 61 00 08 lwz r3,8(r1) > 79dd4: 80 81 00 0c lwz r4,12(r1) > 79dd8: 38 21 00 10 addi r1,r1,16 > 79ddc: 4e 80 00 20 blr > > 00079de0 <__llround_power5plus>: > 79de0: 94 21 ff f0 stwu r1,-16(r1) > 79de4: fc 20 0b 10 frin f1,f1 > 79de8: fc 00 0e 5e fctidz f0,f1 > 79dec: d8 01 00 08 stfd f0,8(r1) > 79df0: 60 00 00 00 nop > 79df4: 60 00 00 00 nop > 79df8: 60 00 00 00 nop > 79dfc: 80 61 00 08 lwz r3,8(r1) > 79e00: 80 81 00 0c lwz r4,12(r1) > 79e04: 38 21 00 10 addi r1,r1,16 > 79e08: 4e 80 00 20 blr > >> --- a/sysdeps/powerpc/powerpc32/fpu/s_llround.c >> +++ b/sysdeps/powerpc/powerpc32/fpu/s_llround.c Without math_opt_barrier I am seeing with gcc 8.2.1 20190214 (--build=powerpc64-unknown-linux-gnu --host=powerpc64-unknown-linux-gnu --target=powerpc-glibc-linux-gnu no extra option, built with build-many-glibcs.py) I see using the flags: powerpc-glibc-linux-gnu-gcc -mcpu=power4 ../sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c -c -std=gnu11 -fgnu89-inline -g -O2 -mcpu=power4 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -mhard-float -Wstrict-prototypes -Wold-style-definition -fno-math-errno -mlong-double-128 -mcpu=power6 -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES --- __llround_power6: .LVL0: .LFB46: .file 1 "../sysdeps/powerpc/powerpc32/fpu/s_llround.c" .loc 1 33 1 view -0 .cfi_startproc .loc 1 35 3 view .LVU1 .loc 1 35 10 is_stmt 0 view .LVU2 b llround .LVL1: .loc 1 35 10 view .LVU3 .cfi_endproc --- This is because gcc transforms the "(long long int) round (x)" to llround and thus creates a cyclic call (this is similar to aarch64 {l}lrint optimization). As a side note, the __builtin_llround for powerpc64 generates the expected code. >> >> [...] >> >> + { >> + /* IEEE 1003.1 lround function. IEEE specifies "round to the nearest >> + integer value, rounding halfway cases away from zero, regardless of >> + the current rounding mode." However PowerPC Architecture defines >> + "round to Nearest" as "Choose the best approximation. In case of a >> + tie, choose the one that is even (least significant bit o).". >> + So we can't use the PowerPC "round to Nearest" mode. Instead we set >> + "round toward Zero" mode and round by adding +-0.5 before rounding >> + to the integer value. >> + > ~~ > Two white-spaces here. Acked. > >> --- a/sysdeps/powerpc/powerpc32/fpu/s_lround.S >> +++ /dev/null >> >> [...] >> >> - fcmpu cr5, fp1, fp9 /* if x >= 0x7fffffff.8p0 */ >> - fcmpu cr1, fp1, fp8 /* if x <= -0x80000000.8p0 */ > > OK. These have also been converted to c code in lround.c... > >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc32/fpu/s_lround.c >> >> [...] >> >> + if (x >= 0x7fffffff.8p0 || x <= -0x80000000.8p0) >> + x = (x < 0.0) ? -0x1p+52 : 0x1p+52; > > ... Here. >