From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 149341F55B for ; Thu, 21 May 2020 20:41:41 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7795A383E81D; Thu, 21 May 2020 20:41:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7795A383E81D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1590093699; bh=0sKTbhc00NQl10jB/CBxEdLKB2RcoZC+vBYt2SkPFRw=; h=In-Reply-To:References:Subject:To:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=qZ7lHx7xqOEPmgo25m9bm4voNgcm73+1wHCgTFm1XdggmaaSyfz+IhmqmGNWE7AaY A9WsSvGHbew9TN59QnvLXS6dzT8zxHJfaDWZkVHTRHRiMW/2LURPekefrLMIf/r2pb PAB8sntNKT3qPg7+iANArg3QMHaBpoQ2HRlinnBs= Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A7B7E383E81D for ; Thu, 21 May 2020 20:41:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A7B7E383E81D Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04LKZqKL062940; Thu, 21 May 2020 16:41:33 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com with ESMTP id 3160mh8eyw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 May 2020 16:41:33 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 04LKYgeQ004934; Thu, 21 May 2020 20:41:32 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma02dal.us.ibm.com with ESMTP id 313whbm45b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 21 May 2020 20:41:32 +0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 04LKfVNv48496926 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 21 May 2020 20:41:31 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B1D5EAC05E; Thu, 21 May 2020 20:41:31 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 328A8AC060; Thu, 21 May 2020 20:41:31 +0000 (GMT) Received: from localhost (unknown [9.163.89.61]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Thu, 21 May 2020 20:41:30 +0000 (GMT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20200521191048.1566568-1-murphyp@linux.vnet.ibm.com> References: <20200521191048.1566568-1-murphyp@linux.vnet.ibm.com> Subject: Re: [PATCH] powerpc64le: add optimized strlen for P9 To: "Paul E. Murphy" , anton@ozlabs.org, libc-alpha@sourceware.org Date: Thu, 21 May 2020 17:41:29 -0300 Message-ID: <159009368982.9928.17842426686993036466@localhost.localdomain> User-Agent: alot/0.9.1 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.676 definitions=2020-05-21_13:2020-05-21, 2020-05-21 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 adultscore=0 impostorscore=0 mlxscore=0 spamscore=0 clxscore=1011 mlxlogscore=999 priorityscore=1501 cotscore=-2147483648 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005210148 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: "Lucas A. M. Magalhaes via Libc-alpha" Reply-To: "Lucas A. M. Magalhaes" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Quoting Paul E. Murphy via Libc-alpha (2020-05-21 16:10:48) > This is a followup to rawmemchr/strlen from Anton. I missed > his original strlen patch, and likewise I wasn't happy with > the 3-4% performance drop for larger strings which occurs > around 2.5kB as the P8 vector loop is a bit faster. As noted, > this is up to 50% faster for small strings, and about 1% faster > for larger strings (I hazard to guess this some uarch difference > between lxv and lvx). >=20 > I guess this is a semi-V2 of the patch. Likewise, I need to > double check binutils 2.26 supports the P9 insn used here. >=20 > ---8<--- >=20 > This started as a trivial change to Anton's rawmemchr. I got > carried away. This is a hybrid between P8's asympotically > faster 64B checks with extremely efficient small string checks > e.g <64B (and sometimes a little bit more depending on alignment). >=20 > The second trick is to align to 64B by running a 48B checking loop > 16B at a time until we naturally align to 64B (i.e checking 48/96/144 > bytes/iteration based on the alignment after the first 5 comparisons). > This allieviates the need to check page boundaries. >=20 > Finally, explicly use the P7 strlen with the runtime loader when building > P9. We need to be cautious about vector/vsx extensions here on P9 only > builds. > --- > .../powerpc/powerpc64/le/power9/rtld-strlen.S | 1 + > sysdeps/powerpc/powerpc64/le/power9/strlen.S | 215 ++++++++++++++++++ > sysdeps/powerpc/powerpc64/multiarch/Makefile | 2 +- > .../powerpc64/multiarch/ifunc-impl-list.c | 4 + > .../powerpc64/multiarch/strlen-power9.S | 2 + > sysdeps/powerpc/powerpc64/multiarch/strlen.c | 5 + > 6 files changed, 228 insertions(+), 1 deletion(-) > create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strlen.S > create mode 100644 sysdeps/powerpc/powerpc64/le/power9/strlen.S > create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strlen-power9.S >=20 > diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strlen.S b/sysdeps/= powerpc/powerpc64/le/power9/rtld-strlen.S > new file mode 100644 > index 0000000000..e9d83323ac > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strlen.S > @@ -0,0 +1 @@ > +#include > diff --git a/sysdeps/powerpc/powerpc64/le/power9/strlen.S b/sysdeps/power= pc/powerpc64/le/power9/strlen.S > new file mode 100644 > index 0000000000..084d6e31a8 > --- /dev/null > +++ b/sysdeps/powerpc/powerpc64/le/power9/strlen.S > @@ -0,0 +1,215 @@ > + > +/* Optimized rawmemchr implementation for PowerPC64/POWER9. s/rawmemchr/strlen Still trying to understand the rest of the patch though. =3D) --- Lucas A. M. Magalh=C3=A3es