From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8D73B1F8C6 for ; Fri, 27 Aug 2021 05:04:00 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B1D2A3857426 for ; Fri, 27 Aug 2021 05:03:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B1D2A3857426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630040638; bh=xn40bdLirOTsKASHE0cec+/9bNT5VBRXlrQ8VQM3QJE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=p/8xcHQbcGfaDF1cAaa9Q8R5EQyt7B7IFBd4yH/zLQWJcZDHsPQj5b1SVkYTexbiT 4aycp572Wrf0RnAWXENKTNjaLM4Jkr8dWCCU/HzIaj2GJ19S79kqUiYtthnUCQ99/V B/qu5A6wcdLg0lx+8CrHfAX7prfKRT/1hjKFjzJA= Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by sourceware.org (Postfix) with ESMTPS id 985A93857429 for ; Fri, 27 Aug 2021 05:03:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 985A93857429 IronPort-SDR: bLaWIHrjAwjEX5RAMBKX2iwNArohMu5frMVd4UyE3bVYBoI3m7v3aHEnHwZCp2fLxTnr/fRG5f SG49XQZa5OhmSnpw+kWjWAiilC39HuGGPOiEL5uTF9E1EqaOTPEDN05C9h3XFepkKFKRFz+a7m E3cy2Wx7T67heAf1bgx/bW4JTQfaSdymY5htCdGvDkuDNS/nlZGA1l9skihr7IDvPxFjwbIILZ +M2+BIaC8EjnfwkG3mQDSQWTbkXlG/Q0dceUg9I/o7hvQm4oar6vp+/xUA0r163J1qKHEW2hHQ saXgleXshkJxfSrcB25rN6gm X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="41936256" X-IronPort-AV: E=Sophos;i="5.84,355,1620658800"; d="scan'208";a="41936256" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP; 27 Aug 2021 14:03:29 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 8C3BF6DB23 for ; Fri, 27 Aug 2021 14:03:27 +0900 (JST) Received: from m3050.s.css.fujitsu.com (msm.b.css.fujitsu.com [10.134.21.208]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id E301FEA0EC for ; Fri, 27 Aug 2021 14:03:26 +0900 (JST) Received: from bionic.lxd (unknown [10.126.53.116]) by m3050.s.css.fujitsu.com (Postfix) with ESMTP id CCFDAAB; Fri, 27 Aug 2021 14:03:26 +0900 (JST) To: Wilco Dijkstra , libc-alpha@sourceware.org Subject: [PATCH] AArch64: Update A64FX memset not to degrade at 16KB Date: Fri, 27 Aug 2021 05:03:04 +0000 Message-Id: <20210827050304.543471-1-naohirot@fujitsu.com> X-Mailer: git-send-email 2.17.1 X-TM-AS-GCONF: 00 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Naohiro Tamura via Libc-alpha Reply-To: Naohiro Tamura Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" This patch updates unroll8 code so as not to degrade at the peak performance 16KB for both FX1000 and FX700. Inserted 2 instructions at the beginning of the unroll8 loop, cmp and branch, are a workaround that is found heuristically. Reviewed-by: Wilco Dijkstra --- sysdeps/aarch64/multiarch/memset_a64fx.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index 7bf759b6a753..f7dfdaace7cf 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -96,7 +96,14 @@ L(vl_agnostic): // VL Agnostic L(unroll8): sub count, count, tmp1 .p2align 4 -1: st1b_unroll 0, 7 + // The 2 instructions at the beginning of the following loop, + // cmp and branch, are a workaround so as not to degrade at + // the peak performance 16KB. + // It is found heuristically and the branch condition, b.ne, + // is chosen intentionally never to jump. +1: cmp xzr, xzr + b.ne 1b + st1b_unroll 0, 7 add dst, dst, tmp1 subs count, count, tmp1 b.hi 1b -- 2.17.1