From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 307141F5AE for ; Wed, 12 May 2021 09:24:31 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 505C5383581C; Wed, 12 May 2021 09:24:30 +0000 (GMT) Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by sourceware.org (Postfix) with ESMTPS id 62757386EC57 for ; Wed, 12 May 2021 09:24:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 62757386EC57 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=fujitsu.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=naohirot@fujitsu.com IronPort-SDR: Tmkb9+p9U27NoUKNfyRomVsP3XOtUBLAbjTYYTLmM8gmUxZaYfQAXe8ceQIykGwuzv+34TJYcN lktYfaB0c8qZJwFUIjB/vmgvs4t27tofzRgqC5VyDO3mader2Q/IK4fhALOI5o1jx9Gxvgk7Lz cGGGjgpI+DeepuYQCyvH435jgUSV7BwHFVmMT1j7UcjK4Sy5qlDbBFLquRcpzB7jCP6pE8DnRY WGn9xSSz/yZJnHuDDI53aaTYt5Mpx6ecKTxM8b4FF2ub1sqy7hrTtkd0Tj8FzBM6q3AkQlgHhh o4k= X-IronPort-AV: E=McAfee;i="6200,9189,9981"; a="8905884" X-IronPort-AV: E=Sophos;i="5.82,293,1613401200"; d="scan'208";a="8905884" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP; 12 May 2021 18:24:23 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id A02B4147F0 for ; Wed, 12 May 2021 18:24:18 +0900 (JST) Received: from m3051.s.css.fujitsu.com (m3051.s.css.fujitsu.com [10.134.21.209]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 04A7CC9CC6 for ; Wed, 12 May 2021 18:24:18 +0900 (JST) Received: from bionic.lxd (unknown [10.126.53.116]) by m3051.s.css.fujitsu.com (Postfix) with ESMTP id EC63393; Wed, 12 May 2021 18:24:17 +0900 (JST) From: Naohiro Tamura To: libc-alpha@sourceware.org Subject: [PATCH v2 0/6] aarch64: Added optimized memcpy/memmove/memset for A64FX Date: Wed, 12 May 2021 09:23:08 +0000 Message-Id: <20210512092308.900998-1-naohirot@fujitsu.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210317022849.323046-1-naohirot@fujitsu.com> References: <20210317022849.323046-1-naohirot@fujitsu.com> X-TM-AS-GCONF: 00 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Hi Szabolcs, Wilco, Florian, Thank you for reviewing Patch V1. Patch V2 has been reflected all of V1 comments which are mainly related to redundant assembler code. Consequently assembler code has been minimized, and each line of V2 assembler code has been rationalized by string bench performance data. In terms of assembler LOC (lines of code), memcpy/memmove reduced 60% from 1,000 to 400 lines, memset reduced 55% from 600 to 270 lines. So please kindly review V2. Thanks. Naohiro Naohiro Tamura (6): config: Added HAVE_AARCH64_SVE_ASM for aarch64 aarch64: define BTI_C and BTI_J macros as NOP unless HAVE_AARCH64_BTI aarch64: Added optimized memcpy and memmove for A64FX aarch64: Added optimized memset for A64FX scripts: Added Vector Length Set test helper script benchtests: Fixed bench-memcpy-random: buf1: mprotect failed benchtests/bench-memcpy-random.c | 4 +- config.h.in | 5 + manual/tunables.texi | 3 +- scripts/vltest.py | 82 ++++ sysdeps/aarch64/configure | 28 ++ sysdeps/aarch64/configure.ac | 15 + sysdeps/aarch64/multiarch/Makefile | 3 +- sysdeps/aarch64/multiarch/ifunc-impl-list.c | 13 +- sysdeps/aarch64/multiarch/init-arch.h | 4 +- sysdeps/aarch64/multiarch/memcpy.c | 12 +- sysdeps/aarch64/multiarch/memcpy_a64fx.S | 405 ++++++++++++++++++ sysdeps/aarch64/multiarch/memmove.c | 12 +- sysdeps/aarch64/multiarch/memset.c | 11 +- sysdeps/aarch64/multiarch/memset_a64fx.S | 268 ++++++++++++ sysdeps/aarch64/sysdep.h | 9 +- .../unix/sysv/linux/aarch64/cpu-features.c | 4 + .../unix/sysv/linux/aarch64/cpu-features.h | 4 + 17 files changed, 868 insertions(+), 14 deletions(-) create mode 100755 scripts/vltest.py create mode 100644 sysdeps/aarch64/multiarch/memcpy_a64fx.S create mode 100644 sysdeps/aarch64/multiarch/memset_a64fx.S -- 2.17.1