From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 02F5C1F5AE for ; Thu, 22 Jul 2021 16:05:42 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3B317384647C for ; Thu, 22 Jul 2021 16:05:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3B317384647C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969941; bh=jEe+vN1py4oED362ndRHl9Wj05x/N2GFs7JVwPLxQE4=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=KJedJYugL2Q3ykHz+B9HqMrByUQJ3BG9C/m5nRDKQD+JkkiAppzvqw9EsDry6vo6U RcUANGGG73jZBbyyDD05C3xmxsme+rgqGcyMvMJTW5MT6VFf+r9s3MyiGq4dtYoeYP yBs73zdXuGXI6ea9Lpte43RRJRlmj/Hfza8E4Grw= Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150041.outbound.protection.outlook.com [40.107.15.41]) by sourceware.org (Postfix) with ESMTPS id 9DD613848409 for ; Thu, 22 Jul 2021 16:05:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9DD613848409 Received: from PR0P264CA0070.FRAP264.PROD.OUTLOOK.COM (2603:10a6:100:1d::34) by VI1PR0801MB1888.eurprd08.prod.outlook.com (2603:10a6:800:89::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.29; Thu, 22 Jul 2021 16:05:18 +0000 Received: from VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:100:1d:cafe::35) by PR0P264CA0070.outlook.office365.com (2603:10a6:100:1d::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25 via Frontend Transport; Thu, 22 Jul 2021 16:05:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT026.mail.protection.outlook.com (10.152.18.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 16:05:18 +0000 Received: ("Tessian outbound 57330d0f8f60:v99"); Thu, 22 Jul 2021 16:05:17 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 95a550db2db729b4 X-CR-MTA-TID: 64aa7808 Received: from bfe171db8e6d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C56D9E99-28B1-4FC7-BC75-E5FFFA039654.1; Thu, 22 Jul 2021 16:00:46 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id bfe171db8e6d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 16:00:46 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fvT/2wJ4JxOAcaNViiY9NXlJWYFHRx0hkjGr6CAFwAzR6RehjxtnGTum0dPJKHl8L55KbHCjjHATpu3r0dKFhu+s1h5bYN3JDzlZgQA9YNwf7zlQgg2qwZd2ComotfkTlYh0t09LYjRHRIrv9AWFE66+Gfdv1y0I6zxT3DWHHjndvLH1eEa39ffa1XRddTC/g12R1JvF3hLJxQUosdQLjtQoJhgnTZm4Ql54f2lt+SKVgl5810uCrCLWwMiMuEjLo6zdaJKEPU/3iD+eJB+LCgQ/Ihkw8RiYEBaT72WxQSUZEsW1UQnC3hH5nSDRuhaSIraa0ECwnHKVFx6F84tzEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jEe+vN1py4oED362ndRHl9Wj05x/N2GFs7JVwPLxQE4=; b=PiFgC6hApTBFKnx03Yai72yAEkfXBylXLeuuucmBDD/YqCvnmNFtegDsT3SkdqJwfe/GqeWQtFddyPFB+biN8qIhmcVdW2soz89Ty03A3/1uFAZSTq62d/aEydu6U0nPuCUWe90ryyCs0iK/7T2+5Cwjezg2LnLBoLyeQW7KOb2AieLWZfqZjaO8F0O+ACFrsjhRaSpwWXNRTPeAcXb8MoxJkPOptwF9AH5Zp9xZvbb/JajvrE8G/XnzoxDqWkpfUYA/a0yQtEydPNHia1cjr2S4yuSyLuKMW46syowd724VLOKwIeUpncRbhFCh0rKTvl7PklcrJLYAIg2N50HDAQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VE1PR08MB5872.eurprd08.prod.outlook.com (2603:10a6:800:1aa::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24; Thu, 22 Jul 2021 16:00:44 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 16:00:44 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxKaeH7mpjfTrkCBy3mxGimKXg== Date: Thu, 22 Jul 2021 16:00:44 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: e4ee4af8-ae24-4d02-39c2-08d94d2a801b x-ms-traffictypediagnostic: VE1PR08MB5872:|VI1PR0801MB1888: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5236;OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: BhySWc1xUkP5+jIGBjMJUynYcR7VlPYnVdS09u8qYaU/qJaQo3I0aatXXtvaJAWw5DdcVEyqY4sRBwZ4/3bZD6MRgXggEI6KYE8twr3W8PSr8UL2SoRmzzPc1bRdmz9MZ1zHUj15jitEOXVSR2FiI/Ww5z8G5PjWdvGkChq+SmgzLSUA9b2Hpj9//6qeFanr8XknojtR0TjaoQs9+oCeUvdQeST/BHld3Sry4N6NvA/hhlnyWtqgz28f9D5U9n2BCAozZPbA3oJvo3MtOxs53SlFnLMEM8mjFNvpgoM1/ifDvbvGg+Hbr5XX6HRFU+qtWo/e+RHXvz8qCg8av/r7gOL7SPe4Oa30dDisx62tJPeD/X2/TMoMOtLJRGCI0ki7yS7M6PQX5iHPH3t0R5aBVjP8GnwXPW4Gn0xZR/5nmRIQWTTtfrkUj8ylHtR6o4DQrvPaMolgHWaKB1/LS0N5FmhuD6F811bfEfFqbErjrfdEYQj+Ao0Su79u1Wue2onLwsujcB/swqunsb3IpmlFw2xHpUvnCrhskSjalvm8Mv+25WAOm2CBJlcfJV9ua0mxZrgBlk6wCI8dex15Y4YNCyUfVZrtFgInuuwlGphycISK7/yynTolVylapKgGq6E8Nx30qpw6aTCCuOAgvaBStSQ8x+8xG+UGU56wB02Q5ZN5iSO+MQhX82WH9u9CCNREeNlnxez2OtaxF+5+Q5EBWnpB6WTsQHVbbgfHpyrK+d8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(39860400002)(136003)(376002)(346002)(478600001)(2906002)(86362001)(6506007)(8676002)(6916009)(122000001)(38100700002)(316002)(55016002)(33656002)(7696005)(9686003)(186003)(4326008)(5660300002)(66446008)(52536014)(8936002)(66946007)(66556008)(71200400001)(76116006)(26005)(91956017)(64756008)(66476007)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?jFe3EY0jE7CB0lCG6s6dQwDvYS7ipk2kf7nPfs/EMCoFVvxZbTOUY7UzvW?= =?iso-8859-1?Q?9voH+dh3jKuahTNxDhdO1bAPBGom0PD/UypFP1phsH+mN31PIHSNq9KysN?= =?iso-8859-1?Q?8pergZxNahmqX2+S3eYMONS071esRJ4UaqrZriecE51KDj2FulFrV90IhN?= =?iso-8859-1?Q?z24lww3nprOmHeV7nMpbSCn7j/1Tox3RGIuQ0JZOt2Q6xpEENecsrc5L3n?= =?iso-8859-1?Q?AjRnbxBKl5F9aL6z6L0C53LZDB8t20PLBQ6ZAn+D1xrx6awvX6joUWLwSt?= =?iso-8859-1?Q?GeT1NC+Ee0WsFtBbWXLCT3pDh3gV1ejw6gj32vQPSXMa/iRw9dDlhM9hBg?= =?iso-8859-1?Q?y4kNICK4E6JQAIgiRxUsWHMuKT/uf/fc7oIkF0k1KjDXdZUddlzMW7kEEr?= =?iso-8859-1?Q?yee7saGGaCcynraXCWsIfYD2NmYHC2oFmiQtIm+Fw6mmoaFzAAd48UotYg?= =?iso-8859-1?Q?s7tUu53dJ/ufev+fifLZSULGlODSigXrcRLHx+VRUxp01SjZve/iK8hecy?= =?iso-8859-1?Q?VoQ0c9LNfov2bI11iYNF42fnhxUsD1TCHEu4i/tsRiHn7Lb7FIQ9DnE7ui?= =?iso-8859-1?Q?3LvlWao+iUPfcozcCSmzOb9+hF+m6aSp5A43fAeWGJcxpsZz/fmpR6LzZ2?= =?iso-8859-1?Q?ftpu/u1hZ6dTSSYwsizkOTjo+kEcYeAjGmvNm4OwqlN2ZuXsv7/HnpLPX4?= =?iso-8859-1?Q?7z6VmcSsXH5a5Ps4ZcJzjoPJi4FKEBbGgZeALyoKWzKEVNTzwL2uS11xaX?= =?iso-8859-1?Q?Lcbl+mxDRCHkTKp7ByS1KVEb056BLcRqKsUEKYjgTumK4Jt571EBEoJDlR?= =?iso-8859-1?Q?S23l3o0tebTiZVMOIuKYgrkF6x36yk4MZ4JewQ7YIVKyQNJTzuSd5PWOZd?= =?iso-8859-1?Q?SXJ7HURqOGsPcpP1ZTeWtYbYU5WM+8hgCVnYpoRLCY3MzDf6M/+/kPg0XR?= =?iso-8859-1?Q?I4W+Tr8hQIauRbm8pEwGBwzfsmOubIoap2xK/1qKjZDwNV1Dgb9J+3r2ZZ?= =?iso-8859-1?Q?NE8nLAVPe9jwdFpi6Qvq3cGTQCzlr4ClVzYfT3rf6CqZCYxj18nCWthVB2?= =?iso-8859-1?Q?f+3CPyfLFIYZNZoBx7lMcrx+baRVi6LfFebJ3RCWK2bH6K5jek+rfF6tcb?= =?iso-8859-1?Q?7c5g348s1qA20DJaarjmwZW1T6QCUQUxi2JjIPsZ3FGFCirImAf0Fk2ua5?= =?iso-8859-1?Q?N6vFqCoqI65qcbIboGyH9pAjX+o5DT5+IdoD9NiBr7UMOByDENErGLbe5I?= =?iso-8859-1?Q?DM83B7nzPCxmWVoaGXYoVD0ZdxiqvM4LvYiN/yrd5dPRbTzPoLIcqxBZfs?= =?iso-8859-1?Q?CZTZ21roB/bs2xt/qsLuYR9Kd/Wec6QW7/0EHKuJnnRheUE=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5872 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: b755f83a-4a6c-422d-eb56-08d94d29dd1c X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NcAFIZyZ+0dUD/6qWWt7txq+TKSz+Lw/WtajVUjikb7wXF6sA8hP914y2/i20vhErr98uI+mjUpJEAtYeXCzCo5glBW1Lob7ppugPFlxwoSv70PqlMNJpndXINb1BeGgTPJrl9fOvX08nqOz6cW//EF5Ecclw7XNBoIvAfNHU3awvfU4u0Nw5OO0fq7afBGV/Pb8Jr7B9nHtZqJ44vFBGTMNOA5uglG+1FHbRAj2ZSCE2c/QCx/BF/pJyodtoPbChH6zTSFf/RkyhQEqix8OL3VHU7xwfczWhYUUBdM3zzDzy+jeFrqPsvIGDQVkKHPrUO66qnqkR7r5VkdgkPT7+qqxnmxO+j7GB6EXWqikVmyw0fjvsKccWDHOmhjyKW5t4uaJZx8x5f7snDviUqXXplnWMn7Tr39Z86uJU5cDZBP6P3G6spcmlyxmWf8NecCSqqCwiiqZpy4ZwL0hLMaNDuaitRuaHZI7bIQu6B2cN/UElVu1+oUkIC+4DEYFFuuqjs84gDPrk8NxI7EJpsJ9FTNvKWpslNUlVFrft61yehEo75VzBxNuqvQECn7Fj26b5MpeSYgNWg6D9Ijhbwh4N7N1A0KNu/YELf/5qZXYSP4r+F2M1NPKFAJcB8/be2OWnvJUZdWKRXYmFyGz0/GIAjmWY1ztS17SQ474DN5TCIgwMJzoBZVzrAba7Er6fIZzKuEkUEyqzr42pwUeibK3HF66a7J7E9Hlq6mupIYE6eM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(39850400004)(376002)(346002)(396003)(36840700001)(46966006)(186003)(4326008)(7696005)(478600001)(47076005)(82740400003)(356005)(81166007)(52536014)(6862004)(36860700001)(26005)(55016002)(5660300002)(9686003)(2906002)(6506007)(316002)(8936002)(86362001)(8676002)(70206006)(33656002)(70586007)(336012)(82310400003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 16:05:18.1656 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e4ee4af8-ae24-4d02-39c2-08d94d2a801b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1888 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Improve performance of large memsets. Simplify alignment code. For zero mem= set use DC ZVA,=0A= which almost doubles performance. For non-zero memsets use the unroll8 loop= which is about 10% faster.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/mul= tiarch/memset_a64fx.S=0A= index f7fcc7b323e1553f50a2e005b8ccef344a08127d..608e0e2e2ff5259178e2fdadf1e= ea8816194d879 100644=0A= --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= @@ -30,10 +30,8 @@=0A= #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= #define CACHE_LINE_SIZE 256=0A= #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1=0A= -#define rest x8=0A= +#define rest x2=0A= #define vector_length x9=0A= -#define vl_remainder x10 // vector_length remainder=0A= -#define cl_remainder x11 // CACHE_LINE_SIZE remainder=0A= =0A= #if HAVE_AARCH64_SVE_ASM=0A= # if IS_IN (libc)=0A= @@ -41,14 +39,6 @@=0A= =0A= .arch armv8.2-a+sve=0A= =0A= - .macro dc_zva times=0A= - dc zva, tmp1=0A= - add tmp1, tmp1, CACHE_LINE_SIZE=0A= - .if \times-1=0A= - dc_zva "(\times-1)"=0A= - .endif=0A= - .endm=0A= -=0A= .macro st1b_unroll first=3D0, last=3D7=0A= st1b z0.b, p0, [dst, \first, mul vl]=0A= .if \last-\first=0A= @@ -187,54 +177,29 @@ L(L1_prefetch): // if rest >=3D L1_SIZE=0A= cbnz rest, L(unroll32)=0A= ret=0A= =0A= + // count >=3D L2_SIZE=0A= L(L2):=0A= - // align dst address at vector_length byte boundary=0A= - sub tmp1, vector_length, 1=0A= - ands tmp2, dst, tmp1=0A= - // if vl_remainder =3D=3D 0=0A= - b.eq 1f=0A= - sub vl_remainder, vector_length, tmp2=0A= - // process remainder until the first vector_length boundary=0A= - whilelt p2.b, xzr, vl_remainder=0A= - st1b z0.b, p2, [dst]=0A= - add dst, dst, vl_remainder=0A= - sub rest, rest, vl_remainder=0A= - // align dstin address at CACHE_LINE_SIZE byte boundary=0A= -1: mov tmp1, CACHE_LINE_SIZE=0A= - ands tmp2, dst, CACHE_LINE_SIZE - 1=0A= - // if cl_remainder =3D=3D 0=0A= - b.eq L(L2_dc_zva)=0A= - sub cl_remainder, tmp1, tmp2=0A= - // process remainder until the first CACHE_LINE_SIZE boundary=0A= - mov tmp1, xzr // index=0A= -2: whilelt p2.b, tmp1, cl_remainder=0A= - st1b z0.b, p2, [dst, tmp1]=0A= - incb tmp1=0A= - cmp tmp1, cl_remainder=0A= - b.lo 2b=0A= - add dst, dst, cl_remainder=0A= - sub rest, rest, cl_remainder=0A= -=0A= -L(L2_dc_zva):=0A= - // zero fill=0A= - mov tmp1, dst=0A= - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1=0A= - mov zva_len, ZF_DIST=0A= - add tmp1, zva_len, CACHE_LINE_SIZE * 2=0A= - // unroll=0A= - .p2align 3=0A= -1: st1b_unroll 0, 3=0A= - add tmp2, dst, zva_len=0A= - dc zva, tmp2=0A= - st1b_unroll 4, 7=0A= - add tmp2, tmp2, CACHE_LINE_SIZE=0A= - dc zva, tmp2=0A= - add dst, dst, CACHE_LINE_SIZE * 2=0A= - sub rest, rest, CACHE_LINE_SIZE * 2=0A= - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2=0A= - b.ge 1b=0A= - cbnz rest, L(unroll8)=0A= - ret=0A= + tst valw, 255=0A= + b.ne L(unroll8)=0A= + // align dst to CACHE_LINE_SIZE byte boundary=0A= + and tmp2, dst, CACHE_LINE_SIZE - 1=0A= + sub tmp2, tmp2, CACHE_LINE_SIZE=0A= + st1b z0.b, p0, [dst, 0, mul vl]=0A= + st1b z0.b, p0, [dst, 1, mul vl]=0A= + st1b z0.b, p0, [dst, 2, mul vl]=0A= + st1b z0.b, p0, [dst, 3, mul vl]=0A= + sub dst, dst, tmp2=0A= + add count, count, tmp2=0A= +=0A= + // clear cachelines using DC ZVA=0A= + sub count, count, CACHE_LINE_SIZE=0A= + .p2align 4=0A= +1: dc zva, dst=0A= + add dst, dst, CACHE_LINE_SIZE=0A= + subs count, count, CACHE_LINE_SIZE=0A= + b.hi 1b=0A= + add count, count, CACHE_LINE_SIZE=0A= + b L(last)=0A= =0A= END (MEMSET)=0A= libc_hidden_builtin_def (MEMSET)=0A=