From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 64C271F8C6 for ; Mon, 9 Aug 2021 16:18:04 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A44AA386481D for ; Mon, 9 Aug 2021 16:18:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A44AA386481D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628525883; bh=RKQMKKKd/MAnMwZBGLCHSvL8c+HkC1CC5rR2rhi6d9A=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=SVC4m8B2jLlnjjCj0Y4plPgR4CpOi3b8SG4O70B/LJuCZaFNDgHGeU6V6aF5Rcpav 97Rv7E5cjIf93FynBp6iGuHl4PSE3L1L2w0eEStQg79ERYHXgsC/6r1GNVFXJtN5v1 oKoINgy+bRbD90kbCtiAG7tSV4pkCa4QBcnCLR9A= Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2057.outbound.protection.outlook.com [40.107.20.57]) by sourceware.org (Postfix) with ESMTPS id 61999385E019 for ; Mon, 9 Aug 2021 16:17:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 61999385E019 Received: from AM6P192CA0104.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::45) by AM9PR08MB7013.eurprd08.prod.outlook.com (2603:10a6:20b:419::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 16:17:39 +0000 Received: from AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8d:cafe::e) by AM6P192CA0104.outlook.office365.com (2603:10a6:209:8d::45) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 16:17:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT006.mail.protection.outlook.com (10.152.16.122) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 16:17:39 +0000 Received: ("Tessian outbound efa8a7456a86:v101"); Mon, 09 Aug 2021 16:17:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6ac086bc0c01b28 X-CR-MTA-TID: 64aa7808 Received: from 19c96168655f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1FBD997E-33B8-47C3-8389-F74080C66ABF.1; Mon, 09 Aug 2021 16:17:32 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 19c96168655f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 16:17:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=a3GHPP/5LoFmuU/qEaX8pYFM5EX0u6o83R38anQY3mJ7Wn5/y/uaJ6NWAAsojeFdSlnoivB8/rCrgHCvcvU/Nxz6U5fyuUSwEkD9IR1b69bRrdXT5YOLBiU6aF2KSNmnFM4FzI0bi4e7lwON1Iqp/O8sEJC/1fOrBlSQIvLv+GO6M39dsG9kwbs0jAsDf2dovPdMNbFRA2ZfCUyphpPJaCkgeXzNCmRjyCJEZzjIt0ZWkNZYrljT4kG/I5I3MMl7RBen+xOSw70rQSwzcJOHSE1zQyJxzHEeEm6Dlk2fRL1u3o7SidMp12NT2mFdB4zNZV8GoTCnoPAiFHoZvqhwow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RKQMKKKd/MAnMwZBGLCHSvL8c+HkC1CC5rR2rhi6d9A=; b=M8Ap8pF2DNRog1qOOhREBxpkBHCcUoQijc4iRrIWy9yiH9OkfIqo3m8ffL41a+isAwzJ2ywYLL7XpwRzEnPTGN7SESvRibQ31HyNLnXMNA68llWxXZM6gbFELj7VY3DQd3g58Rgvy0pd0+V5kj0A0ZX3uS3ygWYqvbRCXOMyNhuFDa6f7y3w9M5XUyf57YVCmukdkWrz0n07sewsenLrC6cEE7l2JjU8H7hgnjmc7yF/X7l18BKWPHrudO0d3TErJAuY35Gy5CGqO+MpmBKmxucVeE4n0oCeW/J4EydGFBrMszzSn2EMzgpIkrA/o3AQqANc6f8eVm8ZMNby+lgKeA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0801MB1711.eurprd08.prod.outlook.com (2603:10a6:800:4e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 16:17:21 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 16:17:21 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 2/5] AArch64: Improve A64FX memset for large sizes Thread-Topic: [PATCH v4 2/5] AArch64: Improve A64FX memset for large sizes Thread-Index: AQHXjR9/leZH4OrOA0GzbptlHYctdQ== Date: Mon, 9 Aug 2021 16:17:21 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: c6665cd7-e1fb-4319-c418-08d95b513525 x-ms-traffictypediagnostic: VI1PR0801MB1711:|AM9PR08MB7013: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5236;OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: wDwEQDpQry6Hn6bmfDsWhmTA1/x+dkQXpJpda1Z+M5UgDWBQ34QwGRq+JOCYowpCs4oDWAWwL1unEnauGvoSG1PqB6fV9z2lryAdixR4xyCxKMjEV3LE1SulFUJd0kKNg72GP7Ck8d+VevbAP56mCsriBtC4wpka08I9+DwSZyrvzgS/Netr2rYPDxiP7Z0+ZZaHXf9JaeifLWv/cP9WoyetkqEbtymct06TY7zeanpigM0J6cDBTWASYoFBYxy/lDw+e4ZwUoOBydZ+IShDnAiDieviTm6OO2TWjFMIDCak4lGRehuS4NnoTxGLMpzPb8+57B67KGfFVaHp0eDnuhEOo2G3KCBMEAyS7c/Wp0o3FUvF0OGs5JaPCckLUT241a+6GxnzFq0cjPq7f2xqn4Dbh5RW9X6YKkXwdM7NMEXNYpCU3fdbGIoFxmPRVfW76ppHRpjsYgSrkEH5blEzq6bePd/4p8dPpDhgA7jUy5ClcbIfSF5teywwkE4WiKMvQ4BKOHsWBD3MWvXncQAw9bjn9mPRLgL5FXx2mLl/jCe1vkqEQyASP1PpSATGQyA0SGOzNKunp36qiNObOq6RoiYD5sgtVjrzun64riKKxpobTAuAkzL01HZPX7cHUMylMXcpIKBT5xzBN0bgd/cVo8DRPDRwB+h4RuZJj4JkrfasspezZV5n+l+MxgDoh35qs3CbX+v2LG37EIJZgtL/+moSn/WHu3+Y1puy6mrdRVs= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(376002)(346002)(136003)(39860400002)(478600001)(6916009)(2906002)(66446008)(64756008)(66946007)(66476007)(66556008)(38100700002)(4326008)(55016002)(52536014)(122000001)(38070700005)(91956017)(6506007)(26005)(8676002)(9686003)(5660300002)(8936002)(33656002)(71200400001)(316002)(86362001)(186003)(76116006)(7696005)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?oxeCzTpE7kTO0tRRI3vwbqJ+/JTITubL9hqJAgZDYTHY+S9zqbOBHINtFv?= =?iso-8859-1?Q?qSofiKjWTZln+WOfpZ1M4xz2M2Ho/QYNdNglQh/JBE2T+FoayPqxJGv6rL?= =?iso-8859-1?Q?rOcTfZ9ytVF6viLvwUW4retQVrSKhtWJ0BWNkR25OvHSfBCFeO6v6kB/3R?= =?iso-8859-1?Q?t0qT3DVW5qlS1ak1eIUVNuHzS0QJ8KGHp84GgERXmkk32ngjSGQUEj445g?= =?iso-8859-1?Q?l6Zimi3peznZr4a2Qze/L+ElDralvCGnKt+YlYMW8TyK3EmomF+/0L8y6t?= =?iso-8859-1?Q?hbP6x1sVOayePFPpIgtVtzqWVhz+ncamotfoRMl9GK9iYykj5jp/9De/1L?= =?iso-8859-1?Q?E7pes1e+i4I9OQap4aQJ00jxNEwjncyXJetIIQsDDns7SianNGmZPhV5QE?= =?iso-8859-1?Q?hG8eEd5kS4YtHAzSZ/dAHgjBvRVQUzhtkf1rryLuZ5ufyVGaBEIIVvQjUj?= =?iso-8859-1?Q?YR8lCxYaSVpdq1Ae1y6RX2gWPyjDI7P1WxPI425dgwGmZYxDyG9kjlNtgk?= =?iso-8859-1?Q?69F7dEUznCuQtPHlTkRm+d58hjBzAtuCZ3nRQmSLUIRYeYg8IjvA2BajFB?= =?iso-8859-1?Q?qEtr8H64CR79h+ucAu9LW2eK4u05PjAIemeEfKb7oDyW4rElqXq7NDbcJc?= =?iso-8859-1?Q?SuEpdV6W3N7QiG4dEuHbh9QymT0wQ2ZOf6LktUlfjbzt+7BI8j1BW88xoG?= =?iso-8859-1?Q?LTW/wa7C9J7Hkr1FYzhUmlVe4riwnDmvZ6cU3AtyGdBW6FB/Wv0sTa+O6f?= =?iso-8859-1?Q?54UrA6MzUomx+symsqg+MWCaXuyBWPMfbJtaYiNvwL60NMINKNxJ0JQ0dK?= =?iso-8859-1?Q?F7uKz7KRq5YbMPiPnSPmWd1Dtb7+Sjpnjc4M2wHVycm+SgGKPYwvA0o+8t?= =?iso-8859-1?Q?yYiitfyjrzZLMsD5YfhSU2eMlUXH/7PRz8hPuI/1GYUZUAeUOoDDP+BoVN?= =?iso-8859-1?Q?JSr96UYdLyl4SnjNcLBKf07aGYfatd1nJBG//8ZSkERXbiGGg9tLGjRs/1?= =?iso-8859-1?Q?jBYJz9B/oOn31cj07H+yU1sQEXArQkjYWYTiPDxw6nTQ1FDlmDrMKIhUAQ?= =?iso-8859-1?Q?gC3p6y3iFiQXYWmmV63XuzqK8ZTNWDsyjuiNGMvy+T9CBm/ua/KV+BGtE0?= =?iso-8859-1?Q?LS8VbxLYprKoP9jaXquO1TiIeLoCMj9dp+NpFjv1C73gAtzZzBD/7J2qJt?= =?iso-8859-1?Q?z+7CnXEktpJVwPX+Z6bA0Z/+sY+3hZP6J60ieYDtsmwcmRE3HoVHwRQZU7?= =?iso-8859-1?Q?CBM3W96vsmfO4nRiNryv9ZUfQrzX1NSC9GlL/uwq6+FZ72bfWeL7nzVjCN?= =?iso-8859-1?Q?s00lg43Q8lLUvNonRqDqTEtSJxf3+yGgD4rOMFcZ4lSWIR0=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1711 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 7544b3e2-a48c-4954-4f1a-08d95b512a8a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IcCFxicjQxsqbhN3ft556cv3S5EV13QDSPnXvjeMDGjAhECOqrmSjyjIxT+AySUELAD5y9PQ2cbZGWs9ysAicmTImZ4VBAqRHFKeRjOrkk8cFTNewl2jVe+yHMS0cXhS4ciLh2yg0v983vA5mM18vN9bvGK0cpqEG31EWl2oHKbHWptVwpB2qugtwACnv4IJz2yhFMMQDX+oEDG95GE+dyrBDLu3H8JZoP3p7kDd+gO92kcYC48eAhd4D/EuNLnL07y7meoKyPmEkJALbPFu7HqkudKOTmJQ3n7lzhIIN9orSH24DtDdDBhVfl7srh9Ygt0ug/EGP2Yw9bSn+M5XqJpvSqS+4Z59AVqgk7ow5dNWNRdxRdkG425J85OIaZjk6OBER8293lYlWYiNIYHBf1NX9aWqXBDgR9iPUuQe3zOftb96tlXV89Wu2+fX8hKudDBhq6OJTK+ladINWPvFOxkw4vPoXF4Z5Px7otzId506Lao0HTIldEqOfWmjlFJeRCf82BY6HjlZ8pKnOb0Y1K9Rw4oE+Me64FVugTSM4V8w509B7/VwREaOs1kmIMInnJ1kHg0BkDjbkQsNZd+HcD62nDyVOGhGpfT0945PnHE/KWvUulOJTAtXvJzVzsSSeeck0QhdXFhJ3wiPWRFafwJcOzBd8W56Phcee1/1C7g2dsrqk2LmxEcq2WAVM8KmbBnicpbtswN0LF9HIBFaQqLjTuBfuBiXyO5Mu7uKSXY= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(376002)(396003)(346002)(39860400002)(46966006)(36840700001)(82310400003)(8936002)(2906002)(6862004)(336012)(81166007)(52536014)(33656002)(6506007)(70206006)(8676002)(186003)(26005)(5660300002)(70586007)(86362001)(356005)(36860700001)(316002)(9686003)(478600001)(47076005)(7696005)(4326008)(55016002)(82740400003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 16:17:39.1262 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c6665cd7-e1fb-4319-c418-08d95b513525 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB7013 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" v4: Slightly tweak alignment code=0A= =0A= Improve performance of large memsets. Simplify alignment code. For zero mem= set use DC ZVA,=0A= which almost doubles performance. For non-zero memsets use the unroll8 loop= which is about 10% faster.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/mul= tiarch/memset_a64fx.S=0A= index cf3d402ef681a9d98964d1751537945692a1ae68..6bc8ef5e0c84dbb59a57d114ae6= ec8e3fa3822ad 100644=0A= --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= @@ -27,14 +27,11 @@=0A= */=0A= =0A= #define L1_SIZE (64*1024) // L1 64KB=0A= -#define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= +#define L2_SIZE (8*1024*1024) // L2 8MB=0A= #define CACHE_LINE_SIZE 256=0A= #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1=0A= -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill distance=0A= -#define rest x8=0A= +#define rest x2=0A= #define vector_length x9=0A= -#define vl_remainder x10 // vector_length remainder=0A= -#define cl_remainder x11 // CACHE_LINE_SIZE remainder=0A= =0A= #if HAVE_AARCH64_SVE_ASM=0A= # if IS_IN (libc)=0A= @@ -42,14 +39,6 @@=0A= =0A= .arch armv8.2-a+sve=0A= =0A= - .macro dc_zva times=0A= - dc zva, tmp1=0A= - add tmp1, tmp1, CACHE_LINE_SIZE=0A= - .if \times-1=0A= - dc_zva "(\times-1)"=0A= - .endif=0A= - .endm=0A= -=0A= .macro st1b_unroll first=3D0, last=3D7=0A= st1b z0.b, p0, [dst, \first, mul vl]=0A= .if \last-\first=0A= @@ -188,54 +177,30 @@ L(L1_prefetch): // if rest >=3D L1_SIZE=0A= cbnz rest, L(unroll32)=0A= ret=0A= =0A= -L(L2):=0A= - // align dst address at vector_length byte boundary=0A= - sub tmp1, vector_length, 1=0A= - ands tmp2, dst, tmp1=0A= - // if vl_remainder =3D=3D 0=0A= - b.eq 1f=0A= - sub vl_remainder, vector_length, tmp2=0A= - // process remainder until the first vector_length boundary=0A= - whilelt p2.b, xzr, vl_remainder=0A= - st1b z0.b, p2, [dst]=0A= - add dst, dst, vl_remainder=0A= - sub rest, rest, vl_remainder=0A= - // align dstin address at CACHE_LINE_SIZE byte boundary=0A= -1: mov tmp1, CACHE_LINE_SIZE=0A= - ands tmp2, dst, CACHE_LINE_SIZE - 1=0A= - // if cl_remainder =3D=3D 0=0A= - b.eq L(L2_dc_zva)=0A= - sub cl_remainder, tmp1, tmp2=0A= - // process remainder until the first CACHE_LINE_SIZE boundary=0A= - mov tmp1, xzr // index=0A= -2: whilelt p2.b, tmp1, cl_remainder=0A= - st1b z0.b, p2, [dst, tmp1]=0A= - incb tmp1=0A= - cmp tmp1, cl_remainder=0A= - b.lo 2b=0A= - add dst, dst, cl_remainder=0A= - sub rest, rest, cl_remainder=0A= -=0A= -L(L2_dc_zva):=0A= - // zero fill=0A= - mov tmp1, dst=0A= - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1=0A= - mov zva_len, ZF_DIST=0A= - add tmp1, zva_len, CACHE_LINE_SIZE * 2=0A= - // unroll=0A= + // count >=3D L2_SIZE=0A= .p2align 3=0A= -1: st1b_unroll 0, 3=0A= - add tmp2, dst, zva_len=0A= - dc zva, tmp2=0A= - st1b_unroll 4, 7=0A= - add tmp2, tmp2, CACHE_LINE_SIZE=0A= - dc zva, tmp2=0A= - add dst, dst, CACHE_LINE_SIZE * 2=0A= - sub rest, rest, CACHE_LINE_SIZE * 2=0A= - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2=0A= - b.ge 1b=0A= - cbnz rest, L(unroll8)=0A= - ret=0A= +L(L2):=0A= + tst valw, 255=0A= + b.ne L(unroll8)=0A= + // align dst to CACHE_LINE_SIZE byte boundary=0A= + and tmp2, dst, CACHE_LINE_SIZE - 1=0A= + st1b z0.b, p0, [dst, 0, mul vl]=0A= + st1b z0.b, p0, [dst, 1, mul vl]=0A= + st1b z0.b, p0, [dst, 2, mul vl]=0A= + st1b z0.b, p0, [dst, 3, mul vl]=0A= + sub dst, dst, tmp2=0A= + add count, count, tmp2=0A= +=0A= + // clear cachelines using DC ZVA=0A= + sub count, count, CACHE_LINE_SIZE * 2=0A= + .p2align 4=0A= +1: add dst, dst, CACHE_LINE_SIZE=0A= + dc zva, dst=0A= + subs count, count, CACHE_LINE_SIZE=0A= + b.hi 1b=0A= + add count, count, CACHE_LINE_SIZE=0A= + add dst, dst, CACHE_LINE_SIZE=0A= + b L(last)=0A= =0A= END (MEMSET)=0A= libc_hidden_builtin_def (MEMSET)=0A= =0A=