From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, PDS_RDNS_DYNAMIC_FP,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RDNS_DYNAMIC, SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id AF0011F5AE for ; Thu, 22 Jul 2021 16:00:14 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6D3123848409 for ; Thu, 22 Jul 2021 16:00:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6D3123848409 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1626969613; bh=ljuM4l5ZLqwCJ/XaFtbQJVj1vy3a6UqQbiVthbtQZDM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=uzjK56e04pAbu6e/y34uf/De/klhIk8trRoZvwmV/NAo+/5wZjxmhRgQpBVrwUqqw mwxZDgUZYLmiVtTSMTH+sQbBihb3ibR5SyrQ34iNKuOxDyw68JsPT8xP+/xuRtTBib 5N1zFnry1jR9VVJ+WBwx9KCySOTQAnQu5stsnEdw= Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80070.outbound.protection.outlook.com [40.107.8.70]) by sourceware.org (Postfix) with ESMTPS id C5EB73857C53 for ; Thu, 22 Jul 2021 15:59:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C5EB73857C53 Received: from AS8PR04CA0138.eurprd04.prod.outlook.com (2603:10a6:20b:127::23) by AS8PR08MB5944.eurprd08.prod.outlook.com (2603:10a6:20b:297::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25; Thu, 22 Jul 2021 15:59:49 +0000 Received: from AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:127:cafe::8e) by AS8PR04CA0138.outlook.office365.com (2603:10a6:20b:127::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 15:59:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT018.mail.protection.outlook.com (10.152.16.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24 via Frontend Transport; Thu, 22 Jul 2021 15:59:48 +0000 Received: ("Tessian outbound 809237f40a36:v99"); Thu, 22 Jul 2021 15:59:48 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: de698130304502b2 X-CR-MTA-TID: 64aa7808 Received: from 9831eb7a0e1a.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E91266D8-C52A-4DB3-B89C-A0D425112012.1; Thu, 22 Jul 2021 15:59:21 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9831eb7a0e1a.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 22 Jul 2021 15:59:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ApMIqsLOayTZ+Jq9YSQhaa1RsTtZN4mpFafNQw4kpT4yW3vYir10DiQHYZYdyws3CBaJ0HxQl8MQpuayW2A0IE7SPU5HXrqOs0QtnyQX1xJqvrT0Kuib/Pc4jmxs/GDOxHklteUpQK48Y8/4fYEy7pLN9ACl9mx6J0CD+8q4s4Yc9e3u8+9fOfNKZmBydFAgNNNanx71gGeXQmt2wyvo2eUVSrDRKq/97EXGEvu0zm25IKJxzNagumUqmAOFm0p13n0E2uf4b45SGX+1PffcA0UA7lhNneM7029oQGbzQN0ETgiMK5VnTMTvZtaLSkgy593yvQFa3gb/mE3CzhSBGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ljuM4l5ZLqwCJ/XaFtbQJVj1vy3a6UqQbiVthbtQZDM=; b=OfeZ9cicOdIGzeWgEd8DWDguDiMWH+BUNDSq6tonASfOQDYJxjEiEcJb+WeVFiZasbbpoSodsfkeLn1fUxazY40YrIRnR5tHdSKUpMNhQyRUu3ZHvOaW4Pcjp+fxyG8p6zJD30VE51olJXzPdXyrMLFeftE2ZIQ5j/x7bfoXHcjrGu1tuMXgz7YrqfRHajiGL734aWuxEwCdlyVFvVOQRpSPIMY+pUKQTOKNqfd4z+MDQBcBcwecls4wpGSsOc8jrGqAnkiUUPjFnSTWwzZuWsKxUHbLMu9fl25K9FhdNL6/OUjCPNvFnVn9TVnyA9LcDvK9EF8PNoIvYRZCc9eciw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0801MB1710.eurprd08.prod.outlook.com (2603:10a6:800:53::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.24; Thu, 22 Jul 2021 15:59:19 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::5ccd:ab57:a64f:e07e%7]) with mapi id 15.20.4352.025; Thu, 22 Jul 2021 15:59:19 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxJfA7Ac8a4LAESeqDyonguxdA== Date: Thu, 22 Jul 2021 15:59:19 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: d489e7ad-0502-4942-d748-08d94d29bbcb x-ms-traffictypediagnostic: VI1PR0801MB1710:|AS8PR08MB5944: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:2958;OLM:2958; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0qNBustEhDP2KYatovmEdOThKRqAxT1PE4hmpC0yiIYt2wV3H79M2MPjz94ij1P5ubl+cOahnPmEPZ0LbeNvkT2xiTXt2SO+1WIUkiya/ufB4Zj6IQB6g984DjyvZuaviNQXKv9mR0jRtL1x4NsGUp/qu66biAWMfVLlSPSlUXwVQ0Ul/aWju3L+iXSZsnaW4Utp1tuqrSxkBjjlZwLWZzJ+ZYFbgi61SxBVQLBbkwztCQd6t1PvQjDhSmwM+j/GtWGhUJShqsMprxlSEupCIWVOL1AHFgxFekJ4JBl4lhHeu40mKdMQf9d2s0FMcbptkLaNRNNibK+DfHahrPDCVHm4JJ+uhwNKYyWzlGO4HSKynr0kxLjQf3gulw5Q7+GjHJore3lw+QHb63IIxjhzqBWvD279USFN5EaHjXbW3EosmKhbmBmWfnlkX9e3vgvWo+SScwEf3Jnlk/Mx5092inhreljDVwDNvi0y7GOBVDb3SoxZl9kF+XZFelOM+cPi0CxVbknTUVpDXBLOWtkrCrMPDXrA+2lCPa+Vbq1zaaf6tqgtT6AoLlDL1mkP5giL5W9GvTd+yukz1PgkZl2SfsiInW2q+hbMzdMgbXwfl3WCfZ/1oym8ywdznm72vZKYonzv7wyQ9h5yMLvBIb5Vf5xgb98dx4WC1z/RPrqKZAs9NOnOIDpI0BcsMFZyChSUPyvcBOxzosh5xTv17pbct0QIpXgpZLlKEuP796jBQIc= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(346002)(396003)(366004)(136003)(376002)(186003)(66556008)(6506007)(316002)(122000001)(64756008)(66946007)(52536014)(478600001)(76116006)(7696005)(66446008)(86362001)(26005)(66476007)(55016002)(91956017)(33656002)(71200400001)(9686003)(8936002)(4326008)(5660300002)(8676002)(2906002)(38100700002)(6916009)(38070700004)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?2K0U31VwX8VmIqK/F+FzOzXt4XoPTOH/owo+0ybk1I61MLWV3IpRKp0aft?= =?iso-8859-1?Q?FsHiunqp03eOLdgHdamfVpxsugh4XsECKEFf9U3wu8oIrO3U+a/dDg86As?= =?iso-8859-1?Q?neDy5lkhAv1vJZSNUQkldXRsHVuFGlkZUU/26uo3dldhpwxQ9eogbW5B1v?= =?iso-8859-1?Q?uzcb3F1lcTx6nYDjct3K9mIkwgBBmeHDAWVFuZcsHd/C0dYbVH6mlB/V4P?= =?iso-8859-1?Q?ppwgMyR2AZhjKWv6H64HpsnDQKt4+8G181/2vVys3rGGU27zTnQLPjrvXO?= =?iso-8859-1?Q?eRHXg7ZnBZbJsdFq9ry35KHyZSkogkEjEBuhGHkPqewmlKGlTntRzCRJcW?= =?iso-8859-1?Q?7NDfmKRh6ebMHri3Ov0+ifhkeIiLxUvD5D4UzazSQxMxgvsL2OV1VlnFU+?= =?iso-8859-1?Q?hPgF+fW52YXjvkZ+IhSqIJe0BRyTEDKe960MIkDx0hxUElj4OOhj0OBuLq?= =?iso-8859-1?Q?3zk68Xv9HDuieLqh7Pgcu9vZ8IaP3Eyzxqdef8qUNtXiUFvpGnUdADENRT?= =?iso-8859-1?Q?iJjNMiKcRpJxMdJlAdov+jSTLVLwKh9ofWpC5K8s5Q2ef4gmEAd0mttde/?= =?iso-8859-1?Q?g2AjmGvR1D+EYaNlJx7KmFw2aVmczq1HzA+O4ZQW/UDVJoKWG+L4DgePuu?= =?iso-8859-1?Q?784EPH5Z8Elbwgfv/ASiooSaCinJdfcp2LLVqDn28ezI3QaernskhziwW+?= =?iso-8859-1?Q?0so2kWJS0f/D+0kF93YdaRFNF5GqNzQwOVEFygJm5bQtlXqVxL8Nzro0Qz?= =?iso-8859-1?Q?MLUvGLDpXFoVe79GoRj022I5Sq2N1/UJ21NKzoAthVhOBGyAmQsyTq6Hxe?= =?iso-8859-1?Q?hOzSa65q4bon6GC463p4oWVgnXFGq7Y7P/wcXeX62CO1PSU7pFaBWRA40M?= =?iso-8859-1?Q?IjAoIaQPImUf3WyGB6lIiT1ti748ayCmELLeRD5pBB5MR9Ah6QyvdksvdU?= =?iso-8859-1?Q?btFJnmHpFXISr3MXFgTQyi2Fm8KDGg5eCJ0qxrt6mVa/UZVKQnjGFVigkk?= =?iso-8859-1?Q?nxJpV51U/3eCVcVbLBh1+GczWtnRPjcKvPEugYvwc8D715Z3zk46fiY8nF?= =?iso-8859-1?Q?wY74PSbw/75StZsamzseQ69KW2y91guxJTQ1SAT0xNs4bC19mqP+cAAH2t?= =?iso-8859-1?Q?L7+ZdtwJbrD6TajU/szwUSMvrl/TE80rR55Kd35Ob78RcGrNEmQgvXfRf0?= =?iso-8859-1?Q?xPUH945WWO0dP0KZAMLYNcq0CHmKh8PuZEnesbb+ZX0cTizz/kt/j/etw0?= =?iso-8859-1?Q?4Abi5LR3y/Y6r6P4IVf6naOPZOTaHZJp22sEqqrNiZFVPIS5UQEMmwbuoq?= =?iso-8859-1?Q?n0HnVTIGh1V6T/hO7kZODcLXRoUXOYK66bFMQC8zJD50fKk=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1710 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 29bcf28e-44c9-4574-c689-08d94d29aa64 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wAEAgs2m59z6RsusV1jYGSOgeDIYyMh9z/tZQ4Y6qNVGg50K+IPzLHffpaWOGbSM+L9FaCm6MHJ+TCZiWK/N2KXeJ+EK3zbjwZPVnJeUxbskGo/Y1C8YMs9t9j5dlrHnKzlt9kRJpyEvGGCDSixqSeYKyAEJ3p4SzSNzcDQ7aidWmz3PDNK1bMfXZGT3wodpkI1WQgBt5+HuoqmkNU+t+wVtd0jWoM4cDUGX0wjoe72TAofTZSPGZJ2osSVRbF40/eMYlFamyt4FGoe0/KgvKxU9zMh/qDjwTH9J0dpFb5Fie4nqRQy5ZP14tmubNWSdqG/lBbssACnC4l58I6mx3/KZGKkxitMNXQspopNaXHDZxYvuF2qKnD1xInM1GPQgGf5tsD1QL1QuX8JXp/6yndwV8ozuK/NSKGLgMltzV/0KuO8JPi7BMpzhyvTwoAh+ym5BZmmLupWHl2DZ3lhKQoQtKFNAI7HWxZXOdYqBJVD++0bqJxqD792qevaDMmSxPhDZQFQP5aP+Y0pFtO3AYQqkA/OQJPmCkfbULi5AV5uHgv7IhMmSpYHQdrEp1D1XvwwNZ6tuvQ3sNcTCdo0MxtCCoSKRrewKbnu+hk4WNcVEN8b1lx0jaGCszSS3DEWr3ECIPCuyM3DuDhK41xCPZO+4E+W3Vrbiu6Mkwh2GsROdBJyCWlqnxxBtToL7Bj4QzUX96uGYTWfpKP/b3M5gDHzOTEuJQgxhJFvo4xOs7XQ= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(82310400003)(81166007)(70586007)(6862004)(316002)(36860700001)(2906002)(8936002)(52536014)(70206006)(4326008)(356005)(5660300002)(8676002)(9686003)(6506007)(33656002)(26005)(55016002)(86362001)(47076005)(508600001)(336012)(186003)(7696005)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2021 15:59:48.8738 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d489e7ad-0502-4942-d748-08d94d29bbcb X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB5944 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Improve performance of small copies by reducing instruction counts and impr= oving=0A= alignment. Bench-memset shows 35-45% performance gain for small sizes.=0A= =0A= ---=0A= =0A= diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/mul= tiarch/memset_a64fx.S=0A= index ce54e5418b08c8bc0ecc7affff68a59272ba6397..f7fcc7b323e1553f50a2e005b8c= cef344a08127d 100644=0A= --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= @@ -30,7 +30,6 @@=0A= #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= #define CACHE_LINE_SIZE 256=0A= #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1=0A= -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill distance=0A= #define rest x8=0A= #define vector_length x9=0A= #define vl_remainder x10 // vector_length remainder=0A= @@ -51,78 +50,54 @@=0A= .endm=0A= =0A= .macro st1b_unroll first=3D0, last=3D7=0A= - st1b z0.b, p0, [dst, #\first, mul vl]=0A= + st1b z0.b, p0, [dst, \first, mul vl]=0A= .if \last-\first=0A= st1b_unroll "(\first+1)", \last=0A= .endif=0A= .endm=0A= =0A= - .macro shortcut_for_small_size exit=0A= - // if rest <=3D vector_length * 2=0A= - whilelo p0.b, xzr, count=0A= - whilelo p1.b, vector_length, count=0A= - b.last 1f=0A= - st1b z0.b, p0, [dstin, #0, mul vl]=0A= - st1b z0.b, p1, [dstin, #1, mul vl]=0A= - ret=0A= -1: // if rest > vector_length * 8=0A= - cmp count, vector_length, lsl 3 // vector_length * 8=0A= - b.hi \exit=0A= - // if rest <=3D vector_length * 4=0A= - lsl tmp1, vector_length, 1 // vector_length * 2=0A= - whilelo p2.b, tmp1, count=0A= - incb tmp1=0A= - whilelo p3.b, tmp1, count=0A= - b.last 1f=0A= - st1b z0.b, p0, [dstin, #0, mul vl]=0A= - st1b z0.b, p1, [dstin, #1, mul vl]=0A= - st1b z0.b, p2, [dstin, #2, mul vl]=0A= - st1b z0.b, p3, [dstin, #3, mul vl]=0A= - ret=0A= -1: // if rest <=3D vector_length * 8=0A= - lsl tmp1, vector_length, 2 // vector_length * 4=0A= - whilelo p4.b, tmp1, count=0A= - incb tmp1=0A= - whilelo p5.b, tmp1, count=0A= - b.last 1f=0A= - st1b z0.b, p0, [dstin, #0, mul vl]=0A= - st1b z0.b, p1, [dstin, #1, mul vl]=0A= - st1b z0.b, p2, [dstin, #2, mul vl]=0A= - st1b z0.b, p3, [dstin, #3, mul vl]=0A= - st1b z0.b, p4, [dstin, #4, mul vl]=0A= - st1b z0.b, p5, [dstin, #5, mul vl]=0A= - ret=0A= -1: lsl tmp1, vector_length, 2 // vector_length * 4=0A= - incb tmp1 // vector_length * 5=0A= - incb tmp1 // vector_length * 6=0A= - whilelo p6.b, tmp1, count=0A= - incb tmp1=0A= - whilelo p7.b, tmp1, count=0A= - st1b z0.b, p0, [dstin, #0, mul vl]=0A= - st1b z0.b, p1, [dstin, #1, mul vl]=0A= - st1b z0.b, p2, [dstin, #2, mul vl]=0A= - st1b z0.b, p3, [dstin, #3, mul vl]=0A= - st1b z0.b, p4, [dstin, #4, mul vl]=0A= - st1b z0.b, p5, [dstin, #5, mul vl]=0A= - st1b z0.b, p6, [dstin, #6, mul vl]=0A= - st1b z0.b, p7, [dstin, #7, mul vl]=0A= - ret=0A= - .endm=0A= =0A= -ENTRY (MEMSET)=0A= +#undef BTI_C=0A= +#define BTI_C=0A= =0A= +ENTRY (MEMSET)=0A= PTR_ARG (0)=0A= SIZE_ARG (2)=0A= =0A= - cbnz count, 1f=0A= - ret=0A= -1: dup z0.b, valw=0A= cntb vector_length=0A= - // shortcut for less than vector_length * 8=0A= - // gives a free ptrue to p0.b for n >=3D vector_length=0A= - shortcut_for_small_size L(vl_agnostic)=0A= - // end of shortcut=0A= + dup z0.b, valw=0A= + whilelo p0.b, vector_length, count=0A= + b.last 1f=0A= + whilelo p1.b, xzr, count=0A= + st1b z0.b, p1, [dstin, 0, mul vl]=0A= + st1b z0.b, p0, [dstin, 1, mul vl]=0A= + ret=0A= +=0A= + // count >=3D vector_length * 2=0A= +1: cmp count, vector_length, lsl 2=0A= + add dstend, dstin, count=0A= + b.hi 1f=0A= + st1b z0.b, p0, [dstin, 0, mul vl]=0A= + st1b z0.b, p0, [dstin, 1, mul vl]=0A= + st1b z0.b, p0, [dstend, -2, mul vl]=0A= + st1b z0.b, p0, [dstend, -1, mul vl]=0A= + ret=0A= +=0A= + // count > vector_length * 4=0A= +1: lsl tmp1, vector_length, 3=0A= + cmp count, tmp1=0A= + b.hi L(vl_agnostic)=0A= + st1b z0.b, p0, [dstin, 0, mul vl]=0A= + st1b z0.b, p0, [dstin, 1, mul vl]=0A= + st1b z0.b, p0, [dstin, 2, mul vl]=0A= + st1b z0.b, p0, [dstin, 3, mul vl]=0A= + st1b z0.b, p0, [dstend, -4, mul vl]=0A= + st1b z0.b, p0, [dstend, -3, mul vl]=0A= + st1b z0.b, p0, [dstend, -2, mul vl]=0A= + st1b z0.b, p0, [dstend, -1, mul vl]=0A= + ret=0A= =0A= + .p2align 4=0A= L(vl_agnostic): // VL Agnostic=0A= mov rest, count=0A= mov dst, dstin=0A=