From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS, UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 29F9E1F8C6 for ; Tue, 10 Aug 2021 09:37:54 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 957273846078 for ; Tue, 10 Aug 2021 09:37:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 957273846078 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628588272; bh=0F68tv8f3Rzi00LnTj7DVXoatZ+J7PtxfD33VQfxqdc=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=MOWCxr8/hCZGXiwlc8jSpRT88VMEb3RVva5268svSqbL+TKprSNU3/06kiAymtBfx /gG7qA4NRLdVloxqj9y5aQun4EKzndAzRrsGi7JRvTldlzX5TS0IKclGy2+Unj8Yhm f5rPDh4X4dyf4bM14mF96CPeAZXoBvhrtLYJKt3Q= Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80048.outbound.protection.outlook.com [40.107.8.48]) by sourceware.org (Postfix) with ESMTPS id 727AA3857017 for ; Tue, 10 Aug 2021 09:37:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 727AA3857017 Received: from AM6PR02CA0027.eurprd02.prod.outlook.com (2603:10a6:20b:6e::40) by AM8PR08MB5857.eurprd08.prod.outlook.com (2603:10a6:20b:1d2::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.13; Tue, 10 Aug 2021 09:37:10 +0000 Received: from AM5EUR03FT054.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:6e:cafe::35) by AM6PR02CA0027.outlook.office365.com (2603:10a6:20b:6e::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17 via Frontend Transport; Tue, 10 Aug 2021 09:37:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT054.mail.protection.outlook.com (10.152.16.212) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Tue, 10 Aug 2021 09:37:10 +0000 Received: ("Tessian outbound d9f41274f41a:v101"); Tue, 10 Aug 2021 09:37:09 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c7a4e2ce40ddfbe1 X-CR-MTA-TID: 64aa7808 Received: from a23d105a593f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 210EE1D6-FB6A-48DF-8B99-EE9C6C9562D5.1; Tue, 10 Aug 2021 09:36:59 +0000 Received: from FRA01-MR2-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a23d105a593f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 10 Aug 2021 09:36:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kd7rRTPp7etMTcXNEP+iIsIvr1T03I0PYEfSi2dEpOpivNhue373YiCSwZftCaHEvja7tDYcTnSY71yVFRlPSTRP//HuWsE4wgjK8WcBiqzOPz9j8yu1nYg/x0m3BFNmxmMvYSjaFQiwH0XX5cOg40fvpy/G5B3g2sacXcY7cOEAfFBWfXTCJZbbDD3SBw8tOuEy//n3q+Z04M6UhkHxcCihTTb4dPACQf0/YONC44j/QZzhKVyo4Y/uxKANuV8Jnh0ryOPB+ixo3Jgfo2Yx9MB/Lq4JgflDoa1F+njrxDUFTELO/DuMLaySvY/m/cBMqoCa4O7LJlcAIabiRScizg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0F68tv8f3Rzi00LnTj7DVXoatZ+J7PtxfD33VQfxqdc=; b=cYAcsgSXVpO93tXnVFuQP9odhfIOC2oBGGzqxV5/FWDi2TFEYdx5qvEUJO6We4eOn8pzNrfV8lEXeIEyRBsoRfSXImoVel9HMMIqb4FPaDnuSq4KZC4TAcOZg89paqbyI7UJluxe0pQx6WqrFtG/ZJgBK4TvWJSvKbTzRXGBWj1MiGL/WVrqZcJ6CjQRPzQ1+cBt3DgegOIsriBUBPxupGKGwLAelpbKyZldsovfM7/ITaaSjfyjjnI5LoRTKY0C0Nj50JVBbgqeg5+xAEEl6JDE8E2R/WXTw7MjlqCmAcV5WZot0d+EPbcCyoPRIIy20dK0wCyOGmEDonkH2DlTQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) by PR2PR08MB4681.eurprd08.prod.outlook.com (2603:10a6:101:20::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Tue, 10 Aug 2021 09:36:55 +0000 Received: from PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6]) by PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6%7]) with mapi id 15.20.4415.013; Tue, 10 Aug 2021 09:36:55 +0000 Date: Tue, 10 Aug 2021 10:36:53 +0100 To: Wilco Dijkstra Subject: Re: [PATCH v4 1/5] AArch64: Improve A64FX memset for small sizes Message-ID: <20210810093652.GC20410@arm.com> References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO4P123CA0152.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:188::13) To PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.49) by LO4P123CA0152.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:188::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17 via Frontend Transport; Tue, 10 Aug 2021 09:36:55 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 54f8744f-cffc-407a-1676-08d95be26d1b X-MS-TrafficTypeDiagnostic: PR2PR08MB4681:|AM8PR08MB5857: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:313;OLM:313; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 1f+y4BOsB3r8yXYrvbP8h2qGo/3wTDYq5VCss4zyF/wfHgK+P+okmp6qVoz31Lj6+JmvNEHHbghE8bsX3j3asKmh519UkTWkz4RdH3/60Q6dpWgZQqgF8SwoCE3eueD5mEbko1fja6WUvWbsYIEC6LGIb1tpEbb6uQXEm+8TGdvTaIcdtGBjExVhdHcE12mijSDNCiJxKTU0lItow0Bp59oezyW7HPDvaA+1TO2whttfFEW6enC8AXcWzxNvR1cPcV0eBcJqmeKMGupXjiTtN+Gq84uYwKE6FlPzqCMlM2x2WmwLzfUlUkfx7TTJcozwqV2S75frKPsBo3OHvvF+FF85M+7bXpN8J2liDWEvTm4RuYTICEtbKHl7yHqgO/rGBThuuAWyLVHMd6tavcOSw57egaOCCXeWXcMA5glDNwqgfW7PgZhk2VQ0WAC4ooIkXqx8fXwIP0ucHo7IaUlHBJ15988qyKDuLK3YBkbD3ThB0KlOISYFYnsWkrm1WWlTwMFhwnYjEXY6QXi8EJ1EGJTb4Qd+kEsK9yrALVd/twIFn0cs+Ej+JgPi3z1NsQ0QvCCVLFBW5YeJmpRpShEOPIQL7n0IBOuGBGCBiWkdkM2hYvG7SR14RC+v2f4YUJfN7F8fFRZ7Aquoama6TFbsfOS2+b+zVETAsvkURM6r76LlJMnii6hix2ymcSzBRsidKAmJFuJjtXam/ot5Is0rZAm/nt9+VcYT0lj4+JhO8XA= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PA4PR08MB6320.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(8936002)(4326008)(2906002)(37006003)(54906003)(38350700002)(38100700002)(44832011)(55016002)(956004)(2616005)(7696005)(52116002)(8886007)(6862004)(508600001)(316002)(33656002)(36756003)(1076003)(8676002)(66946007)(26005)(5660300002)(186003)(6636002)(86362001)(66476007)(66556008)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?YUI3d2hjdmltR242eFF0Q21qQUhpWndXZFB3ZDB1T2grS0dnbWlMVHJUZnQ0?= =?utf-8?B?U0Q5ZGhZOUdXSXI3Uk9hTDVZL1FNZStURDh2RitQUlFKMDVnR1ZiMWtGU3hB?= =?utf-8?B?eUd1b0xCWWR0bkUrM2U4YVhqSm1YdisxMGFkU0NDcENqbUkwZXpORUszUG1M?= =?utf-8?B?aXVxOUxhT3U5TUFtY25EVjI3bXp1NmZYQTFXQTBuVHB1dnVQMGtRZ2k1RFQv?= =?utf-8?B?cVJUdHdCQkVwUndoa1VZSVRQMjhPbCtNaGpicGwveXk5Y2pxbWVDSHJZVEY4?= =?utf-8?B?SXBFaXBFazZyZ3ZvN3VmZ3h3c1hLbUVLTWhYQ1ZrNkUyaUJCSGtQUjRwMlU2?= =?utf-8?B?MkpjVXNma2gzOUh1TFp4T0xvTEZESnJLTDZ4UGQ4S1crT3duaUQ0UDhIY2p6?= =?utf-8?B?YlVId0I3cUdKV3EwVUJKem1OQ1ZDOHE4RmZWS1pRbE1tblI1L3E3T0RsREcv?= =?utf-8?B?WlZZM1BZS0xMRWlFdU5iTFYwWi9GSnB6T3JUakJ0TVVIbHBvUjM1NE5jM1pX?= =?utf-8?B?SFh5OWRjQVVvV2xibmdlNks0ZkxPeFprbXV5cXZJQ2syVXViU0JmNmZwdGlQ?= =?utf-8?B?cVhITnBIZXNyRUhlMjM2M3diYXF2TFRkTnlkaUg4YThOenZZODhKZG1uaG8z?= =?utf-8?B?OHAxMHBoUUxRVTJtR09zK0pyUzczREpSVGVmTUdoN0tDVFgwM0wrMTZBVnlZ?= =?utf-8?B?MkI5aHlkanFDYlBNdXBKWFZGZkJYY0JWeCtENi8zSzIvelBLSlpRQVREaHRx?= =?utf-8?B?L0VidnlENjkxQ0lDbHNwcEt3QS9nNlZVbWkrbkE3KzFCYys3Ym9ic1laYmNF?= =?utf-8?B?T1E3ZWp4eDhWbXUzUU8vSHZnZERnY2FKbG9QQ0Z0MTZMTktjQjVUaGxnMVFF?= =?utf-8?B?eU9sa016cWZOYldpcGZtVVZ5RzkvVEx3RU16cTREM2R0Q090NC8weDVRWEVk?= =?utf-8?B?NTZ4VjNteWpHWFl5NUd4SmJsci9NL2owbnJrOGJjVlhTUWNqdFIzTnFmUVpk?= =?utf-8?B?YjJjV2R3clo2a3Z4aTRseHFEaCtvZVdTWmZnWHRWSjllbzllc3UzUitLeXQz?= =?utf-8?B?Q1U4enEwSVpGQ2dLRHEyVldaSTlSMExDVXR1bUo1c0RqN0JkS2FEZFFCbkRq?= =?utf-8?B?UDdMV3lacG1RTnBEU2pxSGdNVU5PQXVUWjlQOG4rb1RPTjRtY0lUdThZZC84?= =?utf-8?B?eEZkN1Y1SEpXZ0s3UGZMMUZkVGRUdHZLaFVyMUZlbGsycG53aXdoYldySnNH?= =?utf-8?B?M0I0ODRjUlc4VkFUN1YxMHVtS1AvZmVveTk1Q2diU1E0WXYyTGVIazNTRmtB?= =?utf-8?B?R0gwd3Q4U056MExUNlFxeXZNTXpOYTUyaWFIWVBuSXVGaXh1YWZBRzVYNlpm?= =?utf-8?B?OGZSbmNSZy91eFh0NWJqQXRMYndsNmxYY0t5QVp0WlNIQkU0Q0JqV3kxektS?= =?utf-8?B?U3Y4cjJGSnVWWmdPUEtXaWpxWWJmNWxOLy84U01NVEQ1enp0Yk5oVHQvbFFv?= =?utf-8?B?SWdqWGhtVHJNRENTUEUzNHFKdVdBd2JhR0RXRkNrN3FzY3hjRHNFM1R2UTJ6?= =?utf-8?B?YmYrN1BTSHZ2VzBxelN3WG5US3FScFJCWjR0MUxKd2xyWG9aZk9oR1p2RklZ?= =?utf-8?B?d1lnL3dSVVppRmNyYmQ0V0V3TlMyWWxGekZIaFRRMlN3MUh3eUxiN1VPd29V?= =?utf-8?B?ZzhHaVA1NGpNSlRTUnBkaFVGQlJuMnNRUlk0YWtQd2UyRVg4V0J0ZlFsTGJ2?= =?utf-8?Q?Wcxs0OKuFv/bW8Z9lFIfKa/WcWbrzFk8Eopa+xt?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR2PR08MB4681 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT054.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 6efb20f8-65db-4630-19bc-08d95be2643e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: btxWiGutFnlr6SWzuQeCCGsLWwbjKxRVaRweaByuKf7zdJtEGDSHNQju6yEvlAq1dYzjrG3QY7y+d5gGHvejTjKFM6JNVFUDpquNPeILLAmeEKJlmtj7I+YxP8kgnXYDNBufmCjJDb48fk/iK+pEAEY9jtHrP4G+zFRSd60Lqz6fgrRtRMRc80R0NDnvomh68kCfVsTL/m1hpKBY0w/m2w8RDtvVNTkt02wqx0lwAP4ecvIxUO3CEok+e6oeZgpC/buVzniqd0d5mrk3/L1MvF+2t+3FfGBTRMfzsG+7RgjEypQ+GlqXQSy2TX8mfUbSgMpdYSoPtDS9VG7pHQyop+1nHEz7AYuJR3Rua20T/YL/TCvjgYDcDP/CsBX+7BsnVxk4NbPEH++RPdXYzc+NAo5hHAKvDoEGw52y5GUbyWXCQqDkiJXq2o2oKm47rfe+HVtMJm9kvxCaL780z/WNkHfo6th550mWeacu5wZl7uHBGESIYTGfD6KNd9OWhb64IsMSg01SpcBNorj/JbtyCtXCBHn8bQVRqMEC1ty9JbcUzUwV3/j/lkvcgDBRMNiZyTgepVqkxJIM0LLUTvI+Tg3mHqK+OAlkyAVLR/UoB8+Pr1LQDY6MhjhOkt9MEASoYvP7ktf8Oi54/cJMi0XqbPJy+e88FO8f9jY72ooTwHJyd9y4aumGH7bQt5ziRO8+Pscjp/OettkwdkdQsZRcvh1JGT9tvOS/biqP+ZsXzBE= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(86362001)(47076005)(55016002)(70206006)(8886007)(6862004)(70586007)(1076003)(336012)(37006003)(82310400003)(508600001)(54906003)(8936002)(81166007)(33656002)(36756003)(6636002)(7696005)(316002)(956004)(356005)(2616005)(186003)(36860700001)(8676002)(26005)(44832011)(2906002)(5660300002)(4326008)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Aug 2021 09:37:10.0335 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 54f8744f-cffc-407a-1676-08d95be26d1b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT054.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5857 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Szabolcs Nagy via Libc-alpha Reply-To: Szabolcs Nagy Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The 08/09/2021 13:07, Wilco Dijkstra via Libc-alpha wrote: > v4: Don't remove ZF_DIST yet > > Improve performance of small memsets by reducing instruction counts and improving > alignment. Bench-memset shows 35-45% performance gain for small sizes. thanks, this is OK to commit. (if further tweaks needed that can be in follow up commits) > > --- > > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S > index ce54e5418b08c8bc0ecc7affff68a59272ba6397..cf3d402ef681a9d98964d1751537945692a1ae68 100644 > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S > @@ -51,78 +51,54 @@ > .endm > > .macro st1b_unroll first=0, last=7 > - st1b z0.b, p0, [dst, #\first, mul vl] > + st1b z0.b, p0, [dst, \first, mul vl] > .if \last-\first > st1b_unroll "(\first+1)", \last > .endif > .endm > > - .macro shortcut_for_small_size exit > - // if rest <= vector_length * 2 > - whilelo p0.b, xzr, count > - whilelo p1.b, vector_length, count > - b.last 1f > - st1b z0.b, p0, [dstin, #0, mul vl] > - st1b z0.b, p1, [dstin, #1, mul vl] > - ret > -1: // if rest > vector_length * 8 > - cmp count, vector_length, lsl 3 // vector_length * 8 > - b.hi \exit > - // if rest <= vector_length * 4 > - lsl tmp1, vector_length, 1 // vector_length * 2 > - whilelo p2.b, tmp1, count > - incb tmp1 > - whilelo p3.b, tmp1, count > - b.last 1f > - st1b z0.b, p0, [dstin, #0, mul vl] > - st1b z0.b, p1, [dstin, #1, mul vl] > - st1b z0.b, p2, [dstin, #2, mul vl] > - st1b z0.b, p3, [dstin, #3, mul vl] > - ret > -1: // if rest <= vector_length * 8 > - lsl tmp1, vector_length, 2 // vector_length * 4 > - whilelo p4.b, tmp1, count > - incb tmp1 > - whilelo p5.b, tmp1, count > - b.last 1f > - st1b z0.b, p0, [dstin, #0, mul vl] > - st1b z0.b, p1, [dstin, #1, mul vl] > - st1b z0.b, p2, [dstin, #2, mul vl] > - st1b z0.b, p3, [dstin, #3, mul vl] > - st1b z0.b, p4, [dstin, #4, mul vl] > - st1b z0.b, p5, [dstin, #5, mul vl] > - ret > -1: lsl tmp1, vector_length, 2 // vector_length * 4 > - incb tmp1 // vector_length * 5 > - incb tmp1 // vector_length * 6 > - whilelo p6.b, tmp1, count > - incb tmp1 > - whilelo p7.b, tmp1, count > - st1b z0.b, p0, [dstin, #0, mul vl] > - st1b z0.b, p1, [dstin, #1, mul vl] > - st1b z0.b, p2, [dstin, #2, mul vl] > - st1b z0.b, p3, [dstin, #3, mul vl] > - st1b z0.b, p4, [dstin, #4, mul vl] > - st1b z0.b, p5, [dstin, #5, mul vl] > - st1b z0.b, p6, [dstin, #6, mul vl] > - st1b z0.b, p7, [dstin, #7, mul vl] > - ret > - .endm > > -ENTRY (MEMSET) > +#undef BTI_C > +#define BTI_C > > +ENTRY (MEMSET) > PTR_ARG (0) > SIZE_ARG (2) > > - cbnz count, 1f > - ret > -1: dup z0.b, valw > cntb vector_length > - // shortcut for less than vector_length * 8 > - // gives a free ptrue to p0.b for n >= vector_length > - shortcut_for_small_size L(vl_agnostic) > - // end of shortcut > + dup z0.b, valw > + whilelo p0.b, vector_length, count > + b.last 1f > + whilelo p1.b, xzr, count > + st1b z0.b, p1, [dstin, 0, mul vl] > + st1b z0.b, p0, [dstin, 1, mul vl] > + ret > + > + // count >= vector_length * 2 > +1: cmp count, vector_length, lsl 2 > + add dstend, dstin, count > + b.hi 1f > + st1b z0.b, p0, [dstin, 0, mul vl] > + st1b z0.b, p0, [dstin, 1, mul vl] > + st1b z0.b, p0, [dstend, -2, mul vl] > + st1b z0.b, p0, [dstend, -1, mul vl] > + ret > + > + // count > vector_length * 4 > +1: lsl tmp1, vector_length, 3 > + cmp count, tmp1 > + b.hi L(vl_agnostic) > + st1b z0.b, p0, [dstin, 0, mul vl] > + st1b z0.b, p0, [dstin, 1, mul vl] > + st1b z0.b, p0, [dstin, 2, mul vl] > + st1b z0.b, p0, [dstin, 3, mul vl] > + st1b z0.b, p0, [dstend, -4, mul vl] > + st1b z0.b, p0, [dstend, -3, mul vl] > + st1b z0.b, p0, [dstend, -2, mul vl] > + st1b z0.b, p0, [dstend, -1, mul vl] > + ret > > + .p2align 4 > L(vl_agnostic): // VL Agnostic > mov rest, count > mov dst, dstin --