From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 62FDE1F8C6 for ; Mon, 2 Aug 2021 13:53:52 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B3768383541A for ; Mon, 2 Aug 2021 13:53:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B3768383541A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627912430; bh=FFdMpoS8u0Y7ai6zOrJabNBTobouO24IyxTdD9zK3Hw=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fWfa3lfOZOodHtpQfxO/V4FNbUlRA5pOABryempZJ4nczWf5njK7q93MGkOeSQyvn 2ZHmn8fuXN0rV7F9EjinL+FGWZ7+oLIRNsVKuq2KWcVuHg6mHgBQPfI7em+2GVGVrl 3DTrpjl/P0Ffe3BAvK8nmfDaiUUKFhuSBX4ega3o= Received: from esa5.fujitsucc.c3s2.iphmx.com (esa5.fujitsucc.c3s2.iphmx.com [68.232.159.76]) by sourceware.org (Postfix) with ESMTPS id 8CC8A3853809 for ; Mon, 2 Aug 2021 13:53:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8CC8A3853809 X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="36014649" X-IronPort-AV: E=Sophos;i="5.84,289,1620658800"; d="scan'208";a="36014649" Received: from mail-os2jpn01lp2051.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.51]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2021 22:53:26 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bL16105YvIg0Yrz+vyZAtVRThmpiLYiqSGzBOImt2Hu1pWTJMHRYKQKlQ7l2mKQaiPhkDzDbmhUXOcPtfhKh0p5sNDUavIuyFemllt73g5DgYXbkU1pvNqLWbOnVn+DTBZPENfOofZR3S7Moi7/48zmgSWshxSvwnt+mA1nU6G17x0tqzNQLpAma1s8nIreK+A5frxoeaerk+pY0wRIYq4ToPCk6dXCn412W0+eyWlekoKup6mBiTNMEGk9yyHmsGcjgKGyhc/gObLabUuX3iKpTESDjq2ri5AY0g+Pcze2n9u1n1unts5NoQTwte/P+MuJKKNoA6RmvGMZgQ2aXUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FFdMpoS8u0Y7ai6zOrJabNBTobouO24IyxTdD9zK3Hw=; b=j4xsEdfh/kW8pa5hvIKLdEilZ8kbr7+wOOYhtVkKz3OHXR2r6OmNVrK4j5MdS3mqE4zkIx3N6R4sRVdGoDZRhvWSPizQNJnnV5P+UZhS+1qZeDHMpE8hKSzeFdjGzVG+9j/faW+PA4FwxAAg+XcywENK2Rx1RF2DLr02R3GEDT/d5tgFRq/WxHfZttrxEnFdrWTGWU9lR8+GgabGUL+EAcaJAHkJEA6W3jaW5V3hwi59CaRviIyeBq6p3VFDvrAc35DPULn6Qli9aBtoyNazSI9GihvC86AIkeNfJ75ypxl2xKP9wGljFFAob+fQKBFkB/2i4TL+oMY7OOVekmNlww== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYAPR01MB4127.jpnprd01.prod.outlook.com (2603:1096:404:c5::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.21; Mon, 2 Aug 2021 13:53:23 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4373.026; Mon, 2 Aug 2021 13:53:23 +0000 To: Wilco Dijkstra Subject: RE: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxJfA7Ac8a4LAESeqDyonguxdKtYES17gAgauZA= Date: Mon, 2 Aug 2021 13:53:23 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-02T13:53:22.785Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 5f14d11f-f2ef-4d74-98a0-08d955bce4fc x-ms-traffictypediagnostic: TYAPR01MB4127: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: kLQ1Ve1EMOMacpR/ZmbC5xEGzSU9Wz08KkzTO+XVmW2kFezcSznRTIAMV6I8aymeCJ6SP9dDKcT6UVBRbCtF7knQah0Aj1Ptx7GQc13INQn3WQSTZ/8eEO/WNAT1B0HhZ3ZjBj2qtBLqFOfE6t+36/jrDyUcmgSgzDLpu8sP8oOWuMZGWCEqlJCWnAzNHM5WwNIcsc88qs46NlYda3rGQ4mW6pST5rJzr9/a2IYacxRAS3nwuX4eLATGCKnyz5ynISenWo1lqBNzNpjoYePymLbx+N96jYNMvqS+IwWD/sD+5EqGrG9BF24jByMqAu5OEs/mXXaTXyA7IZrArVm2VudKYcBuktJZekrT+8qdVOi+oZwl7UEhJ3XqQHA39DxE+Vhjod3jDZ9HLc7s0/LDCysYbsYyCy+dBnngfVE4eGB7KPGeqzWZNvaZ3ysVsbeEVLX1j++OxawZ+usf57SWqpomvK4nvU5Xdd0tI1ndtxDlyxh5uMytd2cjicfWE3a6b8akWRAI0AZtz19ohld5xN7mL+WtOzWqJES3LzxeER3cynulZ8dk+kJVs0/53XVYo0/QllDLInrV3mdzlg7VicnI1YKY3nmaGpv+9bGODEK8Js9ShyNLgKq4dEeGVXWRQV1FYscLWpfSeZyLQli9yiWHgGb2y7lFLS0vvNliSe3GqyCly7HEEr15AVerONUp143hIH/Z8hOd7dv4vZzmcfpAwFmMYjxzaHd0mdLA77IrZsKafHv05Ye+UXgCbsIPFuNCDRfUQ92Qp1d3gYtD3CHXzF9tvURcZSHv6QsJJm9VtLgDAN4T9xvxyRwNbKVu x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(376002)(396003)(39860400002)(136003)(366004)(64756008)(26005)(66476007)(66446008)(66556008)(38100700002)(76116006)(122000001)(66946007)(186003)(4326008)(9686003)(55016002)(33656002)(8936002)(6506007)(53546011)(38070700005)(316002)(478600001)(71200400001)(83380400001)(86362001)(7696005)(52536014)(85182001)(6916009)(966005)(5660300002)(8676002)(2906002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?aU04bW55ZXpkWm1wTUU3YlRRVUd5dElISlhITFR3UTZYSk9EemR6Ykln?= =?iso-2022-jp?B?ZWx2WC9ycG1RS2pITDljd1JFbWNnSXN4aGFVNFJoeEFtZTJTT0lCRFFr?= =?iso-2022-jp?B?b2YyZ2EwM2ZobXlpMGhDQWY2bG9RUjZYOWNxOTAwS25qR05xcWdGamVn?= =?iso-2022-jp?B?QjJkWm1RVURtNnZQQURVUmk0VEZ0aFFGTXRGdUlQWTJReGEzazJnSUxv?= =?iso-2022-jp?B?ZHFWeU5CZmdXR2Q3NGZ2YkpVaUxUa0VBanVTWWJTb3NLUGpFWkp5UURs?= =?iso-2022-jp?B?YVIrWlVsNy9KY1ZRbUxNazNFWnoxSVpSb2ZpNWtvWmlYMG1KM3pmcFMz?= =?iso-2022-jp?B?ajM2QWZrNCtTaTFKZmloOHhHMGd5WWpqQkNMSmZ0Q2dqYjFnMVNLRkNN?= =?iso-2022-jp?B?SmtFOGZCZXBnTFRuVW9NbGhwam5CV1JVdUYwQTdnY0JWZjU5RTBoaE9U?= =?iso-2022-jp?B?elVibnJ1WE5ucGU4VjVsVDNLWVRHQ0lrTzkxNU5DWi9ma2JUcUprdGdn?= =?iso-2022-jp?B?aHZRMS9TM0U3VkYxc0pwNDNTWFQ5Q2w0bjRxVkUzdnA3alhUbjhiV0Ry?= =?iso-2022-jp?B?WU5ib3BUWDZoOG1CN1dvZDV3a2E4ZExmYjhSeE42UXoyU0pRcy9aaEow?= =?iso-2022-jp?B?Q3RHTHpYYXdKZ2VydDluQ3BDbmZyZFpRMjFyRDNJWUd2RDRXckk1cFN2?= =?iso-2022-jp?B?alp0UGNrZzZDczBFem1pT2o3cU9vTDZTcXprcnN5ci9Eekh5Y1hNaGZt?= =?iso-2022-jp?B?ZGpVSDA4ejcyZkRIMUxNRXhLNkNqcjNOLy9lc29IV2ZrV0krdzhjQlZ0?= =?iso-2022-jp?B?QzNUUGpBNlkrNmJmN0VrbGQzak5MOVprUnZBWW8zMm5sWGhTc0k3SUpL?= =?iso-2022-jp?B?OFQzNUhmcktpTUNoZlp3NzgwMGc1S0xtVVpxVUh0OXM4bzJVZ1RIcng5?= =?iso-2022-jp?B?bGJVUm9BL3NSL2QxYkE1aVpCZG8xeGt3bWlid3BKM3RBaEdpbVM5Z3ly?= =?iso-2022-jp?B?dEpnWHBZc3FFdmtGc2NGTVYzdUVwTzl0bllZK09IY25LTW5reFBubk81?= =?iso-2022-jp?B?MjZBcWRYN0VEOThVM05QaERzc2NrZGExZWp6MVhlOWxHY1RJT1lTb2Ex?= =?iso-2022-jp?B?algxUTlET1BXZjVtRUZFS0R0MjU2dVAwZHhieVFiZFFIQm5vTHRUVTln?= =?iso-2022-jp?B?cm9zY2JuMXBsUkxTU3hqcWxXWFU2Mkd5aHFtOXd6N3BBYmh5R01ycTFj?= =?iso-2022-jp?B?Z1lKOEp4RG9ybHlYMUdnelMyMTBURHUxaC82SGNNNzlwbnFrMFRqWGlG?= =?iso-2022-jp?B?WjlTSWsyejgvWUFCV256dUYwN2VaVVpHRGFQcjdGc2ZBM293bnVLSFJq?= =?iso-2022-jp?B?ZjRQTUdqRW9WV2FOY2NzZWpqTE14elAzUnRoVmpnanFxZGdCM3llRkhr?= =?iso-2022-jp?B?bXBuRy9rOFlkS29hZ2dDRXZJRVRVN3g5UTROcWFueVZOM29DdVZKSjNP?= =?iso-2022-jp?B?dHUwRGRpTXJ5YmxrSDZDK0hEQU1pZmM3T2lQRkU4K3VGU3FFQlE3cklG?= =?iso-2022-jp?B?ZTd1MHRLUC80WDZRM0ZSMnRtRWNyMTR1MGxhTXJjZW1QK1RQT09aaFpt?= =?iso-2022-jp?B?ckx6Ymo1UFZVSjdRbmVzV2ozNjFQOUJtRlRlYWx1UnVyczIvUTVqK3kx?= =?iso-2022-jp?B?NWs2a29wTTl0bTZkR3RSMGx3OTczT2IrOVpTczFHRUVrWlVoVlNONTJr?= =?iso-2022-jp?B?cHM5UExwTC9LaXlkS3pZcjZNZTFPU0ZqNG5EamZPZ0RuZWxGSW1ENHdX?= =?iso-2022-jp?B?a0NHRUttS0JXcXVIR3daRWNOV3pDMTJWekdDVnhLRTVvQ2lZaDlGeVBn?= =?iso-2022-jp?B?VjFweUtQR2QwMVA0Qk9BT016TndFPQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5f14d11f-f2ef-4d74-98a0-08d955bce4fc X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Aug 2021 13:53:23.2253 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Gj38FrXv3XCoesCJoClKjZ2oef47urXDbgLSOQxQEfL/AWvKHkBAZA97OqfM/pYwaID0kS+/Fm8oriDGZdRP0Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB4127 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= I have one question below.=0A= =0A= > -----Original Message-----=0A= > From: Tamura, Naohiro/=1B$BEDB<=1B(B =1B$BD>9-=1B(B =0A= > Sent: Wednesday, July 28, 2021 5:11 PM=0A= > To: Wilco Dijkstra =0A= > Cc: 'GNU C Library' =0A= > Subject: Re: [PATCH v3 1/5] AArch64: Improve A64FX memset=0A= > =0A= > Hi Wilco,=0A= > =0A= > Thanks for the patch.=0A= > =0A= > I confirmed that the performance is improved than the master as show=0A= > in the graphs [1].=0A= > There are two comments, please find them.=0A= > =0A= > Reviewed-by: Naohiro Tamura =0A= > Tested-by: Naohiro Tamura =0A= > =0A= > [1] https://drive.google.com/file/d/1DfYPMd6RRS0Z_2y3VH3Q4b-r8N6TyW1c/vie= w?usp=3Dsharing=0A= > =0A= > > [PATCH v3 1/5] AArch64: Improve A64FX memset=0A= > >=0A= > =0A= > Would you update the commit title so as not to be the same among 5=0A= > patches?=0A= > Because we need to ask distro to backport these patches.=0A= > If all commit titles are the same, it will increase the room to happen=0A= > confusion and mistake.=0A= > =0A= > How about "AArch64: Improve A64FX memset for less than 512B" ?=0A= > =0A= > > Improve performance of small copies by reducing instruction counts and = improving=0A= > > alignment. Bench-memset shows 35-45% performance gain for small sizes.= =0A= > >=0A= > > ---=0A= > >=0A= > > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64= /multiarch/memset_a64fx.S=0A= > > index ce54e5418b08c8bc0ecc7affff68a59272ba6397..f7fcc7b323e1553f50a2e00= 5b8ccef344a08127d 100644=0A= > > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > > @@ -30,7 +30,6 @@=0A= > > #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= > > #define CACHE_LINE_SIZE 256=0A= > > #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L= 1=0A= > > -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill dis= tance=0A= > =0A= > This caused compile error.=0A= > =0A= > > #define rest x8=0A= > > #define vector_length x9=0A= > > #define vl_remainder x10 // vector_length remainder=0A= > > @@ -51,78 +50,54 @@=0A= > > .endm=0A= > >=0A= > > .macro st1b_unroll first=3D0, last=3D7=0A= > > - st1b z0.b, p0, [dst, #\first, mul vl]=0A= > > + st1b z0.b, p0, [dst, \first, mul vl]=0A= > > .if \last-\first=0A= > > st1b_unroll "(\first+1)", \last=0A= > > .endif=0A= > > .endm=0A= > >=0A= > > - .macro shortcut_for_small_size exit=0A= > > - // if rest <=3D vector_length * 2=0A= > > - whilelo p0.b, xzr, count=0A= > > - whilelo p1.b, vector_length, count=0A= > > - b.last 1f=0A= > > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > > - ret=0A= > > -1: // if rest > vector_length * 8=0A= > > - cmp count, vector_length, lsl 3 // vector_length * 8=0A= > > - b.hi \exit=0A= > > - // if rest <=3D vector_length * 4=0A= > > - lsl tmp1, vector_length, 1 // vector_length * 2=0A= > > - whilelo p2.b, tmp1, count=0A= > > - incb tmp1=0A= > > - whilelo p3.b, tmp1, count=0A= > > - b.last 1f=0A= > > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > > - ret=0A= > > -1: // if rest <=3D vector_length * 8=0A= > > - lsl tmp1, vector_length, 2 // vector_length * 4=0A= > > - whilelo p4.b, tmp1, count=0A= > > - incb tmp1=0A= > > - whilelo p5.b, tmp1, count=0A= > > - b.last 1f=0A= > > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > > - st1b z0.b, p4, [dstin, #4, mul vl]=0A= > > - st1b z0.b, p5, [dstin, #5, mul vl]=0A= > > - ret=0A= > > -1: lsl tmp1, vector_length, 2 // vector_length * 4=0A= > > - incb tmp1 // vector_length * 5=0A= > > - incb tmp1 // vector_length * 6=0A= > > - whilelo p6.b, tmp1, count=0A= > > - incb tmp1=0A= > > - whilelo p7.b, tmp1, count=0A= > > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > > - st1b z0.b, p4, [dstin, #4, mul vl]=0A= > > - st1b z0.b, p5, [dstin, #5, mul vl]=0A= > > - st1b z0.b, p6, [dstin, #6, mul vl]=0A= > > - st1b z0.b, p7, [dstin, #7, mul vl]=0A= > > - ret=0A= > > - .endm=0A= > >=0A= > > -ENTRY (MEMSET)=0A= > > +#undef BTI_C=0A= > > +#define BTI_C=0A= =0A= We discussed how should be defined BTI_C macro before, at that time conclus= ion=0A= was "NOP" rather than empty unless HAVE_AARCH64_BTI.=0A= Now the above code defines BTI_C as empty unconditionally.=0A= A64FX doesn't support BTI, so this code is OK.=0A= But I'm just interested in the reason why it is changed.=0A= =0A= Thanks.=0A= Naohiro=0A= =0A= > >=0A= > > +ENTRY (MEMSET)=0A= > > PTR_ARG (0)=0A= > > SIZE_ARG (2)=0A= > >=0A= > > - cbnz count, 1f=0A= > > - ret=0A= > > -1: dup z0.b, valw=0A= > > cntb vector_length=0A= > > - // shortcut for less than vector_length * 8=0A= > > - // gives a free ptrue to p0.b for n >=3D vector_length=0A= > > - shortcut_for_small_size L(vl_agnostic)=0A= > > - // end of shortcut=0A= > > + dup z0.b, valw=0A= > > + whilelo p0.b, vector_length, count=0A= > > + b.last 1f=0A= > > + whilelo p1.b, xzr, count=0A= > > + st1b z0.b, p1, [dstin, 0, mul vl]=0A= > > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > > + ret=0A= > > +=0A= > > + // count >=3D vector_length * 2=0A= > > +1: cmp count, vector_length, lsl 2=0A= > > + add dstend, dstin, count=0A= > > + b.hi 1f=0A= > > + st1b z0.b, p0, [dstin, 0, mul vl]=0A= > > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -2, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -1, mul vl]=0A= > > + ret=0A= > > +=0A= > > + // count > vector_length * 4=0A= > > +1: lsl tmp1, vector_length, 3=0A= > > + cmp count, tmp1=0A= > > + b.hi L(vl_agnostic)=0A= > > + st1b z0.b, p0, [dstin, 0, mul vl]=0A= > > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > > + st1b z0.b, p0, [dstin, 2, mul vl]=0A= > > + st1b z0.b, p0, [dstin, 3, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -4, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -3, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -2, mul vl]=0A= > > + st1b z0.b, p0, [dstend, -1, mul vl]=0A= > > + ret=0A= > >=0A= > > + .p2align 4=0A= > > L(vl_agnostic): // VL Agnostic=0A= > > mov rest, count=0A= > > mov dst, dstin=0A= > >=0A=