From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8364D1F8C6 for ; Wed, 28 Jul 2021 08:33:12 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB1163853C08 for ; Wed, 28 Jul 2021 08:33:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB1163853C08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627461191; bh=irzlwx0no95AeCaGCvbdUMxw8nXRWr+kX9j0wJ4Shww=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=v/Uo7fmTjDJdVzdZDjXxcMvM9WqE+rKChWWfci6gjAqcSVGAZ4Hbuc5VNOa52MS/E ZerTtGMpu4+1z5oOuyD3B04I1qaKITLT1484YJWQmGEDoEMiblWgxOogrGQtoqHq5a Y0Sz71D3Qk9l1wfsgsKeoD4GCO2TtSNwPhWRJX5k= Received: from esa15.fujitsucc.c3s2.iphmx.com (esa15.fujitsucc.c3s2.iphmx.com [68.232.156.107]) by sourceware.org (Postfix) with ESMTPS id BB903398B88D for ; Wed, 28 Jul 2021 08:10:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BB903398B88D X-IronPort-AV: E=McAfee;i="6200,9189,10058"; a="35615695" X-IronPort-AV: E=Sophos;i="5.84,275,1620658800"; d="scan'208";a="35615695" Received: from mail-sg2apc01lp2058.outbound.protection.outlook.com (HELO APC01-SG2-obe.outbound.protection.outlook.com) ([104.47.125.58]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jul 2021 17:10:58 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Iby9Du2zVgBJbJski9idwLzzBegiTLIWHq4uYxIxcgLFzGgFQet11JCJy2UWSemdGGgij2bmm4da6pBzgj9j+Xx9wrWcS9E2Fi4nijj7LT3CILGZWhoLOB5HNI2BQqYAxBI9QfMxrGOMIN+5cP9m9h/EzzadIV+BqL1U4/S5zEnpbEthh6nX+DvgwzblTRfKmEgf060gWT2pSG5EXpdQD3xZM5blLMz7ZVpur/WnxGlimRhibNFGQUW5oMtOzKQFBVzTaLkfVrKkTK9NqegiS0GaSkFtoTeK8RsTFODpy1lY+0cmJ+hNOPy/qmRsMD9kE5iSsRPKcVXats9VHbg/uA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=irzlwx0no95AeCaGCvbdUMxw8nXRWr+kX9j0wJ4Shww=; b=VDECc6Nq94LUNYylwqh1Bxuaern82Ed1mO3+ewwrdCzAxmKMntrFflFSRvyfgdzlEANbwWpvAo+fCGMckb6jtoIkO921dqHtfH0fESRos5nNLTGCjzhbU3PHbNjskPjyOPRHInHfaybtW7NLblo6UClAxB9TW0ISIocHZq55++gv6RHgcR8UC2qu5uj1SVr/sRg2+u1QR8T//kBQIywHiFcKR7l3KuSvY/ZcSPy0VUv0C02+PsHCQGEc48GTX6pQVzscU+2PzdIfyyUMb94M0Be3Xmrbu1Qs7nznQHre/rj+IytkGpqXAdQZ85dtjjeDAhsWTt+XOwVx0DkSrkhHsg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TY2PR01MB3083.jpnprd01.prod.outlook.com (2603:1096:404:78::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.29; Wed, 28 Jul 2021 08:10:55 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4352.032; Wed, 28 Jul 2021 08:10:55 +0000 To: Wilco Dijkstra Subject: Re: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 1/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxJfA7Ac8a4LAESeqDyonguxdKtYES17 Date: Wed, 28 Jul 2021 08:10:55 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-07-28T08:10:54.558Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9653343c-27b9-458c-bfe9-08d9519f3957 x-ms-traffictypediagnostic: TY2PR01MB3083: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: pbp7aL/1/EWf7GFGeJKMHJFbeue5cHE2TbupLlzLuVSaREt2JJEZahF2tjkLXJxq4xRhdkadQGh2Ex3AqNReYhgBXk0vUewC4PzhDA+maI9D1C5bGCuyFijCXsVXOtQoku61+FLQTpMCl1SCvoOU/ybX09pp8aIZxnDVIGjgU/3a3Msc/mbWM5DJD8ACF5XK189rkSp4C85MMyG7pUl7lKVF3VxXyli7yBxlUfnPNr7SfaO9I8PK+nQKOQ/1Ya+5PXX4DZaotvMPlqqHGGGf5nq/4PIVdKQYm/dPQ13b+o4YpuFHeJOsnmVgBDeINQpBOWNbLjuyGPdQQ1RusFnBg5yRqEVRAo6a1jCZ4OfeRP/5rwLy6HYz5GTk0778TRt77d6zBUIBGB/eQ8UHp1L/Q6hCgi9mzLUbP7XD9DClB8zO30vH9ew782RDpRQdPpqvaWd2uKtClnGY77VdxORjYoN4p6JSCuXFlAKs22KnEf9R9J8risGBmW9g2XH4Hccak08exL56eKd0312ehAsMEnrLk9vmjXKmI2Swlj9faV8hsCyIkfulRA+m4PWrWvXlvev8pqPOjUSfMe3jm5h3lwJvBWKTaZUyq1CjCbYf/0FdzliQhvhlsUynARMsESd4OXrQxMGHBaSMWyPG+KiwkM7DR01KwdkaTIaC90Y96v+KE8A09bdBtuVrYdceSQB47a6r+KOeMtcHGXU4Lb/tmPDXiEc0nS9le/zj2yFVqMGJfX7E5/6ED+irboWnWk/ibxLA7X1rwUiyHef2LxwA/mPNvAPE58cwHU8VJhHrChRvsx9egVOng8acHqDqZITM x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(346002)(376002)(39860400002)(136003)(366004)(5660300002)(122000001)(4326008)(8676002)(85182001)(83380400001)(66556008)(33656002)(186003)(66946007)(66446008)(64756008)(478600001)(966005)(38070700005)(86362001)(9686003)(71200400001)(55016002)(8936002)(76116006)(38100700002)(66476007)(6916009)(26005)(6506007)(52536014)(2906002)(7696005)(316002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?czFPUlNscURXcWVzT2lOUExlRGt0a21jVGhHZFNGLzNHc3E2NlZKd1Fy?= =?iso-2022-jp?B?VVYwU2JDbFNMTHlZZVVMV1ZJMS84SEc3ZFQ1emNnanM0cWdEa0NsUzc2?= =?iso-2022-jp?B?U3RGSWVCcFN4TWR6L3F0SDlXUUZmYWpMVXR3cnkrMFA1R0t3ai9RV29U?= =?iso-2022-jp?B?a05jSW5oWU5CMUl0b24vSW1FR1B5UXBaQkhRTjJLOWFrUWdaZSt4dUFK?= =?iso-2022-jp?B?cWtGb3BNNml0TlBxZzdXaFduRXRLQXQ3aHZBb2JOc2czSzlFYThUR2wv?= =?iso-2022-jp?B?TW1qQ01HMVYvVWJHMWZrRGFtNWNNaXFvS3BnbTJRaExPc3Jra0hBQTJr?= =?iso-2022-jp?B?cWhGczZFNlR1Zjl0M3UzRmk0b2M1NVN3Y1IyRGtLU1Q4OUU4Wm5uMDdL?= =?iso-2022-jp?B?UFJqeWZjRmZhNzRxdWVMaWdObUtOU0Jab3MvWUkreUNydTBWanlEdjFx?= =?iso-2022-jp?B?dVBDSG9QZTlySzNuN3RRdmViY2FPS2tPeTRNeEhKekdza3ZTNktib0NK?= =?iso-2022-jp?B?WklDblhKaS9NL25ONVFhWjVkY0xPOXNYVnIvRGxJQVdoZEtmNU5KdXVj?= =?iso-2022-jp?B?ZHYvMVQwNmNTdURWWDBNR1BKd0UxNGllRE1jTmRJeVZ1VVNyODNjNHJE?= =?iso-2022-jp?B?U0hYN3ZxMlVONE5KM1pzZzNkUTBXSUxOeVl6SjA2amNUUEtEK3dqaHZy?= =?iso-2022-jp?B?Z2pxZjRhMkpVQ0hPeVd3Vy9nM2poRTZhSnl0ZnhUWGpRQ0FFRCtFQkFp?= =?iso-2022-jp?B?UWZ6SnFMWHloZ0ExK05GcGVxWkdtRzRDOEo4WWMxQXFkK0xyT0VHY1Rr?= =?iso-2022-jp?B?cCs2Qmh1SjdTZWo4WWtsL0V0RUR2ZGZOK3J4bW1YVXJQdmEvcGxVQy9l?= =?iso-2022-jp?B?UCtMS3NoRklkVDRCRkZCN0FWMWdabFZVL3gyMmFlQkxOZVRrbGxFTlha?= =?iso-2022-jp?B?QzBXNnowY3E3ck4rcU9hNGRDY3phbnpvTk5QRDlLMHRYRFZTNStCL2pm?= =?iso-2022-jp?B?T25qeVpKYi9BQk1qYlpSd0xEVXd6emV3R3B4bmFmb1BjdXdBbHpWOUlR?= =?iso-2022-jp?B?dC8rc3dWNG56Y2hYc3dONVluUmRNYUUzMzF2allMbS8rdUlCb2FIWUhU?= =?iso-2022-jp?B?dUwyRjcrbnFpUE9icFdUYWpqUlFROGZtZksvRUNPWmt0SFRKK1BReEdz?= =?iso-2022-jp?B?M3JtUFhHdTJGY3JoZEUzVms0anIrYWxzQU9Cbmd4bUpEcEN2R2x2Y3ph?= =?iso-2022-jp?B?VzJNNU1icUlHQ2dkeE9ySUtVa2xwcHFMTFdaK0NsRmdmUGdmNk9hN2t4?= =?iso-2022-jp?B?RVhHeEVPSEttZnVUOWp3UEhEMzdYakpNZWhCelBCa2VyVGZzMHhUR0xl?= =?iso-2022-jp?B?SVRVZGdDUndhTCttRGdSQ1RPU2tKeVNmVzBOMElUeWVJbUpwckpFQVR1?= =?iso-2022-jp?B?QTZZVGpmQis1UVhjd3FCS0RNZUttS3NiRDlyRWpKWjIzMU9jN0tkd2JZ?= =?iso-2022-jp?B?YkxyL2VsVXprQ0hEaVFldTRDdUp3d2N5c0hYcU50NVRSeTQ3a2tFcmJK?= =?iso-2022-jp?B?Nmp3dHl3WUthb3g5Rk9yTjdWLytLa2k3aUZORVI2Y1FkMlovdlVPcFU0?= =?iso-2022-jp?B?ekYvSkdnTkllRkRZeXNUa0Nqc2pmb1UzWngzTGhrRTFadTJlWGJwTCsx?= =?iso-2022-jp?B?a2ltUU1qKzlWV0g3VTEwQzZkc2ozTmExWi9NU3RzVVFrZFozNitoUFI3?= =?iso-2022-jp?B?YUFOTTFPSzExMGxnL0FmeU9tcUJZRXdhVVlhQXRTQ1drUEdnanZQejNM?= =?iso-2022-jp?B?OVdGV2Y5ZW1XaWl3ZlVkS1JxUDhWRVFwOXUvMEdEeXRGR1pQQjJ6V1ZV?= =?iso-2022-jp?B?RW1lOXlrQytoYnRuQjhveGpUcVZBPQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9653343c-27b9-458c-bfe9-08d9519f3957 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jul 2021 08:10:55.1894 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: GhAw7ufv8JSbiQVyYb6ZvVJD6aAd/V3roM9sk7OD2Xf9TdIz99MfWbAyy5cOC7A0/HVw1yEWkCbs3AW/mRgfKw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY2PR01MB3083 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= Thanks for the patch.=0A= =0A= I confirmed that the performance is improved than the master as show=0A= in the graphs [1].=0A= There are two comments, please find them.=0A= =0A= Reviewed-by: Naohiro Tamura =0A= Tested-by: Naohiro Tamura =0A= =0A= [1] https://drive.google.com/file/d/1DfYPMd6RRS0Z_2y3VH3Q4b-r8N6TyW1c/view?= usp=3Dsharing=0A= =0A= > [PATCH v3 1/5] AArch64: Improve A64FX memset=0A= >=0A= =0A= Would you update the commit title so as not to be the same among 5=0A= patches?=0A= Because we need to ask distro to backport these patches.=0A= If all commit titles are the same, it will increase the room to happen=0A= confusion and mistake.=0A= =0A= How about "AArch64: Improve A64FX memset for less than 512B" ?=0A= =0A= > Improve performance of small copies by reducing instruction counts and im= proving=0A= > alignment. Bench-memset shows 35-45% performance gain for small sizes.=0A= > =0A= > ---=0A= > =0A= > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/m= ultiarch/memset_a64fx.S=0A= > index ce54e5418b08c8bc0ecc7affff68a59272ba6397..f7fcc7b323e1553f50a2e005b= 8ccef344a08127d 100644=0A= > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > @@ -30,7 +30,6 @@=0A= > #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= > #define CACHE_LINE_SIZE 256=0A= > #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1= =0A= > -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill dista= nce=0A= =0A= This caused compile error.=0A= =0A= > #define rest x8=0A= > #define vector_length x9=0A= > #define vl_remainder x10 // vector_length remainder=0A= > @@ -51,78 +50,54 @@=0A= > .endm=0A= > =0A= > .macro st1b_unroll first=3D0, last=3D7=0A= > - st1b z0.b, p0, [dst, #\first, mul vl]=0A= > + st1b z0.b, p0, [dst, \first, mul vl]=0A= > .if \last-\first=0A= > st1b_unroll "(\first+1)", \last=0A= > .endif=0A= > .endm=0A= > =0A= > - .macro shortcut_for_small_size exit=0A= > - // if rest <=3D vector_length * 2=0A= > - whilelo p0.b, xzr, count=0A= > - whilelo p1.b, vector_length, count=0A= > - b.last 1f=0A= > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > - ret=0A= > -1: // if rest > vector_length * 8=0A= > - cmp count, vector_length, lsl 3 // vector_length * 8=0A= > - b.hi \exit=0A= > - // if rest <=3D vector_length * 4=0A= > - lsl tmp1, vector_length, 1 // vector_length * 2=0A= > - whilelo p2.b, tmp1, count=0A= > - incb tmp1=0A= > - whilelo p3.b, tmp1, count=0A= > - b.last 1f=0A= > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > - ret=0A= > -1: // if rest <=3D vector_length * 8=0A= > - lsl tmp1, vector_length, 2 // vector_length * 4=0A= > - whilelo p4.b, tmp1, count=0A= > - incb tmp1=0A= > - whilelo p5.b, tmp1, count=0A= > - b.last 1f=0A= > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > - st1b z0.b, p4, [dstin, #4, mul vl]=0A= > - st1b z0.b, p5, [dstin, #5, mul vl]=0A= > - ret=0A= > -1: lsl tmp1, vector_length, 2 // vector_length * 4=0A= > - incb tmp1 // vector_length * 5=0A= > - incb tmp1 // vector_length * 6=0A= > - whilelo p6.b, tmp1, count=0A= > - incb tmp1=0A= > - whilelo p7.b, tmp1, count=0A= > - st1b z0.b, p0, [dstin, #0, mul vl]=0A= > - st1b z0.b, p1, [dstin, #1, mul vl]=0A= > - st1b z0.b, p2, [dstin, #2, mul vl]=0A= > - st1b z0.b, p3, [dstin, #3, mul vl]=0A= > - st1b z0.b, p4, [dstin, #4, mul vl]=0A= > - st1b z0.b, p5, [dstin, #5, mul vl]=0A= > - st1b z0.b, p6, [dstin, #6, mul vl]=0A= > - st1b z0.b, p7, [dstin, #7, mul vl]=0A= > - ret=0A= > - .endm=0A= > =0A= > -ENTRY (MEMSET)=0A= > +#undef BTI_C=0A= > +#define BTI_C=0A= > =0A= > +ENTRY (MEMSET)=0A= > PTR_ARG (0)=0A= > SIZE_ARG (2)=0A= > =0A= > - cbnz count, 1f=0A= > - ret=0A= > -1: dup z0.b, valw=0A= > cntb vector_length=0A= > - // shortcut for less than vector_length * 8=0A= > - // gives a free ptrue to p0.b for n >=3D vector_length=0A= > - shortcut_for_small_size L(vl_agnostic)=0A= > - // end of shortcut=0A= > + dup z0.b, valw=0A= > + whilelo p0.b, vector_length, count=0A= > + b.last 1f=0A= > + whilelo p1.b, xzr, count=0A= > + st1b z0.b, p1, [dstin, 0, mul vl]=0A= > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > + ret=0A= > +=0A= > + // count >=3D vector_length * 2=0A= > +1: cmp count, vector_length, lsl 2=0A= > + add dstend, dstin, count=0A= > + b.hi 1f=0A= > + st1b z0.b, p0, [dstin, 0, mul vl]=0A= > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > + st1b z0.b, p0, [dstend, -2, mul vl]=0A= > + st1b z0.b, p0, [dstend, -1, mul vl]=0A= > + ret=0A= > +=0A= > + // count > vector_length * 4=0A= > +1: lsl tmp1, vector_length, 3=0A= > + cmp count, tmp1=0A= > + b.hi L(vl_agnostic)=0A= > + st1b z0.b, p0, [dstin, 0, mul vl]=0A= > + st1b z0.b, p0, [dstin, 1, mul vl]=0A= > + st1b z0.b, p0, [dstin, 2, mul vl]=0A= > + st1b z0.b, p0, [dstin, 3, mul vl]=0A= > + st1b z0.b, p0, [dstend, -4, mul vl]=0A= > + st1b z0.b, p0, [dstend, -3, mul vl]=0A= > + st1b z0.b, p0, [dstend, -2, mul vl]=0A= > + st1b z0.b, p0, [dstend, -1, mul vl]=0A= > + ret=0A= > =0A= > + .p2align 4=0A= > L(vl_agnostic): // VL Agnostic=0A= > mov rest, count=0A= > mov dst, dstin=0A= >=0A=