From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B4B721F8C6 for ; Mon, 2 Aug 2021 13:29:49 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90BDF383F43E for ; Mon, 2 Aug 2021 13:29:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90BDF383F43E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627910988; bh=Nc5EUYOGQXCXMY/kCQtjXxsVuBpe1dnSroczKGOMzws=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=oyJWhpmUzW7p7udH2SZW4qWfaOZWVlaeG1nwZCaLZk2PJzAU9B0XdZ9ds8mE6HjK2 jwIvSlPoCw+omPhqyduYHmxoeQO8rFgp7PFbBeENW1tHenQe9ixiwLluEwmq5sMn3P ReBxZF+QLWtOa9Tlf1zmReU8KNiXMg321TIHFkKw= Received: from esa9.fujitsucc.c3s2.iphmx.com (esa9.fujitsucc.c3s2.iphmx.com [68.232.159.90]) by sourceware.org (Postfix) with ESMTPS id ADF4F385743B for ; Mon, 2 Aug 2021 13:29:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org ADF4F385743B X-IronPort-AV: E=McAfee;i="6200,9189,10063"; a="36034707" X-IronPort-AV: E=Sophos;i="5.84,289,1620658800"; d="scan'208";a="36034707" Received: from mail-hk2apc01lp2051.outbound.protection.outlook.com (HELO APC01-HK2-obe.outbound.protection.outlook.com) ([104.47.124.51]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2021 22:29:24 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fvK7RMAjN2vXhhOlXuEUb4+Kus6g57k2lhF721b62jN+zFUigckOGo1pKrgEuc85HO1csYWmn5fFVsYtaUnTW9i+v0aGGd3rsLPDLGV8GY6uqZ3lo16c4TetdOwJ9frXZXak+J1G/yX1Zv7ZVZ1bM4PmkvuBAGXym7o83YpblHmQJt8Nz5u0n9MyGMHxVdESE0Qp8MEHdPPRsC2Nnkrw3oApddsWBVZxnnihaAUQ+d9UZMft1AI4Up2lqRRLLV6wsEwm6XJvn+ft63qgi+Ta/c79yHMKA0NELpyBdy7pa80cSSpAt7IVP7jF44hq7YCGvNtVLM0Nk8KzSJVRcUMKSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Nc5EUYOGQXCXMY/kCQtjXxsVuBpe1dnSroczKGOMzws=; b=cciLKN7woJJu5eOcHSXlceylHRao2ph/+sI56+v0cn+fvLWOOU39RP68+9iL6DRb5LqNnxsWx9BY1LkwKcsU/J1kn99DZ/Kht5HEK+w4rx+brs0cBXNAr8UwCyrU+nkbsLszvh12LqlBSI9SnnRsPgd0Kds2/zAQd3EV7wAnw+vd2K4RkTxgwhm5/asz9RdllFW8ia2Ri0unCcXJzSAEe1vQknjFd7XG7J0ppq7Oo7i/9zZQrxaKnRm96zfg+8mESbhNebs0kvWC+oshbB5rxh+oxtelAJngzbsEHx2du+i1oMLcdDYWlJyOnqXskGoRplrulHkGLNL49GXYfZrY8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TY2PR01MB3897.jpnprd01.prod.outlook.com (2603:1096:404:de::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.25; Mon, 2 Aug 2021 13:29:21 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4373.026; Mon, 2 Aug 2021 13:29:21 +0000 To: Wilco Dijkstra Subject: RE: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxKaeH7mpjfTrkCBy3mxGimKXqtZRZ/w Date: Mon, 2 Aug 2021 13:29:21 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-02T13:29:20.974Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: b009bf5f-a437-441f-c66a-08d955b989a4 x-ms-traffictypediagnostic: TY2PR01MB3897: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3173; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: mMjk8DcKyC1qgOnlg1KvBefskus51RVVYnT/xj+gMYpk2NUP9OdhbeM9kgx3/+blX8jg0GwRXlj/oPyX9v6kUpQSzTZZ4czdtGaC781bsEfp7u8MVZTBXB9Oz5sFqLLBZEkrKrUVnUXMcGcyKy4BUcNwWWiYSpN9TBFB+4D+3K3NHzqZaDQ0hi0lQsZG27L5QFBEnn1+iZfTcMUZeR1W1v+FWMXRuw8d7fBUcDexvbGWD5hTkX8e87tvQd9LPS48bEKlOrleCCcndEy8lq3oTKRdiDPZZOP80HUvaZbjGDN3N5PT0DdkOtlc3C2+nRfI2n5uAg9Bd+Zb1ciWdpofIy5H3hBS0SF14iu0VYTOQcrj+JmhQStUJQ+ncCW3QQaWBBynYgiFCGDxxJ2AkBN2hZ1q30Myew296sFAKBrsycckzaNvMkDZHfy3VhY0LJY0jxD3LA54vFkc08drRvscDLgWx43BO++w2bxWznheBonb9W1tDgZbQLq1YKfU9oKW8aITt3CbVXP3mFbAa0ipn9eAwYxdaA650ybGUL9Pkd6tMciREuz+lsJFv2RUNz/hO6QELXo6DWdyPTAydJGRyIatakZieT1ALA9t/NMOxnmYbbPBnJ0mllMyF2xEA1S82UcJZlahxGEnp+1NV03lEKrNGNmHCZATzck6pPvksV6jK+o16m+64fHo9OQDl88zkrRpy6zgfxx4alg7n3rVGHZmZh/NJmsPRJZAzOCDOiAQXsCC7aidkp45ud9JsWgnZ2O5bboOJv7JiXykHnfFu7z9Cwd6y7pGXQQHvxlu/N9u64i+Vo4K1jfftB2MJgFOK1MX2AukVniYLb4SFJfnkKld1XP30HHj0P00b+b9X9Q= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(39860400002)(346002)(366004)(376002)(136003)(85182001)(33656002)(122000001)(2906002)(38100700002)(55016002)(8936002)(9686003)(6506007)(316002)(186003)(26005)(66446008)(38070700005)(7696005)(71200400001)(66556008)(66476007)(86362001)(966005)(478600001)(66946007)(64756008)(76116006)(5660300002)(8676002)(4326008)(83380400001)(6916009)(52536014)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?YkNnQitWS3NUenB5Z0MycUxpUmR6VXVyc2Z3N3JJQ2dtMWNGSjNJUzF3?= =?iso-2022-jp?B?SWViQXNjei85bU03K1UzYzQxMXI2TU9UY3V1SFhsUjR2SkpsL0V2ck4z?= =?iso-2022-jp?B?a1BITnVudG80S0xUWWMxRzlibW5lYzVUOGY3dElWY2dQNi9YODBWbzlY?= =?iso-2022-jp?B?UFlHNUNmOFFLZ1grcHFJWExrVE5KeWNSQjUyMGVoQWJUUTdyZDRBSjVj?= =?iso-2022-jp?B?MzdXY1VxSmJJREZuaWVaaTFXM2R2dE9DbmgwbThSNHJlSlVvcWZSYU1j?= =?iso-2022-jp?B?ZlEwMmdrS0taUzJEWWc3UVhOSzd5NGxvZzFyekhnSk9weHQzN2tJcUdu?= =?iso-2022-jp?B?bHdkL3BvYW1XVjRaVGJmYzJrV3BsQk5nK1E2Tk5NMm10c2M1a3oyWGFT?= =?iso-2022-jp?B?TW5UdERwWktoUjJSVzhxQTVGcGN6VEVSUHgyRHA3dUxzUWE1c1M2b2Jk?= =?iso-2022-jp?B?TC8vbWt5MkxxZytqRjdSVGdPVEJzT3BVQXNOekVKOFhNNmNGQ3c5NkFN?= =?iso-2022-jp?B?Z1o4Qi83RGJ5MjRPVnJ0WmVMQlBaUHNnWGZmeUorUTM3Wmt1djVNT1hE?= =?iso-2022-jp?B?bkJMSlRzaDlQNkdNL054ZHBJK2pyTGxGWUl5RldFNDVXbGxBVHk2OUxT?= =?iso-2022-jp?B?R3ltc0wvR21FZVZsSng1aHR5QjVIMzBZREhDSnIwU01aTnpncm9BVmEx?= =?iso-2022-jp?B?c3ZiRmpWd0wremlDNGlUMTZPMFJmNkJCTXRaUXJOSjdMZ001Q29Oc0pS?= =?iso-2022-jp?B?ZmkybW9EcUREa2o4NWJXYjZEb1ZycWVORnVPVGxHRnlzaXNJNVFiYXV1?= =?iso-2022-jp?B?d0w1Y1ZZSUJhc0hwOXB6eVV6bW9KWVoyREdNb3hBOGtTSm5UM3dXbUNa?= =?iso-2022-jp?B?UXVzbVFmaVNFdGluMk5QcElyVG1EclkyakV5clp2QUVwM3NUS2Fnalhv?= =?iso-2022-jp?B?TnNxRlY2VHA4OTVCWHRReW4valZiT3RsbkJNU1JybVNDRHhJOExzU2dG?= =?iso-2022-jp?B?Vy9BdE9OWW9zcHp5STBWUzF6UUlSdzU3S1VuNmltSCtaeXdlZS83aEk4?= =?iso-2022-jp?B?SjFtVzRLNEorTjZWVHN3ZGNGR1Q3VVMvWnprWVUxNzcxSzM4eEtnUUZp?= =?iso-2022-jp?B?RXQ5dkZCK0s1L3VHNXZqZmN5eWw5dTRMdExxUmdoQmdTOFFBdUpJeDha?= =?iso-2022-jp?B?aDJFVUxUamZrZ0ZZNG5Ra1VSOWlDMmtyM0ZBQjl2U1NkWlJkU3lGWTg4?= =?iso-2022-jp?B?ZWI4MVJmTTRNTmtJQ0F4RnJodm43cFM1a0VmbEUrR01mYmdrN2JOMnZF?= =?iso-2022-jp?B?K1E5Zndmc2ZpeGJnOHVaaHRFZkhPZHZKdU9OcktoeVpmMkYvRmdSWjBB?= =?iso-2022-jp?B?alVSSnpMaEtqYWtYdkRyQWsybjZ2N2dyZklzUVBSdFEvS1NjN29ua2RU?= =?iso-2022-jp?B?MXZ4NlZhREVFcFRqM2l2cXluYnZPdldnaGJvNGIxN2FLWlNVS2RneHY5?= =?iso-2022-jp?B?SklUcGwxZHhjcDJsRnlXcTZDSFV5YjF0UTJKbzFVemNoVENGZWVDWFhS?= =?iso-2022-jp?B?VnQ5bnBpTWFtUnZtZkkrS0lJNWRsOFV5alpyYnFXRWxIMytyTnhXVkI2?= =?iso-2022-jp?B?bFM4bHc0QXk2Y1cyNm9LZjVMb0Jmd2ZYc1Z0bVRndUIyeUQ1THBKdHY5?= =?iso-2022-jp?B?eWdEZHBoYVlsdkpmTGk5Y2c2ekZUNnN1eStPclkxVUhvOTMzelNSeXVl?= =?iso-2022-jp?B?T284RHYya0ZZOWxTVytoNmJ3MzUraDNrNkVZNG5iZE5UUmU2OVRtK2o5?= =?iso-2022-jp?B?Qm9MeDhoOUljQzd4SDhyQ1BUM3F1OEQ1RFA0ZmFZQmV5b1dBYi9qWlpJ?= =?iso-2022-jp?B?U1ZKejM2ci91ZXc2Q2U1aUVDWWtjPQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: b009bf5f-a437-441f-c66a-08d955b989a4 X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Aug 2021 13:29:21.5128 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: s2iNXfHtip5pLDFXq2Koo9cwr5fXhM7ijg1vEQlKPxZ/GGgQPxZduLf7QyFsnEbHzxe7g8Iems2GDcRoQykH8g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY2PR01MB3897 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= Thank you for the patch.=0A= =0A= I confirmed that the performance is improved than the master as shown=0A= in the graphs [1][2].=0A= =0A= Reviewed-by: Naohiro Tamura =0A= Tested-by: Naohiro Tamura =0A= =0A= [1] https://drive.google.com/file/d/1RxdIlJa2Wvl8eT5_TRVkvbkyS6bfZObx/view?= usp=3Dsharing=0A= [2] https://drive.google.com/file/d/1xCLsa7qweovdQpWtfnNZZwcEi7W7z3Ok/view?= usp=3Dsharing=0A= =0A= There are one comment and one question below.=0A= =0A= =0A= > Subject: [PATCH v3 2/5] AArch64: Improve A64FX memset=0A= =0A= How about "AArch64: Improve A64FX memset for more than 8MB"?=0A= =0A= > =0A= > Improve performance of large memsets. Simplify alignment code. For zero m= emset use DC ZVA,=0A= > which almost doubles performance. For non-zero memsets use the unroll8 lo= op which is about 10% faster.=0A= > =0A= > ---=0A= > =0A= > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/m= ultiarch/memset_a64fx.S=0A= > index f7fcc7b323e1553f50a2e005b8ccef344a08127d..608e0e2e2ff5259178e2fdadf= 1eea8816194d879 100644=0A= > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > @@ -30,10 +30,8 @@=0A= > #define L2_SIZE (8*1024*1024) // L2 8MB - 1MB=0A= > #define CACHE_LINE_SIZE 256=0A= > #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1=0A= > -#define rest x8=0A= > +#define rest x2=0A= > #define vector_length x9=0A= > -#define vl_remainder x10 // vector_length remainder=0A= > -#define cl_remainder x11 // CACHE_LINE_SIZE remainder=0A= > =0A= > #if HAVE_AARCH64_SVE_ASM=0A= > # if IS_IN (libc)=0A= > @@ -41,14 +39,6 @@=0A= > =0A= > .arch armv8.2-a+sve=0A= > =0A= > - .macro dc_zva times=0A= > - dc zva, tmp1=0A= > - add tmp1, tmp1, CACHE_LINE_SIZE=0A= > - .if \times-1=0A= > - dc_zva "(\times-1)"=0A= > - .endif=0A= > - .endm=0A= > -=0A= > .macro st1b_unroll first=3D0, last=3D7=0A= > st1b z0.b, p0, [dst, \first, mul vl]=0A= > .if \last-\first=0A= > @@ -187,54 +177,29 @@ L(L1_prefetch): // if rest >=3D L1_SIZE=0A= > cbnz rest, L(unroll32)=0A= > ret=0A= > =0A= > + // count >=3D L2_SIZE=0A= > L(L2):=0A= > - // align dst address at vector_length byte boundary=0A= > - sub tmp1, vector_length, 1=0A= > - ands tmp2, dst, tmp1=0A= > - // if vl_remainder =3D=3D 0=0A= > - b.eq 1f=0A= > - sub vl_remainder, vector_length, tmp2=0A= > - // process remainder until the first vector_length boundary=0A= > - whilelt p2.b, xzr, vl_remainder=0A= > - st1b z0.b, p2, [dst]=0A= > - add dst, dst, vl_remainder=0A= > - sub rest, rest, vl_remainder=0A= > - // align dstin address at CACHE_LINE_SIZE byte boundary=0A= > -1: mov tmp1, CACHE_LINE_SIZE=0A= > - ands tmp2, dst, CACHE_LINE_SIZE - 1=0A= > - // if cl_remainder =3D=3D 0=0A= > - b.eq L(L2_dc_zva)=0A= > - sub cl_remainder, tmp1, tmp2=0A= > - // process remainder until the first CACHE_LINE_SIZE boundary=0A= > - mov tmp1, xzr // index=0A= > -2: whilelt p2.b, tmp1, cl_remainder=0A= > - st1b z0.b, p2, [dst, tmp1]=0A= > - incb tmp1=0A= > - cmp tmp1, cl_remainder=0A= > - b.lo 2b=0A= > - add dst, dst, cl_remainder=0A= > - sub rest, rest, cl_remainder=0A= > -=0A= > -L(L2_dc_zva):=0A= > - // zero fill=0A= > - mov tmp1, dst=0A= > - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1=0A= > - mov zva_len, ZF_DIST=0A= > - add tmp1, zva_len, CACHE_LINE_SIZE * 2=0A= > - // unroll=0A= > - .p2align 3=0A= > -1: st1b_unroll 0, 3=0A= > - add tmp2, dst, zva_len=0A= > - dc zva, tmp2=0A= > - st1b_unroll 4, 7=0A= > - add tmp2, tmp2, CACHE_LINE_SIZE=0A= > - dc zva, tmp2=0A= > - add dst, dst, CACHE_LINE_SIZE * 2=0A= > - sub rest, rest, CACHE_LINE_SIZE * 2=0A= > - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2=0A= > - b.ge 1b=0A= > - cbnz rest, L(unroll8)=0A= > - ret=0A= > + tst valw, 255=0A= > + b.ne L(unroll8)=0A= > + // align dst to CACHE_LINE_SIZE byte boundary=0A= > + and tmp2, dst, CACHE_LINE_SIZE - 1=0A= > + sub tmp2, tmp2, CACHE_LINE_SIZE=0A= > + st1b z0.b, p0, [dst, 0, mul vl]=0A= > + st1b z0.b, p0, [dst, 1, mul vl]=0A= > + st1b z0.b, p0, [dst, 2, mul vl]=0A= > + st1b z0.b, p0, [dst, 3, mul vl]=0A= > + sub dst, dst, tmp2=0A= > + add count, count, tmp2=0A= > +=0A= > + // clear cachelines using DC ZVA=0A= > + sub count, count, CACHE_LINE_SIZE=0A= > + .p2align 4=0A= > +1: dc zva, dst=0A= =0A= DC ZVA is called if buffer size is more than 8MB and fill data is zero.=0A= In case of __memset_generic, DC ZVA is called if buffer size is more=0A= than 256B and fill data is zero. This is implemented by you[3].=0A= =0A= V3 Patch 02 __memset_a64fx (green line) recorded very close=0A= performance to __memset_generic (red line) in terms of zerofill [4].=0A= V3 Patch 05 __memset_a64fx (green line) recorded almost same=0A= performance as __memset_generic (red line) in terms of zerofill [5].=0A= =0A= Graphs[4][5] X axis starts from 256B to 64MB, and are created by the=0A= following command.=0A= $ cat bench-memset-zerofill.out | \=0A= > jq -r 'del(.functions.memset.results[] | select(.char2 !=3D 0))' | \=0A= > plot_strings.py -l -p thru -v -=0A= =0A= So DC ZVA implementations for __memset_generic and __memset_a64fx seem=0A= appropriate respectively.=0A= =0A= But comparing nonzero fill graph[6] with zero fill graph[4],=0A= why DC ZVA is only effective more than 8MB for __memset_a64fx in spite=0A= that DC ZVA is effective from smaller size for __memset_generic?=0A= =0A= Still I couldn't understand DC ZVA behavior.=0A= =0A= [3] https://sourceware.org/git/?p=3Dglibc.git;a=3Dcommit;f=3Dsysdeps/aarch6= 4/memset.S;h=3Da8c5a2a9521e105da6e96eaf4029b8e4d595e4f5=0A= [4] https://drive.google.com/file/d/1f0_sTiujCcEZTfxbQ1UZdVwAvbMbP2ii/view?= usp=3Dsharing=0A= [5] https://drive.google.com/file/d/1Wyp3GO-9ipcphwqOQOQ9a97EwFz90SPc/view?= usp=3Dsharing=0A= [6] https://drive.google.com/file/d/1nZ_lfj6Kz5vFCR35O0q929SceUP-wbih/view?= usp=3Dsharing=0A= =0A= Thanks.=0A= Naohiro=0A= =0A= > + add dst, dst, CACHE_LINE_SIZE=0A= > + subs count, count, CACHE_LINE_SIZE=0A= > + b.hi 1b=0A= > + add count, count, CACHE_LINE_SIZE=0A= > + b L(last)=0A= > =0A= > END (MEMSET)=0A= > libc_hidden_builtin_def (MEMSET)=0A=