From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 5ED6E1F8C6 for ; Tue, 3 Aug 2021 05:03:55 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6DD3389247F for ; Tue, 3 Aug 2021 05:03:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D6DD3389247F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627967033; bh=AxkhN6buTuL5vERVPw0AJlEfBgh7W0jG/Z0rmM67pmU=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QQ2PtkWIsD5fkvYGjMmJXyO0vJZ7dNMGjfSbtrwkG1uckATdzVsAvZCQf1Z+NgQs2 5srj1LbHktJP6lmtj0T2zf+9P8F//PvrbCxnyPKNoVJyeQ3/DmyyElaAQCv7zyGzec J8NsbZd/dj6K2usOZdqdy1vSmUcB5i36PzcWj6aI= Received: from esa15.fujitsucc.c3s2.iphmx.com (esa15.fujitsucc.c3s2.iphmx.com [68.232.156.107]) by sourceware.org (Postfix) with ESMTPS id EFBD93851C32 for ; Tue, 3 Aug 2021 05:03:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EFBD93851C32 X-IronPort-AV: E=McAfee;i="6200,9189,10064"; a="35948691" X-IronPort-AV: E=Sophos;i="5.84,290,1620658800"; d="scan'208";a="35948691" Received: from mail-sg2apc01lp2051.outbound.protection.outlook.com (HELO APC01-SG2-obe.outbound.protection.outlook.com) ([104.47.125.51]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 14:03:30 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IWGb3iCbEpRQcA3TBUrQzrEWvEJTK/EBI9THGqFxUNqheyo1ldyJppr7/yigyImW5FvJQNuCiM7sS5wD7PvNBvuk0+0QHOKk/ftmAVaxUAcd7gpKN4cofc5a+OozXFsAMNklIj/AdHbfyRVHsUeS5okY7FNSa+szGjgXZoaRYSlY2k2Uhzjjqa9Al7/0c0z64hMbCJyvI6DbuH4TgoplT+vchgKg5NRuaWymQSWG5es3oocOylv++KsXaStlrRwDihUfnxuxj3oXnZhz2KuwNPAl71q5p12gbgcN8cFkOTTvW7YsXJohAvlmBAExSBpki2KUAVidOiPPb/XoyyM4tQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AxkhN6buTuL5vERVPw0AJlEfBgh7W0jG/Z0rmM67pmU=; b=dp/tKZhuwcE7DFa35R4p7iApw9cis6T6jvksEX9R8hORXQtUr5NHzQ8A04ZGcBT0CyqcF0X+YpRnw7PXg3AhTyyOCiNRQrUrsyArvkqBcoaxiv7Um7y+jS2tq5/Cxwul1ARHOZx1OxZuLsZbUOMCM27dFSa9WVt73yj55Aymd34RrTYaR9ClswDFdXtrdJyJ8TFVVJTgpNcpAJDoyXtoJ56rf6LQSojweJc26fzfSQuU+VyKya79kF5bYCimIFg00vcAxu9wv4GOQwp18tI2Kihe5Akd0CYoyiwB6gOkK7r/NdE7+9oxrXToJpft7P8sPF9vXRKlT4m07SQQGuGcrQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TY2PR01MB4954.jpnprd01.prod.outlook.com (2603:1096:404:117::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.20; Tue, 3 Aug 2021 05:03:27 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4373.026; Tue, 3 Aug 2021 05:03:27 +0000 To: Wilco Dijkstra Subject: RE: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxKaeH7mpjfTrkCBy3mxGimKXqtZRZ/wgAfpjZA= Date: Tue, 3 Aug 2021 05:03:27 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-03T05:03:26.369Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 1ed3b252-b8c1-47da-95f2-08d9563c0760 x-ms-traffictypediagnostic: TY2PR01MB4954: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: cr7aMarfyQvOQxVdMHr+QrxQdzrUmD/YjMfVA7jSlNLEAZ5aOHDqbdLRc1sjp21YeAk/BDZ3ITuD0PNt9xPS/hrV0cPNInskWtqr+Fk2RptUf88Xjr7wlXpjgAyNCvqSeZaFfTW/VBC+L9XYv94VzKR1jc2cDAxqzKWalPjOw+9VXFHbpyylHmsGkmhrFKZ3aFlh2lGm9GyLSyIHLYN91yqJS8eq5S/2bIJBuGF3YrTcFJhLwfXQAFcnEB4yN0PbI7IA83eLBBYidS2R6pMOqazsuZNw9Y/KgkjyzRggkfFzKNWJ23SJ7oaGlNgQlCnEaOMbA01b7kYe6iIOnny7FGeQsQMGOtvPtlaiKBg2aFrdetpBJ9Md7fSxgErSbs7Q3uYhIPzMQbRUriLTl2E/7iNrcVCHaiOEAi3Ji1aEYYVqfCDOHHdRNWISbTVtS87ZHjN5uVuypGuTucjLwU+PsmTQcEwJ+oNxd5NKyIP1Vcu0VTTO7UzsnOkr81SNe4MJhF9M49Zw5wqR4b2oLAYvyuTd7HBOEdcLoefqvED7JronGgcZftIUpeBUHzCef6MexKpOC8xPOimVmTZMM05FTQWo4XeG241+3aIQkW0M+G8GI84RRlyt8aXhx18pS6mek1zx3ypHTmcD7nCFW0dlbgJ5rOK3qC3Cv3DhDPUrapFCX9x5W0LSDKqgR62MGqz5c9fAI3CTd+jDsxlOPypk8IPJjiL6qpbwCXEFRNCrpQ0= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(38100700002)(4326008)(5660300002)(8936002)(122000001)(316002)(38070700005)(7696005)(2906002)(85182001)(66556008)(64756008)(71200400001)(66476007)(508600001)(76116006)(33656002)(66946007)(52536014)(6916009)(9686003)(55016002)(66446008)(26005)(8676002)(186003)(6506007)(86362001)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?c1AyemM0bzNHWVVWTUxqY0cvNzdZY2NjenNCRTltdVk5NDlQaFVJR3o0?= =?iso-2022-jp?B?ajBDNXhiTStOMjQ1QXlKemFQOXU4K291U0owT2Y2cy84b1AxZXM4U1Zr?= =?iso-2022-jp?B?YnRIRnpUNkpZZ3NzLzRVRmlRMDFNQ3p6b2xaQmFYY3FLMVFMdjNzWHVL?= =?iso-2022-jp?B?TGtvU0c2cndjTDV2VkxlQng0cTkyVlZQZjROK2RrcGphbmV5NlhHVmc1?= =?iso-2022-jp?B?cTZPcm95YmVNWWpSaDRnNUNUWTZ6SmQ1RVJDME1HSWYwMlJyK3JaL3RG?= =?iso-2022-jp?B?YU1KY01WeUhsSk0xT1RHSzM2WW8weVpBWGpzd2paS0VKWU9NWG5Ha0NN?= =?iso-2022-jp?B?NXpxL25kTHphSDRacWR5STRsK0VseEtJWTdWaVNvNU1VQjBaQ1Jmemtx?= =?iso-2022-jp?B?V09jcERYMVFKcXJkWmNjNHNnKy9UdFR2ZHZHSFhpOXQ2QW5rZ1pZcVNh?= =?iso-2022-jp?B?dHhvMzZ2dExBVWd0K05VVnNQbjdnbnZ1NE5raUhya1paREVlMkp6U29E?= =?iso-2022-jp?B?WkxNc0xDMkMycmU0SjZDRXNZL1ZxZ1UvOWFhNm82bHZiaWFkTFBQaGhD?= =?iso-2022-jp?B?U1BuYkhDeGUveDA1ZkdOdlJ5ODdLMkQ3WmhpSXRvUzZCaFBRNVhWaVhC?= =?iso-2022-jp?B?OXh4aEhKYmpoMkJTZWczbitJd3VZbjlCS0xMMlk0bFFOaHBRdXZhaW9T?= =?iso-2022-jp?B?V3hWMzN6Q1RRYWJkSW9ITDFkR2VsdysvbXEyY09seHZhZUxEeDhhT2JN?= =?iso-2022-jp?B?eG1wQ1NCamh6TkdHL0pyV0Q4RzJKckVNQ1FFb1l2N3pkZDVHcnRVRC94?= =?iso-2022-jp?B?OEg4cjk2NWYxMXdVZHlnRURkYjBLSUJLNHBuRmZWZXd5cVl1NjJyWFVF?= =?iso-2022-jp?B?VmFrUnJweDdDMk9Ja3BxWVIyRHhXZ01jc29aYzJUREUwZURTczFsWndM?= =?iso-2022-jp?B?Uzc5WnhHbW9nbVc1T0F6UW41NDh6UjNwVCtKSCtvQVRadTljbll6QzRS?= =?iso-2022-jp?B?M1ZQaFhrcFczbmsxNjJZQk52d2NvZzRTRFB5eklPZlc2Y2l0a1BwU2Fz?= =?iso-2022-jp?B?SDZJY1FkWFBmRmUrVlNMSjRMNkV6UFFBMFpCQnlmYmJRcHJxR2pSQWhF?= =?iso-2022-jp?B?OHhDMXFXcG10cjlLMGxWblVFY0NHYVM1Qy9TV2FGN2hETjlKclJKUG9k?= =?iso-2022-jp?B?YVF2anhrTmhOY1ZsYm1tRFdVOGFtN3RSdFZDdFZTY3dvbU54WDVWeXdw?= =?iso-2022-jp?B?Q2lKOUowZEpvbVBSTmZJUkk0U3I5WXhvNUVmbHF6bTdnRXpmSi85M1R1?= =?iso-2022-jp?B?VThlUE5LRzNOdnYvZ3REbXIwZlE0eGorYXM1UVdzbmxNMW84eCtSQ2JV?= =?iso-2022-jp?B?YytVMW5VMlRNdXpSL2hROUtBMjlKNEdsVHRhTis2d3NibnZzeDBlVVUw?= =?iso-2022-jp?B?Y3ZxNGtLUW1wR1VJQ1VaTEhlVWFTYVhrZGZ5enlmNmpGbi9FQWo0eFdV?= =?iso-2022-jp?B?MlNSS2hoQWQyOHhzRGFXZUpRQU9zbEMrOHhCMHJRYXN0Q2ZQNTBBS2xp?= =?iso-2022-jp?B?U09rK1dzcWNkUittUG1Mby9jWHV4N1RERUNuTFA3R3VLVGE1Yng3dWpF?= =?iso-2022-jp?B?K3FQeDJkaWRjUmhqRi81MDhzUTB1L3p3L1htMEtVeVYyOTZLcWlBYUs0?= =?iso-2022-jp?B?T1NQSW11aFVaRTg1bGFiY0R3aGJ3MkVFcjR3cEZlVmNIMDZOMmxpc2NJ?= =?iso-2022-jp?B?cDBsYkU2R2wyc3IyQTNkRnNzNW9hWVJST1ZBd0tUbzg3VHZTSWV4OUEy?= =?iso-2022-jp?B?aExoZkZySjB6UUpBcG1udmRRTzdiNHdGbmthL0o2YWRZcEdFV0lGeFNJ?= =?iso-2022-jp?B?bThUcFQzTXBMQWVLcTFzbXRjc2FZPQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1ed3b252-b8c1-47da-95f2-08d9563c0760 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Aug 2021 05:03:27.0267 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: N38Ij8TeFqp5/EXlCV6uLnNF116hfob+elxtPIgzuTlFUlNaJ1gOx7YZwmud4ursfQYyEpEtI2XjwBrMECnPHg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY2PR01MB4954 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= Sorry, I forgot to mention one thing about readability matter.=0A= =0A= > From: Tamura, Naohiro/=1B$BEDB<=1B(B =1B$BD>9-=1B(B =0A= > Sent: Monday, August 2, 2021 10:29 PM=0A= =0A= > > + // count >=3D L2_SIZE=0A= > > L(L2):=0A= > > - // align dst address at vector_length byte boundary=0A= > > - sub tmp1, vector_length, 1=0A= > > - ands tmp2, dst, tmp1=0A= > > - // if vl_remainder =3D=3D 0=0A= > > - b.eq 1f=0A= > > - sub vl_remainder, vector_length, tmp2=0A= > > - // process remainder until the first vector_length boundary=0A= > > - whilelt p2.b, xzr, vl_remainder=0A= > > - st1b z0.b, p2, [dst]=0A= > > - add dst, dst, vl_remainder=0A= > > - sub rest, rest, vl_remainder=0A= > > - // align dstin address at CACHE_LINE_SIZE byte boundary=0A= > > -1: mov tmp1, CACHE_LINE_SIZE=0A= > > - ands tmp2, dst, CACHE_LINE_SIZE - 1=0A= > > - // if cl_remainder =3D=3D 0=0A= > > - b.eq L(L2_dc_zva)=0A= > > - sub cl_remainder, tmp1, tmp2=0A= > > - // process remainder until the first CACHE_LINE_SIZE boundary=0A= > > - mov tmp1, xzr // index=0A= > > -2: whilelt p2.b, tmp1, cl_remainder=0A= > > - st1b z0.b, p2, [dst, tmp1]=0A= > > - incb tmp1=0A= > > - cmp tmp1, cl_remainder=0A= > > - b.lo 2b=0A= > > - add dst, dst, cl_remainder=0A= > > - sub rest, rest, cl_remainder=0A= > > -=0A= > > -L(L2_dc_zva):=0A= > > - // zero fill=0A= > > - mov tmp1, dst=0A= > > - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1=0A= > > - mov zva_len, ZF_DIST=0A= > > - add tmp1, zva_len, CACHE_LINE_SIZE * 2=0A= > > - // unroll=0A= > > - .p2align 3=0A= > > -1: st1b_unroll 0, 3=0A= > > - add tmp2, dst, zva_len=0A= > > - dc zva, tmp2=0A= > > - st1b_unroll 4, 7=0A= > > - add tmp2, tmp2, CACHE_LINE_SIZE=0A= > > - dc zva, tmp2=0A= > > - add dst, dst, CACHE_LINE_SIZE * 2=0A= > > - sub rest, rest, CACHE_LINE_SIZE * 2=0A= > > - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2=0A= > > - b.ge 1b=0A= > > - cbnz rest, L(unroll8)=0A= > > - ret=0A= > > + tst valw, 255=0A= > > + b.ne L(unroll8)=0A= > > + // align dst to CACHE_LINE_SIZE byte boundary=0A= > > + and tmp2, dst, CACHE_LINE_SIZE - 1=0A= > > + sub tmp2, tmp2, CACHE_LINE_SIZE=0A= =0A= "tmp2" becomes always minus value.=0A= I felt that it would be easier to understand and natural if it is reversed = like this:=0A= =0A= sub tmp2, CACHE_LINE_SIZE, tmp2=0A= =0A= > > + st1b z0.b, p0, [dst, 0, mul vl]=0A= > > + st1b z0.b, p0, [dst, 1, mul vl]=0A= > > + st1b z0.b, p0, [dst, 2, mul vl]=0A= > > + st1b z0.b, p0, [dst, 3, mul vl]=0A= > > + sub dst, dst, tmp2=0A= =0A= "dst" needs to be incremented.=0A= Actually "dst" is incremented by "sub" because "tmp2" is minus value.=0A= So it would become natural if "tmp2" is plus value like this:=0A= =0A= add dst, dst, tmp2=0A= =0A= > > + add count, count, tmp2=0A= =0A= "count" needs to be decremented.=0A= Actually "count" is decremented by "add" because "tmp2" is minus value.=0A= So it would become natural if tmp2 is plus value like this:=0A= =0A= sub count, count, tmp2=0A= =0A= Thanks.=0A= Naohiro=0A=