From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C71241F8C6 for ; Tue, 3 Aug 2021 03:06:34 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CAC043890034 for ; Tue, 3 Aug 2021 03:06:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CAC043890034 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627959993; bh=mQDfdQHGEnhnVOGOrqCQJ7kvxmcVKwYBuH+E3cxq0OU=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=grrY9npylUtfE+HeK+MOrm3LXZ6cctHzbglZlVUJ/oYiN9Sz0Jo+ggn27C/h7oy4Z LgPS6eNHJPimx6/ND/VjhqFDreZpxzeTesELE7GttXgy7wBlLHNPNQMcSZVGbOpWWR cQyHswEhhfy8J7SqMgikoFNvjPWFQbpOYT29hZgw= Received: from esa6.fujitsucc.c3s2.iphmx.com (esa6.fujitsucc.c3s2.iphmx.com [68.232.159.83]) by sourceware.org (Postfix) with ESMTPS id 99BAC3891008 for ; Tue, 3 Aug 2021 03:05:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 99BAC3891008 X-IronPort-AV: E=McAfee;i="6200,9189,10064"; a="36131539" X-IronPort-AV: E=Sophos;i="5.84,290,1620658800"; d="scan'208";a="36131539" Received: from mail-sg2apc01lp2053.outbound.protection.outlook.com (HELO APC01-SG2-obe.outbound.protection.outlook.com) ([104.47.125.53]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Aug 2021 12:05:55 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eaf+K59mIp+wexifjSZW/utqkCDewMihbXguLbRYN8HuQVQA8vFSZ1EJfaQ+22g2EknIUJLics4YeJj7kqDA5H54LVcnKC8gVsylGqELvU4looCL9DPVG07xOQuEDgnSzm+vW06/fjxaX7fzqnowOnDyYzemSjsgjXmwq+WS1r4eXRpeoyLwpKuxpNMIpH3tW3Auo0K27DeF+WwJSGLEuNAF0M5dCaI58FM1koJRHM+fghvidzaGy0Wu257QMHFZTsIJSaecdoZ3eXY9WPLVLbnZj12l9JmZBZSvSAJL2juJk61Kk7csMwQronJkibDtE4xads+BwZbyKmVXchzv1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mQDfdQHGEnhnVOGOrqCQJ7kvxmcVKwYBuH+E3cxq0OU=; b=kV/LIiVDCzjZSucrc1/S5Q6N9QVIUyVgLArpA1i+DkYKbL295BenqBuRopqfqWInIexmnJNdUsQhbSNO/pxjkJmtfXNoTzDw4GbJpDs/nSO42I5OTsX/23OkP/E3OVkyUePWSuuBxu6cpyqfcmGKrNWwKBH5icrKzmlNNvjKN5fyEw85euwkMAw6ws3SXAIAZlRv3mbRm550xwnlDO1n8VHVmXPosP6UNWqY+lYo10t25xhAoukfwDjyngw4cmO6VJLi4DnS7/5MAaNsRwVwPJqBGHjI40qGbi10KQGtnt7mO/zYMDvYWQdp0LZi0pVcDcqmhGH3Kfs4KtInVq50eA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TY2PR01MB4587.jpnprd01.prod.outlook.com (2603:1096:404:119::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.20; Tue, 3 Aug 2021 03:05:52 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4373.026; Tue, 3 Aug 2021 03:05:52 +0000 To: Wilco Dijkstra Subject: RE: [PATCH v3 4/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 4/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMAahcBoeFLQU2G2lcRn1XzVKthHPUw Date: Tue, 3 Aug 2021 03:05:52 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-03T03:05:52.233Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 32a700da-e6f3-4707-c42c-08d9562b9a9e x-ms-traffictypediagnostic: TY2PR01MB4587: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:324; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: DJDR3AK8SUjzggTUtRFTIE8HnTOSprPPULlOWETQgo2ie09a9eOxydOLXhIZVK63uGyJdrfEtnR35CHlI9urIbvqYt8GOMN8jP5TNu8XxqX0MWraHZQ0sd4PY7o/4tVPaTL26E0GshlnrnqkfWjLYiZy4+aH+NWtp0dsjDs2zuvSStlituV8I2qbUMmNCiC4QGCeO6J6rmul9S3p8oh3GmbC/VrK2JB+Zytyn6j9LAKcZSqXq7tPZZhw0AZFDSuEUKBvdbkWA2oBsYu/CqJ3C53BlSyhdFK8hUKLVJOUSYyJlVMkX8oHTxN9ZURYfUjCEc6RMHQMvBmEISsQVlexLsZhCtghlFyD9caigb3haOfQ3dqSPunk5iTlmBvO4jkKJZ1xcwRlEpE+/Xm3tnEe+bXeJy03oeBN/iPAtnx5YD3+O5E8ljdujBJBMjq+S6GMtSisxOp0evqf1kV2bSklZSYCbIbVpda5ftMIBt3qXC8mZKOKg0gYLuR6wDVIDdV0TuaLQ1jgzL3DvkE4TkIfSdn9BRmed7u1l+/dITFOBg4zRNrqHwip3CTzeCGwOJ0Fot6usk+uiITlOX0EOO+hmOgE0hUOE8mifRbvFBgG58Uxjkl/JCPb90WNg9cAPVzYE3VracvwHe88pp/+m9SbtEaytUnJDDg8Xt4AL5IWBSw9B+1ozQ3YKg0GF2BW9EbPHhD+DrwHR8bwVcbvrf/3Dzp90cq+rM8EAmR0cNPo2dME4yihkvhJ581i3C0jpkUE548p0KpeMvJ1dpcUPCPFZQC8CFeP+ghJ/a7jHHIXdhiS5RzFTy9Uqzsu3+6IF1ic8D+6vWdUoS+yyotahL/xfw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(366004)(346002)(396003)(39860400002)(38100700002)(83380400001)(122000001)(8936002)(33656002)(2906002)(4326008)(7696005)(85182001)(38070700005)(52536014)(76116006)(26005)(5660300002)(71200400001)(6506007)(8676002)(478600001)(6916009)(55016002)(66556008)(53546011)(66476007)(966005)(64756008)(9686003)(66446008)(186003)(316002)(66946007)(86362001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?YjN4em4yVWhkb0JIN1dXYjlsNUpVdXlOR0lFZzZ0a1piWUdXazVzQUY1?= =?iso-2022-jp?B?UHlGd05OTFVNWFhSeXM3RWtRS0tLK3hJa01RUk0yVnRXbWVUdjNKNTFp?= =?iso-2022-jp?B?UkpjY084dll0UTdWYkFNM2RGWXVEcGVlbDhWVUQ2ZHUweEtqbkVJeEZC?= =?iso-2022-jp?B?ZkZsNUNUemtrZWZuSXdkcEZDZjc2eDNqamZLK0NvQmRyQnl0Qi9nbFdz?= =?iso-2022-jp?B?c1BMNjg1NG1WeHltdCtvVWFKd01kcUs2eEZ3QjlPVHFaeWdoaVFFazBp?= =?iso-2022-jp?B?WldKUUNYYTRDYTBGK1doM2c0QnhBUlQ5OGFXbWlNYXVKc0RXek42SnpK?= =?iso-2022-jp?B?NkJnbm5KaTdFeHRPMGcyaVFna2t6SVVabTdtSjJoeGQ2S0hPeFg3MmE3?= =?iso-2022-jp?B?VDk4R0hzNzE3bEc5dUFRZElNT0RvaUJmRzVMaXBGVFpwN2VCN1hQUU9I?= =?iso-2022-jp?B?a1BMWkRUUzdzSHptMjdSYTZsaXlSWC9rQWJsbjNMNnVmMUJEc3ZWK0lS?= =?iso-2022-jp?B?NGdZVFJ6S09MNm5lWlFUY1VCYTdHZnpSK0kyVFlDY3VZS3JIS0cwV2ps?= =?iso-2022-jp?B?ZlBram1ReG5taWp0N2dHNDQ2QTlWZTI4cVMweXptd0lEbG15TDhaR0F2?= =?iso-2022-jp?B?enhHNFR1Tit5Zno3NFJlSWx0T3Fvcm1SUHA4eHZMdTZDRDdlMEtpTHVq?= =?iso-2022-jp?B?OXNrRVFhMXNtMktGQStReHk1VVFCOWZFZUh1Snc3WmgyUXhBRVZWeCtI?= =?iso-2022-jp?B?Wk0zSnZ3WCt1V0xnSlA3UlFyL3drOVF5K3UwTllReG0vb1hQYlVUdHA5?= =?iso-2022-jp?B?OWJydjNQdzMzSEZ2SGZEU2kyYXRjZGk2YzIrWG9ocEROODV5b0svM0tL?= =?iso-2022-jp?B?RWQwNjg0SVlLbmFvamIrMThOMFhlZEU3bHVLWEdIZ0hvbTlPU0NVZG55?= =?iso-2022-jp?B?WGJlbmxQOUltWnBsWUhKR0V0eFZOWUQ5S2ZuQlgwYkZIQ21sYzZqSG9x?= =?iso-2022-jp?B?bHl6cmpUT1RSZ1FJTW9qd3ZlU3RlczJTNjdGdlI2Um5kVWRTVnhJcGlz?= =?iso-2022-jp?B?YUR5bjdKNUkwWDJnUjhONC9oNXl1aWRacGhLWHdnSTRHalg0aUxKL0Qv?= =?iso-2022-jp?B?UXJHWlQ0cWRDWk1lb2lZVUhXSmpwU1dpaHprYXhYYjhiUFhpSC9WT0J0?= =?iso-2022-jp?B?bElOMVVoZXd1SHltblhmSW5sbThRd3lBL0lsVVN6Tk1XT3BQMVI3bTVh?= =?iso-2022-jp?B?bkg4WjR6VVEwYXNjZmZUNlcrNjhZU2diQkQrcDVBQnhHemJZSzAzQmM2?= =?iso-2022-jp?B?Zk8ybzhzL2p5UjNqSGxOcGJLQTgvZGJ0NjFPaVJXMXoyUU10Z1RiTXRa?= =?iso-2022-jp?B?amtobEZyVTVQNHhET2xvZStrM1lqRDBFdWJnTUxtUzd0T095S1ZFUXVX?= =?iso-2022-jp?B?aWlXQVBpSEtTc05vSUF1N003TE85Z3BiZ29VcFZ0cVhZUjlDS2x5MEUv?= =?iso-2022-jp?B?WkgwcENuc2JJMmRvT3NNTGdraUVxUHdlNERsamE2ekVxRGpoRTJUdG81?= =?iso-2022-jp?B?emZoWlBFRmxxbVU4NUlmeUszdkptUmhuaGU1ZzZZSUJDeXpMOEdpR1Ft?= =?iso-2022-jp?B?bTR2b0VEM0RneGxUdkZtU1A1aXphZ1FYOW1peU5CcHp2aWJTc2pwdFpa?= =?iso-2022-jp?B?MzFCMUVqRVZ4M0pGTEVuQUt1dWozc2pSSW95ek5oeXNjYTc3L0hVdnUy?= =?iso-2022-jp?B?Tk5lRG05WTlCTUV1V3A1NzM2a3lSbVRRYk9STURCc3NvSjArZ25oK2E0?= =?iso-2022-jp?B?VU1IVE9OdUk0bFcwVG9OMVFTN3dQdU9vL3dxY1h0enk0aU5tUVkyUEVZ?= =?iso-2022-jp?B?QUhja2dIRm1ac3k5akJtSFhKUmd3PQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 32a700da-e6f3-4707-c42c-08d9562b9a9e X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Aug 2021 03:05:52.6438 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: f5AoPD/sFqnI1Uiz2wGDDu4cOYsPtREC3CAjsEkV0dSvYBWZR2ZJiCIy8baq4320ozUSTKl2PysmuBu6zqZWhg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY2PR01MB4587 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= Thank you for the patch.=0A= =0A= LGTM, I confirmed that no performance change [1][2].=0A= =0A= Reviewed-by: Naohiro Tamura =0A= Tested-by: Naohiro Tamura =0A= =0A= Regarding commit title, how about like this?=0A= "AArch64: Improve A64FX memset by removing unroll32"=0A= =0A= [1] https://drive.google.com/file/d/1SIw7bXX9Pi2G7wga9j5X9M2xzOHYQlIx/view?= usp=3Dsharing=0A= [2] https://drive.google.com/file/d/1gdcuRFZbtlIpnINUMar4DIt9Ao04K36o/view?= usp=3Dsharing=0A= =0A= Thanks.=0A= Naohiro=0A= =0A= > -----Original Message-----=0A= > From: Wilco Dijkstra =0A= > Sent: Friday, July 23, 2021 1:03 AM=0A= > To: Tamura, Naohiro/=1B$BEDB<=1B(B =1B$BD>9-=1B(B = =0A= > Cc: 'GNU C Library' =0A= > Subject: [PATCH v3 4/5] AArch64: Improve A64FX memset=0A= > =0A= > Remove unroll32 code since it doesn't improve performance.=0A= > =0A= > ---=0A= > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/m= ultiarch/memset_a64fx.S=0A= > index fce257fa68120c2b101f29b438c397e10b4c275e..8665c272431b46dadea53c63a= b74829c3aa99312 100644=0A= > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S=0A= > @@ -102,22 +102,6 @@ L(vl_agnostic): // VL Agnostic=0A= > ccmp vector_length, tmp1, 0, cs=0A= > b.eq L(L1_prefetch)=0A= > =0A= > -L(unroll32):=0A= > - lsl tmp1, vector_length, 3 // vector_length * 8=0A= > - lsl tmp2, vector_length, 5 // vector_length * 32=0A= > - .p2align 3=0A= > -1: cmp rest, tmp2=0A= > - b.cc L(unroll8)=0A= > - st1b_unroll=0A= > - add dst, dst, tmp1=0A= > - st1b_unroll=0A= > - add dst, dst, tmp1=0A= > - st1b_unroll=0A= > - add dst, dst, tmp1=0A= > - st1b_unroll=0A= > - add dst, dst, tmp1=0A= > - sub rest, rest, tmp2=0A= > - b 1b=0A= > =0A= > L(unroll8):=0A= > lsl tmp1, vector_length, 3=0A= > @@ -155,7 +139,7 @@ L(L1_prefetch): // if rest >=3D L1_SIZE=0A= > sub rest, rest, CACHE_LINE_SIZE * 2=0A= > cmp rest, L1_SIZE=0A= > b.ge 1b=0A= > - cbnz rest, L(unroll32)=0A= > + cbnz rest, L(unroll8)=0A= > ret=0A= > =0A= > // count >=3D L2_SIZE=0A=