From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 0D2FC1F8C6 for ; Mon, 23 Aug 2021 16:51:33 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B2A733857818 for ; Mon, 23 Aug 2021 16:51:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B2A733857818 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629737491; bh=0RSIlErT4wugsbba3s2yRI8/kmLlGW8qWV3AiTFnsus=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=b+uQ0C8DwIktEDCfssNh1uBfeMpO9tQ8/VfUzHmPL/3i2XcaXZo7xfMgaXYLvphZc llb5PYk7w8nWv5VKpsG9m/HfQtexVwWA6cAP2HHIwqeEjrSRxuyjUrZfFA0GPQ5yaS ajjSNC3yDFAfGAfcwZsKDqXzxTGh5KcWhRvgBpfI= Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2050.outbound.protection.outlook.com [40.107.22.50]) by sourceware.org (Postfix) with ESMTPS id 457EE3857818 for ; Mon, 23 Aug 2021 16:51:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 457EE3857818 Received: from PR0P264CA0149.FRAP264.PROD.OUTLOOK.COM (2603:10a6:100:1b::17) by AM4PR08MB2772.eurprd08.prod.outlook.com (2603:10a6:205:3::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.24; Mon, 23 Aug 2021 16:51:00 +0000 Received: from VE1EUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:100:1b:cafe::73) by PR0P264CA0149.outlook.office365.com (2603:10a6:100:1b::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19 via Frontend Transport; Mon, 23 Aug 2021 16:51:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT021.mail.protection.outlook.com (10.152.18.117) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19 via Frontend Transport; Mon, 23 Aug 2021 16:50:59 +0000 Received: ("Tessian outbound 1a0c40aa17d8:v103"); Mon, 23 Aug 2021 16:50:59 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c96fdd3f4f88dc7a X-CR-MTA-TID: 64aa7808 Received: from 4e58d71198b8.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E1D422FB-925B-4A8C-B5EC-507F7ECB7CBA.1; Mon, 23 Aug 2021 16:50:53 +0000 Received: from EUR02-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4e58d71198b8.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 23 Aug 2021 16:50:53 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Bi0rpzfDFlPTg1QvzUaoiMNp9hop/lIHK583SS80H2CiSE8ai2lrTDuQwt0r4eMN1lkge5DRa5n/V2GogX7THQwyDqhDKHPL+9phIZ5XbQeAVlxtdc9J+eCQpMss/W7SM+EAkPCKGCyVIDxJwjaxEnRnsPp9pf5q18VCsgppmVapfcOZBvW5ZCvFZHiGkI8DFB4+sz0riaMkTvN5/LGoXq1R0ZWJ500Id2+9AErUn2vyFkdW3Adb+XV98AEjbL6cVk8MJUWd4o8g/R1VQc0/6nkrll7szcvhEuB8s0PCtyAQQasVhxkL4TBiy86nbjnU58lAE5/bI2TK7/AtOr7BlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0RSIlErT4wugsbba3s2yRI8/kmLlGW8qWV3AiTFnsus=; b=RlKf/Va2ioQbk4QVjn9LjePlUqK+PPOj1UjKdh3X3WeWlw0a6ePPV0WhlniAkwIsTpkK4WnLDqUrkTpu1otQ4Xmy7NUyhwkpljsLnexCXs5QDXxb1Zjxef+AjZ0i4XLsUzNXHGSPDSS9lhM/ITKV9TH/+cEDsJjAOT+GRYC9A36i/wZpMJPlOx0M2uuEvjlAa54MWYCv2maXjC0XuZW602jETdGD7ryfxtcAEJqvZHgGA+OMbBSPN+snPlQR9z5kBcK/+mq58YGk7BO6FtVAN5MeVumqiPUe8kl+Yjrbq/gu8lpBGy5GL4VUvUNNASHDI/YLY57mDAQQOEr2ShAGYQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5343.eurprd08.prod.outlook.com (2603:10a6:803:12d::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.19; Mon, 23 Aug 2021 16:50:50 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%9]) with mapi id 15.20.4436.022; Mon, 23 Aug 2021 16:50:50 +0000 To: "naohirot@fujitsu.com" Subject: Re: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMmDOwcjJYmP0eOZ48wdBomwathHRvQgAo6SBuADAQMwYADlG+HgADnVzCABY//Zg== Date: Mon, 23 Aug 2021 16:50:50 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-20T05:41:35.105Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: ad326f23-aac4-4c4a-93d4-08d966562f51 x-ms-traffictypediagnostic: VI1PR08MB5343:|AM4PR08MB2772: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Y786FcgulSDK5Y60DADKdNTvvbUnrzzacN2q43CSVhxtwObmwO9MWs+EdtFEXAoeoVwzfN6U55MqOUHpU3zdWIuK6vzLVpniXxpwI1DcSPjZVkxyDHYjJpJ28f0OqHuloSJAwQruyrQP5zf0Ccap33bc7Xgc64uitOnEu7d8s5QojOyGRF3TCIZ4Zox317kQ5dO1M6vreCIGyo+/Xo0FQvjqGw+OqU228rQcyuTZCc7XH3hsrQpcvIDSVRLh5ka1Jxy4WFdSbHMjGurrFaKSOk3+lRN1RCmVFNDgJfA3PJ7wERNTJ18PiRN3e+D8zI8xpS86BZePE1QIveQBZ0OVBr6fEcF43XWDaJgJJwVA3z0pJzJl1U+K8vso7orBakrJ5lUkDKVoO2R0n2JqF/Vd+K91wsacmcSTVdZ3eB/zBiYfe4ms2DiY7mxMMAbmSo8flxS2NRiD7M8vAl1qvowWWMjiLj3qm/2KXsutRC3IvE9m8+vbpyWrZcuBee96s/wOxma+m+Jct6YE51H6tKgOKajk3C49wbbu4+nU8+xLWayvHzhQOBlSFbNSIa2kadzwkUHJP2FMIfoftR6Mmr+pTF7FAf9hyAXt1IhmhNfss3ck0yJF+fHy1gEcDuyff74dmsNAtipnYvUjlxmC0IZvdF1dPyPH4a2ML43NcMMUM9CGP3gtltqajcVGNmlSuOWyS9J33pxmukvlgmrZfRWHnw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(136003)(376002)(39840400004)(366004)(316002)(2906002)(122000001)(38100700002)(64756008)(8676002)(33656002)(26005)(6916009)(55016002)(71200400001)(9686003)(66446008)(5660300002)(76116006)(6506007)(86362001)(91956017)(478600001)(38070700005)(8936002)(83380400001)(4326008)(52536014)(7696005)(186003)(66946007)(66476007)(66556008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?ziL/K+djEMANLu8/o6hgJNifypmvv8D1AV9SSxtSaHtWsaq1D0vC6AMw1h?= =?iso-8859-1?Q?n3bjLtx0opPwYjVsPCXYAaO/EE0GojRbby/OAOoyudfsTmGKthZ/9JHX49?= =?iso-8859-1?Q?s998MPX1p8cJg8B19ReDyGuZf5CsInWRWXXrSa10RaRuy8NnFuVXnBa44d?= =?iso-8859-1?Q?UolsfLxMlW1UvnFsNIGbOamdnjrTOYL/mvtGnQuqu8/puyRsknAsvW3REi?= =?iso-8859-1?Q?N+7s3Y6wEZa7N2Wt+HDrCZYLH/8HUH31MHCVNul6aGIKvj6ttP0lNolN7n?= =?iso-8859-1?Q?6MaTPkE+IuF5KI6H9mW9rQ2O+JvbKqUmIJelWbhu2Wul8tiZmAYxIiZZ25?= =?iso-8859-1?Q?J5thDZ0ozmnANaS0eZMEhuQXkcdFifziwlg5rsx8iAN7BpKK6xX9xc86ZY?= =?iso-8859-1?Q?dtUIRQMW4HGQVk9aXb0H6t8Ym3hWOzvxCrZ4UpK+MRuQFWeiJT+68MBCft?= =?iso-8859-1?Q?nZ9qozfpaiNcllE2T6aYay7nTRr5YsJr7nvwWzhJPtR2Bdr8y2ScFyOZXa?= =?iso-8859-1?Q?VcYxB048e6M1Unonw19/+9hdR5AaGxlbK7xqMTThOVnLRdJSVWNVcarHFv?= =?iso-8859-1?Q?vIFOlIocQmMbeWICjJf7Sklux/dzjes2WpoUeW92RcECp2A1txHXUTjohj?= =?iso-8859-1?Q?LJTZ5qOvSNwVUfVwroXQXwrJgyyO5dvdMHloQQC79nHQ6+5bwBguk85Slh?= =?iso-8859-1?Q?Mdg3tP3ozZKEh1bf2fReME/nCOR0VIha1rZ/QomSvMtq3VU1YrwQZcjVpU?= =?iso-8859-1?Q?eqpudbio4AhNFWlinfv/XOWlLaHjQob1OVuxiCFsSKz7+hAEjRPFxYT6hi?= =?iso-8859-1?Q?HgQ2NmvctAgHPNRty1l56xWWLlVQ/r3VduvCOUdT7eRH+pot8/fox58von?= =?iso-8859-1?Q?L/ljLLFebtJD+fM9I/M3CsZOYeN9odDZblpAv5CG4aUlqaRvc6VSQiXuB4?= =?iso-8859-1?Q?1cXTIen0BxIammanWx52zafb1CzNGZLNA1l6n7NFwVqL9FrMjO8tC6mJej?= =?iso-8859-1?Q?XuSx58wCKQaG1H5MmbJjsJIAiALaIJ6g5BvSDoXVHPktqmTR4oV2Dv+NRd?= =?iso-8859-1?Q?TbthUv3mYgM5NpJmKBHyqKkkOSVN+9nyp7F/yxFaxBhGFyyjivVJcszSCG?= =?iso-8859-1?Q?37IMlM14vZ/IbYSnYVFDAtZ5SAkhW+YK5lsBZCclidgO5GOsb4F0Qfn9Ha?= =?iso-8859-1?Q?QfJsOwPbLdtAwQvocZqOxs79cY/ay3XdwiCk0A6O99CzJrrpMPtzH2uXDc?= =?iso-8859-1?Q?luICkYb1dGfqyaHU3A7XpWIhzTgrYu8oBUlyscxMjBmB8oDlfT+gcGgCWQ?= =?iso-8859-1?Q?gtZaaWtsmRK1CQVK5qY1LVg2L5BefLk/wMXgOH/44MZv0+g=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5343 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: c435b5be-ff85-4fbe-f2d4-08d9665629dc X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xft0PZkvK0BCLQuFCDP/R4XdObMgiw3LxlXlfY1RbJTC/AcUlWPxfZO/Gw0sN3+Cy9A89NNT+N86UHn4LJKbEZ79h6uWTaF3pTI5xweq6gQ/xoLyDrB39ouIQa9B6DuNPCN2/SeDslKxdRgl/picp2WAcD4NH9bnDTdab07JPfMjjcTFWIqByuJT2kzB6gRZ+md869yOKd9vpt2ud9ito7dYgjWEoi8orCSkQdtjBZnlM+K5mTGr1vLUxUC9DN0rhABnnCTspL8JBrerOVH7df8G/mRdGh916M6Q3ImYQ/cXcN7Ze/PmcK6zVqCQMUczGN6BxY0ApDQxTAWjd2TLDOMVKLhLL+wfOyZn+p2IZfvS99UpbC5leHXd6Cbq8TC/bVOglFW3o7L1/SmxXEmWmMAOJmy0BWxfTCGb66r01TnquChnYXGr5rmCQPMNDjrEJtbUCENrhAWsVwyRmgVG6uCypRPNJdkX+J5nmUvStmrNGLAfTRORO/z8bqYxRtu3HCrBYiYpqOXesGtk7yRIR8qw+v8JdDJje10S2RLTqeUdsv0rcnIkZKYv25CCQLHoMJW2+yYyY0mSIaOI1gqO2KUM2x1Xs0urCMU7LEiT0Saq6yOyZiRrR5bf/10xNTQDQBo4RVUNQQqf9f9h/b7StouBXzITBb10dFazOdLSo2Z4X9BWZmRJJOgHCVdbwFp1Lks8BYwhPQDLDyFdRq7/zw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(346002)(396003)(39840400004)(376002)(46966006)(36840700001)(186003)(47076005)(4326008)(33656002)(8676002)(82310400003)(336012)(8936002)(2906002)(81166007)(356005)(55016002)(26005)(9686003)(5660300002)(316002)(36860700001)(86362001)(478600001)(83380400001)(52536014)(70586007)(6506007)(70206006)(7696005)(6862004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Aug 2021 16:50:59.5592 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ad326f23-aac4-4c4a-93d4-08d966562f51 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR08MB2772 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Naohiro,=0A= =0A= > In my environment, I don't have any performance degradation by reverting = unroll8,=0A= > but 16KB performance improvement as shown in the graphs.=0A= =0A= I still see a major regression at 1KB in the graph (it is larger relatively= than the gain at 16KB),=0A= plus many smaller regressions between 2KB-8KB.=0A= =0A= > In your environment, do you have any performance degradation by reverting= unroll8?=0A= > If there is no disadvantage by reverting unroll8, why don't we revert it?= =0A= =0A= For me bench-memset shows a 50% regression with the unroll8 loop reverted p= lus=0A= many smaller regressions. So I don't think reverting is a good idea.=0A= =0A= I tried "perf stat" and oddly enough this loop causes a lot of branch mispr= edictions.=0A= However if you add a branch at the top of the loop that is never taken (eg.= blt and=0A= ensuring the sub above it sets the flags), it becomes faster than the best = results so far.=0A= If you can reproduce that, it is probably the best workaround.=0A= =0A= > Is it HPE Apollo 80 System?=0A= > Or does ARM Company have an account to Fujitsu FX1000 or FX700?=0A= =0A= It has 48 cores, that's all I know...=0A= =0A= Cheers,=0A= Wilco=