From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id BA9971F8C6 for ; Thu, 26 Aug 2021 14:14:28 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E1E23858413 for ; Thu, 26 Aug 2021 14:14:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E1E23858413 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629987267; bh=aXvQm4UzrSASfSCYn/ohB3GR8eO7CnWEQFWsg2oUlzw=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=u6d8gk8WS04PwMaZfwQQFQnZzEBbQ6zBZhvnx1SkHay1CETi6E8IkSryzKyHWKgKA IiXg9t7PQN5dErxJQ+REdz/jWz0h09tD65XHTyKstgvKHUYcvGD9jIn4j2T5hxVujP RLjh0+j1B3zf85whajjAJc2sXSULkgeUk7NEOL80= Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60050.outbound.protection.outlook.com [40.107.6.50]) by sourceware.org (Postfix) with ESMTPS id 2085A3858402 for ; Thu, 26 Aug 2021 14:13:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2085A3858402 Received: from AM6P195CA0010.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::23) by AM0PR08MB5378.eurprd08.prod.outlook.com (2603:10a6:208:18a::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.22; Thu, 26 Aug 2021 14:13:56 +0000 Received: from AM5EUR03FT055.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::24) by AM6P195CA0010.outlook.office365.com (2603:10a6:209:81::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.21 via Frontend Transport; Thu, 26 Aug 2021 14:13:56 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT055.mail.protection.outlook.com (10.152.17.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17 via Frontend Transport; Thu, 26 Aug 2021 14:13:55 +0000 Received: ("Tessian outbound f11f34576ce3:v103"); Thu, 26 Aug 2021 14:13:55 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: e0b52bb71751db48 X-CR-MTA-TID: 64aa7808 Received: from 20c2dd79ec53.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5EE85F4E-AFCA-44A8-833A-07EA0B1E3FFF.1; Thu, 26 Aug 2021 14:13:47 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 20c2dd79ec53.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 26 Aug 2021 14:13:47 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Qu1klWdwsb/+JLRlsXWoHnfM9ZxX3nf6vGtzEf4T0Eb/0AmPRl60iTXf2rMTRybkLsTpKpwAKRfOwwW4dU7JOSl8bhG6IFQ+669v8EjIWO9wAoDvRZwEr/o4NPyZEu6pGQkJUt7mj2aih/tV0GvW7/o8+DL65P/mfON21bNB5VdeXffITvbxzRAirJxa/wNM/AkQKY9dvUSx3cWsDq9JFAB0tpc7TAxdGi5rho6VADMHuTVrQRcx1DgzvNxyR98YPF91/9KpKBNOm6pm8ERSDegfD+KnoWSeHMdvQ6kGH657VwQHP/DQx14riRrxkqf2tH5/Bu+CNvu2/MvvYaL6PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=aXvQm4UzrSASfSCYn/ohB3GR8eO7CnWEQFWsg2oUlzw=; b=XIoipwDZ8Wfs0f0TEsgLrMpEYO9T5e52n6jIJamNsagmOJZ1DauYh29qa6c3hmhs6jZZYp9YZw+73XG/aaesLXXTwJ/b/GfUz1CsSJLv80525qknoeEB9PPrQZQvUXWJWHir7F8uqlZcpxs77xYkLWEX9mRZLvZAmRy8hYJhol8NbRA6wXDnHxxoC5aKv1qbUO7Be0apxDSpNV9cmfK1VQkPAqM0esjNeIfKA/T8cooX2CJ4G+13GjqL6E1FVySkbk7scK1XMRQYOvzc7w12yZE/MgMc9Z5zjFHLQ+nCEUlC54do+nbbAZVJ+AORqyDaKitfPPJYMgVPZtaC5xIFYQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0801MB1711.eurprd08.prod.outlook.com (2603:10a6:800:4e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17; Thu, 26 Aug 2021 14:13:45 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%9]) with mapi id 15.20.4436.022; Thu, 26 Aug 2021 14:13:45 +0000 To: "naohirot@fujitsu.com" Subject: Re: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMmDOwcjJYmP0eOZ48wdBomwathHRvQgAo6SBuADAQMwYADlG+HgADnVzCABY//ZoAAms4wgADddl2AAVHh0IABzQsN Date: Thu, 26 Aug 2021 14:13:45 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-26T01:44:02.179Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 04dcf325-dec6-413f-3c55-08d9689bbd9b x-ms-traffictypediagnostic: VI1PR0801MB1711:|AM0PR08MB5378: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: CdLCvD57CMO+o+l7aRE1sMZ5PB0FG4ml3Bw5ClPwSgRQZ95pwKPEN/hOhOrMVxDuEMtY9bvsgUFvQNlbOP45VZBuntGLnZJO3IUTCsq05gVQz1ngffYoCUNW6sheHrxqw6ChGsBk3gtHEqsy5bUXDmHezc4Y3SL5uYyFHu7aFgcsWVp2GQu522gE/SlnP4xekBTORKlhPv0tBihDBPW1jpqbjj0S8pHGDpdHvdsBgd6xg8kylbVEMyz3d8UpouKZFTmtGVrYSOUzeO6Vo3Aw5+5enq+r4M9n2FoWLcCOSurBW9J2HwYiQXVTQEXEgZRYlEuDFBjZp1MjHLyworGfPI4oc1XtAlOqnr2+pFncpiduDjDnCk6VWYY24BOGEpr+TS9hVOOdgySCSuKk3lsBcqXtLBf+RUnwmj38bhZFeRefhP8fF7FWfLcEw0mFqeyoto13SpH27vHiL3Ivs7KyqP+0HROCTgE6+V428hSS74umYNRWgJlojsw9focjKyajqaTGcVaYiOEsgGcgKC2EFsmEHOSuqnbc6t1/8xmLKarq+1I8YPTOZgypNvcfFsINfL2WKmZva+wM1yGnSDno4fcCdK7TN12aC+2DInsmI5e6K7MT6/BWJbpz4GBA2CTKptgkTv0Jz/x7Q7oFlhliHd9KaE1VhjPmbWEqQ97ogfO3p7rz2el34edHti6cS8xnPKvKXhv0TLPLLfeHVtZdh+LT8QmiH0ztSFD5AECcUUW9Oq9JoiHcE39OAJUdyoRfr5R41nFEw820oc3tDBun1Ef592t1fkXweH1aC2ZOsmU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(366004)(136003)(346002)(396003)(39860400002)(316002)(2906002)(26005)(6506007)(38100700002)(186003)(6916009)(71200400001)(122000001)(52536014)(7696005)(83380400001)(4326008)(91956017)(76116006)(66476007)(66556008)(64756008)(66446008)(66946007)(966005)(8676002)(55016002)(478600001)(5660300002)(9686003)(33656002)(86362001)(8936002)(38070700005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?XcIEbZJCgmNbmRpuCgi6cmjojPvuTtgnCDjwYHByZEhcQqBhESGnKugmya?= =?iso-8859-1?Q?9HDVL1OwKgcFzodudZYDQIzll0OZV+XT3RTTrp25WTvAV4EEppErNyjNCw?= =?iso-8859-1?Q?cMC6aroNdE0onArxalPpeynV4OY5xTFC7YLNpgRXcNBZCzlj9a2Yfe3nn6?= =?iso-8859-1?Q?iXbnzFoeG0Kw4N2R1JuaTMeDqinU1030eFg6hUY9g/v11EDcl4hbtOaSmc?= =?iso-8859-1?Q?WSverSrFhK2CuGLHFwzdsRmQ21b1F3ojHf9uzu26WGtkrNMNCFEZmOPyKu?= =?iso-8859-1?Q?v66cHTZkK/FBglw1cOSGcJwIR5oBHjptXoB+9ed5SxntxKjPW9U5kMW31A?= =?iso-8859-1?Q?gaZF1OxjOpyfojFPWgU4uPn5CGATdJOZxD9tJGBtu1agcSMGwpYHOTehZV?= =?iso-8859-1?Q?pB3Z+timDNYtcrUp/l8a8JWXw4WEFD2MsLNTJI2e8rbq3erECij2LWg3zt?= =?iso-8859-1?Q?JAjga84jURCF6uP6eo7hedxqImd2/5xBZa9l2Ay0nNbkYG9k0BpJs7qE/E?= =?iso-8859-1?Q?2Om+v+AKxlIGISWXCXZZzUhsLhNrBTid56Z9CCp5UzFAbNEXNFZ+hHRPpO?= =?iso-8859-1?Q?CVutVqA0OknCYD+lRNn4HBVT9dJJV9c8K/ns4EF4FzPDb2JWnDvHNe0w2C?= =?iso-8859-1?Q?3mrVBAswxWbocN20QvPuMMeEU1YwHL8D1CY1/vWd4Mr8x7/86WdROb02r0?= =?iso-8859-1?Q?SN1JD2eUED/+rdXPyIdy7IUzC2VIklSDofXDYmkujUOSyIxqUpJ/w8QCXX?= =?iso-8859-1?Q?9C1asKZimlaNlS7t3wnFdF07JoGMILrHn+UQ7Xe0pZPyhoRcXjE6iWDQUA?= =?iso-8859-1?Q?E5qxA7Nd8nFB3+s2BVAld6SbSWBKJr3fVseZ2n2jolhbIUr3yn7lFdCHZ6?= =?iso-8859-1?Q?QTO3bsMrZRFR7WuPekuKLa7o/BPJ67OH+NoMH6x0nq3GlxeF04FAgDuMqF?= =?iso-8859-1?Q?0R4Q3+QNBWcX+AVEKwCsOybUNLhOOmMKCWJ1jVj+nTvJBKHC5hVc2tI44c?= =?iso-8859-1?Q?Aw9ZZSG/j5NjDCJhv9BT3Ymxx1gQ3Y43/9ttKJUdMya/DSqdwkEHciElVg?= =?iso-8859-1?Q?jhIGsWj6sQY2c1UfQ+CWYrN0mGC5sIblbvM3gf4RfxJclSx1tXtwmYr3OY?= =?iso-8859-1?Q?sg7o+K3bV/lIGdn5e/50Znje7clK5F+6lzSYaJ8XdafRPtz0yCJGIXwowV?= =?iso-8859-1?Q?ffsPUV53+3JPkeRlIBc9gQvgXP8e2JrYW03nhuhMtVHG/MQcLDnshjf6Yl?= =?iso-8859-1?Q?XYV/nlpoLuRoHXYpodYbsUDCfgYLZgjTz98TqwV9DSE8gD4Pl2rPPxMEal?= =?iso-8859-1?Q?mior7/wWKNLRXjDJnz3fVSY42SIYbjE1lKB93lkb9ZI6ZhQ=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1711 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT055.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 83df6d7f-daa4-4ff5-6827-08d9689bb758 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gCiZIQwNv7qGlF6A8Aq4Ywl5lyfg1PB8g3xa9dcjBxucdA3mGnsiEggVA56AkYBOdD1nc9uUIvxcestgtn4Fh2uPR0GtG3bUs92WVo4LyOHYsy4XZWD3VC4psrKltOe9NZJVAXD3Gw9DLfqbgCya7FBVQ7TtXrCo2x/8IhJNpqrn3ia8aUljrCBcsPB1GE7lxC6Nm4j3Tazo9FQgqF4rSXrmFNw0OyraEPo46feyiEtWLVy+xrTdJBAj7DOZjgUa58nBA5wuOSsQ9VrnUqMDvU5C1qdE0xuwUVbBMrCZ9PsFRgZyf7VYMt/cwr9sNNzm5ERMZAzvNZIFpjy677pYxIhz+Inpd4XdHIVdvmXjQmhJWRHCM5viAlGM7vVR6fOympil/2Hv5Fy6wdoa3RhWVT9RO/P2yj6oK0TD5OSdsHkvjCUCANTPEGM/ZM41dBxv7kMhsVRkewGcdDaneFw7vsCM0ik4jOeMEzDjYWltqkCHibaUZ+BHTY8uwe4EiI6cguap7OPzCR5esx54gcNw5P+3ackgPbrGpLUJ7LcmS0SYgqGFKsB5nlCbWxUOsX64Z5ubJ0sQdVsizutPZC7N1JPRv9wXgZe5N1vE7ZzFikdKS+4KC9i9nB4kKq70vj4JESVT0Ib7goA0a5/Kl+qmygYOdNHKwOho8dWU6O5Wlo6MeRG2caqzxUOkh6rBiNjxOZV/UzjPcy5X0R3EVgTlSJX2Kpjgnp7bZ8sANr7+jF0+KPKSiXCcUERqW/Sh2YdSCR7EjSA9sfym+UWqTvt644dpHAeWzTH/hDQpNArXWr8= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(81166007)(36860700001)(55016002)(33656002)(82310400003)(356005)(4326008)(47076005)(2906002)(8936002)(9686003)(336012)(83380400001)(70206006)(508600001)(52536014)(8676002)(186003)(26005)(70586007)(6506007)(5660300002)(316002)(7696005)(6862004)(966005)(86362001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Aug 2021 14:13:55.9343 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 04dcf325-dec6-413f-3c55-08d9689bbd9b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT055.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5378 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Naohiro,=0A= =0A= > You can see the difference between FX1000 and FX700 (=3D=3DApollo 80) [1]= .=0A= > The number of cores and clock are different at least.=0A= >=0A= > And the blt workaround worked for only FX700 but not for FX1000 as explai= ned below.=0A= > =0A= > [1] https://www.fujitsu.com/global/products/computing/servers/supercomput= er/specifications/=0A= =0A= So it looks like they are different silicon and likely slightly different m= icroarchitectures=0A= which would explain the different behaviour.=0A= =0A= > If you agree to the cmp and branch workaround (2 instructions at the begi= nning of the loop)=0A= > below, I'll submit a patch.=0A= =0A= Yes, the 2 instruction workaround is clearly the best solution so far. It f= ixes the dips=0A= around 16KB but doesn't regress anything else. The results v4 vs v4fix [9] = show there=0A= are even some uplifts in the 1-8KB range.=0A= =0A= [9] https://drive.google.com/file/d/1JaJG0I79VMSTGy2PqaZf1SILujE69Gi2/view?= usp=3Dsharing=0A= =0A= > 2) Result of the cmp and branch workaround (2 instructions at the beginni= ng of the loop)=0A= =0A= It's interesting this works on both systems, however it's still a mystery w= hy...=0A= It would be a good idea to ask your CPU team about this.=0A= =0A= Cheers,=0A= Wilco=