From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS, UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id A2F801F8C6 for ; Fri, 3 Sep 2021 15:02:55 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A16AA3848417 for ; Fri, 3 Sep 2021 15:02:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A16AA3848417 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630681373; bh=FczoTO6yz6oXs0h6UkLS5xmBGX0mBpFxSnSlIes8l3k=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=lUqXw8LoG47+55PJlyppoKyhpconmyYJL8bDHXe6ljV2lzFQ7V3estbpaTOb/Aqg+ 5Lv0cvJJl1xe0DSk/GGZ1lI+ynA/oj1mXtNGHkzfyaT+n5bg2DukYkkjnDzD9Cg1MC urlPlNLZQzM+qosAwmBHD7i9t1eNacgZKZdYPPeI= Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2067.outbound.protection.outlook.com [40.107.22.67]) by sourceware.org (Postfix) with ESMTPS id 46341384A01F for ; Fri, 3 Sep 2021 15:02:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 46341384A01F Received: from DB6PR0301CA0049.eurprd03.prod.outlook.com (2603:10a6:4:54::17) by AM5PR0801MB1633.eurprd08.prod.outlook.com (2603:10a6:203:3c::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Fri, 3 Sep 2021 15:02:28 +0000 Received: from DB5EUR03FT036.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:54:cafe::2f) by DB6PR0301CA0049.outlook.office365.com (2603:10a6:4:54::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4478.19 via Frontend Transport; Fri, 3 Sep 2021 15:02:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT036.mail.protection.outlook.com (10.152.20.185) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4478.19 via Frontend Transport; Fri, 3 Sep 2021 15:02:28 +0000 Received: ("Tessian outbound 8b41f5fb4e9e:v103"); Fri, 03 Sep 2021 15:02:28 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 236f9c8af92efb6a X-CR-MTA-TID: 64aa7808 Received: from 2a67451670c2.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1FFD3258-B5D9-4A68-8EF1-3A7A91C52A93.1; Fri, 03 Sep 2021 15:02:08 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2a67451670c2.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 03 Sep 2021 15:02:08 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n2QWEN/KGgah3vwt9HGjEnwL4uBUQryVQSY5y3z8kcOTvpbQCDNaK2hpLtnK8gQ4Dz6fR0kbMCUvpDV0ImW464ZgpGErdZ8a2lWiMcWYeuUcLJzZNZDphkAOhnP7vMVPyXctMwrrTL+vCmrgTRj3M55+VzFQv6XkoDaalNQjJ6fFtZqYldU9o9sqAyte7wFEywWXkhQ1eZxqgbIqfsZFz9eNDyGFqiVF3TAQJTtbUmBFIdsaAoJ45lcdJUfJRzGDqqGMnbiKh3OTC1fIabuqPiC3Ghw4wFAyIL00f+R2M+p2hFK9Bi74qDPq3e6UbdiCCRIyUsX6rezGNvdMB+yOIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FczoTO6yz6oXs0h6UkLS5xmBGX0mBpFxSnSlIes8l3k=; b=S4r30B0ONtHleLLF+FdM+qXYdTYadyB8TITF2p1iXH6BKCGrj3Xa9PpQhv8ndiRw9p9PkMHEZoLVKdZ3ezp8GTt9X8urhgqSSPJG19JM/DM3Nb9VeVRrs57Js9UpFSjN/GvbNpE9gFoIkuebNAnLXReqD2dymhe/8yvgYczoRzQF7o6iqx9y1t9/vQJ4XCcV7zp7fn2k063kA5QmNlNvwGX1pflYJys/hDk5Beu6Gn1WUTM7/PJ+WNeWayDPBCRw4KZulE4EHbDVEj+esw4/nHg4DSTOtX7/YaAnEN9gmYsuuqjot4njYefEp55jmTmvIBlHwaTUeP7WHfTAh9LshA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; Received: from AM9PR08MB6306.eurprd08.prod.outlook.com (2603:10a6:20b:2d6::17) by AM0PR08MB3074.eurprd08.prod.outlook.com (2603:10a6:208:5f::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Fri, 3 Sep 2021 15:02:07 +0000 Received: from AM9PR08MB6306.eurprd08.prod.outlook.com ([fe80::795e:a0ad:961b:898d]) by AM9PR08MB6306.eurprd08.prod.outlook.com ([fe80::795e:a0ad:961b:898d%6]) with mapi id 15.20.4478.022; Fri, 3 Sep 2021 15:02:06 +0000 Date: Fri, 3 Sep 2021 16:01:58 +0100 To: Naohiro Tamura Subject: Re: [PATCH] AArch64: Update A64FX memset not to degrade at 16KB Message-ID: <20210903150156.GF21740@arm.com> References: <20210827050304.543471-1-naohirot@fujitsu.com> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210827050304.543471-1-naohirot@fujitsu.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: SN4PR0501CA0151.namprd05.prod.outlook.com (2603:10b6:803:2c::29) To AM9PR08MB6306.eurprd08.prod.outlook.com (2603:10a6:20b:2d6::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.55) by SN4PR0501CA0151.namprd05.prod.outlook.com (2603:10b6:803:2c::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4500.4 via Frontend Transport; Fri, 3 Sep 2021 15:02:04 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6fabc471-8bad-4dbb-d2a8-08d96eebd8a2 X-MS-TrafficTypeDiagnostic: AM0PR08MB3074:|AM5PR0801MB1633: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:7219;OLM:7219; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: wF71Hbd/lQb3lMNOIVd/a13IVFyTYFKl9tkFmV0yMQrWx2T2KnnOV2MvK2X4HLCanz9AzNU4sSfe0wh+mMeKWmMglQOgCYA3qKh9gug1BxvBETL4KVACifjlXRhzG7qIFnCjDTVyBbIj0VYgssmSGBKnMpw7x6AZMH0+VHyVvtgSFoJ4uFZjts9cCLlSVFcyFUmAnI8T1aU3aKKaJwIahbH2LFguY8BXNXhf/NrVoTUp4ywvdRfk3Pk1U5I0jiwjr1hZG3LY8Gxg72WYYQwdHKCqJMDojYfIDL3SrpJ70p4xop80ufBx0HCcgQBRd0627YoU/E8wzTC2ODsFeHWJxXMxTas1xaHNLH/NTjpB7W8BitUALsR7O4VtFtAqiYAOoheDe8xHaiy0Xr53MKsIbIOJ0Td3VLtaLHD8nwSzYVjzbHrJi40MjWdIB9u3X8CYf45FhT5Sa3t8zjCDicxM0yFXBxfI3f21m8aHtJ05zsCJNvnNtG3YhtTUqp6cCklDGcfXWgLfkyrmEKp5chdu0aIWdJvmct1l0f7unCcrI7rJZPDyT5kAycX7ym6b2i8D4rLHYYpa3fzrVLHWWWmf3ZoxQ1t+mJvHmQ3KZ/KmPZGWT0hyz8GDhRiJkJH4Ts3NrO4zAnc3AyNtRrS4WauXz1PBnPwA6iDDuYZUs7OAJb8yYJK0Ps03vWxKEXLIUdhgIWzYyjiL6whNBwN+xaWzYN1Uz/0h5kb5itBKzoFlb3Q= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM9PR08MB6306.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(376002)(366004)(396003)(39860400002)(136003)(6916009)(8936002)(26005)(478600001)(2616005)(2906002)(316002)(186003)(33656002)(36756003)(1076003)(38350700002)(86362001)(956004)(66556008)(66946007)(66476007)(4326008)(38100700002)(5660300002)(83380400001)(55016002)(8676002)(15650500001)(7696005)(52116002)(44832011)(6666004)(8886007)(219204002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?OXprM0U2ZHZUOFdCK2VOcXVHUFByT1FTR0ZOWkxndS9oWW1ZNXpUVHRxUGpk?= =?utf-8?B?dDRObjludnBieXlHcFpUWnZhQ0MySXFpYnNDUWNSdGhHTzFjTEduRDh0Vi8z?= =?utf-8?B?MHdwTkI5b3c2N3hJalhqOGUrcXRxcDdmN0FjT1VBZERZb0JPTG95aVV2dWFv?= =?utf-8?B?UEd0OHFiS0w2bTRWTVVMQkRrVnhvYnlaLzVlWHJFemFoUlhlMjlvTXI5bGE5?= =?utf-8?B?ZGd6dlZsUjBtZnRLek9YaDQyNnFtbTFKdFo0SStxNFBSR3k5ZEcxOUQ1Ry9I?= =?utf-8?B?UytOOU41R01xQ1J3c2Vva3RlTkdKUVJhK1J3K0JZVERYTFZqSkxXSzdWOFdk?= =?utf-8?B?ZStNSXpaM1M3N0NYWHA0ZElaNWdTN3FLL3hXMGZxL2JIUGJWa0hqbGd2L29Q?= =?utf-8?B?REJLZXRFNjRndkJPZkpMWVdqVUs2d29HMnVNMDRiYzJMY0dYTjhVN2hHTTY5?= =?utf-8?B?clVQbDg5b1FPWE5hRUpFalNIOWNmSlZEK1V5U0k0TWQzNkVJaVdsTldlWWJw?= =?utf-8?B?d2JXWFVuNXJLVkZrbmIxcyt2MFB5TjFGKzhpSG90UjBUajNybkt4c0FEUVdo?= =?utf-8?B?V1hrNEJaNFpia2NETkFvTTBwcjd1VGRTRElkT2drN0hTRmZqS1JRT0VQQkdY?= =?utf-8?B?SzFHZ2tyOU5kSkQyYU5OcjIxU2J6dStHbFpzN0MraGNwQVdFd0lUZ29kZWRW?= =?utf-8?B?R0xpZmwybTAzbk5vNktHQXJDQU13M3I5Z09MT1poSDh4RS94UXpEekZKWGlO?= =?utf-8?B?VS8xUmlRQVZ5eXBMU0dleTVHZmFXNUE3NWg0NTZ5S2trYUZFNS95WDl4UGFv?= =?utf-8?B?cmlOLzF2U21yK2FOTTdtTWxMTzlXcUxsTkxLREVHN2c5ZFNveTROMDFuMUVH?= =?utf-8?B?RnRkd3B2Ym55SEI5eEpLVUNVOEYxTmp4VnVxcWZMWDkvalVwMkZHVm5jUmlR?= =?utf-8?B?U2orUDdFQXVHL2UrRVFJRjQyK1A1MGJZSVN0YTRzamdFNUp2QVpLRGMvand6?= =?utf-8?B?cGt3TDdpMklHcURaaHlDRzJQNy9pb24yWWozajJ0cmpkT2ZyS3J4emliQzg0?= =?utf-8?B?TVkva0RUaHpqcW1JM056M3VQYlFEU05yVzgyU2VlakFZZzdCQkM2azJlVVVl?= =?utf-8?B?K29neU9UUjZuRzIyYUYrS0l6d2Y1TkNlTGF6aHpYMUo1YXB6RFFlRW1aMUVt?= =?utf-8?B?OEpkVTh1YWJQQjViQWhzcWw2bDJCKzZUZGVxc2MzM0Y4R1Z5K1pXWWJkNmtm?= =?utf-8?B?ek1RZ0RNT2xNd0dLM2pGOGVlNnZxR2FQeGUzNm9qbVhlZjRmVmRhSEFmYWth?= =?utf-8?B?ZHZ6d1ZVRERmTG9kcWRwVnZjL2puWDJBbE1GTVJBUVhBSE1hVFNGYThNYnpU?= =?utf-8?B?UG93U0lJbWR2cXVOYWFUTHo2bXNIZjZFRjRiZHJQbWZEOFcraEdJbTZqdEpy?= =?utf-8?B?ZzJpcXQ5V081M1VuNHVyMXc0WmNXQkFmTW9sdnE1Q2VzR2JOYm5mMmkyZFhr?= =?utf-8?B?Sm5tZzBJV0tEREg0a0paLzBUZHAwb2VpQWRCYUxYRjRzZEpwbVhwdCtiZ0tX?= =?utf-8?B?VVBFK3EwQWlaUWRqcUE0WlNEdkdBaENwTm0rTVZveDQrZmFqN0RhdUtsWjFp?= =?utf-8?B?R2JJdEJPY2x1bVhiV2ozOUlhL1JVQTEwdVdaalpVeXVzaFlLRXdoMFVtcFJh?= =?utf-8?B?bmt6UVdTVTRneGFTQ1NwUjU5RVpSRUg3NHMzRnFtZHhHYzIwV3EvdzFQdGhN?= =?utf-8?Q?DgUkwVK8VYpKjzUwvOOk8pLFwkYVXB+nDBAd7x+?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB3074 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT036.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 4c59cfe3-44af-4a26-f616-08d96eebcb51 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cjKMVGw4UBpVOzww4XstImTi5Vr3VNKmYLuOKFTqvW3acgXBHETRtmKk8lCXm4mCKr6NJasIQSN9N1DgmrT3XU8oMk7SCAP3RLtRO66HtXosuPI/u3SOuzm39AWD7hErZqq+YTRgGjMVYUAPM7U/V90N3hNtU9/iPO98X4fcWsE2utWTfCwn6UJ46NiCDSJH4Jo3+dB1KU5vCLw/C+AH1IfyxUP26NFQi8pilKc79S0cPLHsUg9pVAplUFD6lwYiSQcMaCc8ROoCVyhA0FQJQs5nYoSQ4tXEjkFlxXJPOUzqX3wjswY9WerUZGnI92OXxP6/h/aV7o3PeB1XNlD55lI3yl1kDFnNWEHI5HT22RwSllDxlvmj7pPupLUbbAAIH2Itpf+xvx04fu7Jeglrn6gMTwwuC2eq9iUynHZdXrS2MWQ4i2qmxxl7W9YYg8h9TMkn32GYpoDQ22uEl5WBF2YYYGe4vhw7jgKHD4mC2oNxs/96BCfUCfT0ljr7Ueyk1sTifGyjDtw/ECPlAshSD1Rdapbb3Y43th5RkbaFwoLidL0/KoxR8ivd68A+Cltia8j976v8pAQhEW1684Bl4KB5ifACwQ7xZx7plxwqzjGzS8B2gY6wQp7YMobuxrr52Ty0XAI3k2xwA8q5o8dT5vU7k3mxbYx3ggqQAvHStBubZtX/LJQUanKWcyGcyli1wGBF5sLOH1bdIkTweQ8Bw/5c9Ck7APqvNppS36vKWx0= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(136003)(39860400002)(396003)(346002)(36840700001)(46966006)(956004)(186003)(356005)(82740400003)(70206006)(81166007)(33656002)(2906002)(6862004)(2616005)(316002)(86362001)(70586007)(82310400003)(1076003)(7696005)(55016002)(26005)(47076005)(4326008)(336012)(36756003)(15650500001)(8936002)(8676002)(5660300002)(8886007)(6666004)(44832011)(36860700001)(83380400001)(478600001)(219204002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Sep 2021 15:02:28.0486 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6fabc471-8bad-4dbb-d2a8-08d96eebd8a2 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT036.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1633 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Szabolcs Nagy via Libc-alpha Reply-To: Szabolcs Nagy Cc: libc-alpha@sourceware.org, Wilco Dijkstra Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The 08/27/2021 05:03, Naohiro Tamura via Libc-alpha wrote: > This patch updates unroll8 code so as not to degrade at the peak > performance 16KB for both FX1000 and FX700. > > Inserted 2 instructions at the beginning of the unroll8 loop, > cmp and branch, are a workaround that is found heuristically. > > Reviewed-by: Wilco Dijkstra thanks, i committed this now. > --- > sysdeps/aarch64/multiarch/memset_a64fx.S | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S > index 7bf759b6a753..f7dfdaace7cf 100644 > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S > @@ -96,7 +96,14 @@ L(vl_agnostic): // VL Agnostic > L(unroll8): > sub count, count, tmp1 > .p2align 4 > -1: st1b_unroll 0, 7 > + // The 2 instructions at the beginning of the following loop, > + // cmp and branch, are a workaround so as not to degrade at > + // the peak performance 16KB. > + // It is found heuristically and the branch condition, b.ne, > + // is chosen intentionally never to jump. > +1: cmp xzr, xzr > + b.ne 1b > + st1b_unroll 0, 7 > add dst, dst, tmp1 > subs count, count, tmp1 > b.hi 1b > -- > 2.17.1 >