From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS, UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7A95D1F8C6 for ; Tue, 10 Aug 2021 09:44:20 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2FA1B3850401 for ; Tue, 10 Aug 2021 09:44:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2FA1B3850401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628588659; bh=tp8MojOvs+WrYGCV3Gj4EPD3nljbHeGO0ToAZoXFXGg=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=JDUzWZtsFmO9R4YeILDpj4nFy7Ah4zPPVodVqxGoh8pOUfLPAdZRSkNVQmgbC3DMN oI3dnRQM20h9a8dOtx/zWbsjhYG4y+rx8IUFBgloTWbxavrv6zGQLsnC5pRCAWBPaK BoPo+koYjvSU63PuDPqrQdpV/J1oixSjY/DQWHhI= Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60087.outbound.protection.outlook.com [40.107.6.87]) by sourceware.org (Postfix) with ESMTPS id 44A45385DC01 for ; Tue, 10 Aug 2021 09:43:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44A45385DC01 Received: from AS8P251CA0003.EURP251.PROD.OUTLOOK.COM (2603:10a6:20b:2f2::23) by AS8PR08MB6486.eurprd08.prod.outlook.com (2603:10a6:20b:33c::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Tue, 10 Aug 2021 09:43:22 +0000 Received: from AM5EUR03FT007.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2f2::4) by AS8P251CA0003.outlook.office365.com (2603:10a6:20b:2f2::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17 via Frontend Transport; Tue, 10 Aug 2021 09:43:22 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT007.mail.protection.outlook.com (10.152.16.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Tue, 10 Aug 2021 09:43:21 +0000 Received: ("Tessian outbound 79bfeeb089c1:v101"); Tue, 10 Aug 2021 09:43:21 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 2e838a4e05e74070 X-CR-MTA-TID: 64aa7808 Received: from eae605054e11.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 2466B8BF-6B99-4D86-AA77-DDFE18860BC0.1; Tue, 10 Aug 2021 09:43:15 +0000 Received: from FRA01-PR2-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id eae605054e11.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 10 Aug 2021 09:43:15 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WJpKG19C1J/epYl5mYtcDFONpcem32lhx2sud5DfLSYIOZ0xav6pQVjaJ8GPx3kOe2msk4EQczxbV1itUSZqn4OCZIiYN7S4Obr9z7a/LzUTC8A03V1F1ivMPBo7trrTukcFUdJc33vyJowcs0o3cPFE2XRkAuJ9lnfEeqmEot1ZyLBv8XwFOzbsVuXpy9Js8PYXIt5kWdfm5PgQJoTXK7vNnucgIhADd0YqAxSg+vspbM0f1uROldzCD59GwIH1GqtAD8xc6ePjLznJCswRTrEzLnUx0efh8LRHNGIJOCFJ/+w39djfOSCkAKmd2uYh7ZiFDaqXZ8bplJJ85Bzdyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tp8MojOvs+WrYGCV3Gj4EPD3nljbHeGO0ToAZoXFXGg=; b=e5jpx3uHsc3ddWOSh1HQQ0H3NaO2lacECz4KIW+PFLY9jirSoib4T4injZdGSphtHLyGArr6e8lOqqaOVJZO1d4ARjqVRb1MR3/foaLfLWSfUhSgVeMOPbbi+30Nmmp/VehYHLURTIoaVq8PuVGQsN+B66O/nYKAd9w7FszVwOmnnHTgplTL3KD8ZyB4W0U7k9mvWfRC+fVSDzq568fUnMUUlkFA9H6zGGCpnzUhNcdWwl4zJeX10Tn+hJvaj/YkNtPnd5wFNDuk99iPikzq2JF88/Vk+K7ADeyqNrnj8NS4wb8+cLCo9A5YoqBrCQIDWK350Tx+/ZpOrkiJqBfWmg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) by PR2PR08MB4827.eurprd08.prod.outlook.com (2603:10a6:101:22::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Tue, 10 Aug 2021 09:43:13 +0000 Received: from PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6]) by PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6%7]) with mapi id 15.20.4415.013; Tue, 10 Aug 2021 09:43:13 +0000 Date: Tue, 10 Aug 2021 10:43:11 +0100 To: Wilco Dijkstra Subject: Re: [PATCH v4 4/5] AArch64: Improve A64FX memset by removing unroll32 Message-ID: <20210810094310.GF20410@arm.com> References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO4P123CA0182.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a4::7) To PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.49) by LO4P123CA0182.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a4::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Tue, 10 Aug 2021 09:43:13 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6e69e144-aad1-4209-caec-08d95be34a9f X-MS-TrafficTypeDiagnostic: PR2PR08MB4827:|AS8PR08MB6486: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:361;OLM:361; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: PKlHXOVwi0CdW/DIAV6JkZkYnruyhjbpsqSB3ZaQ8luu6KkulcggFv9YvWgGbhCXor30cx6jtsF4WgGEqdCgxx5DPAkri5dwIX/Rf0HwNkpCGcIjM24Q6o9WooHkp5We+tufPIcGKT+sqazmQC4wGpRZjtXjGKv4fv3gkktlaQc+HN46UUp7jGnqk0ZYPjurLqmqWlnmlUwrsM/ShwRmhbPhDfwlWAgeZNFxh++sWVnN1E+QLrT8P684lOH4OKmCrmaFl1weTQuFE7PXf8s481h0/MGIu5VlibjOHuLlQJ5kZhaahnWH60hlKMWyA6G/lDKdkTnnHixVr7OxoQh/UgYxM/jFzgt860cpzd5A/Ao7gzb87zVTIUOYXEeM7wepQ2/oh/EHQceT9xlgvJSNNd69u2PFNLPAxxga1AD0feRBccdQDjcrDDBLWcFyjNQIeAs1ge0ga7q3kOBVe4zkfHqy4DVT/Fu4oJ5fzDvtWXmhDmsgqy8gLBBXcN6P5F1oQK8kpwBfP8O4GDu0a1q+Lm1pLtp7EDYJtdQIYcIBpZ6xfuwpdXtneBE11zlWTv42vMJovO6cIGwWkvdoQ7QJvSYVzC+FJLHUnBoW1Bs+rASJL7J7Rb0f9DC7vd2uHDg4aI+E3P+//G1FzTqxevmSoxfTQSxatN7f3nrd+6JDxNgrjOH29oysf3w8NYqt352Eqs4/yzl69UenMAqCGjy04g== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PA4PR08MB6320.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(376002)(136003)(346002)(39850400004)(396003)(33656002)(2616005)(956004)(1076003)(316002)(86362001)(8886007)(8936002)(55016002)(5660300002)(38100700002)(186003)(8676002)(38350700002)(66946007)(6636002)(36756003)(66476007)(66556008)(4326008)(478600001)(7696005)(52116002)(44832011)(2906002)(6862004)(26005)(37006003)(54906003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?azNJcHFvTWhUSGg1SHZnamtxYkpQbEY3cS9UazFYeHQ1cW9YR2ZPQjlCV09w?= =?utf-8?B?eG9kbHVyOUowSTJDUDl2OW41anV1Zi9kVytHNjZicWhkR25VL01TbHVwTDdt?= =?utf-8?B?Z2l3aXk4MGNBcGpFL3poVTc1M05rQnZHU2t5REcxcjNTbXI5LzlHamxxVlBh?= =?utf-8?B?eDFvb2VQZFlkRGg2ZjB0MkZNWUx6cFFla0NmU2lSUzBRaElTNk1ZWDIwVEIz?= =?utf-8?B?ei9HK2FaaWxuelpkMDhFc1JlZnpEMWU3Mk5wK2hYY0dsUGt2L3V0MDl3bXZ5?= =?utf-8?B?amk4QTVOUERZRGEvWDQ2dW04S2JHMnhUU0pMdjVYMVY2SkFYck55QnY1elo5?= =?utf-8?B?OHdZQzhidVAraHVTYmdWSVY1QjdsbU1LdDdpRGxMbHhhZTZJZlo5WnZoR0Fr?= =?utf-8?B?dXpBWjB4Y1RxZGY3U2QrWlM3UUlJaG1vdjhlYnc5NGpqeHUvcUtHZmd0Q0Z5?= =?utf-8?B?ZDRKcTJmZ0thOEFNZnMrb0hoWjArKzhqT2ErcVJGbTdCTDVqVlRVbDR6eENS?= =?utf-8?B?ckcrWm96MVlEYnVQdHZJN1ZVV3BPaytnUnhnR1dVVGdYUDJHb2pjbktMVzJT?= =?utf-8?B?bVNuTUo5RUowcGdkMk1jR1V5eUZBeDVuNjkwTkxrNUJLbFA4UDlBRXVsb3d6?= =?utf-8?B?S2FaQWlLVjNYUzNJb1dsd3RGZFlya0ZPTFNlUUlHdHRlVXNkWk9iN1VLdGxz?= =?utf-8?B?MmEweCt5SXFGNXlKRGpqckNOSkFRVFovTkMrbFZJT0ZkTEVMTmFEQ3plSnoz?= =?utf-8?B?NFRNRDFybG4zZFdkYkE0dEhLbVd6OHlFdldBRU1JUk1NYlRkenV0RndiSHlh?= =?utf-8?B?MGtFbkRtNmtZNTUvYVBxa0hSdW83N1BKTEE0M3E4RjU3VlY4cDNISDlyK1FH?= =?utf-8?B?TUN3T2xlSE1OVy81SmFjRXJCdmdWU3ZZby82OXY4SmhXa0kvK3ZlRUFVWGpk?= =?utf-8?B?YzZlTlNmeVZRRGlDTXo2cUdXZ2ZSZFNtMWZ4T1NMQURyOHJJNnBrTVBITmx3?= =?utf-8?B?bmd6cE1EREVYRVBUSUFkNElSTm5iM1ZkWFRWdFBRaDU2VWZyd3Y5YWVGcmQ4?= =?utf-8?B?S2IzMXd3K3dkbFBZUGo3RTJNWTc3RVA3UDJxTTJDam85eGJMYVBGSmVDQUxa?= =?utf-8?B?a0xHSlpjc0NhVk9FVFIxTHFKMHZFdU0zR3hhNzhPMEhQc2swajcwTXd1U1ZK?= =?utf-8?B?a2FWeEZoYWdLK081NXRnUmsyMi9UOTgzekpxSnpmdC9PRmloc25LZkR1NWxj?= =?utf-8?B?cXJ1bEZwQm92UjVPZmtYd20ydEhleS9WWVRuYmhhZVVCK2hudDZKZFhjYXlq?= =?utf-8?B?bmpGdTlWTzU5eHhHUHlBbHZwWGFxT29SWjgwS0RWTXpqQ0RKbkZ4ekVLN2Q3?= =?utf-8?B?MGQ4R1k2bDYwZXJYaThCUlFvc2dEelZUanpiZ25IeURleFRJZDhmT1ZGbzFk?= =?utf-8?B?aEJUZ3ZDQnQ2TSt6eXpoTnVrOTlUc3RiQTd0dWJhcDJ3TEFyUnJaUmlOREMr?= =?utf-8?B?bjlkdHFwNVBVRGVVREhzbGQ3TG5UK09ueUsvcFY0dHdzQzBSMHJVUXEvQVJy?= =?utf-8?B?OEN5QmpDQzh0R25xQXJWUE14azlYU0NjdWsvYTN4bUZvMEREdzNJamJvMjRR?= =?utf-8?B?bzArRWFpeko0cDRUTFVQWGE2Z0E5dzhBUWd6SDhvdVUrZDBkMldZUEFOSER6?= =?utf-8?B?OWJHZUV3TlVHOVBOVGhqTkpJeG43cjVybGJSQUozaktKZStINGVJY0JWZmRW?= =?utf-8?Q?wzwwnOKkdi9GUAfpx9a6NYK/YS/qlJ4RO0J9lbG?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR2PR08MB4827 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT007.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 497ba4cd-cdd2-4152-e28f-08d95be34577 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: xqh+QkU74tPzXf/6ycWpTfBu4cGwC3/KxKU2gg9ItlfllUQNqQ6UTSZimEqpN9i0KzFrdBqh+ZtSUqby4mNimIHZZWYjn5NFvWebmFU2d0FQhrZgqY1tQ1UXQknvokAARUAyFF0seTH/AcZSQo7gk4IGEfHEytfBf8RDKRbb1wRcru0tybMRPDlGhDct17ZrWE4kRuenSUG8jrLME7HXCwfCxnXS5K4AovoR0Rif6tRjc5culZeX6a1XdsVNnVAXNrFH94csMXHGU11BCTLACjk8eTDCNN/4TPa8w0u2ic3hidcSZy+ypRw2CEj7exgOvjtnyMu4FoNFPV7zcRdYu3j8G2oJrKKjy6LJmR7nL61MMiYSgOs4/buiVBVuFMqS5FkwzswLW2UUiWekWAPQW69cXXFOdKvkNE841IfNGzdl49NTuZOQ2j2ypfAsSjJWuYKFAmMFpjs6TrGKeN96t8R5Xcb0b5JYErd5bZ1nbJHGGUWs0hKWKAfpwL5hbHASxf2q5ID5GEtFpGVsBZEYoFUcgeqAdK0zPMD8HQdjFoAJmrCNk9fmft5VVNNOljlLzOwpF0TPRer6ZbARP0Du7OCen1rfrqa1oIVoBS9WEJe5DLuTOEM2s3GkIdfVPZYYdpm4w0wqMfO2RlDtcahE55uj1XPrAaMTQBOcAn5TOJKfSJlsh4XwVojRsTt5i5WSnu1XUwrMCqPOiJ0UnrSt6Q== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(396003)(376002)(39850400004)(346002)(46966006)(36840700001)(36860700001)(1076003)(8936002)(956004)(70206006)(2906002)(70586007)(8886007)(47076005)(33656002)(5660300002)(7696005)(26005)(2616005)(8676002)(44832011)(82740400003)(86362001)(336012)(478600001)(316002)(37006003)(186003)(55016002)(6636002)(54906003)(6862004)(82310400003)(356005)(36756003)(4326008)(81166007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Aug 2021 09:43:21.6766 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6e69e144-aad1-4209-caec-08d95be34a9f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT007.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6486 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Szabolcs Nagy via Libc-alpha Reply-To: Szabolcs Nagy Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The 08/09/2021 13:13, Wilco Dijkstra via Libc-alpha wrote: > v4: no changes > > Remove unroll32 code since it doesn't improve performance. OK to commit, but keep Reviewed-by: Naohiro Tamura > > --- > > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S > index 55f28b644defdffb140c88da0635ef099235546c..89dba912588c243e67a9527a56b4d3a44659d542 100644 > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S > @@ -102,22 +102,6 @@ L(vl_agnostic): // VL Agnostic > ccmp vector_length, tmp1, 0, cs > b.eq L(L1_prefetch) > > -L(unroll32): > - lsl tmp1, vector_length, 3 // vector_length * 8 > - lsl tmp2, vector_length, 5 // vector_length * 32 > - .p2align 3 > -1: cmp rest, tmp2 > - b.cc L(unroll8) > - st1b_unroll > - add dst, dst, tmp1 > - st1b_unroll > - add dst, dst, tmp1 > - st1b_unroll > - add dst, dst, tmp1 > - st1b_unroll > - add dst, dst, tmp1 > - sub rest, rest, tmp2 > - b 1b > > L(unroll8): > lsl tmp1, vector_length, 3 > @@ -155,7 +139,7 @@ L(L1_prefetch): // if rest >= L1_SIZE > sub rest, rest, CACHE_LINE_SIZE * 2 > cmp rest, L1_SIZE > b.ge 1b > - cbnz rest, L(unroll32) > + cbnz rest, L(unroll8) > ret > > // count >= L2_SIZE > --