From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS, UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id F410B1F8C6 for ; Tue, 10 Aug 2021 09:39:31 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 339ED386481A for ; Tue, 10 Aug 2021 09:39:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 339ED386481A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628588371; bh=xlvdSvdX2IIQCMO69uDY5qGcfwp2YByNbB6ka96254U=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=vB2fDHJD9T1OklPlGtKuSAPYnwrUCGbD5zQUVEz6BxYtlmZMf8SFjj3gO+tPreLz2 28OcD/IkMDJwMRcm0/Oh1JLz2R1sbaESOMowLNhMoDjsdNJJgYwfnXF7scuuGgDuCL n4HlzSn6lug+6/dI3RIdsKRzhLnVS8XeVLgfgsLo= Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80082.outbound.protection.outlook.com [40.107.8.82]) by sourceware.org (Postfix) with ESMTPS id D426C395205C for ; Tue, 10 Aug 2021 09:38:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D426C395205C Received: from AM0PR01CA0151.eurprd01.prod.exchangelabs.com (2603:10a6:208:aa::20) by DBBPR08MB4396.eurprd08.prod.outlook.com (2603:10a6:10:ca::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17; Tue, 10 Aug 2021 09:38:42 +0000 Received: from VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:aa:cafe::52) by AM0PR01CA0151.outlook.office365.com (2603:10a6:208:aa::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Tue, 10 Aug 2021 09:38:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT063.mail.protection.outlook.com (10.152.18.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Tue, 10 Aug 2021 09:38:41 +0000 Received: ("Tessian outbound d9f41274f41a:v101"); Tue, 10 Aug 2021 09:38:41 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 0851bb8f5c5c04e7 X-CR-MTA-TID: 64aa7808 Received: from 8f54fec0ca8b.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 528DADD6-8B49-4A4B-9987-E7858E334EFA.1; Tue, 10 Aug 2021 09:38:34 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 8f54fec0ca8b.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 10 Aug 2021 09:38:34 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n3f8auH8+Mj4lkL5CsspPeyRfrddMkScLPqEeXBvLV+a8y/U8g/nmFgHERZXHX9Tf+ZIPS9szYEyIZNSy900DBHVSd5zuLblLXFCUcBLxdstHTcDeadkIDhB1WAOs/2s6snEQgNVJ3MiIReNFbPCithW3YLTpaeyGmsc74S4DiHao2cbWgsOUxZpVMM/jrcx86Mgis+48J0mFkl9r8m6awZ7aL3yaCPn+J2EleJ1LZ9FgwfNAaY3P6hf4azARsr8hU6t9tU+mPPKSDNQyCFFtHEXVqtLH9dpnWtDPZF2qIfW06Q7J/7trvuLpYezkdUBNGHKNfe8hHBWPTQUOLMIlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xlvdSvdX2IIQCMO69uDY5qGcfwp2YByNbB6ka96254U=; b=Q/5/bx5XTyKL+u0EgFB+mJl3D8mmdv49egW6UjlWTJjunKBZToAq5nDA5vAkQH4yGKqK/j/jJg2aw9i81cM+0xTdMUJFT+RFFpCNHobUEMQruhcVihAgXWpKKbE7Me0/Uw68UK/f9XsdBUYfb9gs2hvq1LTQWXJ7rbnrUv1jJbW/as39R8YWsTx8s0Nf+MxGQVppsRUNOG77MWF90Ai40V+hE4P9CS0LBla62kb67O5Gajbarejr2nCqgpyasjel3RML3dKCxVaW//kYTMjJcnhswrT82rzbiC+ff4ZGLydNJBb3IrGLrW7LjXu452cxLO+crzu5NQhr6R8piMwCyQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) by PAXPR08MB6589.eurprd08.prod.outlook.com (2603:10a6:102:159::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.19; Tue, 10 Aug 2021 09:38:33 +0000 Received: from PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6]) by PA4PR08MB6320.eurprd08.prod.outlook.com ([fe80::cd22:a583:c97c:72a6%7]) with mapi id 15.20.4415.013; Tue, 10 Aug 2021 09:38:33 +0000 Date: Tue, 10 Aug 2021 10:38:31 +0100 To: Wilco Dijkstra Subject: Re: [PATCH v4 2/5] AArch64: Improve A64FX memset for large sizes Message-ID: <20210810093830.GD20410@arm.com> References: Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO2P265CA0231.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:b::27) To PA4PR08MB6320.eurprd08.prod.outlook.com (2603:10a6:102:e5::9) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.49) by LO2P265CA0231.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:b::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Tue, 10 Aug 2021 09:38:33 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8cc40671-211a-4187-b363-08d95be2a3cc X-MS-TrafficTypeDiagnostic: PAXPR08MB6589:|DBBPR08MB4396: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:576;OLM:576; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: qnk4tXZl45Rvy+IgsKtZO4XWjrgV9ORLo4K8Rjo+QGIcEO4Tn1bXFL3M44M57+dfpXWTRO0CzeTooj1xXuzmLMU3qckGSPV5y0CAQEYO7LSqWUsUlSSQA7gTss0lEhZELuMwrzJQS4axp+U5h5jGYDbKvjgSfIRop2zJGbXZ/qxJKVSeVa6kVCWC5IQQReDJOrkr0m9sVvFkdogKy7HXSs+rl1Wd70VE8TxoZpUgOXIazq+TF/ahQEfhynIQqPl9sGX/qDAyItQjF9ie70e7gWTU6DR29w2krSD0pH0jrpBPnqfdm27s8g3NXwO8kkygy/AyJd1YRmLaqYzMqcezY8AwLq49fxYfdx6BbjcrGH7xwXMWIkNA5ZOJN1SxOKgfPyiByiYXVgptboaJrlzAF2Qr7LWofM00lFmS/ImcQLP1+mI1ts1vOxVdpXkKzIKqQDb4zwEBjzKgzATXfz2We7SpHKwqo4ID1496CjiK5MtQEPkgWOOjfYs1eYnEkRac5JIzU3OADx7CuDc8dm4ogwbEiH9+iPQYOg5pmKSEtXDAHvbvrsHIhy6AwCNFo893lYQujMjd6FSx7vsG1PAnxslW3lKRMY9/BcDlK9BRy8X54G3cYZU/VMc3rpUPFUh5DbrpHg5ZIl8diwsQBFBBrM+1pqm6vkiuw3gZI2DMhPi48RgDrkfHUGr8pcIauD/3+HZyHmGpSjqMnFKVJZmTesSQZW3MDJT5j6rD0Hz7Ne4= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PA4PR08MB6320.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(396003)(376002)(366004)(136003)(39850400004)(66946007)(956004)(66556008)(2616005)(66476007)(83380400001)(44832011)(33656002)(8676002)(54906003)(8936002)(37006003)(316002)(55016002)(2906002)(1076003)(52116002)(8886007)(186003)(5660300002)(38100700002)(38350700002)(6862004)(7696005)(26005)(86362001)(6636002)(478600001)(4326008)(36756003)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?cDNKT3FzbmJjZXI0aXJKUGRKRC8vTGQ1eGJSZEMzMllWTUNRbHc0Rjk1L2gz?= =?utf-8?B?T0p1OWR4TkdJTFU0anlXZ2tkSFdSbVlsKzl0QXJIQWJuTGRtcXJ0NENyU3R2?= =?utf-8?B?OGVHRlozRzM0N1dJNE5pWVZkWXZOQUpKKzM3YWxrQmdPNEc3Nlk4aUFqT0pT?= =?utf-8?B?Wk1tQjJLMVM1cE5jYWY5dzlxOG1UQlpicmJoTVI2c3ZrcTd4aHg4N29oWWIw?= =?utf-8?B?N3ZKZmZKaEdCS0VIaSs1Q0RCdm9MMlpzNWVyT3FNUGNYeUtkanlYNGNBZE1p?= =?utf-8?B?MzdiZ0hGaE5qYzh2MnF0VFhPRVp3ejdqYlI0eGVlUHZFMXpJRXBNN2xFQ1hn?= =?utf-8?B?RlV3S3BCU0tpQXlpcWkwYTZadGJ3dm1Vckp0KzRnNjhCTTg3Q2dvRWowTytT?= =?utf-8?B?NTNBUjJYZ1ZzRVo2RDg0cjJGcmNYR2RsUXM4MnNyakRGSzdyVUJqQ2Z3M2pj?= =?utf-8?B?UUVlNlhTWHMyMXJoak02dVNURUxsWmRhQkNBdlVGdVBzOUpLczQvY29tNU1x?= =?utf-8?B?aUxTVWN2NDNpczRTT3E1ZzdxNjk3MDU3NGFTNDFOUmlWSHBjaElWTTR5bUxQ?= =?utf-8?B?OWVmdUdQQTFINXIxWi9ndVlxRVhGYzRnTjZXRWxxc0tHM3J3aTRDbFVYd0No?= =?utf-8?B?ZVlrZVVXTXJUb2xyNmZ2MGdWVkZvdFM1Q3lFYnl4Wk5oSDE3NnJsVGNGN21o?= =?utf-8?B?TTFGTnY0Njl2M21rbFo5MTBvUnFLczh3bFM4V3l0bWtRd2ZiNlRMWCtXNjRv?= =?utf-8?B?NDJoRkt2bzluWVkwYWwydVhUL3gzTjJhRm1uc1VKcmYzQ25LbHl6YWxvS0FX?= =?utf-8?B?T3NmaTQ2NDR5QTZCalkyVkNvcEdIR0hHWjVDTm05NWZJZVRVM1hUMWlGYXly?= =?utf-8?B?ckowK3o5SkhsWkFBbDBHY3pGWE94M1pVQWVRL0Zjem5TcnNLOCtKNnN2a1U4?= =?utf-8?B?dmJnb2VYaTNldjQxVEZpK29KOHIzQVkyaGt3ZkRPaE9UcTNYM1Y1YnoyQWx5?= =?utf-8?B?Sm1FL05mUHgxbkY4dWJSdjVadEJqN3dIWFkwMEY0RXg5UFE1TXNXRnFGMDVh?= =?utf-8?B?OENIM1BjbmJWa1NPRWhSZkFQM3h0ZXhEQzRsQkU5YndxQk5uZGJsa2lJVmto?= =?utf-8?B?VDZKdWpmUTYyUi9rZXduUkpoeHdPQ1k4N1RpZ1YxLzRQbDJsbks2REdLMFEy?= =?utf-8?B?elo1djdUV1l4QmlmaGxYd0FFYlR5QVFXZkdOV29YYmNHUkVrcnZScHdsckVr?= =?utf-8?B?aE9URmZGRVpKYXlGdDZXZmloU0JTaElUTzdRL3lSUkJQNWoxN0V3L1k1Mmhl?= =?utf-8?B?N3Z5VjF6T0xJK3lQZitZQ29pQm5uTENlTFNkU0lvd0E1YTFjR2FobHBQd3lQ?= =?utf-8?B?a2FOYkZCTUVqdXZ0Q1F4NG13elFuSjY0UlloMGJjY2pIc0JUdnMrZncvS0d2?= =?utf-8?B?VWRKSkNlYXM3NCswSEphNUhTeEJJR2pERmRpekI0aUswTCs0d1Y5azVFNnRZ?= =?utf-8?B?L3N4NExaYVM3azhzb0h0OWxwczlLUXlMT0I2aW9HK2JxeFpjNGNnakM2WHBq?= =?utf-8?B?ZHFSWFM2N0taa25UUnNKcURxbXZJUUtjTHBtaDY2OFZ4SWZlTnVJRC83WUtX?= =?utf-8?B?cy9kdWpJd3FRQll5aXZ5RnBjN2hlTGV1QUVJMXloNG5udWlVL0x5eks0MndB?= =?utf-8?B?N1V4WEZOYVU0bGY5ZElKMWI4VGhtK0YzZFZaYjJzUlM4RElmbm5qS3l6MS9s?= =?utf-8?Q?rA0H0Jm2gcS2IBQlsSPEJ34IuHyz3h5qYxYLg7/?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6589 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 8de000b2-12c9-4bef-8b19-08d95be29eb7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: suoPPZfe+jM60AcKmJY1eePgaTHXj0u5kV2SbgqR6tXjAwF+rXY6YOqdDf5U0ArKhU4SQzhYmvPRTVlEy6XxxqfOV67DgHmpDKDl4hStv2w56PRi+sHYkyO9Y/QuDuLjxRlrbZVJJj52bLGd83HQdP+Bq1vElZnhPIsUqZTDBH68CNC6Ar/THlg4x6/aQPonAlCQ1ZaFXbSb89IR1uzzYy4isxUyoln3ugnOKjj6v7/a4Qjm/Ss3i0ix+4LCX0DdNvufGRgKpwpISoHXTkpjXeZgeU7f3Q0ucsdPQjXWtwfe7JEu3Vu39PuaiQcBgl66Nl6M4BXXX1Q+wYSbN0mQP68/ooeoqA7e+BYpcR28Aw7hNidimlNR7rS1MM2Hot/Xewllf6v+8419373zXw7uzyINvRTtL65AjMSSyYf1n++veum9js6nyTEhuBKr5pCTyVNWMULa9lgWGwdYhGH5DtKY+p75XDh0sYEnNW4b83pnBakhnKwaxlGvDoRGC78Krk8ynUL1w+Bw1hhiMqCzFmwTh9staomzM/nkfrspjyU9PmzEWhG/654G3xfuu0BiPLm/BnmczK/ti+6mwNW89k2Br4MPXaHR91W887P0c7SDUFrz6IP9J2tbgjPj7R50qgV73W9JnSW8XF0myGwVh5A2JSmQldZQ4tZyRWQvhfvEO10WCOZLOE48lhlgALrGM1MwyLXi0P5BfgYj3CVmVeKIhtHWfazLidbCR8dPXJk= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(39850400004)(396003)(346002)(136003)(36840700001)(46966006)(44832011)(6862004)(5660300002)(82310400003)(36860700001)(86362001)(2616005)(956004)(70206006)(6636002)(36756003)(8886007)(4326008)(47076005)(7696005)(82740400003)(81166007)(70586007)(186003)(37006003)(54906003)(2906002)(33656002)(336012)(26005)(1076003)(316002)(55016002)(8676002)(8936002)(356005)(83380400001)(478600001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Aug 2021 09:38:41.7304 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8cc40671-211a-4187-b363-08d95be2a3cc X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB4396 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Szabolcs Nagy via Libc-alpha Reply-To: Szabolcs Nagy Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The 08/09/2021 16:17, Wilco Dijkstra via Libc-alpha wrote: > v4: Slightly tweak alignment code > > Improve performance of large memsets. Simplify alignment code. For zero memset use DC ZVA, > which almost doubles performance. For non-zero memsets use the unroll8 loop which is about 10% faster. this is OK to commit. you should keep Reviewed-by: Naohiro Tamura in the commit message if there are only minor tweaks or no changes. > > --- > > diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S > index cf3d402ef681a9d98964d1751537945692a1ae68..6bc8ef5e0c84dbb59a57d114ae6ec8e3fa3822ad 100644 > --- a/sysdeps/aarch64/multiarch/memset_a64fx.S > +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S > @@ -27,14 +27,11 @@ > */ > > #define L1_SIZE (64*1024) // L1 64KB > -#define L2_SIZE (8*1024*1024) // L2 8MB - 1MB > +#define L2_SIZE (8*1024*1024) // L2 8MB > #define CACHE_LINE_SIZE 256 > #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 > -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill distance > -#define rest x8 > +#define rest x2 > #define vector_length x9 > -#define vl_remainder x10 // vector_length remainder > -#define cl_remainder x11 // CACHE_LINE_SIZE remainder > > #if HAVE_AARCH64_SVE_ASM > # if IS_IN (libc) > @@ -42,14 +39,6 @@ > > .arch armv8.2-a+sve > > - .macro dc_zva times > - dc zva, tmp1 > - add tmp1, tmp1, CACHE_LINE_SIZE > - .if \times-1 > - dc_zva "(\times-1)" > - .endif > - .endm > - > .macro st1b_unroll first=0, last=7 > st1b z0.b, p0, [dst, \first, mul vl] > .if \last-\first > @@ -188,54 +177,30 @@ L(L1_prefetch): // if rest >= L1_SIZE > cbnz rest, L(unroll32) > ret > > -L(L2): > - // align dst address at vector_length byte boundary > - sub tmp1, vector_length, 1 > - ands tmp2, dst, tmp1 > - // if vl_remainder == 0 > - b.eq 1f > - sub vl_remainder, vector_length, tmp2 > - // process remainder until the first vector_length boundary > - whilelt p2.b, xzr, vl_remainder > - st1b z0.b, p2, [dst] > - add dst, dst, vl_remainder > - sub rest, rest, vl_remainder > - // align dstin address at CACHE_LINE_SIZE byte boundary > -1: mov tmp1, CACHE_LINE_SIZE > - ands tmp2, dst, CACHE_LINE_SIZE - 1 > - // if cl_remainder == 0 > - b.eq L(L2_dc_zva) > - sub cl_remainder, tmp1, tmp2 > - // process remainder until the first CACHE_LINE_SIZE boundary > - mov tmp1, xzr // index > -2: whilelt p2.b, tmp1, cl_remainder > - st1b z0.b, p2, [dst, tmp1] > - incb tmp1 > - cmp tmp1, cl_remainder > - b.lo 2b > - add dst, dst, cl_remainder > - sub rest, rest, cl_remainder > - > -L(L2_dc_zva): > - // zero fill > - mov tmp1, dst > - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1 > - mov zva_len, ZF_DIST > - add tmp1, zva_len, CACHE_LINE_SIZE * 2 > - // unroll > + // count >= L2_SIZE > .p2align 3 > -1: st1b_unroll 0, 3 > - add tmp2, dst, zva_len > - dc zva, tmp2 > - st1b_unroll 4, 7 > - add tmp2, tmp2, CACHE_LINE_SIZE > - dc zva, tmp2 > - add dst, dst, CACHE_LINE_SIZE * 2 > - sub rest, rest, CACHE_LINE_SIZE * 2 > - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2 > - b.ge 1b > - cbnz rest, L(unroll8) > - ret > +L(L2): > + tst valw, 255 > + b.ne L(unroll8) > + // align dst to CACHE_LINE_SIZE byte boundary > + and tmp2, dst, CACHE_LINE_SIZE - 1 > + st1b z0.b, p0, [dst, 0, mul vl] > + st1b z0.b, p0, [dst, 1, mul vl] > + st1b z0.b, p0, [dst, 2, mul vl] > + st1b z0.b, p0, [dst, 3, mul vl] > + sub dst, dst, tmp2 > + add count, count, tmp2 > + > + // clear cachelines using DC ZVA > + sub count, count, CACHE_LINE_SIZE * 2 > + .p2align 4 > +1: add dst, dst, CACHE_LINE_SIZE > + dc zva, dst > + subs count, count, CACHE_LINE_SIZE > + b.hi 1b > + add count, count, CACHE_LINE_SIZE > + add dst, dst, CACHE_LINE_SIZE > + b L(last) > > END (MEMSET) > libc_hidden_builtin_def (MEMSET) > --