From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4D9381F4B4 for ; Tue, 20 Apr 2021 14:45:13 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 42FC0383B419; Tue, 20 Apr 2021 14:45:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 42FC0383B419 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1618929911; bh=8Xg6W8JHpy26L/Axn4DheZ3MGf6i3zgp/Is4AGdwEbw=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fVhiuc3+mlGMNDiK/teJTNGzpHHTLPEs+YTC5z5VP9B/GdDD0y8x2s47gZcYJem33 gwtOf73nuC4K6vS3hjN8J0nuI7Pxh5dmyKh9qJ6Us4GX4szS2v0ZZooVF2KpDZP37D WXtOgtMHfQo/i89CkPpz9s+SWfl72hcHjB6P5CUk= Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60066.outbound.protection.outlook.com [40.107.6.66]) by sourceware.org (Postfix) with ESMTPS id 2B03B3857C44 for ; Tue, 20 Apr 2021 14:45:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2B03B3857C44 Received: from DB6PR07CA0057.eurprd07.prod.outlook.com (2603:10a6:6:2a::19) by AM9PR08MB6900.eurprd08.prod.outlook.com (2603:10a6:20b:302::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.16; Tue, 20 Apr 2021 14:45:05 +0000 Received: from DB5EUR03FT052.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:2a:cafe::c2) by DB6PR07CA0057.outlook.office365.com (2603:10a6:6:2a::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.6 via Frontend Transport; Tue, 20 Apr 2021 14:45:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT052.mail.protection.outlook.com (10.152.21.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.16 via Frontend Transport; Tue, 20 Apr 2021 14:45:05 +0000 Received: ("Tessian outbound 4ee49f77c636:v90"); Tue, 20 Apr 2021 14:45:05 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: e97f5f84174982cb X-CR-MTA-TID: 64aa7808 Received: from a5752e4f6941.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 976D0DCE-D3D2-4BDF-B255-EE57F5A0FBC8.1; Tue, 20 Apr 2021 14:44:59 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id a5752e4f6941.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 20 Apr 2021 14:44:59 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cjYzxZd0fpzD2RmVXUkZRFajp0lxa8e8nQQ/MOnE7dTWIkzvASE2/EPRhFSv4mKXD1tCt+6m9v2442ueidP3mvuCR7xfcOhu5kMOwzfklT7mCq7bDlv5xikv4hBjXDP+EB9S8r8rb8tt8VzSsh2zFywXNyEwXAdvbAX4hDG1x07OyfFkLQK2vaLFHFj5DCHfdbXxkn0VCFogaYaiDEP0ExZU+exavfnFnGW3G/SYmu1vJwZzY5nOn4Uklgm7A/JD09jWcWFrMddHbC8JnI8VX+pEPqhN/6Lg4qRYrJcoaGZqWXYYLzA48t/y06ZVTNqrm22MVAUaNCboPqYE+UCEzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8Xg6W8JHpy26L/Axn4DheZ3MGf6i3zgp/Is4AGdwEbw=; b=W2vyqHveMi/SNr8m4eSRf+o2I1A6AOnNOfJUEmewgieo2tJJ7MSrMR8yDS+i/1Q5bZje5/bSsx2toyTBAF9nEoJ7pWuW+6uYv2hAyhPPy9Gz1ydiSgQ1aYABQ9SvVsyyqONFVtnWCAlTs+1R8KJQxkhkYf+fRLWixoNQdF4VC3bUgBpvXmbfC+rIb4Mhav1Nr6eDK+IuuqDWDSch31XPtnehUlUSY27Pd0L+mJm7qzI77d5psvdEiPzEPnDo0bpmIrul58daryxMwgLdp787KDKCwvVvqtao7eonJ5gJ255bU9LLj3BEo2+eP0Xfc3yifnsUQ9m68SuaBYuNpVGF4w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB5341.eurprd08.prod.outlook.com (2603:10a6:803:135::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4042.18; Tue, 20 Apr 2021 14:44:58 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::385c:f8ff:ee16:3a4d]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::385c:f8ff:ee16:3a4d%6]) with mapi id 15.20.4042.024; Tue, 20 Apr 2021 14:44:57 +0000 To: "naohirot@fujitsu.com" Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uACI2QUIAAyRGk Date: Tue, 20 Apr 2021 14:44:57 +0000 Message-ID: References: , , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-originating-ip: [82.24.249.100] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: c0e744da-01ac-4511-bfb4-08d9040ae2e7 x-ms-traffictypediagnostic: VI1PR08MB5341:|AM9PR08MB6900: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: YKwjvYj3x18VGhjWr3QJ1E0he4X3qUEP9URcqxCiIRMHH2LpaTxRv2O+z5XNW4FkwAdVVjJCjFDBfOTq7i7Qt7iBmU7Dmllzjj74/rB9ZtbXBJ7k/0jsYJR+S0SVVIdKtedQGBmAwrdFK7Pk+tDRvUzzvFVabPrVlECJi/543iLXw5EW48IqqrZfDUorC67UrsbqMHl6KrBX1RLkKcGBoFvVHlE3M2nJRbxUMSizNnGV6m693e+UEy1g7RzBZtczj59rH2jhVcrvkD5z/MIx7wZpA96/sP0qbJtRzF8xgm6nHqvyowhg0GmT2Kw1hE/w7Txv1mIQGJjYt742q00oR8EnSww8yZv6AW9pzEvG5ZpJ/NABUh/G+VxwA2bhzw9sCepvONb0yqWT80Rqg5Wnv/3z0qPhwwo8hmPsBNQdzRkj0uhD3yZ6BhPciMFCz+wUxjSmDagfpnmDpLRQ6mlLc4Cl++txFcHRe/sIc5v9KMBweNm+j7vvMVHEkc5rTAk3Yg9aNYu+3rAdqe6o/qlYDEc4XdjPFXS1uLHK7jm+nnbFBejD1L6L0tUdN5E/dFF2b7Gg/CSKMoq/hvtKDVKZHHe0h5rOqjErlsRJvZjNeyKi3elia6gERr8eww2ql9EvJMDRx5XPWUxoMHlY66zVJWJ64bOPPm2sPVYd4DKpmfs= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(346002)(366004)(396003)(136003)(39840400004)(52536014)(54906003)(4326008)(33656002)(9686003)(71200400001)(26005)(6506007)(66476007)(55016002)(186003)(8676002)(8936002)(6916009)(122000001)(64756008)(316002)(2906002)(478600001)(66556008)(7696005)(66446008)(83380400001)(5660300002)(76116006)(66946007)(86362001)(38100700002)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?Q?vnUbYuAxFtkLKc3aBSs5sSA16Kn9UNrhSORnEFZI7r5vW+V0KP5WO02eYs?= =?iso-8859-1?Q?5WdlGlAXWMYXGO4sGd26RkpEPpYFHWAd1NqyKT6KH55ezEuwbo2Wo1PMUn?= =?iso-8859-1?Q?ODFjNbKx8ECvbmp2B2OZ0MjvuwowWAQlK1aiGvJMTii/mDW/BJQ1cW55Au?= =?iso-8859-1?Q?s5oksr8eMMWGFeYckLrMNjZxNrKznHLe3bF2VmWHFrLRsG/aLcEhQZBdHx?= =?iso-8859-1?Q?szdCmoXUMaHnw+7twJS+c6Rm8lcFeugE8WtxBpIe5MvKjrH2Z6t/PNML1+?= =?iso-8859-1?Q?rpwaX3qOWMGVSY/SoL4gXHR810+rKz+HEecdkZCH+Mqkl3+97Ww2CU9oTK?= =?iso-8859-1?Q?JXNN1ewfmvX2R0G6j03/6MwnkwvF1s/BLsCTxNq3xChmqhxFltFR5G93wM?= =?iso-8859-1?Q?dJZw3TypMQOkWxuEOnJ9KbiNVLdwsnDJ9BIhzE25yLGtg0J+DcFqJvo2d6?= =?iso-8859-1?Q?WeYU4ZG1G5oHOZpbrBrzI2qmXTWDz2yWoOJAw/Vh4eS++XQeMT1Js5hi9r?= =?iso-8859-1?Q?XCEMCPJ6mDGUc09t1PeKkX6IPRKArGqGuTSXRi/cHbBb4XzPwF75vnyJlS?= =?iso-8859-1?Q?uVOZivDePZlUkx+tGnPUECJlvzGHVpuVWyinN52F3tapIaq9PnBzHmK5Hu?= =?iso-8859-1?Q?GTd26SBkzlknzo7zOzYCXOYxxuvGFxaLV8JhuD42pLHaXHHkPwhfg2sle8?= =?iso-8859-1?Q?V5a7vHjiUz0hs7MztbD/cDgHjMhDjZm8y9sdUZDSVzTZhB7hut5wy/KrYe?= =?iso-8859-1?Q?p7bgF/9+Uo7nG055M27+xqL11TnyJKQItBG/rDQIqe3P0bGkhE7pTs5H8j?= =?iso-8859-1?Q?x2HE0CzOtgb8PxsP06EHWreHnG6ZmwY8djHbCJ0591qLTjGUSqI+gyBkw7?= =?iso-8859-1?Q?bN8p4PQ7K+yFuNk7nQ/Sz4dgaKjfBGYixS2CxW8pWK3pzyOHC9Vo2EmYdv?= =?iso-8859-1?Q?Rx/0QyJdAZkTARxVVTFM/rf8o31yucqlkBWeAg6Co/sqD5CDQIegMad5l5?= =?iso-8859-1?Q?hzEMqALlJT/lI64ycTfgK01dYtKlx/bqD6rvfNTfI2If+z8jLS5ekjoCNs?= =?iso-8859-1?Q?5WVw1JwzBF7HjkKbYj7C/x0CZaDiYmmnbL1e9iUVPAsC1kGcTVAe7OZb+/?= =?iso-8859-1?Q?zn4RlH34iCMkXYBU2SXxKAqfiyPAKRNXl+Ih0T/QaO5EhS3J29py32FgNF?= =?iso-8859-1?Q?plIgjDYk4eRyx0sL+VswaIGbZh8tAzk6TfKpboeWPrTQdPe63z8mLmnb9+?= =?iso-8859-1?Q?8TWMVndsX1SuXVetKmcA9JwQjHFzY3rtLNUkArjzBjRqf5asDIPnu/4d24?= =?iso-8859-1?Q?4sE0D4hDH5q4aHQQHXVPHOncgEDcNl5IgpybzZFj3HDage0=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5341 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT052.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: c3f5a7e8-93c8-4a2a-d4a4-08d9040ade91 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mS/KkarAMTXJeD+Rit1HQTv0FcdN9SHysLMqciruW5zWHSFLeSwpsvFADBtLiC3LGF8RJaY/LYHLN8HvgohsI6hUl2xyFw5cgZfca2edMK8XVFByIwDrloYuNSbqN/zlePAa/EKaet3P3k0Nr6h/8HiP9IOaTNjaV85d9yPsUv789FdEP6ZfoJ2ghK3Jrb2wvNxurOHPOYuPgv4KC2eZKGS+0WH16KmwH2CXJ0woUZFRPQVuSZj/HcbxGlSWxpQPSN2+r1UCyLQNtObW5JSunryTd65W9yvcNm/2fZTTEwgHTAVTtZg0FZW3RmvdEX/cA5nZqfIWRdEmYR28oaymiq4CDq71S7DJSiZV3UnJaclrYUPJG9ZChZTskIF3iuSwUxO8bPrsheaun26vAVvujtSBjS2M5HqVWj6K63IPCcF42Dww/lDmpUrwj/POzkIbleR0rK1ctK96b+WPltz7uksP9/jpGZ3G8+KKQuuDQkLplfxF0BrNaBli3cMgSuBDg0ofUr4tcSKv7/rv4PtSq0sf1li0aXyRNSBiRp9OuYYQVALNAbCyuQeJDEmrIk2pVNIihEli43POacgAWLb9G7mNZ9VwLurR6wgbMSD+TsJpoxUCT0L1fQzayww/Ro9E3ahb2Y7yQObceQsDHtQr/KYxHL7+i/BLFteC03NHreXgnDMl/Pj6GDCLkiQ80ZVZ X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(396003)(136003)(39840400004)(346002)(376002)(46966006)(36840700001)(6506007)(47076005)(336012)(316002)(26005)(36860700001)(82310400003)(81166007)(9686003)(70586007)(33656002)(55016002)(86362001)(4326008)(8676002)(186003)(7696005)(2906002)(8936002)(52536014)(5660300002)(6862004)(478600001)(356005)(54906003)(70206006)(83380400001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2021 14:45:05.2583 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c0e744da-01ac-4511-bfb4-08d9040ae2e7 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT052.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6900 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: Szabolcs Nagy , 'GNU C Library' Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" Hi Naohiro,=0A= =0A= > Case 4 [1] improved the performance in the size range more than 4MB from = Case 1=0A= > 7.5-10 GB/sec [2] to 10-10.5 GB/sec [3].=0A= >=0A= > Case 1: DC_ZVA + L1 prefetch + L2 + prefetch [2]=0A= > Case 2: DC_ZVA + L1 prefetch=0A= > Case 3: DC_ZVA + L2 prefetch=0A= > Case 4: DC_ZVA only [3]=0A= =0A= That is great news - it simplifies the loop a lot, and it is faster too!=0A= =0A= >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Also = why would the DC_ZVA=0A= >> need to be done so early? It seems to me that cleaning the cacheline jus= t before=0A= >> you write it works best since that avoids accidentally replacing it.=0A= >> =0A= >=0A= > Yes, I moved it closer, please look at the change [1].=0A= =0A= What I meant is, why is ZF_DIST so huge? I don't see how that helps. Is the= re any penalty=0A= if we did it like this (or possibly with 1-2 cachelines offset)?=0A= =0A= dc zva, dest_ptr=0A= st1b z0.b, p0, [dest_ptr, #0, mul vl]=0A= st1b z1.b, p0, [dest_ptr, #1, mul vl]=0A= st1b z2.b, p0, [dest_ptr, #2, mul vl]=0A= st1b z3.b, p0, [dest_ptr, #3, mul vl]=0A= =0A= This would remove almost all initialization code from the start of L(L2_dc_= zva).=0A= =0A= Cheers,=0A= Wilco=