From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 55FE81F8C6 for ; Mon, 9 Aug 2021 16:17:05 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6776C3890406 for ; Mon, 9 Aug 2021 16:17:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6776C3890406 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628525823; bh=SmfZ0MkR35y0oiYna5Lr2eFn2IsAOKpOTV+KptGND6o=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=sDNrcd+aXn8NpPp86R+ZGBaIqC2azd0XB9RMFQi5jak+myl4BluZNHEqOborcQ5GG 4gD4jLx8+KFSXfd/fUDk3dOE6ZRIHuOZOPOkcjbZ+kozvGUZiyEG41VRkUTA0//8aU wpJvow04BLzzzJNQDuh8cnSl5qk1zV8DjxpW+vF0= Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-eopbgr140087.outbound.protection.outlook.com [40.107.14.87]) by sourceware.org (Postfix) with ESMTPS id AAE38385E019 for ; Mon, 9 Aug 2021 16:16:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AAE38385E019 Received: from AM5PR04CA0027.eurprd04.prod.outlook.com (2603:10a6:206:1::40) by AM5PR0801MB1732.eurprd08.prod.outlook.com (2603:10a6:203:35::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.17; Mon, 9 Aug 2021 16:16:39 +0000 Received: from AM5EUR03FT017.eop-EUR03.prod.protection.outlook.com (2603:10a6:206:1:cafe::dc) by AM5PR04CA0027.outlook.office365.com (2603:10a6:206:1::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 16:16:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT017.mail.protection.outlook.com (10.152.16.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 16:16:38 +0000 Received: ("Tessian outbound 79bfeeb089c1:v101"); Mon, 09 Aug 2021 16:16:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 81da5b57f888ea1f X-CR-MTA-TID: 64aa7808 Received: from e11ad9f2bfc1.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B58E6B23-2D11-4155-AB93-0786C5FF4605.1; Mon, 09 Aug 2021 16:16:31 +0000 Received: from EUR02-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e11ad9f2bfc1.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 16:16:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=M0DWsKw8YpxDMZnhGiBa/e17QWESfRNBu1LZss/uajl9LwCoPByCs6SoNBTpOtYFOlKIPz1pWlaLEuZ96PGb6V9mC9iZHMO0bnMrK0hIR4ef8kKcrJh+oXiY0vAecGk9qDt/KOqLppGRKbGPJeyjeNHxZROSnbrVpp9R3QBJ0GNYgOF49Lu202SjZbLVYfMYb7xRLLBoi5LoghtMWeuizH44IuIc6uLM9dLTr4gdFu9r9ssmy9gP+SjskZbCE8qk+RnsJCGzytN1X5MUwo0a5IgPdGfFaOwBC10p2cQc6LDukgz4lSz6uoUK9oX4C0TkCgsz0kp+0L0j4jt45OOXWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=SmfZ0MkR35y0oiYna5Lr2eFn2IsAOKpOTV+KptGND6o=; b=XtgbmQIUPQ3Q5euJYgTgPBG9EQuC6ZukfVnzgNjn2/4Fz0Fqypm+tLA+02Q1VfzcGLU5Dq1RRLLiEJJTBY1A6H0bQQlmjwYWcdr3kI37Ii/iVDCxCGRHJjZHw84ueJqkfAk+A7aFt+sCWhhzvpxaFku5Gn3IUhqYJa86fwFhDM8nR+YmK5czLqQc404SnlXxQXTy1zkChfDoRV5XlZ+CNJpa+3aCHyQmP9oa36vh4WNQiDykjopgfAFLU2mPs9ZqERMdQE0tjnlkC0lD7F9hyK6nxWrbeBDT/c5S8HAVzZvvE+p5V6QO8Vk8nIfqmkz2orqOwfLWVMkx8NgH2P9FJQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4032.eurprd08.prod.outlook.com (2603:10a6:803:e2::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16; Mon, 9 Aug 2021 16:16:29 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 16:16:29 +0000 To: "naohirot@fujitsu.com" Subject: Re: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 2/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxKaeH7mpjfTrkCBy3mxGimKXqtZRZ/wgAfpjZCACjBaZQ== Date: Mon, 9 Aug 2021 16:16:29 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-03T05:03:26.369Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 623f3e92-82ed-4691-64f0-08d95b511131 x-ms-traffictypediagnostic: VI1PR08MB4032:|AM5PR0801MB1732: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: dcfZh1f5ubbHwOhFZm8EkYz4OzKaMEjRlsWEQFhtFr9C2dscfoCB/OlzQbrWDBWf0gtFO8FjqquE+cuLUcXFVnT9QfX9kcvSBm6duNJ6VouPquoE8s74vJWuKlp37PQrduzi40SdN78Bp/k52KoBnXbG+8gbOHfI5dyIgcmZ0AbflNyZ7gapzyBSk8QkvYqWs1dekYVuDqdobPNimtNfIgeriB6N2YytU9LThhjUttu2YsdABbAaUkxl47Fw6UZQRAENw8Ttt+HOGJQ90uM7F/fONTzTFRCAxPF1Lfk/+LA4mDdpcQ9K0vBZkwp9+SP/jOOq1cTDOveaB9eRMsspc5LVYJoUcy4zju/mDh8rWtZ4nJN/CsUuf1MRo7kUwgEfiNhKCf+DaqUZloX+2jQp7pcnrHJY4ZSdQjlyhnxSFitZh3q9WpCr2NKePQwgcuE/q8pU5eG0rAVTW31bu0SG1TU5YBgDh0xNuP4Cb2RtqdkkLwEa/9aRhHqp4MVL4KEh0dKvrR1lFPg3meqG241lFGcREvF1IGYPJnaIPJF4eaa31MpOsuUUstnPbkxkUUFSrr2dFo08h8CyynLJPzUaVDz1wUZYtsXuWNI43EhzniIfQv2EgpqvM4PJ2GN3uq2DQ3FEUbIFzN7X4kMauHgvnHhaaAWcaQpt+SFhdFwNG/Xs3Ji0DLs4UWaDT0EU83q6oraJuZt6zSNSAXGgJoBzsA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(39860400002)(346002)(376002)(396003)(366004)(5660300002)(9686003)(66946007)(66476007)(66556008)(4326008)(6506007)(52536014)(76116006)(66446008)(64756008)(91956017)(7696005)(55016002)(38100700002)(4744005)(71200400001)(38070700005)(122000001)(86362001)(186003)(8936002)(8676002)(33656002)(478600001)(316002)(6916009)(26005)(2906002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?61gZadCtYonZeIfQoF7v7EyRaCGcU3QaXyKBwO4phpNHwrTPHHbW1lmLdo?= =?iso-8859-1?Q?UtZijPRIacdZe2KmAw8BaO1qy4Imtz6RsIghPqrotlX1f8g9TFADJFvDkA?= =?iso-8859-1?Q?ONF3amx00GgaosUP42xufp3ImoN/sEmmaaqNyBK57Hd9l0zmPY+Frgo3Md?= =?iso-8859-1?Q?JIUFbXRvH9yKLwUbCFadv/gqBv0JQKSa/XfO8rbNJJJivcAuGldNApfIhy?= =?iso-8859-1?Q?DnKM9Yfvjwueyd/KE7x1a8ce15DKF0TL9ta3SXSWLtfTifIVS6TtsXxJqG?= =?iso-8859-1?Q?QAwt0p8BapRaVavZnhAKbDbQIAR+dcYNScUNj9+dD7IXcIOcEOdBScWmER?= =?iso-8859-1?Q?OB4b8l7/Xg55ZVUzL6zC8xMkGTnGNGJRtxQTpykXWmKRm26esog3LZSYfz?= =?iso-8859-1?Q?P9WAzsnk+whHpJO4CloDPpzQm3ucv8R9kDL/twztm3JNbKXEW4Ju1cI/Js?= =?iso-8859-1?Q?GAUYKzzUBPr5vp6HSDKUCeYVbRWHNbmm/9l0ejhoAi/6PGbrWn0pC9P9I1?= =?iso-8859-1?Q?lSi5vKiENUu1My1MvUrERNmHVpG8DU+PJNuYcnkqsC4ZYEX3sjmtDUemx5?= =?iso-8859-1?Q?Q5gGWkla7yqAbl9Bbqx1Q/q0S9EFAUywU/cM27d03qpWGZbvO0rsohlLUa?= =?iso-8859-1?Q?9pIgOP3O9KA3U/WaQV4fRKGbMggnyRC0GQLnDgFDJW+mKhkPucujhHosw5?= =?iso-8859-1?Q?oJMnWpcHQJi0POR27UVkpwzAUJ/prpfsnwmZRtO+rgaTi3k+aDZUYvIhJX?= =?iso-8859-1?Q?EXeYQRhd0we1kGpKOzIAWdK1OIwDe8sTToKyToOHsyfMwA5W8QwfeL2wzv?= =?iso-8859-1?Q?0YapCrxsmgEM4nP3vaexdSrLa3tvDsALU82Rqy9VQ/eiATo59PYL07LHnW?= =?iso-8859-1?Q?MDibJcCW4pwedPxjZNm1gfjtPdNv1bWJoOkxW2bNCOjlO/4ozoTj2c0EXu?= =?iso-8859-1?Q?sG+Qk2490PYCXU5DDVUnjEsf0zLirg+xd2ci6dPXkMMq4dO0jWqIpPfnj/?= =?iso-8859-1?Q?aY2TqA13X4eQHLenHkrbtuSlpq8xxekXJPfhM39xxiLnmGhZF3LnXBcDjH?= =?iso-8859-1?Q?/kreinymjKHUeheE47b0ZchHANAvq/2DmFO5VcN0bUr59OwlqInpdhpc3d?= =?iso-8859-1?Q?Vd/zdV/3iN5rSpak93tNZwUko5tVPKhhGdmsLmEFT1Ov+V9kqbBXIUPOZ8?= =?iso-8859-1?Q?CqUVF9nuFVw7p1EHXC0CW36exzOdAjRhpvYxcpiA5mOxvgw2zylgfMiqsl?= =?iso-8859-1?Q?C3A14nypn15UoCpuXGY7L8HSXaCmYlJ/AYXNeHGhI0Q8VzSJ1+nGvKA63d?= =?iso-8859-1?Q?1NmJ9MdFPlqhPH2pBzxjQR1ALBfCi9OMNJ0KilS+gqcU1GI=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4032 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT017.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: a28cda8c-1df5-4691-c865-08d95b510ba8 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: PYPykK6HjIr3VKMgYIgDf+hgDZOhFXbvODBVGtatLPmQpCwcMNJ95b1+mc6XCw1aAijuP/wwl/ON4hZfU8c9j+mq2o9ilb10OSaYvSBOlNDowUdi/f1ot9giz+62YCYhAp28vnrS1a8Af0xO+oTEHdRcChTuku0YGcp4pjG46/Gs1htqyLWI55+EUZiZA0pZ7759olAfaGTYtvNPTH5cEe8jj4BNqCkpYv3+o/yyYmbg7oT4y/CPQvlEGHl+TzaMws5OVZhej1q2wLfPS9KwwQWlBAofpW1JDw9nxsB7clj05QhrFLt0f29pMcrXWoy1x+H94a4udwf8k6PANZyoVep/tnkUvrr0Pw23hHOMyWjaz9jJuAu6Dxn3JGh7oFUbXUbVwQKZUiVRxzdDqAa8MV/S2NwR+ciDfcAerTm1Lyjnl0hyQFgnoy0HcvDa55yHFwHb9K4RcxWSMdWp+u3gkh3BfEd5015eyLZFs6QxUU3B4Ur8JBEHUNmW9eOxKyHOOG2A5hE9AypowCg7IkG3ZSqWoqzm4p41V7FH/oCEIMLGK4AosdbFUl7nrfFBA9/WWdF24j7OO4AAIxT/9h2GECtzRo82e8lc3SIl0jM7OhUdmk/37yw2+o7Q9cOTMDhbKqh6bc18ViNdFiDBkIPihvBnQcZr50NQ69YAMjI+6fhXDfGjTokjmdM5djoYlrlLXwsJt6pP/aQzJ/tyQ7PnBw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(346002)(396003)(136003)(376002)(39860400002)(46966006)(36840700001)(2906002)(86362001)(8936002)(47076005)(70586007)(70206006)(55016002)(316002)(7696005)(336012)(8676002)(9686003)(26005)(81166007)(6506007)(186003)(478600001)(4326008)(4744005)(356005)(36860700001)(82740400003)(82310400003)(52536014)(33656002)(6862004)(5660300002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 16:16:38.7994 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 623f3e92-82ed-4691-64f0-08d95b511131 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT017.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1732 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Naohiro,=0A= =0A= > > +=A0=A0=A0=A0=A0=A0=A0 // align dst to CACHE_LINE_SIZE byte boundary=0A= > > +=A0=A0 and=A0=A0=A0=A0 tmp2, dst, CACHE_LINE_SIZE - 1=0A= > > +=A0=A0 sub=A0=A0=A0=A0 tmp2, tmp2, CACHE_LINE_SIZE=0A= =0A= > "tmp2" becomes always minus value.=0A= > I felt that it would be easier to understand and natural if it is reverse= d like this:=0A= >=0A= >=A0=A0=A0=A0=A0=A0 sub=A0=A0=A0=A0 tmp2, CACHE_LINE_SIZE, tmp2=0A= =0A= That's not a valid instruction though. I've just removed it in v4 since we = can=0A= delay the cacheline adjustment to dst and count to later instructions.=0A= =0A= > But comparing nonzero fill graph[6] with zero fill graph[4],=0A= > why DC ZVA is only effective more than 8MB for __memset_a64fx in spite=0A= > that DC ZVA is effective from smaller size for __memset_generic?=0A= =0A= Well it seems on A64FX DC ZVA is faster only when data is not in L1. So it = may=0A= be feasible to use DC ZVA for smaller sizes.=0A= =0A= Cheers,=0A= Wilco=