From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B04D81F8C6 for ; Fri, 27 Aug 2021 05:05:40 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 33D8D3858439 for ; Fri, 27 Aug 2021 05:05:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 33D8D3858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630040739; bh=3gYNSp+gvNW9i0j+YRGF1PcCZvSmhemSm1dLp8NJ+tc=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Zs1mg/6+/mTBBAhkAET+hvH38fFqY2YvrPdUd7EXzAUCAyuquZI7kpLI4bXYJxXSb ej/RMIgfvvbnCEqO7j+UMPzgVuD/jC/WHCStHXCi0dWIGXT+c1EkPWhi2/g6wXWwOH CWyBhUsph5T2KDzs6QOkb/71PPWzPfnndbDJ2n3w= Received: from esa8.fujitsucc.c3s2.iphmx.com (esa8.fujitsucc.c3s2.iphmx.com [68.232.159.88]) by sourceware.org (Postfix) with ESMTPS id C53A7385741D for ; Fri, 27 Aug 2021 05:05:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C53A7385741D X-IronPort-AV: E=McAfee;i="6200,9189,10088"; a="37869546" X-IronPort-AV: E=Sophos;i="5.84,355,1620658800"; d="scan'208";a="37869546" Received: from mail-os2jpn01lp2050.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.50]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Aug 2021 14:05:06 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QLvrt+kZFNCKmrEbpJJvA/yhIEwepqWEsuB9vU1oTSZrP4ioedvz2ClU1AFmyQ7DsEQF6o8kz49cDOH7fTND4lrfrBB9qmx9FY/Uf+fqjY2pCW6nJ2ChsGvM/bmTXA5LK9ack30wkeO6n0gz1xjN86pfORxabKayvuGI3dLsOJyD3lAdsj0xJcbqWKKVbJzN3BBgfKlCra7PpKomMRIBba99c18DQxBwCxtxczivbCwLH25K7eBXDaFbZHVPznbZeBsYb3qiHewYQf4d7PVgbgv275c3iUChJgb/HYrJCQ9RRHZaNIFfBw4wPbtJknNOUVIAF+SVWTHWi5MMuUXzTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3gYNSp+gvNW9i0j+YRGF1PcCZvSmhemSm1dLp8NJ+tc=; b=ob1DMFN5hscrsTGOt7Y7thLB4wSHhAvqHvqk3azSTF6DfBROMpE0Tbk4pXzpNxA3btxvMvI34YmHiLE7jnJKqbL4/P/9hnr820+RCOE3xIYXkd9wHhyyRnfDM6dvcvGe3xEBLZkoKqBxsZx3gYSyS7j6wUaibojVeZ70HDfnyPXipfRhgExDKfTMcRDL/GYQjsoFndbfn7wTXrPlkZ9qtc885WoXV3z5QVSYWxfKWMgGqi1JUe5fzMHFp+RVfu+9nhLx5GTFZJyn7A4rUrv5bPoKNfBtqtpRK8gskYHICvbHQR4EHxMzlQ2mge3XXAmybQoWYKFtoDgZ2atYgf2xSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYYPR01MB6779.jpnprd01.prod.outlook.com (2603:1096:400:ce::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.20; Fri, 27 Aug 2021 05:05:03 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::f55f:cf98:a8d4:b803]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::f55f:cf98:a8d4:b803%8]) with mapi id 15.20.4436.025; Fri, 27 Aug 2021 05:05:03 +0000 To: Wilco Dijkstra Subject: RE: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Topic: [PATCH v3 5/5] AArch64: Improve A64FX memset Thread-Index: AQHXfxMmDOwcjJYmP0eOZ48wdBomwathHRvQgAo6SBuADAQMwYADlG+HgADnVzCABY//ZoAAms4wgADddl2AAVHh0IABzQsNgADsEjA= Date: Fri, 27 Aug 2021 05:05:03 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-27T05:05:03.334Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 68487e6b-a3d4-4f42-6208-08d969183af2 x-ms-traffictypediagnostic: TYYPR01MB6779: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: fmSZzousMDPLdK+LQkZLSEhvE3Hx4xoWHvVVcY5d5w8nsVrz/Wz+xO1v8+mdFsNdjV8fdjcIVF8aHXpD2YfQpiEEiwi+x5iK2op+6CYUakaWfd7o8n7U8YgxBz3sPGQ7LfRBhsjChCs0DNBV19zsgBLMoka0a89qC1g74ROfmsMIVXV9LOKvMnH4rKFEh3ZbCRMMMgKH0h2vfFdZDECTx81c2bIB1fuOdbssRC8NvhTodq1WuVHDJcxIhtsa0SiTbtcHjVGR+y5hHzvdLvhmI2nGItjcgnspt+nmesmUxSHcvYSUQvzhQzJaafJUIoCLAVpZU79JOQFAp+PxKy0gglWNXm1//a3KhjGlfUMBZKIW6+MoCLr/8xJEmKXT8G8CotBJeg3jY1mHHEQjrrkq4zGYIFdT+XCm+ILPDCGhHXET1c8HnuMSIxwRaFl8NuO2qdGMdk0U5H/2qFLJ7mVQYlUpRJYilOuWORD4SIzcHRkO+Mai3T6QsBUCHyMUnBDkMduMxxkT9DEny5srkq7O5E862SEbd39qhOCG/PmVb8HBN3hlswmjjTPpm0MZeyVHTJpFKnrdIa7fklULlqmJdlq57XjJYbq9irCtfkCW1/JgPRBfgAFy9rnXIgCeu/NmkOfXD1J8QUURH03BGE03xFsCEPNEOhPeqo/SAwyfJRISvId4cROSoVfElN5flyy+cyHf7EVVvhfMb+fpCI7jxvma7FLONMAZjEfHW4Pj83dX1JkKD5eH9sWkh2HAHMX42AomSlzonXGC5SwMzrKbXP34v9aO2lMPJU51mRNNz9c= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(396003)(366004)(39860400002)(376002)(346002)(4326008)(33656002)(4744005)(7696005)(8676002)(38100700002)(86362001)(71200400001)(66446008)(8936002)(55016002)(66556008)(64756008)(66946007)(966005)(76116006)(478600001)(9686003)(38070700005)(66476007)(6916009)(26005)(186003)(2906002)(6506007)(122000001)(316002)(52536014)(5660300002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?TGJPMDBGZjRqejEvT1YwZnhnYmFOc3hpS1JJSW9oTXZJSml4NFUzdmtP?= =?iso-2022-jp?B?aFAzREZqeG9qNUh1bFBVSTVZVGtOOXVIMnp6VVF0TlFwMEJwdGhmTWJP?= =?iso-2022-jp?B?M1E2YzVLdng2czlUdzdKVEMrKys3MnI0cnFkZ3MvejJlbGhkMFlXbC9U?= =?iso-2022-jp?B?NUF5WlZLSkg3K3hVbGNuSEV1VXcxL3M2aWFqL0s4Ynh3Q3dDTk5rZklj?= =?iso-2022-jp?B?eFR0bldLd3hFbjdDQnNTd1E4Qm0xS0l3Q051cWJ2bGdqTjh1VTN2ak5l?= =?iso-2022-jp?B?Z3BzemhsKy9nN1lYc3FldGlJbGVBZ1B0a2xFbjFkSjhhcVhDSHhEMTVN?= =?iso-2022-jp?B?OW0wWHVXTlNJeHNIRGszdGVOOUtTTkVzeFk4Wm9pMFhYemwyWXFKYlcy?= =?iso-2022-jp?B?Sk9kZytMekFVekQzTzdSQVJKbUJ1b2FtVk5obGdiYVlCM2w0UTU2ZjNU?= =?iso-2022-jp?B?V1RzcG1uUzArcjZrRVdXQjdrSFY5YTVpZy9YUWdXbExoWEJHWmU1N3du?= =?iso-2022-jp?B?N1Q2Qk52b3VMOVRqNkt2K0M5WEFYeFFyMTZtQlNaRmR6MWdzUDA0cXpi?= =?iso-2022-jp?B?T1V2TWdCV09WRGFpU0ZlL1oreVZLaFhRaHZmZExFU1NIbGszM1Q4Yk8z?= =?iso-2022-jp?B?ZTdnSWNaTVVMNk9RUGZUYVJRYSs0QnR4OHVvc2pzKzJlenE2MzJPOFU4?= =?iso-2022-jp?B?citwQUFGdUZnUnVrWWh0WkNVYUxhNDZ3OWE2c1NORCtlSnJyODhCdm56?= =?iso-2022-jp?B?d0ZIbHliYmFkbklPRm9pN0J4bFV6bUVaZ0x2NktPcTdaL1Foakw2OGN4?= =?iso-2022-jp?B?bks2QmhNUjR0MWFGdDVhQ3NVRXN6c0c1Q3F4V0V6K09qOVBzM3phWEgr?= =?iso-2022-jp?B?VXcvRHF6eWVhRm1iYXhWK29jUXVsd2h2RWZFYU1MbmI5aU4zN3M1ZTNz?= =?iso-2022-jp?B?cktMbUMvNzBGNytOWXdCZ3BUZ0FoaFhMejNvRGZ1QXdaUWM4OVNkM29V?= =?iso-2022-jp?B?V29WMjkreXlnWFkxWUpwQ0JubkJ3RGQ2OFFlemlWa0pTK0JFb05QRDND?= =?iso-2022-jp?B?WEsyWTMwZGIxYmdHY1dTTzFjRVMwU2llbXMwQlFkTk84WDl3QkY2Ui95?= =?iso-2022-jp?B?S3VhZ1RBd2Z1RlJmM1kwUnpqeVJWWHhUSCt4RGg2eDBiV0lzWmpQQnlL?= =?iso-2022-jp?B?dFRlL2tIZDdJVis4UTRYSHZUQ1lqSzd5eVdNSjNaZlNGUng1UmNOQmFh?= =?iso-2022-jp?B?VWJjTllOR1dyeDdkVndmMDZSVUVpZnVJeWhZZEpmRG9OMjdrblFqV1pD?= =?iso-2022-jp?B?TXBHeUV0MUxqR1k4U2p2ekZsVkVnbHhuRW1kblhkM0VBSDVJcjZxdTZr?= =?iso-2022-jp?B?dGtic3FkNkp0amJkOVdXMTdJbE5BcTRpbWVPajMxT0VGODFXVEN1V2pz?= =?iso-2022-jp?B?Z0hHM0lBT01SWkRyNDh1cUFSS1dOVjhXYXVKZmgwZ1JBY0lUay9LMUUx?= =?iso-2022-jp?B?UEszbTJ2S0JkMHZiaXJHaVFLNTRHY25ZS1UrVFNTY0l6TmxWM1dnNzNY?= =?iso-2022-jp?B?S2dvMkR6Y3dLV3pDZVlRZjlOQytyUDRYeHJmTy9SenNzbi9IYjdqN0Fu?= =?iso-2022-jp?B?SEFJWUtpVUJOOWxHR3BSc2U5TkNHazcwUjU1dzA3SWNYVE9qR216UTc3?= =?iso-2022-jp?B?QUdoWmlNMjA2dFJCZmxXVVI5bDlVRDRJN0VWSnl6TG1vTHNQNWNxVmVB?= =?iso-2022-jp?B?bjJqR1VtV2V2KzE4L0JCa0w1WWliWHRHV3RaYkFDaU5PbEFtak9IUVpk?= =?iso-2022-jp?B?MnZxRXl1WWpvS0tqL05wUWVPdEJwcjlvQ2ZTZXZaSk1HMjF1Q1ViTWky?= =?iso-2022-jp?B?elp5UmNxRkZpUWdtY0ZONkpkV1JZPQ==?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 68487e6b-a3d4-4f42-6208-08d969183af2 X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Aug 2021 05:05:03.8372 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 6snL8aWpvU8LBO49kOFvjk1gkM+f0Lxj/rkYcYDAMiQdmP8kNL7MGHW5n3/yz2JGkUQOghDMZYTsqdDb3AWl5Q== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYYPR01MB6779 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco,=0A= =0A= > > If you agree to the cmp and branch workaround (2 instructions at the be= ginning of the loop)=0A= > > below, I'll submit a patch.=0A= > =0A= > Yes, the 2 instruction workaround is clearly the best solution so far. It= fixes the dips=0A= > around 16KB but doesn't regress anything else. The results v4 vs v4fix [9= ] show there=0A= > are even some uplifts in the 1-8KB range.=0A= =0A= Thank you for the review. I submitted a patch [1], please find it.=0A= =0A= [1] https://sourceware.org/pipermail/libc-alpha/2021-August/130569.html=0A= =0A= > > 2) Result of the cmp and branch workaround (2 instructions at the begin= ning of the loop)=0A= > =0A= > It's interesting this works on both systems, however it's still a mystery= why...=0A= > It would be a good idea to ask your CPU team about this.=0A= =0A= OK. In the meanwhile you can find the microarchitecture manual [2] if you'r= e interested in.=0A= =0A= [2] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitectur= e_Manual_en_1.5.pdf=0A= =0A= Thanks.=0A= Naohiro=0A=