From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 460031F8C6 for ; Wed, 4 Aug 2021 09:12:40 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ADB803898526 for ; Wed, 4 Aug 2021 09:12:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ADB803898526 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628068358; bh=qWZoL2HNqS/9ZaqxA3RIHi9hJkaIfJyOVEirwpEIxZU=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QMWqr/+g0dXRCw3sT5ljYa3fNBZ77E/KR2k/owRUfpnb5QTb5duGTRNQ1ZhLBHQKh 6J5Qem0zwoIQyOhMz7yjEngoIAdato5vFszFqYRr1XV31jw/FdVECXBkJ6Ep0lUTmg YgcieuBMsnWF6NFQbpsCpiXLg2tTgjJTcJbeZekM= Received: from esa7.fujitsucc.c3s2.iphmx.com (esa7.fujitsucc.c3s2.iphmx.com [68.232.159.87]) by sourceware.org (Postfix) with ESMTPS id E47DF3896C19 for ; Wed, 4 Aug 2021 09:11:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E47DF3896C19 X-IronPort-AV: E=McAfee;i="6200,9189,10065"; a="36076924" X-IronPort-AV: E=Sophos;i="5.84,293,1620658800"; d="scan'208";a="36076924" Received: from mail-os2jpn01lp2053.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.53]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Aug 2021 18:11:14 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kES1kJdZf3YaPoeGBJwMIHezSm6syfuL50sFqEhJQTvGF4scoSgqDnct8aurzu1wM0aVGGC7Q8Cie9FJ5CtkZo7t7lg5SWhzy1MRCxldv2PXdKAZ5SFySag5fo6PkNyAlcNdn2n2EmDfrTmK3OQw5W/CUgtqKeCh1PDznB882hLDxLiWzzbBFxqFNtYmdusr5o8NHg/48H+eBk7BOOc9h2lmjCWmHGoXqSHY6PWLrld1sc6GZcH/C2qw7Xnt9WfGdc7lQ5epsmcBtI1LNBTUvmaU/FzfNwy12zWPCUlDrABteblT3Xsi6xgF+dsRS4g0kDXR3gx0vHqMFuBBENM8nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qWZoL2HNqS/9ZaqxA3RIHi9hJkaIfJyOVEirwpEIxZU=; b=gjteCWj1UvSywXJ5Fk5EJ96t+Uofhcx/a08mXVfOg/SQ60jfhl5E6o/QJ4+GXbmtPlqxS37QUjJifKAL8pygTMLMH7HcNHcKCeJ66jwrMOokWVb4walHd/COAWmgmQZ/xpSmr+8qKqz5RembNzgeOaEfl/CkHX3snajK9qBy65rM68ijx4Qzkh++mykj48x0rscX9qzRb9Ak7ieVqhycSldfYmfKQKy+L9lgyX9gy1r4L19sLbb3B2mElbRv40wpGnE9Ju7if1lsx6K+8QEShfsWx9UAu8o0arbIw8h5abH0GJa7q3ehgV+QsTanQwNHIncqLGDV5esPogUTZVnKtA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYCPR01MB7028.jpnprd01.prod.outlook.com (2603:1096:400:bf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.20; Wed, 4 Aug 2021 09:11:12 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::5816:45c1:5336:c108%7]) with mapi id 15.20.4373.026; Wed, 4 Aug 2021 09:11:12 +0000 To: Wilco Dijkstra , Noah Goldstein Subject: RE: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest Thread-Topic: [PATCH v2 2/5] benchtests: Add memset zero fill benchtest Thread-Index: AQHXfTFkDFnCQ66ygU6ZgGc9SBuuUatME7OAgAEdVVCAADXcUYAAVzoAgAARdwCABymxSIAAK2oAgAD9BDqAAP0PAIAA6oOmgArPMBA= Date: Wed, 4 Aug 2021 09:11:11 +0000 Message-ID: References: <20210713082214.307529-1-naohirot@fujitsu.com> <20210720063500.362313-1-naohirot@fujitsu.com> , , , , , , In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Enabled=True; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SiteId=a19f121d-81e1-4858-a9d8-736e267fd4c7; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_SetDate=2021-08-04T09:11:11.107Z; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Name=FUJITSU-RESTRICTED; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_ContentBits=0; MSIP_Label_a7295cc1-d279-42ac-ab4d-3b0f4fece050_Method=Standard; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 4a8930e8-d617-4e0f-4d71-08d95727cdf7 x-ms-traffictypediagnostic: TYCPR01MB7028: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3513; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 6aXFWwCyZqMU/EeBiupjonjjjT6WxIfnncpzP+WZ1YQH6wlvlDaYsrs8LXfEOR1nLqp2AVSRCIXGiOwpQAlfrvvFHq4V5fwJdKm5e4s1K0jBCbuPxMpubmiksd3erP2+tfmebkXc6hvGVamQreRWYEGBuPT6VTASxL6JEPb6eF/2IQcnyUJgxof/EoM1giM1KoGGiSW19lrgb53y1oLWLe2KkMq+E0S+v38UkrT+nLuo8o1BsXrQ2FAaiMrlPIgAyh/ii3n88TUxq/Q9BWYVB2zfrnI2K9hzNePiZG6/C7CytT+lqTImQ/b+1vh7LaKnTpdrkTtGZbrMx3dKYzuaTfnJq5Xtp9PvrCc4bCbLXroG7UfFcoJxH6uGCuGPo6s1ieapDIXr/Ywv6aalcdGn2EgIdQSl4mXtuyC/cBZluHjyRq5J41coQJbfZb512QlnKhtvsOQ5qa8Mvm8jJqlcubGs8KobINcxFUoDqEmB0exfne9UhvgC+iU+4YsUwl1T1ZOloPxybuDLT6L2C13BdJdbSSFZ29J12y7sHmVSvZ98QqITjfYK2dBvnGu4WKF3He5U74QCEyodZtpvoUwgA2GpD+vtw2WoFjkZB746IbeFJUhxam9loPynG2m1UPa5DF3bTU7AAJQedAirP2UTicD/7kFY5nheXhmNNHFS98N0SPuE7xXv4kGd3KzgT6aWjMhGZcN3Z3qYftTKZAwOKw4evGYFdBK8HExHqPPw8udI92o0uCK7Bx+4mlMU7LK5tkF3/KCt29D6gN7p4ETNc2rrRSvai+WSLC5ft01u9p4IZclE/ZG+UeTaOaQqaoby x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(39860400002)(376002)(346002)(136003)(396003)(4326008)(54906003)(83380400001)(76116006)(85182001)(66946007)(26005)(2906002)(5660300002)(9686003)(8936002)(186003)(110136005)(55016002)(86362001)(316002)(7696005)(66476007)(8676002)(6506007)(478600001)(38070700005)(38100700002)(71200400001)(966005)(66556008)(33656002)(66446008)(64756008)(52536014)(122000001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-2022-jp?B?TWV6by9lUlluS1VrMzZBa2hreXFvUEhuT1dvakhoNXlEWFdOWnFJc0ZQ?= =?iso-2022-jp?B?VEQxYTI1TmRJRXhMR2xsK3VtSW1MWmdyMmdScGtLOU1DRFNhUWVWUVZR?= =?iso-2022-jp?B?UlRJRGVZcWJYTUpWYkR5MCtGQVVLVXJ0T25TbkRORW9IYXpJNDk1RjJD?= =?iso-2022-jp?B?UW56bGI2NUFJS21BRWFEV3lvR0xraDkwamE0eGpaa2M4Nnh5TzVEYVFI?= =?iso-2022-jp?B?aWxsV3dnVmlOMWNybGdIZmwyVzViTXZndFNhWStCVlBPelFxdEF1SGJR?= =?iso-2022-jp?B?NnhvWXlYSWFaOHZxc2NRRkliZVBDT1hHYlk2YThEUUZVdGc4V1BqNlFH?= =?iso-2022-jp?B?WTZIbGFLeHRYTWJGWUdLVlhmck42S0JyTGlEKzZFMW9IdkxqRmlXN3Vl?= =?iso-2022-jp?B?bWpKbktqaThqUzlTcnFtaTZraDY0cUYvTDFFQW1pVFBSV3ZZL0xCNTVh?= =?iso-2022-jp?B?S3R1QTN3NFpPMVJQcWlYdmF6OHJOYjZseXdjTjFFQzk1QVUyT0VZa3Z1?= =?iso-2022-jp?B?cGM5VWthRkVSSHNKenE3c2JlNTZlUnJIcVVlMzhQOTE5ZFVEYjRUWGlR?= =?iso-2022-jp?B?Uk1oRjVwRmpud1RjUHc1OEZwNzZaZThCMDB5SU4rci81NFIxelA5SC9W?= =?iso-2022-jp?B?MjBtVUp0YU9uZUtwUC8vdlljRjhJak43Z3RnRHB6NUtVbXUycjF5RUFB?= =?iso-2022-jp?B?ek5GMFg0YmM5ZWszemplWHprdUVmMTRsYURpc3BGNWpsVDRhcGtYREpY?= =?iso-2022-jp?B?VzRRRW1hWHVMcklxL0hGangyaGJ6ZXpiUnBjZDlnaU0xUjZlbnNLSkVV?= =?iso-2022-jp?B?cjlTcWJhb0RQbWxyS2JyNTJPSjRzNW9EUHhMd1lQdmJ5Zlh6ZVhyemow?= =?iso-2022-jp?B?TlpoNXoyVlFNM2YreDAvQTBleE5Ua044UlpIcDhWQUV2ZTV4dXRlNkdD?= =?iso-2022-jp?B?RklKZUJJQVVBUDc3TzJzNXorNFhRQXNEbDJBQjFHTzZ6RVFXQ0YxR0di?= =?iso-2022-jp?B?UWpLNDJrK0xVSXMzcU14MGtBOEgvOUtPNFEzVDI3alYxVjdFSEpDRUZK?= =?iso-2022-jp?B?cjZKVGNTT1lCM1hOVklSZW8rZ0l4bXYvdWEvTytXMUJ5MnBHS0VZOEk0?= =?iso-2022-jp?B?NHkxVjhnb0xDckJJc3ZWeXJoY1hNTTdpanlJemFIZlVSS09hN1JxeWFG?= =?iso-2022-jp?B?RXlHTkN5ZnRHaWpWY1JHYkhlTHc1VkRBMWg3cDkyWU41ZmZONU5GenBW?= =?iso-2022-jp?B?WGh2WDhTYWJ3MWRuZkpqTUszNm5pdUgzeWVmbU45QVpGNzVqRzBVL3Bu?= =?iso-2022-jp?B?Z1VkV3Ryam9qYUVSdjB4dU41dWR2OVQ5ODI4YjRBNjg0MzdLTlQ4dXJ2?= =?iso-2022-jp?B?Vlh1L2RvOGlUL2xlOWFRS3RWOWxwQm04SjVNVGxvNk9hOU9GUkdHTlFW?= =?iso-2022-jp?B?SVdRRTdCSmdEelhlYTR0bjF1NURiaFZVWEVmRjdnclpLMHVoeHlHUU5T?= =?iso-2022-jp?B?ZnZmZHY3bFhTd3pvS2ZleUlvSHVFT2VGMmVud1paTU11T0Ntd1lLMFBS?= =?iso-2022-jp?B?T2JENGlsRjlwZ0x1N1Y0bzYwMkJqdmt5L1hUYjdDQjh4MG1ZRW9hcUhn?= =?iso-2022-jp?B?ZFArQmM3M3U4UllobEpzZ2UzcXlsdHRJMEY2NVhvR1pOcVNzcG5CeEln?= =?iso-2022-jp?B?YlVseHMxQXlOeW01VlorSDZTai9RaTM4RWVPaG1UNW81RFdyN3lFMnZp?= =?iso-2022-jp?B?TzJWMnRVMjhmN2RJYm9TQk53UDg5SnBVNGpFMzMwREtpU3lTbVEyRUtN?= =?iso-2022-jp?B?NVo4T0J0MTdsbDFHOGdMN0twNnZ0NmxRaFhhVXcxSU9FbkpKaFNuTUd2?= =?iso-2022-jp?B?NkxETXlwdk5oZm1hdnhDMHlPY284PQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4a8930e8-d617-4e0f-4d71-08d95727cdf7 X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Aug 2021 09:11:11.9925 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: +xi4YIo9TJGi+5+8oBOFRBVlTp36UNof+uW1t4z5aE3bBEQuheZIw7PqXphjIQcojii43QgpOjv0osqVcr7QfA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYCPR01MB7028 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: naohirot--- via Libc-alpha Reply-To: "naohirot@fujitsu.com" Cc: GNU C Library Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Wilco, Noah,=0A= =0A= > From: Tamura, Naohiro/=1B$BEDB<=1B(B =1B$BD>9-=1B(B =0A= > Sent: Wednesday, July 28, 2021 4:28 PM=0A= > =0A= > Taking Noah's comment [1] into account, the final code should be like=0A= > the below. Can we agree with this code?=0A= > =0A= > Two results, two loop version in the mail [1] and one loop version=0A= > below, are almost same in case of __memset_generic on a64fx as=0A= > shown in the graph [2].=0A= > =0A= > -----=0A= > static void=0A= > __attribute__((noinline, noclone))=0A= > do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s,=0A= > int c1 __attribute ((unused)), int c2 __attribute ((unused))= ,=0A= > size_t n)=0A= > {=0A= > size_t i, iters =3D 32;=0A= > timing_t start, stop, cur, latency =3D 0;=0A= > =0A= > CALL (impl, s, c2, n); // warm up=0A= > =0A= > for (i =3D 0; i < iters; i++)=0A= > {=0A= > memset (s, c1, n); // alternation=0A= > =0A= > TIMING_NOW (start);=0A= > =0A= > CALL (impl, s, c2, n);=0A= > =0A= > TIMING_NOW (stop);=0A= > TIMING_DIFF (cur, start, stop);=0A= > TIMING_ACCUM (latency, cur);=0A= > }=0A= > =0A= > json_element_double (json_ctx, (double) latency / (double) iters);=0A= > }=0A= > -----=0A= =0A= I'd like to share an interesting insight which was found when=0A= START_SIZE was changed to smaller size 256 from 16KB.=0A= Currently DC ZVA is called if size is more than 256B and value is zero=0A= in __memset_generic (sysdeps/aarch64/memset.S).=0A= However DC ZVA is slower than store instruction if size is less than=0A= 16KB on A64FX[3].=0A= So this would indicate that the appropriate DC ZVA start size might=0A= be different on each CPU.=0A= It would be interesting to see how other CPU behaves.=0A= =0A= The code is below, which measures 4 patterns, zero-over-zero,=0A= zero-over-one, one-over-zero and one-over-one from 256B to 64MB.=0A= In the graph [3], 4 patterns are abbreviated 0o0, 0o1, 1o0 and 1o1.=0A= =0A= =0A= #define START_SIZE 256=0A= #define MIN_PAGE_SIZE (getpagesize () + 64 * 1024 * 1024)=0A= =0A= for (c1 =3D 0; c1 < 2; c1++)=0A= for (c2 =3D 0; c2 < 2; c2++)=0A= for (i =3D START_SIZE; i <=3D MIN_PAGE_SIZE; i <<=3D 1)=0A= {=0A= do_test (&json_ctx, 0, c1, c2, i);=0A= do_test (&json_ctx, 3, c1, c2, i);=0A= }=0A= =0A= I'd like to submit V3 patch incorporating above change too.=0A= =0A= [3] https://drive.google.com/file/d/1fonjDDlF4LPLfZY9-z22DGn-yaSpGN4g/view?= usp=3Dsharing=0A= =0A= Thanks.=0A= Naohiro=0A= =0A= > [1] https://sourceware.org/pipermail/libc-alpha/2021-July/129486.html=0A= > [2] https://drive.google.com/file/d/1bptHqg5vvFAGoYgoR3w_pvclXFSP8Sr0/vie= w?usp=3Dsharing=0A= > =0A= > Thanks.=0A= > Naohiro=0A=