From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CC6231F8C6 for ; Thu, 26 Aug 2021 17:07:11 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0C43A385841A for ; Thu, 26 Aug 2021 17:07:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0C43A385841A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629997630; bh=lwiw6Nh4NCjQgC3dMn70gH2ySdXVOTUyyMrwrqH4iJo=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=BNoXcVT+oNuElj6hSNPHCmDXWsjqSifqhF4daCp5HkoSnDpOTnxEKTsBXqMFKr+ix X4dG1uvNqfVFL1EVFcA/VysnZLZYPNa4QR6cIv9ijrSjEk5rzv+Ywh2KPe3u0mfY+n Qt69IC6tNQAzZ9FgGRLcflQjwzfQjzi8is7TPAx8= Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2082.outbound.protection.outlook.com [40.107.22.82]) by sourceware.org (Postfix) with ESMTPS id 87176385780B for ; Thu, 26 Aug 2021 17:06:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87176385780B Received: from DB8PR03CA0017.eurprd03.prod.outlook.com (2603:10a6:10:be::30) by AS8PR08MB6712.eurprd08.prod.outlook.com (2603:10a6:20b:393::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17; Thu, 26 Aug 2021 17:06:37 +0000 Received: from DB5EUR03FT063.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:be:cafe::26) by DB8PR03CA0017.outlook.office365.com (2603:10a6:10:be::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.18 via Frontend Transport; Thu, 26 Aug 2021 17:06:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT063.mail.protection.outlook.com (10.152.20.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17 via Frontend Transport; Thu, 26 Aug 2021 17:06:37 +0000 Received: ("Tessian outbound f11f34576ce3:v103"); Thu, 26 Aug 2021 17:06:37 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: f2e2ba33841f3bfd X-CR-MTA-TID: 64aa7808 Received: from 969e6dbfbfbd.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 948A38D0-3CBC-4261-9B8F-3046F8C35F2F.1; Thu, 26 Aug 2021 17:06:31 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 969e6dbfbfbd.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 26 Aug 2021 17:06:31 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ui5TCCrpVlxUjVcvyF3YNZnjWE9v0RfR1FQ++8Gml6Qsb+fzX0DvEmecREUBMajTRELPYYugsEcvP0vLboCjRbEpSeUzaYj3Nygbr+n8jhRbpFLUuG8mpnJGSMMYztgj4rX/oTrnBP3j8kq3yked5KMTM24R+CECYa+wA8Qxhz7L1/v+W5CMkZt/E0kjXb247qVBmP1r5gHZRrSDofUOiuDwxCZz16dI1uSoZwFmwjbFpUrLxwxNj4t1uYxB72f7lPm9BtmjbiIPaKHcYtikx9gTGJ35aGp944VKwdkgq603zqdbjwm+0br0hvY9/eQ/Bu59IIFJgy5u1fjQC96AAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lwiw6Nh4NCjQgC3dMn70gH2ySdXVOTUyyMrwrqH4iJo=; b=O+vY2ICmipvITCMZrYSYwZlqRFPlTiykFVSs+5BhCTV/mNRSSZg47HTscUvAdBCg5D3uXNqP76ifbspWnHrNp7Ayb7ifex7FZEdC1XN7GDxr/MmXh21PWD5fP1tku4fnZqTFQos8SZ4m2x1fux1fn/A7MJ49mMKsAomPHd4crMng6xEaLcYsN1WkJ8gFZsKggGR+WUUFK7a007aGtnlMW6TiPnX3C3vbLKgO4rrBNbL8CF/nDYSRh6jXBjZc/mlH9AYsoRazX9bb9fsaEOyXye5bUphjxpsLgAAlqkzDOxf2AAGcDriADU5QGGd987mPmeLmXauRiuE+i2dgp+leQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4463.eurprd08.prod.outlook.com (2603:10a6:803:101::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4436.22; Thu, 26 Aug 2021 17:06:29 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%9]) with mapi id 15.20.4436.022; Thu, 26 Aug 2021 17:06:29 +0000 To: Noah Goldstein Subject: Re: [PATCH 2/5] benchtests: Add new random cases to bench-memcpy-random.c Thread-Topic: [PATCH 2/5] benchtests: Add new random cases to bench-memcpy-random.c Thread-Index: AQHXmQQ0qzx2yGtFtkKr7NICmA6tQauDC3wAgALcDus= Date: Thu, 26 Aug 2021 17:06:29 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 9b0a7b29-9b0e-4543-ccb2-08d968b3dd5b x-ms-traffictypediagnostic: VI1PR08MB4463:|AS8PR08MB6712: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: IjjcnLC2x1fv2dzjxn3d9Kyjxyey6Tpw6RNfWHvw65GfG1/aW/gZpJ3vJaTG1tovwyiDZ1zj+ltqcydrMC1MoD6cGuGcOggdKqRH6D04dqRSYoRVUGlvqZvgNvwAA8sp9N9fOLtofS0jgAYUCJQGhKLz9nHHnIXv90KrmVWMJibsLN1bIvxMAg1QO6Nlwyl4bbkNve3lXI8iPCP6+KbtX/yfN6X7n3g50VgW+0us6Ox/hDfCzcet3HjMENe19hnmnq7FAVztEf2LU/T0hLFDsL+Wa39vijbH/Lp8D49cZCQaLTcYydYDJ9FaVHjJowt0OcxSUQ3veKhhBnmU5HLA1wTnWNfrMkZ0UWAKxyPfB4v9p9hctM5UhD4uwk6Cz8PUVJADLRlc9fwArQNC9mXyZzP2WfWS3nilEOKomHvicjVlrwXlmaCZyWC0EzQY7iU2K7sMDyWm6jDoc76OqCZeGdAC4HKINKsHMpGoIsxOh3qB3mBMQl4GlyT8G0A7eQuC+RDSqz3VCSTSm9SdpVDjSIvSyY+P+05r5r/8Me7AR8Tfk6M+yV2b4yu6NH8XieRpKnuwkWrm0fImRXl84BY7PLYsnMnwJrlcE6hif5CbSgMmGQrwfgciVQZrhBVa+1QTwzgwAZxUHZzfpqa1O04i1t9aYAUV75Kk+aFAfnkpOx78kydxcHsqPxFnNSNYVAgIUpmOYXQ/J9UDzkJIuLKX05tNNH6QaYFdXaopPgcm8JL3CMnt6n5lnFTPGbZYTuqNDm5RXkakPdxhcbk6NtcQZAH5MingXmGLDN+Mx8XrReI= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(136003)(366004)(39860400002)(346002)(376002)(6506007)(86362001)(33656002)(7696005)(186003)(2906002)(966005)(26005)(71200400001)(38070700005)(83380400001)(316002)(76116006)(66556008)(122000001)(5660300002)(52536014)(8936002)(91956017)(9686003)(66946007)(66476007)(4326008)(6916009)(8676002)(55016002)(66446008)(38100700002)(478600001)(64756008); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?WuSqegLrLxT7TUarK3k2yx2nFKhvGv/4h7f5kHzfL496mnBV2rOQ6CuH1Y?= =?iso-8859-1?Q?4Wrc2gcGmb/JYlsSNfIQpK4F5nZdBBa0rhIb3yAdA6TiZuAq4srh7ibzEV?= =?iso-8859-1?Q?mAKIunS1zrc5c3g276EiE7GZIW27I8yHodS2sWWI6KKKZQ6KlmPjsPekXc?= =?iso-8859-1?Q?mKk8hc3dcDs494MwrJPD5/wWw87oVITuXmNe4mfs3XdO0XovStSVUAm1aW?= =?iso-8859-1?Q?3O0BWeEVNIZFs+CaAaALx11DlctGcrjXv+KLLtpUgS/8rhCRMg4WOe+gP/?= =?iso-8859-1?Q?StKFBKuQeMZ19elnbj+yqg4jgzl14fvbCBkcj7eXKMIKv3YcrDkqv2Nov3?= =?iso-8859-1?Q?SUWU3Hq5yk4CTgHkOz1jnumy+VB/Uq1v6AV3Lzohaue2FH5BUEtHTP4OEG?= =?iso-8859-1?Q?lJIrLPLi69wBcBLHvxx97MU3GNkXfWc3pk53zPKIoMhWwW3qVlvD+Zkyp/?= =?iso-8859-1?Q?REg1O06mr9sQKfYsGH+64bPK4p8S/6p8JW8BRrVCFM4JME/EBY7y5s/HTS?= =?iso-8859-1?Q?VPuFGiL4UfuEe2mwhS2jrmbSgYeonnTJdlAD6jte9ssn0Dgo9m2RZvDzej?= =?iso-8859-1?Q?fytWNK5tPa4W/Axh9Lr99UzbUtktFXhtObCRNNkOcZ1DvauvkzSM7shYiy?= =?iso-8859-1?Q?XgEpnwbAmifTS5huVP0bKcDp4/59mKJpGvWC5PNTfFaIBF73V9v/+xK1SF?= =?iso-8859-1?Q?Tk77o3HtoE8Bq4WCDsjuIxM/4uRRRulgTP6RjuAJoJYmMowlYulROboQu2?= =?iso-8859-1?Q?ZK4U27rplSlQOEGwre5+E7geHtG+NRm/mYqXuxIsn66xud2cpHjVZeYNBq?= =?iso-8859-1?Q?uurpvsgNt2uPzP/5e8u9rQKINVkW+quQR4gntSmWsUhd49ghgDV9ZBbA3c?= =?iso-8859-1?Q?iyOpJ87WV0d9MSAVNEE34BBInpCChwI2Eh+21wQDFCnXA/oAM/NVkJdiRA?= =?iso-8859-1?Q?f4kQpyypPAm/MSSw296p2u3mPrL8wgxof2+BrkvKkOOZlYBPCYe85+uloQ?= =?iso-8859-1?Q?ki2jDbz7E7uJU/GOxiD0PbOg9MEmyQA728fyKb8FzZOo0KLCBHW6IgeZhx?= =?iso-8859-1?Q?jC2+YnXldc79/bzenwdV9Rmij0ZHbT3FgK3Wrt89ZtooW/GEpg4t3XuT0r?= =?iso-8859-1?Q?GLVL0I/EzW94+CVaw5ENMaWMejPtMxBlHnPaG3zSbSwXIMk1r9QMJkZNNL?= =?iso-8859-1?Q?VAgOKdv+SPzYOJhrZFL3nZISgep7pqfIvwkiIY69k0HvUGQylm45zFy2jj?= =?iso-8859-1?Q?wFwo29fgNmWnikwOcIAMbSIdDjqaywpbHxh0x3IJeJdZNOqrp9b50+NQ1b?= =?iso-8859-1?Q?vggPp+g/jqpiPOM+NEuMUVg9OGQLrMfK95KCxVXf+o6g1LA=3D?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4463 Original-Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 333a63c3-dc28-415a-2f22-08d968b3d8f2 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YzJ7XM32dZMW8jIy62pJ32JaBC48OfMFGTrf66kSwzVlmFWS5l2ZdRIT7rIL00c7Jv+rG6uvh5+Dx80aMXlbil0r0wK2eFXzH7opipCgAUuT62cGvSpdStX91DMJG7l1SobZj4IL9gfEyMXnxIvJsK4UZOhQPT8w47m9g9t+xOHHVsPlW2jHuf+yiwX72QxvwimZzifNXoMuRBWpRUyPNFFKHf15rv2HYAXNmTSi8KBQU4dQghe8VBL4YM8EdsrVJzZvo7taMooVmoV3tMdh4IiMRBzpoaZteUGJLZrNWxdTmuWcYrO4VIcoebttF8dW1ZIeutysiOyDYawEh6lD5FdLnwXu+T2a54aYzLK0A2ixHF+vmNKzChtCh8EO6abcgSeDRsXYhmD4KZspQNWOi/esU1DVkcHcgyOh23ah6inmM5tMJxjupmIAOGRglptr9l3JssDLLphg0RHOB6niiGIkI6PAhiyAdhHCkixOBG1krtGKlT5TL8BmUEGfTiy9KWvDP7gGay13N4DjISK3O/I2FNTRcTrtu4slBuKCNEKVpv3rPIC9CLlixGSUCb4QT9iMWbvcXCxiDCcBp3Rkf0DE9K2c9V1Vpz9/oyIp9mNmyYLk4UtxR1n7XKQ8H7rS96ZdS6mjwbi83fLsKqOJHCEylXXquk8QY3r+gsJfeiXCEmk7aRAstTm/JUqhulJxGo6m1tDN+b1oj00/W2h1wMjSkRLY4JcW4MStbqSZPlbs4lOIUmPFobrGJJdRJ03YEOKMESESjI811SgRElJF1E+KnVKgE8Fv+KLYqxSQU4M= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(136003)(346002)(396003)(39860400002)(36840700001)(46966006)(4326008)(82740400003)(26005)(7696005)(47076005)(6862004)(52536014)(478600001)(70206006)(966005)(82310400003)(9686003)(5660300002)(186003)(33656002)(86362001)(55016002)(70586007)(6506007)(81166007)(83380400001)(336012)(36860700001)(8936002)(356005)(8676002)(2906002)(316002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Aug 2021 17:06:37.1781 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9b0a7b29-9b0e-4543-ccb2-08d968b3dd5b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT063.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6712 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: GNU C Library Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Noah,=0A= =0A= > Since on x86_64 at least we use memmove to implement memcpy there are a= =A0=0A= > few points where the logic revolves around src < dst. Particularly in L(m= ore_8x_vec)=0A= > which branch on the condition w.o checking for overlap. I don't think its= particularly=0A= > fair for the random tests to essentially ignore that branch and imo a bet= ter reflection=0A= > of real-world usage. An implementation could add some really branch heavy= code=0A= > in the case ignored by the current implementation that looked good on the= fixed=0A= > benchmarks but would perform worse in the randomized tests.=0A= =0A= Yes such branches are a bad idea in general, particularly for small sizes. = However=0A= I don't understand why you think that branch is ignored given the current r= andom=0A= tests will already test it (so you should see it being expensive in profile= s, and you=0A= can easily check whether doing an overlap test ends up faster).=0A= =0A= So why would there need to be any changes to the benchmark?=0A= =0A= > For example in cases where src/dst 4k alias forward/backward looping can = make a big difference.=A0=0A= > On fixed tests it may be worth it to add branches that check for that, bu= t in a randomized=A0=0A= > environment where the branches can miss could say otherwise.=0A= =0A= Again the existing benchmark should already test those branches - if they= =0A= execute frequently and are hard to predict, the results will be worse.=0A= =0A= =0A= > I generally agree. But on the flip side there is essentially no way to ti= ebreak different=0A= > alignment methods. For example in x86_64 aligning to 1x vs 2x VECs will i= mprove about=0A= > 50% of the fixed tests and harm 50%. I think this can be useful for deter= mining "which method=0A= > is expected to yield better results". There are other more branch heavy p= ossible alignment=0A= > configurations, which I don't think fixed tests would be able to accurate= ly benchmark as they=0A= > would be 100% predicted.=0A= =A0=0A= Yes the fixed tests are not very good for tuning at all since they complete= ly ignore the=0A= effects of branch prediction. I agree adding fixed sizes might be useful fo= r alignment=0A= tuning but then it's best to keep it to smaller sizes since that's where th= e alignment=0A= overhead is the largest (say from 128 bytes to 4KB rather than 4KB to 128KB= ).=0A= =0A= +=A0 uint64_t src : 26;=0A= +=A0 uint64_t dst : 26;=0A= =0A= >> This doesn't make any sense, where do we need offsets larger than 2^24?= =0A= >=0A= > We need the extra bits so store max_size offsets to alternate direction.= =0A= =0A= You don't need the extra bits, 24 bits is more than enough.=0A= =0A= >> This doesn't make sense since this is now copying outside of the region,= =0A= >> effectively enlarging the region but also always reading from one part a= nd=0A= >> writing to another. The data outside the region is uninitialized which m= eans=0A= >> it is using the zero page which gives unrepresentative results.=0A= >=0A= > I see. I didn't quite grasp the purpose of the max_size. (As seen by the= =0A= > fact that I renamed "length" to "max-alignment", now "region-size").=A0= =0A= >=0A= > Changed it to use max_size instead of MAX_TEST_SIZE and initialize=0A= > 3 * max_size.=0A= =0A= That still doesn't solve the issue. Basically the original test runs within= =0A= a block of max_size. You're effectively doubling or tripling it which means= =0A= the results are not comparable with the original code, and the results=0A= cannot be compared between the variants in the new version either.=0A= =0A= > I think its better to expand region size to 3x max_size as opposed to 2x= =0A= > max_size and have noise for store-forwarding (on x86 at least)=0A= =0A= I don't see how that helps. The point of the benchmark is to be more=0A= real-world than the fixed benchmarks. This means you will see lots of=0A= mispredictions, cachemisses and other penalties. Yes, it will tell you if y= ou=0A= have a bad memcpy implemention. And that is a good thing rather than=0A= something to be avoided.=0A= =0A= +=A0 =A0 /* Create a random set of copies with the given size and alignment= =0A= =A0 =A0 =A0 distributions.=A0 */=0A= =A0 =A0for (i =3D 0; i < MAX_COPIES; i++)=0A= =A0 =A0 =A0{=0A= +=A0 =A0 =A0 dst_offset=A0 =3D dst_gt_src =3D=3D -1=0A= +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ? (rand() & 1) ? MAX_TEST_= SIZE + getpagesize() : 0=0A= +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 : dst_offset;=0A= =0A= > I don't quite understand what you mean? I change it to be=0A= > 2 * max_size in v2 and src_offset always has a value of=0A= > max_size if src > dst or randomized. This should prevent=0A= > overlaps as essentially src has its own max_size buffer=0A= > and dst has two possible max_size buffers, one below src=0A= > and one above.=0A= =0A= My point is that doesn't make sense even if we ignore issues in=0A= the implementation. Basically it doubles the region size for 2 of=0A= the 3 variants and triples it for the other, so they're not comparable.=0A= When you can't even come to any conclusion about that src < dst=0A= branch, what use does all this have?=0A= =0A= The other thing is that with splitting the buffer into read-only and=0A= write-only we go a step back towards fixed tests. A memcpy often=0A= reads data that has recently been written to, so doing both reads=0A= and writes from the same buffer is closer to real world scenarios.=0A= =0A= > Fixed in v2=0A= =0A= Is that this version (which doesn't appear to have fixed it) or are you=0A= planning to post another one?=0A= =0A= https://sourceware.org/pipermail/libc-alpha/2021-August/130493.html=0A= =0A= void=0A= +do_test (json_ctx_t *json_ctx, size_t max_size, int dst_gt_src)=0A= +{=0A= +=A0 size_t n;=0A= +=A0 memset (buf1, 1, max_size);=0A= =0A= >> Note this doesn't initialize the extra data in the buffer which creates= =0A= >> odd performance behaviour due to the zero page.=0A= >=0A= > Fixed in v2.=0A= =0A= The memset looks fixed indeed. However there now more bugs in the=0A= fixed size case where it does:=0A= =0A= + n =3D init_copy(3 * max_size, dst_gt_src);=0A= =0A= > I misunderstood the purpose of "length" which is essentially total=0A= > region size. Imo "length" in context of memcpy generally refers to=0A= > copy size. Renamed "region-size" and set to full region size used=A0=0A= > which previously was 2 * max_size, now 3 * max_size.=0A= =0A= "region-size" sounds reasonable, but you might need to update the=0A= XML template to allow all these changes too.=0A= =0A= >> It's much easier to change the for loop to i =3D 4096 to MAX_TEST_SIZE= =0A= >> and remove the various i * 1024. Then changing MAX_TEST_SIZE just works.= =0A= > Fixed in v2.=0A= =A0=0A= + for (int i =3D 4096; i < MAX_TEST_SIZE; i =3D i * 2)=0A= =0A= That looks much better indeed, but please use i <=3D MAX_TEST_SIZE.=0A= =0A= Cheers,=0A= Wilco=