From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-2.1 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 039021F8C6 for ; Fri, 16 Jul 2021 19:37:32 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 11F323840002 for ; Fri, 16 Jul 2021 19:37:31 +0000 (GMT) Received: from lxmtout1.gsi.de (lxmtout1.gsi.de [140.181.3.111]) by sourceware.org (Postfix) with ESMTPS id D3F903840002 for ; Fri, 16 Jul 2021 19:37:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D3F903840002 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gsi.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gsi.de Received: from localhost (localhost [127.0.0.1]) by lxmtout1.gsi.de (Postfix) with ESMTP id D9BE12050D14; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at lxmtout1.gsi.de Received: from lxmtout1.gsi.de ([127.0.0.1]) by localhost (lxmtout1.gsi.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id IJ3y4BF1LKPb; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) Received: from srvex1.campus.gsi.de (unknown [10.10.4.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by lxmtout1.gsi.de (Postfix) with ESMTPS id BBD412050D10; Fri, 16 Jul 2021 21:37:16 +0200 (CEST) Received: from excalibur.localnet (140.181.3.12) by srvex1.campus.gsi.de (10.10.4.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2242.10; Fri, 16 Jul 2021 21:37:16 +0200 From: Matthias Kretz To: Noah Goldstein Subject: Re: [PATCH] c++: implement C++17 hardware interference size Date: Fri, 16 Jul 2021 21:37:16 +0200 Message-ID: <1770208.5S6X66LlFz@excalibur> Organization: GSI Helmholtzzentrum =?UTF-8?B?ZsO8cg==?= Schwerionenforschung In-Reply-To: References: <20210716023656.670004-1-jason@redhat.com> <2136759.qKCeTcHjAi@excalibur> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Originating-IP: [140.181.3.12] X-ClientProxiedBy: srvex1.Campus.gsi.de (10.10.4.11) To srvex1.campus.gsi.de (10.10.4.11) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Richard Earnshaw \(lists\)" , libstdc++ , gcc-patches List , GNU C Library Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote: > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz wrote: > > I don't understand how this feature would lead to false sharing. But ma= ybe > > I > > misunderstand the spatial prefetcher. The first access to one of the two > > cache > > lines pairs would bring both cache lines to LLC (and possibly L2). If a > > core > > with a different L2 reads the other cache line the cache line would be > > duplicated; if it writes to it, it would be exclusive to the other core= 's > > L2. > > The cache line pairs do not affect each other anymore. Maybe there's a > > minor > > inefficiency on initial transfer from memory, but isn't that all? >=20 > If two cores that do not share an L2 cache need exclusive access to > a cache-line, the L2 spatial prefetcher could cause pingponging if those > two cache-lines were adjacent and shared the same 128 byte alignment. > Say core A requests line x1 in exclusive, it also get line x2 (not sure > if x2 would be in shared or exclusive), core B then requests x2 in > exclusive, > it also gets x1. Irrelevant of the state x1 comes into core B's private L2 > cache > it invalidates the exclusive state on cache-line x1 in core A's private L2 > cache. If this was done in a loop (say a simple `lock add` loop) it would > cause > pingponging on cache-lines x1/x2 between core A and B's private L2 caches. Quoting the latest ORM: "The following two hardware prefetchers fetched dat= a=20 from memory to the L2 cache and last level cache: Spatial Prefetcher: This prefetcher strives to complete every cache line=20 fetched to the L2 cache with the pair line that completes it to a 128-byte= =20 aligned chunk." 1. If the requested cache line is already present on some other core, the=20 spatial prefetcher should not get used ("fetched data from memory"). 2. The section is about data prefetching. It is unclear whether the spatial= =20 prefetcher applies at all for normal cache line fetches. 3. The ORM uses past tense ("The following two hardware prefetchers fetched= =20 data"), which indicates to me that Intel isn't doing this for newer=20 generations anymore. 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of ca= che=20 line A and the adjacent cache line B thus is also loaded to LLC. Core 2=20 request a read of line B and thus loads line A into LLC. Now both cores hav= e=20 both cache lines in LLC. Core 1 writes to line A, which invalidates line A = in=20 LLC of Core 2 but does not affect line B. Core 2 writes to line B,=20 invalidating line A for Core 1. =3D> no false sharing. Where did I get my m= ental=20 cache protocol wrong? =2D-=20 =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80