From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CA4211F953 for ; Wed, 1 Dec 2021 11:12:26 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 70554385781D for ; Wed, 1 Dec 2021 11:12:24 +0000 (GMT) Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sourceware.org (Postfix) with ESMTPS id 7BDEC3858C60 for ; Wed, 1 Dec 2021 11:12:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7BDEC3858C60 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=inria.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:message-id:from:to:cc:in-reply-to:subject: references; bh=uDl7InzP73GljYayaPjuSSK40U2MII4NsmsA7JjEjU0=; b=DwEvpNLQ4d74AJVn7DaqHp3e2cNqS2yU9qgYUifqYKHIfJvWxPvyFesT QxgLhVd59/86VUarOndSn5AwmTrrLLgm5//JdrpV3EoAVk7nwiSMaf2Hr M2DN4VYhgFI+YA9zbx+gP5FRo20yksUmdtyRcG+O4o1/bRmz3JPcm+Y+i g=; IronPort-HdrOrdr: =?us-ascii?q?A9a23=3Ahk2gnKCHk/WbO9LlHejhsceALOsnbusQ8zAX?= =?us-ascii?q?PiBKIiC9Ffbo5PxG/c5rrSMc7Qx8ZJhOo6H+BEGdKUmsvKKdjbNhRItLsmHdyS?= =?us-ascii?q?SVxK8L1/qt/9TLIVyzygbEvp0QMpSXMbXLfBhHZSyT2nj6Lz9Y+qjEzEnXv5al?= =?us-ascii?q?854dd3AQV0g61XYwe02m+yRNLWEtaPVWZfnshLsi1l3QH0j/LP7Le0XtOdKz0+?= =?us-ascii?q?Eju6iWKyLubCRXnTVm4wnYlILSIly1+S1bVD9Bhb0m8WTDjmXChplKqpqAqyM1?= =?us-ascii?q?WALonuUm6OcI9LF4dbuxo/lQDhXJpkKBeJlmUbuEozwvve2jgWxa4uUk2y1QQf?= =?us-ascii?q?ia4BnqDxeISMTWqmzd+Qdr7HmnwUSTgHv9raXCNX8HN/Y=3D?= X-IronPort-AV: E=Sophos;i="5.84,326,1620684000"; d="scan'208";a="400285434" Received: from tomate.loria.fr (HELO tomate) ([152.81.10.51]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Dec 2021 12:12:12 +0100 Date: Wed, 01 Dec 2021 12:12:11 +0100 Message-Id: From: Paul Zimmermann To: Wilco Dijkstra In-Reply-To: (message from Wilco Dijkstra via Libc-alpha on Tue, 30 Nov 2021 13:04:27 +0000) Subject: Re: [PATCH] Improve hypot performance References: X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Dear Wilco, > Improve hypot performance significantly by using fma when available. The fma > version has twice the throughput of the previous version and 70% of the latency. > The non-fma version has 30% higher throughput and 10% higher latency. I cannot reproduce these figures. On a x86_64 for the fma version I get almost identical throughput, and only 88% of the previous latency. For the non-fma version I get 24% smaller throughput (31% larger *reciprocal* throughput), and 51% higher latency. With fma: before: "hypot": { "workload-random": { "duration": 3.31399e+09, "iterations": 7.4e+07, "reciprocal-throughput": 32.4478, "latency": 57.1194, "max-throughput": 3.08188e+07, "min-throughput": 1.75072e+07 } with patch: "hypot": { "workload-random": { "duration": 3.33618e+09, "iterations": 8.2e+07, "reciprocal-throughput": 31.205, "latency": 50.1653, "max-throughput": 3.20462e+07, "min-throughput": 1.99341e+07 } Without fma: before: "hypot": { "workload-random": { "duration": 3.34724e+09, "iterations": 7.4e+07, "reciprocal-throughput": 32.9285, "latency": 57.5374, "max-throughput": 3.03689e+07, "min-throughput": 1.738e+07 } with patch: "hypot": { "workload-random": { "duration": 3.38571e+09, "iterations": 5.2e+07, "reciprocal-throughput": 43.1054, "latency": 87.1141, "max-throughput": 2.31989e+07, "min-throughput": 1.14792e+07 } > Max ULP error is 0.949 with fma and 0.792 without fma. I confirm this. More precisely here are the largest errors I find with corresponding inputs: Without fma: hypot 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022 0.791664 With fma: hypot -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022 0.948811 (compared to 0.986776 for the current version, with inputs -0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022). Best regards, Paul