From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces+e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS17314 8.43.84.0/22
X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id CA4211F953
	for <e@80x24.org>; Wed,  1 Dec 2021 11:12:26 +0000 (UTC)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 70554385781D
	for <e@80x24.org>; Wed,  1 Dec 2021 11:12:24 +0000 (GMT)
Received: from mail3-relais-sop.national.inria.fr
 (mail3-relais-sop.national.inria.fr [192.134.164.104])
 by sourceware.org (Postfix) with ESMTPS id 7BDEC3858C60
 for <libc-alpha@sourceware.org>; Wed,  1 Dec 2021 11:12:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7BDEC3858C60
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=inria.fr
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=inria.fr
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc;
 h=date:message-id:from:to:cc:in-reply-to:subject: references;
 bh=uDl7InzP73GljYayaPjuSSK40U2MII4NsmsA7JjEjU0=;
 b=DwEvpNLQ4d74AJVn7DaqHp3e2cNqS2yU9qgYUifqYKHIfJvWxPvyFesT
 QxgLhVd59/86VUarOndSn5AwmTrrLLgm5//JdrpV3EoAVk7nwiSMaf2Hr
 M2DN4VYhgFI+YA9zbx+gP5FRo20yksUmdtyRcG+O4o1/bRmz3JPcm+Y+i g=;
IronPort-HdrOrdr: =?us-ascii?q?A9a23=3Ahk2gnKCHk/WbO9LlHejhsceALOsnbusQ8zAX?=
 =?us-ascii?q?PiBKIiC9Ffbo5PxG/c5rrSMc7Qx8ZJhOo6H+BEGdKUmsvKKdjbNhRItLsmHdyS?=
 =?us-ascii?q?SVxK8L1/qt/9TLIVyzygbEvp0QMpSXMbXLfBhHZSyT2nj6Lz9Y+qjEzEnXv5al?=
 =?us-ascii?q?854dd3AQV0g61XYwe02m+yRNLWEtaPVWZfnshLsi1l3QH0j/LP7Le0XtOdKz0+?=
 =?us-ascii?q?Eju6iWKyLubCRXnTVm4wnYlILSIly1+S1bVD9Bhb0m8WTDjmXChplKqpqAqyM1?=
 =?us-ascii?q?WALonuUm6OcI9LF4dbuxo/lQDhXJpkKBeJlmUbuEozwvve2jgWxa4uUk2y1QQf?=
 =?us-ascii?q?ia4BnqDxeISMTWqmzd+Qdr7HmnwUSTgHv9raXCNX8HN/Y=3D?=
X-IronPort-AV: E=Sophos;i="5.84,326,1620684000"; d="scan'208";a="400285434"
Received: from tomate.loria.fr (HELO tomate) ([152.81.10.51])
 by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 01 Dec 2021 12:12:12 +0100
Date: Wed, 01 Dec 2021 12:12:11 +0100
Message-Id: <mwo86037g4.fsf@tomate.loria.fr>
From: Paul Zimmermann <Paul.Zimmermann@inria.fr>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
In-Reply-To: <VE1PR08MB5599DFC6FA620B3B3AFA212B83679@VE1PR08MB5599.eurprd08.prod.outlook.com>
 (message from Wilco Dijkstra via Libc-alpha on Tue, 30 Nov 2021
 13:04:27 +0000)
Subject: Re: [PATCH] Improve hypot performance
References: <VE1PR08MB5599DFC6FA620B3B3AFA212B83679@VE1PR08MB5599.eurprd08.prod.outlook.com>
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Cc: libc-alpha@sourceware.org
Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces+e=80x24.org@sourceware.org>

       Dear Wilco,

> Improve hypot performance significantly by using fma when available. The fma
> version has twice the throughput of the previous version and 70% of the latency.
> The non-fma version has 30% higher throughput and 10% higher latency.

I cannot reproduce these figures. On a x86_64 for the fma version I get almost
identical throughput, and only 88% of the previous latency. For the non-fma
version I get 24% smaller throughput (31% larger *reciprocal* throughput),
and 51% higher latency.

With fma:

before:
  "hypot": {
   "workload-random": {
    "duration": 3.31399e+09,
    "iterations": 7.4e+07,
    "reciprocal-throughput": 32.4478,
    "latency": 57.1194,
    "max-throughput": 3.08188e+07,
    "min-throughput": 1.75072e+07
   }

with patch:
  "hypot": {
   "workload-random": {
    "duration": 3.33618e+09,
    "iterations": 8.2e+07,
    "reciprocal-throughput": 31.205,
    "latency": 50.1653,
    "max-throughput": 3.20462e+07,
    "min-throughput": 1.99341e+07
   }

Without fma:

before:
  "hypot": {
   "workload-random": {
    "duration": 3.34724e+09,
    "iterations": 7.4e+07,
    "reciprocal-throughput": 32.9285,
    "latency": 57.5374,
    "max-throughput": 3.03689e+07,
    "min-throughput": 1.738e+07
   }

with patch:
  "hypot": {
   "workload-random": {
    "duration": 3.38571e+09,
    "iterations": 5.2e+07,
    "reciprocal-throughput": 43.1054,
    "latency": 87.1141,
    "max-throughput": 2.31989e+07,
    "min-throughput": 1.14792e+07
   }

> Max ULP error is 0.949 with fma and 0.792 without fma.

I confirm this. More precisely here are the largest errors I find with
corresponding inputs:

Without fma:
hypot 0x0.603e52daf0bfdp-1022,-0x0.a622d0a9a433bp-1022 0.791664

With fma:
hypot -0x0.5a22c27a3893p-1022,0x0.9cfea180c00dap-1022 0.948811

(compared to 0.986776 for the current version, with inputs
-0x0.5a934b7eac967p-1022,-0x0.b5265a7e06b82p-1022).

Best regards,
Paul