Re: [llvm-dev] [PATCH] Add optional _Float16 support - H.J. Lu via Libc-alpha

From: "H.J. Lu via Libc-alpha" <libc-alpha@sourceware.org>
To: "Wang, Pengfei" <pengfei.wang@intel.com>, llvm-dev@lists.llvm.org
Cc: GNU C Library <libc-alpha@sourceware.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
	IA32 System V Application Binary Interface
	<ia32-abi@googlegroups.com>,
	Joseph Myers <joseph@codesourcery.com>
Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
Date: Tue, 13 Jul 2021 07:26:02 -0700	[thread overview]
Message-ID: <CAMe9rOppeyatAtdj--hJmiNdinTx7UO7vOAsKVqY5Xv5dfLFMA@mail.gmail.com> (raw)
In-Reply-To: <DM6PR11MB300351B195028A5A5510FADD88149@DM6PR11MB3003.namprd11.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]

On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote:
>
> > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e.,
> 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
> 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?

Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

> Thanks
> Pengfei
>
> -----Original Message-----
> From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of H.J. Lu via llvm-dev
> Sent: Friday, July 2, 2021 6:28 AM
> To: Joseph Myers <joseph@codesourcery.com>
> Cc: llvm-dev@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application Binary Interface <ia32-abi@googlegroups.com>
> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
>
> On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> >
> > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > That restricts use of _Float16 to processors with SSE.  Is that what
> > we want in the ABI, or should _Float16 be available with base 32-bit
> > x86 architecture features only, much like _Float128 and the decimal FP
> > types
>
> Yes, _Float16 requires XMM registers.
>
> > are?  (If it is restricted to SSE, we can of course ensure relevant
> > libgcc functions are built with SSE enabled, and likewise in glibc if
> > that gains
> > _Float16 functions, though maybe with some extra complications to get
> > relevant testcases to run whenever possible.)
> >
>
> _Float16 functions in libgcc should be compiled with SSE enabled.
>
> BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.
>
> --
> H.J.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



-- 
H.J.

[-- Attachment #2: v2-0001-Add-optional-_Float16-support.patch --]
[-- Type: text/x-patch, Size: 7316 bytes --]

From b48c361b939ef9216184f1a58a9d5052bbeb7551 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 1 Jul 2021 13:58:00 -0700
Subject: [PATCH v2] Add optional _Float16 support

1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
---
 low-level-sys-info.tex | 76 ++++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 22 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index acaf30e..157509b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a
 \subsubsection{Fundamental Types}
 
 Table~\ref{basic-types} shows the correspondence between ISO C
-scalar types and the processor scalar types.  \code{__float80},
+scalar types and the processor scalar types.  \code{_Float16},
+\code{__float80},
 \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and
 \code{__m512} types are optional.
 
@@ -79,22 +80,27 @@ scalar types and the processor scalar types.  \code{__float80},
     & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\
     & \texttt{\textit{any-type} (*)()} & & \\
     \hline
-    Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
     \cline{2-5}
-    point & \texttt{double} & 8 & 4 & double (IEEE-754) \\
-    & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+    & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\
     \cline{2-5}
-    & \texttt{__float80}$^{\dagger\dagger}$  & 12 & 4 & 80-bit extended (IEEE-754) \\
-    & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+    & \texttt{float} & 4 & 4 & single (IEEE-754) \\
+    \cline{2-5}
+    Floating- & \texttt{double} & 8
+	& 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\
+    \cline{2-5}
+    point & \texttt{__float80}$^{\dagger\dagger}$  & 16 & 16 & 80-bit extended (IEEE-754) \\
+    & \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$  & 16 & 16 & 80-bit extended (IEEE-754) \\
     \cline{2-5}
     & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\
     \hline
-    Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
+    & \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & complex 16-bit (IEEE-754) \\
     \cline{2-5}
-    Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
-    point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+    & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
     \cline{2-5}
-    & \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 80-bit extended (IEEE-754) \\
+    Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
+    Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+    \cline{2-5}
+    point & \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 80-bit extended (IEEE-754) \\
     & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
     \cline{2-5}
     & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\
@@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double}
 type, on the Android{\texttrademark} platform.  More information on the
 Android{\texttrademark} platform is available from
 \url{http://www.android.com/}.}\\
+\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger\dagger}$
+The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\
   \end{tabular}
 }
 \end{table}
@@ -323,6 +331,7 @@ at the time of the call.
 \begin{table}
 \Hrule
   \caption{Register Usage}
+  \myfontsize
   \label{fig-reg-usage}
   \begin{center}
     \begin{tabular}{l|p{8.35cm}|l}
@@ -346,13 +355,29 @@ of some 64bit return types & No \\
 \EBP & callee-saved register; optionally used as frame pointer & Yes \\
 \ESI & callee-saved register & yes \\
 \EDI & callee-saved register & yes \\
-\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return
-\code{__m128}, \code{__m256} parameters & No\\
-\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass
-\code{__m128}, & No \\
-\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\
-\reg{xmm3}--\reg{xmm7},& scratch registers & No \\
-\reg{ymm3}--\reg{ymm7} & & \\
+\reg{xmm0} & scratch register; also used to pass the first \code{__m128}
+             parameter and return \code{__m128}, \code{_Float16},
+	     the real part of \code{_Complex _Float16} & No \\
+\reg{ymm0} & scratch register; also used to pass the first \code{__m256}
+             parameter and return \code{__m256} & No \\
+\reg{zmm0} & scratch register; also used to pass the first \code{__m512}
+             parameter and return \code{__m512} & No \\
+\reg{xmm1} & scratch register; also used to pass the second \code{__m128}
+             parameter and return the imaginary part of
+	     \code{_Complex _Float16} & No \\
+\reg{ymm1} & scratch register; also used to pass the second \code{__m256}
+             parameters & No \\
+\reg{zmm1} & scratch register; also used to pass the second \code{__m512}
+             parameters & No \\
+\reg{xmm2} & scratch register; also used to pass the third \code{__m128}
+             parameters & No \\
+\reg{ymm2} & scratch register; also used to pass the third \code{__m256}
+             parameters & No \\
+\reg{zmm2} & scratch register; also used to pass the third \code{__m512}
+             parameters & No \\
+\reg{xmm3}--\reg{xmm7} & scratch registers & No \\
+\reg{ymm3}--\reg{ymm7} & scratch registers & No \\
+\reg{zmm3}--\reg{zmm7} & scratch registers & No \\
 \reg{mm0} & scratch register; also used to pass and return
 \code{__m64} parameter & No\\
 \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\
@@ -420,6 +445,8 @@ and \texttt{unions}) are always returned in memory.
     & \texttt{\textit{any-type} *} & \EAX \\
     & \texttt{\textit{any-type} (*)()} & \\
     \hline
+    & \texttt{_Float16} & \reg{xmm0} \\
+    \cline{2-3}
     & \texttt{float} & \reg{st0} \\
     \cline{2-3}
     Floating- & \texttt{double} & \reg{st0} \\
@@ -430,14 +457,19 @@ and \texttt{unions}) are always returned in memory.
     \cline{2-3}
     & \texttt{__float128} & memory \\
     \hline
-    & \texttt{_Complex float} & \EDX:\EAX \\
-    & & The real part is returned in \EAX. The imaginary part is
+    & \texttt{_Complex _Float16} & \reg{xmm0}:\reg{xmm1} \\
+    & & The real part is returned in \reg{xmm0}. The imaginary part is
+        returned \\
+    & & in \reg{xmm1}.\\
+    \cline{2-3}
+    Complex & \texttt{_Complex float} & \EDX:\EAX \\
+    floating- & & The real part is returned in \EAX. The imaginary part is
         returned \\
-    Complex & & in \EDX.\\
+    point & & in \EDX.\\
     \cline{2-3}
-    floating- & \texttt{_Complex double} & memory \\
+    & \texttt{_Complex double} & memory \\
     \cline{2-3}
-    point & \texttt{_Complex long double} & memory \\
+    & \texttt{_Complex long double} & memory \\
     \cline{2-3}
     & \texttt{_Complex __float80} & memory \\
     \cline{2-3}
-- 
2.31.1