* [PATCH] Add optional _Float16 support @ 2021-07-01 21:05 H.J. Lu via Libc-alpha 2021-07-01 22:10 ` Joseph Myers 0 siblings, 1 reply; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-01 21:05 UTC (permalink / raw) To: ia32-abi; +Cc: llvm-dev, libc-alpha, gcc-patches 1. Pass _Float16 and _Complex _Float16 values on stack. 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. --- low-level-sys-info.tex | 57 +++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 17 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index acaf30e..82956e3 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a \subsubsection{Fundamental Types} Table~\ref{basic-types} shows the correspondence between ISO C -scalar types and the processor scalar types. \code{__float80}, +scalar types and the processor scalar types. \code{_Float16}, +\code{__float80}, \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and \code{__m512} types are optional. @@ -79,22 +80,25 @@ scalar types and the processor scalar types. \code{__float80}, & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\ & \texttt{\textit{any-type} (*)()} & & \\ \hline - Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\ \cline{2-5} - point & \texttt{double} & 8 & 4 & double (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\ \cline{2-5} - & \texttt{__float80}$^{\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{float} & 4 & 4 & single (IEEE-754) \\ + \cline{2-5} + Floating- & \texttt{double} & 8 + & 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\ + \cline{2-5} + point & \texttt{__float80}$^{\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ + & \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ \cline{2-5} & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\ \hline - Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ + & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ \cline{2-5} - Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ - point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ + Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} - & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ + point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\ @@ -125,6 +129,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double} type, on the Android{\texttrademark} platform. More information on the Android{\texttrademark} platform is available from \url{http://www.android.com/}.}\\ +\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ +The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\ \end{tabular} } \end{table} @@ -323,6 +329,7 @@ at the time of the call. \begin{table} \Hrule \caption{Register Usage} + \myfontsize \label{fig-reg-usage} \begin{center} \begin{tabular}{l|p{8.35cm}|l} @@ -346,13 +353,29 @@ of some 64bit return types & No \\ \EBP & callee-saved register; optionally used as frame pointer & Yes \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ -\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return -\code{__m128}, \code{__m256} parameters & No\\ -\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass -\code{__m128}, & No \\ -\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\ -\reg{xmm3}--\reg{xmm7},& scratch registers & No \\ -\reg{ymm3}--\reg{ymm7} & & \\ +\reg{xmm0} & scratch register; also used to pass the first \code{__m128} + parameter and return \code{__m128}, \code{_Float16}, + the real part of \code{_Complex _Float16} & No \\ +\reg{ymm0} & scratch register; also used to pass the first \code{__m256} + parameter and return \code{__m256} & No \\ +\reg{zmm0} & scratch register; also used to pass the first \code{__m512} + parameter and return \code{__m512} & No \\ +\reg{xmm1} & scratch register; also used to pass the second \code{__m128} + parameter and return the imaginary part of + \code{_Complex _Float16} & No \\ +\reg{ymm1} & scratch register; also used to pass the second \code{__m256} + parameters & No \\ +\reg{zmm1} & scratch register; also used to pass the second \code{__m512} + parameters & No \\ +\reg{xmm2} & scratch register; also used to pass the third \code{__m128} + parameters & No \\ +\reg{ymm2} & scratch register; also used to pass the third \code{__m256} + parameters & No \\ +\reg{zmm2} & scratch register; also used to pass the third \code{__m512} + parameters & No \\ +\reg{xmm3}--\reg{xmm7} & scratch registers & No \\ +\reg{ymm3}--\reg{ymm7} & scratch registers & No \\ +\reg{zmm3}--\reg{zmm7} & scratch registers & No \\ \reg{mm0} & scratch register; also used to pass and return \code{__m64} parameter & No\\ \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\ -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] Add optional _Float16 support 2021-07-01 21:05 [PATCH] Add optional _Float16 support H.J. Lu via Libc-alpha @ 2021-07-01 22:10 ` Joseph Myers 2021-07-01 22:27 ` H.J. Lu via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: Joseph Myers @ 2021-07-01 22:10 UTC (permalink / raw) To: H.J. Lu; +Cc: llvm-dev, gcc-patches, libc-alpha, ia32-abi On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. That restricts use of _Float16 to processors with SSE. Is that what we want in the ABI, or should _Float16 be available with base 32-bit x86 architecture features only, much like _Float128 and the decimal FP types are? (If it is restricted to SSE, we can of course ensure relevant libgcc functions are built with SSE enabled, and likewise in glibc if that gains _Float16 functions, though maybe with some extra complications to get relevant testcases to run whenever possible.) -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Add optional _Float16 support 2021-07-01 22:10 ` Joseph Myers @ 2021-07-01 22:27 ` H.J. Lu via Libc-alpha 2021-07-01 22:40 ` Joseph Myers ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-01 22:27 UTC (permalink / raw) To: Joseph Myers Cc: llvm-dev, GCC Patches, GNU C Library, IA32 System V Application Binary Interface On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote: > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > That restricts use of _Float16 to processors with SSE. Is that what we > want in the ABI, or should _Float16 be available with base 32-bit x86 > architecture features only, much like _Float128 and the decimal FP types Yes, _Float16 requires XMM registers. > are? (If it is restricted to SSE, we can of course ensure relevant libgcc > functions are built with SSE enabled, and likewise in glibc if that gains > _Float16 functions, though maybe with some extra complications to get > relevant testcases to run whenever possible.) > _Float16 functions in libgcc should be compiled with SSE enabled. BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. There is no 16bit load/store for XMM registers without AVX512FP16. -- H.J. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Add optional _Float16 support 2021-07-01 22:27 ` H.J. Lu via Libc-alpha @ 2021-07-01 22:40 ` Joseph Myers 2021-07-01 23:01 ` H.J. Lu via Libc-alpha 2021-07-01 23:33 ` Jacob Lifshay via Libc-alpha 2021-07-13 3:59 ` Wang, Pengfei via Libc-alpha 2 siblings, 1 reply; 20+ messages in thread From: Joseph Myers @ 2021-07-01 22:40 UTC (permalink / raw) To: H.J. Lu Cc: llvm-dev, GCC Patches, GNU C Library, IA32 System V Application Binary Interface On Thu, 1 Jul 2021, H.J. Lu wrote: > BTW, _Float16 software emulation may require more than just SSE > since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. You should be able to make the move go via general-purpose registers (for example) if you can't do a direct 16-bit load/store for XMM registers. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Add optional _Float16 support 2021-07-01 22:40 ` Joseph Myers @ 2021-07-01 23:01 ` H.J. Lu via Libc-alpha 2021-07-01 23:05 ` [llvm-dev] " Craig Topper via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-01 23:01 UTC (permalink / raw) To: Joseph Myers Cc: llvm-dev, GCC Patches, GNU C Library, IA32 System V Application Binary Interface On Thu, Jul 1, 2021 at 3:40 PM Joseph Myers <joseph@codesourcery.com> wrote: > > On Thu, 1 Jul 2021, H.J. Lu wrote: > > > BTW, _Float16 software emulation may require more than just SSE > > since we need to do _Float16 load and store with XMM registers. > > There is no 16bit load/store for XMM registers without AVX512FP16. > > You should be able to make the move go via general-purpose registers (for > example) if you can't do a direct 16-bit load/store for XMM registers. > There is no 16bit move between GPRs and XMM registers without AVX512FP16. -- H.J. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-01 23:01 ` H.J. Lu via Libc-alpha @ 2021-07-01 23:05 ` Craig Topper via Libc-alpha 0 siblings, 0 replies; 20+ messages in thread From: Craig Topper via Libc-alpha @ 2021-07-01 23:05 UTC (permalink / raw) To: H.J. Lu Cc: llvm-dev, GNU C Library, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Thu, Jul 1, 2021 at 4:02 PM H.J. Lu via llvm-dev <llvm-dev@lists.llvm.org> wrote: > On Thu, Jul 1, 2021 at 3:40 PM Joseph Myers <joseph@codesourcery.com> > wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu wrote: > > > > > BTW, _Float16 software emulation may require more than just SSE > > > since we need to do _Float16 load and store with XMM registers. > > > There is no 16bit load/store for XMM registers without AVX512FP16. > > > > You should be able to make the move go via general-purpose registers (for > > example) if you can't do a direct 16-bit load/store for XMM registers. > > > > There is no 16bit move between GPRs and XMM registers without > AVX512FP16. > > Isn't PINSRW supported since SSE1? > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-01 22:27 ` H.J. Lu via Libc-alpha 2021-07-01 22:40 ` Joseph Myers @ 2021-07-01 23:33 ` Jacob Lifshay via Libc-alpha 2021-07-02 7:45 ` Richard Biener via Libc-alpha 2021-07-13 3:59 ` Wang, Pengfei via Libc-alpha 2 siblings, 1 reply; 20+ messages in thread From: Jacob Lifshay via Libc-alpha @ 2021-07-01 23:33 UTC (permalink / raw) To: H.J. Lu Cc: llvm-dev, GNU C Library, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Thu, Jul 1, 2021, 15:28 H.J. Lu via llvm-dev <llvm-dev@lists.llvm.org> wrote: > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> > wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what we > > want in the ABI, or should _Float16 be available with base 32-bit x86 > > architecture features only, much like _Float128 and the decimal FP types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > libgcc > > functions are built with SSE enabled, and likewise in glibc if that gains > > _Float16 functions, though maybe with some extra complications to get > > relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE > since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > Umm, if you just need to load/store 16-bit scalars in XMM registers you can use pextrw and pinsrw which don't require AVX. f16x8 can use any of the standard full-register load/stores. https://gcc.godbolt.org/z/ncznr9TM1 Jacob ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-01 23:33 ` Jacob Lifshay via Libc-alpha @ 2021-07-02 7:45 ` Richard Biener via Libc-alpha 2021-07-02 8:03 ` Hongtao Liu via Libc-alpha 2021-07-02 9:21 ` Jakub Jelinek via Libc-alpha 0 siblings, 2 replies; 20+ messages in thread From: Richard Biener via Libc-alpha @ 2021-07-02 7:45 UTC (permalink / raw) To: Jacob Lifshay Cc: GNU C Library, llvm-dev, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Fri, Jul 2, 2021 at 1:34 AM Jacob Lifshay via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Jul 1, 2021, 15:28 H.J. Lu via llvm-dev <llvm-dev@lists.llvm.org> > wrote: > > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> > > wrote: > > > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > > registers. > > > > > > That restricts use of _Float16 to processors with SSE. Is that what we > > > want in the ABI, or should _Float16 be available with base 32-bit x86 > > > architecture features only, much like _Float128 and the decimal FP types > > > > Yes, _Float16 requires XMM registers. > > > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc > > > functions are built with SSE enabled, and likewise in glibc if that gains > > > _Float16 functions, though maybe with some extra complications to get > > > relevant testcases to run whenever possible.) > > > > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > > > BTW, _Float16 software emulation may require more than just SSE > > since we need to do _Float16 load and store with XMM registers. > > There is no 16bit load/store for XMM registers without AVX512FP16. > > > > Umm, if you just need to load/store 16-bit scalars in XMM registers you can > use pextrw and pinsrw which don't require AVX. f16x8 can use any of the > standard full-register load/stores. It looks like that requires SSE2, with SSE only inserts/extracts to/from MMX regs are supported. But of course GPR half-word loads and GPR->XMM moves of full size would work. > https://gcc.godbolt.org/z/ncznr9TM1 > > Jacob ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-02 7:45 ` Richard Biener via Libc-alpha @ 2021-07-02 8:03 ` Hongtao Liu via Libc-alpha 2021-07-02 9:21 ` Jakub Jelinek via Libc-alpha 1 sibling, 0 replies; 20+ messages in thread From: Hongtao Liu via Libc-alpha @ 2021-07-02 8:03 UTC (permalink / raw) To: Richard Biener Cc: GNU C Library, Jacob Lifshay, llvm-dev, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Fri, Jul 2, 2021 at 3:46 PM Richard Biener via llvm-dev <llvm-dev@lists.llvm.org> wrote: > > On Fri, Jul 2, 2021 at 1:34 AM Jacob Lifshay via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Thu, Jul 1, 2021, 15:28 H.J. Lu via llvm-dev <llvm-dev@lists.llvm.org> > > wrote: > > > > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> > > > wrote: > > > > > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > > > registers. > > > > > > > > That restricts use of _Float16 to processors with SSE. Is that what we > > > > want in the ABI, or should _Float16 be available with base 32-bit x86 > > > > architecture features only, much like _Float128 and the decimal FP types > > > > > > Yes, _Float16 requires XMM registers. > > > > > > > are? (If it is restricted to SSE, we can of course ensure relevant > > > libgcc > > > > functions are built with SSE enabled, and likewise in glibc if that gains > > > > _Float16 functions, though maybe with some extra complications to get > > > > relevant testcases to run whenever possible.) > > > > > > > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > > > > > BTW, _Float16 software emulation may require more than just SSE > > > since we need to do _Float16 load and store with XMM registers. > > > There is no 16bit load/store for XMM registers without AVX512FP16. > > > > > > > Umm, if you just need to load/store 16-bit scalars in XMM registers you can > > use pextrw and pinsrw which don't require AVX. f16x8 can use any of the > > standard full-register load/stores. > > It looks like that requires SSE2, with SSE only inserts/extracts > to/from MMX regs > are supported. But of course GPR half-word loads and GPR->XMM moves of > full size would work. movd between sse registers and gpr also required sse2. > > > https://gcc.godbolt.org/z/ncznr9TM1 > > > > Jacob > _______________________________________________ > LLVM Developers mailing list > llvm-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- BR, Hongtao ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-02 7:45 ` Richard Biener via Libc-alpha 2021-07-02 8:03 ` Hongtao Liu via Libc-alpha @ 2021-07-02 9:21 ` Jakub Jelinek via Libc-alpha 1 sibling, 0 replies; 20+ messages in thread From: Jakub Jelinek via Libc-alpha @ 2021-07-02 9:21 UTC (permalink / raw) To: Richard Biener Cc: GNU C Library, Jacob Lifshay, llvm-dev, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Fri, Jul 02, 2021 at 09:45:46AM +0200, Richard Biener via Gcc-patches wrote: > > > > are? (If it is restricted to SSE, we can of course ensure relevant > > > libgcc > > > > functions are built with SSE enabled, and likewise in glibc if that gains > > > > _Float16 functions, though maybe with some extra complications to get > > > > relevant testcases to run whenever possible.) > > > > > > > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > > > > > BTW, _Float16 software emulation may require more than just SSE > > > since we need to do _Float16 load and store with XMM registers. > > > There is no 16bit load/store for XMM registers without AVX512FP16. > > > > > > > Umm, if you just need to load/store 16-bit scalars in XMM registers you can > > use pextrw and pinsrw which don't require AVX. f16x8 can use any of the > > standard full-register load/stores. > > It looks like that requires SSE2, with SSE only inserts/extracts > to/from MMX regs > are supported. But of course GPR half-word loads and GPR->XMM moves of > full size would work. Loads can be done in SSE2 directly with PINSRW, that supports 16-bit load from memory to XMM reg. But SSE2 PEXTRW only supports stores into GPR and one needs SSE4.1 fo PEXTRW into memory. So, for the stores and SSE2 one needs secondary reload... Jakub ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-01 22:27 ` H.J. Lu via Libc-alpha 2021-07-01 22:40 ` Joseph Myers 2021-07-01 23:33 ` Jacob Lifshay via Libc-alpha @ 2021-07-13 3:59 ` Wang, Pengfei via Libc-alpha 2021-07-13 14:26 ` H.J. Lu via Libc-alpha 2 siblings, 1 reply; 20+ messages in thread From: Wang, Pengfei via Libc-alpha @ 2021-07-13 3:59 UTC (permalink / raw) To: H.J. Lu, Joseph Myers Cc: GNU C Library, GCC Patches, IA32 System V Application Binary Interface > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., 1, In which case will _Float16 values return in both %xmm0 and %xmm1? 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? Thanks Pengfei -----Original Message----- From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of H.J. Lu via llvm-dev Sent: Friday, July 2, 2021 6:28 AM To: Joseph Myers <joseph@codesourcery.com> Cc: llvm-dev@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application Binary Interface <ia32-abi@googlegroups.com> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote: > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > That restricts use of _Float16 to processors with SSE. Is that what > we want in the ABI, or should _Float16 be available with base 32-bit > x86 architecture features only, much like _Float128 and the decimal FP > types Yes, _Float16 requires XMM registers. > are? (If it is restricted to SSE, we can of course ensure relevant > libgcc functions are built with SSE enabled, and likewise in glibc if > that gains > _Float16 functions, though maybe with some extra complications to get > relevant testcases to run whenever possible.) > _Float16 functions in libgcc should be compiled with SSE enabled. BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. There is no 16bit load/store for XMM registers without AVX512FP16. -- H.J. _______________________________________________ LLVM Developers mailing list llvm-dev@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 3:59 ` Wang, Pengfei via Libc-alpha @ 2021-07-13 14:26 ` H.J. Lu via Libc-alpha 2021-07-13 14:48 ` Wang, Pengfei via Libc-alpha 2021-07-13 15:41 ` Joseph Myers 0 siblings, 2 replies; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-13 14:26 UTC (permalink / raw) To: Wang, Pengfei, llvm-dev Cc: GNU C Library, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers [-- Attachment #1: Type: text/plain, Size: 2348 bytes --] On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? Here is the v2 patch to add the missing _Float16 bits. The PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > Thanks > Pengfei > > -----Original Message----- > From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of H.J. Lu via llvm-dev > Sent: Friday, July 2, 2021 6:28 AM > To: Joseph Myers <joseph@codesourcery.com> > Cc: llvm-dev@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application Binary Interface <ia32-abi@googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what > > we want in the ABI, or should _Float16 be available with base 32-bit > > x86 architecture features only, much like _Float128 and the decimal FP > > types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc functions are built with SSE enabled, and likewise in glibc if > > that gains > > _Float16 functions, though maybe with some extra complications to get > > relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- H.J. [-- Attachment #2: v2-0001-Add-optional-_Float16-support.patch --] [-- Type: text/x-patch, Size: 7316 bytes --] From b48c361b939ef9216184f1a58a9d5052bbeb7551 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Thu, 1 Jul 2021 13:58:00 -0700 Subject: [PATCH v2] Add optional _Float16 support 1. Pass _Float16 and _Complex _Float16 values on stack. 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. --- low-level-sys-info.tex | 76 ++++++++++++++++++++++++++++++------------ 1 file changed, 54 insertions(+), 22 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index acaf30e..157509b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a \subsubsection{Fundamental Types} Table~\ref{basic-types} shows the correspondence between ISO C -scalar types and the processor scalar types. \code{__float80}, +scalar types and the processor scalar types. \code{_Float16}, +\code{__float80}, \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and \code{__m512} types are optional. @@ -79,22 +80,27 @@ scalar types and the processor scalar types. \code{__float80}, & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\ & \texttt{\textit{any-type} (*)()} & & \\ \hline - Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\ \cline{2-5} - point & \texttt{double} & 8 & 4 & double (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\ \cline{2-5} - & \texttt{__float80}$^{\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{float} & 4 & 4 & single (IEEE-754) \\ + \cline{2-5} + Floating- & \texttt{double} & 8 + & 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\ + \cline{2-5} + point & \texttt{__float80}$^{\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ + & \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ \cline{2-5} & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\ \hline - Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ + & \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & complex 16-bit (IEEE-754) \\ \cline{2-5} - Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ - point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ \cline{2-5} - & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ + Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ + Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + \cline{2-5} + point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\ @@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double} type, on the Android{\texttrademark} platform. More information on the Android{\texttrademark} platform is available from \url{http://www.android.com/}.}\\ +\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ +The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\ \end{tabular} } \end{table} @@ -323,6 +331,7 @@ at the time of the call. \begin{table} \Hrule \caption{Register Usage} + \myfontsize \label{fig-reg-usage} \begin{center} \begin{tabular}{l|p{8.35cm}|l} @@ -346,13 +355,29 @@ of some 64bit return types & No \\ \EBP & callee-saved register; optionally used as frame pointer & Yes \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ -\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return -\code{__m128}, \code{__m256} parameters & No\\ -\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass -\code{__m128}, & No \\ -\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\ -\reg{xmm3}--\reg{xmm7},& scratch registers & No \\ -\reg{ymm3}--\reg{ymm7} & & \\ +\reg{xmm0} & scratch register; also used to pass the first \code{__m128} + parameter and return \code{__m128}, \code{_Float16}, + the real part of \code{_Complex _Float16} & No \\ +\reg{ymm0} & scratch register; also used to pass the first \code{__m256} + parameter and return \code{__m256} & No \\ +\reg{zmm0} & scratch register; also used to pass the first \code{__m512} + parameter and return \code{__m512} & No \\ +\reg{xmm1} & scratch register; also used to pass the second \code{__m128} + parameter and return the imaginary part of + \code{_Complex _Float16} & No \\ +\reg{ymm1} & scratch register; also used to pass the second \code{__m256} + parameters & No \\ +\reg{zmm1} & scratch register; also used to pass the second \code{__m512} + parameters & No \\ +\reg{xmm2} & scratch register; also used to pass the third \code{__m128} + parameters & No \\ +\reg{ymm2} & scratch register; also used to pass the third \code{__m256} + parameters & No \\ +\reg{zmm2} & scratch register; also used to pass the third \code{__m512} + parameters & No \\ +\reg{xmm3}--\reg{xmm7} & scratch registers & No \\ +\reg{ymm3}--\reg{ymm7} & scratch registers & No \\ +\reg{zmm3}--\reg{zmm7} & scratch registers & No \\ \reg{mm0} & scratch register; also used to pass and return \code{__m64} parameter & No\\ \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\ @@ -420,6 +445,8 @@ and \texttt{unions}) are always returned in memory. & \texttt{\textit{any-type} *} & \EAX \\ & \texttt{\textit{any-type} (*)()} & \\ \hline + & \texttt{_Float16} & \reg{xmm0} \\ + \cline{2-3} & \texttt{float} & \reg{st0} \\ \cline{2-3} Floating- & \texttt{double} & \reg{st0} \\ @@ -430,14 +457,19 @@ and \texttt{unions}) are always returned in memory. \cline{2-3} & \texttt{__float128} & memory \\ \hline - & \texttt{_Complex float} & \EDX:\EAX \\ - & & The real part is returned in \EAX. The imaginary part is + & \texttt{_Complex _Float16} & \reg{xmm0}:\reg{xmm1} \\ + & & The real part is returned in \reg{xmm0}. The imaginary part is + returned \\ + & & in \reg{xmm1}.\\ + \cline{2-3} + Complex & \texttt{_Complex float} & \EDX:\EAX \\ + floating- & & The real part is returned in \EAX. The imaginary part is returned \\ - Complex & & in \EDX.\\ + point & & in \EDX.\\ \cline{2-3} - floating- & \texttt{_Complex double} & memory \\ + & \texttt{_Complex double} & memory \\ \cline{2-3} - point & \texttt{_Complex long double} & memory \\ + & \texttt{_Complex long double} & memory \\ \cline{2-3} & \texttt{_Complex __float80} & memory \\ \cline{2-3} -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* RE: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 14:26 ` H.J. Lu via Libc-alpha @ 2021-07-13 14:48 ` Wang, Pengfei via Libc-alpha 2021-07-13 15:04 ` H.J. Lu via Libc-alpha 2021-07-13 15:41 ` Joseph Myers 1 sibling, 1 reply; 20+ messages in thread From: Wang, Pengfei via Libc-alpha @ 2021-07-13 14:48 UTC (permalink / raw) To: H.J. Lu, llvm-dev@lists.llvm.org Cc: GNU C Library, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers Hi H.J., Our LLVM implementation currently use %xmm0 for both _Complex's real part and imaginary part. Do we have special reason to use two registers? We are using one register on X64. Considering the performance, especially the register pressure, should it be better to use one register for _Complex _Float16 on 32 bits target? Thanks Pengfei -----Original Message----- From: H.J. Lu <hjl.tools@gmail.com> Sent: Tuesday, July 13, 2021 10:26 PM To: Wang, Pengfei <pengfei.wang@intel.com>; llvm-dev@lists.llvm.org Cc: Joseph Myers <joseph@codesourcery.com>; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application Binary Interface <ia32-abi@googlegroups.com> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > Can you please explain the behavior here? Is there difference between > _Float16 and _Complex _Float16 when return? I.e., 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? Here is the v2 patch to add the missing _Float16 bits. The PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > Thanks > Pengfei > > -----Original Message----- > From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of H.J. Lu > via llvm-dev > Sent: Friday, July 2, 2021 6:28 AM > To: Joseph Myers <joseph@codesourcery.com> > Cc: llvm-dev@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; > GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application > Binary Interface <ia32-abi@googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what > > we want in the ABI, or should _Float16 be available with base 32-bit > > x86 architecture features only, much like _Float128 and the decimal > > FP types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc functions are built with SSE enabled, and likewise in glibc > > if that gains > > _Float16 functions, though maybe with some extra complications to > > get relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-dev@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- H.J. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 14:48 ` Wang, Pengfei via Libc-alpha @ 2021-07-13 15:04 ` H.J. Lu via Libc-alpha 0 siblings, 0 replies; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-13 15:04 UTC (permalink / raw) To: Wang, Pengfei Cc: llvm-dev@lists.llvm.org, GNU C Library, GCC Patches, IA32 System V Application Binary Interface, Joseph Myers On Tue, Jul 13, 2021 at 7:48 AM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > Hi H.J., > > Our LLVM implementation currently use %xmm0 for both _Complex's real part and imaginary part. Do we have special reason to use two registers? > We are using one register on X64. Considering the performance, especially the register pressure, should it be better to use one register for _Complex _Float16 on 32 bits target? x86-64 psABI is unrelated to i386 psABI. Using a pair of registers is more natural for complex _Float16. Since it is only used for function return value, I don't think there is a register pressure issue. > Thanks > Pengfei > > -----Original Message----- > From: H.J. Lu <hjl.tools@gmail.com> > Sent: Tuesday, July 13, 2021 10:26 PM > To: Wang, Pengfei <pengfei.wang@intel.com>; llvm-dev@lists.llvm.org > Cc: Joseph Myers <joseph@codesourcery.com>; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application Binary Interface <ia32-abi@googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > Can you please explain the behavior here? Is there difference between > > _Float16 and _Complex _Float16 when return? I.e., 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > Thanks > > Pengfei > > > > -----Original Message----- > > From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of H.J. Lu > > via llvm-dev > > Sent: Friday, July 2, 2021 6:28 AM > > To: Joseph Myers <joseph@codesourcery.com> > > Cc: llvm-dev@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; > > GNU C Library <libc-alpha@sourceware.org>; IA32 System V Application > > Binary Interface <ia32-abi@googlegroups.com> > > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <joseph@codesourcery.com> wrote: > > > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > > > That restricts use of _Float16 to processors with SSE. Is that what > > > we want in the ABI, or should _Float16 be available with base 32-bit > > > x86 architecture features only, much like _Float128 and the decimal > > > FP types > > > > Yes, _Float16 requires XMM registers. > > > > > are? (If it is restricted to SSE, we can of course ensure relevant > > > libgcc functions are built with SSE enabled, and likewise in glibc > > > if that gains > > > _Float16 functions, though maybe with some extra complications to > > > get relevant testcases to run whenever possible.) > > > > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > > > BTW, _Float16 software emulation may require more than just SSE since we need to do _Float16 load and store with XMM registers. > > There is no 16bit load/store for XMM registers without AVX512FP16. > > > > -- > > H.J. > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev@lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > -- > H.J. -- H.J. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 14:26 ` H.J. Lu via Libc-alpha 2021-07-13 14:48 ` Wang, Pengfei via Libc-alpha @ 2021-07-13 15:41 ` Joseph Myers 2021-07-13 16:24 ` H.J. Lu via Libc-alpha 1 sibling, 1 reply; 20+ messages in thread From: Joseph Myers @ 2021-07-13 15:41 UTC (permalink / raw) To: IA32 System V Application Binary Interface Cc: Wang, Pengfei, llvm-dev, GNU C Library, GCC Patches On Tue, 13 Jul 2021, H.J. Lu wrote: > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI This PDF shows _Complex _Float16 as having a size of 2 bytes (should be 4-byte size, 2-byte alignment). It also seems to change double from 4-byte to 8-byte alignment, which is wrong. And it's inconsistent about whether it covers the long double = double (Android) case - it shows that case for _Complex long double but not for long double itself. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 15:41 ` Joseph Myers @ 2021-07-13 16:24 ` H.J. Lu via Libc-alpha 2021-07-29 13:39 ` H.J. Lu via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-13 16:24 UTC (permalink / raw) To: IA32 System V Application Binary Interface Cc: Wang, Pengfei, llvm-dev, GNU C Library, GCC Patches [-- Attachment #1: Type: text/plain, Size: 1285 bytes --] On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers <joseph@codesourcery.com> wrote: > > On Tue, 13 Jul 2021, H.J. Lu wrote: > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at > > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be > 4-byte size, 2-byte alignment). > > It also seems to change double from 4-byte to 8-byte alignment, which is > wrong. And it's inconsistent about whether it covers the long double = > double (Android) case - it shows that case for _Complex long double but > not for long double itself. Here is the v3 patch with the fixes. I also updated the PDF file. > -- > Joseph S. Myers > joseph@codesourcery.com > -- H.J. [-- Attachment #2: v3-0001-Add-optional-_Float16-support.patch --] [-- Type: text/x-patch, Size: 7346 bytes --] From a02a11ef0ea066cab57eb66ef392b21d243d2734 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Thu, 1 Jul 2021 13:58:00 -0700 Subject: [PATCH v3] Add optional _Float16 support 1. Pass _Float16 and _Complex _Float16 values on stack. 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. --- low-level-sys-info.tex | 76 ++++++++++++++++++++++++++++++------------ 1 file changed, 54 insertions(+), 22 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index acaf30e..9ae7995 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a \subsubsection{Fundamental Types} Table~\ref{basic-types} shows the correspondence between ISO C -scalar types and the processor scalar types. \code{__float80}, +scalar types and the processor scalar types. \code{_Float16}, +\code{__float80}, \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and \code{__m512} types are optional. @@ -79,23 +80,28 @@ scalar types and the processor scalar types. \code{__float80}, & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\ & \texttt{\textit{any-type} (*)()} & & \\ \hline - Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\ \cline{2-5} - point & \texttt{double} & 8 & 4 & double (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\ + \cline{2-5} + & \texttt{float} & 4 & 4 & single (IEEE-754) \\ + \cline{2-5} + Floating- & \texttt{double} & 8 & 4 & double (IEEE-754) \\ + point & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & 8 & 4 & double (IEEE-754) \\ \cline{2-5} & \texttt{__float80}$^{\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ \cline{2-5} & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\ \hline - Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ + & \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger}$ & 4 & 2 & complex 16-bit (IEEE-754) \\ \cline{2-5} - Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ - point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ \cline{2-5} - & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ - & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ + Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + \cline{2-5} + point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ + & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\ \hline @@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double} type, on the Android{\texttrademark} platform. More information on the Android{\texttrademark} platform is available from \url{http://www.android.com/}.}\\ +\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger}$ +The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\ \end{tabular} } \end{table} @@ -323,6 +331,7 @@ at the time of the call. \begin{table} \Hrule \caption{Register Usage} + \myfontsize \label{fig-reg-usage} \begin{center} \begin{tabular}{l|p{8.35cm}|l} @@ -346,13 +355,29 @@ of some 64bit return types & No \\ \EBP & callee-saved register; optionally used as frame pointer & Yes \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ -\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return -\code{__m128}, \code{__m256} parameters & No\\ -\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass -\code{__m128}, & No \\ -\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\ -\reg{xmm3}--\reg{xmm7},& scratch registers & No \\ -\reg{ymm3}--\reg{ymm7} & & \\ +\reg{xmm0} & scratch register; also used to pass the first \code{__m128} + parameter and return \code{__m128}, \code{_Float16}, + the real part of \code{_Complex _Float16} & No \\ +\reg{ymm0} & scratch register; also used to pass the first \code{__m256} + parameter and return \code{__m256} & No \\ +\reg{zmm0} & scratch register; also used to pass the first \code{__m512} + parameter and return \code{__m512} & No \\ +\reg{xmm1} & scratch register; also used to pass the second \code{__m128} + parameter and return the imaginary part of + \code{_Complex _Float16} & No \\ +\reg{ymm1} & scratch register; also used to pass the second \code{__m256} + parameters & No \\ +\reg{zmm1} & scratch register; also used to pass the second \code{__m512} + parameters & No \\ +\reg{xmm2} & scratch register; also used to pass the third \code{__m128} + parameters & No \\ +\reg{ymm2} & scratch register; also used to pass the third \code{__m256} + parameters & No \\ +\reg{zmm2} & scratch register; also used to pass the third \code{__m512} + parameters & No \\ +\reg{xmm3}--\reg{xmm7} & scratch registers & No \\ +\reg{ymm3}--\reg{ymm7} & scratch registers & No \\ +\reg{zmm3}--\reg{zmm7} & scratch registers & No \\ \reg{mm0} & scratch register; also used to pass and return \code{__m64} parameter & No\\ \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\ @@ -420,6 +445,8 @@ and \texttt{unions}) are always returned in memory. & \texttt{\textit{any-type} *} & \EAX \\ & \texttt{\textit{any-type} (*)()} & \\ \hline + & \texttt{_Float16} & \reg{xmm0} \\ + \cline{2-3} & \texttt{float} & \reg{st0} \\ \cline{2-3} Floating- & \texttt{double} & \reg{st0} \\ @@ -430,14 +457,19 @@ and \texttt{unions}) are always returned in memory. \cline{2-3} & \texttt{__float128} & memory \\ \hline - & \texttt{_Complex float} & \EDX:\EAX \\ - & & The real part is returned in \EAX. The imaginary part is + & \texttt{_Complex _Float16} & \reg{xmm0}:\reg{xmm1} \\ + & & The real part is returned in \reg{xmm0}. The imaginary part is + returned \\ + & & in \reg{xmm1}.\\ + \cline{2-3} + Complex & \texttt{_Complex float} & \EDX:\EAX \\ + floating- & & The real part is returned in \EAX. The imaginary part is returned \\ - Complex & & in \EDX.\\ + point & & in \EDX.\\ \cline{2-3} - floating- & \texttt{_Complex double} & memory \\ + & \texttt{_Complex double} & memory \\ \cline{2-3} - point & \texttt{_Complex long double} & memory \\ + & \texttt{_Complex long double} & memory \\ \cline{2-3} & \texttt{_Complex __float80} & memory \\ \cline{2-3} -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-13 16:24 ` H.J. Lu via Libc-alpha @ 2021-07-29 13:39 ` H.J. Lu via Libc-alpha 2021-08-24 5:55 ` John McCall via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-07-29 13:39 UTC (permalink / raw) To: IA32 System V Application Binary Interface Cc: Wang, Pengfei, llvm-dev, GNU C Library, GCC Patches [-- Attachment #1: Type: text/plain, Size: 1538 bytes --] On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers <joseph@codesourcery.com> wrote: > > > > On Tue, 13 Jul 2021, H.J. Lu wrote: > > > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: > > > > > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > > > > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., > > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > > > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > > > > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at > > > > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be > > 4-byte size, 2-byte alignment). > > > > It also seems to change double from 4-byte to 8-byte alignment, which is > > wrong. And it's inconsistent about whether it covers the long double = > > double (Android) case - it shows that case for _Complex long double but > > not for long double itself. > > Here is the v3 patch with the fixes. I also updated the PDF file. Here is the final patch I checked in. _Complex _Float16 is changed to return in XMM0 register. The new PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI -- H.J. [-- Attachment #2: 0001-Add-optional-_Float16-support.patch --] [-- Type: text/x-patch, Size: 6998 bytes --] From 4ce1007486d28b13da36bbf216b2e470818d7ee1 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Thu, 1 Jul 2021 13:58:00 -0700 Subject: [PATCH] Add optional _Float16 support 1. Pass _Float16 and _Complex _Float16 values on stack. 2. Return _Float16 and _Complex _Float16 values in XMM0 register. --- low-level-sys-info.tex | 70 +++++++++++++++++++++++++++++------------- 1 file changed, 49 insertions(+), 21 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index acaf30e..860ff66 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a \subsubsection{Fundamental Types} Table~\ref{basic-types} shows the correspondence between ISO C -scalar types and the processor scalar types. \code{__float80}, +scalar types and the processor scalar types. \code{_Float16}, +\code{__float80}, \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and \code{__m512} types are optional. @@ -79,23 +80,28 @@ scalar types and the processor scalar types. \code{__float80}, & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\ & \texttt{\textit{any-type} (*)()} & & \\ \hline - Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\ \cline{2-5} - point & \texttt{double} & 8 & 4 & double (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\ + \cline{2-5} + & \texttt{float} & 4 & 4 & single (IEEE-754) \\ + \cline{2-5} + Floating- & \texttt{double} & 8 & 4 & double (IEEE-754) \\ + point & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{__float80}$^{\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\ \hline - Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ + & \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger}$ & 4 & 2 & complex 16-bit (IEEE-754) \\ + \cline{2-5} + & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ \cline{2-5} - Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ - point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ + Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} - & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ - & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ + & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\ \hline @@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double} type, on the Android{\texttrademark} platform. More information on the Android{\texttrademark} platform is available from \url{http://www.android.com/}.}\\ +\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger}$ +The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\ \end{tabular} } \end{table} @@ -323,6 +331,7 @@ at the time of the call. \begin{table} \Hrule \caption{Register Usage} + \myfontsize \label{fig-reg-usage} \begin{center} \begin{tabular}{l|p{8.35cm}|l} @@ -346,13 +355,28 @@ of some 64bit return types & No \\ \EBP & callee-saved register; optionally used as frame pointer & Yes \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ -\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return -\code{__m128}, \code{__m256} parameters & No\\ -\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass -\code{__m128}, & No \\ -\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\ -\reg{xmm3}--\reg{xmm7},& scratch registers & No \\ -\reg{ymm3}--\reg{ymm7} & & \\ +\reg{xmm0} & scratch register; also used to pass the first \code{__m128} + parameter and return \code{__m128}, \code{_Float16}, + \code{_Complex _Float16} & No \\ +\reg{ymm0} & scratch register; also used to pass the first \code{__m256} + parameter and return \code{__m256} & No \\ +\reg{zmm0} & scratch register; also used to pass the first \code{__m512} + parameter and return \code{__m512} & No \\ +\reg{xmm1} & scratch register; also used to pass the second \code{__m128} + parameter & No \\ +\reg{ymm1} & scratch register; also used to pass the second \code{__m256} + parameters & No \\ +\reg{zmm1} & scratch register; also used to pass the second \code{__m512} + parameters & No \\ +\reg{xmm2} & scratch register; also used to pass the third \code{__m128} + parameters & No \\ +\reg{ymm2} & scratch register; also used to pass the third \code{__m256} + parameters & No \\ +\reg{zmm2} & scratch register; also used to pass the third \code{__m512} + parameters & No \\ +\reg{xmm3}--\reg{xmm7} & scratch registers & No \\ +\reg{ymm3}--\reg{ymm7} & scratch registers & No \\ +\reg{zmm3}--\reg{zmm7} & scratch registers & No \\ \reg{mm0} & scratch register; also used to pass and return \code{__m64} parameter & No\\ \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\ @@ -420,6 +444,8 @@ and \texttt{unions}) are always returned in memory. & \texttt{\textit{any-type} *} & \EAX \\ & \texttt{\textit{any-type} (*)()} & \\ \hline + & \texttt{_Float16} & \reg{xmm0} \\ + \cline{2-3} & \texttt{float} & \reg{st0} \\ \cline{2-3} Floating- & \texttt{double} & \reg{st0} \\ @@ -430,14 +456,16 @@ and \texttt{unions}) are always returned in memory. \cline{2-3} & \texttt{__float128} & memory \\ \hline - & \texttt{_Complex float} & \EDX:\EAX \\ - & & The real part is returned in \EAX. The imaginary part is + & \texttt{_Complex _Float16} & \reg{xmm0} \\ + \cline{2-3} + Complex & \texttt{_Complex float} & \EDX:\EAX \\ + floating- & & The real part is returned in \EAX. The imaginary part is returned \\ - Complex & & in \EDX.\\ + point & & in \EDX.\\ \cline{2-3} - floating- & \texttt{_Complex double} & memory \\ + & \texttt{_Complex double} & memory \\ \cline{2-3} - point & \texttt{_Complex long double} & memory \\ + & \texttt{_Complex long double} & memory \\ \cline{2-3} & \texttt{_Complex __float80} & memory \\ \cline{2-3} -- 2.31.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-07-29 13:39 ` H.J. Lu via Libc-alpha @ 2021-08-24 5:55 ` John McCall via Libc-alpha 2021-08-25 12:35 ` H.J. Lu via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: John McCall via Libc-alpha @ 2021-08-24 5:55 UTC (permalink / raw) To: ia32-abi; +Cc: Wang, Pengfei, LLVM Dev, GNU C Library, GCC Patches On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu <hjl.tools@gmail.com> wrote: > > > > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers <joseph@codesourcery.com> > wrote: > > > > > > On Tue, 13 Jul 2021, H.J. Lu wrote: > > > > > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei < > pengfei.wang@intel.com> wrote: > > > > > > > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > registers. > > > > > > > > > > Can you please explain the behavior here? Is there difference > between _Float16 and _Complex _Float16 when return? I.e., > > > > > 1, In which case will _Float16 values return in both %xmm0 and > %xmm1? > > > > > 2, For a single _Float16 value, are both real part and imaginary > part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? > > > > > > > > Here is the v2 patch to add the missing _Float16 bits. The PDF > file is at > > > > > > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > > > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be > > > 4-byte size, 2-byte alignment). > > > > > > It also seems to change double from 4-byte to 8-byte alignment, which > is > > > wrong. And it's inconsistent about whether it covers the long double = > > > double (Android) case - it shows that case for _Complex long double but > > > not for long double itself. > > > > Here is the v3 patch with the fixes. I also updated the PDF file. > > Here is the final patch I checked in. _Complex _Float16 is changed to > return > in XMM0 register. The new PDF file is at > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI This should be explicit that the real part is returned in bits 0..15 and the imaginary part is returned in bits 16..31, or however we conventionally designate subcomponents of a vector. John. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-08-24 5:55 ` John McCall via Libc-alpha @ 2021-08-25 12:35 ` H.J. Lu via Libc-alpha 2021-08-25 20:32 ` John McCall via Libc-alpha 0 siblings, 1 reply; 20+ messages in thread From: H.J. Lu via Libc-alpha @ 2021-08-25 12:35 UTC (permalink / raw) To: IA32 System V Application Binary Interface Cc: Wang, Pengfei, LLVM Dev, GNU C Library, GCC Patches On Mon, Aug 23, 2021 at 10:55 PM John McCall <rjmccall@gmail.com> wrote: > > On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> >> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu <hjl.tools@gmail.com> wrote: >> > >> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers <joseph@codesourcery.com> wrote: >> > > >> > > On Tue, 13 Jul 2021, H.J. Lu wrote: >> > > >> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.wang@intel.com> wrote: >> > > > > >> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. >> > > > > >> > > > > Can you please explain the behavior here? Is there difference between _Float16 and _Complex _Float16 when return? I.e., >> > > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? >> > > > > 2, For a single _Float16 value, are both real part and imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively? >> > > > >> > > > Here is the v2 patch to add the missing _Float16 bits. The PDF file is at >> > > > >> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI >> > > >> > > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be >> > > 4-byte size, 2-byte alignment). >> > > >> > > It also seems to change double from 4-byte to 8-byte alignment, which is >> > > wrong. And it's inconsistent about whether it covers the long double = >> > > double (Android) case - it shows that case for _Complex long double but >> > > not for long double itself. >> > >> > Here is the v3 patch with the fixes. I also updated the PDF file. >> >> Here is the final patch I checked in. _Complex _Float16 is changed to return >> in XMM0 register. The new PDF file is at >> >> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > This should be explicit that the real part is returned in bits 0..15 and the imaginary part is returned in bits 16..31, or however we conventionally designate subcomponents of a vector. > > John. How about this? diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 860ff66..8f527c1 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory. & \texttt{__float128} & memory \\ \hline & \texttt{_Complex _Float16} & \reg{xmm0} \\ + & & The real part is returned in bits 0..15. The imaginary part is + returned \\ + & & in bits 16..31.\\ \cline{2-3} Complex & \texttt{_Complex float} & \EDX:\EAX \\ floating- & & The real part is returned in \EAX. The imaginary part is https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf -- H.J. ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [llvm-dev] [PATCH] Add optional _Float16 support 2021-08-25 12:35 ` H.J. Lu via Libc-alpha @ 2021-08-25 20:32 ` John McCall via Libc-alpha 0 siblings, 0 replies; 20+ messages in thread From: John McCall via Libc-alpha @ 2021-08-25 20:32 UTC (permalink / raw) To: ia32-abi; +Cc: Wang, Pengfei, LLVM Dev, GNU C Library, GCC Patches On Wed, Aug 25, 2021 at 8:36 AM H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Aug 23, 2021 at 10:55 PM John McCall <rjmccall@gmail.com> wrote: > > On Thu, Jul 29, 2021 at 9:40 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu <hjl.tools@gmail.com> wrote: > >> > On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers <joseph@codesourcery.com> > wrote: > >> > > On Tue, 13 Jul 2021, H.J. Lu wrote: > >> > > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei < > pengfei.wang@intel.com> wrote: > >> > > > > > >> > > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 > registers. > >> > > > > > >> > > > > Can you please explain the behavior here? Is there difference > between _Float16 and _Complex _Float16 when return? I.e., > >> > > > > 1, In which case will _Float16 values return in both %xmm0 and > %xmm1? > >> > > > > 2, For a single _Float16 value, are both real part and > imaginary part returned in %xmm0? Or returned in %xmm0 and %xmm1 > respectively? > >> > > > > >> > > > Here is the v2 patch to add the missing _Float16 bits. The PDF > file is at > >> > > > > >> > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > >> > > > >> > > This PDF shows _Complex _Float16 as having a size of 2 bytes > (should be > >> > > 4-byte size, 2-byte alignment). > >> > > > >> > > It also seems to change double from 4-byte to 8-byte alignment, > which is > >> > > wrong. And it's inconsistent about whether it covers the long > double = > >> > > double (Android) case - it shows that case for _Complex long double > but > >> > > not for long double itself. > >> > > >> > Here is the v3 patch with the fixes. I also updated the PDF file. > >> > >> Here is the final patch I checked in. _Complex _Float16 is changed to > return > >> in XMM0 register. The new PDF file is at > >> > >> https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > > > > > > This should be explicit that the real part is returned in bits 0..15 and > the imaginary part is returned in bits 16..31, or however we conventionally > designate subcomponents of a vector. > > How about this? > > diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex > index 860ff66..8f527c1 100644 > --- a/low-level-sys-info.tex > +++ b/low-level-sys-info.tex > @@ -457,6 +457,9 @@ and \texttt{unions}) are always returned in memory. > & \texttt{__float128} & memory \\ > \hline > & \texttt{_Complex _Float16} & \reg{xmm0} \\ > + & & The real part is returned in bits 0..15. The imaginary part is > + returned \\ > + & & in bits 16..31.\\ > \cline{2-3} > Complex & \texttt{_Complex float} & \EDX:\EAX \\ > floating- & & The real part is returned in \EAX. The imaginary part is > > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/uploads/89eb3e52c7e5eadd58f7597508e13f34/intel386-psABI-2021-08-25.pdf Looks good to me, thanks. John. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-08-25 20:33 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-01 21:05 [PATCH] Add optional _Float16 support H.J. Lu via Libc-alpha 2021-07-01 22:10 ` Joseph Myers 2021-07-01 22:27 ` H.J. Lu via Libc-alpha 2021-07-01 22:40 ` Joseph Myers 2021-07-01 23:01 ` H.J. Lu via Libc-alpha 2021-07-01 23:05 ` [llvm-dev] " Craig Topper via Libc-alpha 2021-07-01 23:33 ` Jacob Lifshay via Libc-alpha 2021-07-02 7:45 ` Richard Biener via Libc-alpha 2021-07-02 8:03 ` Hongtao Liu via Libc-alpha 2021-07-02 9:21 ` Jakub Jelinek via Libc-alpha 2021-07-13 3:59 ` Wang, Pengfei via Libc-alpha 2021-07-13 14:26 ` H.J. Lu via Libc-alpha 2021-07-13 14:48 ` Wang, Pengfei via Libc-alpha 2021-07-13 15:04 ` H.J. Lu via Libc-alpha 2021-07-13 15:41 ` Joseph Myers 2021-07-13 16:24 ` H.J. Lu via Libc-alpha 2021-07-29 13:39 ` H.J. Lu via Libc-alpha 2021-08-24 5:55 ` John McCall via Libc-alpha 2021-08-25 12:35 ` H.J. Lu via Libc-alpha 2021-08-25 20:32 ` John McCall via Libc-alpha
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).