unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg
@ 2019-01-18 13:06 Uros Bizjak
  2019-01-20 18:21 ` Uros Bizjak
  2019-01-24  8:23 ` Richard Henderson
  0 siblings, 2 replies; 7+ messages in thread
From: Uros Bizjak @ 2019-01-18 13:06 UTC (permalink / raw
  To: libc-alpha; +Cc: Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 497 bytes --]

Hello!

Attached patch fixes sysdeps/alpha/remqu.S clobbering $f3 register via
$y_is_neg path. There was missing restore of $f3 before the return
from the function.

The patch also reorders insns a bit, so it becomes similar as much as
possible to divqu.S.

Without the patch, math/big testcase from Go-1.11 testsuite (that
includes lots of corner cases that exercise remqu) FAIL, with patched
function, the testcase PASSes without problems.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>

Uros.

[-- Attachment #2: libc-alpha.diff.txt --]
[-- Type: text/plain, Size: 1439 bytes --]

diff --git a/sysdeps/alpha/remqu.S b/sysdeps/alpha/remqu.S
index c2c5caf3c20..7210628f973 100644
--- a/sysdeps/alpha/remqu.S
+++ b/sysdeps/alpha/remqu.S
@@ -59,20 +59,19 @@ __remqu:
 	subq	Y, 1, AT
 	stt	$f0, 0(sp)
 	and	Y, AT, AT
+	excb
+	beq	AT, $powerof2
 
 	stt	$f1, 8(sp)
-	excb
 	stt	$f3, 48(sp)
-	beq	AT, $powerof2
 	cfi_rel_offset ($f0, 0)
 	cfi_rel_offset ($f1, 8)
 	cfi_rel_offset ($f3, 48)
+	mf_fpcr	$f3
 
 	_ITOFT2	X, $f0, 16, Y, $f1, 24
-	mf_fpcr	$f3
 	cvtqt	$f0, $f0
 	cvtqt	$f1, $f1
-
 	blt	X, $x_is_neg
 	divt/c	$f0, $f1, $f0
 
@@ -94,12 +93,12 @@ __remqu:
 	mulq	AT, Y, AT
 	ldt	$f0, 0(sp)
 	ldt	$f3, 48(sp)
-	lda	sp, FRAME(sp)
 	cfi_remember_state
 	cfi_restore ($f0)
 	cfi_restore ($f1)
 	cfi_restore ($f3)
 	cfi_def_cfa_offset (0)
+	lda	sp, FRAME(sp)
 
 	.align	4
 	subq	X, AT, RV
@@ -116,11 +115,13 @@ $x_is_neg:
 	cfi_rel_offset ($f2, 24)
 	_ITOFS	AT, $f2, 16
 
+	.align	4
 	addt	$f0, $f2, $f0
+	unop
 	divt/c	$f0, $f1, $f0
+	unop
 
 	/* Ok, we've now the divide issued.  Continue with other checks.  */
-	.align	4
 	ldt	$f1, 8(sp)
 	unop
 	ldt	$f2, 24(sp)
@@ -246,12 +247,16 @@ $y_is_neg:
 	   quotient must be either 0 or 1, so the remainder must be X
 	   or X-Y, so just compute it directly.  */
 	cmpule	Y, X, AT
+	excb
+	mt_fpcr	$f3
 	subq	X, Y, RV
 	ldt	$f0, 0(sp)
+	ldt	$f3, 48(sp)
 	cmoveq	AT, X, RV
 
 	lda	sp, FRAME(sp)
 	cfi_restore ($f0)
+	cfi_restore ($f3)
 	cfi_def_cfa_offset (0)
 	ret	$31, (RA), 1
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg
  2019-01-18 13:06 [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg Uros Bizjak
@ 2019-01-20 18:21 ` Uros Bizjak
  2019-01-24  8:23 ` Richard Henderson
  1 sibling, 0 replies; 7+ messages in thread
From: Uros Bizjak @ 2019-01-20 18:21 UTC (permalink / raw
  To: libc-alpha; +Cc: Richard Henderson

On Fri, Jan 18, 2019 at 2:06 PM Uros Bizjak <ubizjak@gmail.com>
wrote:Please also find the missing ChangeLog entry.

Please find missing ChangeLog entry in this message. Also, please note
I don't have write access to glibc repository.

Uros.

> Attached patch fixes sysdeps/alpha/remqu.S clobbering $f3 register via
> $y_is_neg path. There was missing restore of $f3 before the return
> from the function.
>
> The patch also reorders insns a bit, so it becomes similar as much as
> possible to divqu.S.
>
> Without the patch, math/big testcase from Go-1.11 testsuite (that
> includes lots of corner cases that exercise remqu) FAIL, with patched
> function, the testcase PASSes without problems.

2019-18-01  Uroš Bizjak  <ubizjak@gmail.com>

    * sysdeps/alpha/remqu.S (__remqu): Add missing restore
    of $f3 register on $y_is_neg path.  Reorder instructions
    similar to divqu.S order.

> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
>
> Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg
  2019-01-18 13:06 [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg Uros Bizjak
  2019-01-20 18:21 ` Uros Bizjak
@ 2019-01-24  8:23 ` Richard Henderson
  2019-01-24  8:35   ` Uros Bizjak
  2019-02-27 19:54   ` [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S Uros Bizjak
  1 sibling, 2 replies; 7+ messages in thread
From: Richard Henderson @ 2019-01-24  8:23 UTC (permalink / raw
  To: Uros Bizjak, libc-alpha

On 1/18/19 5:06 AM, Uros Bizjak wrote:
> Hello!
> 
> Attached patch fixes sysdeps/alpha/remqu.S clobbering $f3 register via
> $y_is_neg path. There was missing restore of $f3 before the return
> from the function.
> 
> The patch also reorders insns a bit, so it becomes similar as much as
> possible to divqu.S.
> 
> Without the patch, math/big testcase from Go-1.11 testsuite (that
> includes lots of corner cases that exercise remqu) FAIL, with patched
> function, the testcase PASSes without problems.


> +++ b/sysdeps/alpha/remqu.S
> @@ -59,20 +59,19 @@ __remqu:
>  	subq	Y, 1, AT
>  	stt	$f0, 0(sp)
>  	and	Y, AT, AT
> +	excb
> +	beq	AT, $powerof2
>  
>  	stt	$f1, 8(sp)
> -	excb

Why are you moving the excb above the powerof2 branch?
The path at powerof2 does not touch fpcr or issue fp insns.

> @@ -94,12 +93,12 @@ __remqu:
>  	mulq	AT, Y, AT
>  	ldt	$f0, 0(sp)
>  	ldt	$f3, 48(sp)
> -	lda	sp, FRAME(sp)
>  	cfi_remember_state
>  	cfi_restore ($f0)
>  	cfi_restore ($f1)
>  	cfi_restore ($f3)
>  	cfi_def_cfa_offset (0)
> +	lda	sp, FRAME(sp)

This change is actively wrong wrt the unwind info.

> @@ -246,12 +247,16 @@ $y_is_neg:
>  	   quotient must be either 0 or 1, so the remainder must be X
>  	   or X-Y, so just compute it directly.  */
>  	cmpule	Y, X, AT
> +	excb
> +	mt_fpcr	$f3
>  	subq	X, Y, RV
>  	ldt	$f0, 0(sp)
> +	ldt	$f3, 48(sp)
>  	cmoveq	AT, X, RV
>  
>  	lda	sp, FRAME(sp)
>  	cfi_restore ($f0)
> +	cfi_restore ($f3)
>  	cfi_def_cfa_offset (0)
>  	ret	$31, (RA), 1

This appears to be the only change required to fix the bug.

Can you walk me through why the other changes?


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg
  2019-01-24  8:23 ` Richard Henderson
@ 2019-01-24  8:35   ` Uros Bizjak
  2019-01-24  9:04     ` Richard Henderson
  2019-02-27 19:54   ` [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S Uros Bizjak
  1 sibling, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2019-01-24  8:35 UTC (permalink / raw
  To: Richard Henderson; +Cc: libc-alpha

On Thu, Jan 24, 2019 at 9:23 AM Richard Henderson <rth@twiddle.net> wrote:
>
> On 1/18/19 5:06 AM, Uros Bizjak wrote:
> > Hello!
> >
> > Attached patch fixes sysdeps/alpha/remqu.S clobbering $f3 register via
> > $y_is_neg path. There was missing restore of $f3 before the return
> > from the function.
> >
> > The patch also reorders insns a bit, so it becomes similar as much as
> > possible to divqu.S.
> >
> > Without the patch, math/big testcase from Go-1.11 testsuite (that
> > includes lots of corner cases that exercise remqu) FAIL, with patched
> > function, the testcase PASSes without problems.
>
>
> > +++ b/sysdeps/alpha/remqu.S
> > @@ -59,20 +59,19 @@ __remqu:
> >       subq    Y, 1, AT
> >       stt     $f0, 0(sp)
> >       and     Y, AT, AT
> > +     excb
> > +     beq     AT, $powerof2
> >
> >       stt     $f1, 8(sp)
> > -     excb
>
> Why are you moving the excb above the powerof2 branch?
> The path at powerof2 does not touch fpcr or issue fp insns.

This was meant to unify the flow with the __divqu assembly, which does
the above before calling DIVBYZERO. The idea was that __divqu is used
much more than __remqu, so the later should do the same as the former.

> > @@ -94,12 +93,12 @@ __remqu:
> >       mulq    AT, Y, AT
> >       ldt     $f0, 0(sp)
> >       ldt     $f3, 48(sp)
> > -     lda     sp, FRAME(sp)
> >       cfi_remember_state
> >       cfi_restore ($f0)
> >       cfi_restore ($f1)
> >       cfi_restore ($f3)
> >       cfi_def_cfa_offset (0)
> > +     lda     sp, FRAME(sp)
>
> This change is actively wrong wrt the unwind info.

Again, this will match __divqu assembly. It looks that __divqu needs
to be fixed then.

> > @@ -246,12 +247,16 @@ $y_is_neg:
> >          quotient must be either 0 or 1, so the remainder must be X
> >          or X-Y, so just compute it directly.  */
> >       cmpule  Y, X, AT
> > +     excb
> > +     mt_fpcr $f3
> >       subq    X, Y, RV
> >       ldt     $f0, 0(sp)
> > +     ldt     $f3, 48(sp)
> >       cmoveq  AT, X, RV
> >
> >       lda     sp, FRAME(sp)
> >       cfi_restore ($f0)
> > +     cfi_restore ($f3)
> >       cfi_def_cfa_offset (0)
> >       ret     $31, (RA), 1
>
> This appears to be the only change required to fix the bug.

That is true. This part is the problematic part and clobbers $f3.
Should I resend the patch only with this part fixed?

> Can you walk me through why the other changes?

As said above, I was trying to make __remqu like __divqu, but it looks
that __divqu should be fixed in some places.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg
  2019-01-24  8:35   ` Uros Bizjak
@ 2019-01-24  9:04     ` Richard Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2019-01-24  9:04 UTC (permalink / raw
  To: Uros Bizjak; +Cc: libc-alpha

On 1/24/19 12:35 AM, Uros Bizjak wrote:
> That is true. This part is the problematic part and clobbers $f3.
> Should I resend the patch only with this part fixed?

Yes please.  We're in the final stages of release
and any changes should be most minimal.

We can fix the div/rem matchup and unwind errors
for the next release.


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S
  2019-01-24  8:23 ` Richard Henderson
  2019-01-24  8:35   ` Uros Bizjak
@ 2019-02-27 19:54   ` Uros Bizjak
  2019-04-01  9:02     ` Richard Henderson
  1 sibling, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2019-02-27 19:54 UTC (permalink / raw
  To: Richard Henderson; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 567 bytes --]

Attached patch improves and unifies sysdeps/alpha/divqu.S and
sysdeps/alpha/remqu.S.

* sysdeps/alpha/divqu.S (__divqu): Move save of $f0 and excb after
conditional branch to DIVBYZERO.  Fix unwind info.
* sysdeps/alpha/remqu.S (__remqu): Move saves of $f0, $f1, $f2 and
excb after conditional branch to $powerof2.  Add missing unop
instructions and .align directives and reorder instructions to match
__divqu.

Signed-off-by: Uroš Bizjak <ubizjak@gmail.com>

Patch was tested on alphaev68-linux-gnu, also by running complete
libgo testsuite.

Uros.

[-- Attachment #2: a.diff.txt --]
[-- Type: text/plain, Size: 1762 bytes --]

diff --git a/sysdeps/alpha/divqu.S b/sysdeps/alpha/divqu.S
index f5cedd0716..3165374b6d 100644
--- a/sysdeps/alpha/divqu.S
+++ b/sysdeps/alpha/divqu.S
@@ -56,10 +56,10 @@ __divqu:
 	   that's done, we have at least 22 cycles until its results are
 	   ready -- all the time in the world to figure out how we're
 	   going to use the results.  */
-	stt	$f0, 0(sp)
-	excb
 	beq	Y, DIVBYZERO
 
+	stt	$f0, 0(sp)
+	excb
 	stt	$f1, 8(sp)
 	stt	$f3, 48(sp)
 	cfi_rel_offset ($f0, 0)
@@ -70,6 +70,7 @@ __divqu:
 	_ITOFT2	X, $f0, 16, Y, $f1, 24
 	cvtqt	$f0, $f0
 	cvtqt	$f1, $f1
+
 	blt	X, $x_is_neg
 	divt/c	$f0, $f1, $f0
 
@@ -90,12 +91,12 @@ __divqu:
 
 	ldt	$f0, 0(sp)
 	ldt	$f3, 48(sp)
+	lda	sp, FRAME(sp)
 	cfi_remember_state
 	cfi_restore ($f0)
 	cfi_restore ($f1)
 	cfi_restore ($f3)
 	cfi_def_cfa_offset (0)
-	lda	sp, FRAME(sp)
 	ret	$31, (RA), 1
 
 	.align	4
diff --git a/sysdeps/alpha/remqu.S b/sysdeps/alpha/remqu.S
index a240ee9735..3b6a62dd88 100644
--- a/sysdeps/alpha/remqu.S
+++ b/sysdeps/alpha/remqu.S
@@ -57,19 +57,19 @@ __remqu:
 	   ready -- all the time in the world to figure out how we're
 	   going to use the results.  */
 	subq	Y, 1, AT
-	stt	$f0, 0(sp)
 	and	Y, AT, AT
+	beq	AT, $powerof2
 
-	stt	$f1, 8(sp)
+	stt	$f0, 0(sp)
 	excb
+	stt	$f1, 8(sp)
 	stt	$f3, 48(sp)
-	beq	AT, $powerof2
 	cfi_rel_offset ($f0, 0)
 	cfi_rel_offset ($f1, 8)
 	cfi_rel_offset ($f3, 48)
+	mf_fpcr	$f3
 
 	_ITOFT2	X, $f0, 16, Y, $f1, 24
-	mf_fpcr	$f3
 	cvtqt	$f0, $f0
 	cvtqt	$f1, $f1
 
@@ -116,11 +116,13 @@ $x_is_neg:
 	cfi_rel_offset ($f2, 24)
 	_ITOFS	AT, $f2, 16
 
+	.align	4
 	addt	$f0, $f2, $f0
+	unop
 	divt/c	$f0, $f1, $f0
+	unop
 
 	/* Ok, we've now the divide issued.  Continue with other checks.  */
-	.align	4
 	ldt	$f1, 8(sp)
 	unop
 	ldt	$f2, 24(sp)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S
  2019-02-27 19:54   ` [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S Uros Bizjak
@ 2019-04-01  9:02     ` Richard Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2019-04-01  9:02 UTC (permalink / raw
  To: Uros Bizjak; +Cc: libc-alpha

On 2/28/19 2:54 AM, Uros Bizjak wrote:
> Attached patch improves and unifies sysdeps/alpha/divqu.S and
> sysdeps/alpha/remqu.S.
> 
> * sysdeps/alpha/divqu.S (__divqu): Move save of $f0 and excb after
> conditional branch to DIVBYZERO.  Fix unwind info.
> * sysdeps/alpha/remqu.S (__remqu): Move saves of $f0, $f1, $f2 and
> excb after conditional branch to $powerof2.  Add missing unop
> instructions and .align directives and reorder instructions to match
> __divqu.
> 
> Signed-off-by: Uroš Bizjak <ubizjak@gmail.com>

Thanks, committed.


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-01  9:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-18 13:06 [PATCH, alpha]: Fix sysdeps/alpha/remqu.S clobbering $f3 reg Uros Bizjak
2019-01-20 18:21 ` Uros Bizjak
2019-01-24  8:23 ` Richard Henderson
2019-01-24  8:35   ` Uros Bizjak
2019-01-24  9:04     ` Richard Henderson
2019-02-27 19:54   ` [PATCH, alpha]: Improve sysdeps/alpha/divqu.S and sysdeps/alpha/remqu.S Uros Bizjak
2019-04-01  9:02     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).