Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.

From: Ivan Zakharyaschev <imz@altlinux.org>
To: Bruno Haible <bruno@clisp.org>
Cc: bircoph@altlinux.org, bug-gnulib@gnu.org
Subject: Re: [RFC PATCH] test-c-stack2.sh: skip if the platform sent SIGILL on an invalid address.
Date: Sat, 29 Dec 2018 23:13:00 +0300 (MSK)	[thread overview]
Message-ID: <alpine.LFD.2.20.1812292301020.6081@imap.altlinux.org> (raw)
In-Reply-To: <alpine.LFD.2.20.1812291505000.6081@imap.altlinux.org>

Here is a follow-up to the story, for those curious what happens in a 
similar IA64 architecture. And this should be it.

As for the problem on E2K itself, we should discuss it with MCST and/or 
investigate whether the missing information about the faults can be 
recovered to better satisfy POSIX.

On Sat, 29 Dec 2018, Ivan Zakharyaschev wrote:

> > > As for the SIGILL peculiarity, it has a reason in the Elbrus architecture. 

> I've studied the assembler code and found the other true 
> reason in this specific case: these are faults "hidden" in an explicitly 
> "speculative" computation which utltimately result in SIGILL. (The E2K ISA 
> is reminiscent of IA64; this can help get the idea.) The specific kind of 
> the fault is "forgotten", unfortunately.

> Besides, in many aspects including the newly mentioned by me explicitly 
> speculative instructions, E2K reminds IA64.
> 
> And it'd be interesting to have a look how they treat faults coming from 
> speculative computations in Linux/ia64 to get an idea whether it can be 
> done in a manner with better conformance to POSIX.

> * * *
> 
> BTW, saving and forgetting the type of the original fault doesn't seem

I meant "not forgetting".

> to be something expensive to implement (after some thought): when a
> register is marked as invalid, it shouldn't matter anymore what value
> it holds. So, the same register can be used to save the information
> about the type of the fault.

As Dmitry Levin pointed out, probably not, because there can be too much 
information (the fault, and the associated addres) for a single register.

> * * *
> 
> I wanted to see how Linux/ia64 handles these complications arising
> from speculative computations possibly causing a fault; and powered on
> such a machine, and had a look at the above examples with SIGILL on
> E2K: the third one, and the fifth one (speculative division by zero).
> 
> The third example from above:
> 
> imz@rx2620:~/test-speculative-SIGSEGV$ cc -Wall -O3 -xc - -S -o c.s && cat c.s
> int main(int argc, char ** argv) {
>   if (0 < argc)
>     ++*(char*)0xbad;
>   return 0xbeef;
> }
> 	.file	""
> 	.pred.safe_across_calls p1-p5,p16-p63
> 	.section	.text.startup,"ax",@progbits
> 	.align 16
> 	.align 64
> 	.global main#
> 	.type	main#, @function
> 	.proc main#
> main:
> 	.prologue
> 	.body
> 	.mmi
> 	cmp4.ge p6, p7 = 0, r32
> 	addl r14 = 2989, r0
> 	addl r8 = 48879, r0
> 	;;
> 	.mmi
> 	(p7) ld1 r15 = [r14]
> 	;;
> 	(p7) adds r15 = 1, r15
> 	nop 0
> 	;;
> 	.mib
> 	(p7) st1 [r14] = r15
> 	nop 0
> 	br.ret.sptk.many b0
> 	.endp main#
> 	.ident	"GCC: (Debian 4.6.3-14) 4.6.3"
> 	.section	.note.GNU-stack,"",@progbits
> imz@rx2620:~/test-speculative-SIGSEGV$ cc -Wall -O3 c.s && ./a.out; echo $?
> Segmentation fault
> 139

> Notes on the assembler: the possible groupings into VLIWs are
> separated by double semicolons (";;"). Predicative execution of
> instructions is marked by a prefix with the corresponding predicate
> register in parentheses, like "(p7)" in the code above:
> 
> 	.mmi
> 	(p7) ld1 r15 = [r14]
> 	;;
> 	(p7) adds r15 = 1, r15
> 	nop 0
> 	;;
> 	.mib
> 	(p7) st1 [r14] = r15
> 
> These are the "load", "add", and "store" instructions corresponding to: ++*(char*)0xbad
> 
> All this shows that gcc-4.6 on IA-64 doesn't generate speculative
> computations for the same examples that had speculative computations
> on E2K. Unfortunately, this means that we couldn't compare the
> interesting bits of the behavior between Linux/e2k and Linux/ia64
> quickly. Perhaps, editing the IA64 assembler code can give a desired
> example.

Cool! Linux/ia64 also produces SIGILL in the same situation; it seems
to have no magic. (But there is a second part of the story!)

imz@rx2620:~/test-speculative-SIGSEGV$ diff c.s c_s.s
18c18
< 	(p7) ld1 r15 = [r14]
---
> 	(p7) ld1.s r15 = [r14]
imz@rx2620:~/test-speculative-SIGSEGV$ cc c_s.s && ./a.out; echo $?
Illegal instruction
132

"ld1.s" is the "load 1 byte" instruction with the "speculative" flag.

If we do not use the "invalid" register in a "store" instruction, then
there is no fault:

imz@rx2620:~/test-speculative-SIGSEGV$ diff c_s.s c_nost.s
24,25d23
< 	(p7) st1 [r14] = r15
< 	nop 0
imz@rx2620:~/test-speculative-SIGSEGV$ cc c_nost.s && ./a.out; echo $?
239

And the second part:

The problem has a solution on IA64. The compiler would know how to
replay the faulty speculative computation, so it would be able
generate code to do this non-speculatively and trigger the real fault.
And there is an instruction that checks whether a register is
"valid"[1] and helps to jump to the recovery code[2]: "chk.s".

I've implemented this approach manually in c_chk.s like this (but I
have not seen what a compiler would do actually; IA64 has other
flavors of speculative instructions, like "ld.a" etc., so there are
rich possiblities):

	.file	""
	.pred.safe_across_calls p1-p5,p16-p63
	.section	.text.startup,"ax",@progbits
	.align 16
	.align 64
	.global main#
	.type	main#, @function
	.proc main#
main:
	.prologue
	.body
	.mmi
	addl r14 = 2989, r0
	addl r8 = 48879, r0
	;;
	.mmi
	ld1.s r15 = [r14]
	;;
	.mmi
	cmp4.ge p6, p7 = 0, r32
	;;
	(p7) adds r15 = 1, r15
	nop 0
	;;
	(p7) chk.s r15, .recovery
	;;
.back:
	.mib
	(p7) st1 [r14] = r15
	nop 0
	br.ret.sptk.many b0
.recovery:
	ld1 r15 = [r14]
	//adds r15 = 1, r15
        br.cond.sptk .back
	.endp main#
	.ident	"GCC: (Debian 4.6.3-14) 4.6.3"
	.section	.note.GNU-stack,"",@progbits

imz@rx2620:~/test-speculative-SIGSEGV$ cc c_chk.s && ./a.out; echo $?
Segmentation fault
139

It produced a normal behavior, better satisfying POSIX.

[1]: https://blogs.msdn.microsoft.com/oldnewthing/20040119-00/?p=41003
[2]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ia64/strchr.S;h=3a29e80b52c350a76e880cbb8daa66c91fa98964;hb=HEAD#l87

[1] seems to be outdated because it shows a wrong variant of "chk.s",
but has a story about the registers being 65-bit having an additional
bit for their "validity".

[2] is a manually written example of this approach which I googled up
quickly searching for "chk.s" "ia64".

-- 
Best regards,
Ivan