unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
@ 2020-05-27 13:03 Florian Weimer via Libc-alpha
  2020-05-27 13:39 ` Andrew Cooper via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-27 13:03 UTC (permalink / raw)
  To: xen-devel; +Cc: libc-alpha

I'm about to remove nosegneg support from upstream glibc, special builds
that use -mno-tls-direct-seg-refs, and the ability load different
libraries built in this mode automatically, when the Linux kernel tells
us to do that.  I think the intended effect is that these special builds
do not use operands of the form %gs:(%eax) when %eax has the MSB set
because that had a performance hit with paravirtualization on 32-bit
x86.  Instead, the thread pointer is first loaded from %gs:0, and the
actual access does not use a segment prefix.

Before doing that, I'd like to ask if anybody is still using this
feature?

I know that we've been carrying nosegneg libraries for many years, in
some cases even after we stopped shipping 32-bit kernels. 8-/ The
feature has always been rather poorly documented, and the way the
dynamic loader selects those nosegneg library variants is still very
bizarre.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 13:03 -mno-tls-direct-seg-refs support in glibc for i386 PV Xen Florian Weimer via Libc-alpha
@ 2020-05-27 13:39 ` Andrew Cooper via Libc-alpha
  2020-05-27 13:44   ` Samuel Thibault
  2020-05-27 14:00   ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper via Libc-alpha @ 2020-05-27 13:39 UTC (permalink / raw)
  To: Florian Weimer, xen-devel; +Cc: libc-alpha

On 27/05/2020 14:03, Florian Weimer wrote:
> I'm about to remove nosegneg support from upstream glibc, special builds
> that use -mno-tls-direct-seg-refs, and the ability load different
> libraries built in this mode automatically, when the Linux kernel tells
> us to do that.  I think the intended effect is that these special builds
> do not use operands of the form %gs:(%eax) when %eax has the MSB set
> because that had a performance hit with paravirtualization on 32-bit
> x86.  Instead, the thread pointer is first loaded from %gs:0, and the
> actual access does not use a segment prefix.
>
> Before doing that, I'd like to ask if anybody is still using this
> feature?
>
> I know that we've been carrying nosegneg libraries for many years, in
> some cases even after we stopped shipping 32-bit kernels. 8-/ The
> feature has always been rather poorly documented, and the way the
> dynamic loader selects those nosegneg library variants is still very
> bizarre.

I wasn't even aware of this feature, or that there was a problem wanting
fixing.

That said, I have found:

# 32-bit x86 does not perform well with -ve segment accesses on Xen.
CFLAGS-$(CONFIG_X86_32) += $(call cc-option,$(CC),-mno-tls-direct-seg-refs)

in one of our makefiles.

Why does the MSB make any difference?  %gs still needs to remain intact
so the thread pointer can be pulled out, so there is nothing that Xen or
Linux can do in the way of lazy loading.

Beyond that, its straight up segment base semantics in x86.  There will
be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
Xen and applies to all segment based TLS accesses on x86, and you'll win
that back easily through reduced register pressure.

Are there any further details on the perf problem claim?  I find it
suspicious.


Either way, 32bit PV is on its last legs (not too bad, for something
which was essentially killed by the AMD64 spec).

Ring 1 counting as supervisor mode as far as pagetables goes has already
caused guests to suffer a major performance hit on hardware with
SMAP/SMEP (IvyBridge and later), as well as various speculative
mitigations (we can't rely on SMEP preventing the CPU from speculating
back into Ring 1, etc), and the forthcoming CET Shadow Stack feature
totally kills Ring1/2 as usable concepts in the architecture.

Linux is threatening to drop PV32 support, and I've recently added an
option to Xen to compile out and/or disable PV32 (both for attack
surface reduction purposes, and as a necessary consequence of using
Shadow Stacks).

With both my XenServer and upstream x86 maintainers hats on, PV32 is
solely for legacy workloads now.  People currently using PV32 obviously
don't care about performance, or haven't been taking security updates. 
I severely doubt they'll notice any change from this.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 13:39 ` Andrew Cooper via Libc-alpha
@ 2020-05-27 13:44   ` Samuel Thibault
  2020-05-27 14:15     ` Andrew Cooper via Libc-alpha
  2020-05-27 14:00   ` Jan Beulich
  1 sibling, 1 reply; 8+ messages in thread
From: Samuel Thibault @ 2020-05-27 13:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Florian Weimer, xen-devel, libc-alpha

Hello,

Andrew Cooper via Libc-alpha, le mer. 27 mai 2020 14:39:00 +0100, a ecrit:
> Why does the MSB make any difference?  %gs still needs to remain intact
> so the thread pointer can be pulled out, so there is nothing that Xen or
> Linux can do in the way of lazy loading.
> 
> Beyond that, its straight up segment base semantics in x86.  There will
> be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
> Xen and applies to all segment based TLS accesses on x86, and you'll win
> that back easily through reduced register pressure.
> 
> Are there any further details on the perf problem claim?  I find it
> suspicious.

The concern is not about the indirection.

The concern is that to keep safe from the guest, the hypervisor has to
restrict the size of the segment, and thus negative offsets, used in the
i386 TLS model, are rejected by the processor, and the hypervisor has to
emulate these access, thus a high cost.

Samuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 13:39 ` Andrew Cooper via Libc-alpha
  2020-05-27 13:44   ` Samuel Thibault
@ 2020-05-27 14:00   ` Jan Beulich
  2020-05-27 14:40     ` Andrew Cooper via Libc-alpha
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2020-05-27 14:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Florian Weimer, xen-devel, libc-alpha

On 27.05.2020 15:39, Andrew Cooper wrote:
> On 27/05/2020 14:03, Florian Weimer wrote:
>> I'm about to remove nosegneg support from upstream glibc, special builds
>> that use -mno-tls-direct-seg-refs, and the ability load different
>> libraries built in this mode automatically, when the Linux kernel tells
>> us to do that.  I think the intended effect is that these special builds
>> do not use operands of the form %gs:(%eax) when %eax has the MSB set
>> because that had a performance hit with paravirtualization on 32-bit
>> x86.  Instead, the thread pointer is first loaded from %gs:0, and the
>> actual access does not use a segment prefix.
>>
>> Before doing that, I'd like to ask if anybody is still using this
>> feature?
>>
>> I know that we've been carrying nosegneg libraries for many years, in
>> some cases even after we stopped shipping 32-bit kernels. 8-/ The
>> feature has always been rather poorly documented, and the way the
>> dynamic loader selects those nosegneg library variants is still very
>> bizarre.
> 
> I wasn't even aware of this feature, or that there was a problem wanting
> fixing.
> 
> That said, I have found:
> 
> # 32-bit x86 does not perform well with -ve segment accesses on Xen.
> CFLAGS-$(CONFIG_X86_32) += $(call cc-option,$(CC),-mno-tls-direct-seg-refs)
> 
> in one of our makefiles.
> 
> Why does the MSB make any difference?  %gs still needs to remain intact
> so the thread pointer can be pulled out, so there is nothing that Xen or
> Linux can do in the way of lazy loading.
> 
> Beyond that, its straight up segment base semantics in x86.  There will
> be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
> Xen and applies to all segment based TLS accesses on x86, and you'll win
> that back easily through reduced register pressure.
> 
> Are there any further details on the perf problem claim?  I find it
> suspicious.

To guard the hypervisor area, 32-bit Xen reduced the limits of guest
usable segment descriptors. While this works fine for flat ones (you
just chop off some space at the top), there's no way to represent a
full segment with a non-zero base. You can have the descriptor map
only the [base,XenBase] part or the [0,base) one. Hence Xen, from its
#GP handler, flipped the descriptor between the two options depending
on whether the current access was to the positive of negative part of
the TLS seg. (An in-practice use of expand down segments, as you'll
surely notice.)

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 13:44   ` Samuel Thibault
@ 2020-05-27 14:15     ` Andrew Cooper via Libc-alpha
  2020-05-27 14:20       ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper via Libc-alpha @ 2020-05-27 14:15 UTC (permalink / raw)
  To: Samuel Thibault, Florian Weimer, xen-devel, libc-alpha

On 27/05/2020 14:44, Samuel Thibault wrote:
> Hello,
>
> Andrew Cooper via Libc-alpha, le mer. 27 mai 2020 14:39:00 +0100, a ecrit:
>> Why does the MSB make any difference?  %gs still needs to remain intact
>> so the thread pointer can be pulled out, so there is nothing that Xen or
>> Linux can do in the way of lazy loading.
>>
>> Beyond that, its straight up segment base semantics in x86.  There will
>> be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
>> Xen and applies to all segment based TLS accesses on x86, and you'll win
>> that back easily through reduced register pressure.
>>
>> Are there any further details on the perf problem claim?  I find it
>> suspicious.
> The concern is not about the indirection.
>
> The concern is that to keep safe from the guest, the hypervisor has to
> restrict the size of the segment, and thus negative offsets, used in the
> i386 TLS model, are rejected by the processor, and the hypervisor has to
> emulate these access, thus a high cost.

Oh, so the i386 TLS model relies on the calculation wrapping (modulo 4G)
when the segment limit is 4G, instead of taking a fault?

Intel states this is behaviour is implementation specific (SDM Vol3
5.3.1) and may fault, while AMD doesn't discuss it at all as far as I
can tell (APM Vol2 4.12 is the right section, but I can't see this
discussed).

While I can believe it probably works on every processor these days, it
does seem like dodgy ground to base an ABI on.

It also means that Xen isn't necessarily the only affected party.  I'm
pretty sure GRSecurity use reduced segment limits as well.

I also bet it doesn't work reliably under emulation.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 14:15     ` Andrew Cooper via Libc-alpha
@ 2020-05-27 14:20       ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-27 14:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: libc-alpha, xen-devel

* Andrew Cooper:

> Oh, so the i386 TLS model relies on the calculation wrapping (modulo 4G)
> when the segment limit is 4G, instead of taking a fault?

That's about it.

> Intel states this is behaviour is implementation specific (SDM Vol3
> 5.3.1) and may fault, while AMD doesn't discuss it at all as far as I
> can tell (APM Vol2 4.12 is the right section, but I can't see this
> discussed).
>
> While I can believe it probably works on every processor these days, it
> does seem like dodgy ground to base an ABI on.

Sure, but it has been this way since the beginnings of NPTL, for close
to twenty years now.  The TCB is at positive offsets, and the user TLS
data at negative offsets.

> It also means that Xen isn't necessarily the only affected party.  I'm
> pretty sure GRSecurity use reduced segment limits as well.

Mostly for CS and DS, I believe, for the fake NX handling.  I think that
was never upstream, but some vendor kernels had variants of it.

> I also bet it doesn't work reliably under emulation.

It has to, given that it's so pervasively used under Linux. 8-/

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 14:00   ` Jan Beulich
@ 2020-05-27 14:40     ` Andrew Cooper via Libc-alpha
  2020-05-27 15:25       ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper via Libc-alpha @ 2020-05-27 14:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Florian Weimer, xen-devel, libc-alpha

On 27/05/2020 15:00, Jan Beulich wrote:
> On 27.05.2020 15:39, Andrew Cooper wrote:
>> On 27/05/2020 14:03, Florian Weimer wrote:
>>> I'm about to remove nosegneg support from upstream glibc, special builds
>>> that use -mno-tls-direct-seg-refs, and the ability load different
>>> libraries built in this mode automatically, when the Linux kernel tells
>>> us to do that.  I think the intended effect is that these special builds
>>> do not use operands of the form %gs:(%eax) when %eax has the MSB set
>>> because that had a performance hit with paravirtualization on 32-bit
>>> x86.  Instead, the thread pointer is first loaded from %gs:0, and the
>>> actual access does not use a segment prefix.
>>>
>>> Before doing that, I'd like to ask if anybody is still using this
>>> feature?
>>>
>>> I know that we've been carrying nosegneg libraries for many years, in
>>> some cases even after we stopped shipping 32-bit kernels. 8-/ The
>>> feature has always been rather poorly documented, and the way the
>>> dynamic loader selects those nosegneg library variants is still very
>>> bizarre.
>> I wasn't even aware of this feature, or that there was a problem wanting
>> fixing.
>>
>> That said, I have found:
>>
>> # 32-bit x86 does not perform well with -ve segment accesses on Xen.
>> CFLAGS-$(CONFIG_X86_32) += $(call cc-option,$(CC),-mno-tls-direct-seg-refs)
>>
>> in one of our makefiles.
>>
>> Why does the MSB make any difference?  %gs still needs to remain intact
>> so the thread pointer can be pulled out, so there is nothing that Xen or
>> Linux can do in the way of lazy loading.
>>
>> Beyond that, its straight up segment base semantics in x86.  There will
>> be a 1-cycle AGU delay from a non-zero base, but that nothing to do with
>> Xen and applies to all segment based TLS accesses on x86, and you'll win
>> that back easily through reduced register pressure.
>>
>> Are there any further details on the perf problem claim?  I find it
>> suspicious.
> To guard the hypervisor area, 32-bit Xen reduced the limits of guest
> usable segment descriptors.

Right.  Segment limits are what keept the guest kernel (ring 1,
supervisor) out of Xen (ring 1, also supervisor).

> While this works fine for flat ones (you
> just chop off some space at the top), there's no way to represent a
> full segment with a non-zero base.

(From the other thread,) The problem isn't related to the base, per say.

It is that a segment with a non-4G limit now faults rather than
truncating usefully for the 32bit TLS model.

> You can have the descriptor map
> only the [base,XenBase] part or the [0,base) one. Hence Xen, from its
> #GP handler, flipped the descriptor between the two options depending
> on whether the current access was to the positive of negative part of
> the TLS seg. (An in-practice use of expand down segments, as you'll
> surely notice.)

I've found gpf_emulate_4gb() in source history.  It was specific to
32bit builds of Xen (now long gone).

What I can't figure out is why this is unnecessary in 64bit builds of
Xen.  We still enforce reduced segment limits on the guests descriptors.

I have a worrying suspicion that Xen's ABI for PV32 (on top of a 64bit
Xen) now depends on -mno-tls-direct-seg-refs

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: -mno-tls-direct-seg-refs support in glibc for i386 PV Xen
  2020-05-27 14:40     ` Andrew Cooper via Libc-alpha
@ 2020-05-27 15:25       ` Jan Beulich
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2020-05-27 15:25 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Florian Weimer, xen-devel, libc-alpha

On 27.05.2020 16:40, Andrew Cooper wrote:
> On 27/05/2020 15:00, Jan Beulich wrote:
>> You can have the descriptor map
>> only the [base,XenBase] part or the [0,base) one. Hence Xen, from its
>> #GP handler, flipped the descriptor between the two options depending
>> on whether the current access was to the positive of negative part of
>> the TLS seg. (An in-practice use of expand down segments, as you'll
>> surely notice.)
> 
> I've found gpf_emulate_4gb() in source history.  It was specific to
> 32bit builds of Xen (now long gone).
> 
> What I can't figure out is why this is unnecessary in 64bit builds of
> Xen.  We still enforce reduced segment limits on the guests descriptors.

Do we? I can't find such - neither boot_compat_gdt[] has any signs
of it, nor check_descriptor(). And we don't have a need to: The
entire range is used for the r/o M2P, i.e. protection is enforced
at the paging layer. 32-bit Xen necessarily had r/w as well as
executable sub-ranges there.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-27 15:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 13:03 -mno-tls-direct-seg-refs support in glibc for i386 PV Xen Florian Weimer via Libc-alpha
2020-05-27 13:39 ` Andrew Cooper via Libc-alpha
2020-05-27 13:44   ` Samuel Thibault
2020-05-27 14:15     ` Andrew Cooper via Libc-alpha
2020-05-27 14:20       ` Florian Weimer via Libc-alpha
2020-05-27 14:00   ` Jan Beulich
2020-05-27 14:40     ` Andrew Cooper via Libc-alpha
2020-05-27 15:25       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).