2020-07-03 17:12:06

by Andy Lutomirski

[permalink] [raw]
Subject: FSGSBASE seems to be busted on Xen PV

Hi Xen folks-

I did some testing of the upcoming Linux FSGSBASE support on Xen PV,
and I found what appears to be some significant bugs in the Xen
context switching code. These bugs are causing Linux selftest
failures, and they could easily cause random and hard-to-debug
failures of user programs that use the new instructions in a Xen PV
guest.

The bugs seem to boil down to the context switching code in Xen being
clever and trying to guess that a nonzero FS or GS means that the
segment base must match the in-memory descriptor. This is simply not
true if CR4.FSGSBASE is set -- the bases can have any canonical value,
under the full control of the guest, and Xen has absolutely no way of
knowing whether the values are expected to be in sync with the
selectors. (The same is true of FSGSBASE except that guest funny
business either requires MSR accesses or some descriptor table
fiddling, and guests are perhaps less likely to care)

Having written a bunch of the corresponding Linux code, I don't
there's any way around just independently saving and restoring the
selectors and the bases. At least it's relatively fast with FSGSBASE
enabled.

If you can't get this fixed in upstream Xen reasonably quickly, we may
need to disable FSGSBASE in a Xen PV guest in Linux.

--Andy


2020-07-03 17:17:03

by Andrew Cooper

[permalink] [raw]
Subject: Re: FSGSBASE seems to be busted on Xen PV

On 03/07/2020 18:10, Andy Lutomirski wrote:
> Hi Xen folks-
>
> I did some testing of the upcoming Linux FSGSBASE support on Xen PV,
> and I found what appears to be some significant bugs in the Xen
> context switching code. These bugs are causing Linux selftest
> failures, and they could easily cause random and hard-to-debug
> failures of user programs that use the new instructions in a Xen PV
> guest.
>
> The bugs seem to boil down to the context switching code in Xen being
> clever and trying to guess that a nonzero FS or GS means that the
> segment base must match the in-memory descriptor. This is simply not
> true if CR4.FSGSBASE is set -- the bases can have any canonical value,
> under the full control of the guest, and Xen has absolutely no way of
> knowing whether the values are expected to be in sync with the
> selectors. (The same is true of FSGSBASE except that guest funny
> business either requires MSR accesses or some descriptor table
> fiddling, and guests are perhaps less likely to care)
>
> Having written a bunch of the corresponding Linux code, I don't
> there's any way around just independently saving and restoring the
> selectors and the bases. At least it's relatively fast with FSGSBASE
> enabled.
>
> If you can't get this fixed in upstream Xen reasonably quickly, we may
> need to disable FSGSBASE in a Xen PV guest in Linux.

This has come up several times before, but if its actually breaking
userspace then Xen needs to change.

I'll see about making something which is rather more robust.

~Andrew

2020-07-03 22:32:37

by Thomas Gleixner

[permalink] [raw]
Subject: Re: FSGSBASE seems to be busted on Xen PV

Andrew Cooper <[email protected]> writes:
> On 03/07/2020 18:10, Andy Lutomirski wrote:
>> If you can't get this fixed in upstream Xen reasonably quickly, we may
>> need to disable FSGSBASE in a Xen PV guest in Linux.
>
> This has come up several times before, but if its actually breaking
> userspace then Xen needs to change.
>
> I'll see about making something which is rather more robust.

You mean disabling XEN PV completely? That would be indeed very robust
and allows us to get rid of lots of obscure code. Feel free to add my
Acked-by :)

Thanks,

tglx