When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
on AMD cpus. Xen will disable this via setup_clear_cpu_cap(), so test
cpu_caps_cleared to not have disabled this bit.
This bug/feature bit is kind of special as it will be used very early
when switching threads. Setting the bit and clearing it a little bit
later leaves a critical window where things can go wrong. This time
window has enlarged a little bit by using setup_clear_cpu_cap() instead
of the hypervisor's set_cpu_features callback. It seems this larger
window now makes it rather easy to hit the problem.
The proper solution is to never set the bit in case of Xen.
Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c36140d788fe..f659b6f534b7 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -800,7 +800,9 @@ static void init_amd(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
/* AMD CPUs don't reset SS attributes on SYSRET */
- set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
+ if (!test_bit(X86_BUG_SYSRET_SS_ATTRS,
+ (unsigned long *)cpu_caps_cleared))
+ set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
}
#ifdef CONFIG_X86_32
--
2.12.0
On Tue, Apr 25, 2017 at 08:00:14PM +0200, Juergen Gross wrote:
> When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
> on AMD cpus. Xen will disable this via setup_clear_cpu_cap(), so test
> cpu_caps_cleared to not have disabled this bit.
>
> This bug/feature bit is kind of special as it will be used very early
> when switching threads. Setting the bit and clearing it a little bit
> later leaves a critical window where things can go wrong. This time
> window has enlarged a little bit by using setup_clear_cpu_cap() instead
> of the hypervisor's set_cpu_features callback. It seems this larger
> window now makes it rather easy to hit the problem.
>
> The proper solution is to never set the bit in case of Xen.
>
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/kernel/cpu/amd.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index c36140d788fe..f659b6f534b7 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -800,7 +800,9 @@ static void init_amd(struct cpuinfo_x86 *c)
> set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
>
> /* AMD CPUs don't reset SS attributes on SYSRET */
> - set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
> + if (!test_bit(X86_BUG_SYSRET_SS_ATTRS,
> + (unsigned long *)cpu_caps_cleared))
> + set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
> }
Hold on, AFAICT we have this order:
c_init() -> init_amd() sets X86_BUG_SYSRET_SS_ATTRS
init_hypervisor->x86_hyper->set_cpu_features-> clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
and all is good.
And I remember seeing a patchset doing some xen cpuid cleanup so I'm
assuming you're doing setup_clear_cpu_cap() now? And now we have to wag
the dog?
Confused.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 25/04/17 20:24, Borislav Petkov wrote:
> On Tue, Apr 25, 2017 at 08:00:14PM +0200, Juergen Gross wrote:
>> When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set
>> on AMD cpus. Xen will disable this via setup_clear_cpu_cap(), so test
>> cpu_caps_cleared to not have disabled this bit.
>>
>> This bug/feature bit is kind of special as it will be used very early
>> when switching threads. Setting the bit and clearing it a little bit
>> later leaves a critical window where things can go wrong. This time
>> window has enlarged a little bit by using setup_clear_cpu_cap() instead
>> of the hypervisor's set_cpu_features callback. It seems this larger
>> window now makes it rather easy to hit the problem.
>>
>> The proper solution is to never set the bit in case of Xen.
>>
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> arch/x86/kernel/cpu/amd.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index c36140d788fe..f659b6f534b7 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -800,7 +800,9 @@ static void init_amd(struct cpuinfo_x86 *c)
>> set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
>>
>> /* AMD CPUs don't reset SS attributes on SYSRET */
>> - set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>> + if (!test_bit(X86_BUG_SYSRET_SS_ATTRS,
>> + (unsigned long *)cpu_caps_cleared))
>> + set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>> }
>
> Hold on, AFAICT we have this order:
>
> c_init() -> init_amd() sets X86_BUG_SYSRET_SS_ATTRS
And what happens when there is a scheduling event right here?
__switch_to() will see X86_BUG_SYSRET_SS_ATTRS set and take a wrong
path.
> init_hypervisor->x86_hyper->set_cpu_features-> clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
And now (with setup_clear_cpu_cap()) the bit is cleared a little bit
later. And all should be good. But it isn't.
>
> and all is good.
>
> And I remember seeing a patchset doing some xen cpuid cleanup so I'm
> assuming you're doing setup_clear_cpu_cap() now? And now we have to wag
> the dog?
No, now the time window with X86_BUG_SYSRET_SS_ATTRS set is so long we
actually see the problem happening.
Juergen
On Tue, Apr 25, 2017 at 08:34:34PM +0200, Juergen Gross wrote:
> And what happens when there is a scheduling event right here?
> __switch_to() will see X86_BUG_SYSRET_SS_ATTRS set and take a wrong
> path.
So the whole thing we're doing right now is wrong: set bit and then
clear bit.
We should not set the bit at all and there won't be any window to get it
wrong.
So can we do something like this instead:
if (!cpu_has(c, X86_FEATURE_XENPV))
set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
or is XENPV the wrong thing to test?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 25/04/17 20:18, Borislav Petkov wrote:
> On Tue, Apr 25, 2017 at 08:34:34PM +0200, Juergen Gross wrote:
>> And what happens when there is a scheduling event right here?
>> __switch_to() will see X86_BUG_SYSRET_SS_ATTRS set and take a wrong
>> path.
> So the whole thing we're doing right now is wrong: set bit and then
> clear bit.
>
> We should not set the bit at all and there won't be any window to get it
> wrong.
>
> So can we do something like this instead:
>
> if (!cpu_has(c, X86_FEATURE_XENPV))
> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>
> or is XENPV the wrong thing to test?
>
X86_BUG_SYSRET_SS_ATTRS only actually causes a problem if you enter the
kernel via an IDT vector (forcing %ss to NULL), then exiting the kernel
via the optimistic sysret path, which on AMD loads the %ss selector, but
apparently doesn't update the segment cache (and %ss.dpl in particular).
The problem (for all ring-deprivileged virtuailsation; not just Xen PV),
is that
savesegment(ss, ss_sel);
if (ss_sel != __KERNEL_DS)
loadsegment(ss, __KERNEL_DS);
tries to load %ss with an RPL of 0, and things blow up.
~Andrew
On Tue, Apr 25, 2017 at 09:17:13PM +0100, Andrew Cooper wrote:
> The problem (for all ring-deprivileged virtuailsation; not just Xen PV),
> is that
I know what that that bug flag is for.
I'm asking whether the xen guest boot sets a flag early - like XENPV,
for example - which can differentiate between baremetal and virt boot in
a fairly generic manner so that we don't set that bug flag then instead
of the set and then clear dance we do currently.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 25/04/17 21:18, Borislav Petkov wrote:
> On Tue, Apr 25, 2017 at 08:34:34PM +0200, Juergen Gross wrote:
>> And what happens when there is a scheduling event right here?
>> __switch_to() will see X86_BUG_SYSRET_SS_ATTRS set and take a wrong
>> path.
>
> So the whole thing we're doing right now is wrong: set bit and then
> clear bit.
Right. And this is handled by my patch.
The really clean solution would be to add this test to set_cpu_bug()
et al. Don't set/clear the bit if anyone selected to force a value.
The force variants would be capable to overwrite, the normal variants
wouldn't. This would require a lot of research to avoid pitfalls with
today's handling, though. OTOH one could remove all the calls to
apply_forced_caps().
>
> We should not set the bit at all and there won't be any window to get it
> wrong.
>
> So can we do something like this instead:
>
> if (!cpu_has(c, X86_FEATURE_XENPV))
> set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>
> or is XENPV the wrong thing to test?
This would work. OTOH I'd prefer to test whether the bit should be
forced to remain zero than use the knowledge _who_ is trying to force
it.
Juergen
On Wed, Apr 26, 2017 at 06:45:42AM +0200, Juergen Gross wrote:
> The really clean solution would be to add this test to set_cpu_bug()
No, the really clean solution is to set it once and not play toggle
games.
> This would work. OTOH I'd prefer to test whether the bit should be
> forced to remain zero than use the knowledge _who_ is trying to force
> it.
Because we're in the business of investigating who did?
Nah, we should set it or clear it once and not do funky toggle games.
Especially if in the future something else changes and timing windows
grow and we refactor stuff and yadda yadda...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 26/04/17 08:35, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 06:45:42AM +0200, Juergen Gross wrote:
>> The really clean solution would be to add this test to set_cpu_bug()
>
> No, the really clean solution is to set it once and not play toggle
> games.
>
>> This would work. OTOH I'd prefer to test whether the bit should be
>> forced to remain zero than use the knowledge _who_ is trying to force
>> it.
>
> Because we're in the business of investigating who did?
>
> Nah, we should set it or clear it once and not do funky toggle games.
> Especially if in the future something else changes and timing windows
> grow and we refactor stuff and yadda yadda...
So what else is my patch doing? It is avoiding to set the bit in case
somebody (i.e. Xen) was forcing it to remain zero.
I'm not feeling strong about it. So if you want to test for
X86_FEATURE_XENPV to avoid setting X86_BUG_SYSRET_SS_ATTRS I'm fine
with it.
Will send V2 with that change.
Juergen
On Wed, Apr 26, 2017 at 08:24:12PM +0200, Juergen Gross wrote:
> I'm not feeling strong about it. So if you want to test for
> X86_FEATURE_XENPV to avoid setting X86_BUG_SYSRET_SS_ATTRS I'm fine
> with it.
>
> Will send V2 with that change.
And remove the corresponding
clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
in xen_set_cpu_features().
So that we can set it once, only on !XENPV feature set.
/me looks again at the code...
Gah, except that we do
set_cpu_cap(c, X86_FEATURE_XENPV);
and that runs as part of init_hypervisor() and it runs *after* c_init().
So, back to square one. :-\
So lemme try to explain again what I mean:
I'd like to have a generic way of detecting whether I'm running as a xen
guest at ->c_init() time and depending on the result of that detection,
to set X86_BUG_SYSRET_SS_ATTRS or not set it.
Does that make more sense?
Thanks.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 27/04/17 00:04, Borislav Petkov wrote:
> On Wed, Apr 26, 2017 at 08:24:12PM +0200, Juergen Gross wrote:
>> I'm not feeling strong about it. So if you want to test for
>> X86_FEATURE_XENPV to avoid setting X86_BUG_SYSRET_SS_ATTRS I'm fine
>> with it.
>>
>> Will send V2 with that change.
>
> And remove the corresponding
>
> clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>
> in xen_set_cpu_features().
Okay, you are right, we can omit this one now.
> So that we can set it once, only on !XENPV feature set.
>
> /me looks again at the code...
>
> Gah, except that we do
>
> set_cpu_cap(c, X86_FEATURE_XENPV);
>
> and that runs as part of init_hypervisor() and it runs *after* c_init().
No, this is called by xen_start_kernel(), long before c_init().
> So, back to square one. :-\
>
> So lemme try to explain again what I mean:
>
> I'd like to have a generic way of detecting whether I'm running as a xen
> guest at ->c_init() time and depending on the result of that detection,
> to set X86_BUG_SYSRET_SS_ATTRS or not set it.
>
> Does that make more sense?
This does make sense and it is working, as Sander could confirm.
Juergen