2018-12-20 20:30:18

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

It was found that AWS x1 instances (Xen-based) lack xen.git commit
1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
doesn't help: the consequent rdmsr() blows up with

unchecked MSR access error: RDMSR from 0xc90 at rIP:
0xffffffff88c5bba3 (native_read_msr+0x3/0x30)

The issue should definitely get fixed on AWS side. We can, however, simply
workaround this in Linux and live happily after.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 44272b7107ad..0acee6cd07a8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)

if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
return;
- rdmsr(IA32_L3_CBM_BASE, l, h);
+ if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
+ return;

/* If all the bits were set in MSR, return success */
if (l != max_cbm)
--
2.19.2



2018-12-20 17:33:09

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

Borislav Petkov <[email protected]> writes:

> On Thu, Dec 20, 2018 at 02:40:46PM +0100, Vitaly Kuznetsov wrote:
>> It was found that AWS x1 instances (Xen-based) lack xen.git commit
>> 1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
>> and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
>> doesn't help: the consequent rdmsr() blows up with
>>
>> unchecked MSR access error: RDMSR from 0xc90 at rIP:
>> 0xffffffff88c5bba3 (native_read_msr+0x3/0x30)
>>
>> The issue should definitely get fixed on AWS side. We can, however, simply
>> workaround this in Linux and live happily after.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
>> index 44272b7107ad..0acee6cd07a8 100644
>> --- a/arch/x86/kernel/cpu/intel_rdt.c
>> +++ b/arch/x86/kernel/cpu/intel_rdt.c
>> @@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)
>>
>> if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
>> return;
>> - rdmsr(IA32_L3_CBM_BASE, l, h);
>> + if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
>> + return;
>>
>> /* If all the bits were set in MSR, return success */
>> if (l != max_cbm)
>> --
>
> Does the below hunk work too?
>

Yes, it does, thanks!

--
Vitaly

2018-12-20 21:39:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

On Thu, Dec 20, 2018 at 02:40:46PM +0100, Vitaly Kuznetsov wrote:
> It was found that AWS x1 instances (Xen-based) lack xen.git commit
> 1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
> and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
> doesn't help: the consequent rdmsr() blows up with
>
> unchecked MSR access error: RDMSR from 0xc90 at rIP:
> 0xffffffff88c5bba3 (native_read_msr+0x3/0x30)
>
> The issue should definitely get fixed on AWS side. We can, however, simply
> workaround this in Linux and live happily after.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 44272b7107ad..0acee6cd07a8 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)
>
> if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
> return;
> - rdmsr(IA32_L3_CBM_BASE, l, h);
> + if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
> + return;
>
> /* If all the bits were set in MSR, return success */
> if (l != max_cbm)
> --

Does the below hunk work too?

Reinette, Babu: is there any reason for rdt_init_res_defs() and
check_quirks() to run before get_rdt_resources() which looks at CPUID
bits?

If not, we need to do something like below, for guests.

Thx.

---
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c3a9dc63edf2..64cb0fb31862 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -969,17 +969,13 @@ static int __init resctrl_late_init(void)
struct rdt_resource *r;
int state, ret;

- /*
- * Initialize functions(or definitions) that are different
- * between vendors here.
- */
+ if (!get_rdt_resources())
+ return -ENODEV;
+
rdt_init_res_defs();

check_quirks();

- if (!get_rdt_resources())
- return -ENODEV;
-
rdt_init_padding();

state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,



--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-09 11:51:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

On Tue, Jan 08, 2019 at 09:39:46PM -0800, Reinette Chatre wrote:
> To clarify please the last sentence applies to real Haswell server CPUs
> (not virtualized as prompted this work, my apologies for the confusion)
> that support Intel RDT but does not have CPUID enumeration for this
> feature. With Vitaly's patch this hardware would still be detected as
> supporting CAT but if CPUID enumeration is moved earlier then from what
> I can tell this hardware would be considered as not supporting the
> feature anymore.

Ok, so hw "forgot" to add CPUID again. And CPU guys should know better
but everytime we hear "important reasons" why they dropped the ball
there.

So, assuming RDT is not going to be supported in a guest, we need a
proper fix to disable it when in a guest. So the RDT init path needs
something like this then:

---
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c3a9dc63edf2..1e5a1cb49e9c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -969,6 +969,9 @@ static int __init resctrl_late_init(void)
struct rdt_resource *r;
int state, ret;

+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ return -ENODEV;
+
/*
* Initialize functions(or definitions) that are different
* between vendors here.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-09 12:25:34

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

Borislav Petkov <[email protected]> writes:

> So, assuming RDT is not going to be supported in a guest

Hm, why is that? In theory, hypervisors can pass through or emulate the
required MSRs...

--
Vitaly

2019-01-09 13:16:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
> Hm, why is that? In theory, hypervisors can pass through or emulate the
> required MSRs...

...and when the theory becomes reality we'll remove the check.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-09 21:20:45

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

On Wed, Jan 9, 2019 at 5:00 AM Borislav Petkov <[email protected]> wrote:
>
> On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
> > Hm, why is that? In theory, hypervisors can pass through or emulate the
> > required MSRs...
>
> ...and when the theory becomes reality we'll remove the check.

In practice that may be a long time coming. We don't have many CLOSIDs, or
bits in a cache mask, at the h/w level. If you start trying to
subdivide those resources
to pass a subset to a guest, then you'll quickly find that you have no
flexibility in the
guest to do anything useful. It would only work if you limited to
two, or perhaps three
guests.

-Tony

2019-01-10 10:34:43

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

Tony Luck <[email protected]> writes:

> On Wed, Jan 9, 2019 at 5:00 AM Borislav Petkov <[email protected]> wrote:
>>
>> On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
>> > Hm, why is that? In theory, hypervisors can pass through or emulate the
>> > required MSRs...
>>
>> ...and when the theory becomes reality we'll remove the check.
>
> In practice that may be a long time coming. We don't have many CLOSIDs, or
> bits in a cache mask, at the h/w level. If you start trying to
> subdivide those resources to pass a subset to a guest, then you'll
> quickly find that you have no flexibility in the guest to do anything
> useful. It would only work if you limited to two, or perhaps three
> guests.

Running a single guest on a physical CPU is a very common scenario. In
fact, sharing cores is very rare for public clouds: e.g. all worthy
instance types on AWS/Azure give you dedicated cores and I don't see why
hypervisor can't pass through resctl features.

The other thing is: how can we be sure that there's no hypervisor
exposing these feature already? Even if open-source hypervisors like
KVM/Xen don't do it it doesn't prove anything: there are numerous
proprietary hypervisors and who knows what they do.

The original issue which triggered the discussion was discovered on AWS
Xen where the host is buggy and I suggested a simple short-term
workaround, I'm no expert in rdt/qos so I'm leaving this up to the
maintainers to decide which fix deserves to go in (if any).

--
Vitaly

2019-01-10 11:56:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue

On Thu, Jan 10, 2019 at 11:32:47AM +0100, Vitaly Kuznetsov wrote:
> The other thing is: how can we be sure that there's no hypervisor
> exposing these feature already? Even if open-source hypervisors like
> KVM/Xen don't do it it doesn't prove anything: there are numerous
> proprietary hypervisors and who knows what they do.

Well, we have this thing called CPUID which enumerates features -
regardless of running on baremetal or on a hypervisor. But apparently
some Intel CPUs dropped the ball there. Which caused the f*ckup we are
in right now.

If you really want to not foreclose that feature for guests - and it
sounds like you do - the other thing I can think of is an early quirk
*setting* those feature bits for those Intel CPUs which "forgot" to set
them and then changing the resctrl code to test CPUID bits first.

This way, the kernel is presented with a unified view on what is
supported by the underlying machine - and that machine can be a HV or
baremetal - kernel doesn't care.

> The original issue which triggered the discussion was discovered on
> AWS Xen where the host is buggy and I suggested a simple short-term
> workaround

And I'm sick'n'tired of simple-short term workarounds and band-aids and
the kernel always bending because people dropped the ball and those
being perpetuated, because, sure, we'll add one more "fix" on the pile,
who cares... :-\

If AWS xen is broken, then that should be fixed - not the kernel.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.