2021-04-21 02:36:40

by Like Xu

[permalink] [raw]
Subject: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
When ARCH_LBR we don't set lbr_tos, the failure from the
check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
thereby preventing the initialization of the guest LBR.

Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5272f349dca2..5036496caa60 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
u64 val_old, val_new, val_tmp;

/*
- * Disable the check for real HW, so we don't
+ * Disable the check for real HW or non-sense msr, so we don't
* mess with potentionaly enabled registers:
*/
- if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
return true;

/*
--
2.30.2


2021-04-21 21:12:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0

On Wed, Apr 21, 2021, Like Xu wrote:
> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
> When ARCH_LBR we don't set lbr_tos, the failure from the
> check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
> thereby preventing the initialization of the guest LBR.
>
> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> Signed-off-by: Like Xu <[email protected]>
> Reviewed-by: Kan Liang <[email protected]>
> ---
> arch/x86/events/intel/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 5272f349dca2..5036496caa60 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
> u64 val_old, val_new, val_tmp;
>
> /*
> - * Disable the check for real HW, so we don't
> + * Disable the check for real HW or non-sense msr, so we don't

I think this should be "undefined MSR" or something along those lines. MSR 0x0
is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.

Anyways, my point is that if your definition of "nonsense" is any MSR that is
not a valid perf MSR, then this check is woefully incompletely. If your
definition is a nonsensical value, then this comment is simply wrong.

What you're really looking for is precisely the case where the MSR was zero
initialized and never defined.

> * mess with potentionaly enabled registers:
> */
> - if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
> + if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
> return true;
>
> /*
> --
> 2.30.2
>

2021-04-22 01:33:27

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0

On 2021/4/21 23:30, Sean Christopherson wrote:
> On Wed, Apr 21, 2021, Like Xu wrote:
>> The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
>> When ARCH_LBR we don't set lbr_tos, the failure from the
>> check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
>> thereby preventing the initialization of the guest LBR.
>>
>> Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
>> Signed-off-by: Like Xu <[email protected]>
>> Reviewed-by: Kan Liang <[email protected]>
>> ---
>> arch/x86/events/intel/core.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 5272f349dca2..5036496caa60 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
>> u64 val_old, val_new, val_tmp;
>>
>> /*
>> - * Disable the check for real HW, so we don't
>> + * Disable the check for real HW or non-sense msr, so we don't
>
> I think this should be "undefined MSR" or something along those lines. MSR 0x0
> is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
> supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.

Thank you, Sean.

<idle>-0 [000] dN.. 38980.032347: read_msr: 0, value fff

Do we have a historic story or specification for this kind of alias ?

#define MSR_IA32_MC0_ADDR 0x00000402
#define MSR_IA32_MC0_CTL 0x00000400

>
> Anyways, my point is that if your definition of "nonsense" is any MSR that is
> not a valid perf MSR, then this check is woefully incompletely. If your
> definition is a nonsensical value, then this comment is simply wrong.
>
> What you're really looking for is precisely the case where the MSR was zero
> initialized and never defined.
>
>> * mess with potentionaly enabled registers:
>> */
>> - if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> + if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) || !msr)
>> return true;
>>
>> /*
>> --
>> 2.30.2
>>

2021-04-22 01:49:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH RESEND 1/2] perf/x86: Skip checking MSR for MSR 0x0

On Thu, Apr 22, 2021, Like Xu wrote:
> On 2021/4/21 23:30, Sean Christopherson wrote:
> > On Wed, Apr 21, 2021, Like Xu wrote:
> > > The Architecture LBR does not have MSR_LBR_TOS (0x000001c9).
> > > When ARCH_LBR we don't set lbr_tos, the failure from the
> > > check_msr() against MSR 0x000 will make x86_pmu.lbr_nr = 0,
> > > thereby preventing the initialization of the guest LBR.
> > >
> > > Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
> > > Signed-off-by: Like Xu <[email protected]>
> > > Reviewed-by: Kan Liang <[email protected]>
> > > ---
> > > arch/x86/events/intel/core.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 5272f349dca2..5036496caa60 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -4751,10 +4751,10 @@ static bool check_msr(unsigned long msr, u64 mask)
> > > u64 val_old, val_new, val_tmp;
> > > /*
> > > - * Disable the check for real HW, so we don't
> > > + * Disable the check for real HW or non-sense msr, so we don't
> >
> > I think this should be "undefined MSR" or something along those lines. MSR 0x0
> > is a "real" MSR, on Intel CPUs it's an alias for IA32_MC0_ADDR; at least it's
> > supposed to be, most/all Intel CPUs incorrectly alias it to IA32_MC0_CTL.
>
> Thank you, Sean.
>
> <idle>-0 [000] dN.. 38980.032347: read_msr: 0, value fff
>
> Do we have a historic story or specification for this kind of alias ?

It's kinda documented in the SDM under "2.1 ARCHITECTURAL MSRS"

0H 0 IA32_P5_MC_ADDR (P5_MC_ADDR) Pentium Processor (05_01H)
1H 1 IA32_P5_MC_TYPE (P5_MC_TYPE) DF_DM = 05_01H

The history is that very early machine check support only had a single "bank",
with MSR 0x0 holding the address and MSR 0x1 holding the type. When the MSRs were
relocated to the 0x400 range, presumably to have room to grow the list, the MSRs
were aliased to maintain backwards compatibility (again, an assumption).

Unfortunately, that backwards compatibility apparently didn't get tested, and MSR
0x0 ended up aliased to 0x400 instead of 0x402.

The only reason I'm aware of all this because SGX is soft disabled by ucode if
any of the machine check banks are disabled by writing MCn_CTL. Some folks found
out the hard way way doing WRMSR with an uninitialized index, i.e. WRMSR(0),
would disable SGX.

If you want a good giggle, you can verify on pretty much any Intel silicon:

$ rdmsr 0x400
ff
$ wrmsr 0x0 0
$ rdmsr 0x400
0

> #define MSR_IA32_MC0_ADDR 0x00000402
> #define MSR_IA32_MC0_CTL 0x00000400