2023-04-17 16:23:02

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

Scalable MCA systems may report errors found during boot-time polling
through the ACPI Boot Error Record Table (BERT). The errors are logged
in an "x86 Processor" Common Platform Error Record (CPER). The format of
the x86 CPER does not include a logical CPU number, but it does provide
the logical APIC ID for the logical CPU. Also, it does not explicitly
provide MCA error information, but it can share this information using
an "MSR Context" defined in the CPER format.

The MCA error information is parsed by
1) Checking that the context matches the Scalable MCA register space.
2) Finding the logical CPU that matches the logical APIC ID from the
CPER.
3) Filling in struct mce with the relevant data and logging it.

All the above is done when the BERT is processed during late init. This
can be scheduled on any CPU, and it may be preemptible.

This results in two issues.
1) mce_setup() includes a call to smp_processor_id(). This will throw a
warning if preemption is enabled.
2) mce_setup() will pull info from the executing CPU, so some info in
struct mce may be incorrect for the CPU with the error. For example,
in a dual-socket system, an error logged in socket 1 CPU but
processed by a socket 0 CPU will save the PPIN of the socket 0 CPU.

Fix both issues by scheduling mce_setup() to run on the logical CPU
indicated in the error record. Preemption is disabled when calling
smp_call_function_*() resolving issue #1. And the error info is gathered
from the proper logical CPU resolving issue #2.

Furthermore, smp_call_function_*() handles calls with invalid CPU
numbers, etc. So extra checking by the caller is not necessary.

Fixes: 4a24d80b8c3e ("x86/mce, cper: Pass x86 CPER through the MCA handling chain")
Signed-off-by: Yazen Ghannam <[email protected]>
Cc: [email protected]
---
arch/x86/kernel/cpu/mce/apei.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 8ed341714686..5c0381a4a66f 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -63,6 +63,11 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
}
EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);

+static void __mce_setup(void *info)
+{
+ mce_setup((struct mce *)info);
+}
+
int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
{
const u64 *i_mce = ((const u64 *) (ctx_info + 1));
@@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
if (ctx_info->reg_arr_size < 48)
return -EINVAL;

- mce_setup(&m);
-
- m.extcpu = -1;
- m.socketid = -1;
-
- for_each_possible_cpu(cpu) {
- if (cpu_data(cpu).initial_apicid == lapic_id) {
- m.extcpu = cpu;
- m.socketid = cpu_data(m.extcpu).phys_proc_id;
+ for_each_possible_cpu(cpu)
+ if (cpu_data(cpu).initial_apicid == lapic_id)
break;
- }
- }

- m.apicid = lapic_id;
+ if (smp_call_function_single(cpu, __mce_setup, &m, 1))
+ return -EINVAL;
+
m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
m.status = *i_mce;
m.addr = *(i_mce + 1);
--
2.34.1


2023-04-17 17:32:13

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 4/17/23 13:17, Luck, Tony wrote:
>> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>
> I'm a bit puzzled why this function has "smca" in the name. I think it is called
> unconditionally on Intel as well as AMD systems ... and looks like it will do useful
> things if Intel someday has a hybrid server that reports APEI errors (maybe it is
> already useful on hybrid clients?).
>
> Otherwise this look looks fine.
>
> Acked-by: Tony Luck <[email protected]>
>

Thanks Tony.

The function expects the data to match the "MCAX" register layout used on
Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
for legacy MCA and MCAX. But the rest will be different.

This could be extended though, if other systems will use it.

Thanks,
Yazen

2023-04-17 17:34:32

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

> int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)

I'm a bit puzzled why this function has "smca" in the name. I think it is called
unconditionally on Intel as well as AMD systems ... and looks like it will do useful
things if Intel someday has a hybrid server that reports APEI errors (maybe it is
already useful on hybrid clients?).

Otherwise this look looks fine.

Acked-by: Tony Luck <[email protected]>

-Tony

2023-04-17 17:41:43

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

> The function expects the data to match the "MCAX" register layout used on
> Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
> for legacy MCA and MCAX. But the rest will be different.

Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA
and returns if it isn't on an SMCA system.

if (!boot_cpu_has(X86_FEATURE_SMCA))
return -EINVAL;

> This could be extended though, if other systems will use it.

I'll keep it in mind if Intel makes a hybrid server.

-Tony

2023-06-09 14:36:55

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 4/17/23 1:39 PM, Luck, Tony wrote:
>> The function expects the data to match the "MCAX" register layout used on
>> Scalable MCA systems. CTL, STATUS, ADDR, and MISC will be at the same offsets
>> for legacy MCA and MCAX. But the rest will be different.
>
> Ah yes. Looking at the code, rather than the patch I see that it first checks for SMCA
> and returns if it isn't on an SMCA system.
>
> if (!boot_cpu_has(X86_FEATURE_SMCA))
> return -EINVAL;
>
>> This could be extended though, if other systems will use it.
>
> I'll keep it in mind if Intel makes a hybrid server.
>

Hi all,

Any further comments on this?

Thanks,
Yazen

2023-06-15 15:29:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote:
> @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
> if (ctx_info->reg_arr_size < 48)
> return -EINVAL;
>
> - mce_setup(&m);
> -
> - m.extcpu = -1;
> - m.socketid = -1;
> -
> - for_each_possible_cpu(cpu) {
> - if (cpu_data(cpu).initial_apicid == lapic_id) {
> - m.extcpu = cpu;
> - m.socketid = cpu_data(m.extcpu).phys_proc_id;
> + for_each_possible_cpu(cpu)
> + if (cpu_data(cpu).initial_apicid == lapic_id)
> break;
> - }
> - }
>
> - m.apicid = lapic_id;
> + if (smp_call_function_single(cpu, __mce_setup, &m, 1))

I can see the following call-chain from NMI context which is a no-no:

ghes_notify_nmi
|-> ghes_in_nmi_spool_from_list
|-> ghes_in_nmi_queue_one_entry
|-> __ghes_panic
|-> __ghes_print_estatus
|-> cper_estatus_print
|-> cper_estatus_print_section
|-> cper_print_proc_ia
|-> arch_apei_report_x86_error
|-> apei_smca_report_x86_error
|-> smp_call_function_single


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-15 15:59:40

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 6/15/2023 11:20 AM, Borislav Petkov wrote:
> On Mon, Apr 17, 2023 at 04:20:06PM +0000, Yazen Ghannam wrote:
>> @@ -97,20 +102,13 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>> if (ctx_info->reg_arr_size < 48)
>> return -EINVAL;
>>
>> - mce_setup(&m);
>> -
>> - m.extcpu = -1;
>> - m.socketid = -1;
>> -
>> - for_each_possible_cpu(cpu) {
>> - if (cpu_data(cpu).initial_apicid == lapic_id) {
>> - m.extcpu = cpu;
>> - m.socketid = cpu_data(m.extcpu).phys_proc_id;
>> + for_each_possible_cpu(cpu)
>> + if (cpu_data(cpu).initial_apicid == lapic_id)
>> break;
>> - }
>> - }
>>
>> - m.apicid = lapic_id;
>> + if (smp_call_function_single(cpu, __mce_setup, &m, 1))
>
> I can see the following call-chain from NMI context which is a no-no:
>
> ghes_notify_nmi
> |-> ghes_in_nmi_spool_from_list
> |-> ghes_in_nmi_queue_one_entry
> |-> __ghes_panic
> |-> __ghes_print_estatus
> |-> cper_estatus_print
> |-> cper_estatus_print_section
> |-> cper_print_proc_ia
> |-> arch_apei_report_x86_error
> |-> apei_smca_report_x86_error
> |-> smp_call_function_single
>
>

Right, but in practice SMCA errors are not reported through GHES at
runtime. They will only come in through BERT at boot time. There aren't
any plans to change this, so the NMI issue won't be encountered.

I can include this info in the commit message and/or code comments. Is
this okay?

We can solve the NMI issue if it ever comes up in the future. Unless
there's an obvious change to avoid this now. Any suggestions?

Thanks,
Yazen


2023-06-15 16:39:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
> We can solve the NMI issue if it ever comes up in the future. Unless there's
> an obvious change to avoid this now. Any suggestions?

Yes, solve it right from the get-go. "It cannot happen now" is not good
enough. It should not be even technically possible.

Just report what's logged into BERT - nothing more. Whoever needs the
remaining info, can dump it from the machine.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-06-15 17:26:39

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 6/15/2023 12:20 PM, Borislav Petkov wrote:
> On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
>> We can solve the NMI issue if it ever comes up in the future. Unless there's
>> an obvious change to avoid this now. Any suggestions?
>
> Yes, solve it right from the get-go. "It cannot happen now" is not good
> enough. It should not be even technically possible.
>

Okay, understood.

> Just report what's logged into BERT - nothing more. Whoever needs the
> remaining info, can dump it from the machine.
>

Will do.

Thanks,
Yazen


2023-06-15 17:56:34

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 6/15/2023 1:02 PM, Yazen Ghannam wrote:
> On 6/15/2023 12:20 PM, Borislav Petkov wrote:
>> On Thu, Jun 15, 2023 at 11:34:21AM -0400, Yazen Ghannam wrote:
>>> We can solve the NMI issue if it ever comes up in the future. Unless
>>> there's
>>> an obvious change to avoid this now. Any suggestions?
>>
>> Yes, solve it right from the get-go. "It cannot happen now" is not good
>> enough. It should not be even technically possible.
>>
>
> Okay, understood.
>
>> Just report what's logged into BERT - nothing more. Whoever needs the
>> remaining info, can dump it from the machine.
>>
>
> Will do.
>

How about these changes? I can split this into separate preemption and
PPIN patches.

The PPIN isn't coming directly from BERT. But it seems we can use the
per_cpu value which is set up during CPU init.

Thanks,
Yazen

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 8ed341714686..db16dc3c7b03 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -97,15 +97,19 @@ int apei_smca_report_x86_error(struct
cper_ia_proc_ctx *ctx_info, u64 lapic_id)
if (ctx_info->reg_arr_size < 48)
return -EINVAL;

+ get_cpu();
mce_setup(&m);
+ put_cpu();

m.extcpu = -1;
m.socketid = -1;
+ m.ppin = 0;

for_each_possible_cpu(cpu) {
if (cpu_data(cpu).initial_apicid == lapic_id) {
m.extcpu = cpu;
m.socketid = cpu_data(m.extcpu).phys_proc_id;
+ m.ppin = cpu_data(m.extcpu).ppin;
break;
}
}


2023-06-15 18:05:32

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

> for_each_possible_cpu(cpu) {
> if (cpu_data(cpu).initial_apicid == lapic_id) {
> m.extcpu = cpu;

Are there any other places in the kernel where we need to find the
Linux CPU number starting from the apic id?

I briefly looked a while back when I needed this for some
debug use case and didn't find it then. But Linux changes
and maybe there are some now.

Which is a long way to say I'd like to see boot code initialize an array so the above
could just be:

m.extcpu = lapic_to_cpu[lapic_id];

But we'd need at least a couple of use cases to justify.

-Tony

2023-06-16 14:24:38

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

On 6/15/2023 1:54 PM, Luck, Tony wrote:
>> for_each_possible_cpu(cpu) {
>> if (cpu_data(cpu).initial_apicid == lapic_id) {
>> m.extcpu = cpu;
>
> Are there any other places in the kernel where we need to find the
> Linux CPU number starting from the apic id?
>
> I briefly looked a while back when I needed this for some
> debug use case and didn't find it then. But Linux changes
> and maybe there are some now.
>
> Which is a long way to say I'd like to see boot code initialize an array so the above
> could just be:
>
> m.extcpu = lapic_to_cpu[lapic_id];
>

So functionally the inverse of "x86_cpu_to_apicid", right? Though not
per_cpu, of course.

I'll try and write this up if I don't find anything like it.

> But we'd need at least a couple of use cases to justify.
>

Can you describe your original use case?

Thanks,
Yazen


2023-06-16 16:16:38

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH] x86/mce: Schedule mce_setup() on correct CPU for CPER decoding

> Can you describe your original use case?

<vague_mode>
I was playing with a potential future feature where I was given the x2apicid and
had to find the Linux CPU number.
</vaguemode>

-Tony