2024-04-04 15:15:10

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 06/16] x86/mce/amd: Prep DFR handler before enabling banks

Scalable MCA systems use the per-bank MCA_CONFIG register to enable
deferred error interrupts. This is done as part of SMCA configuration.

Currently, the deferred error interrupt handler is set up after SMCA
configuration.

Move the deferred error interrupt handler set up before SMCA
configuration. This ensures the kernel is ready to receive the
interrupts before the hardware is configured to send them.

Signed-off-by: Yazen Ghannam <[email protected]>
---

Notes:
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3093fed06194..e8e78d91082b 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -589,6 +589,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
u32 low = 0, high = 0;
int def_offset = -1, def_new;

+ if (!mce_flags.succor)
+ return;
+
if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
return;

@@ -768,6 +771,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
u32 low = 0, high = 0, address = 0;
int offset = -1;

+ deferred_error_interrupt_enable(c);

for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
if (mce_flags.smca)
@@ -794,9 +798,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
offset = prepare_threshold_block(bank, block, address, offset, high);
}
}
-
- if (mce_flags.succor)
- deferred_error_interrupt_enable(c);
}

/*
--
2.34.1



2024-04-24 18:35:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 06/16] x86/mce/amd: Prep DFR handler before enabling banks

On Thu, Apr 04, 2024 at 10:13:49AM -0500, Yazen Ghannam wrote:
> Scalable MCA systems use the per-bank MCA_CONFIG register to enable
> deferred error interrupts. This is done as part of SMCA configuration.
>
> Currently, the deferred error interrupt handler is set up after SMCA
> configuration.
>
> Move the deferred error interrupt handler set up before SMCA
> configuration. This ensures the kernel is ready to receive the
> interrupts before the hardware is configured to send them.
>
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---
>
> Notes:
> Link:
> https://lkml.kernel.org/r/[email protected]
>
> v1->v2:
> * No change.
>
> arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 3093fed06194..e8e78d91082b 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -589,6 +589,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
> u32 low = 0, high = 0;
> int def_offset = -1, def_new;
>
> + if (!mce_flags.succor)

Does succor imply smca?

Because you have now this order:

deferred_error_interrupt_enable(c);

...

configure_smca(bank);

and that one tests mce_flags.smca

Now, if succor didn't imply smca, we'll enable the DF irq handler for
no good reason on (succor=true && smca=false) systems.

If the implication is given:

Reviewed-by: Borislav Petkov (AMD) <[email protected]>

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-25 13:33:25

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 06/16] x86/mce/amd: Prep DFR handler before enabling banks



On 4/24/2024 2:34 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2024 at 10:13:49AM -0500, Yazen Ghannam wrote:
>> Scalable MCA systems use the per-bank MCA_CONFIG register to enable
>> deferred error interrupts. This is done as part of SMCA configuration.
>>
>> Currently, the deferred error interrupt handler is set up after SMCA
>> configuration.
>>
>> Move the deferred error interrupt handler set up before SMCA
>> configuration. This ensures the kernel is ready to receive the
>> interrupts before the hardware is configured to send them.
>>
>> Signed-off-by: Yazen Ghannam <[email protected]>
>> ---
>>
>> Notes:
>> Link:
>> https://lkml.kernel.org/r/[email protected]
>>
>> v1->v2:
>> * No change.
>>
>> arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 3093fed06194..e8e78d91082b 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -589,6 +589,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>> u32 low = 0, high = 0;
>> int def_offset = -1, def_new;
>>
>> + if (!mce_flags.succor)
>
> Does succor imply smca?
>
> Because you have now this order:
>
> deferred_error_interrupt_enable(c);
>
> ...
>
> configure_smca(bank);
>
> and that one tests mce_flags.smca
>
> Now, if succor didn't imply smca, we'll enable the DF irq handler for
> no good reason on (succor=true && smca=false) systems.
>
> If the implication is given:
>
> Reviewed-by: Borislav Petkov (AMD) <[email protected]>
>

They are independent features. SUCCOR is the feature that defines the deferred
error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
in the later Family 15h systems.

Thanks,
Yazen

2024-04-29 13:11:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 06/16] x86/mce/amd: Prep DFR handler before enabling banks

On Thu, Apr 25, 2024 at 09:31:58AM -0400, Yazen Ghannam wrote:
> They are independent features. SUCCOR is the feature that defines the deferred
> error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
> in the later Family 15h systems.

.. and as we've established, it is not really enabled on them for
whatever reason so for simplicity's sake, we'll simply assume that it
was enabled together with SMCA and that would simplify a lot of things
in the code.

Please put that in the commit message so that we know.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-29 13:58:50

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 06/16] x86/mce/amd: Prep DFR handler before enabling banks

On 4/29/2024 8:38 AM, Borislav Petkov wrote:
> On Thu, Apr 25, 2024 at 09:31:58AM -0400, Yazen Ghannam wrote:
>> They are independent features. SUCCOR is the feature that defines the deferred
>> error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
>> in the later Family 15h systems.
>
> ... and as we've established, it is not really enabled on them for
> whatever reason so for simplicity's sake, we'll simply assume that it
> was enabled together with SMCA and that would simplify a lot of things
> in the code.
>
> Please put that in the commit message so that we know.
>
> Thx.
>

Okay, will do.

Thanks,
Yazen