2022-03-08 23:22:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Unify vendors grading logic and provide AMD machine error checks

On Tue, Mar 08, 2022 at 12:41:34PM -0600, Carlos Bilbao wrote:
> AMD's severity grading covers very few machine errors. In the graded cases
> there are no user-readable messages, complicating debugging of critical
> hardware errors. Furthermore, with the current implementation AMD MCEs have
> no support for the severities-coverage file. Adding new severities for AMD
> with the current logic would be too convoluted.
>
> Fix the above issues including AMD severities to the severity table, in
> combination with Intel MCEs. Unify the severity grading logic of both
> vendors. Label the vendor-specific cases (e.g. cases with different
> registers) where checks cannot be implicit with the available features.
>
> Signed-off-by: Carlos Bilbao <[email protected]>
> ---
> arch/x86/include/asm/mce.h | 7 ++
> arch/x86/kernel/cpu/mce/severity.c | 188 +++++++++++++++--------------
> 2 files changed, 103 insertions(+), 92 deletions(-)

Sorry, maybe you're too new to this and you probably haven't read the
old discussions we have had about the severity grading turd. In order to
save you some time: adding more to that macro insanity is not going to
happen.

The AMD severity grading functions are *actually* readable vs this
abomination which I hate with passion.

If you want to add more logic, you should add to mce_severity_amd(),
perhaps call other helper functions which grade based on a certain
aspect of the error type, split the logic, use comments, etc, but
*definitely* not this.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2022-03-09 16:33:19

by Bilbao, Carlos

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Unify vendors grading logic and provide AMD machine error checks

On 3/8/2022 1:32 PM, Borislav Petkov wrote:
> On Tue, Mar 08, 2022 at 12:41:34PM -0600, Carlos Bilbao wrote:
>> AMD's severity grading covers very few machine errors. In the graded cases
>> there are no user-readable messages, complicating debugging of critical
>> hardware errors. Furthermore, with the current implementation AMD MCEs have
>> no support for the severities-coverage file. Adding new severities for AMD
>> with the current logic would be too convoluted.
>>
>> Fix the above issues including AMD severities to the severity table, in
>> combination with Intel MCEs. Unify the severity grading logic of both
>> vendors. Label the vendor-specific cases (e.g. cases with different
>> registers) where checks cannot be implicit with the available features.
>>
>> Signed-off-by: Carlos Bilbao <[email protected]>
>> ---
>> arch/x86/include/asm/mce.h | 7 ++
>> arch/x86/kernel/cpu/mce/severity.c | 188 +++++++++++++++--------------
>> 2 files changed, 103 insertions(+), 92 deletions(-)
>
> Sorry, maybe you're too new to this and you probably haven't read the
> old discussions we have had about the severity grading turd. In order to
> save you some time: adding more to that macro insanity is not going to
> happen.
>
> The AMD severity grading functions are *actually* readable vs this
> abomination which I hate with passion.
>
> If you want to add more logic, you should add to mce_severity_amd(),
> perhaps call other helper functions which grade based on a certain
> aspect of the error type, split the logic, use comments, etc, but
> *definitely* not this.
>
> Thx.
>

Understood, sending a new patch in that direction.

Thanks,
Carlos