2015-05-13 22:39:29

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH] x86, mce, amd: Read mcgstatus before we log the error

Code to read mcgstatus was introduced with patch 44612a3ac.
However, that seems to have been accidentally removed in a3a529d10.
Adding that back here.

Signed-off-by: Aravind Gopalakrishnan <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index e99b150..7edfa4c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -319,6 +319,7 @@ static void __log_error(unsigned int bank, bool threshold_err, u64 misc)
return;

mce_setup(&m);
+ rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);

m.status = status;
m.bank = bank;
--
2.4.0


2015-05-14 09:50:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86, mce, amd: Read mcgstatus before we log the error

On Wed, May 13, 2015 at 12:37:04PM -0500, Aravind Gopalakrishnan wrote:
> Code to read mcgstatus was introduced with patch 44612a3ac.
> However, that seems to have been accidentally removed in a3a529d10.
> Adding that back here.
>
> Signed-off-by: Aravind Gopalakrishnan <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce_amd.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index e99b150..7edfa4c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -319,6 +319,7 @@ static void __log_error(unsigned int bank, bool threshold_err, u64 misc)
> return;
>
> mce_setup(&m);
> + rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);

Any meaningful bits in that MSR we wanna know when getting a
thresholding or deferred error? Are they even defined?

If yes, RIPV should always be 1b, EIPV too, MCIP can't be set.

-ENOMOREUSEFULBITS.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-14 14:56:22

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH] x86, mce, amd: Read mcgstatus before we log the error

On 5/14/2015 4:50 AM, Borislav Petkov wrote:
> On Wed, May 13, 2015 at 12:37:04PM -0500, Aravind Gopalakrishnan wrote:
>>
>> mce_setup(&m);
>> + rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> Any meaningful bits in that MSR we wanna know when getting a
> thresholding or deferred error? Are they even defined?

Ah. Good point. RIPV is not defined for Deferred errors.
For thresholding, we'll hit the interrupt handler only if we hit the
threshold and
it is not UC error (for which RIPV is not defined). Else, the counter
would be incremented,
but it would cause a #MC anyway.

> If yes, RIPV should always be 1b, EIPV too, MCIP can't be set.
>
> -ENOMOREUSEFULBITS.
>

Thanks,
-Aravind.