2020-02-17 13:08:40

by Prarit Bhargava

[permalink] [raw]
Subject: [PATCH v2] x86/mce: Do not log spurious corrected mce errors

A user has reported that they are seeing spurious corrected errors on
their hardware.

Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
"spurious corrected errors may be logged in the IA32_MC0_STATUS register
with the valid field (bit 63) set, the uncorrected error field (bit 61)
not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
an MCA Error Code (bits [15:0]) of 0x0005."

Block these spurious errors from the console and logs.

Links to Intel Specification updates:
HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html

Signed-off-by: Prarit Bhargava <[email protected]>
Co-developed-by: Alexander Krupp <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/cpu/mce/core.c | 2 ++
arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mce/internal.h | 1 +
3 files changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c4f949611e4..fe3983d551cc 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
{
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
return amd_filter_mce(m);
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ return intel_filter_mce(m);

return false;
}
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 5627b1091b85..989148e6746c 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
{
intel_clear_lmce();
}
+
+bool intel_filter_mce(struct mce *m)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
+ if ((c->x86 == 6) &&
+ ((c->x86_model == INTEL_FAM6_HASWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_L) ||
+ (c->x86_model == INTEL_FAM6_BROADWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
+ (m->bank == 0) &&
+ ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
+ return true;
+
+ return false;
+}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b785c0d0b590..821faba5b05d 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m);
#else
static inline bool amd_filter_mce(struct mce *m) { return false; };
#endif
+extern bool intel_filter_mce(struct mce *m);

#endif /* __X86_MCE_INTERNAL_H__ */
--
2.21.1


2020-02-18 16:13:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors

On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote:
> A user has reported that they are seeing spurious corrected errors on
> their hardware.
>
> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
> with the valid field (bit 63) set, the uncorrected error field (bit 61)
> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
> an MCA Error Code (bits [15:0]) of 0x0005."
>
> Block these spurious errors from the console and logs.
>
> Links to Intel Specification updates:
> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html

My previous review comment still holds:

Those links tend to get stale with time. If you really want to refer to
the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
them there as an attachment and add the link to the entry to the commit
message.

> Signed-off-by: Prarit Bhargava <[email protected]>
> Co-developed-by: Alexander Krupp <[email protected]>

WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
#36:

See Documentation/process/submitting-patches.rst for more detail.

> Cc: Tony Luck <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/kernel/cpu/mce/core.c | 2 ++
> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
> arch/x86/kernel/cpu/mce/internal.h | 1 +
> 3 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2c4f949611e4..fe3983d551cc 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
> {
> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> return amd_filter_mce(m);
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + return intel_filter_mce(m);
>
> return false;
> }
> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index 5627b1091b85..989148e6746c 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
> {
> intel_clear_lmce();
> }
> +
> +bool intel_filter_mce(struct mce *m)
> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
> + if ((c->x86 == 6) &&
> + ((c->x86_model == INTEL_FAM6_HASWELL) ||
> + (c->x86_model == INTEL_FAM6_HASWELL_L) ||
> + (c->x86_model == INTEL_FAM6_BROADWELL) ||
> + (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
> + (m->bank == 0) &&
> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
> + return true;
> +
> + return false;
> +}
> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
> index b785c0d0b590..821faba5b05d 100644
> --- a/arch/x86/kernel/cpu/mce/internal.h
> +++ b/arch/x86/kernel/cpu/mce/internal.h
> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m);
> #else
> static inline bool amd_filter_mce(struct mce *m) { return false; };
> #endif
> +extern bool intel_filter_mce(struct mce *m);

It doesn't even build:

ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce':
/home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce'
make: *** [Makefile:1077: vmlinux] Error 1

Hint: do it like it is done for amd_filter_mce() but in the respective
#ifdef CONFIG_X86_MCE_INTEL place.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-18 16:18:20

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors



On 2/18/20 11:13 AM, Borislav Petkov wrote:
> On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote:
>> A user has reported that they are seeing spurious corrected errors on
>> their hardware.
>>
>> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
>> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
>> with the valid field (bit 63) set, the uncorrected error field (bit 61)
>> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
>> an MCA Error Code (bits [15:0]) of 0x0005."
>>
>> Block these spurious errors from the console and logs.
>>
>> Links to Intel Specification updates:
>> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
>> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
>> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
>> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
>
> My previous review comment still holds:
>
> Those links tend to get stale with time. If you really want to refer to
> the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
> them there as an attachment and add the link to the entry to the commit
> message.
>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> Co-developed-by: Alexander Krupp <[email protected]>
>
> WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
> #36:
>
> See Documentation/process/submitting-patches.rst for more detail.
>
>> Cc: Tony Luck <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> arch/x86/kernel/cpu/mce/core.c | 2 ++
>> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
>> arch/x86/kernel/cpu/mce/internal.h | 1 +
>> 3 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 2c4f949611e4..fe3983d551cc 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
>> {
>> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>> return amd_filter_mce(m);
>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
>> + return intel_filter_mce(m);
>>
>> return false;
>> }
>> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
>> index 5627b1091b85..989148e6746c 100644
>> --- a/arch/x86/kernel/cpu/mce/intel.c
>> +++ b/arch/x86/kernel/cpu/mce/intel.c
>> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
>> {
>> intel_clear_lmce();
>> }
>> +
>> +bool intel_filter_mce(struct mce *m)
>> +{
>> + struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
>> + if ((c->x86 == 6) &&
>> + ((c->x86_model == INTEL_FAM6_HASWELL) ||
>> + (c->x86_model == INTEL_FAM6_HASWELL_L) ||
>> + (c->x86_model == INTEL_FAM6_BROADWELL) ||
>> + (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
>> + (m->bank == 0) &&
>> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
>> + return true;
>> +
>> + return false;
>> +}
>> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
>> index b785c0d0b590..821faba5b05d 100644
>> --- a/arch/x86/kernel/cpu/mce/internal.h
>> +++ b/arch/x86/kernel/cpu/mce/internal.h
>> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m);
>> #else
>> static inline bool amd_filter_mce(struct mce *m) { return false; };
>> #endif
>> +extern bool intel_filter_mce(struct mce *m);
>
> It doesn't even build:
>
> ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce':
> /home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce'
> make: *** [Makefile:1077: vmlinux] Error 1
>
> Hint: do it like it is done for amd_filter_mce() but in the respective
> #ifdef CONFIG_X86_MCE_INTEL place.'

That's weird. I just re-compiled it on my system without any issue. Can I have
your .config to double check my compile? I'm using the standard fedora config FWIW.

P.
>

2020-02-18 16:24:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors

On Tue, Feb 18, 2020 at 11:17:34AM -0500, Prarit Bhargava wrote:
> That's weird. I just re-compiled it on my system without any issue. Can I have
> your .config to double check my compile? I'm using the standard fedora config FWIW.

Just disable CONFIG_X86_MCE_INTEL

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-19 12:27:43

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors



On 2/18/20 11:13 AM, Borislav Petkov wrote:
> On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote:
>> A user has reported that they are seeing spurious corrected errors on
>> their hardware.
>>
>> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
>> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
>> with the valid field (bit 63) set, the uncorrected error field (bit 61)
>> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
>> an MCA Error Code (bits [15:0]) of 0x0005."
>>
>> Block these spurious errors from the console and logs.
>>
>> Links to Intel Specification updates:
>> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
>> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
>> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
>> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
>
> My previous review comment still holds:
>
> Those links tend to get stale with time. If you really want to refer to
> the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
> them there as an attachment and add the link to the entry to the commit
> message.
>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> Co-developed-by: Alexander Krupp <[email protected]>
>
> WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
> #36:

Borislav, when I've been using your Co-developed-by & not using a Signed-off-by
process but when I run checkpatch.py I get the following warning:

WARNING: Co-developed-by and Signed-off-by: name/email do not match
#21:
Co-developed-by: Alexander Krupp <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>

When I submitted this patch I looked at other commits in the kernel near
top-of-tree and they have Signed-off-by followed by Co-developed-by, and also
took your suggestion of not using a Signed-off-by for Alexander. That's why I
chose to display them this way in v2.

Examples:

126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
40ca1bf580ef ("PCI: brcmstb: Add MSI support")

I'm now thoroughly confused as to what the correct format is. It seems like
checkpatch.py is telling me to include a Signed-off-by in addition to the
Co-developed-by for Alexander but you explicitly told me not to.

P.

>
> See Documentation/process/submitting-patches.rst for more detail.
>
>> Cc: Tony Luck <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> arch/x86/kernel/cpu/mce/core.c | 2 ++
>> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
>> arch/x86/kernel/cpu/mce/internal.h | 1 +
>> 3 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 2c4f949611e4..fe3983d551cc 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
>> {
>> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>> return amd_filter_mce(m);
>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
>> + return intel_filter_mce(m);
>>
>> return false;
>> }
>> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
>> index 5627b1091b85..989148e6746c 100644
>> --- a/arch/x86/kernel/cpu/mce/intel.c
>> +++ b/arch/x86/kernel/cpu/mce/intel.c
>> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
>> {
>> intel_clear_lmce();
>> }
>> +
>> +bool intel_filter_mce(struct mce *m)
>> +{
>> + struct cpuinfo_x86 *c = &boot_cpu_data;
>> +
>> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
>> + if ((c->x86 == 6) &&
>> + ((c->x86_model == INTEL_FAM6_HASWELL) ||
>> + (c->x86_model == INTEL_FAM6_HASWELL_L) ||
>> + (c->x86_model == INTEL_FAM6_BROADWELL) ||
>> + (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
>> + (m->bank == 0) &&
>> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
>> + return true;
>> +
>> + return false;
>> +}
>> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
>> index b785c0d0b590..821faba5b05d 100644
>> --- a/arch/x86/kernel/cpu/mce/internal.h
>> +++ b/arch/x86/kernel/cpu/mce/internal.h
>> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m);
>> #else
>> static inline bool amd_filter_mce(struct mce *m) { return false; };
>> #endif
>> +extern bool intel_filter_mce(struct mce *m);
>
> It doesn't even build:
>
> ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce':
> /home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce'
> make: *** [Makefile:1077: vmlinux] Error 1
>
> Hint: do it like it is done for amd_filter_mce() but in the respective
> #ifdef CONFIG_X86_MCE_INTEL place.
>

2020-02-19 13:04:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors

On Wed, Feb 19, 2020 at 07:25:59AM -0500, Prarit Bhargava wrote:
> When I submitted this patch I looked at other commits in the kernel near
> top-of-tree and they have Signed-off-by followed by Co-developed-by, and also
> took your suggestion of not using a Signed-off-by for Alexander. That's why I

I said:

"This is not how this is expressed. Either you write that in free text in
the commit message or you use Co-developed-by. More details in

Documentation/process/submitting-patches.rst"

> I'm now thoroughly confused as to what the correct format is. It seems like
> checkpatch.py is telling me to include a Signed-off-by in addition to the
> Co-developed-by for Alexander but you explicitly told me not to.
>
> > See Documentation/process/submitting-patches.rst for more detail.

You need to start reading my replies in their entirety and finally read that
document I've pointed to twice:

"Co-developed-by: states that the patch was co-created by multiple developers;
it is a used to give attribution to co-authors (in addition to the author
attributed by the From: tag) when several people work on a single patch. Since
^^^^^
Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
followed by a Signed-off-by: of the associated co-author."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I.e., basically what checkpatch is saying.

So you either

a) write in free text in the commit message something like

"This is based on a patch submitted to RH bugzilla by Alexander Krupp
<[email protected]>"

OR

b) use

Co-developed-by: Alexander Krupp <[email protected]>
Signed-off-by: Alexander Krupp <[email protected]>

Either a) XOR b).

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette