2020-02-19 13:17:16

by Prarit Bhargava

[permalink] [raw]
Subject: [PATCH v3] x86/mce: Do not log spurious corrected mce errors

A user has reported that they are seeing spurious corrected errors on
their hardware.

Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
"spurious corrected errors may be logged in the IA32_MC0_STATUS register
with the valid field (bit 63) set, the uncorrected error field (bit 61)
not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
an MCA Error Code (bits [15:0]) of 0x0005." The Errata PDFs are linked in
the bugzilla below.

Block these spurious errors from the console and logs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206587
Co-developed-by: Alexander Krupp <[email protected]>
Signed-off-by: Alexander Krupp <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
v2: Fix intel_filter_mce() declaration
v3: Fix !CONFIG_X86_MCE_INTEL compile

arch/x86/kernel/cpu/mce/core.c | 2 ++
arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mce/internal.h | 5 +++++
3 files changed, 24 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c4f949611e4..fe3983d551cc 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
{
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
return amd_filter_mce(m);
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ return intel_filter_mce(m);

return false;
}
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 5627b1091b85..989148e6746c 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
{
intel_clear_lmce();
}
+
+bool intel_filter_mce(struct mce *m)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
+ if ((c->x86 == 6) &&
+ ((c->x86_model == INTEL_FAM6_HASWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_L) ||
+ (c->x86_model == INTEL_FAM6_BROADWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
+ (m->bank == 0) &&
+ ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
+ return true;
+
+ return false;
+}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b785c0d0b590..f6e0419969c5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -175,5 +175,10 @@ extern bool amd_filter_mce(struct mce *m);
#else
static inline bool amd_filter_mce(struct mce *m) { return false; };
#endif
+#ifdef CONFIG_X86_MCE_INTEL
+extern bool intel_filter_mce(struct mce *m);
+#else
+static inline bool intel_filter_mce(struct mce *m) { return false; };
+#endif

#endif /* __X86_MCE_INTERNAL_H__ */
--
2.21.1


Subject: [tip: ras/core] x86/mce: Do not log spurious corrected mce errors

The following commit has been merged into the ras/core branch of tip:

Commit-ID: 2976908e4198aa02fc3f76802358f69396267189
Gitweb: https://git.kernel.org/tip/2976908e4198aa02fc3f76802358f69396267189
Author: Prarit Bhargava <[email protected]>
AuthorDate: Wed, 19 Feb 2020 08:16:11 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Wed, 19 Feb 2020 18:14:49 +01:00

x86/mce: Do not log spurious corrected mce errors

A user has reported that they are seeing spurious corrected errors on
their hardware.

Intel Errata HSD131, HSM142, HSW131, and BDM48 report that "spurious
corrected errors may be logged in the IA32_MC0_STATUS register with
the valid field (bit 63) set, the uncorrected error field (bit 61) not
set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA
Error Code (bits [15:0]) of 0x0005." The Errata PDFs are linked in the
bugzilla below.

Block these spurious errors from the console and logs.

[ bp: Move the intel_filter_mce() header declarations into the already
existing CONFIG_X86_MCE_INTEL ifdeffery. ]

Co-developed-by: Alexander Krupp <[email protected]>
Signed-off-by: Alexander Krupp <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206587
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 2 ++
arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
arch/x86/kernel/cpu/mce/internal.h | 2 ++
3 files changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c4f949..fe3983d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
{
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
return amd_filter_mce(m);
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+ return intel_filter_mce(m);

return false;
}
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index 5627b10..989148e 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
{
intel_clear_lmce();
}
+
+bool intel_filter_mce(struct mce *m)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
+ if ((c->x86 == 6) &&
+ ((c->x86_model == INTEL_FAM6_HASWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_L) ||
+ (c->x86_model == INTEL_FAM6_BROADWELL) ||
+ (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
+ (m->bank == 0) &&
+ ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
+ return true;
+
+ return false;
+}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b785c0d..97db184 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -48,6 +48,7 @@ void cmci_disable_bank(int bank);
void intel_init_cmci(void);
void intel_init_lmce(void);
void intel_clear_lmce(void);
+bool intel_filter_mce(struct mce *m);
#else
# define cmci_intel_adjust_timer mce_adjust_timer_default
static inline bool mce_intel_cmci_poll(void) { return false; }
@@ -56,6 +57,7 @@ static inline void cmci_disable_bank(int bank) { }
static inline void intel_init_cmci(void) { }
static inline void intel_init_lmce(void) { }
static inline void intel_clear_lmce(void) { }
+static inline bool intel_filter_mce(struct mce *m) { return false; };
#endif

void mce_timer_kick(unsigned long interval);