Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
"spurious corrected errors may be logged in the IA32_MC0_STATUS register
with the valid field (bit 63) set, the uncorrected error field (bit 61)
not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
an MCA Error Code (bits [15:0]) of 0x0005."
Block these spurious errors from the console and logs.
Links to Intel Specification updates:
HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
Signed-off-by: Alexander Krupp <[email protected]>
Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
arch/x86/kernel/cpu/mce/core.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c4f949611e4..d893cc764a06 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -121,6 +121,8 @@ static struct irq_work mce_irq_work;
static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
+static int (*quirk_noprint)(struct mce *m);
+
/*
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
@@ -232,6 +234,9 @@ struct mca_msr_regs msr_ops = {
static void __print_mce(struct mce *m)
{
+ if (quirk_noprint && quirk_noprint(m))
+ return;
+
pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
m->extcpu,
(m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
@@ -1622,6 +1627,15 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
m->cs = regs->cs;
}
+static int quirk_spurious_ce_noprint(struct mce *m)
+{
+ if (m->bank == 0 &&
+ (m->status & 0xa0000000ffffffff) == 0x80000000000f0005)
+ return 1;
+
+ return 0;
+}
+
/* Add per CPU specific workarounds here */
static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
{
@@ -1696,6 +1710,13 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
if (c->x86 == 6 && c->x86_model == 45)
quirk_no_way_out = quirk_sandybridge_ifu;
+
+ if ((c->x86 == 6) &&
+ ((c->x86_model == 0x3c) || (c->x86_model == 0x3d) ||
+ (c->x86_model == 0x45) || (c->x86_model == 46))) {
+ pr_info("MCE errata HSD131, HSM142, HSW131, BDM48, or HSM142 enabled.\n");
+ quirk_noprint = quirk_spurious_ce_noprint;
+ }
}
if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
--
2.21.1
On Wed, Feb 05, 2020 at 07:58:31AM -0500, Prarit Bhargava wrote:
> Subject: Re: [PATCH] x86/mce: Enable HSD131, HSM142, HSW131, BDM48, and HSM142
That subject is unreadable for humans.
> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
> with the valid field (bit 63) set, the uncorrected error field (bit 61)
> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
> an MCA Error Code (bits [15:0]) of 0x0005."
>
> Block these spurious errors from the console and logs.
Are they being hit in the wild or why do we need this?
> Links to Intel Specification updates:
> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
Those links tend to get stale with time. If you really want to refer to
the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
them there as an attachment and add the link to the entry to the commit
message.
> Signed-off-by: Alexander Krupp <[email protected]>
What's that Signed-off-by: tag supposed to mean?
> Signed-off-by: Prarit Bhargava <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> arch/x86/kernel/cpu/mce/core.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
If at all, this should be done by adding an intel_filter_mce() function
and called from filter_mce() so that such errors don't get logged.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2/6/20 6:10 AM, Borislav Petkov wrote:
> On Wed, Feb 05, 2020 at 07:58:31AM -0500, Prarit Bhargava wrote:
>
>> Subject: Re: [PATCH] x86/mce: Enable HSD131, HSM142, HSW131, BDM48, and HSM142
>
> That subject is unreadable for humans.
Yeah :/ I couldn't think of a better one. Maybe "Block spurious corrected
errors on some Intel processors"? Any other suggestion?
>
>> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
>> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
>> with the valid field (bit 63) set, the uncorrected error field (bit 61)
>> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
>> an MCA Error Code (bits [15:0]) of 0x0005."
>>
>> Block these spurious errors from the console and logs.
>
> Are they being hit in the wild or why do we need this?
Alexander, cc'd, is being hit by this in the wild.
>
>> Links to Intel Specification updates:
>> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
>> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
>> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
>> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
>
> Those links tend to get stale with time. If you really want to refer to
> the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
> them there as an attachment and add the link to the entry to the commit
> message.
>
>> Signed-off-by: Alexander Krupp <[email protected]>
>
> What's that Signed-off-by: tag supposed to mean?
>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> Cc: Tony Luck <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> arch/x86/kernel/cpu/mce/core.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>
> If at all, this should be done by adding an intel_filter_mce() function
> and called from filter_mce() so that such errors don't get logged.
I'll take a look over there.
P.
>
> Thx.
>
On 2/6/20 7:53 AM, Prarit Bhargava wrote:
>
>
> On 2/6/20 6:10 AM, Borislav Petkov wrote:
>> On Wed, Feb 05, 2020 at 07:58:31AM -0500, Prarit Bhargava wrote:
>>
>>> Subject: Re: [PATCH] x86/mce: Enable HSD131, HSM142, HSW131, BDM48, and HSM142
>>
>> That subject is unreadable for humans.
>
> Yeah :/ I couldn't think of a better one. Maybe "Block spurious corrected
> errors on some Intel processors"? Any other suggestion?
>
>>
>>> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
>>> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
>>> with the valid field (bit 63) set, the uncorrected error field (bit 61)
>>> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
>>> an MCA Error Code (bits [15:0]) of 0x0005."
>>>
>>> Block these spurious errors from the console and logs.
>>
>> Are they being hit in the wild or why do we need this?
>
> Alexander, cc'd, is being hit by this in the wild.
>
>>
>>> Links to Intel Specification updates:
>>> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
>>> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
>>> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
>>> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html
>>
>> Those links tend to get stale with time. If you really want to refer to
>> the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
>> them there as an attachment and add the link to the entry to the commit
>> message.
>>
>>> Signed-off-by: Alexander Krupp <[email protected]>
>>
>> What's that Signed-off-by: tag supposed to mean?
Sorry. I missed this question, but I really don't understand the question.
Alexander posted a patch in a kernel bugzilla @ Red Hat and I modified the patch
with some additional changes. I don't want him to lose credit for the work so
he's got a proper Signed-off-by tag for this patch.
P.
On Thu, Feb 06, 2020 at 07:53:34AM -0500, Prarit Bhargava wrote:
> Yeah :/ I couldn't think of a better one. Maybe "Block spurious corrected
> errors on some Intel processors"? Any other suggestion?
"Do not log ..."
> Alexander, cc'd, is being hit by this in the wild.
Do say that in the commit message.
> >> Signed-off-by: Alexander Krupp <[email protected]>
> >
> > What's that Signed-off-by: tag supposed to mean?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You missed this one.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Feb 06, 2020 at 08:05:24AM -0500, Prarit Bhargava wrote:
> Sorry. I missed this question, but I really don't understand the question.
> Alexander posted a patch in a kernel bugzilla @ Red Hat and I modified the patch
> with some additional changes. I don't want him to lose credit for the work so
> he's got a proper Signed-off-by tag for this patch.
This is not how this is expressed. Either you write that in free text in
the commit message or you use Co-developed-by. More details in
Documentation/process/submitting-patches.rst
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette