2014-10-27 00:57:24

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 0/2] RAS: add the support for handling UCNA/DEFERRED error

And on the basis of the first patch, the second patch adds the support
for identifying and handling UCNA/DEFERRED error in machine_check_poll.

[PATCH 1/2] x86, mce, severity: extend the the mce_severity
[PATCH 2/2] x86, mce: support memory error recovery for both UCNA


2014-10-27 00:57:46

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 1/2] x86, mce, severity: extend the the mce_severity mechanism to handle UCNA/DEFERRED error

Until now, the mce_severity mechanism can only identify the
severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it
is not able to filter out DEFERRED error for ADM platform.

This patch aims to extend the mce_severity mechanism for
handling UCNA/DEFERRED error. In order to do this, the patch
introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY.

Signed-off-by: Chen Yucong <[email protected]>
---
arch/x86/include/asm/mce.h | 4 ++++
arch/x86/kernel/cpu/mcheck/mce-internal.h | 2 ++
arch/x86/kernel/cpu/mcheck/mce-severity.c | 6 +++++-
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
4 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 958b90f..40b35a5 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -34,6 +34,10 @@
#define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */
#define MCI_STATUS_AR (1ULL<<55) /* Action required */

+/* AMD-specific bits */
+#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */
+#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */
+
/*
* Note that the full MCACOD field of IA32_MCi_STATUS MSR is
* bits 15:0. But bit 12 is the 'F' bit, defined for corrected
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 09edd0b..d32fcbb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -3,6 +3,8 @@

enum severity_level {
MCE_NO_SEVERITY,
+ MCE_DEFERRED_SEVERITY,
+ MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY,
MCE_KEEP_SEVERITY,
MCE_SOME_SEVERITY,
MCE_AO_SEVERITY,
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index c370e1c..c12e0a7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -83,13 +83,17 @@ static struct severity {
KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
),
MCESEV(
+ DEFERRED, "Deferred error",
+ NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED)
+ ),
+ MCESEV(
KEEP, "Corrected error",
NOSER, BITCLR(MCI_STATUS_UC)
),

/* ignore OVER for UCNA */
MCESEV(
- KEEP, "Uncorrected no action required",
+ UCNA, "Uncorrected no action required",
SER, MASK(MCI_UC_SAR, MCI_STATUS_UC)
),
MCESEV(
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 61a9668ce..fdc422e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1101,7 +1101,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
* When machine check was for corrected handler don't touch,
* unless we're panicing.
*/
- if (severity == MCE_KEEP_SEVERITY && !no_way_out)
+ if ((severity == MCE_KEEP_SEVERITY ||
+ severity == MCE_UCNA_SEVERITY) && !no_way_out)
continue;
__set_bit(i, toclear);
if (severity == MCE_NO_SEVERITY) {
--
1.7.10.4

2014-10-27 00:58:10

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

Uncorrected no action required (UCNA) - is a UCR error that is not
signaled via a machine check exception and, instead, is reported to
system software as a corrected machine check error. UCNA errors indicate
that some data in the system is corrupted, but the data has not been
consumed and the processor state is valid and you may continue execution
on this processor. UCNA errors require no action from system software
to continue execution. Note that UCNA errors are supported by the
processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
-- Intel SDM Volume 3B

Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
-- ADM64 APM Volume 2

Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.

Signed-off-by: Chen Yucong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 55 ++++++++++++++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index fdc422e..7439077 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -575,6 +575,47 @@ static void mce_read_aux(struct mce *m, int i)
}
}

+static bool mem_deferred_error(struct mce *m)
+{
+ int severity;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
+ severity = mce_severity(m, mca_cfg.tolerant, NULL);
+
+ if (c->x86_vendor == X86_VENDOR_AMD) {
+ /*
+ * AMD BKDGs - Machine Check Error Codes
+ *
+ * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
+ * a memory-specific error. Note that this field encodes info-
+ * rmation about memory-hierarchy level involved in the error.
+ */
+ if (severity == MCE_DEFERRED_SEVERITY)
+ return (m->status & 0xff00) == BIT(8);
+ } else if (c->x86_vendor == X86_VENDOR_INTEL) {
+ /*
+ * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
+ *
+ * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
+ * indicating a memory error. Bit 8 is used for indicating a
+ * cache hierarchy error. The combination of bit 2 and bit 3
+ * is used for indicating a `generic' cache hierarchy error
+ * But we can't just blindly check the above bits, because if
+ * bit 11 is set, then it is a bus/interconnect error - and
+ * either way the above bits just gives more detail on what
+ * bus/interconnect error happened. Note that bit 12 can be
+ * ignored, as it's the "filter" bit.
+ */
+ if (severity == MCE_UCNA_SEVERITY)
+ return (m->status & 0xef80) == BIT(7) ||
+ (m->status & 0xef00) == BIT(8) ||
+ (m->status & 0xeffc) == 0xc;
+ }
+
+ return false;
+}
+
DEFINE_PER_CPU(unsigned, mce_poll_count);

/*
@@ -630,6 +671,16 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)

if (!(flags & MCP_TIMESTAMP))
m.tsc = 0;
+
+ /*
+ * In the cases where we don't have a valid address after all,
+ * do not add it into the ring buffer.
+ */
+ if (mem_deferred_error(&m) && (m.status & MCI_STATUS_ADDRV)) {
+ mce_ring_add(m.addr >> PAGE_SHIFT);
+ mce_schedule_work();
+ }
+
/*
* Don't get the IP here because it's unlikely to
* have anything to do with the actual error location.
@@ -1098,8 +1149,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
severity = mce_severity(&m, cfg->tolerant, NULL);

/*
- * When machine check was for corrected handler don't touch,
- * unless we're panicing.
+ * When machine check was for corrected/deferred handler don't
+ * touch, unless we're panicing.
*/
if ((severity == MCE_KEEP_SEVERITY ||
severity == MCE_UCNA_SEVERITY) && !no_way_out)
--
1.7.10.4

2014-10-27 09:37:34

by Chen Yucong

[permalink] [raw]
Subject: Re: [PATCH 0/2] RAS: add the support for handling UCNA/DEFERRED error

On Mon, 2014-10-27 at 08:56 +0800, Chen Yucong wrote:

Hi all,

At the suggestion of Boris, the first patch extends the mce_severity
mechanism for handling UCNA/DEFERRED error.
Link: https://lkml.org/lkml/2014/10/23/190

And on the basis of the first patch, the second patch adds the support
for identifying and handling UCNA/DEFERRED error in machine_check_poll.

[PATCH 1/2] x86, mce, severity: extend the the mce_severity
[PATCH 2/2] x86, mce: support memory error recovery for both UCNA

thx!
cyc

2014-10-27 23:10:13

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

+ m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
+ severity = mce_severity(m, mca_cfg.tolerant, NULL);

This seems a big hack to make mce_severity() work when called from
CMCI context (when MCG_STATUS register is not set). It would also
be confusing as the subsequent logged entries would show MCIP and RIPV
bits set in the mcg_status.

If someone can think of a less hacky way to do this, that would be good. Otherwise
the code needs a comment, and should reset m->mcg_status to avoid making logs
that have incorrect data.

-Tony

2014-10-28 02:22:12

by Chen Yucong

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

On Mon, 2014-10-27 at 23:10 +0000, Luck, Tony wrote:
> + m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
> + severity = mce_severity(m, mca_cfg.tolerant, NULL);
>
> This seems a big hack to make mce_severity() work when called from
> CMCI context (when MCG_STATUS register is not set). It would also
> be confusing as the subsequent logged entries would show MCIP and RIPV
> bits set in the mcg_status.
>
In fact, I have already noticed this issue from the start. But the
Intel SDM document that MCIP/RIPV/EIPV are specific to machine check
exception. And I don't know if the above flag bits will be checked in
CMCI context by error log/decode handlers.

> If someone can think of a less hacky way to do this, that would be good. Otherwise
> the code needs a comment, and should reset m->mcg_status to avoid making logs
> that have incorrect data.
>
Yes! the above code snippet should be commented. And another method
that can be used for restoring m->mcgstatus is shown below.

+ u8 mcgs = m->mcgstatus & 0xff;
+
+ m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
+ severity = mce_severity(m, mca_cfg.tolerant, NULL);
+ m->mcgstatus = (m->mcgstatus & ~0xff) | mcgs;

thx!
cyc

2014-10-29 02:16:37

by Chen Yucong

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

On Mon, 2014-10-27 at 23:10 +0000, Luck, Tony wrote:
> + m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
> + severity = mce_severity(m, mca_cfg.tolerant, NULL);
>
> This seems a big hack to make mce_severity() work when called from
> CMCI context (when MCG_STATUS register is not set). It would also
> be confusing as the subsequent logged entries would show MCIP and RIPV
> bits set in the mcg_status.
>
> If someone can think of a less hacky way to do this, that would be good. Otherwise
> the code needs a comment, and should reset m->mcg_status to avoid making logs
> that have incorrect data.
>
Hi all,

At the suggestion of Tony, this patch add a comment, and restore m->mcgstatus to avoid
making logs that have incorrect data.

thx!
cyc

From: Chen Yucong <[email protected]>

Signed-off-by: Chen Yucong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 64 ++++++++++++++++++++++++++++++++++++--
1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index fdc422e..d285d26 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -575,6 +575,56 @@ static void mce_read_aux(struct mce *m, int i)
}
}

+static bool mem_deferred_error(struct mce *m)
+{
+ int severity;
+ u8 mcgs = m->mcgstatus & 0xff;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ /*
+ * mce_severity is specific to machine check exception, and it will
+ * check MCIP/EIPV/RIPV bits. In order to get pass the check, we need
+ * to set MCIP and RIPV.
+ */
+ m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
+ severity = mce_severity(m, mca_cfg.tolerant, NULL);
+
+ /* restore the original value of m->mcgstatus */
+ m->mcgstatus = (m->mcgstatus & ~0xff) | mcgs;
+
+ if (c->x86_vendor == X86_VENDOR_AMD) {
+ /*
+ * AMD BKDGs - Machine Check Error Codes
+ *
+ * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
+ * a memory-specific error. Note that this field encodes info-
+ * rmation about memory-hierarchy level involved in the error.
+ */
+ if (severity == MCE_DEFERRED_SEVERITY)
+ return (m->status & 0xff00) == BIT(8);
+ } else if (c->x86_vendor == X86_VENDOR_INTEL) {
+ /*
+ * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
+ *
+ * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
+ * indicating a memory error. Bit 8 is used for indicating a
+ * cache hierarchy error. The combination of bit 2 and bit 3
+ * is used for indicating a `generic' cache hierarchy error
+ * But we can't just blindly check the above bits, because if
+ * bit 11 is set, then it is a bus/interconnect error - and
+ * either way the above bits just gives more detail on what
+ * bus/interconnect error happened. Note that bit 12 can be
+ * ignored, as it's the "filter" bit.
+ */
+ if (severity == MCE_UCNA_SEVERITY)
+ return (m->status & 0xef80) == BIT(7) ||
+ (m->status & 0xef00) == BIT(8) ||
+ (m->status & 0xeffc) == 0xc;
+ }
+
+ return false;
+}
+
DEFINE_PER_CPU(unsigned, mce_poll_count);

/*
@@ -630,6 +680,16 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)

if (!(flags & MCP_TIMESTAMP))
m.tsc = 0;
+
+ /*
+ * In the cases where we don't have a valid address after all,
+ * do not add it into the ring buffer.
+ */
+ if (mem_deferred_error(&m) && (m.status & MCI_STATUS_ADDRV)) {
+ mce_ring_add(m.addr >> PAGE_SHIFT);
+ mce_schedule_work();
+ }
+
/*
* Don't get the IP here because it's unlikely to
* have anything to do with the actual error location.
@@ -1098,8 +1158,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
severity = mce_severity(&m, cfg->tolerant, NULL);

/*
- * When machine check was for corrected handler don't touch,
- * unless we're panicing.
+ * When machine check was for corrected/deferred handler don't
+ * touch, unless we're panicing.
*/
if ((severity == MCE_KEEP_SEVERITY ||
severity == MCE_UCNA_SEVERITY) && !no_way_out)
--
1.7.10.4


2014-11-04 02:11:46

by Chen Yucong

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

On Wed, 2014-10-29 at 10:16 +0800, Chen Yucong wrote:
> On Mon, 2014-10-27 at 23:10 +0000, Luck, Tony wrote:
> > + m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
> > + severity = mce_severity(m, mca_cfg.tolerant, NULL);
> >
> > This seems a big hack to make mce_severity() work when called from
> > CMCI context (when MCG_STATUS register is not set). It would also
> > be confusing as the subsequent logged entries would show MCIP and RIPV
> > bits set in the mcg_status.
> >
> > If someone can think of a less hacky way to do this, that would be good. Otherwise
> > the code needs a comment, and should reset m->mcg_status to avoid making logs
> > that have incorrect data.
> >
> Hi all,
>
> At the suggestion of Tony, this patch add a comment, and restore m->mcgstatus to avoid
> making logs that have incorrect data.
>

Hi Tony,

Do you have any more comments for the two patches?

thx!
cyc
>
> From: Chen Yucong <[email protected]>
>
> Signed-off-by: Chen Yucong <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 64 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index fdc422e..d285d26 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -575,6 +575,56 @@ static void mce_read_aux(struct mce *m, int i)
> }
> }
>
> +static bool mem_deferred_error(struct mce *m)
> +{
> + int severity;
> + u8 mcgs = m->mcgstatus & 0xff;
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + /*
> + * mce_severity is specific to machine check exception, and it will
> + * check MCIP/EIPV/RIPV bits. In order to get pass the check, we need
> + * to set MCIP and RIPV.
> + */
> + m->mcgstatus |= (MCG_STATUS_MCIP|MCG_STATUS_RIPV);
> + severity = mce_severity(m, mca_cfg.tolerant, NULL);
> +
> + /* restore the original value of m->mcgstatus */
> + m->mcgstatus = (m->mcgstatus & ~0xff) | mcgs;
> +
> + if (c->x86_vendor == X86_VENDOR_AMD) {
> + /*
> + * AMD BKDGs - Machine Check Error Codes
> + *
> + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
> + * a memory-specific error. Note that this field encodes info-
> + * rmation about memory-hierarchy level involved in the error.
> + */
> + if (severity == MCE_DEFERRED_SEVERITY)
> + return (m->status & 0xff00) == BIT(8);
> + } else if (c->x86_vendor == X86_VENDOR_INTEL) {
> + /*
> + * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> + *
> + * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
> + * indicating a memory error. Bit 8 is used for indicating a
> + * cache hierarchy error. The combination of bit 2 and bit 3
> + * is used for indicating a `generic' cache hierarchy error
> + * But we can't just blindly check the above bits, because if
> + * bit 11 is set, then it is a bus/interconnect error - and
> + * either way the above bits just gives more detail on what
> + * bus/interconnect error happened. Note that bit 12 can be
> + * ignored, as it's the "filter" bit.
> + */
> + if (severity == MCE_UCNA_SEVERITY)
> + return (m->status & 0xef80) == BIT(7) ||
> + (m->status & 0xef00) == BIT(8) ||
> + (m->status & 0xeffc) == 0xc;
> + }
> +
> + return false;
> +}
> +
> DEFINE_PER_CPU(unsigned, mce_poll_count);
>
> /*
> @@ -630,6 +680,16 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>
> if (!(flags & MCP_TIMESTAMP))
> m.tsc = 0;
> +
> + /*
> + * In the cases where we don't have a valid address after all,
> + * do not add it into the ring buffer.
> + */
> + if (mem_deferred_error(&m) && (m.status & MCI_STATUS_ADDRV)) {
> + mce_ring_add(m.addr >> PAGE_SHIFT);
> + mce_schedule_work();
> + }
> +
> /*
> * Don't get the IP here because it's unlikely to
> * have anything to do with the actual error location.
> @@ -1098,8 +1158,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> severity = mce_severity(&m, cfg->tolerant, NULL);
>
> /*
> - * When machine check was for corrected handler don't touch,
> - * unless we're panicing.
> + * When machine check was for corrected/deferred handler don't
> + * touch, unless we're panicing.
> */
> if ((severity == MCE_KEEP_SEVERITY ||
> severity == MCE_UCNA_SEVERITY) && !no_way_out)

2014-11-04 11:38:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

On Wed, Oct 29, 2014 at 10:16:04AM +0800, Chen Yucong wrote:
> + /*
> + * mce_severity is specific to machine check exception, and it will
> + * check MCIP/EIPV/RIPV bits. In order to get pass the check, we need
> + * to set MCIP and RIPV.
> + */

No, no more hacks please! Less hacks and cleaner code instead.

You need to integrate it properly in the current solution, or, if the
current solution doesn't provide a proper way of integrating UCNA
errors, then you need to extend it and improve it to do so.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--