2014-11-05 04:48:39

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 0/2 v2] RAS: add the support for handling UCNA/DEFERRED error

Hi all,

At the suggestion of Boris, the first patch extends the mce_severity
mechanism for handling UCNA/DEFERRED error.
Link: https://lkml.org/lkml/2014/10/23/190

The first patch have also eliminated a big hack to make mce_severity() work
when called from non-exception context on the advice of Tony and Boris.
Link: https://lkml.org/lkml/2014/10/27/1017

And on the basis of the first patch, the second patch adds the support
for identifying and handling UCNA/DEFERRED error in machine_check_poll.

[PATCH 1/2] x86, mce, severity: extend the the mce_severity
[PATCH 2/2] x86, mce: support memory error recovery for both UCNA

thx!
cyc


2014-11-05 04:49:05

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for ADM platform.

This patch aims to extend the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Signed-off-by: Chen Yucong <[email protected]>
---
arch/x86/include/asm/mce.h | 4 ++++
arch/x86/kernel/cpu/mcheck/mce-internal.h | 4 +++-
arch/x86/kernel/cpu/mcheck/mce-severity.c | 21 ++++++++++++++++-----
arch/x86/kernel/cpu/mcheck/mce.c | 14 ++++++++------
4 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 958b90f..40b35a5 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -34,6 +34,10 @@
#define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */
#define MCI_STATUS_AR (1ULL<<55) /* Action required */

+/* AMD-specific bits */
+#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */
+#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */
+
/*
* Note that the full MCACOD field of IA32_MCi_STATUS MSR is
* bits 15:0. But bit 12 is the 'F' bit, defined for corrected
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 09edd0b..10b4690 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -3,6 +3,8 @@

enum severity_level {
MCE_NO_SEVERITY,
+ MCE_DEFERRED_SEVERITY,
+ MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY,
MCE_KEEP_SEVERITY,
MCE_SOME_SEVERITY,
MCE_AO_SEVERITY,
@@ -21,7 +23,7 @@ struct mce_bank {
char attrname[ATTR_LEN]; /* attribute name */
};

-int mce_severity(struct mce *a, int tolerant, char **msg);
+int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
struct dentry *mce_get_debugfs_dir(void);

extern struct mce_bank *mce_banks;
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index c370e1c..d31618d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -31,6 +31,7 @@

enum context { IN_KERNEL = 1, IN_USER = 2 };
enum ser { SER_REQUIRED = 1, NO_SER = 2 };
+enum exception { EXCP_CONTEXT = 1, NO_EXCP = 2 };

static struct severity {
u64 mask;
@@ -40,6 +41,7 @@ static struct severity {
unsigned char mcgres;
unsigned char ser;
unsigned char context;
+ unsigned char excp;
unsigned char covered;
char *msg;
} severities[] = {
@@ -48,6 +50,8 @@ static struct severity {
#define USER .context = IN_USER
#define SER .ser = SER_REQUIRED
#define NOSER .ser = NO_SER
+#define EXCP .excp = EXCP_CONTEXT
+#define ONEXCP .excp = NO_EXCP
#define BITCLR(x) .mask = x, .result = 0
#define BITSET(x) .mask = x, .result = x
#define MCGMASK(x, y) .mcgmask = x, .mcgres = y
@@ -71,16 +75,20 @@ static struct severity {
/* When MCIP is not set something is very confused */
MCESEV(
PANIC, "MCIP not set in MCA handler",
- MCGMASK(MCG_STATUS_MCIP, 0)
+ EXCP, MCGMASK(MCG_STATUS_MCIP, 0)
),
/* Neither return not error IP -- no chance to recover -> PANIC */
MCESEV(
PANIC, "Neither restart nor error IP",
- MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
+ EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
),
MCESEV(
PANIC, "In kernel and no restart IP",
- KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
+ EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
+ ),
+ MCESEV(
+ DEFERRED, "Deferred error",
+ NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED)
),
MCESEV(
KEEP, "Corrected error",
@@ -89,7 +97,7 @@ static struct severity {

/* ignore OVER for UCNA */
MCESEV(
- KEEP, "Uncorrected no action required",
+ UCNA, "Uncorrected no action required",
SER, MASK(MCI_UC_SAR, MCI_STATUS_UC)
),
MCESEV(
@@ -178,8 +186,9 @@ static int error_context(struct mce *m)
return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
}

-int mce_severity(struct mce *m, int tolerant, char **msg)
+int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
{
+ enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
enum context ctx = error_context(m);
struct severity *s;

@@ -194,6 +203,8 @@ int mce_severity(struct mce *m, int tolerant, char **msg)
continue;
if (s->context && ctx != s->context)
continue;
+ if (s->excp && excp != s->excp)
+ continue;
if (msg)
*msg = s->msg;
s->covered = 1;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 61a9668ce..453e9bf 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -668,7 +668,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
if (quirk_no_way_out)
quirk_no_way_out(i, m, regs);
}
- if (mce_severity(m, mca_cfg.tolerant, msg) >= MCE_PANIC_SEVERITY)
+ if (mce_severity(m, mca_cfg.tolerant, msg, true) >=
+ MCE_PANIC_SEVERITY)
ret = 1;
}
return ret;
@@ -754,7 +755,7 @@ static void mce_reign(void)
for_each_possible_cpu(cpu) {
int severity = mce_severity(&per_cpu(mces_seen, cpu),
mca_cfg.tolerant,
- &nmsg);
+ &nmsg, true);
if (severity > global_worst) {
msg = nmsg;
global_worst = severity;
@@ -1095,13 +1096,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
*/
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);

- severity = mce_severity(&m, cfg->tolerant, NULL);
+ severity = mce_severity(&m, cfg->tolerant, NULL, true);

/*
- * When machine check was for corrected handler don't touch,
- * unless we're panicing.
+ * When machine check was for corrected/deferred handler don't
+ * touch, unless we're panicing.
*/
- if (severity == MCE_KEEP_SEVERITY && !no_way_out)
+ if ((severity == MCE_KEEP_SEVERITY ||
+ severity == MCE_UCNA_SEVERITY) && !no_way_out)
continue;
__set_bit(i, toclear);
if (severity == MCE_NO_SEVERITY) {
--
1.7.10.4

2014-11-05 04:49:24

by Chen Yucong

[permalink] [raw]
Subject: [PATCH 2/2 v2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

Uncorrected no action required (UCNA) - is a UCR error that is not
signaled via a machine check exception and, instead, is reported to
system software as a corrected machine check error. UCNA errors indicate
that some data in the system is corrupted, but the data has not been
consumed and the processor state is valid and you may continue execution
on this processor. UCNA errors require no action from system software
to continue execution. Note that UCNA errors are supported by the
processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
-- Intel SDM Volume 3B

Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
-- ADM64 APM Volume 2

Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.

Signed-off-by: Chen Yucong <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 50 ++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 453e9bf..37f7649 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -575,6 +575,46 @@ static void mce_read_aux(struct mce *m, int i)
}
}

+static bool mem_deferred_error(struct mce *m)
+{
+ int severity;
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+
+ severity = mce_severity(m, mca_cfg.tolerant, NULL, false);
+
+ if (c->x86_vendor == X86_VENDOR_AMD) {
+ /*
+ * AMD BKDGs - Machine Check Error Codes
+ *
+ * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
+ * a memory-specific error. Note that this field encodes info-
+ * rmation about memory-hierarchy level involved in the error.
+ */
+ if (severity == MCE_DEFERRED_SEVERITY)
+ return (m->status & 0xff00) == BIT(8);
+ } else if (c->x86_vendor == X86_VENDOR_INTEL) {
+ /*
+ * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
+ *
+ * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
+ * indicating a memory error. Bit 8 is used for indicating a
+ * cache hierarchy error. The combination of bit 2 and bit 3
+ * is used for indicating a `generic' cache hierarchy error
+ * But we can't just blindly check the above bits, because if
+ * bit 11 is set, then it is a bus/interconnect error - and
+ * either way the above bits just gives more detail on what
+ * bus/interconnect error happened. Note that bit 12 can be
+ * ignored, as it's the "filter" bit.
+ */
+ if (severity == MCE_UCNA_SEVERITY)
+ return (m->status & 0xef80) == BIT(7) ||
+ (m->status & 0xef00) == BIT(8) ||
+ (m->status & 0xeffc) == 0xc;
+ }
+
+ return false;
+}
+
DEFINE_PER_CPU(unsigned, mce_poll_count);

/*
@@ -630,6 +670,16 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)

if (!(flags & MCP_TIMESTAMP))
m.tsc = 0;
+
+ /*
+ * In the cases where we don't have a valid address after all,
+ * do not add it into the ring buffer.
+ */
+ if (mem_deferred_error(&m) && (m.status & MCI_STATUS_ADDRV)) {
+ mce_ring_add(m.addr >> PAGE_SHIFT);
+ mce_schedule_work();
+ }
+
/*
* Don't get the IP here because it's unlikely to
* have anything to do with the actual error location.
--
1.7.10.4

2014-11-05 18:27:39

by Tony Luck

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

> +#define ONEXCP .excp = NO_EXCP

Shouldn't this be named "NOEXCP" and used in the initializations
for the deferred and UCNA table entries?

-Tony

2014-11-06 01:55:40

by Chen Yucong

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Wed, 2014-11-05 at 10:27 -0800, Tony Luck wrote:
> > +#define ONEXCP .excp = NO_EXCP
>
I'm sorry, this is a typing error. Thanks!

> Shouldn't this be named "NOEXCP" and used in the initializations
> for the deferred and UCNA table entries?
>
In fact, "NOEXCP" can be used in the initialization for the deferred
and UCNA table entries. But it may affect the following snippet in
do_machine_check().

/*
* When machine check was for corrected/deferred handler don't
* touch, unless we're panicing.
*/
if ((severity == MCE_KEEP_SEVERITY ||
severity == MCE_UCNA_SEVERITY) && !no_way_out)
continue;

If `no_way_out' equals 1, we may need to dump/decode corrected/deferred
error information. So if we use "NOEXCP" to initialize the deferred and
UCNA table entries, do_machine_check will skip checking deferred/UCNA
entry when `no_way_out' is set to 1.

thx!
cyc

2014-11-06 15:35:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Wed, Nov 05, 2014 at 12:47:52PM +0800, Chen Yucong wrote:
> Until now, the mce_severity mechanism can only identify the severity
> of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
> out DEFERRED error for ADM platform.
>
> This patch aims to extend the mce_severity mechanism for handling
> UCNA/DEFERRED error. In order to do this, the patch introduces a new
> severity level - MCE_UCNA/DEFERRED_SEVERITY.
>
> In addition, mce_severity is specific to machine check exception,
> and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
> mechanism in non-exception context, the patch also introduces a new
> argument (is_excp) for mce_severity. `is_excp' is used to explicitly
> specify the calling context of mce_severity.
>
> Signed-off-by: Chen Yucong <[email protected]>
> ---
> arch/x86/include/asm/mce.h | 4 ++++
> arch/x86/kernel/cpu/mcheck/mce-internal.h | 4 +++-
> arch/x86/kernel/cpu/mcheck/mce-severity.c | 21 ++++++++++++++++-----
> arch/x86/kernel/cpu/mcheck/mce.c | 14 ++++++++------
> 4 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 958b90f..40b35a5 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -34,6 +34,10 @@
> #define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */
> #define MCI_STATUS_AR (1ULL<<55) /* Action required */
>
> +/* AMD-specific bits */
> +#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */
> +#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */

In file included from drivers/edac/mce_amd.c:4:0:
drivers/edac/mce_amd.h:35:0: warning: "MCI_STATUS_DEFERRED" redefined
#define MCI_STATUS_DEFERRED BIT_64(44)
^
In file included from drivers/edac/mce_amd.h:6:0,
from drivers/edac/mce_amd.c:4:
./arch/x86/include/asm/mce.h:38:0: note: this is the location of the previous definition
#define MCI_STATUS_DEFERRED (1ULL<<44) /* declare an uncorrected error */
^
In file included from drivers/edac/mce_amd.c:4:0:
drivers/edac/mce_amd.h:36:0: warning: "MCI_STATUS_POISON" redefined
#define MCI_STATUS_POISON BIT_64(43)
^
In file included from drivers/edac/mce_amd.h:6:0,
from drivers/edac/mce_amd.c:4:
./arch/x86/include/asm/mce.h:39:0: note: this is the location of the previous definition
#define MCI_STATUS_POISON (1ULL<<43) /* access poisonous data */
^
...

> +
> /*
> * Note that the full MCACOD field of IA32_MCi_STATUS MSR is
> * bits 15:0. But bit 12 is the 'F' bit, defined for corrected
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
> index 09edd0b..10b4690 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
> +++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
> @@ -3,6 +3,8 @@
>
> enum severity_level {
> MCE_NO_SEVERITY,
> + MCE_DEFERRED_SEVERITY,
> + MCE_UCNA_SEVERITY = MCE_DEFERRED_SEVERITY,
> MCE_KEEP_SEVERITY,
> MCE_SOME_SEVERITY,
> MCE_AO_SEVERITY,
> @@ -21,7 +23,7 @@ struct mce_bank {
> char attrname[ATTR_LEN]; /* attribute name */
> };
>
> -int mce_severity(struct mce *a, int tolerant, char **msg);
> +int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
> struct dentry *mce_get_debugfs_dir(void);
>
> extern struct mce_bank *mce_banks;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index c370e1c..d31618d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -31,6 +31,7 @@
>
> enum context { IN_KERNEL = 1, IN_USER = 2 };
> enum ser { SER_REQUIRED = 1, NO_SER = 2 };
> +enum exception { EXCP_CONTEXT = 1, NO_EXCP = 2 };
>
> static struct severity {
> u64 mask;
> @@ -40,6 +41,7 @@ static struct severity {
> unsigned char mcgres;
> unsigned char ser;
> unsigned char context;
> + unsigned char excp;
> unsigned char covered;
> char *msg;
> } severities[] = {
> @@ -48,6 +50,8 @@ static struct severity {
> #define USER .context = IN_USER
> #define SER .ser = SER_REQUIRED
> #define NOSER .ser = NO_SER
> +#define EXCP .excp = EXCP_CONTEXT
> +#define ONEXCP .excp = NO_EXCP
> #define BITCLR(x) .mask = x, .result = 0
> #define BITSET(x) .mask = x, .result = x
> #define MCGMASK(x, y) .mcgmask = x, .mcgres = y
> @@ -71,16 +75,20 @@ static struct severity {
> /* When MCIP is not set something is very confused */
> MCESEV(
> PANIC, "MCIP not set in MCA handler",
> - MCGMASK(MCG_STATUS_MCIP, 0)
> + EXCP, MCGMASK(MCG_STATUS_MCIP, 0)
> ),
> /* Neither return not error IP -- no chance to recover -> PANIC */
> MCESEV(
> PANIC, "Neither restart nor error IP",
> - MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> + EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> ),
> MCESEV(
> PANIC, "In kernel and no restart IP",
> - KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
> + EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
> + ),
> + MCESEV(
> + DEFERRED, "Deferred error",
> + NOSER, MASK(MCI_STATUS_UC|MCI_STATUS_DEFERRED|MCI_STATUS_POISON, MCI_STATUS_DEFERRED)
> ),
> MCESEV(
> KEEP, "Corrected error",
> @@ -89,7 +97,7 @@ static struct severity {
>
> /* ignore OVER for UCNA */
> MCESEV(
> - KEEP, "Uncorrected no action required",
> + UCNA, "Uncorrected no action required",
> SER, MASK(MCI_UC_SAR, MCI_STATUS_UC)
> ),
> MCESEV(
> @@ -178,8 +186,9 @@ static int error_context(struct mce *m)
> return ((m->cs & 3) == 3) ? IN_USER : IN_KERNEL;
> }
>
> -int mce_severity(struct mce *m, int tolerant, char **msg)
> +int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)

You're adding a function argument which is carrying redundant info which
is already present in *m...

> {
> + enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);

... and so this should be:

excp = ((m->mcg_status & MCG_STATUS_MCIP) ? EXCP_CONTEXT : NO_EXCP);

> enum context ctx = error_context(m);
> struct severity *s;
>
> @@ -194,6 +203,8 @@ int mce_severity(struct mce *m, int tolerant, char **msg)
> continue;
> if (s->context && ctx != s->context)
> continue;
> + if (s->excp && excp != s->excp)
> + continue;
> if (msg)
> *msg = s->msg;
> s->covered = 1;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 61a9668ce..453e9bf 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -668,7 +668,8 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
> if (quirk_no_way_out)
> quirk_no_way_out(i, m, regs);
> }
> - if (mce_severity(m, mca_cfg.tolerant, msg) >= MCE_PANIC_SEVERITY)
> + if (mce_severity(m, mca_cfg.tolerant, msg, true) >=
> + MCE_PANIC_SEVERITY)
> ret = 1;
> }
> return ret;
> @@ -754,7 +755,7 @@ static void mce_reign(void)
> for_each_possible_cpu(cpu) {
> int severity = mce_severity(&per_cpu(mces_seen, cpu),
> mca_cfg.tolerant,
> - &nmsg);
> + &nmsg, true);
> if (severity > global_worst) {
> msg = nmsg;
> global_worst = severity;
> @@ -1095,13 +1096,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> */
> add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>
> - severity = mce_severity(&m, cfg->tolerant, NULL);
> + severity = mce_severity(&m, cfg->tolerant, NULL, true);
>
> /*
> - * When machine check was for corrected handler don't touch,
> - * unless we're panicing.
> + * When machine check was for corrected/deferred handler don't
> + * touch, unless we're panicing.
> */
> - if (severity == MCE_KEEP_SEVERITY && !no_way_out)
> + if ((severity == MCE_KEEP_SEVERITY ||
> + severity == MCE_UCNA_SEVERITY) && !no_way_out)
> continue;
> __set_bit(i, toclear);
> if (severity == MCE_NO_SEVERITY) {
> --
> 1.7.10.4
>
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-06 15:41:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Thu, Nov 06, 2014 at 09:54:19AM +0800, Chen Yucong wrote:
> If `no_way_out' equals 1, we may need to dump/decode corrected/deferred
> error information. So if we use "NOEXCP" to initialize the deferred and
> UCNA table entries, do_machine_check will skip checking deferred/UCNA
> entry when `no_way_out' is set to 1.

I think we want to log every error when we're panicking, even the
deferred ones.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-06 15:49:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2 v2] x86, mce: support memory error recovery for both UCNA and Deferred error in machine_check_poll

On Wed, Nov 05, 2014 at 12:47:53PM +0800, Chen Yucong wrote:
> Uncorrected no action required (UCNA) - is a UCR error that is not

Please explain "UCR" if you're using it out of the rest of the SDM text.

> signaled via a machine check exception and, instead, is reported to
> system software as a corrected machine check error. UCNA errors indicate
> that some data in the system is corrupted, but the data has not been
> consumed and the processor state is valid and you may continue execution
> on this processor. UCNA errors require no action from system software
> to continue execution. Note that UCNA errors are supported by the
> processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set.
> -- Intel SDM Volume 3B
>
> Deferred errors are errors that cannot be corrected by hardware, but
> do not cause an immediate interruption in program flow, loss of data
> integrity, or corruption of processor state. These errors indicate
> that data has been corrupted but not consumed. Hardware writes information
> to the status and address registers in the corresponding bank that
> identifies the source of the error if deferred errors are enabled for
> logging. Deferred errors are not reported via machine check exceptions;
> they can be seen by polling the MCi_STATUS registers.
> -- ADM64 APM Volume 2
^^^

Please try to spell "AMD" correctly. Your first patch has "ADM" too.

>
> Above two items, both UCNA and Deferred errors belong to detected
> errors, but they can't be corrected by hardware, and this is very
> similar to Software Recoverable Action Optional (SRAO) errors.
> Therefore, we can take some actions that have been used for handling
> SRAO errors to handle UCNA and Deferred errors.


>
> Signed-off-by: Chen Yucong <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 50 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 453e9bf..37f7649 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -575,6 +575,46 @@ static void mce_read_aux(struct mce *m, int i)
> }
> }
>
> +static bool mem_deferred_error(struct mce *m)
> +{
> + int severity;
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + severity = mce_severity(m, mca_cfg.tolerant, NULL, false);
> +
> + if (c->x86_vendor == X86_VENDOR_AMD) {
> + /*
> + * AMD BKDGs - Machine Check Error Codes
> + *
> + * Bit 8 of ErrCode[15:0] of MCi_STATUS is used for indicating
> + * a memory-specific error. Note that this field encodes info-
> + * rmation about memory-hierarchy level involved in the error.
> + */
> + if (severity == MCE_DEFERRED_SEVERITY)
> + return (m->status & 0xff00) == BIT(8);
> + } else if (c->x86_vendor == X86_VENDOR_INTEL) {
> + /*
> + * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> + *
> + * Bit 7 of the MCACOD field of IA32_MCi_STATUS is used for
> + * indicating a memory error. Bit 8 is used for indicating a
> + * cache hierarchy error. The combination of bit 2 and bit 3
> + * is used for indicating a `generic' cache hierarchy error
> + * But we can't just blindly check the above bits, because if
> + * bit 11 is set, then it is a bus/interconnect error - and
> + * either way the above bits just gives more detail on what
> + * bus/interconnect error happened. Note that bit 12 can be
> + * ignored, as it's the "filter" bit.
> + */
> + if (severity == MCE_UCNA_SEVERITY)
> + return (m->status & 0xef80) == BIT(7) ||
> + (m->status & 0xef00) == BIT(8) ||
> + (m->status & 0xeffc) == 0xc;
> + }
> +
> + return false;
> +}

This function is combining deferred and memory error checks. Please do
this differently:

if (memory_error(m))
if (deferred_error(m))
blabla

so that the memory_error() function can be used in other code paths
separately.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-06 17:27:29

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

>> +int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
>
> You're adding a function argument which is carrying redundant info which
> is already present in *m...
>
>> {
>> + enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
>
> ... and so this should be:
>
> excp = ((m->mcg_status & MCG_STATUS_MCIP) ? EXCP_CONTEXT : NO_EXCP);

That only works if you trust that MCG_STATUS.MCIP is correctly set to indicate whether
we are in MCE or CMCI context. The current code doesn't do that - we check for, and flag
it as a fatal error if we find ourselves in the MCE handler with MCIP==0. If you add the
code you suggest, then it completely neuters the severity check:

MCESEV(
PANIC, "MCIP not set in MCA handler",
MCGMASK(MCG_STATUS_MCIP, 0)
),

I'm also a bit worried about the check for DEFERRED errors in
the severity table. That isn't conditional on an:
if (intel) do_onething(); else /*amd/ do_anotherthing();
So if we can misinterpret some bits on an Intel cpu as if
we had a deferred error.

Overall, this might have seemed like a good idea to begin with,
but we are piling more complexity into mce_severity() [a routine
which everyone agrees is already tough to understand].

It doesn't even buy us some simple code in the polling path.
We still have to do more checks on MCi_STATUS.MCACOD above
and beyond what we get back from mce_severity()

Boris: Do you still want to keep pushing this way? Or should
we look back fondly at version 1 of this patch?

-Tony


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-06 18:22:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Thu, Nov 06, 2014 at 05:27:14PM +0000, Luck, Tony wrote:
> >> +int mce_severity(struct mce *m, int tolerant, char **msg, bool is_excp)
> >
> > You're adding a function argument which is carrying redundant info which
> > is already present in *m...
> >
> >> {
> >> + enum exception excp = (is_excp ? EXCP_CONTEXT : NO_EXCP);
> >
> > ... and so this should be:
> >
> > excp = ((m->mcg_status & MCG_STATUS_MCIP) ? EXCP_CONTEXT : NO_EXCP);
>
> That only works if you trust that MCG_STATUS.MCIP is correctly set to indicate whether
> we are in MCE or CMCI context. The current code doesn't do that - we check for, and flag
> it as a fatal error if we find ourselves in the MCE handler with MCIP==0. If you add the
> code you suggest, then it completely neuters the severity check:
>
> MCESEV(
> PANIC, "MCIP not set in MCA handler",
> MCGMASK(MCG_STATUS_MCIP, 0)
> ),

I was looking at the version Chen did:

MCESEV(
PANIC, "MCIP not set in MCA handler",
EXCP, MCGMASK(MCG_STATUS_MCIP, 0)
),

and then

if (s->excp && excp != s->excp)
continue;

Basically, this check is being done only for machine check exceptions
only.

> I'm also a bit worried about the check for DEFERRED errors in
> the severity table. That isn't conditional on an:
> if (intel) do_onething(); else /*amd/ do_anotherthing();
> So if we can misinterpret some bits on an Intel cpu as if
> we had a deferred error.
>
> Overall, this might have seemed like a good idea to begin with,
> but we are piling more complexity into mce_severity() [a routine
> which everyone agrees is already tough to understand].
>
> It doesn't even buy us some simple code in the polling path.
> We still have to do more checks on MCi_STATUS.MCACOD above
> and beyond what we get back from mce_severity()
>
> Boris: Do you still want to keep pushing this way? Or should
> we look back fondly at version 1 of this patch?

You mean the one which doesn't touch mce_severity() at all and decides
on deferred errors in a separate, completely unrelated function? Yeah,
that might be cleaner after all.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-06 18:32:41

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

> Basically, this check is being done only for machine check exceptions
> only.

But you proposed setting excp by looking at mcg_status:
> excp = ((m->mcg_status & MCG_STATUS_MCIP) ? EXCP_CONTEXT : NO_EXCP);

Which makes the code rather self referential. If we actually did arrive in MCE handler
with MCIP == 0 ... then your code would pretend that we'd arrived here from the
poll code, and skip over the test for MCIP - so fail to report that MCIP wasn't set.

-Tony

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-06 18:56:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Thu, Nov 06, 2014 at 06:32:37PM +0000, Luck, Tony wrote:
> > Basically, this check is being done only for machine check exceptions
> > only.
>
> But you proposed setting excp by looking at mcg_status:
> > excp = ((m->mcg_status & MCG_STATUS_MCIP) ? EXCP_CONTEXT : NO_EXCP);
>
> Which makes the code rather self referential. If we actually did arrive in MCE handler
> with MCIP == 0 ... then your code would pretend that we'd arrived here from the
> poll code, and skip over the test for MCIP - so fail to report that MCIP wasn't set.

Is that ever possible - to have a discrepancy between the setting of
MCIP and where we call mce_severity()?

I'm under the assumption that at all times, when we get a MCE, MCIP will
be set. For example, mce_gather_info() reads MCG_STATUS before we call
mce_severity() in do_machine_check().

Or am I missing something?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-11-06 21:24:06

by Tony Luck

[permalink] [raw]
Subject: RE: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

> I'm under the assumption that at all times, when we get a MCE, MCIP will
> be set. For example, mce_gather_info() reads MCG_STATUS before we call
> mce_severity() in do_machine_check().
>
> Or am I missing something?

Architecturally it is true that MCIP will be set when machine check is signaled.
But, sometimes there are bugs. BIOS has a hook to get an SMI to see the
event before the OS sees the machine check - which gives lots of scope for
things to not happen by the book. If MCIP isn't set correctly, I'd like to get
on and panic quickly - because all sorts of bad things will happen if a nested
machine check happens and isn't caught because MCIP wasn't set in the first
machine check.

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-07 12:12:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] x86, mce, severity: extend the the mce_severity

On Thu, Nov 06, 2014 at 09:24:01PM +0000, Luck, Tony wrote:
> Architecturally it is true that MCIP will be set when machine check is
> signaled. But, sometimes there are bugs. BIOS has a hook to get an SMI
> to see the event before the OS sees the machine check - which gives
> lots of scope for things to not happen by the book. If MCIP isn't set
> correctly, I'd like to get on and panic quickly - because all sorts
> of bad things will happen if a nested machine check happens and isn't
> caught because MCIP wasn't set in the first machine check.

I won't even ask about your actual experience with stuff not setting
MCIP :-)

Anyway, noted. So for the mce_severity rewrite we should make sure to
note from which context we're being called as the more reliable method,
instead of looking at MCGSTATUS.MCIP.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--