2015-04-30 19:52:27

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH 0/4] Enable deferred error interrupts

Newer AMD processors can generate deferred errors and can be configured
to generate APIC interrupts on such events.

This patchset introduces a new interrupt handler for deferred errors and
configures the HW if the feature is present.

Patch1: Defines SUCCOR cpuid bit. This indicates prescence of features
such as data poisoning and deferred error interrupts in hardware.
Patch2: Implement the interrupt handler.
- setup vector number, build the interrupt and implement handler
function in this patch.
Patch3, Patch 4: Cleanups in the code. No functional changes are introduced.

Aravind Gopalakrishnan (4):
x86/mce: Define 'SUCCOR' cpuid bit
x86/mce/amd: Introduce deferred error interrupt handler
x86, irq: Cleanup ordering of vector numbers
x86/mce/amd: Rename setup_APIC_mce

arch/x86/include/asm/entry_arch.h | 3 +
arch/x86/include/asm/hardirq.h | 3 +
arch/x86/include/asm/hw_irq.h | 2 +
arch/x86/include/asm/irq_vectors.h | 11 ++--
arch/x86/include/asm/mce.h | 6 +-
arch/x86/include/asm/trace/irq_vectors.h | 6 ++
arch/x86/include/asm/traps.h | 3 +-
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
arch/x86/kernel/cpu/mcheck/mce_amd.c | 105 ++++++++++++++++++++++++++++++-
arch/x86/kernel/entry_64.S | 5 ++
arch/x86/kernel/irq.c | 6 ++
arch/x86/kernel/irqinit.c | 4 ++
arch/x86/kernel/traps.c | 4 ++
13 files changed, 150 insertions(+), 9 deletions(-)

--
1.9.1


2015-04-30 19:52:47

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
It indicates support for data poisoning in HW and deferred error
interrupts.

Add new bitfield in mce_vendor_flags for this.
We use this to verify prescence of deferred error interrupts
before we enable them in mce_amd.c

Signed-off-by: Aravind Gopalakrishnan <[email protected]>
---
arch/x86/include/asm/mce.h | 3 ++-
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1f5a86d..dfcb664 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -118,7 +118,8 @@ struct mca_config {

struct mce_vendor_flags {
__u64 overflow_recov : 1, /* cpuid_ebx(80000007) */
- __reserved_0 : 63;
+ succor : 1,
+ __reserved_0 : 62;
};
extern struct mce_vendor_flags mce_flags;

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533..de61f62e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
case X86_VENDOR_AMD:
mce_amd_feature_init(c);
mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
+ mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
break;
default:
break;
--
1.9.1

2015-04-30 19:53:23

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

Changes introduced in the patch-
- Assign vector number 0xf4 for Deferred errors
- Declare deferred_interrupt, allocate gate and bind it
to DEFERRED_APIC_VECTOR.
- Declare smp_deferred_interrupt to be used as the
entry point for the interrupt in mce_amd.c
- Define trace_deferred_interrupt for tracing
- Enable deferred error interrupt selectively upon detection
of 'succor' bitfield
- Setup amd_deferred_error_interrupt() to handle the interrupt
and assign it to def_int_vector if feature is present in HW.
Else, let default handler deal with it.
- Provide Deferred error interrupt stats on
/proc/interrupts by incrementing irq_deferred_count

Signed-off-by: Aravind Gopalakrishnan <[email protected]>
---
arch/x86/include/asm/entry_arch.h | 3 +
arch/x86/include/asm/hardirq.h | 3 +
arch/x86/include/asm/hw_irq.h | 2 +
arch/x86/include/asm/irq_vectors.h | 1 +
arch/x86/include/asm/mce.h | 3 +
arch/x86/include/asm/trace/irq_vectors.h | 6 ++
arch/x86/include/asm/traps.h | 3 +-
arch/x86/kernel/cpu/mcheck/mce_amd.c | 101 +++++++++++++++++++++++++++++++
arch/x86/kernel/entry_64.S | 5 ++
arch/x86/kernel/irq.c | 6 ++
arch/x86/kernel/irqinit.c | 4 ++
arch/x86/kernel/traps.c | 4 ++
12 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index dc5fa66..f7b957b 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -50,4 +50,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
#endif

+#ifdef CONFIG_X86_MCE_AMD
+BUILD_INTERRUPT(deferred_interrupt, DEFERRED_APIC_VECTOR)
+#endif
#endif
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b..448451c 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -33,6 +33,9 @@ typedef struct {
#ifdef CONFIG_X86_MCE_THRESHOLD
unsigned int irq_threshold_count;
#endif
+#ifdef CONFIG_X86_MCE_AMD
+ unsigned int irq_deferred_count;
+#endif
#if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
unsigned int irq_hv_callback_count;
#endif
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e9571dd..7cf2670 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -73,6 +73,7 @@ extern asmlinkage void invalidate_interrupt31(void);
extern asmlinkage void irq_move_cleanup_interrupt(void);
extern asmlinkage void reboot_interrupt(void);
extern asmlinkage void threshold_interrupt(void);
+extern asmlinkage void deferred_interrupt(void);

extern asmlinkage void call_function_interrupt(void);
extern asmlinkage void call_function_single_interrupt(void);
@@ -87,6 +88,7 @@ extern void trace_spurious_interrupt(void);
extern void trace_thermal_interrupt(void);
extern void trace_reschedule_interrupt(void);
extern void trace_threshold_interrupt(void);
+extern void trace_deferred_interrupt(void);
extern void trace_call_function_interrupt(void);
extern void trace_call_function_single_interrupt(void);
#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 666c89e..cee723f 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -113,6 +113,7 @@
#define IRQ_WORK_VECTOR 0xf6

#define UV_BAU_MESSAGE 0xf5
+#define DEFERRED_APIC_VECTOR 0xf4

/* Vector on which hypervisor callbacks will be delivered */
#define HYPERVISOR_CALLBACK_VECTOR 0xf3
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfcb664..b21b887 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -224,6 +224,9 @@ void do_machine_check(struct pt_regs *, long);
extern void (*mce_threshold_vector)(void);
extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);

+/* Deferred error interrupt handler */
+extern void (*deferred_int_vector)(void);
+
/*
* Thermal handler
*/
diff --git a/arch/x86/include/asm/trace/irq_vectors.h b/arch/x86/include/asm/trace/irq_vectors.h
index 4cab890..3c1f0a7 100644
--- a/arch/x86/include/asm/trace/irq_vectors.h
+++ b/arch/x86/include/asm/trace/irq_vectors.h
@@ -101,6 +101,12 @@ DEFINE_IRQ_VECTOR_EVENT(call_function_single);
DEFINE_IRQ_VECTOR_EVENT(threshold_apic);

/*
+ * deferred_apic - called when entering/exiting a deferred apic interrupt
+ * vector handler
+ */
+DEFINE_IRQ_VECTOR_EVENT(deferred_apic);
+
+/*
* thermal_apic - called when entering/exiting a thermal apic interrupt
* vector handler
*/
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 4e49d7d..ef937b7 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -108,7 +108,8 @@ extern int panic_on_unrecovered_nmi;
void math_emulate(struct math_emu_info *);
#ifndef CONFIG_X86_32
asmlinkage void smp_thermal_interrupt(void);
-asmlinkage void mce_threshold_interrupt(void);
+asmlinkage void smp_threshold_interrupt(void);
+asmlinkage void smp_deferred_interrupt(void);
#endif

extern enum ctx_state ist_enter(struct pt_regs *regs);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 55ad9b3..ce82f0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -12,6 +12,8 @@
* - added support for AMD Family 0x10 processors
* May 2012
* - major scrubbing
+ * April 2015
+ * - add support for deferred error interrupts
*
* All MC4_MISCi registers are shared between multi-cores
*/
@@ -32,6 +34,7 @@
#include <asm/idle.h>
#include <asm/mce.h>
#include <asm/msr.h>
+#include <asm/trace/irq_vectors.h>

#define NR_BLOCKS 9
#define THRESHOLD_MAX 0xFFF
@@ -47,6 +50,13 @@
#define MASK_BLKPTR_LO 0xFF000000
#define MCG_XBLK_ADDR 0xC0000400

+/* Deferred error settings */
+#define MSR_CU_DEF_ERR 0xC0000410
+#define MASK_DEF_LVTOFF 0x000000F0
+#define MASK_DEF_INT_TYPE 0x00000006
+#define DEF_LVT_OFF 0x2
+#define DEF_INT_TYPE_APIC 0x2
+
static const char * const th_names[] = {
"load_store",
"insn_fetch",
@@ -60,6 +70,15 @@ static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
static DEFINE_PER_CPU(unsigned char, bank_map); /* see which banks are on */

static void amd_threshold_interrupt(void);
+static void amd_deferred_error_interrupt(void);
+
+/* Setup default deferred error interrupt handler */
+static void default_deferred_interrupt(void)
+{
+ pr_err("Unexpected deferred interrupt at vector %x\n",
+ DEFERRED_APIC_VECTOR);
+}
+void (*def_int_vector)(void) = default_deferred_interrupt;

/*
* CPU Initialization
@@ -205,6 +224,62 @@ static int setup_APIC_mce(int reserved, int new)
return reserved;
}

+static int setup_APIC_deferred_error(int reserved, int new)
+{
+ if (reserved < 0 && !setup_APIC_eilvt(new, DEFERRED_APIC_VECTOR,
+ APIC_EILVT_MSG_FIX, 0))
+ return new;
+
+ return reserved;
+}
+
+static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
+{
+ u32 low = 0, high = 0;
+ int def_offset = -1, def_new;
+
+ if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
+ return;
+
+ def_new = (low & MASK_DEF_LVTOFF) >> 4;
+ if (c->x86 == 0x15 && c->x86_model == 0x60 &&
+ !(low & MASK_DEF_LVTOFF)) {
+ def_new = DEF_LVT_OFF;
+ low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
+ }
+
+ def_offset = setup_APIC_deferred_error(def_offset, def_new);
+
+ if ((def_offset == def_new) &&
+ (def_int_vector != amd_deferred_error_interrupt))
+ def_int_vector = amd_deferred_error_interrupt;
+
+ low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
+ wrmsr(MSR_CU_DEF_ERR, low, high);
+}
+
+static inline void __smp_deferred_interrupt(void)
+{
+ inc_irq_stat(irq_deferred_count);
+ def_int_vector();
+}
+
+asmlinkage __visible void smp_deferred_interrupt(void)
+{
+ entering_irq();
+ __smp_deferred_interrupt();
+ exiting_ack_irq();
+}
+
+asmlinkage __visible void smp_trace_deferred_interrupt(void)
+{
+ entering_irq();
+ trace_deferred_apic_entry(DEFERRED_APIC_VECTOR);
+ __smp_deferred_interrupt();
+ trace_deferred_apic_exit(DEFERRED_APIC_VECTOR);
+ exiting_ack_irq();
+}
+
/* cpu init entry point, called from mce.c with preempt off */
void mce_amd_feature_init(struct cpuinfo_x86 *c)
{
@@ -262,6 +337,32 @@ init:
mce_threshold_block_init(&b, offset);
}
}
+
+ if (mce_flags.succor)
+ deferred_error_interrupt_enable(c);
+}
+
+/* Apic interrupt handler for deferred errors */
+static void amd_deferred_error_interrupt(void)
+{
+ u64 status;
+ unsigned int bank;
+ struct mce m;
+
+ for (bank = 0; bank < mca_cfg.banks; ++bank) {
+ rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
+
+ if (!(status & MCI_STATUS_VAL) ||
+ !(status & MCI_STATUS_DEFERRED))
+ continue;
+
+ mce_setup(&m);
+ m.bank = bank;
+ m.status = status;
+ mce_log(&m);
+ wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
+ break;
+ }
}

/*
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e952f6b..820965c 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -914,6 +914,11 @@ apicinterrupt THRESHOLD_APIC_VECTOR \
threshold_interrupt smp_threshold_interrupt
#endif

+#ifdef CONFIG_X86_MCE_AMD
+apicinterrupt DEFERRED_APIC_VECTOR \
+ deferred_interrupt smp_deferred_interrupt
+#endif
+
#ifdef CONFIG_X86_THERMAL_VECTOR
apicinterrupt THERMAL_APIC_VECTOR \
thermal_interrupt smp_thermal_interrupt
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index e5952c2..406f204 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -116,6 +116,12 @@ int arch_show_interrupts(struct seq_file *p, int prec)
seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count);
seq_puts(p, " Threshold APIC interrupts\n");
#endif
+#ifdef CONFIG_X86_MCE_AMD
+ seq_printf(p, "%*s: ", prec, "DEF");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", irq_stats(j)->irq_deferred_count);
+ seq_puts(p, " Deferred APIC interrupts\n");
+#endif
#ifdef CONFIG_X86_MCE
seq_printf(p, "%*s: ", prec, "MCE");
for_each_online_cpu(j)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index cd10a64..e0ffd29 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -135,6 +135,10 @@ static void __init apic_intr_init(void)
alloc_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt);
#endif

+#ifdef CONFIG_X86_MCE_AMD
+ alloc_intr_gate(DEFERRED_APIC_VECTOR, deferred_interrupt);
+#endif
+
#ifdef CONFIG_X86_LOCAL_APIC
/* self generated IPI for local APIC timer */
alloc_intr_gate(LOCAL_TIMER_VECTOR, apic_timer_interrupt);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 324ab52..dbfe07c2 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -827,6 +827,10 @@ asmlinkage __visible void __attribute__((weak)) smp_threshold_interrupt(void)
{
}

+asmlinkage __visible void __attribute__((weak)) smp_deferred_interrupt(void)
+{
+}
+
/*
* 'math_state_restore()' saves the current math information in the
* old math state array, and gets the new ones from the current task
--
1.9.1

2015-04-30 19:52:37

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH 3/4] x86, irq: Cleanup ordering of vector numbers

Enforcing proper descending order of vector number assignments here.
No functional change.

Signed-off-by: Aravind Gopalakrishnan <[email protected]>
---
arch/x86/include/asm/irq_vectors.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index cee723f..244d486 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -102,11 +102,6 @@
*/
#define X86_PLATFORM_IPI_VECTOR 0xf7

-/* Vector for KVM to deliver posted interrupt IPI */
-#ifdef CONFIG_HAVE_KVM
-#define POSTED_INTR_VECTOR 0xf2
-#endif
-
/*
* IRQ work vector:
*/
@@ -118,6 +113,11 @@
/* Vector on which hypervisor callbacks will be delivered */
#define HYPERVISOR_CALLBACK_VECTOR 0xf3

+/* Vector for KVM to deliver posted interrupt IPI */
+#ifdef CONFIG_HAVE_KVM
+#define POSTED_INTR_VECTOR 0xf2
+#endif
+
/*
* Local APIC timer IRQ vector is on a different priority level,
* to work around the 'lost local interrupt if more than 2 IRQ
--
1.9.1

2015-04-30 19:52:32

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: [PATCH 4/4] x86/mce/amd: Rename setup_APIC_mce

'setup_APIC_mce' doesn't give us an indication of why we are
going to program LVT. Make that explicit by renaming it to
setup_APIC_mce_threshold so we know.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index ce82f0b..2952864 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -215,7 +215,7 @@ static void mce_threshold_block_init(struct threshold_block *b, int offset)
threshold_restart_bank(&tr);
};

-static int setup_APIC_mce(int reserved, int new)
+static int setup_APIC_mce_threshold(int reserved, int new)
{
if (reserved < 0 && !setup_APIC_eilvt(new, THRESHOLD_APIC_VECTOR,
APIC_EILVT_MSG_FIX, 0))
@@ -327,7 +327,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)

b.interrupt_enable = 1;
new = (high & MASK_LVTOFF_HI) >> 20;
- offset = setup_APIC_mce(offset, new);
+ offset = setup_APIC_mce_threshold(offset, new);

if ((offset == new) &&
(mce_threshold_vector != amd_threshold_interrupt))
--
1.9.1

2015-04-30 20:41:52

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan
<[email protected]> wrote:
> Changes introduced in the patch-
> - Assign vector number 0xf4 for Deferred errors
> - Declare deferred_interrupt, allocate gate and bind it
> to DEFERRED_APIC_VECTOR.
> - Declare smp_deferred_interrupt to be used as the
> entry point for the interrupt in mce_amd.c
> - Define trace_deferred_interrupt for tracing
> - Enable deferred error interrupt selectively upon detection
> of 'succor' bitfield
> - Setup amd_deferred_error_interrupt() to handle the interrupt
> and assign it to def_int_vector if feature is present in HW.
> Else, let default handler deal with it.
> - Provide Deferred error interrupt stats on
> /proc/interrupts by incrementing irq_deferred_count

You're calling these "deferred interrupts" all over (e.g.
irq_deferred_count, deferred_int_handler, etc). That seems like it'll
be confusing. They're deferred errors, not deferred interrupts.

--Andy

2015-05-01 04:16:42

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler


On 4/30/15 3:41 PM, Andy Lutomirski wrote:
> On Thu, Apr 30, 2015 at 7:49 AM, Aravind Gopalakrishnan
> <[email protected]> wrote:
>> Changes introduced in the patch-
>> - Assign vector number 0xf4 for Deferred errors
>> - Declare deferred_interrupt, allocate gate and bind it
>> to DEFERRED_APIC_VECTOR.
>> - Declare smp_deferred_interrupt to be used as the
>> entry point for the interrupt in mce_amd.c
>> - Define trace_deferred_interrupt for tracing
>> - Enable deferred error interrupt selectively upon detection
>> of 'succor' bitfield
>> - Setup amd_deferred_error_interrupt() to handle the interrupt
>> and assign it to def_int_vector if feature is present in HW.
>> Else, let default handler deal with it.
>> - Provide Deferred error interrupt stats on
>> /proc/interrupts by incrementing irq_deferred_count
> You're calling these "deferred interrupts" all over (e.g.
> irq_deferred_count, deferred_int_handler, etc). That seems like it'll
> be confusing. They're deferred errors, not deferred interrupts.
>

I used the term as it is an interrupt due to the deferred error.
Would 'deferred_err_interrupt' be more apt? Maybe
'irq_deferred_error_count' for the counter?

Thanks,
-Aravind.

2015-05-01 07:18:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/4] Enable deferred error interrupts


* Aravind Gopalakrishnan <[email protected]> wrote:

> Newer AMD processors can generate deferred errors and can be configured
> to generate APIC interrupts on such events.

What's the wider context of this? What is it good for?

I suspect it's MCE related, but only from the diffstat:

> arch/x86/kernel/cpu/mcheck/mce.c | 1 +
> arch/x86/kernel/cpu/mcheck/mce_amd.c | 105 ++++++++++++++++++++++++++++++-

Please provide proper high level description for the changes.

Thanks,

Ingo

2015-05-01 09:36:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Thu, Apr 30, 2015 at 11:16:30PM -0500, Aravind Gopalakrishnan wrote:
> I used the term as it is an interrupt due to the deferred error.
> Would 'deferred_err_interrupt' be more apt? Maybe 'irq_deferred_error_count'
> for the counter?

Yeah, I think it is important to stick to the "deferred error" naming
as those are interrupts announcing deferred errors and not deferred
interrupts.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-01 10:25:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

On Thu, Apr 30, 2015 at 09:49:22AM -0500, Aravind Gopalakrishnan wrote:
> SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
> It indicates support for data poisoning in HW and deferred error
> interrupts.
>
> Add new bitfield in mce_vendor_flags for this.
> We use this to verify prescence of deferred error interrupts
> before we enable them in mce_amd.c
>
> Signed-off-by: Aravind Gopalakrishnan <[email protected]>
> ---
> arch/x86/include/asm/mce.h | 3 ++-
> arch/x86/kernel/cpu/mcheck/mce.c | 1 +
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 1f5a86d..dfcb664 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -118,7 +118,8 @@ struct mca_config {
>
> struct mce_vendor_flags {
> __u64 overflow_recov : 1, /* cpuid_ebx(80000007) */
> - __reserved_0 : 63;
> + succor : 1,

Please add that CPUID bit definition from the commit message here too so
that we know what it means.

> + __reserved_0 : 62;
> };
> extern struct mce_vendor_flags mce_flags;
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index e535533..de61f62e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
> case X86_VENDOR_AMD:
> mce_amd_feature_init(c);
> mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
> + mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;

mce_flags.succor = !!(cpuid_ebx(0x80000007) & BIT(1));

is a common way of assigning truth values from bits in the kernel.

You can change the above one to use BIT(0) too, while at it, and
vertically align the assignments.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-01 14:50:54

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 0/4] Enable deferred error interrupts

On 5/1/2015 2:18 AM, Ingo Molnar wrote:
> * Aravind Gopalakrishnan <[email protected]> wrote:
>
>> Newer AMD processors can generate deferred errors and can be configured
>> to generate APIC interrupts on such events.
> What's the wider context of this? What is it good for?
>
> I suspect it's MCE related, but only from the diffstat:

Deferred errors indicate error conditions that were not corrected, but
require no action from S/W (or action is optional).
These errors provide info about a latent UC MCE that can occur when a
poisoned data is consumed by the processor.

HTH,

I shall include the short description in the cover letter of V2.

Thanks,
-Aravind.


>> arch/x86/kernel/cpu/mcheck/mce.c | 1 +
>> arch/x86/kernel/cpu/mcheck/mce_amd.c | 105 ++++++++++++++++++++++++++++++-
> Please provide proper high level description for the changes.
>
>

2015-05-01 14:51:05

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/1/2015 4:36 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 11:16:30PM -0500, Aravind Gopalakrishnan wrote:
>> I used the term as it is an interrupt due to the deferred error.
>> Would 'deferred_err_interrupt' be more apt? Maybe 'irq_deferred_error_count'
>> for the counter?
> Yeah, I think it is important to stick to the "deferred error" naming
> as those are interrupts announcing deferred errors and not deferred
> interrupts.
>

Ok. I'll shall do this substitution in V2:
s/deferred_interrupt/deferred_error

and in mce_amd.c, s/def_int_vector/def_err_int_vector

unless you have any objections.

Thanks,
-Aravind.

2015-05-01 14:54:31

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

On 5/1/2015 5:25 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 09:49:22AM -0500, Aravind Gopalakrishnan wrote:
>> SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
>> It indicates support for data poisoning in HW and deferred error
>> interrupts.
>>
>>
>>
>> struct mce_vendor_flags {
>> __u64 overflow_recov : 1, /* cpuid_ebx(80000007) */
>> - __reserved_0 : 63;
>> + succor : 1,
> Please add that CPUID bit definition from the commit message here too so
> that we know what it means.

Will do.

Shall I beef up comment regarding 'overflow_recov' too?
Something like 'overflow recovery cpuid bit indicates that overflow
conditions are not fatal'
would provide a better indication of the usage of the bit IMHO.

>> case X86_VENDOR_AMD:
>> mce_amd_feature_init(c);
>> mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
>> + mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
> mce_flags.succor = !!(cpuid_ebx(0x80000007) & BIT(1));
>
> is a common way of assigning truth values from bits in the kernel.
>
> You can change the above one to use BIT(0) too, while at it, and
> vertically align the assignments.
>
>


Ok, Will do.

Thanks,
-Aravind.

2015-05-01 15:09:58

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

On 04/30/2015 07:49 AM, Aravind Gopalakrishnan wrote:
> @@ -1640,6 +1640,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
> case X86_VENDOR_AMD:
> mce_amd_feature_init(c);
> mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
> + mce_flags.succor = (cpuid_ebx(0x80000007) & 0x2) ? 1 : 0;
> break;
> default:
> break;

Is there a reason to add the cpuid detection like this instead of adding
an X86_FEATURE_ for it and using cpu_has() and friends? Doing that
would also let you see the bit in /proc/cpuinfo.

2015-05-01 16:20:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

On Fri, May 01, 2015 at 08:09:40AM -0700, Dave Hansen wrote:
> Is there a reason to add the cpuid detection like this instead of adding
> an X86_FEATURE_ for it and using cpu_has() and friends? Doing that
> would also let you see the bit in /proc/cpuinfo.

Well, as those are RAS-specific, it didn't seem to make a whole lotta
sense to advertise them globally (they're used solely in the MCE code).

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-03 09:01:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/4] x86/mce: Define 'SUCCOR' cpuid bit

On Fri, May 01, 2015 at 09:54:19AM -0500, Aravind Gopalakrishnan wrote:
> Shall I beef up comment regarding 'overflow_recov' too?
> Something like 'overflow recovery cpuid bit indicates that overflow
> conditions are not fatal'
> would provide a better indication of the usage of the bit IMHO.

Ok.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-03 09:22:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Thu, Apr 30, 2015 at 09:49:23AM -0500, Aravind Gopalakrishnan wrote:
> Changes introduced in the patch-
> - Assign vector number 0xf4 for Deferred errors
> - Declare deferred_interrupt, allocate gate and bind it
> to DEFERRED_APIC_VECTOR.
> - Declare smp_deferred_interrupt to be used as the
> entry point for the interrupt in mce_amd.c
> - Define trace_deferred_interrupt for tracing
> - Enable deferred error interrupt selectively upon detection
> of 'succor' bitfield
> - Setup amd_deferred_error_interrupt() to handle the interrupt
> and assign it to def_int_vector if feature is present in HW.
> Else, let default handler deal with it.
> - Provide Deferred error interrupt stats on
> /proc/interrupts by incrementing irq_deferred_count

This commit message should explain the feature in more high-level way,
what is it good for and so on, not what you're adding.

That I can see. :-)

> Signed-off-by: Aravind Gopalakrishnan <[email protected]>
> ---
> arch/x86/include/asm/entry_arch.h | 3 +
> arch/x86/include/asm/hardirq.h | 3 +
> arch/x86/include/asm/hw_irq.h | 2 +
> arch/x86/include/asm/irq_vectors.h | 1 +
> arch/x86/include/asm/mce.h | 3 +
> arch/x86/include/asm/trace/irq_vectors.h | 6 ++
> arch/x86/include/asm/traps.h | 3 +-
> arch/x86/kernel/cpu/mcheck/mce_amd.c | 101 +++++++++++++++++++++++++++++++
> arch/x86/kernel/entry_64.S | 5 ++
> arch/x86/kernel/irq.c | 6 ++
> arch/x86/kernel/irqinit.c | 4 ++
> arch/x86/kernel/traps.c | 4 ++
> 12 files changed, 140 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> index dc5fa66..f7b957b 100644
> --- a/arch/x86/include/asm/entry_arch.h
> +++ b/arch/x86/include/asm/entry_arch.h
> @@ -50,4 +50,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
> BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
> #endif
>
> +#ifdef CONFIG_X86_MCE_AMD
> +BUILD_INTERRUPT(deferred_interrupt, DEFERRED_APIC_VECTOR)

All the other names are written out so you can simply do

BUILD_INTERRUPT(deferred_error_interrupt, DEFERRED_ERROR_VECTOR)

> +#endif
> #endif
> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
> index 0f5fb6b..448451c 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -33,6 +33,9 @@ typedef struct {
> #ifdef CONFIG_X86_MCE_THRESHOLD
> unsigned int irq_threshold_count;
> #endif
> +#ifdef CONFIG_X86_MCE_AMD
> + unsigned int irq_deferred_count;

Right
unsigned int irq_deferred_error_count;

> +#endif
> #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
> unsigned int irq_hv_callback_count;
> #endif
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index e9571dd..7cf2670 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h

...

> +static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
> +{
> + u32 low = 0, high = 0;
> + int def_offset = -1, def_new;
> +
> + if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
> + return;
> +
> + def_new = (low & MASK_DEF_LVTOFF) >> 4;
> + if (c->x86 == 0x15 && c->x86_model == 0x60 &&
> + !(low & MASK_DEF_LVTOFF)) {

What's the family check for? for BIOSes which don't set the LVT offset
to 2, as they should?

If so, we probably should say

pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");

or similar...

> + def_new = DEF_LVT_OFF;
> + low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
> + }
> +
> + def_offset = setup_APIC_deferred_error(def_offset, def_new);
> +
> + if ((def_offset == def_new) &&
> + (def_int_vector != amd_deferred_error_interrupt))
> + def_int_vector = amd_deferred_error_interrupt;
> +
> + low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
> + wrmsr(MSR_CU_DEF_ERR, low, high);
> +}

...

> +/* Apic interrupt handler for deferred errors */
> +static void amd_deferred_error_interrupt(void)
> +{
> + u64 status;
> + unsigned int bank;
> + struct mce m;
> +
> + for (bank = 0; bank < mca_cfg.banks; ++bank) {
> + rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
> +
> + if (!(status & MCI_STATUS_VAL) ||
> + !(status & MCI_STATUS_DEFERRED))
> + continue;
> +
> + mce_setup(&m);
> + m.bank = bank;
> + m.status = status;
> + mce_log(&m);
> + wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
> + break;
> + }

That's very similar to what we do in the end of
amd_threshold_interrupt(). You could add a generic __log_error() static
helper in a pre-patch and then call it here.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-04 15:30:06

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/3/2015 4:22 AM, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 09:49:23AM -0500, Aravind Gopalakrishnan wrote:
>> Changes introduced in the patch-
>> - Assign vector number 0xf4 for Deferred errors
>> - Declare deferred_interrupt, allocate gate and bind it
>> to DEFERRED_APIC_VECTOR.
>> - Declare smp_deferred_interrupt to be used as the
>> entry point for the interrupt in mce_amd.c
>> - Define trace_deferred_interrupt for tracing
>> - Enable deferred error interrupt selectively upon detection
>> of 'succor' bitfield
>> - Setup amd_deferred_error_interrupt() to handle the interrupt
>> and assign it to def_int_vector if feature is present in HW.
>> Else, let default handler deal with it.
>> - Provide Deferred error interrupt stats on
>> /proc/interrupts by incrementing irq_deferred_count
> This commit message should explain the feature in more high-level way,
> what is it good for and so on, not what you're adding.
>
> That I can see. :-)

Okay, I'll include a short description of deferred errors here for V2.

>> +#endif
>> #endif
>> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
>> index 0f5fb6b..448451c 100644
>> --- a/arch/x86/include/asm/hardirq.h
>> +++ b/arch/x86/include/asm/hardirq.h
>> @@ -33,6 +33,9 @@ typedef struct {
>> #ifdef CONFIG_X86_MCE_THRESHOLD
>> unsigned int irq_threshold_count;
>> #endif
>> +#ifdef CONFIG_X86_MCE_AMD
>> + unsigned int irq_deferred_count;
> Right
> unsigned int irq_deferred_error_count;

Ack.

>> +static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>> +{
>> + u32 low = 0, high = 0;
>> + int def_offset = -1, def_new;
>> +
>> + if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
>> + return;
>> +
>> + def_new = (low & MASK_DEF_LVTOFF) >> 4;
>> + if (c->x86 == 0x15 && c->x86_model == 0x60 &&
>> + !(low & MASK_DEF_LVTOFF)) {
> What's the family check for? for BIOSes which don't set the LVT offset
> to 2, as they should?
>
> If so, we probably should say
>
> pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
>
> or similar...

Yeah. I meant to provide a comment at least for this.
Forgot to do that.

I'll print out a error message as you suggested (considering we do this
in other places like threshold setup or IBS setup..)

>> +/* Apic interrupt handler for deferred errors */
>> +static void amd_deferred_error_interrupt(void)
>> +{
>> + u64 status;
>> + unsigned int bank;
>> + struct mce m;
>> +
>> + for (bank = 0; bank < mca_cfg.banks; ++bank) {
>> + rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
>> +
>> + if (!(status & MCI_STATUS_VAL) ||
>> + !(status & MCI_STATUS_DEFERRED))
>> + continue;
>> +
>> + mce_setup(&m);
>> + m.bank = bank;
>> + m.status = status;
>> + mce_log(&m);
>> + wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
>> + break;
>> + }
> That's very similar to what we do in the end of
> amd_threshold_interrupt(). You could add a generic __log_error() static
> helper in a pre-patch and then call it here.
>

Right. I think a __log_error() is a good idea.
Except, in amd_threshold_interrupt(), we have-
m.misc = ((u64)high << 32) | low;

which, is actually useless as we don't use m.misc anywhere in
amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
We only print out if 'misc' is valid and we only need status bits for that-
((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),

But, more importantly, we don't setup 'm.addr' here (in
amd_threshold_interrupt() or in amd_deferred_error_interrupt())
Which means anytime we pass an error to be decoded from the interrupt
handlers, we don't get any info about the error address.

So, we can do one of these-
1. Remove m.misc setup in amd_threshold_interrupt() and
rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
2. Since we have mce_read_aux() that reads misc and addr registers, we
can move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and
use it here in mce_amd.c

Thoughts?

Thanks,
-Aravind.

2015-05-04 15:47:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote:
> >What's the family check for? for BIOSes which don't set the LVT offset
> >to 2, as they should?
> >
> >If so, we probably should say
> >
> > pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
> >
> >or similar...
>
> Yeah. I meant to provide a comment at least for this.
> Forgot to do that.
>
> I'll print out a error message as you suggested (considering we do this in
> other places like threshold setup or IBS setup..)

lvt_off_valid() does that already. Adding Robert.

> Right. I think a __log_error() is a good idea.
> Except, in amd_threshold_interrupt(), we have-
> m.misc = ((u64)high << 32) | low;
>
> which, is actually useless as we don't use m.misc anywhere in
> amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
> We only print out if 'misc' is valid and we only need status bits for that-
> ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
>
> But, more importantly, we don't setup 'm.addr' here (in
> amd_threshold_interrupt() or in amd_deferred_error_interrupt())
> Which means anytime we pass an error to be decoded from the interrupt
> handlers, we don't get any info about the error address.

So what are we reporting with a deferred error if it is not a
full-fledged MCE? We better fix that otherwise we probably shouldn't
even report those. I mean, userspace is supposed to do some error
handling based on error info but if that info's missing, we might just
as well panic right then and there, right?

> So, we can do one of these-
> 1. Remove m.misc setup in amd_threshold_interrupt() and
> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
> 2. Since we have mce_read_aux() that reads misc and addr registers, we can
> move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it
> here in mce_amd.c
>
> Thoughts?

Makes sense but you need to first check though, which registers are
valid in the hw when a threshold/deferred error happens and collect
them. Only then we can do proper recovery.

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-04 17:08:32

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/4/2015 10:46 AM, Borislav Petkov wrote:
> On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote:
>>> What's the family check for? for BIOSes which don't set the LVT offset
>>> to 2, as they should?
>>>
>>> If so, we probably should say
>>>
>>> pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
>>>
>>> or similar...
>> Yeah. I meant to provide a comment at least for this.
>> Forgot to do that.
>>
>> I'll print out a error message as you suggested (considering we do this in
>> other places like threshold setup or IBS setup..)
> lvt_off_valid() does that already. Adding Robert.

Not sure if lvt_off_valid() can be reused for deferred error interrupt
setup.
It expects some some of info to be in struct threshold_block which is
fine for threshold errors and the shifts for offset are different too.

For deferred errors, the workaround is a little different as it applies
to only the given family/model right now.
If the workaround needs to be applied for future processors, we can
extend the family check for those right?


>> Right. I think a __log_error() is a good idea.
>> Except, in amd_threshold_interrupt(), we have-
>> m.misc = ((u64)high << 32) | low;
>>
>> which, is actually useless as we don't use m.misc anywhere in
>> amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
>> We only print out if 'misc' is valid and we only need status bits for that-
>> ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
>>
>> But, more importantly, we don't setup 'm.addr' here (in
>> amd_threshold_interrupt() or in amd_deferred_error_interrupt())
>> Which means anytime we pass an error to be decoded from the interrupt
>> handlers, we don't get any info about the error address.
> So what are we reporting with a deferred error if it is not a
> full-fledged MCE? We better fix that otherwise we probably shouldn't
> even report those. I mean, userspace is supposed to do some error
> handling based on error info but if that info's missing, we might just
> as well panic right then and there, right?

Oh no.. It is a proper MCE.
I was simply saying that when we look at dmesg logs after an error
happens, we would not see any useful info regarding the error address.
Here's an example-
[ 1314.651485] [Hardware Error]: Deferred error.
[ 1314.651611] [Hardware Error]: CPU:0 (15:60:0)
MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00005080813
[ 1314.651898] [Hardware Error]: MC4 Error Address: 0x0000000000000000
^^^^^^^^^^^^^^^^^^^^

The Error Address will always be logged as 0x0 as m->addr in
amd_decode_mce() is 0x0.
If we setup 'm.addr' in amd_threshold_interrupt() and
amd_deferred_error_interrupt() properly, then amd_decode_mce() would
actually have
some value in m->addr to report.

I didn't mean to say HW doesn't provide us the information in the addr
and/or the misc registers.

>> So, we can do one of these-
>> 1. Remove m.misc setup in amd_threshold_interrupt() and
>> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
>> 2. Since we have mce_read_aux() that reads misc and addr registers, we can
>> move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it
>> here in mce_amd.c
>>
>> Thoughts?
> Makes sense but you need to first check though, which registers are
> valid in the hw when a threshold/deferred error happens and collect
> them. Only then we can do proper recovery.
>
>

The addr, misc registers are still valid for threshold, deferred errors.
(Of course, misc is valid only if m->status & MCI_STATUS_MISCV)

My point was, in __log_error(), we can read relevant status and addr
MSRs to be passed to mce_log() as those are the only pieces of
information we use in the decoding chain; and discard the m.misc
assignment we do for threshold errors.

If userspace tools absolutely need 'misc' info too, we can go with
option (2) as mentioned above..

Thanks,
-Aravind.

2015-05-04 18:47:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Mon, May 04, 2015 at 12:08:16PM -0500, Aravind Gopalakrishnan wrote:
> Not sure if lvt_off_valid() can be reused for deferred error interrupt
> setup. It expects some some of info to be in struct threshold_block
> which is fine for threshold errors and the shifts for offset are
> different too.

I meant that thresholding and IBS is being taken care of by that
function.

> For deferred errors, the workaround is a little different as it
> applies to only the given family/model right now. If the workaround
> needs to be applied for future processors, we can extend the family
> check for those right?

Or, you can do the check for all families as we're behind a CPUID bit
anyway. This is why CPUID bits are a good thing :-)

> If we setup 'm.addr' in amd_threshold_interrupt() and
> amd_deferred_error_interrupt() properly, then amd_decode_mce() would
> actually have some value in m->addr to report.
>
> I didn't mean to say HW doesn't provide us the information in the addr
> and/or the misc registers.

So you can use mce_read_aux(), yeah, you can move it to mce-internal.h

> The addr, misc registers are still valid for threshold, deferred errors.
> (Of course, misc is valid only if m->status & MCI_STATUS_MISCV)
>
> My point was, in __log_error(), we can read relevant status and addr MSRs to
> be passed to mce_log() as those are the only pieces of information we use in
> the decoding chain; and discard the m.misc assignment we do for threshold
> errors.

But MCx_MISC is important for thresholding errors, it carries the ErrCnt
and stuff.

So you can pass a parameter to __log_error(..., threshold=true, misc)
and do

if (threshold)
m.misc = misc;

Right?

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-04 19:07:05

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/4/2015 1:46 PM, Borislav Petkov wrote:
>> For deferred errors, the workaround is a little different as it
>> applies to only the given family/model right now. If the workaround
>> needs to be applied for future processors, we can extend the family
>> check for those right?
> Or, you can do the check for all families as we're behind a CPUID bit
> anyway. This is why CPUID bits are a good thing :-)

Yep. Ok, Will do that.

>> If we setup 'm.addr' in amd_threshold_interrupt() and
>> amd_deferred_error_interrupt() properly, then amd_decode_mce() would
>> actually have some value in m->addr to report.
>>
>> I didn't mean to say HW doesn't provide us the information in the addr
>> and/or the misc registers.
> So you can use mce_read_aux(), yeah, you can move it to mce-internal.h


Ok, will do.
Is it ok to grow another patch in a V2 for this instead of fixing it in
this patch since it's a real bug?
That should be helpful when someone wants to look up git logs of why
this was done..

>> The addr, misc registers are still valid for threshold, deferred errors.
>> (Of course, misc is valid only if m->status & MCI_STATUS_MISCV)
>>
>> My point was, in __log_error(), we can read relevant status and addr MSRs to
>> be passed to mce_log() as those are the only pieces of information we use in
>> the decoding chain; and discard the m.misc assignment we do for threshold
>> errors.
> But MCx_MISC is important for thresholding errors, it carries the ErrCnt
> and stuff.
>
> So you can pass a parameter to __log_error(..., threshold=true, misc)
> and do
>
> if (threshold)
> m.misc = misc;
>
> Right?
>

Yeah, just wanted to keep __log_error() as generic as possible and not
special case for threshold.
But ok, since MCx_MISC is needed, I'll work it up as you suggested.

Thanks,
-Aravind.

2015-05-04 19:14:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On Mon, May 04, 2015 at 02:06:43PM -0500, Aravind Gopalakrishnan wrote:
> Is it ok to grow another patch in a V2 for this instead of fixing
> it in this patch since it's a real bug? That should be helpful when
> someone wants to look up git logs of why this was done..

Yes, a prepatch please.

> Yeah, just wanted to keep __log_error() as generic as possible and not
> special case for threshold.

Not important as it is going to be used in mce_amd.c only anyway. It's
main goal is to avoid code duplication - nothing else.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-05-05 18:39:26

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/4/2015 1:46 PM, Borislav Petkov wrote:
> So you can use mce_read_aux(), yeah, you can move it to mce-internal.h

Re-using mce_read_aux() was not as trivial as I initially thought.
The MISC address value we read in amd_threshold_interrupt() could also
be the value
in MSR0xc0000408 or MSR0xc0000409 (for a bank == 4 case). But in
mce_read_aux(), we will only
look at MSR_IA32_MCx_MISC(i) (which is 0x413 for bank = 4)

So, instead of mucking around with mce_read_aux(), I am reusing the
'misc' value from amd_threshold_interrupt()
and just adding rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr)

> So you can pass a parameter to __log_error(..., threshold=true, misc)
> and do
>
> if (threshold)
> m.misc = misc;
>

Here's how I have it currently-
static void __log_error(unsigned int bank, bool is_thr, u64 misc)
{
struct mce m;

mce_setup(&m);
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
if (!(m.status & MCI_STATUS_VAL))
return;

if (is_thr)
m.misc = misc;

m.bank = bank;
rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
mce_log(&m);

wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
}

and works fine..

Before patch:

[76916.275587] [Hardware Error]: Corrected error, no action required.
[76916.279576] [Hardware Error]: CPU:0 (15:60:0)
MC0_STATUS[-|CE|-|-|AddrV|-|-|CECC]: 0x840041000028017b
[76916.279576] [Hardware Error]: MC0 Error Address: 0x0000000000000000

Corrected error output:
[ 102.623490] [Hardware Error]: Corrected error, no action required.
[ 102.623668] [Hardware Error]: CPU:0 (15:60:0)
MC0_STATUS[-|CE|-|-|AddrV|-|-|CECC]: 0x840041000028017b
[ 102.623930] [Hardware Error]: MC0 Error Address: 0x00001f808f0ff040

Thanks,
-Aravind.

2015-05-05 20:28:44

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

Should you check whether the address is valid before blindly reading the register?

> m.bank = bank;
if (m.status & MCI_STATUS_ADDRV)
rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
> mce_log(&m);

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-05 20:34:12

by Aravind Gopalakrishnan

[permalink] [raw]
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

On 5/5/2015 3:28 PM, Luck, Tony wrote:
> Should you check whether the address is valid before blindly reading the register?
>
>> m.bank = bank;
> if (m.status & MCI_STATUS_ADDRV)
> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>> mce_log(&m);

Yes, missed that when I sent the snippet.
Fixed it:)

Thanks,
-Aravind.