2017-03-30 10:45:01

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 0/8] arm64: acpi: apei: handle SEI notification type for ARMv8

ARM APEI extension proposal added SEI (asynchronous SError interrupt)
notification type for ARMv8.

Add a new GHES error source handling function for SEI. In firmware
first mode, if an error source's notification type is SEI. Then GHES
could parse and report the detail error information.

In firmware first mode, we could assume:
1) The ghes for sei is a global table, firmware should report one sei at a time.
2) SEI is masked while exception is processing
3) SEI may interrupt while irq handler is running
4) SEA may interrupt while SEI handler is running

So, the memory area used to transfer hardware error information from
BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
handler. We add a virtual page for SEI context.

Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
is used to synchronize Unrecoverable errors. That is, containable errors
architecturally consumed by the PE and not silently propagated.

With ESB it is generally possible to isolate an unrecoverable error
between two ESB instructions. So, it's possible to recovery from
unrecoverable errors reported by asynchronous SError interrupt.

Since SEI is asynchronous, a SEI generated on user process may propagate
to another user process via shared memory, which may cause unrecoverable.
We have a idea which only recover sei from user process which has no
writable shared memory.

The idea is still in the discussion, any comments is welcome.

This series is based on Tyler's series "[PATCH V13 00/10] Add UEFI 2.6 and ACPI 6.1 updates
for RAS on ARM64" and the latest mainline.

v3: add ARM processor error information trace event
add a per-cpu variable to indecate sei is processing
remove nmi_{enter/exit} protection for ghes_notify_sei()
reserve a virtual page for processing ghes table in SEI context
don't always inject vabt for sei
serprate do_sei from bad_mode
add esb which make recovery from sei possible
add an idea to recovery from sei for shared memory process

v2: add kvm guest SEI notification support
add nmi_{entry, exit} to protect ghes_notify_sei

https://lkml.org/lkml/2017/3/7/962
https://lkml.org/lkml/2017/3/3/189

Wang Xiongfeng (1):
arm64: exception: check shared writable page in SEI handler

Xie XiuQi (7):
trace: ras: add ARM processor error information trace event
acpi: apei: handle SEI notification type for ARMv8
arm64: apei: add a per-cpu variable to indecate sei is processing
APEI: GHES: reserve a virtual page for SEI context
arm64: KVM: add guest SEI support
arm64: RAS: add ras extension runtime detection
arm64: exception: handle asynchronous SError interrupt

arch/arm64/Kconfig | 16 +++
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 14 +++
arch/arm64/include/asm/sysreg.h | 2 +
arch/arm64/include/asm/system_misc.h | 1 +
arch/arm64/kernel/cpufeature.c | 11 ++
arch/arm64/kernel/entry.S | 70 ++++++++++-
arch/arm64/kernel/traps.c | 217 +++++++++++++++++++++++++++++++++++
arch/arm64/kvm/handle_exit.c | 22 +++-
drivers/acpi/apei/Kconfig | 14 +++
drivers/acpi/apei/ghes.c | 166 +++++++++++++++++++--------
include/acpi/ghes.h | 3 +
include/linux/cper.h | 5 +
include/ras/ras_event.h | 87 ++++++++++++++
14 files changed, 575 insertions(+), 56 deletions(-)

--
1.8.3.1


2017-03-30 10:35:05

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 6/8] arm64: RAS: add ras extension runtime detection

According to <<RAS Extension PRD03>> document, we add RAS extension
feature runtime detection, which would be used for error recovery
in the future.

Signed-off-by: Xie XiuQi <[email protected]>
Reviewed-by: Kefeng Wang <[email protected]>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kernel/cpufeature.c | 11 +++++++++++
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index fb78a5d..3847cf8 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -37,7 +37,8 @@
#define ARM64_HAS_NO_FPSIMD 16
#define ARM64_WORKAROUND_REPEAT_TLBI 17
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
+#define ARM64_HAS_RAS_EXTN 19

-#define ARM64_NCAPS 19
+#define ARM64_NCAPS 20

#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ac24b6e..32964c7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -157,6 +157,7 @@
#define ID_AA64ISAR0_AES_SHIFT 4

/* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT 28
#define ID_AA64PFR0_GIC_SHIFT 24
#define ID_AA64PFR0_ASIMD_SHIFT 20
#define ID_AA64PFR0_FP_SHIFT 16
@@ -165,6 +166,7 @@
#define ID_AA64PFR0_EL1_SHIFT 4
#define ID_AA64PFR0_EL0_SHIFT 0

+#define ID_AA64PFR0_RAS_V1 0x1
#define ID_AA64PFR0_FP_NI 0xf
#define ID_AA64PFR0_FP_SUPPORTED 0x0
#define ID_AA64PFR0_ASIMD_NI 0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index abda8e8..b0fb81e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -98,6 +98,7 @@
};

static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -860,6 +861,16 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
.min_field_value = 0,
.matches = has_no_fpsimd,
},
+ {
+ .desc = "ARM64 RAS Extension Support",
+ .capability = ARM64_HAS_RAS_EXTN,
+ .def_scope = SCOPE_SYSTEM,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64PFR0_EL1,
+ .sign = FTR_UNSIGNED,
+ .field_pos = ID_AA64PFR0_RAS_SHIFT,
+ .min_field_value = ID_AA64PFR0_RAS_V1,
+ },
{},
};

--
1.8.3.1

2017-03-30 10:34:59

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
is used to synchronize Unrecoverable errors. That is, containable errors
architecturally consumed by the PE and not silently propagated.

With ESB it is generally possible to isolate an unrecoverable error
between two ESB instructions. So, it's possible to recovery from
unrecoverable errors reported by asynchronous SError interrupt.

If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.

Signed-off-by: Xie XiuQi <[email protected]>
Signed-off-by: Wang Xiongfeng <[email protected]>
---
arch/arm64/Kconfig | 16 ++++++++++
arch/arm64/include/asm/esr.h | 14 +++++++++
arch/arm64/kernel/entry.S | 70 ++++++++++++++++++++++++++++++++++++++++++--
arch/arm64/kernel/traps.c | 54 ++++++++++++++++++++++++++++++++--
4 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 859a90e..7402175 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -911,6 +911,22 @@ endmenu

menu "ARMv8.2 architectural features"

+config ARM64_ESB
+ bool "Enable support for Error Synchronization Barrier (ESB)"
+ default n
+ help
+ Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
+ is used to synchronize Unrecoverable errors. That is, containable errors
+ architecturally consumed by the PE and not silently propagated.
+
+ Without ESB it is not generally possible to isolate an Unrecoverable
+ error because it is not known which instruction generated the error.
+
+ Selecting this option allows inject esb instruction before the exception
+ change. If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.
+
+ Note that ESB instruction can introduce slight overhead, so say N if unsure.
+
config ARM64_UAO
bool "Enable support for User Access Override (UAO)"
default y
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f20c64a..22f9c90 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -106,6 +106,20 @@
#define ESR_ELx_AR (UL(1) << 14)
#define ESR_ELx_CM (UL(1) << 8)

+#define ESR_Elx_DFSC_SEI (0x11)
+
+#define ESR_ELx_AET_SHIFT (10)
+#define ESR_ELx_AET_MAX (7)
+#define ESR_ELx_AET_MASK (UL(7) << ESR_ELx_AET_SHIFT)
+#define ESR_ELx_AET(esr) (((esr) & ESR_ELx_AET_MASK) >> ESR_ELx_AET_SHIFT)
+
+#define ESR_ELx_AET_UC (0)
+#define ESR_ELx_AET_UEU (1)
+#define ESR_ELx_AET_UEO (2)
+#define ESR_ELx_AET_UER (3)
+#define ESR_ELx_AET_CE (6)
+
+
/* ISS field definitions for exceptions taken in to Hyp */
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 43512d4..d8a7306 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -69,7 +69,14 @@
#define BAD_FIQ 2
#define BAD_ERROR 3

+ .arch_extension ras
+
.macro kernel_entry, el, regsize = 64
+#ifdef CONFIG_ARM64_ESB
+ .if \el == 0
+ esb
+ .endif
+#endif
sub sp, sp, #S_FRAME_SIZE
.if \regsize == 32
mov w0, w0 // zero upper 32 bits of x0
@@ -208,6 +215,7 @@ alternative_else_nop_endif
#endif

.if \el == 0
+ msr daifset, #0xF // Set flags
ldr x23, [sp, #S_SP] // load return stack pointer
msr sp_el0, x23
#ifdef CONFIG_ARM64_ERRATUM_845719
@@ -226,6 +234,15 @@ alternative_else_nop_endif

msr elr_el1, x21 // set up the return data
msr spsr_el1, x22
+
+#ifdef CONFIG_ARM64_ESB
+ .if \el == 0
+ esb // Error Synchronization Barrier
+ mrs x21, disr_el1 // Check for deferred error
+ tbnz x21, #31, el1_sei
+ .endif
+#endif
+
ldp x0, x1, [sp, #16 * 0]
ldp x2, x3, [sp, #16 * 1]
ldp x4, x5, [sp, #16 * 2]
@@ -318,7 +335,7 @@ ENTRY(vectors)
ventry el1_sync_invalid // Synchronous EL1t
ventry el1_irq_invalid // IRQ EL1t
ventry el1_fiq_invalid // FIQ EL1t
- ventry el1_error_invalid // Error EL1t
+ ventry el1_error // Error EL1t

ventry el1_sync // Synchronous EL1h
ventry el1_irq // IRQ EL1h
@@ -328,7 +345,7 @@ ENTRY(vectors)
ventry el0_sync // Synchronous 64-bit EL0
ventry el0_irq // IRQ 64-bit EL0
ventry el0_fiq_invalid // FIQ 64-bit EL0
- ventry el0_error_invalid // Error 64-bit EL0
+ ventry el0_error // Error 64-bit EL0

#ifdef CONFIG_COMPAT
ventry el0_sync_compat // Synchronous 32-bit EL0
@@ -508,12 +525,31 @@ el1_preempt:
ret x24
#endif

+ .align 6
+el1_error:
+ kernel_entry 1
+el1_sei:
+ /*
+ * asynchronous SError interrupt from kernel
+ */
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, #1 // exception level of SEI generated
+ b do_sei
+ENDPROC(el1_error)
+
+
/*
* EL0 mode handlers.
*/
.align 6
el0_sync:
kernel_entry 0
+#ifdef CONFIG_ARM64_ESB
+ mrs x26, disr_el1
+ tbnz x26, #31, el0_sei // check DISR.A
+ msr daifclr, #0x4 // unmask SEI
+#endif
mrs x25, esr_el1 // read the syndrome register
lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
@@ -688,8 +724,38 @@ el0_inv:
ENDPROC(el0_sync)

.align 6
+el0_error:
+ kernel_entry 0
+el0_sei:
+ /*
+ * asynchronous SError interrupt from userspace
+ */
+ ct_user_exit
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, #0
+ bl do_sei
+ b ret_to_user
+ENDPROC(el0_error)
+
+ .align 6
el0_irq:
kernel_entry 0
+#ifdef CONFIG_ARM64_ESB
+ mrs x26, disr_el1
+ tbz x26, #31, el0_irq_naked // check DISR.A
+
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, 0
+
+ /*
+ * The SEI generated at EL0 is not affect this irq context,
+ * so after sei handler, we continue process this irq.
+ */
+ bl do_sei
+ msr daifclr, #0x4 // unmask SEI
+#endif
el0_irq_naked:
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index b6d6727..99be6d8 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -643,6 +643,34 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
handler[reason], smp_processor_id(), esr,
esr_get_class_string(esr));

+ die("Oops - bad mode", regs, 0);
+ local_irq_disable();
+ panic("bad mode");
+}
+
+static const char *sei_context[] = {
+ "userspace", /* EL0 */
+ "kernel", /* EL1 */
+};
+
+static const char *sei_severity[] = {
+ [0 ... ESR_ELx_AET_MAX] = "Unknown",
+ [ESR_ELx_AET_UC] = "Uncontainable",
+ [ESR_ELx_AET_UEU] = "Unrecoverable",
+ [ESR_ELx_AET_UEO] = "Restartable",
+ [ESR_ELx_AET_UER] = "Recoverable",
+ [ESR_ELx_AET_CE] = "Corrected",
+};
+
+DEFINE_PER_CPU(int, sei_in_process);
+asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
+{
+ int aet = ESR_ELx_AET(esr);
+ console_verbose();
+
+ pr_crit("Asynchronous SError interrupt detected on CPU%d, %s, %s\n",
+ smp_processor_id(), sei_context[el], sei_severity[aet]);
+
/*
* In firmware first mode, we could assume firmware will only generate one
* of cper records at a time. There is no risk for one cpu to parse ghes table.
@@ -653,9 +681,31 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
this_cpu_dec(sei_in_process);
}

- die("Oops - bad mode", regs, 0);
+ if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
+ cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
+ siginfo_t info;
+ void __user *pc = (void __user *)instruction_pointer(regs);
+
+ if (aet >= ESR_ELx_AET_UEO)
+ return;
+
+ if (aet == ESR_ELx_AET_UEU) {
+ info.si_signo = SIGILL;
+ info.si_errno = 0;
+ info.si_code = ILL_ILLOPC;
+ info.si_addr = pc;
+
+ current->thread.fault_address = 0;
+ current->thread.fault_code = 0;
+
+ force_sig_info(info.si_signo, &info, current);
+
+ return;
+ }
+ }
+
local_irq_disable();
- panic("bad mode");
+ panic("Asynchronous SError interrupt");
}

/*
--
1.8.3.1

2017-03-30 10:35:02

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 5/8] arm64: KVM: add guest SEI support

Add ghes handling for SEI so that the host kernel could parse and
report detailed error information for SEI which occur in the guest
kernel.

If there were no CPER records, (or the system doesn't support SEI as a GHES
notification mechanism), then yes we should still call kvm_inject_vabt().

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/include/asm/system_misc.h | 1 +
arch/arm64/kernel/traps.c | 14 ++++++++++++++
arch/arm64/kvm/handle_exit.c | 22 ++++++++++++++++++++--
3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 5b2cecd..d68d61f 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -59,5 +59,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
#endif /* __ASSEMBLY__ */

int handle_guest_sea(unsigned long addr, unsigned int esr);
+int handle_guest_sei(unsigned long addr, unsigned int esr);

#endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 955dc8c..b6d6727 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -618,6 +618,20 @@ const char *esr_get_class_string(u32 esr)
DEFINE_PER_CPU(int, sei_in_process);

/*
+ * Handle asynchronous SError interrupt that occur in a guest kernel.
+ */
+int handle_guest_sei(unsigned long addr, unsigned int esr)
+{
+ int ret = -ENOENT;
+
+ if(IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+ ret = ghes_notify_sei();
+ }
+
+ return ret;
+}
+
+/*
* bad_mode handles the impossible case in the exception vector. This is always
* fatal.
*/
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fa1b18e..c89d83a 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_psci.h>
+#include <asm/system_misc.h>

#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -177,6 +178,23 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
return arm_exit_handlers[hsr_ec];
}

+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+ if (handle_guest_sei((unsigned long)fault_ipa,
+ kvm_vcpu_get_hsr(vcpu))) {
+ kvm_err("Failed to handle guest SEI, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+ kvm_vcpu_trap_get_class(vcpu),
+ (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+ (unsigned long)kvm_vcpu_get_hsr(vcpu));
+
+ kvm_inject_vabt(vcpu);
+ }
+
+ return 0;
+}
+
/*
* Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
* proper exit to userspace.
@@ -200,7 +218,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
*vcpu_pc(vcpu) -= adj;
}

- kvm_inject_vabt(vcpu);
+ kvm_handle_guest_sei(vcpu, run);
return 1;
}

@@ -210,7 +228,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
case ARM_EXCEPTION_IRQ:
return 1;
case ARM_EXCEPTION_EL1_SERROR:
- kvm_inject_vabt(vcpu);
+ kvm_handle_guest_sei(vcpu, run);
return 1;
case ARM_EXCEPTION_TRAP:
/*
--
1.8.3.1

2017-03-30 10:34:54

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8

ARM APEI extension proposal added SEI (asynchronous SError interrupt)
notification type for ARMv8.

Add a new GHES error source handling function for SEI. In firmware
first mode, if an error source's notification type is SEI. Then GHES
could parse and report the detail error information.

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 10 ++++++++
drivers/acpi/apei/Kconfig | 14 ++++++++++++
drivers/acpi/apei/ghes.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
include/acpi/ghes.h | 1 +
4 files changed, 83 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e52be6a..53710a2 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -47,6 +47,8 @@
#include <asm/system_misc.h>
#include <asm/sysreg.h>

+#include <acpi/ghes.h>
+
static const char *handler[]= {
"Synchronous Abort",
"IRQ",
@@ -625,6 +627,14 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
handler[reason], smp_processor_id(), esr,
esr_get_class_string(esr));

+ /*
+ * In firmware first mode, we could assume firmware will only generate one
+ * of cper records at a time. There is no risk for one cpu to parse ghes table.
+ */
+ if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
+ ghes_notify_sei();
+ }
+
die("Oops - bad mode", regs, 0);
local_irq_disable();
panic("bad mode");
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b54bd33..c00c9d34 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,20 @@ config ACPI_APEI_SEA
option allows the OS to look for such hardware error record, and
take appropriate action.

+config ACPI_APEI_SEI
+ bool "APEI Asynchronous SError Interrupt logging/recovering support"
+ depends on ARM64 && ACPI_APEI_GHES
+ help
+ This option should be enabled if the system supports
+ firmware first handling of SEI (asynchronous SError interrupt).
+
+ SEI happens with invalid instruction access or asynchronous exceptions
+ on ARMv8 systems. If a system supports firmware first handling of SEI,
+ the platform analyzes and handles hardware error notifications from
+ SEI, and it may then form a HW error record for the OS to parse and
+ handle. This option allows the OS to look for such hardware error
+ record, and take appropriate action.
+
config ACPI_APEI_MEMORY_FAILURE
bool "APEI memory error recovering support"
depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6be0333..045d101 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -862,6 +862,51 @@ static inline void ghes_sea_remove(struct ghes *ghes)
}
#endif /* CONFIG_ACPI_APEI_SEA */

+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+int ghes_notify_sei(void)
+{
+ struct ghes *ghes;
+ int ret = -ENOENT;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+ if(!ghes_proc(ghes))
+ ret = 0;
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+ mutex_lock(&ghes_list_mutex);
+ list_add_rcu(&ghes->list, &ghes_sei);
+ mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+ mutex_lock(&ghes_list_mutex);
+ list_del_rcu(&ghes->list);
+ mutex_unlock(&ghes_list_mutex);
+ synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes)
+{
+ pr_err(GHES_PFX "ID: %d, trying to add SEI notification which is not supported\n",
+ ghes->generic->header.source_id);
+}
+
+static inline void ghes_sei_remove(struct ghes *ghes)
+{
+ pr_err(GHES_PFX "ID: %d, trying to remove SEI notification which is not supported\n",
+ ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEI */
+
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
/*
* printk is not safe in NMI context. So in NMI handler, we allocate
@@ -1111,6 +1156,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
goto err;
}
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEI)) {
+ pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+ generic->header.source_id);
+ goto err;
+ }
+ break;
case ACPI_HEST_NOTIFY_NMI:
if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1179,6 +1231,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_add(ghes);
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ ghes_sei_add(ghes);
+ break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
break;
@@ -1224,6 +1279,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_remove(ghes);
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ ghes_sei_remove(ghes);
+ break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);
break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 2a727dc..10d752e 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -100,5 +100,6 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
}

int ghes_notify_sea(void);
+int ghes_notify_sei(void);

#endif /* GHES_H */
--
1.8.3.1

2017-03-30 10:34:57

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler

From: Wang Xiongfeng <[email protected]>

Since SEI is asynchronous, the error data has been consumed. So we must
suppose that all the memory data current process can write are
contaminated. If the process doesn't have shared writable pages, the
process will be killed, and the system will continue running normally.
Otherwise, the system must be terminated, because the error has been
propagated to other processes running on other cores, and recursively
the error may be propagated to several another processes.

Signed-off-by: Wang Xiongfeng <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 149 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 144 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 99be6d8..b222589 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -34,6 +34,8 @@
#include <linux/sched/task_stack.h>
#include <linux/syscalls.h>
#include <linux/mm_types.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>

#include <asm/atomic.h>
#include <asm/bug.h>
@@ -662,7 +664,144 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
[ESR_ELx_AET_CE] = "Corrected",
};

+static void shared_writable_pte_entry(pte_t *pte, unsigned long addr,
+ struct mm_walk *walk)
+{
+ int *is_shared_writable = walk->private;
+ struct vm_area_struct *vma = walk->vma;
+ struct page *page = NULL;
+ int mapcount = -1;
+
+ if (!pte_write(__pte(pgprot_val(vma->vm_page_prot))))
+ return;
+
+ if (pte_present(*pte)) {
+ page = vm_normal_page(vma, addr, *pte);
+ } else if (is_swap_pte(*pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+ if (!non_swap_entry(swpent))
+ mapcount = swp_swapcount(swpent);
+ else if (is_migration_entry(swpent))
+ page = migration_entry_to_page(swpent);
+ }
+
+ if (mapcount == -1 && page)
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+}
+
+static void shared_writable_pmd_entry(pmd_t *pmd, unsigned long addr,
+ struct mm_walk *walk)
+{
+ struct page *page;
+ int mapcount;
+ int *is_shared_writable = walk->private;
+
+ if (!pmd_write(*pmd))
+ return;
+
+ page = pmd_page(*pmd);
+ if (page) {
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+ }
+}
+
+static int shared_writable_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ pte_t *pte;
+
+ if (pmd_trans_huge(*pmd)) {
+ shared_writable_pmd_entry(pmd, addr, walk);
+ return 0;
+ }
+
+ if (pmd_trans_unstable(pmd))
+ return 0;
+
+ pte = pte_offset_map(pmd, addr);
+ for (; addr != end; pte++, addr += PAGE_SIZE)
+ shared_writable_pte_entry(pte, addr, walk);
+ return 0;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+static int shared_writable_hugetlb_range(pte_t *pte, unsigned long hmask,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct vm_area_struct *vma = walk->vma;
+ int *is_shared_writable = walk->private;
+ struct page *page = NULL;
+ int mapcount;
+
+ if (!pte_write(*pte))
+ return 0;
+
+ if (pte_present(*pte)) {
+ page = vm_normal_page(vma, addr, *pte);
+ } else if (is_swap_pte(*pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+ if (is_migration_entry(swpent))
+ page = migration_entry_to_page(swpent);
+ }
+
+ if (page) {
+ mapcount = page_mapcount(page);
+
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+ }
+ return 0;
+}
+#endif
+
+/*
+ *Check whether there exists a page in mm_struct which is shared with other
+ process and writable (not COW) at the same time. 0 means existing such a page.
+ */
+int mm_shared_writable(struct mm_struct *mm)
+{
+ struct vm_area_struct *vma;
+ int is_shared_writable = 0;
+ struct mm_walk shared_writable_walk = {
+ .pmd_entry = shared_writable_pte_range,
+#ifdef CONFIG_HUGETLB_PAGE
+ .hugetlb_entry = shared_writable_hugetlb_range,
+#endif
+ .mm = mm,
+ .private = &is_shared_writable,
+ };
+
+ if (!mm)
+ return -EPERM;
+
+ vma = mm->mmap;
+ while (vma) {
+ walk_page_vma(vma, &shared_writable_walk);
+ if (is_shared_writable)
+ return 1;
+ vma = vma->vm_next;
+ }
+ return 0;
+}
+
DEFINE_PER_CPU(int, sei_in_process);
+
+/*
+ * Since SEI is asynchronous, the error data has been consumed. So we must
+ * suppose that all the memory data current process can write are
+ * contaminated. If the process doesn't have shared writable pages, the
+ * process will be killed, and the system will continue running normally.
+ * Otherwise, the system must be terminated, because the error has been
+ * propagated to other processes running on other cores, and recursively
+ * the error may be propagated to several another processes.
+ */
asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
{
int aet = ESR_ELx_AET(esr);
@@ -684,16 +823,16 @@ asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
siginfo_t info;
- void __user *pc = (void __user *)instruction_pointer(regs);

if (aet >= ESR_ELx_AET_UEO)
return;

- if (aet == ESR_ELx_AET_UEU) {
- info.si_signo = SIGILL;
+ if (aet == ESR_ELx_AET_UEU &&
+ !mm_shared_writable(current->mm)) {
+ info.si_signo = SIGKILL;
info.si_errno = 0;
- info.si_code = ILL_ILLOPC;
- info.si_addr = pc;
+ info.si_code = 0;
+ info.si_addr = 0;

current->thread.fault_address = 0;
current->thread.fault_code = 0;
--
1.8.3.1

2017-03-30 10:34:47

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Add a new trace event for ARM processor error information, so that
the user will know what error occurred. With this information the
user may take appropriate action.

These trace events are consistent with the ARM processor error
information table which defined in UEFI 2.6 spec section N.2.4.4.1.

---
v2: add trace enabled condition as Steven's suggestion.
fix a typo.
---

Cc: Steven Rostedt <[email protected]>
Cc: Tyler Baicar <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
---
drivers/acpi/apei/ghes.c | 10 ++++++
include/linux/cper.h | 5 +++
include/ras/ras_event.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 102 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 81eabc6..6be0333 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -518,9 +518,19 @@ static void ghes_do_proc(struct ghes *ghes,
else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM) &&
trace_arm_event_enabled()) {
struct cper_sec_proc_arm *arm_err;
+ struct cper_arm_err_info *err_info;
+ int i;

arm_err = acpi_hest_generic_data_payload(gdata);
trace_arm_event(arm_err);
+
+ if (trace_arm_proc_err_enabled()) {
+ err_info = (struct cper_arm_err_info *)(arm_err + 1);
+ for (i = 0; i < arm_err->err_info_num; i++) {
+ trace_arm_proc_err(err_info);
+ err_info += 1;
+ }
+ }
} else if (trace_unknown_sec_event_enabled()) {
void *unknown_err = acpi_hest_generic_data_payload(gdata);
trace_unknown_sec_event(&sec_type,
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 85450f3..0cae900 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -270,6 +270,11 @@ enum {
#define CPER_ARM_INFO_VALID_VIRT_ADDR 0x0008
#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR 0x0010

+#define CPER_ARM_INFO_TYPE_CACHE 0
+#define CPER_ARM_INFO_TYPE_TLB 1
+#define CPER_ARM_INFO_TYPE_BUS 2
+#define CPER_ARM_INFO_TYPE_UARCH 3
+
#define CPER_ARM_INFO_FLAGS_FIRST 0x0001
#define CPER_ARM_INFO_FLAGS_LAST 0x0002
#define CPER_ARM_INFO_FLAGS_PROPAGATED 0x0004
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 13befad..026b094 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -206,6 +206,93 @@
__entry->running_state, __entry->psci_state)
);

+#define ARM_PROC_ERR_TYPE \
+ EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
+ EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
+ EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
+ EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
+
+#define ARM_PROC_ERR_FLAGS \
+ EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
+ EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
+ EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
+ EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
+
+/*
+ * First define the enums in MM_ACTION_RESULT to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+ARM_PROC_ERR_TYPE
+ARM_PROC_ERR_FLAGS
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) { a, b },
+#define EMe(a, b) { a, b }
+
+TRACE_EVENT(arm_proc_err,
+
+ TP_PROTO(const struct cper_arm_err_info *err),
+
+ TP_ARGS(err),
+
+ TP_STRUCT__entry(
+ __field(u8, type)
+ __field(u16, multiple_error)
+ __field(u8, flags)
+ __field(u64, error_info)
+ __field(u64, virt_fault_addr)
+ __field(u64, physical_fault_addr)
+ ),
+
+ TP_fast_assign(
+ __entry->type = err->type;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
+ __entry->multiple_error = err->multiple_error;
+ else
+ __entry->multiple_error = ~0;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
+ __entry->flags = err->flags;
+ else
+ __entry->flags = ~0;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+ __entry->error_info = err->error_info;
+ else
+ __entry->error_info = 0ULL;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+ __entry->virt_fault_addr = err->virt_fault_addr;
+ else
+ __entry->virt_fault_addr = 0ULL;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+ __entry->physical_fault_addr = err->physical_fault_addr;
+ else
+ __entry->physical_fault_addr = 0ULL;
+ ),
+
+ TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
+ " error info: %016llx; virtual address: %016llx;"
+ " physical address: %016llx",
+ __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
+ __entry->multiple_error,
+ __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
+ __entry->error_info, __entry->virt_fault_addr,
+ __entry->physical_fault_addr)
+);
+
/*
* Unknown Section Report
*
--
1.8.3.1

2017-03-30 10:41:51

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Add a new trace event for ARM processor error information, so that
the user will know what error occurred. With this information the
user may take appropriate action.

These trace events are consistent with the ARM processor error
information table which defined in UEFI 2.6 spec section N.2.4.4.1.

---
v2: add trace enabled condition as Steven's suggestion.
fix a typo.
---

Cc: Steven Rostedt <[email protected]>
Cc: Tyler Baicar <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
---
drivers/acpi/apei/ghes.c | 10 ++++++
include/linux/cper.h | 5 +++
include/ras/ras_event.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 102 insertions(+)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 81eabc6..6be0333 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -518,9 +518,19 @@ static void ghes_do_proc(struct ghes *ghes,
else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM) &&
trace_arm_event_enabled()) {
struct cper_sec_proc_arm *arm_err;
+ struct cper_arm_err_info *err_info;
+ int i;

arm_err = acpi_hest_generic_data_payload(gdata);
trace_arm_event(arm_err);
+
+ if (trace_arm_proc_err_enabled()) {
+ err_info = (struct cper_arm_err_info *)(arm_err + 1);
+ for (i = 0; i < arm_err->err_info_num; i++) {
+ trace_arm_proc_err(err_info);
+ err_info += 1;
+ }
+ }
} else if (trace_unknown_sec_event_enabled()) {
void *unknown_err = acpi_hest_generic_data_payload(gdata);
trace_unknown_sec_event(&sec_type,
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 85450f3..0cae900 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -270,6 +270,11 @@ enum {
#define CPER_ARM_INFO_VALID_VIRT_ADDR 0x0008
#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR 0x0010

+#define CPER_ARM_INFO_TYPE_CACHE 0
+#define CPER_ARM_INFO_TYPE_TLB 1
+#define CPER_ARM_INFO_TYPE_BUS 2
+#define CPER_ARM_INFO_TYPE_UARCH 3
+
#define CPER_ARM_INFO_FLAGS_FIRST 0x0001
#define CPER_ARM_INFO_FLAGS_LAST 0x0002
#define CPER_ARM_INFO_FLAGS_PROPAGATED 0x0004
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 13befad..026b094 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -206,6 +206,93 @@
__entry->running_state, __entry->psci_state)
);

+#define ARM_PROC_ERR_TYPE \
+ EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
+ EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
+ EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
+ EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
+
+#define ARM_PROC_ERR_FLAGS \
+ EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
+ EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
+ EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
+ EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
+
+/*
+ * First define the enums in MM_ACTION_RESULT to be exported to userspace
+ * via TRACE_DEFINE_ENUM().
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+ARM_PROC_ERR_TYPE
+ARM_PROC_ERR_FLAGS
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) { a, b },
+#define EMe(a, b) { a, b }
+
+TRACE_EVENT(arm_proc_err,
+
+ TP_PROTO(const struct cper_arm_err_info *err),
+
+ TP_ARGS(err),
+
+ TP_STRUCT__entry(
+ __field(u8, type)
+ __field(u16, multiple_error)
+ __field(u8, flags)
+ __field(u64, error_info)
+ __field(u64, virt_fault_addr)
+ __field(u64, physical_fault_addr)
+ ),
+
+ TP_fast_assign(
+ __entry->type = err->type;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
+ __entry->multiple_error = err->multiple_error;
+ else
+ __entry->multiple_error = ~0;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
+ __entry->flags = err->flags;
+ else
+ __entry->flags = ~0;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+ __entry->error_info = err->error_info;
+ else
+ __entry->error_info = 0ULL;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+ __entry->virt_fault_addr = err->virt_fault_addr;
+ else
+ __entry->virt_fault_addr = 0ULL;
+
+ if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+ __entry->physical_fault_addr = err->physical_fault_addr;
+ else
+ __entry->physical_fault_addr = 0ULL;
+ ),
+
+ TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
+ " error info: %016llx; virtual address: %016llx;"
+ " physical address: %016llx",
+ __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
+ __entry->multiple_error,
+ __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
+ __entry->error_info, __entry->virt_fault_addr,
+ __entry->physical_fault_addr)
+);
+
/*
* Unknown Section Report
*
--
1.8.3.1

2017-03-30 10:42:28

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8

ARM APEI extension proposal added SEI (asynchronous SError interrupt)
notification type for ARMv8.

Add a new GHES error source handling function for SEI. In firmware
first mode, if an error source's notification type is SEI. Then GHES
could parse and report the detail error information.

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 10 ++++++++
drivers/acpi/apei/Kconfig | 14 ++++++++++++
drivers/acpi/apei/ghes.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++
include/acpi/ghes.h | 1 +
4 files changed, 83 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index e52be6a..53710a2 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -47,6 +47,8 @@
#include <asm/system_misc.h>
#include <asm/sysreg.h>

+#include <acpi/ghes.h>
+
static const char *handler[]= {
"Synchronous Abort",
"IRQ",
@@ -625,6 +627,14 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
handler[reason], smp_processor_id(), esr,
esr_get_class_string(esr));

+ /*
+ * In firmware first mode, we could assume firmware will only generate one
+ * of cper records at a time. There is no risk for one cpu to parse ghes table.
+ */
+ if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
+ ghes_notify_sei();
+ }
+
die("Oops - bad mode", regs, 0);
local_irq_disable();
panic("bad mode");
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b54bd33..c00c9d34 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,20 @@ config ACPI_APEI_SEA
option allows the OS to look for such hardware error record, and
take appropriate action.

+config ACPI_APEI_SEI
+ bool "APEI Asynchronous SError Interrupt logging/recovering support"
+ depends on ARM64 && ACPI_APEI_GHES
+ help
+ This option should be enabled if the system supports
+ firmware first handling of SEI (asynchronous SError interrupt).
+
+ SEI happens with invalid instruction access or asynchronous exceptions
+ on ARMv8 systems. If a system supports firmware first handling of SEI,
+ the platform analyzes and handles hardware error notifications from
+ SEI, and it may then form a HW error record for the OS to parse and
+ handle. This option allows the OS to look for such hardware error
+ record, and take appropriate action.
+
config ACPI_APEI_MEMORY_FAILURE
bool "APEI memory error recovering support"
depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6be0333..045d101 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -862,6 +862,51 @@ static inline void ghes_sea_remove(struct ghes *ghes)
}
#endif /* CONFIG_ACPI_APEI_SEA */

+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+int ghes_notify_sei(void)
+{
+ struct ghes *ghes;
+ int ret = -ENOENT;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+ if(!ghes_proc(ghes))
+ ret = 0;
+ }
+ rcu_read_unlock();
+ return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+ mutex_lock(&ghes_list_mutex);
+ list_add_rcu(&ghes->list, &ghes_sei);
+ mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+ mutex_lock(&ghes_list_mutex);
+ list_del_rcu(&ghes->list);
+ mutex_unlock(&ghes_list_mutex);
+ synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes)
+{
+ pr_err(GHES_PFX "ID: %d, trying to add SEI notification which is not supported\n",
+ ghes->generic->header.source_id);
+}
+
+static inline void ghes_sei_remove(struct ghes *ghes)
+{
+ pr_err(GHES_PFX "ID: %d, trying to remove SEI notification which is not supported\n",
+ ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEI */
+
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
/*
* printk is not safe in NMI context. So in NMI handler, we allocate
@@ -1111,6 +1156,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
goto err;
}
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEI)) {
+ pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+ generic->header.source_id);
+ goto err;
+ }
+ break;
case ACPI_HEST_NOTIFY_NMI:
if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1179,6 +1231,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_add(ghes);
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ ghes_sei_add(ghes);
+ break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_add(ghes);
break;
@@ -1224,6 +1279,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
case ACPI_HEST_NOTIFY_SEA:
ghes_sea_remove(ghes);
break;
+ case ACPI_HEST_NOTIFY_SEI:
+ ghes_sei_remove(ghes);
+ break;
case ACPI_HEST_NOTIFY_NMI:
ghes_nmi_remove(ghes);
break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 2a727dc..10d752e 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -100,5 +100,6 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
}

int ghes_notify_sea(void);
+int ghes_notify_sei(void);

#endif /* GHES_H */
--
1.8.3.1

2017-03-30 10:41:17

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context

On arm64 platform, SEI may interrupt code which had interrupts masked.
But SEI could be masked, so it's not treated as NMI, however SEA is
treated as NMI.

So, the memory area used to transfer hardware error information from
BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
handler.

In this patch, we add a virtual page for SEI context.

Signed-off-by: Xie XiuQi <[email protected]>
---
drivers/acpi/apei/ghes.c | 98 +++++++++++++++++++++++-------------------------
1 file changed, 47 insertions(+), 51 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 045d101..b1f9b1f 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -108,26 +108,33 @@

/*
* Because the memory area used to transfer hardware error information
- * from BIOS to Linux can be determined only in NMI, IRQ or timer
- * handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * from BIOS to Linux can be determined only in NMI, SEI (ARM64), IRQ or
+ * timer handler, but general ioremap can not be used in atomic context,
+ * so a special version of atomic ioremap is implemented for that.
*/

/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
+ * 3 virtual pages are used, one for IRQ/PROCESS context, one for SEI
+ * (on ARM64 platform), and the other for NMI context (optionally).
*/
+#ifdef CONFIG_ACPI_APEI_SEI
+#define GHES_IOREMAP_PAGES 3
+#define GHES_IOREMAP_SEI_PAGE(base) ((base) + PAGE_SIZE*2)
+#else
#define GHES_IOREMAP_PAGES 2
+#endif
+
#define GHES_IOREMAP_IRQ_PAGE(base) (base)
#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)

/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
+ * These 3 spinlock is used to prevent atomic ioremap virtual memory
* area from being mapped simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_ioremap_lock_sei);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);

static struct gen_pool *ghes_estatus_pool;
@@ -155,54 +162,55 @@ static void ghes_ioremap_exit(void)
free_vm_area(ghes_ioremap_area);
}

-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
+static void __iomem *ghes_ioremap_pfn(u64 pfn)
{
- unsigned long vaddr;
+ unsigned long vaddr, flags = 0;
phys_addr_t paddr;
pgprot_t prot;

- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
- paddr = pfn << PAGE_SHIFT;
- prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
-{
- unsigned long vaddr, paddr;
- pgprot_t prot;
-
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
+ if (in_nmi()) {
+ raw_spin_lock(&ghes_ioremap_lock_nmi);
+ vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
+ } else if (this_cpu_read(sei_in_process)) {
+ spin_lock_irqsave(&ghes_ioremap_lock_sei, flags);
+ vaddr = (unsigned long)GHES_IOREMAP_SEI_PAGE(ghes_ioremap_area->addr);
+ } else {
+ spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
+ vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
+ }

paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
-
ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);

return (void __iomem *)vaddr;
}

-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap(void __iomem *vaddr_ptr)
{
unsigned long vaddr = (unsigned long __force)vaddr_ptr;
void *base = ghes_ioremap_area->addr;
+ unsigned long page, flags = 0;
+
+ if (in_nmi()) {
+ page = (unsigned long)GHES_IOREMAP_NMI_PAGE(base);
+ } else if (this_cpu_read(sei_in_process)) {
+ page = (unsigned long)GHES_IOREMAP_SEI_PAGE(base);
+ } else {
+ page = (unsigned long)GHES_IOREMAP_IRQ_PAGE(base);
+ }

- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
+ BUG_ON(vaddr != page);
unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
arch_apei_flush_tlb_one(vaddr);
-}
-
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
-{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;

- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ if (in_nmi()) {
+ raw_spin_unlock(&ghes_ioremap_lock_nmi);
+ } else if (this_cpu_read(sei_in_process)) {
+ spin_unlock_irqrestore(&ghes_ioremap_lock_sei, flags);
+ } else {
+ spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
+ }
}

static int ghes_estatus_pool_init(void)
@@ -327,20 +335,13 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
int from_phys)
{
void __iomem *vaddr;
- unsigned long flags = 0;
- int in_nmi = in_nmi();
u64 offset;
u32 trunk;

while (len > 0) {
offset = paddr - (paddr & PAGE_MASK);
- if (in_nmi) {
- raw_spin_lock(&ghes_ioremap_lock_nmi);
- vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
- } else {
- spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
- vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
- }
+ vaddr = ghes_ioremap_pfn(paddr >> PAGE_SHIFT);
+
trunk = PAGE_SIZE - offset;
trunk = min(trunk, len);
if (from_phys)
@@ -350,13 +351,8 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
len -= trunk;
paddr += trunk;
buffer += trunk;
- if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
- raw_spin_unlock(&ghes_ioremap_lock_nmi);
- } else {
- ghes_iounmap_irq(vaddr);
- spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
- }
+
+ ghes_iounmap(vaddr);
}
}

--
1.8.3.1

2017-03-30 10:34:49

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 3/8] arm64: apei: add a per-cpu variable to indecate sei is processing

Add a per-cpu variable to indicate sei is processing, with which we could use to
reserve a separate virtual space address page for sei in next patch

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 4 ++++
include/acpi/ghes.h | 2 ++
2 files changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 53710a2..955dc8c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -615,6 +615,8 @@ const char *esr_get_class_string(u32 esr)
return esr_class_str[ESR_ELx_EC(esr)];
}

+DEFINE_PER_CPU(int, sei_in_process);
+
/*
* bad_mode handles the impossible case in the exception vector. This is always
* fatal.
@@ -632,7 +634,9 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
* of cper records at a time. There is no risk for one cpu to parse ghes table.
*/
if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
+ this_cpu_inc(sei_in_process);
ghes_notify_sei();
+ this_cpu_dec(sei_in_process);
}

die("Oops - bad mode", regs, 0);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 10d752e..eed79ea 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -102,4 +102,6 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
int ghes_notify_sea(void);
int ghes_notify_sei(void);

+DECLARE_PER_CPU(int, sei_in_process);
+
#endif /* GHES_H */
--
1.8.3.1

2017-03-30 10:42:09

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 3/8] arm64: apei: add a per-cpu variable to indecate sei is processing

Add a per-cpu variable to indicate sei is processing, with which we could use to
reserve a separate virtual space address page for sei in next patch

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 4 ++++
include/acpi/ghes.h | 2 ++
2 files changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 53710a2..955dc8c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -615,6 +615,8 @@ const char *esr_get_class_string(u32 esr)
return esr_class_str[ESR_ELx_EC(esr)];
}

+DEFINE_PER_CPU(int, sei_in_process);
+
/*
* bad_mode handles the impossible case in the exception vector. This is always
* fatal.
@@ -632,7 +634,9 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
* of cper records at a time. There is no risk for one cpu to parse ghes table.
*/
if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
+ this_cpu_inc(sei_in_process);
ghes_notify_sei();
+ this_cpu_dec(sei_in_process);
}

die("Oops - bad mode", regs, 0);
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 10d752e..eed79ea 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -102,4 +102,6 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
int ghes_notify_sea(void);
int ghes_notify_sei(void);

+DECLARE_PER_CPU(int, sei_in_process);
+
#endif /* GHES_H */
--
1.8.3.1

2017-03-30 10:43:25

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
is used to synchronize Unrecoverable errors. That is, containable errors
architecturally consumed by the PE and not silently propagated.

With ESB it is generally possible to isolate an unrecoverable error
between two ESB instructions. So, it's possible to recovery from
unrecoverable errors reported by asynchronous SError interrupt.

If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.

Signed-off-by: Xie XiuQi <[email protected]>
Signed-off-by: Wang Xiongfeng <[email protected]>
---
arch/arm64/Kconfig | 16 ++++++++++
arch/arm64/include/asm/esr.h | 14 +++++++++
arch/arm64/kernel/entry.S | 70 ++++++++++++++++++++++++++++++++++++++++++--
arch/arm64/kernel/traps.c | 54 ++++++++++++++++++++++++++++++++--
4 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 859a90e..7402175 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -911,6 +911,22 @@ endmenu

menu "ARMv8.2 architectural features"

+config ARM64_ESB
+ bool "Enable support for Error Synchronization Barrier (ESB)"
+ default n
+ help
+ Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
+ is used to synchronize Unrecoverable errors. That is, containable errors
+ architecturally consumed by the PE and not silently propagated.
+
+ Without ESB it is not generally possible to isolate an Unrecoverable
+ error because it is not known which instruction generated the error.
+
+ Selecting this option allows inject esb instruction before the exception
+ change. If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.
+
+ Note that ESB instruction can introduce slight overhead, so say N if unsure.
+
config ARM64_UAO
bool "Enable support for User Access Override (UAO)"
default y
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f20c64a..22f9c90 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -106,6 +106,20 @@
#define ESR_ELx_AR (UL(1) << 14)
#define ESR_ELx_CM (UL(1) << 8)

+#define ESR_Elx_DFSC_SEI (0x11)
+
+#define ESR_ELx_AET_SHIFT (10)
+#define ESR_ELx_AET_MAX (7)
+#define ESR_ELx_AET_MASK (UL(7) << ESR_ELx_AET_SHIFT)
+#define ESR_ELx_AET(esr) (((esr) & ESR_ELx_AET_MASK) >> ESR_ELx_AET_SHIFT)
+
+#define ESR_ELx_AET_UC (0)
+#define ESR_ELx_AET_UEU (1)
+#define ESR_ELx_AET_UEO (2)
+#define ESR_ELx_AET_UER (3)
+#define ESR_ELx_AET_CE (6)
+
+
/* ISS field definitions for exceptions taken in to Hyp */
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 43512d4..d8a7306 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -69,7 +69,14 @@
#define BAD_FIQ 2
#define BAD_ERROR 3

+ .arch_extension ras
+
.macro kernel_entry, el, regsize = 64
+#ifdef CONFIG_ARM64_ESB
+ .if \el == 0
+ esb
+ .endif
+#endif
sub sp, sp, #S_FRAME_SIZE
.if \regsize == 32
mov w0, w0 // zero upper 32 bits of x0
@@ -208,6 +215,7 @@ alternative_else_nop_endif
#endif

.if \el == 0
+ msr daifset, #0xF // Set flags
ldr x23, [sp, #S_SP] // load return stack pointer
msr sp_el0, x23
#ifdef CONFIG_ARM64_ERRATUM_845719
@@ -226,6 +234,15 @@ alternative_else_nop_endif

msr elr_el1, x21 // set up the return data
msr spsr_el1, x22
+
+#ifdef CONFIG_ARM64_ESB
+ .if \el == 0
+ esb // Error Synchronization Barrier
+ mrs x21, disr_el1 // Check for deferred error
+ tbnz x21, #31, el1_sei
+ .endif
+#endif
+
ldp x0, x1, [sp, #16 * 0]
ldp x2, x3, [sp, #16 * 1]
ldp x4, x5, [sp, #16 * 2]
@@ -318,7 +335,7 @@ ENTRY(vectors)
ventry el1_sync_invalid // Synchronous EL1t
ventry el1_irq_invalid // IRQ EL1t
ventry el1_fiq_invalid // FIQ EL1t
- ventry el1_error_invalid // Error EL1t
+ ventry el1_error // Error EL1t

ventry el1_sync // Synchronous EL1h
ventry el1_irq // IRQ EL1h
@@ -328,7 +345,7 @@ ENTRY(vectors)
ventry el0_sync // Synchronous 64-bit EL0
ventry el0_irq // IRQ 64-bit EL0
ventry el0_fiq_invalid // FIQ 64-bit EL0
- ventry el0_error_invalid // Error 64-bit EL0
+ ventry el0_error // Error 64-bit EL0

#ifdef CONFIG_COMPAT
ventry el0_sync_compat // Synchronous 32-bit EL0
@@ -508,12 +525,31 @@ el1_preempt:
ret x24
#endif

+ .align 6
+el1_error:
+ kernel_entry 1
+el1_sei:
+ /*
+ * asynchronous SError interrupt from kernel
+ */
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, #1 // exception level of SEI generated
+ b do_sei
+ENDPROC(el1_error)
+
+
/*
* EL0 mode handlers.
*/
.align 6
el0_sync:
kernel_entry 0
+#ifdef CONFIG_ARM64_ESB
+ mrs x26, disr_el1
+ tbnz x26, #31, el0_sei // check DISR.A
+ msr daifclr, #0x4 // unmask SEI
+#endif
mrs x25, esr_el1 // read the syndrome register
lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
@@ -688,8 +724,38 @@ el0_inv:
ENDPROC(el0_sync)

.align 6
+el0_error:
+ kernel_entry 0
+el0_sei:
+ /*
+ * asynchronous SError interrupt from userspace
+ */
+ ct_user_exit
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, #0
+ bl do_sei
+ b ret_to_user
+ENDPROC(el0_error)
+
+ .align 6
el0_irq:
kernel_entry 0
+#ifdef CONFIG_ARM64_ESB
+ mrs x26, disr_el1
+ tbz x26, #31, el0_irq_naked // check DISR.A
+
+ mov x0, sp
+ mrs x1, esr_el1
+ mov x2, 0
+
+ /*
+ * The SEI generated at EL0 is not affect this irq context,
+ * so after sei handler, we continue process this irq.
+ */
+ bl do_sei
+ msr daifclr, #0x4 // unmask SEI
+#endif
el0_irq_naked:
enable_dbg
#ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index b6d6727..99be6d8 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -643,6 +643,34 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
handler[reason], smp_processor_id(), esr,
esr_get_class_string(esr));

+ die("Oops - bad mode", regs, 0);
+ local_irq_disable();
+ panic("bad mode");
+}
+
+static const char *sei_context[] = {
+ "userspace", /* EL0 */
+ "kernel", /* EL1 */
+};
+
+static const char *sei_severity[] = {
+ [0 ... ESR_ELx_AET_MAX] = "Unknown",
+ [ESR_ELx_AET_UC] = "Uncontainable",
+ [ESR_ELx_AET_UEU] = "Unrecoverable",
+ [ESR_ELx_AET_UEO] = "Restartable",
+ [ESR_ELx_AET_UER] = "Recoverable",
+ [ESR_ELx_AET_CE] = "Corrected",
+};
+
+DEFINE_PER_CPU(int, sei_in_process);
+asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
+{
+ int aet = ESR_ELx_AET(esr);
+ console_verbose();
+
+ pr_crit("Asynchronous SError interrupt detected on CPU%d, %s, %s\n",
+ smp_processor_id(), sei_context[el], sei_severity[aet]);
+
/*
* In firmware first mode, we could assume firmware will only generate one
* of cper records at a time. There is no risk for one cpu to parse ghes table.
@@ -653,9 +681,31 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
this_cpu_dec(sei_in_process);
}

- die("Oops - bad mode", regs, 0);
+ if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
+ cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
+ siginfo_t info;
+ void __user *pc = (void __user *)instruction_pointer(regs);
+
+ if (aet >= ESR_ELx_AET_UEO)
+ return;
+
+ if (aet == ESR_ELx_AET_UEU) {
+ info.si_signo = SIGILL;
+ info.si_errno = 0;
+ info.si_code = ILL_ILLOPC;
+ info.si_addr = pc;
+
+ current->thread.fault_address = 0;
+ current->thread.fault_code = 0;
+
+ force_sig_info(info.si_signo, &info, current);
+
+ return;
+ }
+ }
+
local_irq_disable();
- panic("bad mode");
+ panic("Asynchronous SError interrupt");
}

/*
--
1.8.3.1

2017-03-30 10:43:27

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler

From: Wang Xiongfeng <[email protected]>

Since SEI is asynchronous, the error data has been consumed. So we must
suppose that all the memory data current process can write are
contaminated. If the process doesn't have shared writable pages, the
process will be killed, and the system will continue running normally.
Otherwise, the system must be terminated, because the error has been
propagated to other processes running on other cores, and recursively
the error may be propagated to several another processes.

Signed-off-by: Wang Xiongfeng <[email protected]>
Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/kernel/traps.c | 149 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 144 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 99be6d8..b222589 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -34,6 +34,8 @@
#include <linux/sched/task_stack.h>
#include <linux/syscalls.h>
#include <linux/mm_types.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>

#include <asm/atomic.h>
#include <asm/bug.h>
@@ -662,7 +664,144 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
[ESR_ELx_AET_CE] = "Corrected",
};

+static void shared_writable_pte_entry(pte_t *pte, unsigned long addr,
+ struct mm_walk *walk)
+{
+ int *is_shared_writable = walk->private;
+ struct vm_area_struct *vma = walk->vma;
+ struct page *page = NULL;
+ int mapcount = -1;
+
+ if (!pte_write(__pte(pgprot_val(vma->vm_page_prot))))
+ return;
+
+ if (pte_present(*pte)) {
+ page = vm_normal_page(vma, addr, *pte);
+ } else if (is_swap_pte(*pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+ if (!non_swap_entry(swpent))
+ mapcount = swp_swapcount(swpent);
+ else if (is_migration_entry(swpent))
+ page = migration_entry_to_page(swpent);
+ }
+
+ if (mapcount == -1 && page)
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+}
+
+static void shared_writable_pmd_entry(pmd_t *pmd, unsigned long addr,
+ struct mm_walk *walk)
+{
+ struct page *page;
+ int mapcount;
+ int *is_shared_writable = walk->private;
+
+ if (!pmd_write(*pmd))
+ return;
+
+ page = pmd_page(*pmd);
+ if (page) {
+ mapcount = page_mapcount(page);
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+ }
+}
+
+static int shared_writable_pte_range(pmd_t *pmd, unsigned long addr,
+ unsigned long end, struct mm_walk *walk)
+{
+ pte_t *pte;
+
+ if (pmd_trans_huge(*pmd)) {
+ shared_writable_pmd_entry(pmd, addr, walk);
+ return 0;
+ }
+
+ if (pmd_trans_unstable(pmd))
+ return 0;
+
+ pte = pte_offset_map(pmd, addr);
+ for (; addr != end; pte++, addr += PAGE_SIZE)
+ shared_writable_pte_entry(pte, addr, walk);
+ return 0;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+static int shared_writable_hugetlb_range(pte_t *pte, unsigned long hmask,
+ unsigned long addr, unsigned long end,
+ struct mm_walk *walk)
+{
+ struct vm_area_struct *vma = walk->vma;
+ int *is_shared_writable = walk->private;
+ struct page *page = NULL;
+ int mapcount;
+
+ if (!pte_write(*pte))
+ return 0;
+
+ if (pte_present(*pte)) {
+ page = vm_normal_page(vma, addr, *pte);
+ } else if (is_swap_pte(*pte)) {
+ swp_entry_t swpent = pte_to_swp_entry(*pte);
+
+ if (is_migration_entry(swpent))
+ page = migration_entry_to_page(swpent);
+ }
+
+ if (page) {
+ mapcount = page_mapcount(page);
+
+ if (mapcount >= 2)
+ *is_shared_writable = 1;
+ }
+ return 0;
+}
+#endif
+
+/*
+ *Check whether there exists a page in mm_struct which is shared with other
+ process and writable (not COW) at the same time. 0 means existing such a page.
+ */
+int mm_shared_writable(struct mm_struct *mm)
+{
+ struct vm_area_struct *vma;
+ int is_shared_writable = 0;
+ struct mm_walk shared_writable_walk = {
+ .pmd_entry = shared_writable_pte_range,
+#ifdef CONFIG_HUGETLB_PAGE
+ .hugetlb_entry = shared_writable_hugetlb_range,
+#endif
+ .mm = mm,
+ .private = &is_shared_writable,
+ };
+
+ if (!mm)
+ return -EPERM;
+
+ vma = mm->mmap;
+ while (vma) {
+ walk_page_vma(vma, &shared_writable_walk);
+ if (is_shared_writable)
+ return 1;
+ vma = vma->vm_next;
+ }
+ return 0;
+}
+
DEFINE_PER_CPU(int, sei_in_process);
+
+/*
+ * Since SEI is asynchronous, the error data has been consumed. So we must
+ * suppose that all the memory data current process can write are
+ * contaminated. If the process doesn't have shared writable pages, the
+ * process will be killed, and the system will continue running normally.
+ * Otherwise, the system must be terminated, because the error has been
+ * propagated to other processes running on other cores, and recursively
+ * the error may be propagated to several another processes.
+ */
asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
{
int aet = ESR_ELx_AET(esr);
@@ -684,16 +823,16 @@ asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
siginfo_t info;
- void __user *pc = (void __user *)instruction_pointer(regs);

if (aet >= ESR_ELx_AET_UEO)
return;

- if (aet == ESR_ELx_AET_UEU) {
- info.si_signo = SIGILL;
+ if (aet == ESR_ELx_AET_UEU &&
+ !mm_shared_writable(current->mm)) {
+ info.si_signo = SIGKILL;
info.si_errno = 0;
- info.si_code = ILL_ILLOPC;
- info.si_addr = pc;
+ info.si_code = 0;
+ info.si_addr = 0;

current->thread.fault_address = 0;
current->thread.fault_code = 0;
--
1.8.3.1

2017-03-30 10:43:31

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 6/8] arm64: RAS: add ras extension runtime detection

According to <<RAS Extension PRD03>> document, we add RAS extension
feature runtime detection, which would be used for error recovery
in the future.

Signed-off-by: Xie XiuQi <[email protected]>
Reviewed-by: Kefeng Wang <[email protected]>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kernel/cpufeature.c | 11 +++++++++++
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index fb78a5d..3847cf8 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -37,7 +37,8 @@
#define ARM64_HAS_NO_FPSIMD 16
#define ARM64_WORKAROUND_REPEAT_TLBI 17
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
+#define ARM64_HAS_RAS_EXTN 19

-#define ARM64_NCAPS 19
+#define ARM64_NCAPS 20

#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ac24b6e..32964c7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -157,6 +157,7 @@
#define ID_AA64ISAR0_AES_SHIFT 4

/* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT 28
#define ID_AA64PFR0_GIC_SHIFT 24
#define ID_AA64PFR0_ASIMD_SHIFT 20
#define ID_AA64PFR0_FP_SHIFT 16
@@ -165,6 +166,7 @@
#define ID_AA64PFR0_EL1_SHIFT 4
#define ID_AA64PFR0_EL0_SHIFT 0

+#define ID_AA64PFR0_RAS_V1 0x1
#define ID_AA64PFR0_FP_NI 0xf
#define ID_AA64PFR0_FP_SUPPORTED 0x0
#define ID_AA64PFR0_ASIMD_NI 0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index abda8e8..b0fb81e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -98,6 +98,7 @@
};

static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -860,6 +861,16 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
.min_field_value = 0,
.matches = has_no_fpsimd,
},
+ {
+ .desc = "ARM64 RAS Extension Support",
+ .capability = ARM64_HAS_RAS_EXTN,
+ .def_scope = SCOPE_SYSTEM,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64PFR0_EL1,
+ .sign = FTR_UNSIGNED,
+ .field_pos = ID_AA64PFR0_RAS_SHIFT,
+ .min_field_value = ID_AA64PFR0_RAS_V1,
+ },
{},
};

--
1.8.3.1

2017-03-30 10:34:44

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context

On arm64 platform, SEI may interrupt code which had interrupts masked.
But SEI could be masked, so it's not treated as NMI, however SEA is
treated as NMI.

So, the memory area used to transfer hardware error information from
BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
handler.

In this patch, we add a virtual page for SEI context.

Signed-off-by: Xie XiuQi <[email protected]>
---
drivers/acpi/apei/ghes.c | 98 +++++++++++++++++++++++-------------------------
1 file changed, 47 insertions(+), 51 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 045d101..b1f9b1f 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -108,26 +108,33 @@

/*
* Because the memory area used to transfer hardware error information
- * from BIOS to Linux can be determined only in NMI, IRQ or timer
- * handler, but general ioremap can not be used in atomic context, so
- * a special version of atomic ioremap is implemented for that.
+ * from BIOS to Linux can be determined only in NMI, SEI (ARM64), IRQ or
+ * timer handler, but general ioremap can not be used in atomic context,
+ * so a special version of atomic ioremap is implemented for that.
*/

/*
- * Two virtual pages are used, one for IRQ/PROCESS context, the other for
- * NMI context (optionally).
+ * 3 virtual pages are used, one for IRQ/PROCESS context, one for SEI
+ * (on ARM64 platform), and the other for NMI context (optionally).
*/
+#ifdef CONFIG_ACPI_APEI_SEI
+#define GHES_IOREMAP_PAGES 3
+#define GHES_IOREMAP_SEI_PAGE(base) ((base) + PAGE_SIZE*2)
+#else
#define GHES_IOREMAP_PAGES 2
+#endif
+
#define GHES_IOREMAP_IRQ_PAGE(base) (base)
#define GHES_IOREMAP_NMI_PAGE(base) ((base) + PAGE_SIZE)

/* virtual memory area for atomic ioremap */
static struct vm_struct *ghes_ioremap_area;
/*
- * These 2 spinlock is used to prevent atomic ioremap virtual memory
+ * These 3 spinlock is used to prevent atomic ioremap virtual memory
* area from being mapped simultaneously.
*/
static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi);
+static DEFINE_SPINLOCK(ghes_ioremap_lock_sei);
static DEFINE_SPINLOCK(ghes_ioremap_lock_irq);

static struct gen_pool *ghes_estatus_pool;
@@ -155,54 +162,55 @@ static void ghes_ioremap_exit(void)
free_vm_area(ghes_ioremap_area);
}

-static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
+static void __iomem *ghes_ioremap_pfn(u64 pfn)
{
- unsigned long vaddr;
+ unsigned long vaddr, flags = 0;
phys_addr_t paddr;
pgprot_t prot;

- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-
- paddr = pfn << PAGE_SHIFT;
- prot = arch_apei_get_mem_attribute(paddr);
- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
-
- return (void __iomem *)vaddr;
-}
-
-static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
-{
- unsigned long vaddr, paddr;
- pgprot_t prot;
-
- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
+ if (in_nmi()) {
+ raw_spin_lock(&ghes_ioremap_lock_nmi);
+ vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
+ } else if (this_cpu_read(sei_in_process)) {
+ spin_lock_irqsave(&ghes_ioremap_lock_sei, flags);
+ vaddr = (unsigned long)GHES_IOREMAP_SEI_PAGE(ghes_ioremap_area->addr);
+ } else {
+ spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
+ vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
+ }

paddr = pfn << PAGE_SHIFT;
prot = arch_apei_get_mem_attribute(paddr);
-
ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);

return (void __iomem *)vaddr;
}

-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
+static void ghes_iounmap(void __iomem *vaddr_ptr)
{
unsigned long vaddr = (unsigned long __force)vaddr_ptr;
void *base = ghes_ioremap_area->addr;
+ unsigned long page, flags = 0;
+
+ if (in_nmi()) {
+ page = (unsigned long)GHES_IOREMAP_NMI_PAGE(base);
+ } else if (this_cpu_read(sei_in_process)) {
+ page = (unsigned long)GHES_IOREMAP_SEI_PAGE(base);
+ } else {
+ page = (unsigned long)GHES_IOREMAP_IRQ_PAGE(base);
+ }

- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base));
+ BUG_ON(vaddr != page);
unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
arch_apei_flush_tlb_one(vaddr);
-}
-
-static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
-{
- unsigned long vaddr = (unsigned long __force)vaddr_ptr;
- void *base = ghes_ioremap_area->addr;

- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base));
- unmap_kernel_range_noflush(vaddr, PAGE_SIZE);
- arch_apei_flush_tlb_one(vaddr);
+ if (in_nmi()) {
+ raw_spin_unlock(&ghes_ioremap_lock_nmi);
+ } else if (this_cpu_read(sei_in_process)) {
+ spin_unlock_irqrestore(&ghes_ioremap_lock_sei, flags);
+ } else {
+ spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
+ }
}

static int ghes_estatus_pool_init(void)
@@ -327,20 +335,13 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
int from_phys)
{
void __iomem *vaddr;
- unsigned long flags = 0;
- int in_nmi = in_nmi();
u64 offset;
u32 trunk;

while (len > 0) {
offset = paddr - (paddr & PAGE_MASK);
- if (in_nmi) {
- raw_spin_lock(&ghes_ioremap_lock_nmi);
- vaddr = ghes_ioremap_pfn_nmi(paddr >> PAGE_SHIFT);
- } else {
- spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
- vaddr = ghes_ioremap_pfn_irq(paddr >> PAGE_SHIFT);
- }
+ vaddr = ghes_ioremap_pfn(paddr >> PAGE_SHIFT);
+
trunk = PAGE_SIZE - offset;
trunk = min(trunk, len);
if (from_phys)
@@ -350,13 +351,8 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
len -= trunk;
paddr += trunk;
buffer += trunk;
- if (in_nmi) {
- ghes_iounmap_nmi(vaddr);
- raw_spin_unlock(&ghes_ioremap_lock_nmi);
- } else {
- ghes_iounmap_irq(vaddr);
- spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags);
- }
+
+ ghes_iounmap(vaddr);
}
}

--
1.8.3.1

2017-03-30 10:44:22

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 5/8] arm64: KVM: add guest SEI support

Add ghes handling for SEI so that the host kernel could parse and
report detailed error information for SEI which occur in the guest
kernel.

If there were no CPER records, (or the system doesn't support SEI as a GHES
notification mechanism), then yes we should still call kvm_inject_vabt().

Signed-off-by: Xie XiuQi <[email protected]>
---
arch/arm64/include/asm/system_misc.h | 1 +
arch/arm64/kernel/traps.c | 14 ++++++++++++++
arch/arm64/kvm/handle_exit.c | 22 ++++++++++++++++++++--
3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 5b2cecd..d68d61f 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -59,5 +59,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
#endif /* __ASSEMBLY__ */

int handle_guest_sea(unsigned long addr, unsigned int esr);
+int handle_guest_sei(unsigned long addr, unsigned int esr);

#endif /* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 955dc8c..b6d6727 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -618,6 +618,20 @@ const char *esr_get_class_string(u32 esr)
DEFINE_PER_CPU(int, sei_in_process);

/*
+ * Handle asynchronous SError interrupt that occur in a guest kernel.
+ */
+int handle_guest_sei(unsigned long addr, unsigned int esr)
+{
+ int ret = -ENOENT;
+
+ if(IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+ ret = ghes_notify_sei();
+ }
+
+ return ret;
+}
+
+/*
* bad_mode handles the impossible case in the exception vector. This is always
* fatal.
*/
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fa1b18e..c89d83a 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_psci.h>
+#include <asm/system_misc.h>

#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -177,6 +178,23 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
return arm_exit_handlers[hsr_ec];
}

+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+ if (handle_guest_sei((unsigned long)fault_ipa,
+ kvm_vcpu_get_hsr(vcpu))) {
+ kvm_err("Failed to handle guest SEI, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+ kvm_vcpu_trap_get_class(vcpu),
+ (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+ (unsigned long)kvm_vcpu_get_hsr(vcpu));
+
+ kvm_inject_vabt(vcpu);
+ }
+
+ return 0;
+}
+
/*
* Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
* proper exit to userspace.
@@ -200,7 +218,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
*vcpu_pc(vcpu) -= adj;
}

- kvm_inject_vabt(vcpu);
+ kvm_handle_guest_sei(vcpu, run);
return 1;
}

@@ -210,7 +228,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
case ARM_EXCEPTION_IRQ:
return 1;
case ARM_EXCEPTION_EL1_SERROR:
- kvm_inject_vabt(vcpu);
+ kvm_handle_guest_sei(vcpu, run);
return 1;
case ARM_EXCEPTION_TRAP:
/*
--
1.8.3.1

2017-03-30 10:44:25

by Xie XiuQi

[permalink] [raw]
Subject: [PATCH v3 0/8] arm64: acpi: apei: handle SEI notification type for ARMv8

ARM APEI extension proposal added SEI (asynchronous SError interrupt)
notification type for ARMv8.

Add a new GHES error source handling function for SEI. In firmware
first mode, if an error source's notification type is SEI. Then GHES
could parse and report the detail error information.

In firmware first mode, we could assume:
1) The ghes for sei is a global table, firmware should report one sei at a time.
2) SEI is masked while exception is processing
3) SEI may interrupt while irq handler is running
4) SEA may interrupt while SEI handler is running

So, the memory area used to transfer hardware error information from
BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
handler. We add a virtual page for SEI context.

Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
is used to synchronize Unrecoverable errors. That is, containable errors
architecturally consumed by the PE and not silently propagated.

With ESB it is generally possible to isolate an unrecoverable error
between two ESB instructions. So, it's possible to recovery from
unrecoverable errors reported by asynchronous SError interrupt.

Since SEI is asynchronous, a SEI generated on user process may propagate
to another user process via shared memory, which may cause unrecoverable.
We have a idea which only recover sei from user process which has no
writable shared memory.

The idea is still in the discussion, any comments is welcome.

This series is based on Tyler's series "[PATCH V13 00/10] Add UEFI 2.6 and ACPI 6.1 updates
for RAS on ARM64" and the latest mainline.

v3: add ARM processor error information trace event
add a per-cpu variable to indecate sei is processing
remove nmi_{enter/exit} protection for ghes_notify_sei()
reserve a virtual page for processing ghes table in SEI context
don't always inject vabt for sei
serprate do_sei from bad_mode
add esb which make recovery from sei possible
add an idea to recovery from sei for shared memory process

v2: add kvm guest SEI notification support
add nmi_{entry, exit} to protect ghes_notify_sei

https://lkml.org/lkml/2017/3/7/962
https://lkml.org/lkml/2017/3/3/189

Wang Xiongfeng (1):
arm64: exception: check shared writable page in SEI handler

Xie XiuQi (7):
trace: ras: add ARM processor error information trace event
acpi: apei: handle SEI notification type for ARMv8
arm64: apei: add a per-cpu variable to indecate sei is processing
APEI: GHES: reserve a virtual page for SEI context
arm64: KVM: add guest SEI support
arm64: RAS: add ras extension runtime detection
arm64: exception: handle asynchronous SError interrupt

arch/arm64/Kconfig | 16 +++
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 14 +++
arch/arm64/include/asm/sysreg.h | 2 +
arch/arm64/include/asm/system_misc.h | 1 +
arch/arm64/kernel/cpufeature.c | 11 ++
arch/arm64/kernel/entry.S | 70 ++++++++++-
arch/arm64/kernel/traps.c | 217 +++++++++++++++++++++++++++++++++++
arch/arm64/kvm/handle_exit.c | 22 +++-
drivers/acpi/apei/Kconfig | 14 +++
drivers/acpi/apei/ghes.c | 166 +++++++++++++++++++--------
include/acpi/ghes.h | 3 +
include/linux/cper.h | 5 +
include/ras/ras_event.h | 87 ++++++++++++++
14 files changed, 575 insertions(+), 56 deletions(-)

--
1.8.3.1

2017-03-30 16:02:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

On Thu, 30 Mar 2017 18:31:01 +0800
Xie XiuQi <[email protected]> wrote:

> Add a new trace event for ARM processor error information, so that
> the user will know what error occurred. With this information the
> user may take appropriate action.
>
> These trace events are consistent with the ARM processor error
> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>
> ---
> v2: add trace enabled condition as Steven's suggestion.
> fix a typo.
> ---
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Tyler Baicar <[email protected]>
> Signed-off-by: Xie XiuQi <[email protected]>
> ---
> drivers/acpi/apei/ghes.c | 10 ++++++
> include/linux/cper.h | 5 +++
> include/ras/ras_event.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 102 insertions(+)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 81eabc6..6be0333 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -518,9 +518,19 @@ static void ghes_do_proc(struct ghes *ghes,
> else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM) &&
> trace_arm_event_enabled()) {
> struct cper_sec_proc_arm *arm_err;
> + struct cper_arm_err_info *err_info;
> + int i;
>
> arm_err = acpi_hest_generic_data_payload(gdata);
> trace_arm_event(arm_err);
> +
> + if (trace_arm_proc_err_enabled()) {
> + err_info = (struct cper_arm_err_info *)(arm_err + 1);
> + for (i = 0; i < arm_err->err_info_num; i++) {
> + trace_arm_proc_err(err_info);
> + err_info += 1;
> + }
> + }
> } else if (trace_unknown_sec_event_enabled()) {
> void *unknown_err = acpi_hest_generic_data_payload(gdata);
> trace_unknown_sec_event(&sec_type,
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index 85450f3..0cae900 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -270,6 +270,11 @@ enum {
> #define CPER_ARM_INFO_VALID_VIRT_ADDR 0x0008
> #define CPER_ARM_INFO_VALID_PHYSICAL_ADDR 0x0010
>
> +#define CPER_ARM_INFO_TYPE_CACHE 0
> +#define CPER_ARM_INFO_TYPE_TLB 1
> +#define CPER_ARM_INFO_TYPE_BUS 2
> +#define CPER_ARM_INFO_TYPE_UARCH 3
> +
> #define CPER_ARM_INFO_FLAGS_FIRST 0x0001
> #define CPER_ARM_INFO_FLAGS_LAST 0x0002
> #define CPER_ARM_INFO_FLAGS_PROPAGATED 0x0004
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 13befad..026b094 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -206,6 +206,93 @@
> __entry->running_state, __entry->psci_state)
> );
>
> +#define ARM_PROC_ERR_TYPE \
> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
> +
> +#define ARM_PROC_ERR_FLAGS \
> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
> +

Can flags have more than one bit set?

> +/*
> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
> + * via TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +ARM_PROC_ERR_TYPE
> +ARM_PROC_ERR_FLAGS
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) { a, b },
> +#define EMe(a, b) { a, b }
> +
> +TRACE_EVENT(arm_proc_err,
> +
> + TP_PROTO(const struct cper_arm_err_info *err),
> +
> + TP_ARGS(err),
> +
> + TP_STRUCT__entry(
> + __field(u8, type)
> + __field(u16, multiple_error)
> + __field(u8, flags)
> + __field(u64, error_info)
> + __field(u64, virt_fault_addr)
> + __field(u64, physical_fault_addr)
> + ),
> +
> + TP_fast_assign(
> + __entry->type = err->type;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
> + __entry->multiple_error = err->multiple_error;
> + else
> + __entry->multiple_error = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
> + __entry->flags = err->flags;
> + else
> + __entry->flags = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
> + __entry->error_info = err->error_info;
> + else
> + __entry->error_info = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
> + __entry->virt_fault_addr = err->virt_fault_addr;
> + else
> + __entry->virt_fault_addr = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
> + __entry->physical_fault_addr = err->physical_fault_addr;
> + else
> + __entry->physical_fault_addr = 0ULL;

else statements add a lot of branch conditions, and kills the branch
prediction. What about setting up default values, and then setting only
if the if statement is true.


memset(&__entry->error_info, 0,
sizeof(__entry) - offsetof(typeof(__entry), error_info)));
__entry->multiple_error = ~0;
__entry->flags = ~0;

if (...)

> + ),
> +
> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
> + " error info: %016llx; virtual address: %016llx;"
> + " physical address: %016llx",
> + __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
> + __entry->multiple_error,
> + __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),

Again, can flags have more than one bit set? If so, then this wont work.

-- Steve

> + __entry->error_info, __entry->virt_fault_addr,
> + __entry->physical_fault_addr)
> +);
> +
> /*
> * Unknown Section Report
> *

2017-03-31 16:20:54

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8

Hi Xie XiuQi,

On 30/03/17 11:31, Xie XiuQi wrote:
> ARM APEI extension proposal added SEI (asynchronous SError interrupt)
> notification type for ARMv8.
>
> Add a new GHES error source handling function for SEI. In firmware
> first mode, if an error source's notification type is SEI. Then GHES
> could parse and report the detail error information.

The APEI additions are unsafe until patch 4 as SEA can interrupt SEI and
deadlock while trying to take the same set of locks. This patch needs to be
after that interaction is fixed/prevented, or we should prevent it by adding a
depends-on-not to the Kconfig to prevent SEI and SEA being registered at the
same time. (as a short term fix).

(more comments on this on that later patch)


> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index e52be6a..53710a2 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c

> @@ -625,6 +627,14 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)

bad_mode() is called in other scenarios too, for example executing an undefined
instruction at EL1. You split the SError path out of the vectors in patch 7, I
think we should do that here.


> handler[reason], smp_processor_id(), esr,
> esr_get_class_string(esr));
>
> + /*
> + * In firmware first mode, we could assume firmware will only generate one
> + * of cper records at a time. There is no risk for one cpu to parse ghes table.
> + */

I don't follow this comment, is this saying SError can't interrupt SError? We
already get this guarantee as the CPU masks SError when it takes an exception.

Firmware can generate multiple CPER records for a single 'event'. The CPER
records are the 'Data' in ACPI:Table 18-343 Generic Error Data Entry, and there
are 'zero or more' of these with a 'Generic Error Status Block' header that
describes the overall event. (Table 18-342).

I don't think we need this comment.


> + if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
> + ghes_notify_sei();
> + }

> die("Oops - bad mode", regs, 0);
> local_irq_disable();
> panic("bad mode");

Thanks,

James

2017-03-31 16:23:08

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context

Hi Xie XiuQi,

On 30/03/17 11:31, Xie XiuQi wrote:
> On arm64 platform, SEI may interrupt code which had interrupts masked.
> But SEI could be masked, so it's not treated as NMI, however SEA is
> treated as NMI.
>
> So, the memory area used to transfer hardware error information from
> BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
> handler.
>
> In this patch, we add a virtual page for SEI context.

I don't think this is the best way of solving the interaction problem. If we
ever need to add another notification method this gets even more complicated,
and the ioremap code has to know how these methods can interact.


> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 045d101..b1f9b1f 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -155,54 +162,55 @@ static void ghes_ioremap_exit(void)

> -static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
> -{
> - unsigned long vaddr, paddr;
> - pgprot_t prot;
> -
> - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
> + if (in_nmi()) {
> + raw_spin_lock(&ghes_ioremap_lock_nmi);
> + vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> + } else if (this_cpu_read(sei_in_process)) {

> + spin_lock_irqsave(&ghes_ioremap_lock_sei, flags);

I think this one should be a raw_spin_lock. I'm pretty sure this is for RT-linux
where spin_lock() on a contented lock will call schedule() in the same way
mutex_lock() does. If interrupts were masked by arch code then you need to use
raw_spin_lock.
So now we need to know how we got in here, we interrupted the SError handler so
this should only be Synchronous External Abort. Having to know how we got here
is another problem with this approach.


As suggested earlier[0], I think the best way is to allocate one page of virtual
address space per struct ghes, and move the locks out to the notify calls, which
can know how they are called.

I've pushed a branch to:
http://www.linux-arm.org/git?p=linux-jm.git;a=commit;h=refs/heads/apei_ioremap_rework/v1

I intend to post those patches as an RFC series later in the cycle, I've only
build tested it so far.


Thanks,

James

> + vaddr = (unsigned long)GHES_IOREMAP_SEI_PAGE(ghes_ioremap_area->addr);
> + } else {
> + spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
> + vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
> + }


[0] https://lkml.org/lkml/2017/3/20/434

2017-04-06 09:07:47

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Hi Steve,

Sorry for reply late.

On 2017/3/31 0:02, Steven Rostedt wrote:
> On Thu, 30 Mar 2017 18:31:01 +0800
> Xie XiuQi <[email protected]> wrote:
>
>> Add a new trace event for ARM processor error information, so that
>> the user will know what error occurred. With this information the
>> user may take appropriate action.
>>
>> These trace events are consistent with the ARM processor error
>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>
>> ---
>> v2: add trace enabled condition as Steven's suggestion.
>> fix a typo.
>> ---
>>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: Tyler Baicar <[email protected]>
>> Signed-off-by: Xie XiuQi <[email protected]>
>> ---
>> drivers/acpi/apei/ghes.c | 10 ++++++
>> include/linux/cper.h | 5 +++
>> include/ras/ras_event.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 102 insertions(+)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 81eabc6..6be0333 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -518,9 +518,19 @@ static void ghes_do_proc(struct ghes *ghes,
>> else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM) &&
>> trace_arm_event_enabled()) {
>> struct cper_sec_proc_arm *arm_err;
>> + struct cper_arm_err_info *err_info;
>> + int i;
>>
>> arm_err = acpi_hest_generic_data_payload(gdata);
>> trace_arm_event(arm_err);
>> +
>> + if (trace_arm_proc_err_enabled()) {
>> + err_info = (struct cper_arm_err_info *)(arm_err + 1);
>> + for (i = 0; i < arm_err->err_info_num; i++) {
>> + trace_arm_proc_err(err_info);
>> + err_info += 1;
>> + }
>> + }
>> } else if (trace_unknown_sec_event_enabled()) {
>> void *unknown_err = acpi_hest_generic_data_payload(gdata);
>> trace_unknown_sec_event(&sec_type,
>> diff --git a/include/linux/cper.h b/include/linux/cper.h
>> index 85450f3..0cae900 100644
>> --- a/include/linux/cper.h
>> +++ b/include/linux/cper.h
>> @@ -270,6 +270,11 @@ enum {
>> #define CPER_ARM_INFO_VALID_VIRT_ADDR 0x0008
>> #define CPER_ARM_INFO_VALID_PHYSICAL_ADDR 0x0010
>>
>> +#define CPER_ARM_INFO_TYPE_CACHE 0
>> +#define CPER_ARM_INFO_TYPE_TLB 1
>> +#define CPER_ARM_INFO_TYPE_BUS 2
>> +#define CPER_ARM_INFO_TYPE_UARCH 3
>> +
>> #define CPER_ARM_INFO_FLAGS_FIRST 0x0001
>> #define CPER_ARM_INFO_FLAGS_LAST 0x0002
>> #define CPER_ARM_INFO_FLAGS_PROPAGATED 0x0004
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index 13befad..026b094 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -206,6 +206,93 @@
>> __entry->running_state, __entry->psci_state)
>> );
>>
>> +#define ARM_PROC_ERR_TYPE \
>> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
>> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
>> +
>> +#define ARM_PROC_ERR_FLAGS \
>> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
>> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
>> +
>
> Can flags have more than one bit set?

Yes, indeed. ARM_PROC_ERR_FLAGS might have more than on bit set.
Could we use __print_flags instead? like:

#define show_proc_err_flags(flags) __print_flags(flags, "|", \
{ CPER_ARM_INFO_FLAGS_FIRST, "First error captured" }, \
{ CPER_ARM_INFO_FLAGS_LAST, "Last error captured" }, \
{ CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" }, \
{ CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" }}

>
>> +/*
>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>> + * via TRACE_DEFINE_ENUM().
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>> +
>> +ARM_PROC_ERR_TYPE
>> +ARM_PROC_ERR_FLAGS
>> +
>> +/*
>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>> + * that will be printed in the output.
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) { a, b },
>> +#define EMe(a, b) { a, b }
>> +
>> +TRACE_EVENT(arm_proc_err,
>> +
>> + TP_PROTO(const struct cper_arm_err_info *err),
>> +
>> + TP_ARGS(err),
>> +
>> + TP_STRUCT__entry(
>> + __field(u8, type)
>> + __field(u16, multiple_error)
>> + __field(u8, flags)
>> + __field(u64, error_info)
>> + __field(u64, virt_fault_addr)
>> + __field(u64, physical_fault_addr)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->type = err->type;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>> + __entry->multiple_error = err->multiple_error;
>> + else
>> + __entry->multiple_error = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>> + __entry->flags = err->flags;
>> + else
>> + __entry->flags = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>> + __entry->error_info = err->error_info;
>> + else
>> + __entry->error_info = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>> + __entry->virt_fault_addr = err->virt_fault_addr;
>> + else
>> + __entry->virt_fault_addr = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>> + __entry->physical_fault_addr = err->physical_fault_addr;
>> + else
>> + __entry->physical_fault_addr = 0ULL;
>
> else statements add a lot of branch conditions, and kills the branch
> prediction. What about setting up default values, and then setting only
> if the if statement is true.
>
>
> memset(&__entry->error_info, 0,
> sizeof(__entry) - offsetof(typeof(__entry), error_info)));
> __entry->multiple_error = ~0;
> __entry->flags = ~0;
>
> if (...)

I'll use this style next version, thanks.

>
>> + ),
>> +
>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
>> + " error info: %016llx; virtual address: %016llx;"
>> + " physical address: %016llx",
>> + __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
>> + __entry->multiple_error,
>> + __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
>
> Again, can flags have more than one bit set? If so, then this wont work.
>
> -- Steve
>
>> + __entry->error_info, __entry->virt_fault_addr,
>> + __entry->physical_fault_addr)
>> +);
>> +
>> /*
>> * Unknown Section Report
>> *
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> .
>

--
Thanks,
Xie XiuQi

2017-04-06 09:16:28

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8

Hi James,

Sorry for reply late, and thanks for your comments.

On 2017/4/1 0:20, James Morse wrote:
> Hi Xie XiuQi,
>
> On 30/03/17 11:31, Xie XiuQi wrote:
>> ARM APEI extension proposal added SEI (asynchronous SError interrupt)
>> notification type for ARMv8.
>>
>> Add a new GHES error source handling function for SEI. In firmware
>> first mode, if an error source's notification type is SEI. Then GHES
>> could parse and report the detail error information.
>
> The APEI additions are unsafe until patch 4 as SEA can interrupt SEI and
> deadlock while trying to take the same set of locks. This patch needs to be
> after that interaction is fixed/prevented, or we should prevent it by adding a
> depends-on-not to the Kconfig to prevent SEI and SEA being registered at the
> same time. (as a short term fix).

Will fix later.

>
> (more comments on this on that later patch)
>
>
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index e52be6a..53710a2 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>
>> @@ -625,6 +627,14 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
>
> bad_mode() is called in other scenarios too, for example executing an undefined
> instruction at EL1. You split the SError path out of the vectors in patch 7, I
> think we should do that here.
>
>
>> handler[reason], smp_processor_id(), esr,
>> esr_get_class_string(esr));
>>
>> + /*
>> + * In firmware first mode, we could assume firmware will only generate one
>> + * of cper records at a time. There is no risk for one cpu to parse ghes table.
>> + */
>
> I don't follow this comment, is this saying SError can't interrupt SError? We
> already get this guarantee as the CPU masks SError when it takes an exception.
>
> Firmware can generate multiple CPER records for a single 'event'. The CPER
> records are the 'Data' in ACPI:Table 18-343 Generic Error Data Entry, and there
> are 'zero or more' of these with a 'Generic Error Status Block' header that
> describes the overall event. (Table 18-342).
>
> I don't think we need this comment.

Thanks for your explanation, OK, I'll remove this comment.

>
>
>> + if (IS_ENABLED(CONFIG_ACPI_APEI_SEI) && ESR_ELx_EC(esr) == ESR_ELx_EC_SERROR) {
>> + ghes_notify_sei();
>> + }
>
>> die("Oops - bad mode", regs, 0);
>> local_irq_disable();
>> panic("bad mode");
>
> Thanks,
>
> James
>
>
> .
>

--
Thanks,
Xie XiuQi

2017-04-06 09:32:13

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context

Hi James,

Thanks for your comments.

On 2017/4/1 0:22, James Morse wrote:
> Hi Xie XiuQi,
>
> On 30/03/17 11:31, Xie XiuQi wrote:
>> On arm64 platform, SEI may interrupt code which had interrupts masked.
>> But SEI could be masked, so it's not treated as NMI, however SEA is
>> treated as NMI.
>>
>> So, the memory area used to transfer hardware error information from
>> BIOS to Linux can be determined only in NMI, SEI(arm64), IRQ or timer
>> handler.
>>
>> In this patch, we add a virtual page for SEI context.
>
> I don't think this is the best way of solving the interaction problem. If we
> ever need to add another notification method this gets even more complicated,
> and the ioremap code has to know how these methods can interact.
>
>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 045d101..b1f9b1f 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -155,54 +162,55 @@ static void ghes_ioremap_exit(void)
>
>> -static void __iomem *ghes_ioremap_pfn_irq(u64 pfn)
>> -{
>> - unsigned long vaddr, paddr;
>> - pgprot_t prot;
>> -
>> - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
>> + if (in_nmi()) {
>> + raw_spin_lock(&ghes_ioremap_lock_nmi);
>> + vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> + } else if (this_cpu_read(sei_in_process)) {
>
>> + spin_lock_irqsave(&ghes_ioremap_lock_sei, flags);
>
> I think this one should be a raw_spin_lock. I'm pretty sure this is for RT-linux
> where spin_lock() on a contented lock will call schedule() in the same way
> mutex_lock() does. If interrupts were masked by arch code then you need to use
> raw_spin_lock.
> So now we need to know how we got in here, we interrupted the SError handler so
> this should only be Synchronous External Abort. Having to know how we got here
> is another problem with this approach.
>
>
> As suggested earlier[0], I think the best way is to allocate one page of virtual
> address space per struct ghes, and move the locks out to the notify calls, which
> can know how they are called.
>
> I've pushed a branch to:
> http://www.linux-arm.org/git?p=linux-jm.git;a=commit;h=refs/heads/apei_ioremap_rework/v1
>

Good! I could rebase on your patch next time.

> I intend to post those patches as an RFC series later in the cycle, I've only
> build tested it so far.
>
>
> Thanks,
>
> James
>
>> + vaddr = (unsigned long)GHES_IOREMAP_SEI_PAGE(ghes_ioremap_area->addr);
>> + } else {
>> + spin_lock_irqsave(&ghes_ioremap_lock_irq, flags);
>> + vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr);
>> + }
>
>
> [0] https://lkml.org/lkml/2017/3/20/434
>
>
> .
>

--
Thanks,
Xie XiuQi

2017-04-07 15:57:20

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler

Hi Xie XiuQi,

On 30/03/17 11:31, Xie XiuQi wrote:
> From: Wang Xiongfeng <[email protected]>
>
> Since SEI is asynchronous, the error data has been consumed. So we must
> suppose that all the memory data current process can write are
> contaminated. If the process doesn't have shared writable pages, the
> process will be killed, and the system will continue running normally.
> Otherwise, the system must be terminated, because the error has been
> propagated to other processes running on other cores, and recursively
> the error may be propagated to several another processes.

This is pretty complicated. We can't guarantee that another CPU hasn't modified
the page tables while we do this, (so its racy). We can't guarantee that the
corrupt data hasn't been sent over the network or written to disk in the mean
time (so its not enough).

The scenario you have is a write of corrupt data to memory where another CPU
reading it doesn't know the value is corrupt.

The hardware gives us quite a lot of help containing errors. The RAS
specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
Architectural error propagation', and then classifies it in '2.1.3
Architecturally infected, containable and uncontainable' as uncontained because
the value is no longer in the general-purpose registers. For uncontained errors
we should panic().

We shouldn't need to try to track errors after we get a notification as the
hardware has done this for us.


Firmware-first does complicate this if events like this are not delivered using
a synchronous external abort, as Linux may have PSTATE.A masked preventing
SError Interrupts from being taken. It looks like PSTATE.A is masked much more
often than is necessary. I will look into cleaning this up.


Thanks,

James

2017-04-12 08:38:54

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler

Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
>
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <[email protected]>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
>
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
>
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
>
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
>
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
>
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
>
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
>
>
> Thanks,
>
> James
>
> .
>
Thanks,
Wang Xiongfeng

2017-04-13 08:48:21

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Xiuqi,

On 2017/3/30 18:31, Xie XiuQi wrote:
> Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
> is used to synchronize Unrecoverable errors. That is, containable errors
> architecturally consumed by the PE and not silently propagated.
>
> With ESB it is generally possible to isolate an unrecoverable error
> between two ESB instructions. So, it's possible to recovery from

> /* ISS field definitions for exceptions taken in to Hyp */
> #define ESR_ELx_CV (UL(1) << 24)
> #define ESR_ELx_COND_SHIFT (20)
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 43512d4..d8a7306 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -69,7 +69,14 @@
> #define BAD_FIQ 2
> #define BAD_ERROR 3
>
> + .arch_extension ras
> +
> .macro kernel_entry, el, regsize = 64
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb
> + .endif
> +#endif
> sub sp, sp, #S_FRAME_SIZE
> .if \regsize == 32
> mov w0, w0 // zero upper 32 bits of x0
> @@ -208,6 +215,7 @@ alternative_else_nop_endif
> #endif
>
> .if \el == 0
> + msr daifset, #0xF // Set flags
> ldr x23, [sp, #S_SP] // load return stack pointer
> msr sp_el0, x23
> #ifdef CONFIG_ARM64_ERRATUM_845719
> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>
> msr elr_el1, x21 // set up the return data
> msr spsr_el1, x22
> +
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb // Error Synchronization Barrier
> + mrs x21, disr_el1 // Check for deferred error
> + tbnz x21, #31, el1_sei

We may need to clear disr_el1.A after reading it because the hardware won't clear it.

> + .endif
> +#endif
> +
> ldp x0, x1, [sp, #16 * 0]
> ldp x2, x3, [sp, #16 * 1]
> ldp x4, x5, [sp, #16 * 2]
> @@ -318,7 +335,7 @@ ENTRY(vectors)
> ventry el1_sync_invalid // Synchronous EL1t
> ventry el1_irq_invalid // IRQ EL1t
> ventry el1_fiq_invalid // FIQ EL1t
> - ventry el1_error_invalid // Error EL1t
> + ventry el1_error // Error EL1t
>
> ventry el1_sync // Synchronous EL1h
> ventry el1_irq // IRQ EL1h
> @@ -328,7 +345,7 @@ ENTRY(vectors)
> ventry el0_sync // Synchronous 64-bit EL0
> ventry el0_irq // IRQ 64-bit EL0
> ventry el0_fiq_invalid // FIQ 64-bit EL0
> - ventry el0_error_invalid // Error 64-bit EL0
> + ventry el0_error // Error 64-bit EL0
>
> #ifdef CONFIG_COMPAT
> ventry el0_sync_compat // Synchronous 32-bit EL0
> @@ -508,12 +525,31 @@ el1_preempt:
> ret x24
> #endif
>
> + .align 6
> +el1_error:
> + kernel_entry 1
> +el1_sei:
> + /*
> + * asynchronous SError interrupt from kernel
> + */
> + mov x0, sp
> + mrs x1, esr_el1
> + mov x2, #1 // exception level of SEI generated
> + b do_sei
> +ENDPROC(el1_error)
> +
> +
> /*
> * EL0 mode handlers.
> */
> .align 6
> el0_sync:
> kernel_entry 0
> +#ifdef CONFIG_ARM64_ESB
> + mrs x26, disr_el1
> + tbnz x26, #31, el0_sei // check DISR.A
> + msr daifclr, #0x4 // unmask SEI
> +#endif
> mrs x25, esr_el1 // read the syndrome register
> lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
> cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
> @@ -688,8 +724,38 @@ el0_inv:
> ENDPROC(el0_sync)
>
> .align 6
> +el0_error:
> + kernel_entry 0
> +el0_sei:
> + /*
> + * asynchronous SError interrupt from userspace
> + */
> + ct_user_exit
> + mov x0, sp
> + mrs x1, esr_el1
> + mov x2, #0
> + bl do_sei
> + b ret_to_user
> +ENDPROC(el0_error)
> +
> + .align 6
> el0_irq:
> kernel_entry 0
> +#ifdef CONFIG_ARM64_ESB
> + mrs x26, disr_el1
> + tbz x26, #31, el0_irq_naked // check DISR.A
> +
> + mov x0, sp
> + mrs x1, esr_el1
> + mov x2, 0
> +
> + /*
> + * The SEI generated at EL0 is not affect this irq context,
> + * so after sei handler, we continue process this irq.
> + */
> + bl do_sei
> + msr daifclr, #0x4 // unmask SEI
> +#endif
> el0_irq_naked:
> enable_dbg
> #ifdef CONFIG_TRACE_IRQFLAGS

Thanks,
Wang Xiongfeng

2017-04-13 10:52:01

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi,

On Thu, Mar 30, 2017 at 06:31:07PM +0800, Xie XiuQi wrote:
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index f20c64a..22f9c90 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -106,6 +106,20 @@
> #define ESR_ELx_AR (UL(1) << 14)
> #define ESR_ELx_CM (UL(1) << 8)
>
> +#define ESR_Elx_DFSC_SEI (0x11)

We should probably have a definition for the uncategorized DFSC value,
too.

[...]

> index 43512d4..d8a7306 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -69,7 +69,14 @@
> #define BAD_FIQ 2
> #define BAD_ERROR 3
>
> + .arch_extension ras

Generally, arch_extension is a warning sign that code isn't going to
work with contemporary assemblers, which we likely need to support.

> +
> .macro kernel_entry, el, regsize = 64
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb

Here, I think that we'll need to macro this such that we can build with
existing toolchains.

e.g. in <asm/assembler.h> we need something like:

#define HINT_IMM_ESB 16

.macro ESB
hint #HINT_IMM_ESB
.endm

> + .endif
> +#endif
> sub sp, sp, #S_FRAME_SIZE
> .if \regsize == 32
> mov w0, w0 // zero upper 32 bits of x0
> @@ -208,6 +215,7 @@ alternative_else_nop_endif
> #endif
>
> .if \el == 0
> + msr daifset, #0xF // Set flags

Elsewhere in head.S we use helpers to fiddle with DAIF bits.

Please be consistent with that. Add an enable_all macro if we need one.

> ldr x23, [sp, #S_SP] // load return stack pointer
> msr sp_el0, x23
> #ifdef CONFIG_ARM64_ERRATUM_845719
> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>
> msr elr_el1, x21 // set up the return data
> msr spsr_el1, x22
> +
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb // Error Synchronization Barrier
> + mrs x21, disr_el1 // Check for deferred error

We'll need an <asm/sysreg.h> definition for this register. With that, we
can use mrs_s here.

> + tbnz x21, #31, el1_sei
> + .endif
> +#endif
> +
> ldp x0, x1, [sp, #16 * 0]
> ldp x2, x3, [sp, #16 * 1]
> ldp x4, x5, [sp, #16 * 2]
> @@ -318,7 +335,7 @@ ENTRY(vectors)
> ventry el1_sync_invalid // Synchronous EL1t
> ventry el1_irq_invalid // IRQ EL1t
> ventry el1_fiq_invalid // FIQ EL1t
> - ventry el1_error_invalid // Error EL1t
> + ventry el1_error // Error EL1t
>
> ventry el1_sync // Synchronous EL1h
> ventry el1_irq // IRQ EL1h
> @@ -328,7 +345,7 @@ ENTRY(vectors)
> ventry el0_sync // Synchronous 64-bit EL0
> ventry el0_irq // IRQ 64-bit EL0
> ventry el0_fiq_invalid // FIQ 64-bit EL0
> - ventry el0_error_invalid // Error 64-bit EL0
> + ventry el0_error // Error 64-bit EL0
>
> #ifdef CONFIG_COMPAT
> ventry el0_sync_compat // Synchronous 32-bit EL0
> @@ -508,12 +525,31 @@ el1_preempt:
> ret x24
> #endif
>
> + .align 6
> +el1_error:
> + kernel_entry 1
> +el1_sei:
> + /*
> + * asynchronous SError interrupt from kernel
> + */
> + mov x0, sp
> + mrs x1, esr_el1

I don't think this is correct if we branched here from kernel_exit.
Surely we want the DISR_EL1 value, and ESR_EL1 is unrelated?

> + mov x2, #1 // exception level of SEI generated
> + b do_sei

You don't need to figure out the EL here. In do_sei() we can determine
the exception level from the regs (e.g. using user_mode(regs)).

> +ENDPROC(el1_error)
> +
> +
> /*
> * EL0 mode handlers.
> */
> .align 6
> el0_sync:
> kernel_entry 0
> +#ifdef CONFIG_ARM64_ESB
> + mrs x26, disr_el1
> + tbnz x26, #31, el0_sei // check DISR.A
> + msr daifclr, #0x4 // unmask SEI
> +#endif

Why do we duplicate this across the EL0 handlers, rather than making it
common in the el0 kernel_entry code?

> mrs x25, esr_el1 // read the syndrome register
> lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
> cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
> @@ -688,8 +724,38 @@ el0_inv:
> ENDPROC(el0_sync)
>
> .align 6
> +el0_error:
> + kernel_entry 0
> +el0_sei:
> + /*
> + * asynchronous SError interrupt from userspace
> + */
> + ct_user_exit
> + mov x0, sp
> + mrs x1, esr_el1

As with el1_sei, I don't think this is correct if we branched to
el0_sei. As far as I am aware, ESR_EL1 will contain whatever exception
we took, and the value we want is in DISR_EL1.

> + mov x2, #0

This EL parameter can go.

> + bl do_sei
> + b ret_to_user
> +ENDPROC(el0_error)
> +
> + .align 6
> el0_irq:
> kernel_entry 0
> +#ifdef CONFIG_ARM64_ESB
> + mrs x26, disr_el1
> + tbz x26, #31, el0_irq_naked // check DISR.A
> +
> + mov x0, sp
> + mrs x1, esr_el1
> + mov x2, 0
> +
> + /*
> + * The SEI generated at EL0 is not affect this irq context,
> + * so after sei handler, we continue process this irq.
> + */
> + bl do_sei
> + msr daifclr, #0x4 // unmask SEI

This rough pattern is duplicated several times across the EL0 entry
paths. I think it should be made common.

> +#endif
> el0_irq_naked:
> enable_dbg
> #ifdef CONFIG_TRACE_IRQFLAGS
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index b6d6727..99be6d8 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -643,6 +643,34 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
> handler[reason], smp_processor_id(), esr,
> esr_get_class_string(esr));
>
> + die("Oops - bad mode", regs, 0);
> + local_irq_disable();
> + panic("bad mode");
> +}
> +
> +static const char *sei_context[] = {
> + "userspace", /* EL0 */
> + "kernel", /* EL1 */
> +};

This should go. It's only used in one place, and would be clearer with
the strings inline. More on that below.

> +
> +static const char *sei_severity[] = {

Please name this for what it actually represents:

static const char *esr_aet_str[] = {

> + [0 ... ESR_ELx_AET_MAX] = "Unknown",

For consistency with esr_class_str, please make this:

[0 ... ESR_ELx_AET_MAX] = "UNRECOGNIZED AET",

... which makes it clear that this isn't some AET value which reports an
"Unknown" status.

> + [ESR_ELx_AET_UC] = "Uncontainable",
> + [ESR_ELx_AET_UEU] = "Unrecoverable",
> + [ESR_ELx_AET_UEO] = "Restartable",
> + [ESR_ELx_AET_UER] = "Recoverable",
> + [ESR_ELx_AET_CE] = "Corrected",
> +};
> +
> +DEFINE_PER_CPU(int, sei_in_process);

A previous patch added definition of this.

> +asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
> +{
> + int aet = ESR_ELx_AET(esr);

The AET field is only valid when the DFSC is 0b010001, so we need to
check that before we interpret AET.

> + console_verbose();
> +
> + pr_crit("Asynchronous SError interrupt detected on CPU%d, %s, %s\n",
> + smp_processor_id(), sei_context[el], sei_severity[aet]);

We should dump the full ESR_ELx value, regardless of what automated
decoding we do, so that we have a chance of debugging issues in the
field.

It would also be nice to align with how bad_mode reports this today.
Please make this:

pr_crit("SError detected on CPU%d while in %s mode: code: 0x%08x -- %s\n",
smp_processor_id(), user_mode(regs) ? "user" : "kernel",
esr, esr_aet_str[aet]);

... though it might be best to dump the raw SPSR rather than trying to
say user/kernel, so that we can distinguish EL1/EL2 with VHE, etc.

> +
> /*
> * In firmware first mode, we could assume firmware will only generate one
> * of cper records at a time. There is no risk for one cpu to parse ghes table.
> @@ -653,9 +681,31 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
> this_cpu_dec(sei_in_process);
> }
>
> - die("Oops - bad mode", regs, 0);
> + if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&

Please use user_mode(regs), and get rid of the el parameter to this
function entirely.

> + cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
> + siginfo_t info;
> + void __user *pc = (void __user *)instruction_pointer(regs);
> +
> + if (aet >= ESR_ELx_AET_UEO)
> + return;

We need to check the DFSC first, and 0b111 is a reserved value (which
the ARM ARM doesn't define the recoverability of), so I don't think this
is correct.

We should probably test the DSFC, then switch on the AET value, so as to
handle only the cases we are aware of.

> +
> + if (aet == ESR_ELx_AET_UEU) {
> + info.si_signo = SIGILL;
> + info.si_errno = 0;
> + info.si_code = ILL_ILLOPC;
> + info.si_addr = pc;

An unrecoverable error is not necessarily a particular bad instruction,
so I'm not sure this makes sense.

Thanks,
Mark.

2017-04-14 07:05:02

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Mark,

Thanks for your comments.

On 2017/4/13 18:51, Mark Rutland wrote:
> Hi,
>
> On Thu, Mar 30, 2017 at 06:31:07PM +0800, Xie XiuQi wrote:
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index f20c64a..22f9c90 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -106,6 +106,20 @@
>> #define ESR_ELx_AR (UL(1) << 14)
>> #define ESR_ELx_CM (UL(1) << 8)
>>
>> +#define ESR_Elx_DFSC_SEI (0x11)
>
> We should probably have a definition for the uncategorized DFSC value,
> too.
>

Will do, thanks.

How about "#define ESR_Elx_DFSC_UNCATEGORIZED (0)" ?

> [...]
>
>> index 43512d4..d8a7306 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -69,7 +69,14 @@
>> #define BAD_FIQ 2
>> #define BAD_ERROR 3
>>
>> + .arch_extension ras
>
> Generally, arch_extension is a warning sign that code isn't going to
> work with contemporary assemblers, which we likely need to support.
>
>> +
>> .macro kernel_entry, el, regsize = 64
>> +#ifdef CONFIG_ARM64_ESB
>> + .if \el == 0
>> + esb
>
> Here, I think that we'll need to macro this such that we can build with
> existing toolchains.
>
> e.g. in <asm/assembler.h> we need something like:
>
> #define HINT_IMM_ESB 16
>
> .macro ESB
> hint #HINT_IMM_ESB
> .endm
>

Good, thanks for your suggestion. I'll use this macro in next versin.

>> + .endif
>> +#endif
>> sub sp, sp, #S_FRAME_SIZE
>> .if \regsize == 32
>> mov w0, w0 // zero upper 32 bits of x0
>> @@ -208,6 +215,7 @@ alternative_else_nop_endif
>> #endif
>>
>> .if \el == 0
>> + msr daifset, #0xF // Set flags
>
> Elsewhere in head.S we use helpers to fiddle with DAIF bits.
>
> Please be consistent with that. Add an enable_all macro if we need one.

OK, I'll do it refer to head.S.

>
>> ldr x23, [sp, #S_SP] // load return stack pointer
>> msr sp_el0, x23
>> #ifdef CONFIG_ARM64_ERRATUM_845719
>> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>>
>> msr elr_el1, x21 // set up the return data
>> msr spsr_el1, x22
>> +
>> +#ifdef CONFIG_ARM64_ESB
>> + .if \el == 0
>> + esb // Error Synchronization Barrier
>> + mrs x21, disr_el1 // Check for deferred error
>
> We'll need an <asm/sysreg.h> definition for this register. With that, we
> can use mrs_s here.

OK, thanks.

>
>> + tbnz x21, #31, el1_sei
>> + .endif
>> +#endif
>> +
>> ldp x0, x1, [sp, #16 * 0]
>> ldp x2, x3, [sp, #16 * 1]
>> ldp x4, x5, [sp, #16 * 2]
>> @@ -318,7 +335,7 @@ ENTRY(vectors)
>> ventry el1_sync_invalid // Synchronous EL1t
>> ventry el1_irq_invalid // IRQ EL1t
>> ventry el1_fiq_invalid // FIQ EL1t
>> - ventry el1_error_invalid // Error EL1t
>> + ventry el1_error // Error EL1t
>>
>> ventry el1_sync // Synchronous EL1h
>> ventry el1_irq // IRQ EL1h
>> @@ -328,7 +345,7 @@ ENTRY(vectors)
>> ventry el0_sync // Synchronous 64-bit EL0
>> ventry el0_irq // IRQ 64-bit EL0
>> ventry el0_fiq_invalid // FIQ 64-bit EL0
>> - ventry el0_error_invalid // Error 64-bit EL0
>> + ventry el0_error // Error 64-bit EL0
>>
>> #ifdef CONFIG_COMPAT
>> ventry el0_sync_compat // Synchronous 32-bit EL0
>> @@ -508,12 +525,31 @@ el1_preempt:
>> ret x24
>> #endif
>>
>> + .align 6
>> +el1_error:
>> + kernel_entry 1
>> +el1_sei:
>> + /*
>> + * asynchronous SError interrupt from kernel
>> + */
>> + mov x0, sp
>> + mrs x1, esr_el1
>
> I don't think this is correct if we branched here from kernel_exit.
> Surely we want the DISR_EL1 value, and ESR_EL1 is unrelated?

Yes, indeed. I'll change it in next version.

>
>> + mov x2, #1 // exception level of SEI generated
>> + b do_sei
>
> You don't need to figure out the EL here. In do_sei() we can determine
> the exception level from the regs (e.g. using user_mode(regs)).

Yes, you're right. I'll fix it.

>
>> +ENDPROC(el1_error)
>> +
>> +
>> /*
>> * EL0 mode handlers.
>> */
>> .align 6
>> el0_sync:
>> kernel_entry 0
>> +#ifdef CONFIG_ARM64_ESB
>> + mrs x26, disr_el1
>> + tbnz x26, #31, el0_sei // check DISR.A
>> + msr daifclr, #0x4 // unmask SEI
>> +#endif
>
> Why do we duplicate this across the EL0 handlers, rather than making it
> common in the el0 kernel_entry code?

It's different between el0_sync and el0_irq. If sei come from el0_sync,
the context is the same process, so we could just return to user space
after do_sei, instead of continue the syscall. The process may be killed
very likely, it's not necessary to continue the system call.

However, if sei come from el0_irq, in the irq context which is not related
the current process, we should continue the irq handler after do_sei.

So I think we should handle these two situation differently. If I'm wrong,
please correct me, Thanks.

>
>> mrs x25, esr_el1 // read the syndrome register
>> lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
>> cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
>> @@ -688,8 +724,38 @@ el0_inv:
>> ENDPROC(el0_sync)
>>
>> .align 6
>> +el0_error:
>> + kernel_entry 0
>> +el0_sei:
>> + /*
>> + * asynchronous SError interrupt from userspace
>> + */
>> + ct_user_exit
>> + mov x0, sp
>> + mrs x1, esr_el1
>
> As with el1_sei, I don't think this is correct if we branched to
> el0_sei. As far as I am aware, ESR_EL1 will contain whatever exception
> we took, and the value we want is in DISR_EL1.

Will fix, thanks.

>
>> + mov x2, #0
>
> This EL parameter can go.
>
>> + bl do_sei
>> + b ret_to_user
>> +ENDPROC(el0_error)
>> +
>> + .align 6
>> el0_irq:
>> kernel_entry 0
>> +#ifdef CONFIG_ARM64_ESB
>> + mrs x26, disr_el1
>> + tbz x26, #31, el0_irq_naked // check DISR.A
>> +
>> + mov x0, sp
>> + mrs x1, esr_el1
>> + mov x2, 0
>> +
>> + /*
>> + * The SEI generated at EL0 is not affect this irq context,
>> + * so after sei handler, we continue process this irq.
>> + */
>> + bl do_sei
>> + msr daifclr, #0x4 // unmask SEI
>
> This rough pattern is duplicated several times across the EL0 entry
> paths. I think it should be made common.

I agree, I'll do it in next version, thanks.

>
>> +#endif
>> el0_irq_naked:
>> enable_dbg
>> #ifdef CONFIG_TRACE_IRQFLAGS
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index b6d6727..99be6d8 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -643,6 +643,34 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
>> handler[reason], smp_processor_id(), esr,
>> esr_get_class_string(esr));
>>
>> + die("Oops - bad mode", regs, 0);
>> + local_irq_disable();
>> + panic("bad mode");
>> +}
>> +
>> +static const char *sei_context[] = {
>> + "userspace", /* EL0 */
>> + "kernel", /* EL1 */
>> +};
>
> This should go. It's only used in one place, and would be clearer with
> the strings inline. More on that below.
>

OK, thanks.

>> +
>> +static const char *sei_severity[] = {
>
> Please name this for what it actually represents:
>
> static const char *esr_aet_str[] = {
>
>> + [0 ... ESR_ELx_AET_MAX] = "Unknown",
>
> For consistency with esr_class_str, please make this:
>
> [0 ... ESR_ELx_AET_MAX] = "UNRECOGNIZED AET",
>
> ... which makes it clear that this isn't some AET value which reports an
> "Unknown" status.
>

OK, thanks.

>> + [ESR_ELx_AET_UC] = "Uncontainable",
>> + [ESR_ELx_AET_UEU] = "Unrecoverable",
>> + [ESR_ELx_AET_UEO] = "Restartable",
>> + [ESR_ELx_AET_UER] = "Recoverable",
>> + [ESR_ELx_AET_CE] = "Corrected",
>> +};
>> +
>> +DEFINE_PER_CPU(int, sei_in_process);
>
> A previous patch added definition of this.
>

I'll remote it, thanks.

>> +asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
>> +{
>> + int aet = ESR_ELx_AET(esr);
>
> The AET field is only valid when the DFSC is 0b010001, so we need to
> check that before we interpret AET.
>

Will fix.

>> + console_verbose();
>> +
>> + pr_crit("Asynchronous SError interrupt detected on CPU%d, %s, %s\n",
>> + smp_processor_id(), sei_context[el], sei_severity[aet]);
>
> We should dump the full ESR_ELx value, regardless of what automated
> decoding we do, so that we have a chance of debugging issues in the
> field.
>
> It would also be nice to align with how bad_mode reports this today.
> Please make this:
>
> pr_crit("SError detected on CPU%d while in %s mode: code: 0x%08x -- %s\n",
> smp_processor_id(), user_mode(regs) ? "user" : "kernel",
> esr, esr_aet_str[aet]);
>
> ... though it might be best to dump the raw SPSR rather than trying to
> say user/kernel, so that we can distinguish EL1/EL2 with VHE, etc.
>

OK, I'll modify in next version.

>> +
>> /*
>> * In firmware first mode, we could assume firmware will only generate one
>> * of cper records at a time. There is no risk for one cpu to parse ghes table.
>> @@ -653,9 +681,31 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
>> this_cpu_dec(sei_in_process);
>> }
>>
>> - die("Oops - bad mode", regs, 0);
>> + if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
>
> Please use user_mode(regs), and get rid of the el parameter to this
> function entirely.
>

Will fix.

>> + cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
>> + siginfo_t info;
>> + void __user *pc = (void __user *)instruction_pointer(regs);
>> +
>> + if (aet >= ESR_ELx_AET_UEO)
>> + return;
>
> We need to check the DFSC first, and 0b111 is a reserved value (which
> the ARM ARM doesn't define the recoverability of), so I don't think this
> is correct.
>
> We should probably test the DSFC, then switch on the AET value, so as to
> handle only the cases we are aware of.

Will fix.

>
>> +
>> + if (aet == ESR_ELx_AET_UEU) {
>> + info.si_signo = SIGILL;
>> + info.si_errno = 0;
>> + info.si_code = ILL_ILLOPC;
>> + info.si_addr = pc;
>
> An unrecoverable error is not necessarily a particular bad instruction,
> so I'm not sure this makes sense.
>

Generally, a SEI is generated when PE consumes an uncorrectable error.
So, may be we could send a SIGBUS instead.

I'm not sure too. Any other suggestion?

> Thanks,
> Mark.
>
> .
>

--
Thanks,
Xie XiuQi

2017-04-14 20:36:14

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

On 3/30/2017 4:31 AM, Xie XiuQi wrote:
> Add a new trace event for ARM processor error information, so that
> the user will know what error occurred. With this information the
> user may take appropriate action.
>
> These trace events are consistent with the ARM processor error
> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>
> ---
> v2: add trace enabled condition as Steven's suggestion.
> fix a typo.
> ---
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Tyler Baicar <[email protected]>
> Signed-off-by: Xie XiuQi <[email protected]>
> ---
...
>
> +#define ARM_PROC_ERR_TYPE \
> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
> +
> +#define ARM_PROC_ERR_FLAGS \
> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
> +
Hello Xie XiuQi,

This isn't compiling for me because of these definitions. Here you are
using ARM_*, but below in the TP_printk you are using ARCH_*. The
compiler complains the ARCH_* ones are undefined:

./include/trace/../../include/ras/ras_event.h:278:37: error:
'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
__print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
./include/trace/../../include/ras/ras_event.h:280:38: error:
'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
__print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),

> +/*
> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
> + * via TRACE_DEFINE_ENUM().
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +ARM_PROC_ERR_TYPE
> +ARM_PROC_ERR_FLAGS
Are the above two lines supposed to be here?
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) { a, b },
> +#define EMe(a, b) { a, b }
> +
> +TRACE_EVENT(arm_proc_err,
I think it would be better to keep this similar to the naming of the
current RAS trace events (right now we have mc_event, arm_event,
aer_event, etc.). I would suggest using "arm_err_info_event" since this
is handling the error information structures of the arm errors.
> +
> + TP_PROTO(const struct cper_arm_err_info *err),
> +
> + TP_ARGS(err),
> +
> + TP_STRUCT__entry(
> + __field(u8, type)
> + __field(u16, multiple_error)
> + __field(u8, flags)
> + __field(u64, error_info)
> + __field(u64, virt_fault_addr)
> + __field(u64, physical_fault_addr)
Validation bits should also be a part of this structure that way user
space tools will know which of these fields are valid.
> + ),
> +
> + TP_fast_assign(
> + __entry->type = err->type;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
> + __entry->multiple_error = err->multiple_error;
> + else
> + __entry->multiple_error = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
> + __entry->flags = err->flags;
> + else
> + __entry->flags = ~0;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
> + __entry->error_info = err->error_info;
> + else
> + __entry->error_info = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
> + __entry->virt_fault_addr = err->virt_fault_addr;
> + else
> + __entry->virt_fault_addr = 0ULL;
> +
> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
> + __entry->physical_fault_addr = err->physical_fault_addr;
> + else
> + __entry->physical_fault_addr = 0ULL;
> + ),
> +
> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
I think the "ARM Processor Error:" part of this should just be removed.
Here's the output with this removed and the trace event renamed to
arm_err_info_event. I think this looks much cleaner and matches the
style used with the arm_event.

<idle>-0 [020] .ns. 366.592434: arm_event: affinity
level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running
state: 1; PSCI state: 0
<idle>-0 [020] .ns. 366.592437: arm_err_info_event:
type cache error; count: 0; flags: 0x3; error info: 0000000000c20058;
virtual address: 0000000000000000; physical address: 0000000000000000

Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2017-04-17 03:09:17

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Hi Tyler,

Thanks for your comments and testing.

On 2017/4/15 4:36, Baicar, Tyler wrote:
> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>> Add a new trace event for ARM processor error information, so that
>> the user will know what error occurred. With this information the
>> user may take appropriate action.
>>
>> These trace events are consistent with the ARM processor error
>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>
>> ---
>> v2: add trace enabled condition as Steven's suggestion.
>> fix a typo.
>> ---
>>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: Tyler Baicar <[email protected]>
>> Signed-off-by: Xie XiuQi <[email protected]>
>> ---
> ...
>> +#define ARM_PROC_ERR_TYPE \
>> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
>> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
>> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
>> +
>> +#define ARM_PROC_ERR_FLAGS \
>> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
>> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
>> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
>> +
> Hello Xie XiuQi,
>
> This isn't compiling for me because of these definitions. Here you are using ARM_*, but below in the TP_printk you are using ARCH_*. The compiler complains the ARCH_* ones are undefined:
>
> ./include/trace/../../include/ras/ras_event.h:278:37: error: 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
> __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
> ./include/trace/../../include/ras/ras_event.h:280:38: error: 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
> __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),

Sorry, it's a typo. It should be ARM_xxx.

>
>> +/*
>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>> + * via TRACE_DEFINE_ENUM().
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>> +
>> +ARM_PROC_ERR_TYPE
>> +ARM_PROC_ERR_FLAGS
> Are the above two lines supposed to be here?
>> +
>> +/*
>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>> + * that will be printed in the output.
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) { a, b },
>> +#define EMe(a, b) { a, b }
>> +
>> +TRACE_EVENT(arm_proc_err,
> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>> +
>> + TP_PROTO(const struct cper_arm_err_info *err),
>> +
>> + TP_ARGS(err),
>> +
>> + TP_STRUCT__entry(
>> + __field(u8, type)
>> + __field(u16, multiple_error)
>> + __field(u8, flags)
>> + __field(u64, error_info)
>> + __field(u64, virt_fault_addr)
>> + __field(u64, physical_fault_addr)
> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.

Could we use the default value to check the validation which we have checked in TP_fast_assign?

>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->type = err->type;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>> + __entry->multiple_error = err->multiple_error;
>> + else
>> + __entry->multiple_error = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>> + __entry->flags = err->flags;
>> + else
>> + __entry->flags = ~0;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>> + __entry->error_info = err->error_info;
>> + else
>> + __entry->error_info = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>> + __entry->virt_fault_addr = err->virt_fault_addr;
>> + else
>> + __entry->virt_fault_addr = 0ULL;
>> +
>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>> + __entry->physical_fault_addr = err->physical_fault_addr;
>> + else
>> + __entry->physical_fault_addr = 0ULL;
>> + ),
>> +
>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>
> <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
> <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000

I agree. It looks much better.

>
> Thanks,
> Tyler
>

--
Thanks,
Xie XiuQi

2017-04-17 03:16:58

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Hi Tyler,

On 2017/4/17 11:08, Xie XiuQi wrote:
> Hi Tyler,
>
> Thanks for your comments and testing.
>
> On 2017/4/15 4:36, Baicar, Tyler wrote:
>> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>>> Add a new trace event for ARM processor error information, so that
>>> the user will know what error occurred. With this information the
>>> user may take appropriate action.
>>>
>>> These trace events are consistent with the ARM processor error
>>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>>
>>> ---
>>> v2: add trace enabled condition as Steven's suggestion.
>>> fix a typo.
>>> ---
>>>
>>> Cc: Steven Rostedt <[email protected]>
>>> Cc: Tyler Baicar <[email protected]>
>>> Signed-off-by: Xie XiuQi <[email protected]>
>>> ---
>> ...
>>> +#define ARM_PROC_ERR_TYPE \
>>> + EM ( CPER_ARM_INFO_TYPE_CACHE, "cache error" ) \
>>> + EM ( CPER_ARM_INFO_TYPE_TLB, "TLB error" ) \
>>> + EM ( CPER_ARM_INFO_TYPE_BUS, "bus error" ) \
>>> + EMe ( CPER_ARM_INFO_TYPE_UARCH, "micro-architectural error" )
>>> +
>>> +#define ARM_PROC_ERR_FLAGS \
>>> + EM ( CPER_ARM_INFO_FLAGS_FIRST, "First error captured" ) \
>>> + EM ( CPER_ARM_INFO_FLAGS_LAST, "Last error captured" ) \
>>> + EM ( CPER_ARM_INFO_FLAGS_PROPAGATED, "Propagated" ) \
>>> + EMe ( CPER_ARM_INFO_FLAGS_OVERFLOW, "Overflow" )
>>> +
>> Hello Xie XiuQi,
>>
>> This isn't compiling for me because of these definitions. Here you are using ARM_*, but below in the TP_printk you are using ARCH_*. The compiler complains the ARCH_* ones are undefined:
>>
>> ./include/trace/../../include/ras/ras_event.h:278:37: error: 'ARCH_PROC_ERR_TYPE' undeclared (first use in this function)
>> __print_symbolic(__entry->type, ARCH_PROC_ERR_TYPE),
>> ./include/trace/../../include/ras/ras_event.h:280:38: error: 'ARCH_PROC_ERR_FLAGS' undeclared (first use in this function)
>> __print_symbolic(__entry->flags, ARCH_PROC_ERR_FLAGS),
>
> Sorry, it's a typo. It should be ARM_xxx.
>
>>
>>> +/*
>>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>>> + * via TRACE_DEFINE_ENUM().
>>> + */
>>> +#undef EM
>>> +#undef EMe
>>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>>> +
>>> +ARM_PROC_ERR_TYPE
>>> +ARM_PROC_ERR_FLAGS
>> Are the above two lines supposed to be here?
>>> +
>>> +/*
>>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>>> + * that will be printed in the output.
>>> + */
>>> +#undef EM
>>> +#undef EMe
>>> +#define EM(a, b) { a, b },
>>> +#define EMe(a, b) { a, b }
>>> +
>>> +TRACE_EVENT(arm_proc_err,
>> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>>> +
>>> + TP_PROTO(const struct cper_arm_err_info *err),
>>> +
>>> + TP_ARGS(err),
>>> +
>>> + TP_STRUCT__entry(
>>> + __field(u8, type)
>>> + __field(u16, multiple_error)
>>> + __field(u8, flags)
>>> + __field(u64, error_info)
>>> + __field(u64, virt_fault_addr)
>>> + __field(u64, physical_fault_addr)
>> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.
>
> Could we use the default value to check the validation which we have checked in TP_fast_assign?
>
>>> + ),
>>> +
>>> + TP_fast_assign(
>>> + __entry->type = err->type;
>>> +
>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>>> + __entry->multiple_error = err->multiple_error;
>>> + else
>>> + __entry->multiple_error = ~0;
>>> +
>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>>> + __entry->flags = err->flags;
>>> + else
>>> + __entry->flags = ~0;
>>> +
>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>>> + __entry->error_info = err->error_info;
>>> + else
>>> + __entry->error_info = 0ULL;
>>> +
>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>>> + __entry->virt_fault_addr = err->virt_fault_addr;
>>> + else
>>> + __entry->virt_fault_addr = 0ULL;
>>> +
>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>>> + __entry->physical_fault_addr = err->physical_fault_addr;
>>> + else
>>> + __entry->physical_fault_addr = 0ULL;
>>> + ),
>>> +
>>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
>> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>>
>> <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
>> <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000
>

As this section is ARM Processor Error Section, how about use arm_proc_err_event?

> I agree. It looks much better.
>
>>
>> Thanks,
>> Tyler
>>
>

--
Thanks,
Xie XiuQi

2017-04-17 17:18:54

by Tyler Baicar

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

On 4/16/2017 9:16 PM, Xie XiuQi wrote:
> On 2017/4/17 11:08, Xie XiuQi wrote:
>>> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>>>> Add a new trace event for ARM processor error information, so that
>>>> the user will know what error occurred. With this information the
>>>> user may take appropriate action.
>>>>
>>>> These trace events are consistent with the ARM processor error
>>>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>>>
>>>> ---
>>>> v2: add trace enabled condition as Steven's suggestion.
>>>> fix a typo.
>>>> ---
>>>>
>>>> Cc: Steven Rostedt <[email protected]>
>>>> Cc: Tyler Baicar <[email protected]>
>>>> Signed-off-by: Xie XiuQi <[email protected]>
>>>> ---
>>>
...
>>>> +/*
>>>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>>>> + * via TRACE_DEFINE_ENUM().
>>>> + */
>>>> +#undef EM
>>>> +#undef EMe
>>>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>>>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>>>> +
>>>> +ARM_PROC_ERR_TYPE
>>>> +ARM_PROC_ERR_FLAGS
>>> Are the above two lines supposed to be here?
>>>> +
>>>> +/*
>>>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>>>> + * that will be printed in the output.
>>>> + */
>>>> +#undef EM
>>>> +#undef EMe
>>>> +#define EM(a, b) { a, b },
>>>> +#define EMe(a, b) { a, b }
>>>> +
>>>> +TRACE_EVENT(arm_proc_err,
>>> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>>>> +
>>>> + TP_PROTO(const struct cper_arm_err_info *err),
>>>> +
>>>> + TP_ARGS(err),
>>>> +
>>>> + TP_STRUCT__entry(
>>>> + __field(u8, type)
>>>> + __field(u16, multiple_error)
>>>> + __field(u8, flags)
>>>> + __field(u64, error_info)
>>>> + __field(u64, virt_fault_addr)
>>>> + __field(u64, physical_fault_addr)
>>> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.
>> Could we use the default value to check the validation which we have checked in TP_fast_assign?
Yes, true...I guess we really don't need the validation bits then.
>>>> + ),
>>>> +
>>>> + TP_fast_assign(
>>>> + __entry->type = err->type;
>>>> +
>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>>>> + __entry->multiple_error = err->multiple_error;
>>>> + else
>>>> + __entry->multiple_error = ~0;
>>>> +
>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>>>> + __entry->flags = err->flags;
>>>> + else
>>>> + __entry->flags = ~0;
>>>> +
>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>>>> + __entry->error_info = err->error_info;
>>>> + else
>>>> + __entry->error_info = 0ULL;
>>>> +
>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>>>> + __entry->virt_fault_addr = err->virt_fault_addr;
>>>> + else
>>>> + __entry->virt_fault_addr = 0ULL;
>>>> +
>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>>>> + __entry->physical_fault_addr = err->physical_fault_addr;
>>>> + else
>>>> + __entry->physical_fault_addr = 0ULL;
>>>> + ),
>>>> +
>>>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
>>> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>>>
>>> <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
>>> <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000
> As this section is ARM Processor Error Section, how about use arm_proc_err_event?
This is not for the ARM Processor Error Section, that is what the
arm_event is handling. What you are adding this trace support for here
is called the ARM Processor Error Information (UEFI 2.6 spec section
N.2.4.4.1). So I think your trace event here should be called
arm_err_info_event. This will also be consistent with the other two
trace events that I'm planning on adding:

arm_ctx_info_event: ARM Processor Context Information (UEFI 2.6 section
N.2.4.4.2)
arm_vendor_info_event: This is the "Vendor Specific Error Information"
in the ARM Processor Error Section (Table 260). It's possible I may just
add this into the arm_event trace event, but I haven't looked into it
enough yet.

Thanks,
Tyler

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2017-04-18 01:18:24

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Mark,

I have some confusion about the RAS feature when VHE is enabled. Does RAS spec support
the situation when VHE is enabled. When VHE is disabled, the hyperviosr delegates the error
exception to EL1 by setting HCR_EL2.VSE to 1, and this will inject a virtual SEI into OS.
My understanding is that HCR_EL2.VSE is only used to inject a virtual SEI into EL1.
But when VHE is enabled, the host OS will run at EL2. We can't inject a virtual SEI into
host OS. I don't know if RAS spec can handle this situation.

Thanks,
Wang Xiongfeng

On 2017/4/13 18:51, Mark Rutland wrote:
> Hi,
>
> On Thu, Mar 30, 2017 at 06:31:07PM +0800, Xie XiuQi wrote:
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index f20c64a..22f9c90 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -106,6 +106,20 @@
>> #define ESR_ELx_AR (UL(1) << 14)
>> #define ESR_ELx_CM (UL(1) << 8)
>>
>> +#define ESR_Elx_DFSC_SEI (0x11)
>
> We should probably have a definition for the uncategorized DFSC value,
> too.
>
> [...]
>
>> index 43512d4..d8a7306 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -69,7 +69,14 @@
>> #define BAD_FIQ 2
>> #define BAD_ERROR 3
>>
>> + .arch_extension ras
>
> Generally, arch_extension is a warning sign that code isn't going to
> work with contemporary assemblers, which we likely need to support.
>
>> +
>> .macro kernel_entry, el, regsize = 64
>> +#ifdef CONFIG_ARM64_ESB
>> + .if \el == 0
>> + esb
>
> Here, I think that we'll need to macro this such that we can build with
> existing toolchains.
>
> e.g. in <asm/assembler.h> we need something like:
>
> #define HINT_IMM_ESB 16
>
> .macro ESB
> hint #HINT_IMM_ESB
> .endm
>
>> + .endif
>> +#endif
>> sub sp, sp, #S_FRAME_SIZE
>> .if \regsize == 32
>> mov w0, w0 // zero upper 32 bits of x0
>> @@ -208,6 +215,7 @@ alternative_else_nop_endif
>> #endif
>>
>> .if \el == 0
>> + msr daifset, #0xF // Set flags
>
> Elsewhere in head.S we use helpers to fiddle with DAIF bits.
>
> Please be consistent with that. Add an enable_all macro if we need one.
>
>> ldr x23, [sp, #S_SP] // load return stack pointer
>> msr sp_el0, x23
>> #ifdef CONFIG_ARM64_ERRATUM_845719
>> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>>
>> msr elr_el1, x21 // set up the return data
>> msr spsr_el1, x22
>> +
>> +#ifdef CONFIG_ARM64_ESB
>> + .if \el == 0
>> + esb // Error Synchronization Barrier
>> + mrs x21, disr_el1 // Check for deferred error
>
> We'll need an <asm/sysreg.h> definition for this register. With that, we
> can use mrs_s here.
>
>> + tbnz x21, #31, el1_sei
>> + .endif
>> +#endif
>> +
>> ldp x0, x1, [sp, #16 * 0]
>> ldp x2, x3, [sp, #16 * 1]
>> ldp x4, x5, [sp, #16 * 2]
>> @@ -318,7 +335,7 @@ ENTRY(vectors)
>> ventry el1_sync_invalid // Synchronous EL1t
>> ventry el1_irq_invalid // IRQ EL1t
>> ventry el1_fiq_invalid // FIQ EL1t
>> - ventry el1_error_invalid // Error EL1t
>> + ventry el1_error // Error EL1t
>>
>> ventry el1_sync // Synchronous EL1h
>> ventry el1_irq // IRQ EL1h
>> @@ -328,7 +345,7 @@ ENTRY(vectors)
>> ventry el0_sync // Synchronous 64-bit EL0
>> ventry el0_irq // IRQ 64-bit EL0
>> ventry el0_fiq_invalid // FIQ 64-bit EL0
>> - ventry el0_error_invalid // Error 64-bit EL0
>> + ventry el0_error // Error 64-bit EL0
>>
>> #ifdef CONFIG_COMPAT
>> ventry el0_sync_compat // Synchronous 32-bit EL0
>> @@ -508,12 +525,31 @@ el1_preempt:
>> ret x24
>> #endif
>>
>> + .align 6
>> +el1_error:
>> + kernel_entry 1
>> +el1_sei:
>> + /*
>> + * asynchronous SError interrupt from kernel
>> + */
>> + mov x0, sp
>> + mrs x1, esr_el1
>
> I don't think this is correct if we branched here from kernel_exit.
> Surely we want the DISR_EL1 value, and ESR_EL1 is unrelated?
>
>> + mov x2, #1 // exception level of SEI generated
>> + b do_sei
>
> You don't need to figure out the EL here. In do_sei() we can determine
> the exception level from the regs (e.g. using user_mode(regs)).
>
>> +ENDPROC(el1_error)
>> +
>> +
>> /*
>> * EL0 mode handlers.
>> */
>> .align 6
>> el0_sync:
>> kernel_entry 0
>> +#ifdef CONFIG_ARM64_ESB
>> + mrs x26, disr_el1
>> + tbnz x26, #31, el0_sei // check DISR.A
>> + msr daifclr, #0x4 // unmask SEI
>> +#endif
>
> Why do we duplicate this across the EL0 handlers, rather than making it
> common in the el0 kernel_entry code?
>
>> mrs x25, esr_el1 // read the syndrome register
>> lsr x24, x25, #ESR_ELx_EC_SHIFT // exception class
>> cmp x24, #ESR_ELx_EC_SVC64 // SVC in 64-bit state
>> @@ -688,8 +724,38 @@ el0_inv:
>> ENDPROC(el0_sync)
>>
>> .align 6
>> +el0_error:
>> + kernel_entry 0
>> +el0_sei:
>> + /*
>> + * asynchronous SError interrupt from userspace
>> + */
>> + ct_user_exit
>> + mov x0, sp
>> + mrs x1, esr_el1
>
> As with el1_sei, I don't think this is correct if we branched to
> el0_sei. As far as I am aware, ESR_EL1 will contain whatever exception
> we took, and the value we want is in DISR_EL1.
>
>> + mov x2, #0
>
> This EL parameter can go.
>
>> + bl do_sei
>> + b ret_to_user
>> +ENDPROC(el0_error)
>> +
>> + .align 6
>> el0_irq:
>> kernel_entry 0
>> +#ifdef CONFIG_ARM64_ESB
>> + mrs x26, disr_el1
>> + tbz x26, #31, el0_irq_naked // check DISR.A
>> +
>> + mov x0, sp
>> + mrs x1, esr_el1
>> + mov x2, 0
>> +
>> + /*
>> + * The SEI generated at EL0 is not affect this irq context,
>> + * so after sei handler, we continue process this irq.
>> + */
>> + bl do_sei
>> + msr daifclr, #0x4 // unmask SEI
>
> This rough pattern is duplicated several times across the EL0 entry
> paths. I think it should be made common.
>
>> +#endif
>> el0_irq_naked:
>> enable_dbg
>> #ifdef CONFIG_TRACE_IRQFLAGS
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index b6d6727..99be6d8 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -643,6 +643,34 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
>> handler[reason], smp_processor_id(), esr,
>> esr_get_class_string(esr));
>>
>> + die("Oops - bad mode", regs, 0);
>> + local_irq_disable();
>> + panic("bad mode");
>> +}
>> +
>> +static const char *sei_context[] = {
>> + "userspace", /* EL0 */
>> + "kernel", /* EL1 */
>> +};
>
> This should go. It's only used in one place, and would be clearer with
> the strings inline. More on that below.
>
>> +
>> +static const char *sei_severity[] = {
>
> Please name this for what it actually represents:
>
> static const char *esr_aet_str[] = {
>
>> + [0 ... ESR_ELx_AET_MAX] = "Unknown",
>
> For consistency with esr_class_str, please make this:
>
> [0 ... ESR_ELx_AET_MAX] = "UNRECOGNIZED AET",
>
> ... which makes it clear that this isn't some AET value which reports an
> "Unknown" status.
>
>> + [ESR_ELx_AET_UC] = "Uncontainable",
>> + [ESR_ELx_AET_UEU] = "Unrecoverable",
>> + [ESR_ELx_AET_UEO] = "Restartable",
>> + [ESR_ELx_AET_UER] = "Recoverable",
>> + [ESR_ELx_AET_CE] = "Corrected",
>> +};
>> +
>> +DEFINE_PER_CPU(int, sei_in_process);
>
> A previous patch added definition of this.
>
>> +asmlinkage void do_sei(struct pt_regs *regs, unsigned int esr, int el)
>> +{
>> + int aet = ESR_ELx_AET(esr);
>
> The AET field is only valid when the DFSC is 0b010001, so we need to
> check that before we interpret AET.
>
>> + console_verbose();
>> +
>> + pr_crit("Asynchronous SError interrupt detected on CPU%d, %s, %s\n",
>> + smp_processor_id(), sei_context[el], sei_severity[aet]);
>
> We should dump the full ESR_ELx value, regardless of what automated
> decoding we do, so that we have a chance of debugging issues in the
> field.
>
> It would also be nice to align with how bad_mode reports this today.
> Please make this:
>
> pr_crit("SError detected on CPU%d while in %s mode: code: 0x%08x -- %s\n",
> smp_processor_id(), user_mode(regs) ? "user" : "kernel",
> esr, esr_aet_str[aet]);
>
> ... though it might be best to dump the raw SPSR rather than trying to
> say user/kernel, so that we can distinguish EL1/EL2 with VHE, etc.
>
>> +
>> /*
>> * In firmware first mode, we could assume firmware will only generate one
>> * of cper records at a time. There is no risk for one cpu to parse ghes table.
>> @@ -653,9 +681,31 @@ asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
>> this_cpu_dec(sei_in_process);
>> }
>>
>> - die("Oops - bad mode", regs, 0);
>> + if (el == 0 && IS_ENABLED(CONFIG_ARM64_ESB) &&
>
> Please use user_mode(regs), and get rid of the el parameter to this
> function entirely.
>
>> + cpus_have_cap(ARM64_HAS_RAS_EXTN)) {
>> + siginfo_t info;
>> + void __user *pc = (void __user *)instruction_pointer(regs);
>> +
>> + if (aet >= ESR_ELx_AET_UEO)
>> + return;
>
> We need to check the DFSC first, and 0b111 is a reserved value (which
> the ARM ARM doesn't define the recoverability of), so I don't think this
> is correct.
>
> We should probably test the DSFC, then switch on the AET value, so as to
> handle only the cases we are aware of.
>
>> +
>> + if (aet == ESR_ELx_AET_UEU) {
>> + info.si_signo = SIGILL;
>> + info.si_errno = 0;
>> + info.si_code = ILL_ILLOPC;
>> + info.si_addr = pc;
>
> An unrecoverable error is not necessarily a particular bad instruction,
> so I'm not sure this makes sense.
>
> Thanks,
> Mark.
>
> .
>

2017-04-18 02:22:46

by Xie XiuQi

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] trace: ras: add ARM processor error information trace event

Hi Tyler,

On 2017/4/18 1:18, Baicar, Tyler wrote:
> On 4/16/2017 9:16 PM, Xie XiuQi wrote:
>> On 2017/4/17 11:08, Xie XiuQi wrote:
>>>> On 3/30/2017 4:31 AM, Xie XiuQi wrote:
>>>>> Add a new trace event for ARM processor error information, so that
>>>>> the user will know what error occurred. With this information the
>>>>> user may take appropriate action.
>>>>>
>>>>> These trace events are consistent with the ARM processor error
>>>>> information table which defined in UEFI 2.6 spec section N.2.4.4.1.
>>>>>
>>>>> ---
>>>>> v2: add trace enabled condition as Steven's suggestion.
>>>>> fix a typo.
>>>>> ---
>>>>>
>>>>> Cc: Steven Rostedt <[email protected]>
>>>>> Cc: Tyler Baicar <[email protected]>
>>>>> Signed-off-by: Xie XiuQi <[email protected]>
>>>>> ---
>>>>
> ...
>>>>> +/*
>>>>> + * First define the enums in MM_ACTION_RESULT to be exported to userspace
>>>>> + * via TRACE_DEFINE_ENUM().
>>>>> + */
>>>>> +#undef EM
>>>>> +#undef EMe
>>>>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>>>>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>>>>> +
>>>>> +ARM_PROC_ERR_TYPE
>>>>> +ARM_PROC_ERR_FLAGS
>>>> Are the above two lines supposed to be here?
>>>>> +
>>>>> +/*
>>>>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>>>>> + * that will be printed in the output.
>>>>> + */
>>>>> +#undef EM
>>>>> +#undef EMe
>>>>> +#define EM(a, b) { a, b },
>>>>> +#define EMe(a, b) { a, b }
>>>>> +
>>>>> +TRACE_EVENT(arm_proc_err,
>>>> I think it would be better to keep this similar to the naming of the current RAS trace events (right now we have mc_event, arm_event, aer_event, etc.). I would suggest using "arm_err_info_event" since this is handling the error information structures of the arm errors.
>>>>> +
>>>>> + TP_PROTO(const struct cper_arm_err_info *err),
>>>>> +
>>>>> + TP_ARGS(err),
>>>>> +
>>>>> + TP_STRUCT__entry(
>>>>> + __field(u8, type)
>>>>> + __field(u16, multiple_error)
>>>>> + __field(u8, flags)
>>>>> + __field(u64, error_info)
>>>>> + __field(u64, virt_fault_addr)
>>>>> + __field(u64, physical_fault_addr)
>>>> Validation bits should also be a part of this structure that way user space tools will know which of these fields are valid.
>>> Could we use the default value to check the validation which we have checked in TP_fast_assign?
> Yes, true...I guess we really don't need the validation bits then.
>>>>> + ),
>>>>> +
>>>>> + TP_fast_assign(
>>>>> + __entry->type = err->type;
>>>>> +
>>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_MULTI_ERR)
>>>>> + __entry->multiple_error = err->multiple_error;
>>>>> + else
>>>>> + __entry->multiple_error = ~0;
>>>>> +
>>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_FLAGS)
>>>>> + __entry->flags = err->flags;
>>>>> + else
>>>>> + __entry->flags = ~0;
>>>>> +
>>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
>>>>> + __entry->error_info = err->error_info;
>>>>> + else
>>>>> + __entry->error_info = 0ULL;
>>>>> +
>>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
>>>>> + __entry->virt_fault_addr = err->virt_fault_addr;
>>>>> + else
>>>>> + __entry->virt_fault_addr = 0ULL;
>>>>> +
>>>>> + if (err->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
>>>>> + __entry->physical_fault_addr = err->physical_fault_addr;
>>>>> + else
>>>>> + __entry->physical_fault_addr = 0ULL;
>>>>> + ),
>>>>> +
>>>>> + TP_printk("ARM Processor Error: type %s; count: %u; flags: %s;"
>>>> I think the "ARM Processor Error:" part of this should just be removed. Here's the output with this removed and the trace event renamed to arm_err_info_event. I think this looks much cleaner and matches the style used with the arm_event.
>>>>
>>>> <idle>-0 [020] .ns. 366.592434: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510f8000; running state: 1; PSCI state: 0
>>>> <idle>-0 [020] .ns. 366.592437: arm_err_info_event: type cache error; count: 0; flags: 0x3; error info: 0000000000c20058; virtual address: 0000000000000000; physical address: 0000000000000000
>> As this section is ARM Processor Error Section, how about use arm_proc_err_event?
> This is not for the ARM Processor Error Section, that is what the arm_event is handling. What you are adding this trace support for here is called the ARM Processor Error Information (UEFI 2.6 spec section N.2.4.4.1). So I think your trace event here should be called arm_err_info_event. This will also be consistent with the other two trace events that I'm planning on adding:
>
> arm_ctx_info_event: ARM Processor Context Information (UEFI 2.6 section N.2.4.4.2)
> arm_vendor_info_event: This is the "Vendor Specific Error Information" in the ARM Processor Error Section (Table 260). It's possible I may just add this into the arm_event trace event, but I haven't looked into it enough yet.
>

OK, I see. Thanks for your explanation.

> Thanks,
> Tyler
>

--
Thanks,
Xie XiuQi

2017-04-18 10:53:48

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Wang Xiongfeng,

On 18/04/17 02:09, Xiongfeng Wang wrote:
> I have some confusion about the RAS feature when VHE is enabled. Does RAS spec support
> the situation when VHE is enabled. When VHE is disabled, the hyperviosr delegates the error
> exception to EL1 by setting HCR_EL2.VSE to 1, and this will inject a virtual SEI into OS.

(The ARM-ARM also requires the HCR_EL2.AMO to be set so that physical SError
Interrupts are taken to EL2, meaning EL1 can never receive a physical SError)


> My understanding is that HCR_EL2.VSE is only used to inject a virtual SEI into EL1.

... mine too ...

> But when VHE is enabled, the host OS will run at EL2. We can't inject a virtual SEI into
> host OS. I don't know if RAS spec can handle this situation.

The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
describe a way to inject these as they are generated by the CPU.

Am I right in thinking you want this to use SError Interrupts as an APEI
notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)

This is straightforward for the hyper-visor to implement using Virtual SError.
I don't think its not always feasible for the host as Physical SError is routed
to EL3 by SCR_EL3.EA, meaning there is no hardware generated SError that can
reach EL2. Another APEI notification mechanism may be more appropriate.

EL3 may be able to 'fake' an SError by returning into the appropriate EL2 vector
if the exception came from EL{0,1}, or from EL2 and PSTATE.A is clear.
If the SError came from EL2 and the ESR_EL3.IESB bit is set, we can write an
appropriate ESR into DISR.
You cant use SError to cover all the possible RAS exceptions. We already have
this problem using SEI if PSTATE.A was set and the exception was an imprecise
abort from EL2. We can't return to the interrupted context and we can't deliver
an SError to EL2 either.

Setting SCR_EL3.EA allows firmware to handle these ugly corner cases. Notifying
the OS is a separate problem where APEI's SEI may not always be the best choice.


Thanks,

James

2017-04-19 02:43:59

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi James,

Thanks for your reply.

On 2017/4/18 18:51, James Morse wrote:
> Hi Wang Xiongfeng,
>
> On 18/04/17 02:09, Xiongfeng Wang wrote:
>> I have some confusion about the RAS feature when VHE is enabled. Does RAS spec support
>> the situation when VHE is enabled. When VHE is disabled, the hyperviosr delegates the error
>> exception to EL1 by setting HCR_EL2.VSE to 1, and this will inject a virtual SEI into OS.
>
> (The ARM-ARM also requires the HCR_EL2.AMO to be set so that physical SError
> Interrupts are taken to EL2, meaning EL1 can never receive a physical SError)
>
>
>> My understanding is that HCR_EL2.VSE is only used to inject a virtual SEI into EL1.
>
> ... mine too ...
>
>> But when VHE is enabled, the host OS will run at EL2. We can't inject a virtual SEI into
>> host OS. I don't know if RAS spec can handle this situation.
>
> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
> describe a way to inject these as they are generated by the CPU.
>
> Am I right in thinking you want this to use SError Interrupts as an APEI
> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)

Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
describes the SEI recovery process when ESB is implemented.

In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
Then ESB is executed again, and DISR_EL1.A is set by hardware (2.4.4 ESB and virtual errors), so that
the following process can be executed.

So we want to inject a vSEI into OS, when control is returned from EL3/2 to OS, no matter whether
it is on host OS or guest OS. I don't know if my understanding is right here.
>
> This is straightforward for the hyper-visor to implement using Virtual SError.
> I don't think its not always feasible for the host as Physical SError is routed
> to EL3 by SCR_EL3.EA, meaning there is no hardware generated SError that can
> reach EL2. Another APEI notification mechanism may be more appropriate.

>
> EL3 may be able to 'fake' an SError by returning into the appropriate EL2 vector
> if the exception came from EL{0,1}, or from EL2 and PSTATE.A is clear.
> If the SError came from EL2 and the ESR_EL3.IESB bit is set, we can write an
> appropriate ESR into DISR.

Yes, this can work. When VHE is enabled, we can set DISR.A by software, and 'fake'
an SError by returning into the EL2 SEI vector.

> You cant use SError to cover all the possible RAS exceptions. We already have
> this problem using SEI if PSTATE.A was set and the exception was an imprecise
> abort from EL2. We can't return to the interrupted context and we can't deliver
> an SError to EL2 either.

SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>
> Setting SCR_EL3.EA allows firmware to handle these ugly corner cases. Notifying
> the OS is a separate problem where APEI's SEI may not always be the best choice.
>
>
> Thanks,
>
> James
>
> .
>
Thanks,
Wang Xiongfeng


2017-04-20 08:54:02

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Wang Xiongfeng,

On 19/04/17 03:37, Xiongfeng Wang wrote:
> On 2017/4/18 18:51, James Morse wrote:
>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>> describe a way to inject these as they are generated by the CPU.
>>
>> Am I right in thinking you want this to use SError Interrupts as an APEI
>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>
> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
> describes the SEI recovery process when ESB is implemented.
>
> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects

> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.

This step has confused me. How would this work with VHE where the host runs at
EL2 and there is nothing at Host:EL1?
>From your description I assume you have some firmware resident at EL2.


> Then ESB is executed again, and DISR_EL1.A is set by hardware (2.4.4 ESB and virtual errors), so that
> the following process can be executed.


> So we want to inject a vSEI into OS, when control is returned from EL3/2 to OS, no matter whether
> it is on host OS or guest OS.

I disagree. With Linux/KVM you can't use Virtual SError to notify the host OS.
The host OS expects to receive Physical SError, these shouldn't be taken to EL2
unless a guest is running. Notifications from firmware that use SEA or SEI
should follow the routing rules in the ARM-ARM, which means they should never
reach a guest OS.

For VHE the host runs at EL2 and sets HCR_EL2.AMO. Any firmware notification
should come from EL3 to the host at EL2. The host may choose to notify the
guest, but this should always go via Qemu.

For non-VHE systems the host runs at EL1 and KVM lives at EL2 to do the world
switch. When KVM is running a guest it sets HCR_EL2.AMO, when it has switched
back to the host it clears it.

Notifications from EL3 that pretend to be SError should route SError to EL2 or
EL1 depending on HCR_EL2.AMO.
When KVM gets a Synchronous External Abort or an SError while a guest is running
it will switch back to the host and tell the handle_exit() code. Again the host
may choose to notify the guest, but this should always go via Qemu.

The ARM-ARM pseudo code for the routing rules is in
AArch64.TakePhysicalSErrorException(). Firmware injecting fake SError should
behave exactly like this.


Newly appeared in the ARM-ARM is HCR_EL2.TEA, which takes Synchronous External
Aborts to EL2 (if SCR_EL3 hasn't claimed them). We should set/clear this bit
like we do HCR_EL2.AMO if the CPU has the RAS extensions.


>> You cant use SError to cover all the possible RAS exceptions. We already have
>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>> abort from EL2. We can't return to the interrupted context and we can't deliver
>> an SError to EL2 either.
>
> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.

firmware-panic? We can do a little better than that:
If the IESB bit is set in the ESR we can behave as if this were an ESB and have
firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
exception came from EL2.

Linux should have an ESB in its __switch_to(), I've re-worked the daif masking
in entry.S so that PSTATE.A will always be unmasked here.

Other than these two cases, yes, this CPU really is wrecked. Firmware can power
it off via PSCI, and could notify another CPU about what happened. UEFI's table
250 'Processor Generic Error Section' has a 'Processor ID' field, with a note
that on ARM this is the MPIDR_EL1. Table 260 'ARM Processor Error Section' has
something similar.


Thanks,

James

2017-04-21 10:47:32

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi XiuQi,

On 2017/3/30 18:31, Xie XiuQi wrote:
> Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
> is used to synchronize Unrecoverable errors. That is, containable errors
> architecturally consumed by the PE and not silently propagated.
>
> With ESB it is generally possible to isolate an unrecoverable error
> between two ESB instructions. So, it's possible to recovery from
> unrecoverable errors reported by asynchronous SError interrupt.
>
> If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.
>
> Signed-off-by: Xie XiuQi <[email protected]>
> Signed-off-by: Wang Xiongfeng <[email protected]>
> ---
> arch/arm64/Kconfig | 16 ++++++++++
> arch/arm64/include/asm/esr.h | 14 +++++++++
> arch/arm64/kernel/entry.S | 70 ++++++++++++++++++++++++++++++++++++++++++--
> arch/arm64/kernel/traps.c | 54 ++++++++++++++++++++++++++++++++--
> 4 files changed, 150 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 859a90e..7402175 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -911,6 +911,22 @@ endmenu
>
> menu "ARMv8.2 architectural features"
>
> +config ARM64_ESB
> + bool "Enable support for Error Synchronization Barrier (ESB)"
> + default n
> + help
> + Error Synchronization Barrier (ESB; part of the ARMv8.2 Extensions)
> + is used to synchronize Unrecoverable errors. That is, containable errors
> + architecturally consumed by the PE and not silently propagated.
> +
> + Without ESB it is not generally possible to isolate an Unrecoverable
> + error because it is not known which instruction generated the error.
> +
> + Selecting this option allows inject esb instruction before the exception
> + change. If ARMv8.2 RAS Extension is not support, ESB is treated as a NOP.
> +
> + Note that ESB instruction can introduce slight overhead, so say N if unsure.
> +
> config ARM64_UAO
> bool "Enable support for User Access Override (UAO)"
> default y
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index f20c64a..22f9c90 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -106,6 +106,20 @@
> #define ESR_ELx_AR (UL(1) << 14)
> #define ESR_ELx_CM (UL(1) << 8)
>
> +#define ESR_Elx_DFSC_SEI (0x11)
> +
> +#define ESR_ELx_AET_SHIFT (10)
> +#define ESR_ELx_AET_MAX (7)
> +#define ESR_ELx_AET_MASK (UL(7) << ESR_ELx_AET_SHIFT)
> +#define ESR_ELx_AET(esr) (((esr) & ESR_ELx_AET_MASK) >> ESR_ELx_AET_SHIFT)
> +
> +#define ESR_ELx_AET_UC (0)
> +#define ESR_ELx_AET_UEU (1)
> +#define ESR_ELx_AET_UEO (2)
> +#define ESR_ELx_AET_UER (3)
> +#define ESR_ELx_AET_CE (6)
> +
> +
> /* ISS field definitions for exceptions taken in to Hyp */
> #define ESR_ELx_CV (UL(1) << 24)
> #define ESR_ELx_COND_SHIFT (20)
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 43512d4..d8a7306 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -69,7 +69,14 @@
> #define BAD_FIQ 2
> #define BAD_ERROR 3
>
> + .arch_extension ras
> +
> .macro kernel_entry, el, regsize = 64
because we also want to take SEI exception in kernel space, we may need to unmask SEI in kernel_entry.
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb
> + .endif
> +#endif
> sub sp, sp, #S_FRAME_SIZE
> .if \regsize == 32
> mov w0, w0 // zero upper 32 bits of x0
> @@ -208,6 +215,7 @@ alternative_else_nop_endif
> #endif
>
> .if \el == 0
> + msr daifset, #0xF // Set flags
> ldr x23, [sp, #S_SP] // load return stack pointer
> msr sp_el0, x23
> #ifdef CONFIG_ARM64_ERRATUM_845719
> @@ -226,6 +234,15 @@ alternative_else_nop_endif
>
> msr elr_el1, x21 // set up the return data
> msr spsr_el1, x22
> +
> +#ifdef CONFIG_ARM64_ESB
> + .if \el == 0
> + esb // Error Synchronization Barrier
> + mrs x21, disr_el1 // Check for deferred error
> + tbnz x21, #31, el1_sei
> + .endif
> +#endif
> +
> ldp x0, x1, [sp, #16 * 0]
> ldp x2, x3, [sp, #16 * 1]
> ldp x4, x5, [sp, #16 * 2]
> @@ -318,7 +335,7 @@ ENTRY(vectors)
> ventry el1_sync_invalid // Synchronous EL1t
> ventry el1_irq_invalid // IRQ EL1t
> ventry el1_fiq_invalid // FIQ EL1t
> - ventry el1_error_invalid // Error EL1t
> + ventry el1_error // Error EL1t
>
> ventry el1_sync // Synchronous EL1h
> ventry el1_irq // IRQ EL1h
> @@ -328,7 +345,7 @@ ENTRY(vectors)
> ventry el0_sync // Synchronous 64-bit EL0
> ventry el0_irq // IRQ 64-bit EL0
> ventry el0_fiq_invalid // FIQ 64-bit EL0
> - ventry el0_error_invalid // Error 64-bit EL0
> + ventry el0_error // Error 64-bit EL0

for the situation 'Current EL with SPx', we also need to change the sError vector to el1_error.
>
> #ifdef CONFIG_COMPAT
> ventry el0_sync_compat // Synchronous 32-bit EL0
> @@ -508,12 +525,31 @@ el1_preempt:
> ret x24
> #endif
>
> + .align 6
> +el1_error:
> + kernel_entry 1
> +el1_sei:
> + /*
> + * asynchronous SError interrupt from kernel
> + */
> + mov x0, sp
> + mrs x1, esr_el1
> + mov x2, #1 // exception level of SEI generated
> + b do_sei
> +ENDPROC(el1_error)
> +
> +


Thanks,

Wang Xiongfeng


2017-04-21 11:34:53

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi James,

Thanks for your reply.

On 2017/4/20 16:52, James Morse wrote:
> Hi Wang Xiongfeng,
>
> On 19/04/17 03:37, Xiongfeng Wang wrote:
>> On 2017/4/18 18:51, James Morse wrote:
>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>> describe a way to inject these as they are generated by the CPU.
>>>
>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>>
>> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>> describes the SEI recovery process when ESB is implemented.
>>
>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
>> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
>
>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
>
> This step has confused me. How would this work with VHE where the host runs at
> EL2 and there is nothing at Host:EL1?

RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
I don't know whether the step described in the section is designed for guest os or host os or both.
Yes, it doesn't work with VHE where the host runs at EL2.

>>From your description I assume you have some firmware resident at EL2.

Our actual SEI step is planned as follows:
Host OS: EL0/EL1 -> EL3 -> EL0/EL1
Guest OS: EL0/EL1 -> EL3 -> EL2 -> EL0/EL1

Our problem is that, when returning to EL0/EL1, whether we should jump to EL1 SEI vector or just return to where the SEI is taken from?
In guest os situation, we can inject an vSEI and return to where the SEI is taken from.
But in host os situation, we can't inject an vSEI (if we don't set HCR_EL2.AMO), so we have to jump to EL1 SEI vector.
Then the process following ESB won't be executed becuase SEI is taken to EL3 from the ESB instruction in EL1, and when control
is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.

It is ok to just ignore the process following the ESB instruction in el0_sync, because the process will be sent SIGBUS signal.
But for el0_irq, the irq process following the ESB may should be executed because the irq is not related to the user process.

If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into host OS in EL3.
Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may be too racy is
not consistent with ARM rules.
>
>
>> Then ESB is executed again, and DISR_EL1.A is set by hardware (2.4.4 ESB and virtual errors), so that
>> the following process can be executed.
>
>
>> So we want to inject a vSEI into OS, when control is returned from EL3/2 to OS, no matter whether
>> it is on host OS or guest OS.
>
> I disagree. With Linux/KVM you can't use Virtual SError to notify the host OS.
> The host OS expects to receive Physical SError, these shouldn't be taken to EL2
> unless a guest is running. Notifications from firmware that use SEA or SEI
> should follow the routing rules in the ARM-ARM, which means they should never
> reach a guest OS.

Yes, I agree. When running the host os, exception should not be taken to EL2, because
EL2 SEI vector always suppose that exception is taken from guest os and save
guest context in the first place.
>
> For VHE the host runs at EL2 and sets HCR_EL2.AMO. Any firmware notification
> should come from EL3 to the host at EL2. The host may choose to notify the
> guest, but this should always go via Qemu.
>
> For non-VHE systems the host runs at EL1 and KVM lives at EL2 to do the world
> switch. When KVM is running a guest it sets HCR_EL2.AMO, when it has switched
> back to the host it clears it.
>
> Notifications from EL3 that pretend to be SError should route SError to EL2 or
> EL1 depending on HCR_EL2.AMO.
> When KVM gets a Synchronous External Abort or an SError while a guest is running
> it will switch back to the host and tell the handle_exit() code. Again the host
> may choose to notify the guest, but this should always go via Qemu.
>
> The ARM-ARM pseudo code for the routing rules is in
> AArch64.TakePhysicalSErrorException(). Firmware injecting fake SError should
> behave exactly like this.
>
>
> Newly appeared in the ARM-ARM is HCR_EL2.TEA, which takes Synchronous External
> Aborts to EL2 (if SCR_EL3 hasn't claimed them). We should set/clear this bit
> like we do HCR_EL2.AMO if the CPU has the RAS extensions.
>
>
>>> You cant use SError to cover all the possible RAS exceptions. We already have
>>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>>> abort from EL2. We can't return to the interrupted context and we can't deliver
>>> an SError to EL2 either.
>>
>> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
>> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>
> firmware-panic? We can do a little better than that:
> If the IESB bit is set in the ESR we can behave as if this were an ESB and have
> firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
> exception came from EL2.

This method may solve the problem I said above.
Is IESB a new bit added int ESR in the newest RAS spec?
>
> Linux should have an ESB in its __switch_to(), I've re-worked the daif masking
> in entry.S so that PSTATE.A will always be unmasked here.
>
> Other than these two cases, yes, this CPU really is wrecked. Firmware can power
> it off via PSCI, and could notify another CPU about what happened. UEFI's table
> 250 'Processor Generic Error Section' has a 'Processor ID' field, with a note
> that on ARM this is the MPIDR_EL1. Table 260 'ARM Processor Error Section' has
> something similar.
>
>
> Thanks,
>
> James
>
> .
>
Thanks,

Wang Xiongfeng

2017-04-24 17:15:54

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi Wang Xiongfeng,

On 21/04/17 12:33, Xiongfeng Wang wrote:
> On 2017/4/20 16:52, James Morse wrote:
>> On 19/04/17 03:37, Xiongfeng Wang wrote:
>>> On 2017/4/18 18:51, James Morse wrote:
>>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>>> describe a way to inject these as they are generated by the CPU.
>>>>
>>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>>>
>>> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
>>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>>> describes the SEI recovery process when ESB is implemented.
>>>
>>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
>>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
>>> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
>>
>>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
>>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
>>
>> This step has confused me. How would this work with VHE where the host runs at
>> EL2 and there is nothing at Host:EL1?
>
> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
> I don't know whether the step described in the section is designed for guest os or host os or both.
> Yes, it doesn't work with VHE where the host runs at EL2.

If it uses Virtual SError, it must be for an OS running at EL1, as the code
running at EL2 (VHE:Linux, firmware or Xen) must set HCR_EL2.AMO to make this work.

I can't find the section you describe in this RAS spec:
https://developer.arm.com/docs/ddi0587/latest/armreliabilityavailabilityandserviceabilityrasspecificationarmv8forthearmv8aarchitectureprofile


>> >From your description I assume you have some firmware resident at EL2.

> Our actual SEI step is planned as follows:

Ah, okay. You can't use Virtual SError at all then, as firmware doesn't own EL2:
KVM does. KVM will save/restore the HCR_EL2 register as part of its world
switch. Firmware must not modify the hyp registers behind the hyper-visors back.


> Host OS: EL0/EL1 -> EL3 -> EL0/EL1

i.e. While the host was running, so EL3 sees HCR_EL2.AMO clear, and directs the
SEI to EL1.

> Guest OS: EL0/EL1 -> EL3 -> EL2 -> EL0/EL1

i.e. While the guest was running, so EL3 sees HCR_EL2.AMO set, directs the SEI
to EL2 where KVM performs the world-switch and returns to host EL1.

Looks sensible.


> In guest os situation, we can inject an vSEI and return to where the SEI is taken from.

An SEI from firmware should always end up in the host OS. If a guest was running
KVM is responsible for doing the world switch to restore host EL1 and return
there. This should all work today.


> But in host os situation, we can't inject an vSEI (if we don't set HCR_EL2.AMO), so we have to jump to EL1 SEI vector.

Because your firmware is at EL3 you have to check PSTATE.A and jump into the
SError vector in both cases, you just choose if its VBAR_EL2 or VBAR_EL1 based
on HCR_EL2.AMO.


> Then the process following ESB won't be executed becuase SEI is taken to EL3 from the ESB instruction in EL1, and when control
> is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.

Firmware should set ELR_EL1 so that an eret from the EL1 SError vector takes you
back to the ESB instruction that took us to EL3 in the first place.


There is a problem here mixing SError as the CPU->Software notification of RAS
errors and as APEI's SEI Software->Software notification that a firmware-first
error has been handled by EL3.

To elaborate:
The routine in this patch was something like:
1 ESB
2 Read DISR_EL1
3 If set, branch to the SError handler.

If we have firmware-first ESB will generate an SError as PSTATE.A at EL{1,2}
doesn't mask SError for EL3. Firmware can then handle the RAS error. With this
the CPU->Software story is done. Firmware notifying the OS via APEI is a
different problem.

If the error doesn't need to be notified to the OS via SEI, firmware can return
to step 1 above with DISR_EL1 clear. The OS may be told via another mechanism
such as polling or an irq.

If PSTATE.A was clear, firmware can return to the SError vector. If PSTATE.A was
set and firmware knows this was due to an ESB, it can set DISR_EL1. The problem
is firmware can't know if the SError was due to an ESB.

The only time we are likely to have PSTATE.A masked is when changing exception
level. For this we can rely on IESB, so step 1 isn't necessary in the routine
above. Firmware knows if an SError taken to EL3 is due to an IESB, as the
ESR_EL3.IESB bit will be set, in this case it can write to DISR_EL1 and return.

In Linux we can use IESB for changes in exception level, and ESB with PSTATE.A
clear for any other need. (e.g. __switch_to()). Firmware can notify us using SEI
if we use either of these two patterns. This may not work for other OS (Xen?),
but this is a problem with using SEI as an APEI notification method.

(Some cleanup of our SError unmasking is needed, I have some patches I will post
on a branch shortly)


> It is ok to just ignore the process following the ESB instruction in el0_sync, because the process will be sent SIGBUS signal.

I don't understand. How will Linux know the process caused an error if we
neither take an SError nor read DISR_EL1 after an ESB?


> But for el0_irq, the irq process following the ESB may should be executed because the irq is not related to the user process.

If we exit EL0 for an IRQ, and take an SError due to IESB we know the fault was
due to EL0, not the exception level we entered to process the IRQ.


> If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into host OS in EL3.

(from EL3?) Only if the host OS is running at EL1 and isn't using EL2. Firmware
can't know if we're using EL2 so this can't work.


> Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may be too racy is
> not consistent with ARM rules.

EL3 flipping bits in EL2 system registers behind EL2 software's back is going to
have strange side effects. e.g. some Hypervisor may not change HCR_EL2 when
switching VM-A to VM-B as the configuration is the same; now the virtual SError
will be delivered to the wrong VM. (KVM doesn't do this, but Xen might.)


>>>> You cant use SError to cover all the possible RAS exceptions. We already have
>>>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>>>> abort from EL2. We can't return to the interrupted context and we can't deliver
>>>> an SError to EL2 either.
>>>
>>> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
>>> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>>
>> firmware-panic? We can do a little better than that:
>> If the IESB bit is set in the ESR we can behave as if this were an ESB and have
>> firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
>> exception came from EL2.
>
> This method may solve the problem I said above.
> Is IESB a new bit added int ESR in the newest RAS spec?

The latest ARM-ARM (DDI0487B.a) describes it as part of the "RAS Extension in
ARMv8.2" in section 'A1.7.5 The Reliability, Availability, and Serviceability
(RAS) Extension'.

Jumping to 'D7.2.28 ESR_ELx, Exception Syndrome Register (ELx)', then page
D7-2284 for the 'ISS encoding for an SError interrupt', it indicates "the SError
interrupt was synchronized by the implicit [ESB] and taken immediately"


As above, with IESB for changes in exception level, and unmasking PSTATE.A for
any other use of ESB, Linux can make sure that firmware can always notify us of
an APEI event using SEI.

Another OS (or even UEFI) may not do the same, so firmware should be prepared
for ESB to be executed with PSTATE.A masked (i.e, not using IESB). If there is
no wider firmware-policy for this (reboot, notify a remote CPU), then returning
to the faulting location and trying to inject SError later is all we can do.


Thanks,

James

2017-04-28 03:00:28

by Xiongfeng Wang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Hi James,

Thanks for your explanation and suggests.

On 2017/4/25 1:14, James Morse wrote:
> Hi Wang Xiongfeng,
>
> On 21/04/17 12:33, Xiongfeng Wang wrote:
>> On 2017/4/20 16:52, James Morse wrote:
>>> On 19/04/17 03:37, Xiongfeng Wang wrote:
>>>> On 2017/4/18 18:51, James Morse wrote:
>>>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>>>> describe a way to inject these as they are generated by the CPU.
>>>>>
>>>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>>>>
>>>> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
>>>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>>>> describes the SEI recovery process when ESB is implemented.
>>>>
>>>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
>>>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
>>>> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
>>>
>>>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
>>>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
>>>
>>> This step has confused me. How would this work with VHE where the host runs at
>>> EL2 and there is nothing at Host:EL1?
>>
>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>> I don't know whether the step described in the section is designed for guest os or host os or both.
>> Yes, it doesn't work with VHE where the host runs at EL2.
>
> If it uses Virtual SError, it must be for an OS running at EL1, as the code
> running at EL2 (VHE:Linux, firmware or Xen) must set HCR_EL2.AMO to make this work.
>
> I can't find the section you describe in this RAS spec:
> https://developer.arm.com/docs/ddi0587/latest/armreliabilityavailabilityandserviceabilityrasspecificationarmv8forthearmv8aarchitectureprofile
>
I got the RAS spec from my company's ARM account. The document number is PRD03-PRDC-010953 .

>
>>> >From your description I assume you have some firmware resident at EL2.
>
>> Our actual SEI step is planned as follows:
>
> Ah, okay. You can't use Virtual SError at all then, as firmware doesn't own EL2:
> KVM does. KVM will save/restore the HCR_EL2 register as part of its world
> switch. Firmware must not modify the hyp registers behind the hyper-visors back.
>
>
>> Host OS: EL0/EL1 -> EL3 -> EL0/EL1
>
> i.e. While the host was running, so EL3 sees HCR_EL2.AMO clear, and directs the
> SEI to EL1.
>
>> Guest OS: EL0/EL1 -> EL3 -> EL2 -> EL0/EL1
>
> i.e. While the guest was running, so EL3 sees HCR_EL2.AMO set, directs the SEI
> to EL2 where KVM performs the world-switch and returns to host EL1.
>
> Looks sensible.
>
>
>> In guest os situation, we can inject an vSEI and return to where the SEI is taken from.
>
> An SEI from firmware should always end up in the host OS. If a guest was running
> KVM is responsible for doing the world switch to restore host EL1 and return
> there. This should all work today.
>
>
>> But in host os situation, we can't inject an vSEI (if we don't set HCR_EL2.AMO), so we have to jump to EL1 SEI vector.
>
> Because your firmware is at EL3 you have to check PSTATE.A and jump into the
> SError vector in both cases, you just choose if its VBAR_EL2 or VBAR_EL1 based
> on HCR_EL2.AMO.
>
>
>> Then the process following ESB won't be executed becuase SEI is taken to EL3 from the ESB instruction in EL1, and when control
>> is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.
>
> Firmware should set ELR_EL1 so that an eret from the EL1 SError vector takes you
> back to the ESB instruction that took us to EL3 in the first place.
>
>
> There is a problem here mixing SError as the CPU->Software notification of RAS
> errors and as APEI's SEI Software->Software notification that a firmware-first
> error has been handled by EL3.
>
> To elaborate:
> The routine in this patch was something like:
> 1 ESB
> 2 Read DISR_EL1
> 3 If set, branch to the SError handler.
>
> If we have firmware-first ESB will generate an SError as PSTATE.A at EL{1,2}
> doesn't mask SError for EL3. Firmware can then handle the RAS error. With this
> the CPU->Software story is done. Firmware notifying the OS via APEI is a
> different problem.
>
> If the error doesn't need to be notified to the OS via SEI, firmware can return
> to step 1 above with DISR_EL1 clear. The OS may be told via another mechanism
> such as polling or an irq.
>
> If PSTATE.A was clear, firmware can return to the SError vector. If PSTATE.A was
> set and firmware knows this was due to an ESB, it can set DISR_EL1. The problem
> is firmware can't know if the SError was due to an ESB.

Yes, if implicit ESB is not implemented, we can't know if the sError was due to
an ESB unless we only set PSTATE.A when ESB is executed.
>
> The only time we are likely to have PSTATE.A masked is when changing exception
> level. For this we can rely on IESB, so step 1 isn't necessary in the routine
> above. Firmware knows if an SError taken to EL3 is due to an IESB, as the
> ESR_EL3.IESB bit will be set, in this case it can write to DISR_EL1 and return.
>
> In Linux we can use IESB for changes in exception level, and ESB with PSTATE.A
> clear for any other need. (e.g. __switch_to()). Firmware can notify us using SEI
> if we use either of these two patterns. This may not work for other OS (Xen?),
> but this is a problem with using SEI as an APEI notification method.
>
> (Some cleanup of our SError unmasking is needed, I have some patches I will post
> on a branch shortly)
>
>
>> It is ok to just ignore the process following the ESB instruction in el0_sync, because the process will be sent SIGBUS signal.
>
> I don't understand. How will Linux know the process caused an error if we
> neither take an SError nor read DISR_EL1 after an ESB?
>
I think there may be some misunderstanding here. The ESB instruction is placed in kernel_entry
of el0_sync and el0_irq. For the el0_sync, such as an syscall from userspace, after ESB is executed,
we check whether DISR.A is set. If it is not set, we go on to process the syscall. If it is set, we
jump to sError vector and then just eret.
But for el0_irq, after ESB is executed, we check whether DISR.A is set. If it is set, we jump to
(using BL) sError vector, process SEI, return back to irq handler and process irq, and then eret.

>
>> But for el0_irq, the irq process following the ESB may should be executed because the irq is not related to the user process.
>
> If we exit EL0 for an IRQ, and take an SError due to IESB we know the fault was
> due to EL0, not the exception level we entered to process the IRQ.
>
>
>> If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into host OS in EL3.
>
> (from EL3?) Only if the host OS is running at EL1 and isn't using EL2. Firmware
> can't know if we're using EL2 so this can't work.
>
>
>> Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may be too racy is
>> not consistent with ARM rules.
>
> EL3 flipping bits in EL2 system registers behind EL2 software's back is going to
> have strange side effects. e.g. some Hypervisor may not change HCR_EL2 when
> switching VM-A to VM-B as the configuration is the same; now the virtual SError
> will be delivered to the wrong VM. (KVM doesn't do this, but Xen might.)
>
>
>>>>> You cant use SError to cover all the possible RAS exceptions. We already have
>>>>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>>>>> abort from EL2. We can't return to the interrupted context and we can't deliver
>>>>> an SError to EL2 either.
>>>>
>>>> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
>>>> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>>>
>>> firmware-panic? We can do a little better than that:
>>> If the IESB bit is set in the ESR we can behave as if this were an ESB and have
>>> firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
>>> exception came from EL2.
>>
>> This method may solve the problem I said above.
>> Is IESB a new bit added int ESR in the newest RAS spec?
>
> The latest ARM-ARM (DDI0487B.a) describes it as part of the "RAS Extension in
> ARMv8.2" in section 'A1.7.5 The Reliability, Availability, and Serviceability
> (RAS) Extension'.
>
> Jumping to 'D7.2.28 ESR_ELx, Exception Syndrome Register (ELx)', then page
> D7-2284 for the 'ISS encoding for an SError interrupt', it indicates "the SError
> interrupt was synchronized by the implicit [ESB] and taken immediately"
>
>
> As above, with IESB for changes in exception level, and unmasking PSTATE.A for
> any other use of ESB, Linux can make sure that firmware can always notify us of
> an APEI event using SEI.
>
> Another OS (or even UEFI) may not do the same, so firmware should be prepared
> for ESB to be executed with PSTATE.A masked (i.e, not using IESB). If there is
> no wider firmware-policy for this (reboot, notify a remote CPU), then returning
> to the faulting location and trying to inject SError later is all we can do.
>
>
> Thanks,
>
> James
>
> .
>
Thanks,
Wang Xiongfeng