2017-06-26 12:25:55

by Dongjiu Geng

[permalink] [raw]
Subject: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

when SError happen, kvm notifies user space to record the CPER,
user space specifies and passes the contents of ESR_EL1 on taking
a virtual SError interrupt to KVM, KVM enables virtual system
error or asynchronous abort with this specifies syndrome. This
patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
saves the virtual SError syndrome, it becomes the ESR_EL1 value when
HCR_EL2.VSE injects an SError. This register is added by the
RAS Extensions.

Changes since v3:
(1) Move restore VSESR_EL2 value logic to a helper method
(2) In the world-switch, not save VSESR_EL2, because no one cares the
old VSESR_EL2 value
(3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
a virtual system error

Signed-off-by: Dongjiu Geng <[email protected]>
Signed-off-by: Quanming Wu <[email protected]>
---
Documentation/virtual/kvm/api.txt | 10 ++++++++++
arch/arm/include/asm/kvm_host.h | 1 +
arch/arm/kvm/arm.c | 7 +++++++
arch/arm/kvm/guest.c | 5 +++++
arch/arm64/include/asm/esr.h | 2 ++
arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/sysreg.h | 3 +++
arch/arm64/kvm/guest.c | 14 ++++++++++++++
arch/arm64/kvm/handle_exit.c | 25 +++++++++++++++++++------
arch/arm64/kvm/hyp/switch.c | 14 ++++++++++++++
include/uapi/linux/kvm.h | 8 ++++++++
12 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 3c248f7..852ac55 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
__u32 pad;
};

+4.104 KVM_ARM_SEI
+
+Capability: KVM_EXIT_SERROR_INTR
+Architectures: arm/arm64
+Type: vcpu ioctl
+Parameters: u64 (syndrome)
+Returns: 0 in case of success
+
+Pend an virtual system error or asynchronous abort with user space specified.
+
5. The kvm_run structure
------------------------

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 31ee468..566292a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);

int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
int exception_index);
+int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);

static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
unsigned long hyp_stack_ptr,
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 96dba7c..583294f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -987,6 +987,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return -EFAULT;
return kvm_arm_vcpu_has_attr(vcpu, &attr);
}
+ case KVM_ARM_SEI: {
+ u64 syndrome;
+
+ if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
+ return -EFAULT;
+ return kvm_vcpu_ioctl_sei(vcpu, &syndrome);
+ }
default:
return -EINVAL;
}
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index fa6182a..72505bf 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
}

+int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
+{
+ return 0;
+}
+
int __attribute_const__ kvm_target_cpu(void)
{
switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 22f9c90..d009c99 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -127,6 +127,8 @@
#define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
#define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)

+#define VSESR_ELx_IDS_ISS_MASK ((1UL << 25) - 1)
+
/* ESR value templates for specific events */

/* BRK instruction trap from AArch64 state */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5f64ab2..93dc3d1 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
return vcpu->arch.fault.esr_el2;
}

+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ vcpu->arch.fault.vsesr_el2 = val;
+}
+
static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
{
u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e7705e7..b6242fb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
u32 esr_el2; /* Hyp Syndrom Register */
u64 far_el2; /* Hyp Fault Address Register */
u64 hpfar_el2; /* Hyp IPA Fault Address Register */
+ u32 vsesr_el2; /* Virtual SError Exception Syndrome Register */
};

/*
@@ -385,6 +386,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
+int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);

static inline void __cpu_init_stage2(void)
{
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 32964c7..3af6907 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -125,6 +125,9 @@
#define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
#define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)

+/* virtual SError exception syndrome register in armv8.2 */
+#define REG_VSESR_EL2 sys_reg(3, 4, 5, 2, 3)
+
#define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM | \
(!!x)<<8 | 0x1f)
#define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM | \
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index b37446a..21c20b0 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,20 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
return -EINVAL;
}

+int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
+{
+ u64 reg = *syndrome;
+
+ if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
+ /* set vsesr_el2[24:0] with value that user space specified */
+ kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);
+
+ /* inject virtual system Error or asynchronous abort */
+ kvm_inject_vabt(vcpu);
+
+ return 0;
+}
+
int __attribute_const__ kvm_target_cpu(void)
{
unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index c89d83a..23c02c2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)

static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+ unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+ struct kvm_memory_slot *memslot;
+ int hsr, ret = 1;
+ bool writable;
+ gfn_t gfn;

if (handle_guest_sei((unsigned long)fault_ipa,
kvm_vcpu_get_hsr(vcpu))) {
@@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
(unsigned long)kvm_vcpu_get_hsr(vcpu));

kvm_inject_vabt(vcpu);
+ } else {
+ hsr = kvm_vcpu_get_hsr(vcpu);
+
+ gfn = fault_ipa >> PAGE_SHIFT;
+ memslot = gfn_to_memslot(vcpu->kvm, gfn);
+ hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
+
+ run->exit_reason = KVM_EXIT_SERROR_INTR;
+ run->serror_intr.syndrome_info = hsr;
+ run->serror_intr.address = hva;
+ ret = 0;
}

- return 0;
+ return ret;
}

/*
@@ -218,8 +233,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
*vcpu_pc(vcpu) -= adj;
}

- kvm_handle_guest_sei(vcpu, run);
- return 1;
+ return kvm_handle_guest_sei(vcpu, run);
}

exception_index = ARM_EXCEPTION_CODE(exception_index);
@@ -228,8 +242,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
case ARM_EXCEPTION_IRQ:
return 1;
case ARM_EXCEPTION_EL1_SERROR:
- kvm_handle_guest_sei(vcpu, run);
- return 1;
+ return kvm_handle_guest_sei(vcpu, run);
case ARM_EXCEPTION_TRAP:
/*
* See ARM ARM B1.14.1: "Hyp traps on instructions
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index aede165..6d17c97 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,13 @@ static hyp_alternate_select(__activate_traps_arch,
__activate_traps_nvhe, __activate_traps_vhe,
ARM64_HAS_VIRT_HOST_EXTN);

+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu)
+{
+ if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+ (vcpu->arch.hcr_el2 & HCR_VSE))
+ write_sysreg_s(vcpu->arch.fault.vsesr_el2, REG_VSESR_EL2);
+}
+
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
@@ -86,6 +93,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
isb();
}
write_sysreg(val, hcr_el2);
+
+ /*
+ * If the virtual SError interrupt is taken to EL1 using AArch64,
+ * then VSESR_EL2 provides the syndrome value reported in ESR_EL1.
+ */
+ __sysreg_set_vsesr(vcpu);
+
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
/*
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 27fe556..ab91fe3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -205,6 +205,7 @@ struct kvm_hyperv_exit {
#define KVM_EXIT_S390_STSI 25
#define KVM_EXIT_IOAPIC_EOI 26
#define KVM_EXIT_HYPERV 27
+#define KVM_EXIT_SERROR_INTR 28

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -360,6 +361,11 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+ /* KVM_EXIT_SERROR_INTR */
+ struct {
+ __u32 syndrome_info;
+ __u64 address;
+ } serror_intr;
/* KVM_EXIT_HYPERV */
struct kvm_hyperv_exit hyperv;
/* Fix the size of the union. */
@@ -1301,6 +1307,8 @@ struct kvm_s390_ucas_mapping {
#define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
/* Available with KVM_CAP_X86_SMM */
#define KVM_SMI _IO(KVMIO, 0xb7)
+/* Available with KVM_EXIT_SERROR_INTR */
+#define KVM_ARM_SEI _IO(KVMIO, 0xb8)

#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
--
2.10.1


2017-07-03 08:39:10

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi Dongjiu,

On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
> when SError happen, kvm notifies user space to record the CPER,
> user space specifies and passes the contents of ESR_EL1 on taking
> a virtual SError interrupt to KVM, KVM enables virtual system
> error or asynchronous abort with this specifies syndrome. This
> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
> HCR_EL2.VSE injects an SError. This register is added by the
> RAS Extensions.

This commit message is confusing and doesn't help me understand the
patch.

I think this patch is trying to do too many things. I suggest you split
the patch into (at least) one patch that captures exception information
from the world-switch path, one patch that deals with the new exit
reason, and finally a patch with the new ioctl. That way you can write
a commit message for each patch describing first what the patch does,
and then why this is a good idea.

Neverthess, I added some random comments below.

>
> Changes since v3:
> (1) Move restore VSESR_EL2 value logic to a helper method
> (2) In the world-switch, not save VSESR_EL2, because no one cares the
> old VSESR_EL2 value
> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
> a virtual system error
>
> Signed-off-by: Dongjiu Geng <[email protected]>
> Signed-off-by: Quanming Wu <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 10 ++++++++++
> arch/arm/include/asm/kvm_host.h | 1 +
> arch/arm/kvm/arm.c | 7 +++++++
> arch/arm/kvm/guest.c | 5 +++++
> arch/arm64/include/asm/esr.h | 2 ++
> arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/include/asm/sysreg.h | 3 +++
> arch/arm64/kvm/guest.c | 14 ++++++++++++++
> arch/arm64/kvm/handle_exit.c | 25 +++++++++++++++++++------
> arch/arm64/kvm/hyp/switch.c | 14 ++++++++++++++
> include/uapi/linux/kvm.h | 8 ++++++++
> 12 files changed, 95 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 3c248f7..852ac55 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
> __u32 pad;
> };
>
> +4.104 KVM_ARM_SEI
> +
> +Capability: KVM_EXIT_SERROR_INTR
> +Architectures: arm/arm64
> +Type: vcpu ioctl
> +Parameters: u64 (syndrome)
> +Returns: 0 in case of success
> +
> +Pend an virtual system error or asynchronous abort with user space specified.
> +

I don't understand enough about what this ioctl does by just reading
this text. Did you mean to say something like "Record that userspace
wishes to inject a pending virtual system error to the VCPU. Next time
the VCPU is run, th

> 5. The kvm_run structure
> ------------------------
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468..566292a 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
>
> int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> int exception_index);
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>
> static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
> unsigned long hyp_stack_ptr,
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 96dba7c..583294f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -987,6 +987,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> return -EFAULT;
> return kvm_arm_vcpu_has_attr(vcpu, &attr);
> }
> + case KVM_ARM_SEI: {
> + u64 syndrome;
> +
> + if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
> + return -EFAULT;
> + return kvm_vcpu_ioctl_sei(vcpu, &syndrome);
> + }
> default:
> return -EINVAL;
> }
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index fa6182a..72505bf 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
> return -EINVAL;
> }
>
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> +{
> + return 0;
> +}
> +
> int __attribute_const__ kvm_target_cpu(void)
> {
> switch (read_cpuid_part()) {
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 22f9c90..d009c99 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -127,6 +127,8 @@
> #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
> #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
>
> +#define VSESR_ELx_IDS_ISS_MASK ((1UL << 25) - 1)
> +
> /* ESR value templates for specific events */
>
> /* BRK instruction trap from AArch64 state */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5f64ab2..93dc3d1 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
> return vcpu->arch.fault.esr_el2;
> }
>
> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.fault.vsesr_el2;
> +}
> +
> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
> +{
> + vcpu->arch.fault.vsesr_el2 = val;
> +}
> +
> static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
> {
> u32 esr = kvm_vcpu_get_hsr(vcpu);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e7705e7..b6242fb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
> u32 esr_el2; /* Hyp Syndrom Register */
> u64 far_el2; /* Hyp Fault Address Register */
> u64 hpfar_el2; /* Hyp IPA Fault Address Register */
> + u32 vsesr_el2; /* Virtual SError Exception Syndrome Register */
> };
>
> /*
> @@ -385,6 +386,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>
> static inline void __cpu_init_stage2(void)
> {
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 32964c7..3af6907 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -125,6 +125,9 @@
> #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
> #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)
>
> +/* virtual SError exception syndrome register in armv8.2 */
> +#define REG_VSESR_EL2 sys_reg(3, 4, 5, 2, 3)
> +
> #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM | \
> (!!x)<<8 | 0x1f)
> #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM | \
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index b37446a..21c20b0 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -277,6 +277,20 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
> return -EINVAL;
> }
>
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> +{
> + u64 reg = *syndrome;
> +
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
> + /* set vsesr_el2[24:0] with value that user space specified */
> + kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);

Are you really supposed to fail silently here if trying to do this on a
system that doesn't have RAS ?

Why can you not set reg to 0 here? That doesn't seem to be documented
anywhere, and shouldn't the function return -EINVAL in this case?

Also, if you do this, but don't run the VCPU, then migrate the VM, and
run the VCPU on the new destination, isn't this information lost?


> +
> + /* inject virtual system Error or asynchronous abort */
> + kvm_inject_vabt(vcpu);
> +
> + return 0;
> +}
> +
> int __attribute_const__ kvm_target_cpu(void)
> {
> unsigned long implementor = read_cpuid_implementor();
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index c89d83a..23c02c2 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>
> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> - unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> + unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> + struct kvm_memory_slot *memslot;
> + int hsr, ret = 1;
> + bool writable;
> + gfn_t gfn;
>
> if (handle_guest_sei((unsigned long)fault_ipa,
> kvm_vcpu_get_hsr(vcpu))) {
> @@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> (unsigned long)kvm_vcpu_get_hsr(vcpu));
>
> kvm_inject_vabt(vcpu);
> + } else {
> + hsr = kvm_vcpu_get_hsr(vcpu);
> +
> + gfn = fault_ipa >> PAGE_SHIFT;
> + memslot = gfn_to_memslot(vcpu->kvm, gfn);
> + hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
> +
> + run->exit_reason = KVM_EXIT_SERROR_INTR;
> + run->serror_intr.syndrome_info = hsr;
> + run->serror_intr.address = hva;
> + ret = 0;
> }
>
> - return 0;
> + return ret;
> }
>
> /*
> @@ -218,8 +233,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> *vcpu_pc(vcpu) -= adj;
> }
>
> - kvm_handle_guest_sei(vcpu, run);
> - return 1;
> + return kvm_handle_guest_sei(vcpu, run);
> }
>
> exception_index = ARM_EXCEPTION_CODE(exception_index);
> @@ -228,8 +242,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> case ARM_EXCEPTION_IRQ:
> return 1;
> case ARM_EXCEPTION_EL1_SERROR:
> - kvm_handle_guest_sei(vcpu, run);
> - return 1;
> + return kvm_handle_guest_sei(vcpu, run);
> case ARM_EXCEPTION_TRAP:
> /*
> * See ARM ARM B1.14.1: "Hyp traps on instructions
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index aede165..6d17c97 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -67,6 +67,13 @@ static hyp_alternate_select(__activate_traps_arch,
> __activate_traps_nvhe, __activate_traps_vhe,
> ARM64_HAS_VIRT_HOST_EXTN);
>
> +static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu)
> +{
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
> + (vcpu->arch.hcr_el2 & HCR_VSE))
> + write_sysreg_s(vcpu->arch.fault.vsesr_el2, REG_VSESR_EL2);
> +}
> +
> static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> {
> u64 val;
> @@ -86,6 +93,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> isb();
> }
> write_sysreg(val, hcr_el2);
> +
> + /*
> + * If the virtual SError interrupt is taken to EL1 using AArch64,
> + * then VSESR_EL2 provides the syndrome value reported in ESR_EL1.
> + */
> + __sysreg_set_vsesr(vcpu);
> +
> /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
> write_sysreg(1 << 15, hstr_el2);
> /*
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 27fe556..ab91fe3 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -205,6 +205,7 @@ struct kvm_hyperv_exit {
> #define KVM_EXIT_S390_STSI 25
> #define KVM_EXIT_IOAPIC_EOI 26
> #define KVM_EXIT_HYPERV 27
> +#define KVM_EXIT_SERROR_INTR 28
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -360,6 +361,11 @@ struct kvm_run {
> struct {
> __u8 vector;
> } eoi;
> + /* KVM_EXIT_SERROR_INTR */
> + struct {
> + __u32 syndrome_info;
> + __u64 address;
> + } serror_intr;
> /* KVM_EXIT_HYPERV */
> struct kvm_hyperv_exit hyperv;
> /* Fix the size of the union. */
> @@ -1301,6 +1307,8 @@ struct kvm_s390_ucas_mapping {
> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
> /* Available with KVM_CAP_X86_SMM */
> #define KVM_SMI _IO(KVMIO, 0xb7)
> +/* Available with KVM_EXIT_SERROR_INTR */
> +#define KVM_ARM_SEI _IO(KVMIO, 0xb8)
>
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> --
> 2.10.1
>

Thanks,
-Christoffer

2017-07-04 04:47:30

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi Christoffer,
thanks for the review.


On 2017/7/3 16:39, Christoffer Dall wrote:
> Hi Dongjiu,
>
> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>> when SError happen, kvm notifies user space to record the CPER,
>> user space specifies and passes the contents of ESR_EL1 on taking
>> a virtual SError interrupt to KVM, KVM enables virtual system
>> error or asynchronous abort with this specifies syndrome. This
>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>> HCR_EL2.VSE injects an SError. This register is added by the
>> RAS Extensions.
>
> This commit message is confusing and doesn't help me understand the
> patch.
(1) what is the rationale for the guest OS SError interrupt(SEI) handling in the RAS solution?
you can refer to document: "RAS_Extension_PRD03-PRDC-010953-32-0, 6.5.3 Example software sequences"
a). In the firmware-first RAS solution, when guest OS happen a SError interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
b). The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came from guest OS EL1, firmware
does by faking an SError interrupt exception entry to EL2.
c). Control transfers to the hypervisor's delegated error recovery agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
Virtual SError interrupt to delegate an asynchronous abort to EL1, by setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.

(2) what is this patch mainly do?
As mentioned above, the hypervisor needs to enable virtual SError and pass the virtual syndrome to the guest OS.

a). when Control transfers to the hypervisor from firmware by faking an SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
host VA address( Qemu translate this VA address to the virtual machine physical address(IPA)) using below new added "serror_intr" struct.
/* KVM_EXIT_SERROR_INTR */
struct {
__u32 syndrome_info;
__u64 address;
} serror_intr;

b). Qemu gets the address(host VA) delivered by KVM, translate this host VA address to virtual machine physical address(IPA), and runtime record this virtual
machine physical address(IPA) to the guest OS's APEI table.

c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome value(but can be different from it) to specify the virtual SError interrupt's syndrome through setting VESR_EL2.

the vsesr_el2 is armv8.2 register, its explanation can be found in "RAS_Extension_PRD03-PRDC-010953-33-0, 5.6.18 VSESR_EL2, Virtual SError Exception Syndrome Register"

>>The VSESR_EL2 characteristics are:
>>Purpose:
>>Provides the syndrome value reported to software on taking a virtual SError interrupt exception:
>> ? If the virtual SError interrupt is taken to EL1 using AArch64 then VSESR_EL2 provides the
>>syndrome value reported in ESR_EL1.
>> ? If the virtual SError interrupt is taken to EL1 using AArch32 then VSESR_EL2 provides the
>> syndrome values reported in DFSR.{AET, ExT} and the remainder of the DFSR is set as
>> defined by VMSAv8-32.

so in the KVM, I added a new IOCTL(#define KVM_ARM_SEI _IO(KVMIO, 0xb8)) to pass the virtual SError syndrome value specified by Qemu and enable a virtual System Error.


d). when world switch to guest OS, guest OS will happen virtual SError(this virtual SError can not be route to EL3 firmware), guest OS uses the specified syndrome value to do the recovery and
parses the guest OS CPER which is dynamically recorded by the Qemu in the APEI table .



>
> I think this patch is trying to do too many things. I suggest you split
> the patch into (at least) one patch that captures exception information
> from the world-switch path, one patch that deals with the new exit
> reason, and finally a patch with the new ioctl. That way you can write
> a commit message for each patch describing first what the patch does,
> and then why this is a good idea.
Ok, thanks for the good suggestion.

>
> Neverthess, I added some random comments below.
>
>>
>> Changes since v3:
>> (1) Move restore VSESR_EL2 value logic to a helper method
>> (2) In the world-switch, not save VSESR_EL2, because no one cares the
>> old VSESR_EL2 value
>> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
>> a virtual system error
>>
>> Signed-off-by: Dongjiu Geng <[email protected]>
>> Signed-off-by: Quanming Wu <[email protected]>
>> ---
>> Documentation/virtual/kvm/api.txt | 10 ++++++++++
>> arch/arm/include/asm/kvm_host.h | 1 +
>> arch/arm/kvm/arm.c | 7 +++++++
>> arch/arm/kvm/guest.c | 5 +++++
>> arch/arm64/include/asm/esr.h | 2 ++
>> arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
>> arch/arm64/include/asm/kvm_host.h | 2 ++
>> arch/arm64/include/asm/sysreg.h | 3 +++
>> arch/arm64/kvm/guest.c | 14 ++++++++++++++
>> arch/arm64/kvm/handle_exit.c | 25 +++++++++++++++++++------
>> arch/arm64/kvm/hyp/switch.c | 14 ++++++++++++++
>> include/uapi/linux/kvm.h | 8 ++++++++
>> 12 files changed, 95 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 3c248f7..852ac55 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
>> __u32 pad;
>> };
>>
>> +4.104 KVM_ARM_SEI
>> +
>> +Capability: KVM_EXIT_SERROR_INTR
>> +Architectures: arm/arm64
>> +Type: vcpu ioctl
>> +Parameters: u64 (syndrome)
>> +Returns: 0 in case of success
>> +
>> +Pend an virtual system error or asynchronous abort with user space specified.
>> +
>
> I don't understand enough about what this ioctl does by just reading
> this text. Did you mean to say something like "Record that userspace
> wishes to inject a pending virtual system error to the VCPU. Next time
> the VCPU is run, th
Christoffer, sorry to confuse you.
This IOCTL mainly do two things:

(1) set the VCPU struct's vsesr_el2 to pass the virtual SError's syndrome value that Qemu specified.
(2) enable virtual SError

+int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
+{
+ u64 reg = *syndrome;
+
+ if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
+ /* set vsesr_el2[24:0] with value that user space specified */
+ kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);

+
+ /* inject virtual system Error or asynchronous abort */
+ kvm_inject_vabt(vcpu);
+
+ return 0;
+}

>
>> 5. The kvm_run structure
>> ------------------------
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 31ee468..566292a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
>>
>> int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> int exception_index);
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>>
>> static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>> unsigned long hyp_stack_ptr,
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 96dba7c..583294f 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -987,6 +987,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>> return -EFAULT;
>> return kvm_arm_vcpu_has_attr(vcpu, &attr);
>> }
>> + case KVM_ARM_SEI: {
>> + u64 syndrome;
>> +
>> + if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
>> + return -EFAULT;
>> + return kvm_vcpu_ioctl_sei(vcpu, &syndrome);
>> + }
>> default:
>> return -EINVAL;
>> }
>> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
>> index fa6182a..72505bf 100644
>> --- a/arch/arm/kvm/guest.c
>> +++ b/arch/arm/kvm/guest.c
>> @@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>> return -EINVAL;
>> }
>>
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
>> +{
>> + return 0;
>> +}
>> +
>> int __attribute_const__ kvm_target_cpu(void)
>> {
>> switch (read_cpuid_part()) {
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 22f9c90..d009c99 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -127,6 +127,8 @@
>> #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
>> #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
>>
>> +#define VSESR_ELx_IDS_ISS_MASK ((1UL << 25) - 1)
>> +
>> /* ESR value templates for specific events */
>>
>> /* BRK instruction trap from AArch64 state */
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 5f64ab2..93dc3d1 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
>> return vcpu->arch.fault.esr_el2;
>> }
>>
>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>> +{
>> + return vcpu->arch.fault.vsesr_el2;
>> +}
>> +
>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
>> +{
>> + vcpu->arch.fault.vsesr_el2 = val;
>> +}
>> +
>> static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>> {
>> u32 esr = kvm_vcpu_get_hsr(vcpu);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index e7705e7..b6242fb 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
>> u32 esr_el2; /* Hyp Syndrom Register */
>> u64 far_el2; /* Hyp Fault Address Register */
>> u64 hpfar_el2; /* Hyp IPA Fault Address Register */
>> + u32 vsesr_el2; /* Virtual SError Exception Syndrome Register */
>> };
>>
>> /*
>> @@ -385,6 +386,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>> struct kvm_device_attr *attr);
>> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>> struct kvm_device_attr *attr);
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>>
>> static inline void __cpu_init_stage2(void)
>> {
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 32964c7..3af6907 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -125,6 +125,9 @@
>> #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
>> #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)
>>
>> +/* virtual SError exception syndrome register in armv8.2 */
>> +#define REG_VSESR_EL2 sys_reg(3, 4, 5, 2, 3)
>> +
>> #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM | \
>> (!!x)<<8 | 0x1f)
>> #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM | \
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index b37446a..21c20b0 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,20 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>> return -EINVAL;
>> }
>>
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
>> +{
>> + u64 reg = *syndrome;
>> +
>> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
>> + /* set vsesr_el2[24:0] with value that user space specified */
>> + kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);
>
> Are you really supposed to fail silently here if trying to do this on a
> system that doesn't have RAS ?
This register vsesr_el2 is only introduced by armv8.2.
so if system does not have RAS, the setting will be failed because this register does not exist. I need to "return -EINVAL"
for this case. thank you for pointing that.


>
> Why can you not set reg to 0 here? That doesn't seem to be documented
> anywhere, and shouldn't the function return -EINVAL in this case?
Because the default value is 0. if want to set to 0, it can not call this API.
zero is meaningless syndrome value. no one care the zero value.


>
> Also, if you do this, but don't run the VCPU, then migrate the VM, and
> run the VCPU on the new destination, isn't this information lost?
>
>
>> +
>> + /* inject virtual system Error or asynchronous abort */
>> + kvm_inject_vabt(vcpu);
>> +
>> + return 0;
>> +}
>> +
>> int __attribute_const__ kvm_target_cpu(void)
>> {
>> unsigned long implementor = read_cpuid_implementor();
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index c89d83a..23c02c2 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>>
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>> - unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>> + unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>> + struct kvm_memory_slot *memslot;
>> + int hsr, ret = 1;
>> + bool writable;
>> + gfn_t gfn;
>>
>> if (handle_guest_sei((unsigned long)fault_ipa,
>> kvm_vcpu_get_hsr(vcpu))) {
>> @@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> (unsigned long)kvm_vcpu_get_hsr(vcpu));
>>
>> kvm_inject_vabt(vcpu);
>> + } else {
>> + hsr = kvm_vcpu_get_hsr(vcpu);
>> +
>> + gfn = fault_ipa >> PAGE_SHIFT;
>> + memslot = gfn_to_memslot(vcpu->kvm, gfn);
>> + hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>> +
>> + run->exit_reason = KVM_EXIT_SERROR_INTR;
>> + run->serror_intr.syndrome_info = hsr;
>> + run->serror_intr.address = hva;
>> + ret = 0;
>> }
>>
>> - return 0;
>> + return ret;
>> }
>>
>> /*
>> @@ -218,8 +233,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> *vcpu_pc(vcpu) -= adj;
>> }
>>
>> - kvm_handle_guest_sei(vcpu, run);
>> - return 1;
>> + return kvm_handle_guest_sei(vcpu, run);
>> }
>>
>> exception_index = ARM_EXCEPTION_CODE(exception_index);
>> @@ -228,8 +242,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>> case ARM_EXCEPTION_IRQ:
>> return 1;
>> case ARM_EXCEPTION_EL1_SERROR:
>> - kvm_handle_guest_sei(vcpu, run);
>> - return 1;
>> + return kvm_handle_guest_sei(vcpu, run);
>> case ARM_EXCEPTION_TRAP:
>> /*
>> * See ARM ARM B1.14.1: "Hyp traps on instructions
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index aede165..6d17c97 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -67,6 +67,13 @@ static hyp_alternate_select(__activate_traps_arch,
>> __activate_traps_nvhe, __activate_traps_vhe,
>> ARM64_HAS_VIRT_HOST_EXTN);
>>
>> +static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu)
>> +{
>> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
>> + (vcpu->arch.hcr_el2 & HCR_VSE))
>> + write_sysreg_s(vcpu->arch.fault.vsesr_el2, REG_VSESR_EL2);
>> +}
>> +
>> static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>> {
>> u64 val;
>> @@ -86,6 +93,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>> isb();
>> }
>> write_sysreg(val, hcr_el2);
>> +
>> + /*
>> + * If the virtual SError interrupt is taken to EL1 using AArch64,
>> + * then VSESR_EL2 provides the syndrome value reported in ESR_EL1.
>> + */
>> + __sysreg_set_vsesr(vcpu);
>> +
>> /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>> write_sysreg(1 << 15, hstr_el2);
>> /*
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 27fe556..ab91fe3 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -205,6 +205,7 @@ struct kvm_hyperv_exit {
>> #define KVM_EXIT_S390_STSI 25
>> #define KVM_EXIT_IOAPIC_EOI 26
>> #define KVM_EXIT_HYPERV 27
>> +#define KVM_EXIT_SERROR_INTR 28
>>
>> /* For KVM_EXIT_INTERNAL_ERROR */
>> /* Emulate instruction failed. */
>> @@ -360,6 +361,11 @@ struct kvm_run {
>> struct {
>> __u8 vector;
>> } eoi;
>> + /* KVM_EXIT_SERROR_INTR */
>> + struct {
>> + __u32 syndrome_info;
>> + __u64 address;
>> + } serror_intr;
>> /* KVM_EXIT_HYPERV */
>> struct kvm_hyperv_exit hyperv;
>> /* Fix the size of the union. */
>> @@ -1301,6 +1307,8 @@ struct kvm_s390_ucas_mapping {
>> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>> /* Available with KVM_CAP_X86_SMM */
>> #define KVM_SMI _IO(KVMIO, 0xb7)
>> +/* Available with KVM_EXIT_SERROR_INTR */
>> +#define KVM_ARM_SEI _IO(KVMIO, 0xb8)
>>
>> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>> --
>> 2.10.1
>>
>
> Thanks,
> -Christoffer
>
> .
>

2017-07-04 08:50:00

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi Dongjiu,

On Tue, Jul 04, 2017 at 12:46:23PM +0800, gengdongjiu wrote:
> Hi Christoffer,
> thanks for the review.
>
>
> On 2017/7/3 16:39, Christoffer Dall wrote:
> > Hi Dongjiu,
> >
> > On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
> >> when SError happen, kvm notifies user space to record the CPER,
> >> user space specifies and passes the contents of ESR_EL1 on taking
> >> a virtual SError interrupt to KVM, KVM enables virtual system
> >> error or asynchronous abort with this specifies syndrome. This
> >> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
> >> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
> >> HCR_EL2.VSE injects an SError. This register is added by the
> >> RAS Extensions.
> >
> > This commit message is confusing and doesn't help me understand the
> > patch.
> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in the RAS solution?
> you can refer to document: "RAS_Extension_PRD03-PRDC-010953-32-0, 6.5.3 Example software sequences"
> a). In the firmware-first RAS solution, when guest OS happen a SError interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
> b). The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came from guest OS EL1, firmware
> does by faking an SError interrupt exception entry to EL2.
> c). Control transfers to the hypervisor's delegated error recovery agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
> Virtual SError interrupt to delegate an asynchronous abort to EL1, by setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
>
> (2) what is this patch mainly do?
> As mentioned above, the hypervisor needs to enable virtual SError and pass the virtual syndrome to the guest OS.
>
> a). when Control transfers to the hypervisor from firmware by faking an SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
> host VA address( Qemu translate this VA address to the virtual machine physical address(IPA)) using below new added "serror_intr" struct.
> /* KVM_EXIT_SERROR_INTR */
> struct {
> __u32 syndrome_info;
> __u64 address;
> } serror_intr;
>
> b). Qemu gets the address(host VA) delivered by KVM, translate this host VA address to virtual machine physical address(IPA), and runtime record this virtual
> machine physical address(IPA) to the guest OS's APEI table.
>
> c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome value(but can be different from it) to specify the virtual SError interrupt's syndrome through setting VESR_EL2.
>
> the vsesr_el2 is armv8.2 register, its explanation can be found in "RAS_Extension_PRD03-PRDC-010953-33-0, 5.6.18 VSESR_EL2, Virtual SError Exception Syndrome Register"
>
> >>The VSESR_EL2 characteristics are:
> >>Purpose:
> >>Provides the syndrome value reported to software on taking a virtual SError interrupt exception:
> >> — If the virtual SError interrupt is taken to EL1 using AArch64 then VSESR_EL2 provides the
> >>syndrome value reported in ESR_EL1.
> >> — If the virtual SError interrupt is taken to EL1 using AArch32 then VSESR_EL2 provides the
> >> syndrome values reported in DFSR.{AET, ExT} and the remainder of the DFSR is set as
> >> defined by VMSAv8-32.
>
> so in the KVM, I added a new IOCTL(#define KVM_ARM_SEI _IO(KVMIO, 0xb8)) to pass the virtual SError syndrome value specified by Qemu and enable a virtual System Error.
>
>
> d). when world switch to guest OS, guest OS will happen virtual SError(this virtual SError can not be route to EL3 firmware), guest OS uses the specified syndrome value to do the recovery and
> parses the guest OS CPER which is dynamically recorded by the Qemu in the APEI table .
>
>

Please format the text nicely, and try to simplify the message when
resubmitting these patches. I hope it will be easier for you to write
a meaningful commit message if you split these patches into multiple
ones.

One specific thing currently lacking from the commit message is a
discussion of why this API is needed in the first place - why can't we
reuse KVM_SET_ONE_REG, for example.

>
> >
> > I think this patch is trying to do too many things. I suggest you split
> > the patch into (at least) one patch that captures exception information
> > from the world-switch path, one patch that deals with the new exit
> > reason, and finally a patch with the new ioctl. That way you can write
> > a commit message for each patch describing first what the patch does,
> > and then why this is a good idea.
> Ok, thanks for the good suggestion.
>
> >
> > Neverthess, I added some random comments below.
> >
> >>
> >> Changes since v3:
> >> (1) Move restore VSESR_EL2 value logic to a helper method
> >> (2) In the world-switch, not save VSESR_EL2, because no one cares the
> >> old VSESR_EL2 value
> >> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
> >> a virtual system error
> >>
> >> Signed-off-by: Dongjiu Geng <[email protected]>
> >> Signed-off-by: Quanming Wu <[email protected]>
> >> ---
> >> Documentation/virtual/kvm/api.txt | 10 ++++++++++
> >> arch/arm/include/asm/kvm_host.h | 1 +
> >> arch/arm/kvm/arm.c | 7 +++++++
> >> arch/arm/kvm/guest.c | 5 +++++
> >> arch/arm64/include/asm/esr.h | 2 ++
> >> arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
> >> arch/arm64/include/asm/kvm_host.h | 2 ++
> >> arch/arm64/include/asm/sysreg.h | 3 +++
> >> arch/arm64/kvm/guest.c | 14 ++++++++++++++
> >> arch/arm64/kvm/handle_exit.c | 25 +++++++++++++++++++------
> >> arch/arm64/kvm/hyp/switch.c | 14 ++++++++++++++
> >> include/uapi/linux/kvm.h | 8 ++++++++
> >> 12 files changed, 95 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 3c248f7..852ac55 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
> >> __u32 pad;
> >> };
> >>
> >> +4.104 KVM_ARM_SEI
> >> +
> >> +Capability: KVM_EXIT_SERROR_INTR
> >> +Architectures: arm/arm64
> >> +Type: vcpu ioctl
> >> +Parameters: u64 (syndrome)
> >> +Returns: 0 in case of success
> >> +
> >> +Pend an virtual system error or asynchronous abort with user space specified.
> >> +
> >
> > I don't understand enough about what this ioctl does by just reading
> > this text. Did you mean to say something like "Record that userspace
> > wishes to inject a pending virtual system error to the VCPU. Next time
> > the VCPU is run, th
> Christoffer, sorry to confuse you.
> This IOCTL mainly do two things:
>
> (1) set the VCPU struct's vsesr_el2 to pass the virtual SError's syndrome value that Qemu specified.
> (2) enable virtual SError
>
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> +{
> + u64 reg = *syndrome;
> +
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
> + /* set vsesr_el2[24:0] with value that user space specified */
> + kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);
>
> +
> + /* inject virtual system Error or asynchronous abort */
> + kvm_inject_vabt(vcpu);
> +
> + return 0;
> +}
>

Dongjiu, I can read the code, of course. My comment was to tell you
that the text you add to the api.txt file should be meaningful and
explanatory on its own; that's the whole point of having such a file.

> >
> >> 5. The kvm_run structure
> >> ------------------------
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 31ee468..566292a 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
> >>
> >> int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
> >> int exception_index);
> >> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
> >>
> >> static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
> >> unsigned long hyp_stack_ptr,
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 96dba7c..583294f 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -987,6 +987,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> >> return -EFAULT;
> >> return kvm_arm_vcpu_has_attr(vcpu, &attr);
> >> }
> >> + case KVM_ARM_SEI: {
> >> + u64 syndrome;
> >> +
> >> + if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
> >> + return -EFAULT;
> >> + return kvm_vcpu_ioctl_sei(vcpu, &syndrome);
> >> + }
> >> default:
> >> return -EINVAL;
> >> }
> >> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> >> index fa6182a..72505bf 100644
> >> --- a/arch/arm/kvm/guest.c
> >> +++ b/arch/arm/kvm/guest.c
> >> @@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
> >> return -EINVAL;
> >> }
> >>
> >> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> int __attribute_const__ kvm_target_cpu(void)
> >> {
> >> switch (read_cpuid_part()) {
> >> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> >> index 22f9c90..d009c99 100644
> >> --- a/arch/arm64/include/asm/esr.h
> >> +++ b/arch/arm64/include/asm/esr.h
> >> @@ -127,6 +127,8 @@
> >> #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
> >> #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
> >>
> >> +#define VSESR_ELx_IDS_ISS_MASK ((1UL << 25) - 1)
> >> +
> >> /* ESR value templates for specific events */
> >>
> >> /* BRK instruction trap from AArch64 state */
> >> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> >> index 5f64ab2..93dc3d1 100644
> >> --- a/arch/arm64/include/asm/kvm_emulate.h
> >> +++ b/arch/arm64/include/asm/kvm_emulate.h
> >> @@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
> >> return vcpu->arch.fault.esr_el2;
> >> }
> >>
> >> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
> >> +{
> >> + return vcpu->arch.fault.vsesr_el2;
> >> +}
> >> +
> >> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
> >> +{
> >> + vcpu->arch.fault.vsesr_el2 = val;
> >> +}
> >> +
> >> static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
> >> {
> >> u32 esr = kvm_vcpu_get_hsr(vcpu);
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index e7705e7..b6242fb 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
> >> u32 esr_el2; /* Hyp Syndrom Register */
> >> u64 far_el2; /* Hyp Fault Address Register */
> >> u64 hpfar_el2; /* Hyp IPA Fault Address Register */
> >> + u32 vsesr_el2; /* Virtual SError Exception Syndrome Register */
> >> };
> >>
> >> /*
> >> @@ -385,6 +386,7 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> >> struct kvm_device_attr *attr);
> >> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> >> struct kvm_device_attr *attr);
> >> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
> >>
> >> static inline void __cpu_init_stage2(void)
> >> {
> >> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> >> index 32964c7..3af6907 100644
> >> --- a/arch/arm64/include/asm/sysreg.h
> >> +++ b/arch/arm64/include/asm/sysreg.h
> >> @@ -125,6 +125,9 @@
> >> #define REG_PSTATE_PAN_IMM sys_reg(0, 0, 4, 0, 4)
> >> #define REG_PSTATE_UAO_IMM sys_reg(0, 0, 4, 0, 3)
> >>
> >> +/* virtual SError exception syndrome register in armv8.2 */
> >> +#define REG_VSESR_EL2 sys_reg(3, 4, 5, 2, 3)
> >> +
> >> #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM | \
> >> (!!x)<<8 | 0x1f)
> >> #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM | \
> >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >> index b37446a..21c20b0 100644
> >> --- a/arch/arm64/kvm/guest.c
> >> +++ b/arch/arm64/kvm/guest.c
> >> @@ -277,6 +277,20 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
> >> return -EINVAL;
> >> }
> >>
> >> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> >> +{
> >> + u64 reg = *syndrome;
> >> +
> >> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && reg)
> >> + /* set vsesr_el2[24:0] with value that user space specified */
> >> + kvm_vcpu_set_vsesr(vcpu, reg & VSESR_ELx_IDS_ISS_MASK);
> >
> > Are you really supposed to fail silently here if trying to do this on a
> > system that doesn't have RAS ?
> This register vsesr_el2 is only introduced by armv8.2.
> so if system does not have RAS, the setting will be failed because this register does not exist. I need to "return -EINVAL"
> for this case. thank you for pointing that.
>
>
> >
> > Why can you not set reg to 0 here? That doesn't seem to be documented
> > anywhere, and shouldn't the function return -EINVAL in this case?
> Because the default value is 0. if want to set to 0, it can not call this API.
> zero is meaningless syndrome value. no one care the zero value.
>

What if userspace wants to drive a CPU reset of some form. Is there any
mechanism for resetting the register then?

>
> >
> > Also, if you do this, but don't run the VCPU, then migrate the VM, and
> > run the VCPU on the new destination, isn't this information lost?
> >
> >

Please consider the migration point for your next version as well.

Thanks,
-Christoffer

2017-07-04 10:15:40

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi gengdongjiu,

Can you give us a specific example of an error you are trying to handle?
How would a non-KVM user space process handle the error?

KVM-users should be regular user space processes, we should not have a KVM-way
and everyone-else-way of handling errors.


On 04/07/17 05:46, gengdongjiu wrote:
> On 2017/7/3 16:39, Christoffer Dall wrote:
>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>> when SError happen, kvm notifies user space to record the CPER,
>>> user space specifies and passes the contents of ESR_EL1 on taking
>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>> error or asynchronous abort with this specifies syndrome. This
>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>> HCR_EL2.VSE injects an SError. This register is added by the
>>> RAS Extensions.
>>
>> This commit message is confusing and doesn't help me understand the
>> patch.
> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in the RAS solution?

> a). In the firmware-first RAS solution, when guest OS happen a SError interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
> b). The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came from guest OS EL1, firmware
> does by faking an SError interrupt exception entry to EL2.
> c). Control transfers to the hypervisor's delegated error recovery agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
> Virtual SError interrupt to delegate an asynchronous abort to EL1, by setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.

So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
via a KVM-specific API.

Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
implementation details to user space. What if I discover the same error via a
Polling GHES, or one of the IRQ flavours?

User space should not have to know, or care, how linux is notified about APEI
RAS errors.


> (2) what is this patch mainly do?
> As mentioned above, the hypervisor needs to enable virtual SError and pass the virtual syndrome to the guest OS.
>
> a). when Control transfers to the hypervisor from firmware by faking an SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
> host VA address( Qemu translate this VA address to the virtual machine physical address(IPA)) using below new added "serror_intr" struct.
> /* KVM_EXIT_SERROR_INTR */
> struct {
> __u32 syndrome_info;
> __u64 address;
> } serror_intr;

This is for a guest exit to host user-space. Here you are telling Qemu that a
physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
parse the ESR, this is the job of the operating system.

When you're using ACPI firmware-first, SError/SEI is just a notification, the
important data is in the CPER records, which Qemu can't access, (and should be
processed by Linux APEI code).


It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
SError, these are meaningless.

(These registers hold real values for Synchronous External Abort, but for
firmware-first we should prefer the CPER records.)


> b). Qemu gets the address(host VA) delivered by KVM, translate this host VA address to virtual machine physical address(IPA), and runtime record this virtual
> machine physical address(IPA) to the guest OS's APEI table.

I agree with this step, but you're acting on the wrong data. (You're converting
fault_ipa -> virtual address -> fault_ipa, something isn't right ...)

Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). This
mechanism serves all user space processes, not just kvm users. This is where the
user-space virtual address should come from. Qemu/kvmtool have to generate the
guest IPA once they discover the affected memory was presented to the guest
through KVM.


Your KVM-specific mechanism exposes too much raw information (raw ESR values to
user space), and only serves applications using KVM.

If there is another type of CPER record where we should notify userspace, please
do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
drivers/firmware/efi/cper.c. These should consider all user-space applications,
not just users of KVM, and not just on arm64.


> c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome value(but can be different from it) to specify the virtual SError interrupt's syndrome through setting VESR_EL2.

'but can be different from it' is because a classification step is required, the
operating system should do this. We should only signal Qemu/kvmtool for errors
that can actually be handled. Some APEI notifications may be for corrected
errors, (I would hope these always come through a polled GHES), Linux shouldn't
interrupt user space for a corrected error.

For memory errors we already have BUS_MCEERR_AR - action-required, and
BUS_MCEERR_AO - action-optional.

For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux has
to classify the error and handle it as far as possible. In most cases the error
is either handled (no notification required), or fatal. Memory errors are the
only example I've found so far where an application can do additional work to
handle the error.

Can you give us a specific example of an error you are trying to handle?
How would a non-KVM user space process handle the error?



Thanks,

James

2017-07-04 10:50:23

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi James,
Thanks for the review. I will read your comments carefully and then reply to you.


On 2017/7/4 18:14, James Morse wrote:
> Hi gengdongjiu,
>
> Can you give us a specific example of an error you are trying to handle?
> How would a non-KVM user space process handle the error?
>
> KVM-users should be regular user space processes, we should not have a KVM-way
> and everyone-else-way of handling errors.
>
>
> On 04/07/17 05:46, gengdongjiu wrote:
>> On 2017/7/3 16:39, Christoffer Dall wrote:
>>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>>> when SError happen, kvm notifies user space to record the CPER,
>>>> user space specifies and passes the contents of ESR_EL1 on taking
>>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>>> error or asynchronous abort with this specifies syndrome. This
>>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>>> HCR_EL2.VSE injects an SError. This register is added by the
>>>> RAS Extensions.
>>>
>>> This commit message is confusing and doesn't help me understand the
>>> patch.
>> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in the RAS solution?
>
>> a). In the firmware-first RAS solution, when guest OS happen a SError interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>> b). The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came from guest OS EL1, firmware
>> does by faking an SError interrupt exception entry to EL2.
>> c). Control transfers to the hypervisor's delegated error recovery agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>> Virtual SError interrupt to delegate an asynchronous abort to EL1, by setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
>
> So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
> via a KVM-specific API.
>
> Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
> implementation details to user space. What if I discover the same error via a
> Polling GHES, or one of the IRQ flavours?
>
> User space should not have to know, or care, how linux is notified about APEI
> RAS errors.
>
>
>> (2) what is this patch mainly do?
>> As mentioned above, the hypervisor needs to enable virtual SError and pass the virtual syndrome to the guest OS.
>>
>> a). when Control transfers to the hypervisor from firmware by faking an SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>> host VA address( Qemu translate this VA address to the virtual machine physical address(IPA)) using below new added "serror_intr" struct.
>> /* KVM_EXIT_SERROR_INTR */
>> struct {
>> __u32 syndrome_info;
>> __u64 address;
>> } serror_intr;
>
> This is for a guest exit to host user-space. Here you are telling Qemu that a
> physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
> parse the ESR, this is the job of the operating system.
>
> When you're using ACPI firmware-first, SError/SEI is just a notification, the
> important data is in the CPER records, which Qemu can't access, (and should be
> processed by Linux APEI code).
>
>
> It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
> SError, these are meaningless.
>
> (These registers hold real values for Synchronous External Abort, but for
> firmware-first we should prefer the CPER records.)
>
>
>> b). Qemu gets the address(host VA) delivered by KVM, translate this host VA address to virtual machine physical address(IPA), and runtime record this virtual
>> machine physical address(IPA) to the guest OS's APEI table.
>
> I agree with this step, but you're acting on the wrong data. (You're converting
> fault_ipa -> virtual address -> fault_ipa, something isn't right ...)
>
> Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). This
> mechanism serves all user space processes, not just kvm users. This is where the
> user-space virtual address should come from. Qemu/kvmtool have to generate the
> guest IPA once they discover the affected memory was presented to the guest
> through KVM.
>
>
> Your KVM-specific mechanism exposes too much raw information (raw ESR values to
> user space), and only serves applications using KVM.
>
> If there is another type of CPER record where we should notify userspace, please
> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
> drivers/firmware/efi/cper.c. These should consider all user-space applications,
> not just users of KVM, and not just on arm64.
>
>
>> c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome value(but can be different from it) to specify the virtual SError interrupt's syndrome through setting VESR_EL2.
>
> 'but can be different from it' is because a classification step is required, the
> operating system should do this. We should only signal Qemu/kvmtool for errors
> that can actually be handled. Some APEI notifications may be for corrected
> errors, (I would hope these always come through a polled GHES), Linux shouldn't
> interrupt user space for a corrected error.
>
> For memory errors we already have BUS_MCEERR_AR - action-required, and
> BUS_MCEERR_AO - action-optional.
>
> For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux has
> to classify the error and handle it as far as possible. In most cases the error
> is either handled (no notification required), or fatal. Memory errors are the
> only example I've found so far where an application can do additional work to
> handle the error.
>
> Can you give us a specific example of an error you are trying to handle?
> How would a non-KVM user space process handle the error?
>
>
>
> Thanks,
>
> James
>
>
> .
>

2017-07-05 08:14:31

by Dongjiu Geng

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi James,


On 2017/7/4 18:14, James Morse wrote:
> Hi gengdongjiu,
>
> Can you give us a specific example of an error you are trying to handle?
For example:
guest OS user space accesses device type memory, but happen SError. because the
SError is asynchronous faults, it does not take immediately. when guest OS call "SVC" to enter guest os
kernel space, the ESB instruction(Error Synchronization Barrier) will defter this SError. so the SError happen immediately.


> How would a non-KVM user space process handle the error?
it is indeed, non-KVM user space can not get the notification from hypervisor or host kernel. thanks for the pointing out
do you mean still Signal SIGBUS from memory_failure?


>
> KVM-users should be regular user space processes, we should not have a KVM-way
> and everyone-else-way of handling errors.
>
>
> On 04/07/17 05:46, gengdongjiu wrote:
>> On 2017/7/3 16:39, Christoffer Dall wrote:
>>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>>> when SError happen, kvm notifies user space to record the CPER,
>>>> user space specifies and passes the contents of ESR_EL1 on taking
>>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>>> error or asynchronous abort with this specifies syndrome. This
>>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>>> HCR_EL2.VSE injects an SError. This register is added by the
>>>> RAS Extensions.
>>>
>>> This commit message is confusing and doesn't help me understand the
>>> patch.
>> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in the RAS solution?
>
>> a). In the firmware-first RAS solution, when guest OS happen a SError interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>> b). The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came from guest OS EL1, firmware
>> does by faking an SError interrupt exception entry to EL2.
>> c). Control transfers to the hypervisor's delegated error recovery agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>> Virtual SError interrupt to delegate an asynchronous abort to EL1, by setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
>
> So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
> via a KVM-specific API.
>
> Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
> implementation details to user space. What if I discover the same error via a
> Polling GHES, or one of the IRQ flavours?
James, you mainly concern the way that "tell Qemu/kvmtool via a KVM-specific API", right?
so how about still delivered SIGBUS same as the SEA(Synchronous External Abort)?

by the way, what is your meaning of below words?
>"What if I discover the same error via a Polling GHES, or one of the IRQ flavours?"


>
> User space should not have to know, or care, how linux is notified about APEI
> RAS errors.
>
>
>> (2) what is this patch mainly do?
>> As mentioned above, the hypervisor needs to enable virtual SError and pass the virtual syndrome to the guest OS.
>>
>> a). when Control transfers to the hypervisor from firmware by faking an SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>> host VA address( Qemu translate this VA address to the virtual machine physical address(IPA)) using below new added "serror_intr" struct.
>> /* KVM_EXIT_SERROR_INTR */
>> struct {
>> __u32 syndrome_info;
>> __u64 address;
>> } serror_intr;
>
> This is for a guest exit to host user-space. Here you are telling Qemu that a
> physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
> parse the ESR, this is the job of the operating system.
it does not want Qemu/kvmtool to parse the ESR.
Qemu/kvmtool can refer to the ESR to specify the vsesr's value, only for reference.

As mentioned above, firmware does by faking an SError interrupt exception entry to EL2.
so the esr_El2 may contain some useful information, Qemu can refer to this value to set the vsesr_el2(esr_el1).

when qemu specified the vsesr value, do you mean not refer to the esr_el2 value?
if so, what is the suggested value for the vsesr_el2 value?

>
> When you're using ACPI firmware-first, SError/SEI is just a notification, the
> important data is in the CPER records, which Qemu can't access, (and should be
> processed by Linux APEI code).
Yes, I agree with you.

>
>
> It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
> SError, these are meaningless.
>
> (These registers hold real values for Synchronous External Abort, but for
> firmware-first we should prefer the CPER records.)
>
>
>> b). Qemu gets the address(host VA) delivered by KVM, translate this host VA address to virtual machine physical address(IPA), and runtime record this virtual
>> machine physical address(IPA) to the guest OS's APEI table.
>
> I agree with this step, but you're acting on the wrong data. (You're converting
> fault_ipa -> virtual address -> fault_ipa, something isn't right ...)
consider again, using the fault_ipa indeed has problem. because SEI is asynchronous faults. the IPA which is recorded in the register may not the real error address.
thanks for the pointing out.
>
> Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). This
> mechanism serves all user space processes, not just kvm users. This is where the
> user-space virtual address should come from. Qemu/kvmtool have to generate the
> guest IPA once they discover the affected memory was presented to the guest
> through KVM.
so do you suggest to use SIGBUS to notify Qemu/kvmtool both for the SEA/SEI?
if so, how the Qemu know the error is SEA or SEI. the siginfo_t for the SIGBUS only
include the si_code(BUS_MCEERR_A{R,O}) and si_address(host VA).


>
>
> Your KVM-specific mechanism exposes too much raw information (raw ESR values to
> user space), and only serves applications using KVM.
thanks for the pointing out.

>
> If there is another type of CPER record where we should notify userspace, please
> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
> drivers/firmware/efi/cper.c. These should consider all user-space applications,
> not just users of KVM, and not just on arm64.

here I have a question, in the "drivers/acpi/apei/ghes.c" code, it only handle the memory section of CPER.
if the section type of CPER is processor, it will not notify user-space. so only let userspace handle the memory section is reasonable?

As shown blow, only when the section is memory section, it will call ghes_handle_memory_failure.

450 static void ghes_do_proc(struct ghes *ghes,
451 const struct acpi_hest_generic_status *estatus)
452 {
453 int sev, sec_sev;
454 struct acpi_hest_generic_data *gdata;
455 uuid_le sec_type;
456 uuid_le *fru_id = &NULL_UUID_LE;
457 char *fru_text = "";
458
459 sev = ghes_severity(estatus->error_severity);
460 apei_estatus_for_each_section(estatus, gdata) {
461 sec_sev = ghes_severity(gdata->error_severity);
462 sec_type = *(uuid_le *)gdata->section_type;
463
464 if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
465 fru_id = (uuid_le *)gdata->fru_id;
466 if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
467 fru_text = gdata->fru_text;
468
469 if (!uuid_le_cmp(sec_type,
470 CPER_SEC_PLATFORM_MEM)) {
471 struct cper_sec_mem_err *mem_err;
472
473 mem_err = acpi_hest_generic_data_payload(gdata);
474 ghes_edac_report_mem_error(ghes, sev, mem_err);
475
476 arch_apei_report_mem_error(sev, mem_err);
477 ghes_handle_memory_failure(gdata, sev);
478 }

>
>
>> c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome value(but can be different from it) to specify the virtual SError interrupt's syndrome through setting VESR_EL2.
>
> 'but can be different from it' is because a classification step is required, the
> operating system should do this. We should only signal Qemu/kvmtool for errors
> that can actually be handled. Some APEI notifications may be for corrected
> errors, (I would hope these always come through a polled GHES), Linux shouldn't
> interrupt user space for a corrected error.
classification step is need, for the error that Qemu/kvmtool can actually handled, what is vsesr_el2's value if qemu specify it?
If qemu sets arbitrary value for vsesr, it may affect guest os error recovery. so here I let Qemu refer to the esr_el2's value.

here I show a example in the "RAS_Extension_PRD03-PRDC-010953-33-0", in this example, it even Sets up SPSR_EL1, ELR_EL1 and ESR_EL1 with copies of SPSR_EL2, ELR_EL2 and
ESR_EL2.

Variant: asynchronous External Abort with delegated exception handling
The example above requires the hypervisor to know that it can delegate the error exception to the OS using a
virtual error interrupt. In this example:
 The error exception is taken asynchronously.
 Before the load instruction completes, the software executes an SVC instruction.
 The OS uses the ESB instruction at the SVC exception entry point. This barriers the error as described
above.
 The firmware logs, triages, and delegates the error exception to the hypervisor. As the error came
from EL1, it does by faking an SError interrupt exception entry to EL2.
 Control transfers to the hypervisor’s delegated error recovery agent. It repeats the process of triaging
the error.
 The hypervisor, however, delegates the error exception to the OS using the delegated exception
handler model.
— If it were to try a virtual SError interrupt, then this would be masked on return to EL1.
 The delegated exception handler model is described in [5], but, briefly, the hypervisor:
— Stashes the original SPSR_EL2, ELR_EL2 and ESR_EL2 in a local record.
— Stashes the current SPSR_EL1, ELR_EL1 and ESR_EL1 in a local record.
 These registers may contain live data.
— Sets up SPSR_EL1, ELR_EL1 and ESR_EL1 with copies of SPSR_EL2, ELR_EL2 and
ESR_EL2.
 That is, the syndrome for the error.
— Sets SPSR_EL2 and ELR_EL2 to point to the OS delegated error recovery agent entry point,
and executes an ERET instruction.
 Control returns to the OS. It triages the error and discovers that the error was taken from an ESB
instruction at an exception entry point and so it might be able to recover from the error.
 The OS returns from its delegated error recovery agent by executing a Hypervisor call, asking the
hypervisor to schedule a virtual SError interrupt. The hypervisor:
— Restores the original SPSR_EL2, ELR_EL2 and ESR_EL2 from its local record.
— Sets HCR_EL2.VSE to 1.
— Returns back to the OS at the point the error exception was taken from (in this case, the ESB).
 The virtual SError interrupt is processed as described above.
 The error has been logged, triaged, and contained to the EL0 application.


>
> For memory errors we already have BUS_MCEERR_AR - action-required, and
> BUS_MCEERR_AO - action-optional.
>
> For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux has
> to classify the error and handle it as far as possible. In most cases the error
> is either handled (no notification required), or fatal. Memory errors are the
> only example I've found so far where an application can do additional work to
> handle the error.
James, only memory errors needs application to do additional work. UEFI spec mentioned that?

>
> Can you give us a specific example of an error you are trying to handle?
> How would a non-KVM user space process handle the error?
>
>
>
> Thanks,
>
> James
>
>
> .
>

2017-07-06 16:40:26

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

Hi gengdongjiu,

On 05/07/17 09:14, gengdongjiu wrote:
> On 2017/7/4 18:14, James Morse wrote:
>> Can you give us a specific example of an error you are trying to handle?

> For example:
> guest OS user space accesses device type memory, but happen SError. because the
> SError is asynchronous faults, it does not take immediately. when guest OS call "SVC" to enter guest os
> kernel space, the ESB instruction(Error Synchronization Barrier) will defter this SError. so the SError happen immediately.

Ah, this isn't necessarily a 'RAS notification' SError/SEI, it may be a
'vanilla', v8.0 SError.

You've given a guest access to a physical device (how?), the guest has done
something, which caused the device to respond with SError.

Do you have a specific use-case for this? What is the ESR? What kinds of CPER
records does firmware generate? (if any)


We have to be careful here as devices can still generate asynchronous-interrupts
using SError, these aren't contained by ESB barriers. For these we should fall
back to KVM's v8.0 SError behaviour. KVM can tell them apart as the APEI code
doesn't claim the SError as an SEI notification, and with the RAS extensions the
ESR has the 'IDS' bit set.


>> How would a non-KVM user space process handle the error?

> it is indeed, non-KVM user space can not get the notification from hypervisor or host kernel. thanks for the pointing out
> do you mean still Signal SIGBUS from memory_failure?

No, I was assuming this was a RAS notification SEI, (because your patch 1/3 of
touched the RAS cpu-features) being given to user space to handle.

Instead, can I ask how the host should handle this SError if it had accessed the
device itself?


I agree device pass-through is going to be a special case for KVM, but before
the host can deliver a device RAS error into the guest that was using the
device, it needs to fully understand what the error means:

The error may mean that the careful configuration that makes device-passthrough
safe no longer works, letting the guest continue to access the device may let it
damage another guest or the hyper visor.


We may need a way for the host RAS code to identify the driver responsible, to
handle the device error, or delegate it if that's appropriate.


[...]

>> So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
>> via a KVM-specific API.
>>
>> Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
>> implementation details to user space. What if I discover the same error via a
>> Polling GHES, or one of the IRQ flavours?

> James, you mainly concern the way that "tell Qemu/kvmtool via a KVM-specific API", right?
> so how about still delivered SIGBUS same as the SEA(Synchronous External Abort)?

> by the way, what is your meaning of below words?
> >"What if I discover the same error via a Polling GHES, or one of the IRQ flavours?"

This was my mistaken assumption that you were passing an APEI RAS SEI
notification to user space via a KVM specific API. This wouldn't work for
applications not using KVM, or notifications not using SEI.

Here I was asking what happens if the notification used NOTIFY_POLL or
NOTIFY_IRQ (instead of NOTIFY_SEI) in the GHES, but this isn't relevant as it
doesn't look like this is a APEI RAS notification.

[...]


>> If there is another type of CPER record where we should notify userspace, please
>> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
>> drivers/firmware/efi/cper.c. These should consider all user-space applications,
>> not just users of KVM, and not just on arm64.
>
> here I have a question, in the "drivers/acpi/apei/ghes.c" code, it only handle the memory section of CPER.

Yes, we are certainly missing processing for the other record types.


> if the section type of CPER is processor, it will not notify user-space. so only let userspace handle the memory section is reasonable?

I think the only errors that user-space can know more than the kernel are memory
errors. These are the only RAS errors we should expect user space to handle. All
the others fall into either 'corrected by the kernel' or 'fatal for userspace -
SIGKILL'.


>> For memory errors we already have BUS_MCEERR_AR - action-required, and
>> BUS_MCEERR_AO - action-optional.
>>
>> For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux has
>> to classify the error and handle it as far as possible. In most cases the error
>> is either handled (no notification required), or fatal. Memory errors are the
>> only example I've found so far where an application can do additional work to
>> handle the error.

> James, only memory errors needs application to do additional work. UEFI spec mentioned that?

No, its my observation based on the record types. Memory is the only thing an
application can change. Everything else belongs to the kernel.
For a corrupt page of anonymous memory, there is nothing the kernel can do, but
report the data lost. The application (e.g. web browser) may know what the
corrupt data was, and if/how it can retrieve it again. This isn't true of
Processor/Cache/TLB/PCIe errors, which cover the other CPER records.


Thanks,

James