Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751081AbdCNJp0 (ORCPT ); Tue, 14 Mar 2017 05:45:26 -0400 Received: from foss.arm.com ([217.140.101.70]:58648 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750924AbdCNJpY (ORCPT ); Tue, 14 Mar 2017 05:45:24 -0400 Message-ID: <58C7BBA2.4080208@arm.com> Date: Tue, 14 Mar 2017 09:45:06 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Xie XiuQi CC: marc.zyngier@arm.com, fu.wei@linaro.org, catalin.marinas@arm.com, will.deacon@arm.com, zjzhang@codeaurora.org, wangkefeng.wang@huawei.com, zhengqiang10@huawei.com, wangxiongfeng2@huawei.com, shiju.jose@huawei.com, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, hanjun.guo@linaro.org, guohanjun@huawei.com, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v2 3/3] arm64: KVM: add guest SEI support References: <1488946181-130774-1-git-send-email-xiexiuqi@huawei.com> <1488946181-130774-4-git-send-email-xiexiuqi@huawei.com> In-Reply-To: <1488946181-130774-4-git-send-email-xiexiuqi@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4074 Lines: 121 Hi Xie XiuQi, On 08/03/17 04:09, Xie XiuQi wrote: > Add ghes handling for SEI so that the host kernel could parse and > report detailed error information for SEI which occur in the guest > kernel. How does this interact with Synchronous External Abort as a notify method? Both of these take the in_nmi() path through APEI. SError Interrupts are masked during exception processing, so we don't have to worry about them becoming recursive. For SEA the firmware has to promise not to invoke another SEA while we are still processing the first, and SEI will be masked if we took it as an exception. What happens if we take an SEA while processing another event notified via SEI? Can this happen on your platform? Can someone else build a platform where this happens? Does the GHES APEI code need to be able to handle this? If we need to support both at the same time we will need to change Linux's APEI code to reserve a page of virtual address space per GHES entry, instead of one for NMI and one for IRQ. > diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h > index 5b2cecd..d68d61f 100644 > --- a/arch/arm64/include/asm/system_misc.h > +++ b/arch/arm64/include/asm/system_misc.h > @@ -59,5 +59,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int, > #endif /* __ASSEMBLY__ */ > > int handle_guest_sea(unsigned long addr, unsigned int esr); > +int handle_guest_sei(unsigned long addr, unsigned int esr); > > #endif /* __ASM_SYSTEM_MISC_H */ > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c > index 65dbfa9..cf9f569 100644 > --- a/arch/arm64/kernel/traps.c > +++ b/arch/arm64/kernel/traps.c > @@ -616,6 +616,24 @@ const char *esr_get_class_string(u32 esr) > } > > /* > + * Handle asynchronous SError interrupt that occur in a guest kernel. > + */ > +int handle_guest_sei(unsigned long addr, unsigned int esr) > +{ > + /* > + * synchronize_rcu() will wait for nmi_exit(), so no need to > + * rcu_read_lock(). > + */ This comment was true for patch 4 of Tyler's series, but not-true when we got to patch 10. Please remove it, > + if(IS_ENABLED(CONFIG_ACPI_APEI_SEI)) { > + rcu_read_lock(); Please put the rcu calls against the thing using them. > + ghes_notify_sei(); > + rcu_read_unlock(); > + } > + > + return 0; > +} > + > +/* > * bad_mode handles the impossible case in the exception vector. This is always > * fatal. > */ > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c > index 1bfe30d..8c7dba0 100644 > --- a/arch/arm64/kvm/handle_exit.c > +++ b/arch/arm64/kvm/handle_exit.c > @@ -172,6 +173,23 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu) > return arm_exit_handlers[hsr_ec]; > } > > +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run) > +{ > + unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > + > + if (handle_guest_sei((unsigned long)fault_ipa, > + kvm_vcpu_get_hsr(vcpu))) { > + kvm_err("Failed to handle guest SEI, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n", > + kvm_vcpu_trap_get_class(vcpu), > + (unsigned long)kvm_vcpu_trap_get_fault(vcpu), > + (unsigned long)kvm_vcpu_get_hsr(vcpu)); > + } > + > + kvm_inject_vabt(vcpu); Always inject an SError Interrupt? How should this work when Qemu supports guest-RAS too? If we do want to kill the guest for RAS-related reasons we should go via user-space to allow Qemu to handle the error and potentially notify the guest. This would let Qemu generate CPER records for the guest, mirroring what just happened with the firmware-generated records. As on the other thread: if there were CPER records processed by handle_guest_sei() we should continue as normal as the fault was handled in some way. If there were no CPER records, (or the system doesn't support SEI as a GHES notification mechanism), then yes we should still call kvm_inject_vabt(). A suggestion of how do this: [0], if you have a better suggestion please chime in! Thanks, James [0] https://www.spinics.net/lists/kvm/msg146131.html