Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757517AbdCUNKu (ORCPT ); Tue, 21 Mar 2017 09:10:50 -0400 Received: from foss.arm.com ([217.140.101.70]:52358 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757150AbdCUNKr (ORCPT ); Tue, 21 Mar 2017 09:10:47 -0400 Message-ID: <58D1263E.90800@arm.com> Date: Tue, 21 Mar 2017 13:10:22 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: gengdongjiu , xiexiuqi@huawei.com CC: Marc Zyngier , catalin.marinas@arm.com, will.deacon@arm.com, christoffer.dall@linaro.org, rkrcmar@redhat.com, suzuki.poulose@arm.com, andre.przywara@arm.com, mark.rutland@arm.com, vladimir.murzin@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, wangxiongfeng2@huawei.com, wuquanming@huawei.com, huangshaoyu@huawei.com Subject: Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS References: <1489996534-8270-1-git-send-email-gengdongjiu@huawei.com> <7055772d-2a20-6e0c-2bf8-204bc9ef52a5@arm.com> <22fb583f-a33e-15f8-a059-fb112b27dd4f@arm.com> <58CFF058.8020205@arm.com> <76795e20-2f20-1e54-cfa5-7444f28b18ee@huawei.com> In-Reply-To: <76795e20-2f20-1e54-cfa5-7444f28b18ee@huawei.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5061 Lines: 116 Hi, On 21/03/17 06:32, gengdongjiu wrote: > On 2017/3/20 23:08, James Morse wrote: >> On 20/03/17 13:58, Marc Zyngier wrote: >>> On 20/03/17 12:28, gengdongjiu wrote: >>>> On 2017/3/20 19:24, Marc Zyngier wrote: >>>>> On 20/03/17 07:55, Dongjiu Geng wrote: >>>>>> In the RAS implementation, hardware pass the virtual SEI >>>>>> syndrome information through the VSESR_EL2, so set the virtual >>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to >>>>>> the guest OS (I've juggled the order of your replies:) > so for both SEA and SEI, do you prefer to below steps? > EL0/EL1 SEI/SEA ---> EL3 firmware first handle ------> EL2 hypervisor notify > the Qemu to inject SEI/SEA------>Qemu call KVM API to inject SEA/SEI---->KVM > inject SEA/SEI to guest OS Yes, to expand your EL2 hypervisor notify Qemu step: 1 The host should call its APEI code to parse the CPER records. 2 User space processes are then notified via SIGBUS (or for rasdaemon, trace points). 3 Qemu can take the address delivered via SIGBUS and generate CPER records for the guest. It knows how to convert host addresses to guest IPAs, and it knows where in guest memory to write the CPER records. 4 Qemu can then notify the guest via whatever mechanism it advertised via the HEST/GHES table. It might not be the same mechanism that the host received the notification through. Steps 1 and 2 are the same even if no guest is running, so we don't have to add any special case for KVM. This is existing code that x86 uses. We can test the Qemu parts without any firmware support and the APEI path in the host and guest is the same. >> Is anyone from Huawei looking at adding RAS support for Qemu? > yes, I am looking at Qemu and want to add RAS support. Great, support in Qemu is one of the missing pieces. On x86 it looks like it emulates machine-check-exceptions, which is how x86 did this before firmware-first and APEI became the standard. > do you mean let Qemu inject both the SEA and SEI? To do the notification, yes. It needs to happen after the CPER records have been written, and the mechanism and CPER memory location need to match what the guest was told via the HEST/GHES table. If Qemu didn't tell the guest about firmware-first, it can still deliver the guest an SError Interrupt. SEA should be possible to do with the KVM_SET_REG API, GPIO/GSIV and the other kind of interrupts can use irqfd. For SEI we may need to add an API call to KVM to let it pend SError with a specific ESR. >> How does this work with firmware first? > when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 firmware records the error > info to the APEI table, These are CPER records in a memory area pointed to by one of HEST's GHES entries? > then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and transfers control to the > hypervisor, hypervisor delegates the error exception to EL1 guest This is a problem, just because the error occurred while the guest was running doesn't mean we should deliver it directly to the guest. Some of these errors will be fatal for the CPU and the host should try and power it off to contain the fault. For example: CPER's 'micro-architectural error', should the guest power-off the vCPU? All that really does is return to the hypervisor, the error hasn't been contained. Firmware should handle the error first, then the host, finally the guest via Qemu. > OS by setting HCR_EL2.VSE to 1 and pass the virtual SEI syndrome through vsesr_el2. > The EL1 guest OS check the DISR_EL1 syndrome information to decide to > terminate the application, or do some other recovery action. because the HCR_EL2.AMO is set, so in fact, read > DISR_EL1, it returns the VDISR_EL2. and VDISR_EL2 is loaded from VSESR_EL2, so here I pass the virtual SEI > syndrome vsesr_el2. So this is how an SError Interrupt's ESR gets into a guest. How does it get hold of the CPER records? >> If we took a Physical SError Interrupt the CPER records are in the hosts memory. >> To deliver a RAS event to the guest something needs to generate CPER records and >> put them in the guest memory. Only Qemu knows where these memory regions are. >> >> Put another way, what is the guest expected to do with this SError interrupt? > > No, we do not only panic,if it is EL0 application SEI. the OS error recovery > agent will terminate the EL0 application to isolate the error; If it is EL1 guest > OS SError, guest OS can see whether it can recover. if the error was in a read-only file cache buffer, guest OS > can invalidate the page and reload the data from disk. How do we get an address for memory failure? SError is asynchronous, I don't think it sets the FAR. (SEA is synchronous and its not guaranteed to set the FAR..). As far as I understand this information is in the CPER records in host memory. If we did have an address it would be a host address, how is it converted to a guest IPA? I think Qemu should do this as part of its CPER record generation, once the host has decided the error wasn't catastrophic. Thanks, James