Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935336AbdCWPHI (ORCPT ); Thu, 23 Mar 2017 11:07:08 -0400 Received: from foss.arm.com ([217.140.101.70]:57704 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934465AbdCWPGi (ORCPT ); Thu, 23 Mar 2017 11:06:38 -0400 Message-ID: <58D3E469.8090408@arm.com> Date: Thu, 23 Mar 2017 15:06:17 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Dongjiu Geng CC: rkrcmar@redhat.com, christoffer.dall@linaro.org, marc.zyngier@arm.com, linux@armlinux.org.uk, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, xiexiuqi@huawei.com, wangxiongfeng2@huawei.com, wuquanming@huawei.com, huangshaoyu@huawei.com Subject: Re: [PATCH] arm/arm64: KVM: send SIGBUS error to qemu References: <1490274061-487-1-git-send-email-gengdongjiu@huawei.com> In-Reply-To: <1490274061-487-1-git-send-email-gengdongjiu@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2598 Lines: 82 Hi Dongjiu Geng, On 23/03/17 13:01, Dongjiu Geng wrote: > when the pfn is KVM_PFN_ERR_HWPOISON, it indicates to send > SIGBUS signal from KVM's fault-handling code to qemu, qemu > can handle this signal according to the fault address. I'm afraid I beat you to it on this one: https://www.spinics.net/lists/arm-kernel/msg568919.html (Are you the same gengdj who ask me to post that patch?: https://lkml.org/lkml/2017/3/5/187 ) We don't need upstream KVM to do this until either arm or arm64 has ARCH_SUPPORTS_MEMORY_FAILURE. Punit and Tyler have discovered problems with the way arm64's hugepage and hwpoison interact: https://www.spinics.net/lists/arm-kernel/msg568995.html Some comments on the differences: > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 962616fd4ddd..1307ec400de3 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -1237,6 +1237,20 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn, > __coherent_cache_guest_page(vcpu, pfn, size); > } > > +static void kvm_send_hwpoison_signal(unsigned long address, > + struct task_struct *tsk) > +{ > + siginfo_t info; > + > + info.si_signo = SIGBUS; > + info.si_errno = 0; > + info.si_code = BUS_MCEERR_AR; > + info.si_addr = (void __user *)address; > + info.si_addr_lsb = PAGE_SHIFT; Any version of this patch should handle hugepage for the sizes KVM uses in its stage2 mappings. By just passing PAGE_SHIFT you let the guest fault for each page that makes up the hugepage. > + > + send_sig_info(SIGBUS, &info, tsk); > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_memory_slot *memslot, unsigned long hva, > unsigned long fault_status) > @@ -1309,6 +1323,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > if (is_error_noslot_pfn(pfn)) > return -EFAULT; > > + if (is_error_hwpoison_pfn(pfn)) { > + kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), > + current); > + return -EFAULT; This will return -EFAULT from the KVM_RUN ioctl(). Is Qemu expected to know it should try again? This is indistinguishable from the is_error_noslot_pfn() error above. x86 returns 0 from this path, kvm_handle_bad_page() in arch/x86/kvm/mmu.c as the SIGBUS should arrive first. If the SIGBUS is handled the error has been resolved and Qemu can call KVM_RUN again. Returning an error and sending SIGBUS suggests there are two problems. > + } > + > if (kvm_is_device_pfn(pfn)) { > mem_type = PAGE_S2_DEVICE; > flags |= KVM_S2PTE_FLAG_IS_IOMAP; Thanks, James