Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp111802ybt; Thu, 25 Jun 2020 16:45:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDZbsLgL83hVA2ipSvTy7tJocwdWqH8hbDKjbwnoD8gxO8YjEAlOXY178SOU78Em+OYyQi X-Received: by 2002:a50:8143:: with SMTP id 61mr716924edc.202.1593128703408; Thu, 25 Jun 2020 16:45:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593128703; cv=none; d=google.com; s=arc-20160816; b=Gcmnr/8lntmBf4X/NGW06zSuRi3ET3Fc29819/JjeDmoPD64gzAEOEvDp/m2FE73Ua 8lWd+S4+dkSRmNbVY/OV/VaiMk39Jkf/AX0hUHjqgEPSLo1tUyJHNPfYC06YDU1zvs0u gJcHtfudNZoZH8GxzkQWDyrpXT+nbfof7HKcRxycXeTD0Ca0zfpxUKTdPdRqNB3hxq+b JGq2/KvBz0SjkJtfhc0jwzL1P0OOPZTkecrn1cFPo/hpIMMyTX0rS8dR4yFFsKEs9uRz bzJroC8kdcxLJ9ARbTGnn1BRpGmEYK5GvNORfWZD+uFpaudSVHk0v4iTAS+MI5KzVpmS FDfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=kgGg8jMkbvcZ8iFUP5N32zhARfU1/gNnbQngU0bR5Hg=; b=zHwq2AQA2Qh/RyiVUtDch60KyY3MTe2JGfPYlTCA6HMikuGCOlp8rJ+k2YlhXH/RUe 5otbz7EsUgizkLzEUQXh3cKwGy2mqkrwv9JlI9ANMsMneE8m0/2rDgFQ5CA9fP2+mGbp LWPpzX6QiWvmp+RG0ZsLBWQzrwNltIblX8Tz4gnuif03UsWJL2g24Qucc6judJ6AlYs6 3u2S+tL8PFf3HM9kLFMC0I1sTSkUkly1Mb4MMO8fF7LwDuDpNELBAiriWRaK7mqYGLrs 7ouvPdWpxVVOZRfOwjFnHXzMuCxwzUQcUSZHP3Oj7LBdn4FJw/a4vKe96tPiXSz9f2WQ y4sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I+8PpCU9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b15si9353599edx.0.2020.06.25.16.44.39; Thu, 25 Jun 2020 16:45:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I+8PpCU9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406808AbgFYVrJ (ORCPT + 99 others); Thu, 25 Jun 2020 17:47:09 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:20234 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2404502AbgFYVrJ (ORCPT ); Thu, 25 Jun 2020 17:47:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593121627; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=kgGg8jMkbvcZ8iFUP5N32zhARfU1/gNnbQngU0bR5Hg=; b=I+8PpCU9yA2Mh31kPUoIdN/acufsAX46PXRvP8YBSJYHxkttreyaQiqczJStojdtH2GWBQ K5c2WYNHsnnwLTXMlvDBLmNMBAwKX51dBsWHDSdtscu6GcA7Q+pE1qvGGldPF1KORG/Gus Tv92WrHld0l3bm2Wiv8FLmDFvAUu9wM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-206-mZZ0iYCOMZ-QZKedegce-g-1; Thu, 25 Jun 2020 17:47:05 -0400 X-MC-Unique: mZZ0iYCOMZ-QZKedegce-g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B53F4800C64; Thu, 25 Jun 2020 21:47:04 +0000 (UTC) Received: from horse.redhat.com (ovpn-116-4.rdu2.redhat.com [10.10.116.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id A9C915C240; Thu, 25 Jun 2020 21:47:01 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 4C3E6220244; Thu, 25 Jun 2020 17:47:01 -0400 (EDT) Date: Thu, 25 Jun 2020 17:47:01 -0400 From: Vivek Goyal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: virtio-fs@redhat.com, vkuznets@redhat.com, pbonzini@redhat.com, sean.j.christopherson@intel.com Subject: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error Message-ID: <20200625214701.GA180786@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Page fault error handling behavior in kvm seems little inconsistent when page fault reports error. If we are doing fault synchronously then we capture error (-EFAULT) returned by __gfn_to_pfn_memslot() and exit to user space and qemu reports error, "error: kvm run failed Bad address". But if we are doing async page fault, then async_pf_execute() will simply ignore the error reported by get_user_pages_remote() or by kvm_mmu_do_page_fault(). It is assumed that page fault was successful and either a page ready event is injected in guest or guest is brought out of artificial halt state and run again. In both the cases when guest retries the instruction, it takes exit again as page fault was not successful in previous attempt. And then this infinite loop continues forever. Trying fault in a loop will make sense if error is temporary and will be resolved on retry. But I don't see any intention in the code to determine if error is temporary or not. Whether to do fault synchronously or asynchronously, depends on so many variables but none of the varibales is whether error is temporary or not. (kvm_can_do_async_pf()). And that makes it very inconsistent or unpredictable to figure out whether kvm will exit to qemu with error or it will just retry and go into an infinite loop. This patch tries to make this behavior consistent. That is instead of getting into infinite loop of retrying page fault, exit to user space and stop VM if page fault error happens. In future this can be improved by injecting errors into guest. As of now we don't have any race free method to inject errors in guest. When page fault error happens in async path save that pfn and when next time guest retries, do a sync fault instead of async fault. So that if error is encountered, we exit to qemu and avoid infinite loop. As of now only one error pfn is stored and that means it could be overwritten before next a retry from guest happens. But this is just a hint and if we miss it, some other time we will catch it. If this becomes an issue, we could maintain an array of error gfn later to help ease the issue. Signed-off-by: Vivek Goyal --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/x86.c | 14 +++++++++++--- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index be5363b21540..3c0677b9d3d5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -778,6 +778,7 @@ struct kvm_vcpu_arch { unsigned long nested_apf_token; bool delivery_as_pf_vmexit; bool pageready_pending; + gfn_t error_gfn; } apf; /* OSVW MSRs (AMD only) */ diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 444bb9c54548..d0a2a12c7bb6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -60,7 +60,7 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots); void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, u32 cr0, u32 cr4, u32 efer); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, bool accessed_dirty, gpa_t new_eptp); -bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu); +bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn); int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, u64 fault_address, char *insn, int insn_len); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 76817d13c86e..a882a6a9f7a7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4078,7 +4078,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, if (!async) return false; /* *pfn has correct page already */ - if (!prefault && kvm_can_do_async_pf(vcpu)) { + if (!prefault && kvm_can_do_async_pf(vcpu, cr2_or_gpa >> PAGE_SHIFT)) { trace_kvm_try_async_get_page(cr2_or_gpa, gfn); if (kvm_find_async_pf_gfn(vcpu, gfn)) { trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3b92db412335..a6af7e9831b9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10380,7 +10380,9 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) return; - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); + r = kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); + if (r < 0) + vcpu->arch.apf.error_gfn = work->cr2_or_gpa >> PAGE_SHIFT; } static inline u32 kvm_async_pf_hash_fn(gfn_t gfn) @@ -10490,7 +10492,7 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) return true; } -bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu) +bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn) { if (unlikely(!lapic_in_kernel(vcpu) || kvm_event_needs_reinjection(vcpu) || @@ -10504,7 +10506,13 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu) * If interrupts are off we cannot even use an artificial * halt state. */ - return kvm_arch_interrupt_allowed(vcpu); + if (!kvm_arch_interrupt_allowed(vcpu)) + return false; + + if (vcpu->arch.apf.error_gfn == gfn) + return false; + + return true; } bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, -- 2.25.4