Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751938AbdFSOvw (ORCPT ); Mon, 19 Jun 2017 10:51:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51120 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbdFSOvs (ORCPT ); Mon, 19 Jun 2017 10:51:48 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 43DB080460 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=rkrcmar@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 43DB080460 Date: Mon, 19 Jun 2017 16:51:45 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: "linux-kernel@vger.kernel.org" , kvm , Paolo Bonzini , Wanpeng Li Subject: Re: [PATCH v2 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Message-ID: <20170619145144.GA10325@potion> References: <1497493615-18512-1-git-send-email-wanpeng.li@hotmail.com> <1497493615-18512-4-git-send-email-wanpeng.li@hotmail.com> <20170616133702.GA6360@potion> <20170616153832.GA5980@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 19 Jun 2017 14:51:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4596 Lines: 94 2017-06-17 13:52+0800, Wanpeng Li: > 2017-06-16 23:38 GMT+08:00 Radim Krčmář : > > 2017-06-16 22:24+0800, Wanpeng Li: > >> 2017-06-16 21:37 GMT+08:00 Radim Krčmář : > >> > 2017-06-14 19:26-0700, Wanpeng Li: > >> >> From: Wanpeng Li > >> >> > >> >> Add an async_page_fault field to vcpu->arch.exception to identify an async > >> >> page fault, and constructs the expected vm-exit information fields. Force > >> >> a nested VM exit from nested_vmx_check_exception() if the injected #PF > >> >> is async page fault. > >> >> > >> >> Cc: Paolo Bonzini > >> >> Cc: Radim Krčmář > >> >> Signed-off-by: Wanpeng Li > >> >> --- > >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); > >> >> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) > >> >> { > >> >> ++vcpu->stat.pf_guest; > >> >> - vcpu->arch.cr2 = fault->address; > >> >> + vcpu->arch.exception.async_page_fault = fault->async_page_fault; > >> > > >> > I think we need to act as if arch.exception.async_page_fault was not > >> > pending in kvm_vcpu_ioctl_x86_get_vcpu_events(). Otherwise, if we > >> > migrate with pending async_page_fault exception, we'd inject it as a > >> > normal #PF, which could confuse/kill the nested guest. > >> > > >> > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for > >> > sanity as well. > >> > >> Do you mean we should add a field like async_page_fault to > >> kvm_vcpu_events::exception, then saves arch.exception.async_page_fault > >> to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and > >> restores events->exception.async_page_fault to > >> arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS? > > > > No, I thought we could get away with a disgusting hack of hiding the > > exception from userspace, which would work for migration, but not if > > local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ... > > > > Extending the userspace interface would work, but I'd do it as a last > > resort, after all conservative solutions have failed. > > async_pf migration is very crude, so exposing the exception is just an > > ugly workaround for the local case. Adding the flag would also require > > userspace configuration of async_pf features for the guest to keep > > compatibility. > > > > I see two options that might be simpler than adding the userspace flag: > > > > 1) do the nested VM exit sooner, at the place where we now queue #PF, > > 2) queue the #PF later, save the async_pf in some intermediate > > structure and consume it at the place where you proposed the nested > > VM exit. > > How about something like this to not get exception events if it is > "is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > vcpu->arch.exception.async_page_fault" since lost a reschedule > optimization is not that importmant in L1. > > @@ -3072,13 +3074,16 @@ static void > kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > struct kvm_vcpu_events *events) > { > process_nmi(vcpu); > - events->exception.injected = > - vcpu->arch.exception.pending && > - !kvm_exception_is_soft(vcpu->arch.exception.nr); > - events->exception.nr = vcpu->arch.exception.nr; > - events->exception.has_error_code = vcpu->arch.exception.has_error_code; > - events->exception.pad = 0; > - events->exception.error_code = vcpu->arch.exception.error_code; > + if (!(is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > + vcpu->arch.exception.async_page_fault)) { > + events->exception.injected = > + vcpu->arch.exception.pending && > + !kvm_exception_is_soft(vcpu->arch.exception.nr); > + events->exception.nr = vcpu->arch.exception.nr; > + events->exception.has_error_code = vcpu->arch.exception.has_error_code; > + events->exception.pad = 0; > + events->exception.error_code = vcpu->arch.exception.error_code; > + } This adds a bug when userspace does KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS without migration -- KVM would drop the async_pf and a L1 process gets stuck as a result. We we'd need to add a similar condition to kvm_vcpu_ioctl_x86_set_vcpu_events(), so userspace SET doesn't drop it, but that is far beyond the realm of acceptable code. I realized this bug only after the first mail, sorry for the confusing paragraph.