Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752890AbdFUJxY (ORCPT ); Wed, 21 Jun 2017 05:53:24 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:36344 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751069AbdFUJxU (ORCPT ); Wed, 21 Jun 2017 05:53:20 -0400 MIME-Version: 1.0 In-Reply-To: <20170620161251.GB13549@potion> References: <1497493615-18512-1-git-send-email-wanpeng.li@hotmail.com> <1497493615-18512-4-git-send-email-wanpeng.li@hotmail.com> <20170616133702.GA6360@potion> <20170616153832.GA5980@potion> <20170619145144.GA10325@potion> <20170620161251.GB13549@potion> From: Wanpeng Li Date: Wed, 21 Jun 2017 17:53:18 +0800 Message-ID: Subject: Re: [PATCH v2 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: "linux-kernel@vger.kernel.org" , kvm , Paolo Bonzini , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v5L9raqE001837 Content-Length: 6102 Lines: 122 2017-06-21 0:12 GMT+08:00 Radim Krčmář : > 2017-06-20 05:47+0800, Wanpeng Li: >> 2017-06-19 22:51 GMT+08:00 Radim Krčmář : >> > 2017-06-17 13:52+0800, Wanpeng Li: >> >> 2017-06-16 23:38 GMT+08:00 Radim Krčmář : >> >> > 2017-06-16 22:24+0800, Wanpeng Li: >> >> >> 2017-06-16 21:37 GMT+08:00 Radim Krčmář : >> >> >> > 2017-06-14 19:26-0700, Wanpeng Li: >> >> >> >> From: Wanpeng Li >> >> >> >> >> >> >> >> Add an async_page_fault field to vcpu->arch.exception to identify an async >> >> >> >> page fault, and constructs the expected vm-exit information fields. Force >> >> >> >> a nested VM exit from nested_vmx_check_exception() if the injected #PF >> >> >> >> is async page fault. >> >> >> >> >> >> >> >> Cc: Paolo Bonzini >> >> >> >> Cc: Radim Krčmář >> >> >> >> Signed-off-by: Wanpeng Li >> >> >> >> --- >> >> >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> >> >> >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); >> >> >> >> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) >> >> >> >> { >> >> >> >> ++vcpu->stat.pf_guest; >> >> >> >> - vcpu->arch.cr2 = fault->address; >> >> >> >> + vcpu->arch.exception.async_page_fault = fault->async_page_fault; >> >> >> > >> >> >> > I think we need to act as if arch.exception.async_page_fault was not >> >> >> > pending in kvm_vcpu_ioctl_x86_get_vcpu_events(). Otherwise, if we >> >> >> > migrate with pending async_page_fault exception, we'd inject it as a >> >> >> > normal #PF, which could confuse/kill the nested guest. >> >> >> > >> >> >> > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for >> >> >> > sanity as well. >> >> >> >> >> >> Do you mean we should add a field like async_page_fault to >> >> >> kvm_vcpu_events::exception, then saves arch.exception.async_page_fault >> >> >> to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and >> >> >> restores events->exception.async_page_fault to >> >> >> arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS? >> >> > >> >> > No, I thought we could get away with a disgusting hack of hiding the >> >> > exception from userspace, which would work for migration, but not if >> >> > local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ... >> >> > >> >> > Extending the userspace interface would work, but I'd do it as a last >> >> > resort, after all conservative solutions have failed. >> >> > async_pf migration is very crude, so exposing the exception is just an >> >> > ugly workaround for the local case. Adding the flag would also require >> >> > userspace configuration of async_pf features for the guest to keep >> >> > compatibility. >> >> > >> >> > I see two options that might be simpler than adding the userspace flag: >> >> > >> >> > 1) do the nested VM exit sooner, at the place where we now queue #PF, >> >> > 2) queue the #PF later, save the async_pf in some intermediate >> >> > structure and consume it at the place where you proposed the nested >> >> > VM exit. >> >> >> >> How about something like this to not get exception events if it is >> >> "is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && >> >> vcpu->arch.exception.async_page_fault" since lost a reschedule >> >> optimization is not that importmant in L1. >> >> >> >> @@ -3072,13 +3074,16 @@ static void >> >> kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, >> >> struct kvm_vcpu_events *events) >> >> { >> >> process_nmi(vcpu); >> >> - events->exception.injected = >> >> - vcpu->arch.exception.pending && >> >> - !kvm_exception_is_soft(vcpu->arch.exception.nr); >> >> - events->exception.nr = vcpu->arch.exception.nr; >> >> - events->exception.has_error_code = vcpu->arch.exception.has_error_code; >> >> - events->exception.pad = 0; >> >> - events->exception.error_code = vcpu->arch.exception.error_code; >> >> + if (!(is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && >> >> + vcpu->arch.exception.async_page_fault)) { >> >> + events->exception.injected = >> >> + vcpu->arch.exception.pending && >> >> + !kvm_exception_is_soft(vcpu->arch.exception.nr); >> >> + events->exception.nr = vcpu->arch.exception.nr; >> >> + events->exception.has_error_code = vcpu->arch.exception.has_error_code; >> >> + events->exception.pad = 0; >> >> + events->exception.error_code = vcpu->arch.exception.error_code; >> >> + } >> > >> > This adds a bug when userspace does KVM_GET_VCPU_EVENTS and >> > KVM_SET_VCPU_EVENTS without migration -- KVM would drop the async_pf and >> > a L1 process gets stuck as a result. >> > >> > We we'd need to add a similar condition to >> > kvm_vcpu_ioctl_x86_set_vcpu_events(), so userspace SET doesn't drop it, >> > but that is far beyond the realm of acceptable code. >> >> Do you mean current status of the patchset v2 can be accepted? >> Otherwise, what's the next should be done? > > No, sorry, that one has the migration bug (the async_page_fault gets > dropped on destination). > > You proposed to add the flag to the userspace interface, which is a > sound solution. I was asking to look for a different one, because the > flag is a work-around for an implementation detail, which isn't a good > thing to put into a userspace interface ... > > Still, I looked at the early VM exit (1) and it doesn't fit well into > SVM's model of single nested VM exit location, so it's out. > > The remaining contender is to add a paravirtualized event for apf and > only convert it into nested VM exit or #PF in inject_pending_event(). > The end result would likely be a slightly better version of the > exception flag ... > > I guess that doing a prototype of the userspace interface extension is a > good follow up. Yeah, I just do this in patch 3/4 v3 and another qemu patch. Please have a review. :) Regards, Wanpeng Li