Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752538AbdHBU0d (ORCPT ); Wed, 2 Aug 2017 16:26:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35644 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751739AbdHBU0c (ORCPT ); Wed, 2 Aug 2017 16:26:32 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1038968683 Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=rkrcmar@redhat.com Date: Wed, 2 Aug 2017 22:26:29 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Wanpeng Li Subject: Re: [PATCH v3] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2 Message-ID: <20170802202628.GB32403@flask> References: <1501670903-3368-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1501670903-3368-1-git-send-email-wanpeng.li@hotmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 02 Aug 2017 20:26:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3272 Lines: 83 2017-08-02 03:48-0700, Wanpeng Li: > From: Wanpeng Li > > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] > CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 > RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] > Call Trace: > vmx_check_nested_events+0x131/0x1f0 [kvm_intel] > ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] > kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] > ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] > ? kvm_arch_vcpu_load+0x62/0x230 [kvm] > kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? __fget+0xfc/0x210 > do_vfs_ioctl+0xa4/0x6a0 > ? __fget+0x11d/0x210 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x8f/0x750 > ? trace_hardirqs_on_thunk+0x1a/0x1c > entry_SYSCALL64_slow_path+0x25/0x25 > > This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which > means that tells the kernel to not make use of any IOAPICs that may be present > in the system. > > Actually external_intr variable in nested_vmx_vmexit() is the req_int_win > variable passed from vcpu_enter_guest() which means that the L0's userspace > requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && > L0's userspace reqeusts an irq window) is true, so there is no interrupt which > L1 requires to inject to L2, we should not attempt to emualte "Acknowledge > interrupt on exit" for the irq window requirement in this scenario. > > SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit > interrupt information (valid interrupt) is always set to 1 on > EXIT_REASON_EXTERNAL_INTERRUPT. We don't want to break hypervisors > expecting an interrupt in that case, so we should do a userspace VM exit > when the window is open and then inject the userspace interrupt with a > VM exit. > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Signed-off-by: Wanpeng Li > --- > v2 -> v3: > * request an irq window and don't nested vmexit > v1 -> v2: > * update patch description > * check nested_exit_intr_ack_set() first > > arch/x86/kvm/vmx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 80b20e8..9ef2ec3 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -10761,7 +10761,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && > + if ((kvm_cpu_has_interrupt(vcpu) || > + (external_intr && !nested_exit_intr_ack_set(vcpu))) && I think it would be safer to also add something like the second hunk I posted (that also takes nested_exit_on_intr() into account). The issue is that we're allowing L2's GUEST_RFLAGS and GUEST_INTERRUPTIBILITY_INFO to disable userspace interrupt injection even though neither affect delivery of interrupts into L1. This means that L2 can block/postpone the delivery to L1 by doing "cli; busy_loop/normal_critical_section". Thanks. > nested_exit_on_intr(vcpu)) { > if (vmx->nested.nested_run_pending) > return -EBUSY; > -- > 2.7.4 >