DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1038968683
Date: Wed, 2 Aug 2017 22:26:29 +0200
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
        Paolo Bonzini <pbonzini@redhat.com>,
        Wanpeng Li <wanpeng.li@hotmail.com>
Subject: Re: [PATCH v3] KVM: nVMX: Fix attempting to emulate "Acknowledge
 interrupt on exit" when there is no interrupt which L1 requires to inject to
 L2
Message-ID: <20170802202628.GB32403@flask>
References: <1501670903-3368-1-git-send-email-wanpeng.li@hotmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1501670903-3368-1-git-send-email-wanpeng.li@hotmail.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3272
Lines: 83

2017-08-02 03:48-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
> Call Trace:
>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? __fget+0xfc/0x210
>   do_vfs_ioctl+0xa4/0x6a0
>   ? __fget+0x11d/0x210
>   SyS_ioctl+0x79/0x90
>   do_syscall_64+0x8f/0x750
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   entry_SYSCALL64_slow_path+0x25/0x25
> 
> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which 
> means that tells the kernel to not make use of any IOAPICs that may be present 
> in the system.
> 
> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win 
> variable passed from vcpu_enter_guest() which means that the L0's userspace 
> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
> L0's userspace reqeusts an irq window) is true, so there is no interrupt which 
> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge 
> interrupt on exit" for the irq window requirement in this scenario.
> 
> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
> interrupt information (valid interrupt) is always set to 1 on
> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
> expecting an interrupt in that case, so we should do a userspace VM exit
> when the window is open and then inject the userspace interrupt with a
> VM exit.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v2 -> v3:
>  * request an irq window and don't nested vmexit
> v1 -> v2:
>  * update patch description
>  * check nested_exit_intr_ack_set() first 
> 
>  arch/x86/kvm/vmx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 80b20e8..9ef2ec3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10761,7 +10761,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>  		return 0;
>  	}
>  
> -	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> +	if ((kvm_cpu_has_interrupt(vcpu) ||
> +	    (external_intr && !nested_exit_intr_ack_set(vcpu))) &&

I think it would be safer to also add something like the second hunk I
posted (that also takes nested_exit_on_intr() into account).

The issue is that we're allowing L2's GUEST_RFLAGS and
GUEST_INTERRUPTIBILITY_INFO to disable userspace interrupt injection
even though neither affect delivery of interrupts into L1.
This means that L2 can block/postpone the delivery to L1 by doing "cli;
busy_loop/normal_critical_section".

Thanks.

>  	    nested_exit_on_intr(vcpu)) {
>  		if (vmx->nested.nested_run_pending)
>  			return -EBUSY;
> -- 
> 2.7.4
>