Received: by 10.223.176.5 with SMTP id f5csp1890897wra; Thu, 8 Feb 2018 05:18:01 -0800 (PST) X-Google-Smtp-Source: AH8x225rTw5+SfOw8VpOp2Do1N2u4dTIJQJWv205cdOmWvACteK2uUfcv7braUvMbyd2MRXwpPnr X-Received: by 10.98.206.65 with SMTP id y62mr634167pfg.79.1518095881598; Thu, 08 Feb 2018 05:18:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518095881; cv=none; d=google.com; s=arc-20160816; b=CDhPyPiXodrzkiO5E5grhXz5WtnC1nqGtYFNlBDHac0dec62+8JZR2OgQ8U6RQLYmR dMuH9t8qCRMg4HgpLTKearCLWVvSR4IR2MZtD79lIKqfk4g1RzlKDhJRCz4XiO1RI+hm UTYBEx1RQvuaBMOwvJH5LPnh31GqrlntgYrs6pFZemhk7ajYgvPDwq+xbYYq4CutlvK4 BIBdDuCTG305VwY1AGpi+Yrq1BTvjLgQC9+YcBB6ZuR6Lt5BjA1Tf1czoYfLMVcJOFMf tpLv1eVFSNPkvPfyP0desJE5WKzjKLwbry2s7+38+YevqJNAyoR7UA30yN8mHqt1DEUL ScfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=hqCdfv9arnxeo4gx9+Ty5wdTW+12HHIxTPflg/zHo4U=; b=S/Mey4xL1wn1adHm6BCILs8yXI1ZcRsh1Ukf9WMghXsIdQ4dgg2HbnCiVTlt3dMJsk 5FgQOe+XWzet442MogN1wR4viL19DtB9WholhrvnMlZfcPavU/XyUoaS0LqXl5DlBIbM 1P/UcJmQ7qnL0wTfUNhModVFaY9BpHvjIs2/2YfH3U9TEcAbXGuEJWvZfpjJoU4CXgeG SaS7CKlj4FI/Wl7+Pz5LmCKeJ965WcQERlF5EhSXiTQegRwWozS07/Si2ozqjKi46So4 dHGXODuOXHLj9yk1BM1okx9p88q9ZJFOr0t5QRx4uONXUI7KIFOqhS8qgIj3dlBlu/eK DaqQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a59-v6si2754778pla.366.2018.02.08.05.17.47; Thu, 08 Feb 2018 05:18:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752087AbeBHNQX (ORCPT + 99 others); Thu, 8 Feb 2018 08:16:23 -0500 Received: from mga03.intel.com ([134.134.136.65]:12941 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750806AbeBHNQW (ORCPT ); Thu, 8 Feb 2018 08:16:22 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2018 05:16:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,478,1511856000"; d="scan'208";a="199659509" Received: from skl-4s-chao.sh.intel.com ([10.239.48.67]) by orsmga005.jf.intel.com with ESMTP; 08 Feb 2018 05:16:19 -0800 Date: Thu, 8 Feb 2018 20:04:43 +0800 From: Chao Gao To: Liran Alon Cc: pbonzini@redhat.com, mingo@redhat.com, x86@kernel.org, tglx@linutronix.de, rkrcmar@redhat.com, linux-kernel@vger.kernel.org, hpa@zytor.com, kvm@vger.kernel.org Subject: Re: [PATCH] x86/kvm/vmx: Don't halt vcpu when L1 is injecting events to L2 Message-ID: <20180208120441.GA118540@skl-4s-chao.sh.intel.com> References: <94d2625d-5055-4834-ba3c-b1a25117b762@default> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <94d2625d-5055-4834-ba3c-b1a25117b762@default> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 08, 2018 at 04:09:36AM -0800, Liran Alon wrote: > >----- pbonzini@redhat.com wrote: > >> On 08/02/2018 06:13, Chao Gao wrote: >> > Although L2 is in halt state, it will be in the active state after >> > VM entry if the VM entry is vectoring. Halting the vcpu here means >> > the event won't be injected to L2 and this decision isn't reported >> > to L1. Thus L0 drops an event that should be injected to L2. >> > >> > Because virtual interrupt delivery may wake L2 vcpu, if VID is >> enabled, >> > do the same thing -- don't halt L2. >> >> This second part seems wrong to me, or at least overly general. >> Perhaps >> you mean if RVI>0? >> >> Thanks, >> >> Paolo > >I would first recommend to split this commit. >The first commit should handle only the case of vectoring VM entry. >It should also specify in commit message it is based on Intel SDM 26.6.2 Activity State: >("If the VM entry is vectoring, the logical processor is in the active state after VM entry.") >That part in code seems correct to me. > >The second commit seems wrong to me as-well. >(I would also mention here it is based on Intel SDM 26.6.5 >Interrupt-Window Exiting and Virtual-Interrupt Delivery: >"These events wake the logical processor if it just entered the HLT state because of a VM entry") > >Paolo, I think that your suggestion is not sufficient as well. >Consider the case that APIC's TPR blocks interrupt specified in RVI. > >I think that we should just remove the check for VID from commit, >and instead fix another bug which I describe below. > >If lapic_in_kernel()==false, then APICv is not active and VID is not exposed to L1 >(In current KVM code. Jim already criticize that this is wrong.). > >Otherwise, kvm_vcpu_halt() will change mp_state to KVM_MP_STATE_HALTED. >Eventually, vcpu_run() will call vcpu_block() which will reach kvm_vcpu_has_events(). >That function is responsible for checking if there is any pending interrupts. >Including, pending interrupts as a result of VID enabled and RVI>0 The difference between checking VID and RVI here and in vcpu_run() is "nested_run_pending" is set for the former. Would it cause any problem if we don't set it? Thanks Chao >(While also taking into account the APIC's TPR). >The logic that checks for pending interrupts is kvm_cpu_has_interrupt() >which eventually reach apic_has_interrupt_for_ppr(). >If APICv is enabled, apic_has_interrupt_for_ppr() will call vmx_sync_pir_to_irr() >which calls vmx_hwapic_irr_update(). > >However, max_irr returned to apic_has_interrupt_for_ppr() does not consider the interrupt >pending in RVI. Which I think is the real bug to fix here. >In the non-nested case, RVI can never be larger than max_irr because that is how L0 KVM manages RVI. >However, in the nested case, L1 can set RVI in VMCS arbitrary >(we just copy GUEST_INTR_STATUS from vmcs01 into vmcs02). > >A possible patch to fix this is to change vmx_hwapic_irr_update() such that >if is_guest_mode(vcpu)==true, we should return max(max_irr, rvi) and return >that value into apic_has_interrupt_for_ppr(). >Need to verify that it doesn't break other flows but I think it makes sense. >What do you think? > >Regards, >-Liran > >> >> > Signed-off-by: Chao Gao >> > --- >> > arch/x86/kvm/vmx.c | 10 ++++++++-- >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> > >> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> > index bb5b488..e1fe4e4 100644 >> > --- a/arch/x86/kvm/vmx.c >> > +++ b/arch/x86/kvm/vmx.c >> > @@ -10985,8 +10985,14 @@ static int nested_vmx_run(struct kvm_vcpu >> *vcpu, bool launch) >> > if (ret) >> > return ret; >> > >> > - if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) >> > - return kvm_vcpu_halt(vcpu); >> > + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) { >> > + u32 intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD); >> > + u32 exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); >> > + >> > + if (!(intr_info & VECTORING_INFO_VALID_MASK) && >> > + !(exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)) >> > + return kvm_vcpu_halt(vcpu); >> > + } >> >> > vmx->nested.nested_run_pending = 1; >> > >> >