Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751678AbdG1EW1 (ORCPT ); Fri, 28 Jul 2017 00:22:27 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:10290 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751613AbdG1EWZ (ORCPT ); Fri, 28 Jul 2017 00:22:25 -0400 Message-ID: <597ABBEF.2080802@huawei.com> Date: Fri, 28 Jul 2017 12:22:07 +0800 From: "Longpeng (Mike)" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: Paolo Bonzini CC: , , Huangweidong , Gonglei , wangxin , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Subject: Re: [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load References: <20170606105707.23207-1-pbonzini@redhat.com> <20170606105707.23207-4-pbonzini@redhat.com> In-Reply-To: <20170606105707.23207-4-pbonzini@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.246.209] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.597ABBFA.007C,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 978c060c2d7ebf773ee420434bdba940 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5036 Lines: 156 On 2017/6/6 18:57, Paolo Bonzini wrote: > The simplify part: do not touch pi_desc.nv, we can set it when the > VCPU is first created. Likewise, pi_desc.sn is only handled by > vmx_vcpu_pi_load, do not touch it in __pi_post_block. > > The fix part: do not check kvm_arch_has_assigned_device, instead > check the SN bit to figure out whether vmx_vcpu_pi_put ran before. > This matches what the previous patch did in pi_post_block. > > Cc: Longpeng (Mike) > Cc: Huangweidong > Cc: Gonglei > Cc: wangxin > Cc: Radim Krčmář > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++-------------------------- > 1 file changed, 35 insertions(+), 33 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 0f4714fe4908..81047f373747 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > struct pi_desc old, new; > unsigned int dest; > > - if (!kvm_arch_has_assigned_device(vcpu->kvm) || > - !irq_remapping_cap(IRQ_POSTING_CAP) || > - !kvm_vcpu_apicv_active(vcpu)) > + /* > + * In case of hot-plug or hot-unplug, we may have to undo > + * vmx_vcpu_pi_put even if there is no assigned device. And we > + * always keep PI.NDST up to date for simplicity: it makes the > + * code easier, and CPU migration is not a fast path. > + */ > + if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) > + return; Hi Paolo, I'm confused with the following scenario: (suppose the VM has a assigned devices) step 1. the running vcpu is be preempted --> vmx_vcpu_pi_put [ SET pi.sn ] step 2. hot-unplug the assigned devices step 3. the vcpu is scheduled in --> vmx_vcpu_pi_load [ CLEAR pi.sn ] step 4. the running vcpu is be preempted again --> vmx_vcpu_pi_put [ direct return ] step 5. the vcpu is migrated to another pcpu step 6. the vcpu is scheduled in --> vmx_vcpu_pi_load [ above check fails and continue to execute the follow parts ] I think vmx_vcpu_pi_load should return direct in step6, because vmx_vcpu_pi_put in step4 did nothing. So maybe the above check has a potential problem. Please kindly figure out if I misunderstand anything important :) -- Regards, Longpeng(Mike) > + > + /* > + * First handle the simple case where no cmpxchg is necessary; just > + * allow posting non-urgent interrupts. > + * > + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change > + * PI.NDST: pi_post_block will do it for us and the wakeup_handler > + * expects the VCPU to be on the blocked_vcpu_list that matches > + * PI.NDST. > + */ > + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || > + vcpu->cpu == cpu) { > + pi_clear_sn(pi_desc); > return; > + } > > + /* The full case. */ > do { > old.control = new.control = pi_desc->control; > > - /* > - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there > - * are two possible cases: > - * 1. After running 'pre_block', context switch > - * happened. For this case, 'sn' was set in > - * vmx_vcpu_put(), so we need to clear it here. > - * 2. After running 'pre_block', we were blocked, > - * and woken up by some other guy. For this case, > - * we don't need to do anything, 'pi_post_block' > - * will do everything for us. However, we cannot > - * check whether it is case #1 or case #2 here > - * (maybe, not needed), so we also clear sn here, > - * I think it is not a big deal. > - */ > - if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) { > - if (vcpu->cpu != cpu) { > - dest = cpu_physical_id(cpu); > - > - if (x2apic_enabled()) > - new.ndst = dest; > - else > - new.ndst = (dest << 8) & 0xFF00; > - } > + dest = cpu_physical_id(cpu); > > - /* set 'NV' to 'notification vector' */ > - new.nv = POSTED_INTR_VECTOR; > - } > + if (x2apic_enabled()) > + new.ndst = dest; > + else > + new.ndst = (dest << 8) & 0xFF00; > > - /* Allow posting non-urgent interrupts */ > new.sn = 0; > } while (cmpxchg(&pi_desc->control, old.control, > new.control) != old.control); > @@ -9259,6 +9257,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > > vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED; > > + /* > + * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR > + * or POSTED_INTR_WAKEUP_VECTOR. > + */ > + vmx->pi_desc.nv = POSTED_INTR_VECTOR; > + vmx->pi_desc.sn = 1; > + > return &vmx->vcpu; > > free_vmcs: > @@ -11249,9 +11254,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) > else > new.ndst = (dest << 8) & 0xFF00; > > - /* Allow posting non-urgent interrupts */ > - new.sn = 0; > - > /* set 'NV' to 'notification vector' */ > new.nv = POSTED_INTR_VECTOR; > } while (cmpxchg(&pi_desc->control, old.control, -- Regards, Longpeng(Mike)