Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752083AbbLVGmP (ORCPT ); Tue, 22 Dec 2015 01:42:15 -0500 Received: from mail-pa0-f47.google.com ([209.85.220.47]:35743 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751367AbbLVGmL (ORCPT ); Tue, 22 Dec 2015 01:42:11 -0500 Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts To: "Wu, Feng" , "pbonzini@redhat.com" , "rkrcmar@redhat.com" References: <1450229853-3886-1-git-send-email-feng.wu@intel.com> <1450229853-3886-3-git-send-email-feng.wu@intel.com> <56775ADB.10602@gmail.com> <56775D67.1030703@gmail.com> Cc: "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" From: Yang Zhang Message-ID: <5678F0BE.2020409@gmail.com> Date: Tue, 22 Dec 2015 14:42:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6829 Lines: 201 On 2015/12/22 12:36, Wu, Feng wrote: > > >> -----Original Message----- >> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com] >> Sent: Monday, December 21, 2015 10:01 AM >> To: Wu, Feng ; pbonzini@redhat.com; >> rkrcmar@redhat.com >> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org >> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d >> posted-interrupts >> >> On 2015/12/21 9:55, Wu, Feng wrote: >>> >>> >>>> -----Original Message----- >>>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- >>>> owner@vger.kernel.org] On Behalf Of Yang Zhang >>>> Sent: Monday, December 21, 2015 9:50 AM >>>> To: Wu, Feng ; pbonzini@redhat.com; >>>> rkrcmar@redhat.com >>>> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org >>>> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d >>>> posted-interrupts >>>> >>>> On 2015/12/16 9:37, Feng Wu wrote: >>>>> Use vector-hashing to deliver lowest-priority interrupts for >>>>> VT-d posted-interrupts. >>>>> >>>>> Signed-off-by: Feng Wu >>>>> --- >>>>> arch/x86/kvm/lapic.c | 67 >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> arch/x86/kvm/lapic.h | 2 ++ >>>>> arch/x86/kvm/vmx.c | 12 ++++++++-- >>>>> 3 files changed, 79 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >>>>> index e29001f..d4f2c8f 100644 >>>>> --- a/arch/x86/kvm/lapic.c >>>>> +++ b/arch/x86/kvm/lapic.c >>>>> @@ -854,6 +854,73 @@ out: >>>>> } >>>>> >>>>> /* >>>>> + * This routine handles lowest-priority interrupts using vector-hashing >>>>> + * mechanism. As an example, modern Intel CPUs use this method to >> handle >>>>> + * lowest-priority interrupts. >>>>> + * >>>>> + * Here is the details about the vector-hashing mechanism: >>>>> + * 1. For lowest-priority interrupts, store all the possible destination >>>>> + * vCPUs in an array. >>>>> + * 2. Use "guest vector % max number of destination vCPUs" to find the >> right >>>>> + * destination vCPU in the array for the lowest-priority interrupt. >>>>> + */ >>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, >>>>> + struct kvm_lapic_irq *irq) >>>>> +{ >>>>> + struct kvm_apic_map *map; >>>>> + struct kvm_vcpu *vcpu = NULL; >>>>> + >>>>> + if (irq->shorthand) >>>>> + return NULL; >>>>> + >>>>> + rcu_read_lock(); >>>>> + map = rcu_dereference(kvm->arch.apic_map); >>>>> + >>>>> + if (!map) >>>>> + goto out; >>>>> + >>>>> + if ((irq->dest_mode != APIC_DEST_PHYSICAL) && >>>>> + kvm_lowest_prio_delivery(irq)) { >>>>> + u16 cid; >>>>> + int i, idx = 0; >>>>> + unsigned long bitmap = 1; >>>>> + unsigned int dest_vcpus = 0; >>>>> + struct kvm_lapic **dst = NULL; >>>>> + >>>>> + >>>>> + if (!kvm_apic_logical_map_valid(map)) >>>>> + goto out; >>>>> + >>>>> + apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap); >>>>> + >>>>> + if (cid >= ARRAY_SIZE(map->logical_map)) >>>>> + goto out; >>>>> + >>>>> + dst = map->logical_map[cid]; >>>>> + >>>>> + for_each_set_bit(i, &bitmap, 16) { >>>>> + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) { >>>>> + clear_bit(i, &bitmap); >>>>> + continue; >>>>> + } >>>>> + } >>>>> + >>>>> + dest_vcpus = hweight16(bitmap); >>>>> + >>>>> + if (dest_vcpus != 0) { >>>>> + idx = kvm_vector_2_index(irq->vector, dest_vcpus, >>>>> + &bitmap, 16); >>>>> + vcpu = dst[idx-1]->vcpu; >>>>> + } >>>>> + } >>>>> + >>>>> +out: >>>>> + rcu_read_unlock(); >>>>> + return vcpu; >>>>> +} >>>>> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest); >>>>> + >>>>> +/* >>>>> * Add a pending IRQ into lapic. >>>>> * Return 1 if successfully added and 0 if discarded. >>>>> */ >>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h >>>>> index 6890ef0..52bffce 100644 >>>>> --- a/arch/x86/kvm/lapic.h >>>>> +++ b/arch/x86/kvm/lapic.h >>>>> @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, >>>> struct kvm_lapic_irq *irq, >>>>> struct kvm_vcpu **dest_vcpu); >>>>> int kvm_vector_2_index(u32 vector, u32 dest_vcpus, >>>>> const unsigned long *bitmap, u32 bitmap_size); >>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm, >>>>> + struct kvm_lapic_irq *irq); >>>>> #endif >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>>> index 5eb56ed..3f89189 100644 >>>>> --- a/arch/x86/kvm/vmx.c >>>>> +++ b/arch/x86/kvm/vmx.c >>>>> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm >> *kvm, >>>> unsigned int host_irq, >>>>> */ >>>>> >>>>> kvm_set_msi_irq(e, &irq); >>>>> - if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) >>>>> - continue; >>>>> + >>>>> + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) { >>>>> + if (!kvm_vector_hashing_enabled() || >>>>> + irq.delivery_mode != >>>> APIC_DM_LOWEST) >>>>> + continue; >>>>> + >>>>> + vcpu = kvm_intr_vector_hashing_dest(kvm, &irq); >>>>> + if (!vcpu) >>>>> + continue; >>>>> + } >>>> >>>> I am a little confused with the 'continue'. If the destination is not >>>> single vcpu, shouldn't we rollback to use non-PI mode? >>> >>> Here is the logic: >>> - If it is single destination, we will use PI no matter it is fixed or lowest-priority. >>> - If it is not single destination: >>> a) It is fixed, we will use non-PI >>> b) It is lowest-priority and vector-hashing is enabled, we will use PI >>> c) otherwise, use non-PI >> >> If it is single destination previously, then change to no-single mode. >> Can current code cover this case? > > In my test, before setting irq affinity (change single vcpu to non-single vcpu > in this case), the guest will mask the interrupt first, so before getting here, IRTE > has been changed back to remapped mode already(when guest masks the MSIx, > we will change back to remapped mode), hence nothing needed here. > > Digging into the linux code (guest) a bit more, I found that if interrupt remapping > is not enabled in the guest (IR is not supported for guest anyway), it will always > mask the MSI/MSIx before setting the irq affinity. So the code should work > well currently. We should not rely on guest's behavior. From code level, it need be fixed. > > However, for robustness, I think explicitly changing IRTE back to remapped > mode for the 'continue' case should be a good idea. This is what i am looking for. > > Radim, Paolo, what are your guys' options about this? Any comments are > appreciated! Thanks a lot! > > Thanks, > Feng > -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/