Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1892751imm; Thu, 19 Jul 2018 09:30:04 -0700 (PDT) X-Google-Smtp-Source: AAOMgpen/aQBqRP9gwVoH6mkX3wqKfCPRgQDLTpGVy9XDhrOEMZdb+0bTWZByTcy/CPzHW6FpoiQ X-Received: by 2002:a62:ad1:: with SMTP id 78-v6mr1996390pfk.57.1532017804594; Thu, 19 Jul 2018 09:30:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532017804; cv=none; d=google.com; s=arc-20160816; b=PNN+GJnbS9F5LjVyuBT+RuHpzzf++R+6E94x5+k1DvoyvY3Q418HygyxAXQtugreoH Oz5NOcyRltLTMDYWfMihg4NOX3OKM1TIoWbxcifgZAb2/B/MSppDV5hYfUvy6g/DOzt4 uNJKLEyRNDJMLvHkDo84nVx3qYuYC0m/tx1RLVzu8kwrA33+KXW26nterD3OxWqZmgP5 Ncod/jn5Et60ouBEHu9E8XakNNzgG3zk72V1IUDqXnTyIqS8rUN0lFemDUOsMdMgs0jp +7SLlHGWJHlddevq6iG6tGI2Ws6D9TnY5z88e0awDMGUh4e2Li2Opbao1RkRt+bDMLYw vgMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=rU8hB1a5e53M7cXcKgzBKcGlwWNNijWj7pQbzR6b3Gc=; b=ABqlm9JlDC8dJr1cjH2aenZRqmkn0xhJ8RLb/gqlgpxsh5+skWgWat+qOY0CrqfuD6 852ObNm38WSJE2d841HvBIh+cYN7xzhWVPlKKG6nWc0Ntf8VpKnS7sTWUxc6/qxEfd8p dM2z3Vj0RyB4nBLYjDA1LtBk69L03+f5BG5azntXT79LfOk6kjuTV4H5YLIHVM6xSMYF 0EKjf3wwyNBcrlnUd261hwKiXE6a8IrMWNzhKGWL5/03z58VfLF6PsNkUFh7In0RMPbU R3sVmqkewIsiqdX0+pi+c47wxLM6o+GGHJ/XmcbfNJXBbHk/KlZhv8lmTndHUm8XjaKS x4cw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d5-v6si6188951pla.337.2018.07.19.09.29.49; Thu, 19 Jul 2018 09:30:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731994AbeGSRM1 (ORCPT + 99 others); Thu, 19 Jul 2018 13:12:27 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34540 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731711AbeGSRM0 (ORCPT ); Thu, 19 Jul 2018 13:12:26 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C759C81A88DE; Thu, 19 Jul 2018 16:28:29 +0000 (UTC) Received: from flask (unknown [10.43.2.80]) by smtp.corp.redhat.com (Postfix) with SMTP id DAAA0111DCEC; Thu, 19 Jul 2018 16:28:27 +0000 (UTC) Received: by flask (sSMTP sendmail emulation); Thu, 19 Jul 2018 18:28:27 +0200 Date: Thu, 19 Jul 2018 18:28:27 +0200 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Vitaly Kuznetsov Subject: Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Message-ID: <20180719162826.GB11749@flask> References: <1530598891-21370-1-git-send-email-wanpengli@tencent.com> <1530598891-21370-3-git-send-email-wanpengli@tencent.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1530598891-21370-3-git-send-email-wanpengli@tencent.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 19 Jul 2018 16:28:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Thu, 19 Jul 2018 16:28:29 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'rkrcmar@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-07-03 14:21+0800, Wanpeng Li: > From: Wanpeng Li > > Implement paravirtual apic hooks to enable PV IPIs. > > apic->send_IPI_mask > apic->send_IPI_mask_allbutself > apic->send_IPI_allbutself > apic->send_IPI_all > > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud > environment currently, supporting more vCPUs needs to introduce more > complex logic, in the future this might be extended if needed. > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: Vitaly Kuznetsov > Signed-off-by: Wanpeng Li > --- > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void) > } > > #ifdef CONFIG_SMP > + > +#ifdef CONFIG_X86_64 > +static void __send_ipi_mask(const struct cpumask *mask, int vector) > +{ > + unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0; > + int cpu, apic_id; > + > + if (cpumask_empty(mask)) > + return; > + > + local_irq_save(flags); > + > + for_each_cpu(cpu, mask) { > + apic_id = per_cpu(x86_cpu_to_apicid, cpu); > + if (apic_id < BITS_PER_LONG) > + __set_bit(apic_id, &ipi_bitmap_low); > + else if (apic_id < 2 * BITS_PER_LONG) > + __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high); It'd be nicer with 'unsigned long ipi_bitmap[2]' and a single __set_bit(apic_id, ipi_bitmap); > + } > + > + kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector); and kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector); Still, the main problem is that we can only address 128 APICs. A simple improvement would reuse the vector field (as we need only 8 bits) and put a 'offset' in the rest. The offset would say which cluster of 128 are we addressing. 24 bits of offset results in 2^31 total addressable CPUs (we probably should even use that many bits). The downside of this is that we can only address 128 at a time. It's basically the same as x2apic cluster mode, only with 128 cluster size instead of 16, so the code should be a straightforward port. And because x2apic code doesn't seem to use any division by the cluster size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and make the cluster size 192. :) But because it is very similar to x2apic, I'd really need some real performance data to see if this benefits a real workload. Hardware could further optimize LAPIC (apicv, vapic) in the future, which we'd lose by using paravirt. e.g. AMD's acceleration should be superior to this when using < 8 VCPUs as they can use logical xAPIC and send without VM exits (when all VCPUs are running). > + > + local_irq_restore(flags); > +} > + > +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector) > +{ > + __send_ipi_mask(mask, vector); > +} > + > +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector) > +{ > + unsigned int this_cpu = smp_processor_id(); > + struct cpumask new_mask; > + const struct cpumask *local_mask; > + > + cpumask_copy(&new_mask, mask); > + cpumask_clear_cpu(this_cpu, &new_mask); > + local_mask = &new_mask; > + __send_ipi_mask(local_mask, vector); > +} > + > +static void kvm_send_ipi_allbutself(int vector) > +{ > + kvm_send_ipi_mask_allbutself(cpu_online_mask, vector); > +} > + > +static void kvm_send_ipi_all(int vector) > +{ > + __send_ipi_mask(cpu_online_mask, vector); These should be faster when using the native APIC shorthand -- is this the "Broadcast" in your tests? > +} > + > +/* > + * Set the IPI entry points > + */ > +static void kvm_setup_pv_ipi(void) > +{ > + apic->send_IPI_mask = kvm_send_ipi_mask; > + apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself; > + apic->send_IPI_allbutself = kvm_send_ipi_allbutself; > + apic->send_IPI_all = kvm_send_ipi_all; > + pr_info("KVM setup pv IPIs\n"); > +} > +#endif > + > static void __init kvm_smp_prepare_cpus(unsigned int max_cpus) > { > native_smp_prepare_cpus(max_cpus); > @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void) > > static void __init kvm_apic_init(void) > { > +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64) > + if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) && > + num_possible_cpus() <= 2 * BITS_PER_LONG) It looks that num_possible_cpus() is actually NR_CPUS, so the feature would never be used on a standard Linux distro. And we're using APIC_ID, which can be higher even if maximum CPU the number is lower. Just remove it.