Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp679717imm; Fri, 29 Jun 2018 04:44:17 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ5D6vEfWeZvqGz/H5hJaD/ODNjz2PMaCxvncWjoKBHIStkMe3w/t0q3ATx61cVEd9szayR X-Received: by 2002:a63:6092:: with SMTP id u140-v6mr12204008pgb.433.1530272656950; Fri, 29 Jun 2018 04:44:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530272656; cv=none; d=google.com; s=arc-20160816; b=KzTcFbs10bTJOnCE6FeTL29T2OW4zxnUDo6GPVfEePbMP0Z42RFVTGMuDxWlFbDJLU YSjHX3RyvX90iQVrviGbZkkEN1WQeYPj8cvN6efu/Yct2TTf/TlZkPSxo9mwZl/cf+OX TPfTBgKPyJQ7iJpXkZUHHIZpHBf7XafPHYwmTRiQDQyhixLfZ8z7Q8RoGZbWuK92zC34 98tni/Rsn3wu7gy6XBEVamzsTIUkmsR5ToJGwiR2BMDub+NOlviesS+Gwhi9IKaFb/uH CS7/2yBBR+VTqrcQziDtKzhguWn/dgzsklebWzIsijcT9nOwE41InZBBIOYvwGHGh5iZ s8Vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from :arc-authentication-results; bh=CmtgDcOE8T7FddP7LE6NC4idrNKQZV1Vign5nOPO2nI=; b=znTzJM7axttA48Ya76+4AvGFNQgPSjrpCNjLEUbviU7Ja2R04CtXD6TRTn1h1pz5l1 TltGq34k675lyfxFDYeiWw69S/hKQEJTHHJH956TrCyRDh0ExD+w9IDK4RzLXWXSyrE0 hWMRWiC9hMQxWU7h/vX7gtoBPcazj9f5fPJAleosVpg1Ow1jlSjV8oH6zKfoJxNfOd/7 LQ8j/ohrJALTX8J34JtPvqlCqo82bYPVVcTKWSU2tmyNUKjD6AOnMRg56w+f81Qlhamh R1a+naDCzvhtG7fEsoEYImCgEKeU8gFZ7tONQWY7N62Aqn8I+wnRZNZLKFbcktRgUowL g2KQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 32-v6si8483888ple.447.2018.06.29.04.44.02; Fri, 29 Jun 2018 04:44:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936168AbeF2Lht (ORCPT + 99 others); Fri, 29 Jun 2018 07:37:49 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53174 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932331AbeF2Lhr (ORCPT ); Fri, 29 Jun 2018 07:37:47 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 60A4BC12A0; Fri, 29 Jun 2018 11:37:47 +0000 (UTC) Received: from vitty.brq.redhat.com.redhat.com (unknown [10.43.2.155]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7A1582156700; Fri, 29 Jun 2018 11:37:45 +0000 (UTC) From: Vitaly Kuznetsov To: Roman Kagan Cc: kvm@vger.kernel.org, x86@kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , "Michael Kelley \(EOSG\)" , Mohammed Gamal , Cathy Avery , Wanpeng Li , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 2/5] KVM: x86: hyperv: introduce vp_index_to_vcpu_idx mapping References: <20180628135313.17468-1-vkuznets@redhat.com> <20180628135313.17468-3-vkuznets@redhat.com> <20180629101134.GA15656@rkaganb.sw.ru> <87y3exdh2o.fsf@vitty.brq.redhat.com> <20180629111227.GB15656@rkaganb.sw.ru> Date: Fri, 29 Jun 2018 13:37:44 +0200 In-Reply-To: <20180629111227.GB15656@rkaganb.sw.ru> (Roman Kagan's message of "Fri, 29 Jun 2018 14:12:28 +0300") Message-ID: <87tvplddrr.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 29 Jun 2018 11:37:47 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 29 Jun 2018 11:37:47 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'vkuznets@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Roman Kagan writes: > On Fri, Jun 29, 2018 at 12:26:23PM +0200, Vitaly Kuznetsov wrote: >> Roman Kagan writes: >> >> > On Thu, Jun 28, 2018 at 03:53:10PM +0200, Vitaly Kuznetsov wrote: >> >> While it is easy to get VP index from vCPU index the reverse task is hard. >> >> Basically, to solve it we have to walk all vCPUs checking if their VP index >> >> matches. For hypercalls like HvFlushVirtualAddress{List,Space}* and the >> >> upcoming HvSendSyntheticClusterIpi* where a single CPU may be specified in >> >> the whole set this is obviously sub-optimal. >> >> >> >> As VP index can be set to anything <= U32_MAX by userspace using plain >> >> [0..MAX_VP_INDEX] array is not a viable option. Use condensed sorted >> >> array with logarithmic search complexity instead. Use RCU to make read >> >> access as fast as possible and maintain atomicity of updates. >> > >> > Quoting TLFS 5.0C section 7.8.1: >> > >> >> Virtual processors are identified by using an index (VP index). The >> >> maximum number of virtual processors per partition supported by the >> >> current implementation of the hypervisor can be obtained through CPUID >> >> leaf 0x40000005. A virtual processor index must be less than the >> >> maximum number of virtual processors per partition. >> > >> > so this is a dense index, and VP_INDEX >= KVM_MAX_VCPUS is invalid. I >> > think we're better off enforcing this in kvm_hv_set_msr and keep the >> > translation simple. If the algorithm in get_vcpu_by_vpidx is not good >> > enough (and yes it can be made to return NULL early on vpidx >= >> > KVM_MAX_VCPUS instead of taking the slow path) then a simple index array >> > of KVM_MAX_VCPUS entries should certainly do. >> >> Sure, we can use pre-allocated [0..KVM_MAX_VCPUS] array instead and put >> limits on what userspace can assign VP_INDEX to. Howver, while thinking >> about it I decided to go with the more complex condensed array approach >> because the tendency is for KVM_MAX_VCPUS to grow and we will be >> pre-allocating more and more memory for no particular reason (so I think >> even 'struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]' in 'struct kvm' will need >> to be converted to something else eventually). > > We're talking of kilobytes here. I guess this is going to be the least > of the scalability problems. Yes, kilobytes but per-VM. > >> Anyway, I'm flexible and if you think we should go this way now I'll do >> this in v3. We can re-think this when we later decide to raise >> KVM_MAX_VCPUS significantly. > > Although there's no strict requirement for that I think every sensible > userspace will allocate VP_INDEX linearly resulting in it being equal to > KVM's vcpu index. So we've yet to see a case where get_vcpu_by_vpidx > doesn't take the fast path. If it ever starts appearing in the profiles > we may consider optimiziing it but ATM I don't even think introducing > the translation array is justified. It was Radim who suggested it in the first place :-) The problem we're trying to solve here is: with PV TLB flush and IPI we need to walk through the supplied list of VP_INDEXes and get VCPU ids. Usually they match. But in case they don't we'll fall back to full scan for every VP_INDEX in the supplied list. Now let's say we have 128 CPUs. We'll need to perform up to 128 * 128 extra comparisons on every hypercall. Not good. So instead of using get_vcpu_by_vpidx() I opted for walking the whole VCPU list and checking if VPU's VP_INDEX is in the supplied set. This way we end up with 128 comparisons in the example above (worst case scenarion). However, we lose in simple scenarios like only 1 VP_INDEX was specified in the set: we'll still need to walk the whole list. So having the translation array (one way or another) is IMO justified. -- Vitaly