Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1150006pxb; Fri, 21 Jan 2022 10:55:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJy2mhfmJr4vqu7rKNrjFxOug8dfHcBSrYMk+qT/770+1r3vP8Dm8dV/d5X35aJJqDRy07Qr X-Received: by 2002:a17:902:c40a:b0:14a:7fef:981a with SMTP id k10-20020a170902c40a00b0014a7fef981amr4979977plk.156.1642791354253; Fri, 21 Jan 2022 10:55:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642791354; cv=none; d=google.com; s=arc-20160816; b=J6HPT/pu0jbbGHJQucP1I6YX7wsVeuBGGw/M2yuUyMAYPEgJ92uvDXA8UIZ5rOXx21 zRRXsaxEfUKpIsSNsjiZaiszJCSH3AlZoVAyQq7/pW0QDMyU4HJW+AJ0YMAXN9Pgn9J5 +C4ocgJ8zf22drI5D5Lj8cF/s1+/aU0/a+tQMcHOyN7LSt46gcenBY+aQRhrlOhTkWXj axdBCnadO6gZe72/HrUVi03WaLmbwnCkpzG/ycqHsnatwb9fCc/8gqEsmZb1N8RHhpwQ 19Y2m1Cm3GwQpE+aWN1OD6zGFwKpzFqQIJAlrITaXi4HUetja8Gn/XAhD6ERl3C2JNKA zlUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=CFy44aI3z2C9Ppm0csVr/HrTro0+IEiRaDAsMQwAJSY=; b=HZgvMV48ynK9wBHUDx0UauWyrl4l2aRmDpe0hCusjdiMTIVahzCEFqKVS/KgjDzDkS e8StIHexpzD6Tx/dHRj50V6k+hRjCduvfCZ7xnEcITFIBCkI6vgVQFSYxKXvaBHxLmNK UaraegAW/RaZMSHgu8RMUXd1TOw3HLxbb7Rl2JCj/wcGrEbGuYHvNlph0tpxj4UvpyXB mcMHqyw9R8Ysxv7zwrmoBFTN5VrDih7sS0YihEk1qAgCg7VrBDtCILdugxD8djnbf7rT I+yl4uF61whtIgikvv2EijAU9YFJ1XPPcn07G1ANIzWgHqWUTuYx3gF/rK3dXmLI4Lpy hT5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="M/grDbHj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 74si7083726pgc.808.2022.01.21.10.55.41; Fri, 21 Jan 2022 10:55:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="M/grDbHj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345443AbiASH4B (ORCPT + 99 others); Wed, 19 Jan 2022 02:56:01 -0500 Received: from mga14.intel.com ([192.55.52.115]:28552 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235096AbiASH4A (ORCPT ); Wed, 19 Jan 2022 02:56:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1642578960; x=1674114960; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Os56kLMd0kjXcgux4x7i9dbY012Pv8vPfsbJq5keC/4=; b=M/grDbHjzpYn/pO0GX+Cp00Lv9t27OYGlyPcnZbrN/Yus+4FoduMvBT+ mF3fy6hSfw+NyhSTI4tJrX6BbsToq+UymSAHEK0UbJyCwaPENayLkzALL BY0crew1G0M7T/ZEqubSaBds8/gBNre50zHk4IFihj8rnVn9YONke9vLd C61/kz6u0eWUBvwxpwzUW4wa2Q5M82eqin2MbYObcG+TP8YoVCPFX63OS UErs36Ak3AjCunNYxsfVYFyxwIS+wx0e5c4VKp4i9lJn2n60WYJ8g9Nas zPjD0dZk5nSAv9mgc1aVikNKr5Wu4kQgOJ0vHB2LXPJ99KWmdiGD3HaeE A==; X-IronPort-AV: E=McAfee;i="6200,9189,10231"; a="245205912" X-IronPort-AV: E=Sophos;i="5.88,299,1635231600"; d="scan'208";a="245205912" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2022 23:55:59 -0800 X-IronPort-AV: E=Sophos;i="5.88,299,1635231600"; d="scan'208";a="532162832" Received: from zengguan-mobl.ccr.corp.intel.com (HELO [10.238.0.96]) ([10.238.0.96]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2022 23:55:54 -0800 Message-ID: Date: Wed, 19 Jan 2022 15:55:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.5.0 Subject: Re: [PATCH v5 8/8] KVM: VMX: Resize PID-ponter table on demand for IPI virtualization Content-Language: en-US To: Sean Christopherson Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , "kvm@vger.kernel.org" , Dave Hansen , "Luck, Tony" , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , "Huang, Kai" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "Hu, Robert" , "Gao, Chao" References: <20211231142849.611-1-guang.zeng@intel.com> <20211231142849.611-9-guang.zeng@intel.com> <43200b86-aa40-f7a3-d571-dc5fc3ebd421@intel.com> <67262b95-d577-0620-79bf-20fc37906869@intel.com> From: Zeng Guang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/19/2022 1:15 AM, Sean Christopherson wrote: > On Mon, Jan 17, 2022, Zeng Guang wrote: >> On 1/15/2022 12:18 AM, Sean Christopherson wrote: >>> Userspace can simply do KVM_CREATE_VCPU until it hits KVM_MAX_VCPU_IDS... >> IIUC, what you proposed is to use max_vcpus in kvm for x86 arch (currently >> not present yet) and >> provide new api for userspace to notify kvm how many vcpus in current vm >> session prior to vCPU creation. >> Thus IPIv can setup PID-table with this information in one shot. >> I'm thinking this may have several things uncertain: >> 1. cannot identify the exact max APIC ID corresponding to max vcpus >> APIC ID definition is platform dependent. A large APIC ID could be assigned >> to one vCPU in theory even running with >> small max_vcpus. We cannot figure out max APIC ID supported mapping to >> max_vcpus. > Gah, I conflated KVM_CAP_MAX_VCPUS and KVM_MAX_VCPU_IDS. But the underlying idea > still works: extend KVM_MAX_VCPU_IDS to allow userspace to lower the max allowed > vCPU ID to reduce the memory footprint of densely "packed" and/or small VMs. Possibly it may not work well as expected. From user's perspective, assigning max apic id requires knowledge of apic id implementation on various platform. It's hard to let user to determine an appropriate value for every vm session. User may know his exact demand on vcpu resource like cpu number of smp , max cpus for cpu hotplug etc, but highly possibly not know or care about what the apic id should be. If an improper value is provided, we cannot achieve the goal to reduce the memory footprint, but also may lead to unexpected failure on vcpu creation, e.g. actual vcpu id(=apic id) is larger than max apic id assigned.  So this solution seems still have potential problem existing. Besides, it also need change user hypervisor(QEMU etc.) and kvm (kvm arch, vcpu creation policy etc.) which unnecessarily interrelate such modules together. From these point of view, it's given not much advantage other than simplifying IPIv memory management on PID table. >> 2. cannot optimize the memory consumption on PID table to the least at >> run-time >>  In case "-smp=small_n,maxcpus=large_N", kvm has to allocate memory to >> accommodate large_N vcpus at the >> beginning no matter whether all maxcpus will run. > That's a feature. E.g. if userspace defines a max vCPU ID that is larger than > what is required at boot, e.g. to hotplug vCPUs, then consuming a few extra pages > of memory to ensure that IPIv will be supported for hotplugged vCPUs is very > desirable behavior. Observing poor performance on hotplugged vCPUs because the > host was under memory pressure is far worse. > > And the goal isn't to achieve the smallest memory footprint possible, it's to > avoid allocating 32kb of memory when userspace wants to run a VM with only a > handful of vCPUs, i.e. when 4kb will suffice. Consuming 32kb of memory for a VM > with hundreds of vCPUs is a non-issue, e.g. it's highly unlikely to be running > multiple such VMs on a single host, and such hosts will likely have hundreds of > gb of RAM. Conversely, hosts running run small VMs will likely run tens or hudreds > of small VMs, e.g. for container scenarios, in which case reducing the per-VM memory > footprint is much more valuable and also easier to achieve. Agree. This is the purpose to implement this patch. With current solution we proposed,  IPIv just use memory as less as possible in all kinds of scenarios, and keep 4Kb in most cases instead of 32Kb. It's self-adaptive , standalone function module in kvm, no any extra limitation introduced and scalable even future extension on KVM_MAX_VCPU_IDS or new apic id implementation released. How do you think ? :) >> 3. Potential backward-compatible problem >> If running with old QEMU version,  kvm cannot get expected information so as >> to make a fallback to use >> KVM_MAX_VCPU_IDS by default. It's feasible but not benefit on memory >> optimization for PID table. > That's totally fine. This is purely a memory optimization, IPIv will still work > as intended if usersepace doesn't lower the max vCPU ID, it'll just consume a bit > more memory.