Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Message-ID: <2207797e-6441-8abc-9ffc-d231fa4ca3fc@intel.com>
Date:   Tue, 25 Jan 2022 00:40:01 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 Thunderbird/91.5.0
Subject: Re: [PATCH v5 8/8] KVM: VMX: Resize PID-ponter table on demand for
 IPI virtualization
Content-Language: en-US
To:     Sean Christopherson <seanjc@google.com>
Cc:     Paolo Bonzini <pbonzini@redhat.com>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        Jim Mattson <jmattson@google.com>,
        Joerg Roedel <joro@8bytes.org>,
        "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        "Luck, Tony" <tony.luck@intel.com>,
        Kan Liang <kan.liang@linux.intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Kim Phillips <kim.phillips@amd.com>,
        Jarkko Sakkinen <jarkko@kernel.org>,
        Jethro Beekman <jethro@fortanix.com>,
        "Huang, Kai" <kai.huang@intel.com>,
        "x86@kernel.org" <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "Hu, Robert" <robert.hu@intel.com>,
        "Gao, Chao" <chao.gao@intel.com>
References: <20211231142849.611-1-guang.zeng@intel.com>
 <20211231142849.611-9-guang.zeng@intel.com> <YeCjHbdAikyIFQc9@google.com>
 <43200b86-aa40-f7a3-d571-dc5fc3ebd421@intel.com>
 <YeGiVCn0wNH9eqxX@google.com>
 <67262b95-d577-0620-79bf-20fc37906869@intel.com>
 <Yeb1vkEclYzD27R/@google.com>
 <aba84be5-562a-369e-913d-1b834c141cc6@intel.com>
 <Yei0d0KVnNphPrP3@google.com>
From:   Zeng Guang <guang.zeng@intel.com>
In-Reply-To: <Yei0d0KVnNphPrP3@google.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: bulk

On 1/20/2022 9:01 AM, Sean Christopherson wrote:
> On Wed, Jan 19, 2022, Zeng Guang wrote:
>> It's self-adaptive , standalone function module in kvm, no any extra
>> limitation introduced
> I disagree.  Its failure mode on OOM is to degrade guest performance, _that_ is
> a limitation.  OOM is absolutely something that should be immediately communicated
> to userspace in a way that userspace can take action.
If memory allocation fails, PID-pointer table stop updating and keep using
the old one.  All IPIs from other vcpus will go through APIC-Write VM-exits
and won't get performance improvement from IPI virtualization to this new
created vcpu. Right, it's a limitation though it doesn't impact the 
effectiveness
of IPI virtualization among existing vcpus.
>> and scalable even future extension on KVM_MAX_VCPU_IDS or new apic id
>> implementation released.
>>
>> How do you think ? :)
> Heh, I think I've made it quite clear that I think it's unnecesary complexity in
> KVM.  It's not a hill I'll die on, e.g. if Paolo and others feel it's the right
> approach then so be it, but I really, really dislike the idea of dynamically
> changing the table, KVM has a long and sordid history of botching those types
> of flows/features.

To follow your proposal, we think about the feasible implementation as 
follows:
1. Define new parameter apic_id_limit in struct kvm_arch and initialized
as KVM_MAX_VCPU_IDS by default.

2. New vm ioclt KVM_SET_APICID_LIMIT to allow user space set the possible
max apic id required in the vm session before vcpu creation. Currently
QEMU calculates the limit to CPU APIC ID up to max cpus assigned for
hotpluggable cpu. It simply uses package/die/core/smt model to get bit
width of id field on each level (not totally comply with CPUID 1f/0b) and
make apic id for specific vcpu index. We can notify kvm this apic id limit
to ensure memory enough for PID-table.

3. Need check whether id is less than min(apic_id_limit, KVM_MAX_VCPU_IDS)
in vcpu creation. Otherwise return error.

4. Allocate memory covering vcpus with the id up to apic_id_limit for PID
table during the first vcpu creation. Proper lock still needed to 
protect PID
table setup from race condition. If OOM happens, current vcpu creation
fails either and return error back to user space.

Plz let us know whether we can go for this solution further. Thanks.