Date:   Tue, 14 Dec 2021 18:56:11 +0000
From:   Sean Christopherson <seanjc@google.com>
To:     Kechen Lu <kechenl@nvidia.com>
Cc:     kvm@vger.kernel.org, pbonzini@redhat.com, wanpengli@tencent.com,
        rkrcmar@redhat.com, vkuznets@redhat.com, somduttar@nvidia.com,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] KVM: x86: add kvm per-vCPU exits disable capability
Message-ID: <Ybjoy5h9a8nKK9X4@google.com>
References: <20211214033227.264714-1-kechenl@nvidia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20211214033227.264714-1-kechenl@nvidia.com>
Precedence: bulk

On Mon, Dec 13, 2021, Kechen Lu wrote:
> ---
>  Documentation/virt/kvm/api.rst  | 8 +++++++-
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/cpuid.c            | 2 +-
>  arch/x86/kvm/svm/svm.c          | 2 +-
>  arch/x86/kvm/vmx/vmx.c          | 4 ++--
>  arch/x86/kvm/x86.c              | 5 ++++-
>  arch/x86/kvm/x86.h              | 5 +++--
>  include/uapi/linux/kvm.h        | 4 +++-
>  8 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..9a44896dc950 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6580,6 +6580,9 @@ branch to guests' 0x200 interrupt vector.
>  
>  :Architectures: x86
>  :Parameters: args[0] defines which exits are disabled
> +             args[1] defines vCPU bitmask based on vCPU ID, 1 on
> +                     corresponding vCPU ID bit would enable exists
> +                     on that vCPU
>  :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
>  
>  Valid bits in args[0] are::
> @@ -6588,13 +6591,16 @@ Valid bits in args[0] are::
>    #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
>    #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
>    #define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
> +  #define KVM_X86_DISABLE_EXITS_PER_VCPU         (1UL << 63)

This doesn't scale, there are already plenty of use cases for VMs with 65+ vCPUs.
At a glance, I don't see anything fundamentally wrong with simply supporting a
vCPU-scoped ioctl().

The VM-scoped version already has an undocumented requirement that it be called
before vCPUs are created, because neither VMX nor SVM will update the controls
if exits are disabled after vCPUs are created.  That means the flags checked at
runtime can be purely vCPU, with the per-VM flag picked up at vCPU creation.

Probably worth formalizing that requirement too, e.g.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 85127b3e3690..6c9bc022a522 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5775,6 +5775,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
                if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
                        break;

+               mutex_lock(&kvm->lock);
+               if (kvm->created_vcpus)
+                       goto disable_exits_unlock;
+
                if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
                        kvm_can_mwait_in_guest())
                        kvm->arch.mwait_in_guest = true;
@@ -5785,6 +5789,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
                if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
                        kvm->arch.cstate_in_guest = true;
                r = 0;
+disable_exits_unlock:
+               mutex_unlock(&kvm->lock);
                break;
        case KVM_CAP_MSR_PLATFORM_INFO:
                kvm->arch.guest_can_read_msr_platform_info = cap->args[0];