From:   Thomas Gleixner <tglx@linutronix.de>
To:     Yang Zhong <yang.zhong@intel.com>, x86@kernel.org,
        kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
        pbonzini@redhat.com
Cc:     seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com,
        jing2.liu@linux.intel.com, jing2.liu@intel.com,
        yang.zhong@intel.com
Subject: Re: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly
In-Reply-To: <20211208000359.2853257-16-yang.zhong@intel.com>
References: <20211208000359.2853257-1-yang.zhong@intel.com>
 <20211208000359.2853257-16-yang.zhong@intel.com>
Date:   Sat, 11 Dec 2021 01:10:47 +0100
Message-ID: <87pmq4vw54.ffs@tglx>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote:
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 5089f2e7dc22..9811dc98d550 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>  	fpstate->is_guest	= true;
>  
>  	gfpu->fpstate		= fpstate;
> +	gfpu->xfd_err           = XFD_ERR_GUEST_DISABLED;

This wants to be part of the previous patch, which introduces the field.

>  	gfpu->user_xfeatures	= fpu_user_cfg.default_features;
>  	gfpu->user_perm		= fpu_user_cfg.default_features;
>  	fpu_init_guest_permissions(gfpu);
> @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
>  		fpu->fpstate = guest_fps;
>  		guest_fps->in_use = true;
>  	} else {
> +		fpu_save_guest_xfd_err(guest_fpu);

Hmm. See below.

>  		guest_fps->in_use = false;
>  		fpu->fpstate = fpu->__task_fpstate;
>  		fpu->__task_fpstate = NULL;
> @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  		kvm_steal_time_set_preempted(vcpu);
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  
> +	if (vcpu->preempted)
> +		fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);

I'm not really exited about the thought of an exception cause register
in guest clobbered state.

Aside of that I really have to ask the question why all this is needed?

#NM in the guest is slow path, right? So why are you trying to optimize
for it?

The straight forward solution to this is:

    1) Trap #NM and MSR_XFD_ERR write

    2) When the guest triggers #NM is takes an VMEXIT and the host
       does:

                rdmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

       injects the #NM and goes on.

    3) When the guest writes to MSR_XFD_ERR it takes an VMEXIT and
       the host does:

           vcpu->arch.guest_fpu.xfd_err = msrval;
           wrmsrl(MSR_XFD_ERR, msrval);

      and goes back.

    4) Before entering the preemption disabled section of the VCPU loop
       do:
       
           if (vcpu->arch.guest_fpu.xfd_err)
                      wrmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

    5) Before leaving the preemption disabled section of the VCPU loop
       do:
       
           if (vcpu->arch.guest_fpu.xfd_err)
                      wrmsrl(MSR_XFD_ERR, 0);

It's really that simple and pretty much 0 overhead for the regular case.

If the guest triggers #NM with a high frequency then taking the VMEXITs
is the least of the problems. That's not a realistic use case, really.

Hmm?

Thanks,

        tglx