Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Message-ID: <67dfdb0c-c483-ec3c-4fb8-57086ffde9bf@intel.com>
Date:   Thu, 24 Feb 2022 10:52:23 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.5.0
Content-Language: en-US
To:     "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
        luto@kernel.org, peterz@infradead.org
Cc:     sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
        ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
        hpa@zytor.com, jgross@suse.com, jmattson@google.com,
        joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
        pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
        tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
        thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org,
        linux-kernel@vger.kernel.org
References: <20220224155630.52734-1-kirill.shutemov@linux.intel.com>
 <20220224155630.52734-10-kirill.shutemov@linux.intel.com>
From:   Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCHv4 09/30] x86/tdx: Add MSR support for TDX guests
In-Reply-To: <20220224155630.52734-10-kirill.shutemov@linux.intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk

On 2/24/22 07:56, Kirill A. Shutemov wrote:
> Use hypercall to emulate MSR read/write for the TDX platform.
> 
> There are two viable approaches for doing MSRs in a TD guest:
> 
> 1. Execute the RDMSR/WRMSR instructions like most VMs and bare metal
>    do. Some will succeed, others will cause a #VE. All of those that
>    cause a #VE will be handled with a TDCALL.
> 2. Use paravirt infrastructure.  The paravirt hook has to keep a list
>    of which MSRs would cause a #VE and use a TDCALL.  All other MSRs
>    execute RDMSR/WRMSR instructions directly.
> 
> The second option can be ruled out because the list of MSRs was
> challenging to maintain. That leaves option #1 as the only viable
> solution for the minimal TDX support.
> 
> For performance-critical MSR writes (like TSC_DEADLINE), future patches
> will replace the WRMSR/#VE sequence with the direct TDCALL.

This will still leave us with a list of non-#VE-inducing MSRs.  That's
not great.  But, if we miss an MSR in the performance-critical list, the
result is a slow WRMSR->#VE.  If we miss an MSR in the paravirt
approach, we induce a fatal #VE.

Please add something to that effect if you revise this patch.

> RDMSR and WRMSR specification details can be found in
> Guest-Host-Communication Interface (GHCI) for Intel Trust Domain
> Extensions (Intel TDX) specification, sec titled "TDG.VP.
> VMCALL<Instruction.RDMSR>" and "TDG.VP.VMCALL<Instruction.WRMSR>".
> 
> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/coco/tdx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/x86/coco/tdx.c b/arch/x86/coco/tdx.c
> index 0a2e6be0cdae..89992593a209 100644
> --- a/arch/x86/coco/tdx.c
> +++ b/arch/x86/coco/tdx.c
> @@ -116,6 +116,44 @@ void __cpuidle tdx_safe_halt(void)
>  		WARN_ONCE(1, "HLT instruction emulation failed\n");
>  }
>  
> +static bool read_msr(struct pt_regs *regs)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_READ,

Just a minor note: these "EXIT_REASON_FOO"'s in r11 are effectively
*the* hypercall being made, right?

The hypercall is being made in response to what would have otherwise
been a MSR read VMEXIT.  But, it's a *bit* goofy to see them here when
the TDX guest isn't doing any kind of VMEXIT.

I wish there were some clarity around it, but it's not a deal breaker.

> +		.r12 = regs->cx,
> +	};
> +
> +	/*
> +	 * Emulate the MSR read via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
> +	 */
> +	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
> +		return false;
> +
> +	regs->ax = lower_32_bits(args.r11);
> +	regs->dx = upper_32_bits(args.r11);
> +	return true;
> +}
> +
> +static bool write_msr(struct pt_regs *regs)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_WRITE,
> +		.r12 = regs->cx,
> +		.r13 = (u64)regs->dx << 32 | regs->ax,
> +	};
> +
> +	/*
> +	 * Emulate the MSR write via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI) section titled "TDG.VP.VMCALL<Instruction.WRMSR>".
> +	 */
> +	return !__tdx_hypercall(&args, 0);
> +}
> +
>  void tdx_get_ve_info(struct ve_info *ve)
>  {
>  	struct tdx_module_output out;
> @@ -158,6 +196,10 @@ static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
>  	switch (ve->exit_reason) {
>  	case EXIT_REASON_HLT:
>  		return handle_halt();
> +	case EXIT_REASON_MSR_READ:
> +		return read_msr(regs);
> +	case EXIT_REASON_MSR_WRITE:
> +		return write_msr(regs);
>  	default:
>  		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>  		return false;

I still think it's annoying that all these WRMSR's are turned into #VE,
but this does seem like the best approach given the architecture that we
have.  Having the optimized ones seems like a good compromise.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>