Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Message-ID: <9b4b3581-925b-32a8-8a4f-fdd8d98f2164@intel.com>
Date:   Thu, 24 Feb 2022 10:42:56 -0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.5.0
Content-Language: en-US
To:     "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
        luto@kernel.org, peterz@infradead.org
Cc:     sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
        ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
        hpa@zytor.com, jgross@suse.com, jmattson@google.com,
        joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
        pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
        tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
        thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org,
        linux-kernel@vger.kernel.org
References: <20220224155630.52734-1-kirill.shutemov@linux.intel.com>
 <20220224155630.52734-9-kirill.shutemov@linux.intel.com>
From:   Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCHv4 08/30] x86/tdx: Add HLT support for TDX guests
In-Reply-To: <20220224155630.52734-9-kirill.shutemov@linux.intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: bulk

On 2/24/22 07:56, Kirill A. Shutemov wrote:
> The HLT instruction is a privileged instruction, executing it stops
> instruction execution and places the processor in a HALT state. It
> is used in kernel for cases like reboot, idle loop and exception fixup
> handlers. For the idle case, interrupts will be enabled (using STI)
> before the HLT instruction (this is also called safe_halt()).
> 
> To support the HLT instruction in TDX guests, it needs to be emulated
> using TDVMCALL (hypercall to VMM). More details about it can be found
> in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
> Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
> 
> In TDX guests, executing HLT instruction will generate a #VE, which is
> used to emulate the HLT instruction. But #VE based emulation will not
> work for the safe_halt() flavor, because it requires STI instruction to
> be executed just before the TDCALL. Since idle loop is the only user of
> safe_halt() variant, handle it as a special case.
> 
> To avoid *safe_halt() call in the idle function, define the
> tdx_guest_idle() and use it to override the "x86_idle" function pointer
> for a valid TDX guest.
> 
> Alternative choices like PV ops have been considered for adding
> safe_halt() support. But it was rejected because HLT paravirt calls
> only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
> safe_halt() use case is not worth the cost.

Thanks for all the history and background here.

> diff --git a/arch/x86/coco/tdcall.S b/arch/x86/coco/tdcall.S
> index c4dd9468e7d9..3c35a056974d 100644
> --- a/arch/x86/coco/tdcall.S
> +++ b/arch/x86/coco/tdcall.S
> @@ -138,6 +138,19 @@ SYM_FUNC_START(__tdx_hypercall)
>  
>  	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>  
> +	/*
> +	 * For the idle loop STI needs to be called directly before the TDCALL
> +	 * that enters idle (EXIT_REASON_HLT case). STI instruction enables
> +	 * interrupts only one instruction later. If there is a window between
> +	 * STI and the instruction that emulates the HALT state, there is a
> +	 * chance for interrupts to happen in this window, which can delay the
> +	 * HLT operation indefinitely. Since this is the not the desired
> +	 * result, conditionally call STI before TDCALL.
> +	 */
> +	testq $TDX_HCALL_ISSUE_STI, %rsi
> +	jz .Lskip_sti
> +	sti
> +.Lskip_sti:
>  	tdcall
>  
>  	/*
> diff --git a/arch/x86/coco/tdx.c b/arch/x86/coco/tdx.c
> index 86a2f35e7308..0a2e6be0cdae 100644
> --- a/arch/x86/coco/tdx.c
> +++ b/arch/x86/coco/tdx.c
> @@ -7,6 +7,7 @@
>  #include <linux/cpufeature.h>
>  #include <asm/coco.h>
>  #include <asm/tdx.h>
> +#include <asm/vmx.h>
>  
>  /* TDX module Call Leaf IDs */
>  #define TDX_GET_INFO			1
> @@ -59,6 +60,62 @@ static void get_info(void)
>  	td_info.attributes = out.rdx;
>  }
>  
> +static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_HLT,
> +		.r12 = irq_disabled,
> +	};
> +
> +	/*
> +	 * Emulate HLT operation via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI), section 3.8 TDG.VP.VMCALL<Instruction.HLT>.
> +	 *
> +	 * The VMM uses the "IRQ disabled" param to understand IRQ
> +	 * enabled status (RFLAGS.IF) of the TD guest and to determine
> +	 * whether or not it should schedule the halted vCPU if an
> +	 * IRQ becomes pending. E.g. if IRQs are disabled, the VMM
> +	 * can keep the vCPU in virtual HLT, even if an IRQ is
> +	 * pending, without hanging/breaking the guest.
> +	 */
> +	return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
> +}
> +
> +static bool handle_halt(void)
> +{
> +	/*
> +	 * Since non safe halt is mainly used in CPU offlining
> +	 * and the guest will always stay in the halt state, don't
> +	 * call the STI instruction (set do_sti as false).
> +	 */
> +	const bool irq_disabled = irqs_disabled();
> +	const bool do_sti = false;
> +
> +	if (__halt(irq_disabled, do_sti))
> +		return false;
> +
> +	return true;
> +}

One other note: I really do like the silly:

	const bool do_sti = false;

variables as opposed to doing gunk like:

	__halt(irq_disabled, false));

Thanks for doing that.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>