Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1752331pxm; Thu, 24 Feb 2022 08:43:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJx4mT9rxpra1blnA1VSBoxq9qUT3H/QZukbFslJbhsxWY2e5BApVkNoHqFP3+BDwNJuSsqH X-Received: by 2002:a05:6a00:1943:b0:4cb:79c9:fa48 with SMTP id s3-20020a056a00194300b004cb79c9fa48mr3693299pfk.47.1645721000872; Thu, 24 Feb 2022 08:43:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645721000; cv=none; d=google.com; s=arc-20160816; b=1HYzoxDHoA88kUcYpecuAVNrEOSvrapSnZxJwngoXj0+cv3s9FHMGkWoea7dghano1 S4fY6kHEezbB/D9oEzL3BjYwnB5bS+ZSXX2DaE0DYHtqMFNWTcYEk68KusXfFurydCfl iwb5FgG/rhiz/L4SFEbtiYOG9a0GS5I0ka0mkq1dJmWAKspLmWllmbJWwRRFePI9xtM8 DRWYu2mMGy5W0r7JlXDbICTmF6D1no60o6QAtgjMV9ZtuWrRfdAMrybRRNC7CupFzz4D SE2F9nZ4Bj3DCH8MpullA8Bw+UfptbloqXGR0Ywg/zfigWziHCrFlgbz1pWooq0EDswb VUyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AMSphy0YfSlqvbAPOSeiz40OzeKpNMEOPe3CyWjrZVQ=; b=dN3wnu2JZJeieMpj3/jhw/2gsl9JfcZBtkyUa4RNrC5LbHihssHkn0qwYN7FtA5eCA hSng4Rsz/fryPsTn51hWpaYmOwlYuNGxvvEki290LPUBwNquLgqCuZIuYrJxupeAmA+9 KGWOCjs4Kxf3Re/f2DhPVfll24pJtaXnDVbTRX0PlLGIgcOWMq9ebQ1zIBbMTow+4u1n ABii/9BnTRVbe2Dsg0h4DGQ7X10iHj/qInBssHouZh8hSiUoyT0kztxyCjJ8PLCf/BrD XJtVWGMV+viJu0t4nIUVmsdj+QZ+LiCo3du5dA0FuPbYYlaXj1xR2H+D8wQH+ueIS5to mYrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CiWPajI+; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id 62si2585178pgc.281.2022.02.24.08.43.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Feb 2022 08:43:20 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CiWPajI+; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8A3B018E421; Thu, 24 Feb 2022 08:22:54 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234754AbiBXP51 (ORCPT + 99 others); Thu, 24 Feb 2022 10:57:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236724AbiBXP5G (ORCPT ); Thu, 24 Feb 2022 10:57:06 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E398712764 for ; Thu, 24 Feb 2022 07:56:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645718193; x=1677254193; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EDcsdaa3CooVKW3VeAd7SwKOoBvqieWwTDQ5K+cpg/U=; b=CiWPajI+jDxF8/UHQ0mVtGMpT5wg6i09fX3YDCZ00wDMRT0R66bXTq4p qefXJ5qcODNrD5QEUsEPUDPlEU+LhK7GP4VRodgm2Mn2WrQnOC54RGFhI D04LmEh1g/Z86DbkhajJiAVG2zH2JLD6XEeuIi9OCtX0tXxWVK+2h+Hd8 +JGgyKqqObYF77zDio5VaO9LKy8gHhhN3eoe2IIz/cXzZfDm4xzakVatl 9vGaG15pWK5G9uYAVJ4Yp9UoZYnQijK8sQPdbbLIQWgGvVGmQF9MT8kw1 5GYY+yFeuBPUAhe2ogCvo1fCriklqdj9rANoS/xwvGDnIrp9YfyYdO93K A==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="251094346" X-IronPort-AV: E=Sophos;i="5.90,134,1643702400"; d="scan'208";a="251094346" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Feb 2022 07:56:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,134,1643702400"; d="scan'208";a="628512993" Received: from black.fi.intel.com ([10.237.72.28]) by FMSMGA003.fm.intel.com with ESMTP; 24 Feb 2022 07:56:24 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 8AB627FA; Thu, 24 Feb 2022 17:56:34 +0200 (EET) From: "Kirill A. Shutemov" To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com, ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com, hpa@zytor.com, jgross@suse.com, jmattson@google.com, joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org, pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com, tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv4 08/30] x86/tdx: Add HLT support for TDX guests Date: Thu, 24 Feb 2022 18:56:08 +0300 Message-Id: <20220224155630.52734-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220224155630.52734-1-kirill.shutemov@linux.intel.com> References: <20220224155630.52734-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The HLT instruction is a privileged instruction, executing it stops instruction execution and places the processor in a HALT state. It is used in kernel for cases like reboot, idle loop and exception fixup handlers. For the idle case, interrupts will be enabled (using STI) before the HLT instruction (this is also called safe_halt()). To support the HLT instruction in TDX guests, it needs to be emulated using TDVMCALL (hypercall to VMM). More details about it can be found in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication Interface (GHCI) specification, section TDVMCALL[Instruction.HLT]. In TDX guests, executing HLT instruction will generate a #VE, which is used to emulate the HLT instruction. But #VE based emulation will not work for the safe_halt() flavor, because it requires STI instruction to be executed just before the TDCALL. Since idle loop is the only user of safe_halt() variant, handle it as a special case. To avoid *safe_halt() call in the idle function, define the tdx_guest_idle() and use it to override the "x86_idle" function pointer for a valid TDX guest. Alternative choices like PV ops have been considered for adding safe_halt() support. But it was rejected because HLT paravirt calls only exist under PARAVIRT_XXL, and enabling it in TDX guest just for safe_halt() use case is not worth the cost. Co-developed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kuppuswamy Sathyanarayanan Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Signed-off-by: Kirill A. Shutemov --- arch/x86/coco/tdcall.S | 13 ++++++++ arch/x86/coco/tdx.c | 66 ++++++++++++++++++++++++++++++++++++-- arch/x86/include/asm/tdx.h | 4 +++ arch/x86/kernel/process.c | 4 +++ 4 files changed, 85 insertions(+), 2 deletions(-) diff --git a/arch/x86/coco/tdcall.S b/arch/x86/coco/tdcall.S index c4dd9468e7d9..3c35a056974d 100644 --- a/arch/x86/coco/tdcall.S +++ b/arch/x86/coco/tdcall.S @@ -138,6 +138,19 @@ SYM_FUNC_START(__tdx_hypercall) movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx + /* + * For the idle loop STI needs to be called directly before the TDCALL + * that enters idle (EXIT_REASON_HLT case). STI instruction enables + * interrupts only one instruction later. If there is a window between + * STI and the instruction that emulates the HALT state, there is a + * chance for interrupts to happen in this window, which can delay the + * HLT operation indefinitely. Since this is the not the desired + * result, conditionally call STI before TDCALL. + */ + testq $TDX_HCALL_ISSUE_STI, %rsi + jz .Lskip_sti + sti +.Lskip_sti: tdcall /* diff --git a/arch/x86/coco/tdx.c b/arch/x86/coco/tdx.c index 86a2f35e7308..0a2e6be0cdae 100644 --- a/arch/x86/coco/tdx.c +++ b/arch/x86/coco/tdx.c @@ -7,6 +7,7 @@ #include #include #include +#include /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 @@ -59,6 +60,62 @@ static void get_info(void) td_info.attributes = out.rdx; } +static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti) +{ + struct tdx_hypercall_args args = { + .r10 = TDX_HYPERCALL_STANDARD, + .r11 = EXIT_REASON_HLT, + .r12 = irq_disabled, + }; + + /* + * Emulate HLT operation via hypercall. More info about ABI + * can be found in TDX Guest-Host-Communication Interface + * (GHCI), section 3.8 TDG.VP.VMCALL. + * + * The VMM uses the "IRQ disabled" param to understand IRQ + * enabled status (RFLAGS.IF) of the TD guest and to determine + * whether or not it should schedule the halted vCPU if an + * IRQ becomes pending. E.g. if IRQs are disabled, the VMM + * can keep the vCPU in virtual HLT, even if an IRQ is + * pending, without hanging/breaking the guest. + */ + return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0); +} + +static bool handle_halt(void) +{ + /* + * Since non safe halt is mainly used in CPU offlining + * and the guest will always stay in the halt state, don't + * call the STI instruction (set do_sti as false). + */ + const bool irq_disabled = irqs_disabled(); + const bool do_sti = false; + + if (__halt(irq_disabled, do_sti)) + return false; + + return true; +} + +void __cpuidle tdx_safe_halt(void) +{ + /* + * For do_sti=true case, __tdx_hypercall() function enables + * interrupts using the STI instruction before the TDCALL. So + * set irq_disabled as false. + */ + const bool irq_disabled = false; + const bool do_sti = true; + + /* + * Use WARN_ONCE() to report the failure. + */ + if (__halt(irq_disabled, do_sti)) + WARN_ONCE(1, "HLT instruction emulation failed\n"); +} + void tdx_get_ve_info(struct ve_info *ve) { struct tdx_module_output out; @@ -98,8 +155,13 @@ static bool virt_exception_user(struct pt_regs *regs, struct ve_info *ve) /* Handle the kernel #VE */ static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve) { - pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); - return false; + switch (ve->exit_reason) { + case EXIT_REASON_HLT: + return handle_halt(); + default: + pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); + return false; + } } bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 34cf998ad534..e6e23ade53a6 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -13,6 +13,7 @@ #define TDX_HYPERCALL_STANDARD 0 #define TDX_HCALL_HAS_OUTPUT BIT(0) +#define TDX_HCALL_ISSUE_STI BIT(1) #define TDX_SEAMCALL_VMFAILINVALID 0x8000FF00FFFF0000ULL @@ -79,9 +80,12 @@ void tdx_get_ve_info(struct ve_info *ve); bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); +void tdx_safe_halt(void); + #else static inline void tdx_early_init(void) { }; +static inline void tdx_safe_halt(void) { }; #endif /* CONFIG_INTEL_TDX_GUEST */ diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e131d71b3cae..2e90d57cf86e 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -46,6 +46,7 @@ #include #include #include +#include #include "process.h" @@ -873,6 +874,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c) } else if (prefer_mwait_c1_over_halt(c)) { pr_info("using mwait in idle threads\n"); x86_idle = mwait_idle; + } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + pr_info("using TDX aware idle routine\n"); + x86_idle = tdx_safe_halt; } else x86_idle = default_idle; } -- 2.34.1