Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp5034841pxb; Tue, 5 Oct 2021 16:08:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyhmFaq27m7Iz5DYt+pn9rjS+BZG4eZEtNR4RRMf+2n3O17L/y9c2f8SX7fivKRFWHGxGV X-Received: by 2002:aa7:858e:0:b0:44c:5e03:e7f7 with SMTP id w14-20020aa7858e000000b0044c5e03e7f7mr14139218pfn.9.1633475336449; Tue, 05 Oct 2021 16:08:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633475336; cv=none; d=google.com; s=arc-20160816; b=peqR7C9ntn6GTm4A5Ji4TTW9jUCqsc37FtjlJjjGUcnCTK8Hs7LF6HJDFZZ14urbii XJQXmgo82p3jBIr8+piIRVfawcSyZptYkBMqK9S2r4Yn+bfPrRwlgc9I5m52UCi2Gmoh shTUxtuxYdmQyIqrUw86mMItjlsynO6fTo6v9f9EwI/0cQzD3lPLUl2PKC8S/E3CLw6T IUSmnvEnEi+HvIEEmqRmp+EHPuDaMxGkxr9oqZkBum7DSxC792rJkhmJkeBmDxTng697 ghtMi7UWbFTLRZ9BQ8UORiJ+qv2htZasymdZLFi5mHCb5kFS30euRf9QNeUsiUVpO9rL SJCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=qh6EbOBi9dJGjTwf5ldP3QrgS25E6PQwcFDWtxJYQUc=; b=VNLZ8i0IP2m3el0uLEIvRGycCLmITgNHGjC9yBKOQBBH6xM7gTwSK4lLSl7P9WwyZf MLJwwU77R7GMdvYl/ri6At9C3kdN5eS2orNLaA455/Mr8+LuTz3QS9U5ZHvnpTWgCk8/ xzMrdNHkl60T2+4VLN01k4j8nTHRpBbCNMOGTgVOMGVU5XG15FjxrMFpfe0QBRYt0M57 p2tOB9rY5UUZjpdXAnWWJNLVieuzjOKGGXNYvySXQ5/BHKRVawu+uMS2QAFyj4NQJX1c V3rKdurLcRAqROoSB3cBwVDK50f0hbH3mdeJUZUQiRzHCEElmXQPQ8KoiB6QH7+oQfel 1Lgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mg14si2062185pjb.121.2021.10.05.16.08.43; Tue, 05 Oct 2021 16:08:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237052AbhJEXIR (ORCPT + 99 others); Tue, 5 Oct 2021 19:08:17 -0400 Received: from mga12.intel.com ([192.55.52.136]:14161 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236993AbhJEXIG (ORCPT ); Tue, 5 Oct 2021 19:08:06 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10128"; a="205998518" X-IronPort-AV: E=Sophos;i="5.85,350,1624345200"; d="scan'208";a="205998518" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Oct 2021 16:06:15 -0700 X-IronPort-AV: E=Sophos;i="5.85,350,1624345200"; d="scan'208";a="438907990" Received: from alyee-mobl.amr.corp.intel.com (HELO skuppusw-desk1.amr.corp.intel.com) ([10.254.5.222]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Oct 2021 16:06:13 -0700 From: Kuppuswamy Sathyanarayanan To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, Paolo Bonzini , David Hildenbrand , Andrea Arcangeli , Josh Poimboeuf , "H . Peter Anvin" Cc: Dave Hansen , Tony Luck , Dan Williams , Andi Kleen , Kirill Shutemov , Sean Christopherson , Kuppuswamy Sathyanarayanan , Kuppuswamy Sathyanarayanan , linux-kernel@vger.kernel.org Subject: [PATCH v7 2/6] x86/boot: Avoid #VE during boot for TDX platforms Date: Tue, 5 Oct 2021 16:05:46 -0700 Message-Id: <20211005230550.1819406-3-sathyanarayanan.kuppuswamy@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211005230550.1819406-1-sathyanarayanan.kuppuswamy@linux.intel.com> References: <20211005230550.1819406-1-sathyanarayanan.kuppuswamy@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson There are a few MSRs and control register bits which the kernel normally needs to modify during boot. But, TDX disallows modification of these registers to help provide consistent security guarantees. Fortunately, TDX ensures that these are all in the correct state before the kernel loads, which means the kernel has no need to modify them. The conditions to avoid are:   * Any writes to the EFER MSR   * Clearing CR0.NE   * Clearing CR3.MCE This theoretically makes guest boot more fragile. If, for instance, EFER was set up incorrectly and a WRMSR was performed, it will trigger early exception panic or a triple fault, if it's before early exceptions are set up. However, this is likely to trip up the guest BIOS long before control reaches the kernel. In any case, these kinds of problems are unlikely to occur in production environments, and developers have good debug tools to fix them quickly.  Signed-off-by: Sean Christopherson Reviewed-by: Andi Kleen Reviewed-by: Dan Williams Signed-off-by: Kuppuswamy Sathyanarayanan --- Changes since v6: * None arch/x86/boot/compressed/head_64.S | 16 ++++++++++++---- arch/x86/boot/compressed/pgtable.h | 2 +- arch/x86/kernel/head_64.S | 20 ++++++++++++++++++-- arch/x86/realmode/rm/trampoline_64.S | 23 +++++++++++++++++++---- 4 files changed, 50 insertions(+), 11 deletions(-) diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 572c535cf45b..e71001b380fe 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -643,12 +643,20 @@ SYM_CODE_START(trampoline_32bit_src) movl $MSR_EFER, %ecx rdmsr btsl $_EFER_LME, %eax + /* Avoid writing EFER if no change was made (for TDX guest) */ + jc 1f wrmsr - popl %edx +1: popl %edx popl %ecx /* Enable PAE and LA57 (if required) paging modes */ - movl $X86_CR4_PAE, %eax + movl %cr4, %eax + /* + * Clear all bits except CR4.MCE, which is preserved. + * Clearing CR4.MCE will #VE in TDX guests. + */ + andl $X86_CR4_MCE, %eax + orl $X86_CR4_PAE, %eax testl %edx, %edx jz 1f orl $X86_CR4_LA57, %eax @@ -662,8 +670,8 @@ SYM_CODE_START(trampoline_32bit_src) pushl $__KERNEL_CS pushl %eax - /* Enable paging again */ - movl $(X86_CR0_PG | X86_CR0_PE), %eax + /* Enable paging again. Avoid clearing X86_CR0_NE for TDX */ + movl $(X86_CR0_PG | X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 lret diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h index 6ff7e81b5628..cc9b2529a086 100644 --- a/arch/x86/boot/compressed/pgtable.h +++ b/arch/x86/boot/compressed/pgtable.h @@ -6,7 +6,7 @@ #define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0 #define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE -#define TRAMPOLINE_32BIT_CODE_SIZE 0x70 +#define TRAMPOLINE_32BIT_CODE_SIZE 0x80 #define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d8b3ebd2bb85..96beac9eff42 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -141,7 +141,13 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) 1: /* Enable PAE mode, PGE and LA57 */ - movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx + movq %cr4, %rcx + /* + * Clear all bits except CR4.MCE, which is preserved. + * Clearing CR4.MCE will #VE in TDX guests. + */ + andl $X86_CR4_MCE, %ecx + orl $(X86_CR4_PAE | X86_CR4_PGE), %ecx #ifdef CONFIG_X86_5LEVEL testl $1, __pgtable_l5_enabled(%rip) jz 1f @@ -229,13 +235,23 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL) /* Setup EFER (Extended Feature Enable Register) */ movl $MSR_EFER, %ecx rdmsr + /* + * Preserve current value of EFER for comparison and to skip + * EFER writes if no change was made (for TDX guest) + */ + movl %eax, %edx btsl $_EFER_SCE, %eax /* Enable System Call */ btl $20,%edi /* No Execute supported? */ jnc 1f btsl $_EFER_NX, %eax btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) -1: wrmsr /* Make changes effective */ + /* Avoid writing EFER if no change was made (for TDX guest) */ +1: cmpl %edx, %eax + je 1f + xor %edx, %edx + wrmsr /* Make changes effective */ +1: /* Setup cr0 */ movl $CR0_STATE, %eax /* Make changes effective */ diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S index ae112a91592f..0fdd74054044 100644 --- a/arch/x86/realmode/rm/trampoline_64.S +++ b/arch/x86/realmode/rm/trampoline_64.S @@ -143,13 +143,27 @@ SYM_CODE_START(startup_32) movl %eax, %cr3 # Set up EFER + movl $MSR_EFER, %ecx + rdmsr + /* + * Skip writing to EFER if the register already has desired + * value (to avoid #VE for the TDX guest). + */ + cmp pa_tr_efer, %eax + jne .Lwrite_efer + cmp pa_tr_efer + 4, %edx + je .Ldone_efer +.Lwrite_efer: movl pa_tr_efer, %eax movl pa_tr_efer + 4, %edx - movl $MSR_EFER, %ecx wrmsr - # Enable paging and in turn activate Long Mode - movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_PE), %eax +.Ldone_efer: + /* + * Enable paging and in turn activate Long Mode. Avoid clearing + * X86_CR0_NE for TDX. + */ + movl $(X86_CR0_PG | X86_CR0_WP | X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 /* @@ -169,7 +183,8 @@ SYM_CODE_START(pa_trampoline_compat) movl $rm_stack_end, %esp movw $__KERNEL_DS, %dx - movl $X86_CR0_PE, %eax + /* Avoid clearing X86_CR0_NE for TDX */ + movl $(X86_CR0_NE | X86_CR0_PE), %eax movl %eax, %cr0 ljmpl $__KERNEL32_CS, $pa_startup_32 SYM_CODE_END(pa_trampoline_compat) -- 2.25.1