Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp720944ybt; Wed, 17 Jun 2020 12:09:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw1s6G3tdQfi7DGL4BRe/vEV/ZcGOvsVO3Cob6ZxPyOnxxmKheOIhL9IRaADQyiHr+5i4a3 X-Received: by 2002:a17:906:f2d9:: with SMTP id gz25mr598098ejb.467.1592420956830; Wed, 17 Jun 2020 12:09:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592420956; cv=none; d=google.com; s=arc-20160816; b=Td2G+K7cSjNhker0jTxFEsyZ0Qj7LO5Y9M8PdlTRzrOqXDZ3fAEoEDD6yh/iy0fgFH qCyxVmX3TIs0y9kQTArp/6E1E2RmaSPUkX/10OJvJWVq2hXvSTIb8tpt0M3RjVzDL1Lu YEJYmNCtXTAYFzTuCeyWxeM5HtmCjKG0QMtk0F7BX1a8NWXnq2cec+7oLD4IyXON4fPC FadtNKODfDsuaWllZrBOfU4cb3+ruKgcyCvbgO7+JVkpslnhk+QPjh3T/u5O6lf1gL9l qQXM1RSsCJhcW9xZTzWht/JjqdZwFddJKIcDc89b6YrcNRpZ12N7tDhnUK6VEtUAz/eR As+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :ironport-sdr:ironport-sdr; bh=DUWOBoIPdfKA8S+4qyMjewxTL6vROOaxBSt+xAam72k=; b=tjSzlGJwkYUpihKzgPjI89bpItKO89KGxt7XFZuDkulKJUhGLriKhtj4sXqjDI/1yc 4QZfVVO82m629VpaUE9XIsMWHEfCo02eoplkFjuGZNqxqKM9pxsxdTfv5SODgRmKb/65 ENO4zFxKYwE61KrXzgHyE6SbRZtuBxAWt4vjB5k2jMgL1lBSy6Je9Op3qIvIBS1VgC6m q8cko71KzxmxCt1F9VsY7LrOzV80XRJz7meGr/JLyNJDjnjspTMP9GU+B/ItLwdBnoNy 8Id/37C2CHLu2EsnLe/X4IuWnqqC3fkUDcPiS+BIa9qhqQvl3WZ0NHpoLuYRP47O0C10 YxuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id do5si637620ejc.105.2020.06.17.12.08.54; Wed, 17 Jun 2020 12:09:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727860AbgFQTFP (ORCPT + 99 others); Wed, 17 Jun 2020 15:05:15 -0400 Received: from mga04.intel.com ([192.55.52.120]:48978 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726991AbgFQTFM (ORCPT ); Wed, 17 Jun 2020 15:05:12 -0400 IronPort-SDR: NooAGNyQf0ZPhrz42xP9jstf9Gded30rYMc3C7Pr7lebLop2SdcCjeEBA8MAl2wsQFi0pagCu2 tJjbBA3sdeEA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jun 2020 12:05:11 -0700 IronPort-SDR: 16liMDp6fjMJskgBNoaSNQ/ghF+MJxz6rqa/Oic4ytifJV5tNpXBCV6ChubOykl3ZRrsk1hjFv 8O6IvGieAe+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,523,1583222400"; d="scan'208";a="273609675" Received: from gza.jf.intel.com ([10.54.75.28]) by orsmga003.jf.intel.com with ESMTP; 17 Jun 2020 12:05:11 -0700 From: John Andersen To: corbet@lwn.net, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, shuah@kernel.org, sean.j.christopherson@intel.com, liran.alon@oracle.com, drjones@redhat.com, rick.p.edgecombe@intel.com, kristen@linux.intel.com Cc: vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, mchehab+huawei@kernel.org, gregkh@linuxfoundation.org, paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, jgross@suse.com, mike.kravetz@oracle.com, oneukum@suse.com, luto@kernel.org, peterz@infradead.org, fenghua.yu@intel.com, reinette.chatre@intel.com, vineela.tummalapalli@intel.com, dave.hansen@linux.intel.com, john.s.andersen@intel.com, arjan@linux.intel.com, caoj.fnst@cn.fujitsu.com, bhe@redhat.com, nivedita@alum.mit.edu, keescook@chromium.org, dan.j.williams@intel.com, eric.auger@redhat.com, aaronlewis@google.com, peterx@redhat.com, makarandsonare@google.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-hardening@lists.openwall.com Subject: [PATCH 4/4] X86: Use KVM CR pin MSRs Date: Wed, 17 Jun 2020 12:07:57 -0700 Message-Id: <20200617190757.27081-5-john.s.andersen@intel.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20200617190757.27081-1-john.s.andersen@intel.com> References: <20200617190757.27081-1-john.s.andersen@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Strengthen existing control register pinning when running paravirtualized under KVM. Check which bits KVM supports pinning for each control register and only pin supported bits which are already pinned via the existing native protection. Write to KVM CR0/4 pinned MSRs to enable pinning. Initiate KVM assisted pinning directly following the setup of native pinning on boot CPU. For non-boot CPUs initiate paravirtualized pinning on CPU identification. Identification of non-boot CPUs takes place after the boot CPU has setup native CR pinning. Therefore, non-boot CPUs access pinned bits setup by the boot CPU and request that those be pinned. All CPUs request paravirtualized pinning of the same bits which are already pinned natively. Guests using the kexec system call currently do not support paravirtualized control register pinning. This is due to early boot code writing known good values to control registers, these values do not contain the protected bits. This is due to CPU feature identification being done at a later time, when the kernel properly checks if it can enable protections. As such, the pv_cr_pin command line option has been added which instructs the kernel to disable kexec in favor of enabling paravirtualized control register pinning. crashkernel is also disabled when the pv_cr_pin parameter is specified due to its reliance on kexec. When we fix kexec, we will still need a way for a kernel with support to know if the kernel it is attempting to load has support. If a kernel with this enabled attempts to kexec a kernel where this is not supported, it would trigger a fault almost immediately. Liran suggested adding a section to the built image acting as a flag to signify support for being kexec'd by a kernel with pinning enabled. Should that approach be implemented, it is likely that the command line flag (pv_cr_pin) would still be desired for some deprecation period. We wouldn't want the default behavior to change from being able to kexec older kernels to not being able to, as this might break some users workflows. Signed-off-by: John Andersen --- .../admin-guide/kernel-parameters.txt | 11 ++++++ arch/x86/Kconfig | 10 +++++ arch/x86/include/asm/kvm_para.h | 28 +++++++++++++ arch/x86/kernel/cpu/common.c | 5 +++ arch/x86/kernel/kvm.c | 39 +++++++++++++++++++ arch/x86/kernel/setup.c | 8 ++++ 6 files changed, 101 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 89386f6f3ab6..54fb2b5ab8fc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3926,6 +3926,17 @@ [KNL] Number of legacy pty's. Overwrites compiled-in default number. + pv_cr_pin [SECURITY,X86] + Enable paravirtualized control register pinning. When + running paravirutalized under KVM, request that KVM not + allow the guest to disable kernel protection features + set in CPU control registers. Specifying this option + will disable kexec (and crashkernel). If kexec support + has not been compiled into the kernel and host KVM + supports paravirtualized control register pinning, it + will be active by default without the need to specify + this parameter. + quiet [KNL] Disable most log messages r128= [HW,DRM] diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 67f6a40b5e93..bc0b27483001 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -800,6 +800,7 @@ config KVM_GUEST bool "KVM Guest support (including kvmclock)" depends on PARAVIRT select PARAVIRT_CLOCK + select PARAVIRT_CR_PIN select ARCH_CPUIDLE_HALTPOLL default y ---help--- @@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING config PARAVIRT_CLOCK bool +config PARAVIRT_CR_PIN + bool "Paravirtual bit pinning for CR0 and CR4" + depends on KVM_GUEST + help + Select this option to have the virtualised guest request that the + hypervisor disallow it from disabling protections set in control + registers. The hypervisor will prevent exploits from disabling + features such as SMEP, SMAP, UMIP, and WP. + config JAILHOUSE_GUEST bool "Jailhouse non-root cell support" depends on X86_64 && PCI diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 57fd1966c4ea..f021531e98dc 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -112,6 +112,23 @@ static inline void kvm_spinlock_init(void) } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +#ifdef CONFIG_PARAVIRT_CR_PIN +void __init kvm_paravirt_cr_pinning_init(void); +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits); +#else +static inline void kvm_paravirt_cr_pinning_init(void) +{ + return; +} + +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + return; +} +#endif /* CONFIG_PARAVIRT_CR_PIN */ + #else /* CONFIG_KVM_GUEST */ #define kvm_async_pf_task_wait_schedule(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) @@ -145,6 +162,17 @@ static inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token) { return false; } + +static inline void kvm_paravirt_cr_pinning_init(void) +{ + return; +} + +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + return; +} #endif #endif /* _ASM_X86_KVM_PARA_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 921e67086a00..ee17223b1fa8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -416,6 +417,8 @@ static void __init setup_cr_pinning(void) mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP); cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask; static_key_enable(&cr_pinning.key); + + kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits); } /* @@ -1551,6 +1554,8 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) mtrr_ap_init(); validate_apic_and_package_id(c); x86_spec_ctrl_setup_ap(); + + kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits); } static __init int setup_noclflush(char *arg) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 7e6403a8d861..def913b86a99 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #include #include #include @@ -33,6 +35,7 @@ #include #include #include +#include DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); @@ -723,6 +726,7 @@ static void __init kvm_apic_init(void) static void __init kvm_init_platform(void) { kvmclock_init(); + kvm_paravirt_cr_pinning_init(); x86_platform.apic_post_init = kvm_apic_init; } @@ -877,6 +881,41 @@ void __init kvm_spinlock_init(void) #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +#ifdef CONFIG_PARAVIRT_CR_PIN +static int kvm_paravirt_cr_pinning_enabled __ro_after_init; + +void __init kvm_paravirt_cr_pinning_init(void) +{ +#ifdef CONFIG_KEXEC_CORE + if (!cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) + return; + + /* Paravirtualized CR pinning is currently incompatible with kexec */ + kexec_load_disabled = 1; +#endif + + kvm_paravirt_cr_pinning_enabled = 1; +} + +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits, + unsigned long cr4_pinned_bits) +{ + u64 mask; + + if (!kvm_paravirt_cr_pinning_enabled) + return; + + if (!kvm_para_has_feature(KVM_FEATURE_CR_PIN)) + return; + + rdmsrl(MSR_KVM_CR0_PIN_ALLOWED, mask); + wrmsrl(MSR_KVM_CR0_PINNED_HIGH, cr0_pinned_bits & mask); + + rdmsrl(MSR_KVM_CR4_PIN_ALLOWED, mask); + wrmsrl(MSR_KVM_CR4_PINNED_HIGH, cr4_pinned_bits & mask); +} +#endif + #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL static void kvm_disable_host_haltpoll(void *i) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d9c678b37a9b..ed3bcc85d40d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -27,6 +27,9 @@ #include #include #include +#include +#include + #include #include #include @@ -502,6 +505,11 @@ static void __init reserve_crashkernel(void) return; } + if (cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) { + pr_info("Ignoring crashkernel since pv_cr_pin present in cmdline\n"); + return; + } + /* 0 means: find the address automatically */ if (!crash_base) { /* -- 2.21.0