Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp771635ybg; Wed, 10 Jun 2020 13:18:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoUKj62O85iAmMWSfpxQ1DKH0ya7w32WgJmyhIyYPyVq92ONbH5o2FZft8rnGvfzc5/G36 X-Received: by 2002:a17:906:b845:: with SMTP id ga5mr5068021ejb.300.1591820293701; Wed, 10 Jun 2020 13:18:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591820293; cv=none; d=google.com; s=arc-20160816; b=qvjXh+vN/uDDRRtPsE4uOEaHA7iN275YPx8ZrNFsTcf2g83qZuCQgBXJRotl9zsGol pRUewfi5uqHYwbmzThd8HIExKNxHFsh4i1Gjn7+X2E1LeU8G3Qr2Da5ZRM2APr/9SDuf 0r7NDyUGnER06EqxHdzDpJzWlM49NNPqVXbOGDxOseBmzW5ZXCAp596x+6eEqPBUo9yZ ep/zpObbc1GCmrxvvfVPkuI8viknI5zRX0P/bYYfxslxgsypbBDuEd+4CkhR3LpN2Whf ZFqwz4JBKwkVBT254LVhg6mmz9CG+uBdzWeJqm7seiIEyMcNkTdCslEdkSSoOtBIiLQD iOIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=cs5XB/fZa/7g4SLn47f5vDnmQACXZv+bMqZi3OmAiss=; b=fOq8pKOTZDHTnqjCG/FPnsPwKd/tre0CD2FLAoJyRlz9UhRrUw7zAI1lOiN43XhmAG TRgyGFng9HfyMy/XN3YeMAfQGWnTTX5aQ7SVjMluPTyypEOF3qtMkEGDS/68WFOfYU2V dCxdvR37Uc1TouJpAznI4n4/K3oASSBDy5H5H5Iu1YdjeO+wfDTKBRDeiS+SuMw/J1TN 56L6vPIUIGNeUAWU2uyuZBBjBWUD+uaoYq1JYwbktffpGLepzD19eObO/5TKUIxkFdPP V5bXp6uFWktGJlUvl4cgMEpZjKGCw1mkFU5Sr67VN1kwyMpZLod57HmXk+McGiTc/8N8 p8LQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@g001.emailsrvr.com header.s=20190322-9u7zjiwi header.b=rT0AfRzT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v10si386888eda.476.2020.06.10.13.17.50; Wed, 10 Jun 2020 13:18:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@g001.emailsrvr.com header.s=20190322-9u7zjiwi header.b=rT0AfRzT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728800AbgFJSVw (ORCPT + 99 others); Wed, 10 Jun 2020 14:21:52 -0400 Received: from smtp74.ord1d.emailsrvr.com ([184.106.54.74]:50356 "EHLO smtp74.ord1d.emailsrvr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728776AbgFJSVv (ORCPT ); Wed, 10 Jun 2020 14:21:51 -0400 X-Greylist: delayed 460 seconds by postgrey-1.27 at vger.kernel.org; Wed, 10 Jun 2020 14:21:50 EDT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com; s=20190322-9u7zjiwi; t=1591812850; bh=mh+Kv2YKb+dLXMp7FflSBtwzk5puxQCfhuIontc45EI=; h=From:To:Subject:Date:From; b=rT0AfRzTSNUbHS3zS7Rk3KEo0cexyz7OMDGYuC3vcjU1INxfmArSE+cMqxq8CA6k5 cKHzpmwDbCrotKXcs3DxeaMI89StC6IU0CkvWxNzMTdiC+A50xu9pU/eedGr5YbplS u4L+1qqb6g4vVZlQhTuCfilzpsT4B3HJias8cYSU= X-Auth-ID: dpreed@deepplum.com Received: by smtp2.relay.ord1d.emailsrvr.com (Authenticated sender: dpreed-AT-deepplum.com) with ESMTPSA id DE6A04039B; Wed, 10 Jun 2020 14:14:08 -0400 (EDT) X-Sender-Id: dpreed@deepplum.com Received: from gyre.localnet ([UNAVAILABLE]. [209.6.10.161]) (using TLSv1.2 with cipher AES256-GCM-SHA384) by 0.0.0.0:465 (trex/5.7.12); Wed, 10 Jun 2020 14:14:09 -0400 From: "David P. Reed" To: dpreed@deepplum.com Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Allison Randal , Enrico Weigelt , Greg Kroah-Hartman , Kate Stewart , "Peter Zijlstra (Intel)" , Randy Dunlap , Martin Molnar , Andy Lutomirski , Alexandre Chartre , Jann Horn , Dave Hansen , linux-kernel@vger.kernel.org Subject: [PATCH] Fix undefined operation VMXOFF during reboot and crash Date: Wed, 10 Jun 2020 14:12:50 -0400 Message-Id: <20200610181254.2142-1-dpreed@deepplum.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Classification-ID: 863a6ae4-a1f9-4ba7-93f9-e74dc2bfae29-1-1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is done on all CPUS, to allow the INIT IPI to function, since INIT is suppressed when CPUs are in VMX root operation. However, VMXOFF causes an undefined operation fault if the CPU is not in VMX operation, that is, VMXON has not been executed, or VMXOFF has been executed, but VMX is enabled. This fix makes the reboot work more reliably by modifying the #UD handler to skip the VMXOFF if VMX is enabled on the CPU and the VMXOFF is executed as part of cpu_emergency_vmxoff(). The logic in reboot.c is also corrected, since the point of forcing the processor out of VMX root operation is because when VMX root operation is enabled, the processor INIT signal is always masked. See Intel SDM section on differences between VMX Root operation and normal operation. Thus every CPU must be forced out of VMX operation. Since the CPU will hang rather than restart, a manual "reset" is the only way out of this state (or if there is a BMC, it can issue a RESET to the chip). Signed-off-by: David P. Reed --- arch/x86/include/asm/virtext.h | 24 ++++++++++++---- arch/x86/kernel/reboot.c | 13 ++------- arch/x86/kernel/traps.c | 52 ++++++++++++++++++++++++++++++++-- 3 files changed, 71 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..ea2d67191684 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -13,12 +13,16 @@ #ifndef _ASM_X86_VIRTEX_H #define _ASM_X86_VIRTEX_H +#include + #include #include #include #include +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff); + /* * VMX functions: */ @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void) /** Disable VMX on the current CPU * * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * on the CPU previously. Only call this function directly if you know VMX + * is enabled *and* CPU is in VMX root operation. */ static inline void cpu_vmxoff(void) { @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void) return __read_cr4() & X86_CR4_VMXE; } -/** Disable VMX if it is enabled on the current CPU +/** Force disable VMX if it is enabled on the current CPU. + * Note that if CPU is not in VMX root operation this + * VMXOFF will fault an undefined operation fault. + * So the 'doing_emergency_vmxoff' percpu flag is set, + * the trap handler for just restarts execution after + * the VMXOFF instruction. * - * You shouldn't call this if cpu_has_vmx() returns 0. + * You shouldn't call this directly if cpu_has_vmx() returns 0. */ static inline void __cpu_emergency_vmxoff(void) { - if (cpu_vmx_enabled()) + if (cpu_vmx_enabled()) { + this_cpu_write(doing_emergency_vmxoff, 1); cpu_vmxoff(); + this_cpu_write(doing_emergency_vmxoff, 0); + } } -/** Disable VMX if it is supported and enabled on the current CPU +/** Force disable VMX if it is supported and enabled on the current CPU */ static inline void cpu_emergency_vmxoff(void) { diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 3ca43be4f9cf..abc8b51a57c7 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void) * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check - * if other CPUs have VMX enabled. So we will call it only if the - * CPU we are running on has VMX enabled. - * - * We will miss cases where VMX is not enabled on all CPUs. This - * shouldn't do much harm because KVM always enable VMX on all - * CPUs anyway. But we can miss it on the small window where KVM - * is still enabling VMX. + * if other CPUs have VMX enabled. */ - if (cpu_has_vmx() && cpu_vmx_enabled()) { + if (cpu_has_vmx()) { /* Disable VMX on this CPU. */ - cpu_vmxoff(); + cpu_emergency_vmxoff(); /* Halt and disable VMX on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); - } } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4cc541051994..2dcf57ef467e 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -59,6 +60,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -70,6 +72,8 @@ #include #endif +DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0; + DECLARE_BITMAP(system_vectors, NR_VECTORS); static inline void cond_local_irq_enable(struct pt_regs *regs) @@ -115,6 +119,43 @@ int fixup_bug(struct pt_regs *regs, int trapnr) return 0; } +/* + * Fix any unwanted undefined operation fault due to VMXOFF instruction that + * is needed to ensure that CPU is not in VMX root operation at time of + * a reboot/panic CPU reset. There is no safe and reliable way to know + * if a processor is in VMX root operation, other than to skip the + * VMXOFF. It is safe to just skip any VMXOFF that might generate this + * exception, when VMX operation is enabled in CR4. In the extremely + * rare case that a VMXOFF is erroneously executed while VMX is enabled, + * but VMXON has not been executed yet, the undefined opcode fault + * should not be missed by valid code, though it would be an error. + * To detect this, we could somehow restrict the instruction address + * to the specific use during reboot/panic. + */ +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr) +{ + const static u8 insn_vmxoff[3] = { 0x0f, 0x01, 0xc4 }; + u8 ud[3]; + + if (trapnr != X86_TRAP_UD) + return 0; + if (!cpu_vmx_enabled()) + return 0; + if (!this_cpu_read(doing_emergency_vmxoff)) + return 0; + + /* undefined instruction must be in kernel and be VMXOFF */ + if (regs->ip < TASK_SIZE_MAX) + return 0; + if (probe_kernel_address((u8 *)regs->ip, ud)) + return 0; + if (memcmp(ud, insn_vmxoff, sizeof(insn_vmxoff))) + return 0; + + regs->ip += sizeof(insn_vmxoff); + return 1; +} + static nokprobe_inline int do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str, struct pt_regs *regs, long error_code) @@ -193,9 +234,16 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str, /* * WARN*()s end up here; fix them up before we call the * notifier chain. + * Also, VMXOFF causes unwanted fault during reboot + * if VMX is enabled, but not in VMX root operation. Fix + * before calling notifier chain. */ - if (!user_mode(regs) && fixup_bug(regs, trapnr)) - return; + if (!user_mode(regs)) { + if (fixup_bug(regs, trapnr)) + return; + if (fixup_emergency_vmxoff(regs, trapnr)) + return; + } if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) != NOTIFY_STOP) { -- 2.26.2