Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp814966ybg; Wed, 10 Jun 2020 14:39:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzQtQJ8ltd3bmwh3O/dtXq8JvH+6D1lw13DiE0GSiZHJHe5cLyIspAbJeqMe8oKauoHAhoM X-Received: by 2002:a17:906:73db:: with SMTP id n27mr5212422ejl.16.1591825162962; Wed, 10 Jun 2020 14:39:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591825162; cv=none; d=google.com; s=arc-20160816; b=FaF1W8i9JV62xFndcxh49QfF9SFiWBrdh70waGz4JE6NRxfGv5XxVOOD6MYZ0n02M/ 0ZHkdHfwSlCM7nuREHMId4fAirEnoHGVW7D0X571sxqbZMwgTJJCAqINCVSYHTPmsFIT XqFiwRLq0my1eyPhZ/92jyhC/nIlVIdPPV6gViPZ0+BzGaYBhlF3CuCEB3UEraUljT3C 9cInKMWzuds+nDKkWCrxowkvMNYAJH+8z/X+P8gXtNCIJbXDuQmsfrT+5XNXVDG/fbDT CBa0u3EoPGznoI5s4iFmEUTs6FaAKyDevf/TCk2HZ4juQ09alBH5WcZ1AkVbMUt/4an1 6eLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=CrBK+A3W+9Tpc1zKftHne+9P9DRmvw+J1TXlqluEV+c=; b=m4g0RJA4gb4Ufa6TcvCiwFbw3SckfKylWKnRyHUSFyoOd2hAwgWfcr2Jk++V16elSo I1LA8T+3YaWXmoeg8usT6xDaauSuouq40MsQzWi8IPvHnAVBpAad1h3NVrc7E6i96S/K AUL3CaPrp9EGiI/if9zOx6fQf+7g/Ums+D+fYUPvblpdsfNQTnDVTDhcuT6Yai/JiHCP F/Hl+1VkfwXO9WxXHeK5QVs6jOZFHlKq8ZT3IfZmR9NXCelTwqkU1RF0M7JsuJa0jW+G mkRQ/VpmYZTR7ZG6sAXiJa5mRKJkD784abvK9xGUQX0wZ8pOOVbE5JJnKG7x7arlAFgF 4Wcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e23si724731ejb.425.2020.06.10.14.39.00; Wed, 10 Jun 2020 14:39:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726277AbgFJVgG (ORCPT + 99 others); Wed, 10 Jun 2020 17:36:06 -0400 Received: from mga03.intel.com ([134.134.136.65]:8907 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726114AbgFJVgF (ORCPT ); Wed, 10 Jun 2020 17:36:05 -0400 IronPort-SDR: 743q4c38cybqGE+XYfq4CNivY8uXZwV2V9WjKFap4l31AMJM913au8trOfcavxvV9gqbQpTpQ9 mBRTUTbcIwCA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jun 2020 14:36:04 -0700 IronPort-SDR: VGR49SxO4V7TZXUsKgr0PkSZES+SlQ/oVEiz8EZpTNdpGpxR8a9I7NZe29I5vOtDH+GDwr9SBA QVlAn+F0J8YQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,497,1583222400"; d="scan'208";a="275115369" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.152]) by orsmga006.jf.intel.com with ESMTP; 10 Jun 2020 14:36:04 -0700 Date: Wed, 10 Jun 2020 14:36:04 -0700 From: Sean Christopherson To: "David P. Reed" Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Allison Randal , Enrico Weigelt , Greg Kroah-Hartman , Kate Stewart , "Peter Zijlstra (Intel)" , Randy Dunlap , Martin Molnar , Andy Lutomirski , Alexandre Chartre , Jann Horn , Dave Hansen , linux-kernel@vger.kernel.org Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash Message-ID: <20200610213604.GG18790@linux.intel.com> References: <20200610181254.2142-1-dpreed@deepplum.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200610181254.2142-1-dpreed@deepplum.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 10, 2020 at 02:12:50PM -0400, David P. Reed wrote: > If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is > done on all CPUS, to allow the INIT IPI to function, since > INIT is suppressed when CPUs are in VMX root operation. > However, VMXOFF causes an undefined operation fault if the CPU is not > in VMX operation, that is, VMXON has not been executed, or VMXOFF > has been executed, but VMX is enabled. This fix makes the reboot > work more reliably by modifying the #UD handler to skip the VMXOFF > if VMX is enabled on the CPU and the VMXOFF is executed as part > of cpu_emergency_vmxoff(). > The logic in reboot.c is also corrected, since the point of forcing > the processor out of VMX root operation is because when VMX root > operation is enabled, the processor INIT signal is always masked. > See Intel SDM section on differences between VMX Root operation and normal > operation. Thus every CPU must be forced out of VMX operation. > Since the CPU will hang rather than restart, a manual "reset" is the > only way out of this state (or if there is a BMC, it can issue a RESET > to the chip). > > Signed-off-by: David P. Reed > --- > @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void) > return __read_cr4() & X86_CR4_VMXE; > } > > -/** Disable VMX if it is enabled on the current CPU > +/** Force disable VMX if it is enabled on the current CPU. > + * Note that if CPU is not in VMX root operation this > + * VMXOFF will fault an undefined operation fault. > + * So the 'doing_emergency_vmxoff' percpu flag is set, > + * the trap handler for just restarts execution after > + * the VMXOFF instruction. > * > - * You shouldn't call this if cpu_has_vmx() returns 0. > + * You shouldn't call this directly if cpu_has_vmx() returns 0. > */ > static inline void __cpu_emergency_vmxoff(void) > { > - if (cpu_vmx_enabled()) > + if (cpu_vmx_enabled()) { > + this_cpu_write(doing_emergency_vmxoff, 1); > cpu_vmxoff(); > + this_cpu_write(doing_emergency_vmxoff, 0); > + } > } ... > +/* > + * Fix any unwanted undefined operation fault due to VMXOFF instruction that > + * is needed to ensure that CPU is not in VMX root operation at time of > + * a reboot/panic CPU reset. There is no safe and reliable way to know > + * if a processor is in VMX root operation, other than to skip the > + * VMXOFF. It is safe to just skip any VMXOFF that might generate this > + * exception, when VMX operation is enabled in CR4. In the extremely > + * rare case that a VMXOFF is erroneously executed while VMX is enabled, > + * but VMXON has not been executed yet, the undefined opcode fault > + * should not be missed by valid code, though it would be an error. > + * To detect this, we could somehow restrict the instruction address > + * to the specific use during reboot/panic. > + */ > +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr) > +{ > + const static u8 insn_vmxoff[3] = { 0x0f, 0x01, 0xc4 }; > + u8 ud[3]; > + > + if (trapnr != X86_TRAP_UD) > + return 0; > + if (!cpu_vmx_enabled()) > + return 0; > + if (!this_cpu_read(doing_emergency_vmxoff)) > + return 0; > + > + /* undefined instruction must be in kernel and be VMXOFF */ > + if (regs->ip < TASK_SIZE_MAX) > + return 0; > + if (probe_kernel_address((u8 *)regs->ip, ud)) > + return 0; > + if (memcmp(ud, insn_vmxoff, sizeof(insn_vmxoff))) > + return 0; > + > + regs->ip += sizeof(insn_vmxoff); > + return 1; > +} > + > static nokprobe_inline int > do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str, > struct pt_regs *regs, long error_code) > @@ -193,9 +234,16 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str, > /* > * WARN*()s end up here; fix them up before we call the > * notifier chain. > + * Also, VMXOFF causes unwanted fault during reboot > + * if VMX is enabled, but not in VMX root operation. Fix > + * before calling notifier chain. > */ > - if (!user_mode(regs) && fixup_bug(regs, trapnr)) > - return; > + if (!user_mode(regs)) { > + if (fixup_bug(regs, trapnr)) > + return; > + if (fixup_emergency_vmxoff(regs, trapnr)) > + return; > + } Isn't this just a really kludgy way of doing fixup on vmxoff? E.g. wouldn't the below patch do the trick? diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h index 9aad0e0876fb..54bc84d7028d 100644 --- a/arch/x86/include/asm/virtext.h +++ b/arch/x86/include/asm/virtext.h @@ -32,13 +32,15 @@ static inline int cpu_has_vmx(void) /** Disable VMX on the current CPU * - * vmxoff causes a undefined-opcode exception if vmxon was not run - * on the CPU previously. Only call this function if you know VMX - * is enabled. + * VMXOFF causes a #UD if the CPU is not post-VMXON, eat any #UDs to handle + * races with a hypervisor doing VMXOFF, e.g. if an NMI arrived between VMXOFF + * and clearing CR4.VMXE. */ static inline void cpu_vmxoff(void) { - asm volatile ("vmxoff"); + asm volatile("1: vmxoff\n\t" + "2:\n\t" + _ASM_EXTABLE(1b, 2b)); cr4_clear_bits(X86_CR4_VMXE); }