Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp643955ybz; Wed, 22 Apr 2020 05:22:34 -0700 (PDT) X-Google-Smtp-Source: APiQypKSc/itdz1nG+rk0sYoJK9M6w59LOImsi9oS+taeq5NkFFLh4Rqth+kt3CXWei+Hea6rQ81 X-Received: by 2002:a17:906:28d7:: with SMTP id p23mr2224844ejd.305.1587558154667; Wed, 22 Apr 2020 05:22:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587558154; cv=none; d=google.com; s=arc-20160816; b=O7/0O1nFJFc4ZNSTDHjXVR0f37yqNqRANq1KLev+GwvzB9AEafBUaWcfQEHTvWqN8E REQ0xV4q72xDpml9W674Xmk23UyK0RLPyRAQMat6h3a+lfFy4D4/4omGUKicAe1/OWYv dZ09TV+fwhPJvVgaMhV5/F87E4l5W2Xfl0Qu5ul7TwkjlP03bGJ8Qmn7Z+vtYzbYZ8yU BrYf9Jce7iXHNMipMzr18b7MWPRcclfPqC4lxOKHoHFzl7FtDCrTV6rCHOHrbvXptc61 AaEL0XySgV/KGDHLyR2WaZDBqMB+IbYxfUmM1QRx/Unkyo38b8xNRAHByO74bbNOGWTR s7Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=WGVY9L2TwAFqY19ofhU/karLPXxFdc6LDpJBQa10fNU=; b=wfTJMBQ5AOFYaMkVx+S5aF5XwBejSByv2sTrXgdwYo5txseTPAVw3czfjkAoxo3F/V jhSR0SEARKwemjSGN+61LpusV4+lEe+stgMv+h95o0nD01qArbQGFdiHAM5oVcqTk6mX EPOkOEQvTm9E7woCRmrhM5G5eKzG2OduLCxNMYD4XTmkKmAZSHiCBkHb5aA0jc+tejb1 ogYKn2ieu2UTh+WvsOAGkI/EqmqenSC4+0QuC7UrPZK/E680B5sDXYoHfmty4yJX1XhM ARtPmQXPjwlO2qH7RMkDz9pdjBBhzQrid4Nbm8XrgO2BIJElTCoAOWJpheuxR+YmDgdT Kr4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=vVWf1uSf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e16si3315883eds.503.2020.04.22.05.22.11; Wed, 22 Apr 2020 05:22:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=vVWf1uSf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726766AbgDVKMW (ORCPT + 99 others); Wed, 22 Apr 2020 06:12:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:45068 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728778AbgDVKMN (ORCPT ); Wed, 22 Apr 2020 06:12:13 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C5F0320575; Wed, 22 Apr 2020 10:12:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587550332; bh=EHVLxDqmDsahDyCiv52kZHDo+TWJVyTygZscD6oy+zw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vVWf1uSf3wagMvHYWnjI8UoT87b9Z/RnW0YS+Qdnl34z+EfC6duMfhwNG3ed8tMaD 2uV2379WdXUQhmptepj7uE7ZQwN3Onhku9yppn06Qx0lIUvtwIT4LBX3Ls2M42X3Nq Ubuuk8xWlzsquMFEROk3R/3ZXoLMsIHWABVB2IPQ= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sean Christopherson , Vitaly Kuznetsov , Paolo Bonzini Subject: [PATCH 4.14 061/199] KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support Date: Wed, 22 Apr 2020 11:56:27 +0200 Message-Id: <20200422095104.355453920@linuxfoundation.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200422095057.806111593@linuxfoundation.org> References: <20200422095057.806111593@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson commit 31603d4fc2bb4f0815245d496cb970b27b4f636a upstream. VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown interrupted a KVM update of the percpu in-use VMCS list. Because NMIs are not blocked by disabling IRQs, it's possible that crash_vmclear_local_loaded_vmcss() could be called while the percpu list of VMCSes is being modified, e.g. in the middle of list_add() in vmx_vcpu_load_vmcs(). This potential corner case was called out in the original commit[*], but the analysis of its impact was wrong. Skipping the VMCLEARs is wrong because it all but guarantees that a loaded, and therefore cached, VMCS will live across kexec and corrupt memory in the new kernel. Corruption will occur because the CPU's VMCS cache is non-coherent, i.e. not snooped, and so the writeback of VMCS memory on its eviction will overwrite random memory in the new kernel. The VMCS will live because the NMI shootdown also disables VMX, i.e. the in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the VMCS cache on VMXOFF. Furthermore, interrupting list_add() and list_del() is safe due to crash_vmclear_local_loaded_vmcss() using forward iteration. list_add() ensures the new entry is not visible to forward iteration unless the entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev" pointer could be observed if the NMI shootdown interrupted list_del() or list_add(), but list_for_each_entry() does not consume ->prev. In addition to removing the temporary disabling of VMCLEAR, open code loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that the VMCS is deleted from the list only after it's been VMCLEAR'd. Deleting the VMCS before VMCLEAR would allow a race where the NMI shootdown could arrive between list_del() and vmcs_clear() and thus neither flow would execute a successful VMCLEAR. Alternatively, more code could be moved into loaded_vmcs_init(), but that gets rather silly as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb() and would need to work around the list_del(). Update the smp_*() comments related to the list manipulation, and opportunistically reword them to improve clarity. [*] https://patchwork.kernel.org/patch/1675731/#3720461 Fixes: 8f536b7697a0 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 67 ++++++++++++----------------------------------------- 1 file changed, 16 insertions(+), 51 deletions(-) --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1674,43 +1674,15 @@ static void vmcs_load(struct vmcs *vmcs) } #ifdef CONFIG_KEXEC_CORE -/* - * This bitmap is used to indicate whether the vmclear - * operation is enabled on all cpus. All disabled by - * default. - */ -static cpumask_t crash_vmclear_enabled_bitmap = CPU_MASK_NONE; - -static inline void crash_enable_local_vmclear(int cpu) -{ - cpumask_set_cpu(cpu, &crash_vmclear_enabled_bitmap); -} - -static inline void crash_disable_local_vmclear(int cpu) -{ - cpumask_clear_cpu(cpu, &crash_vmclear_enabled_bitmap); -} - -static inline int crash_local_vmclear_enabled(int cpu) -{ - return cpumask_test_cpu(cpu, &crash_vmclear_enabled_bitmap); -} - static void crash_vmclear_local_loaded_vmcss(void) { int cpu = raw_smp_processor_id(); struct loaded_vmcs *v; - if (!crash_local_vmclear_enabled(cpu)) - return; - list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu), loaded_vmcss_on_cpu_link) vmcs_clear(v->vmcs); } -#else -static inline void crash_enable_local_vmclear(int cpu) { } -static inline void crash_disable_local_vmclear(int cpu) { } #endif /* CONFIG_KEXEC_CORE */ static void __loaded_vmcs_clear(void *arg) @@ -1722,19 +1694,24 @@ static void __loaded_vmcs_clear(void *ar return; /* vcpu migration can race with cpu offline */ if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs) per_cpu(current_vmcs, cpu) = NULL; - crash_disable_local_vmclear(cpu); + + vmcs_clear(loaded_vmcs->vmcs); + if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched) + vmcs_clear(loaded_vmcs->shadow_vmcs); + list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link); /* - * we should ensure updating loaded_vmcs->loaded_vmcss_on_cpu_link - * is before setting loaded_vmcs->vcpu to -1 which is done in - * loaded_vmcs_init. Otherwise, other cpu can see vcpu = -1 fist - * then adds the vmcs into percpu list before it is deleted. + * Ensure all writes to loaded_vmcs, including deleting it from its + * current percpu list, complete before setting loaded_vmcs->vcpu to + * -1, otherwise a different cpu can see vcpu == -1 first and add + * loaded_vmcs to its percpu list before it's deleted from this cpu's + * list. Pairs with the smp_rmb() in vmx_vcpu_load_vmcs(). */ smp_wmb(); - loaded_vmcs_init(loaded_vmcs); - crash_enable_local_vmclear(cpu); + loaded_vmcs->cpu = -1; + loaded_vmcs->launched = 0; } static void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs) @@ -2497,18 +2474,17 @@ static void vmx_vcpu_load(struct kvm_vcp if (!already_loaded) { loaded_vmcs_clear(vmx->loaded_vmcs); local_irq_disable(); - crash_disable_local_vmclear(cpu); /* - * Read loaded_vmcs->cpu should be before fetching - * loaded_vmcs->loaded_vmcss_on_cpu_link. - * See the comments in __loaded_vmcs_clear(). + * Ensure loaded_vmcs->cpu is read before adding loaded_vmcs to + * this cpu's percpu list, otherwise it may not yet be deleted + * from its previous cpu's percpu list. Pairs with the + * smb_wmb() in __loaded_vmcs_clear(). */ smp_rmb(); list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, &per_cpu(loaded_vmcss_on_cpu, cpu)); - crash_enable_local_vmclear(cpu); local_irq_enable(); } @@ -3804,17 +3780,6 @@ static int hardware_enable(void) INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu)); spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu)); - /* - * Now we can enable the vmclear operation in kdump - * since the loaded_vmcss_on_cpu list on this cpu - * has been initialized. - * - * Though the cpu is not in VMX operation now, there - * is no problem to enable the vmclear operation - * for the loaded_vmcss_on_cpu list is empty! - */ - crash_enable_local_vmclear(cpu); - rdmsrl(MSR_IA32_FEATURE_CONTROL, old); test_bits = FEATURE_CONTROL_LOCKED;