Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4095716pxb; Mon, 8 Feb 2021 07:51:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJw7kdeMD/U1e3+uMxaqJ10Vrxa1V4Uy45o/9NCbw3NFD7RsVO/P2vd40AXtXptZlJJdeHkX X-Received: by 2002:aa7:d345:: with SMTP id m5mr13353557edr.30.1612799496854; Mon, 08 Feb 2021 07:51:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612799496; cv=none; d=google.com; s=arc-20160816; b=RR1Axf8onHitjU6IzGBT1EtIRy7jqL61p+LOaELbwm1zPd1UgfwdvlL2zNu8izqrQ6 SGotkgi8MMEO6MKHylQ3HB0pAnlEcjOTRlF6gLBZ05nJje3H+pLmTVCzO1ELgswAzPM7 3n2RDi6fCJt0yOtfnPhAGIzpMa3zBP1wBdWv8UKJRkRJzIaf5+I8AfSQrcrpNpDwVw9D GDqUIL6KXmjxKWX7qZ9nXnkeHYz71ViEkjV6WTNhxXMPYf+FX+M5jYBTQfY7cEwANulv st2/s4j+d3P8rbzFigee0MLUorGwgrYN642pUsSacOzgU0DVEb/pSD5AtUKHI/Xph3Iv Pmng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=obsGtwH8Ye0G0iJCYzeVjPXYeOz29DwzW6885q/x0xI=; b=yxwCA12MgBvXrRlbVy2xj7TT9uWQ4vUJQaUI20xtlUkgB+lNSEhb4+lyRj7/PyVyKi lJxatfB7Q+ogy6FHlrFu39qy5gFiYiPXSc8OCXasfkAnHyZFF+jrzTUj2I+el/pnQRqT V1XMm+lyIUqOJdauu7AFE3dLUNEBbsEllLCas6Wz+YgEziGGThTxgxnpgC4rxQIzU4qd YZpzW5Wh6JfEerZiqkc7hN90guDSl7XmXcN1zY3M7bsE8No1ZIohaku9SrweZlSvPuun a32f2RyYy4LnZPn/5v0Y7+wprKNmmiIXULCKhN6Pa2rvsu5beQu/cm+0AO5g3BMQbJWR Eypg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WS5MkvZx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id fx25si10981636ejb.58.2021.02.08.07.51.11; Mon, 08 Feb 2021 07:51:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=WS5MkvZx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234126AbhBHPrt (ORCPT + 99 others); Mon, 8 Feb 2021 10:47:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:52604 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233084AbhBHPGv (ORCPT ); Mon, 8 Feb 2021 10:06:51 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id E15DF64EDD; Mon, 8 Feb 2021 15:05:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1612796714; bh=T7bESprAvYwMSM13wEooklKYp3qfd468WFuEe5W0Pzw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WS5MkvZxVZVTkIbE8RQRouIKyOFlQQsahnyl9w+BSpFYWx0V2P3EDcG6PEqRNq3L+ Mndn7CTOlIcmH8GLot3mANMupsxTaeNzhVTcjt1ivq5qHDsERLfZ+hS0i82POFb4B3 0+MZ1a4BR1s8iJ6S0ZfgqB6RVcbmB79n5QEruRQ4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Jan Kiszka , Dave Hansen , Borislav Petkov , "Peter Zijlstra (Intel)" , Thomas Gleixner Subject: [PATCH 4.9 39/43] x86/apic: Add extra serialization for non-serializing MSRs Date: Mon, 8 Feb 2021 16:01:05 +0100 Message-Id: <20210208145807.899436995@linuxfoundation.org> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20210208145806.281758651@linuxfoundation.org> References: <20210208145806.281758651@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen commit 25a068b8e9a4eb193d755d58efcb3c98928636e0 upstream. Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE. Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain MFENCE judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. [ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ] Reported-by: Jan Kiszka Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Acked-by: Thomas Gleixner Cc: Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/apic.h | 10 ---------- arch/x86/include/asm/barrier.h | 18 ++++++++++++++++++ arch/x86/kernel/apic/apic.c | 4 ++++ arch/x86/kernel/apic/x2apic_cluster.c | 6 ++++-- arch/x86/kernel/apic/x2apic_phys.c | 6 ++++-- 5 files changed, 30 insertions(+), 14 deletions(-) --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -176,16 +176,6 @@ static inline void lapic_update_tsc_freq #endif /* !CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_X86_X2APIC -/* - * Make previous memory operations globally visible before - * sending the IPI through x2apic wrmsr. We need a serializing instruction or - * mfence for this. - */ -static inline void x2apic_wrmsr_fence(void) -{ - asm volatile("mfence" : : : "memory"); -} - static inline void native_apic_msr_write(u32 reg, u32 v) { if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -110,4 +110,22 @@ do { \ #include +/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + asm volatile("mfence; lfence" : : : "memory"); +} + #endif /* _ASM_X86_BARRIER_H */ --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -476,6 +477,9 @@ static int lapic_next_deadline(unsigned { u64 tsc; + /* This MSR is special and need a special fence: */ + weak_wrmsr_fence(); + tsc = rdtsc(); wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -27,7 +27,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); } @@ -40,7 +41,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long flags; u32 dest; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags); --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -40,7 +40,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL); } @@ -51,7 +52,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long this_cpu; unsigned long flags; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags);