Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5938469ybf; Thu, 5 Mar 2020 09:51:35 -0800 (PST) X-Google-Smtp-Source: ADFU+vt4a16zBMrpgIMILiSOoUxGZBOJm7514Ym7y62PAL+u3Ppn+LucXC5U0tx67Q1nasm4cnFT X-Received: by 2002:a05:6808:1c4:: with SMTP id x4mr164160oic.83.1583430695333; Thu, 05 Mar 2020 09:51:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583430695; cv=none; d=google.com; s=arc-20160816; b=y2qUvFSXm9ZHg0+O7oGFfbuwjDpnvX/C2KBtPeoudpM26QEsHCjsIDy6D+4WY1ejMp KPKJn1G4jNRbDhj5NeAH3jo2JSQty6UxQfVO0epwgnxjVhxwZRxbhoJwPwYf1jE5muwe VqTj0k123v0m+jOGXBN9JskMG1+09t5NpVj9hfO569fUglH3KsTAgW63Y2yCUQdP2N3m FlIHR+xCd1pZm1SIHgN0fF08/hH7lUvproF8M3F8vIcN3Es/55zkua3UUxV7J1zGWHtx OpJwVhqd79hvAXDXf83xVxJdiijrIemc8xy0rEL0tOD6Bd+SkG/Mmt5RFuxHUvad7X9+ S9Vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:in-reply-to:references:date :from:cc:to:subject; bh=79G8OPRNNYjTwt3ESA+DPf98ea7d1o0vu8kRGFSJjjw=; b=GTzaOMnAyaSpuduemfpoN1snU3+E6GjTFjqPkUnsTQ7VTVbr89x27xeCzSVxQZh9eO FEGlrf6Dct/F8MSc59zsNFpbI5AWYb1v/4q4q1WIwyAxPcn5uhc+eAR1QOWwb7vnyztz xJaLY3q4A1ISardrK72osB6FfmefGx3kACstR4gvSoK/oAVv20Hv8kS6nQRmhk+RMqXN QvOsoWkXG625y0JoOqjMUvCoAg6nEI+bBWrJaJNZcmIyRGD3Au5kSIE9l8r2gftuaE75 PviNNHwwcyE0wNFwdOtboyETe4mdW21njWh+2SN4Sp76ok9M/zVMaDNXM74zjQxRnHEG hDOA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n14si3945786oig.110.2020.03.05.09.51.23; Thu, 05 Mar 2020 09:51:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726273AbgCERuy (ORCPT + 99 others); Thu, 5 Mar 2020 12:50:54 -0500 Received: from mga14.intel.com ([192.55.52.115]:24953 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726020AbgCERuy (ORCPT ); Thu, 5 Mar 2020 12:50:54 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Mar 2020 09:50:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,518,1574150400"; d="scan'208";a="320276072" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by orsmga001.jf.intel.com with ESMTP; 05 Mar 2020 09:50:53 -0800 Subject: [RFC][PATCH 2/2] x86: add extra serialization for non-serializing MSRs To: linux-kernel@vger.kernel.org Cc: Dave Hansen , jan.kiszka@siemens.com, x86@kernel.org, peterz@infradead.org From: Dave Hansen Date: Thu, 05 Mar 2020 09:47:08 -0800 References: <20200305174706.0D6B8EE4@viggo.jf.intel.com> In-Reply-To: <20200305174706.0D6B8EE4@viggo.jf.intel.com> Message-Id: <20200305174708.F77040DD@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain "mfence" while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for "mfence;lfence". Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendatrions. This is not known to cause any issues in practice, only in theory. Longer story below: The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient. Why was the pain "mfence" judged insufficient? WRMSR itself is normally a serializing instruction. No fences are needed because because the instruction itself serializes everything. But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs. Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MEFNCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place. This mean that an x2apic IPI could theoretically be triggered before there is any (visible) data to process. Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident. To be safe, add the SDM-recommended fences for all x2apic WRMSRs. This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say: In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required. But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE. Reported-by: Jan Kiszka Cc: x86@kernel.org Cc: Peter Zijlstra --- b/arch/x86/include/asm/apic.h | 10 ---------- b/arch/x86/include/asm/barrier.h | 18 ++++++++++++++++++ b/arch/x86/kernel/apic/apic.c | 4 ++++ b/arch/x86/kernel/apic/x2apic_cluster.c | 6 ++++-- b/arch/x86/kernel/apic/x2apic_phys.c | 9 ++++++--- b/tools/arch/x86/include/asm/barrier.h | 1 + 6 files changed, 33 insertions(+), 15 deletions(-) diff -puN arch/x86/include/asm/apic.h~x2apic-wrmsr-serialization arch/x86/include/asm/apic.h --- a/arch/x86/include/asm/apic.h~x2apic-wrmsr-serialization 2020-03-05 09:42:38.876901038 -0800 +++ b/arch/x86/include/asm/apic.h 2020-03-05 09:42:38.891901038 -0800 @@ -195,16 +195,6 @@ static inline bool apic_needs_pit(void) #endif /* !CONFIG_X86_LOCAL_APIC */ #ifdef CONFIG_X86_X2APIC -/* - * Make previous memory operations globally visible before - * sending the IPI through x2apic wrmsr. We need a serializing instruction or - * mfence for this. - */ -static inline void x2apic_wrmsr_fence(void) -{ - asm volatile("mfence" : : : "memory"); -} - static inline void native_apic_msr_write(u32 reg, u32 v) { if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || diff -puN arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization arch/x86/include/asm/barrier.h --- a/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization 2020-03-05 09:42:38.878901038 -0800 +++ b/arch/x86/include/asm/barrier.h 2020-03-05 09:42:38.893901038 -0800 @@ -84,4 +84,22 @@ do { \ #include +/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MEFNCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + asm volatile("mfence; lfence" : : : "memory"); +} + #endif /* _ASM_X86_BARRIER_H */ diff -puN arch/x86/kernel/apic/apic.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/apic.c --- a/arch/x86/kernel/apic/apic.c~x2apic-wrmsr-serialization 2020-03-05 09:42:38.880901038 -0800 +++ b/arch/x86/kernel/apic/apic.c 2020-03-05 09:42:38.892901038 -0800 @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -474,6 +475,9 @@ static int lapic_next_deadline(unsigned { u64 tsc; + /* This MSR is special and need a special fence: */ + weak_wrmsr_fence(); + tsc = rdtsc(); wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; diff -puN arch/x86/kernel/apic/x2apic_cluster.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/x2apic_cluster.c --- a/arch/x86/kernel/apic/x2apic_cluster.c~x2apic-wrmsr-serialization 2020-03-05 09:42:38.882901038 -0800 +++ b/arch/x86/kernel/apic/x2apic_cluster.c 2020-03-05 09:42:38.892901038 -0800 @@ -29,7 +29,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); } @@ -41,7 +42,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long flags; u32 dest; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags); tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask); diff -puN arch/x86/kernel/apic/x2apic_phys.c~x2apic-wrmsr-serialization arch/x86/kernel/apic/x2apic_phys.c --- a/arch/x86/kernel/apic/x2apic_phys.c~x2apic-wrmsr-serialization 2020-03-05 09:42:38.885901038 -0800 +++ b/arch/x86/kernel/apic/x2apic_phys.c 2020-03-05 09:42:38.892901038 -0800 @@ -37,7 +37,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_apicid, cpu); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL); } @@ -48,7 +49,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long this_cpu; unsigned long flags; - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); local_irq_save(flags); @@ -116,7 +118,8 @@ void __x2apic_send_IPI_shorthand(int vec { unsigned long cfg = __prepare_ICR(which, vector, 0); - x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); native_x2apic_icr_write(cfg, 0); } diff -puN tools/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization tools/arch/x86/include/asm/barrier.h --- a/tools/arch/x86/include/asm/barrier.h~x2apic-wrmsr-serialization 2020-03-05 09:42:38.887901038 -0800 +++ b/tools/arch/x86/include/asm/barrier.h 2020-03-05 09:42:38.892901038 -0800 @@ -43,4 +43,5 @@ do { \ ___p1; \ }) #endif /* defined(__x86_64__) */ + #endif /* _TOOLS_LINUX_ASM_X86_BARRIER_H */ _