Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp2159478rdh; Sun, 29 Oct 2023 03:36:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqqlCtzMMDDoDUxsg6DnZ8q6BnOjBF5QeRgq5qc8DLwxRe86Uc9WFTKUuF2kcJxAywb0Al X-Received: by 2002:a17:902:6803:b0:1cc:b71:c96f with SMTP id h3-20020a170902680300b001cc0b71c96fmr4901812plk.41.1698575785929; Sun, 29 Oct 2023 03:36:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698575785; cv=none; d=google.com; s=arc-20160816; b=PadK2tRyGB60DCJb5yFsb9eQnV23Xg3sxOq1HIUvQxP/bzm41M+SKHkgd5zWI7TRwx CnxZddC6yXNkKyygAZoCsZWJOP/RiVBfyZFa5FcRyPST7p0ozwSPCurK2wCaNfOJO7MD W+3V1NtcNHB1zKoHOtdkJREkcRQDTb6tujxszSN2P5HDZ+/d8f7bY2s8v2o5BFp/wVnB Q7KZuKVvPGnLmKt/hv/HYL2toynMX31TsjWohNk9Z1isNa+L9wV6R626wqsvGTHM0adc cK23OMnKBbLZ6Hd6WL2j+TyrSqUq9cuIlx/tAuTGzDPG4R+4IEkxoMqVQamINESU9vrP 4gIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=v06Duhi96oMDgg5PdZrl7eX0rzRu065FykTkgDgBCaM=; fh=ellRwZrUjtAkBfBiAptF8r670PT/2ZycIXAy38qJ+S8=; b=H94Snm15xYDQe2Tg+hElduIiBmt/aip6HKo35sIMiJxceLjM9QIN1ehsfM8DLoBeFl KeBKc+2/k01/sMlncwwkugOIOAPJQeRWezhW4Rn2IrnNdhzSJGA0sz56MZY2Rdr3W3Ud qUbfLtc02jUQL5r7dicaO4jLfo4Xx+4jept6sOdbLMMXhN4hsVjRbRDd9GJq24jq7Zh5 jXNyqYUH9TaAcDrT/cSwkNXovwqTN6MU1gR48ZQu1JNeMndZxbi2RJkVemvuiwAiBFXI T/cR3Lg5ASBAvKIExZKIJ+shID4bn8L401VQYOrDXQ52QMTbV+gUnzxX7KAuOF0C8V1n ytow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=alien8 header.b=TFmd5pmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id m19-20020a170902e41300b001c62161b18esi3460890ple.580.2023.10.29.03.36.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Oct 2023 03:36:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=alien8 header.b=TFmd5pmZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 564568057442; Sun, 29 Oct 2023 03:36:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229529AbjJ2Kfs (ORCPT + 99 others); Sun, 29 Oct 2023 06:35:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229482AbjJ2Kfr (ORCPT ); Sun, 29 Oct 2023 06:35:47 -0400 Received: from mail.alien8.de (mail.alien8.de [IPv6:2a01:4f9:3051:3f93::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7751DBF for ; Sun, 29 Oct 2023 03:35:43 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 55A2140E01A3; Sun, 29 Oct 2023 10:35:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Authentication-Results: mail.alien8.de (amavisd-new); dkim=pass (4096-bit key) header.d=alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LnugwZy42cqN; Sun, 29 Oct 2023 10:35:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1698575737; bh=v06Duhi96oMDgg5PdZrl7eX0rzRu065FykTkgDgBCaM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=TFmd5pmZSxYep+gjMd9vPjt3Xwy/VkKM0KPUkhIGt4uArjTBnKkjYuHP4TDGB3znm PFzOtDfYqajYlcwLmVPqsdxsXgC0eRXsvqumP+JNDVWUEN5DTnQevHo852c0SS+icv 5Cgm0w2vcS2IlgqbdI8yytosaz2jeG+A86USGMHS8aHwOEXzd66Vks9ZbIhT0UKChH NGMxl32Z7NpJWU2ItFlUkR2yM16To8FDjNu8zosTRJFWoyWvPiT8jn9VApds3rrxgF fBFXEkYYGJAyq3wIfARVvvcbIl6HUL50ZHThDCKYLl40LLZIsssPBqxIeDvBjmy9US wuHZtUTqq7ilr0G/jPxUiEdPt7zSX+KSpJOP6AVTQzGf+4Z4WFj46RrjcGS+EcTdft lpR2ucUmwy20Pjq2FFOEUwRggKhjQnVDKGC0bwfdLC1FsNDnuYbI0EyqvSqELXzSFa yICPAEvZ0QqHknswBOca7n5N3Ly+z33X8B+emcsTa40c0RybZep/+1PrELFRwqOX5H yFMjllgfxp2j/Bi/7cB+WhfmIwy6JL1YOIixYwqV4zgGsJZnE0Y3ryZTadoaOCQsAn tyrmJI0yK7MWyom9HJbbX5FZFWjL8EvoO/bC3tPW+xji2PPaUqM9oNExDCc8DiOd2V Sm2rmf6PY0eBL5l6CNepxK3o= Received: from zn.tnic (pd95304da.dip0.t-ipconnect.de [217.83.4.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 50F8E40E014B; Sun, 29 Oct 2023 10:35:32 +0000 (UTC) Date: Sun, 29 Oct 2023 11:35:26 +0100 From: Borislav Petkov To: Andrew Cooper Cc: Peter Zijlstra , X86 ML , Kishon VijayAbraham , LKML Subject: [PATCH -v3] x86/barrier: Do not serialize MSR accesses on AMD Message-ID: <20231029103526.GAZT41bj8qMjt9+7Ql@fat_crate.local> References: <20230704074631.GAZKPOV/9BfqP0aU8v@fat_crate.local> <20230704090132.GP4253@hirez.programming.kicks-ass.net> <20230704092222.GBZKPkzgdM8rbPe7zA@fat_crate.local> <20231027153327.GKZTvYR3qslaTUjtCT@fat_crate.local> <20231027153458.GMZTvYou1tlK6HD8/Y@fat_crate.local> <20231027185641.GE26550@noisy.programming.kicks-ass.net> <20231027191633.GRZTwMkaiW1nyvnzzO@fat_crate.local> <20231027192907.GSZTwPg8v7NF6+Zn0w@fat_crate.local> <3c56e807-945c-4996-9ac1-3205a23248ab@citrix.com> <20231027202328.GTZTwcQIh8wFyZUtQd@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231027202328.GTZTwcQIh8wFyZUtQd@fat_crate.local> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Sun, 29 Oct 2023 03:36:22 -0700 (PDT) On Fri, Oct 27, 2023 at 10:23:28PM +0200, Borislav Petkov wrote: > So the feature bit should be named something more specific: > > X86_FEATURE_APIC_TSC_MSRS_NEED_FENCING > > or so. Plain and simple: --- From: "Borislav Petkov (AMD)" Date: Fri, 27 Oct 2023 14:24:16 +0200 AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham : On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: *) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) --- arch/x86/include/asm/barrier.h | 18 ------------------ arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/processor.h | 18 ++++++++++++++++++ arch/x86/kernel/cpu/amd.c | 3 +++ arch/x86/kernel/cpu/common.c | 7 +++++++ arch/x86/kernel/cpu/hygon.c | 3 +++ 6 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 35389b2af88e..0216f63a366b 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -81,22 +81,4 @@ do { \ #include -/* - * Make previous memory operations globally visible before - * a WRMSR. - * - * MFENCE makes writes visible, but only affects load/store - * instructions. WRMSR is unfortunately not a load/store - * instruction and is unaffected by MFENCE. The LFENCE ensures - * that the WRMSR is not reordered. - * - * Most WRMSRs are full serializing instructions themselves and - * do not require this barrier. This is only required for the - * IA32_TSC_DEADLINE and X2APIC MSRs. - */ -static inline void weak_wrmsr_fence(void) -{ - asm volatile("mfence; lfence" : : : "memory"); -} - #endif /* _ASM_X86_BARRIER_H */ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 58cb9495e40f..0091f1008314 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,10 +308,10 @@ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ #define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ - #define X86_FEATURE_SRSO (11*32+24) /* "" AMD BTB untrain RETs */ #define X86_FEATURE_SRSO_ALIAS (11*32+25) /* "" AMD BTB untrain RETs through aliasing */ #define X86_FEATURE_IBPB_ON_VMEXIT (11*32+26) /* "" Issue an IBPB only on VMEXIT */ +#define X86_FEATURE_APIC_MSRS_FENCE (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 4b130d894cb6..061aa86b4662 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -752,4 +752,22 @@ enum mds_mitigations { extern bool gds_ucode_mitigated(void); +/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + alternative("mfence; lfence", "", ALT_NOT(X86_FEATURE_APIC_MSRS_FENCE)); +} + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index a7eab05e5f29..841e21213668 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -1162,6 +1162,9 @@ static void init_amd(struct cpuinfo_x86 *c) if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && cpu_has_amd_erratum(c, amd_erratum_1485)) msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT); + + /* AMD CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9058da9ae011..4d4b87c6885d 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1856,6 +1856,13 @@ static void identify_cpu(struct cpuinfo_x86 *c) c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0); #endif + + /* + * Set default APIC and TSC_DEADLINE MSR fencing flag. AMD and + * Hygon will clear it in ->c_init() below. + */ + set_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); + /* * Vendor-specific initialization. In this section we * canonicalize the feature flags, meaning if there are diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c index 6f247d66758d..f0cd95502faa 100644 --- a/arch/x86/kernel/cpu/hygon.c +++ b/arch/x86/kernel/cpu/hygon.c @@ -354,6 +354,9 @@ static void init_hygon(struct cpuinfo_x86 *c) set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS); check_null_seg_clears_base(c); + + /* Hygon CPUs don't need fencing after x2APIC/TSC_DEADLINE MSR writes. */ + clear_cpu_cap(c, X86_FEATURE_APIC_MSRS_FENCE); } static void cpu_detect_tlb_hygon(struct cpuinfo_x86 *c) -- 2.42.0.rc0.25.ga82fb66fed25 -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette