Received: by 2002:ac0:da4c:0:0:0:0:0 with SMTP id a12csp457435imi; Fri, 22 Jul 2022 02:48:42 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vMK2LFdFVEBU+gfWyiN2zrgvZTdpOezmD6DQuFlkaSxbpBMkHDRMqz5IvsrEpg0YVIOWhr X-Received: by 2002:a17:907:7b9c:b0:72b:5652:a14a with SMTP id ne28-20020a1709077b9c00b0072b5652a14amr2480291ejc.86.1658483322592; Fri, 22 Jul 2022 02:48:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658483322; cv=none; d=google.com; s=arc-20160816; b=FIIu88M9Us8f9Q9k3RMmA0sI3x4bkF5oSKUmBU/A+e4Ch2fvB4rRGNTCPZpEyji+4W oBeNmiZYt90GtC6hiVEs1rt7ZEfEQRHjFeCF3Q3G4o5Uxh+AaFsCE96UkgN+KqQzqkvN PdWeOGY8hHTkBHEvSZ9DvC3iLIhHh9tVwaHIp1f2bWwiZWJi1v9k23IscNAXjPOZss0D NcqOuFm2NQDZgaawjSZ8sb+aqdVJMcy4JstzQ779CkSKLMu+fKoAIVsw0VVTUknEi3i0 xuOLMECy4RuU0gd/r+igfIN41KoITXEmFlzdQgg6MQ3lK0Tm8ybjjLkTGMXSKlh6WijD NKbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=M+Q3Thke6nxEUqtEllaos2Bsy5HgMmtrGYO/APH5EV8=; b=JgVkU5P7JyadAjF5u3fYNyx8pDfjjMEm+E5VvVLc6dI35sylZ/CYKSvTTRyearuEnp 5GHCy64X4F5Bzbbu1NMSLKpTKO4+FFzDkxgKT706pyUBih4xfQyzX8fiepLUr4zuFRw6 djM90lX2xxYw/TCgT98pE2Pn1r4JObnhlFqDVvtLkcVrMDTqHONWBsq2tDzhE7IX25S6 a2k4sBRd5lnIdWb0hM8tx4xCVdyu1TKFwjbwOePXlDVkYr3ih3lT5ZxUumV8fudqzEYe 3oDhbprKhkhyYoDXEnTq3U75t/mxBSE5Z/FOT4N7BskGlox4x4QWnQisV98OlTsyimgX WTlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=YSi5ZHQT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h11-20020a056402280b00b0043bb917b3fesi6504405ede.243.2022.07.22.02.48.18; Fri, 22 Jul 2022 02:48:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=YSi5ZHQT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235405AbiGVJOV (ORCPT + 99 others); Fri, 22 Jul 2022 05:14:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235086AbiGVJNY (ORCPT ); Fri, 22 Jul 2022 05:13:24 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 558E6A7200; Fri, 22 Jul 2022 02:10:42 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9923761E86; Fri, 22 Jul 2022 09:10:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2BFDC341C6; Fri, 22 Jul 2022 09:10:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1658481041; bh=seRPMTj96YlQMAcbki8HQXrzTJgGhYbVsCfNyj/pMR4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YSi5ZHQTAz9iiM5h8HRvXj3HZzVeRps4ng+c5DK+II5p0o/HXo9db6Nr2M3xoQth7 rxBJJIw4bw3DpL2zvvKzJeqZiiqinT2QUa4sA+LKCNin0AmwVuL/YaF26rJvoCzd1R R8i/xfP8AeOzdifNFERQcGsSE1SbKulJYJk/tvhI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tim Chen , "Peter Zijlstra (Intel)" , Borislav Petkov , Josh Poimboeuf , Thadeu Lima de Souza Cascardo Subject: [PATCH 5.18 33/70] intel_idle: Disable IBRS during long idle Date: Fri, 22 Jul 2022 11:07:28 +0200 Message-Id: <20220722090652.573428452@linuxfoundation.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220722090650.665513668@linuxfoundation.org> References: <20220722090650.665513668@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra commit bf5835bcdb9635c97f85120dba9bfa21e111130f upstream. Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/nospec-branch.h | 1 arch/x86/kernel/cpu/bugs.c | 6 ++++ drivers/idle/intel_idle.c | 44 ++++++++++++++++++++++++++++++----- 3 files changed, 45 insertions(+), 6 deletions(-) --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -255,6 +255,7 @@ static inline void indirect_branch_predi /* The Intel SPEC CTRL MSR base value cache */ extern u64 x86_spec_ctrl_base; extern void write_spec_ctrl_current(u64 val, bool force); +extern u64 spec_ctrl_current(void); /* * With retpoline, we must use IBRS to restrict branch prediction --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -79,6 +79,12 @@ void write_spec_ctrl_current(u64 val, bo wrmsrl(MSR_IA32_SPEC_CTRL, val); } +u64 spec_ctrl_current(void) +{ + return this_cpu_read(x86_spec_ctrl_current); +} +EXPORT_SYMBOL_GPL(spec_ctrl_current); + /* * The vendor and possibly platform specific bits which can be modified in * x86_spec_ctrl_base. --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -47,11 +47,13 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include @@ -106,6 +108,12 @@ static unsigned int mwait_substates __in #define CPUIDLE_FLAG_ALWAYS_ENABLE BIT(15) /* + * Disable IBRS across idle (when KERNEL_IBRS), is exclusive vs IRQ_ENABLE + * above. + */ +#define CPUIDLE_FLAG_IBRS BIT(16) + +/* * MWAIT takes an 8-bit "hint" in EAX "suggesting" * the C-state (top nibble) and sub-state (bottom nibble) * 0x00 means "MWAIT(C1)", 0x10 means "MWAIT(C2)" etc. @@ -159,6 +167,24 @@ static __cpuidle int intel_idle_irq(stru return ret; } +static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + bool smt_active = sched_smt_active(); + u64 spec_ctrl = spec_ctrl_current(); + int ret; + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + + ret = __intel_idle(dev, drv, index); + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); + + return ret; +} + /** * intel_idle_s2idle - Ask the processor to enter the given idle state. * @dev: cpuidle device of the target CPU. @@ -680,7 +706,7 @@ static struct cpuidle_state skl_cstates[ { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 85, .target_residency = 200, .enter = &intel_idle, @@ -688,7 +714,7 @@ static struct cpuidle_state skl_cstates[ { .name = "C7s", .desc = "MWAIT 0x33", - .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 124, .target_residency = 800, .enter = &intel_idle, @@ -696,7 +722,7 @@ static struct cpuidle_state skl_cstates[ { .name = "C8", .desc = "MWAIT 0x40", - .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 200, .target_residency = 800, .enter = &intel_idle, @@ -704,7 +730,7 @@ static struct cpuidle_state skl_cstates[ { .name = "C9", .desc = "MWAIT 0x50", - .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 480, .target_residency = 5000, .enter = &intel_idle, @@ -712,7 +738,7 @@ static struct cpuidle_state skl_cstates[ { .name = "C10", .desc = "MWAIT 0x60", - .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 890, .target_residency = 5000, .enter = &intel_idle, @@ -741,7 +767,7 @@ static struct cpuidle_state skx_cstates[ { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 133, .target_residency = 600, .enter = &intel_idle, @@ -1686,6 +1712,12 @@ static void __init intel_idle_init_cstat if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE) drv->states[drv->state_count].enter = intel_idle_irq; + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) && + cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) { + WARN_ON_ONCE(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE); + drv->states[drv->state_count].enter = intel_idle_ibrs; + } + if ((disabled_states_mask & BIT(drv->state_count)) || ((icpu->use_acpi || force_use_acpi) && intel_idle_off_by_default(mwait_hint) &&