Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3078931rwd; Fri, 16 Jun 2023 11:55:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4kd2SglExamBLB8Xw/4dE5+c6NfeiRCcwORpdGHK14LcImBSImBXxz3JFd7EikYbM7k9N7 X-Received: by 2002:a05:6808:8ef:b0:398:1186:88b9 with SMTP id d15-20020a05680808ef00b00398118688b9mr2761819oic.30.1686941733785; Fri, 16 Jun 2023 11:55:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686941733; cv=none; d=google.com; s=arc-20160816; b=w5mIJfRdHTBbxXd96cbwi2YTnaDtLNkeMR+DKzhm6ZM6+cwXQK87GDKlgwVl66Fz61 61ORQl28I3ANNFAF2U52M92bCPWnybdCXjaLydjOiXK6HfcLRJf7JVx8cGfHfXnOCRuU 4OilVowgH6rEkZtY/YQ04Ss813rRgLqNw65I3TEDnf7QfXDh4EjL8INXnPdqIXcU/iuU 8Q9exDu0wXoEQzUfBW1Hwa3Vs8IYTqBcHIjCkxM/iuaKufSJ4TiJiNRTBnKOAMUUkvmD MUTg6IKuG0QaVink+lxSBA2MzH6FeRQlV5lJKHjLPeiJ/wgIs5Tzp3hKVwurxzSzSfc1 d2xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WRyYdkfyfB/9ORwqKdXXDwPQP99PSDaWw5+88bchAgI=; b=gPB/cXGqBeQjFmOiodMbt+yEYpAXDoS2OvnCS5yTZIu2PFLsHl1YrmC0xWEfhXU9fe o4UJDAywDKN+etR9l9b4m5xS56kMcJg1GYlCBJc4NvubKwYcY5cwG1npjBHuM5CjDbbo zR3C+W0nOB+74n5XCfQOxZAE+wMjhkZNqXIXUa0yQIUYYh1kX3UMxIc69AVAtwURRUfj uqWqalzteIGIbW+st1m4Mu6Ijx7oysBmAG+igRKCKw6UW49Va7YsEXqrKalBvMv9WHwB ubmoNb9fl+77i9p8xGc52kry2bMQRlU0rf6LAsiHSh18XQTqA9hWrK6R/rQFEk1/U0gu dDew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VA5DXYVG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i4-20020a63e444000000b00551df478a00si3175132pgk.137.2023.06.16.11.55.19; Fri, 16 Jun 2023 11:55:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VA5DXYVG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344589AbjFPS2E (ORCPT + 99 others); Fri, 16 Jun 2023 14:28:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbjFPS1z (ORCPT ); Fri, 16 Jun 2023 14:27:55 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EBFD35AB; Fri, 16 Jun 2023 11:27:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686940074; x=1718476074; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zxtEr+49Z8SAPXXw+VLSmKO/k0+mLZ8hYyIUwukrFTo=; b=VA5DXYVGxNPxlswh6cAFi17GZfVvbKo+t3G1CbZJHCYyx72uwd5zhFAP tx/TWcf7ry4AIHAFYHj6cJrunAg1Mk6lphQQDP5ESFU6DtRMfusrxi+sC R4ltbnhphJryIIepbe+yaoUgzhh835HyCF/VqjaMvNOBBDam2e48DYfre 4mxMFq9j5AomJaTcN9TUWk1/lHP9BMyVd4Kl6v1tEzMNV63cVP/SI+gT7 TLG6ki9ADlvsJh3ZxOpjMwpUblAcidD//olpHuPtAizEqtdQNcfH6PR5S a3jHXotkC5YNDgQKqHijmcRsj2LW+uI4IwHoDb8JahNZFD4/DzA82DOMN A==; X-IronPort-AV: E=McAfee;i="6600,9927,10743"; a="361815197" X-IronPort-AV: E=Sophos;i="6.00,248,1681196400"; d="scan'208";a="361815197" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2023 11:27:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10743"; a="783018175" X-IronPort-AV: E=Sophos;i="6.00,248,1681196400"; d="scan'208";a="783018175" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2023 11:27:51 -0700 From: Tony Luck To: Borislav Petkov Cc: Yazen Ghannam , Smita.KoralahalliChannabasappa@amd.com, dave.hansen@linux.intel.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v6 3/4] x86/mce: Handle AMD threshold interrupt storms Date: Fri, 16 Jun 2023 11:27:43 -0700 Message-Id: <20230616182744.17632-4-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230616182744.17632-1-tony.luck@intel.com> References: <20230411173841.70491-1-tony.luck@intel.com> <20230616182744.17632-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Smita Koralahalli Add hook into core storm handling code for AMD threshold interrupts. Disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors. Turning off the threshold interrupts is the best solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled. [Tony: Updated places where storm tracking variables moved into a structure] Signed-off-by: Smita Koralahalli Signed-off-by: Tony Luck --- arch/x86/kernel/cpu/mce/internal.h | 2 ++ arch/x86/kernel/cpu/mce/amd.c | 49 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/mce/core.c | 3 ++ 3 files changed, 54 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index eae88a824d97..22899d28138f 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -232,6 +232,7 @@ extern bool filter_mce(struct mce *m); #ifdef CONFIG_X86_MCE_AMD extern bool amd_filter_mce(struct mce *m); +void mce_amd_handle_storm(int bank, bool on); /* * If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits @@ -259,6 +260,7 @@ static __always_inline void smca_extract_err_addr(struct mce *m) #else static inline bool amd_filter_mce(struct mce *m) { return false; } +static inline void mce_amd_handle_storm(int bank, bool on) {} static inline void smca_extract_err_addr(struct mce *m) { } #endif diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 0b971f974096..b19f3eb70187 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -468,6 +468,47 @@ static void threshold_restart_bank(void *_tr) wrmsr(tr->b->address, lo, hi); } +static void _reset_block(struct threshold_block *block) +{ + struct thresh_restart tr; + + memset(&tr, 0, sizeof(tr)); + tr.b = block; + threshold_restart_bank(&tr); +} + +static void toggle_interrupt_reset_block(struct threshold_block *block, bool on) +{ + if (!block) + return; + + block->interrupt_enable = !!on; + _reset_block(block); +} + +void mce_amd_handle_storm(int bank, bool on) +{ + struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; + struct threshold_bank **bp = this_cpu_read(threshold_banks); + unsigned long flags; + + if (!bp) + return; + + local_irq_save(flags); + + first_block = bp[bank]->blocks; + if (!first_block) + goto end; + + toggle_interrupt_reset_block(first_block, on); + + list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj) + toggle_interrupt_reset_block(block, on); +end: + local_irq_restore(flags); +} + static void mce_threshold_block_init(struct threshold_block *b, int offset) { struct thresh_restart tr = { @@ -868,6 +909,7 @@ static void amd_threshold_interrupt(void) struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; struct threshold_bank **bp = this_cpu_read(threshold_banks); unsigned int bank, cpu = smp_processor_id(); + u64 status; /* * Validate that the threshold bank has been initialized already. The @@ -881,6 +923,13 @@ static void amd_threshold_interrupt(void) if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank))) continue; + rdmsrl(mca_msr_reg(bank, MCA_STATUS), status); + track_cmci_storm(bank, status); + + /* Return early on an interrupt storm */ + if (this_cpu_read(storm_desc.bank_storm[bank])) + return; + first_block = bp[bank]->blocks; if (!first_block) continue; diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index cd9d9ea5bb0a..d4c9dc194d56 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -2055,6 +2055,9 @@ static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) void mce_handle_storm(int bank, bool on) { switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_AMD: + mce_amd_handle_storm(bank, on); + break; } } -- 2.40.1