Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2796245rdb; Wed, 4 Oct 2023 11:38:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF/g3U0zeHhI8ybnaWUURDQxCsbZzcIMswJblPwWLypXf645TODbih7PC2j+NuOKQTGUN1x X-Received: by 2002:a05:6a20:9698:b0:15d:624c:6e43 with SMTP id hp24-20020a056a20969800b0015d624c6e43mr2564883pzc.3.1696444728895; Wed, 04 Oct 2023 11:38:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696444728; cv=none; d=google.com; s=arc-20160816; b=zifTlvOUAXP97uLEIX1yU5gF/O++H/Ol0Jq4eCMlqUCzTN5fw6s8xZs1i2kBgol/wD iVzpJiafBVo0rxgMSUpZHk5qEnN+39s7fJ5e5cVhNIyHYZdCmgL4yyx/WFMYIqXjeF5p eKxW+0BaNpKGAU/1kUB7acJCE/fjJ2Lfzt5qH2kF5MXkG6sX53v7wtlekj2DO8NRTXK6 yxQoeu7yKA1wVPXqtf9mXZHAXkVtpTNI0egkPd6uIpbtQsTy5C3ze49oXcXtX0aswoD8 tpzVo0uWzwvLTw1AAmSeNv8Wyh1A+bO5y/LWdnQ+0BZf63U/iu3tlonx3J5gdQKeJoql F79A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AMGu90qKdqbLDSptsZV4IcwSUoJn4WoLxMP6aaxUiHM=; fh=S7iaUXeayMPNyPzmDJBX6zQJxeHISzeeDLUOtGlzdSo=; b=Ve2jBSmCcQUZ5+a124cuEOzE3LPvKyB2QydQjwIRPYjz4cIRccQVOMDYDcVwB9BWIB em223ySX4WZzZDTexyElADYB3lFHU7Ob2NwXvJUafDMocCWuahD26yqYE0xMv+fECkuh e0UN/lJFRUBZcQaWaXwAt2Z9mKInIUTr7/i9djR9vS+obzT4TmrO/p2wDs1dJWCP2sDc II93PNECyKgGQsM0n/fWfj3hUuLWQCfBRtj+sb3aygW5i9vMbE/tPErVNrq1PTHYuOvW v2Qy9wD+quCaV9t7S17JjxPVWUorocN/gnRiLog0vn41ZVK6fnD43dprc48ZS1ANTFgm 0xAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UzDUs+k7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id 11-20020a63164b000000b00569fd44093fsi4137924pgw.230.2023.10.04.11.38.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 11:38:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UzDUs+k7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 575A18029375; Wed, 4 Oct 2023 11:38:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244488AbjJDSiB (ORCPT + 99 others); Wed, 4 Oct 2023 14:38:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244485AbjJDSiA (ORCPT ); Wed, 4 Oct 2023 14:38:00 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6FBAC4; Wed, 4 Oct 2023 11:37:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696444675; x=1727980675; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TTblblbDN1W4j6G8QhSl9EaeyJL5JAlnQkhiVd7lKhg=; b=UzDUs+k7BYSIixCDq+Jf4CpG9tWsxoFEm5r4/SOAoqKkem3IcM2k7OFf gjR/dzDoLoATiQFyliQhuD2vFE/xjnoAR4TxGALznyWCvgElTyXb1o0HJ AL0ZnxsXvKDe3O3rLasjPu3LdYIJeguzA+I7HwvHMkWEiEJHLVZGolzo9 YBdSdJ9H7HSndOyIQIeNCd+zvjJSjwM6E3GrfXxa7/Gq2oLxpdJTVDGaZ KSz1lkM5P/RnvHZTjlUSWtnMPKQtpaZAuOKkaOgLTkyq96D3Ez71CGPd1 2guY6GzF/KvV3k2e8Bn8joQLnooyJ2zYGima8VUIx/G8GSzZiutEbO4lz Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10853"; a="387133505" X-IronPort-AV: E=Sophos;i="6.03,201,1694761200"; d="scan'208";a="387133505" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Oct 2023 11:36:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10853"; a="701236050" X-IronPort-AV: E=Sophos;i="6.03,201,1694761200"; d="scan'208";a="701236050" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Oct 2023 11:36:32 -0700 From: Tony Luck To: Borislav Petkov Cc: Yazen Ghannam , Smita.KoralahalliChannabasappa@amd.com, dave.hansen@linux.intel.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v9 0/3] Handle corrected machine check interrupt storms Date: Wed, 4 Oct 2023 11:36:20 -0700 Message-ID: <20231004183623.17067-1-tony.luck@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230929181626.210782-1-tony.luck@intel.com> References: <20230929181626.210782-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=2.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 04 Oct 2023 11:38:36 -0700 (PDT) X-Spam-Level: ** Linux CMCI storm mitigation is a big hammer that just disables the CMCI interrupt globally and switches to polling all banks. There are two problems with this: 1) It really is a big hammer. It means that errors reported in other banks from different functional units are all subject to the same polling delay before being processed. 2) Intel systems signal some uncorrected errors using CMCI (e.g. memory controller patrol scrub on Icelake Xeon and newer). Delaying processing these error reports negates some of the benefit of the patrol scrubber providing early notice of errors before they are consumed and cause a machine check. This series throws away the old storm implementation and replaces it with one that keeps track of the weather on each separate machine check bank. When a storm is detected from a bank. On Intel the storm is mitigated by setting a very high threshold for corrected errors to signal CMCI. This threshold does not affect signaling CMCI for uncorrected errors. Signed-off-by: Tony Luck --- Changes since v8: Fixed issue reported by lkp with randconfig build with neither CONFIG_X86_MCE_INTEL not CONFIG_X86_MCE_AMD set by making a cleaner division between the storm tracking code in threshold.c with the restof the code using more function accessors that can be stubbed out. Tony Luck (3): x86/mce: Remove old CMCI storm mitigation code x86/mce: Add per-bank CMCI storm mitigation x86/mce: Handle Intel threshold interrupt storms arch/x86/kernel/cpu/mce/internal.h | 48 ++++- arch/x86/kernel/cpu/mce/core.c | 45 ++--- arch/x86/kernel/cpu/mce/intel.c | 303 ++++++++++++---------------- arch/x86/kernel/cpu/mce/threshold.c | 115 +++++++++++ 4 files changed, 304 insertions(+), 207 deletions(-) base-commit: 6465e260f48790807eef06b583b38ca9789b6072 -- 2.41.0