Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp7986926rwl; Thu, 23 Mar 2023 11:04:40 -0700 (PDT) X-Google-Smtp-Source: AK7set8vb4De1MJoGqmq56KksIqKNKQymqkKCekVL5S7COJL4KjSE54Jg3EfwciMfQRhQxQ4eYxu X-Received: by 2002:a17:906:5901:b0:933:4c24:101b with SMTP id h1-20020a170906590100b009334c24101bmr6853999ejq.7.1679594680208; Thu, 23 Mar 2023 11:04:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679594680; cv=none; d=google.com; s=arc-20160816; b=axRoB/vVGlGmYdlN1U/aTVwMVqJ2VBAoQ/QuhGYQIEForBXmUJN/tC6FQFjBEvZxfb s8cWoOXuBY97L3PvrVyDViz0BeGZbyV02GHdMeZougVXEJXRPfKUsVOAHSC+HL7n9H0V 43c/jWxJdPqwbd9SdImV43Xf1DnUDVjwJYv6gnv5Dfu1628I3LdAFv/OTvyLP71DggH5 a5/hjqCM68HvRrOINoX6EhCEdMeHMUc8FUzlno0Jn9rtpyRu6C0UyaQ5YoZIwJLY5Uu8 e5jFJZ6FlvRHJu+HLbUEd6aogv7+ARv4xFx8te0rjKSwJgYSJyfqDAjHqaXhjT+8aO7L b+8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xLS47RXQcdUynadStE3R88PnxZLMzE8N8XF3ZNZxUfg=; b=zhrppGDumhql4yE0oHWlin1ZpQW93ahKIXGr590SqCNchNMQV3G5AXgnw4vwiqDzsH dQe6ttjg/rsdxYsd/iDO4bQ2RdFiLn0qCavund3qRFsCvEwewJynM2OCCQRe3mTx+O3V IkilSbYMtogbjpGyU+0ulyhLftMSZUENzMA6ZHvY54k15+mEiX9fucLvvt5rEjsLPQAb SHEOWlSPHMTNfsJE5BTxeRJbINjEGbMLOZzU75L8mgbxeVKpKQWCMRS2PrptjfCkKSfm SwFzfwxnHrFJ7SOE/4fThtsZGVhU/0TbI7HeJzgCGg9HWqtNu842jyOv2kpR78s69vI5 F+8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="f3TU0B/y"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a26-20020a1709064a5a00b00931038178ecsi15946990ejv.330.2023.03.23.11.04.12; Thu, 23 Mar 2023 11:04:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="f3TU0B/y"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbjCWSBD (ORCPT + 99 others); Thu, 23 Mar 2023 14:01:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230321AbjCWSBB (ORCPT ); Thu, 23 Mar 2023 14:01:01 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 609A111158; Thu, 23 Mar 2023 11:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679594460; x=1711130460; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=v0ftauUrjiaR6pvZo2Poj+0io1VoaItaW/Z2mkjM9AU=; b=f3TU0B/yxJULIcy/UVJErCUfHQApoe/MqiIk9Q7jbqG5HMgqrOsGk6Pw LeNFAi7EXSo0g43Mwmlq3wG86NrbLjZd3O0pvzV/zkzzdI8cKuUIXEC+v jBruVpbkjEXblJ0hDaAcj9GDo0VGwLa7V/rrIZlRoVdxEjhrFrU79P70Q B2dna41R6tgJHPLz6BZeycB5rAw4oVn+IdDAmkEmYFe1iWkszML3b6HY0 9whYv4sKziJlFa0qItoWNtlA/GIKS1VfF7vbP/umxGaELI7CRK4sbwETV yhnNCQgriONjMzHlCVRTh7MkglNTuNhp90tZbn/i4YMNyV2P9+feYXBb/ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="402156738" X-IronPort-AV: E=Sophos;i="5.98,285,1673942400"; d="scan'208";a="402156738" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2023 11:00:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10658"; a="856583768" X-IronPort-AV: E=Sophos;i="5.98,285,1673942400"; d="scan'208";a="856583768" Received: from agluck-desk3.sc.intel.com ([172.25.222.78]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2023 11:00:56 -0700 Date: Thu, 23 Mar 2023 11:00:55 -0700 From: Tony Luck To: Yazen Ghannam Cc: Borislav Petkov , Smita.KoralahalliChannabasappa@amd.com, dave.hansen@linux.intel.com, hpa@zytor.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, patches@lists.linux.dev Subject: Re: [PATCH v3 3/5] x86/mce: Introduce mce_handle_storm() to deal with begin/end of storms Message-ID: References: <20230317172042.117201-1-tony.luck@intel.com> <20230317172042.117201-4-tony.luck@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 11:22:22AM -0400, Yazen Ghannam wrote: > On Fri, Mar 17, 2023 at 10:20:40AM -0700, Tony Luck wrote: > > +void mce_intel_handle_storm(int bank, bool on) > > +{ > > + if (on) > > + cmci_set_threshold(bank, cmci_threshold[bank]); > > + else > > + cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); > > I think these conditions are reversed. When storm handling is 'on' we should > use CMCI_STORM_THRESHOLD, and when off use the saved bank threshold. > > > +} > > + > > static void cmci_storm_begin(int bank) > > { > > __set_bit(bank, this_cpu_ptr(mce_poll_banks)); > > @@ -211,13 +219,13 @@ void track_cmci_storm(int bank, u64 status) > > if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) > > return; > > pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); > > - cmci_set_threshold(bank, cmci_threshold[bank]); > > + mce_handle_storm(bank, true); > > Should be 'false' when the storm subsides. > > > cmci_storm_end(bank); > > } else { > > if (hweight64(history) < STORM_BEGIN_THRESHOLD) > > return; > > pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); > > - cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); > > + mce_handle_storm(bank, false); > > Should be 'true' when the storm starts. > > > cmci_storm_begin(bank); > > } > > } There's a saying that two wrongs do not make a right (but three lefts do). My code was working, but only because the second mistake cancelled out the first. Changing them both as you suggest (diff below) and the code still works, and makes sense too! Thanks -Tony diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 74b560476424..c3e1bb790680 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -677,13 +677,13 @@ void track_cmci_storm(int bank, u64 status) if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) return; pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); - mce_handle_storm(bank, true); + mce_handle_storm(bank, false); cmci_storm_end(bank); } else { if (hweight64(history) < STORM_BEGIN_THRESHOLD) return; pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); - mce_handle_storm(bank, false); + mce_handle_storm(bank, true); cmci_storm_begin(bank); } } diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 6cc9aa97c092..20c2143a68c1 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -134,9 +134,9 @@ static void cmci_set_threshold(int bank, int thresh) void mce_intel_handle_storm(int bank, bool on) { if (on) - cmci_set_threshold(bank, cmci_threshold[bank]); - else cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); + else + cmci_set_threshold(bank, cmci_threshold[bank]); } /*