Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3347676rdh; Mon, 27 Nov 2023 11:51:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFz8IzpfFKbI2Q8wEWJyObPwAbBdCDWqHZbb+NBKG4qfjk9WxUa+87Z2IEUQGcsNkeGaw3 X-Received: by 2002:a17:90b:1650:b0:285:34c4:a0c8 with SMTP id il16-20020a17090b165000b0028534c4a0c8mr11147462pjb.39.1701114674483; Mon, 27 Nov 2023 11:51:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701114674; cv=none; d=google.com; s=arc-20160816; b=Q7SY7XYePxQ80u2uplai2wJ9KQMCZ63PpSkIJ65FkUd3M0JzINOilxT+z/cN0RCgdX Yl35JeycqqQ5wHBPcuBvZpd6JYA94Ms0JIcStz1YkNiDcNJISWdUU1tH10XvdAzD4p0b 32plH3ZK0QznunMK3d4wFrC7VZCuCzN1O7MUwvcNBCBwUEBtcKTHuC3vKi2YpiC6sawy 4xlX2aMeN2qZP6cWiXG6DvRd6+ir/ShwHkrQM8T0VModhl610ZgvP5ChE+8BMb0rQYb6 af5+lSjDfUrtVbeqZi+1esGxw30IrJV/K3r8Xs9p6EldD7cYoDthx/lnIQqq+7iyIruw kmug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=78Cy/J3horJHwQOqLJEj8U6cNMr8TTAxf/cwqsUT2cM=; fh=MduqRXC3aaNM9G800fpOeM6Zpb+vrOIrlF6XhEkwlxw=; b=Xerm5Ezto19a5FRRekh7HA0XjLxNgNGf9aGGdKnJh87qRTeUjLZTClNsJfpmVT7bkO /l265H/p9Cb2xcqNdCZ1V1Ay6Ihv1COzWbDCaySPbDmrUys+I7WVfiECmJHGnjyumbo1 aglPNpRLOcV2NuEfEfk24iRKIIs5lAgXrCxdiNoIVLbvoERIDUxHnMI6TqXkkxgE/98x DRiY/pXQm8bSLN3SRBtdFPXJEixpnBl5NL/6uu6FEqqmhM6fCiKEXvWmmoCyUv+T3fMS QrnE0LM8YqqpFbksuFKJEmdJP6/h88ZGfntCQU9gMDiI3UtYh0unFB8Z39eWBEo99JcR BgYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gCObUKfT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id qe6-20020a17090b4f8600b00285ca5d2636si3393639pjb.123.2023.11.27.11.51.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 11:51:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gCObUKfT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 29AA5807B4AB; Mon, 27 Nov 2023 11:50:40 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231358AbjK0TuX (ORCPT + 99 others); Mon, 27 Nov 2023 14:50:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231246AbjK0TuX (ORCPT ); Mon, 27 Nov 2023 14:50:23 -0500 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE7371B4; Mon, 27 Nov 2023 11:50:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701114628; x=1732650628; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=FOWlBJuI1QWETMszMwBkYk25m1g8XF0ebe/+lRpK/Yg=; b=gCObUKfTaJZdrOdFp7nSPUlKYVaHXKwxM4Ib2kwUkf/AyO8UOxcrYknA uZnA47b8xdfi8hPOez76FhBEHIygWJDkNgmBs4lUZnLm1me/JUoG/UsAU 10j89ErBIwrikmgmoF29llZP1QjTnWKsaU37s6uir64TLsArK/yNwhEVA 5oXXzJhDg5tmRTW1vid8kuG9mRRTrq8fYA2KEcWMJcruDVVvk0Rzhe1eq k7n9Fyf3uAM+VidsAC9KaHiS5dEvSrkKxGAR78tmKOSv8023GrpNcJ9rR gjZfGiMkDUckR1Sgf6btmUk1GDAhOfRHybL0AgsicTcU0DBxM+2f6jmwh w==; X-IronPort-AV: E=McAfee;i="6600,9927,10907"; a="389930690" X-IronPort-AV: E=Sophos;i="6.04,231,1695711600"; d="scan'208";a="389930690" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Nov 2023 11:50:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10907"; a="744669420" X-IronPort-AV: E=Sophos;i="6.04,231,1695711600"; d="scan'208";a="744669420" Received: from agluck-desk3.sc.intel.com (HELO agluck-desk3) ([172.25.222.74]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Nov 2023 11:50:28 -0800 Date: Mon, 27 Nov 2023 11:50:26 -0800 From: Tony Luck To: Borislav Petkov Cc: Yazen Ghannam , Smita.KoralahalliChannabasappa@amd.com, dave.hansen@linux.intel.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev Subject: Re: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation Message-ID: References: <20230929181626.210782-1-tony.luck@intel.com> <20231004183623.17067-1-tony.luck@intel.com> <20231004183623.17067-3-tony.luck@intel.com> <20231019151211.GHZTFHS3osBIL1IJbF@fat_crate.local> <20231114192324.GAZVPJLGZmfJBS181/@fat_crate.local> <20231121115448.GCZVyaiNkNvb4t2NxB@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231121115448.GCZVyaiNkNvb4t2NxB@fat_crate.local> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 27 Nov 2023 11:50:40 -0800 (PST) On Tue, Nov 21, 2023 at 12:54:48PM +0100, Borislav Petkov wrote: > On Tue, Nov 14, 2023 at 02:04:46PM -0800, Tony Luck wrote: > > Whichever of the timer and the CMCI happens first will run. Second to > > arrive will pend the interrupt and be handled when interrupts are > > enabled as the first completes. > > So I still don't like the timer calling machine_check_poll() and > cmci_mc_poll_banks() doing the same without any proper synchronization > between the two. But it isn't doing the same thing. The timer calls: machine_check_poll(0, this_cpu_ptr(&mce_poll_banks)); and cmci_mc_poll_banks() calls: machine_check_poll(0, this_cpu_ptr(&mce_poll_banks)); A bank is either in the bitmap of banks to poll from the timer, or in one of the per-cpu bitmaps of banks "owned" by that CPU to be checked when a CMCI occurs. But it can't be in both. > Yes, when you get a CMCI interrupt, you poll and do the call the storm > code. Now what happens if the polling runs from softirq context and you > get a CMCI interrupt at exactly the same time. I.e., is > machine_check_poll() reentrant and audited properly? So nothing bad happens. If Linux was polling some set of banks from the timer and is interrupted by CMCI, the interrupt will check some disjoint set of banks. All the history tracking code is done per-bank, so there is no overlap. > I hope I'm making more sense. Yes. Totally making sense. I was under the mistaken impression that the mce timers used TIMER_IRQSAFE and the nested CMCI while processing a timed poll couldn't happen. So I learned something here too. I'll think of some comment to add to the history tracking code to summarize this thread. -Tony