Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp6090552rwb; Mon, 5 Dec 2022 07:53:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf4pgVnRKdeWeKsOG09bDb8j8AlW3hVhCyEne7iNLlT1hxfhfgVZiG/HGV6kaG5jUZ34hhNw X-Received: by 2002:a17:906:ce49:b0:7c0:ee36:fb17 with SMTP id se9-20020a170906ce4900b007c0ee36fb17mr5590621ejb.727.1670255636914; Mon, 05 Dec 2022 07:53:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670255636; cv=none; d=google.com; s=arc-20160816; b=Cb6ew+qWIUtjfZG6waZoKHvkn6gNV5hLYnCWYDHlqp2gbWGkI12Ql92+WWZnlrQtjw Wba/qpcC5RoRTeOS+IIG0N/KNTnAERkgzj6LYd3NgiFnqAfvB4uQGeO2P9dtug23UeLB tKgD6vtxKIAMe3oLu0UiYuRCXndrxtL/Srs68YGqXkXNE0YnVyutrEe5mNMvGI9Mwsoh 9LgVNS1KAnk8onW2rkTr5EbAPstOhAwFaDxx+jfXxwkcoCl6wPS+n437LLkRwR+0ER7Z dwLRyrfSa2fqkJS9qNxJIRiaQsp1cg14fjorzQVDePAClXSHHBshGyMiP++EdpnEai+z QRAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=p5Y08WCauCkKV8rGN0y/sAMyI/lEE0FyECSha1h2tSI=; b=BILFj7qGhLwD9gdJ0zBzC3dTf1XwSRvXeleCP4l5owdX3bnfz/1654EP7y0bpPOA3S SqDMeWuG6OXxyjMalnnCN4pBrakqgKnBt1UWnhuaFdm+Phxtfptw4J8Ij4JttppFiN20 pZtgcsxJJGnZCL33xPz/ri6kPOlpKe2k1c0NJx3mdmDvb5ZFcrUOYcZtmjT1BH71LTrL 3WxGkqfNR5rSdthPtMONIk0Hc7RCvkmev+rn6LFtdfpP7oMpZVeBSaSMZIF0x4R18ZCu Z38DZKEfz6WhZUJXNkixHglytve7kEv//V7ChJ02bIx34COplI5hyNFc3vomPzEcbHdU T0dA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=ctJ9TovA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e6-20020a056402190600b0046b00aab3e3si1515365edz.170.2022.12.05.07.53.36; Mon, 05 Dec 2022 07:53:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=ctJ9TovA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232700AbiLEPiy (ORCPT + 81 others); Mon, 5 Dec 2022 10:38:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232372AbiLEPic (ORCPT ); Mon, 5 Dec 2022 10:38:32 -0500 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9298255 for ; Mon, 5 Dec 2022 07:38:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1670254693; x=1701790693; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=p5Y08WCauCkKV8rGN0y/sAMyI/lEE0FyECSha1h2tSI=; b=ctJ9TovAwOIFj1nAj6Y7SywP6uPHrM7HKBFwt/FraLrfSD0oyAnyxH67 RJ8LvuqFzeq0W5Zb7n5nqFmVPRlgQvuEyH5aPEmH17nq1vVP5QV9tOFaY p6Pmk4P8ljkP+u4nz5cyS6clqnCg7S6iqV+04X4uRRME66sypy8AUf+1Y Y=; X-IronPort-AV: E=Sophos;i="5.96,219,1665446400"; d="scan'208";a="158006298" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-iad-1e-m6i4x-245b69b1.us-east-1.amazon.com) ([10.25.36.214]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2022 15:38:09 +0000 Received: from EX13MTAUWC002.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1e-m6i4x-245b69b1.us-east-1.amazon.com (Postfix) with ESMTPS id 6B29E341E4A; Mon, 5 Dec 2022 15:38:06 +0000 (UTC) Received: from EX19D003UWC001.ant.amazon.com (10.13.138.144) by EX13MTAUWC002.ant.amazon.com (10.43.162.240) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Mon, 5 Dec 2022 15:38:05 +0000 Received: from [192.168.4.128] (10.43.162.134) by EX19D003UWC001.ant.amazon.com (10.13.138.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.20; Mon, 5 Dec 2022 15:38:04 +0000 Date: Mon, 5 Dec 2022 09:38:02 -0600 From: Geoff Blake To: Robin Murphy CC: , , , Subject: RE: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better In-Reply-To: <83d16969-9d23-1dc5-c9dd-03542b43a52e@arm.com> Message-ID: <2bb86e97-6cef-700e-70ed-4f303da10fd9@amazon.com> References: <99fd664c-bf59-b8c0-29d0-6eccfc1c8e80@amazon.com> <83d16969-9d23-1dc5-c9dd-03542b43a52e@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [10.43.162.134] X-ClientProxiedBy: EX13D42UWA004.ant.amazon.com (10.43.160.18) To EX19D003UWC001.ant.amazon.com (10.13.138.144) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > > From my perspective, this is a worse solution as now we're sweeping an > > > > issue under the rug and consuming CPU cycles handling IRQs we should not > > > > be getting in the first place. While an overflow IRQ from the cmn > > > > should > > > > not be high frequency, there is a non-zero chance in the future it could > > > > be and this could lead to a very hard to debug performance issue instead > > > > of the current problem, which is discovering we need to clean up better > > > > from a noisy kernel message. > > > > > > Kexec is not the only possible source of spurious IRQs. If they cause a > > > problem for this driver, that cannot be robustly addressed by trying to > > > rely on whatever software might happen to run before this driver. > > > > Sure, I can agree with the assertion a spurious IRQ could come from > > anywhere, in that case though, shouldn't the behavior still be to log > > spurious IRQs as a warning instead of silently sinking them? > > We still have to handle the interrupt anyway to avoid it getting > disabled behind our back, and beyond that it's not really something > that's actionable by the user. What would we say? > > dev_warn(dev, "Something harmless, and in some cases expected, > happened! If you've just rebooted after a kernel panic, maybe try having > the kernel not panic?"); > > Perhaps that should be a core IRQ helper so that many other drivers can > also call it too? > > Furthermore if you're worried about performance implications from a > theoretical interrupt storm, I can tell you from experience that logging > to a serial console from a high-frequency interrupt handler is one of > the best ways to cripple a system to the point where reaching for the > power switch is the only option. Logging unexpected events is necessary to give clues of what is going wrong before they implode on fully remote machines. If you prefer to handle the IRQ here rather than in the bad_irq section, then can we at least have a WARN_ON() in the case where a spurious IRQ happens but no overflow bit is set. > The DTC_CTL documentation seems fairly unambiguous: > > [0] dt_en Enables debug, trace, and PMU features > > The design intent is that the PMU counters do not count when the entire > PMU feature is disabled. I'm pretty sure I did confirm that empirically > during development too (I recall the sheer number of different "enable" > bits baffled me at the beginning, and there was actually one that did > nothing, which I think did eventually get removed from the documentation). > > Of course clearing PMCR_PMU_EN is sufficient to simply stop counting, > which we also depend on for correct operation, but I believe clearing > DT_EN allows it to put all of the DT logic into a quiescent state. I took the other patch that writes 0 to DTC_CTL.dt_en only and put it in a loop of kexec'ing when the PMU is active for a few hours, I did not see anymore spurious IRQs (whereas with the stock driver I could reproduce in under 10 tries). You are correct Robin, that is all that is needed, and my code was overly cautious. - Geoff