Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp3173379rwb; Wed, 30 Nov 2022 16:43:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf6b14R4g0xA58SG7flOZGraGbldIPgUsKqnGjJ0ey/HNOzqT0gaCoRFGDxuUMEs7QFeSEad X-Received: by 2002:a63:2160:0:b0:46f:f26e:e8ba with SMTP id s32-20020a632160000000b0046ff26ee8bamr41184308pgm.250.1669855386163; Wed, 30 Nov 2022 16:43:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669855386; cv=none; d=google.com; s=arc-20160816; b=G26yW+TqABLphLKVubgdlaO24WH6lYWRGDjCignOfDt2v+zphT/ce6W0ZoTOq5vQJm SpbnDfdro0VUDLNRcjm80sHJ5fWYMNHY3dEA6olm9WFFQpqqvnW5zx/W/7QhtGkxEqse 2GVEqRXbFTHtHN+FCV1R1VYgtUlwEgpcruIXxJVQGWrz5/JpxydrJQu8cjZlxq+2FAIh b9z5FtBfrCU5DZZkd1joOR4121k7EnISdRtnA2GeM0rV3R+LYw+GX+gKZrT5f+S3Qs6A 6GK5EnfCeU8eNqyC0OOStZDDvTrBvkpmmOHiS3hyBVhii6ekSbUbBEWFxKYHKAm9yf9C DBbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=bZPmwaW8zkxnyfuYSWuvlfr7TyglUbkuH8MZ7VOi87o=; b=DaHWOyICzOmkIHVIfCtSpBqKsY64YmTZzBu7IPHYeRDyQZuy2FHWMfd6kZ8o1fKKML 5eSTMh7EEYLpUDE+Nez7fP4nch+5h/Iw8AKspnkdkS8QA5qS6THWg0Ae0NzhZCJS5tP4 hZ2eP0rCvvP6e9L64+sitgqjgaUlOJ0oM6oeQWOxN/W7RajVh15wYC/BT86OIgvgzSdu DAjk/slNp9Bg63zqoGw+fhtq/62Fg77qrb03FGRw5PhdSePfgLEDTX8ccU8IT4LG/iQR oEX/x/pfoybsxTbtT03K45TdAytt7/U+Llxv/wMqb5tz4t6dOxGjGhChzBR/Mm6iIK9N 5yYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=lTWm7oBv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u15-20020a170902e80f00b00186e114bedfsi2917693plg.126.2022.11.30.16.42.29; Wed, 30 Nov 2022 16:43:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=lTWm7oBv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229614AbiK3XVU (ORCPT + 83 others); Wed, 30 Nov 2022 18:21:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbiK3XVB (ORCPT ); Wed, 30 Nov 2022 18:21:01 -0500 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 192F1A13E7 for ; Wed, 30 Nov 2022 15:13:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1669850022; x=1701386022; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=bZPmwaW8zkxnyfuYSWuvlfr7TyglUbkuH8MZ7VOi87o=; b=lTWm7oBvCD5aWXgUflr99cEBh9aIU2wsWLX3pG6P9wqTqvHcbPNbl63R Z3jgWqxQRYvoyFaWN/iGSn/ucDCwFZcebHWgjrbmFyERIdmRbkkup1vL3 Xn0/oi6+XsNbSHJYZJwCQ8Q1WK6wgR1YwCsv89+niJhQ9Ei4hD4JHwIFs g=; X-IronPort-AV: E=Sophos;i="5.96,207,1665446400"; d="scan'208";a="285771492" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-pdx-2a-m6i4x-8a14c045.us-west-2.amazon.com) ([10.25.36.210]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Nov 2022 23:13:33 +0000 Received: from EX13MTAUWC002.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-2a-m6i4x-8a14c045.us-west-2.amazon.com (Postfix) with ESMTPS id 419D6825BE; Wed, 30 Nov 2022 23:13:32 +0000 (UTC) Received: from EX19D003UWC001.ant.amazon.com (10.13.138.144) by EX13MTAUWC002.ant.amazon.com (10.43.162.240) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Wed, 30 Nov 2022 23:13:31 +0000 Received: from [192.168.3.46] (10.43.162.134) by EX19D003UWC001.ant.amazon.com (10.13.138.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.20; Wed, 30 Nov 2022 23:13:30 +0000 Date: Wed, 30 Nov 2022 17:13:27 -0600 From: Geoff Blake To: Robin Murphy CC: , , , Subject: RE: [PATCH 1/2] perf/arm-cmn: Cope with spurious IRQs better In-Reply-To: Message-ID: <99fd664c-bf59-b8c0-29d0-6eccfc1c8e80@amazon.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [10.43.162.134] X-ClientProxiedBy: EX13D49UWC001.ant.amazon.com (10.43.162.217) To EX19D003UWC001.ant.amazon.com (10.13.138.144) X-Spam-Status: No, score=-14.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > From my perspective, this is a worse solution as now we're sweeping an > > issue under the rug and consuming CPU cycles handling IRQs we should not > > be getting in the first place. While an overflow IRQ from the cmn should > > not be high frequency, there is a non-zero chance in the future it could > > be and this could lead to a very hard to debug performance issue instead > > of the current problem, which is discovering we need to clean up better > > from a noisy kernel message. > > Kexec is not the only possible source of spurious IRQs. If they cause a > problem for this driver, that cannot be robustly addressed by trying to > rely on whatever software might happen to run before this driver. Sure, I can agree with the assertion a spurious IRQ could come from anywhere, in that case though, shouldn't the behavior still be to log spurious IRQs as a warning instead of silently sinking them? > > The driver as best I can grok currently is optimized to limit the amount > > of register writes for the common use-case, which is setting and unsetting > > events, so all the wiring for the PMU to feed events to the DTC is done up > > front on load: DTC_CTL's DT_EN bit is set immediately during probe, as is > > OVFL_INTR_EN. All the DN states and DTM PMU_CONFIG_PMU_EN is deferred > > for when an event is actually set, and here we go through all of them > > anyways for each event unless its bynodeid, so the expense of setting > > events grows linearly with the mesh size anyways. > > If arm_cmn_init_dtc() writing 0 to PMCR didn't stop the PMU then we've > got bigger problems, because that's how we expect to start and stop it > in normal operation. I'm not ruling out that some subtle bug in that > regard might exist, since I've still not yet had a chance to reproduce > and observe this behaviour on my board, but I've also not seen > sufficient evidence to suggest that that is the case either. (Now that > I'm looking closely, I think there *is* actually a small oversight for > the DTMs, but that would lead to different symptoms than you reported) > At least the writes to PMOVSR_CLR *did* clearly work, because you're > seeing the "nobody cared" message from the IRQ core rather than the > WARN_ON(!dtc->counters[i]) which would happen if a fresh overflow was > actually asserted. Currently I would expect to see up to 4 of those > messages since there can be up to 4 IRQs, but once those are all > requested, enabled, and "handled", all the spurious initially-latched > state should be cleared and any *new* overflows will be indicated in > PMOVSR. I don't see how a single IRQ could ever be unhandled more than > once anyway, if the first time disables it. I do see 4 of these "nobody cared" messages in all the times I've reproduced it, but saw no need to copy paste all of them in with the original post. Looking back over the code I see why more clearly your assertion we only need to clear the DT_EN bit as the PMU is off at the DTC with the PMCR set to 0 on init, but it is really hard to see why that is with all the various places bits of configuration is done, but it is still not easy to verify if unsetting that bit is sufficient to not get into some odd corner cases. Is there any argument against me taking another pass and try separating out discovery, from a shared reset/initialization code path? -Geoff