Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4711368rdh; Wed, 29 Nov 2023 08:35:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IGYOYWHc6ar+AMFYNpnTI/sN1nyg6Ebrq8sTZCqG/oNEk+RO25IyU26J51UhlrQ10eiON4K X-Received: by 2002:a05:6808:1593:b0:3a4:316c:8eeb with SMTP id t19-20020a056808159300b003a4316c8eebmr22835376oiw.40.1701275751387; Wed, 29 Nov 2023 08:35:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701275751; cv=none; d=google.com; s=arc-20160816; b=o3SytwRw4l7FMMFawcowrvQ50Q1XjoQ7AM9kV3W/VS/ySTBF73FDGQweIF9byf4SCJ htdz1r3JeXO79BenZgaUhLOTPt6pGvSHay9gWE6c/+/Sv/jurwl2ZyNU8en5H3SNSoLb QXvfHYztprzf/aUQ3d3PxFsbw5+AepwhunkaUaqxxIDXN4SdyWkDP4ZYa+cnca4GeCg/ n+57JE6rvP9O7oYGtwiEGvSVNUuEso3AF181GWHwGqXTqqYPTNWSM99t4F0995XH9cbg 0TFLkw4ZTlGtaHhHA4UHC0Eov8yRhaVhwyGYTctPqTuWTH1rG8hvLey1M1vUMllASHIj T2mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=NTM76tmJNZEq6MV7JLH9aEnvFO3vLFYNqsp6d5xBm5A=; fh=h89pDCf4sb/h8iiH2eBl34HhpquGQT/lUmV6zzg6kCw=; b=z644LNRPk+jBbA9sSxXEDUKs2Mc15JIOD8a87uAZnA1vHa3AR5YMFQg8OV9VB6uCc2 RyvW7Rb2CAoT2/5IgO9MEv46gY30ncP/9GCbVWFY8eCkDeysMNoZfCej1luDuYvG5KwA +AUCIYyg2+4YmJkdxhb1OfxMFXZkKmn10n0f3IBT0/l0ypQFYFoT676aIVgm53qeJnpn trWwdEOjot+SlZRuG9lumBejsUBsLm1UCZMwD+fjEvduwUJ16Ntg1R8yroMKSaowyVnV qtd19beWODduxyJG+7pcpHLInYzEsc2p+aeUAyNMyECPxQ9nDtkd93CZsTOAf3jOqQah pZ+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id n15-20020a63ee4f000000b005c2421cc394si14325725pgk.532.2023.11.29.08.35.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 08:35:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id B00DD802139B; Wed, 29 Nov 2023 08:35:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230350AbjK2QfU (ORCPT + 99 others); Wed, 29 Nov 2023 11:35:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231187AbjK2QfF (ORCPT ); Wed, 29 Nov 2023 11:35:05 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 704DCD7F; Wed, 29 Nov 2023 08:35:08 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0DED0C15; Wed, 29 Nov 2023 08:35:55 -0800 (PST) Received: from FVFF77S0Q05N.cambridge.arm.com (FVFF77S0Q05N.cambridge.arm.com [10.1.31.168]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 16EB33F73F; Wed, 29 Nov 2023 08:35:06 -0800 (PST) Date: Wed, 29 Nov 2023 16:35:01 +0000 From: Mark Rutland To: "Ashley, William" Cc: "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-perf-users@vger.kernel.org" , will@kernel.org Subject: Re: armv8pmu: Pending overflow interrupt is discarded when perf event is disabled Message-ID: References: <950001BD-490C-4BAC-8EEA-CDB9F7C4ADFC@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 29 Nov 2023 08:35:48 -0800 (PST) Hi William, On Mon, Nov 20, 2023 at 10:32:10PM +0000, Ashley, William wrote: > Adding linux-arm-kernel@lists.infradead.org and linux-kernel@vger.kernel.org, > sorry for the noise. Thanks for this! For the benefit of others, the original mail (and attachment) can be found at: https://lore.kernel.org/linux-perf-users/950001BD-490C-4BAC-8EEA-CDB9F7C4ADFC@amazon.com/ For some reason, links and whitespace seem to have been mangled in the resend; I'm not sure what happened there. I've added Will Deacon, as Will and I co-maintain the ARM PMU drivers. > On 11/20/23, 12:36 PM, "Ashley, William" > wrote: > > > An issue [1] was opened in the rr-debugger project reporting occasional missed > perf event overflow signals on arm64. I've been digging into this and think I > understand what's happening, but wanted to confirm my understanding. > > The attached example application, derived from an rr-debugger test case, reports > when the value of a counter doesn't increase by the expected period +/- some > tolerance. When it is ping-ponged between cores (e.g. with taskset) at a high > frequency, it frequently reports increases of ~2x the expected. I've confirmed > this same behavior on kernels 5.4, 5.10, 6.1 and 6.5. > > > I found armv8pmu_disable_intens [2] that is called as part of event > de-scheduling and contains > /* Clear the overflow flag in case an interrupt is pending. */ > write_pmovsclr(mask); > which results in any pending overflow interrupt being dropped. I added some > debug output here and indeed there is a correlation of this bit being high at > the point of the above code and the reproducer identifying a missed signal. I think you're right that if we had an overflow asserted at this point, we'll throw away the occurrence of the overflow (and so not call perf_event_overflow() and generate a sample, etc). It looks like we only lose the occurrence of the overflow; the actual counts will get sampled correctly and when we next reprogram the event, armpmu_event_set_period() should set up the next overflow period. > This behavior does not occur with pseudo-NMIs (irqchip.gicv3_pseudo_nmi=1) > enabled. That's interesting, because it implies that the PMU overflow interrupt is being recognised by the CPU while regular interrupts are disabled. There are some narrow races where that could occur (e.g. taking a timer or scheduler IRQ *just* as an overflow occurs), and some other cases I'd expect RR to avoid by construction (e.g. if RR isn't using mode exclusion and also counts kernel events). It's also worth noting that this means there are races even with pseudo-NMI where overflows could be lost. How often do you see the overflow being lost? Does RR set any of the perf_event_attr::exclude_* bits? If not, does RR intentionally count events that occur within the kernel? > When an event is not being explicitly torn down (e.g. being closed), this seems > like an undesirable behavior. I agree it's undesirable, though my understanding is that for most other users this isn't any worse than losing samples for other reasons (e.g. the perf ring buffer being full), and for the general case of sample-based profiling, losing a sample every so often isn't the end of the world. > I haven't attempted to demo it yet, but I suspect > an application disabling an event temporarily could occasionally see the same > missed overflow signals. Is my understanding here correct? That sounds right to me, though I haven't checked that end-to-end yet. > Does anyone have thoughts on how this could be addressed without creating > other issues? We should be able to detect overflow from the counter value alone, so we might be able to account for that when we actually read the event out, or when we schedule it back in and reprogram the period. I'm not sure if we can reasonably do that when scheduling the event out, since if we're switching tasks, that'll queue up IRQ work which will be triggered in the context of the next task. We might be able to figure out that we have an overflow when we schedule the event in under armpmu_start(), but I'll need to go digging to see if there are any free-running counters as the comment above the call to armpmu_event_set_period() says, or whether that's a historical artifact. I suspect we might need to restructure the code somewhat to be able to catch overflows more consistently. I'll see if I can come up with something, but we might not be able to guarantee this in all cases. Mark. > [1] https://github.com/rr-debugger/rr/issues/3607 > [2] https://github.com/torvalds/linux/blob/c42d9eeef8e5ba9292eda36fd8e3c11f35ee065c/drivers/perf/arm_pmuv3.c#L652C20-L652C43