Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1553417rdd; Thu, 11 Jan 2024 02:41:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IGRRwykeslOZwGxCBSKa8DvZXZ1j4+7tKGnBCIgeBHtJkU8K5S+G5JIX+AJUrqCaRqXPRiR X-Received: by 2002:a05:6512:a93:b0:50e:7f88:d34f with SMTP id m19-20020a0565120a9300b0050e7f88d34fmr310712lfu.57.1704969706859; Thu, 11 Jan 2024 02:41:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704969706; cv=none; d=google.com; s=arc-20160816; b=IFGQrq6T6y4vy4Gwrh1/ehrSQoIDmLJO6YEy6wwnJeLwEqLtgMg1jW4RmjEGx7YRX9 ks2dmfidPIO3yc6CVvC8v2RMsUUxnND47zulehkEviUfFp9oiadOwGZlQbWT0HBF7KCl gWUBEbQzDAIH3ACm090I8WsnK4Lcd7EldwdZNr3/o2TI0o8egqRBDgJNHEnY31AblMxW O2rRoNwd4GJzayVmI1fObELV+1zbIQnCv8/D4Y3KZfg8aPefqN4w04sMZfhWk+s3luof eKBPU4tMGMKPhlQGVU76m4KvLUwNAN1yJzwFl7gQUA1FXZ/P4+DDG7pWlFjRKdXwA03Z IQ/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date; bh=dcKblCU7ntIdibLMCtr2sXjOVoasmBK0BRnB+x0DV3o=; fh=cCVMafgbEyOCCUqUx5WjkYKLbIRwhADjPEhTFUhR4p8=; b=UASFTHfB3yNm1PP8E+ZAgDnL1UGbIvWRSw8J+PnXF9fHfIAIBu885fsclKvg02U/uQ jFbs7ACZ221Ta0mdmgALmXIAQvg7lOqND9xtxcgaxNwbLsypO7lzm4nsw+MQZZ1wb0m0 fq7E/TB8sK7sxu4RZP/+VmHLJmIyLv52Z1TTt4WMfj6Q3aAr6QZBH80vmV0S1P/nz+yx e8TxQnrCdkjNldpcVwIle001lA8Hq7wbCrQeFHBpnpz7WAoOlITJjSesIQtSMQVMaqjC ORS2YzXwJY+ZG5sfDFyMJQvlVUIan/WOq3Ha7XIXlqbCaYEyYUZSq7BCSYWCz93KX/QN d1mA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-23393-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23393-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id f16-20020a170906c09000b00a26f5fcbd0bsi380639ejz.383.2024.01.11.02.41.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 02:41:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-23393-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-23393-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23393-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 8D9A11F23B9F for ; Thu, 11 Jan 2024 10:41:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A0DDF14AB7; Thu, 11 Jan 2024 10:41:39 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6425A14A9C for ; Thu, 11 Jan 2024 10:41:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1D56F2F4; Thu, 11 Jan 2024 02:42:21 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.90.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DDD6F3F73F; Thu, 11 Jan 2024 02:41:33 -0800 (PST) Date: Thu, 11 Jan 2024 10:41:27 +0000 From: Mark Rutland To: Namhyung Kim Cc: Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Arnaldo Carvalho de Melo , LKML , Mingwei Zhang , Ian Rogers , Kan Liang Subject: Re: [PATCH RESEND 1/2] perf/core: Update perf_adjust_freq_unthr_context() Message-ID: References: <20240109213623.449371-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Jan 10, 2024 at 10:27:27AM -0800, Namhyung Kim wrote: > On Wed, Jan 10, 2024 at 6:49 AM Mark Rutland wrote: > > On Tue, Jan 09, 2024 at 01:36:22PM -0800, Namhyung Kim wrote: > > > It was unnecessarily disabling and enabling PMUs for each event. It > > > should be done at PMU level. Add pmu_ctx->nr_freq counter to check it > > > at each PMU. As pmu context has separate active lists for pinned group > > > and flexible group, factor out a new function to do the job. > > > > > > Another minor optimization is that it can skip PMUs w/ CAP_NO_INTERRUPT > > > even if it needs to unthrottle sampling events. > > > > > > Reviewed-by: Ian Rogers > > > Reviewed-by: Kan Liang > > > Tested-by: Mingwei Zhang > > > Signed-off-by: Namhyung Kim > > > > Hi, > > > > I've taken a quick look and I don't think this is quite right for > > hybrid/big.LITTLE, but I think that should be relatively simple to fix (more on > > that below). > > Thanks for your review! No problem! > > This seems to be a bunch of optimizations; was that based on inspection alone, > > or have you found a workload where this has a measureable impact? > > It's from a code inspection but I think Mingwei reported some excessive > MSR accesses for KVM use cases. Anyway it'd increase the interrupt \ > latency if you have slow (uncore) PMUs and lots of events on those PMUs. Makes sense; it would be good if we could put smoething in the commit message mentioning that. [...] > > > +static void > > > +perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > > > +{ > > > + struct perf_event_pmu_context *pmu_ctx; > > > + > > > + /* > > > + * only need to iterate over all events iff: > > > + * - context have events in frequency mode (needs freq adjust) > > > + * - there are events to unthrottle on this cpu > > > + */ > > > + if (!(ctx->nr_freq || unthrottle)) > > > + return; > > > + > > > + raw_spin_lock(&ctx->lock); > > > + > > > + list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) { > > > + if (!(pmu_ctx->nr_freq || unthrottle)) > > > + continue; > > > + if (pmu_ctx->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) > > > + continue; > > > + > > > + perf_pmu_disable(pmu_ctx->pmu); > > > + perf_adjust_freq_unthr_events(&pmu_ctx->pinned_active); > > > + perf_adjust_freq_unthr_events(&pmu_ctx->flexible_active); > > > + perf_pmu_enable(pmu_ctx->pmu); > > > } > > > > I don't think this is correct for big.LITTLE/hybrid systems. > > > > Imagine a system where CPUs 0-1 have pmu_a, CPUs 2-3 have pmu_b, and a task has > > events for both pmu_a and pmu_b. The perf_event_context for that task will have > > a perf_event_pmu_context for each PMU in its pmu_ctx_list. > > > > Say that task is run on CPU0, and perf_event_task_tick() is called. That will > > call perf_adjust_freq_unthr_context(), and it will iterate over the > > pmu_ctx_list. Note that regardless of pmu_ctx->nr_freq, if 'unthottle' is true, > > we'll go ahead and call the following for all of the pmu contexts in the > > pmu_ctx_list: > > > > perf_pmu_disable(pmu_ctx->pmu); > > perf_adjust_freq_unthr_events(&pmu_ctx->pinned_active); > > perf_adjust_freq_unthr_events(&pmu_ctx->flexible_active); > > perf_pmu_enable(pmu_ctx->pmu); > > > > ... and that means we might call that for pmu_b, even though it's not > > associated with CPU0. That could be fatal depending on what those callbacks do. > > Thanks for pointing that out. I missed the hybrid cases. > > > The old logic avoided that possibility implicitly, since the events for pmu_b > > couldn't be active, and so the check at the start of the look would skip all of > > pmu_b's events: > > > > if (event->state != PERF_EVENT_STATE_ACTIVE) > > continue; > > > > We could do similar by keeping track of how many active events each > > perf_event_pmu_context has, which'd allow us to do something like: > > > > if (pmu_ctx->nr_active == 0) > > continue; > > > > How does that sound to you? > > Sounds good. Maybe we can just test if both active lists are empty. Good idea, I think that'd be simpler and less fragile. Thanks, Mark.