Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F590C6FD1D for ; Mon, 20 Mar 2023 17:48:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230217AbjCTRsE (ORCPT ); Mon, 20 Mar 2023 13:48:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbjCTRrO (ORCPT ); Mon, 20 Mar 2023 13:47:14 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AEAD6C163 for ; Mon, 20 Mar 2023 10:42:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 53CD4165C; Mon, 20 Mar 2023 10:28:20 -0700 (PDT) Received: from merodach.members.linode.com (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CCF183F67D; Mon, 20 Mar 2023 10:27:33 -0700 (PDT) From: James Morse To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu , Reinette Chatre , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Babu Moger , James Morse , shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS , carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, xingxin.hx@openanolis.org, baolin.wang@linux.alibaba.com, Jamie Iles , Xin Hao , peternewman@google.com Subject: [PATCH v3 09/19] x86/resctrl: Queue mon_event_read() instead of sending an IPI Date: Mon, 20 Mar 2023 17:26:10 +0000 Message-Id: <20230320172620.18254-10-james.morse@arm.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230320172620.18254-1-james.morse@arm.com> References: <20230320172620.18254-1-james.morse@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org x86 is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Tested-by: Shaopeng Tan Signed-off-by: James Morse --- Changes since v2: * Use cpumask_any_housekeeping() and fallback to an IPI if needed --- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 19 +++++++++++++++++-- arch/x86/kernel/cpu/resctrl/internal.h | 2 +- arch/x86/kernel/cpu/resctrl/monitor.c | 6 ++++-- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c index eb07d4435391..b06e86839d00 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "internal.h" /* @@ -527,8 +528,13 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, struct rdt_domain *d, struct rdtgroup *rdtgrp, int evtid, int first) { + int cpu; + + /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */ + lockdep_assert_held(&rdtgroup_mutex); + /* - * setup the parameters to send to the IPI to read the data. + * setup the parameters to pass to mon_event_count() to read the data. */ rr->rgrp = rdtgrp; rr->evtid = evtid; @@ -537,7 +543,16 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, rr->val = 0; rr->first = first; - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1); + cpu = get_cpu(); + if (cpumask_test_cpu(cpu, &d->cpu_mask)) { + mon_event_count(rr); + put_cpu(); + } else { + put_cpu(); + + cpu = cpumask_any_housekeeping(&d->cpu_mask); + smp_call_on_cpu(cpu, mon_event_count, rr, false); + } } int rdtgroup_mondata_show(struct seq_file *m, void *arg) diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index 0b5fd5a0cda2..a07557390895 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -563,7 +563,7 @@ int alloc_rmid(u32 closid); void free_rmid(u32 closid, u32 rmid); int rdt_get_mon_l3_config(struct rdt_resource *r); bool __init rdt_cpu_has(int flag); -void mon_event_count(void *info); +int mon_event_count(void *info); int rdtgroup_mondata_show(struct seq_file *m, void *arg); void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, struct rdt_domain *d, struct rdtgroup *rdtgrp, diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 3bec5c59ca0e..5e9e876c3409 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -550,10 +550,10 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr) } /* - * This is called via IPI to read the CQM/MBM counters + * This is scheduled by mon_event_read() to read the CQM/MBM counters * on a domain. */ -void mon_event_count(void *info) +int mon_event_count(void *info) { struct rdtgroup *rdtgrp, *entry; struct rmid_read *rr = info; @@ -586,6 +586,8 @@ void mon_event_count(void *info) */ if (ret == 0) rr->err = 0; + + return 0; } /* -- 2.39.2