2024-02-13 18:55:07

by James Morse

[permalink] [raw]
Subject: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hello!

It's been back and forth for whether this series should be rebased onto Tony's
SNC series. This version isn't, its based on tip/x86/cache.
(I have the rebased-and-tested versions if anyone needs them)

This version just has minor cleanup from the previous one.
* An unusued parameter in unused code has gone,
* I've added a comment about the sizeing of the index array - it only matters on arm64.

Changes are also noted on each patch.

~

This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
in a way that works for MPAM, and it separates the locking so that the arch code
and filesystem code don't have to share a mutex. I tried to split this as two
series, but these touch similar call sites, so it would create more work.

(What's MPAM? See the cover letter of the first series. [1])

On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
isn't an independent number - it extends the PARTID (same as CLOSID) space
with bits that aren't used to select the configuration. The monitors can
then be told to match specific PMG values, allowing monitor-groups to be
created.

But, MPAM expects the monitors to always monitor by PARTID. The
Cache-storage-utilisation counters can only work this way.
(In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
this)

It gets worse, as some SoCs may have very few PMG bits. I've seen the
datasheet for one that has a single bit of PMG space.

To be usable, MPAM's counters always need the PARTID and the PMG.
For resctrl, this means always making the CLOSID available when the RMID
is used.

To ensure RMID are always unique, this series combines the CLOSID and RMID
into an index, and manages RMID based on that. For x86, the index and RMID
would always be the same.


Currently the architecture specific code in the cpuhp callbacks takes the
rdtgroup_mutex. This means the filesystem code would have to export this
lock, resulting in an ill-defined interface between the two, and the possibility
of cross-architecture lock-ordering head aches.

The second part of this series adds a domain_list_lock to protect writes to the
domain list, and protects the domain list with RCU - or cpus_read_lock().

Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
working, its very likely they'll need to be plumbed up to perf. An uncore PMU
driver would need to be a lockless reader of the domain list.

This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9

Bugs welcome,

Thanks,

James

[1] https://lore.kernel.org/lkml/[email protected]/
[v1] https://lore.kernel.org/all/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/r/[email protected]
[v4] https://lore.kernel.org/r/[email protected]
[v5] https://lore.kernel.org/lkml/[email protected]/
[v6] https://lore.kernel.org/all/[email protected]/
[v7] https://lore.kernel.org/r/[email protected]/
[v8] https://lore.kernel.org/lkml/[email protected]/

James Morse (24):
tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
x86/resctrl: Create helper for RMID allocation and mondata dir
creation
x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
x86/resctrl: Track the closid with the rmid
x86/resctrl: Access per-rmid structures by index
x86/resctrl: Allow RMID allocation to be scoped by CLOSID
x86/resctrl: Track the number of dirty RMID a CLOSID has
x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
x86/resctrl: Allocate the cleanest CLOSID by searching
closid_num_dirty_rmid
x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
x86/resctrl: Queue mon_event_read() instead of sending an IPI
x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
x86/resctrl: Allow arch to allocate memory needed in
resctrl_arch_rmid_read()
x86/resctrl: Make resctrl_mounted checks explicit
x86/resctrl: Move alloc/mon static keys into helpers
x86/resctrl: Make rdt_enable_key the arch's decision to switch
x86/resctrl: Add helpers for system wide mon/alloc capable
x86/resctrl: Add CPU online callback for resctrl work
x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
cpu
x86/resctrl: Add CPU offline callback for resctrl work
x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
x86/resctrl: Separate arch and fs resctrl locks

arch/x86/include/asm/resctrl.h | 90 +++++
arch/x86/kernel/cpu/resctrl/core.c | 102 ++---
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++-
arch/x86/kernel/cpu/resctrl/internal.h | 67 +++-
arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++-----
include/linux/resctrl.h | 48 ++-
include/linux/tick.h | 9 +-
9 files changed, 921 insertions(+), 280 deletions(-)

--
2.39.2



2024-02-13 18:56:30

by James Morse

[permalink] [raw]
Subject: [PATCH v9 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
used for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.

The CLOSID found is returned to closid_alloc() for the free list
to be updated.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
---
Changes since v4:
* Dropped stale section from comment

Changes since v5:
* Renamed some variables.

Changes since v7:
* Made comments over closid_num_dirty_rmid() not a kdoc comment.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/monitor.c | 45 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++---
3 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 872ba1a34103..b7b9d9230bef 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -566,5 +566,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
void rdt_staged_configs_clear(void);
+bool closid_allocated(unsigned int closid);
+int resctrl_find_cleanest_closid(void);

#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 13b0c8d14f3d..101f1b112d17 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -386,6 +386,51 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
return ERR_PTR(-ENOSPC);
}

+/**
+ * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
+ * RMID are clean, or the CLOSID that has
+ * the most clean RMID.
+ *
+ * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
+ * may not be able to allocate clean RMID. To avoid this the allocator will
+ * choose the CLOSID with the most clean RMID.
+ *
+ * When the CLOSID and RMID are independent numbers, the first free CLOSID will
+ * be returned.
+ */
+int resctrl_find_cleanest_closid(void)
+{
+ u32 cleanest_closid = ~0;
+ int i = 0;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+ return -EIO;
+
+ for (i = 0; i < closids_supported(); i++) {
+ int num_dirty;
+
+ if (closid_allocated(i))
+ continue;
+
+ num_dirty = closid_num_dirty_rmid[i];
+ if (num_dirty == 0)
+ return i;
+
+ if (cleanest_closid == ~0)
+ cleanest_closid = i;
+
+ if (num_dirty < closid_num_dirty_rmid[cleanest_closid])
+ cleanest_closid = i;
+ }
+
+ if (cleanest_closid == ~0)
+ return -ENOSPC;
+
+ return cleanest_closid;
+}
+
/*
* For MPAM the RMID value is not unique, and has to be considered with
* the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bc6e0f83c847..8fc46204a6cc 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -137,13 +137,22 @@ static void closid_init(void)

static int closid_alloc(void)
{
- u32 closid = ffs(closid_free_map);
+ int cleanest_closid;
+ u32 closid;

lockdep_assert_held(&rdtgroup_mutex);

- if (closid == 0)
- return -ENOSPC;
- closid--;
+ if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+ cleanest_closid = resctrl_find_cleanest_closid();
+ if (cleanest_closid < 0)
+ return cleanest_closid;
+ closid = cleanest_closid;
+ } else {
+ closid = ffs(closid_free_map);
+ if (closid == 0)
+ return -ENOSPC;
+ closid--;
+ }
__clear_bit(closid, &closid_free_map);

return closid;
@@ -163,7 +172,7 @@ void closid_free(int closid)
* Return: true if @closid is currently associated with a resource group,
* false if @closid is free
*/
-static bool closid_allocated(unsigned int closid)
+bool closid_allocated(unsigned int closid)
{
lockdep_assert_held(&rdtgroup_mutex);

--
2.39.2


2024-02-13 18:57:34

by James Morse

[permalink] [raw]
Subject: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

rmid_ptrs[] is allocated from dom_data_init() but never free()d.

While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.

Add resctrl_put_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().

There is no reason to backport this to a stable kernel.

Signed-off-by: James Morse <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Babu Moger <[email protected]>
---
Changes since v5:
* This patch is new

Changes since v6:
* Removed struct rdt_resource argument, added __exit markers to match the
only caller.
* Adedd a whole stack of functions to maintain symmetry.

Changes since v7:
* Moved the eventual kfree() call to be after rdtgroup_exit()

Changes since v8:
* Removed an unused parameter from some unused code.
---
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 15 +++++++++++++++
3 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index aa9810a64258..9641c42d0f85 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -990,8 +990,14 @@ late_initcall(resctrl_late_init);

static void __exit resctrl_exit(void)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
cpuhp_remove_state(rdt_online);
+
rdtgroup_exit();
+
+ if (r->mon_capable)
+ rdt_put_mon_l3_config();
}

__exitcall(resctrl_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 52e7e7deee10..61c763604fc9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -544,6 +544,7 @@ void closid_free(int closid);
int alloc_rmid(void);
void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
+void __exit rdt_put_mon_l3_config(void);
bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3a6c069614eb..3a73db0579d8 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -719,6 +719,16 @@ static int dom_data_init(struct rdt_resource *r)
return 0;
}

+static void __exit dom_data_exit(void)
+{
+ mutex_lock(&rdtgroup_mutex);
+
+ kfree(rmid_ptrs);
+ rmid_ptrs = NULL;
+
+ mutex_unlock(&rdtgroup_mutex);
+}
+
static struct mon_evt llc_occupancy_event = {
.name = "llc_occupancy",
.evtid = QOS_L3_OCCUP_EVENT_ID,
@@ -814,6 +824,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
return 0;
}

+void __exit rdt_put_mon_l3_config(void)
+{
+ dom_data_exit();
+}
+
void __init intel_rdt_mbm_apply_quirk(void)
{
int cf_index;
--
2.39.2


2024-02-13 19:07:34

by James Morse

[permalink] [raw]
Subject: [PATCH v9 05/24] x86/resctrl: Track the closid with the rmid

x86's RMID are independent of the CLOSID. An RMID can be allocated,
used and freed without considering the CLOSID.

MPAM's equivalent feature is PMG, which is not an independent number,
it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of
'RMID' can be allocated for a single CLOSID.
i.e. if there is 1 bit of PMG space, then each CLOSID can have two
monitor groups.

To allow resctrl to disambiguate RMID values for different CLOSID,
everything in resctrl that keeps an RMID value needs to know the CLOSID
too. This will always be ignored on x86.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Xin Hao <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>

---
Is there a better term for 'the unique identifier for a monitor group'.
Using RMID for that here may be confusing...

Changes since v1:
* Added comment in struct rmid_entry

Changes since v2:
* Moved X86_RESCTRL_BAD_CLOSID from a subsequent patch

Chances since v3:
* Renamed X86_RESCTRL_BAD_CLOSID to EMPTY
* Clarified a few comments and kernel-doc

Changes since v5:
* Use entry->closid from the iterator, instead of the parent control group.
* Move the reserved defines into this patch to reduce the churn.
* Added some kernel doc.
* Renamed some arch closid parameters as 'unused'.

Changes since v6:
* Changes to comments.

Changes since v7:
* Changes to comments.
---
arch/x86/include/asm/resctrl.h | 7 +++
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 73 +++++++++++++++--------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 ++--
include/linux/resctrl.h | 16 ++++-
6 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 255a78d9d906..cc6e1bce7b1a 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -7,6 +7,13 @@
#include <linux/sched.h>
#include <linux/jump_label.h>

+/*
+ * This value can never be a valid CLOSID, and is used when mapping a
+ * (closid, rmid) pair to an index and back. On x86 only the RMID is
+ * needed. The index is a software defined value.
+ */
+#define X86_RESCTRL_EMPTY_CLOSID ((u32)~0)
+
/**
* struct resctrl_pqr_state - State cache for the PQR MSR
* @cur_rmid: The cached Resource Monitoring ID
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 61c763604fc9..ae0e3338abc4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -542,7 +542,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
-void free_rmid(u32 rmid);
+void free_rmid(u32 closid, u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void __exit rdt_put_mon_l3_config(void);
bool __init rdt_cpu_has(int flag);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3a73db0579d8..3dad4134d2c9 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,7 +24,20 @@

#include "internal.h"

+/**
+ * struct rmid_entry - dirty tracking for all RMID.
+ * @closid: The CLOSID for this entry.
+ * @rmid: The RMID for this entry.
+ * @busy: The number of domains with cached data using this RMID.
+ * @list: Member of the rmid_free_lru list when busy == 0.
+ *
+ * Depending on the architecture the correct monitor is accessed using
+ * both @closid and @rmid, or @rmid only.
+ *
+ * Take the rdtgroup_mutex when accessing.
+ */
struct rmid_entry {
+ u32 closid;
u32 rmid;
int busy;
struct list_head list;
@@ -136,7 +149,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
return val;
}

-static inline struct rmid_entry *__rmid_entry(u32 rmid)
+static inline struct rmid_entry *__rmid_entry(u32 closid, u32 rmid)
{
struct rmid_entry *entry;

@@ -190,7 +203,8 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
}

void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid)
+ u32 unused, u32 rmid,
+ enum resctrl_event_id eventid)
{
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
@@ -230,7 +244,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
}

int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid, u64 *val)
+ u32 unused, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
@@ -285,9 +300,9 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
if (nrmid >= r->num_rmid)
break;

- entry = __rmid_entry(nrmid);
+ entry = __rmid_entry(X86_RESCTRL_EMPTY_CLOSID, nrmid);// temporary

- if (resctrl_arch_rmid_read(r, d, entry->rmid,
+ if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
} else {
@@ -342,7 +357,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- err = resctrl_arch_rmid_read(r, d, entry->rmid,
+ err = resctrl_arch_rmid_read(r, d, entry->closid,
+ entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_rmid_realloc_threshold)
@@ -366,7 +382,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
list_add_tail(&entry->list, &rmid_free_lru);
}

-void free_rmid(u32 rmid)
+void free_rmid(u32 closid, u32 rmid)
{
struct rmid_entry *entry;

@@ -375,7 +391,7 @@ void free_rmid(u32 rmid)

lockdep_assert_held(&rdtgroup_mutex);

- entry = __rmid_entry(rmid);
+ entry = __rmid_entry(closid, rmid);

if (is_llc_occupancy_enabled())
add_rmid_to_limbo(entry);
@@ -383,8 +399,8 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
- enum resctrl_event_id evtid)
+static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 closid,
+ u32 rmid, enum resctrl_event_id evtid)
{
switch (evtid) {
case QOS_L3_MBM_TOTAL_EVENT_ID:
@@ -396,20 +412,21 @@ static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
}
}

-static int __mon_event_count(u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
struct mbm_state *m;
u64 tval = 0;

if (rr->first) {
- resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
- m = get_mbm_state(rr->d, rmid, rr->evtid);
+ resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
+ m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
if (m)
memset(m, 0, sizeof(struct mbm_state));
return 0;
}

- rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
+ &tval);
if (rr->err)
return rr->err;

@@ -421,6 +438,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
/*
* mbm_bw_count() - Update bw count from values previously read by
* __mon_event_count().
+ * @closid: The closid used to identify the cached mbm_state.
* @rmid: The rmid used to identify the cached mbm_state.
* @rr: The struct rmid_read populated by __mon_event_count().
*
@@ -429,7 +447,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
* __mon_event_count() is compared with the chunks value from the previous
* invocation. This must be called once per second to maintain values in MBps.
*/
-static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
+static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
struct mbm_state *m = &rr->d->mbm_local[rmid];
u64 cur_bw, bytes, cur_bytes;
@@ -456,7 +474,7 @@ void mon_event_count(void *info)

rdtgrp = rr->rgrp;

- ret = __mon_event_count(rdtgrp->mon.rmid, rr);
+ ret = __mon_event_count(rdtgrp->closid, rdtgrp->mon.rmid, rr);

/*
* For Ctrl groups read data from child monitor groups and
@@ -467,7 +485,8 @@ void mon_event_count(void *info)

if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
- if (__mon_event_count(entry->mon.rmid, rr) == 0)
+ if (__mon_event_count(entry->closid, entry->mon.rmid,
+ rr) == 0)
ret = 0;
}
}
@@ -578,7 +597,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, u32 rmid)
{
struct rmid_read rr;

@@ -593,12 +613,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
if (is_mbm_total_enabled()) {
rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
rr.val = 0;
- __mon_event_count(rmid, &rr);
+ __mon_event_count(closid, rmid, &rr);
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
rr.val = 0;
- __mon_event_count(rmid, &rr);
+ __mon_event_count(closid, rmid, &rr);

/*
* Call the MBA software controller only for the
@@ -606,7 +626,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
* the software controller explicitly.
*/
if (is_mba_sc(NULL))
- mbm_bw_count(rmid, &rr);
+ mbm_bw_count(closid, rmid, &rr);
}
}

@@ -663,11 +683,11 @@ void mbm_handle_overflow(struct work_struct *work)
d = container_of(work, struct rdt_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
- mbm_update(r, d, prgrp->mon.rmid);
+ mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);

head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list)
- mbm_update(r, d, crgrp->mon.rmid);
+ mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);

if (is_mba_sc(NULL))
update_mba_bw(prgrp, d);
@@ -710,10 +730,11 @@ static int dom_data_init(struct rdt_resource *r)
}

/*
- * RMID 0 is special and is always allocated. It's used for all
- * tasks that are not monitored.
+ * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and
+ * are always allocated. These are used for the rdtgroup_default
+ * control group, which will be setup later in rdtgroup_init().
*/
- entry = __rmid_entry(0);
+ entry = __rmid_entry(RESCTRL_RESERVED_CLOSID, RESCTRL_RESERVED_RMID);
list_del(&entry->list);

return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..65bee6f11015 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -752,7 +752,7 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
* anymore when this group would be used for pseudo-locking. This
* is safe to call on platforms not capable of monitoring.
*/
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

ret = 0;
goto out;
@@ -787,7 +787,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)

ret = rdtgroup_locksetup_user_restore(rdtgrp);
if (ret) {
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
return ret;
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f455a10b74ab..ad7da7254f4d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2837,7 +2837,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)

head = &rdtgrp->mon.crdtgrp_list;
list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
- free_rmid(sentry->mon.rmid);
+ free_rmid(sentry->closid, sentry->mon.rmid);
list_del(&sentry->mon.crdtgrp_list);

if (atomic_read(&sentry->waitcount) != 0)
@@ -2877,7 +2877,7 @@ static void rmdir_all_sub(void)
cpumask_or(&rdtgroup_default.cpu_mask,
&rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);

- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

kernfs_remove(rdtgrp->kn);
list_del(&rdtgrp->rdtgroup_list);
@@ -3305,7 +3305,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
return ret;
}

@@ -3315,7 +3315,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
if (rdt_mon_capable)
- free_rmid(rgrp->mon.rmid);
+ free_rmid(rgrp->closid, rgrp->mon.rmid);
}

static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
@@ -3574,7 +3574,7 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);

rdtgrp->flags = RDT_DELETED;
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

/*
* Remove the rdtgrp from the parent ctrl_mon group's list
@@ -3620,8 +3620,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);

+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
- free_rmid(rdtgrp->mon.rmid);

rdtgroup_ctrl_remove(rdtgrp);

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 66942d7fba7f..bd4ec22b5a96 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -6,6 +6,10 @@
#include <linux/list.h>
#include <linux/pid.h>

+/* CLOSID, RMID value used by the default control group */
+#define RESCTRL_RESERVED_CLOSID 0
+#define RESCTRL_RESERVED_RMID 0
+
#ifdef CONFIG_PROC_CPU_RESCTRL

int proc_resctrl_show(struct seq_file *m,
@@ -225,6 +229,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* for this resource and domain.
* @r: resource that the counter should be read from.
* @d: domain that the counter should be read from.
+ * @closid: closid that matches the rmid. Depending on the architecture, the
+ * counter may match traffic of both @closid and @rmid, or @rmid
+ * only.
* @rmid: rmid of the counter to read.
* @eventid: eventid to read, e.g. L3 occupancy.
* @val: result of the counter read in bytes.
@@ -235,20 +242,25 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* 0 on success, or -EIO, -EINVAL etc on error.
*/
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid, u64 *val);
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val);
+

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid
* and eventid.
* @r: The domain's resource.
* @d: The rmid's domain.
+ * @closid: closid that matches the rmid. Depending on the architecture, the
+ * counter may match traffic of both @closid and @rmid, or @rmid only.
* @rmid: The rmid whose counter values should be reset.
* @eventid: The eventid whose counter values should be reset.
*
* This can be called from any CPU.
*/
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid);
+ u32 closid, u32 rmid,
+ enum resctrl_event_id eventid);

/**
* resctrl_arch_reset_rmid_all() - Reset all private state associated with
--
2.39.2


2024-02-13 19:13:50

by James Morse

[permalink] [raw]
Subject: [PATCH v9 17/24] x86/resctrl: Move alloc/mon static keys into helpers

resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any
static keys that control it should be buried behind helpers.

Move the alloc/mon logic into arch-specific helpers as a preparatory step
for making the rdt_enable_key's status something the arch code decides.

This means other architectures don't have to mirror the static keys.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
---
arch/x86/include/asm/resctrl.h | 20 ++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 5 -----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 29c4cc343787..3c9137b6ad4f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -42,6 +42,26 @@ DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);

+static inline void resctrl_arch_enable_alloc(void)
+{
+ static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_disable_alloc(void)
+{
+ static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_enable_mon(void)
+{
+ static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+}
+
+static inline void resctrl_arch_disable_mon(void)
+{
+ static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+}
+
/*
* __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9bfda6963794..78580855139d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -94,9 +94,6 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
return container_of(kfc, struct rdt_fs_context, kfc);
}

-DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
-DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
-
/**
* struct mon_evt - Entry in the event list of a resource
* @evtid: event id
@@ -452,8 +449,6 @@ extern struct mutex rdtgroup_mutex;

extern struct rdt_hw_resource rdt_resources_all[];
extern struct rdtgroup rdtgroup_default;
-DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
-
extern struct dentry *debugfs_resctrl;

enum resctrl_res_level {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 857fbbc3c839..231207f09e04 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2668,9 +2668,9 @@ static int rdt_get_tree(struct fs_context *fc)
goto out_psl;

if (rdt_alloc_capable)
- static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+ resctrl_arch_enable_alloc();
if (rdt_mon_capable)
- static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_enable_mon();

if (rdt_alloc_capable || rdt_mon_capable) {
static_branch_enable_cpuslocked(&rdt_enable_key);
@@ -2946,8 +2946,8 @@ static void rdt_kill_sb(struct super_block *sb)
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
rdtgroup_destroy_root();
- static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
- static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_disable_alloc();
+ resctrl_arch_disable_mon();
static_branch_disable_cpuslocked(&rdt_enable_key);
resctrl_mounted = false;
kernfs_kill_sb(sb);
--
2.39.2


2024-02-13 19:14:38

by James Morse

[permalink] [raw]
Subject: [PATCH v9 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable

resctrl reads rdt_alloc_capable or rdt_mon_capable to determine
whether any of the resources support the corresponding features.
resctrl also uses the static-keys that affect the architecture's
context-switch code to determine the same thing.

This forces another architecture to have the same static-keys.

As the static-key is enabled based on the capable flag, and none of
the filesystem uses of these are in the scheduler path, move the
capable flags behind helpers, and use these in the filesystem
code instead of the static-key.

After this change, only the architecture code manages and uses
the static-keys to ensure __resctrl_sched_in() does not need
runtime checks.

This avoids multiple architectures having to define the same
static-keys.

Cases where the static-key implicitly tested if the resctrl
filesystem was mounted all have an explicit check added by a
previous patch.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
---
Changes since v1:
* Added missing conversion in mkdir_rdt_prepare_rmid_free()

Changes since v3:
* Expanded the commit message.

Change since v7:
* Added a few missing resctrl_arch_mon_capable() that crept in during
a rebase.
---
arch/x86/include/asm/resctrl.h | 13 ++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 2 --
arch/x86/kernel/cpu/resctrl/monitor.c | 4 +--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 ++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 38 +++++++++++------------
5 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b74aa34dc9e8..12dbd2588ca7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -38,10 +38,18 @@ struct resctrl_pqr_state {

DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);

+extern bool rdt_alloc_capable;
+extern bool rdt_mon_capable;
+
DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);

+static inline bool resctrl_arch_alloc_capable(void)
+{
+ return rdt_alloc_capable;
+}
+
static inline void resctrl_arch_enable_alloc(void)
{
static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
@@ -54,6 +62,11 @@ static inline void resctrl_arch_disable_alloc(void)
static_branch_dec_cpuslocked(&rdt_enable_key);
}

+static inline bool resctrl_arch_mon_capable(void)
+{
+ return rdt_mon_capable;
+}
+
static inline void resctrl_arch_enable_mon(void)
{
static_branch_enable_cpuslocked(&rdt_mon_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 78580855139d..3ee855c37447 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -137,8 +137,6 @@ struct rmid_read {
void *arch_mon_ctx;
};

-extern bool rdt_alloc_capable;
-extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
extern struct list_head resctrl_schema_all;
extern bool resctrl_mounted;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index d5d8a58d96f2..92d7ba674003 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -817,7 +817,7 @@ void mbm_handle_overflow(struct work_struct *work)
* If the filesystem has been unmounted this work no longer needs to
* run.
*/
- if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+ if (!resctrl_mounted || !resctrl_arch_mon_capable())
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -854,7 +854,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
* When a domain comes online there is no guarantee the filesystem is
* mounted. If not, there is no need to catch counter overflow.
*/
- if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+ if (!resctrl_mounted || !resctrl_arch_mon_capable())
return;
cpu = cpumask_any_housekeeping(&dom->cpu_mask);
dom->mbm_work_cpu = cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d8f44113ed1f..8056bed033cc 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -581,7 +581,7 @@ static int rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
if (ret)
goto err_cpus;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = rdtgroup_kn_mode_restrict(rdtgrp, "mon_groups");
if (ret)
goto err_cpus_list;
@@ -628,7 +628,7 @@ static int rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
if (ret)
goto err_cpus;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = rdtgroup_kn_mode_restore(rdtgrp, "mon_groups", 0777);
if (ret)
goto err_cpus_list;
@@ -776,7 +776,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
{
int ret;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = alloc_rmid(rdtgrp->closid);
if (ret < 0) {
rdt_last_cmd_puts("Out of RMIDs\n");
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e57ac9d81f7..ed5fc677a99d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -641,13 +641,13 @@ static int __rdtgroup_move_task(struct task_struct *tsk,

static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_alloc_capable && (r->type == RDTCTRL_GROUP) &&
+ return (resctrl_arch_alloc_capable() && (r->type == RDTCTRL_GROUP) &&
resctrl_arch_match_closid(t, r->closid));
}

static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_mon_capable && (r->type == RDTMON_GROUP) &&
+ return (resctrl_arch_mon_capable() && (r->type == RDTMON_GROUP) &&
resctrl_arch_match_rmid(t, r->mon.parent->closid,
r->mon.rmid));
}
@@ -2632,7 +2632,7 @@ static int rdt_get_tree(struct fs_context *fc)

closid_init();

- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
flags |= RFTYPE_MON;

ret = rdtgroup_add_files(rdtgroup_default.kn, flags);
@@ -2645,7 +2645,7 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_schemata_free;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = mongroup_create_dir(rdtgroup_default.kn,
&rdtgroup_default, "mon_groups",
&kn_mongrp);
@@ -2667,12 +2667,12 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_psl;

- if (rdt_alloc_capable)
+ if (resctrl_arch_alloc_capable())
resctrl_arch_enable_alloc();
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
resctrl_arch_enable_mon();

- if (rdt_alloc_capable || rdt_mon_capable)
+ if (resctrl_arch_alloc_capable() || resctrl_arch_mon_capable())
resctrl_mounted = true;

if (is_mbm_enabled()) {
@@ -2686,10 +2686,10 @@ static int rdt_get_tree(struct fs_context *fc)
out_psl:
rdt_pseudo_lock_release();
out_mondata:
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
kernfs_remove(kn_mondata);
out_mongrp:
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
kernfs_remove(kn_mongrp);
out_info:
kernfs_remove(kn_info);
@@ -2944,9 +2944,9 @@ static void rdt_kill_sb(struct super_block *sb)
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
rdtgroup_destroy_root();
- if (rdt_alloc_capable)
+ if (resctrl_arch_alloc_capable())
resctrl_arch_disable_alloc();
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
resctrl_arch_disable_mon();
resctrl_mounted = false;
kernfs_kill_sb(sb);
@@ -3326,7 +3326,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
{
int ret;

- if (!rdt_mon_capable)
+ if (!resctrl_arch_mon_capable())
return 0;

ret = alloc_rmid(rdtgrp->closid);
@@ -3348,7 +3348,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)

static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
free_rmid(rgrp->closid, rgrp->mon.rmid);
}

@@ -3412,7 +3412,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,

if (rtype == RDTCTRL_GROUP) {
files = RFTYPE_BASE | RFTYPE_CTRL;
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
files |= RFTYPE_MON;
} else {
files = RFTYPE_BASE | RFTYPE_MON;
@@ -3521,7 +3521,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,

list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
/*
* Create an empty mon_groups directory to hold the subset
* of tasks and cpus to monitor.
@@ -3576,14 +3576,14 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
* allocation is supported, add a control and monitoring
* subdirectory
*/
- if (rdt_alloc_capable && parent_kn == rdtgroup_default.kn)
+ if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);

/*
* If RDT monitoring is supported and the parent directory is a valid
* "mon_groups" directory, add a monitoring subdirectory.
*/
- if (rdt_mon_capable && is_mon_groups(parent_kn, name))
+ if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
return rdtgroup_mkdir_mon(parent_kn, name, mode);

return -EPERM;
@@ -3918,7 +3918,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
* If resctrl is mounted, remove all the
* per domain monitor data directories.
*/
- if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+ if (resctrl_mounted && resctrl_arch_mon_capable())
rmdir_mondata_subdir_allrdtgrp(r, d->id);

if (is_mbm_enabled())
@@ -4001,7 +4001,7 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
* by rdt_get_tree() calling mkdir_mondata_all().
* If resctrl is mounted, add per domain monitor data directories.
*/
- if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+ if (resctrl_mounted && resctrl_arch_mon_capable())
mkdir_mondata_subdir_allrdtgrp(r, d);

return 0;
--
2.39.2


2024-02-13 19:15:01

by James Morse

[permalink] [raw]
Subject: [PATCH v9 20/24] x86/resctrl: Add CPU online callback for resctrl work

The resctrl architecture specific code may need to create a domain when
a CPU comes online, it also needs to reset the CPUs PQR_ASSOC register.
The resctrl filesystem code needs to update the rdtgroup_default CPU
mask when CPUs are brought online.

Currently this is all done in one function, resctrl_online_cpu().
This will need to be split into architecture and filesystem parts
before resctrl can be moved to /fs/.

Pull the rdtgroup_default update work out as a filesystem specific
cpu_online helper. resctrl_online_cpu() is the obvious name for this,
which means the version in core.c needs renaming.

resctrl_online_cpu() is called by the arch code once it has done the
work to add the new CPU to any domains.

In future patches, resctrl_online_cpu() will take the rdtgroup_mutex
itself.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
---
Changes since v3:
* Renamed err to ret

Changes since v4:
* Changes in capitalisation.

Changes since v5:
* More changes in capitalisation.
* Made resctrl_online_cpu() return void.
---
arch/x86/kernel/cpu/resctrl/core.c | 8 ++++----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++++
include/linux/resctrl.h | 1 +
3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index d1dc80a21ea9..4627d447bc3d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -606,16 +606,16 @@ static void clear_closid_rmid(int cpu)
RESCTRL_RESERVED_CLOSID);
}

-static int resctrl_online_cpu(unsigned int cpu)
+static int resctrl_arch_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
- /* The cpu is set in default rdtgroup after online. */
- cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
clear_closid_rmid(cpu);
+
+ resctrl_online_cpu(cpu);
mutex_unlock(&rdtgroup_mutex);

return 0;
@@ -967,7 +967,7 @@ static int __init resctrl_late_init(void)

state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"x86/resctrl/cat:online:",
- resctrl_online_cpu, resctrl_offline_cpu);
+ resctrl_arch_online_cpu, resctrl_offline_cpu);
if (state < 0)
return state;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ed5fc677a99d..38d3b19a3aca 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4007,6 +4007,14 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

+void resctrl_online_cpu(unsigned int cpu)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ /* The CPU is set in default rdtgroup after online. */
+ cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bf460c912bf5..4c4bad3c34e4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -223,6 +223,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_online_cpu(unsigned int cpu);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
--
2.39.2


2024-02-13 19:16:14

by James Morse

[permalink] [raw]
Subject: [PATCH v9 24/24] x86/resctrl: Separate arch and fs resctrl locks

resctrl has one mutex that is taken by the architecture specific code,
and the filesystem parts. The two interact via cpuhp, where the
architecture code updates the domain list. Filesystem handlers that
walk the domains list should not run concurrently with the cpuhp
callback modifying the list.

Exposing a lock from the filesystem code means the interface is not
cleanly defined, and creates the possibility of cross-architecture
lock ordering headaches. The interaction only exists so that certain
filesystem paths are serialised against CPU hotplug. The CPU hotplug
code already has a mechanism to do this using cpus_read_lock().

MPAM's monitors have an overflow interrupt, so it needs to be possible
to walk the domains list in irq context. RCU is ideal for this,
but some paths need to be able to sleep to allocate memory.

Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part
of a cpuhp callback, cpus_read_lock() must always be taken first.
rdtgroup_schemata_write() already does this.

Most of the filesystem code's domain list walkers are currently
protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live().
The exceptions are rdt_bit_usage_show() and the mon_config helpers
which take the lock directly.

Make the domain list protected by RCU. An architecture-specific
lock prevents concurrent writers. rdt_bit_usage_show() could
walk the domain list using RCU, but to keep all the filesystem
operations the same, this is changed to call cpus_read_lock().
The mon_config helpers send multiple IPIs, take the cpus_read_lock()
in these cases.

The other filesystem list walkers need to be able to sleep.
Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the
cpuhp callbacks can't be invoked when file system operations are
occurring.

Add lockdep_assert_cpus_held() in the cases where the
rdtgroup_kn_lock_live() call isn't obvious.

Resctrl's domain online/offline calls now need to take the
rdtgroup_mutex themselves.

Signed-off-by: James Morse <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
---
Changes since v2:
* Reworded a comment,
* Added a lockdep assertion
* Moved clear_closid_rmid() outside the locked region of cpu
online/offline

Changes since v3:
* Added a header include

Changes since v5:
* Made rdt_bit_usage_show() take the cpus_read_lock() instead of using
RCU.

Changes since v6:
* Added lockdep_is_cpus_held() to get_domain_from_cpu().
* Added cpus_read_lock() around overflow and limbo handlers.
---
arch/x86/kernel/cpu/resctrl/core.c | 44 +++++++++++----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 15 ++++-
arch/x86/kernel/cpu/resctrl/monitor.c | 8 +++
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 3 +
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 68 ++++++++++++++++++-----
include/linux/resctrl.h | 2 +-
6 files changed, 112 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b03a6c658ae5..9f1aa555a8ea 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,6 +16,7 @@

#define pr_fmt(fmt) "resctrl: " fmt

+#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/cacheinfo.h>
@@ -25,8 +26,15 @@
#include <asm/resctrl.h>
#include "internal.h"

-/* Mutex to protect rdtgroup access. */
-DEFINE_MUTEX(rdtgroup_mutex);
+/*
+ * rdt_domain structures are kfree()d when their last CPU goes offline,
+ * and allocated when the first CPU in a new domain comes online.
+ * The rdt_resource's domain list is updated when this happens. Readers of
+ * the domain list must either take cpus_read_lock(), or rely on an RCU
+ * read-side critical section, to avoid observing concurrent modification.
+ * All writers take this mutex:
+ */
+static DEFINE_MUTEX(domain_list_lock);

/*
* The cached resctrl_pqr_state is strictly per CPU and can never be
@@ -354,6 +362,15 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

+ /*
+ * Walking r->domains, ensure it can't race with cpuhp.
+ * Because this is called via IPI by rdt_ctrl_update(), assertions
+ * about locks this thread holds will lead to false positives. Check
+ * someone is holding the CPUs lock.
+ */
+ if (IS_ENABLED(CONFIG_LOCKDEP))
+ lockdep_is_cpus_held();
+
list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
@@ -510,6 +527,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
struct rdt_domain *d;
int err;

+ lockdep_assert_held(&domain_list_lock);
+
d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -543,11 +562,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- list_add_tail(&d->list, add_pos);
+ list_add_tail_rcu(&d->list, add_pos);

err = resctrl_online_domain(r, d);
if (err) {
- list_del(&d->list);
+ list_del_rcu(&d->list);
+ synchronize_rcu();
domain_free(hw_dom);
}
}
@@ -558,6 +578,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;

+ lockdep_assert_held(&domain_list_lock);
+
d = rdt_find_domain(r, id, NULL);
if (IS_ERR_OR_NULL(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -568,7 +590,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
resctrl_offline_domain(r, d);
- list_del(&d->list);
+ list_del_rcu(&d->list);
+ synchronize_rcu();

/*
* rdt_domain "d" is going to be freed below, so clear
@@ -598,13 +621,13 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

- mutex_lock(&rdtgroup_mutex);
+ mutex_lock(&domain_list_lock);
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
- clear_closid_rmid(cpu);
+ mutex_unlock(&domain_list_lock);

+ clear_closid_rmid(cpu);
resctrl_online_cpu(cpu);
- mutex_unlock(&rdtgroup_mutex);

return 0;
}
@@ -613,13 +636,14 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
{
struct rdt_resource *r;

- mutex_lock(&rdtgroup_mutex);
resctrl_offline_cpu(cpu);

+ mutex_lock(&domain_list_lock);
for_each_capable_rdt_resource(r)
domain_remove_cpu(cpu, r);
+ mutex_unlock(&domain_list_lock);
+
clear_closid_rmid(cpu);
- mutex_unlock(&rdtgroup_mutex);

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 64db51455df3..dc59643498bf 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -212,6 +212,9 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct rdt_domain *d;
unsigned long dom_id;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
(r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
@@ -316,6 +319,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
struct rdt_domain *d;
u32 idx;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

@@ -381,11 +387,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return -EINVAL;
buf[nbytes - 1] = '\0';

- cpus_read_lock();
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
- cpus_read_unlock();
return -ENOENT;
}
rdt_last_cmd_clear();
@@ -447,7 +451,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
out:
rdt_staged_configs_clear();
rdtgroup_kn_unlock(of->kn);
- cpus_read_unlock();
return ret ?: nbytes;
}

@@ -467,6 +470,9 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
bool sep = false;
u32 ctrl_val;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(dom, &r->domains, list) {
if (sep)
@@ -537,6 +543,9 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
{
int cpu;

+ /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
/*
* Setup the parameters to pass to mon_event_count() to read the data.
*/
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 67edd4c440f0..c34a35ec0f03 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -15,6 +15,7 @@
* Software Developer Manual June 2016, volume 3, section 17.17.
*/

+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/sizes.h>
#include <linux/slab.h>
@@ -472,6 +473,9 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

lockdep_assert_held(&rdtgroup_mutex);

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);

entry->busy = 0;
@@ -778,6 +782,7 @@ void cqm_handle_limbo(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
struct rdt_domain *d;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

d = container_of(work, struct rdt_domain, cqm_limbo.work);
@@ -792,6 +797,7 @@ void cqm_handle_limbo(struct work_struct *work)
}

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
}

/**
@@ -823,6 +829,7 @@ void mbm_handle_overflow(struct work_struct *work)
struct rdt_resource *r;
struct rdt_domain *d;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

/*
@@ -856,6 +863,7 @@ void mbm_handle_overflow(struct work_struct *work)

out_unlock:
mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
}

/**
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8056bed033cc..884b88e25141 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -844,6 +844,9 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
struct rdt_domain *d_i;
bool ret = false;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
return true;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 777e9f680332..011e17efb1a6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -35,6 +35,10 @@
DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key);
DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
+
+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
static struct kernfs_root *rdt_root;
struct rdtgroup rdtgroup_default;
LIST_HEAD(rdt_all_groups);
@@ -1014,6 +1018,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
bool sep = false;
u32 ctrl_val;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
list_for_each_entry(dom, &r->domains, list) {
@@ -1074,6 +1079,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
}
seq_putc(seq, '\n');
mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
return 0;
}

@@ -1329,6 +1335,9 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
struct rdt_domain *d;
u32 ctrl;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
@@ -1593,6 +1602,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
struct rdt_domain *dom;
bool sep = false;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

list_for_each_entry(dom, &r->domains, list) {
@@ -1609,6 +1619,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
seq_puts(s, "\n");

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return 0;
}
@@ -1690,6 +1701,9 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
unsigned long dom_id, val;
struct rdt_domain *d;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
next:
if (!tok || tok[0] == '\0')
return 0;
@@ -1736,6 +1750,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
if (nbytes == 0 || buf[nbytes - 1] != '\n')
return -EINVAL;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

rdt_last_cmd_clear();
@@ -1745,6 +1760,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return ret ?: nbytes;
}
@@ -1760,6 +1776,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
if (nbytes == 0 || buf[nbytes - 1] != '\n')
return -EINVAL;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

rdt_last_cmd_clear();
@@ -1769,6 +1786,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return ret ?: nbytes;
}
@@ -2245,6 +2263,9 @@ static int set_cache_qos_cfg(int level, bool enable)
struct rdt_domain *d;
int cpu;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (level == RDT_RESOURCE_L3)
update = l3_qos_cfg_update;
else if (level == RDT_RESOURCE_L2)
@@ -2444,6 +2465,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)

rdtgroup_kn_get(rdtgrp, kn);

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

/* Was this group deleted while we waited? */
@@ -2461,6 +2483,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
return;

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
rdtgroup_kn_put(rdtgrp, kn);
}

@@ -2793,6 +2817,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
struct rdt_domain *d;
int i;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

@@ -3077,6 +3104,9 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
list_for_each_entry(dom, &r->domains, list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
@@ -3907,13 +3937,13 @@ static void domain_destroy_mon_state(struct rdt_domain *d)

void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- lockdep_assert_held(&rdtgroup_mutex);
+ mutex_lock(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);

if (!r->mon_capable)
- return;
+ goto out_unlock;

/*
* If resctrl is mounted, remove all the
@@ -3938,6 +3968,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
}

domain_destroy_mon_state(d);
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
}

static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
@@ -3973,20 +4006,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)

int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
+ int err = 0;

- lockdep_assert_held(&rdtgroup_mutex);
+ mutex_lock(&rdtgroup_mutex);

- if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA) {
/* RDT_RESOURCE_MBA is never mon_capable */
- return mba_sc_domain_allocate(r, d);
+ err = mba_sc_domain_allocate(r, d);
+ goto out_unlock;
+ }

if (!r->mon_capable)
- return 0;
+ goto out_unlock;

err = domain_setup_mon_state(r, d);
if (err)
- return err;
+ goto out_unlock;

if (is_mbm_enabled()) {
INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
@@ -4006,15 +4041,18 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (resctrl_mounted && resctrl_arch_mon_capable())
mkdir_mondata_subdir_allrdtgrp(r, d);

- return 0;
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
+
+ return err;
}

void resctrl_online_cpu(unsigned int cpu)
{
- lockdep_assert_held(&rdtgroup_mutex);
-
+ mutex_lock(&rdtgroup_mutex);
/* The CPU is set in default rdtgroup after online. */
cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+ mutex_unlock(&rdtgroup_mutex);
}

static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
@@ -4033,8 +4071,7 @@ void resctrl_offline_cpu(unsigned int cpu)
struct rdtgroup *rdtgrp;
struct rdt_domain *d;

- lockdep_assert_held(&rdtgroup_mutex);
-
+ mutex_lock(&rdtgroup_mutex);
list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
clear_childcpus(rdtgrp, cpu);
@@ -4043,7 +4080,7 @@ void resctrl_offline_cpu(unsigned int cpu)
}

if (!l3->mon_capable)
- return;
+ goto out_unlock;

d = get_domain_from_cpu(cpu, l3);
if (d) {
@@ -4057,6 +4094,9 @@ void resctrl_offline_cpu(unsigned int cpu)
cqm_setup_limbo_handler(d, 0, cpu);
}
}
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
}

/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 270ff1d5c051..a365f67131ec 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -159,7 +159,7 @@ struct resctrl_schema;
* @cache_level: Which cache level defines scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @domains: RCU list of all domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.
--
2.39.2


2024-02-13 23:17:02

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

Hi James,

On 2/13/2024 10:44 AM, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
>
> Add resctrl_put_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().
>
> There is no reason to backport this to a stable kernel.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Babu Moger <[email protected]>
> Tested-by: Carl Worth <[email protected]> # arm64
> Reviewed-by: Babu Moger <[email protected]>

Thank you.

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2024-02-13 23:27:14

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

(+Tony)

Hi Boris,

Could you please consider this series for inclusion?

Thank you very much.

Reinette

On 2/13/2024 10:44 AM, James Morse wrote:
> Hello!
>
> It's been back and forth for whether this series should be rebased onto Tony's
> SNC series. This version isn't, its based on tip/x86/cache.
> (I have the rebased-and-tested versions if anyone needs them)
>
> This version just has minor cleanup from the previous one.
> * An unusued parameter in unused code has gone,
> * I've added a comment about the sizeing of the index array - it only matters on arm64.
>
> Changes are also noted on each patch.
>
> ~
>
> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch code
> and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
>
> (What's MPAM? See the cover letter of the first series. [1])
>
> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
> isn't an independent number - it extends the PARTID (same as CLOSID) space
> with bits that aren't used to select the configuration. The monitors can
> then be told to match specific PMG values, allowing monitor-groups to be
> created.
>
> But, MPAM expects the monitors to always monitor by PARTID. The
> Cache-storage-utilisation counters can only work this way.
> (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
> UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
> this)
>
> It gets worse, as some SoCs may have very few PMG bits. I've seen the
> datasheet for one that has a single bit of PMG space.
>
> To be usable, MPAM's counters always need the PARTID and the PMG.
> For resctrl, this means always making the CLOSID available when the RMID
> is used.
>
> To ensure RMID are always unique, this series combines the CLOSID and RMID
> into an index, and manages RMID based on that. For x86, the index and RMID
> would always be the same.
>
>
> Currently the architecture specific code in the cpuhp callbacks takes the
> rdtgroup_mutex. This means the filesystem code would have to export this
> lock, resulting in an ill-defined interface between the two, and the possibility
> of cross-architecture lock-ordering head aches.
>
> The second part of this series adds a domain_list_lock to protect writes to the
> domain list, and protects the domain list with RCU - or cpus_read_lock().
>
> Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
> working, its very likely they'll need to be plumbed up to perf. An uncore PMU
> driver would need to be a lockless reader of the domain list.
>
> This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9
>
> Bugs welcome,
>
> Thanks,
>
> James
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [v1] https://lore.kernel.org/all/[email protected]/
> [v2] https://lore.kernel.org/lkml/[email protected]/
> [v3] https://lore.kernel.org/r/[email protected]
> [v4] https://lore.kernel.org/r/[email protected]
> [v5] https://lore.kernel.org/lkml/[email protected]/
> [v6] https://lore.kernel.org/all/[email protected]/
> [v7] https://lore.kernel.org/r/[email protected]/
> [v8] https://lore.kernel.org/lkml/[email protected]/
>
> James Morse (24):
> tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
> x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
> x86/resctrl: Create helper for RMID allocation and mondata dir
> creation
> x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
> x86/resctrl: Track the closid with the rmid
> x86/resctrl: Access per-rmid structures by index
> x86/resctrl: Allow RMID allocation to be scoped by CLOSID
> x86/resctrl: Track the number of dirty RMID a CLOSID has
> x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
> x86/resctrl: Allocate the cleanest CLOSID by searching
> closid_num_dirty_rmid
> x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
> x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
> x86/resctrl: Queue mon_event_read() instead of sending an IPI
> x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
> x86/resctrl: Allow arch to allocate memory needed in
> resctrl_arch_rmid_read()
> x86/resctrl: Make resctrl_mounted checks explicit
> x86/resctrl: Move alloc/mon static keys into helpers
> x86/resctrl: Make rdt_enable_key the arch's decision to switch
> x86/resctrl: Add helpers for system wide mon/alloc capable
> x86/resctrl: Add CPU online callback for resctrl work
> x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
> cpu
> x86/resctrl: Add CPU offline callback for resctrl work
> x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
> x86/resctrl: Separate arch and fs resctrl locks
>
> arch/x86/include/asm/resctrl.h | 90 +++++
> arch/x86/kernel/cpu/resctrl/core.c | 102 ++---
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++-
> arch/x86/kernel/cpu/resctrl/internal.h | 67 +++-
> arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++-----
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++-----
> include/linux/resctrl.h | 48 ++-
> include/linux/tick.h | 9 +-
> 9 files changed, 921 insertions(+), 280 deletions(-)
>

2024-02-14 15:14:46

by Babu Moger

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Sanity tested the series again on AMD system. Everything looks good.

Tested-by: Babu Moger <[email protected]>

On 2/13/24 12:44, James Morse wrote:
> Hello!
>
> It's been back and forth for whether this series should be rebased onto Tony's
> SNC series. This version isn't, its based on tip/x86/cache.
> (I have the rebased-and-tested versions if anyone needs them)
>
> This version just has minor cleanup from the previous one.
> * An unusued parameter in unused code has gone,
> * I've added a comment about the sizeing of the index array - it only matters on arm64.
>
> Changes are also noted on each patch.
>
> ~
>
> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch code
> and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
>
> (What's MPAM? See the cover letter of the first series. [1])
>
> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
> isn't an independent number - it extends the PARTID (same as CLOSID) space
> with bits that aren't used to select the configuration. The monitors can
> then be told to match specific PMG values, allowing monitor-groups to be
> created.
>
> But, MPAM expects the monitors to always monitor by PARTID. The
> Cache-storage-utilisation counters can only work this way.
> (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
> UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
> this)
>
> It gets worse, as some SoCs may have very few PMG bits. I've seen the
> datasheet for one that has a single bit of PMG space.
>
> To be usable, MPAM's counters always need the PARTID and the PMG.
> For resctrl, this means always making the CLOSID available when the RMID
> is used.
>
> To ensure RMID are always unique, this series combines the CLOSID and RMID
> into an index, and manages RMID based on that. For x86, the index and RMID
> would always be the same.
>
>
> Currently the architecture specific code in the cpuhp callbacks takes the
> rdtgroup_mutex. This means the filesystem code would have to export this
> lock, resulting in an ill-defined interface between the two, and the possibility
> of cross-architecture lock-ordering head aches.
>
> The second part of this series adds a domain_list_lock to protect writes to the
> domain list, and protects the domain list with RCU - or cpus_read_lock().
>
> Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
> working, its very likely they'll need to be plumbed up to perf. An uncore PMU
> driver would need to be a lockless reader of the domain list.
>
> This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9
>
> Bugs welcome,
>
> Thanks,
>
> James
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [v1] https://lore.kernel.org/all/[email protected]/
> [v2] https://lore.kernel.org/lkml/[email protected]/
> [v3] https://lore.kernel.org/r/[email protected]
> [v4] https://lore.kernel.org/r/[email protected]
> [v5] https://lore.kernel.org/lkml/[email protected]/
> [v6] https://lore.kernel.org/all/[email protected]/
> [v7] https://lore.kernel.org/r/[email protected]/
> [v8] https://lore.kernel.org/lkml/[email protected]/
>
> James Morse (24):
> tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
> x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
> x86/resctrl: Create helper for RMID allocation and mondata dir
> creation
> x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
> x86/resctrl: Track the closid with the rmid
> x86/resctrl: Access per-rmid structures by index
> x86/resctrl: Allow RMID allocation to be scoped by CLOSID
> x86/resctrl: Track the number of dirty RMID a CLOSID has
> x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
> x86/resctrl: Allocate the cleanest CLOSID by searching
> closid_num_dirty_rmid
> x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
> x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
> x86/resctrl: Queue mon_event_read() instead of sending an IPI
> x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
> x86/resctrl: Allow arch to allocate memory needed in
> resctrl_arch_rmid_read()
> x86/resctrl: Make resctrl_mounted checks explicit
> x86/resctrl: Move alloc/mon static keys into helpers
> x86/resctrl: Make rdt_enable_key the arch's decision to switch
> x86/resctrl: Add helpers for system wide mon/alloc capable
> x86/resctrl: Add CPU online callback for resctrl work
> x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
> cpu
> x86/resctrl: Add CPU offline callback for resctrl work
> x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
> x86/resctrl: Separate arch and fs resctrl locks
>
> arch/x86/include/asm/resctrl.h | 90 +++++
> arch/x86/kernel/cpu/resctrl/core.c | 102 ++---
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++-
> arch/x86/kernel/cpu/resctrl/internal.h | 67 +++-
> arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++-----
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++-----
> include/linux/resctrl.h | 48 ++-
> include/linux/tick.h | 9 +-
> 9 files changed, 921 insertions(+), 280 deletions(-)
>

--
Thanks
Babu Moger

2024-02-17 00:34:28

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
> Hello!
>
> It's been back and forth for whether this series should be rebased onto Tony's
> SNC series. This version isn't, its based on tip/x86/cache.
> (I have the rebased-and-tested versions if anyone needs them)

In case James' patches go first, I took a crack at basing my SNC series
on top of his patches (specifically the mpam/monitors_and_locking/v9
branch of git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git).

Result is here:

git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
Branch: james_then_snc

The end result of which ought to be pretty similar to the
"rebased-and-tested" versions that James mentions above.

-Tony

2024-02-17 10:56:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
> Hello!
>
> It's been back and forth for whether this series should be rebased onto Tony's
> SNC series. This version isn't, its based on tip/x86/cache.
> (I have the rebased-and-tested versions if anyone needs them)

The set applied ontop of tip:x86/cache gives:

vmlinux.o: in function `get_domain_from_cpu':
(.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
ld: vmlinux.o: in function `rdt_ctrl_update':
(.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'

Config

00-14-04-randconfig-x86_64-26892.cfg

attached.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Attachments:
(No filename) (724.00 B)
00-14-04-randconfig-x86_64-26892.cfg (210.14 kB)
Download all attachments

2024-02-19 16:50:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:

> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>> Hello!
>>
>> It's been back and forth for whether this series should be rebased onto Tony's
>> SNC series. This version isn't, its based on tip/x86/cache.
>> (I have the rebased-and-tested versions if anyone needs them)
>
> The set applied ontop of tip:x86/cache gives:
>
> vmlinux.o: in function `get_domain_from_cpu':
> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
> ld: vmlinux.o: in function `rdt_ctrl_update':
> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'

Wants to be folded into patch 24.

Thanks,

tglx
---
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i
* about locks this thread holds will lead to false positives. Check
* someone is holding the CPUs lock.
*/
- if (IS_ENABLED(CONFIG_LOCKDEP))
- lockdep_is_cpus_held();
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
+ WARN_ON_ONCE(!lockdep_is_cpus_held());

list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */

2024-02-19 16:53:58

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi Thomas,

On 19/02/2024 16:49, Thomas Gleixner wrote:
> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:
>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>>> It's been back and forth for whether this series should be rebased onto Tony's
>>> SNC series. This version isn't, its based on tip/x86/cache.
>>> (I have the rebased-and-tested versions if anyone needs them)
>>
>> The set applied ontop of tip:x86/cache gives:
>>
>> vmlinux.o: in function `get_domain_from_cpu':
>> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
>> ld: vmlinux.o: in function `rdt_ctrl_update':
>> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'
>
> Wants to be folded into patch 24.

Thanks - I'm just putting a v10 together to fix this.


Thanks,

James

2024-02-19 16:54:07

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi Babu,

On 14/02/2024 15:01, Moger, Babu wrote:
> Sanity tested the series again on AMD system. Everything looks good.
>
> Tested-by: Babu Moger <[email protected]>


Thanks!

James

2024-02-19 16:54:15

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

Hi Reinette,

On 13/02/2024 23:14, Reinette Chatre wrote:
> On 2/13/2024 10:44 AM, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_put_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
>>
>> There is no reason to backport this to a stable kernel.

> Reviewed-by: Reinette Chatre <[email protected]>


Thanks!

James

2024-02-19 16:54:29

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi Boris,

Thanks for the config, (as Thomas pointed out) this is coming from an awkward lockdep
annotation - turns out it also depends on HOTPLUG_CPU.

I'll post a v10 with the collected tags and this fixed.


Thanks,

James

On 17/02/2024 10:55, Borislav Petkov wrote:
> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>> Hello!
>>
>> It's been back and forth for whether this series should be rebased onto Tony's
>> SNC series. This version isn't, its based on tip/x86/cache.
>> (I have the rebased-and-tested versions if anyone needs them)
>
> The set applied ontop of tip:x86/cache gives:
>
> vmlinux.o: in function `get_domain_from_cpu':
> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
> ld: vmlinux.o: in function `rdt_ctrl_update':
> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'
>
> Config
>
> 00-14-04-randconfig-x86_64-26892.cfg
>
> attached.
>
> Thx.
>


2024-02-19 17:56:12

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On 19/02/2024 17:51, Borislav Petkov wrote:
> On Mon, Feb 19, 2024 at 04:53:38PM +0000, James Morse wrote:
>> Thanks - I'm just putting a v10 together to fix this.
>
> No need - I'll fold it in.

Thanks!

James

Subject: [tip: x86/cache] x86/resctrl: Separate arch and fs resctrl locks

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: fb700810d30b9eb333a7bf447012e1158e35c62f
Gitweb: https://git.kernel.org/tip/fb700810d30b9eb333a7bf447012e1158e35c62f
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:38
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 19 Feb 2024 19:28:07 +01:00

x86/resctrl: Separate arch and fs resctrl locks

resctrl has one mutex that is taken by the architecture-specific code, and the
filesystem parts. The two interact via cpuhp, where the architecture code
updates the domain list. Filesystem handlers that walk the domains list should
not run concurrently with the cpuhp callback modifying the list.

Exposing a lock from the filesystem code means the interface is not cleanly
defined, and creates the possibility of cross-architecture lock ordering
headaches. The interaction only exists so that certain filesystem paths are
serialised against CPU hotplug. The CPU hotplug code already has a mechanism to
do this using cpus_read_lock().

MPAM's monitors have an overflow interrupt, so it needs to be possible to walk
the domains list in irq context. RCU is ideal for this, but some paths need to
be able to sleep to allocate memory.

Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp
callback, cpus_read_lock() must always be taken first.
rdtgroup_schemata_write() already does this.

Most of the filesystem code's domain list walkers are currently protected by
the rdtgroup_mutex taken in rdtgroup_kn_lock_live(). The exceptions are
rdt_bit_usage_show() and the mon_config helpers which take the lock directly.

Make the domain list protected by RCU. An architecture-specific lock prevents
concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU,
but to keep all the filesystem operations the same, this is changed to call
cpus_read_lock(). The mon_config helpers send multiple IPIs, take the
cpus_read_lock() in these cases.

The other filesystem list walkers need to be able to sleep. Add
cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't
be invoked when file system operations are occurring.

Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live()
call isn't obvious.

Resctrl's domain online/offline calls now need to take the rdtgroup_mutex
themselves.

[ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ]

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/cpu/resctrl/core.c | 44 ++++++++++----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 15 ++++-
arch/x86/kernel/cpu/resctrl/monitor.c | 8 +++-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 3 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 68 +++++++++++++++++-----
include/linux/resctrl.h | 2 +-
6 files changed, 112 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b03a6c6..8a4ef4f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,6 +16,7 @@

#define pr_fmt(fmt) "resctrl: " fmt

+#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/cacheinfo.h>
@@ -25,8 +26,15 @@
#include <asm/resctrl.h>
#include "internal.h"

-/* Mutex to protect rdtgroup access. */
-DEFINE_MUTEX(rdtgroup_mutex);
+/*
+ * rdt_domain structures are kfree()d when their last CPU goes offline,
+ * and allocated when the first CPU in a new domain comes online.
+ * The rdt_resource's domain list is updated when this happens. Readers of
+ * the domain list must either take cpus_read_lock(), or rely on an RCU
+ * read-side critical section, to avoid observing concurrent modification.
+ * All writers take this mutex:
+ */
+static DEFINE_MUTEX(domain_list_lock);

/*
* The cached resctrl_pqr_state is strictly per CPU and can never be
@@ -354,6 +362,15 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

+ /*
+ * Walking r->domains, ensure it can't race with cpuhp.
+ * Because this is called via IPI by rdt_ctrl_update(), assertions
+ * about locks this thread holds will lead to false positives. Check
+ * someone is holding the CPUs lock.
+ */
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
+ WARN_ON_ONCE(!lockdep_is_cpus_held());
+
list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
@@ -510,6 +527,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
struct rdt_domain *d;
int err;

+ lockdep_assert_held(&domain_list_lock);
+
d = rdt_find_domain(r, id, &add_pos);
if (IS_ERR(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -543,11 +562,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
return;
}

- list_add_tail(&d->list, add_pos);
+ list_add_tail_rcu(&d->list, add_pos);

err = resctrl_online_domain(r, d);
if (err) {
- list_del(&d->list);
+ list_del_rcu(&d->list);
+ synchronize_rcu();
domain_free(hw_dom);
}
}
@@ -558,6 +578,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;

+ lockdep_assert_held(&domain_list_lock);
+
d = rdt_find_domain(r, id, NULL);
if (IS_ERR_OR_NULL(d)) {
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -568,7 +590,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
cpumask_clear_cpu(cpu, &d->cpu_mask);
if (cpumask_empty(&d->cpu_mask)) {
resctrl_offline_domain(r, d);
- list_del(&d->list);
+ list_del_rcu(&d->list);
+ synchronize_rcu();

/*
* rdt_domain "d" is going to be freed below, so clear
@@ -598,13 +621,13 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

- mutex_lock(&rdtgroup_mutex);
+ mutex_lock(&domain_list_lock);
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
- clear_closid_rmid(cpu);
+ mutex_unlock(&domain_list_lock);

+ clear_closid_rmid(cpu);
resctrl_online_cpu(cpu);
- mutex_unlock(&rdtgroup_mutex);

return 0;
}
@@ -613,13 +636,14 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
{
struct rdt_resource *r;

- mutex_lock(&rdtgroup_mutex);
resctrl_offline_cpu(cpu);

+ mutex_lock(&domain_list_lock);
for_each_capable_rdt_resource(r)
domain_remove_cpu(cpu, r);
+ mutex_unlock(&domain_list_lock);
+
clear_closid_rmid(cpu);
- mutex_unlock(&rdtgroup_mutex);

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 20b02d6..7997b47 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -212,6 +212,9 @@ static int parse_line(char *line, struct resctrl_schema *s,
struct rdt_domain *d;
unsigned long dom_id;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
(r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
@@ -316,6 +319,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
struct rdt_domain *d;
u32 idx;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

@@ -381,11 +387,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return -EINVAL;
buf[nbytes - 1] = '\0';

- cpus_read_lock();
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
- cpus_read_unlock();
return -ENOENT;
}
rdt_last_cmd_clear();
@@ -447,7 +451,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
out:
rdt_staged_configs_clear();
rdtgroup_kn_unlock(of->kn);
- cpus_read_unlock();
return ret ?: nbytes;
}

@@ -467,6 +470,9 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
bool sep = false;
u32 ctrl_val;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(dom, &r->domains, list) {
if (sep)
@@ -537,6 +543,9 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
{
int cpu;

+ /* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
/*
* Setup the parameters to pass to mon_event_count() to read the data.
*/
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 67edd4c..c34a35e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -15,6 +15,7 @@
* Software Developer Manual June 2016, volume 3, section 17.17.
*/

+#include <linux/cpu.h>
#include <linux/module.h>
#include <linux/sizes.h>
#include <linux/slab.h>
@@ -472,6 +473,9 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)

lockdep_assert_held(&rdtgroup_mutex);

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);

entry->busy = 0;
@@ -778,6 +782,7 @@ void cqm_handle_limbo(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
struct rdt_domain *d;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

d = container_of(work, struct rdt_domain, cqm_limbo.work);
@@ -792,6 +797,7 @@ void cqm_handle_limbo(struct work_struct *work)
}

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
}

/**
@@ -823,6 +829,7 @@ void mbm_handle_overflow(struct work_struct *work)
struct rdt_resource *r;
struct rdt_domain *d;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

/*
@@ -856,6 +863,7 @@ void mbm_handle_overflow(struct work_struct *work)

out_unlock:
mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
}

/**
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8056bed..884b88e 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -844,6 +844,9 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
struct rdt_domain *d_i;
bool ret = false;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
return true;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 777e9f6..011e17e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -35,6 +35,10 @@
DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key);
DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
+
+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
static struct kernfs_root *rdt_root;
struct rdtgroup rdtgroup_default;
LIST_HEAD(rdt_all_groups);
@@ -1014,6 +1018,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
bool sep = false;
u32 ctrl_val;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
hw_shareable = r->cache.shareable_bits;
list_for_each_entry(dom, &r->domains, list) {
@@ -1074,6 +1079,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
}
seq_putc(seq, '\n');
mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
return 0;
}

@@ -1329,6 +1335,9 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
struct rdt_domain *d;
u32 ctrl;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
list_for_each_entry(s, &resctrl_schema_all, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
@@ -1593,6 +1602,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
struct rdt_domain *dom;
bool sep = false;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

list_for_each_entry(dom, &r->domains, list) {
@@ -1609,6 +1619,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
seq_puts(s, "\n");

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return 0;
}
@@ -1690,6 +1701,9 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
unsigned long dom_id, val;
struct rdt_domain *d;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
next:
if (!tok || tok[0] == '\0')
return 0;
@@ -1736,6 +1750,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
if (nbytes == 0 || buf[nbytes - 1] != '\n')
return -EINVAL;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

rdt_last_cmd_clear();
@@ -1745,6 +1760,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return ret ?: nbytes;
}
@@ -1760,6 +1776,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
if (nbytes == 0 || buf[nbytes - 1] != '\n')
return -EINVAL;

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

rdt_last_cmd_clear();
@@ -1769,6 +1786,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();

return ret ?: nbytes;
}
@@ -2245,6 +2263,9 @@ static int set_cache_qos_cfg(int level, bool enable)
struct rdt_domain *d;
int cpu;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (level == RDT_RESOURCE_L3)
update = l3_qos_cfg_update;
else if (level == RDT_RESOURCE_L2)
@@ -2444,6 +2465,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)

rdtgroup_kn_get(rdtgrp, kn);

+ cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

/* Was this group deleted while we waited? */
@@ -2461,6 +2483,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
return;

mutex_unlock(&rdtgroup_mutex);
+ cpus_read_unlock();
+
rdtgroup_kn_put(rdtgrp, kn);
}

@@ -2793,6 +2817,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
struct rdt_domain *d;
int i;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;

@@ -3077,6 +3104,9 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_domain *dom;
int ret;

+ /* Walking r->domains, ensure it can't race with cpuhp */
+ lockdep_assert_cpus_held();
+
list_for_each_entry(dom, &r->domains, list) {
ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
if (ret)
@@ -3907,13 +3937,13 @@ static void domain_destroy_mon_state(struct rdt_domain *d)

void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- lockdep_assert_held(&rdtgroup_mutex);
+ mutex_lock(&rdtgroup_mutex);

if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
mba_sc_domain_destroy(r, d);

if (!r->mon_capable)
- return;
+ goto out_unlock;

/*
* If resctrl is mounted, remove all the
@@ -3938,6 +3968,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
}

domain_destroy_mon_state(d);
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
}

static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
@@ -3973,20 +4006,22 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)

int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
{
- int err;
+ int err = 0;

- lockdep_assert_held(&rdtgroup_mutex);
+ mutex_lock(&rdtgroup_mutex);

- if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+ if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA) {
/* RDT_RESOURCE_MBA is never mon_capable */
- return mba_sc_domain_allocate(r, d);
+ err = mba_sc_domain_allocate(r, d);
+ goto out_unlock;
+ }

if (!r->mon_capable)
- return 0;
+ goto out_unlock;

err = domain_setup_mon_state(r, d);
if (err)
- return err;
+ goto out_unlock;

if (is_mbm_enabled()) {
INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
@@ -4006,15 +4041,18 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (resctrl_mounted && resctrl_arch_mon_capable())
mkdir_mondata_subdir_allrdtgrp(r, d);

- return 0;
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
+
+ return err;
}

void resctrl_online_cpu(unsigned int cpu)
{
- lockdep_assert_held(&rdtgroup_mutex);
-
+ mutex_lock(&rdtgroup_mutex);
/* The CPU is set in default rdtgroup after online. */
cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+ mutex_unlock(&rdtgroup_mutex);
}

static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
@@ -4033,8 +4071,7 @@ void resctrl_offline_cpu(unsigned int cpu)
struct rdtgroup *rdtgrp;
struct rdt_domain *d;

- lockdep_assert_held(&rdtgroup_mutex);
-
+ mutex_lock(&rdtgroup_mutex);
list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
clear_childcpus(rdtgrp, cpu);
@@ -4043,7 +4080,7 @@ void resctrl_offline_cpu(unsigned int cpu)
}

if (!l3->mon_capable)
- return;
+ goto out_unlock;

d = get_domain_from_cpu(cpu, l3);
if (d) {
@@ -4057,6 +4094,9 @@ void resctrl_offline_cpu(unsigned int cpu)
cqm_setup_limbo_handler(d, 0, cpu);
}
}
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
}

/*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 270ff1d..a365f67 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -159,7 +159,7 @@ struct resctrl_schema;
* @cache_level: Which cache level defines scope of this resource
* @cache: Cache allocation related data
* @membw: If the component has bandwidth controls, their properties.
- * @domains: All domains for this resource
+ * @domains: RCU list of all domains for this resource
* @name: Name to use in "schemata" file.
* @data_width: Character width of data when displaying
* @default_ctrl: Specifies default cache cbm or memory B/W percent.

Subject: [tip: x86/cache] x86/resctrl: Add CPU online callback for resctrl work

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 1b3e50ce7f5001f1e0edaf7d6abea43b264db7ee
Gitweb: https://git.kernel.org/tip/1b3e50ce7f5001f1e0edaf7d6abea43b264db7ee
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:34
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:33 +01:00

x86/resctrl: Add CPU online callback for resctrl work

The resctrl architecture specific code may need to create a domain when a CPU
comes online, it also needs to reset the CPUs PQR_ASSOC register. The resctrl
filesystem code needs to update the rdtgroup_default CPU mask when CPUs are
brought online.

Currently, this is all done in one function, resctrl_online_cpu(). It will
need to be split into architecture and filesystem parts before resctrl can be
moved to /fs/.

Pull the rdtgroup_default update work out as a filesystem specific cpu_online
helper. resctrl_online_cpu() is the obvious name for this, which means the
version in core.c needs renaming.

resctrl_online_cpu() is called by the arch code once it has done the work to
add the new CPU to any domains.

In future patches, resctrl_online_cpu() will take the rdtgroup_mutex itself.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/cpu/resctrl/core.c | 8 ++++----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++++
include/linux/resctrl.h | 1 +
3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index d1dc80a..4627d44 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -606,16 +606,16 @@ static void clear_closid_rmid(int cpu)
RESCTRL_RESERVED_CLOSID);
}

-static int resctrl_online_cpu(unsigned int cpu)
+static int resctrl_arch_online_cpu(unsigned int cpu)
{
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
for_each_capable_rdt_resource(r)
domain_add_cpu(cpu, r);
- /* The cpu is set in default rdtgroup after online. */
- cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
clear_closid_rmid(cpu);
+
+ resctrl_online_cpu(cpu);
mutex_unlock(&rdtgroup_mutex);

return 0;
@@ -967,7 +967,7 @@ static int __init resctrl_late_init(void)

state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"x86/resctrl/cat:online:",
- resctrl_online_cpu, resctrl_offline_cpu);
+ resctrl_arch_online_cpu, resctrl_offline_cpu);
if (state < 0)
return state;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ed5fc67..38d3b19 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -4007,6 +4007,14 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
return 0;
}

+void resctrl_online_cpu(unsigned int cpu)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ /* The CPU is set in default rdtgroup after online. */
+ cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bf460c9..4c4bad3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -223,6 +223,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_online_cpu(unsigned int cpu);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid

Subject: [tip: x86/cache] x86/resctrl: Move alloc/mon static keys into helpers

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 5db6a4a75c95f6967d57906ba7b82756d1985d63
Gitweb: https://git.kernel.org/tip/5db6a4a75c95f6967d57906ba7b82756d1985d63
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:31
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:32 +01:00

x86/resctrl: Move alloc/mon static keys into helpers

resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any static keys
that control it should be buried behind helpers.

Move the alloc/mon logic into arch-specific helpers as a preparatory step for
making the rdt_enable_key's status something the arch code decides.

This means other architectures don't have to mirror the static keys.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/include/asm/resctrl.h | 20 ++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 5 -----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 29c4cc3..3c9137b 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -42,6 +42,26 @@ DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);

+static inline void resctrl_arch_enable_alloc(void)
+{
+ static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_disable_alloc(void)
+{
+ static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_enable_mon(void)
+{
+ static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+}
+
+static inline void resctrl_arch_disable_mon(void)
+{
+ static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+}
+
/*
* __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9bfda69..7858085 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -94,9 +94,6 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
return container_of(kfc, struct rdt_fs_context, kfc);
}

-DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
-DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
-
/**
* struct mon_evt - Entry in the event list of a resource
* @evtid: event id
@@ -452,8 +449,6 @@ extern struct mutex rdtgroup_mutex;

extern struct rdt_hw_resource rdt_resources_all[];
extern struct rdtgroup rdtgroup_default;
-DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
-
extern struct dentry *debugfs_resctrl;

enum resctrl_res_level {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 857fbbc..231207f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2668,9 +2668,9 @@ static int rdt_get_tree(struct fs_context *fc)
goto out_psl;

if (rdt_alloc_capable)
- static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+ resctrl_arch_enable_alloc();
if (rdt_mon_capable)
- static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_enable_mon();

if (rdt_alloc_capable || rdt_mon_capable) {
static_branch_enable_cpuslocked(&rdt_enable_key);
@@ -2946,8 +2946,8 @@ static void rdt_kill_sb(struct super_block *sb)
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
rdtgroup_destroy_root();
- static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
- static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_disable_alloc();
+ resctrl_arch_disable_mon();
static_branch_disable_cpuslocked(&rdt_enable_key);
resctrl_mounted = false;
kernfs_kill_sb(sb);

Subject: [tip: x86/cache] x86/resctrl: Add helpers for system wide mon/alloc capable

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 30017b60706c2ba72a0a4da7d5ef8f5fa95a2f01
Gitweb: https://git.kernel.org/tip/30017b60706c2ba72a0a4da7d5ef8f5fa95a2f01
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:33
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:33 +01:00

x86/resctrl: Add helpers for system wide mon/alloc capable

resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of
the resources support the corresponding features. resctrl also uses the
static keys that affect the architecture's context-switch code to determine the
same thing.

This forces another architecture to have the same static keys.

As the static key is enabled based on the capable flag, and none of the
filesystem uses of these are in the scheduler path, move the capable flags
behind helpers, and use these in the filesystem code instead of the static key.

After this change, only the architecture code manages and uses the static keys
to ensure __resctrl_sched_in() does not need runtime checks.

This avoids multiple architectures having to define the same static keys.

Cases where the static key implicitly tested if the resctrl filesystem was
mounted all have an explicit check now.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/include/asm/resctrl.h | 13 ++++++++-
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 4 +-
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 38 +++++++++++-----------
5 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b74aa34..12dbd25 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -38,10 +38,18 @@ struct resctrl_pqr_state {

DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);

+extern bool rdt_alloc_capable;
+extern bool rdt_mon_capable;
+
DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);

+static inline bool resctrl_arch_alloc_capable(void)
+{
+ return rdt_alloc_capable;
+}
+
static inline void resctrl_arch_enable_alloc(void)
{
static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
@@ -54,6 +62,11 @@ static inline void resctrl_arch_disable_alloc(void)
static_branch_dec_cpuslocked(&rdt_enable_key);
}

+static inline bool resctrl_arch_mon_capable(void)
+{
+ return rdt_mon_capable;
+}
+
static inline void resctrl_arch_enable_mon(void)
{
static_branch_enable_cpuslocked(&rdt_mon_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 7858085..3ee855c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -137,8 +137,6 @@ struct rmid_read {
void *arch_mon_ctx;
};

-extern bool rdt_alloc_capable;
-extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
extern struct list_head resctrl_schema_all;
extern bool resctrl_mounted;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index d5d8a58..92d7ba6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -817,7 +817,7 @@ void mbm_handle_overflow(struct work_struct *work)
* If the filesystem has been unmounted this work no longer needs to
* run.
*/
- if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+ if (!resctrl_mounted || !resctrl_arch_mon_capable())
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -854,7 +854,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
* When a domain comes online there is no guarantee the filesystem is
* mounted. If not, there is no need to catch counter overflow.
*/
- if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+ if (!resctrl_mounted || !resctrl_arch_mon_capable())
return;
cpu = cpumask_any_housekeeping(&dom->cpu_mask);
dom->mbm_work_cpu = cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d8f4411..8056bed 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -581,7 +581,7 @@ static int rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
if (ret)
goto err_cpus;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = rdtgroup_kn_mode_restrict(rdtgrp, "mon_groups");
if (ret)
goto err_cpus_list;
@@ -628,7 +628,7 @@ static int rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
if (ret)
goto err_cpus;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = rdtgroup_kn_mode_restore(rdtgrp, "mon_groups", 0777);
if (ret)
goto err_cpus_list;
@@ -776,7 +776,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
{
int ret;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = alloc_rmid(rdtgrp->closid);
if (ret < 0) {
rdt_last_cmd_puts("Out of RMIDs\n");
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7e57ac9..ed5fc67 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -641,13 +641,13 @@ static int __rdtgroup_move_task(struct task_struct *tsk,

static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_alloc_capable && (r->type == RDTCTRL_GROUP) &&
+ return (resctrl_arch_alloc_capable() && (r->type == RDTCTRL_GROUP) &&
resctrl_arch_match_closid(t, r->closid));
}

static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_mon_capable && (r->type == RDTMON_GROUP) &&
+ return (resctrl_arch_mon_capable() && (r->type == RDTMON_GROUP) &&
resctrl_arch_match_rmid(t, r->mon.parent->closid,
r->mon.rmid));
}
@@ -2632,7 +2632,7 @@ static int rdt_get_tree(struct fs_context *fc)

closid_init();

- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
flags |= RFTYPE_MON;

ret = rdtgroup_add_files(rdtgroup_default.kn, flags);
@@ -2645,7 +2645,7 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_schemata_free;

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
ret = mongroup_create_dir(rdtgroup_default.kn,
&rdtgroup_default, "mon_groups",
&kn_mongrp);
@@ -2667,12 +2667,12 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_psl;

- if (rdt_alloc_capable)
+ if (resctrl_arch_alloc_capable())
resctrl_arch_enable_alloc();
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
resctrl_arch_enable_mon();

- if (rdt_alloc_capable || rdt_mon_capable)
+ if (resctrl_arch_alloc_capable() || resctrl_arch_mon_capable())
resctrl_mounted = true;

if (is_mbm_enabled()) {
@@ -2686,10 +2686,10 @@ static int rdt_get_tree(struct fs_context *fc)
out_psl:
rdt_pseudo_lock_release();
out_mondata:
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
kernfs_remove(kn_mondata);
out_mongrp:
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
kernfs_remove(kn_mongrp);
out_info:
kernfs_remove(kn_info);
@@ -2944,9 +2944,9 @@ static void rdt_kill_sb(struct super_block *sb)
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
rdtgroup_destroy_root();
- if (rdt_alloc_capable)
+ if (resctrl_arch_alloc_capable())
resctrl_arch_disable_alloc();
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
resctrl_arch_disable_mon();
resctrl_mounted = false;
kernfs_kill_sb(sb);
@@ -3326,7 +3326,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
{
int ret;

- if (!rdt_mon_capable)
+ if (!resctrl_arch_mon_capable())
return 0;

ret = alloc_rmid(rdtgrp->closid);
@@ -3348,7 +3348,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)

static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
free_rmid(rgrp->closid, rgrp->mon.rmid);
}

@@ -3412,7 +3412,7 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,

if (rtype == RDTCTRL_GROUP) {
files = RFTYPE_BASE | RFTYPE_CTRL;
- if (rdt_mon_capable)
+ if (resctrl_arch_mon_capable())
files |= RFTYPE_MON;
} else {
files = RFTYPE_BASE | RFTYPE_MON;
@@ -3521,7 +3521,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,

list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);

- if (rdt_mon_capable) {
+ if (resctrl_arch_mon_capable()) {
/*
* Create an empty mon_groups directory to hold the subset
* of tasks and cpus to monitor.
@@ -3576,14 +3576,14 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
* allocation is supported, add a control and monitoring
* subdirectory
*/
- if (rdt_alloc_capable && parent_kn == rdtgroup_default.kn)
+ if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);

/*
* If RDT monitoring is supported and the parent directory is a valid
* "mon_groups" directory, add a monitoring subdirectory.
*/
- if (rdt_mon_capable && is_mon_groups(parent_kn, name))
+ if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
return rdtgroup_mkdir_mon(parent_kn, name, mode);

return -EPERM;
@@ -3918,7 +3918,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
* If resctrl is mounted, remove all the
* per domain monitor data directories.
*/
- if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+ if (resctrl_mounted && resctrl_arch_mon_capable())
rmdir_mondata_subdir_allrdtgrp(r, d->id);

if (is_mbm_enabled())
@@ -4001,7 +4001,7 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
* by rdt_get_tree() calling mkdir_mondata_all().
* If resctrl is mounted, add per domain monitor data directories.
*/
- if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+ if (resctrl_mounted && resctrl_arch_mon_capable())
mkdir_mondata_subdir_allrdtgrp(r, d);

return 0;

Subject: [tip: x86/cache] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 6eac36bb9eb0349c983313c71692c19d50b56878
Gitweb: https://git.kernel.org/tip/6eac36bb9eb0349c983313c71692c19d50b56878
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:24
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:32 +01:00

x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be used
for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.

The CLOSID found is returned to closid_alloc() for the free list
to be updated.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 45 +++++++++++++++++++++++++-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++---
3 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 872ba1a..b7b9d92 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -566,5 +566,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
void rdt_staged_configs_clear(void);
+bool closid_allocated(unsigned int closid);
+int resctrl_find_cleanest_closid(void);

#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 13b0c8d..101f1b1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -386,6 +386,51 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
return ERR_PTR(-ENOSPC);
}

+/**
+ * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
+ * RMID are clean, or the CLOSID that has
+ * the most clean RMID.
+ *
+ * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
+ * may not be able to allocate clean RMID. To avoid this the allocator will
+ * choose the CLOSID with the most clean RMID.
+ *
+ * When the CLOSID and RMID are independent numbers, the first free CLOSID will
+ * be returned.
+ */
+int resctrl_find_cleanest_closid(void)
+{
+ u32 cleanest_closid = ~0;
+ int i = 0;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+ return -EIO;
+
+ for (i = 0; i < closids_supported(); i++) {
+ int num_dirty;
+
+ if (closid_allocated(i))
+ continue;
+
+ num_dirty = closid_num_dirty_rmid[i];
+ if (num_dirty == 0)
+ return i;
+
+ if (cleanest_closid == ~0)
+ cleanest_closid = i;
+
+ if (num_dirty < closid_num_dirty_rmid[cleanest_closid])
+ cleanest_closid = i;
+ }
+
+ if (cleanest_closid == ~0)
+ return -ENOSPC;
+
+ return cleanest_closid;
+}
+
/*
* For MPAM the RMID value is not unique, and has to be considered with
* the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bc6e0f8..8fc4620 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -137,13 +137,22 @@ static void closid_init(void)

static int closid_alloc(void)
{
- u32 closid = ffs(closid_free_map);
+ int cleanest_closid;
+ u32 closid;

lockdep_assert_held(&rdtgroup_mutex);

- if (closid == 0)
- return -ENOSPC;
- closid--;
+ if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+ cleanest_closid = resctrl_find_cleanest_closid();
+ if (cleanest_closid < 0)
+ return cleanest_closid;
+ closid = cleanest_closid;
+ } else {
+ closid = ffs(closid_free_map);
+ if (closid == 0)
+ return -ENOSPC;
+ closid--;
+ }
__clear_bit(closid, &closid_free_map);

return closid;
@@ -163,7 +172,7 @@ void closid_free(int closid)
* Return: true if @closid is currently associated with a resource group,
* false if @closid is free
*/
-static bool closid_allocated(unsigned int closid)
+bool closid_allocated(unsigned int closid)
{
lockdep_assert_held(&rdtgroup_mutex);


Subject: [tip: x86/cache] x86/resctrl: Track the closid with the rmid

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 40fc735b78f0c81cea7d1c511cfd83892cb4d679
Gitweb: https://git.kernel.org/tip/40fc735b78f0c81cea7d1c511cfd83892cb4d679
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:19
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:31 +01:00

x86/resctrl: Track the closid with the rmid

x86's RMID are independent of the CLOSID. An RMID can be allocated,
used and freed without considering the CLOSID.

MPAM's equivalent feature is PMG, which is not an independent number,
it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of
'RMID' can be allocated for a single CLOSID.
i.e. if there is 1 bit of PMG space, then each CLOSID can have two
monitor groups.

To allow resctrl to disambiguate RMID values for different CLOSID,
everything in resctrl that keeps an RMID value needs to know the CLOSID
too. This will always be ignored on x86.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Shaopeng Tan <[email protected]>
Reviewed-by: Xin Hao <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-by: Peter Newman <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/include/asm/resctrl.h | 7 ++-
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 73 ++++++++++++++--------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 ++--
include/linux/resctrl.h | 16 ++++-
6 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 255a78d..cc6e1bc 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -7,6 +7,13 @@
#include <linux/sched.h>
#include <linux/jump_label.h>

+/*
+ * This value can never be a valid CLOSID, and is used when mapping a
+ * (closid, rmid) pair to an index and back. On x86 only the RMID is
+ * needed. The index is a software defined value.
+ */
+#define X86_RESCTRL_EMPTY_CLOSID ((u32)~0)
+
/**
* struct resctrl_pqr_state - State cache for the PQR MSR
* @cur_rmid: The cached Resource Monitoring ID
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 61c7636..ae0e333 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -542,7 +542,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
int alloc_rmid(void);
-void free_rmid(u32 rmid);
+void free_rmid(u32 closid, u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void __exit rdt_put_mon_l3_config(void);
bool __init rdt_cpu_has(int flag);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3a73db0..3dad413 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,7 +24,20 @@

#include "internal.h"

+/**
+ * struct rmid_entry - dirty tracking for all RMID.
+ * @closid: The CLOSID for this entry.
+ * @rmid: The RMID for this entry.
+ * @busy: The number of domains with cached data using this RMID.
+ * @list: Member of the rmid_free_lru list when busy == 0.
+ *
+ * Depending on the architecture the correct monitor is accessed using
+ * both @closid and @rmid, or @rmid only.
+ *
+ * Take the rdtgroup_mutex when accessing.
+ */
struct rmid_entry {
+ u32 closid;
u32 rmid;
int busy;
struct list_head list;
@@ -136,7 +149,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
return val;
}

-static inline struct rmid_entry *__rmid_entry(u32 rmid)
+static inline struct rmid_entry *__rmid_entry(u32 closid, u32 rmid)
{
struct rmid_entry *entry;

@@ -190,7 +203,8 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
}

void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid)
+ u32 unused, u32 rmid,
+ enum resctrl_event_id eventid)
{
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct arch_mbm_state *am;
@@ -230,7 +244,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
}

int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid, u64 *val)
+ u32 unused, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
@@ -285,9 +300,9 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
if (nrmid >= r->num_rmid)
break;

- entry = __rmid_entry(nrmid);
+ entry = __rmid_entry(X86_RESCTRL_EMPTY_CLOSID, nrmid);// temporary

- if (resctrl_arch_rmid_read(r, d, entry->rmid,
+ if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
QOS_L3_OCCUP_EVENT_ID, &val)) {
rmid_dirty = true;
} else {
@@ -342,7 +357,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
cpu = get_cpu();
list_for_each_entry(d, &r->domains, list) {
if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
- err = resctrl_arch_rmid_read(r, d, entry->rmid,
+ err = resctrl_arch_rmid_read(r, d, entry->closid,
+ entry->rmid,
QOS_L3_OCCUP_EVENT_ID,
&val);
if (err || val <= resctrl_rmid_realloc_threshold)
@@ -366,7 +382,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
list_add_tail(&entry->list, &rmid_free_lru);
}

-void free_rmid(u32 rmid)
+void free_rmid(u32 closid, u32 rmid)
{
struct rmid_entry *entry;

@@ -375,7 +391,7 @@ void free_rmid(u32 rmid)

lockdep_assert_held(&rdtgroup_mutex);

- entry = __rmid_entry(rmid);
+ entry = __rmid_entry(closid, rmid);

if (is_llc_occupancy_enabled())
add_rmid_to_limbo(entry);
@@ -383,8 +399,8 @@ void free_rmid(u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}

-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
- enum resctrl_event_id evtid)
+static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 closid,
+ u32 rmid, enum resctrl_event_id evtid)
{
switch (evtid) {
case QOS_L3_MBM_TOTAL_EVENT_ID:
@@ -396,20 +412,21 @@ static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
}
}

-static int __mon_event_count(u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
struct mbm_state *m;
u64 tval = 0;

if (rr->first) {
- resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
- m = get_mbm_state(rr->d, rmid, rr->evtid);
+ resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
+ m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
if (m)
memset(m, 0, sizeof(struct mbm_state));
return 0;
}

- rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
+ &tval);
if (rr->err)
return rr->err;

@@ -421,6 +438,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
/*
* mbm_bw_count() - Update bw count from values previously read by
* __mon_event_count().
+ * @closid: The closid used to identify the cached mbm_state.
* @rmid: The rmid used to identify the cached mbm_state.
* @rr: The struct rmid_read populated by __mon_event_count().
*
@@ -429,7 +447,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
* __mon_event_count() is compared with the chunks value from the previous
* invocation. This must be called once per second to maintain values in MBps.
*/
-static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
+static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
struct mbm_state *m = &rr->d->mbm_local[rmid];
u64 cur_bw, bytes, cur_bytes;
@@ -456,7 +474,7 @@ void mon_event_count(void *info)

rdtgrp = rr->rgrp;

- ret = __mon_event_count(rdtgrp->mon.rmid, rr);
+ ret = __mon_event_count(rdtgrp->closid, rdtgrp->mon.rmid, rr);

/*
* For Ctrl groups read data from child monitor groups and
@@ -467,7 +485,8 @@ void mon_event_count(void *info)

if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
- if (__mon_event_count(entry->mon.rmid, rr) == 0)
+ if (__mon_event_count(entry->closid, entry->mon.rmid,
+ rr) == 0)
ret = 0;
}
}
@@ -578,7 +597,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
}

-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
+ u32 closid, u32 rmid)
{
struct rmid_read rr;

@@ -593,12 +613,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
if (is_mbm_total_enabled()) {
rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
rr.val = 0;
- __mon_event_count(rmid, &rr);
+ __mon_event_count(closid, rmid, &rr);
}
if (is_mbm_local_enabled()) {
rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
rr.val = 0;
- __mon_event_count(rmid, &rr);
+ __mon_event_count(closid, rmid, &rr);

/*
* Call the MBA software controller only for the
@@ -606,7 +626,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
* the software controller explicitly.
*/
if (is_mba_sc(NULL))
- mbm_bw_count(rmid, &rr);
+ mbm_bw_count(closid, rmid, &rr);
}
}

@@ -663,11 +683,11 @@ void mbm_handle_overflow(struct work_struct *work)
d = container_of(work, struct rdt_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
- mbm_update(r, d, prgrp->mon.rmid);
+ mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);

head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list)
- mbm_update(r, d, crgrp->mon.rmid);
+ mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);

if (is_mba_sc(NULL))
update_mba_bw(prgrp, d);
@@ -710,10 +730,11 @@ static int dom_data_init(struct rdt_resource *r)
}

/*
- * RMID 0 is special and is always allocated. It's used for all
- * tasks that are not monitored.
+ * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and
+ * are always allocated. These are used for the rdtgroup_default
+ * control group, which will be setup later in rdtgroup_init().
*/
- entry = __rmid_entry(0);
+ entry = __rmid_entry(RESCTRL_RESERVED_CLOSID, RESCTRL_RESERVED_RMID);
list_del(&entry->list);

return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559ee..65bee6f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -752,7 +752,7 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
* anymore when this group would be used for pseudo-locking. This
* is safe to call on platforms not capable of monitoring.
*/
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

ret = 0;
goto out;
@@ -787,7 +787,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)

ret = rdtgroup_locksetup_user_restore(rdtgrp);
if (ret) {
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
return ret;
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f455a10..ad7da72 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2837,7 +2837,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)

head = &rdtgrp->mon.crdtgrp_list;
list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
- free_rmid(sentry->mon.rmid);
+ free_rmid(sentry->closid, sentry->mon.rmid);
list_del(&sentry->mon.crdtgrp_list);

if (atomic_read(&sentry->waitcount) != 0)
@@ -2877,7 +2877,7 @@ static void rmdir_all_sub(void)
cpumask_or(&rdtgroup_default.cpu_mask,
&rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);

- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

kernfs_remove(rdtgrp->kn);
list_del(&rdtgrp->rdtgroup_list);
@@ -3305,7 +3305,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
if (ret) {
rdt_last_cmd_puts("kernfs subdir error\n");
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
return ret;
}

@@ -3315,7 +3315,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
{
if (rdt_mon_capable)
- free_rmid(rgrp->mon.rmid);
+ free_rmid(rgrp->closid, rgrp->mon.rmid);
}

static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
@@ -3574,7 +3574,7 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
update_closid_rmid(tmpmask, NULL);

rdtgrp->flags = RDT_DELETED;
- free_rmid(rdtgrp->mon.rmid);
+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);

/*
* Remove the rdtgrp from the parent ctrl_mon group's list
@@ -3620,8 +3620,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
update_closid_rmid(tmpmask, NULL);

+ free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
closid_free(rdtgrp->closid);
- free_rmid(rdtgrp->mon.rmid);

rdtgroup_ctrl_remove(rdtgrp);

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 66942d7..bd4ec22 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -6,6 +6,10 @@
#include <linux/list.h>
#include <linux/pid.h>

+/* CLOSID, RMID value used by the default control group */
+#define RESCTRL_RESERVED_CLOSID 0
+#define RESCTRL_RESERVED_RMID 0
+
#ifdef CONFIG_PROC_CPU_RESCTRL

int proc_resctrl_show(struct seq_file *m,
@@ -225,6 +229,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* for this resource and domain.
* @r: resource that the counter should be read from.
* @d: domain that the counter should be read from.
+ * @closid: closid that matches the rmid. Depending on the architecture, the
+ * counter may match traffic of both @closid and @rmid, or @rmid
+ * only.
* @rmid: rmid of the counter to read.
* @eventid: eventid to read, e.g. L3 occupancy.
* @val: result of the counter read in bytes.
@@ -235,20 +242,25 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
* 0 on success, or -EIO, -EINVAL etc on error.
*/
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid, u64 *val);
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val);
+

/**
* resctrl_arch_reset_rmid() - Reset any private state associated with rmid
* and eventid.
* @r: The domain's resource.
* @d: The rmid's domain.
+ * @closid: closid that matches the rmid. Depending on the architecture, the
+ * counter may match traffic of both @closid and @rmid, or @rmid only.
* @rmid: The rmid whose counter values should be reset.
* @eventid: The eventid whose counter values should be reset.
*
* This can be called from any CPU.
*/
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
- u32 rmid, enum resctrl_event_id eventid);
+ u32 closid, u32 rmid,
+ enum resctrl_event_id eventid);

/**
* resctrl_arch_reset_rmid_all() - Reset all private state associated with

Subject: [tip: x86/cache] x86/resctrl: Free rmid_ptrs from resctrl_exit()

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 3f7b07380d58cfbb6a2d3aa672dcc76c0f4b0745
Gitweb: https://git.kernel.org/tip/3f7b07380d58cfbb6a2d3aa672dcc76c0f4b0745
Author: James Morse <[email protected]>
AuthorDate: Tue, 13 Feb 2024 18:44:16
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Fri, 16 Feb 2024 19:18:31 +01:00

x86/resctrl: Free rmid_ptrs from resctrl_exit()

rmid_ptrs[] is allocated from dom_data_init() but never free()d.

While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.

Add resctrl_put_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().

There is no reason to backport this to a stable kernel.

Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Tested-by: Babu Moger <[email protected]>
Tested-by: Carl Worth <[email protected]> # arm64
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 15 +++++++++++++++
3 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index aa9810a..9641c42 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -990,8 +990,14 @@ late_initcall(resctrl_late_init);

static void __exit resctrl_exit(void)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
cpuhp_remove_state(rdt_online);
+
rdtgroup_exit();
+
+ if (r->mon_capable)
+ rdt_put_mon_l3_config();
}

__exitcall(resctrl_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 52e7e7d..61c7636 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -544,6 +544,7 @@ void closid_free(int closid);
int alloc_rmid(void);
void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
+void __exit rdt_put_mon_l3_config(void);
bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3a6c069..3a73db0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -719,6 +719,16 @@ static int dom_data_init(struct rdt_resource *r)
return 0;
}

+static void __exit dom_data_exit(void)
+{
+ mutex_lock(&rdtgroup_mutex);
+
+ kfree(rmid_ptrs);
+ rmid_ptrs = NULL;
+
+ mutex_unlock(&rdtgroup_mutex);
+}
+
static struct mon_evt llc_occupancy_event = {
.name = "llc_occupancy",
.evtid = QOS_L3_OCCUP_EVENT_ID,
@@ -814,6 +824,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
return 0;
}

+void __exit rdt_put_mon_l3_config(void)
+{
+ dom_data_exit();
+}
+
void __init intel_rdt_mbm_apply_quirk(void)
{
int cf_index;

2024-02-20 10:40:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On Mon, Feb 19, 2024 at 04:53:38PM +0000, James Morse wrote:
> Thanks - I'm just putting a v10 together to fix this.

No need - I'll fold it in.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-02-20 15:28:03

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

On 13.02.24 19:44, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
>
> Add resctrl_put_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().
>
> There is no reason to backport this to a stable kernel.
>
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Babu Moger <[email protected]>
> Tested-by: Carl Worth <[email protected]> # arm64
> Reviewed-by: Babu Moger <[email protected]>
> ---

[...]


> +static void __exit dom_data_exit(void)
> +{
> + mutex_lock(&rdtgroup_mutex);
> +
> + kfree(rmid_ptrs);
> + rmid_ptrs = NULL;
> +
> + mutex_unlock(&rdtgroup_mutex);

Just curious: is grabbing that mutex really required?

Against which race are we trying to protect ourselves?

I suspect this mutex is not required here: if we could racing with
someone else, likely freeing that memory would not be safe either.

Apart from that LGTM.

--
Cheers,

David / dhildenb


2024-02-20 15:46:25

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

Hi David,

On 20/02/2024 15:27, David Hildenbrand wrote:
> On 13.02.24 19:44, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_put_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
>>
>> There is no reason to backport this to a stable kernel.

>> +static void __exit dom_data_exit(void)
>> +{
>> +    mutex_lock(&rdtgroup_mutex);
>> +
>> +    kfree(rmid_ptrs);
>> +    rmid_ptrs = NULL;
>> +
>> +    mutex_unlock(&rdtgroup_mutex);
>
> Just curious: is grabbing that mutex really required?
>
> Against which race are we trying to protect ourselves?

Not a race, but its to protect against having to think about memory ordering!


> I suspect this mutex is not required here: if we could racing with someone else, likely
> freeing that memory would not be safe either.

All the accesses to that variable take the mutex, its necessary to take the mutex to
ensure the most recently stored values are seen. In this case the array values don't
matter, but rmid_ptrs is written under the mutex too.
There is almost certainly a control dependency that means the CPU calling dom_data_exit()
will see the value of rmid_ptrs from dom_data_init() - but its much simpler to check that
all accesses take the mutex.

With MPAM this code can be invoked from an error IRQ signalled by the hardware, so it
could happen anytime.


> Apart from that LGTM.

Thanks for taking a look!


Thanks,

James


2024-02-20 15:54:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

On Tue, Feb 20 2024 at 15:46, James Morse wrote:
> On 20/02/2024 15:27, David Hildenbrand wrote:
>> On 13.02.24 19:44, James Morse wrote:
>>> +static void __exit dom_data_exit(void)
>>> +{
>>> +    mutex_lock(&rdtgroup_mutex);
>>> +
>>> +    kfree(rmid_ptrs);
>>> +    rmid_ptrs = NULL;
>>> +
>>> +    mutex_unlock(&rdtgroup_mutex);
>>
>> Just curious: is grabbing that mutex really required?
>>
>> Against which race are we trying to protect ourselves?
>
> Not a race, but its to protect against having to think about memory ordering!
>
>> I suspect this mutex is not required here: if we could racing with someone else, likely
>> freeing that memory would not be safe either.
>
> All the accesses to that variable take the mutex, its necessary to take the mutex to
> ensure the most recently stored values are seen. In this case the array values don't
> matter, but rmid_ptrs is written under the mutex too.
> There is almost certainly a control dependency that means the CPU calling dom_data_exit()
> will see the value of rmid_ptrs from dom_data_init() - but its much simpler to check that
> all accesses take the mutex.
>
> With MPAM this code can be invoked from an error IRQ signalled by the hardware, so it
> could happen anytime.

Which does not work because you can't acquire a mutex from hard
interrupt context.

Thanks,

tglx

2024-02-20 16:04:30

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

On 20/02/2024 15:54, Thomas Gleixner wrote:
> On Tue, Feb 20 2024 at 15:46, James Morse wrote:
>> On 20/02/2024 15:27, David Hildenbrand wrote:
>>> On 13.02.24 19:44, James Morse wrote:
>>>> +static void __exit dom_data_exit(void)
>>>> +{
>>>> +    mutex_lock(&rdtgroup_mutex);
>>>> +
>>>> +    kfree(rmid_ptrs);
>>>> +    rmid_ptrs = NULL;
>>>> +
>>>> +    mutex_unlock(&rdtgroup_mutex);
>>>
>>> Just curious: is grabbing that mutex really required?
>>>
>>> Against which race are we trying to protect ourselves?
>>
>> Not a race, but its to protect against having to think about memory ordering!
>>
>>> I suspect this mutex is not required here: if we could racing with someone else, likely
>>> freeing that memory would not be safe either.
>>
>> All the accesses to that variable take the mutex, its necessary to take the mutex to
>> ensure the most recently stored values are seen. In this case the array values don't
>> matter, but rmid_ptrs is written under the mutex too.
>> There is almost certainly a control dependency that means the CPU calling dom_data_exit()
>> will see the value of rmid_ptrs from dom_data_init() - but its much simpler to check that
>> all accesses take the mutex.
>>
>> With MPAM this code can be invoked from an error IRQ signalled by the hardware, so it
>> could happen anytime.
>
> Which does not work because you can't acquire a mutex from hard
> interrupt context.

Indeed - which is why that happens via schedule_work() [0]

My point was that its non-obvious where/when this will happen, so taking the lock and
forgetting about it is the simplest thing to do.


Thanks,

James


[0]
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/platform/mpam/mpam_devices.c?h=mpam/snapshot/v6.7-rc2&id=7da1c7f9d9ef723f829bf44ed96e1fc4a46ef29f#n1299


2024-02-20 16:13:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v9 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit()

On Tue, Feb 20 2024 at 16:01, James Morse wrote:
> On 20/02/2024 15:54, Thomas Gleixner wrote:
>>> With MPAM this code can be invoked from an error IRQ signalled by the hardware, so it
>>> could happen anytime.
>>
>> Which does not work because you can't acquire a mutex from hard
>> interrupt context.
>
> Indeed - which is why that happens via schedule_work() [0]
>
> My point was that its non-obvious where/when this will happen, so taking the lock and
> forgetting about it is the simplest thing to do.

Makes sense.

Thanks,

tglx

2024-02-20 18:19:06

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi Tony,

On 2/16/2024 4:28 PM, Tony Luck wrote:
> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>> Hello!
>>
>> It's been back and forth for whether this series should be rebased onto Tony's
>> SNC series. This version isn't, its based on tip/x86/cache.
>> (I have the rebased-and-tested versions if anyone needs them)
>
> In case James' patches go first, I took a crack at basing my SNC series
> on top of his patches (specifically the mpam/monitors_and_locking/v9
> branch of git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git).
>
> Result is here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
> Branch: james_then_snc
>
> The end result of which ought to be pretty similar to the
> "rebased-and-tested" versions that James mentions above.

As I understand Babu withdrew his "Reviewed-by" tag [1]. Will you
be posting this new version (with tip tag ordering)?

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

2024-02-20 18:48:57

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

>> The end result of which ought to be pretty similar to the
>> "rebased-and-tested" versions that James mentions above.
>
> As I understand Babu withdrew his "Reviewed-by" tag [1]. Will you
> be posting this new version (with tip tag ordering)?

Reinette

A couple of my patches required extensive massage to apply on top
of James series (which I see is now in TIP x86/cache).

I'll rebase to tip and remove all the Reviewed and Tested tags and post
as v15 soon.

-Tony

2024-02-20 21:00:09

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote:
> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:
>
> > On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
> >> Hello!
> >>
> >> It's been back and forth for whether this series should be rebased onto Tony's
> >> SNC series. This version isn't, its based on tip/x86/cache.
> >> (I have the rebased-and-tested versions if anyone needs them)
> >
> > The set applied ontop of tip:x86/cache gives:
> >
> > vmlinux.o: in function `get_domain_from_cpu':
> > (.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
> > ld: vmlinux.o: in function `rdt_ctrl_update':
> > (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'
>
> Wants to be folded into patch 24.
>
> Thanks,
>
> tglx
> ---
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i
> * about locks this thread holds will lead to false positives. Check
> * someone is holding the CPUs lock.
> */
> - if (IS_ENABLED(CONFIG_LOCKDEP))
> - lockdep_is_cpus_held();
> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
> + WARN_ON_ONCE(!lockdep_is_cpus_held());
>
> list_for_each_entry(d, &r->domains, list) {
> /* Find the domain that contains this CPU */

Testing tip x86/cache that WARN fires while running
tools/tests/selftests/resctrl/resctrl_test.

Everthing runs OK if I drop the top commit:
fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")

-Tony


[ 663.817986] ------------[ cut here ]------------
[ 663.822667] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/resctrl/core.c:372 get_domain_from_cpu+0x45/0x50
[ 663.832332] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif sunrpc binfmt_misc kvm dax_hmem cxl_acpi irqbypass vfat iTCO_wdt rapl intel_pmc_bxt iTCO_vendor_support intel_cstate fat intel_uncore cxl_core pcspkr acpi_ipmi isst_if_mmio isst_if_mbox_pci i2c_i801 isst_if_common mei_me i2c_smbus mei intel_pch_thermal intel_vsec ioatdma ipmi_si ipmi_devintf joydev ipmi_msghandler acpi_power_meter acpi_pad loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 ixgbe igb sha256_ssse3 ast
[ 663.832534] mdio sha1_ssse3 i2c_algo_bit dca wmi ip6_tables ip_tables fuse
[ 663.929224] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1+ #247
[ 663.935662] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0021.P06.2104260458 04/26/2021
[ 663.946175] RIP: 0010:get_domain_from_cpu+0x45/0x50
[ 663.951061] Code: 73 40 89 ef 48 39 f0 75 0a eb 16 48 8b 00 48 39 f0 74 0e 48 0f a3 78 18 73 f1 5b 5d c3 cc cc cc cc 31 c0 5b 5d c3 cc cc cc cc <0f> 0b eb cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
[ 663.969807] RSP: 0018:ff22c48cc0003f80 EFLAGS: 00010046
[ 663.975042] RAX: 0000000000000000 RBX: ffffffff92e4a3a0 RCX: 0000000000000001
[ 663.982174] RDX: 0000000000000000 RSI: ffffffff92aa8289 RDI: ffffffff92b56a1e
[ 663.989305] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
[ 663.996436] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff92e4a3a0
[ 664.003570] R13: 0000000000000000 R14: ffffffff910672c0 R15: ff22c48ce0e17df0
[ 664.010699] FS: 0000000000000000(0000) GS:ff1f2c6cdde00000(0000) knlGS:0000000000000000
[ 664.018785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 664.024530] CR2: 00007f608c001048 CR3: 0000000582e38003 CR4: 0000000000771ef0
[ 664.031664] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 664.038794] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 664.045927] PKRU: 55555554
[ 664.048641] Call Trace:
[ 664.051093] <IRQ>
[ 664.053113] ? get_domain_from_cpu+0x45/0x50
[ 664.057391] ? __warn+0x81/0x170
[ 664.060637] ? get_domain_from_cpu+0x45/0x50
[ 664.064917] ? report_bug+0x18d/0x1c0
[ 664.068593] ? handle_bug+0x3c/0x80
[ 664.072089] ? exc_invalid_op+0x13/0x60
[ 664.075931] ? asm_exc_invalid_op+0x16/0x20
[ 664.080124] ? __pfx_rdt_ctrl_update+0x10/0x10
[ 664.084574] ? get_domain_from_cpu+0x45/0x50
[ 664.088854] rdt_ctrl_update+0x20/0x70
[ 664.092613] __flush_smp_call_function_queue+0xdd/0x560
[ 664.097848] __sysvec_call_function+0x32/0x110
[ 664.102301] sysvec_call_function+0x99/0xc0
[ 664.106497] </IRQ>
[ 664.108602] <TASK>
[ 664.110708] asm_sysvec_call_function+0x16/0x20
[ 664.115246] RIP: 0010:cpuidle_enter_state+0xfb/0x4f0
[ 664.120221] Code: c0 48 0f a3 05 16 4d 0a 01 0f 82 fb 02 00 00 31 ff e8 d9 32 06 ff 45 84 ff 0f 85 cb 02 00 00 e8 ab 70 18 ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 eb 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[ 664.138966] RSP: 0018:ffffffff92e03e28 EFLAGS: 00000202
[ 664.144191] RAX: 00000000002458c1 RBX: ff54c484be002f38 RCX: 000000000000001f
[ 664.151323] RDX: 0000000000000000 RSI: ffffffff92be36c1 RDI: ffffffff92b56a1e
[ 664.158454] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001
[ 664.165588] R10: 0000000000000003 R11: ff1f2c6cdde34de4 R12: ffffffff930d0560
[ 664.172719] R13: 0000009a8ea28623 R14: 0000000000000002 R15: 0000000000000000
[ 664.179860] ? cpuidle_enter_state+0xf5/0x4f0
[ 664.184221] cpuidle_enter+0x29/0x40
[ 664.187808] do_idle+0x231/0x290
[ 664.191052] cpu_startup_entry+0x26/0x30
[ 664.194983] rest_init+0xf1/0x190
[ 664.198303] arch_call_rest_init+0xa/0x30
[ 664.202323] start_kernel+0x8b8/0xac0
[ 664.205993] x86_64_start_reservations+0x14/0x30
[ 664.210617] x86_64_start_kernel+0x92/0xa0
[ 664.214725] secondary_startup_64_no_verify+0x184/0x18b
[ 664.219965] </TASK>
[ 664.222159] irq event stamp: 2382018
[ 664.225737] hardirqs last enabled at (2382017): [<ffffffff9212dea5>] cpuidle_enter_state+0xf5/0x4f0
[ 664.234873] hardirqs last disabled at (2382018): [<ffffffff9212b29a>] sysvec_call_function+0xa/0xc0
[ 664.243920] softirqs last enabled at (2382012): [<ffffffff91120705>] __irq_exit_rcu+0xa5/0x110
[ 664.252621] softirqs last disabled at (2381797): [<ffffffff91120705>] __irq_exit_rcu+0xa5/0x110
[ 664.261311] ---[ end trace 0000000000000000 ]---

2024-02-20 22:59:15

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking



On 2/20/2024 12:59 PM, Tony Luck wrote:
> On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote:
>> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:
>>
>>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>>>> Hello!
>>>>
>>>> It's been back and forth for whether this series should be rebased onto Tony's
>>>> SNC series. This version isn't, its based on tip/x86/cache.
>>>> (I have the rebased-and-tested versions if anyone needs them)
>>>
>>> The set applied ontop of tip:x86/cache gives:
>>>
>>> vmlinux.o: in function `get_domain_from_cpu':
>>> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held'
>>> ld: vmlinux.o: in function `rdt_ctrl_update':
>>> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held'
>>
>> Wants to be folded into patch 24.
>>
>> Thanks,
>>
>> tglx
>> ---
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i
>> * about locks this thread holds will lead to false positives. Check
>> * someone is holding the CPUs lock.
>> */
>> - if (IS_ENABLED(CONFIG_LOCKDEP))
>> - lockdep_is_cpus_held();
>> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
>> + WARN_ON_ONCE(!lockdep_is_cpus_held());
>>
>> list_for_each_entry(d, &r->domains, list) {
>> /* Find the domain that contains this CPU */
>
> Testing tip x86/cache that WARN fires while running
> tools/tests/selftests/resctrl/resctrl_test.
>
> Everthing runs OK if I drop the top commit:
> fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")

The new WARN_ON_ONCE() is why this encountered. The comment notes that
lockdep_is_cpus_held() is used to determine if "someone is holding the
CPUs lock" but it seems that lockdep_is_cpus_held() still only checks
if "current" is holding cpu_hotplug_lock and that is not possible
when running the code via IPI.

The trace that Tony shared notes that this is triggered by get_domain_from_cpu()
called via rdt_ctrl_update(). rdt_ctrl_update() is only run via IPI:

resctrl_arch_update_domains() {
...
lockdep_assert_cpus_held();
...
on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
...
}

and

reset_all_ctrls() {
...
lockdep_assert_cpus_held();
...
on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
}

I sprinkled some debug_show_held_locks(current) to confirm and encountered
the following when reproducing the trace using the resctrl tests:

[ 202.914334] resctrl_arch_update_domains:355
[ 202.919971] 4 locks held by resctrl_tests/3330:
[ 202.925169] #0: ff11001086e09408 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x69/0x100
[ 202.934375] #1: ff110010bb653688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf7/0x200
[ 202.944348] #2: ffffffff8346c890 (cpu_hotplug_lock){++++}-{0:0}, at: rdtgroup_kn_lock_live+0x4c/0xa0
[ 202.954774] #3: ffffffff8344ae68 (rdtgroup_mutex){+.+.}-{3:3}, at: rdtgroup_kn_lock_live+0x5a/0xa0
[ 202.965030] get_domain_from_cpu:366
[ 202.969087] no locks held by swapper/0/0.
[ 202.973697] ------------[ cut here ]------------
[ 202.978979] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/resctrl/core.c:375 get_domain_from_cpu+0x6f/0x80
<SNIP>
[ 203.200095] Call Trace:
[ 203.202947] <TASK>
[ 203.205406] ? __warn+0x84/0x180
[ 203.209123] ? get_domain_from_cpu+0x6f/0x80
[ 203.214011] ? report_bug+0x1c7/0x1e0
[ 203.218214] ? handle_bug+0x3c/0x80
[ 203.222230] ? exc_invalid_op+0x18/0x80
[ 203.227198] ? asm_exc_invalid_op+0x1a/0x20
[ 203.232529] ? __pfx_rdt_ctrl_update+0x20/0x20
[ 203.238146] ? get_domain_from_cpu+0x6f/0x80
[ 203.243548] rdt_ctrl_update+0x26/0x80
<SNIP>


So even though it is confirmed via lockdep_assert_cpus_held() that
resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible
to have a similar lockdep check in the function called by it (resctrl_arch_update_domains())
via IPI. It thus does not look like that lockdep checking within
get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with
to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but
with detailed context comments remaining) the lockdep check in get_domain_from_cpu()?

Reinette

2024-02-20 23:26:07

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

> So even though it is confirmed via lockdep_assert_cpus_held() that
> resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible
> to have a similar lockdep check in the function called by it (resctrl_arch_update_domains())
> via IPI. It thus does not look like that lockdep checking within
> get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with
> to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but
> with detailed context comments remaining) the lockdep check in get_domain_from_cpu()?

Reinette

Both the places where this has problems (reset_all_ctrls() and
resctrl_arch_update_domains()) have similar structure:


list_for_each_entry(d, &r->domains, list) {
add some bits to a cpumask
}

on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);


Maybe instead of collecting all CPUs that need to do something, and then each
of them backtrack and search for the domain from a resource (that is passed
in the msr_param argument). The code could be restructured to pass the domain
to the target function. Like this:


list_for_each_entry(d, &r->domains, list) {
msr_param.dom = d;
smp_call_function_single(somecpu, rdt_ctrl_update, &msr_param, 1);
}

I'll try coding this up to see if it works.

-Tony

2024-02-21 00:35:41

by Luck, Tony

[permalink] [raw]
Subject: [PATCH] x86/resctrl: Fix WARN in get_domain_from_cpu()

reset_all_ctrls() and resctrl_arch_update_domains() use
on_each_cpu_mask() to call rdt_ctrl_update() on potentially
one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain
to apply changes to. Doing so requires a search of all domains
in a resource, which can only be done safely if cpus_lock is
held. Both callers do hold this lock, but there isn't a way
for a function called on another CPU via IPI to verify this.

Fix by adding the target domain to the msr_param structure and
calling for each domain separately using smp_call_function_single()

Signed-off-by: Tony Luck <[email protected]>

---
Either apply on top of tip x86/cache:

fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")

or merge this into that commit.
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 10 +----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 50 +++++------------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 ++-----
4 files changed, 16 insertions(+), 59 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c99f26ebe7a6..c30d7697b431 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -383,6 +383,7 @@ static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
*/
struct msr_param {
struct rdt_resource *res;
+ struct rdt_domain *dom;
u32 low;
u32 high;
};
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8a4ef4f5bddc..8d8b8abcda98 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -390,16 +390,8 @@ void rdt_ctrl_update(void *arg)
struct msr_param *m = arg;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
struct rdt_resource *r = m->res;
- int cpu = smp_processor_id();
- struct rdt_domain *d;

- d = get_domain_from_cpu(cpu, r);
- if (d) {
- hw_res->msr_update(d, m, r);
- return;
- }
- pr_warn_once("cpu %d not found in any domain for resource %s\n",
- cpu, r->name);
+ hw_res->msr_update(m->dom, m, r);
}

/*
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7997b47743a2..aed702d06314 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
- struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask)
-{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
-
- if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
- cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- hw_dom->ctrl_val[idx] = cfg->new_ctrl;
-
- return true;
- }
-
- return false;
-}
-
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
@@ -315,17 +299,13 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
+ int cpu;
u32 idx;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
- msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
for (t = 0; t < CDP_NUM_TYPES; t++) {
@@ -334,29 +314,19 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask))
+ if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
continue;
-
- if (!msr_param.res) {
- msr_param.low = idx;
- msr_param.high = msr_param.low + 1;
- msr_param.res = r;
- } else {
- msr_param.low = min(msr_param.low, idx);
- msr_param.high = max(msr_param.high, idx + 1);
- }
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;
+ cpu = cpumask_any(&d->cpu_mask);
+
+ msr_param.low = idx;
+ msr_param.high = msr_param.low + 1;
+ msr_param.res = r;
+ msr_param.dom = d;
+ smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);
}
}

- if (cpumask_empty(cpu_mask))
- goto done;
-
- /* Update resource control msr on all the CPUs. */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
-done:
- free_cpumask_var(cpu_mask);
-
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 011e17efb1a6..da4f13db4161 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2813,16 +2813,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
+ int cpu;
int i;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
msr_param.res = r;
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
@@ -2834,17 +2831,14 @@ static int reset_all_ctrls(struct rdt_resource *r)
*/
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+ cpu = cpumask_any(&d->cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
+ msr_param.dom = d;
+ smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);
}

- /* Update CBM on all the CPUs in cpu_mask */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
- free_cpumask_var(cpu_mask);
-
return 0;
}

--
2.43.0


2024-02-21 05:11:12

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix WARN in get_domain_from_cpu()

Hi Tony,

Regarding the implication made in the subject ...
from what I understand the WARN is a false positive.

On 2/20/2024 4:34 PM, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use
> on_each_cpu_mask() to call rdt_ctrl_update() on potentially
> one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain
> to apply changes to. Doing so requires a search of all domains
> in a resource, which can only be done safely if cpus_lock is
> held. Both callers do hold this lock, but there isn't a way
> for a function called on another CPU via IPI to verify this.
>
> Fix by adding the target domain to the msr_param structure and
> calling for each domain separately using smp_call_function_single()

This sounds reasonable to me. Thank you for the proposal.

> Signed-off-by: Tony Luck <[email protected]>
>
> ---
> Either apply on top of tip x86/cache:
>
> fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")
>
> or merge this into that commit.

I do not know if it would be preferred to take this approach as
part of this work or just remove the WARN and add this
improvement/refactoring later as a follow-up.

> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> arch/x86/kernel/cpu/resctrl/core.c | 10 +----
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 50 +++++------------------
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 ++-----
> 4 files changed, 16 insertions(+), 59 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c99f26ebe7a6..c30d7697b431 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -383,6 +383,7 @@ static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
> */
> struct msr_param {
> struct rdt_resource *res;
> + struct rdt_domain *dom;
> u32 low;
> u32 high;
> };
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 8a4ef4f5bddc..8d8b8abcda98 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -390,16 +390,8 @@ void rdt_ctrl_update(void *arg)
> struct msr_param *m = arg;
> struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
> struct rdt_resource *r = m->res;
> - int cpu = smp_processor_id();
> - struct rdt_domain *d;
>
> - d = get_domain_from_cpu(cpu, r);
> - if (d) {
> - hw_res->msr_update(d, m, r);
> - return;
> - }
> - pr_warn_once("cpu %d not found in any domain for resource %s\n",
> - cpu, r->name);
> + hw_res->msr_update(m->dom, m, r);

It looks redundant to provide struct msr_param as well as two of its
members as parameters.

> }
>
> /*
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 7997b47743a2..aed702d06314 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
> }
> }
>
> -static bool apply_config(struct rdt_hw_domain *hw_dom,
> - struct resctrl_staged_config *cfg, u32 idx,
> - cpumask_var_t cpu_mask)
> -{
> - struct rdt_domain *dom = &hw_dom->d_resctrl;
> -
> - if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
> - cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
> - hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> -
> - return true;
> - }
> -
> - return false;
> -}
> -
> int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> {
> @@ -315,17 +299,13 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
> struct rdt_hw_domain *hw_dom;
> struct msr_param msr_param;
> enum resctrl_conf_type t;
> - cpumask_var_t cpu_mask;
> struct rdt_domain *d;
> + int cpu;
> u32 idx;
>
> /* Walking r->domains, ensure it can't race with cpuhp */
> lockdep_assert_cpus_held();
>
> - if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
> - return -ENOMEM;
> -
> - msr_param.res = NULL;
> list_for_each_entry(d, &r->domains, list) {
> hw_dom = resctrl_to_arch_dom(d);
> for (t = 0; t < CDP_NUM_TYPES; t++) {
> @@ -334,29 +314,19 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
> continue;
>
> idx = get_config_index(closid, t);
> - if (!apply_config(hw_dom, cfg, idx, cpu_mask))
> + if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
> continue;
> -
> - if (!msr_param.res) {
> - msr_param.low = idx;
> - msr_param.high = msr_param.low + 1;
> - msr_param.res = r;
> - } else {
> - msr_param.low = min(msr_param.low, idx);
> - msr_param.high = max(msr_param.high, idx + 1);
> - }
> + hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> + cpu = cpumask_any(&d->cpu_mask);
> +
> + msr_param.low = idx;
> + msr_param.high = msr_param.low + 1;
> + msr_param.res = r;
> + msr_param.dom = d;
> + smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);

When CDP is enabled, could this not end up sending IPI to the same CPU twice, each
requesting CPU to do one MSR write instead of sending an IPI once to write all
needed MSRs?


Reinette

2024-02-21 12:06:51

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi Tony, Reinette,

On 20/02/2024 22:58, Reinette Chatre wrote:
> On 2/20/2024 12:59 PM, Tony Luck wrote:
>> On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote:
>>> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:
>>>
>>>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>>>>> Hello!
>>>>>
>>>>> It's been back and forth for whether this series should be rebased onto Tony's
>>>>> SNC series. This version isn't, its based on tip/x86/cache.
>>>>> (I have the rebased-and-tested versions if anyone needs them)
>>>>
>>>> The set applied ontop of tip:x86/cache gives:

>> Testing tip x86/cache that WARN fires while running
>> tools/tests/selftests/resctrl/resctrl_test.

I evidently need to build a newer version of that tool.


>> Everthing runs OK if I drop the top commit:
>> fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")
>
> The new WARN_ON_ONCE() is why this encountered. The comment notes that
> lockdep_is_cpus_held() is used to determine if "someone is holding the
> CPUs lock" but it seems that lockdep_is_cpus_held() still only checks
> if "current" is holding cpu_hotplug_lock and that is not possible
> when running the code via IPI.

I was evidently mistaken that this was the difference between
lockdep_is_cpus_held() and lockdep_assert_cpus_held().

It's a false positive, ripping out the check is the simplest thing to do.


> So even though it is confirmed via lockdep_assert_cpus_held() that
> resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible
> to have a similar lockdep check in the function called by it (resctrl_arch_update_domains())
> via IPI. It thus does not look like that lockdep checking within
> get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with
> to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but
> with detailed context comments remaining) the lockdep check in get_domain_from_cpu()?


Thanks,

James

2024-02-21 12:07:04

by James Morse

[permalink] [raw]
Subject: Re: [PATCH] x86/resctrl: Fix WARN in get_domain_from_cpu()

Hi Tony,

On 21/02/2024 00:34, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use
> on_each_cpu_mask() to call rdt_ctrl_update() on potentially
> one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain
> to apply changes to. Doing so requires a search of all domains
> in a resource, which can only be done safely if cpus_lock is
> held. Both callers do hold this lock, but there isn't a way
> for a function called on another CPU via IPI to verify this.
>
> Fix by adding the target domain to the msr_param structure and
> calling for each domain separately using smp_call_function_single()

Cunning - this trades the memory allocation for multiple IPI. I think this is much better
for the case where only the local domains configuration is modified.

With the double-IPI when both CDP configurations are changed fixed:
Reviewed-by: James Morse <[email protected]>

I think we should rip out the false positive check, I'll post a patch to do that.
I'll double check this was the only IPI path, if so its safe again after this patch and we
can add lockdep_assert_cpus_held(). If anyone ever hits this during a bisect its should be
clear(er).


Thanks,

James

2024-02-21 16:49:21

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hi James,

On 2/21/2024 4:06 AM, James Morse wrote:
> Hi Tony, Reinette,
>
> On 20/02/2024 22:58, Reinette Chatre wrote:
>> On 2/20/2024 12:59 PM, Tony Luck wrote:
>>> On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote:
>>>> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote:
>>>>
>>>>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote:
>>>>>> Hello!
>>>>>>
>>>>>> It's been back and forth for whether this series should be rebased onto Tony's
>>>>>> SNC series. This version isn't, its based on tip/x86/cache.
>>>>>> (I have the rebased-and-tested versions if anyone needs them)
>>>>>
>>>>> The set applied ontop of tip:x86/cache gives:
>
>>> Testing tip x86/cache that WARN fires while running
>>> tools/tests/selftests/resctrl/resctrl_test.
>
> I evidently need to build a newer version of that tool.

There has been a lot of changes in the last few cycles. You can find the
latest version on the "next" branch of the kselftest repo [1] where most recent
enhancements are queued up for inclusion. If you are interested, there is one
more series [2] pending merge to the kselftest repo, it adds a new test for
the recent non-contiguous CBM support.

Reinette

[1] git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
[2] https://lore.kernel.org/lkml/[email protected]/

2024-02-21 17:30:38

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

>>>> Testing tip x86/cache that WARN fires while running
>>>> tools/tests/selftests/resctrl/resctrl_test.
>>
>> I evidently need to build a newer version of that tool.
>
> There has been a lot of changes in the last few cycles. You can find the
> latest version on the "next" branch of the kselftest repo [1] where most recent
> enhancements are queued up for inclusion. If you are interested, there is one
> more series [2] pending merge to the kselftest repo, it adds a new test for
> the recent non-contiguous CBM support.

James,

You didn't see this WARN because it isn't in your patch. Diff between what
is in your tree and what was applied to TIP:

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9f1aa555a8ea..8a4ef4f5bddc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
* about locks this thread holds will lead to false positives. Check
* someone is holding the CPUs lock.
*/
- if (IS_ENABLED(CONFIG_LOCKDEP))
- lockdep_is_cpus_held();
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
+ WARN_ON_ONCE(!lockdep_is_cpus_held());

list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index dc59643498bf..7997b47743a2 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -567,7 +567,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
* cpumask_any_housekeeping() prefers housekeeping CPUs, but
* are all the CPUs nohz_full? If yes, pick a CPU to IPI.
* MPAM's resctrl_arch_rmid_read() is unable to read the
- * counters on some platforms if its called in irq context.
+ * counters on some platforms if its called in IRQ context.
*/
if (tick_nohz_full_cpu(cpu))
smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);

-Tony

2024-02-21 19:32:20

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v2] x86/resctrl: Fix WARN in get_domain_from_cpu()

reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Fix by adding the target domain to the msr_param structure and passing an
array with CDP_NUM_TYPES entries. Then calling for each domain separately
using smp_call_function_single()

Change the low level cat_wrmsr(), mba_wrmsr_intel(), and mba_wrmsr_amd()
functions to just take a msr_param structure since it contains the
rdt_resource and rdt_domain information.

Signed-off-by: Tony Luck <[email protected]>

---

Changes since v1:

* Avoid double IPI to the same core when CDP is enabled.
* Don't pass the resource and domain to functions that can
get these from the msr_param structure.
* Clean up some fir tree issues in functions that I changed.


arch/x86/kernel/cpu/resctrl/internal.h | 4 +-
arch/x86/kernel/cpu/resctrl/core.c | 56 +++++++++------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 59 +++++++----------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 ++++-----
4 files changed, 50 insertions(+), 92 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c99f26ebe7a6..2f21358b9621 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -383,6 +383,7 @@ static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
*/
struct msr_param {
struct rdt_resource *res;
+ struct rdt_domain *dom;
u32 low;
u32 high;
};
@@ -442,8 +443,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+ void (*msr_update)(struct msr_param *m);
unsigned int mon_scale;
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8a4ef4f5bddc..643bf64bf1be 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -56,14 +56,9 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+static void mba_wrmsr_intel(struct msr_param *m);
+static void cat_wrmsr(struct msr_param *m);
+static void mba_wrmsr_amd(struct msr_param *m);

#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)

@@ -309,12 +304,11 @@ static void rdt_get_cdp_l2_config(void)
rdt_get_cdp_config(RDT_RESOURCE_L2);
}

-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void mba_wrmsr_amd(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -334,25 +328,22 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
return r->default_ctrl;
}

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r)
+static void mba_wrmsr_intel(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
- wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], r));
+ wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], m->res));
}

-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void cat_wrmsr(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -387,19 +378,14 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r)

void rdt_ctrl_update(void *arg)
{
+ struct rdt_hw_resource *hw_res;
struct msr_param *m = arg;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
- struct rdt_resource *r = m->res;
- int cpu = smp_processor_id();
- struct rdt_domain *d;
+ int t;

- d = get_domain_from_cpu(cpu, r);
- if (d) {
- hw_res->msr_update(d, m, r);
- return;
- }
- pr_warn_once("cpu %d not found in any domain for resource %s\n",
- cpu, r->name);
+ hw_res = resctrl_to_arch_res(m->res);
+ for (t = 0; t < CDP_NUM_TYPES; t++)
+ if (m[t].dom)
+ hw_res->msr_update(m + t);
}

/*
@@ -472,9 +458,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
hw_dom->ctrl_val = dc;
setup_default_ctrlval(r, dc);

+ m.res = r;
+ m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
- hw_res->msr_update(d, &m, r);
+ hw_res->msr_update(&m);
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7997b47743a2..09f6e624f1bb 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
- struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask)
-{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
-
- if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
- cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- hw_dom->ctrl_val[idx] = cfg->new_ctrl;
-
- return true;
- }
-
- return false;
-}
-
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
@@ -304,59 +288,50 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
msr_param.res = r;
msr_param.low = idx;
msr_param.high = idx + 1;
- hw_res->msr_update(d, &msr_param, r);
+ hw_res->msr_update(&msr_param);

return 0;
}

int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
{
+ struct msr_param msr_param[CDP_NUM_TYPES];
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
- struct msr_param msr_param;
enum resctrl_conf_type t;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
+ bool need_update;
+ int cpu;
u32 idx;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
- msr_param.res = NULL;
+ memset(msr_param, 0, sizeof(msr_param));
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
+ need_update = false;
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask))
+ if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
continue;
-
- if (!msr_param.res) {
- msr_param.low = idx;
- msr_param.high = msr_param.low + 1;
- msr_param.res = r;
- } else {
- msr_param.low = min(msr_param.low, idx);
- msr_param.high = max(msr_param.high, idx + 1);
- }
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;
+ cpu = cpumask_any(&d->cpu_mask);
+
+ msr_param[t].low = idx;
+ msr_param[t].high = msr_param[t].low + 1;
+ msr_param[t].res = r;
+ msr_param[t].dom = d;
+ need_update = true;
}
+ if (need_update)
+ smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);
}

- if (cpumask_empty(cpu_mask))
- goto done;
-
- /* Update resource control msr on all the CPUs. */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
-done:
- free_cpumask_var(cpu_mask);
-
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 011e17efb1a6..5d9ff8883c60 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2811,21 +2811,19 @@ static int rdt_init_fs_context(struct fs_context *fc)
static int reset_all_ctrls(struct rdt_resource *r)
{
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+ struct msr_param msr_param[CDP_NUM_TYPES];
struct rdt_hw_domain *hw_dom;
- struct msr_param msr_param;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
+ int cpu;
int i;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
- msr_param.res = r;
- msr_param.low = 0;
- msr_param.high = hw_res->num_closid;
+ memset(msr_param, 0, sizeof(msr_param));
+ msr_param[0].res = r;
+ msr_param[0].low = 0;
+ msr_param[0].high = hw_res->num_closid;

/*
* Disable resource control for this resource by setting all
@@ -2834,17 +2832,14 @@ static int reset_all_ctrls(struct rdt_resource *r)
*/
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+ cpu = cpumask_any(&d->cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
+ msr_param[0].dom = d;
+ smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);
}

- /* Update CBM on all the CPUs in cpu_mask */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
- free_cpumask_var(cpu_mask);
-
return 0;
}

--
2.43.0


2024-02-21 22:59:59

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v2] x86/resctrl: Fix WARN in get_domain_from_cpu()

Hi Tony,

On 2/21/2024 11:31 AM, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
> to call rdt_ctrl_update() on potentially one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain to
> apply changes to. Doing so requires a search of all domains in a resource,
> which can only be done safely if cpus_lock is held. Both callers do hold
> this lock, but there isn't a way for a function called on another CPU
> via IPI to verify this.
>
> Fix by adding the target domain to the msr_param structure and passing an
> array with CDP_NUM_TYPES entries. Then calling for each domain separately
> using smp_call_function_single()

This work contains no changes to get_domain_from_cpu(). I expect the WARN
within it to be removed as intended with [1] and then this work can build
on that without urgency. As I understand, to support the stated goal of this
work, I expect get_domain_from_cpu() in the end to not have any WARN or
IS_ENABLED checks, but just a lockdep_assert_cpus_held().

Do you have different expectations?

> Change the low level cat_wrmsr(), mba_wrmsr_intel(), and mba_wrmsr_amd()
> functions to just take a msr_param structure since it contains the
> rdt_resource and rdt_domain information.

Could moving the rdt_domain into msr_param be done in a separate patch?

..

> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 7997b47743a2..09f6e624f1bb 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
> }
> }
>
> -static bool apply_config(struct rdt_hw_domain *hw_dom,
> - struct resctrl_staged_config *cfg, u32 idx,
> - cpumask_var_t cpu_mask)
> -{
> - struct rdt_domain *dom = &hw_dom->d_resctrl;
> -
> - if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
> - cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
> - hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> -
> - return true;
> - }
> -
> - return false;
> -}
> -
> int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> {
> @@ -304,59 +288,50 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> msr_param.res = r;
> msr_param.low = idx;
> msr_param.high = idx + 1;
> - hw_res->msr_update(d, &msr_param, r);
> + hw_res->msr_update(&msr_param);
>

Is this missing setting the domain in msr_param?

> return 0;
> }
>
> int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
> {
> + struct msr_param msr_param[CDP_NUM_TYPES];
> struct resctrl_staged_config *cfg;
> struct rdt_hw_domain *hw_dom;
> - struct msr_param msr_param;
> enum resctrl_conf_type t;
> - cpumask_var_t cpu_mask;
> struct rdt_domain *d;
> + bool need_update;
> + int cpu;
> u32 idx;
>
> /* Walking r->domains, ensure it can't race with cpuhp */
> lockdep_assert_cpus_held();
>
> - if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
> - return -ENOMEM;
> -
> - msr_param.res = NULL;
> + memset(msr_param, 0, sizeof(msr_param));
> list_for_each_entry(d, &r->domains, list) {
> hw_dom = resctrl_to_arch_dom(d);
> + need_update = false;
> for (t = 0; t < CDP_NUM_TYPES; t++) {
> cfg = &hw_dom->d_resctrl.staged_config[t];
> if (!cfg->have_new_ctrl)
> continue;
>
> idx = get_config_index(closid, t);
> - if (!apply_config(hw_dom, cfg, idx, cpu_mask))
> + if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
> continue;
> -
> - if (!msr_param.res) {
> - msr_param.low = idx;
> - msr_param.high = msr_param.low + 1;
> - msr_param.res = r;
> - } else {
> - msr_param.low = min(msr_param.low, idx);
> - msr_param.high = max(msr_param.high, idx + 1);
> - }
> + hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> + cpu = cpumask_any(&d->cpu_mask);
> +
> + msr_param[t].low = idx;
> + msr_param[t].high = msr_param[t].low + 1;
> + msr_param[t].res = r;
> + msr_param[t].dom = d;
> + need_update = true;
> }
> + if (need_update)
> + smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);

It is not clear to me why it is needed to pass this additional data. Why not
just use the original mechanism of letting the low and high of msr_param span the
multiple indices that need updating? There can still be a "need_update" but it
can be set when msr_param gets its first data. Any other index needing updating
can just update low/high and a single msr_param can be used.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

2024-02-21 23:58:05

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v2] x86/resctrl: Fix WARN in get_domain_from_cpu()

On Wed, Feb 21, 2024 at 02:59:43PM -0800, Reinette Chatre wrote:
> Hi Tony,
>
> On 2/21/2024 11:31 AM, Tony Luck wrote:
> > reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
> > to call rdt_ctrl_update() on potentially one CPU from each domain.
> >
> > But this means rdt_ctrl_update() needs to figure out which domain to
> > apply changes to. Doing so requires a search of all domains in a resource,
> > which can only be done safely if cpus_lock is held. Both callers do hold
> > this lock, but there isn't a way for a function called on another CPU
> > via IPI to verify this.
> >
> > Fix by adding the target domain to the msr_param structure and passing an
> > array with CDP_NUM_TYPES entries. Then calling for each domain separately
> > using smp_call_function_single()
>
> This work contains no changes to get_domain_from_cpu(). I expect the WARN
> within it to be removed as intended with [1] and then this work can build
> on that without urgency. As I understand, to support the stated goal of this
> work, I expect get_domain_from_cpu() in the end to not have any WARN or
> IS_ENABLED checks, but just a lockdep_assert_cpus_held().
>
> Do you have different expectations?

Same expectations. Boris should apply the simple fix (delete the WARN
that is giving a false positive) for this current cycle.

If there is support for my patch (with changes/fixes you point out
below), then it could be added in the future and get_domain_from_cpu()
can use lockdep_assert_cpus_held().

> > Change the low level cat_wrmsr(), mba_wrmsr_intel(), and mba_wrmsr_amd()
> > functions to just take a msr_param structure since it contains the
> > rdt_resource and rdt_domain information.
>
> Could moving the rdt_domain into msr_param be done in a separate patch?

I can break it into more pieces if there is enthusiam to apply it.

> ...
>
> > diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > index 7997b47743a2..09f6e624f1bb 100644
> > --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > @@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
> > }
> > }
> >
> > -static bool apply_config(struct rdt_hw_domain *hw_dom,
> > - struct resctrl_staged_config *cfg, u32 idx,
> > - cpumask_var_t cpu_mask)
> > -{
> > - struct rdt_domain *dom = &hw_dom->d_resctrl;
> > -
> > - if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
> > - cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
> > - hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> > -
> > - return true;
> > - }
> > -
> > - return false;
> > -}
> > -
> > int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> > u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> > {
> > @@ -304,59 +288,50 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> > msr_param.res = r;
> > msr_param.low = idx;
> > msr_param.high = idx + 1;
> > - hw_res->msr_update(d, &msr_param, r);
> > + hw_res->msr_update(&msr_param);
> >
>
> Is this missing setting the domain in msr_param?

Indeed yes.

> > return 0;
> > }
> >
> > int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
> > {
> > + struct msr_param msr_param[CDP_NUM_TYPES];
> > struct resctrl_staged_config *cfg;
> > struct rdt_hw_domain *hw_dom;
> > - struct msr_param msr_param;
> > enum resctrl_conf_type t;
> > - cpumask_var_t cpu_mask;
> > struct rdt_domain *d;
> > + bool need_update;
> > + int cpu;
> > u32 idx;
> >
> > /* Walking r->domains, ensure it can't race with cpuhp */
> > lockdep_assert_cpus_held();
> >
> > - if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
> > - return -ENOMEM;
> > -
> > - msr_param.res = NULL;
> > + memset(msr_param, 0, sizeof(msr_param));
> > list_for_each_entry(d, &r->domains, list) {
> > hw_dom = resctrl_to_arch_dom(d);
> > + need_update = false;
> > for (t = 0; t < CDP_NUM_TYPES; t++) {
> > cfg = &hw_dom->d_resctrl.staged_config[t];
> > if (!cfg->have_new_ctrl)
> > continue;
> >
> > idx = get_config_index(closid, t);
> > - if (!apply_config(hw_dom, cfg, idx, cpu_mask))
> > + if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
> > continue;
> > -
> > - if (!msr_param.res) {
> > - msr_param.low = idx;
> > - msr_param.high = msr_param.low + 1;
> > - msr_param.res = r;
> > - } else {
> > - msr_param.low = min(msr_param.low, idx);
> > - msr_param.high = max(msr_param.high, idx + 1);
> > - }
> > + hw_dom->ctrl_val[idx] = cfg->new_ctrl;
> > + cpu = cpumask_any(&d->cpu_mask);
> > +
> > + msr_param[t].low = idx;
> > + msr_param[t].high = msr_param[t].low + 1;
> > + msr_param[t].res = r;
> > + msr_param[t].dom = d;
> > + need_update = true;
> > }
> > + if (need_update)
> > + smp_call_function_single(cpu, rdt_ctrl_update, &msr_param, 1);
>
> It is not clear to me why it is needed to pass this additional data. Why not
> just use the original mechanism of letting the low and high of msr_param span the
> multiple indices that need updating? There can still be a "need_update" but it
> can be set when msr_param gets its first data. Any other index needing updating
> can just update low/high and a single msr_param can be used.

For some reason this morning I thought that the domain needed to be
different. It isn't, so keeping the code that just adjusts the range
of MSRs will work just fine.

The "need_update" variable isn't required. I will move the
msr_param.res = NULL;
inside the for each domain loop, and can use non-NULL value to
decide whether to IPI a CPU.

-Tony

Subject: [tip: x86/cache] x86/resctrl: Remove lockdep annotation that triggers false positive

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: c0d848fcb09d80a5f48b99f85e448185125ef59f
Gitweb: https://git.kernel.org/tip/c0d848fcb09d80a5f48b99f85e448185125ef59f
Author: James Morse <[email protected]>
AuthorDate: Wed, 21 Feb 2024 12:23:06
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Thu, 22 Feb 2024 16:15:38 +01:00

x86/resctrl: Remove lockdep annotation that triggers false positive

get_domain_from_cpu() walks a list of domains to find the one that
contains the specified CPU. This needs to be protected against races
with CPU hotplug when the list is modified. It has recently gained
a lockdep annotation to check this.

The lockdep annotation causes false positives when called via IPI as the
lock is held, but by another process. Remove it.

[ bp: Refresh it ontop of x86/cache. ]

Fixes: fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")
Reported-by: Tony Luck <[email protected]>
Signed-off-by: James Morse <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/all/ZdUSwOM9UUNpw84Y@agluck-desk3
---
arch/x86/kernel/cpu/resctrl/core.c | 9 ---------
1 file changed, 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8a4ef4f..83e4034 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -362,15 +362,6 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

- /*
- * Walking r->domains, ensure it can't race with cpuhp.
- * Because this is called via IPI by rdt_ctrl_update(), assertions
- * about locks this thread holds will lead to false positives. Check
- * someone is holding the CPUs lock.
- */
- if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP))
- WARN_ON_ONCE(!lockdep_is_cpus_held());
-
list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))

2024-02-22 19:00:55

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v3 0/2] x86/resctrl: Pass domain to target CPU

When a function is called via IPI, it isn't possible for assertions
in source code to check that the right locks are held when those
locks were obtained by the sender of the IPI.

Restructure some code to avoid the need for the check.

Patch 1 has the actual fix

Patch 2 is just some code cleanups

Changes since V2:

1) Rebased on TIP x86/cache

2) Added a missed setting of msr_param.dom in resctrl_arch_update_one()

3) Dropped the code that used separate msr_param structures for CDP

4) Added lockdep_assert_cpus_held() to get_domain_from_cpu()

5) Split into two patches

Tony Luck (2):
x86/resctrl: Pass domain to target CPU
x86/resctrl: Simply call convention for MSR update functions

arch/x86/kernel/cpu/resctrl/internal.h | 4 +-
arch/x86/kernel/cpu/resctrl/core.c | 55 +++++++++--------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 42 +++++------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 ++----
4 files changed, 37 insertions(+), 78 deletions(-)


base-commit: c0d848fcb09d80a5f48b99f85e448185125ef59f
--
2.43.0


2024-02-22 19:52:10

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v3 2/2] x86/resctrl: Simply call convention for MSR update functions

The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:

(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)

But struct msr_param has always contained the rdt_resource, and now
contains the rdt_domain too.

Change to just pass struct msr_param as a single parameter. Clean
up formatting and fix some firtree parameter ordering.

No functional change.

Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 3 +-
arch/x86/kernel/cpu/resctrl/core.c | 39 +++++++++--------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +-
3 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c30d7697b431..2f21358b9621 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -443,8 +443,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+ void (*msr_update)(struct msr_param *m);
unsigned int mon_scale;
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8d378fc7a50b..7751eea19fd2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -56,14 +56,9 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+static void mba_wrmsr_intel(struct msr_param *m);
+static void cat_wrmsr(struct msr_param *m);
+static void mba_wrmsr_amd(struct msr_param *m);

#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)

@@ -309,12 +304,11 @@ static void rdt_get_cdp_l2_config(void)
rdt_get_cdp_config(RDT_RESOURCE_L2);
}

-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void mba_wrmsr_amd(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -334,25 +328,22 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
return r->default_ctrl;
}

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r)
+static void mba_wrmsr_intel(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
- wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], r));
+ wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], m->res));
}

-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void cat_wrmsr(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -384,7 +375,7 @@ void rdt_ctrl_update(void *arg)
struct msr_param *m = arg;

hw_res = resctrl_to_arch_res(m->res);
- hw_res->msr_update(m->dom, m, m->res);
+ hw_res->msr_update(m);
}

/*
@@ -461,7 +452,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
- hw_res->msr_update(d, &m, r);
+ hw_res->msr_update(&m);
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a3a0fd80daa8..7471f6b747b6 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -289,7 +289,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
- hw_res->msr_update(d, &msr_param, r);
+ hw_res->msr_update(&msr_param);

return 0;
}
--
2.43.0


2024-02-22 23:28:45

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86/resctrl: Pass domain to target CPU

Hi Tony,

Is sending new versions of patch series in response to the previous
versions a new custom? I am finding the SNC thread [1] to have become
a maze and now this thread is headed in the same direction. My understanding
of custom (supported by [2]) is to send new versions as a new thread.
This thread even complicates it more by mixing versions of different
features in the same email thread.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#explicit-in-reply-to-headers

2024-02-22 23:28:48

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v3 0/2] x86/resctrl: Pass domain to target CPU

> Is sending new versions of patch series in response to the previous
> versions a new custom? I am finding the SNC thread [1] to have become
> a maze and now this thread is headed in the same direction. My understanding
> of custom (supported by [2]) is to send new versions as a new thread.
> This thread even complicates it more by mixing versions of different
> features in the same email thread.

Reinette,

Not new for me. I've always (tried) to link everything together.

But thanks for this link:

[2] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#explicit-in-reply-to-headers

I see that this isn't desired. I'll switch over to adding a Link: URL in the cover
letter going forward.[1]

-Tony

[1] I'm going to need a v4 if only to s/Simply/Simplify/ in the subject of part 2 :-(

2024-02-27 22:08:05

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/resctrl: Simply call convention for MSR update functions

Hi Tony,

On 2/22/2024 10:50 AM, Tony Luck wrote:
> The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
> and mba_wrmsr_amd() all take three arguments:
>
> (struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
>
> But struct msr_param has always contained the rdt_resource, and now
> contains the rdt_domain too.
>
> Change to just pass struct msr_param as a single parameter. Clean
> up formatting and fix some firtree parameter ordering.

Please stick to imperative tone. For example (feel free to improve):
struct msr_param contains pointers to both struct rdt_resource
and struct rdt_domain, thus only struct msr_param is necessary.

Pass struct msr_param as a single parameter. Clean
up formatting and fix some fir tree declaration ordering.

The patch looks good to me, thank you.

Reinette

2024-03-08 21:39:06

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v5 0/2] x86/resctrl: Pass domain to target CPU

reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
false positive") removed the incorrect assertions.

Add the target domain to the msr_param structure and
call rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely assert
that the cpus_lock is held since the remaining callers do not use IPI.

Signed-off-by: Tony Luck <[email protected]>

---
Changes since V4: Link: https://lore.kernel.org/all/[email protected]/

Reinette: Only assign "cpu" once in resctrl_arch_update_domains() [but
see change from James below]

James: Use smp_call_function_any() instead of cpumask_any() +
smp_call_function_single() to avoid unnecessary IPI in both
resctrl_arch_update_domains() and reset_all_ctrls(). This
eliminates a need for the "cpu" local variable.

Tony Luck (2):
x86/resctrl: Pass domain to target CPU
x86/resctrl: Simplify call convention for MSR update functions

arch/x86/kernel/cpu/resctrl/internal.h | 5 ++-
arch/x86/kernel/cpu/resctrl/core.c | 55 +++++++++--------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 40 ++++-------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +----
4 files changed, 34 insertions(+), 78 deletions(-)


base-commit: c0d848fcb09d80a5f48b99f85e448185125ef59f
--
2.43.0


2024-03-08 21:39:17

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v5 1/2] x86/resctrl: Pass domain to target CPU

reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
false positive") removed the incorrect assertions.

Add the target domain to the msr_param structure and
call rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely assert
that the cpus_lock is held since the remaining callers do not use IPI.

Signed-off-by: Tony Luck <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 17 ++++------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 38 +++++------------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 ++-----
4 files changed, 17 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c99f26ebe7a6..bc999471f072 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -378,11 +378,13 @@ static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
/**
* struct msr_param - set a range of MSRs from a domain
* @res: The resource to use
+ * @dom: The domain to update
* @low: Beginning index from base MSR
* @high: End index
*/
struct msr_param {
struct rdt_resource *res;
+ struct rdt_domain *dom;
u32 low;
u32 high;
};
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 83e40341583e..acf52aa185e0 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -362,6 +362,8 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

+ lockdep_assert_cpus_held();
+
list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
@@ -378,19 +380,11 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r)

void rdt_ctrl_update(void *arg)
{
+ struct rdt_hw_resource *hw_res;
struct msr_param *m = arg;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
- struct rdt_resource *r = m->res;
- int cpu = smp_processor_id();
- struct rdt_domain *d;

- d = get_domain_from_cpu(cpu, r);
- if (d) {
- hw_res->msr_update(d, m, r);
- return;
- }
- pr_warn_once("cpu %d not found in any domain for resource %s\n",
- cpu, r->name);
+ hw_res = resctrl_to_arch_res(m->res);
+ hw_res->msr_update(m->dom, m, m->res);
}

/*
@@ -463,6 +457,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
hw_dom->ctrl_val = dc;
setup_default_ctrlval(r, dc);

+ m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
hw_res->msr_update(d, &m, r);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7997b47743a2..165d8d453c04 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
- struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask)
-{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
-
- if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
- cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- hw_dom->ctrl_val[idx] = cfg->new_ctrl;
-
- return true;
- }
-
- return false;
-}
-
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
@@ -302,6 +286,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
hw_dom->ctrl_val[idx] = cfg_val;

msr_param.res = r;
+ msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
hw_res->msr_update(d, &msr_param, r);
@@ -315,48 +300,39 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
u32 idx;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
- msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
+ msr_param.res = NULL;
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask))
+ if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
continue;
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;

if (!msr_param.res) {
msr_param.low = idx;
msr_param.high = msr_param.low + 1;
msr_param.res = r;
+ msr_param.dom = d;
} else {
msr_param.low = min(msr_param.low, idx);
msr_param.high = max(msr_param.high, idx + 1);
}
}
+ if (msr_param.res)
+ smp_call_function_any(&d->cpu_mask, rdt_ctrl_update, &msr_param, 1);
}

- if (cpumask_empty(cpu_mask))
- goto done;
-
- /* Update resource control msr on all the CPUs. */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
-done:
- free_cpumask_var(cpu_mask);
-
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 011e17efb1a6..02f213f1c51c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2813,16 +2813,12 @@ static int reset_all_ctrls(struct rdt_resource *r)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
int i;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
msr_param.res = r;
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
@@ -2834,17 +2830,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
*/
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
+ msr_param.dom = d;
+ smp_call_function_any(&d->cpu_mask, rdt_ctrl_update, &msr_param, 1);
}

- /* Update CBM on all the CPUs in cpu_mask */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
- free_cpumask_var(cpu_mask);
-
return 0;
}

--
2.43.0


2024-03-08 21:39:19

by Luck, Tony

[permalink] [raw]
Subject: [PATCH v5 2/2] x86/resctrl: Simplify call convention for MSR update functions

The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:

(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)

struct msr_param contains pointers to both struct rdt_resource and struct
rdt_domain, thus only struct msr_param is necessary.

Pass struct msr_param as a single parameter. Clean up formatting and
fix some fir tree declaration ordering.

No functional change.

Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
---
arch/x86/kernel/cpu/resctrl/internal.h | 3 +-
arch/x86/kernel/cpu/resctrl/core.c | 40 +++++++++--------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +-
3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index bc999471f072..8f40fb35db78 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -444,8 +444,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+ void (*msr_update)(struct msr_param *m);
unsigned int mon_scale;
unsigned int mbm_width;
unsigned int mbm_cfg_mask;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index acf52aa185e0..7751eea19fd2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -56,14 +56,9 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+static void mba_wrmsr_intel(struct msr_param *m);
+static void cat_wrmsr(struct msr_param *m);
+static void mba_wrmsr_amd(struct msr_param *m);

#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)

@@ -309,12 +304,11 @@ static void rdt_get_cdp_l2_config(void)
rdt_get_cdp_config(RDT_RESOURCE_L2);
}

-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void mba_wrmsr_amd(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -334,25 +328,22 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
return r->default_ctrl;
}

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r)
+static void mba_wrmsr_intel(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
- wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], r));
+ wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], m->res));
}

-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void cat_wrmsr(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -384,7 +375,7 @@ void rdt_ctrl_update(void *arg)
struct msr_param *m = arg;

hw_res = resctrl_to_arch_res(m->res);
- hw_res->msr_update(m->dom, m, m->res);
+ hw_res->msr_update(m);
}

/*
@@ -457,10 +448,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
hw_dom->ctrl_val = dc;
setup_default_ctrlval(r, dc);

+ m.res = r;
m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
- hw_res->msr_update(d, &m, r);
+ hw_res->msr_update(&m);
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 165d8d453c04..b7291f60399c 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -289,7 +289,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
- hw_res->msr_update(d, &msr_param, r);
+ hw_res->msr_update(&msr_param);

return 0;
}
--
2.43.0


2024-03-08 23:13:03

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v5 0/2] x86/resctrl: Pass domain to target CPU

Hi Tony,

On 3/8/2024 1:38 PM, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
> to call rdt_ctrl_update() on potentially one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain to
> apply changes to. Doing so requires a search of all domains in a resource,
> which can only be done safely if cpus_lock is held. Both callers do hold
> this lock, but there isn't a way for a function called on another CPU
> via IPI to verify this.
>
> Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
> false positive") removed the incorrect assertions.
>
> Add the target domain to the msr_param structure and
> call rdt_ctrl_update() for each domain separately using
> smp_call_function_single(). This means that rdt_ctrl_update() doesn't
> need to search for the domain and get_domain_from_cpu() can safely assert
> that the cpus_lock is held since the remaining callers do not use IPI.
>
> Signed-off-by: Tony Luck <[email protected]>
>
> ---
> Changes since V4: Link: https://lore.kernel.org/all/[email protected]/
>
> Reinette: Only assign "cpu" once in resctrl_arch_update_domains() [but
> see change from James below]
>
> James: Use smp_call_function_any() instead of cpumask_any() +
> smp_call_function_single() to avoid unnecessary IPI in both
> resctrl_arch_update_domains() and reset_all_ctrls(). This
> eliminates a need for the "cpu" local variable.

Great catch, thank you for doing this.

Also, could you please stop referring to previous versions (in this
case it even follows v3) using "In-Reply-To:"? Although v5 here
follows v3, not v4. You previously [1] agreed and did so for v4
but not here.

Reinette

[1] https://lore.kernel.org/lkml/SJ1PR11MB60839AD3BC4DB15BF5B0BBE0FC562@SJ1PR11MB6083.namprd11.prod.outlook.com/

2024-03-08 23:26:39

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v5 0/2] x86/resctrl: Pass domain to target CPU

> Also, could you please stop referring to previous versions (in this
> case it even follows v3) using "In-Reply-To:"? Although v5 here
> follows v3, not v4. You previously [1] agreed and did so for v4
> but not here.

Reinette

sorry. When sending a patch, or series I create a SEND shell script with
the "git send-emal" command line with all the "--to" and "--cc" arguments
to send to the right people and mailing lists.

I copied an old SEND script and forgot to delete the "--in-reply-to"
argument from the script :-(

Will try to do better.

-Tony

2024-03-12 20:06:19

by Babu Moger

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] x86/resctrl: Simplify call convention for MSR update functions



On 3/8/24 15:38, Tony Luck wrote:
> The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
> and mba_wrmsr_amd() all take three arguments:
>
> (struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
>
> struct msr_param contains pointers to both struct rdt_resource and struct
> rdt_domain, thus only struct msr_param is necessary.
>
> Pass struct msr_param as a single parameter. Clean up formatting and
> fix some fir tree declaration ordering.
>
> No functional change.
>
> Suggested-by: Reinette Chatre <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> Tested-by: Maciej Wieczor-Retman <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>

Reviewed-by: Babu Moger <[email protected]>

2024-03-12 20:07:06

by Babu Moger

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] x86/resctrl: Pass domain to target CPU



On 3/8/24 15:38, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
> to call rdt_ctrl_update() on potentially one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain to
> apply changes to. Doing so requires a search of all domains in a resource,
> which can only be done safely if cpus_lock is held. Both callers do hold
> this lock, but there isn't a way for a function called on another CPU
> via IPI to verify this.
>
> Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
> false positive") removed the incorrect assertions.
>
> Add the target domain to the msr_param structure and
> call rdt_ctrl_update() for each domain separately using
> smp_call_function_single(). This means that rdt_ctrl_update() doesn't
> need to search for the domain and get_domain_from_cpu() can safely assert
> that the cpus_lock is held since the remaining callers do not use IPI.
>
> Signed-off-by: Tony Luck <[email protected]>
> Tested-by: Maciej Wieczor-Retman <[email protected]>
> Reviewed-by: Reinette Chatre <[email protected]>
> Reviewed-by: James Morse <[email protected]>

Reviewed-by: Babu Moger <[email protected]>


2024-03-12 20:08:55

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v5 1/2] x86/resctrl: Pass domain to target CPU

> Reviewed-by: Babu Moger <[email protected]>

Babu - thanks for both this, and the part 2 review.

-Tony

2024-03-29 04:55:57

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v5 0/2] x86/resctrl: Pass domain to target CPU

Hi Boris,

Could you please consider this series for inclusion?

Thank you very much.

Reinette

On 3/8/2024 1:38 PM, Tony Luck wrote:
> reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
> to call rdt_ctrl_update() on potentially one CPU from each domain.
>
> But this means rdt_ctrl_update() needs to figure out which domain to
> apply changes to. Doing so requires a search of all domains in a resource,
> which can only be done safely if cpus_lock is held. Both callers do hold
> this lock, but there isn't a way for a function called on another CPU
> via IPI to verify this.
>
> Commit c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
> false positive") removed the incorrect assertions.
>
> Add the target domain to the msr_param structure and
> call rdt_ctrl_update() for each domain separately using
> smp_call_function_single(). This means that rdt_ctrl_update() doesn't
> need to search for the domain and get_domain_from_cpu() can safely assert
> that the cpus_lock is held since the remaining callers do not use IPI.
>
> Signed-off-by: Tony Luck <[email protected]>
>
> ---
> Changes since V4: Link: https://lore.kernel.org/all/[email protected]/
>
> Reinette: Only assign "cpu" once in resctrl_arch_update_domains() [but
> see change from James below]
>
> James: Use smp_call_function_any() instead of cpumask_any() +
> smp_call_function_single() to avoid unnecessary IPI in both
> resctrl_arch_update_domains() and reset_all_ctrls(). This
> eliminates a need for the "cpu" local variable.
>
> Tony Luck (2):
> x86/resctrl: Pass domain to target CPU
> x86/resctrl: Simplify call convention for MSR update functions
>
> arch/x86/kernel/cpu/resctrl/internal.h | 5 ++-
> arch/x86/kernel/cpu/resctrl/core.c | 55 +++++++++--------------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 40 ++++-------------
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +----
> 4 files changed, 34 insertions(+), 78 deletions(-)
>
>
> base-commit: c0d848fcb09d80a5f48b99f85e448185125ef59f

Subject: [tip: x86/cache] x86/resctrl: Pass domain to target CPU

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: e3ca96e479c91d6ee657d3caa5092a6a3a620f9f
Gitweb: https://git.kernel.org/tip/e3ca96e479c91d6ee657d3caa5092a6a3a620f9f
Author: Tony Luck <[email protected]>
AuthorDate: Fri, 08 Mar 2024 13:38:45 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 24 Apr 2024 13:41:41 +02:00

x86/resctrl: Pass domain to target CPU

reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Commit

c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
false positive")

removed the incorrect assertions.

Add the target domain to the msr_param structure and call
rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely
assert that the cpus_lock is held since the remaining callers do not use
IPI.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: James Morse <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 17 +++-------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 38 ++++------------------
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +------
4 files changed, 17 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 83e4034..acf52aa 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -362,6 +362,8 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
{
struct rdt_domain *d;

+ lockdep_assert_cpus_held();
+
list_for_each_entry(d, &r->domains, list) {
/* Find the domain that contains this CPU */
if (cpumask_test_cpu(cpu, &d->cpu_mask))
@@ -378,19 +380,11 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r)

void rdt_ctrl_update(void *arg)
{
+ struct rdt_hw_resource *hw_res;
struct msr_param *m = arg;
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
- struct rdt_resource *r = m->res;
- int cpu = smp_processor_id();
- struct rdt_domain *d;

- d = get_domain_from_cpu(cpu, r);
- if (d) {
- hw_res->msr_update(d, m, r);
- return;
- }
- pr_warn_once("cpu %d not found in any domain for resource %s\n",
- cpu, r->name);
+ hw_res = resctrl_to_arch_res(m->res);
+ hw_res->msr_update(m->dom, m, m->res);
}

/*
@@ -463,6 +457,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
hw_dom->ctrl_val = dc;
setup_default_ctrlval(r, dc);

+ m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
hw_res->msr_update(d, &m, r);
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7997b47..165d8d4 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -272,22 +272,6 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
}
}

-static bool apply_config(struct rdt_hw_domain *hw_dom,
- struct resctrl_staged_config *cfg, u32 idx,
- cpumask_var_t cpu_mask)
-{
- struct rdt_domain *dom = &hw_dom->d_resctrl;
-
- if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
- cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
- hw_dom->ctrl_val[idx] = cfg->new_ctrl;
-
- return true;
- }
-
- return false;
-}
-
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
@@ -302,6 +286,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
hw_dom->ctrl_val[idx] = cfg_val;

msr_param.res = r;
+ msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
hw_res->msr_update(d, &msr_param, r);
@@ -315,48 +300,39 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
enum resctrl_conf_type t;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
u32 idx;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
- msr_param.res = NULL;
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
+ msr_param.res = NULL;
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &hw_dom->d_resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
continue;

idx = get_config_index(closid, t);
- if (!apply_config(hw_dom, cfg, idx, cpu_mask))
+ if (cfg->new_ctrl == hw_dom->ctrl_val[idx])
continue;
+ hw_dom->ctrl_val[idx] = cfg->new_ctrl;

if (!msr_param.res) {
msr_param.low = idx;
msr_param.high = msr_param.low + 1;
msr_param.res = r;
+ msr_param.dom = d;
} else {
msr_param.low = min(msr_param.low, idx);
msr_param.high = max(msr_param.high, idx + 1);
}
}
+ if (msr_param.res)
+ smp_call_function_any(&d->cpu_mask, rdt_ctrl_update, &msr_param, 1);
}

- if (cpumask_empty(cpu_mask))
- goto done;
-
- /* Update resource control msr on all the CPUs. */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
-done:
- free_cpumask_var(cpu_mask);
-
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1a8687f..ab2d315 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -379,11 +379,13 @@ static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
/**
* struct msr_param - set a range of MSRs from a domain
* @res: The resource to use
+ * @dom: The domain to update
* @low: Beginning index from base MSR
* @high: End index
*/
struct msr_param {
struct rdt_resource *res;
+ struct rdt_domain *dom;
u32 low;
u32 high;
};
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 011e17e..02f213f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2813,16 +2813,12 @@ static int reset_all_ctrls(struct rdt_resource *r)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
- cpumask_var_t cpu_mask;
struct rdt_domain *d;
int i;

/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();

- if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
- return -ENOMEM;
-
msr_param.res = r;
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
@@ -2834,17 +2830,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
*/
list_for_each_entry(d, &r->domains, list) {
hw_dom = resctrl_to_arch_dom(d);
- cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);

for (i = 0; i < hw_res->num_closid; i++)
hw_dom->ctrl_val[i] = r->default_ctrl;
+ msr_param.dom = d;
+ smp_call_function_any(&d->cpu_mask, rdt_ctrl_update, &msr_param, 1);
}

- /* Update CBM on all the CPUs in cpu_mask */
- on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-
- free_cpumask_var(cpu_mask);
-
return 0;
}


Subject: [tip: x86/cache] x86/resctrl: Simplify call convention for MSR update functions

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: bd4955d4bc2182ccb660c9c30a4dd7f36feaf943
Gitweb: https://git.kernel.org/tip/bd4955d4bc2182ccb660c9c30a4dd7f36feaf943
Author: Tony Luck <[email protected]>
AuthorDate: Fri, 08 Mar 2024 13:38:46 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 24 Apr 2024 13:47:00 +02:00

x86/resctrl: Simplify call convention for MSR update functions

The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:

(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)

struct msr_param contains pointers to both struct rdt_resource and struct
rdt_domain, thus only struct msr_param is necessary.

Pass struct msr_param as a single parameter. Clean up formatting and
fix some fir tree declaration ordering.

No functional change.

Suggested-by: Reinette Chatre <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Maciej Wieczor-Retman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/resctrl/core.c | 40 ++++++++--------------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 2 +-
arch/x86/kernel/cpu/resctrl/internal.h | 3 +--
3 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index acf52aa..7751eea 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -56,14 +56,9 @@ int max_name_width, max_data_width;
*/
bool rdt_alloc_capable;

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+static void mba_wrmsr_intel(struct msr_param *m);
+static void cat_wrmsr(struct msr_param *m);
+static void mba_wrmsr_amd(struct msr_param *m);

#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains)

@@ -309,12 +304,11 @@ static void rdt_get_cdp_l2_config(void)
rdt_get_cdp_config(RDT_RESOURCE_L2);
}

-static void
-mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void mba_wrmsr_amd(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -334,25 +328,22 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
return r->default_ctrl;
}

-static void
-mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r)
+static void mba_wrmsr_intel(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

/* Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
- wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], r));
+ wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], m->res));
}

-static void
-cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
+static void cat_wrmsr(struct msr_param *m)
{
+ struct rdt_hw_resource *hw_res = resctrl_to_arch_res(m->res);
+ struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(m->dom);
unsigned int i;
- struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);

for (i = m->low; i < m->high; i++)
wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
@@ -384,7 +375,7 @@ void rdt_ctrl_update(void *arg)
struct msr_param *m = arg;

hw_res = resctrl_to_arch_res(m->res);
- hw_res->msr_update(m->dom, m, m->res);
+ hw_res->msr_update(m);
}

/*
@@ -457,10 +448,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
hw_dom->ctrl_val = dc;
setup_default_ctrlval(r, dc);

+ m.res = r;
m.dom = d;
m.low = 0;
m.high = hw_res->num_closid;
- hw_res->msr_update(d, &m, r);
+ hw_res->msr_update(&m);
return 0;
}

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 165d8d4..b7291f6 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -289,7 +289,7 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
msr_param.dom = d;
msr_param.low = idx;
msr_param.high = idx + 1;
- hw_res->msr_update(d, &msr_param, r);
+ hw_res->msr_update(&msr_param);

return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ab2d315..f1d9268 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -445,8 +445,7 @@ struct rdt_hw_resource {
struct rdt_resource r_resctrl;
u32 num_closid;
unsigned int msr_base;
- void (*msr_update) (struct rdt_domain *d, struct msr_param *m,
- struct rdt_resource *r);
+ void (*msr_update)(struct msr_param *m);
unsigned int mon_scale;
unsigned int mbm_width;
unsigned int mbm_cfg_mask;