2023-09-14 17:34:23

by James Morse

[permalink] [raw]
Subject: [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
in a way that works for MPAM, and it separates the locking so that the arch code
and filesystem code don't have to share a mutex. I tried to split this as two
series, but these touch similar call sites, so it would create more work.

(What's MPAM? See the cover letter of the first series. [1])

On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
isn't an independent number - it extends the PARTID (same as CLOSID) space
with bits that aren't used to select the configuration. The monitors can
then be told to match specific PMG values, allowing monitor-groups to be
created.

But, MPAM expects the monitors to always monitor by PARTID. The
Cache-storage-utilisation counters can only work this way.
(In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
this)

It gets worse, as some SoCs may have very few PMG bits. I've seen the
datasheet for one that has a single bit of PMG space.

To be usable, MPAM's counters always need the PARTID and the PMG.
For resctrl, this means always making the CLOSID available when the RMID
is used.

To ensure RMID are always unique, this series combines the CLOSID and RMID
into an index, and manages RMID based on that. For x86, the index and RMID
would always be the same.


Currently the architecture specific code in the cpuhp callbacks takes the
rdtgroup_mutex. This means the filesystem code would have to export this
lock, resulting in an ill-defined interface between the two, and the possibility
of cross-architecture lock-ordering head aches.

The second part of this series adds a domain_list_lock to protect writes to the
domain list, and protects the domain list with RCU - or cpus_read_lock().

Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
working, its very likely they'll need to be plumbed up to perf. An uncore PMU
driver would need to be a lockless reader of the domain list.

This series is based on v6.6-rc1, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v6

Bugs welcome,

Thanks,

James

[1] https://lore.kernel.org/lkml/[email protected]/
[v1] https://lore.kernel.org/all/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/r/[email protected]
[v4] https://lore.kernel.org/r/[email protected]
[v6] https://lore.kernel.org/lkml/[email protected]/


James Morse (24):
tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
x86/resctrl: Create helper for RMID allocation and mondata dir
creation
x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
x86/resctrl: Track the closid with the rmid
x86/resctrl: Access per-rmid structures by index
x86/resctrl: Allow RMID allocation to be scoped by CLOSID
x86/resctrl: Track the number of dirty RMID a CLOSID has
x86/resctrl: Use set_bit()/clear_bit() instead of open coding
x86/resctrl: Allocate the cleanest CLOSID by searching
closid_num_dirty_rmid
x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
x86/resctrl: Queue mon_event_read() instead of sending an IPI
x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
x86/resctrl: Allow arch to allocate memory needed in
resctrl_arch_rmid_read()
x86/resctrl: Make resctrl_mounted checks explicit
x86/resctrl: Move alloc/mon static keys into helpers
x86/resctrl: Make rdt_enable_key the arch's decision to switch
x86/resctrl: Add helpers for system wide mon/alloc capable
x86/resctrl: Add CPU online callback for resctrl work
x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
cpu
x86/resctrl: Add cpu offline callback for resctrl work
x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
x86/resctrl: Separate arch and fs resctrl locks

arch/x86/include/asm/resctrl.h | 90 +++++
arch/x86/kernel/cpu/resctrl/core.c | 78 ++--
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 47 ++-
arch/x86/kernel/cpu/resctrl/internal.h | 56 ++-
arch/x86/kernel/cpu/resctrl/monitor.c | 434 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 345 ++++++++++++-----
include/linux/resctrl.h | 43 ++-
include/linux/tick.h | 9 +-
9 files changed, 857 insertions(+), 260 deletions(-)

--
2.39.2


2023-09-14 17:34:31

by James Morse

[permalink] [raw]
Subject: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

rmid_ptrs[] is allocated from dom_data_init() but never free()d.

While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.

Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().

There is no reason to backport this to a stable kernel.

Signed-off-by: James Morse <[email protected]>
---
Changes since v5:
* This patch is new
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 10 ++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++++
3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 85ceaf9a31ac..57cf1e6a57bd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -537,6 +537,7 @@ void closid_free(int closid);
int alloc_rmid(void);
void free_rmid(u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
+void resctrl_exit_mon_l3_config(struct rdt_resource *r);
bool __init rdt_cpu_has(int flag);
void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ded1fc7cb7cb..cfb3f632a4b2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
return 0;
}

+void resctrl_exit_mon_l3_config(struct rdt_resource *r)
+{
+ mutex_lock(&rdtgroup_mutex);
+
+ kfree(rmid_ptrs);
+ rmid_ptrs = NULL;
+
+ mutex_unlock(&rdtgroup_mutex);
+}
+
static struct mon_evt llc_occupancy_event = {
.name = "llc_occupancy",
.evtid = QOS_L3_OCCUP_EVENT_ID,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 725344048f85..a2158c266e41 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)

void __exit rdtgroup_exit(void)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+ if (r->mon_capable)
+ resctrl_exit_mon_l3_config(r);
+
debugfs_remove_recursive(debugfs_resctrl);
unregister_filesystem(&rdt_fs_type);
sysfs_remove_mount_point(fs_kobj, "resctrl");
--
2.39.2

2023-09-14 17:34:43

by James Morse

[permalink] [raw]
Subject: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
used for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.

The CLOSID found is returned to closid_alloc() for the free list
to be updated.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v4:
* Dropped stale section from comment
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/monitor.c | 42 ++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ad6e874d9ed2..f06d3d3e0808 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
void __init thread_throttle_mode_init(void);
void __init mbm_config_rftype_init(const char *config);
void rdt_staged_configs_clear(void);
+bool closid_allocated(unsigned int closid);
+int resctrl_find_cleanest_closid(void);

#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0c783301d106..0bbed8c62d42 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
return ERR_PTR(-ENOSPC);
}

+/**
+ * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
+ * RMID are clean, or the CLOSID that has
+ * the most clean RMID.
+ *
+ * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
+ * may not be able to allocate clean RMID. To avoid this the allocator will
+ * choose the CLOSID with the most clean RMID.
+ *
+ * When the CLOSID and RMID are independent numbers, the first free CLOSID will
+ * be returned.
+ */
+int resctrl_find_cleanest_closid(void)
+{
+ u32 cleanest_closid = ~0, iter_num_dirty;
+ int i = 0;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+ return -EIO;
+
+ for (i = 0; i < closids_supported(); i++) {
+ if (closid_allocated(i))
+ continue;
+
+ iter_num_dirty = closid_num_dirty_rmid[i];
+ if (iter_num_dirty == 0)
+ return i;
+
+ if (cleanest_closid == ~0)
+ cleanest_closid = i;
+
+ if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
+ cleanest_closid = i;
+ }
+
+ if (cleanest_closid == ~0)
+ return -ENOSPC;
+ return cleanest_closid;
+}
+
/*
* For MPAM the RMID value is not unique, and has to be considered with
* the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index fa449ee0d1a7..1f8f1c417a4b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -132,11 +132,20 @@ static void closid_init(void)

static int closid_alloc(void)
{
- u32 closid = ffs(closid_free_map);
+ u32 closid;
+ int err;

- if (closid == 0)
- return -ENOSPC;
- closid--;
+ if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+ err = resctrl_find_cleanest_closid();
+ if (err < 0)
+ return err;
+ closid = err;
+ } else {
+ closid = ffs(closid_free_map);
+ if (closid == 0)
+ return -ENOSPC;
+ closid--;
+ }
clear_bit(closid, &closid_free_map);

return closid;
@@ -154,7 +163,7 @@ void closid_free(int closid)
* Return: true if @closid is currently associated with a resource group,
* false if @closid is free
*/
-static bool closid_allocated(unsigned int closid)
+bool closid_allocated(unsigned int closid)
{
return !test_bit(closid, &closid_free_map);
}
--
2.39.2

2023-09-14 17:34:43

by James Morse

[permalink] [raw]
Subject: [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers

When switching tasks, the CLOSID and RMID that the new task should
use are stored in struct task_struct. For x86 the CLOSID known by resctrl,
the value in task_struct, and the value written to the CPU register are
all the same thing.

MPAM's CPU interface has two different PARTID's one for data accesses
the other for instruction fetch. Storing resctrl's CLOSID value in
struct task_struct implies the arch code knows whether resctrl is using
CDP.

Move the matching and setting of the struct task_struct properties
to use helpers. This allows arm64 to store the hardware format of
the register, instead of having to convert it each time.

__rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
values aren't seen as another CPU may schedule the task being moved
while the value is being changed. MPAM has an additional corner-case
here as the PMG bits extend the PARTID space. If the scheduler sees a
new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code
is not watching causing an inaccurate count. x86's RMID are independent
values, so the limbo code will still be watching the old-RMID in this
circumstance.
To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
Both values must be provided together.

Because MPAM's RMID values are not unique, the CLOSID must be provided
when matching the RMID.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2:
* __rdtgroup_move_task() changed to set CLOSID from different CLOSID place
depending on group type
---
arch/x86/include/asm/resctrl.h | 18 ++++++++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++++++++++----------
2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index db4c84dde2d5..1d274dbabc44 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -95,6 +95,24 @@ static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
return val * scale;
}

+static inline void resctrl_arch_set_closid_rmid(struct task_struct *tsk,
+ u32 closid, u32 rmid)
+{
+ WRITE_ONCE(tsk->closid, closid);
+ WRITE_ONCE(tsk->rmid, rmid);
+}
+
+static inline bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+ return READ_ONCE(tsk->closid) == closid;
+}
+
+static inline bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 ignored,
+ u32 rmid)
+{
+ return READ_ONCE(tsk->rmid) == rmid;
+}
+
static inline void resctrl_sched_in(struct task_struct *tsk)
{
if (static_branch_likely(&rdt_enable_key))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1f8f1c417a4b..64a0f71f6a5d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -97,7 +97,7 @@ void rdt_staged_configs_clear(void)
*
* Using a global CLOSID across all resources has some advantages and
* some drawbacks:
- * + We can simply set "current->closid" to assign a task to a resource
+ * + We can simply set current's closid to assign a task to a resource
* group.
* + Context switch code can avoid extra memory references deciding which
* CLOSID to load into the PQR_ASSOC MSR
@@ -563,14 +563,26 @@ static void update_task_closid_rmid(struct task_struct *t)
_update_task_closid_rmid(t);
}

+static bool task_in_rdtgroup(struct task_struct *tsk, struct rdtgroup *rdtgrp)
+{
+ u32 closid, rmid = rdtgrp->mon.rmid;
+
+ if (rdtgrp->type == RDTCTRL_GROUP)
+ closid = rdtgrp->closid;
+ else if (rdtgrp->type == RDTMON_GROUP)
+ closid = rdtgrp->mon.parent->closid;
+ else
+ return false;
+
+ return resctrl_arch_match_closid(tsk, closid) &&
+ resctrl_arch_match_rmid(tsk, closid, rmid);
+}
+
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
/* If the task is already in rdtgrp, no need to move the task. */
- if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
- tsk->rmid == rdtgrp->mon.rmid) ||
- (rdtgrp->type == RDTMON_GROUP && tsk->rmid == rdtgrp->mon.rmid &&
- tsk->closid == rdtgrp->mon.parent->closid))
+ if (task_in_rdtgroup(tsk, rdtgrp))
return 0;

/*
@@ -581,19 +593,19 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
* For monitor groups, can move the tasks only from
* their parent CTRL group.
*/
-
- if (rdtgrp->type == RDTCTRL_GROUP) {
- WRITE_ONCE(tsk->closid, rdtgrp->closid);
- WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
- } else if (rdtgrp->type == RDTMON_GROUP) {
- if (rdtgrp->mon.parent->closid == tsk->closid) {
- WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
- } else {
- rdt_last_cmd_puts("Can't move task to different control group\n");
- return -EINVAL;
- }
+ if (rdtgrp->type == RDTMON_GROUP &&
+ !resctrl_arch_match_closid(tsk, rdtgrp->mon.parent->closid)) {
+ rdt_last_cmd_puts("Can't move task to different control group\n");
+ return -EINVAL;
}

+ if (rdtgrp->type == RDTMON_GROUP)
+ resctrl_arch_set_closid_rmid(tsk, rdtgrp->mon.parent->closid,
+ rdtgrp->mon.rmid);
+ else
+ resctrl_arch_set_closid_rmid(tsk, rdtgrp->closid,
+ rdtgrp->mon.rmid);
+
/*
* Ensure the task's closid and rmid are written before determining if
* the task is current that will decide if it will be interrupted.
@@ -615,14 +627,15 @@ static int __rdtgroup_move_task(struct task_struct *tsk,

static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_alloc_capable &&
- (r->type == RDTCTRL_GROUP) && (t->closid == r->closid));
+ return (rdt_alloc_capable && (r->type == RDTCTRL_GROUP) &&
+ resctrl_arch_match_closid(t, r->closid));
}

static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
{
- return (rdt_mon_capable &&
- (r->type == RDTMON_GROUP) && (t->rmid == r->mon.rmid));
+ return (rdt_mon_capable && (r->type == RDTMON_GROUP) &&
+ resctrl_arch_match_rmid(t, r->mon.parent->closid,
+ r->mon.rmid));
}

/**
@@ -822,7 +835,7 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
rdtg->mode != RDT_MODE_EXCLUSIVE)
continue;

- if (rdtg->closid != tsk->closid)
+ if (!resctrl_arch_match_closid(tsk, rdtg->closid))
continue;

seq_printf(s, "res:%s%s\n", (rdtg == &rdtgroup_default) ? "/" : "",
@@ -830,7 +843,8 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
seq_puts(s, "mon:");
list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
mon.crdtgrp_list) {
- if (tsk->rmid != crg->mon.rmid)
+ if (!resctrl_arch_match_rmid(tsk, crg->mon.parent->closid,
+ crg->mon.rmid))
continue;
seq_printf(s, "%s", crg->kn->name);
break;
@@ -2691,8 +2705,8 @@ static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
for_each_process_thread(p, t) {
if (!from || is_closid_match(t, from) ||
is_rmid_match(t, from)) {
- WRITE_ONCE(t->closid, to->closid);
- WRITE_ONCE(t->rmid, to->mon.rmid);
+ resctrl_arch_set_closid_rmid(t, to->closid,
+ to->mon.rmid);

/*
* Order the closid/rmid stores above before the loads
--
2.39.2

2023-09-14 17:34:44

by James Morse

[permalink] [raw]
Subject: [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work

The resctrl architecture specific code may need to free a domain when
a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
Amongst other things, the resctrl filesystem code needs to clear this
CPU from the cpu_mask of any control and monitor groups.

Currently this is all done in core.c and called from
resctrl_offline_cpu(), making the split between architecture and
filesystem code unclear.

Move the filesystem work to remove the CPU from the control and monitor
groups into a filesystem helper called resctrl_offline_cpu(), and rename
the one in core.c resctrl_arch_offline_cpu().

The rdtgroup_mutex is unlocked and locked again in the call in
preparation for changing the locking rules for the architecture
code.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/kernel/cpu/resctrl/core.c | 25 +++++--------------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
include/linux/resctrl.h | 1 +
3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 37aa124f1e4c..00b1592fd059 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -623,31 +623,15 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
return 0;
}

-static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+static int resctrl_arch_offline_cpu(unsigned int cpu)
{
- struct rdtgroup *cr;
-
- list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
- if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask)) {
- break;
- }
- }
-}
-
-static int resctrl_offline_cpu(unsigned int cpu)
-{
- struct rdtgroup *rdtgrp;
struct rdt_resource *r;

mutex_lock(&rdtgroup_mutex);
+ resctrl_offline_cpu(cpu);
+
for_each_capable_rdt_resource(r)
domain_remove_cpu(cpu, r);
- list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
- if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
- clear_childcpus(rdtgrp, cpu);
- break;
- }
- }
clear_closid_rmid(cpu);
mutex_unlock(&rdtgroup_mutex);

@@ -970,7 +954,8 @@ static int __init resctrl_late_init(void)

state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"x86/resctrl/cat:online:",
- resctrl_arch_online_cpu, resctrl_offline_cpu);
+ resctrl_arch_online_cpu,
+ resctrl_arch_offline_cpu);
if (state < 0)
return state;

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 49f100c73838..f06a80d8fa3b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3878,6 +3878,30 @@ void resctrl_online_cpu(unsigned int cpu)
cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
}

+static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+{
+ struct rdtgroup *cr;
+
+ list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
+ if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask))
+ break;
+ }
+}
+
+void resctrl_offline_cpu(unsigned int cpu)
+{
+ struct rdtgroup *rdtgrp;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+ if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
+ clear_childcpus(rdtgrp, cpu);
+ break;
+ }
+ }
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 0888d1975161..74886cda5f66 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -226,6 +226,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_online_cpu(unsigned int cpu);
+void resctrl_offline_cpu(unsigned int cpu);

/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
--
2.39.2

2023-09-14 17:36:36

by James Morse

[permalink] [raw]
Subject: [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow

The limbo and overflow code picks a CPU to use from the domain's list
of online CPUs. Work is then scheduled on these CPUs to maintain
the limbo list and any counters that may overflow.

cpumask_any() may pick a CPU that is marked nohz_full, which will
either penalise the work that CPU was dedicated to, or delay the
processing of limbo list or counters that may overflow. Perhaps
indefinitely. Delaying the overflow handling will skew the bandwidth
values calculated by mba_sc, which expects to be called once a second.

Add cpumask_any_housekeeping() as a replacement for cpumask_any()
that prefers housekeeping CPUs. This helper will still return
a nohz_full CPU if that is the only option. The CPU to use is
re-evaluated each time the limbo/overflow work runs. This ensures
the work will move off a nohz_full CPU once a housekeeping CPU is
available.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v3:
* typos fixed

Changes since v4:
* Made temporary variables unsigned

Changes since v5:
* Restructured cpumask_any_housekeeping() to avoid later churn.
---
arch/x86/kernel/cpu/resctrl/internal.h | 24 ++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 17 ++++++++++++-----
2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f06d3d3e0808..37bb3de37a4a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -7,6 +7,7 @@
#include <linux/kernfs.h>
#include <linux/fs_context.h>
#include <linux/jump_label.h>
+#include <linux/tick.h>
#include <asm/resctrl.h>

#define L3_QOS_CDP_ENABLE 0x01ULL
@@ -55,6 +56,29 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)

+/**
+ * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
+ * aren't marked nohz_full
+ * @mask: The mask to pick a CPU from.
+ *
+ * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
+ * nohz_full, these are preferred.
+ */
+static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
+{
+ unsigned int cpu, hk_cpu;
+
+ cpu = cpumask_any(mask);
+ if (!tick_nohz_full_cpu(cpu))
+ return cpu;
+
+ hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
+ if (hk_cpu < nr_cpu_ids)
+ cpu = hk_cpu;
+
+ return cpu;
+}
+
struct rdt_fs_context {
struct kernfs_fs_context kfc;
bool enable_cdpl2;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0bbed8c62d42..993837e46db1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -782,9 +782,9 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
- int cpu = smp_processor_id();
struct rdt_resource *r;
struct rdt_domain *d;
+ int cpu;

mutex_lock(&rdtgroup_mutex);

@@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)

__check_limbo(d, false);

- if (has_busy_rmid(d))
+ if (has_busy_rmid(d)) {
+ cpu = cpumask_any_housekeeping(&d->cpu_mask);
schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
+ }

mutex_unlock(&rdtgroup_mutex);
}
@@ -804,7 +806,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;

- cpu = cpumask_any(&dom->cpu_mask);
+ cpu = cpumask_any_housekeeping(&dom->cpu_mask);
dom->cqm_work_cpu = cpu;

schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
@@ -814,10 +816,10 @@ void mbm_handle_overflow(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
- int cpu = smp_processor_id();
struct list_head *head;
struct rdt_resource *r;
struct rdt_domain *d;
+ int cpu;

mutex_lock(&rdtgroup_mutex);

@@ -838,6 +840,11 @@ void mbm_handle_overflow(struct work_struct *work)
update_mba_bw(prgrp, d);
}

+ /*
+ * Re-check for housekeeping CPUs. This allows the overflow handler to
+ * move off a nohz_full CPU quickly.
+ */
+ cpu = cpumask_any_housekeeping(&d->cpu_mask);
schedule_delayed_work_on(cpu, &d->mbm_over, delay);

out_unlock:
@@ -851,7 +858,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)

if (!static_branch_likely(&rdt_mon_enable_key))
return;
- cpu = cpumask_any(&dom->cpu_mask);
+ cpu = cpumask_any_housekeeping(&dom->cpu_mask);
dom->mbm_work_cpu = cpu;
schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
}
--
2.39.2

2023-09-14 17:37:01

by James Morse

[permalink] [raw]
Subject: [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit

The rdt_enable_key is switched when resctrl is mounted, and used to
prevent a second mount of the filesystem. It also enables the
architecture's context switch code.

This requires another architecture to have the same set of static-keys,
as resctrl depends on them too. The existing users of these static-keys
are implicitly also checking if the filesystem is mounted.

Make the resctrl_mounted checks explicit: resctrl can keep track of
whether it has been mounted once. This doesn't need to be combined with
whether the arch code is context switching the CLOSID.

rdt_mon_enable_key is never used just to test that resctrl is mounted,
but does also have this implication. Add a resctrl_mounted to all uses
of rdt_mon_enable_key. This will allow rdt_mon_enable_key to be swapped
with a helper in a subsequent patch.

This will allow the static-key changing to be moved behind resctrl_arch_
calls.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>

---
Changes since v3:
* Removed a newline.
* Rephrased commit message

Changes since v4:
* Rephrased comment.
---
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 12 ++++++++++--
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++------
3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 66d9ebb5e03a..0bcfb2da9109 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -143,6 +143,7 @@ extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
extern unsigned int rdt_mon_features;
extern struct list_head resctrl_schema_all;
+extern bool resctrl_mounted;

enum rdt_group_type {
RDTCTRL_GROUP = 0,
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 28a2c8765faf..7bbe3d91b1f1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -836,7 +836,11 @@ void mbm_handle_overflow(struct work_struct *work)

mutex_lock(&rdtgroup_mutex);

- if (!static_branch_likely(&rdt_mon_enable_key))
+ /*
+ * If the filesystem has been unmounted this work no longer needs to
+ * run.
+ */
+ if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
goto out_unlock;

r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -869,7 +873,11 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
unsigned long delay = msecs_to_jiffies(delay_ms);
int cpu;

- if (!static_branch_likely(&rdt_mon_enable_key))
+ /*
+ * When a domain comes online there is no guarantee the filesystem is
+ * mounted. If not, there is no need to catch counter overflow.
+ */
+ if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
return;
cpu = cpumask_any_housekeeping(&dom->cpu_mask);
dom->mbm_work_cpu = cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 64a0f71f6a5d..5a7d6f6b5018 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -42,6 +42,9 @@ LIST_HEAD(rdt_all_groups);
/* list of entries for the schemata file */
LIST_HEAD(resctrl_schema_all);

+/* The filesystem can only be mounted once. */
+bool resctrl_mounted;
+
/* Kernel fs node for "info" directory under root */
static struct kernfs_node *kn_info;

@@ -819,7 +822,7 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
mutex_lock(&rdtgroup_mutex);

/* Return empty if resctrl has not been mounted. */
- if (!static_branch_unlikely(&rdt_enable_key)) {
+ if (!resctrl_mounted) {
seq_puts(s, "res:\nmon:\n");
goto unlock;
}
@@ -2495,7 +2498,7 @@ static int rdt_get_tree(struct fs_context *fc)
/*
* resctrl file system can only be mounted once.
*/
- if (static_branch_unlikely(&rdt_enable_key)) {
+ if (resctrl_mounted) {
ret = -EBUSY;
goto out;
}
@@ -2543,8 +2546,10 @@ static int rdt_get_tree(struct fs_context *fc)
if (rdt_mon_capable)
static_branch_enable_cpuslocked(&rdt_mon_enable_key);

- if (rdt_alloc_capable || rdt_mon_capable)
+ if (rdt_alloc_capable || rdt_mon_capable) {
static_branch_enable_cpuslocked(&rdt_enable_key);
+ resctrl_mounted = true;
+ }

if (is_mbm_enabled()) {
r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -2815,6 +2820,7 @@ static void rdt_kill_sb(struct super_block *sb)
static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
static_branch_disable_cpuslocked(&rdt_mon_enable_key);
static_branch_disable_cpuslocked(&rdt_enable_key);
+ resctrl_mounted = false;
kernfs_kill_sb(sb);
mutex_unlock(&rdtgroup_mutex);
cpus_read_unlock();
@@ -3774,7 +3780,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
* If resctrl is mounted, remove all the
* per domain monitor data directories.
*/
- if (static_branch_unlikely(&rdt_mon_enable_key))
+ if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);

if (is_mbm_enabled())
@@ -3851,8 +3857,13 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
if (is_llc_occupancy_enabled())
INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);

- /* If resctrl is mounted, add per domain monitor data directories. */
- if (static_branch_unlikely(&rdt_mon_enable_key))
+ /*
+ * If the filesystem is not mounted then only the default resource group
+ * exists. Creation of its directories is deferred until mount time
+ * by rdt_get_tree() calling mkdir_mondata_all().
+ * If resctrl is mounted, add per domain monitor data directories.
+ */
+ if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
mkdir_mondata_subdir_allrdtgrp(r, d);

return 0;
--
2.39.2

2023-09-14 17:37:09

by James Morse

[permalink] [raw]
Subject: [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers

resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any
static keys that control it should be buried behind helpers.

Move the alloc/mon logic into arch-specific helpers as a preparatory step
for making the rdt_enable_key's status something the arch code decides.

This means other architectures don't have to mirror the static keys.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
arch/x86/include/asm/resctrl.h | 20 ++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 5 -----
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 29c4cc343787..3c9137b6ad4f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -42,6 +42,26 @@ DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);

+static inline void resctrl_arch_enable_alloc(void)
+{
+ static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_disable_alloc(void)
+{
+ static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_enable_mon(void)
+{
+ static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+}
+
+static inline void resctrl_arch_disable_mon(void)
+{
+ static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+}
+
/*
* __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0bcfb2da9109..ef50789e2b44 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -93,9 +93,6 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
return container_of(kfc, struct rdt_fs_context, kfc);
}

-DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
-DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
-
/**
* struct mon_evt - Entry in the event list of a resource
* @evtid: event id
@@ -453,8 +450,6 @@ extern struct mutex rdtgroup_mutex;

extern struct rdt_hw_resource rdt_resources_all[];
extern struct rdtgroup rdtgroup_default;
-DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
-
extern struct dentry *debugfs_resctrl;

enum resctrl_res_level {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5a7d6f6b5018..4c0e012142e2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2542,9 +2542,9 @@ static int rdt_get_tree(struct fs_context *fc)
goto out_psl;

if (rdt_alloc_capable)
- static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+ resctrl_arch_enable_alloc();
if (rdt_mon_capable)
- static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_enable_mon();

if (rdt_alloc_capable || rdt_mon_capable) {
static_branch_enable_cpuslocked(&rdt_enable_key);
@@ -2817,8 +2817,8 @@ static void rdt_kill_sb(struct super_block *sb)
rdt_pseudo_lock_release();
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
schemata_list_destroy();
- static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
- static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+ resctrl_arch_disable_alloc();
+ resctrl_arch_disable_mon();
static_branch_disable_cpuslocked(&rdt_enable_key);
resctrl_mounted = false;
kernfs_kill_sb(sb);
--
2.39.2

2023-09-14 20:15:03

by James Morse

[permalink] [raw]
Subject: [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID

MPAMs RMID values are not unique unless the CLOSID is considered as well.

alloc_rmid() expects the RMID to be an independent number.

Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when
allocating. If the CLOSID is not relevant to the index, this ends up
comparing the free RMID with itself, and the first free entry will be
used. With MPAM the CLOSID is included in the index, so this becomes a
walk of the free RMID entries, until one that matches the supplied
CLOSID is found.

Reviewed-by: Shaopeng Tan <[email protected]>
Tested-by: Shaopeng Tan <[email protected]>
Tested-By: Peter Newman <[email protected]>
Signed-off-by: James Morse <[email protected]>
---
Changes since v2;
* Rephrased comment in resctrl_find_free_rmid() to describe this in terms of
list_entry_first()
* Rephrased comment above alloc_rmid()

Changes since v3:
* Flipped conditions in alloc_rmid()

Changes since v4:
* Typo in comment

Changes since v5:
* Reworded two comments.
---
arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 52 +++++++++++++++++------
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
4 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ab96af8d9953..ad6e874d9ed2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -535,7 +535,7 @@ void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
int closids_supported(void);
void closid_free(int closid);
-int alloc_rmid(void);
+int alloc_rmid(u32 closid);
void free_rmid(u32 closid, u32 rmid);
int rdt_get_mon_l3_config(struct rdt_resource *r);
void resctrl_exit_mon_l3_config(struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index be0b7cb6e1f5..d286aba1ee63 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -345,24 +345,50 @@ bool has_busy_rmid(struct rdt_domain *d)
return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
}

-/*
- * As of now the RMIDs allocation is global.
- * However we keep track of which packages the RMIDs
- * are used to optimize the limbo list management.
- */
-int alloc_rmid(void)
+static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
{
- struct rmid_entry *entry;
-
- lockdep_assert_held(&rdtgroup_mutex);
+ struct rmid_entry *itr;
+ u32 itr_idx, cmp_idx;

if (list_empty(&rmid_free_lru))
- return rmid_limbo_count ? -EBUSY : -ENOSPC;
+ return rmid_limbo_count ? ERR_PTR(-EBUSY) : ERR_PTR(-ENOSPC);
+
+ list_for_each_entry(itr, &rmid_free_lru, list) {
+ /*
+ * Get the index of this free RMID, and the index it would need
+ * to be if it were used with this CLOSID.
+ * If the CLOSID is irrelevant on this architecture, the two
+ * index values are always same on every entry and thus the
+ * very first entry will be returned.
+
+ */
+ itr_idx = resctrl_arch_rmid_idx_encode(itr->closid, itr->rmid);
+ cmp_idx = resctrl_arch_rmid_idx_encode(closid, itr->rmid);
+
+ if (itr_idx == cmp_idx)
+ return itr;
+ }
+
+ return ERR_PTR(-ENOSPC);
+}
+
+/*
+ * For MPAM the RMID value is not unique, and has to be considered with
+ * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
+ * allows all domains to be managed by a single free list.
+ * Each domain also has a rmid_busy_llc to reduce the work of the limbo handler.
+ */
+int alloc_rmid(u32 closid)
+{
+ struct rmid_entry *entry;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ entry = resctrl_find_free_rmid(closid);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);

- entry = list_first_entry(&rmid_free_lru,
- struct rmid_entry, list);
list_del(&entry->list);
-
return entry->rmid;
}

diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 65bee6f11015..d8f44113ed1f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -777,7 +777,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
int ret;

if (rdt_mon_capable) {
- ret = alloc_rmid();
+ ret = alloc_rmid(rdtgrp->closid);
if (ret < 0) {
rdt_last_cmd_puts("Out of RMIDs\n");
return ret;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 61f338d96906..ac1a6437469f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3172,7 +3172,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
if (!rdt_mon_capable)
return 0;

- ret = alloc_rmid();
+ ret = alloc_rmid(rdtgrp->closid);
if (ret < 0) {
rdt_last_cmd_puts("Out of RMIDs\n");
return ret;
--
2.39.2

2023-09-27 07:50:41

by Shaopeng Tan (Fujitsu)

[permalink] [raw]
Subject: RE: [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hello James,

I reviewed this patch series(v6) and ran resctrl selftest on
Intel(R) Xeon(R) Gold 6254 CPU with nohz_full enabled/disabled,
there is no problem.

<reviewed-by:[email protected]>
<tested-by:[email protected]>


> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch
> code and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
>
> (What's MPAM? See the cover letter of the first series. [1])
>
> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but
> this isn't an independent number - it extends the PARTID (same as CLOSID)
> space with bits that aren't used to select the configuration. The monitors can
> then be told to match specific PMG values, allowing monitor-groups to be
> created.
>
> But, MPAM expects the monitors to always monitor by PARTID. The
> Cache-storage-utilisation counters can only work this way.
> (In the MPAM spec not setting the MATCH_PARTID bit is made
> CONSTRAINED UNPREDICTABLE - which is Arm's term to mean portable
> software can't rely on
> this)
>
> It gets worse, as some SoCs may have very few PMG bits. I've seen the
> datasheet for one that has a single bit of PMG space.
>
> To be usable, MPAM's counters always need the PARTID and the PMG.
> For resctrl, this means always making the CLOSID available when the RMID is
> used.
>
> To ensure RMID are always unique, this series combines the CLOSID and
> RMID into an index, and manages RMID based on that. For x86, the index and
> RMID would always be the same.
>
>
> Currently the architecture specific code in the cpuhp callbacks takes the
> rdtgroup_mutex. This means the filesystem code would have to export this lock,
> resulting in an ill-defined interface between the two, and the possibility of
> cross-architecture lock-ordering head aches.
>
> The second part of this series adds a domain_list_lock to protect writes to the
> domain list, and protects the domain list with RCU - or cpus_read_lock().
>
> Use of RCU is to allow lockless readers of the domain list. To get MPAMs
> monitors working, its very likely they'll need to be plumbed up to perf. An
> uncore PMU driver would need to be a lockless reader of the domain list.
>
> This series is based on v6.6-rc1, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/monitors_and_locking/v6
>
> Bugs welcome,
>
> Thanks,
>
> James
>
> [1]
> https://lore.kernel.org/lkml/[email protected]
> /
> [v1]
> https://lore.kernel.org/all/[email protected]/
> [v2]
> https://lore.kernel.org/lkml/[email protected]
> /
> [v3]
> https://lore.kernel.org/r/[email protected]
> [v4]
> https://lore.kernel.org/r/[email protected]
> [v6]
> https://lore.kernel.org/lkml/[email protected]
> /
>
>
> James Morse (24):
> tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
> x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
> x86/resctrl: Create helper for RMID allocation and mondata dir
> creation
> x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
> x86/resctrl: Track the closid with the rmid
> x86/resctrl: Access per-rmid structures by index
> x86/resctrl: Allow RMID allocation to be scoped by CLOSID
> x86/resctrl: Track the number of dirty RMID a CLOSID has
> x86/resctrl: Use set_bit()/clear_bit() instead of open coding
> x86/resctrl: Allocate the cleanest CLOSID by searching
> closid_num_dirty_rmid
> x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
> x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
> x86/resctrl: Queue mon_event_read() instead of sending an IPI
> x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
> x86/resctrl: Allow arch to allocate memory needed in
> resctrl_arch_rmid_read()
> x86/resctrl: Make resctrl_mounted checks explicit
> x86/resctrl: Move alloc/mon static keys into helpers
> x86/resctrl: Make rdt_enable_key the arch's decision to switch
> x86/resctrl: Add helpers for system wide mon/alloc capable
> x86/resctrl: Add CPU online callback for resctrl work
> x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
> cpu
> x86/resctrl: Add cpu offline callback for resctrl work
> x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
> x86/resctrl: Separate arch and fs resctrl locks
>
> arch/x86/include/asm/resctrl.h | 90 +++++
> arch/x86/kernel/cpu/resctrl/core.c | 78 ++--
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 47 ++-
> arch/x86/kernel/cpu/resctrl/internal.h | 56 ++-
> arch/x86/kernel/cpu/resctrl/monitor.c | 434
> +++++++++++++++++-----
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 345 ++++++++++++-----
> include/linux/resctrl.h | 43 ++-
> include/linux/tick.h | 9 +-
> 9 files changed, 857 insertions(+), 260 deletions(-)
>
> --
> 2.39.2

2023-09-29 16:23:37

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Hello!

On 27/09/2023 08:38, Shaopeng Tan (Fujitsu) wrote:
> I reviewed this patch series(v6) and ran resctrl selftest on
> Intel(R) Xeon(R) Gold 6254 CPU with nohz_full enabled/disabled,
> there is no problem.
>
> <reviewed-by:[email protected]>
> <tested-by:[email protected]>

Thanks for your testing and review!

James

2023-10-02 17:30:07

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 725344048f85..a2158c266e41 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>
> void __exit rdtgroup_exit(void)
> {
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> + if (r->mon_capable)
> + resctrl_exit_mon_l3_config(r);
> +
> debugfs_remove_recursive(debugfs_resctrl);
> unregister_filesystem(&rdt_fs_type);
> sysfs_remove_mount_point(fs_kobj, "resctrl");

You did not respond to me when I requested that this be done differently [1].
Without a response letting me know the faults of my proposal or following the
recommendation I conclude that my feedback was ignored.

Reinette

[1] https://lore.kernel.org/lkml/[email protected]/

2023-10-03 21:13:54

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index be0b7cb6e1f5..d286aba1ee63 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -345,24 +345,50 @@ bool has_busy_rmid(struct rdt_domain *d)
> return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
> }
>
> -/*
> - * As of now the RMIDs allocation is global.
> - * However we keep track of which packages the RMIDs
> - * are used to optimize the limbo list management.
> - */
> -int alloc_rmid(void)
> +static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
> {
> - struct rmid_entry *entry;
> -
> - lockdep_assert_held(&rdtgroup_mutex);
> + struct rmid_entry *itr;
> + u32 itr_idx, cmp_idx;
>
> if (list_empty(&rmid_free_lru))
> - return rmid_limbo_count ? -EBUSY : -ENOSPC;
> + return rmid_limbo_count ? ERR_PTR(-EBUSY) : ERR_PTR(-ENOSPC);
> +
> + list_for_each_entry(itr, &rmid_free_lru, list) {
> + /*
> + * Get the index of this free RMID, and the index it would need
> + * to be if it were used with this CLOSID.
> + * If the CLOSID is irrelevant on this architecture, the two
> + * index values are always same on every entry and thus the

"are always same" -> "are always the same"?

> + * very first entry will be returned.
> +

Stray empty line.

> + */
> + itr_idx = resctrl_arch_rmid_idx_encode(itr->closid, itr->rmid);
> + cmp_idx = resctrl_arch_rmid_idx_encode(closid, itr->rmid);
> +
> + if (itr_idx == cmp_idx)
> + return itr;
> + }
> +
> + return ERR_PTR(-ENOSPC);
> +}
> +
> +/*

Rest of the patch looks good.

Reinette

2023-10-03 21:15:03

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
>
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
>
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2023-10-03 21:15:53

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> When switching tasks, the CLOSID and RMID that the new task should
> use are stored in struct task_struct. For x86 the CLOSID known by resctrl,
> the value in task_struct, and the value written to the CPU register are
> all the same thing.
>
> MPAM's CPU interface has two different PARTID's one for data accesses
> the other for instruction fetch. Storing resctrl's CLOSID value in
> struct task_struct implies the arch code knows whether resctrl is using
> CDP.
>
> Move the matching and setting of the struct task_struct properties
> to use helpers. This allows arm64 to store the hardware format of
> the register, instead of having to convert it each time.
>
> __rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
> values aren't seen as another CPU may schedule the task being moved
> while the value is being changed. MPAM has an additional corner-case
> here as the PMG bits extend the PARTID space. If the scheduler sees a
> new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code
> is not watching causing an inaccurate count. x86's RMID are independent
> values, so the limbo code will still be watching the old-RMID in this
> circumstance.
> To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
> Both values must be provided together.
>
> Because MPAM's RMID values are not unique, the CLOSID must be provided
> when matching the RMID.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2023-10-03 21:16:41

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The limbo and overflow code picks a CPU to use from the domain's list
> of online CPUs. Work is then scheduled on these CPUs to maintain
> the limbo list and any counters that may overflow.
>
> cpumask_any() may pick a CPU that is marked nohz_full, which will
> either penalise the work that CPU was dedicated to, or delay the
> processing of limbo list or counters that may overflow. Perhaps
> indefinitely. Delaying the overflow handling will skew the bandwidth
> values calculated by mba_sc, which expects to be called once a second.
>
> Add cpumask_any_housekeeping() as a replacement for cpumask_any()
> that prefers housekeeping CPUs. This helper will still return
> a nohz_full CPU if that is the only option. The CPU to use is
> re-evaluated each time the limbo/overflow work runs. This ensures
> the work will move off a nohz_full CPU once a housekeeping CPU is
> available.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v3:
> * typos fixed
>
> Changes since v4:
> * Made temporary variables unsigned
>
> Changes since v5:
> * Restructured cpumask_any_housekeeping() to avoid later churn.
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 24 ++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 17 ++++++++++++-----
> 2 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index f06d3d3e0808..37bb3de37a4a 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -7,6 +7,7 @@
> #include <linux/kernfs.h>
> #include <linux/fs_context.h>
> #include <linux/jump_label.h>
> +#include <linux/tick.h>
> #include <asm/resctrl.h>
>

Please maintain the empty line between groups of headers.


...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0bbed8c62d42..993837e46db1 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -782,9 +782,9 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
> void cqm_handle_limbo(struct work_struct *work)
> {
> unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
> - int cpu = smp_processor_id();
> struct rdt_resource *r;
> struct rdt_domain *d;
> + int cpu;
>
> mutex_lock(&rdtgroup_mutex);
>
> @@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)
>
> __check_limbo(d, false);
>
> - if (has_busy_rmid(d))
> + if (has_busy_rmid(d)) {
> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
> schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
> + }
>

ok - but if you do change the CPU the worker is running on then
I also expect d->cqm_work_cpu to be updated. Otherwise the offline
code will not be able to determine if the worker needs to move.

> mutex_unlock(&rdtgroup_mutex);
> }
> @@ -804,7 +806,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
> unsigned long delay = msecs_to_jiffies(delay_ms);
> int cpu;
>
> - cpu = cpumask_any(&dom->cpu_mask);
> + cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> dom->cqm_work_cpu = cpu;
>
> schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
> @@ -814,10 +816,10 @@ void mbm_handle_overflow(struct work_struct *work)
> {
> unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
> struct rdtgroup *prgrp, *crgrp;
> - int cpu = smp_processor_id();
> struct list_head *head;
> struct rdt_resource *r;
> struct rdt_domain *d;
> + int cpu;
>
> mutex_lock(&rdtgroup_mutex);
>
> @@ -838,6 +840,11 @@ void mbm_handle_overflow(struct work_struct *work)
> update_mba_bw(prgrp, d);
> }
>
> + /*
> + * Re-check for housekeeping CPUs. This allows the overflow handler to
> + * move off a nohz_full CPU quickly.
> + */
> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
> schedule_delayed_work_on(cpu, &d->mbm_over, delay);
>

Similar to above I expect a change like this to
be accompanied by an update to d->mbm_work_cpu.

> out_unlock:
> @@ -851,7 +858,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>
> if (!static_branch_likely(&rdt_mon_enable_key))
> return;
> - cpu = cpumask_any(&dom->cpu_mask);
> + cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> dom->mbm_work_cpu = cpu;
> schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
> }


Reinette

2023-10-03 21:19:30

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The rdt_enable_key is switched when resctrl is mounted, and used to
> prevent a second mount of the filesystem. It also enables the
> architecture's context switch code.
>
> This requires another architecture to have the same set of static-keys,
> as resctrl depends on them too. The existing users of these static-keys
> are implicitly also checking if the filesystem is mounted.
>
> Make the resctrl_mounted checks explicit: resctrl can keep track of
> whether it has been mounted once. This doesn't need to be combined with
> whether the arch code is context switching the CLOSID.
>
> rdt_mon_enable_key is never used just to test that resctrl is mounted,
> but does also have this implication. Add a resctrl_mounted to all uses
> of rdt_mon_enable_key. This will allow rdt_mon_enable_key to be swapped
> with a helper in a subsequent patch.
>
> This will allow the static-key changing to be moved behind resctrl_arch_
> calls.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
>
> ---

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2023-10-03 21:19:56

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> resctrl enables three static keys depending on the features it has enabled.
> Another architecture's context switch code may look different, any
> static keys that control it should be buried behind helpers.
>
> Move the alloc/mon logic into arch-specific helpers as a preparatory step
> for making the rdt_enable_key's status something the arch code decides.
>
> This means other architectures don't have to mirror the static keys.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2023-10-03 21:23:24

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The resctrl architecture specific code may need to free a domain when
> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
> Amongst other things, the resctrl filesystem code needs to clear this
> CPU from the cpu_mask of any control and monitor groups.
>
> Currently this is all done in core.c and called from
> resctrl_offline_cpu(), making the split between architecture and
> filesystem code unclear.
>
> Move the filesystem work to remove the CPU from the control and monitor
> groups into a filesystem helper called resctrl_offline_cpu(), and rename
> the one in core.c resctrl_arch_offline_cpu().
>
> The rdtgroup_mutex is unlocked and locked again in the call in
> preparation for changing the locking rules for the architecture
> code.

This last paragraph may cause some confusion since this refactoring
is not changing any current locking. I'll defer to you if you prefer
to keep it.

>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---

Reviewed-by: Reinette Chatre <[email protected]>

Reinette

2023-10-04 18:01:39

by Moger, Babu

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi James,

On 9/14/23 12:21, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
>
> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().
>
> There is no reason to backport this to a stable kernel.
>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v5:
> * This patch is new
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 1 +
> arch/x86/kernel/cpu/resctrl/monitor.c | 10 ++++++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 5 +++++
> 3 files changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 85ceaf9a31ac..57cf1e6a57bd 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -537,6 +537,7 @@ void closid_free(int closid);
> int alloc_rmid(void);
> void free_rmid(u32 rmid);
> int rdt_get_mon_l3_config(struct rdt_resource *r);
> +void resctrl_exit_mon_l3_config(struct rdt_resource *r);
> bool __init rdt_cpu_has(int flag);
> void mon_event_count(void *info);
> int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index ded1fc7cb7cb..cfb3f632a4b2 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
> return 0;
> }
>
> +void resctrl_exit_mon_l3_config(struct rdt_resource *r)
> +{
> + mutex_lock(&rdtgroup_mutex);
> +
> + kfree(rmid_ptrs);
> + rmid_ptrs = NULL;
> +
> + mutex_unlock(&rdtgroup_mutex);
> +}

What is the need for passing "rdt_resource *r" here?
Is mutex_lock required?

Thanks
Babu

2023-10-05 17:16:22

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi Babu,

On 04/10/2023 19:00, Moger, Babu wrote:
> On 9/14/23 12:21, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
>>
>> There is no reason to backport this to a stable kernel.

>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 85ceaf9a31ac..57cf1e6a57bd 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -537,6 +537,7 @@ void closid_free(int closid);
>> int alloc_rmid(void);
>> void free_rmid(u32 rmid);
>> int rdt_get_mon_l3_config(struct rdt_resource *r);
>> +void resctrl_exit_mon_l3_config(struct rdt_resource *r);
>> bool __init rdt_cpu_has(int flag);
>> void mon_event_count(void *info);
>> int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index ded1fc7cb7cb..cfb3f632a4b2 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
>> return 0;
>> }
>>
>> +void resctrl_exit_mon_l3_config(struct rdt_resource *r)
>> +{
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + kfree(rmid_ptrs);
>> + rmid_ptrs = NULL;
>> +
>> + mutex_unlock(&rdtgroup_mutex);
>> +}

> What is the need for passing "rdt_resource *r" here?

My vain belief that monitors should be supported on something other than L3, but I agree
that isn't what resctrl does today. I'll remove it.


> Is mutex_lock required?

Reads and writes to rmid_ptrs[] are protected by that lock. This ensures no-one reads the
value while its being free()d, and after this function releases the lock, anyone trying
sees NULL.

(This is all moot as its only caller is marked __exit, so gets discarded by the linker)



Thanks,

James

2023-10-05 17:17:51

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi Reinette,

On 02/10/2023 18:00, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 725344048f85..a2158c266e41 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>
>> void __exit rdtgroup_exit(void)
>> {
>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> + if (r->mon_capable)
>> + resctrl_exit_mon_l3_config(r);
>> +
>> debugfs_remove_recursive(debugfs_resctrl);
>> unregister_filesystem(&rdt_fs_type);
>> sysfs_remove_mount_point(fs_kobj, "resctrl");
>
> You did not respond to me when I requested that this be done differently [1].
> Without a response letting me know the faults of my proposal or following the
> recommendation I conclude that my feedback was ignored.

Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
one if you prefer, but I find that adds more noise than signal.

This is my attempt at 'doing the cleanup properly', which is what you said your preference
was. (no machine on the planet can ever run this code, the __exit section is always
discarded by the linker).

Reading through again, I missed that you wanted this called from resctrl_exit(). (The
naming suggests I did this originally, but it didn't work out).
I don't think this works as the code in resctrl_exit() remains part of the arch code after
the move, but allocating rmid_ptrs[] stays part of the fs code.

resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
the name resctrl_exit() as its part of the exposed interface.


Thanks,

James

2023-10-05 17:37:09

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow

Hi Reinette,

On 03/10/2023 22:15, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> The limbo and overflow code picks a CPU to use from the domain's list
>> of online CPUs. Work is then scheduled on these CPUs to maintain
>> the limbo list and any counters that may overflow.
>>
>> cpumask_any() may pick a CPU that is marked nohz_full, which will
>> either penalise the work that CPU was dedicated to, or delay the
>> processing of limbo list or counters that may overflow. Perhaps
>> indefinitely. Delaying the overflow handling will skew the bandwidth
>> values calculated by mba_sc, which expects to be called once a second.
>>
>> Add cpumask_any_housekeeping() as a replacement for cpumask_any()
>> that prefers housekeeping CPUs. This helper will still return
>> a nohz_full CPU if that is the only option. The CPU to use is
>> re-evaluated each time the limbo/overflow work runs. This ensures
>> the work will move off a nohz_full CPU once a housekeeping CPU is
>> available.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 0bbed8c62d42..993837e46db1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c

>> @@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)
>>
>> __check_limbo(d, false);
>>
>> - if (has_busy_rmid(d))
>> + if (has_busy_rmid(d)) {
>> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
>> + }
>>
>
> ok - but if you do change the CPU the worker is running on then
> I also expect d->cqm_work_cpu to be updated. Otherwise the offline
> code will not be able to determine if the worker needs to move.

Good point - I missed this.


Thanks,

James

2023-10-05 18:04:58

by Reinette Chatre

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi James,

On 10/5/2023 10:05 AM, James Morse wrote:
> On 02/10/2023 18:00, Reinette Chatre wrote:
>> On 9/14/2023 10:21 AM, James Morse wrote:
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 725344048f85..a2158c266e41 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>>
>>> void __exit rdtgroup_exit(void)
>>> {
>>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>> + if (r->mon_capable)
>>> + resctrl_exit_mon_l3_config(r);
>>> +
>>> debugfs_remove_recursive(debugfs_resctrl);
>>> unregister_filesystem(&rdt_fs_type);
>>> sysfs_remove_mount_point(fs_kobj, "resctrl");
>>
>> You did not respond to me when I requested that this be done differently [1].
>> Without a response letting me know the faults of my proposal or following the
>> recommendation I conclude that my feedback was ignored.
>
> Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
> one if you prefer, but I find that adds more noise than signal.

I do not expect a response to every review feedback but no response
is assumed to mean that you agree with the feedback.

>
> This is my attempt at 'doing the cleanup properly', which is what you said your preference
> was. (no machine on the planet can ever run this code, the __exit section is always
> discarded by the linker).
>
> Reading through again, I missed that you wanted this called from resctrl_exit(). (The

Right. And not responding to that created expectation that you agreed with the
request.

> naming suggests I did this originally, but it didn't work out).
> I don't think this works as the code in resctrl_exit() remains part of the arch code after
> the move, but allocating rmid_ptrs[] stays part of the fs code.
>
> resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
> the name resctrl_exit() as its part of the exposed interface.

I expect memory allocation/free to be symmetrical. Doing otherwise
complicates the code. Having this memory freed in rdtgroup_exit() only
seems appropriate if it is allocated from rdtgroup_init().
Neither rmid_ptrs[] nor closid_num_dirty_rmid are allocated in
rdtgroup_init() so freeing it in rdtgroup_exit() is not appropriate.

If you are planning to move resctrl_exit() to be arch code then I expect
resctrl_late_init() to be split with the rmid_ptrs[]/closid_num_dirty_rmid
allocation moving to fs code. Freeing that memory can follow at that time.

Reinette

2023-10-05 20:14:08

by Moger, Babu

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

Hi James,

On 9/14/2023 12:21 PM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
>
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
>
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v4:
> * Dropped stale section from comment
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/monitor.c | 42 ++++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
> 3 files changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ad6e874d9ed2..f06d3d3e0808 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void __init thread_throttle_mode_init(void);
> void __init mbm_config_rftype_init(const char *config);
> void rdt_staged_configs_clear(void);
> +bool closid_allocated(unsigned int closid);
> +int resctrl_find_cleanest_closid(void);
>
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0c783301d106..0bbed8c62d42 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
> return ERR_PTR(-ENOSPC);
> }
>
> +/**
> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
> + * RMID are clean, or the CLOSID that has
> + * the most clean RMID.
> + *
> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
> + * may not be able to allocate clean RMID. To avoid this the allocator will
> + * choose the CLOSID with the most clean RMID.
> + *
> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
> + * be returned.
> + */
> +int resctrl_find_cleanest_closid(void)
> +{
> + u32 cleanest_closid = ~0, iter_num_dirty;

Just naming num_dirty should have been fine.  I will leave it you.

> + int i = 0;
> +
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> + return -EIO;
> +
> + for (i = 0; i < closids_supported(); i++) {
> + if (closid_allocated(i))
> + continue;
> +
> + iter_num_dirty = closid_num_dirty_rmid[i];
> + if (iter_num_dirty == 0)
> + return i;
> +
> + if (cleanest_closid == ~0)
> + cleanest_closid = i;
> +
> + if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
> + cleanest_closid = i;
> + }
> +
> + if (cleanest_closid == ~0)
> + return -ENOSPC;
> + return cleanest_closid;

Line before the return looks clean.

* if (cleanest_closid == ~0)
+ return -ENOSPC;
+
+ return cleanest_closid;

Thanks
Babu

> +}
> +
> /*
> * For MPAM the RMID value is not unique, and has to be considered with
> * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fa449ee0d1a7..1f8f1c417a4b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -132,11 +132,20 @@ static void closid_init(void)
>
> static int closid_alloc(void)
> {
> - u32 closid = ffs(closid_free_map);
> + u32 closid;
> + int err;
>
> - if (closid == 0)
> - return -ENOSPC;
> - closid--;
> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> + err = resctrl_find_cleanest_closid();
> + if (err < 0)
> + return err;
> + closid = err;
> + } else {
> + closid = ffs(closid_free_map);
> + if (closid == 0)
> + return -ENOSPC;
> + closid--;
> + }
> clear_bit(closid, &closid_free_map);
>
> return closid;
> @@ -154,7 +163,7 @@ void closid_free(int closid)
> * Return: true if @closid is currently associated with a resource group,
> * false if @closid is free
> */
> -static bool closid_allocated(unsigned int closid)
> +bool closid_allocated(unsigned int closid)
> {
> return !test_bit(closid, &closid_free_map);
> }

2023-10-05 20:27:09

by Moger, Babu

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

Hi James,

One more comment.

On 9/14/2023 12:21 PM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
>
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
>
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
>
> Reviewed-by: Shaopeng Tan <[email protected]>
> Tested-by: Shaopeng Tan <[email protected]>
> Tested-By: Peter Newman <[email protected]>
> Signed-off-by: James Morse <[email protected]>
> ---
> Changes since v4:
> * Dropped stale section from comment
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/monitor.c | 42 ++++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
> 3 files changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ad6e874d9ed2..f06d3d3e0808 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> void __init thread_throttle_mode_init(void);
> void __init mbm_config_rftype_init(const char *config);
> void rdt_staged_configs_clear(void);
> +bool closid_allocated(unsigned int closid);
> +int resctrl_find_cleanest_closid(void);
>
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0c783301d106..0bbed8c62d42 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
> return ERR_PTR(-ENOSPC);
> }
>
> +/**
> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
> + * RMID are clean, or the CLOSID that has
> + * the most clean RMID.
> + *
> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
> + * may not be able to allocate clean RMID. To avoid this the allocator will
> + * choose the CLOSID with the most clean RMID.
> + *
> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
> + * be returned.
> + */
> +int resctrl_find_cleanest_closid(void)
> +{
> + u32 cleanest_closid = ~0, iter_num_dirty;
> + int i = 0;
> +
> + lockdep_assert_held(&rdtgroup_mutex);
> +
> + if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> + return -EIO;
> +
> + for (i = 0; i < closids_supported(); i++) {
> + if (closid_allocated(i))
> + continue;
> +
> + iter_num_dirty = closid_num_dirty_rmid[i];
> + if (iter_num_dirty == 0)
> + return i;
> +
> + if (cleanest_closid == ~0)
> + cleanest_closid = i;
> +
> + if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
> + cleanest_closid = i;
> + }
> +
> + if (cleanest_closid == ~0)
> + return -ENOSPC;
> + return cleanest_closid;
> +}
> +
> /*
> * For MPAM the RMID value is not unique, and has to be considered with
> * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fa449ee0d1a7..1f8f1c417a4b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -132,11 +132,20 @@ static void closid_init(void)
>
> static int closid_alloc(void)
> {
> - u32 closid = ffs(closid_free_map);
> + u32 closid;
> + int err;

Naming "err" seems odd here.  How about cleanest_closid ?

Thanks

Babu

>
> - if (closid == 0)
> - return -ENOSPC;
> - closid--;
> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> + err = resctrl_find_cleanest_closid();
> + if (err < 0)
> + return err;
> + closid = err;
> + } else {
> + closid = ffs(closid_free_map);
> + if (closid == 0)
> + return -ENOSPC;
> + closid--;
> + }
> clear_bit(closid, &closid_free_map);
>
> return closid;
> @@ -154,7 +163,7 @@ void closid_free(int closid)
> * Return: true if @closid is currently associated with a resource group,
> * false if @closid is free
> */
> -static bool closid_allocated(unsigned int closid)
> +bool closid_allocated(unsigned int closid)
> {
> return !test_bit(closid, &closid_free_map);
> }

2023-10-24 12:07:11

by Maciej Wieczor-Retman

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

On 2023-09-14 at 17:21:24 +0000, James Morse wrote:
>MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>used for different control groups.
>
>This means once a CLOSID is allocated, all its monitoring ids may still be
>dirty, and held in limbo.
>
>Instead of allocating the first free CLOSID, on architectures where
>CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search

"CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID" >
"CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID"?

>closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
>The CLOSID found is returned to closid_alloc() for the free list
>to be updated.

--
Kind regards
Maciej Wiecz?r-Retman

2023-10-25 17:57:04

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

Hi Babu,

On 05/10/2023 21:26, Moger, Babu wrote:
> On 9/14/2023 12:21 PM, James Morse wrote:
>> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>> used for different control groups.
>>
>> This means once a CLOSID is allocated, all its monitoring ids may still be
>> dirty, and held in limbo.
>>
>> Instead of allocating the first free CLOSID, on architectures where
>> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
>> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>>
>> The CLOSID found is returned to closid_alloc() for the free list
>> to be updated.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index fa449ee0d1a7..1f8f1c417a4b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -132,11 +132,20 @@ static void closid_init(void)
>>     static int closid_alloc(void)
>>   {
>> -    u32 closid = ffs(closid_free_map);
>> +    u32 closid;
>> +    int err;

> Naming "err" seems odd here.  How about cleanest_closid ?

That's just habit because the value might be an error code until its been checked, and
once it has, its called 'closid'. But sure, if you think that is clearer.


Thanks,

James

2023-10-25 17:57:04

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid

Hi Babu,

On 05/10/2023 21:13, Moger, Babu wrote:
> On 9/14/2023 12:21 PM, James Morse wrote:
>> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>> used for different control groups.
>>
>> This means once a CLOSID is allocated, all its monitoring ids may still be
>> dirty, and held in limbo.
>>
>> Instead of allocating the first free CLOSID, on architectures where
>> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
>> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>>
>> The CLOSID found is returned to closid_alloc() for the free list
>> to be updated.

>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index ad6e874d9ed2..f06d3d3e0808 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>   void __init thread_throttle_mode_init(void);
>>   void __init mbm_config_rftype_init(const char *config);
>>   void rdt_staged_configs_clear(void);
>> +bool closid_allocated(unsigned int closid);
>> +int resctrl_find_cleanest_closid(void);
>>     #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 0c783301d106..0bbed8c62d42 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
>>       return ERR_PTR(-ENOSPC);
>>   }
>>   +/**
>> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
>> + *                                  RMID are clean, or the CLOSID that has
>> + *                                  the most clean RMID.
>> + *
>> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
>> + * may not be able to allocate clean RMID. To avoid this the allocator will
>> + * choose the CLOSID with the most clean RMID.
>> + *
>> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
>> + * be returned.
>> + */
>> +int resctrl_find_cleanest_closid(void)
>> +{
>> +    u32 cleanest_closid = ~0, iter_num_dirty;
>
> Just naming num_dirty should have been fine.  I will leave it you.

That was to make it obvious its something to do with the loop, so the value can't be
relied on outside that. I'll rename it and move the declaration into the loop, that way
its out of scope if someone tries to use it later.


>> +    int i = 0;
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> +        return -EIO;
>> +
>> +    for (i = 0; i < closids_supported(); i++) {
>> +        if (closid_allocated(i))
>> +            continue;
>> +
>> +        iter_num_dirty = closid_num_dirty_rmid[i];
>> +        if (iter_num_dirty == 0)
>> +            return i;
>> +
>> +        if (cleanest_closid == ~0)
>> +            cleanest_closid = i;
>> +
>> +        if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
>> +            cleanest_closid = i;
>> +    }
>> +
>> +    if (cleanest_closid == ~0)
>> +        return -ENOSPC;
>> +    return cleanest_closid;
>
> Line before the return looks clean.
>
> *       if (cleanest_closid == ~0)
> +        return -ENOSPC;
> +
> +    return cleanest_closid;

Sure,


Thanks,

James

2023-10-25 17:57:23

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()

Hi Reinette,

On 05/10/2023 19:04, Reinette Chatre wrote:
> On 10/5/2023 10:05 AM, James Morse wrote:
>> On 02/10/2023 18:00, Reinette Chatre wrote:
>>> On 9/14/2023 10:21 AM, James Morse wrote:
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 725344048f85..a2158c266e41 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>>>
>>>> void __exit rdtgroup_exit(void)
>>>> {
>>>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> +
>>>> + if (r->mon_capable)
>>>> + resctrl_exit_mon_l3_config(r);
>>>> +
>>>> debugfs_remove_recursive(debugfs_resctrl);
>>>> unregister_filesystem(&rdt_fs_type);
>>>> sysfs_remove_mount_point(fs_kobj, "resctrl");
>>>
>>> You did not respond to me when I requested that this be done differently [1].
>>> Without a response letting me know the faults of my proposal or following the
>>> recommendation I conclude that my feedback was ignored.
>>
>> Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
>> one if you prefer, but I find that adds more noise than signal.
>
> I do not expect a response to every review feedback but no response
> is assumed to mean that you agree with the feedback.
>
>>
>> This is my attempt at 'doing the cleanup properly', which is what you said your preference
>> was. (no machine on the planet can ever run this code, the __exit section is always
>> discarded by the linker).
>>
>> Reading through again, I missed that you wanted this called from resctrl_exit(). (The
>
> Right. And not responding to that created expectation that you agreed with the
> request.
>
>> naming suggests I did this originally, but it didn't work out).
>> I don't think this works as the code in resctrl_exit() remains part of the arch code after
>> the move, but allocating rmid_ptrs[] stays part of the fs code.
>>
>> resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
>> the name resctrl_exit() as its part of the exposed interface.
>
> I expect memory allocation/free to be symmetrical. Doing otherwise
> complicates the code. Having this memory freed in rdtgroup_exit() only
> seems appropriate if it is allocated from rdtgroup_init().
> Neither rmid_ptrs[] nor closid_num_dirty_rmid are allocated in
> rdtgroup_init() so freeing it in rdtgroup_exit() is not appropriate.

It probably makes more sense when you see how things get split up. I was trying to reduce
the churn of adding something in one place, then moving it later.

For now I've added all the functions to make this thing symmetric.



James


> If you are planning to move resctrl_exit() to be arch code then I expect
> resctrl_late_init() to be split with the rmid_ptrs[]/closid_num_dirty_rmid
> allocation moving to fs code. Freeing that memory can follow at that time.

2023-10-25 17:58:08

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work

Hi Reinette,

On 03/10/2023 22:23, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> The resctrl architecture specific code may need to free a domain when
>> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
>> Amongst other things, the resctrl filesystem code needs to clear this
>> CPU from the cpu_mask of any control and monitor groups.
>>
>> Currently this is all done in core.c and called from
>> resctrl_offline_cpu(), making the split between architecture and
>> filesystem code unclear.
>>
>> Move the filesystem work to remove the CPU from the control and monitor
>> groups into a filesystem helper called resctrl_offline_cpu(), and rename
>> the one in core.c resctrl_arch_offline_cpu().
>>
>> The rdtgroup_mutex is unlocked and locked again in the call in
>> preparation for changing the locking rules for the architecture
>> code.
>
> This last paragraph may cause some confusion since this refactoring
> is not changing any current locking. I'll defer to you if you prefer
> to keep it.

Hmm, that is referring to an earlier version that looked funny and I felt needed
explanation. I've remove that paragraph.


>> Reviewed-by: Shaopeng Tan <[email protected]>
>> Tested-by: Shaopeng Tan <[email protected]>
>> Tested-By: Peter Newman <[email protected]>
>> Signed-off-by: James Morse <[email protected]>
>> ---
>
> Reviewed-by: Reinette Chatre <[email protected]>


Thanks!

James